After the model is trained, you will see a series of metrics in the Statistics area that show how well the classifier would predict new data. These metrics are key to understand your model and how you can improve it.

You can see examples on of these metrics in this public Deals Classifier module.


The accuracy is the percentage of samples that were predicted in the correct category:

It’s a metric that shows how well a parent category distinguishes between its children categories. In the previous example, the Root category has an accuracy of 80% when distinguishing between its 6 children (Entertainment & Recreation, Food & Drinks, Health & Beauty, Miscellaneous, Retail and Travel & Vacations).

Tips on improving the Accuracy:

  • Add more training samples to its children categories.Retag samples that might be incorrectly tagged into the children categories (see confusion matrix section below).Sometimes sibling categories could be too ambiguous. If possible, we recommend merging those categories.

Accuracy on its own is not a good metric, you also have to take care of precision and recall. You can have a classifier with very good accuracy but still have categories with bad precision and recall.

Precision and Recall

Precision and Recall are useful metrics to check the accuracy on each child category:

If a child category has low precision, it means that samples from other sibling categories were predicted as this child category, also known as false positives.

If a child category has a low recall, that means that samples from this child category were predicted as other sibling categories, also known as false negatives.

Usually, there’s a trade-off between precision and recall in a particular category, that means, if you try to increase precision, you could end up doing that at the cost of lowering recall, and vice versa.

By using the confusion matrix, you can see the false positives and false negative of your model.

Tips on improving Precision and Recall:

  • By using the confusion matrix, you can explore the false positives and false negatives of your model.
  • If a sample was initially tagged as child category X but was correctly predicted as child category Y, move that sample to children category Y.
  • If the sample was incorrectly predicted as child category Y, try to make the classifier learn more about that the difference by adding more samples both to category X and category Y.
  • Check that the keywords associated with child categories X and Y are correct (see Keyword Cloud section to see how to fix that).

Confusion Matrix

After selecting a parent category in the category tree, you can see the confusion matrix which shows the confusion between the actual category and the predicted category for its children categories:

In the previous example, we can see that 4 samples that were tagged as Travel & Vacations were incorrectly predicted as Entertainment (red 4 in the left bottom corner of the matrix).

You get perfect results if you obtain a confusion matrix that has non-zero numbers only in its diagonal.

You can click that particular number to see the corresponding samples in the Samples section. In the example below you can see samples that were tagged as ‘Travel & Vacations’ were predicted as ‘Entertainment’.

Here you can fix the problem as we described in the previous sections when the solution is to tag or retag samples:

  • You can select samples in the left checkbox or use the shortcut X or Space keys.
  • You can paginate by using the left and right arrow keys.
  • You can delete or move samples to categories by using the Actions menu after selecting the corresponding samples.
  • See shortcuts by hitting Ctrl + h.

Keyword Cloud

You can see the keywords correlated to each category by selecting the corresponding category in the Tree section. In this example you can see the keyword cloud for the Food & Drinks category:

Tips on improving the Keywords:

  • Check if the keywords that were used to represent samples (dictionary) correlated to each category make sense.
  • Discover keywords that should not be in the dictionary or should not be correlated with that particular category.
  • You can see a more detailed list of keywords and their relevance by clicking the Keyword List link below the keyword cloud.
  • You can click a particular keyword (either in the cloud or the list) to filter the samples that match with that particular keyword.
  • Filter undesired keywords by adding the particular string into the stopwords list (see Parameters section below).
  • If a keyword that is useful to represent your category is missing from your list of keywords, try adding more data to your model that uses that specific term.
  • When using a Naive Bayes classifier, you can take a look at the features that have positive and negative influence in the prediction of each category. That’s particularly useful to debug the features used and correlated to each category:


You can set special parameters in the classifier that affect its behavior and can improve considerably the prediction accuracy.

Tips on improving Parameters:

  • Add keywords to the stopword list if you want to avoid them to be used as keywords by the classifier.
  • Use Multinomial Naive Bayes when developing the classifier as it gives you more insights on the predictions and debugging information. You should switch to Support Vector Machines when finishing developing the classifier to get some extra accuracy.
  • Enable stemming (to transform words into its roots) when useful for your particular case.
  • Try increasing the max features parameter to maximum 20,000.
  • Don’t filter default stopwords if you’re working with sentiment analysis.
  • Enable Preprocess social media when working with social media texts like tweets or Facebook comments.

Final words

Machine learning is a really powerful technology but in order to have an accurate model, you may need to iterate until you achieve the results you are looking for.

In order to achieve the minimum accuracy, precision and recall required, you will need to iterate the process from step 1 to 5, that is:

  1. Refine your Category tree.
  2. Gather more data.
  3. Tag more data.
  4. Upload the new data and retrain the classifier.
  5. Test and improve: Metrics (accuracy, precision and recall). False positives & false negatives. Confusion matrix. Keyword cloud and keyword list. Parameters.

This process can be done with two options:

  • Manually tagging the additional data.
  • Bootstrapping, that is, use the currently trained model to classify untagged samples and then verify that the prediction is correct. Usually verifying the tags is easier (and faster) than manually tagging them from scratch.

Besides adding data, you can also improve your model by:

  • Fixing any confusions found in your model.
  • Improving the keywords of your categories.
  • Finding the best parameters for your use case.

At the end of the day, training data is key in this process; if you train your algorithm with bad examples, the model will make plenty of mistakes. But if you are able to build a quality dataset, your model will be accurate and you will be able benefit from the automatic analysis of text with machine learning.

Did this answer your question?