Being able to understand your classifier statistics is a key part of improving the model's performance. 

MonkeyLearn offers two groups of statistics. One group applies to the classifier overall, and the other is for each tag. Each group offers distinct statistics:

  • Overall: Accuracy and F1 Score
  • Tag Level: Precision and Recall

Overall Statistics

Overall statistics can be seen under the "Build" tab in the Stats section.

Accuracy

The accuracy is the percentage of texts that were predicted with the correct tag. It is the total number of correct predictions divided by the total number of texts in the dataset.

While providing a good indication, accuracy may not take into account large imbalances in the number of texts between tags, or other issues that might exist at a tag level.

F1 Score

F1 Score is another measure for how well the classifier is doing its job, by combining both Precision and Recall for all the tags (see below). Unlike accuracy, it does a better job of accounting for any imbalances in the distribution of texts among tags.

Tag Level Statistics

Tag level statistics can be seen in the Stats section by clicking on the individual tag.

Precision

Precision refers to the percentage of texts the classifier got right out of the total number of texts that it predicted for a given tag.

Recall

Recall refers to the percentage of texts the classifier predicted for a given tag out of the total number of texts it should have predicted for that given tag.

What Makes Up These Metrics?

Many of the statistics for a classifier start with a simple question: was a text correctly classified or not? 

This forms the basis for four possible outcomes:

A true positive is an outcome where the model correctly predicts the right tag. Similarly, a true negative is an outcome where the model correctly predicts the tags that don't apply.

A false positive is an outcome where the model incorrectly predicts the right tag. And a false negative is an outcome where the model incorrectly predicts the tags that don't apply.

Various relationships of these four outputs are involved in generating the statistics for your classifier.

More on this:

Did this answer your question?