When tagging data for training a classifier, you must have consistent criteria for building an accurate model.

When you start training a classifier, you begin tagging with certain criteria. As you tag more and more data, you’ll learn from it and eventually change your tagging criteria over time. This change is entirely normal and a good thing: the more you understand about your data, the better.

But consider that these changes in your criteria will confuse your classifier and affect its accuracy. Imagine the following scenario. At the start of your tagging, you tell the classifier that for a particular input (text), you expect a specific (output). Then over time, you change your criteria. For similar input, you start telling the classifier you expect other tags you initially were missing. By doing this, you are providing mixed signals to the classifier in your tagging that would even confuse a human tagger.

If you detect that your criteria have changed over time, we recommend reviewing the samples you used to train your classifier. Review the tags you used in your training data, and if you find inconsistencies, re-tag those samples.

An excellent way to find these inconsistencies is to explore the False Positives and False Negatives samples for a particular tag. For finding these samples, within the Build > Stats tab, go to the tag you wish to improve and then either click on False Positives or False Negatives:

When clicking on False Positives or False Negatives, MonkeyLearn will show you the samples with these mismatches between the ‘human tag’ and the ‘predicted tag’, which usually are pretty useful to find tagging inconsistencies. Following the example above, these are the False Positives samples for the Videos tag:

In the first sample, the human tagger categorized this sample as News Feed, but the prediction also included the tag Videos. In this example, the prediction is actually correct: the human tagger made a mistake while training this classifier and missed tagging this example with the Videos category.

To fix a tagging mistake or inconsistency, you should select the sample, then click on the Actions button and select Tag selected data:

This will take you to the tagging UI, where you should add the correct tags for your sample. In this example, we should add the Videos tag. Once you make the fix, click on Confirm button to make the change:

Tagging inconsistencies can also be found in the False Negatives samples, so make sure to review your false negatives too.

Did this answer your question?