Manually tagging data is challenging. After tagging a few examples, people start to get tired, bored, and distracted. These things make it easy to make mistakes while tagging data for training a classifier, which has a profound impact on the accuracy of your classifier.
Machine learning models are straightforward in how they work. From your tagged data, they learn that from a particular input (text), you expect a specific output (tag). If your training data has tagging mistakes, the classifier will also learn from those mistakes. These errors confuse your classifier and cause your model to make similar mistakes when making predictions on new data.
This is why reviewing your training data to find these tagging mistakes is a great way to improve your classifier.
How to Find Mistakes in the Training Data?
A straightforward way to find tagging mistakes is to review the false positives and false negatives for a particular tag. For finding these samples, within the Build > Stats tab, go to the tag you wish to improve and then either click on False Positives or False Negatives:
In the example above, False Positives are those samples that the classifier prediction tagged with the selected category (the Videos tag in the above example) in the evaluation process, but it wasn’t manually classified with that tag (Videos) by the human tagger. False Negatives are the opposite; these samples were originally tagged with the selected category (e.g. Videos), but the classifier prediction didn’t label this example with this tag.
When clicking on False Positives or False Negatives, MonkeyLearn will show you the samples with these mismatches between the ‘human tag’ and the predicted tag. Following the example above, these are the False Positives samples for the Videos tag:
In the first sample, the human tagger categorized this sample as News Feed, but the prediction also included the tag Videos. In this example, the prediction is actually correct: the human tagger made a mistake while training this classifier and missed tagging this example with the Videos category.
To fix a mistake, you should select the sample, then click on the Actions button and select Tag selected data:
This will take you to the tagging UI, where you should add the correct tags for your sample. In this example, we should add the Videos tag. Once you make the fix, click on Confirm button to make the change:
Mistakes can also be found in the False Negatives samples, so make sure to review your false negatives too.