Precision and Recall are useful metrics to check the accuracy for tags and there are clear ways to try to improve them. 

Before jumping into working with these metrics, it may help to read the doc on Understanding Classifier Statistics to give more background.

The Basics

If a tag has low precision, that means that texts from other tags are getting confused with the tag in question.

If a tag has a low recall, that means that texts from that tag are getting predicted for other tags.

There is usually a trade-off between precision and recall for a particular tag. If you try to increase precision, you could end up doing that at the cost of lowering recall, and vice versa. You'll need to find the balance that works for your use case.

Tips to Improve Precision for a Tag

Under the "Build" Tab, go to Stats, and click on the tag in question. There are two areas to concentrate on: "Samples" and "False Positives".

Both options will show you a list of texts. "Samples" will show all the texts classified for that tag. "False Positives" will show the texts that have been incorrectly predicted for that tag. 

The best practice is to go through the texts for both areas mentioned and try to see if the texts are where they should be. For any texts that are incorrectly tagged, select them with the checkbox, then go to the "Actions" button in the top right and select "Tag Selected Data". That will give you the chance to retag the selected texts appropriately.

Whenever new data is added, it's a good practice to look at both "Samples" and the "False Positives" for each tag to see what might be having a negative impact on the health of the classifier.

Further improvement for precision may be achieved by analyzing the keywords associated with that tag.

Tips to Improve Recall in a Tag

Improving recall involves adding more accurately tagged text data to the tag in question. In this case, you are looking for the texts that should be in this tag but are not, or were incorrectly predicted (False Negatives). 

The best way to find these kinds of texts is to search for them using keywords.

Under the "Build" tab go to Data. It will show all the texts in your classifier. Use the search bar at the top to search for keywords that you believe might be relevant to the tag in question. It may help to be familiar with the keywords most commonly associated with that tag.

You can use the "Tags" filter to limit the search for "Unassigned" texts. This will help ensure that you are searching for texts that have uploaded, but not been classified yet.

For any unassigned texts that we want to tag, select them with the checkbox, then go to the "Actions" button in the top right, and select "Tag Selected Data". That will give you the chance to tag the selected texts appropriately.

Searching for relevant texts within your data set is a great way to improve recall. The more texts you correctly assign to the tag in question, the more recall will increase. 

More on this subject:

Did this answer your question?