It's perfectly normal when a classifier analyses a text and the prediction or result has no tags. This is related to the type of classifier you are training: a multi-tag classifier.

Context

For context, there are two types of classifiers you can train: single-tag classifiers and multi-tag classifiers:

  • Single-tag classifiers are trained with examples using just one tag per example. When analyzing and making predictions on new data, single-tag classifiers will always return 1 tag as a result.

  • Multi-tag classifiers are trained with examples using one or more tags per example. When analyzing and making predictions on new data, multi-tag classifiers can return 1 tag, 2 tags, or 3 tags. It can also return no tags at all when it thinks that the text doesn't belong to any tag.

So, in other words, single-tag classifiers and multi-tag classifiers are slightly different in the way they are trained and how they make their predictions later on.

For a given text, a single-tag classifier asks 'what is the most probable category?'. In contrast, on multi-tag classifiers, each category of the classifier is a 'binary' classifier. In other words, multi-tag classifiers have to decide and predict if a given text belongs or doesn't belong to that tag. Repeating this process for each of the tags is what creates the multiple predictions in a multi-tag classifier.

As a default, MonkeyLearn detects your tagging strategy to determine if what you are training is a single-tag classifier or a multi-tag classifier. You can see this in the settings of your model:

If in your classifier, some of the training samples were tagged with multiple tags, the 'autodetect' tagging strategy setting inferred that you wanted to create a multi-tag classifier. For example, the following training sample was tagged with 2 tags 'Features', and 'Pricing':

So, in this case, for each text it analyzes, this multi-tag classifier first has to make a decision and say: "this particular article belongs or not to the 'Features' category?" (so there are two possible answers 'Yes' and 'No'). After it has made this decision, it goes to the 'Pricing' category and asks "this particular text belongs or not to the 'Pricing' category?"... and so on for each tag.

So, when this classifier returns 0 tags for some of the entries, it means that it thinks it doesn't belong to any category at all.

Solution

In this situation, you have two options. The first and most straight-forward option is to go to the settings of your model:

And change the Tagging Strategy from 'Autodetect' to' Single-tag':

This will transform the model into a single-tag classifier, and this will cause the model to always return 1 tag. To save this advanced setting, please click the 'Save' button:

Finally, to re-train the model with this new setting, go to Build > Train tab and tag 1 more sample. This will force the model to re-train.

The advantage of this 'solution' is that it's fast; you will start getting tags in all of your results immediately after changing your classifier's settings. On the downside, sooner than later, you should review your training samples in your classifier and use only 1 tag per sample. You will have to spend some time re-tagging your training data to transform it into 'single-tag'. If you don't do this, the classifier results will be poor as the training data is multi-tag, but the model is set up as single-tag. Another disadvantage is that your model will just provide 1 tag for each result, and this might not be a good representation of the data you want to analyze (most problems are multi-tag).

The second option to solve this problem is improving the multi-tag classifier. Right now, it returns no tags at some of the entries because it believes those texts don't belong to any tags. If you find that these entries DO belong to those tags, this means that the model needs further training to better understand each tag. This is the best option if you want an accurate model in the long-term. Your data seems to be multi-tag data, so you should train a multi-tag classifier to model the data with the same criteria.

Did this answer your question?