Keywords are not the only thing classifiers use to make predictions about texts, but they are one of the best things to look at in order to troubleshoot classifier performance.

Understanding the Keyword Cloud

Once you build and train a custom classifier in MonkeyLearn, go to the "Build" tab and click on the Stats section. 

By default, you will see information about the overall level of the classifier. The section for "Keywords" is shown towards the bottom of the dashboard. 

To see keywords for a specific tag, click on the desired tag and the keywords cloud will populate in the same fashion.

The keywords that are most prominent in your keyword cloud should have a common-sense association with the tag. To see the full list of keywords, click on the "Keyword List" option.

Things to look for:

  • Are there any keywords that shouldn't be there?

  • Are there any phrases or keywords that should be included that aren't?

If you answer yes to any to either of those questions, see the next section to troubleshoot those keywords and make changes.

Tip: Click a particular keyword (either in the cloud or the list) to see the texts that match with that particular keyword.

Some classifiers may show odd symbols in the keyword cloud, like __number__ , for example. Models may group certain features together in order to prioritize richer parts of the text over others (like numbers, punctuation, etc.). This grouping results in this symbol, which may appear depending on the parameters set for your model.

Having an idea of which keywords correspond to the tag, both correctly and incorrectly, can go a long way in troubleshooting problems and improving the performance of the classifier. It can also be helpful in improving precision and recall.

Troubleshooting with Keywords

To make an impact on the keywords that are associated with a given tag, you have to make changes to the corresponding texts.

This is done by going to the Data section of the "Build" tab, and then searching for the keyword in question. You can use the "Tags" filter to search through all texts in the data set, or just for texts in a specific tag.

To Disassociate Keywords with a Specific Tag

Use the "Tags" filter drop down and select only the tag in question, then search for the keyword. 

Review the results and select the texts that you will want to retag. To start the retagging process, go to "Actions" in the top right and click "Tag selected data" as shown below.

To Associate New Keywords with a Specific Tag.

Search for the keyword in the entire data set. You may need to use the "Tags" filter to include unassigned, and all other tags, where this keyword may appear.

Upon searching, review the results and select the texts that both match the tag criteria and include the keyword. To tag those texts to the new category, go to "Actions" in the top right and click "Tag selected data".

Any changes you make will be applied. You may need to refresh your browser in order to see the changes to the keyword associations.

To Ignore a Keyword Completely in Your Model

In some cases, it may make sense to have a classifier completely ignore a keyword.  You can force this behavior, by adding the keyword to the list of stopwords in the settings for your model.

More on the subject:

Did this answer your question?