Defining Tags for Classifiers

Best practices and recommendations for defining tags and taxonomies when making a custom classifier

R
Written by Raul Garreta
Updated over a week ago

One of the most critical aspects to building a classifier is defining the tags that you will want to classify for. 

The risks of trying to do too much too fast will have a negative impact on the performance of a model. Though machine learning models can eventually be trained to be complex things, all of them had to start doing something simple first. 

👉 First Time Best Practices

For best results, keep the following in mind when defining tags for the first time. Remember, you can make changes and add more complexity later.

  • Limit the number of tags as much as possible. It's best to work with less than 10, at least initially.

  • Make sure you have enough pieces of text per tag. If you aren't sure you have enough, create the tag later.

  • Avoid situations where one tag might be confused with another.

  • Use one classification criteria per model, each classifier should have its own explicit purpose.

Sets of Tags and Examples

Some tags are pretty self explanatory. For example sentiment analysis generally includes:

  • Positive

  • Negative

  • Neutral

A classifier to categorize daily deals might involve:

  • Entertainment

  • Food & Drinks

  • Health & Beauty

  • Retail

  • Travel & Vacations

  • etc.

Defining Tags in the Dark

Things will get a bit more confusing when tags aren't as apparent. For example, trying to categorize feedback by what the feedback is talking about would be more challenging. You may not immediately know where to start, how specific you may need to be, or how many texts you have for any potential tag.

When it comes to tough situations, the first recommendation is always to take a look at your data. If you aren't familiar with your data, then it will be tough to make the necessary associations or to know if you have enough of a sample size to work with.

A second recommendation is to start with broad tags. If you look at your data with some broad tags in mind, you may be able to better get a sense of what might work. There may be some pre-existing tags for your field that could help as well. For example, when it comes to getting feedback, you could start with what are traditionally the various parts (or tags in this case) of a business: product development, customer support, sales and pricing, etc.

This way you can move forward meeting all the criteria mentioned in the best practices above. Remember, you can add more complexity at a later stage.

Subcategories and Hierarchies in Tags

It may be that you want to include subcategories or parent and child tags to build a hierarchy in your tag list. However, the newest version of MonkeyLearn does not allow for the creation of hierarchies in custom classifiers. 

If you need to add a second level of tags, our first suggestion would be to build a model for the highest level first. This will help build the necessary accuracy in the first training stages.

After the first working model has been built, you can look into the following options:

  • Create new tags that include a "subcategory" in the label itself. For example, for various tags like Movies  and Music  under Entertainment , you can create Entertainment_Movies and Entertainment_Music .

  • Creating new classifiers at each subsequent level to classify the child tags themselves. 

Did this answer your question?