Text classification models are used to categorize text into organized groups. Text is analyzed by a model and then the appropriate tags are applied based on the content. Machine learning models that can automatically apply tags for classification are known as classifiers.

Classifiers can't just work automatically, they need to be trained to be able to make specific predictions for texts. Training a classifier is done by:

  • defining a set of tags that the model will work with
  • making associations between pieces of text and the corresponding tag or tags

Once enough texts have have been tagged, the classifier can learn from those associations and begin to make predictions with new texts.

Classifier Examples

A classifier is most effective when it is built for a specific use case using a set of tags and training texts that pertain to it. The following are some of the ways that classifiers, and their according set of tags, are used.

Sentiment Analysis

Sentiment analysis is one of the most common use cases for classifiers. This kind of analysis is used detect positive or negative sentiment from a user or customer in their comments, tweets, reviews, etc. 

For example, with the following hotel reviews:

Text A:
"Friendly service. Superior room! Loved the high ceiling. Housekeeping service was top quality. Excellent breakfast and fitness room."Text B:
"If possible I would give the hotel zero stars. The smell is terrible and gave me a headache after one night. Recommend avoiding."

A sentiment analysis classifier will classify Text A as positive as there are elements that indicate that the reviewer was generally satisfied.

Text B would likely be classified as negative since the reviewer is signaling to many negative aspects of the hotel room and experience.

Language Detection

In language detection, an incoming piece of text will be analyzed against a list of languages (e.g.: Spanish, English, French, etc.) to programmatically detect the language of the given text.

For example, if we have the following texts:

Text A:
"Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data…"

Text B:
"El aprendizaje automático o aprendizaje de máquinas es una rama de la inteligencia artificial cuyo objetivo es desarrollar técnicas que permitan a las computadoras aprender…"

Text C:
"L’apprentissage automatique (machine learning en anglais), un des champs d’étude de l’intelligence artificielle, est la discipline scientifique concernée par le développement…"

Text A would be classified as English , Text B as Spanish and Text C as French .

Product Classification

Imagine you are working with a set of apparel products and you want to automatically classify them using their descriptions.

Let's look at the the following descriptions as examples:

Product A:
"This women’s printed pullover sweater is a great basic to add to your wardrobe. Throw this sweater over a top for extra warmth and to add some fun pops of color to your outfit. Pair it with jeans and boots this winter. This top is an excellent essential…"

Product B:
"Relieve tired, aching feet from the stress of high heels. The fully-lined padded insole provides a comfortable fit and a rubber outsole for durability. Bendable comes in a convenient carry-bag…"

The description from Product A mentions a sweater in various capacities, so in all probability it would be classified as Apparel_Sweaters.

The description in Product B mentions aspects of a sandal, plus various references to feet and other kinds of shoes. The likely classification would be Apparel_Shoes.

Topic Classification

Text classification is often used for organize text by topic. This is commonly used for emails, support tickets, reviews, articles, etc.

We can train a topic classifier to tag what incoming texts are about, provided that we provide associations between texts and tags that a machine learning model can learn from.

As an example of a topic classification, we can look at the following support tickets:

Hello, I'm just wondering if Best Buy will be carrying the Panasonic GH1, I notice Best Buy currently only carry 1 Panasonic camera, I'm really looking forward to the GH1. Thanks.

A topic classifier might tag this as Availability Inquiry .

My orders keep getting cancelled! This seem to be a common problem. I've tried to place an order twice and each time the order is cancelled within seconds.

The same topic classifier would classify this text as Order Issue .

Classifiers and MonkeyLearn

With MonkeyLearn, you can use our library of publicly available classifiers - like sentiment or language - and you can also make your own custom classifier for your own specific use case.

Did this answer your question?