The main difference between text classification and text extraction has to do with where the resulting prediction comes from.

In text classification, the result is usually not present within the text and a classification has to be deduced by the model based on the text provided.

In text extraction, the result is found within the text and the model is trained to look for that particular entity.

As a result, you will see classifiers looking for tags related to categories, and extractors often looking for tags related to entities.

Looking at an example

"Rafa Nadal's season got off to a miserable start as his comeback from injury and illness stalled in the Qatar Open first round with a 1-6 6-3 6-4 defeat by German journeyman Michael Berrer on Tuesday."

A news classifier might tag this text as shown below. Note how these tags are implied, while not actually being present in the text itself. 

  • Sports 
  • Tennis 

An extractor might extract these tags, or entities, which are explicit in the text:

  • Rafa Nadal 
  • Qatar Open 
  • Michael Berrer 
Did this answer your question?