Defining Tags for Extractors

Recommendations for defining tags when making a custom extractor

Written by Raul Garreta
Updated over a week ago

When defining the set of tags you will be using for a custom extractor, there are some important things to keep in mind.Β 

πŸ‘‰ First Time Best Practices

  • Keep the number of tags as low as you can. Initially, it's best to work work with less than 10.

  • Make sure you know your data and have enough texts for each tag that you create. If you don't have enough text data for a tag, create the tag later.

  • Avoid tags that might be confused with each other, each tag should have its own clearly defined criteria.

  • Make sure your set of tags works on one level. It's important to avoid tags that are more general than others, or higher up in a hierarchy. Keep the scope the same.

Examples of Extraction Tags

To give you some ideas of how extractors can be used, and role that tags play, the following are some examples of tag sets defined for commonly used extractor models.

For texts that include various features. For example, for the specifications of laptop models, the tags might be:

  • Processor or CPU

  • Brand

  • RAM

  • Storage

For a series of resumes in text format, you could define tags to extract:

  • Name

  • Email

  • Education

  • Work History

Similarly, for processing documents like old contracts:

  • Names of Entities Involved

  • Articles changing hands

  • Dates

You can even use custom extractors to tag larger texts. For example, if you want to extract information out of support tickets, you can set up tags to extract:

  • The phrase which includes a specific question or requests

  • Paragraphs of contextual information

Did this answer your question?