When defining the set of tags you will be using for a custom extractor, there are some important things to keep in mind.Β
π First Time Best Practices
Keep the number of tags as low as you can. Initially, it's best to work work with less than 10.
Make sure you know your data and have enough texts for each tag that you create. If you don't have enough text data for a tag, create the tag later.
Avoid tags that might be confused with each other, each tag should have its own clearly defined criteria.
Make sure your set of tags works on one level. It's important to avoid tags that are more general than others, or higher up in a hierarchy. Keep the scope the same.
Examples of Extraction Tags
To give you some ideas of how extractors can be used, and role that tags play, the following are some examples of tag sets defined for commonly used extractor models.
For texts that include various features. For example, for the specifications of laptop models, the tags might be:
Processor or CPU
Brand
RAM
Storage
For a series of resumes in text format, you could define tags to extract:
Name
Email
Education
Work History
Similarly, for processing documents like old contracts:
Names of Entities Involved
Articles changing hands
Dates
You can even use custom extractors to tag larger texts. For example, if you want to extract information out of support tickets, you can set up tags to extract:
The phrase which includes a specific question or requests
Paragraphs of contextual information