The Models view is where models (topics, subtopics, sentiment, keywords, and extraction) are added to the workflow. They can be created from scratch or chosen from a list of pre-made models.
There are five types of models available:
Sentiment
Topic
Subtopic
Keywords
Extraction
Sentiment
This enrichment comes with fixed list of labels (non-customizable): Positive, Negative and Neutral. This means that when adding a sentiment model, there's no need to add labels. A pre-made model for sentiment will be used.
Topics
These are root level topic classification models where a list of labels are defined, eg: "pricing", "customer service", "usability", etc. Topic models can be added from a list of pre-made models or can be defined from scratch, see "custom topics" below.
Subtopics
Subtopics work as topics with the difference that a subtopic belongs to a parent topic label. This way, you can organize topics in hierarchies with a second level of topic categorization.
Keywords
It extracts the most relevant keywords within a text, including aspects (typically nominal groups) for example: food, qualifiers (typically adjectives) for example: delicious, and opinions (a combination of an aspect and a qualifier), for example: delicious food.
Extraction
It extracts particular data points defined in the chosen extraction model, eg: entities, company names, etc.
Custom Topics
MonkeyLearn allows to create and train custom models for topic classification. It works by defining the custom list of labels and then providing a definition for each label (mapping keywords to a label or defining labeling functions).
Adding Labels
When you add a label, you can type a custom label name, or you can pick one from the pre-made topic labels from the top:
The advantage of using a pre-made label is that it already comes with "knowledge", meaning that the label might come with pre-made labeling functions:
Already mapped keywords
Pre-made classification models
Pre-made regexes
Pre-made inference prompts
These pre-made labels are organized in thematic groups, eg: for product feedback there's a list of common topics such as Pricing, Usability, Support, Reliability and Functionality.
Defining Labels
Each topic label must be defined, meaning that you must map keywords for each of them. That way, the label model will have custom labeling functions based on keyword matching in order to label training samples.
To define labels you must map keywords:
Select the label in the label list
Check if there are keywords already mapped in the Mapped Keywords list.
Map keywords to a label by clicking the button with the left pointing arrow found on the keyword you want to map from the Unused Keywords list
Other utilities:
You can map custom keywords by writing such keyword on the text box under Mapped Keywords and clicking the Add button.
You can search for particular keywords using the search bar below Unused Keywords.
If you click the π button on a particular keyword in the lists, you will see samples from the dataset that have that keyword in the bottom of the screen. This is useful to verify in what context that keyword is used.
π‘ Try to map as many keywords as possible, but avoid overly generic keywords that can generate noise and confusion to the model. Focus on specific keywords that only fit one particular label. Take into account that mapped keywords will be used to initially label samples by keyword matching. We are looking for high precision keywords, the strategy here is to map a lot of low-recall, high-precision keywords, discarding generic ones.
π‘ When mapping a particular keyword, try searching for it in the search bar to find similar ones, i.e.: other n-grams that contain the keyword. For example, if you mapped "friendly", try searching "friend" and you might also map "unfriendly", "friendliness", "friendly staff" etc.
Training
Here you will be able to trigger the labeling process and the training process. When it's finished, you will see in the Dataset View that now contains a column with the results of this model for each row. It also updates the stats below.
Stats
Stats section shows handy performance metrics about the selected model including:
Number of labels defined in the model.
Labeling coverage: the percentage of training samples that where labeled with at least one label.
Prediction coverage: the percentage of training samples that were predicted with at least one label by the trained machine learning model.
Number of mapped keywords per label.
Test
Here you can test the predictions made for the model with a particular text.