Whitelisting Words

Read this if you want to know what the whitelist parameter is and what it is useful for.

R
Written by Raul Garreta
Updated over a week ago

The whitelist parameter is basically a list of words that the model will always use as features. In other words, including words in the whitelist will force classifiers to learn from those words regardless of the frequency with which they appear or how important they are for each category. This is useful when you know there is a word that should always be included in a certain category. 

Say you have a category called Pricing & Billing in a topic classifier trained over text data from feedback about the services you provide. The chances are that you will want to have all the texts containing the word pricing in that category, right? If you include the word pricing in your classifier's whitelist, you will only need to tag a few texts in order for your classifier to be able to predict most of the texts containing that word correctly.

Did this answer your question?