N-gram range sets if features to be used to characterize texts will be:

  • Unigrams or words (n-gram size = 1)
  • Bigrams or terms compounded by two words (n-gram size = 2)
  • Trigrams or terms compounded by up to three words (n-gram size = 3)

Currently we support the following combinations:

  • Unigrams 
  • Unigrams and Bigrams (Default)
  • Unigrams, Bigrams and Trigrams
  • Bigrams
  • Bigrams and Trigrams
  • Trigrams

For problems like Sentiment Analysis, setting n-gram ranges that use sizes bigrams or trigrams can improve dramatically the accuracy of classification, as they can capture more complex expressions formed by the composition of more than one word. The rationale is that in Sentiment Analysis the outcome depends not only on the frequency of words but also on how they are combined: good has a different meaning alone than when preceded by a not as in not good.

You may change the n-gram setting in parameters, and then retrain your model, to see if it has a positive impact on the classifier statistics.

Did this answer your question?