Preprocessing URLs

Read this is you want to know what the preprocessing of URLs does and what it is useful for

R
Written by Raul Garreta
Updated over a week ago

When the Preprocess URLs parameter is selected, all URLs will be replaced by a special word __url__ . This will allow a model to learn from the mention of URLs in general rather than learning from specific mentions or instances. 

This is particularly useful for excluding URLs from features (i.e.: using the Preprocess URLs parameter in combination with the model's stopwords) in cases for which the information encoded in URLs is of little value for classification purposes. 

    

Did this answer your question?