The following is a list of public extractor modules that you can use with your MonkeyLearn account. 

They can be used for both manual processing (uploading files directly) or through our API, Zapier, or RapidMiner.

Keywords

  • Keyword Extractor (English)- Detect sentiment in texts (positive, negative or neutral). This model was trained over different domains
  • Keyword Extractor (Spanish) - Extract keywords from text in Spanish. Keywords can be compounded by one or more words and are defined as the important topics in your content and can be used to index data, generate tag clouds or for searching.

Entities

  • Entity Extractor (English) - Extract Entities from text using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 3 classes: PERSON, ORGANIZATION and LOCATION. 
  • Entity Extractor (Spanish) - Extract Entities from text in Spanish using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 4 classes: PERS, ORG, LUG and OTROS.

Web to Text

  • Boilerplate Extractor (English) - Extract relevant text from HTML. This algorithm can be used to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
  • Html to Text Extractor - Converts a page of HTML into clean, easy-to-read plain ASCII text.

Opinion Units

Summary and Insight

  • Insight Extractor (English) - Extract the most important insights from text in English. Given a text, the output will be the most important keywords on the text. Each keyword will include the most representative sentences where it appears. This module is useful for things like seeing what most users are saying about a product or place — simply send all the reviews concatenated as one text.
  • Summary Extractor (English) - Given a text, the output will be a shorter version of it that maintains its meaning. This summarization module employs statistical algorithms and natural language processing technology to analyze your content and generate a summary that preserves the gist of the original. No new sentences are generated; every sentence of the summary is present in the original text.
  • Sentence Extractor (English) - Extracts the sentences from a given text. Useful to separate articles or paragraphs into smaller pieces of data.

Data Points

  • US Address Extractor - Extract US addresses from text. This algorithm can be used to detect a single or multiple addresses inside a text and extract information about them.
  • Date and Time Extractor - Extracts dates and times from text, and outputs them in ISO format. If a date contains a time, they will be extracted together. When any element of the date is missing (such as the year), the current date is assumed. This base date can be specified as well.
  • Price Extractor - Extract prices in different currencies from text. The number and currency are returned separately for more convenient parsing.
  • Email Extractor - Extract email addresses from text.
  • Phone Number Extractor - Extracts North American phone numbers from text and returns them with unified formatting. All the numbers extracted will be valid under the North American Numbering Plan, which means they can be from the US, from Canada, or from certain Caribbean countries.
  • URL Extractor - Extract URLs from text.
Did this answer your question?