The following is a list of public extractor models that you can use with your MonkeyLearn account. 

They can be used for both manual processing (uploading files directly) or through our API, Zapier, via the Google Sheets Extension, or RapidMiner. Please see integrations for more details.

Keywords

  • Keyword Extractor (English)- Detect sentiment in texts (positive, negative or neutral). This model was trained over different domains
  • Keyword Extractor (Spanish) - Extract keywords from text in Spanish. Keywords can be compounded by one or more words and are defined as the important topics in your content and can be used to index data, generate tag clouds or for searching.

Entities

  • Entity Extractor (English) - Extract Entities from text using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 3 classes: PERSON, ORGANIZATION and LOCATION. 
  • Entity Extractor (Spanish) - Extract Entities from text in Spanish using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 4 classes: PERS, ORG, LUG and OTROS.

Email Thread Cleaning

  • Email Cleaner & Last Reply Extractor - Extract the last reply from an email thread. A cleaning model that removes signatures, confidentiality clauses and other replies from email threads. It relies on statistical algorithms and natural language processing technology to analyze your emails and generate a cleaned version that captures the actual message. The targeted language is English.

Web to Text

  • Boilerplate Extractor (English) - Extract relevant text from HTML. This algorithm can be used to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
  • Html to Text Extractor - Converts a page of HTML into clean, easy-to-read plain ASCII text.

Opinion Units

Summary and Insight

  • Insight Extractor (English) - Extract the most important insights from text in English. Given a text, the output will be the most important keywords on the text. Each keyword will include the most representative sentences where it appears. This model is useful for things like seeing what most users are saying about a product or place — simply send all the reviews concatenated as one text.
  • Summary Extractor (English) - Given a text, the output will be a shorter version of it that maintains its meaning. This summarization model employs statistical algorithms and natural language processing technology to analyze your content and generate a summary that preserves the gist of the original. No new sentences are generated; every sentence of the summary is present in the original text.
  • Sentence Extractor (English) - Extracts the sentences from a given text. Useful to separate articles or paragraphs into smaller pieces of data.

Data Points

  • US Address Extractor - Extract US addresses from text. This algorithm can be used to detect a single or multiple addresses inside a text and extract information about them.
  • Date and Time Extractor - Extracts dates and times from text, and outputs them in ISO format. If a date contains a time, they will be extracted together. When any element of the date is missing (such as the year), the current date is assumed. This base date can be specified as well.
  • Price Extractor - Extract prices in different currencies from text. The number and currency are returned separately for more convenient parsing.
  • Email Extractor - Extract email addresses from text.
  • Phone Number Extractor - Extracts North American phone numbers from text and returns them with unified formatting. All the numbers extracted will be valid under the North American Numbering Plan, which means they can be from the US, from Canada, or from certain Caribbean countries.
  • URL Extractor - Extract URLs from text.
Did this answer your question?