MonkeyLearn

This tab allows to clean up the data by stacking different filters in a user-defined order and with specific parameters. There are two main types of filters:

Ignore filters: which filter out rows that are just noise for your application, by completely ignoring the rows that match this filter.

Clean filters: which clean up the text field by removing and/or substituting artifacts within the text.

- Ignore filters: which filter out rows that are just noise for your application, by completely ignoring the rows that match this filter.
- Clean filters: which clean up the text field by removing and/or substituting artifacts within the text.

The Cleaning Filters table on the left, shows the list of the filters that will be applied. You can add as many filters as you want and also, edit, delete, or change the order in which they are applied.

The Preview button will apply the current filters to a subset of data shown in the right table. It will take a few seconds to process and update. This is very useful to iterate on the filter configurations before saving and processing the entire dataset.

Save and Clean will save the filter settings and process all the training samples in the project with those filters. This will be then reflected in the Dataset View (you will see the cleaned texts, and will not see samples that were filtered out).

💡 Take into account that the list of filters is actually a pipeline, meaning that the output from the first filter is the input of the second filter and so on.

When you add a filter, you select the filter type from a predefined list of available filters. For each filter you can configure parameters to customize the behavior. In the example below, you can edit the phrase list parameter (list of phrases that if any matches is contained in the text, the sample is discarded):

You can always edit a filter you have already added by clicking on the edit button on the corresponding filter in the list.

💡 Put ignore filters that can quickly discard rows first in the list (eg: Filter Empty, Filter Containing, Filter Matching Regex, etc). This will reduce processing on downstream filters (eg: if a row is removed in steps 1 and 2, then there's no need to process them in steps 3 and below).

💡 There are some ignore filters such as "Filter by Length" that you might put at the end, since by doing previous cleaning steps, you can reduce the length of the text, eg: cleaning URLs or long tokens.

How to clean up data to improve MonkeyLearn Workflows' output

Cleaning View

Blog

Find answers and get help from Intercom Support and Community Experts

This site employs cookies and other technologies that we and our third party vendors use to monitor and record personal information about you and your interactions with the site (including content viewed, cursor movements, screen recordings, and chat contents) for the purposes described in our Cookie Policy. By continuing to visit our site, you agree to our {websiteTermsLink}, {privacyPolicyLink} and {cookiePolicyLink}.

This site uses cookies and similar technologies ("cookies") as strictly necessary for site operation. We and our partners also would like to set additional cookies to enable site performance analytics, functionality, advertising and social media features. See our {cookiePolicyLink} for details. You can change your cookie preferences in our Cookie Settings.

We use cookies to make our site work and also for analytics and advertising purposes. You can enable or disable optional cookies as desired. See our {cookiePolicyLink} for more details.

Advertising cookies are set by our advertising partners to collect information about your use of the site, our communications, and other online services over time and with different browsers and devices. They use this information to show you ads online that they think will interest you and measure the ads' performance. Social media cookies are set by social media platforms to enable you to share content on those platforms, and are capable of tracking information about your activity across other online services for use as described in their privacy policies.

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

These cookies are necessary for the website to function and cannot be switched off in our systems.

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

You have the right to opt out of the sale of your personal information. See our {cookiePolicyLink} for more details about how we use your data.

Your Privacy Choices

We use cookies to enhance your experience. You can customize your cookie preferences below. See our {cookiePolicyLink} for more details.

Cookie Settings

Link, Press control-option-right-arrow to exit

Empty Help Center

Uh oh. That page doesn’t exist.

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Tickets submitted through the messenger or by a support agent in your conversation will appear here.

Cleaning View

Adding Filters