All Collections
Data Files and MonkeyLearn
Working with CSV/Excel Data Files
Working with CSV/Excel Data Files

More on formatting, encoding requirements, file size and structure suggestions, and upload limits for your data files

R
Written by Raul Garreta
Updated over a week ago

MonkeyLearn uses Comma Separated Values (CSV) or Excel files when importing data into models and CSV when exporting data. The following sections show more details on the formats accepted and suggestions to ensure you can work with these file formats correctly. If you are having issues uploading data, see Common Problems when Uploading Data Files.

️ CSV Primer

CSV files are just plain text files, with a specific format to represent rows and columns. Usually each line represents a row and commas in each line are used to signify where columns would be.

Take the following table, for example:

This same data represented as a CSV file would look like this:

2001: A Space Odyssey,Science Fiction 
Kung Fu Panda,Animation

If the value contains a comma itself you can wrap it with quotation marks (”) and if you need to escape quotation marks you can add another quotation mark before it. Let’s add a couple of lines to our CSV to illustrate this:

2001: A Space Odyssey,Science Fiction
Kung Fu Panda,Animation
"The Good, the Bad and the Ugly",Western
"Dr. Strangelove or: ""How I Learned to Stop Worrying and Love the Bomb""",Comedy

Finally, please note that you can have multi-line values but you must wrap those in quotation marks. Take a look at the following example:

"This is a single line column","And another single line column."
"This is a multi-line column.
It continues here","This is a second multi-line column.
It ends here"

The last three lines represent a single row, with columns that have new line characters in it.

Now you have the basics of the CSV format, we can now proceed to describe MonkeyLearn’s specific CSV requirements.

MonkeyLearn CSV/Excel format

MonkeyLearn will ask for CSV or Excel files to import text data to a model. Within your files, MonkeyLearn will need to you to define the following fields or columns:

  • Texts or Text Data

  • Tag or Category (optional)

If your file contains more columns that just the text and tags, MonkeyLearn will show all of the available columns and will ask you to select which columns are to be used.

The tag or category field is not necessary to include when uploading text data, it would only be used when creating a custom classification model where the tags have already been defined. 

Zip Files

MonkeyLearn also supports zipped CSV or Excel files. If you are using big data files, you might want to try to zip the file before uploading it. This way the upload takes less time and the file size is reduced, which helps avoid reaching a potential file size limit. 

Limits on Data Files

️ Note that MonkeyLearn enforces some limitations on data files:

  • A maximum of 200 tags (or categories)

  • A maximum of texts depending on your account plan's query limit

  • A maximum of 100 MB (zipped or not)

Encoding Requirements for CSVs

️In order to work with MonkeyLearn, a CSV file should be UTF-8 encoded, (latin-1 should also work). 

We try our best to auto detect other encodings, but to ensure we can accept your data, we strongly recommend UTF-8.

If you are exporting the CSV from Excel, please note that this option is known to have issues (in particular with some Unicode characters). Instead of exporting from Excel, try to use an alternative tool like Google Spreadsheets or LibreOffice. Quite often, it works simply to open a file in one of these applications and export it from there.

If you find yourself having other issues with your data please let us know on the live chat.

More on this:

Did this answer your question?