Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Autodetect encoding from CSV files #129
We are using rows in https://github.com/interlegis/interlegis.portalmodelo.transparency to import generic CSV files that users upload, but sometimes the tools used to generate the CSV are not pattern like and generates data with alien encondings, like MS Excel. We need to autodetec the encoding used in the files maybe using some lib as 'chardet' or the 'file' Linux command.
It's a very good feature to have! I suggest implementing it as a function called
chardet in my experience is a bit slow an can make some mistakes if you don't pass the whole data.
I prefer to use
In my opinion the best approach for this issue is:
referenced this issue
Nov 15, 2015
Update on it (after some months...): I just uploaded the official version of file wrapper to the PyPI. We can install it by running:
And we can use to either detect the encoding or the file type, which could be used to solve this issue and also #143.
I've also written a blog post about file-magic library usage.
@jeanferri, do you still need this feature?
import magic # in the beginning of the file [...] if encoding is None: result = magic.detect_from_content(fobj.read(4096)).encoding fobj.seek(0)
Could you please try this solution and verify if this function from
@jeanferri I think the detection should only be made if you do not specify
pip install git+https://github.com/turicas/rows.git@129-detect-csv-encoding
Could you please test this version with your current code?