-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodetect encoding from CSV files #129
Comments
It's a very good feature to have! I suggest implementing it as a function called chardet in my experience is a bit slow an can make some mistakes if you don't pass the whole data. I prefer to use In my opinion the best approach for this issue is:
Pros:
Cons:
|
Update on it (after some months...): I just uploaded the official version of file wrapper to the PyPI. We can install it by running:
And we can use to either detect the encoding or the file type, which could be used to solve this issue and also #143. I've also written a blog post about file-magic library usage. |
@jeanferri, do you still need this feature? import magic # in the beginning of the file
[...]
if encoding is None:
result = magic.detect_from_content(fobj.read(4096)).encoding
fobj.seek(0) Could you please try this solution and verify if this function from |
Yes I do, please! It'll be good to Portal Modelo helping to process any kind of CSV file. Could you please commit this patch and make a release on Pypi? |
@jeanferri I think the detection should only be made if you do not specify pip install git+https://github.com/turicas/rows.git@129-detect-csv-encoding Could you please test this version with your current code? |
We are using rows in https://github.com/interlegis/interlegis.portalmodelo.transparency to import generic CSV files that users upload, but sometimes the tools used to generate the CSV are not pattern like and generates data with alien encondings, like MS Excel. We need to autodetec the encoding used in the files maybe using some lib as 'chardet' or the 'file' Linux command.
The text was updated successfully, but these errors were encountered: