New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force quoting on certain columns? #751
Comments
tl;dr: This doesn't answer your question, but it's a long explanation of why this feature (probably) doesn't exist. And also, why the behavior you've noticed ("001" being interpreted as a string and not a number, by an importing program) is just a quirk specific to that program – it's not CSV standard behavior. FWIW, how a CSV file is interpreted – particularly, how to typecast each column, is generally up to the interpreting program. For example, importing your latter example (i..e. with quoted >>> import pandas as pd
>>> df = pd.read_csv('/tmp/thedata.csv')
>>> print(df)
name COUNTYFP10
0 Adair 1
1 Andrew 3
2 Atchison 5
>>> print(df.dtypes)
name object
COUNTYFP10 int64
dtype: object Not sure how Agate handles the import, just using Pandas as an example of another client program that has its own ways of auto-guessing the data types of imported CSVs. The upshot of all this: I'd be really surprised if Agate or csvkit had a way to specify quoting by column when exporting to CSV, because there wouldn't be any point. There's just no accepted standard for how to assume datatypes of imported CSV, because CSV only has an understanding of every value being text...as opposed to something like JSON, in which there's a concept of strings, numbers, and booleans. My advice is that you should accept having to configure datatyping on the importing program's side, e.g. in Excel's Text Import Wizard. Or in Python pandas, it would be to set the dtype argument, e.g. >>> df = pd.read_csv('/tmp/thedata.csv', dtype={'COUNTYFP10': str})
>>> print(df)
name COUNTYFP10
0 Adair 001
1 Andrew 003
2 Atchison 005
>>> print(df.dtypes)
name object
COUNTYFP10 object
dtype: object It's a pain in the ass, but this kind of vaguery is just inherent to the CSV format. You don't want to depend on application-specific quirks when figuring out the import-export workflow. |
|
Is it possible to force
.to_csv()
to quote particular columns when writing the CSV?I have in mind situations like a column of zero-padded numbers. If that data is not enclosed in quotes in the CSV, then when other programs read the CSV, they interpret it as a number (1) instead of a string ("001").
If I set a column to
text_type
, Agate will write the column's values properly as001
,002
, etc.:Is there a way to make agate generate the CSV like this:
The text was updated successfully, but these errors were encountered: