Force quoting on certain columns? #751

Kirkman · 2020-12-03T22:58:30Z

Is it possible to force .to_csv() to quote particular columns when writing the CSV?

I have in mind situations like a column of zero-padded numbers. If that data is not enclosed in quotes in the CSV, then when other programs read the CSV, they interpret it as a number (1) instead of a string ("001").

If I set a column to text_type, Agate will write the column's values properly as 001, 002, etc.:

name,COUNTYFP10
Adair,001
Andrew,003
Atchison,005

Is there a way to make agate generate the CSV like this:

name,COUNTYFP10
Adair,"001"
Andrew,"003"
Atchison,"005"

The text was updated successfully, but these errors were encountered:

dannguyen · 2020-12-24T16:27:14Z

tl;dr: This doesn't answer your question, but it's a long explanation of why this feature (probably) doesn't exist. And also, why the behavior you've noticed ("001" being interpreted as a string and not a number, by an importing program) is just a quirk specific to that program – it's not CSV standard behavior.

FWIW, how a CSV file is interpreted – particularly, how to typecast each column, is generally up to the interpreting program. For example, importing your latter example (i..e. with quoted COUNTYFP10 values) into a pandas.DataFrame will result in the same as importing the non-quoted data: the COUNTYFP10 column will be interpreted as integers:

>>> import pandas as pd
>>> df = pd.read_csv('/tmp/thedata.csv')
>>> print(df)
       name  COUNTYFP10
0     Adair           1
1    Andrew           3
2  Atchison           5

>>> print(df.dtypes)
name          object
COUNTYFP10     int64
dtype: object

Not sure how Agate handles the import, just using Pandas as an example of another client program that has its own ways of auto-guessing the data types of imported CSVs.

The upshot of all this: I'd be really surprised if Agate or csvkit had a way to specify quoting by column when exporting to CSV, because there wouldn't be any point. There's just no accepted standard for how to assume datatypes of imported CSV, because CSV only has an understanding of every value being text...as opposed to something like JSON, in which there's a concept of strings, numbers, and booleans.

My advice is that you should accept having to configure datatyping on the importing program's side, e.g. in Excel's Text Import Wizard.

Or in Python pandas, it would be to set the dtype argument, e.g.

>>> df = pd.read_csv('/tmp/thedata.csv', dtype={'COUNTYFP10': str})
>>> print(df)
       name COUNTYFP10
0     Adair        001
1    Andrew        003
2  Atchison        005
>>> print(df.dtypes)
name          object
COUNTYFP10    object
dtype: object

It's a pain in the ass, but this kind of vaguery is just inherent to the CSV format. You don't want to depend on application-specific quirks when figuring out the import-export workflow.

jpmckinney · 2021-07-14T16:54:51Z

to_csv passes through keyword arguments to Python's csv module. You can import csv and pass quoting=csv.QUOTE_ALL or maybe even quoting=csv.QUOTE_MINIMAL to to_csv to achieve what you want. https://docs.python.org/3/library/csv.html#csv.QUOTE_ALL

jpmckinney closed this as completed Jul 14, 2021

jpmckinney added the question label Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force quoting on certain columns? #751

Force quoting on certain columns? #751

Kirkman commented Dec 3, 2020

dannguyen commented Dec 24, 2020

jpmckinney commented Jul 14, 2021 •

edited

Force quoting on certain columns? #751

Force quoting on certain columns? #751

Comments

Kirkman commented Dec 3, 2020

dannguyen commented Dec 24, 2020

jpmckinney commented Jul 14, 2021 • edited

jpmckinney commented Jul 14, 2021 •

edited