Skip to content

[Python] read_csv converts strings with leading zeros to integers #46853

Open
@WillAyd

Description

@WillAyd

Describe the bug, including details regarding any error messages, version, and platform.

If you have data in a CSV file that is a quoted value with leading zeros, pyarrow will strip the leading zeros and convert the value to an integer. For example:

In [15]: with io.BytesIO(b'col1,col2\n"001","foo"\n"002","bar"') as buf:
    ...:     tbl = pa.csv.read_csv(buf, parse_options=pa.csv.ParseOptions(quote_char='"'))
    ...:     print(tbl)
pyarrow.Table
col1: int64
col2: string
----
col1: [[1,2]]
col2: [["foo","bar"]]

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions