Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

Closed
tilo opened this issue Mar 15, 2023 · 2 comments
Closed

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

tilo opened this issue Mar 15, 2023 · 2 comments

Comments

@tilo
Copy link
Owner

tilo commented Mar 15, 2023

some CSV files contain a Byte Order Mark
https://en.wikipedia.org/wiki/Byte_order_mark

e.g.

$ hexdump -C /tmp/sample.csv
00000000  ef bb bf 75 73 65 72 5f  69 64 2c 74 79 70 65 2c  |...user_id,type,|
00000010  6d 65 74 61 6c 5f 70 69  64 0d 0a 34 33 32 31 30  |metal_pid..43210|
00000020  38 30 35 2c 72 65 69 73  73 75 65 2c 31 32 33 34  |805,reissue,1234|

First 3 bytes ef bb bf should be ignored

Other BOM Markers:

* UTF-8 with BOM: EF BB BF
* UTF-16BE (big-endian): FE FF
* UTF-16LE (little-endian): FF FE
* UTF-32BE (big-endian): 00 00 FE FF
* UTF-32LE (little-endian): FF FE 00 00
@tilo tilo changed the title Check input file for Check input file for BOM / Byte Order Mark (REGRESSION?) Mar 15, 2023
@tilo
Copy link
Owner Author

tilo commented Mar 15, 2023

HINT: this is typically caused by some Microsoft tools.
A way to fix this is to run dos2unix filename

@tilo
Copy link
Owner Author

tilo commented Mar 19, 2023

fixed in #220

@tilo tilo closed this as completed Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant