Check input file for BOM / Byte Order Mark (REGRESSION?) #219

tilo · 2023-03-15T22:16:54Z

some CSV files contain a Byte Order Mark
https://en.wikipedia.org/wiki/Byte_order_mark

e.g.

$ hexdump -C /tmp/sample.csv
00000000  ef bb bf 75 73 65 72 5f  69 64 2c 74 79 70 65 2c  |...user_id,type,|
00000010  6d 65 74 61 6c 5f 70 69  64 0d 0a 34 33 32 31 30  |metal_pid..43210|
00000020  38 30 35 2c 72 65 69 73  73 75 65 2c 31 32 33 34  |805,reissue,1234|

First 3 bytes ef bb bf should be ignored

Other BOM Markers:

* UTF-8 with BOM: EF BB BF
* UTF-16BE (big-endian): FE FF
* UTF-16LE (little-endian): FF FE
* UTF-32BE (big-endian): 00 00 FE FF
* UTF-32LE (little-endian): FF FE 00 00

The text was updated successfully, but these errors were encountered:

tilo · 2023-03-15T23:10:55Z

HINT: this is typically caused by some Microsoft tools.
A way to fix this is to run dos2unix filename

tilo · 2023-03-19T05:12:28Z

fixed in #220

tilo changed the title ~~Check input file for~~ Check input file for BOM / Byte Order Mark (REGRESSION?) Mar 15, 2023

tilo mentioned this issue Mar 17, 2023

fixing issue with BOM at beginning of file #220

Merged

tilo closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

tilo commented Mar 15, 2023 •

edited

tilo commented Mar 15, 2023

tilo commented Mar 19, 2023

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

Check input file for BOM / Byte Order Mark (REGRESSION?) #219

Comments

tilo commented Mar 15, 2023 • edited

tilo commented Mar 15, 2023

tilo commented Mar 19, 2023

tilo commented Mar 15, 2023 •

edited