Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Could not determine delimiter" when trying to render TSV via stdin #54

Open
hjacobs opened this issue May 24, 2022 · 5 comments
Open

Comments

@hjacobs
Copy link

hjacobs commented May 24, 2022

Rendering TSV (tab-separated values) works when passing a file name:

rich temp.tsv

But it fails for the same file when passing as stdin (-) with error "Could not determine delimiter":

cat temp.tsv | rich - --csv

Apparently the CSV/TSV sniffer does not work correctly and the detection via the file extension (.tsv) makes it work (excel-tab dialect of csv parser) when passing the file name, but not when passing the same data via stdin (-).

@hjacobs
Copy link
Author

hjacobs commented May 24, 2022

OK, apparently the problem is with only sniffing truncated data ([:1024]) which can break the CSV sniffer algorithm as it tries to detect the delimiter by counting the occurrences on each line (and truncating in the middle of a line will therefore corrupt the data for the sniffer).

Changing the logic to sniff the first N lines instead of first 1024 characters would solve this issue.

@patatetom
Copy link

patatetom commented Jun 29, 2022

hi,
and/or adding a --delim option (or something similar) on the command line to force the definition (in case of detection problem for example)...
regards.

@harkabeeparolus
Copy link

OK, apparently the problem is with only sniffing truncated data ([:1024]) which can break the CSV sniffer algorithm as it tries to detect the delimiter by counting the occurrences on each line (and truncating in the middle of a line will therefore corrupt the data for the sniffer).

Wow, is that the reason why?!? 😲 I've been wondering for years why the code example in the official Python csv.Sniffer docs does not seem to work. I never realized it is because it breaks in the middle of a line. 🤨

Seems to me this should be fixed in the official Python docs as well, since I've never managed to get it to work...

Anyway, thanks for this gem! 😊

@luckman212
Copy link

Is there any --delim or similar option to force delim detection?

@YUKI2eN3e
Copy link

I just submitted a pull request to add --csv-format that lets you set the dialect to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants