Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to enable parsing CSV with single/double quote. #2574

Merged
merged 1 commit into from
Jul 4, 2018

Conversation

amosbird
Copy link
Collaborator

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

#2192 #2517

@alexey-milovidov alexey-milovidov self-requested a review July 4, 2018 20:52
@alexey-milovidov
Copy link
Member

TODO:

  • rename settings to more descriptive names;
  • check maximum performance difference in extreme case;

alexey-milovidov added a commit that referenced this pull request Jul 4, 2018
alexey-milovidov added a commit that referenced this pull request Jul 4, 2018
alexey-milovidov added a commit that referenced this pull request Jul 4, 2018
@alexey-milovidov
Copy link
Member

Performance is Ok.

seq 1 1000000 | sed -r -e "s/^.+$/'',hello,'\0'/" > test.csv

for i in {1..100}; do (time ./clickhouse-master local --input-format CSV --structure 'a String, b String, c UInt64' --query="SELECT count() FROM table" < test.csv) 2>&1 | grep user; done | sort | head -n10

for i in {1..100}; do (time ./clickhouse-more-csv-settings local --input-format CSV --structure 'a String, b String, c UInt64' --query="SELECT count() FROM table" < test.csv) 2>&1 | grep user; done | sort | head -n10

No visible performance difference.

@alexey-milovidov alexey-milovidov merged commit 900b046 into ClickHouse:master Jul 4, 2018
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jul 4, 2018

Or it became a little better for unknown reason (no need to investigate):

$ for i in {1..100}; do (time ./clickhouse-master local --input-format CSV --structure 'a String, b String, c UInt64' --query="SELECT count() FROM table" < test.csv) 2>&1 | grep user; done | sort | head -n10
user    0m0.116s
user    0m0.120s
user    0m0.124s
user    0m0.124s
user    0m0.124s
user    0m0.124s
user    0m0.124s
user    0m0.124s
user    0m0.124s
user    0m0.128s
$ for i in {1..100}; do (time ./clickhouse-more-csv-settings local --input-format CSV --structure 'a String, b String, c UInt64' --query="SELECT count() FROM table" < test.csv) 2>&1 | grep user; done | sort | head -n10
user    0m0.112s
user    0m0.112s
user    0m0.112s
user    0m0.116s
user    0m0.116s
user    0m0.116s
user    0m0.116s
user    0m0.116s
user    0m0.116s
user    0m0.116s
$ for i in {1..100}; do (time clickhouse local --input-format CSV --structure 'a String, b String, c UInt64' --query="SELECT count() FROM table" < test.csv) 2>&1 | grep user; done | sort | head -n10
user    0m0.116s
user    0m0.116s
user    0m0.120s
user    0m0.120s
user    0m0.120s
user    0m0.120s
user    0m0.120s
user    0m0.120s
user    0m0.120s
user    0m0.120s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants