Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_csv_allow_single_quotes not working for dictionaries #2829

Closed
champtar opened this issue Aug 8, 2018 · 12 comments
Closed

format_csv_allow_single_quotes not working for dictionaries #2829

champtar opened this issue Aug 8, 2018 · 12 comments
Labels
comp-dictionary Dictionaries comp-formats Input / output formats easy task Good for first contributors feature help wanted st-fixed

Comments

@champtar
Copy link
Contributor

champtar commented Aug 8, 2018

Hello ClickHouse Team,

I'm trying to use GeoLite2-City-Locations-en.csv as dictionary

Using Clickhouse 18.5.1

<dictionaries>
    <dictionary>
        <name>geoname</name>

        <source>
            <file>
                <path>GeoLite2-City-Locations-en.csv</path>
                <format>CSVWithNames</format>
            </file>
        </source>

        <lifetime>600</lifetime>

        <layout>
            <flat/>
        </layout>

        <structure>
            <id>
                <name>geoname_id</name>
            </id>
            <attribute>
                <name>locale_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>continent_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>continent_name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>country_iso_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>country_name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>subdivision_1_iso_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>subdivision_1_name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>subdivision_2_iso_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>subdivision_2_name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>city_name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>metro_code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>time_zone</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>is_in_european_union</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>
    </dictionary>
</dictionaries>

extract of GeoLite2-City-Locations-en.csv to reproduce the problem

geoname_id,locale_code,continent_code,continent_name,country_iso_code,country_name,subdivision_1_iso_code,subdivision_1_name,subdivision_2_iso_code,subdivision_2_name,city_name,metro_code,time_zone,is_in_european_union
102912,en,AS,Asia,SA,"Saudi Arabia",04,"Eastern Province",,,"Ra's al Khafji",,Asia/Riyadh,0
103012,en,AS,Asia,SA,"Saudi Arabia",08,"Northern Borders",,,Rafha,,Asia/Riyadh,0
103035,en,AS,Asia,SA,"Saudi Arabia",02,"Makkah Province",,,Rabigh,,Asia/Riyadh,0
103174,en,AS,Asia,SA,"Saudi Arabia",04,"Eastern Province",,,"Al Qudayh",,Asia/Riyadh,0
103369,en,AS,Asia,SA,"Saudi Arabia",14,'Asir,,,Bisha,,Asia/Riyadh,0
103630,en,AS,Asia,SA,"Saudi Arabia",10,Najran,,,Najran,,Asia/Riyadh,0
104514,en,AS,Asia,SA,"Saudi Arabia",02,"Makkah Province",,,,,Asia/Riyadh,0
104515,en,AS,Asia,SA,"Saudi Arabia",02,"Makkah Province",,,Mecca,,Asia/Riyadh,0
105072,en,AS,Asia,SA,"Saudi Arabia",14,'Asir,,,"Khamis Mushait",,Asia/Riyadh,0
105298,en,AS,Asia,SA,"Saudi Arabia",09,Jizan,,,,,Asia/Riyadh,0
SELECT dictGetString('geoname', 'country_name', toUInt64(103369))

Received exception from server (version 18.5.54400):
Code: 27. DB::Exception: Received from 127.0.0.1:9000. DB::Exception: Cannot parse input: expected , before: Asir,,,"Khamis Mushait",,Asia/Riyadh,0\n105298,en,AS,Asia,SA,"Saudi Arabia",09,Jizan,,,,,Asia/Riyadh,0\n: (at row 5)

Row 4:
Column 0,   name: geoname_id,             type: UInt64, parsed text: "103174"
Column 1,   name: locale_code,            type: String, parsed text: "en"
Column 2,   name: continent_code,         type: String, parsed text: "AS"
Column 3,   name: continent_name,         type: String, parsed text: "Asia"
Column 4,   name: country_iso_code,       type: String, parsed text: "SA"
Column 5,   name: country_name,           type: String, parsed text: "<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>"
Column 6,   name: subdivision_1_iso_code, type: String, parsed text: "04"
Column 7,   name: subdivision_1_name,     type: String, parsed text: "<DOUBLE QUOTE>Eastern Province<DOUBLE QUOTE>"
Column 8,   name: subdivision_2_iso_code, type: String, parsed text: <EMPTY>
Column 9,   name: subdivision_2_name,     type: String, parsed text: <EMPTY>
Column 10,  name: city_name,              type: String, parsed text: "<DOUBLE QUOTE>Al Qudayh<DOUBLE QUOTE>"
Column 11,  name: metro_code,             type: String, parsed text: <EMPTY>
Column 12,  name: time_zone,              type: String, parsed text: "Asia/Riyadh"
Column 13,  name: is_in_european_union,   type: String, parsed text: "0"

Row 5:
Column 0,   name: geoname_id,             type: UInt64, parsed text: "103369"
Column 1,   name: locale_code,            type: String, parsed text: "en"
Column 2,   name: continent_code,         type: String, parsed text: "AS"
Column 3,   name: continent_name,         type: String, parsed text: "Asia"
Column 4,   name: country_iso_code,       type: String, parsed text: "SA"
Column 5,   name: country_name,           type: String, parsed text: "<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>"
Column 6,   name: subdivision_1_iso_code, type: String, parsed text: "14"
Column 7,   name: subdivision_1_name,     type: String, parsed text: "<SINGLE QUOTE>Asir,,,Bisha,,Asia/Riyadh,0<LINE FEED>103630,en,AS,Asia,SA,<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>,10,Najran,,,Najran,,Asia/Riyadh,0<LINE FEED>104514,en,AS,Asia,SA,<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>,02,<DOUBLE QUOTE>Makkah Province<DOUBLE QUOTE>,,,,,Asia/Riyadh,0<LINE FEED>104515,en,AS,Asia,SA,<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>,02,<DOUBLE QUOTE>Makkah Province<DOUBLE QUOTE>,,,Mecca,,Asia/Riyadh,0<LINE FEED>105072,en,AS,Asia,SA,<DOUBLE QUOTE>Saudi Arabia<DOUBLE QUOTE>,14,<SINGLE QUOTE>"
ERROR: There is no delimiter (,). "A" found instead.

. 

0 rows in set. Elapsed: 0.161 sec. 

I've tried to add

<format_csv_allow_single_quotes>0</format_csv_allow_single_quotes>

to config.xml
and also

--format_csv_allow_single_quotes=0

to the client but no changes

Related to #2574

@champtar
Copy link
Contributor Author

champtar commented Aug 8, 2018

workaround for now: quote everything

import csv
with open('GeoLite2-City-Locations-en.csv', 'rb') as csvfilein:
    with open('GeoLite2-City-Locations-en-quoted.csv', 'w') as csvfileout:
        csvreader = csv.DictReader(csvfilein, delimiter=',')
        writer = csv.DictWriter(csvfileout, fieldnames=csvreader.fieldnames, quoting=csv.QUOTE_ALL)
        writer.writeheader()
        for row in csvreader:
            writer.writerow(row)

@alexey-milovidov
Copy link
Member

It can be solved if we will allow to specify custom settings for dictionaries:

<dictionaries>
    <dictionary>
        <name>geoname</name>
        ...
        <settings>
            ...

It is not implemented but it looks quite easy to do.

@alexey-milovidov alexey-milovidov added easy task Good for first contributors feature labels Aug 8, 2018
@alexey-milovidov
Copy link
Member

For cached dictionaries, settings should override query settings.

@fessmage
Copy link
Contributor

fessmage commented Feb 8, 2019

Same problem with same database - GeoLite2-City-Locations. We definitely need this feature for dictionaries.

@champtar
Copy link
Contributor Author

@alexey-milovidov is this issue somewhere on the roadmap ?

@alexey-milovidov
Copy link
Member

This issue is not on the roadmap. But it's pretty easy in implementation and I almost forgot about it.

@filimonov filimonov added the comp-dictionary Dictionaries label May 7, 2019
@filimonov filimonov added the comp-formats Input / output formats label Jun 11, 2019
@gongled
Copy link

gongled commented Mar 7, 2020

Any news? I am facing the same issue with an external dictionaries in CSV format. I have already bypassed it, but it would be nice to implement setting overriding for dictionaries.

@alexey-milovidov
Copy link
Member

This is quite easy task to do. Just add support for settings clause in configuration of dictionaries and in CREATE DICTIONARY query.

@kekekekule
Copy link
Contributor

maybe we'll take it
@evillique @namikhnenko

@alexey-milovidov
Copy link
Member

The original issue is fixed in #10137 for XML configuration.

@alexey-milovidov
Copy link
Member

Example: https://github.com/ClickHouse/ClickHouse/pull/10137/files#diff-fb9153de36f137866ae2fe4d54767f3bR51

Available since version 20.4.

@alexey-milovidov
Copy link
Member

The implementation of SETTINGS clause for DDL queries is pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-dictionary Dictionaries comp-formats Input / output formats easy task Good for first contributors feature help wanted st-fixed
Projects
None yet
Development

No branches or pull requests

7 participants