Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Type DOUBLE does not exist" in Vertica database #3236

Closed
ablack3 opened this issue Dec 1, 2017 · 25 comments
Closed

"Type DOUBLE does not exist" in Vertica database #3236

ablack3 opened this issue Dec 1, 2017 · 25 comments
Assignees
Labels
feature a feature request or enhancement

Comments

@ablack3
Copy link

ablack3 commented Dec 1, 2017

I have an ODBC connection to a Vertica database and am using dplyr to query it.

Vertica has types DOUBLE PRECISION and NUMERIC but does not have type DOUBLE.

When I try to convert a variable of type CHARACTER to NUMERIC dplyr generates incorrect SQL code.

Unfortunately I cannot create a fully reproducible example because the issue involves a Vertica database connection.

con <- DBI:: dbConnect(odbc::odbc(), dsn = "Vertica")
dbplyr::translate_sql(as.numeric(myVar), con = con)

CAST("myVar" AS DOUBLE)
This will not work since DOUBLE is not a valid type in Vertica.
DOUBLE PRECISION or NUMERIC should work.

I get the same issue when I use as.double()

dbplyr::translate_sql(as.double(myVar), con = con)

CAST("myVar" AS DOUBLE)

However if I leave off the connection argument the correct SQL is generated

dbplyr::translate_sql(as.numeric(myVar))

CAST("myVar" AS NUMERIC)

How do I fix this issue and get started adding support for Vertica?
Thanks!

We’re working to add support for more databases over time, but adding support on your own is surprisingly easy. Submit an issue to dplyr and we’ll help you get started.
https://blog.rstudio.com/2017/06/27/dbplyr-1-1-0/

@edgararuiz-zz
Copy link

Hi @ablack3 , this is because there's no custom translation for Vertica, so dbplyr uses the default conversion: https://github.com/tidyverse/dbplyr/blob/a424f67cee36a328cae9db77eea0443e4947a495/R/translate-sql-odbc.R#L5-L6

A workaround is to add a sql_translate_env.Vertica method and overwriting those translations. I'm assuming that the connection's name in your environment is Vertica, you can confirm that by just running: con

@ablack3
Copy link
Author

ablack3 commented Dec 6, 2017

Thank for the suggestions @edgararuiz. I'm an intermediate R programmer and don't much of a handle on methods and classes in R yet. I'm having trouble understanding the code in translate-sql-odbc.R. It looks like sql_translate_env just calls sql_variant and the type conversions are handled in the base_odbc_scalar function.

I'm interested in being able to extend this awesome package. Is there a higher level viewpoint I can start from so I can get an overall sense of how dplyr generates SQL and how to modify and extend it? Thanks!

@edgararuiz-zz
Copy link

Hi, yes, we have a Translation page in our official site, here: http://db.rstudio.com/advanced/translation/

Happy to help further if there's anything else that would need clarification after that

@ablack3
Copy link
Author

ablack3 commented Dec 7, 2017

@edgararuiz Thanks for the helpful info.

class(con) is "Vertica Database".
How should I name a method for a class that has spaces?

@edgararuiz-zz
Copy link

Yes, you can use back ticks, see the MS SQL translation: https://github.com/tidyverse/dbplyr/blob/a424f67cee36a328cae9db77eea0443e4947a495/R/db-odbc-mssql.R#L2

@ablack3
Copy link
Author

ablack3 commented Dec 31, 2017

OK type casting is working! Thanks!

Now I would like to get the copy_to function working. When I try to use copy_to I get the same error: "Type DOUBLE does not exist". I'm also going to need to customize the SQL command used to create the table. To create a temporary table I need to send a SQL command of the form
CREATE LOCAL TEMPORARY TABLE tableName ON COMMIT PRESERVE ROWS

@edgararuiz-zz
Copy link

Right, that's a change that need to be submitted to the odbc package, in the DataTypes script are all of the custom translations for writing tables: https://github.com/r-dbi/odbc/blob/master/R/DataTypes.R, which is what copy_to() ultimately uses.

@edgararuiz-zz
Copy link

Oh I forgot, I believe that to test a custom translation without having to submit a PR or commit to a fork, you can create your own custom odbcDataType() function in your script, kind of like this: https://github.com/r-dbi/odbc/blob/f460f4d58d488713c6aa6c11fb25c91a1b798c31/R/DataTypes.R#L88-L102

@ablack3
Copy link
Author

ablack3 commented Feb 21, 2018

Hi Edgar,

I'm having an issue with using the latest version of dbplyr on my Vertica database. I think the problem started when I updated to dbplyr 1.2.0

The problem is that this code

tbl(con, "my_table") %>%
        count() %>%
        show_query()

produces
SELECT "", COUNT(*) AS "n" FROM "my_table" GROUP BY ""

This gives me an error in the database because I can't select or group by an empty string.
If I add ungroup() before count() or change count() to tally() I get the same thing. When I try this with a local postgres database it produces the correct query:
SELECT COUNT(*) AS "n" FROM "my_table"

Any idea how to fix this?

Great meeting you at the RStudio conference by the way. Thanks very much for your help and all of the amazing work you all are doing!

-Adam

@edgararuiz-zz
Copy link

Thanks @ablack3 !

And yes, I believe this is due an older issue in the odbc package: r-dbi/odbc#146

I'd recommend upgrading that package to the latest version and try again.

@ablack3
Copy link
Author

ablack3 commented Feb 23, 2018

That worked like a charm. I'd really like to get the copy_to function working on my database but I need to tweak the syntax a bit in order for it to work. copy_to generates SQL of the form
create temporary table "my_table" ("my_int_column" integer)
but in order to create tables in the database I'm using I have to use this syntax:
create local temporary table "my_table" ("my_int_column" integer) on commit preserve rows

Is there any way to get copy_to to add the local and on commit preserve rows keywords or somehow manually add them before the query is sent to the database?

@edgararuiz-zz
Copy link

Yes, you can create a custom db_create_table function for your connection, it should be called:
` db_create_table.Vertica Database`

The code should be a modified version of the base function, found here: https://github.com/tidyverse/dbplyr/blob/c0e51957a2fbeb4fe7bc93f0ed677d6d33e14828/R/dbi-s3.r#L55-L73

BTW - Would you mind posting that code once you get it working? We can re-use the code in dbplyr once work to add Vertica support starts. Thanks!

@ablack3
Copy link
Author

ablack3 commented Mar 6, 2018

I'm more than happy to post my code. dplyr + Vertica is a powerful combination that I'd like to get working well for myself and others. I appreciate your help so much, and anything I can do to contribute or give back I'd love to do.

@ablack3
Copy link
Author

ablack3 commented Mar 6, 2018

Ok I created two methods: db_write_table.Vertica Database and db_create_table.Vertica Database. Now copy_to basically works but does throw an error saying "Command ANALYZE is not supported". I'm not familiar with the ANALYZE command and I'm not sure it needs to be called at all. So copy to does succeed in creating a table in the database but also throws an error that would be nice to get rid of. Progress!

@edgararuiz-zz
Copy link

Right, you can remove that step in your custom write and create tables?

@ablack3
Copy link
Author

ablack3 commented Mar 7, 2018

The call to db_analyze that produces the error occurs in db_copy_to here which I did not overwrite. However if I add analyze = FALSE argument when I call copy_to it works and does not produce an error.

A couple questions:
Should I create a custom db_analyze.Vertica Database method? I don't actually know what the ANALYZE command does or if it is needed for Vertica. Vertica does have an ANALYZE_STATISTICS command that might be comparable.

How should I go about adding my code? Should I fork dbplyr, add a db-vertica.R file with my Vertica Database methods and then submit a pull request?

@hadley
Copy link
Member

hadley commented May 20, 2018

@edgararuiz can you please close the loop here?

@hadley hadley added the feature a feature request or enhancement label May 20, 2018
@edgararuiz-zz
Copy link

Sure. What I'll do is work on a PR for this, and ask if @ablack3 can test before marking as ready for review by you.

@edgararuiz-zz edgararuiz-zz self-assigned this May 23, 2018
@edgararuiz-zz
Copy link

@ablack3 - I just reread your last post, do you wish to submit a PR with your changes? If so, do you need any help through the process? You're right, you need to create a fork of the package in your repo, and make your changes, and submit a PR.

@ablack3

This comment has been minimized.

@ablack3
Copy link
Author

ablack3 commented May 26, 2018

@edgararuiz - I looked over my code again and I think there are still too many weird idiosyncrasies to be useful to the general population of dplyr users.

With your help I have found workarounds for the issues I had initially, but I would need a better understanding of both dbplyr and Vertica to make them useful for others. The main issues were the creating a custom sql_translate_env for Vertica and getting copy_to working. I did define a custom sql_translate_env function which fixed the "TYPE DOUBLE DOES NOT EXIST" error but seemed to create other problems. One workaround I've used is simply to multiply character columns by 1.0 to convert them to DOUBLE PRECISION.

Getting copy_to to work has been tricky. I defined custom db_create_table and db_write_table functions and now copy_to works as long as the analyze argument is FALSE. A couple problems remain though. Characters are converted to type VARCHAR(80) so I cannot create tables with character variables longer than 80 characters. I've also seen cases where copy_to only copies the first row of a local dataframe to the database and have no idea why.

Someone with a much greater understanding of dplyr and Vertica than me created the vertica.dplyr package which presumably solved these problems and more. However it has not been updated and does not work with odbc. It uses the RODBC and RJDBC packages which I have not been able to get working.

All that is to say that adding a Vertica support seems a bit more complicated than I initially thought. Perhaps someone else will update the vertica.dplyr package so that it works with odbc. I'll keep working on this as I have time but it is fine with me if you want to close the issue. Thanks again very much for your help!

@edgararuiz-zz
Copy link

Ok, thanks @ablack3 for the follow up. It looks like that repo has not been touched in 2 yrs, so I'll see about addressing some of your findings in a PR.

@hadley
Copy link
Member

hadley commented Jun 7, 2018

@edgararuiz can you please write up a summary of this thread in dbplyr issues, then close this issue?

@ghost
Copy link

ghost commented Jun 21, 2018

This issue was moved by batpigandme to tidyverse/dbplyr/issues/106.

@ghost ghost closed this as completed Jun 21, 2018
@lock
Copy link

lock bot commented Dec 18, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 18, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants