Skip to content

Troubleshooting ATK to spark tk

Daniel Smith edited this page Jan 12, 2017 · 8 revisions

Troubleshooting: ATK to spark-tk

This page provides resolutions to issues you may encounter when switching from the Analytics Toolkit to the spark-tk library.

Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?

Error message:

java.sql.SQLException: No suitable driver found for <jdbcUrl>  

This troubleshooting tip applies to any JDBC DB connection.

The Analytics Toolkit included a driver for the PostgresSQL DB it used, so compatibility was ensured. Since spark-tk doesn't include any drivers, each JDBC connection will need its own driver.

If this error is encountered while running your application, then your JDBC library cannot be found by the node(s) running the application. Instructions are located on this site.

You need to locate and specify the `.jar' file with the compatible JDBC driver when creating the TkContext instance:

>>> tc = sparktk.TkContext(pyspark_submit_args='--jars myJDBCDriver.jar')

Q: I am using spark-tk and want to save files/export models to my local file system instead of HDFS. How do I do that?

The SparkContext created by TkContext follows the system's current Spark configuration. If your system defaults to HDFS, but you want to use a local file system instead, include use_local_fs=True when creating your TkContext, as follows:

  tc = sparktk.TkContext(use_local_fs=True)  
Clone this wiki locally