Troubleshooting ATK to spark tk

Troubleshooting: ATK to spark-tk

This page provides resolutions to issues you may encounter when switching from the Analytics Toolkit to the spark-tk library.

Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?

Error message:

java.sql.SQLException: No suitable driver found for <jdbcUrl>

This troubleshooting tip applies to any JDBC DB connection.

The Analytics Toolkit included a driver for the PostgresSQL DB it used, so compatibility was ensured. Since spark-tk doesn't include any drivers, each JDBC connection will need its own driver.

If this error is encountered while running your application, then your JDBC library cannot be found by the node(s) running the application. Instructions are located on this site.

You need to locate and specify the `.jar' file with the compatible JDBC driver when creating the TkContext instance:

>>> tc = sparktk.TkContext(pyspark_submit_args='--jars myJDBCDriver.jar')

Q: I am using spark-tk and want to save files/export models to my local file system instead of HDFS. How do I do that?

The SparkContext created by TkContext follows the system's current Spark configuration. If your system defaults to HDFS, but you want to use a local file system instead, include use_local_fs=True when creating your TkContext, as follows:

  tc = sparktk.TkContext(use_local_fs=True)

Home | FAQs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting ATK to spark tk

Troubleshooting: ATK to spark-tk

Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?

Q: I am using spark-tk and want to save files/export models to my local file system instead of HDFS. How do I do that?

Clone this wiki locally