### Custom SQL Database support

Our JDBC-based SQL integration for DataFrame has become extensible!

This means that if you have an SQL database that we currently don't support, you can
create your own `DbType` instance and read from your database to a dataframe.

Remember that we already support quite a few databases: MariaDB, PostgreSQL, MySQL, SQLite, MS SQL, and H2 (with dialects).

To get started, we need a custom `DbType`.

For the sake of example, we'll create a custom DbType based on the `HSQLDB` Database. Ordinarily, you'd extend `DbType("jdbc name of your database")`.

In [2]:
USE {
    dependencies("org.hsqldb:hsqldb:2.7.3")
}

In [1]:
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.api.describe
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.io.getSchemaForSqlTable
import org.jetbrains.kotlinx.dataframe.io.readSqlTable
import org.jetbrains.kotlinx.dataframe.io.getSchemaForAllSqlTables
import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema
import java.sql.DriverManager
import java.util.*
import org.jetbrains.kotlinx.dataframe.examples.jdbc.customdb.*


In [3]:
DriverManager.getConnection(URL, USER_NAME, PASSWORD).use { con ->
    createAndPopulateTable(con)
}

java.sql.SQLTransientConnectionException: java.net.ConnectException: Connection refused: connect

**The IMDB Database Exploration: printing schemas for all non-system tables**

In [5]:
val dbConfig = DbConnectionConfig(URL, USER_NAME, PASSWORD)

val dataschemas = DataFrame.getSchemaForAllSqlTables(dbConfig, dbType = HSQLDB)

dataschemas.forEach { 
    println("--- Schema for Table ${it.key} ---")
    println(it.value)
    println()
}

--- Schema for Table ORDERS ---
ID: Int
ITEM: String
PRICE: Double
ORDER_DATE: java.util.Date?



**The IMDB Data Quick Exploration: printing 100 rows from each non-system table**

In [6]:
val dfs = DataFrame.readAllSqlTables(dbConfig, dbType = HSQLDB).values

dfs.forEach {
    it.describe().print()
    it.print(5)
}

         name           type count unique nulls        top freq   mean        std        min     median        max
 0         ID            Int     2      2     0          0    1    0,5   0,707107          0          0          1
 1       ITEM         String     2      2     0     Laptop    1   null       null     Laptop     Laptop Smartphone
 2      PRICE         Double     2      2     0       1500    1 1100,0 565,685425        700       1100       1500
 3 ORDER_DATE java.util.Date     2      1     0 2024-12-04    2   null       null 2024-12-04 2024-12-04 2024-12-04

   ID       ITEM  PRICE ORDER_DATE
 0  0     Laptop 1500,0 2024-12-04
 1  1 Smartphone  700,0 2024-12-04



In [7]:
dbConfig

DbConnectionConfig(url=jdbc:hsqldb:hsql://localhost/testdb, user=SA, password=)

In [8]:
val ordersDf = DataFrame.readSqlTable(dbConfig, "orders", dbType = HSQLDB)
ordersDf

ID,ITEM,PRICE,ORDER_DATE
0,Laptop,1500000000,2024-12-04
1,Smartphone,700000000,2024-12-04


In [27]:
val updatedDf = ordersDf.add("TAX") { it["PRICE"] as Double * 0.1 }
updatedDf

ID,ITEM,PRICE,ORDER_DATE,TAX
0,Laptop,1500000000,2024-12-04,150000000
1,Smartphone,700000000,2024-12-04,70000000


In [9]:
DriverManager.getConnection(URL, USER_NAME, PASSWORD).use { con ->
    removeTable(con)
}

0