Adds jdbc macros from internal #1267

merged 136 commits into from Jul 2, 2015


None yet

4 participants

rubanm commented May 1, 2015

Part 1 of the new scalding-db and scalding-db-macros subprojects. Addresses #1124

This adds @ianoc's original JDBC macros along with some refactoring and fixes done during internal use at Twitter.

Part 2 to be followed will be the new JDBC source and sink classes based on these macros.

The motivation for this is to have an improved way of defining JDBC Typed Source and Sinks using two simple steps:

  1. Defining a case class that represents you table schema:
case class ExampleDBRecord(
  user_id: Long,
  tweet_id: Long,
  created_at: java.util.Date,
  deleted: Boolean)
  1. Defining a TypedJDBCSource based on the above case class:
case class ExampleDBRecordsTable(implicit dbsInEnv : AvailableDatabases)
    extends TypedJDBCSource[ExampleDBRecord](dbsInEnv) {
  override val tableName = TableName("example_table")
  override val database = Database("example_schema") 

Under the hood, this case class is automatically mapped to the underlying DB schema using a macro-generated DBTypeDescriptor. As a user, one does not need to specify SQL mappings or Injections to do this back and forth translation.

There are also optimizations for avoiding common pitfalls when talking to databases directly from hadoop nodes (too many open connections, inefficient OFFSET based queries, table lock contention). This includes performing a jdbc -> hdfs snapshot via the submitter for smaller datasets and likewise for writes. These will be in a separate PR along with the new jdbc source classes.

ianoc and others added some commits Jan 4, 2015
@ianoc ianoc Initial copy of a scalding internal db repo 7312121
@ianoc ianoc Adds in vertica support 15bdb70
@ianoc ianoc WIP e0fa45d
@ianoc ianoc fix a9d98e3
@ianoc ianoc Support ad-hoc better b6ff884
@rubanm rubanm first checkin 75651db
@rubanm rubanm more changes 9f0aec3
@rubanm rubanm minor cleanup ca5d199
@rubanm rubanm order imports 0da6fb9
@rubanm rubanm more cleanup db25b79
@rubanm rubanm unit test note e8c1eec
@rubanm rubanm add note on tsv 015c338
@rubanm rubanm cr edits c7c26aa
@rubanm rubanm more cleanup 9cf1bec
@rubanm rubanm add test case for Date tuple converter bdb31f2
@rubanm rubanm add partitioning support d694387
@rubanm rubanm cr edits aebce90
@rubanm rubanm update tw artifactory path d009fa1
@rubanm rubanm http 221248d
@rubanm rubanm Scalding - use json for storing jdbc data 2e366b1
@rubanm rubanm add another unit test a927f82
@rubanm rubanm initial changes 9559966
@rubanm rubanm working version 5351e9b
@rubanm rubanm minor cleanup 946279a
@rubanm rubanm cr edits af462da
@rubanm rubanm clean comment c4a37ab
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_validation
@rubanm rubanm more unit tests 26a278e
@rubanm rubanm use Try instead of assert fb8d020
@ianoc ianoc Merge branch 'master' of 0a960b7
@rubanm rubanm Scalding - jdbc drop prefix for nested case-classes 7ed24b5
@rubanm rubanm minor cleanup c0b5904
@rubanm rubanm minor cleanup d881c2c
@rubanm rubanm case class resolution repro test 0c8b738
@rubanm rubanm repro test for case class resolution, undo previous repro test 5f32695
@rubanm rubanm add missing test file 5cbec3a
@ianoc ianoc Fixes compile issues fd2da96
@rubanm rubanm remove test comments 225c28f
@rubanm rubanm initial commit 9806d4a
@rubanm rubanm working changes 414eac1
@rubanm rubanm minor cleanup 0125093
@rubanm rubanm add preload queries, more refactoring 6d2b83c
@rubanm rubanm add mysql driver check a8e3d4a
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_mysql_writes baadf3d
@rubanm rubanm Jdbc statement setter macro 6a4291d
@rubanm rubanm more comments 6b70f40
@rubanm rubanm cr edits eb70b84
@rubanm rubanm cr edits fc1d7f1
@rubanm rubanm working on changes 09f7cbf
@rubanm rubanm No prefix option to fields provider macro 673dc0e
@rubanm rubanm unit tests 00acd9f
@rubanm rubanm use sealed trait dee9bf0
@rubanm rubanm Merge branch 'master' into rmonu/setter_refactor 0f10fd5
@rubanm rubanm cr edits 432aeb8
@rubanm rubanm minor cleanup 1e3f855
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_mysql_writes
@rubanm rubanm fix merge conflict 686f6e1
@rubanm rubanm fix indexes in jdbc setter 6979bfb
@rubanm rubanm use try helpers 75291e5
@rubanm rubanm user DriverClass 92e5264
@rubanm rubanm rename to onComplete 9765e80
@rubanm rubanm add missing file bdd57f9
@rubanm rubanm Vertica load abort on error 3ec1491
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_mysql_writes
@rubanm rubanm cr edits ab3f73c
@rubanm rubanm clean up recursion, types e95aa58
@rubanm rubanm use onFailure 5c6444c
@rubanm rubanm make driver check lazy 06ba563
@rubanm rubanm better setter getter 5e96cb8
@rubanm rubanm remove comment 73659ab
@rubanm rubanm scalding-jdbc queryFromMappers option 44227bf
@rubanm rubanm cr edits 9f43191
@rubanm rubanm cr edits 5bd5958
@rubanm rubanm disable auto commit f5f427c
@rubanm rubanm commit runQuery 8ea2ce7
@rubanm rubanm rename loader to writer d23c3a4
@rubanm rubanm add exception to error log 9780f0b
@rubanm rubanm rename to run 8c7ba7d
@rubanm rubanm jdbc macro handle multiple case class apply methods b0f8c46
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_mysql_writes 34ee045
@rubanm rubanm Merge branch 'rmonu/jdbc_mysql_writes' into rmonu/mysql_mappers_flag 580c1be
@ianoc ianoc fixes vertica
@ianoc ianoc Vertica native support
@ianoc ianoc updates 1dd9eda
@ianoc ianoc WIP 25c2d29
@rubanm rubanm Mysql transform for read fix, more comments b273754
@rubanm rubanm jdbc - fix writes, count logic 8062f6e
@rubanm rubanm debug logging 811ff02
@rubanm rubanm fix try sequence 760e74d
@rubanm rubanm undo debug logs a5be0ed
@rubanm rubanm initial changes 2e1f78a
@rubanm rubanm jdbc - fix table exists check ea2920b
@rubanm rubanm minor cleanup 26e1663
@rubanm rubanm initial working changes d5cce1f
@rubanm rubanm remove sqlTableCreateStmt from parent class 106a506
@rubanm rubanm cleanup metadata check; 18cf9e2
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_charset_fix 090fff8
@rubanm rubanm merge in hidden files fix ca18cf9
@rubanm rubanm Merge branch 'master' into rmonu/jdbc_charset_fix 86f84ea
@rubanm rubanm there's a val initialization order bug that breaks JobTests so we use…
… a separate one with no call to connectionConfig for now
@rubanm rubanm vertica jdbc - pick active namenode 8add553
@rubanm rubanm logging changes 8af7a42
@rubanm rubanm rename charset to encoding d2e860b
@rubanm rubanm Merge branch 'master' into rmonu/vertica_namenode_fix 7fa2e96
@rubanm rubanm cr edits 4fa399d
@rubanm rubanm vertica jdbc - set public.Hdfs a9c1be6
@rubanm rubanm vertica jdbc - add load query option to apply method 2732aa0
@rubanm rubanm naming scheme for FieldsProviderImpl d043cbf
@rubanm rubanm remove jdbc and vertica subprojects for now. those will be merged wit…
…h oss separately
@rubanm rubanm remove jdbc and vertica subprojects from build file ca1ef9c
@rubanm rubanm tuple setter macro refactor 7caebcb
@rubanm rubanm use tuple setters/converters from scalding-macros 0a4adaf
@rubanm rubanm use bijection-macros f169051
@rubanm rubanm move jdbc macros closer to root dir 71dfe18
@rubanm rubanm fix imports after file move 76d2fa1
@rubanm rubanm fix copyright year 6518512
@rubanm rubanm rename package names from scalding_internal to scalding 942d04d
@rubanm rubanm rename dirs scalding_internal-db-* to scalding-db-* b66d625
@rubanm rubanm initial merge of jdbc macros ded33d7
@rubanm rubanm add new build entries for db and db-macros 4294324
@rubanm rubanm rename scalding-db-core to scalding-db e686898
@ianoc ianoc commented on an outdated diff May 1, 2015
@@ -13,4 +13,6 @@ addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.10.2")
addSbtPlugin("com.typesafe.sbt" % "sbt-ghpages" % "0.5.1")
+addSbtPlugin("com.typesafe" % "sbt-mima-plugin" % "0.1.6")
ianoc May 1, 2015 Collaborator

mima is already in this file

ianoc commented May 1, 2015

@rubanm can you add in the description some examples of using the new jdbc stuff to show motivation for this stuff

ianoc commented May 1, 2015

Looks great though, thanks for splitting this out

rubanm commented May 1, 2015

@ianoc updated the description

johnynek commented May 6, 2015

can we add a README or something that explains how to get going with this? It's great, but reading the code to learn it is pretty hard.

rubanm commented May 6, 2015

@johnynek makes sense. Added a README. It probably needs a few more edits though.

@johnynek johnynek commented on an outdated diff May 22, 2015
+ res1: cascading.tuple.Fields = 'card_id', 'tweet_id', 'created_at', 'deleted | long, long, Date, boolean
+### Supported Mappings
+Scala type | SQL type
+------------- | -------------
+`Int` | `INTEGER`
+`Long` | `BIGINT`
+`Short` | `SMALLINT`
+`Double` | `DOUBLE`
+`@varchar @size(20) String `| `VARCHAR(20)`
+`@text String` | `TEXT`
+`java.util.Date` | `DATETIME`
+`@date java.util.Date` | `DATE`
+`Boolean` | `BOOL`, `BOOLEAN`, `TINYINT`
johnynek May 22, 2015 Collaborator

what is up with the three values on the right in this column?

johnynek May 22, 2015 Collaborator

Seems like the SQL type is BOOLEAN, from reading the code below.

@johnynek johnynek and 1 other commented on an outdated diff May 22, 2015
+Scala type | SQL type
+------------- | -------------
+`Int` | `INTEGER`
+`Long` | `BIGINT`
+`Short` | `SMALLINT`
+`Double` | `DOUBLE`
+`@varchar @size(20) String `| `VARCHAR(20)`
+`@text String` | `TEXT`
+`java.util.Date` | `DATETIME`
+`@date java.util.Date` | `DATE`
+`Boolean` | `BOOL`, `BOOLEAN`, `TINYINT`
+* Annotations are used for String types to clearly distinguish between TEXT and VARCHAR column types
+* Scala `Option`s can be used to denote columns that are `NULLABLE` in the DB
+* Nested case classes can be used as a workaround for the 22-size limitation on Scala tuples, case classes
johnynek May 22, 2015 Collaborator

this seems to stop abruptly: case classes... ? what? I think you are going to explain that case classes are also flattened in left to right order. You might want to show an example of that.

rubanm May 27, 2015 Collaborator

Added a nested case classes example.


Are the tests being run? Did we update the travis running script?

We really need a more reliable way to make sure we are testing our sub-modules.

rubanm commented Jun 17, 2015

Updated the travis script to include the new db modules.

@johnynek johnynek commented on the diff Jul 2, 2015
+(in the REPL)
+Necessary imports:
+ scalding> import com.twitter.scalding.db_
+ scalding> import com.twitter.scalding.db.macros._
+Case class representing your DB schema:
+ scalding> case class ExampleDBRecord(
+ | card_id: Long,
+ | tweet_id: Long,
+ | created_at: Option[java.util.Date],
+ | deleted: Boolean = false)
+ defined class ExampleDBRecord
johnynek Jul 2, 2015 Collaborator

what's the last step? How to I read or write from a Database? Seems like we are stopping short of clearly explaining the use of the whole package here.

rubanm Jul 2, 2015 Collaborator

Yes, the sources will be added in a follow-up PR. So this is a little incomplete in that sense.

@ianoc ianoc merged commit 500ab80 into twitter:develop Jul 2, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
@rubanm rubanm deleted the rubanm:rubanm/jdbc_macros_merged branch Jul 2, 2015
@ianoc ianoc referenced this pull request Aug 10, 2015

Release 0.16.0 #1413


Coverage Status

Changes Unknown when pulling 9c93939 on rubanm:rubanm/jdbc_macros_merged into ** on twitter:develop**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment