Simplify SQL Workflows with Scala
CSS JavaScript Scala Shell HTML
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.idea/copyright
project
silk-core/src
silk-cui/src
silk-examples/src
silk-frame/src
silk-macros/src/main/scala/xerial/silk/macro
silk-sbt/src/main/scala/xerial/sbt
silk-server/src/main
silk-workflow/src
src/pack
.gitignore
.travis.yml
LICENSE
README.md
SUMMARY.md
build.sbt
sbt
version.sbt

README.md

Silk: A framework for managing SQL data flows.

http://xerial.org/silk

Examples

import xerial.silk.core._

import sampledb._

// SELECT count(*) FROM nasdaq
def dataCount = nasdaq.size

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'
def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)

// You can use a raw SQL statjement as well:
def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10
appleStock.limit(10).print

// time-column based filtering
appleStock.between("2015-05-01", "2015-06-01")

for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {
  nasdaq.filter(_.symbol is company).selectAll
}

Milestones

  • Build SQL + local analysis workflows

  • Submit queries to Presto / Treasure Data

  • Run scheduled queries

  • Retry upon failures

  • Cache intermediate results

  • Resume workflow

  • Partial workflow executions

  • Sampling display

    • Interactive mode
  • Split a large query into small ones

    • Differential computation for time-series data
  • Windowing for stream queries

  • Object-oriented workflow

  • Input Source: fluentd/embulk

  • Output Source:

  • Workflow Executor

    • Local-only mode
    • Register SQL part to Treasure Data
    • Run complex analysis on local cache
    • UNIX command executor