Simplify SQL Workflows with Scala
CSS JavaScript Scala Shell HTML
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.idea/copyright
project
silk-core/src
silk-cui/src
silk-examples/src
silk-frame/src
silk-macros/src/main/scala/xerial/silk/macro
silk-sbt/src/main/scala/xerial/sbt
silk-server/src/main
silk-workflow/src
src/pack
.gitignore
.travis.yml
LICENSE
README.md
SUMMARY.md
build.sbt
sbt
version.sbt

README.md

Silk: A framework for managing SQL data flows.

http://xerial.org/silk

Examples

import xerial.silk.core._

import sampledb._

// SELECT count(*) FROM nasdaq
def dataCount = nasdaq.size

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'
def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)

// You can use a raw SQL statjement as well:
def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10
appleStock.limit(10).print

// time-column based filtering
appleStock.between("2015-05-01", "2015-06-01")

for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {
  nasdaq.filter(_.symbol is company).selectAll
}

Milestones

  • Build SQL + local analysis workflows
  • Submit queries to Presto / Treasure Data
  • Run scheduled queries
  • Retry upon failures
  • Cache intermediate results
  • Resume workflow
  • Partial workflow executions
  • Sampling display
    • Interactive mode
  • Split a large query into small ones

    • Differential computation for time-series data
  • Windowing for stream queries

  • Object-oriented workflow

  • Input Source: fluentd/embulk

  • Output Source:

  • Workflow Executor

    • Local-only mode
    • Register SQL part to Treasure Data
    • Run complex analysis on local cache
    • UNIX command executor