Build pipelines that won’t even compile when contracts drift. Keep transformations pure, put effects at the edges, and run on Spark and Flink.
- Runtime schema drift burns weekends. We believe failures should move left - into the compiler.
- Side‑effects inside transforms amplify retries/speculation. We believe effects belong at the edges and must be idempotent.
- Engineers deserve fast, local feedback. We believe pure transformations and compile‑fail tests make data engineering joyful again.
"A partner team removed a nullable column late Friday. We couldn’t roll back in time; both teams were up all night. If that change had been a compile error, we would have slept."
Think "contracts like Pydantic/Avro - but enforced before jobs run," "pure functions you can test without a cluster," and "connectors/engines that make IO explicit and safe."
You get compile‑time guarantees (not CI or runtime heuristics), a small opinionated surface, and batteries‑included defaults with escape hatches.
- Compile‑time contracts:
SchemaConforms[Out, Contract, Policy]proves compatibility; policies include Exact, Backward, Forward (+ Ordered/CI/ByPosition). See docs/how-it-fails.md. - Typestate builder:
build()exists only when source, transforms, and sink are present. Incomplete pipelines are unbuildable. - Pure vs effect boundary: transforms are pure functions;
F[_]only at IO edges; engines plug into a single algebra. - Pictures over prose: see flowchart.svg and optionality.md.
- Core: contracts, builder, EffectSystem, DataAlgebra.
- Engines: Spark (primary 1.0), Flink (2.12 only).
- Connectors: filesystem, JDBC, GCS (more coming).
- Data Quality: native checks by default; optional Deequ when present.
- Template: flowforge.g8 for new projects.
- Getting started quick: docs/getting-started.md
- Full start: docs/getting-started.md
- Public API: docs/public-api.md
- How it fails (error anatomy): docs/how-it-fails.md
- Framework behaviors (non‑negotiables): docs/design/framework-behaviors.md
- Cut a release: docs/release/how-to-cut-a-release.md
Nightly runs provide broader integration coverage.
- Compile‑fail contracts for typed endpoints under policy lattice
- Typestate builder:
build()only when complete - incomplete pipelines can’t compile - Pure transforms; effectful edges; idempotent side‑effects by design
- See: docs/design/framework-behaviors.md
Prereq: JDK 17+, sbt 1.9+
1) Clone & build
git clone https://github.com/vim89/flowforge.git && cd flowforge
sbt compile2) See a compile‑time contract failure (red → green)
// Paste in REPL or a scratch test to feel it
import com.flowforge.core.contracts._
final case class Out(id: Long)
final case class Contract(id: Long, email: String)
implicitly[SchemaConforms[Out, Contract, SchemaPolicy.Exact]] // ❌ compile‑time error (missing email)Relax the policy to Backward (allows extra producer fields and missing optional/defaults):
implicitly[SchemaConforms[Out, Contract, SchemaPolicy.Backward]] // ✅3) Build a pipeline - typestate forbids incomplete builds
import cats.effect.IO
import com.flowforge.core.PipelineBuilder
import com.flowforge.core.types._
import com.flowforge.core.contracts._
final case class User(id: Long, email: String)
val src = TypedSource[User](LocalDataSource("/tmp/in", DataFormat.Parquet))
val sink = TypedSink[User](LocalDataSink("/tmp/out", DataFormat.Parquet))
PipelineBuilder[IO]("demo")
.addTypedSource[User, User, SchemaPolicy.Exact](src, _ => IO.pure(User(1, "a@b")))
.noTransform
.addTypedSink[User, SchemaPolicy.Exact](sink, (_, _) => IO.unit)
.build() // ✅ build is available only now4) Explore diagrams and failure messages
- Diagrams: flowchart.svg, optionality.md
- Failure anatomy: docs/how-it-fails.md
| Path | Goal | Commands |
|---|---|---|
| A - Examples | Try locally (no cluster) | sbt ffDev (compile + focused tests), sbt ffRunSpark (Spark local[*]) |
| B - Red→Green | See compile‑time error then fix | Use the snippet above; run sbt compile |
| C - New project | Scaffold with g8 | sbt new flowforge.g8 --name="ff-demo" --organization="com.acme" then sbt test / sbt run |
| Component | Version | Notes |
|---|---|---|
| JDK | 17+ | CI pinned to 17; Spark 3.5.x compatibility |
| sbt | 1.9+ | |
| Scala | 2.13 (primary) | Scala 3 for core only (no Spark deps) |
| Spark | 3.5.x | Runs on Java 17 |
| Flink | Scala 2.12 only | Scala API constraints |
Flink’s Scala API is 2.12‑only. The root build excludes Flink from the default aggregate so that +compile, +test:compile, and +test stay green for 2.13 (and Scala 3 where applicable). Build/test Flink explicitly when you need it:
# Compile Flink (Scala 2.12)
sbt "++2.12.* enginesFlink/compile"
# Run Flink tests (Scala 2.12)
sbt "++2.12.* enginesFlink/test"
References: Flink documents binary incompatibility across Scala lines and the need to select the matching _2.12 artifacts for the Scala API. See Flink’s docs on Scala versions and sbt cross‑build guidance.
The diagrams above summarize derivation and policy checks; see also docs/diagrams/compile-time-contracts/guide.md for narrative.
- Examples module: modules/examples (runnable demos)
- Optional Deequ mode: add
-Dff.quality.mode=deequ(auto‑enables when on classpath)
- Start here: docs/start-here.md; quick: docs/getting-started.md
- Why compile‑time: docs/why-compile-time.md
- How it fails: docs/how-it-fails.md
- Public API: docs/public-api.md
- ADR index: docs/adr/INDEX.md
- Evidence: docs/evidence (e.g., scala3-alignment.md)
- Plan & Readiness: docs/plan/v1.0-readiness.md, docs/quality/release-criteria.md
- Talks: docs/talks (WHY→HOW→WHAT outline)
- CHANGELOG: CHANGELOG.md
- Security: SECURITY.md
- v1.0 Plan/Readiness: docs/plan/v1.0-readiness.md, docs/quality/release-criteria.md
- Scala 3?
- Core compiles on Scala 3; engines depend on Spark/Flink ecosystem (Spark 3.x limits Scala 3 today).
- Why compile‑time vs tests?
- Tests are sampled; compile‑time proofs are exhaustive for shapes and policy compatibility.
- How does this compare to Databricks DLT/Dagster/dbt?
- They perform runtime/CI checks; FlowForge enforces compile‑time gates and typestate builder. See docs/evidence for deeper comparisons.
We welcome folks from Python/ETL backgrounds and JVM veterans alike. Start with docs/contributing/HANDBOOK.md, then pick an issue. Please run sbt scalafmtAll and sbt "scalafixAll" before submitting.
Flowforge adopts a hybrid licensing structure combining open innovation and IP protection.
- Legacy / historical releases remain under MIT (for transparency and ecosystem continuity).
- Active and future releases (v1.0 and onward) are licensed under AGPLv3 with additional Flowforge terms (“RESTRICTED COMMERCIAL & DERIVATIVE TERMS FOR FLOWFORGE” in
LICENSE). - Commercial usage (offering as SaaS, embedding in proprietary systems, or internal closed-source deployments) requires a separate commercial license. See
COMMERCIAL_LICENSE.mdfor template. - Contributor License Agreement (CLA) in
CLA.mdgoverns contribution terms, ensuring compatibility with the hybrid licensing framework. - Commercial exceptions and dual-licensing are handled directly by Vitthal Mirji for partners and enterprise use.
The goal: protect Flowforge’s compile-time innovation while keeping community use free and open.