Flowfile

Documentation: Website - Core - Worker - Frontend - Try Online - Technical Architecture

Flowfile is a visual ETL tool and Python library for building data pipelines. It has a drag-and-drop canvas with 30+ node types (joins, fuzzy matching, filters, pivots, aggregations, etc.), connects to databases (PostgreSQL, MySQL, SQL Server, Oracle), cloud storage (S3, ADLS, GCS), and Kafka. Pipelines run on Polars and can be exported as standalone Python scripts. It also includes a data catalog backed by Delta Lake, a scheduler, sandboxed Python execution via Docker kernels, and a programmatic API with Polars-like syntax. Available as a desktop app, web UI, Docker deployment, Python package, or browser-only WASM version.

Example Use Cases

Data Cleaning & Transformation

Perform complex joins (fuzzy matching), text-to-rows transformations, and advanced filtering/grouping using a visual interface.

Code Generation

Export your visual flows as standalone Python/Polars scripts. Deploy workflows without Flowfile dependencies or share ETL logic as readable code.

Data Integration

Standardize data formats and handle messy Excel files efficiently.

Performance at Scale

Built to scale out-of-core using Polars for lightning-fast data processing.

YAML/JSON Export

Save flows as human-readable YAML or JSON files, making them portable and version-control friendly.

Getting Started

Flowfile is designed to be flexible. Choose the installation method that fits your workflow.

Prerequisites

Python 3.10+
Node.js 16+ (for frontend development)
Poetry (Python package manager)
Docker & Docker Compose (optional, for Docker setup)
Make (optional, for build automation)

1. Python Package (Recommended for Developers)

Install Flowfile directly from PyPI. This gives you both the visual UI and the programmatic flowfile_frame API.

pip install Flowfile

Launch the Visual UI: Start the web-based UI with a single command:

flowfile run ui

Use the FlowFrame API: Create pipelines programmatically using a Polars-like syntax:

import flowfile as ff
from flowfile import col, open_graph_in_editor

# Create a data pipeline
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

# Process the data
result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the graph in the web UI
open_graph_in_editor(result.flow_graph)

For more details, see the flowfile_frame documentation.

2. Docker (Self-Hosted)

Run the full suite (Frontend, Core, Worker) using Docker Compose. Ideal for server deployments or local isolation.

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
docker compose up -d

Access the app at http://localhost:8080.

3. Desktop Application

The desktop version offers the best experience for non-technical users with a native interface and integrated backend services.

Option A: Download Pre-built Application

Download the latest release from GitHub Releases and run the installer for your platform (Windows, macOS, or Linux).

Note: You may see security warnings since the app isn't signed with a developer certificate yet.
Windows: Click "More info" → "Run anyway"
macOS: If you see an "app is damaged" error, run this in Terminal:
find /Applications/Flowfile.app -exec xattr -c {} \;
Then open the app normally. This clears the quarantine flag that macOS sets on downloaded apps.

Option B: Build from Source

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile

# Build packaged executable
make    # Creates platform-specific executable

# Or manually:
poetry install
poetry run build_backends
cd flowfile_frontend
npm install
npm run build

4. Browser Version (Lite)

For a zero-setup experience, try the WASM version. It runs entirely in your browser using Pyodide (no server required).

Live Demo: demo.flowfile.org

This lite version includes 14 essential nodes for data transformation:

Input: Read CSV, Manual Input
Transformation:
- Basic: Filter, Select, Sort, Unique, Take Sample
- Reshape: Group By, Pivot, Unpivot, Join
Advanced: Polars Code (write custom Python/Polars logic)
Output: Preview (view in browser), Download (CSV or Parquet)

5. Manual Setup (Development)

For contributors who need hot-reloading and direct access to services.

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
poetry install

# Start backend services
poetry run flowfile_worker  # Starts worker on :63579
poetry run flowfile_core    # Starts core on :63578

# Start frontend (in a new terminal)
cd flowfile_frontend
npm install && npm run dev:web  # Starts web interface on :8080

Visualizing and Sharing Pipelines

One of the most powerful features is the ability to visualize your data transformation pipelines:

Inspect Data Flow: See exactly how your data is transformed step by step
Debugging: Identify issues in your data pipeline visually
Documentation: Share your data transformation logic with teammates
Iteration: Modify your pipeline in the Designer UI and export it back to code

Recent Additions

Data Catalog — Register tables backed by Delta Lake with version history, time travel, and merge/upsert support. Organize tables into catalogs and schemas, and track lineage across flows.
Scheduler — Run registered flows on an interval or trigger them when specific catalog tables are updated. Configured directly from the catalog UI.
Kafka / Redpanda Source — A canvas node for reading from Kafka topics with automatic schema inference.
Python Kernels — The Python Script node runs user code in isolated Docker containers (kernel_runtime) with their own package environments, keeping the host process safe.
Cloud Storage — Read/write to S3, Azure Data Lake Storage, and Google Cloud Storage.
Flow Parameters — ${variable} substitution across node settings, configurable via UI or CLI.

Technical Design

Flowfile operates as three interconnected services:

Designer (Electron + Vue): Visual interface for building data flows
Core (FastAPI): ETL engine using Polars for data transformations (:63578)
Worker (FastAPI): Handles computation and caching of data operations (:63579)

Each flow is represented as a directed acyclic graph (DAG), where nodes represent data operations and edges represent data flow between operations. You can export any visual flow as standalone Python/Polars code for production use.

For a deeper dive, check out this article on our architecture.

TODO

Core Features

Add cloud storage support (S3, ADLS, GCS)
Multi-flow execution support
Polars code reverse engineering
- Generate Polars code from visual flows (via the "Generate code" button)
- Import existing Polars scripts and convert to visual flows
Data catalog with Delta Lake storage
Flow scheduling (interval and table-trigger based)
Kafka / Redpanda ingestion
Sandboxed Python execution (Docker-based kernels)

Documentation

Add comprehensive docstrings
Create detailed node documentation
Add architectural documentation
Improve inline code comments
Create user guides and tutorials

Infrastructure

Implement proper testing
Add CI/CD pipeline
Improve error handling
Add monitoring and logging

License

MIT License

Acknowledgments

Built with Polars, Vue.js, FastAPI, VueFlow, Delta Lake, and Electron.

Name		Name	Last commit message	Last commit date
Latest commit History 501 Commits
.github		.github
build_backends/build_backends		build_backends/build_backends
data/templates		data/templates
docs		docs
flowfile		flowfile
flowfile_core		flowfile_core
flowfile_frame		flowfile_frame
flowfile_frontend		flowfile_frontend
flowfile_scheduler/flowfile_scheduler		flowfile_scheduler/flowfile_scheduler
flowfile_wasm		flowfile_wasm
flowfile_worker		flowfile_worker
kernel_runtime		kernel_runtime
shared		shared
test_utils		test_utils
tests		tests
tools		tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.spec		main.spec
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readme-pypi.md		readme-pypi.md
test.html		test.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flowfile

Example Use Cases

Data Cleaning & Transformation

Code Generation

Data Integration

Performance at Scale

YAML/JSON Export

Getting Started

Prerequisites

1. Python Package (Recommended for Developers)

2. Docker (Self-Hosted)

3. Desktop Application

4. Browser Version (Lite)

5. Manual Setup (Development)

Visualizing and Sharing Pipelines

Recent Additions

Technical Design

TODO

Core Features

Documentation

Infrastructure

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flowfile

Example Use Cases

Data Cleaning & Transformation

Code Generation

Data Integration

Performance at Scale

YAML/JSON Export

Getting Started

Prerequisites

1. Python Package (Recommended for Developers)

2. Docker (Self-Hosted)

3. Desktop Application

4. Browser Version (Lite)

5. Manual Setup (Development)

Visualizing and Sharing Pipelines

Recent Additions

Technical Design

TODO

Core Features

Documentation

Infrastructure

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages