Documentation: Website - Core - Worker - Frontend - Try Online - Technical Architecture
Flowfile is a visual ETL tool and Python library for building data pipelines. It has a drag-and-drop canvas with 30+ node types (joins, fuzzy matching, filters, pivots, aggregations, etc.), connects to databases (PostgreSQL, MySQL, SQL Server, Oracle), cloud storage (S3, ADLS, GCS), and Kafka. Pipelines run on Polars and can be exported as standalone Python scripts. It also includes a data catalog backed by Delta Lake, a scheduler, sandboxed Python execution via Docker kernels, and a programmatic API with Polars-like syntax. Available as a desktop app, web UI, Docker deployment, Python package, or browser-only WASM version.
Perform complex joins (fuzzy matching), text-to-rows transformations, and advanced filtering/grouping using a visual interface.
Export your visual flows as standalone Python/Polars scripts. Deploy workflows without Flowfile dependencies or share ETL logic as readable code.
Standardize data formats and handle messy Excel files efficiently.
Built to scale out-of-core using Polars for lightning-fast data processing.
Save flows as human-readable YAML or JSON files, making them portable and version-control friendly.
Flowfile is designed to be flexible. Choose the installation method that fits your workflow.
- Python 3.10+
- Node.js 16+ (for frontend development)
- Poetry (Python package manager)
- Docker & Docker Compose (optional, for Docker setup)
- Make (optional, for build automation)
Install Flowfile directly from PyPI. This gives you both the visual UI and the programmatic flowfile_frame API.
pip install FlowfileLaunch the Visual UI: Start the web-based UI with a single command:
flowfile run uiUse the FlowFrame API: Create pipelines programmatically using a Polars-like syntax:
import flowfile as ff
from flowfile import col, open_graph_in_editor
# Create a data pipeline
df = ff.from_dict({
"id": [1, 2, 3, 4, 5],
"category": ["A", "B", "A", "C", "B"],
"value": [100, 200, 150, 300, 250]
})
# Process the data
result = df.filter(col("value") > 150).with_columns([
(col("value") * 2).alias("double_value")
])
# Open the graph in the web UI
open_graph_in_editor(result.flow_graph)For more details, see the flowfile_frame documentation.
Run the full suite (Frontend, Core, Worker) using Docker Compose. Ideal for server deployments or local isolation.
git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
docker compose up -dAccess the app at http://localhost:8080.
The desktop version offers the best experience for non-technical users with a native interface and integrated backend services.
Option A: Download Pre-built Application
Download the latest release from GitHub Releases and run the installer for your platform (Windows, macOS, or Linux).
Note: You may see security warnings since the app isn't signed with a developer certificate yet.
- Windows: Click "More info" → "Run anyway"
- macOS: If you see an "app is damaged" error, run this in Terminal:
Then open the app normally. This clears the quarantine flag that macOS sets on downloaded apps.find /Applications/Flowfile.app -exec xattr -c {} \;
Option B: Build from Source
git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
# Build packaged executable
make # Creates platform-specific executable
# Or manually:
poetry install
poetry run build_backends
cd flowfile_frontend
npm install
npm run buildFor a zero-setup experience, try the WASM version. It runs entirely in your browser using Pyodide (no server required).
Live Demo: demo.flowfile.org
This lite version includes 14 essential nodes for data transformation:
- Input: Read CSV, Manual Input
- Transformation:
- Basic: Filter, Select, Sort, Unique, Take Sample
- Reshape: Group By, Pivot, Unpivot, Join
- Advanced: Polars Code (write custom Python/Polars logic)
- Output: Preview (view in browser), Download (CSV or Parquet)
For contributors who need hot-reloading and direct access to services.
git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
poetry install
# Start backend services
poetry run flowfile_worker # Starts worker on :63579
poetry run flowfile_core # Starts core on :63578
# Start frontend (in a new terminal)
cd flowfile_frontend
npm install && npm run dev:web # Starts web interface on :8080One of the most powerful features is the ability to visualize your data transformation pipelines:
- Inspect Data Flow: See exactly how your data is transformed step by step
- Debugging: Identify issues in your data pipeline visually
- Documentation: Share your data transformation logic with teammates
- Iteration: Modify your pipeline in the Designer UI and export it back to code
- Data Catalog — Register tables backed by Delta Lake with version history, time travel, and merge/upsert support. Organize tables into catalogs and schemas, and track lineage across flows.
- Scheduler — Run registered flows on an interval or trigger them when specific catalog tables are updated. Configured directly from the catalog UI.
- Kafka / Redpanda Source — A canvas node for reading from Kafka topics with automatic schema inference.
- Python Kernels — The Python Script node runs user code in isolated Docker containers (
kernel_runtime) with their own package environments, keeping the host process safe. - Cloud Storage — Read/write to S3, Azure Data Lake Storage, and Google Cloud Storage.
- Flow Parameters —
${variable}substitution across node settings, configurable via UI or CLI.
Flowfile operates as three interconnected services:
- Designer (Electron + Vue): Visual interface for building data flows
- Core (FastAPI): ETL engine using Polars for data transformations (
:63578) - Worker (FastAPI): Handles computation and caching of data operations (
:63579)
Each flow is represented as a directed acyclic graph (DAG), where nodes represent data operations and edges represent data flow between operations. You can export any visual flow as standalone Python/Polars code for production use.
For a deeper dive, check out this article on our architecture.
- Add cloud storage support (S3, ADLS, GCS)
- Multi-flow execution support
- Polars code reverse engineering
- Generate Polars code from visual flows (via the "Generate code" button)
- Import existing Polars scripts and convert to visual flows
- Data catalog with Delta Lake storage
- Flow scheduling (interval and table-trigger based)
- Kafka / Redpanda ingestion
- Sandboxed Python execution (Docker-based kernels)
- Add comprehensive docstrings
- Create detailed node documentation
- Add architectural documentation
- Improve inline code comments
- Create user guides and tutorials
- Implement proper testing
- Add CI/CD pipeline
- Improve error handling
- Add monitoring and logging
Built with Polars, Vue.js, FastAPI, VueFlow, Delta Lake, and Electron.





