Skip to content

villanianalytics/Flow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

501 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flowfile Logo
Flowfile

Documentation: Website - Core - Worker - Frontend - Try Online - Technical Architecture

Flowfile is a visual ETL tool and Python library for building data pipelines. It has a drag-and-drop canvas with 30+ node types (joins, fuzzy matching, filters, pivots, aggregations, etc.), connects to databases (PostgreSQL, MySQL, SQL Server, Oracle), cloud storage (S3, ADLS, GCS), and Kafka. Pipelines run on Polars and can be exported as standalone Python scripts. It also includes a data catalog backed by Delta Lake, a scheduler, sandboxed Python execution via Docker kernels, and a programmatic API with Polars-like syntax. Available as a desktop app, web UI, Docker deployment, Python package, or browser-only WASM version.

Flowfile Interface

Example Use Cases

Data Cleaning & Transformation

Perform complex joins (fuzzy matching), text-to-rows transformations, and advanced filtering/grouping using a visual interface.

Flowfile Layout

Code Generation

Export your visual flows as standalone Python/Polars scripts. Deploy workflows without Flowfile dependencies or share ETL logic as readable code.

Automatically generate polars code

Data Integration

Standardize data formats and handle messy Excel files efficiently.

Read Excel

Performance at Scale

Built to scale out-of-core using Polars for lightning-fast data processing.

Flowfile Write Demo

YAML/JSON Export

Save flows as human-readable YAML or JSON files, making them portable and version-control friendly.


Getting Started

Flowfile is designed to be flexible. Choose the installation method that fits your workflow.

Prerequisites

  • Python 3.10+
  • Node.js 16+ (for frontend development)
  • Poetry (Python package manager)
  • Docker & Docker Compose (optional, for Docker setup)
  • Make (optional, for build automation)

1. Python Package (Recommended for Developers)

Install Flowfile directly from PyPI. This gives you both the visual UI and the programmatic flowfile_frame API.

pip install Flowfile

Launch the Visual UI: Start the web-based UI with a single command:

flowfile run ui

Use the FlowFrame API: Create pipelines programmatically using a Polars-like syntax:

import flowfile as ff
from flowfile import col, open_graph_in_editor

# Create a data pipeline
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

# Process the data
result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the graph in the web UI
open_graph_in_editor(result.flow_graph)

For more details, see the flowfile_frame documentation.

2. Docker (Self-Hosted)

Run the full suite (Frontend, Core, Worker) using Docker Compose. Ideal for server deployments or local isolation.

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
docker compose up -d

Access the app at http://localhost:8080.

3. Desktop Application

The desktop version offers the best experience for non-technical users with a native interface and integrated backend services.

Option A: Download Pre-built Application

Download the latest release from GitHub Releases and run the installer for your platform (Windows, macOS, or Linux).

Note: You may see security warnings since the app isn't signed with a developer certificate yet.

  • Windows: Click "More info" → "Run anyway"
  • macOS: If you see an "app is damaged" error, run this in Terminal:
    find /Applications/Flowfile.app -exec xattr -c {} \;
    Then open the app normally. This clears the quarantine flag that macOS sets on downloaded apps.

Option B: Build from Source

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile

# Build packaged executable
make    # Creates platform-specific executable

# Or manually:
poetry install
poetry run build_backends
cd flowfile_frontend
npm install
npm run build

4. Browser Version (Lite)

For a zero-setup experience, try the WASM version. It runs entirely in your browser using Pyodide (no server required).

Live Demo: demo.flowfile.org

This lite version includes 14 essential nodes for data transformation:

  • Input: Read CSV, Manual Input
  • Transformation:
    • Basic: Filter, Select, Sort, Unique, Take Sample
    • Reshape: Group By, Pivot, Unpivot, Join
  • Advanced: Polars Code (write custom Python/Polars logic)
  • Output: Preview (view in browser), Download (CSV or Parquet)

5. Manual Setup (Development)

For contributors who need hot-reloading and direct access to services.

git clone https://github.com/edwardvaneechoud/Flowfile.git
cd Flowfile
poetry install

# Start backend services
poetry run flowfile_worker  # Starts worker on :63579
poetry run flowfile_core    # Starts core on :63578

# Start frontend (in a new terminal)
cd flowfile_frontend
npm install && npm run dev:web  # Starts web interface on :8080

Visualizing and Sharing Pipelines

One of the most powerful features is the ability to visualize your data transformation pipelines:

  • Inspect Data Flow: See exactly how your data is transformed step by step
  • Debugging: Identify issues in your data pipeline visually
  • Documentation: Share your data transformation logic with teammates
  • Iteration: Modify your pipeline in the Designer UI and export it back to code

Recent Additions

  • Data Catalog — Register tables backed by Delta Lake with version history, time travel, and merge/upsert support. Organize tables into catalogs and schemas, and track lineage across flows.
  • Scheduler — Run registered flows on an interval or trigger them when specific catalog tables are updated. Configured directly from the catalog UI.
  • Kafka / Redpanda Source — A canvas node for reading from Kafka topics with automatic schema inference.
  • Python Kernels — The Python Script node runs user code in isolated Docker containers (kernel_runtime) with their own package environments, keeping the host process safe.
  • Cloud Storage — Read/write to S3, Azure Data Lake Storage, and Google Cloud Storage.
  • Flow Parameters${variable} substitution across node settings, configurable via UI or CLI.

Technical Design

Flowfile operates as three interconnected services:

  • Designer (Electron + Vue): Visual interface for building data flows
  • Core (FastAPI): ETL engine using Polars for data transformations (:63578)
  • Worker (FastAPI): Handles computation and caching of data operations (:63579)

Each flow is represented as a directed acyclic graph (DAG), where nodes represent data operations and edges represent data flow between operations. You can export any visual flow as standalone Python/Polars code for production use.

For a deeper dive, check out this article on our architecture.


TODO

Core Features

  • Add cloud storage support (S3, ADLS, GCS)
  • Multi-flow execution support
  • Polars code reverse engineering
    • Generate Polars code from visual flows (via the "Generate code" button)
    • Import existing Polars scripts and convert to visual flows
  • Data catalog with Delta Lake storage
  • Flow scheduling (interval and table-trigger based)
  • Kafka / Redpanda ingestion
  • Sandboxed Python execution (Docker-based kernels)

Documentation

  • Add comprehensive docstrings
  • Create detailed node documentation
  • Add architectural documentation
  • Improve inline code comments
  • Create user guides and tutorials

Infrastructure

  • Implement proper testing
  • Add CI/CD pipeline
  • Improve error handling
  • Add monitoring and logging

License

MIT License


Acknowledgments

Built with Polars, Vue.js, FastAPI, VueFlow, Delta Lake, and Electron.

About

Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 66.2%
  • Vue 21.0%
  • TypeScript 10.6%
  • CSS 1.8%
  • JavaScript 0.1%
  • HTML 0.1%
  • Other 0.2%