Skip to content

Zweer/FlowRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

248 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

FlowRAG

FlowRAG 🌊

TypeScript RAG library with knowledge graph support.

CI License: MIT Coverage Badge

Table of Contents

Why FlowRAG?

FlowRAG solves common problems with existing RAG solutions:

🐍 Python Complexity: No Python environments, virtual envs, or dependency conflicts. Pure TypeScript.

πŸ–₯️ Always-On Servers: Works as a library, not a service. Import, use, done.

☁️ Serverless Unfriendly: Optimized for Lambda with fast cold starts and stateless queries.

πŸ“ Storage Lock-in: File-based storage that's Git-friendly. Commit your knowledge base.

πŸ”— Missing Knowledge Graphs: Combines vector search with entity relationships for richer context.

πŸ”§ Complex Setup: npm install and 10 lines of code to get started.

Installation

npm install @flowrag/core @flowrag/pipeline @flowrag/storage-json @flowrag/storage-sqlite @flowrag/storage-lancedb
npm install @flowrag/provider-local @flowrag/provider-gemini

Or for a complete local setup:

npm install @flowrag/pipeline @flowrag/presets

For AWS cloud deployment:

npm install @flowrag/provider-bedrock @flowrag/storage-s3 @flowrag/storage-opensearch

Quick Start

import { defineSchema } from '@flowrag/core';
import { createFlowRAG } from '@flowrag/pipeline';
import { createLocalStorage } from '@flowrag/presets';

// Define your schema
const schema = defineSchema({
  entityTypes: ['SERVICE', 'DATABASE', 'PROTOCOL'],
  relationTypes: ['USES', 'PRODUCES', 'CONSUMES'],
});

// Create RAG instance
const rag = createFlowRAG({
  schema,
  ...createLocalStorage('./data'),
});

// Index documents
await rag.index('./content');

// Search
const results = await rag.search('how does authentication work');

Features

Schema-Flexible

Define your own entity and relation types with optional custom fields:

const schema = defineSchema({
  entityTypes: ['SERVICE', 'PROTOCOL', 'TEAM'],
  relationTypes: ['PRODUCES', 'CONSUMES', 'OWNS'],
  
  // Optional custom fields for richer metadata
  entityFields: {
    status: { type: 'enum', values: ['active', 'deprecated'], default: 'active' },
    owner: { type: 'string' },
  },
  relationFields: {
    syncType: { type: 'enum', values: ['sync', 'async'] },
  },
});

// schema.isValidEntityType('SERVICE') β†’ true
// schema.normalizeEntityType('UNKNOWN') β†’ 'Other'

Graph-First

Trace data flows through your system:

// Where does this data come from?
const sources = await rag.traceDataFlow('dashboard-metric', 'upstream');

// Where does this data go?
const consumers = await rag.traceDataFlow('user-event', 'downstream');

Dual Retrieval

Combines vector search with graph traversal:

  1. Vector search: Find semantically similar chunks
  2. Graph expansion: Follow entity relationships
  3. Merge & dedupe: Combine results

Entity Search

Search entities semantically by description, not just exact name:

const results = await rag.searchEntities('the service that handles login');
// [{ entity: { name: 'Auth Service', type: 'SERVICE', ... }, score: 0.92 }]

Reranker (Optional)

Improve result quality with a post-retrieval reranking step:

import { LocalReranker } from '@flowrag/provider-local';

const rag = createFlowRAG({
  schema,
  ...createLocalStorage('./data'),
  reranker: new LocalReranker(), // Cross-encoder ONNX, fully offline
});

Three implementations available:

  • LocalReranker β€” cross-encoder via ONNX (Xenova/ms-marco-MiniLM-L-6-v2), no API needed
  • GeminiReranker β€” LLM-based relevance scoring
  • BedrockReranker β€” Amazon Rerank API (amazon.rerank-v1:0)

Incremental Indexing

Only re-process changed documents. Content is hashed (SHA-256) and compared on re-index:

await rag.index('./content');                  // Skips unchanged docs
await rag.index('./content', { force: true }); // Re-index everything

Document Deletion

Delete a document and automatically clean up orphaned entities and relations:

await rag.deleteDocument('doc:readme');

Document Parsers

Pluggable file parsing for non-text documents (PDF, DOCX, images, etc.):

const rag = createFlowRAG({
  schema,
  ...createLocalStorage('./data'),
  parsers: [new PDFParser(), new DocxParser()],
});

Citation / Source Attribution

Search results include source references for traceability:

const results = await rag.search('how does auth work');
// Each result includes: sources: [{ documentId, filePath, chunkIndex }]
// Plus document metadata fields in result.metadata (e.g., author, domain, version)

Entity Merging

Merge duplicate entities extracted by the LLM:

await rag.mergeEntities({
  sources: ['Auth Service', 'AuthService', 'auth-service'],
  target: 'Auth Service',
});

Observability Hooks

Extension points for tracing, monitoring, and token tracking:

const rag = createFlowRAG({
  // ...
  observability: {
    onLLMCall: ({ model, duration, usage }) => console.log(model, usage),
    onEmbedding: ({ model, textsCount, duration }) => console.log(model, textsCount),
    onSearch: ({ query, mode, resultsCount, duration }) => console.log(query, duration),
  },
});

Export

Export the knowledge graph in multiple formats:

await rag.export('json'); // Entities + relations as JSON
await rag.export('csv');  // Relation table
await rag.export('dot');  // Graphviz digraph

Extraction Gleaning

Multi-pass entity extraction for higher accuracy:

const rag = createFlowRAG({
  // ...
  options: { indexing: { extractionGleanings: 2 } },
});

Evaluation

Pluggable RAG quality evaluation:

const rag = createFlowRAG({
  // ...
  evaluator: myEvaluator, // implements Evaluator interface
});

const result = await rag.evaluate('query', { reference: 'expected answer' });
// result.scores: { precision: 0.85, recall: 0.72, faithfulness: 0.91 }

CLI

Full-featured command-line interface for local usage:

# Initialize data directory
flowrag init

# Index documents (with optional interactive entity review)
flowrag index ./content
flowrag index ./content --force          # Re-index all documents
flowrag index ./content --interactive    # Review extracted entities

# Search
flowrag search "how does OCPP work"
flowrag search "OCPP" --type entities    # Search entities
flowrag search "ServiceA" --type relations  # Show entity relations
flowrag search "query" --mode local --limit 20

# Knowledge graph
flowrag graph stats                      # Entity/relation breakdown
flowrag graph export                     # Export as DOT format

# Statistics
flowrag stats

Human-in-the-Loop

Interactive entity review during indexing with --interactive:

πŸ“„ Chunk chunk:abc123 β€” doc:readme

? Entities β€” select to keep:
  β—‰ [SERVICE]  becky-ocpp16 β€” "Backend OCPP 1.6..."
  β—‰ [PROTOCOL] OCPP 1.6 β€” "Open Charge Point Protocol..."
  β—― [OTHER]    WebSocket β€” "Communication protocol..."

? What next?
  β†’ Continue to relations
    ✏️  Edit an entity
    βž• Add new entity
    πŸ“„ Show chunk content

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         FlowRAG                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Schema Definition  β”‚  Pipeline  β”‚  Graph Traversal         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      STORAGE LAYER                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚  β”‚    KV    β”‚  β”‚  Vector  β”‚  β”‚  Graph   β”‚                   β”‚
β”‚  β”‚ JSON/S3  β”‚  β”‚SQLite/   β”‚  β”‚SQLite/OS β”‚                   β”‚
β”‚  β”‚  Redis   β”‚  β”‚Lance/OS  β”‚  β”‚          β”‚                   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      PROVIDERS                              β”‚
β”‚  Embedder: Local ONNX β”‚ Gemini β”‚ Bedrock                    β”‚
β”‚  Extractor: Gemini β”‚ Bedrock                                β”‚
β”‚  Reranker: Local ONNX β”‚ Gemini β”‚ Bedrock                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Packages

Package Version Description Status
@flowrag/core npm version Interfaces, schema, types βœ… Complete
@flowrag/pipeline npm version Indexing & querying pipelines βœ… Complete
@flowrag/storage-json npm version JSON file KV storage βœ… Complete
@flowrag/storage-sqlite npm version SQLite graph & vector storage βœ… Complete
@flowrag/storage-lancedb npm version LanceDB vector storage βœ… Complete
@flowrag/storage-s3 npm S3 KV storage βœ… Complete
@flowrag/storage-opensearch npm OpenSearch vector & graph storage βœ… Complete
@flowrag/provider-local npm version Local AI provider (ONNX embeddings) βœ… Complete
@flowrag/provider-gemini npm version Gemini AI provider (embeddings + extraction) βœ… Complete
@flowrag/provider-bedrock npm AWS Bedrock provider (embeddings + extraction) βœ… Complete
@flowrag/provider-openai npm OpenAI provider (embeddings + extraction) βœ… Complete
@flowrag/provider-anthropic npm Anthropic provider (extraction only) βœ… Complete
@flowrag/storage-redis npm Redis KV + vector storage βœ… Complete
@flowrag/presets npm version Opinionated presets βœ… Complete
@flowrag/cli npm Command-line interface βœ… Complete
@flowrag/mcp npm MCP server for AI assistants βœ… Complete

Development Status

  • βœ… Complete: Fully implemented with 100% test coverage
  • 🚧 In Progress: Currently being developed
  • πŸ“‹ Planned: Scheduled for future development

Use Cases

Local Development

flowrag index ./content    # Index your docs
flowrag search "query"     # Search locally
# DB files committed to Git βœ“

AWS Lambda

import { defineSchema } from '@flowrag/core';
import { createFlowRAG } from '@flowrag/pipeline';
import { BedrockEmbedder, BedrockExtractor } from '@flowrag/provider-bedrock';
import { S3KVStorage } from '@flowrag/storage-s3';
import { OpenSearchVectorStorage, OpenSearchGraphStorage } from '@flowrag/storage-opensearch';

export const handler = async (event: { query: string }) => {
  const rag = createFlowRAG({
    schema,
    storage: {
      kv: new S3KVStorage({ client: s3Client, bucket: 'my-rag-bucket', prefix: 'kv/' }),
      vector: new OpenSearchVectorStorage({ client: osClient, dimensions: 1024 }),
      graph: new OpenSearchGraphStorage({ client: osClient }),
    },
    embedder: new BedrockEmbedder(),
    extractor: new BedrockExtractor(),
  });

  return await rag.search(event.query);
};

Tech Stack

Purpose Tool
Runtime Node.js >=20
Language TypeScript (strict, isolatedDeclarations)
Build tsdown (Rolldown-based)
Test Vitest
Lint/Format Biome
Schema Zod

Development

npm install        # Install dependencies
npm run build      # Build all packages
npm test           # Run all tests
npm run test:e2e   # Run end-to-end tests
npm run lint       # Lint code
npm run typecheck  # Type check

Documentation

The docs site is built with VitePress and includes guides, API reference, provider docs, deployment patterns, and blog posts:

npm run docs:dev   # Local dev server
npm run docs:build # Build for production

AI-friendly llms.txt and llms-full.txt are auto-generated and served from the docs site.

Release

Releases are managed by bonvoy with independent versioning per package. CI runs on every push to main: tests (Node 20/22/24), e2e, lint, then auto-release and docs deploy.

Comparison

FlowRAG vs LightRAG

Aspect LightRAG FlowRAG
Language Python TypeScript
Model Server (always running) Library (import and use)
Indexing Continuous, real-time Batch, scheduled
Deploy Container/server Lambda-friendly
Storage External DBs (Neo4j, Postgres) File-based (Git-friendly)
Complexity Feature-rich, many deps Minimal, focused

License

MIT


Inspired by LightRAG, built for TypeScript developers.

About

🌊 TypeScript RAG library with knowledge graph support β€” batch indexing, semantic search, entity extraction, and graph traversal. Lambda-friendly, Git-friendly, zero servers.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors