# Autoflow

Autoflow is a RAG framework supported:

- Vector Search Based RAG
- Knowledge Graph Based RAG (aka. GraphRAG)
- Knowledge Base and Document Management

## Installation

In [None]:
%pip install autoflow-ai==0.0.1.dev31
%pip install autoflow-ai[experiment]==0.0.1.dev31
%pip install ipywidgets

## Prerequisites

- Go [tidbcloud.com](https://tidbcloud.com/) or using [tiup playground](https://docs.pingcap.com/tidb/stable/tiup-playground/) to create a free TiDB database cluster.
- Go [OpenAI platform](https://platform.openai.com/api-keys) to create your API key.

#### For Jupyter Notebook

Configuration can be provided through environment variables, or using `.env`:

In [None]:
# Create .env file, then edit your .env, for example:
# $ cat .env
# TIDB_HOST=localhost
# TIDB_PORT=4000
# TIDB_USERNAME=root
# TIDB_PASSWORD=
# OPENAI_API_KEY='your_openai_api_key'
%cp .env.example .env

In [3]:
import os
import dotenv

dotenv.load_dotenv()

True

#### For Google Colab

In [None]:
from google.colab import userdata

os.environ["TIDB_HOST"] = userdata.get("TIDB_HOST")
os.environ["TIDB_PORT"] = userdata.get("TIDB_PORT")
os.environ["TIDB_USERNAME"] = userdata.get("TIDB_USERNAME")
os.environ["TIDB_PASSWORD"] = userdata.get("TIDB_PASSWORD")
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

## Quickstart

### Init Autoflow

In [4]:
import os
from autoflow import Autoflow

af = Autoflow.from_config(
    db_host=os.getenv("TIDB_HOST"),
    db_port=int(os.getenv("TIDB_PORT")),
    db_username=os.getenv("TIDB_USERNAME"),
    db_password=os.getenv("TIDB_PASSWORD"),
    db_name=os.getenv("TIDB_DATABASE"),
)

### Create knowledge base

In [5]:
from uuid import UUID
from autoflow.schema import IndexMethod
from autoflow.llms.chat_models import ChatModel
from autoflow.llms.embeddings import EmbeddingModel

chat_model = ChatModel("gpt-4o-mini")
embed_model = EmbeddingModel(model_name="text-embedding-3-small", dimensions=1536)

kb = af.create_knowledge_base(
    id=UUID("655b6cf3-8b30-4839-ba8b-5ed3c502f30e"),
    name="New KB",
    description="This is a knowledge base for testing",
    index_methods=[IndexMethod.VECTOR_SEARCH, IndexMethod.KNOWLEDGE_GRAPH],
    chat_model=chat_model,
    embedding_model=embed_model,
)
kb.model_dump()

{'id': UUID('655b6cf3-8b30-4839-ba8b-5ed3c502f30e'),
 'name': 'New KB',
 'index_methods': [<IndexMethod.VECTOR_SEARCH: 'VECTOR_SEARCH'>,
  <IndexMethod.KNOWLEDGE_GRAPH: 'KNOWLEDGE_GRAPH'>],
 'description': 'This is a knowledge base for testing',
 'chunking_config': {'mode': <ChunkingMode.GENERAL: 'general'>,
  'chunk_size': 1200,
  'chunk_overlap': 200,
  'paragraph_separator': '\n\n\n'},
 'data_sources': [],
 'class_name': 'base_component'}

### Import documents from files

In [6]:
current_dir = os.path.dirname(os.path.abspath("__file__"))
current_dir

'/Users/liangzhiyuan/Projects/autoflow.ai/core/docs'

In [7]:
from pathlib import Path

kb.import_documents_from_files(
    files=[
        Path(current_dir) / "fixtures" / "tidb-overview.md",
    ]
)

[]

### Search Documents

In [8]:
result = kb.search_documents(
    query="What is TiDB?",
    similarity_top_k=2,
)
[(c.score, c.chunk.text) for c in result.chunks]

[(0.7382171054172685,
  'What is TiDB Self-Managed Key features\n<!-- Localization note for TiDB:\n- English: use distributed SQL, and start to emphasize HTAP\n- Chinese: can keep "NewSQL" and emphasize one-stop real-time HTAP ("一栈式实时 HTAP")\n- Japanese: use NewSQL because it is well-recognized\n-->\nTiDB (/\'taɪdiːbi:/, "Ti" stands for Titanium) is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. The goal of TiDB is to provide users with a one-stop database solution that covers OLTP (Online Transactional Processing), OLAP (Online Analytical Processing), and HTAP services. TiDB is suitable for various use cases that require high availability and strong consistency with large-scale data.\nTiDB Self-Managed is a product option of TiDB, where users or organizations can deploy and manage TiDB on their own infrastructure

### Search Knowledge Graph

In [10]:
kg = kb.search_knowledge_graph(
    query="What is TiDB?",
)
[(r.rag_description) for r in kg.relationships]

['TiDB -> TiDB can be deployed in a Self-Managed model, providing users with control over their database setup. -> Self-Managed',
 'TiDB -> TiDB Self-Managed is a deployment option of TiDB that allows users to manage the database on their own infrastructure. -> TiDB Self-Managed',
 'TiDB -> TiDB utilizes TiKV as its row-based storage engine to support real-time data replication. -> TiKV',
 'TiDB -> TiDB employs TiFlash as its columnar storage engine to ensure consistent data storage and real-time replication from TiKV. -> TiFlash',
 'TiDB -> TiDB Operator facilitates the management of TiDB on Kubernetes, automating operational tasks. -> TiDB Operator',
 'TiDB -> TiDB Cloud is the fully-managed service that allows users to deploy and run TiDB clusters in the cloud. -> TiDB Cloud',
 'TiDB -> TiDB can be deployed in a Self-Managed model, providing users with control over their database setup. -> Self-Managed',
 'TiDB -> TiDB Self-Managed is a deployment option of TiDB that allows users to