Skip to content

umangsingh123/simple-rag-python

Repository files navigation

Simple RAG (Retrieval-Augmented Generation) — Python

A minimal, production-ready RAG service built with FastAPI. It embeds queries, retrieves relevant context from a vector store, and generates answers using an LLM. The project uses clean service boundaries and FastAPI dependency injection for testability and performance.

What it does

  • Accepts a question via an HTTP API.
  • Creates an embedding for the question.
  • Retrieves top-k relevant documents from a vector database.
  • Calls an LLM to generate a grounded answer using those documents.
  • Returns the answer, sources, and processing time.

Project structure

simple-rag-python/
  api/                # FastAPI routes (health, query)
  config/             # Settings and environment configuration
  models/             # Pydantic request/response models
  server/             # FastAPI app setup and middleware
  services/           # RAGService, embedder, vector store, LLM adapters
  main.py             # Entry point
  README.md           # This file

Key components

  • api/routes.py: HTTP endpoints. Uses DI: Depends(get_rag_service).
  • services/pincecone_service.py: RAGService orchestrates embed → retrieve → generate.
  • services/embedder.py: Creates embeddings.
  • services/vector_store.py: Vector search (e.g., Pinecone).
  • services/llm_service.py: Generates a response from the LLM.
  • server/server.py: App factory, CORS, request timing middleware.
  • config/settings.py: Centralized configuration via environment variables.

Prerequisites

  • Python 3.12+
  • Optional: uv or pip
  • Vector DB and LLM credentials (see Environment variables)

Architecture overview

Layers

  • API: api/routes.py (FastAPI routes, validation, DI)
  • Service: services/pincecone_service.py (RAGService orchestration)
  • Integrations: services/embedder.py, services/vector_store.py, services/llm_service.py
  • Platform: server/server.py (app factory, CORS, timing), config/settings.py (env), models/data_models.py (schemas)

Request flow

  1. Client → POST /api/v1/query
  2. FastAPI validates QueryRequest, injects RAGService
  3. RAGService → Embedder → VectorStore → LLM
  4. Return QueryResponse with answer, sources, processing_time

ASCII diagram

Client
  |
  v
API (FastAPI) --> RAGService --> Embedder
                         |           |
                         v           v
                   VectorStore -->  LLM

Notes

  • DI uses a cached provider, avoiding global singletons
  • Clear boundaries make providers swappable
  • Request latency exposed via X-Process-Time-ms
  • Structured logs via structlog in service layer

uv setup and Makefile usage (appendix)

Bootstrapping the project using uv and pyenv:

# 1) Initialize a uv project (creates pyproject.toml if missing)
uv init

# 2) Create and activate a virtual environment
uv venv
source .venv/bin/activate

# 3) Import requirements into pyproject (keeps TOML as source of truth)
uv add --requirements requirements.txt

# 4) Sync/install dependencies
uv sync

# 5) Editable install for this project (if not already)
uv pip install -e .

Makefile shortcuts:

make setup   # Initializes venv, runs uv init, imports requirements, syncs deps
make run     # Runs the app via uv: uv run .venv/bin/python main.py
make test    # Runs tests if present
make fmt     # Formats with Black (add with: uv add --dev black)
make deps    # Shows dependency tree (uv tree)
make lock    # Rebuilds uv lock file
make clean   # Removes caches

High-level system graph

System Architecture Overview

The following diagram illustrates the flow across client, API, business logic, data/config, and external services.

graph TB
    subgraph Client_Layer[Client Layer]
        C[Client Apps]
    end

    subgraph API_Layer[API Layer - api-routes.py]
        R[Router /api/v1]
        H1[Health Handler: GET /health]
        H3[Query Handler: POST /query]
    end

    subgraph Business_Logic[Business Logic - services]
        RS[RAGService]
        EMB[Embedder: OpenAIEmbedder or MockEmbedder]
        VEC[VectorStore: PineconeVectorStore or MockVectorStore]
        LLM[LLM Service: OpenAILLM or MockLLM]
    end

    subgraph External_Services[External Services]
        L1[OpenAI API Chat Completions]
        L2[OpenAI API Embeddings]
        P[Pinecone Index]
    end

    subgraph Data_Config[Data and Config]
        M[Models: QueryRequest, QueryResponse, Document]
        CFG[Settings: config/settings.py]
    end

   
    C --> R
    R --> H1
    R --> H3
    H3 --> RS

    RS --> EMB
    RS --> VEC
    RS --> LLM

    EMB --> L2
    LLM --> L1
    VEC --> P

    RS --> M
    RS --> CFG
    EMB --> CFG
    VEC --> CFG
    LLM --> CFG
Loading

More code Documentation for Services

Service Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published