Skip to content

A space for Researching and Developing Agentic Pipelines with Different development Frameworks, logging pipelines, RAG flows and with various models.

Notifications You must be signed in to change notification settings

vijulshah/AgenticPipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AgenticPipelines

A space for Researching and Developing Agentic Pipelines with Different development Frameworks, logging pipelines, RAG flows and with various models.

Features

  • πŸš€ RAG Pipeline with MongoDB: Production-ready RAG implementation using LangChain and MongoDB Atlas Vector Search
  • πŸ“¦ Pydantic Models: Type-safe configuration management with validation
  • βš™οΈ Dynamic Configuration: YAML-based config files with environment variable substitution
  • 🎯 PEP-8 Compliant: Clean, maintainable code following Python best practices
  • πŸ”§ Modular Design: Easily extensible architecture for adding new components
  • πŸ“ Comprehensive Logging: Built-in logging with file and console handlers
  • πŸ§ͺ Type Hints: Full type annotations for better IDE support and code quality

Project Structure

AgenticPipelines/
β”œβ”€β”€ src/
β”‚   └── rag_pipeline/          # RAG pipeline implementation
β”‚       β”œβ”€β”€ core/              # Core pipeline components
β”‚       β”‚   β”œβ”€β”€ embeddings.py  # Embeddings management
β”‚       β”‚   β”œβ”€β”€ llm.py         # LLM management
β”‚       β”‚   β”œβ”€β”€ pipeline.py    # Main RAG pipeline
β”‚       β”‚   └── vector_store.py # Vector store management
β”‚       β”œβ”€β”€ models/            # Pydantic models
β”‚       β”‚   └── config.py      # Configuration models
β”‚       └── utils/             # Utility modules
β”‚           β”œβ”€β”€ config_loader.py   # Config loading utilities
β”‚           β”œβ”€β”€ document_processor.py # Document processing
β”‚           └── logger.py      # Logging utilities
β”œβ”€β”€ configs/                   # Configuration files
β”‚   └── default_config.yaml    # Default RAG configuration
β”œβ”€β”€ examples/                  # Example scripts
β”‚   β”œβ”€β”€ basic_usage.py        # Basic RAG usage
β”‚   β”œβ”€β”€ document_processing.py # Document loading example
β”‚   └── custom_config.py      # Custom configuration example
β”œβ”€β”€ tests/                     # Test files
β”œβ”€β”€ .env.example              # Environment variables template
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ pyproject.toml           # Project configuration
└── README.md                # This file

Installation

Prerequisites

  • Python 3.12+
  • MongoDB (local or MongoDB Atlas)
  • OpenAI API key

Setup

  1. Clone the repository:
git clone https://github.com/vijulshah/AgenticPipelines.git
cd AgenticPipelines
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env and add your API keys and MongoDB connection string

Quick Start

Basic Usage

from pathlib import Path
from src.rag_pipeline import RAGPipeline, load_config

# Load configuration
config = load_config(Path("configs/default_config.yaml"))

# Initialize pipeline
with RAGPipeline(config) as pipeline:
    # Add documents
    text = "Your document text here..."
    pipeline.add_text(text)
    
    # Query the pipeline
    result = pipeline.query("Your question here?")
    print(result['answer'])

Configuration

Create a YAML configuration file or use the default one in configs/default_config.yaml:

pipeline_name: "my_rag_pipeline"
vector_store_type: "mongodb"

mongodb:
  connection_string: "${MONGODB_URI}"
  database_name: "rag_database"
  collection_name: "vector_store"
  index_name: "vector_index"

embedding:
  provider: "openai"
  model_name: "text-embedding-3-small"
  dimensions: 1536
  api_key: "${OPENAI_API_KEY}"

llm:
  provider: "openai"
  model_name: "gpt-4o-mini"
  temperature: 0.7
  max_tokens: 1000
  api_key: "${OPENAI_API_KEY}"

retriever:
  search_type: "similarity"
  k: 4

chunking:
  chunk_size: 1000
  chunk_overlap: 200

Programmatic Configuration

from src.rag_pipeline import RAGPipeline, RAGPipelineConfig
from src.rag_pipeline.models import MongoDBConfig, EmbeddingConfig, LLMConfig

config = RAGPipelineConfig(
    pipeline_name="custom_pipeline",
    mongodb=MongoDBConfig(
        connection_string="mongodb://localhost:27017",
        database_name="my_db",
    ),
    embedding=EmbeddingConfig(
        provider="openai",
        model_name="text-embedding-3-small",
    ),
    llm=LLMConfig(
        provider="openai",
        model_name="gpt-4o-mini",
        temperature=0.5,
    ),
)

pipeline = RAGPipeline(config)

Examples

See the examples/ directory for detailed usage examples:

  • basic_usage.py: Simple RAG pipeline setup
  • document_processing.py: Loading and processing documents
  • custom_config.py: Using custom configurations

Run an example:

python examples/basic_usage.py

MongoDB Setup

Local MongoDB

# Start MongoDB
mongod --dbpath /path/to/data/directory

MongoDB Atlas

  1. Create a free MongoDB Atlas cluster at https://www.mongodb.com/cloud/atlas
  2. Create a database user and get the connection string
  3. Create a vector search index on your collection:
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    }
  ]
}
  1. Update your .env file with the connection string

Development

Code Style

This project follows PEP-8 guidelines and uses:

  • Black for code formatting
  • Ruff for linting
  • MyPy for type checking

Running Linters

# Install dev dependencies
pip install -e ".[dev]"

# Format code
black src/ examples/ tests/

# Lint code
ruff check src/ examples/ tests/

# Type check
mypy src/

Dependencies

Core dependencies:

  • langchain>=0.3.0 - LLM framework
  • langchain-mongodb>=0.2.0 - MongoDB vector store
  • langchain-openai>=0.2.0 - OpenAI integration
  • pymongo>=4.8.0 - MongoDB driver
  • pydantic>=2.9.0 - Data validation
  • pyyaml>=6.0.1 - YAML parsing

Roadmap

  • RAG Pipeline with MongoDB
  • Tool Chaining Flows
  • Additional Vector Stores (Chroma, Pinecone)
  • Multi-modal RAG
  • Agent-based pipelines
  • Advanced retrieval strategies
  • Evaluation metrics and benchmarking

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Acknowledgments

Built with:

About

A space for Researching and Developing Agentic Pipelines with Different development Frameworks, logging pipelines, RAG flows and with various models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages