A space for Researching and Developing Agentic Pipelines with Different development Frameworks, logging pipelines, RAG flows and with various models.
- π RAG Pipeline with MongoDB: Production-ready RAG implementation using LangChain and MongoDB Atlas Vector Search
- π¦ Pydantic Models: Type-safe configuration management with validation
- βοΈ Dynamic Configuration: YAML-based config files with environment variable substitution
- π― PEP-8 Compliant: Clean, maintainable code following Python best practices
- π§ Modular Design: Easily extensible architecture for adding new components
- π Comprehensive Logging: Built-in logging with file and console handlers
- π§ͺ Type Hints: Full type annotations for better IDE support and code quality
AgenticPipelines/
βββ src/
β βββ rag_pipeline/ # RAG pipeline implementation
β βββ core/ # Core pipeline components
β β βββ embeddings.py # Embeddings management
β β βββ llm.py # LLM management
β β βββ pipeline.py # Main RAG pipeline
β β βββ vector_store.py # Vector store management
β βββ models/ # Pydantic models
β β βββ config.py # Configuration models
β βββ utils/ # Utility modules
β βββ config_loader.py # Config loading utilities
β βββ document_processor.py # Document processing
β βββ logger.py # Logging utilities
βββ configs/ # Configuration files
β βββ default_config.yaml # Default RAG configuration
βββ examples/ # Example scripts
β βββ basic_usage.py # Basic RAG usage
β βββ document_processing.py # Document loading example
β βββ custom_config.py # Custom configuration example
βββ tests/ # Test files
βββ .env.example # Environment variables template
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project configuration
βββ README.md # This file
- Python 3.12+
- MongoDB (local or MongoDB Atlas)
- OpenAI API key
- Clone the repository:
git clone https://github.com/vijulshah/AgenticPipelines.git
cd AgenticPipelines- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env and add your API keys and MongoDB connection stringfrom pathlib import Path
from src.rag_pipeline import RAGPipeline, load_config
# Load configuration
config = load_config(Path("configs/default_config.yaml"))
# Initialize pipeline
with RAGPipeline(config) as pipeline:
# Add documents
text = "Your document text here..."
pipeline.add_text(text)
# Query the pipeline
result = pipeline.query("Your question here?")
print(result['answer'])Create a YAML configuration file or use the default one in configs/default_config.yaml:
pipeline_name: "my_rag_pipeline"
vector_store_type: "mongodb"
mongodb:
connection_string: "${MONGODB_URI}"
database_name: "rag_database"
collection_name: "vector_store"
index_name: "vector_index"
embedding:
provider: "openai"
model_name: "text-embedding-3-small"
dimensions: 1536
api_key: "${OPENAI_API_KEY}"
llm:
provider: "openai"
model_name: "gpt-4o-mini"
temperature: 0.7
max_tokens: 1000
api_key: "${OPENAI_API_KEY}"
retriever:
search_type: "similarity"
k: 4
chunking:
chunk_size: 1000
chunk_overlap: 200from src.rag_pipeline import RAGPipeline, RAGPipelineConfig
from src.rag_pipeline.models import MongoDBConfig, EmbeddingConfig, LLMConfig
config = RAGPipelineConfig(
pipeline_name="custom_pipeline",
mongodb=MongoDBConfig(
connection_string="mongodb://localhost:27017",
database_name="my_db",
),
embedding=EmbeddingConfig(
provider="openai",
model_name="text-embedding-3-small",
),
llm=LLMConfig(
provider="openai",
model_name="gpt-4o-mini",
temperature=0.5,
),
)
pipeline = RAGPipeline(config)See the examples/ directory for detailed usage examples:
basic_usage.py: Simple RAG pipeline setupdocument_processing.py: Loading and processing documentscustom_config.py: Using custom configurations
Run an example:
python examples/basic_usage.py# Start MongoDB
mongod --dbpath /path/to/data/directory- Create a free MongoDB Atlas cluster at https://www.mongodb.com/cloud/atlas
- Create a database user and get the connection string
- Create a vector search index on your collection:
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
}
]
}- Update your
.envfile with the connection string
This project follows PEP-8 guidelines and uses:
- Black for code formatting
- Ruff for linting
- MyPy for type checking
# Install dev dependencies
pip install -e ".[dev]"
# Format code
black src/ examples/ tests/
# Lint code
ruff check src/ examples/ tests/
# Type check
mypy src/Core dependencies:
langchain>=0.3.0- LLM frameworklangchain-mongodb>=0.2.0- MongoDB vector storelangchain-openai>=0.2.0- OpenAI integrationpymongo>=4.8.0- MongoDB driverpydantic>=2.9.0- Data validationpyyaml>=6.0.1- YAML parsing
- RAG Pipeline with MongoDB
- Tool Chaining Flows
- Additional Vector Stores (Chroma, Pinecone)
- Multi-modal RAG
- Agent-based pipelines
- Advanced retrieval strategies
- Evaluation metrics and benchmarking
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License
Built with: