This repo is your all-in-one launchpad for onboarding engineering teams into the world of Generative AI.
Whether you're starting to build internal assistants, integrating Retrieval-Augmented Generation (RAG) into your apps, or scaling GenAI use across departments — this starter kit gives you a hands-on, modular foundation that’s easy to clone, extend, and deploy.
💻 Start the Hands-On Colab Notebook!
🗒️ Fill out this 30-second Survey
If you'd like a quick intro lab before getting started, go to the Intro to RAG and Tools lab.
- ✅ Interactive RAG pipeline in Colab using LangChain, OpenAI
gpt-4o
embeddings, and ChromaDB - 📝 Markdown-based internal playbook for async learning and fast ramp-up
- 🧪 LLM evaluation script using Ragas to measure accuracy, completeness, and style
- 💬 Prompt patterns and templates to guide better assistant behavior
- 🛠️ Poetry for dependency management — no environment headaches
- This project demonstrates a simple Retrieval Augmented Generation (RAG) system (where an LLM can query stored documents as extra info to answer prompts)
- It uses OpenAI's
text-embedding-3-small
model to generate embeddings for text documents and stores them in a ChromaDB vector store. - The system then uses these embeddings to retrieve relevant document chunks based on user queries and generate responses.
- The project also includes an evaluation component using the Ragas framework to assess the performance of the RAG pipeline, focusing on context precision, context recall, and faithfulness.
- To provide a practical example of building a RAG system.
- To showcase the use of OpenAI embeddings for text processing.
- To demonstrate the integration of ChromaDB as a vector store.
- To illustrate how to evaluate a RAG system using Ragas.
- To offer a clear and reproducible setup for others to experiment with RAG systems.
- Go to the Gen AI Dev Starter Kit Colab file to run the code and start building a RAG system!
GenAI-Dev-Onboarding-Starter-Kit/
├──pyproject.toml # project dependencies managed by Poetry
├──doc1.txt # Example document about "Security and Compliance"
├──doc2.txt # Example document about "Company FAQs"
├──my_rag_project/
├── embedding_processor_langchain.py # Script for generating and storing embeddings
├── evaluation_langchain.py # Script for evaluating the RAG system
└── README.md # This file!
- Python 3.9 or higher
- Poetry for dependency management
- An OpenAI API key
If you have cloned this project from a Git repository, navigate to the project directory.
If you don't have Poetry installed, you can install it using the following command:
curl -sSL https://install.python-poetry.org | python3 -
Make sure to add Poetry to your PATH as instructed by the installer.
Navigate to the project's root directory (where pyproject.toml
is located) and run:
poetry install
This will create a virtual environment and install all necessary packages.
This project requires an OpenAI API key. You need to set it as an environment variable named OPENAI_API_KEY
.
For example, you can add the following line to your shell configuration file (e.g., .bashrc
, .zshrc
):
export OPENAI_API_KEY='your_api_key_here'
Then, source the file (e.g., source ~/.bashrc
) or open a new terminal session.
Alternatively, you can set it directly in your terminal session:
export OPENAI_API_KEY='your_api_key_here'
cd my_rag_project
The embedding_processor_langchain.py
script is used to generate embeddings for your documents and store them in ChromaDB. Make sure your documents are in the correct location or update the script with the correct file path accordingly!
To run the script:
poetry run python embedding_processor_langchain.py
The evaluation.py
script is used to evaluate the RAG system. It will use the embeddings generated in the previous step.
To run the script:
poetry run python evaluation_langchain.py
- Ensure you have a stable internet connection when running the scripts, as they interact with external APIs (OpenAI) and may download data.
- The paths to documents and the ChromaDB database might need adjustment based on where you run the scripts.
- This project is for demonstration purposes and may require further modifications for production use.