Welcome to the LLM Zoomcamp! This repository contains all the materials for the free online course provided by DataTalks.Club. The course focuses on real-life applications of Large Language Models (LLMs), with a comprehensive syllabus designed to teach you how to build an AI bot that can answer questions about your knowledge base.
The course is structured over 10 weeks, covering the following topics:
-
Week 1: Introduction to LLMs and RAG
- Basics of Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- Setting up the environment and simple RAG with OpenAI API
-
Week 2: Open-Source LLMs and Self-Hosting
- Exploring open-source LLMs
- Flan-T5-xl by Google.
- Phi-3-mini-128k-instruct by Microsoft.
- Mistral-7B-v0.1 by MistralAI.
- Self-hosting techniques
- Exploring open-source LLMs
-
Week 3: Vector Databases and Retrieval Techniques
- Understanding embeddings and vector search
- Integrating vectors with RAG
-
Week 4: Orchestration and Ingestion Pipelines
- Building data ingestion pipelines
- Using tools like Mage for orchestration
-
Week 5: Monitoring and Guardrails
- Implementing monitoring strategies
- Using metrics and dashboards for visualization
- Ensuring safe operations with guardrails
-
Week 6: Tips and Tricks for Advanced RAG Systems
- Best practices for optimizing RAG systems
-
Weeks 7-10: Hands-on Project
- Apply your knowledge in a practical project
- Building a comprehensive AI bot
To begin, clone the repository and follow the instructions provided in each week's directory:
git clone https://github.com/yakhyo/llm-practice.git
cd llm-practice
For more details, visit the LLM Zoomcamp GitHub page and the course overview.