Files

.binder
.ci
.docker
.github
licensing
notebooks
- 3D-pose-estimation-webcam
- 3D-segmentation-point-clouds
- action-recognition-webcam
- animate-anyone
- async-api
- auto-device
- bark-text-to-audio
- big-transfer-quantization
- blip-visual-language-processing
- catvton
- clip-language-saliency-map
- clip-zero-shot-image-classification
- controlnet-stable-diffusion
- convert-to-openvino
- ct-segmentation-quantize
- ddcolor-image-colorization
- deepseek-r1
- deepseek-vl2
- depth-anything
- detectron2-to-openvino
- distil-whisper-asr
- dynamicrafter-animating-images
- efficient-sam
- encodec-audio-compression
- explainable-ai-1-basic
- explainable-ai-2-deep-dive
- explainable-ai-3-map-interpretation
- fast-segment-anything
- film-slowmo
- florence2
- flux-fill
- flux.1-image-generation
- freevc-voice-conversion
- gemma3
- glm-edge-v
- glm4-v
- got-ocr2
- gpu-device
- grammar-correction
- grounded-segment-anything
- handwritten-ocr
- hello-detection
- hello-npu
- hello-segmentation
- hello-world
- hugging-face-hub
- hunyuan-dit-image-generation
- image-classification-quantization
- image-to-image-genai
- inpainting-genai
- instant-id
- instruct-pix2pix-image-editing
- internvl2
- janus-multimodal-generation
- jax-to-openvino
- jina-clip
- keras-with-openvino-backend
- knowledge-graphs-conve
- kosmos2-multimodal-large-language-model
- language-quantize-bert
- latent-consistency-models-image-generation
- llasa-speech-synthesis
- llava-multimodal-chatbot
- llava-next-multimodal-chatbot
- llm-agent-functioncall
- llm-agent-react
- llm-chatbot
- llm-question-answering
- llm-rag-langchain
- llm-rag-llamaindex
  - README.md
  - gradio_helper.py
  - llm-rag-llamaindex.ipynb
- localai
- ltx-video
- magika-content-type-recognition
- meter-reader
- minicpm-v-multimodal-chatbot
- mllama-3.2
- mms-massively-multilingual-speech
- mobileclip-video-search
- model-server
- modelscope-to-openvino
- multilora-image-generation
- multimodal-rag
- music-generation
- named-entity-recognition
- nano-llava-multimodal-chatbot
- nuextract-structure-extraction
- object-detection-webcam
- omnigen
- omniparser
- oneformer-segmentation
- openvino-api
- openvino-tokenizers
- openvoice
- optical-character-recognition
- optimize-preprocessing
- outetts-text-to-speech
- paddle-ocr-webcam
- paddle-to-openvino
- parler-tts-text-to-speech
- person-counting-webcam
- person-tracking-webcam
- phi-3-vision
- phi-4-multimodal
- photo-maker
- pix2struct-docvqa
- pixart
- pixtral
- pose-estimation-webcam
- prompt-lookup-decoding
- pytorch-post-training-quantization-nncf
- pytorch-quantization-aware-training
- pytorch-quantization-sparsity-aware-training
- pytorch-to-openvino
- qrcode-monster
- quantizing-model-with-accuracy-control
- qwen2-audio
- qwen2-vl
- qwen2.5-vl
- riffusion-text-to-music
- rmbg-background-removal
- s3d-mil-nce-text-to-video-retrieval
- sam2-image-segmentation
- sam2-video-segmentation
- sana-image-generation
- sdxl-turbo
- segment-anything
- siglip-zero-shot-image-classification
- sketch-to-image-pix2pix-turbo
- smoldocling
- smolvlm2
- sound-generation-audioldm2
- sparsity-optimization
- speculative-sampling
- speech-recognition-quantization
- speechbrain-emotion-recognition
- stable-audio
- stable-cascade-image-generation
- stable-diffusion-ip-adapter
- stable-diffusion-keras-cv
- stable-diffusion-text-to-image
- stable-diffusion-torchdynamo-backend
- stable-diffusion-v2
- stable-diffusion-v3
- stable-diffusion-xl
- stable-fast-3d
- stable-video-diffusion
- style-transfer-webcam
- surya-line-level-text-detection
- table-question-answering
- tensorflow-classification-to-openvino
- tensorflow-hub
- tensorflow-object-detection-to-openvino
- tensorflow-quantization-aware-training
- text-to-image-genai
- tflite-selfie-segmentation
- tflite-to-openvino
- tiny-sd-image-generation
- torchvision-zoo-to-openvino
- vehicle-detection-and-recognition
- vision-background-removal
- vision-monodepth
- wav2lip
- whisper-asr-genai
- whisper-subtitles-generation
- yolov10-optimization
- yolov11-optimization
- yolov12-optimization
- yolov8-optimization
- yolov9-optimization
- zeroscope-text2video
- README.md
selector
supplementary_materials
utils
.gitignore
CONTRIBUTING.md
Dockerfile
Jenkinsfile
LICENSE
Makefile
README.md
README_cn.md
README_ja.md
SECURITY.md
check_install.py
requirements.txt

llm-rag-llamaindex

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
gradio_helper.py		gradio_helper.py
llm-rag-llamaindex.ipynb		llm-rag-llamaindex.ipynb

README.md

Create a RAG system using OpenVINO and LlamaIndex

Retrieval-augmented generation (RAG) is a technique for augmenting LLM knowledge with additional, often private or real-time, data. LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

LlamaIndex is a framework for building context-augmented generative AI applications with LLMs.LlamaIndex imposes no restriction on how you use LLMs. You can use LLMs as auto-complete, chatbots, semi-autonomous agents, and more. It just makes using them easier.

Notebook Contents

The tutorial consists of the following steps:

Install prerequisites
Download and convert the model from a public source using the OpenVINO integration with Hugging Face Optimum.
Compress model weights to 4-bit or 8-bit data types using NNCF
Create a RAG chain pipeline
Run Q&A pipeline

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide