NVIDIA Chat with RTX is a popular tool that demonstrates the capabilities of chatbots based on an LLM-RAG setup.
It requires a Windows PC with an RTX GPU with at least 8GB of VRAM. NVIDIA chat with RTX
This project demonstrates the use of an LLM supporting German.The demo uses LAION LeoLM
Check the system requirements and download here.
Check that everything is working by running the tool.
The default directory is C:\Users\username\AppData\Local\NVIDIA\ChatWithRTX
Replace username in the following commands by the name used on your PC.
Install git for Windows and run Git Bash
cd /c/Users/username/AppData/Local/NVIDIA/ChatWithRTX/RAG/trt-llm-rag-windows-main/model
git clone https://huggingface.co/LeoLM/leo-hessianai-7b-chat
TensorRT-LLM is already installed as part of Chat with RTX.
Open the Anaconda Prompt
cd C:\Users\username\AppData\Local\NVIDIA\ChatWithRTX
Check available environments and activate the TensorRT-LLM environment
conda info --envs
conda activate C:\Users\username\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag
Convert the model
python ./TensorRT-LLM/TensorRT-LLM-0.7.0/examples/llama/build.py --model_dir ./RAG\trt-llm-rag-windows-main\model\leo-hessianai-7b-chat --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --output_dir ./RAG\trt-llm-rag-windows-main\model\leo-hessianai-7b-chat
C:\Users\username\AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\config
Add to config.json:
{
"name": "LeoLM",
"installed": true,
"metadata": {
"model_path": "model\\leo-hessianai-7b-chat",
"engine": "llama_float16_tp1_rank0.engine",
"tokenizer_path": "model\\leo-hessianai-7b-chat",
"max_new_tokens": 512,
"max_input_token": 2048,
"temperature": 0.1
}
}
