This is an LLM Japanese chatbot demo program using Intel OpenVINO toolkit. The demo uses rinna/youri-7b-chat
model developed by Rinna Co.,Ltd.
This program can use either one of 'CPU', 'GPU.0' (integrated GPU), and 'GPU.1' Intel discrete GPU for inferencing.
- At least main memory of 32GB. 64GB is recommended.
Recommend to use Python virtual env.
You need to have Python installed.
# (optional) create a python venv and enable it.
python -m venv venv
(win) venv/Scripts/activate
(Lnx) source venv/bin/activate
pip install -r requirements0-uninstall.txt
pip install -r requirements1-install.txt
file | description |
---|---|
config.yml |
The system setting file. You can specify huggingface model cache directory and device for inferencing |
youri-7b-chat-openvino.py |
WebUI chatbot demo using rinna/youri-7b-chat model. This program uses OpenVINO as inference engine.You need to run model_download.py to download and convert the model into OpenVINO IR before you run this demo. |
model_download.py |
This program downloads rinna/youri-7b-chat model from huggingface and convert it into FP16 OpenVINO IR model.The converted model will be stored in ./youri-7b-chat/FP16/ directory. Also, the original model downloaded from hugging face will be stored in ./cache/huggingface/hub/ directory. |
benchmark_pyt.py |
Simple benchmark program using PyTorch. |
benchmark_ov.py |
Simple benchmark program using OpenVINO. |
- System 1
- Software
- Windows 11 23H2
- OpenVINO 2023.1.0
- Hardware
- CPU: Core i7-10700K
- MEM: 64GB
- GPU: Intel A380 (discrete GPU is optional)
- Software
- System 2
- Software
- Ubuntu 22.04.3
- OpenVINO 2023.1.0
- Hardware
- CPU: Xeon Platinum 8358
- MEM: 256GB
- Software
Rinna Co.,Ltd - Original developer of rinna/youri-7b-chat model.