llama-cpp-search

A local GenerativeAI powered search engine that utilizes the powers of llama-cpp-python for running LLMs on your local and enahances your search experience. The current version uses the Phi-3-mini-4k-Instruct model for summarizing the search.

Performance

The Phi-3-mini models performs really well and the tokens per second is also very good. On my M2 Macbook Air I'm getting a performance of around 16 to 20 tokens per second for generation. One thing I noticed is when stopping VS Code and other electron apps I got a performance boost of 27 to 29 tokens per seconds. Anyways both of those are decent for a local model.

The performance entirely depends on your local machine. I you have a Nvidia GPU or a better M series PRO or MAX processor you might get better performance interms of tokens per second.

Setup steps

Please follow the following steps to run this locally on your machine.

Clone the repository

git clone https://github.com/vatsalsaglani/llama-cpp-search.git

Install dependencies

pip install -r requirements.txt --no-cache-dir

Create a `model` folder

Create a model folder inside the local-cpp-search folder so that we can download the model in that folder.

mkdir model

Download the model

Download the quantized Phi-3-mini-4k-Instruct model in GGUF format from HuggingFace.

Remember to move the model to the model folder.

Environment variables

Add the Brave Search API key into the environment. Create a .env file and add your BRAVE_API_KEY

# .env
BRAVE_API_KEY="YOUR_API_KEY"

Start the FastAPI server

python app.py

The server will expose the API endpoints for search and chat on http://localhost:8900.

The chat endpoint will allow you to get chat completions.

The search endpoint will take a query and search it using the Brave Search API and provide a search summary.

Demo

Let's look at a real-time demo.

P.S.: You can check the network speeds for the time required to search and summarize it via the Phi-3 model locally. The video is not fast forwarded.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
api_schemas.py		api_schemas.py
app.py		app.py
brave_search.py		brave_search.py
configs.py		configs.py
ctx.py		ctx.py
llamacheck.py		llamacheck.py
llm_invoke.py		llm_invoke.py
requirements.txt		requirements.txt
search_gen.py		search_gen.py

vatsalsaglani/llama-cpp-search

Folders and files

Latest commit

History

Repository files navigation

llama-cpp-search

Performance

Setup steps

Clone the repository

Install dependencies

Create a model folder

Download the model

Environment variables

Start the FastAPI server

Demo

Query: Global economy in 2024

Query: LLM Releases

Query: LLM Research

About

Resources

Stars

Watchers

Forks

Languages

Create a `model` folder