Skip to content

xjohnxjohn/LLaMA-Inference-API

 
 

Repository files navigation

LLaMA Inference API 🦙

project banner

Inference API for LLaMA

pip install llama-inference

or

pip install git+https://github.com/aniketmaurya/llama-inference-api.git@main

Note: You need to manually install and setup Lit-LLaMA to use this project.

pip install lit-llama@git+https://github.com/Lightning-AI/lit-llama.git@main

For Inference

from llama_inference import LLaMAInference
import os

WEIGHTS_PATH = os.environ["WEIGHTS"]

checkpoint_path = f"{WEIGHTS_PATH}/lit-llama/7B/state_dict.pth"
tokenizer_path = f"{WEIGHTS_PATH}/lit-llama/tokenizer.model"

model = LLaMAInference(checkpoint_path=checkpoint_path, tokenizer_path=tokenizer_path, dtype="bfloat16")

print(model("New York is located in"))

For deploying as a REST API

Create a Python file app.py and initialize the ServeLLaMA App.

# app.py
from llama_inference.serve import ServeLLaMA, Response

import lightning as L

component = ServeLLaMA(input_type=PromptRequest, output_type=Response)
app = L.LightningApp(component)
lightning run app app.py

About

LLaMA inference API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 61.3%
  • Jupyter Notebook 31.1%
  • Makefile 7.6%