Welcome to the LLM Dashboard, an interactive tool for exploring different language model inference methods!
- Compare different inference methods: Normal, Caching, and Batching
- Visualize generation times with interactive graphs
- Customize token generation length
-
Clone the repository:
git clone https://github.com/yourusername/llm-dashboard.git cd llm-dashboard
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Start the Flask server:
python app.py
-
Open your web browser and navigate to
http://localhost:5000
- Enter your text prompt in the input field
- Select the inference method (Normal, Caching, or Batching)
- Choose the number of tokens to generate
- Click "Generate" and watch the magic happen!
- Normal: Standard token-by-token generation
- Caching: Utilizes KV-caching for faster subsequent token generation
- Batching: Processes multiple inputs simultaneously for improved throughput
Experiment with different methods and observe the performance differences in the generated graphs!
We welcome contributions! Please see our CONTRIBUTING.md for details on how to get started.
This project is not yet licensed.