This web tool helps you estimate the GPU infrastructure required for Large Language Model (LLM) inference. It assists developers and architects in planning the necessary hardware based on model characteristics and performance requirements.
- Comprehensive Estimation: Calculates the required VRAM (for the model + KV cache), the number of GPUs, latency, and capital expenditure (CAPEX).
- Pre-defined Models: Includes a list of popular models (e.g., Llama, Mistral) for a quick start.
- GPU Catalog: Contains specifications for common GPUs. The price of GPUs can be modified by the user.
- Customizable Parameters: Allows you to adjust all key parameters: model size, precision, context length, QPS, etc.
- CSV Export: Export the input data and estimation results to a CSV file.
- Frontend: React, TypeScript
- Build Tool: Vite
- Styling: Tailwind CSS (used in the project)
Follow these steps to run the project on your local machine.
- Clone the repository (if you haven't already):
git clone https://github.com/your-username/your-repo.git
- Navigate to the project directory:
cd repo-name - Install the dependencies:
npm install
To start the development server, run:
npm run devThe application will then be available at http://localhost:5173 (Vite usually indicates the port in the terminal upon launch).
- To create an optimized production build:
npm run build
- To preview the production build locally:
npm run preview
- Select a model from the dropdown list or manually enter the model parameters.
- Adjust the inference parameters such as context size (input/output tokens) and QPS (queries per second).
- Choose a target GPU to see estimates based on that hardware.
- The results are calculated in real-time and displayed on the right.
- (Optional) Go to the GPU Catalog tab to adjust prices and see the impact on the total cost.
- Click Export to CSV to download a summary of your estimation.
This tool is a work in progress and provides estimates for educational purposes. It does not account for certain elements of a complete infrastructure, such as redundancy, network costs (ingress/egress), other application components, load balancers, monitoring, etc.
- Vincent Méoc - LinkedIn
