# Alternative: Running Mistral 7B Without CUDA
If you do not have a CUDA-enabled GPU, or if you are facing issues with PyTorch/xFormers, this notebook provides an alternative way to run Mistral 7B using `llama.cpp`. This approach works on both **CPU and low-VRAM GPUs** by using a **quantized GGUF model** that requires significantly less memory.

## Step 1: Install Dependencies
Before proceeding, ensure that you have **CMake** and a compiler installed. On Linux/macOS, you need `make`, while on Windows, `CMake` is required.

In [None]:
!sudo apt update && sudo apt install -y build-essential cmake git wget

## Step 2: Clone and Compile `llama.cpp`
This step ensures that `llama.cpp` is properly built for running Mistral 7B.

In [None]:
!git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && rm -rf build

In [None]:
!cmake -B build
!cmake --build build --config Release

## Step 3: Download a Quantized Mistral Model
We will use a **4-bit quantized model** in GGUF format, which works well on CPUs and low-VRAM GPUs.

In [None]:
!wget https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf -P models/

## Step 4: Run Mistral 7B Using `llama.cpp`
Once compiled, run the following command to start the model in interactive mode.

In [None]:
!build/bin/main -m models/mistral-7b-v0.1.Q4_K_M.gguf --interactive

## 🎉 Success!
You should now be able to run Mistral 7B on your **CPU or low-VRAM GPU** without requiring CUDA, PyTorch, or xFormers.