A comprehensive evaluation of Small Language Models (SLMs) running locally on mobile hardware. This project compares SmolLM (360M), Qwen2 (0.5B), and Phi-3.5-mini (3.8B) across four technical domains: General Knowledge, Python Coding, Math Reasoning, and Science Logic.
- Tester: Tathagata Mitra
- Device: POCO F1 (M1805E10A)
- Processor: Octa-core Max 2.8GHz (Snapdragon 845)
- RAM: 6.00 GB
- OS: Android 8.1.0 (MIUI Global 10.0)
- Platform: Android via PocketPal AI
- Model Format: GGUF (Quantization: Q4_K_M / Q8_0)
This benchmark utilizes an "LLM-as-a-Judge" framework to ensure consistent and objective scoring across different models.
- Inference: Each model was prompted with 20 targeted questions (5 per category) within the PocketPal AI mobile environment.
- Evaluation: The raw outputs were judged by Gemini 3.1 Pro.
- Scoring: Gemini evaluated responses based on the specific technical
evaluation_criteriadefined in the project data files. - UX Evaluation: Beyond accuracy, a comparative UX analysis was performed using physical hardware to measure latency, thermal stability, and RAM usage on a 6GB device.
This chart represents the weighted average of performance across all benchmarking categories.
| General Knowledge | Python Coding |
|---|---|
![]() |
![]() |
| Math Reasoning | Science Logic |
|---|---|
![]() |
![]() |
- Data/: Contains the ground-truth prompts in JSON format.
- Results/Histogram_plots/: High-resolution histograms generated from the benchmarking data.
- Results/Output/output_responses/:Responses containing the output responses from each model.
- Install PocketPal AI on your Android device.
- Download the models in GGUF format (SmolLM-360M-Instruct, Qwen2-0.5B-Instruct, Phi-3.5-mini-instruct).
- Run the Prompts: Input the specific prompts located in the
Data/directory. - Grade the Output: Use Gemini 3.1 Pro to score the model's response against the provided
evaluation_criteria.
I am a second-year undergraduate physics student at Jadavpur University. My academic work focuses on computational physics and numerical methods.
- GitHub: tmdeveloper007




