Mobile LLM Benchmarking Suite

A comprehensive evaluation of Small Language Models (SLMs) running locally on mobile hardware. This project compares SmolLM (360M), Qwen2 (0.5B), and Phi-3.5-mini (3.8B) across four technical domains: General Knowledge, Python Coding, Math Reasoning, and Science Logic.

Environment and Hardware

Tester: Tathagata Mitra
Device: POCO F1 (M1805E10A)
Processor: Octa-core Max 2.8GHz (Snapdragon 845)
RAM: 6.00 GB
OS: Android 8.1.0 (MIUI Global 10.0)
Platform: Android via PocketPal AI
Model Format: GGUF (Quantization: Q4_K_M / Q8_0)

Methodology

This benchmark utilizes an "LLM-as-a-Judge" framework to ensure consistent and objective scoring across different models.

Inference: Each model was prompted with 20 targeted questions (5 per category) within the PocketPal AI mobile environment.
Evaluation: The raw outputs were judged by Gemini 3.1 Pro.
Scoring: Gemini evaluated responses based on the specific technical evaluation_criteria defined in the project data files.
UX Evaluation: Beyond accuracy, a comparative UX analysis was performed using physical hardware to measure latency, thermal stability, and RAM usage on a 6GB device.

Results and Analysis

Overall Performance

This chart represents the weighted average of performance across all benchmarking categories.

Category Breakdowns

General Knowledge	Python Coding

Math Reasoning	Science Logic

Repository Structure

Data/: Contains the ground-truth prompts in JSON format.
Results/Histogram_plots/: High-resolution histograms generated from the benchmarking data.
Results/Output/output_responses/:Responses containing the output responses from each model.

How to Reproduce

Install PocketPal AI on your Android device.
Download the models in GGUF format (SmolLM-360M-Instruct, Qwen2-0.5B-Instruct, Phi-3.5-mini-instruct).
Run the Prompts: Input the specific prompts located in the Data/ directory.
Grade the Output: Use Gemini 3.1 Pro to score the model's response against the provided evaluation_criteria.

About the Author

I am a second-year undergraduate physics student at Jadavpur University. My academic work focuses on computational physics and numerical methods.

GitHub: tmdeveloper007

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
Results		Results
ISI_internship.pdf		ISI_internship.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mobile LLM Benchmarking Suite

Environment and Hardware

Methodology

Results and Analysis

Overall Performance

Category Breakdowns

Repository Structure

How to Reproduce

About the Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Mobile LLM Benchmarking Suite

Environment and Hardware

Methodology

Results and Analysis

Overall Performance

Category Breakdowns

Repository Structure

How to Reproduce

About the Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages