MobileKernelBench evaluates LLM-generated mobile operator implementations against native inference frameworks. The repository currently contains NCNN and MNN pipelines plus a MoKA agent wrapper that can iteratively generate, compile, verify, and benchmark operator code.
This top-level README covers repository setup, datasets, environment requirements, and MoKA usage. Framework-specific commands live in the pipeline READMEs:
Core directories:
MoKA/: agent loop and unified pipeline interface.mnnpipeline/: MNN standalone pipeline, operator map, runtime helpers, CLI, and MNN-specific documentation.ncnnpipeline/: NCNN standalone pipeline, operator map, verification helpers, and NCNN-specific documentation.pipeline_MNN/andpipeline_NCNN/: compatibility packages used by MoKA.prompt/: common prompt and LLM API helpers.MNN_utils/: legacy MNN utilities and the MNN dataset used by the newmnnpipeline.dataset/: NCNN/PyTorch datasets and converted model artifacts.MNN-3.3.0/: local MNN 3.3.0 checkout used for MNN evaluation. This is third-party source code.
Important index files:
mnnpipeline/mnn_op_map.yaml: maps MNN task names to PyTorch files, ONNX files, MNN model paths, target source folders, and allowed generated source files.ncnnpipeline/ncnn_op_map.yaml: maps NCNN task names to NCNN source file/layer/test metadata.
NCNN uses the dataset/ tree:
dataset/Mobilekernelbench: PyTorch reference models.dataset/Mobilekernelbench_onnx: ONNX exports.dataset/Mobilekernelbench_onnx_ncnn: NCNN conversion workspace.dataset/Mobilekernelbench_pt: TorchScript or PyTorch-converted artifacts.dataset/Mobilekernelbench_pt_ncnn_success: NCNN converted models used for benchmark inputs.
MNN uses the MNN_utils/dataset/ tree:
MNN_utils/dataset/mnn_dataset_test/Dataset_version1: PyTorch reference models.MNN_utils/dataset/mnn_dataset_test/Dataset_version1_onnx: ONNX reference models.MNN_utils/dataset/mnn_models/Dataset_version1: generated or original MNN model files grouped by model/provider.MNN_utils/dataset/mnn_models/Dataset_version1/original: MNN models generated from the original MNN implementation during original-op evaluation.
The MNN map and NCNN map are the source of truth for locating task files. Prefer adding new task metadata there instead of hard-coding paths in scripts.
Use the repository root as the working directory:
cd /Users/zeezou/python/project/MobileKernelBench_gitCreate or activate a Python environment, then install the common dependencies:
pip install pyyaml tqdm numpy onnx onnxruntime openai torchFor LLM-backed modes, configure OpenRouter:
export OPENROUTER_API_KEY='your-openrouter-key'
export OPENROUTER_MAX_TOKENS= 'your-max-token'For Android performance tests, prepare:
adbavailable inPATH.- An authorized Android device shown by
adb devices. ANDROID_NDKset to a valid Android NDK path.- CMake and Make available on the host.
Framework source requirements:
- MNN: use
MNN-3.3.0/in this repository or pass another checkout with--mnn-root. - NCNN: use an NCNN checkout at
ncnn/; see ncnnpipeline/README.md for setup details.
MoKA wraps a framework pipeline and runs iterative rounds. Each round can generate code, compile it, verify correctness, benchmark it, and use failure information to build the next prompt.
Show all options:
python MoKA/run_moka.py --helppython MoKA/run_moka.py \
--framework mnn \
--task-name Abs \
--mnn-root path/to/your/mnn/folder \
--model model/from/openrouter \
--max-rounds 5Useful MNN options:
--mnn-op-map: defaults tomnnpipeline/mnn_op_map.yaml.--mnn-data-root: defaults toMNN_utils/dataset/mnn_dataset_test/Dataset_version1.--mnn-onnx-root: defaults toMNN_utils/dataset/mnn_dataset_test/Dataset_version1_onnx.--mnn-converted-root: defaults toMNN_utils/dataset/mnn_models/Dataset_version1.--mnn-root: path to the MNN source checkout.
python MoKA/run_moka.py \
--framework ncnn \
--task-name Abs \
--ncnn-op-map ncnnpipeline/ncnn_op_map.yaml \
--ncnn-data-root dataset/Mobilekernelbench \
--ncnn-converted-root dataset/Mobilekernelbench_pt_ncnn_success \
--ncnn-prompt-config prompt/prompt_template.yaml \
--model anthropic/claude-sonnet-4.5 \
--max-rounds 5MoKA writes round-level and final files under:
MoKA_plus_response/<framework>/<task_name>/
MoKA_plus_results/<framework>/<task_name>.json
Typical files include prompts, LLM responses, pipeline results, memory history, and final summaries.
Use standalone pipelines when you want to test one framework without the MoKA repair loop.
- MNN standalone commands: mnnpipeline/README.md
- NCNN standalone commands: ncnnpipeline/README.md
Recommended workflow:
- Run standalone correctness without benchmark first.
- Enable Android benchmark only after host correctness passes.
- Use MoKA after the standalone pipeline is confirmed to work for the target framework.
We build our pipeline on MNN and NCNN, thanks to their great work, checking their repo at MNN and NCNN.
@misc{zou2026mobilekernelbenchllmswriteefficient,
title={MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?},
author={Xingze Zou and Jing Wang and Yuhua Zheng and Xueyi Chen and Haolei Bai and Lingcheng Kong and Syed A. R. Abu-Bakar and Zhaode Wang and Chengfei Lv and Haoji Hu and Huan Wang},
year={2026},
eprint={2603.11935},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.11935},
}