Skip to content

zeezou-isee/MobileKernelBench_git

Repository files navigation

MobileKernelBench

MobileKernelBench evaluates LLM-generated mobile operator implementations against native inference frameworks. The repository currently contains NCNN and MNN pipelines plus a MoKA agent wrapper that can iteratively generate, compile, verify, and benchmark operator code.

This top-level README covers repository setup, datasets, environment requirements, and MoKA usage. Framework-specific commands live in the pipeline READMEs:

Repository Structure

Core directories:

  • MoKA/: agent loop and unified pipeline interface.
  • mnnpipeline/: MNN standalone pipeline, operator map, runtime helpers, CLI, and MNN-specific documentation.
  • ncnnpipeline/: NCNN standalone pipeline, operator map, verification helpers, and NCNN-specific documentation.
  • pipeline_MNN/ and pipeline_NCNN/: compatibility packages used by MoKA.
  • prompt/: common prompt and LLM API helpers.
  • MNN_utils/: legacy MNN utilities and the MNN dataset used by the new mnnpipeline.
  • dataset/: NCNN/PyTorch datasets and converted model artifacts.
  • MNN-3.3.0/: local MNN 3.3.0 checkout used for MNN evaluation. This is third-party source code.

Important index files:

  • mnnpipeline/mnn_op_map.yaml: maps MNN task names to PyTorch files, ONNX files, MNN model paths, target source folders, and allowed generated source files.
  • ncnnpipeline/ncnn_op_map.yaml: maps NCNN task names to NCNN source file/layer/test metadata.

Datasets

NCNN uses the dataset/ tree:

  • dataset/Mobilekernelbench: PyTorch reference models.
  • dataset/Mobilekernelbench_onnx: ONNX exports.
  • dataset/Mobilekernelbench_onnx_ncnn: NCNN conversion workspace.
  • dataset/Mobilekernelbench_pt: TorchScript or PyTorch-converted artifacts.
  • dataset/Mobilekernelbench_pt_ncnn_success: NCNN converted models used for benchmark inputs.

MNN uses the MNN_utils/dataset/ tree:

  • MNN_utils/dataset/mnn_dataset_test/Dataset_version1: PyTorch reference models.
  • MNN_utils/dataset/mnn_dataset_test/Dataset_version1_onnx: ONNX reference models.
  • MNN_utils/dataset/mnn_models/Dataset_version1: generated or original MNN model files grouped by model/provider.
  • MNN_utils/dataset/mnn_models/Dataset_version1/original: MNN models generated from the original MNN implementation during original-op evaluation.

The MNN map and NCNN map are the source of truth for locating task files. Prefer adding new task metadata there instead of hard-coding paths in scripts.

Environment Setup

Use the repository root as the working directory:

cd /Users/zeezou/python/project/MobileKernelBench_git

Create or activate a Python environment, then install the common dependencies:

pip install pyyaml tqdm numpy onnx onnxruntime openai torch

For LLM-backed modes, configure OpenRouter:

export OPENROUTER_API_KEY='your-openrouter-key'
export OPENROUTER_MAX_TOKENS= 'your-max-token'

For Android performance tests, prepare:

  • adb available in PATH.
  • An authorized Android device shown by adb devices.
  • ANDROID_NDK set to a valid Android NDK path.
  • CMake and Make available on the host.

Framework source requirements:

  • MNN: use MNN-3.3.0/ in this repository or pass another checkout with --mnn-root.
  • NCNN: use an NCNN checkout at ncnn/; see ncnnpipeline/README.md for setup details.

MoKA Agent Usage

MoKA wraps a framework pipeline and runs iterative rounds. Each round can generate code, compile it, verify correctness, benchmark it, and use failure information to build the next prompt.

Show all options:

python MoKA/run_moka.py --help

Run MoKA With MNN

python MoKA/run_moka.py \
  --framework mnn \
  --task-name Abs \
  --mnn-root path/to/your/mnn/folder \
  --model model/from/openrouter \
  --max-rounds 5

Useful MNN options:

  • --mnn-op-map: defaults to mnnpipeline/mnn_op_map.yaml.
  • --mnn-data-root: defaults to MNN_utils/dataset/mnn_dataset_test/Dataset_version1.
  • --mnn-onnx-root: defaults to MNN_utils/dataset/mnn_dataset_test/Dataset_version1_onnx.
  • --mnn-converted-root: defaults to MNN_utils/dataset/mnn_models/Dataset_version1.
  • --mnn-root: path to the MNN source checkout.

Run MoKA With NCNN

python MoKA/run_moka.py \
  --framework ncnn \
  --task-name Abs \
  --ncnn-op-map ncnnpipeline/ncnn_op_map.yaml \
  --ncnn-data-root dataset/Mobilekernelbench \
  --ncnn-converted-root dataset/Mobilekernelbench_pt_ncnn_success \
  --ncnn-prompt-config prompt/prompt_template.yaml \
  --model anthropic/claude-sonnet-4.5 \
  --max-rounds 5

MoKA Outputs

MoKA writes round-level and final files under:

MoKA_plus_response/<framework>/<task_name>/
MoKA_plus_results/<framework>/<task_name>.json

Typical files include prompts, LLM responses, pipeline results, memory history, and final summaries.

Standalone Framework Pipelines

Use standalone pipelines when you want to test one framework without the MoKA repair loop.

Recommended workflow:

  1. Run standalone correctness without benchmark first.
  2. Enable Android benchmark only after host correctness passes.
  3. Use MoKA after the standalone pipeline is confirmed to work for the target framework.

Thanks

We build our pipeline on MNN and NCNN, thanks to their great work, checking their repo at MNN and NCNN.

citation

@misc{zou2026mobilekernelbenchllmswriteefficient,
      title={MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?}, 
      author={Xingze Zou and Jing Wang and Yuhua Zheng and Xueyi Chen and Haolei Bai and Lingcheng Kong and Syed A. R. Abu-Bakar and Zhaode Wang and Chengfei Lv and Haoji Hu and Huan Wang},
      year={2026},
      eprint={2603.11935},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.11935}, 
}

About

The code of MobileKernelBench, including MNN, NCNN pipeline, MoKA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors