Skip to content
notmecab-rs is a very basic mecab clone, designed only to do parsing, not training.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

notmecab-rs is a very basic mecab clone, designed only to do parsing, not training.

notmecab-rs loads everything into memory, so it has higher memory requirements than mecab, which uses memory mapping for most things.

This is meant to be used as a library by other tools such as frequency analyzers. Not directly by people. It also only works with UTF-8 dictionaries. (Stop using encodings other than UTF-8 for infrastructural software.)

Licensed under the Apache License, Version 2.0.


Get unidic's sys.dic, matrix.bin, unk.dic, and char.bin and put them in data/. Then invoke tests from the repository root.

Example (from tests):

// you need to acquire a mecab dictionary and place these files here manually
let mut sysdic = BufReader::new(File::open("data/sys.dic").unwrap());
let mut matrix = BufReader::new(File::open("data/matrix.bin").unwrap());
let mut unkdic = BufReader::new(File::open("data/unk.dic").unwrap());
let mut unkdef = BufReader::new(File::open("data/char.bin").unwrap());

let dict = Dict::load(&mut sysdic, &mut matrix, &mut unkdic, &mut unkdef).unwrap();

let result = parse(&dict, &"これを持っていけ".to_string()).unwrap();

for token in &result.0
    println!("{}", token.feature);
let split_up_string = tokenstream_to_string(&result.0, "|");
println!("{}", split_up_string);
assert_eq!(split_up_string, "これ|を|持っ|て|いけ"); // this test might fail if you're not testing with unidic (i.e. the correct parse might be different)

Output of example:


You can also call parse_to_lexertoken, which does less string allocation, but you don't get the feature string as a string.


  • This software is unusably slow if optimizations are disabled.
  • Cost rewriting is not performed when user dictionaries are loaded.
  • There are some cases where multiple parses tie for the lowest cost. It's not defined which parse gets chosen in these cases.
  • There are some cases where mecab failed to find an ideal parse, but notmecab-rs does. Notmecab-rs should never produce a parse that has a higher total cost than the parse that mecab gives. If it does, it indicates some underlying bug, and should be reported, please.
You can’t perform that action at this time.