Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
timwarnock committed Jun 7, 2019
1 parent 2e3e473 commit 473e828
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -81,7 +81,7 @@ Scan through a 10,000 x 10,000 character grid. As expected, datrie outperformed
0inputs+0outputs (0major+76902minor)pagefaults 0swaps

## Trie vs set() for Japanese (N=1,000,000)
Interestingly, for Japanese characters (scanning through a random grid of mostly Kanji), the performance difference was more pronounced. This is useful to know because Japanese (like Chinese) does not use obvious word boundaries and would benefit from using set() rather than Trie for Japanese language parsers.
Interestingly, for Japanese characters (scanning through a random grid of mostly Kanji), the performance difference was more pronounced. This is useful to know because Japanese (like Chinese) does not use obvious word boundaries and would benefit from using set() rather than Trie for Japanese language parsers. For 日本.txt, I extracted all kanji and kana from [EDICT](http://edrdg.org/jmdict/edict.html).

$ /usr/bin/time ./test_j_set.py
4140
Expand Down

0 comments on commit 473e828

Please sign in to comment.