Skip to content

taixhi/kanji_frequency

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Kanji Frequency Calculator

Visualisation

Use the wikipedia dump to produce a kanji frequency csv.

Requirements

How to Use

I have dumped the frequency hashmap and the csv, generated from 500mb of Wikipedia data for your viewing pleasure: frqdist.csv

How to use with your won corpus

Dump the Japanese wikipedia dump and parse it with WikiExtractor

To generate the kanji frequency hashmap:

python3 main.py

Use the saved hashmap to create a csv:

python3 load.py

Optional:

Use WordItOut for visualisation

About

Use the wikipedia dump to produce a kanji frequency csv

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published