Skip to content

wdimmy/LmCSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LmCSC (Language Model-based Chinese Spelling Check)

This is an implementation of Chinese spelling check system.

Quick Links

About

The system mainly consists of the following three parts:

  • A Tri-gram Language Model
  • Confusionset
  • Other sources

Demo

Installation

Except for some pre-installed python libraries, there some additional packages needed to be installed in order to successfully run our system. We have listed the compulsory packages in the requirements.txt. Run the following commands to clone the repository and install LmCSC:

git clone https://github.com/wdimmy/LmCSC.git
cd LmCSC; pip install -r requirements.txt; python setup.py develop

Note: requirements.txt includes a subset of all the possible required packages. Depending on what you want to run, you might need to install an extra package.

You can train the langauge model using kenlm, or downlowed our already trained model by run:

chmod 777 ./download.sh 
./download.sh 

NOTE: we provide two versions:

kenlm_3.bin(about 13GB): https://pan.baidu.com/s/1g7LL_sLs-ra2l9VxeDp-9w Extraction Code:0u3q

kenlm_3_small.bin (about 3GB): https://pan.baidu.com/s/1mMVVHmNtM_FXLJ5yIiRX7Q Extraction Code:91qj

The bigger one works better.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages