Skip to content

yongyehuang/New-Word-Detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

新词发现算法(New Word Detection)

Difference with the original fork: Updated to python 3

参考:
Reference:

代码说明:
Code describing:

  • extract.py:
    提供文档中所有成词的可能组合,以及计算词的点互信息时的一个组合。
    Providing all possible word combines in document, and computing all word combines of PMI.
  • entropy.py:
    计算左右邻居熵的大小
    Computing left and right entropy of neighbors
  • wordseg.py:
    根据计算出的频数、点互信息和左右熵来找出成词可能的组合,最后与词典对比找出新词。
    Finding all possible word combines by frequency, PMI and left/right entropy of neighbors, then comparing with diction to detect new word.

About

新词发现算法(NewWordDetection)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%