Skip to content

yoongi0428/Kor-Sentence-Similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kor-Sentence-Similarity

Sentence/Text Similarity for Korean (In Simple Way)

Models

Details

  • Data

    • In data, two questions are seperated by '\t'
  • Preprocessing

    • Character Level (음소 or 음절)
    • Digits and Specials
    • For eumjeol(Syllable), use frequent 2350
  • Configuration

    • main.py : main run file
    • --epochs : # of training epochs
    • --batch : Batch Size
    • --lr : Learning rate
    • --strmaxlen : Maximum Limit of String Length
    • --charsize : Vocab Size
    • filter_num : # of Filter of one CNN Filter
    • --emb : Embedding Dimension
    • --eumjeol : Use Eumjeol(Syllable-level) if specified
    • threshold : Threshold to determine Similar or not
    • --model : Model Selection (CNN, MLP)

To Run

  • Set FC,layer and CNN layers in 'main.py'
  • run 'main.py' with arguments as you wish

About

Sentence/Text Similararity Models for Korean

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages