zrb250 / word2vec Public

forked from hehuihui1994/word2vec

Notifications You must be signed in to change notification settings
Fork 0
Star 0

word2vec工具的C语言版本

0 stars 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
compute-accuracy.c		compute-accuracy.c
demo-analogy.sh		demo-analogy.sh
demo-classes.sh		demo-classes.sh
demo-word-accuracy.sh		demo-word-accuracy.sh
demo-word.sh		demo-word.sh
distance.c		distance.c
makefile		makefile
text8.zip		text8.zip
word-analogy.c		word-analogy.c
word2phrase.c		word2phrase.c
word2vec.c		word2vec.c

Repository files navigation

word2vec

word2vec工具的C语言版本

这是word2vec工具包
官方介绍信息如下

Tools for computing distributed representtion of words

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following
- desired vector dimensionality
- the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
- training algorithm hierarchical softmax and or negative sampling
- threshold for downsampling the frequent words
- number of threads to use
- the format of the output word vector file (text or binary)
Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at httpscode.google.compword2vec
word2vec原理以及使用方法

word2vec博客

About

word2vec工具的C语言版本

Report repository

Releases

No releases published

Packages

No packages published

Languages

C 94.3%
Shell 4.3%
Makefile 1.4%