Skip to content

Model Iaith Fectorau Word2vec ar sail corpora ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol at ddibenion ymchwil fel cynhyrchu modelau iaith. | A Word2vec Language Model based on the Language Technologies Unit's research corpora.

License

Notifications You must be signed in to change notification settings

techiaith/word2vec-cy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Model Iaith Fectorau Cymraeg

Model Iaith Fectorau Word2vec ar sail adnoddau ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol.

A Word2vec Language Model based on the Language Technologies Unit's research resources collected from various resources.

Gweler https://github.com/techiaith/word2vec-cy/tags a chlicio ar 'Latest' i gael at y data.

See https://github.com/techiaith/word2vec-cy/tags and click on 'Latest' to access the data.

NODYN: Mae ffurfiau'r model hwn bellach i gyd mewn llythrennau bach.

NOTE: The forms found found in this model are now in lower case.

I'w ddefnyddio gyda Gensim 4:

To use with Gensim 4:

pip install gensim

Yna:

Then:

import gensim
from gensim.models import KeyedVectors

wv = KeyedVectors.load("word2vec.wordvectors", mmap='r')

print ("MODEL SIZE:", len(wv)) # 518,260


# Find words that are similar to 'athro' (=male teacher)
# whilst subtracting vectors associated with 'dynion' (='men')
similar_to_athro = wv.most_similar(positive=['athro','dynes'],negative=["dynion"], topn=10)

# The top result should be 'athrawes' (female teacher) as subtracting 'dynion' substracts
# both maleness and the plural aspect found in 'athrawon' (='teachers')
print (similar_to_athro)

# RESULTS
[('athrawes', 0.6490613222122192),
('addysgwr', 0.4838572144508362),
('ymarferydd', 0.4762175381183624),
('ymarferwr', 0.4626823663711548),
('aseswr', 0.462118536233902),
('tiwtor', 0.4528316557407379),
('hyfforddai', 0.4441806972026825),
('mentor', 0.43711039423942566),
('asesydd', 0.4269064962863922),
('prifathrawes', 0.4217046797275543)]

Ariannwyd creu'r model hwn gan Lywodraeth Cymru.

The creation of this model was financed by the Welsh Government.

About

Model Iaith Fectorau Word2vec ar sail corpora ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol at ddibenion ymchwil fel cynhyrchu modelau iaith. | A Word2vec Language Model based on the Language Technologies Unit's research corpora.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages