Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Chinese is cmn and not chi? #37

Closed
d668 opened this issue Sep 29, 2016 · 9 comments
Closed

Why Chinese is cmn and not chi? #37

d668 opened this issue Sep 29, 2016 · 9 comments

Comments

@d668
Copy link

d668 commented Sep 29, 2016

According to standards Chinese should be chi but the library detects it as cmn

@wooorm
Copy link
Owner

wooorm commented Sep 29, 2016

Which standards? cmn is ISO 639-3.

@d668
Copy link
Author

d668 commented Sep 29, 2016

@d668
Copy link
Author

d668 commented Sep 29, 2016

@d668
Copy link
Author

d668 commented Sep 29, 2016

I need to match them with lang attribute codes

@wooorm
Copy link
Owner

wooorm commented Sep 29, 2016

That seems to be ISO 639-1, which doesn’t have all codes for languages supported by franc.

Also: w3schools is a terrible source.

I need to match them with lang attribute codes

Use ISO 639-3.

@wooorm wooorm closed this as completed Sep 29, 2016
@d668
Copy link
Author

d668 commented Sep 29, 2016

this is crazy it is 800KB minified https://github.com/wooorm/iso-639-3/releases. Where can I get a list of only those used in franc-most.js? Thanks

@wooorm
Copy link
Owner

wooorm commented Sep 29, 2016

If you’re worried about size, I don’t suggest using franc-most.

800kb is rather large, because it encompasses more information than you need (btw, GZip make it 75kB)

Here’s the list of currently detected languages in the most bundle:

abk
ace
acu
ada
afr
agr
ajg
aka
als
alt
amc
ame
amh
amr
arb
arl
arn
ast
ayr
azj
azj
bam
ban
bba
bci
bcl
bel
bem
bfa
bho
bin
bis
boa
bod
bos
bos
bre
bug
bul
cab
cak
cat
cbr
cbt
cbu
ceb
ces
cha
chj
chk
cic
cjk
cjs
ckb
cnh
cni
cof
cos
cot
cpu
crs
csa
csw
ctd
cym
dag
dan
ddn
deu
dga
dip
dyo
dyu
emk
eng
epo
est
eus
eve
evn
ewe
fao
fij
fin
fon
fra
fur
gaa
gag
gax
gjn
gkp
gla
gle
glg
guc
gug
guu
gyr
hat
hau
haw
hea
heb
hil
hin
hlt
hms
hna
hni
hrv
hsb
hun
hus
huu
ibb
ibo
ike
ilo
ind
isl
ita
jav
jiv
kal
kaz
kbp
kde
kea
kek
kha
khk
kin
kir
kjh
kmb
knc
kng
koi
koo
kqn
kri
krl
kwi
lad
lav
lia
lin
lit
lns
lob
lot
loz
ltz
lua
lue
lug
lun
lus
mad
mag
mah
mai
mam
mar
maz
mcd
mcf
men
mic
min
miq
mkd
mlt
mos
mri
mxi
mxv
mzi
nav
nba
nbl
ndo
nds
nep
nhn
nio
njo
nld
nno
nob
not
nso
nya
nym
nyn
nzi
ojb
oss
ote
pam
pau
pbb
pcd
pes
pis
plt
pol
pon
por
pov
ppl
quc
qud
qug
quy
quz
qva
qvc
qvh
qvm
qvn
qwh
qxa
qxn
qxu
rar
rgn
rmn
rmy
roh
ron
run
rus
sag
sah
san
sco
shk
shp
skr
slk
slv
sme
smo
sna
snk
snn
som
sot
spa
src
srp
srp
srr
ssw
suk
sun
sus
swb
swe
swh
tah
tat
tbz
tca
tem
tet
tgk
tgl
tir
tiv
tob
toi
toj
ton
top
tpi
tsn
tso
tsz
tuk
tuk
tur
tyv
tzm
uig
uig
ukr
umb
ura
urd
uzn
uzn
ven
vep
vie
vmw
war
wln
wol
wwa
xho
xsm
yad
yao
yap
ydd
ykg
yor
yua
zam
ztu
zul

@d668
Copy link
Author

d668 commented Sep 29, 2016

but ho can i get JSON to match the 2-letter code? Should I do it myself?

@wooorm
Copy link
Owner

wooorm commented Sep 29, 2016

Yup, that’s outside the scope of this project, but it would be useful if someone created a list of such mapping (see GH-30). Note however that many of the languages listed above do not have two-letter codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants