How well will this handle Chinese? #23

benjiwheeler · 2019-06-27T19:18:59Z

I know that Chinese does not have the same density of spaces as English and most languages; a Chinese character is more analogous to an English word than an English letter.

Would you expect your classifier to treat Chinese characters as letters, or as words?

toonimoadi · 2021-10-21T20:02:41Z

Depends on your tokenizer.
By default it will tokenize Chinese characters as letters, but you can easily modify it with the following tokenizer

bayes({
    tokenizer: function (text) { return text.replace(/\s/g, '').split('') }
})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How well will this handle Chinese? #23

How well will this handle Chinese? #23

benjiwheeler commented Jun 27, 2019

toonimoadi commented Oct 21, 2021 •

edited

Loading

How well will this handle Chinese? #23

How well will this handle Chinese? #23

Comments

benjiwheeler commented Jun 27, 2019

toonimoadi commented Oct 21, 2021 • edited Loading

toonimoadi commented Oct 21, 2021 •

edited

Loading