Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix language detection of non-latin alphabets even at few characters #10276

Merged
merged 1 commit into from Mar 15, 2019

Conversation

Projects
None yet
2 participants
@Gargron
Copy link
Member

Gargron commented Mar 15, 2019

CLD3 is unreliable on short text for latin alphabets, because a lot of languages share them. However, some languages have specific character sets, which makes detecting them more reliable even at short lengths. This PR adds an exception to the 140 characters threshold rule, so that we can reliably detect Japanese, Chinese, Hebrew, Korean, and some others.

Resolve #9760

@Gargron Gargron force-pushed the fix-language-detection-non-latin branch from c25b57f to f5d4aa3 Mar 15, 2019

@Gargron Gargron force-pushed the fix-language-detection-non-latin branch from f5d4aa3 to 0f5a9db Mar 15, 2019

@ykzts

ykzts approved these changes Mar 15, 2019

@Gargron Gargron merged commit 1b16770 into master Mar 15, 2019

11 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: check-i18n Your tests passed on CircleCI!
Details
ci/circleci: install Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.4 Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.5 Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.6 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.4 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.5 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.6 Your tests passed on CircleCI!
Details
ci/circleci: test-webui Your tests passed on CircleCI!
Details
codeclimate All good!
Details

@Gargron Gargron deleted the fix-language-detection-non-latin branch Mar 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.