Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable language detection for texts shorter than 140 characters #8010

Merged
merged 1 commit into from Jul 14, 2018

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented Jul 13, 2018

It's wildly inaccurate (and confident of it!) for short sentences:

image

@Gargron Gargron added the bug Something isn't working label Jul 13, 2018
If the input text is blank after preparation (only mention, or
only URL, or empty as in a media post), then use nil as language,
since it's OK to show to everyone.

Otherwise, always fall back to the server's default locale
Copy link
Sponsor Member

@ykzts ykzts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@Gargron Gargron merged commit 38e9662 into master Jul 14, 2018
@Gargron Gargron deleted the fix-cld-false-positives branch July 14, 2018 02:06
@tribela
Copy link
Contributor

tribela commented Jul 14, 2018

I agree for CLD3 is inaccurate in latin character set.
But many of Asian language that are written in non-latin characters are use completely different character set. And CLD3 is detecting it accurately (As seen below)
image

How about enable CLD3 when post contains non-latin characters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants