Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Reduce toot size limit for language detection #9760
Is there any data about that?
Because, anecdotally, 140 is long enough to basically be useless, and google translate at least (no idea if that actually uses cld3 in the background) can detect the language for much shorter texts.
Sure, but that shows that it's possible to reliably detect languages with texts shorter than 140 characters.
So I'd like to know if anyone knows why 140 was picked, and if it's maybe possible that CLD3 could handle shorter texts?
Maybe 20 is too short, but 40 or 60 would work?
That screenshot shows:
So it doesn't seem all that awful to use a sliding scale? I.e. for things under 40 characters or things marked unreliable, don't trust the detection.