Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection defaults to nil #3666

Merged
merged 2 commits into from Jun 9, 2017

Conversation

@mjankowski
Copy link
Collaborator

commented Jun 9, 2017

I believe these code changes are sufficient to change the behavior if we want to, here's some background...

  • Prior to this change, the language detection would "fall back" to the default locale setting for the UI of the instance.
  • This seemed reasonable as a language guess ... but has proven not to be so, especially for highly diverse instances where many languages are used.
  • In those cases, when the language cannot be detected and we have fall back to the instance default, the statuses wind up having a WRONG language applied to them, rather than no language at all applied to them, which seems worse.

Example: prior to this change, a status written in farsi might be undetectable by CLD3 (or at least not reliably detected), so that part would bail out, and if the user did not have a locale set in the UI it would skip that - so then it would fall back to the instance default, which might be en.

This leads to a bunch of statuses in farsi being tagged as en, which is wrong, and which makes it impossible to either a) ignore them by adding a "filter all undetectable languages" setting, b) re-process them with improved language detection in the future.

It seems more accurate to just not tag things when we aren't sure, which is what this change would do.

I'm open to counterargument before merge here ... I THINK that the in-feed and in-html-page lang attributes we have in various places will just wind up being blank, and that that's fine, but I have not thoroughly examined those places yet.

@Gargron
Gargron approved these changes Jun 9, 2017
@Gargron Gargron merged commit 022008a into tootsuite:master Jun 9, 2017
2 checks passed
2 checks passed
codeclimate no new or fixed issues
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mjankowski mjankowski deleted the mjankowski:language-default-to-nil branch Jun 9, 2017
koteitan added a commit to koteitan/mastodon that referenced this pull request Jun 25, 2017
* Default to nil for statuses.language

* Language detection defaults to nil instead of instance UI default
YaQ00 added a commit to YaQ00/mastodon that referenced this pull request Sep 5, 2017
* Default to nil for statuses.language

* Language detection defaults to nil instead of instance UI default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.