Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] only translate "foreign" text #29

Closed
gingerbeardman opened this issue Aug 5, 2018 · 20 comments · Fixed by #89
Closed

[REQ] only translate "foreign" text #29

gingerbeardman opened this issue Aug 5, 2018 · 20 comments · Fixed by #89

Comments

@gingerbeardman
Copy link
Collaborator

gingerbeardman commented Aug 5, 2018

It would be great if the extension only translated "foreign" text

ideas:

  • pages that have a different locale than target translation language (or user Safari/macOS locale?)
  • if translation output is the same as input

screen shot 2018-08-05 at 22 35 55

@gingerbeardman gingerbeardman changed the title [REQ] only translate foreign text [REQ] only translate "foreign" text Aug 5, 2018
@devemio
Copy link
Collaborator

devemio commented Aug 6, 2018

To implement this we can retrieve the detected language from Google API and check with our targetLanguage. But in this case we shouldn't dispatch message showPanel with loading icon first. In other words, we first need to translate the text and then decide whether to show the panel or not. Perhaps there is another good solution to determine the language of the selected text on the extension side.

@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Aug 6, 2018

To get the language of the page we have the following:

  1. HTML lang (cf HTML spec)
    <html lang="ja-JP"...
  2. META charset (cf HTML spec)
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"...
  3. META locale (cf Facebook)
    <meta property="og:locale" content="ja_JP"...

If testing those disables and the user still wants to translate they can use the context menu.

To detect user language:

  1. navigator.language (language of browser app?)
  2. navigator.languages (users preferred languages?)
    (these are sometimes the same but not always)

There's also Window.onlanguagechange to take care of.

@uetchy
Copy link
Owner

uetchy commented Aug 7, 2018

Yes. We need to do something for this.

However, language detection is inefficient because ...

  • Not all site admins follow the rule (some site mistakenly points to inappropriate language)
  • Detecting a language using APIs takes a large amount of network requests
  • Doing language detection without showing loading indicator bring it with the loss of UX.

Instead of those, how about we'd offer a toolbar button (and key shortcut) for switching instant translation feature?

@gingerbeardman
Copy link
Collaborator Author

That doesn't solve the problem IMHO.

Maybe an option to switch off language detection for Instant Translation?

@devemio
Copy link
Collaborator

devemio commented Aug 8, 2018

Using the API to define the language brings a lot of extra network requests.

I think that some kind of shortcut would be a good idea to switch instant translation feature.

@gingerbeardman
Copy link
Collaborator Author

I'm working on this one :)

@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Aug 8, 2018

Just putting some thoughts down...

If we try to simplify the goal: what do we really want to do?

Check if the language of the text/page is the same as the target language.
We don't actually need to know what it is, Google Translate can figure that out.

function getPageLanguage() {
  var langHTML = document.documentElement.lang

  var langCharSet = document.characterSet

  var x = document.getElementsByTagName("META")
  for (var i = 0; i < x.length; i++) {
    if (x[i].getAttribute('property') == 'og:locale')
      var langOgLocale = x[i].getAttribute('content')
  }

  if (langHTML != null) console.log(langHTML)
  if (langCharSet != null) console.log(langCharSet)
  if (langOgLocale != null) console.log(langOgLocale)
  if (langClosest != null) console.log(langClosest)

  return
}
URL lang characterSet og:locale
https://www.github.com en UTF-8
https://www.hbo.com/about/faqs en UTF-8
https://www.bbc.co.uk/cymrufyw cy UTF-8 cy_GB
https://www.jp.playstation.com ja-JP UTF-8 ja_JP
https://ja.wikipedia.org/wiki/富士山 ja UTF-8
http://www.jah.ne.jp/~hanhan4/imasara/8lakes/ EUC-JP

Maybe it's enough to just check lang? Benefit is that it is the same format as the languages are stored in Settings.plist

There's also a possibility that elements within the page are given a lang attribute, so it's better to start at the click and go up the DOM to the closest lang attribute.

var langClosest = e.closest('[lang]').attr('lang') //closest lang attribute wrapping what was clicked

What if our logic gets it wrong? The user still sees the translation! No worries.

@uetchy
Copy link
Owner

uetchy commented Aug 9, 2018

Thank you for the detailed survey!

I think this feature could be shipped if the following issues were solved.

  • For English speakers, they may have a problem with translating Chinese contents in GitHub because lang in their page indicates 'en' and thus Polyglot won't reveal a translation.
  • Relating to the above, all web service which provides user-generated contents have a similar issue. so at least we should make this feature optional.

Consider filing PR for this feature! so we can test it in the real world whether it works well 👍

@gingerbeardman
Copy link
Collaborator Author

Agreed. This is a difficult problem to solve satisfactorily. I will give it more thought.

@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Jul 8, 2019

I was thinking it is possible to guess encoding of text using JavaScript or Obj-C (now that the extension is native) with quick initial check to see if it only contains ASCII bytes.

Background
https://unicodebook.readthedocs.io/guess_encoding.html

JavaScript
https://github.com/aadsm/jschardet

Obj-C (Foundation)
https://developer.apple.com/documentation/foundation/nsstring/1413576-stringencodingfordata?language=objc

@gingerbeardman gingerbeardman mentioned this issue Jul 8, 2019
5 tasks
@uetchy uetchy added this to To do in Polyglot 3 Jul 9, 2019
@uetchy uetchy removed this from To do in Polyglot 3 Jul 16, 2019
@stale
Copy link

stale bot commented Sep 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Sep 6, 2019
@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Sep 6, 2019

Let's still do this.

@gingerbeardman
Copy link
Collaborator Author

I still run into this problem.

I think perhaps my issues would mostly be solved switching off Polyglot for contents of input/textarea?

@uetchy
Copy link
Owner

uetchy commented Mar 18, 2020

Putting "On/off" toggle switch inside textarea (as Grammarly does) is one of our options.
Screen Shot 2020-03-18 at 19 49 38

@gingerbeardman
Copy link
Collaborator Author

I will look into how that works.

@AntonUspishnyi
Copy link

Hi there.
I have Safari in English, but I want translate English to Russian?
How can I do this?

@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Aug 12, 2020

@AntonUspehov did you set your preferences in the Polyglot app?

See readme > setup > step 3

@gingerbeardman
Copy link
Collaborator Author

gingerbeardman commented Sep 9, 2020

This issue still annoys me every day.

How about disabling Polyglot for certain domains? For me that would be English Wikipedia for example.

@uetchy
Copy link
Owner

uetchy commented Jan 25, 2021

doing quick experiment. mostly working fine though.

Screen Shot 2021-01-25 at 2 17 32 PM

@uetchy uetchy mentioned this issue Jan 27, 2021
@gingerbeardman
Copy link
Collaborator Author

This does not work reliably for me?

eg. in my eBay.co.uk messages I select part of a message and that triggers Polyglot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants