Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/i18n headers #67

Merged
merged 8 commits into from Jul 22, 2017
Merged

Fix/i18n headers #67

merged 8 commits into from Jul 22, 2017

Conversation

@untra
Copy link
Owner

@untra untra commented Jul 22, 2017

  • absolute_url_regex handles the matching of absolute urls that are not excluded or lang'd
  • relative_url_regex will no longer regex match absolute urls
  • process_documents will now instantiate the relative_url_regex and absolute_url_regex,
    and pass these into the processes responsible for gsub!ing the docs. Much faster.
  • fixed i18n_headers to whitespace avoid matching the default lang (href=" http://..." won't match)
  • cleaned up tests
untra added 5 commits Jul 18, 2017
* `absolute_url_regex` handles the matching of absolute urls that are not excluded or lang'd
* `relative_url_regex` will no longer regex match absolute urls
* `process_documents` will now instantiate the `relative_url_regex` and `absolute_url_regex`,
and pass these into the processes responsible for `gsub!`ing the docs. Much faster.
* fixed i18n_headers to whitespace avoid matching the default lang (`href=" http://..."` won't match)
* cleaned up tests
@untra untra requested a review from vlsi Jul 22, 2017
@untra
Copy link
Owner Author

@untra untra commented Jul 22, 2017

Absolute URL parsing will stay in

@vlsi

It makes sense to have some other way to mark unprocessed links.
For instance: [about](http://mywebsite.com/fr/about/<!--nopolyglot-->) or [about](<!--nopolyglot-->http://mywebsite.com/fr/about/) or something else. That way polyglot could trim <!--nopolyglot--> and skip URL processing for the particular URL be it absolute or relative one.

I think I came up with a better solution:
processed: href="http://mywebsite.com/about"
unprocessed: href=" http://mywebsite.com/about"

the extra whitespace after the first quote avoids the regex matching, but is still valid html, and is properly rendered / linked by browsers. This fixes the i18n_headers, rendering them as:

<meta http-equiv="Content-Language" content="de">
<link rel="alternate" i18n="en" href=" http://localhost:4000/about/"/>
<link rel="alternate" i18n="es" href="http://localhost:4000/es/about/"/>
<link rel="alternate" i18n="de" href="http://localhost:4000/de/about/"/>
<link rel="alternate" i18n="fr" href="http://localhost:4000/fr/about/"/>

Let me know if this works for you. Thanks! 👍

@vlsi
Copy link
Collaborator

@vlsi vlsi commented Jul 22, 2017

the extra whitespace after the first quote avoids the regex matching, but is still valid html

I think it is worth documenting.

@vlsi
Copy link
Collaborator

@vlsi vlsi commented Jul 22, 2017

By the way: have you considered removing the whitespace?
That is treat leading whitespace as "nolocalize" flag, and kill the whitespace to make final markup clean.

@untra
Copy link
Owner Author

@untra untra commented Jul 22, 2017

I added some documentation to the readme explaining the whitespace approach. I'm going to add further instructions to the v.1.3.0 blogpost.

By the way: have you considered removing the whitespace?
That is treat leading whitespace as "nolocalize" flag, and kill the whitespace to make final markup clean.

I'd say we should leave it up to the user to install a html minifier. The generated html is still valid, and a jekyll minifier will certainly get the job done anyways.

@untra
Copy link
Owner Author

@untra untra commented Jul 22, 2017

If this PR looks good, mark it as reviewed and good

README.md Outdated
@@ -90,7 +90,13 @@ becomes
Notice the link `<a href="/fr/menu/">...` directs to the french website.

Even if you are falling back to `default_lang` page, relative links built on the *french* site will
still link to *french* pages.
still link to *french* pages.v

This comment has been minimized.

@vlsi

vlsi Jul 22, 2017
Collaborator

The last v looks odd.

url_quoted = config['url']
url_quoted = Regexp.quote(url_quoted) unless url_quoted.nil?
%r{href=\"(?:#{url_quoted})?#{@baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}
%r{href=\"?#{@baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}

This comment has been minimized.

@vlsi

vlsi Jul 22, 2017
Collaborator

As far as I can tell, relative URLs like /en/about will still be processed, while absolute URLs that include /:lang are excluded. Is that difference between absolute/relative intentional?
Should this be documented/tested?

This comment has been minimized.

@untra

untra Jul 22, 2017
Author Owner

It should be intentional. The exclusion of the /:lang from absolute urls is needed to ensure that the i18n_headers are mostly left alone.

You are right thought a relative url like /en/about will be processed into /en/en/about which isn't right. Maybe the relative urls should get the same treatment.

This comment has been minimized.

@untra

untra Jul 22, 2017
Author Owner

they're getting the same treatment 👍

README.md Outdated
If you defined a site `url` in your `_config.yaml`, polyglot will automatically relativize links pointing to your absolute site url. If you don't want them relativized, adding a space explicitly to an href prevents the the absolute url from being relativized.

processed: `href="http://mywebsite.com/about"`
unprocessed: `href=" http://mywebsite.com/about"`

This comment has been minimized.

@vlsi

vlsi Jul 22, 2017
Collaborator

Should the space trick work for absolute URLs only? Frankly speaking, I think it is easier to understand in a form of "in case you want to keep the URL as is, just prepend it with a whitespace and polyglot would ignore the link".

If whitespace trick works for all kinds of URLs, then the wording should probably be improved. Currently it sounds like the feature is absolute-URL-only.

@vlsi
Copy link
Collaborator

@vlsi vlsi commented Jul 22, 2017

On the other hand, the following versions build the site for me just fine (except #68).
I18N headers are fine, the links between pages are fine.

ruby 2.4.1p111 (2017-03-22 revision 58053)

Using i18n 0.8.6
Using minitest 5.10.3
Using thread_safe 0.3.6
Using public_suffix 2.0.5
Using bundler 1.15.2
Using colorator 1.1.0
Using multipart-post 2.0.0
Using ffi 1.9.18
Using forwardable-extended 2.6.0
Using gemoji 3.0.0
Using mini_portile2 2.2.0
Using rb-fsevent 0.10.2
Using kramdown 1.14.0
Using liquid 3.0.6
Using mercenary 0.3.6
Using rouge 1.11.1
Using safe_yaml 1.0.4
Using jekyll-paginate 1.1.0
Using tzinfo 1.2.3
Using addressable 2.5.1
Using faraday 0.12.2
Using rb-inotify 0.9.10
Using pathutil 0.14.0
Using nokogiri 1.8.0
Using activesupport 4.2.9
Using sawyer 0.8.1
Using sass-listen 4.0.0
Using listen 3.0.8
Using html-pipeline 2.6.0
Using octokit 4.7.0
Using sass 3.5.1
Using jekyll-watch 1.5.0
Using jekyll-gist 1.4.1
Using jekyll-sass-converter 1.5.0
Using jekyll 3.4.3
Using jekyll-feed 0.9.2
Using jekyll-polyglot 1.3.0 from git://github.com/untra/polyglot.git (at fix/i18n_headers@b033972)
Using jekyll-sitemap 0.13.0
Using jemoji 0.8.0
Using minimal-mistakes-jekyll 4.1.0 from /Users/vladimirsitnikov/Documents/code/minimal-mistakes (at vlsi_master@cec37d6)
end

def relativize_absolute_urls(doc, regex, url)
doc.output.gsub!(regex, "href=\"#{url}/#{@active_lang}/" + '\1"')

This comment has been minimized.

@untra

untra Jul 22, 2017
Author Owner

should be

doc.output.gsub!(regex, "href=\"#{url}#{baseurl}/#{@active_lang}/" + '\1"')
@languages.each do |l|
regex += "(?!#{l}\/)"
end
%r{href=\"?#{url}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}

This comment has been minimized.

@untra

untra Jul 22, 2017
Author Owner

should be

%r{href=\"?#{url}#{baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}
@vlsi
vlsi approved these changes Jul 22, 2017
@untra untra merged commit 504a2f7 into master Jul 22, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@untra untra deleted the fix/i18n_headers branch Jul 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.