Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/i18n headers #67

Merged
merged 8 commits into from
Jul 22, 2017
Merged

Fix/i18n headers #67

merged 8 commits into from
Jul 22, 2017

Conversation

untra
Copy link
Owner

@untra untra commented Jul 22, 2017

  • absolute_url_regex handles the matching of absolute urls that are not excluded or lang'd
  • relative_url_regex will no longer regex match absolute urls
  • process_documents will now instantiate the relative_url_regex and absolute_url_regex,
    and pass these into the processes responsible for gsub!ing the docs. Much faster.
  • fixed i18n_headers to whitespace avoid matching the default lang (href=" http://..." won't match)
  • cleaned up tests

untra and others added 5 commits July 22, 2017 09:17
* `absolute_url_regex` handles the matching of absolute urls that are not excluded or lang'd
* `relative_url_regex` will no longer regex match absolute urls
* `process_documents` will now instantiate the `relative_url_regex` and `absolute_url_regex`,
and pass these into the processes responsible for `gsub!`ing the docs. Much faster.
* fixed i18n_headers to whitespace avoid matching the default lang (`href=" http://..."` won't match)
* cleaned up tests
@untra untra requested a review from vlsi July 22, 2017 17:43
@untra
Copy link
Owner Author

untra commented Jul 22, 2017

Absolute URL parsing will stay in

@vlsi

It makes sense to have some other way to mark unprocessed links.
For instance: [about](http://mywebsite.com/fr/about/<!--nopolyglot-->) or [about](<!--nopolyglot-->http://mywebsite.com/fr/about/) or something else. That way polyglot could trim <!--nopolyglot--> and skip URL processing for the particular URL be it absolute or relative one.

I think I came up with a better solution:
processed: href="http://mywebsite.com/about"
unprocessed: href=" http://mywebsite.com/about"

the extra whitespace after the first quote avoids the regex matching, but is still valid html, and is properly rendered / linked by browsers. This fixes the i18n_headers, rendering them as:

<meta http-equiv="Content-Language" content="de">
<link rel="alternate" i18n="en" href=" http://localhost:4000/about/"/>
<link rel="alternate" i18n="es" href="http://localhost:4000/es/about/"/>
<link rel="alternate" i18n="de" href="http://localhost:4000/de/about/"/>
<link rel="alternate" i18n="fr" href="http://localhost:4000/fr/about/"/>

Let me know if this works for you. Thanks! 👍

@vlsi
Copy link
Collaborator

vlsi commented Jul 22, 2017

the extra whitespace after the first quote avoids the regex matching, but is still valid html

I think it is worth documenting.

@vlsi
Copy link
Collaborator

vlsi commented Jul 22, 2017

By the way: have you considered removing the whitespace?
That is treat leading whitespace as "nolocalize" flag, and kill the whitespace to make final markup clean.

@untra
Copy link
Owner Author

untra commented Jul 22, 2017

I added some documentation to the readme explaining the whitespace approach. I'm going to add further instructions to the v.1.3.0 blogpost.

By the way: have you considered removing the whitespace?
That is treat leading whitespace as "nolocalize" flag, and kill the whitespace to make final markup clean.

I'd say we should leave it up to the user to install a html minifier. The generated html is still valid, and a jekyll minifier will certainly get the job done anyways.

@untra
Copy link
Owner Author

untra commented Jul 22, 2017

If this PR looks good, mark it as reviewed and good ✅

README.md Outdated
@@ -90,7 +90,13 @@ becomes
Notice the link `<a href="/fr/menu/">...` directs to the french website.

Even if you are falling back to `default_lang` page, relative links built on the *french* site will
still link to *french* pages.
still link to *french* pages.v
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last v looks odd.

url_quoted = config['url']
url_quoted = Regexp.quote(url_quoted) unless url_quoted.nil?
%r{href=\"(?:#{url_quoted})?#{@baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}
%r{href=\"?#{@baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}
Copy link
Collaborator

@vlsi vlsi Jul 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, relative URLs like /en/about will still be processed, while absolute URLs that include /:lang are excluded. Is that difference between absolute/relative intentional?
Should this be documented/tested?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be intentional. The exclusion of the /:lang from absolute urls is needed to ensure that the i18n_headers are mostly left alone.

You are right thought a relative url like /en/about will be processed into /en/en/about which isn't right. Maybe the relative urls should get the same treatment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're getting the same treatment 👍

README.md Outdated
If you defined a site `url` in your `_config.yaml`, polyglot will automatically relativize links pointing to your absolute site url. If you don't want them relativized, adding a space explicitly to an href prevents the the absolute url from being relativized.

processed: `href="http://mywebsite.com/about"`
unprocessed: `href=" http://mywebsite.com/about"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the space trick work for absolute URLs only? Frankly speaking, I think it is easier to understand in a form of "in case you want to keep the URL as is, just prepend it with a whitespace and polyglot would ignore the link".

If whitespace trick works for all kinds of URLs, then the wording should probably be improved. Currently it sounds like the feature is absolute-URL-only.

@vlsi
Copy link
Collaborator

vlsi commented Jul 22, 2017

On the other hand, the following versions build the site for me just fine (except #68).
I18N headers are fine, the links between pages are fine.

ruby 2.4.1p111 (2017-03-22 revision 58053)

Using i18n 0.8.6
Using minitest 5.10.3
Using thread_safe 0.3.6
Using public_suffix 2.0.5
Using bundler 1.15.2
Using colorator 1.1.0
Using multipart-post 2.0.0
Using ffi 1.9.18
Using forwardable-extended 2.6.0
Using gemoji 3.0.0
Using mini_portile2 2.2.0
Using rb-fsevent 0.10.2
Using kramdown 1.14.0
Using liquid 3.0.6
Using mercenary 0.3.6
Using rouge 1.11.1
Using safe_yaml 1.0.4
Using jekyll-paginate 1.1.0
Using tzinfo 1.2.3
Using addressable 2.5.1
Using faraday 0.12.2
Using rb-inotify 0.9.10
Using pathutil 0.14.0
Using nokogiri 1.8.0
Using activesupport 4.2.9
Using sawyer 0.8.1
Using sass-listen 4.0.0
Using listen 3.0.8
Using html-pipeline 2.6.0
Using octokit 4.7.0
Using sass 3.5.1
Using jekyll-watch 1.5.0
Using jekyll-gist 1.4.1
Using jekyll-sass-converter 1.5.0
Using jekyll 3.4.3
Using jekyll-feed 0.9.2
Using jekyll-polyglot 1.3.0 from git://github.com/untra/polyglot.git (at fix/i18n_headers@b033972)
Using jekyll-sitemap 0.13.0
Using jemoji 0.8.0
Using minimal-mistakes-jekyll 4.1.0 from /Users/vladimirsitnikov/Documents/code/minimal-mistakes (at vlsi_master@cec37d6)

end

def relativize_absolute_urls(doc, regex, url)
doc.output.gsub!(regex, "href=\"#{url}/#{@active_lang}/" + '\1"')
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

doc.output.gsub!(regex, "href=\"#{url}#{baseurl}/#{@active_lang}/" + '\1"')

@languages.each do |l|
regex += "(?!#{l}\/)"
end
%r{href=\"?#{url}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

%r{href=\"?#{url}#{baseurl}\/((?:#{regex}[^,'\"\s\/?\.#]+\.?)*(?:\/[^\]\[\)\(\"\'\s]*)?)\"}

@untra untra merged commit 504a2f7 into master Jul 22, 2017
@untra untra deleted the fix/i18n_headers branch July 22, 2017 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants