Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream processing #37

Closed
turquoiseowl opened this issue Jan 31, 2013 · 13 comments
Closed

Stream processing #37

turquoiseowl opened this issue Jan 31, 2013 · 13 comments
Milestone

Comments

@turquoiseowl
Copy link
Owner

I've been trying to work out how to localize DataAnnotations, jQuery.validation stuff etc. and had an idea:

Is there scope in i18n for general processing of the HTTP response output and making translation there. E.g. we could scan for msgids wrapped in some markers like ###Translate me!###, lookup any corresponding message and, if found, swap it in.

Maybe this has already been thought of, or is not practical?

About to go away and investigate ASP.NET HttpModules/Handlers... I guess it would also require xgettext or equivalent to be able to locate these marked strings in the project source.

@danielcrenna
Copy link
Collaborator

I tried that in the beginning by using the CodeDom and actually parsing out
_("") symbols in the code. It worked reasonably well but I switched to
xgettext to avoid having to reinvent all the parsing logic once I had the
symbols.

The idea to intercept incoming HTTP traffic is a creative one. If it's
practical to do that then we don't even need to provide dummy classes that
pretend to be real DataAnnotations. Though for the server side we might
have to resort to private reflection if we want to intercept the real
annotations. All very cool ideas. Maybe add them as possible features for
the 2.0 bucket.

It might be a lot of effort, but I can see the value of not having to think
about anything usage wise beyond using the alias.

On Thu, Jan 31, 2013 at 4:16 PM, Martin Connell notifications@github.comwrote:

I've been trying to work out how to localize DataAnnotations,
jQuery.validation stuff etc. and had an idea:

Is there scope in i18n for general processing of the HTTP response output
and making translation there. E.g. we could scan for msgids wrapped in some
markers like ###Translate me!###, lookup any corresponding message and, if
found, swap it in.

Maybe this has already been thought of, or is not practical?

About to go away and investigate ASP.NET HttpModules/Handlers... I guess
it would also require xgettext or equivalent to be able to locate these
marked strings in the project source.


Reply to this email directly or view it on GitHubhttps://github.com//issues/37
.

@turquoiseowl
Copy link
Owner Author

First attempt at doing this:

69f4d19

It's been working well so far with early testing. An HttpModule needs to be installed in web.config like this:

          <system.webServer>
            <modules>
              <add name="i18n.LocalizingModule" type="i18n.LocalizingModule, i18n" />
            </modules>
          </system.webServer>

Uses regex to lookout for 'nuggets' in the html or javascript matching the pattern:

[[[Translate me!]]]

If found, extracts the msgid from within the braces and replaces the entire nugget with the result of a GetText call.

Outstanding issues on which I'd welcome input:

  1. Best way to parse these nuggets in the javascript and .cs files?
  2. Resources subject to this translation and that are outside of our route localization (e.g. javascript files) will need their URL patched to reflect the language.
  3. Best choice of markers. Originally I went for «««User Name»»» which I like, but not so easy to type in, so went for the braces. Used 3 braces to avoid problems with double brace in CDATA... Haven't gone for curly braces because...
  4. Whether and how to include formatting options. E.g.
[[[Hello {0}, you last visited {1} {2} ago||{Fred}{10}{days}]]]

Point 2. is making me nervous for it may require route localization to be moved out of MVC into another HTTP module/filter.

Cheers Martin

@danielcrenna
Copy link
Collaborator

Moving everything out to a filter may not be all bad. But it requires some
thought. We can leverage something like
https://github.com/danielcrenna/minirack to be able to self-install the
module so no web.config fiddling is required. I'm working on a project that
requires most of my time but should be able to work on this in earnest next
week. I like your first thoughts. For JavaScript we could use the approach
I took for the original YuiCompressor port and use the C# ECMA script mod
to parse it, or use Jurassic, whichever is faster to get a symbol tree, but
it's not trivial. We could try to use Regex... and have two problems :)

Using a module to tag external resource URLs for localization is no problem
for local resources, this is how most minification / auto-combining tools
work. I think we'd have to ignore external JS since most libraries don't
provide localized equivalents.

On Fri, Feb 1, 2013 at 10:52 AM, Martin Connell notifications@github.comwrote:

First attempt at doing this:

69f4d1969f4d19

It's been working well so far with early testing. An HttpModule needs to
be installed in web.config like this:

      <system.webServer>
        <modules>
          <add name="i18nResponseFilter" type="i18n.ResponseFilterModule, i18n" />
        </modules>
      </system.webServer>

Uses regex to lookout for 'nuggets' in the html or javascript matching the
pattern:

[[[Translate me!]]]

If found, extracts the msgid from within the braces and replaces the
entire nugget with the result of a GetText call.

Outstanding issues on which I'd welcome input:

  1. Best way to parse these nuggets in the javascript and .cs files?
  2. Resources subject to this translation and that are outside of our
    route localization (e.g. javascript files) will need their URL patched to
    reflect the language.
  3. Best choice of markers. Originally I went for «««User Name»»» which
    I like, but not so easy to type in, so went for the braces. Used 3 braces
    to avoid problems with double brace in CDATA... Haven't gone for curly
    braces because...
  4. Whether and how to include formatting options. E.g.

[[[Hello {0}, you last visited {1} {2} ago||{Fred}{10}{days}]]]

Point 2. is making me nervous for it may require route localization to be
moved out of MVC into another HTTP module/filter.

Cheers Martin


Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-12999812.

Daniel Crenna
Conatus Creative Inc.
cell:613.400.4286

@raulvejar
Copy link
Contributor

Another thing that I've thought of is that this approach will not allow you to see potential unclosed string tags at compile time, which is very useful, specially if you are doing complex HTML in the middle
We might want to add an extension to the editor that allows you to visualize these...

@turquoiseowl
Copy link
Owner Author

Thanks Raul. By string tags, do you mean the [[[ and ]]] markers (or whatever we might finalize on)?

@turquoiseowl
Copy link
Owner Author

@daniel, my assumption was that we (you :) ) would use Regex to parse Javascript in the PO build phase, in fact with the same pattern as used in the response filter as they are both doing the same job.

But I'm assuming the user edits the Javascript sources so as to decorate the target strings, thus have their own versions of jquery.validate or whatever.

Or are you thinking of extracting undecorated strings from Javascript?

@turquoiseowl
Copy link
Owner Author

Self-install of the module = sweet.

@turquoiseowl
Copy link
Owner Author

Update on progress with the HTTP module:

Following on from the idea to move more out to it, the module facilitates two functions:

Nugget Localization - The translation of 'nuggets' (e.g. [[[Translate me!]]]) which are in the HTTP response entity body and for one reason or another were not translated earlier on in the ASP.NET pipeline. The notable example of this is with DataAnnotations, but may also include Javascript and any other enabled content type.

For example, I have a model property annotated like this:

    public class SignupModel
    {
        [Required(ErrorMessage = "[[[A User Name is required]]]")]
        [Display(Name="[[[User Name]]]")]
        public string UserName { get; set; }
    }

The [[[User Name]]] nugget is picked up in the response entity and with the result of calling GetText("User Name").

The choice of markers is user configurable, but needs to correspond to the markers used by the xgettext/postbuild generator (which remains outstanding work).

URL Localization - This is the localization of URLs by the HTTP Module.

A request's URL is processed early in the pipeline in order to derive the principal language for the request (this is called Early URL Localization). Any langtag in the URL is then stripped off so as not to bother the app with it. The app can read the principle language from and HttpContext extension method. This includes the GetText language matching algorithm.

Early URL Localization may cause a redirect from a nonlocalized URL to a localized one, according to a scheme setting. It can also be disabled entirely to support consistent URLs regardless of principal user language.

As an optimization, URL Localization also supports patching of outgoing URLs found in the response entity where appropriate. This patching converts nonlocalized same-host URLs to a localized form e.g. example.com -> example.com/fr. This avoids an unnecessary redirect should the user agent subsequently request the URL. A regex is used to do this. It works well for me, so far, but is configurable and may be disabled if so desired.

Other features:

  • User-defined filters may be added to control processing of both incoming and outgoing URL
  • Pivot point for controlling formatting of localized URLs
  • URL localization now independent of MVC.

Changes to initialization

Here's an example of what an Application_Start may contain:

           // Init i18n support.
            i18n.LocalizedApplication.DefaultLanguage = "fr";
            i18n.LocalizedApplication.PermanentRedirects = true;
            i18n.LocalizedApplication.EarlyUrlLocalizer = null; // disable EUL = consistent URLs
            i18n.UrlLocalizer.UrlLocalizationScheme = i18n.UrlLocalizationScheme.Scheme2;
            i18n.UrlLocalizer.IncomingUrlFilters += delegate(Uri url) {
                if (url.LocalPath.EndsWith("sitemap.xml", StringComparison.InvariantCultureIgnoreCase)) {
                    return false; }
                return true;
            };

I intend to back off the v2.0 branch now for a while. The Nugget Localization feature is dependent on xgettext/postbuildtask support for it which I understand others will do. If not, please let me know and I'll get onto it as I need it for my project.

My testing has been limited to a small website; it should be tested with a larger site (which I do not have at present) before being released. If this work is accepted, there will then be some redundant stuff in there related to MVC route localization and action filter to lop out.

Regards Martin Connell

525fd1f

turquoiseowl added a commit that referenced this issue Feb 6, 2013
@turquoiseowl
Copy link
Owner Author

Added support for formatted nuggets which enables more sophisticated DataAnnotations:

    public class SignupModel
    {
        [StringLength(100, ErrorMessage = "[[[Enter between %0 and %1 characters|||{2}|||{1}]]]", MinimumLength = 6)]
        public string Password { get; set; }
}

The syntax is like this:

[[[<canonical_msgid>|||val0|||val1|||valn]]]

valn is a runtime value stuffed into the nugget and picked up by the post processor for the replacement of a corresponding %n identifier in the canonical msgid.

So in the post-processing of nuggets:

  1. message = GetText(<canonical_msgid>)
  2. Replace %n with {n} in message ...
  3. message = string.Format(message, val1, val2, ... valn)
PO Generation Notes

The parser for PO file generation needs to extract the canonical msgid from the nugget and output that.

Some experimentation with xgettext -a (extract all strings) has successfully managed to extract the nuggets from .cs, .cshtml and .js files. Output of that is suitable as input to a nugget parser.

@raulvejar
Copy link
Contributor

Great job
I think your example is missing a parameter or it should be %1 instead of %2

@turquoiseowl
Copy link
Owner Author

@raulvejar Thanks for your feedback. Just testing :) and well spotted, it should be %1. I'll edit it.

@danielcrenna
Copy link
Collaborator

Nice work. I'll attempt to get caught up on all of this over the next
little while, will get back with any bugs / ideas, and definitely will
complete the postbuild enhancements first of all.

D.

On Wed, Feb 6, 2013 at 12:03 PM, Martin Connell notifications@github.comwrote:

@raulvejar https://github.com/raulvejar Thanks for your feedback. Just
testing :) and well spotted, it should be %1. I'll edit it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-13192677.

@turquoiseowl
Copy link
Owner Author

@danielcrenna Thanks, sounds good, very happy to answer questions etc.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants