Stream processing #37

turquoiseowl · 2013-01-31T21:16:19Z

I've been trying to work out how to localize DataAnnotations, jQuery.validation stuff etc. and had an idea:

Is there scope in i18n for general processing of the HTTP response output and making translation there. E.g. we could scan for msgids wrapped in some markers like ###Translate me!###, lookup any corresponding message and, if found, swap it in.

Maybe this has already been thought of, or is not practical?

About to go away and investigate ASP.NET HttpModules/Handlers... I guess it would also require xgettext or equivalent to be able to locate these marked strings in the project source.

danielcrenna · 2013-01-31T21:49:34Z

I tried that in the beginning by using the CodeDom and actually parsing out
_("") symbols in the code. It worked reasonably well but I switched to
xgettext to avoid having to reinvent all the parsing logic once I had the
symbols.

The idea to intercept incoming HTTP traffic is a creative one. If it's
practical to do that then we don't even need to provide dummy classes that
pretend to be real DataAnnotations. Though for the server side we might
have to resort to private reflection if we want to intercept the real
annotations. All very cool ideas. Maybe add them as possible features for
the 2.0 bucket.

It might be a lot of effort, but I can see the value of not having to think
about anything usage wise beyond using the alias.

On Thu, Jan 31, 2013 at 4:16 PM, Martin Connell notifications@github.comwrote:

I've been trying to work out how to localize DataAnnotations,
jQuery.validation stuff etc. and had an idea:

Is there scope in i18n for general processing of the HTTP response output
and making translation there. E.g. we could scan for msgids wrapped in some
markers like ###Translate me!###, lookup any corresponding message and, if
found, swap it in.

Maybe this has already been thought of, or is not practical?

About to go away and investigate ASP.NET HttpModules/Handlers... I guess
it would also require xgettext or equivalent to be able to locate these
marked strings in the project source.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/37
.

turquoiseowl · 2013-02-01T15:52:23Z

First attempt at doing this:

69f4d19

It's been working well so far with early testing. An HttpModule needs to be installed in web.config like this:

          <system.webServer>
            <modules>
              <add name="i18n.LocalizingModule" type="i18n.LocalizingModule, i18n" />
            </modules>
          </system.webServer>

Uses regex to lookout for 'nuggets' in the html or javascript matching the pattern:

[[[Translate me!]]]

If found, extracts the msgid from within the braces and replaces the entire nugget with the result of a GetText call.

Outstanding issues on which I'd welcome input:

Best way to parse these nuggets in the javascript and .cs files?
Resources subject to this translation and that are outside of our route localization (e.g. javascript files) will need their URL patched to reflect the language.
Best choice of markers. Originally I went for «««User Name»»» which I like, but not so easy to type in, so went for the braces. Used 3 braces to avoid problems with double brace in CDATA... Haven't gone for curly braces because...
Whether and how to include formatting options. E.g.

[[[Hello {0}, you last visited {1} {2} ago||{Fred}{10}{days}]]]

Point 2. is making me nervous for it may require route localization to be moved out of MVC into another HTTP module/filter.

Cheers Martin

danielcrenna · 2013-02-01T16:09:07Z

Moving everything out to a filter may not be all bad. But it requires some
thought. We can leverage something like
https://github.com/danielcrenna/minirack to be able to self-install the
module so no web.config fiddling is required. I'm working on a project that
requires most of my time but should be able to work on this in earnest next
week. I like your first thoughts. For JavaScript we could use the approach
I took for the original YuiCompressor port and use the C# ECMA script mod
to parse it, or use Jurassic, whichever is faster to get a symbol tree, but
it's not trivial. We could try to use Regex... and have two problems :)

Using a module to tag external resource URLs for localization is no problem
for local resources, this is how most minification / auto-combining tools
work. I think we'd have to ignore external JS since most libraries don't
provide localized equivalents.

On Fri, Feb 1, 2013 at 10:52 AM, Martin Connell notifications@github.comwrote:

First attempt at doing this:

69f4d1969f4d19

It's been working well so far with early testing. An HttpModule needs to
be installed in web.config like this:
      <system.webServer>
        <modules>
          <add name="i18nResponseFilter" type="i18n.ResponseFilterModule, i18n" />
        </modules>
      </system.webServer>
Uses regex to lookout for 'nuggets' in the html or javascript matching the
pattern:

[[[Translate me!]]]

If found, extracts the msgid from within the braces and replaces the
entire nugget with the result of a GetText call.

Outstanding issues on which I'd welcome input:

Best way to parse these nuggets in the javascript and .cs files?

Resources subject to this translation and that are outside of our
route localization (e.g. javascript files) will need their URL patched to
reflect the language.

Best choice of markers. Originally I went for «««User Name»»» which
I like, but not so easy to type in, so went for the braces. Used 3 braces
to avoid problems with double brace in CDATA... Haven't gone for curly
braces because...

Whether and how to include formatting options. E.g.

[[[Hello {0}, you last visited {1} {2} ago||{Fred}{10}{days}]]]

Point 2. is making me nervous for it may require route localization to be
moved out of MVC into another HTTP module/filter.

Cheers Martin

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-12999812.

Daniel Crenna
Conatus Creative Inc.
cell:613.400.4286

raulvejar · 2013-02-01T20:57:45Z

Another thing that I've thought of is that this approach will not allow you to see potential unclosed string tags at compile time, which is very useful, specially if you are doing complex HTML in the middle
We might want to add an extension to the editor that allows you to visualize these...

turquoiseowl · 2013-02-02T10:52:24Z

Thanks Raul. By string tags, do you mean the [[[ and ]]] markers (or whatever we might finalize on)?

turquoiseowl · 2013-02-02T11:02:27Z

@daniel, my assumption was that we (you :) ) would use Regex to parse Javascript in the PO build phase, in fact with the same pattern as used in the response filter as they are both doing the same job.

But I'm assuming the user edits the Javascript sources so as to decorate the target strings, thus have their own versions of jquery.validate or whatever.

Or are you thinking of extracting undecorated strings from Javascript?

turquoiseowl · 2013-02-02T11:03:45Z

Self-install of the module = sweet.

turquoiseowl · 2013-02-05T15:17:17Z

Update on progress with the HTTP module:

Following on from the idea to move more out to it, the module facilitates two functions:

Nugget Localization - The translation of 'nuggets' (e.g. [[[Translate me!]]]) which are in the HTTP response entity body and for one reason or another were not translated earlier on in the ASP.NET pipeline. The notable example of this is with DataAnnotations, but may also include Javascript and any other enabled content type.

For example, I have a model property annotated like this:

    public class SignupModel
    {
        [Required(ErrorMessage = "[[[A User Name is required]]]")]
        [Display(Name="[[[User Name]]]")]
        public string UserName { get; set; }
    }

The [[[User Name]]] nugget is picked up in the response entity and with the result of calling GetText("User Name").

The choice of markers is user configurable, but needs to correspond to the markers used by the xgettext/postbuild generator (which remains outstanding work).

URL Localization - This is the localization of URLs by the HTTP Module.

A request's URL is processed early in the pipeline in order to derive the principal language for the request (this is called Early URL Localization). Any langtag in the URL is then stripped off so as not to bother the app with it. The app can read the principle language from and HttpContext extension method. This includes the GetText language matching algorithm.

Early URL Localization may cause a redirect from a nonlocalized URL to a localized one, according to a scheme setting. It can also be disabled entirely to support consistent URLs regardless of principal user language.

As an optimization, URL Localization also supports patching of outgoing URLs found in the response entity where appropriate. This patching converts nonlocalized same-host URLs to a localized form e.g. example.com -> example.com/fr. This avoids an unnecessary redirect should the user agent subsequently request the URL. A regex is used to do this. It works well for me, so far, but is configurable and may be disabled if so desired.

Other features:

User-defined filters may be added to control processing of both incoming and outgoing URL
Pivot point for controlling formatting of localized URLs
URL localization now independent of MVC.

Changes to initialization

Here's an example of what an Application_Start may contain:

           // Init i18n support.
            i18n.LocalizedApplication.DefaultLanguage = "fr";
            i18n.LocalizedApplication.PermanentRedirects = true;
            i18n.LocalizedApplication.EarlyUrlLocalizer = null; // disable EUL = consistent URLs
            i18n.UrlLocalizer.UrlLocalizationScheme = i18n.UrlLocalizationScheme.Scheme2;
            i18n.UrlLocalizer.IncomingUrlFilters += delegate(Uri url) {
                if (url.LocalPath.EndsWith("sitemap.xml", StringComparison.InvariantCultureIgnoreCase)) {
                    return false; }
                return true;
            };

I intend to back off the v2.0 branch now for a while. The Nugget Localization feature is dependent on xgettext/postbuildtask support for it which I understand others will do. If not, please let me know and I'll get onto it as I need it for my project.

My testing has been limited to a small website; it should be tested with a larger site (which I do not have at present) before being released. If this work is accepted, there will then be some redundant stuff in there related to MVC route localization and action filter to lop out.

Regards Martin Connell

525fd1f

turquoiseowl · 2013-02-06T16:03:39Z

Added support for formatted nuggets which enables more sophisticated DataAnnotations:

    public class SignupModel
    {
        [StringLength(100, ErrorMessage = "[[[Enter between %0 and %1 characters|||{2}|||{1}]]]", MinimumLength = 6)]
        public string Password { get; set; }
}

The syntax is like this:

[[[<canonical_msgid>|||val0|||val1|||valn]]]

valn is a runtime value stuffed into the nugget and picked up by the post processor for the replacement of a corresponding %n identifier in the canonical msgid.

So in the post-processing of nuggets:

message = GetText(<canonical_msgid>)
Replace %n with {n} in message ...
message = string.Format(message, val1, val2, ... valn)

PO Generation Notes

The parser for PO file generation needs to extract the canonical msgid from the nugget and output that.

Some experimentation with xgettext -a (extract all strings) has successfully managed to extract the nuggets from .cs, .cshtml and .js files. Output of that is suitable as input to a nugget parser.

raulvejar · 2013-02-06T16:42:32Z

Great job
I think your example is missing a parameter or it should be %1 instead of %2

turquoiseowl · 2013-02-06T17:03:37Z

@raulvejar Thanks for your feedback. Just testing :) and well spotted, it should be %1. I'll edit it.

danielcrenna · 2013-02-06T19:15:44Z

Nice work. I'll attempt to get caught up on all of this over the next
little while, will get back with any bugs / ideas, and definitely will
complete the postbuild enhancements first of all.

D.

On Wed, Feb 6, 2013 at 12:03 PM, Martin Connell notifications@github.comwrote:

@raulvejar https://github.com/raulvejar Thanks for your feedback. Just
testing :) and well spotted, it should be %1. I'll edit it.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-13192677.

turquoiseowl · 2013-02-07T11:46:04Z

@danielcrenna Thanks, sounds good, very happy to answer questions etc.

turquoiseowl mentioned this issue Feb 1, 2013

Strings used for annotations are not included in POT file #39

Closed

turquoiseowl added a commit that referenced this issue Feb 6, 2013

Formatted nugget support #37

33668ba

turquoiseowl mentioned this issue Apr 6, 2013

Future Direction for i18n of Web Applications #50

Closed

turquoiseowl closed this as completed May 10, 2013

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream processing #37

Stream processing #37

turquoiseowl commented Jan 31, 2013

danielcrenna commented Jan 31, 2013

turquoiseowl commented Feb 1, 2013

danielcrenna commented Feb 1, 2013

raulvejar commented Feb 1, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 5, 2013

turquoiseowl commented Feb 6, 2013

raulvejar commented Feb 6, 2013

turquoiseowl commented Feb 6, 2013

danielcrenna commented Feb 6, 2013

turquoiseowl commented Feb 7, 2013

Stream processing #37

Stream processing #37

Comments

turquoiseowl commented Jan 31, 2013

danielcrenna commented Jan 31, 2013

turquoiseowl commented Feb 1, 2013

danielcrenna commented Feb 1, 2013

raulvejar commented Feb 1, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 2, 2013

turquoiseowl commented Feb 5, 2013

Changes to initialization

turquoiseowl commented Feb 6, 2013

PO Generation Notes

raulvejar commented Feb 6, 2013

turquoiseowl commented Feb 6, 2013

danielcrenna commented Feb 6, 2013

turquoiseowl commented Feb 7, 2013