Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat XML storage format #3776

Merged
merged 7 commits into from Jun 21, 2019

Conversation

Projects
None yet
3 participants
@BhaaLseN
Copy link
Contributor

commented Mar 11, 2018

This introduces a new monolingual storage format for a flat XML file that may look like this:

<root>
    <str key="hello_world">Hello World!</str>
    <str key="new_format">This is the new XML-based format.</str>
    <str key="flat_structure">It uses a flat structure.
Just a root element, and a number of what is basically key/value pairs.</str>
    <str key="customizable">Names and namespaces can be customized as parameters (so you can use any &lt;root&gt; or &lt;element&gt; name as you like).</str>
</root>

Main motivation to add such a format was to use it with Weblate (so CC @nijel in case he wants to chime in on things that might be required for Weblate; such as the mandatory TranslationUnit.settarget method), as none of the existing formats were close enough to what we use to provide translation in our XSLT based stylesheets (where XML simply made the most sense).

I'd still consider this WIP, as I could use some feedback/help on the following:

  • Choice of attributes: Assuming this is fine, since base classes also do that. Leaving out comments for now (can always add them later if someone really wants them)
    • The resource key goes to msgid (unless I am the only one that thinks this is fitting), while the value goes to msgstr. msgctxt isn't used; nor are locations (since those strings aren't really from a particular source file but their own seperate entity; similar to RESX)
    • Comments aren't supported (yet), mostly because our own sources are both inconsistent (XML comment before/after the actual element it belongs to) and unspecific (XML comments delimit "blocks" of resources where the developer thought it makes sense to seperate them; rather than documenting one particular resource). I can however add support for those if wanted; and we can agree on where the comments should be to be picked up (my choice would be before the resource element)
  • Copyright headers: Going by the other files; it just goes on the actual sources (not on tests); with creation year and original author
    • Do they go in all files, with no exceptions; or just on actual workable source files (not on tests)?
    • Is there a copy&paste ready template somewhere, or do I just use one from any other file?
    • Do I have to change anything from there, like the copyright year, names or whatever?
  • Documentation files:
    • Are they generated from source (if so: which commands can I use?) or written by hand? Or a combination of both?
    • Should I assume their location and just put an URL in the format documentation comment, or is this done in another way?
  • Tests:
    • Need more, want less?
    • Some of the existing test files didn't make sense to me, so I wrote them differently. If that was simply something I didn't see/understand/recognize (which is entirely possible; since Python is far from my main language), please point it out so I can address that.
    • Find out why Python 3 returns a different encoding. Requires me first fixing my local virtualenv to return a working python3 installation newer than 3.2... Looks like the default string encoding was different; added an explicit XML Declaration for that.

And also: Everything else I didn't think about while implementing this :)

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch 6 times, most recently from bd81f8e to b10e749 Mar 11, 2018

Show resolved Hide resolved translate/storage/flatxml.py Outdated

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from b10e749 to 671cbc9 Mar 16, 2018

@unho

This comment has been minimized.

Copy link
Member

commented Mar 18, 2018

I only gave this PR a quick review, but I have some major blockers:

  • Format needs documentation (documentation is written by hand)
    • This should include a link to the official format specification
  • Converters need documentation (documentation is written by hand)
  • Converters need tests in Python, despite having functional tests is great

I must say I am quite pleased to see that the converters are coded using the pattern I am trying to introduce for all the converters in TTK, which should allow for easier maintenance and refactoring of common code.

@BhaaLseN

This comment has been minimized.

Copy link
Contributor Author

commented Mar 18, 2018

Thanks, I did open this PR early to get this kind of feedback.

I'll look into the documentation soon-ish, and add some more tests for the converters.

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from 671cbc9 to 14d5390 Mar 19, 2018

@BhaaLseN

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2018

Rebased on latest master to get the changes in (especially for the tests that now require the capsys parameter for test_help).
I noticed a bunch of cleanup commits for the converters, is there anything I should take care of? Style-wise it doesn't seem to be too consistent regarding inputfile vs. input_file (not even in a po2x/x2po pair).

I also added some documentation, taking some inspiration from other documentation pages. I put the format under "translatable documents" next to HTML, since it seemed most fitting (vs. DTD, which would be be more fitting IMO, but is Mozilla-specific).
Feedback (obviously) appreciated, since I usually don't write end-user focused documentation :S

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch 2 times, most recently from cdc2faf to d995676 Apr 14, 2018

@BhaaLseN BhaaLseN referenced this pull request May 11, 2018

Open

Custom file format #3810

Show resolved Hide resolved translate/convert/po2flatxml.py Outdated
Show resolved Hide resolved translate/convert/po2flatxml.py Outdated
Show resolved Hide resolved translate/convert/po2flatxml.py Outdated
Show resolved Hide resolved docs/formats/flatxml.rst Outdated
@unho

This comment has been minimized.

Copy link
Member

commented Oct 6, 2018

@BhaaLseN I have performed another review (sorry for the delay). This PR has greatly improved but I still found some stuff that needs to be addressed. If you manage to get those nits sorted out this will be a very welcome contribution, thanks!

@BhaaLseN

This comment has been minimized.

Copy link
Contributor Author

commented Oct 6, 2018

I'll look into getting them addressed either later today or tomorrow, thanks for getting back to it!

Got the things addressed (and fixed a typo in docs), so I'm ready for some more review (if necessary).
CI is still running, but my local tests were fine; so I'm not too concerned there.

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from d995676 to 2b690f9 Oct 6, 2018

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from 2b690f9 to fa75843 Feb 13, 2019

@BhaaLseN

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2019

Rebased on latest master and updated the tests to not use the mixed quotes, although #3871 would be a nice addition to use instead of my lazy single-level reindent there.

CI fail seems to be the known (?) segfault, not sure I can do anything about that.

@nijel

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

I've created separate issue for the test segfaults: #3873

@unho

This comment has been minimized.

Copy link
Member

commented Feb 15, 2019

@BhaaLseN master issues seem to be fixed, can you please rebase this PR again? Thanks.

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from fa75843 to 9576bb3 Feb 15, 2019

@BhaaLseN

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

Rebased. CI seems happy now :)
I also dropped my lazy reindent in favor of xml_helpers.reindent.

Show resolved Hide resolved translate/storage/flatxml.py Outdated
Show resolved Hide resolved translate/storage/flatxml.py Outdated

BhaaLseN added some commits Mar 7, 2018

add storage class for flat XML conversion
Flat XML is a simple, one-level XML file that uses an attribute to
identify the resource and the actual translation as its element value.

Root element name, Value element name and Key attribute name can be
customized, as well as the element namespace and the desired indentation
when serializing the file back to XML.

@BhaaLseN BhaaLseN force-pushed the BhaaLseN:flat-xml branch from 9576bb3 to 46d2349 Mar 5, 2019

@nijel nijel merged commit 03c3d9c into translate:master Jun 21, 2019

4 checks passed

codecov/patch 94.08% of diff hit (target 69.77%)
Details
codecov/project 69.98% (+0.2%) compared to 4718fab
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@nijel

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

Merged, thanks for your contribution!

@nijel nijel self-assigned this Jun 21, 2019

@BhaaLseN BhaaLseN deleted the BhaaLseN:flat-xml branch Jun 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.