Skip to content

Translate strings via Japanese→$language dictionaries? #38

Open
@nmlgc

Description

@nmlgc

With the reorganization of the project and the progress we've made in the last year, I think it's worth raising this question again. The new project administration will need to take a stance on this before covering the next game, and it'll be important to know the background behind my decision against such a system.

It has always been my design goal for thcrap's translation functionality not to rely on any original Japanese strings. This decision stems from two observations:

  • There are static translation patches for earlier games into a variety of languages.
  • With the older games essentially being abandonware [citation needed] and, as of this writing, only one game being distributed as an official download release, piracy remains the only distribution channel that really matters. And it has been shown time and time again that most site owners rather provide complete, easy-to-use packages aimed to speakers of their own language than clean, original copies - which is further justified by some inadvertent "region locking" that e.g. causes 東方紅魔郷.exe and every game's custom.exe to simply not work on non-Japanese locales.

Combine the two and you have rampant piracy of unofficially translated games. We can't rely on pirated copies coming with the original text anymore. That's just how it is.

This means, however, that we have needed all sorts of alternative indexing systems to correctly assign translations:

  • Dialogue and endings use a rather involved time code system that additionally gives translators the freedom to edit dialogue on a text box level and allows them to use 2 lines where the original only uses one (and vice versa). This greatly goes against the line-centric design of ZUN's original formats, and is one source of all the complexity in the dialogue patcher (with the other one being the hard line system used in th06, th07 and parts of th08).

  • Spell cards use their (zero-based) index number in the game's Result screen, which are pulled out from the game's memory using a bunch of breakpoints. This works fine for games that have a result screen. All games that don't (TH095, Uwabami Breakers, TH125, TH143) coincidentally also happen to have no reliable index numbers, and therefore require even more build-specific binary hacks and breakpoints to somehow derive them from the game progression - or, in the case of Uwabami Breakers, a full copy of all .ECL danmaku scripts with corrected IDs. :(

  • Hardcoded string translation assigns IDs to virtual memory addresses (stringlocs.js), then looks up translations for these IDs in a separate table (stringdefs.js). This is reliable, needs zero game- and build-specific hacks, and I have a script to locate the addresses, but they still need to be committed for every build of every game.

  • Music Room translation uses a combination of hardcoded string translation (for the "No. X ??????" strings displayed for locked tracks), a separate file containing song title translations (themes.js) and a separate file for comments (musiccmt.js).

    The themes.js system was designed before thcrap to serve as a song title source for (hypothetical) third-party applications to cope with frequent translation changes (which in turn was the main motivation for thcrap in the first place).

    musiccmt.js basically uses the same format as the generic plaintext translation support that would later be developed for th143, but with a special syntax that replaces a single @ character in a line with a customizable format string, printing the title of the currently selected track.

    Not to mention that pulling the theme number out of the game also requires its fair share of build-specific breakpoints.

  • Dialog resource translation makes use of the fact that the widgets (and thus, their strings) internally appear in a set order, builds a JSON array of hardcoded string IDs (dialog_*.js) in this order, then pulls the translations out of stringdefs.js. Other than that, no build-specific hacks necessary.

Using one single dictionary-based solution instead of these systems would greatly reduce the amount of effort required to support new games at the expense of both compatibility to static patches and more bloat in the translation files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions