Translate strings via Japanese→$language dictionaries? #38

nmlgc · 2014-10-29T03:32:27Z

With the reorganization of the project and the progress we've made in the last year, I think it's worth raising this question again. The new project administration will need to take a stance on this before covering the next game, and it'll be important to know the background behind my decision against such a system.

It has always been my design goal for thcrap's translation functionality not to rely on any original Japanese strings. This decision stems from two observations:

There are static translation patches for earlier games into a variety of languages.
With the older games essentially being abandonware [citation needed] and, as of this writing, only one game being distributed as an official download release, piracy remains the only distribution channel that really matters. And it has been shown time and time again that most site owners rather provide complete, easy-to-use packages aimed to speakers of their own language than clean, original copies - which is further justified by some inadvertent "region locking" that e.g. causes 東方紅魔郷.exe and every game's custom.exe to simply not work on non-Japanese locales.

Combine the two and you have rampant piracy of unofficially translated games. We can't rely on pirated copies coming with the original text anymore. That's just how it is.

This means, however, that we have needed all sorts of alternative indexing systems to correctly assign translations:

Dialogue and endings use a rather involved time code system that additionally gives translators the freedom to edit dialogue on a text box level and allows them to use 2 lines where the original only uses one (and vice versa). This greatly goes against the line-centric design of ZUN's original formats, and is one source of all the complexity in the dialogue patcher (with the other one being the hard line system used in th06, th07 and parts of th08).
Spell cards use their (zero-based) index number in the game's Result screen, which are pulled out from the game's memory using a bunch of breakpoints. This works fine for games that have a result screen. All games that don't (TH095, Uwabami Breakers, TH125, TH143) coincidentally also happen to have no reliable index numbers, and therefore require even more build-specific binary hacks and breakpoints to somehow derive them from the game progression - or, in the case of Uwabami Breakers, a full copy of all .ECL danmaku scripts with corrected IDs. :(
Hardcoded string translation assigns IDs to virtual memory addresses (stringlocs.js), then looks up translations for these IDs in a separate table (stringdefs.js). This is reliable, needs zero game- and build-specific hacks, and I have a script to locate the addresses, but they still need to be committed for every build of every game.
Music Room translation uses a combination of hardcoded string translation (for the "No. X ??????" strings displayed for locked tracks), a separate file containing song title translations (themes.js) and a separate file for comments (musiccmt.js).

The themes.js system was designed before thcrap to serve as a song title source for (hypothetical) third-party applications to cope with frequent translation changes (which in turn was the main motivation for thcrap in the first place).

musiccmt.js basically uses the same format as the generic plaintext translation support that would later be developed for th143, but with a special syntax that replaces a single @ character in a line with a customizable format string, printing the title of the currently selected track.

Not to mention that pulling the theme number out of the game also requires its fair share of build-specific breakpoints.
Dialog resource translation makes use of the fact that the widgets (and thus, their strings) internally appear in a set order, builds a JSON array of hardcoded string IDs (dialog_*.js) in this order, then pulls the translations out of stringdefs.js. Other than that, no build-specific hacks necessary.

Using one single dictionary-based solution instead of these systems would greatly reduce the amount of effort required to support new games at the expense of both compatibility to static patches and more bloat in the translation files.

The text was updated successfully, but these errors were encountered:

nmlgc · 2014-10-29T03:48:58Z

Reposting this from our (now apparently dead) Trello page. I've taken a look at Nutzer's Touhou 8.3 patch and noticed that it frequently uses multiple spell card declarations within a single scene. These would pretty much be impossible to translate without a dictionary system.

Currently, base_tsa merely has an extremely ugly solution to handle this case for the one single scene in the original TH14.3, 3-7, which actually has a second spell card name. It involved finding a certain location in the ECL parsing code that only seems to be called in that specific case, then resetting thcrap's internal spell card ID and assigning the translation for "「リザレクション」" to ID № 1.

(Just in case anyone was still thinking that 14.3 is just a generic Touhou game that shouldn't have posed any difficulty in automatic patching. It is not.)

nmlgc · 2016-07-16T11:44:55Z

Turns out the best implementation is as follows:

Keep the current overall translation file layout, with a separate dictionary for every original file. This is necessary because according to UnKnwn, there are many cases of identical sentences in different contexts which translators might want to translate differently.
Still, there should be an additional global fallback table per game – not only to save translators who don't want to translate those differently.
Add a layer of indirection, so that we go "Japanese": some ID (in filename.ids.json in base_tsa) and then some ID: "translation" (in filename.table.json in the translation patches). This will be necessary for supporting static patches by simply adding "statically translated text": some ID to filename.ids.json.

Note that this means that we technically don't need separate filename.table.json files and could keep all translations in one single big table per game, but I think it would still be better for saving traffic when doing HTTP updates, and for clarity in editing.
By making the ID step optional, we eliminate the need for those separate ID tables where it really isn't all too necessary, as in…
… client-side character name translation, which we can do in the context of an ending like this:

base_tsa/th14/e01.msg.ids.json:
```
{
    "霊夢　「やっと、お祓い棒が大人しくなってきたわ」": "e01_02"
}
```
lang_en/th14/e01.msg.table.json:
```
{
    "e01_02": "<r$<d$霊夢>  >\"Looks like my purification rod's finally calming down.\""
}
```
script_latin/global.table.json:
```
{
    "霊夢": "Reimu"
}
```
The <d$> command would then perform dictionary lookup of the given text in the global table. As you see, we don't need an additional ID table for this case, as the Kanji name lookup could have only been initiated from our translation, which always has the correct source text.
The only instance where we do need a global ID table is TH08 spell card owner translation on top of static patches, where we simply do the reverse for every language we have static patches for:

base_tsa/global.ids.json:
```
{
    "Reimu": "霊夢",
    "Рейму": "霊夢",
    "灵梦": "霊夢"
}
```
Dialog boxes, Music Room comments, and other multi-line strings are looked up and replaced as a single, concatenated string. For better readability in plaintext editors, we should probably support JSON arrays for the values in the table.json files and \n-concatenate those automatically.

… screen in all supported versions. And even in some versions that aren't supported yet. Some of those trials for older games don't even have the safe sprintf() hacks yet, which are necessary for translated versions to show up in the first place, heh. Oh, and while I'm at it: • Don't cover the "th?? JP" string unless there is a good reason. This string is typically only used for the human-readable section of replay files, which we shouldn't translate in order to not introduce incompatibilities. • "th08 Music Room spoiler 5" just consists of a single U+3000 IDEOGRAPHIC SPACE, and isn't meaningfully used in later games. … yeah, we *really* need thpatch/thcrap#38.

32th-System · 2023-05-14T14:16:03Z

For spell cards, dictionary based translation is now technically possible. The new spell_id breakpoint can take multiple parameters from multiple places, and combine them into one string to be used as the spell ID. These parameters can even be strings. So it would be possible to have a spell_name breakpoint like this

{
    "spell_id": {
        [
            {
                "type": "s",
                "param": "ecx"
            }
        ]
    },
    "spell_name": "ecx"
}

or this

{
    "spell_id": "ecx",
    "spell_id_type": "s",
    "spell_name": "ecx"
}

However, by being able to combine as many parameters as you want, there is also no need for a dictionary based translation, even in ISC or ISC mods that use multiple spell declarations in the same scene. On the other hand, I think that spell names in content mods should be entirely up to the mod itself, and that just putting the name in the ecl file and leaving it at that is therefore pefectly OK

32th-System · 2023-07-19T08:23:52Z

In th19, certain server status messages are pulled from the internet. They are therefore

Impossible to static patch
We only want to translate known messages, and if someone encounters a new server message, it should be shown as is

and therefore: only translatable with a dictionary based system. Because of that, I have added a new dict_translate breakpoint. It's placed right before the draw_ctext call that's responsible for drawing those status messages to the screen.

Since we were basically forced to add this functionality, and it has been another 9 years since this issue was opened, using a dictionary based system for other things might warrant another more lengthy discussion

nmlgc added the question label Oct 29, 2014

This was referenced May 1, 2015

Len'en Project support #42

Open

Regex text replacement #49

Open

Assign character mood to dialog translation units? thpatch/thpatch#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate strings via Japanese→$language dictionaries? #38

Translate strings via Japanese→$language dictionaries? #38

nmlgc commented Oct 29, 2014

nmlgc commented Oct 29, 2014

nmlgc commented Jul 16, 2016 •

edited

Loading

32th-System commented May 14, 2023 •

edited

Loading

32th-System commented Jul 19, 2023

Translate strings via Japanese→$language dictionaries? #38

Translate strings via Japanese→$language dictionaries? #38

Comments

nmlgc commented Oct 29, 2014

nmlgc commented Oct 29, 2014

nmlgc commented Jul 16, 2016 • edited Loading

32th-System commented May 14, 2023 • edited Loading

32th-System commented Jul 19, 2023

nmlgc commented Jul 16, 2016 •

edited

Loading

32th-System commented May 14, 2023 •

edited

Loading