Skip to content

Case 3: Grisaia no Kajitsu UTF8 transplant

Robert Jordan edited this page Aug 15, 2022 · 3 revisions

Case 3: Grisaia no Kajitsu UTF8 transplant (2nd attempt)

Used games: Grisaia no Kajitsu (18*, denpasoft), Happiness 2! Sakura Celebration (JP release engine ver 2.7.1.90), Nekopara Vol.3 (EN, engine ver 2.7.1.76, source unknown)

🚧 This page is a work in progress

This documents the first successful major engine transplant and lists the steps and knowledge used during the procedure. It still needs heavy reformatting to better explain the process.

UTF-8 Support

cs2.exe

Includes: cs2.exe, cs2confx.dll, (emotedriver.dll if present)

Upgrading

Upgrading is as simple as replacing the KEY_CODE:KEY, V_CODE:DATA, and V_CODE2:DATA resources into the newer engine executable.

Things to avoid

Unless creating a patch for a Steam game, it's recommended you avoid using a target engine with Steam DRM, it's not the end of the world, but it'll mean extra hoops to go through (and more legality issues).

If upgrading ban engine with E-Mote

Encoding Changes

The newest CatSystem 2 engines now use and expect modern text encodings to support all system locales, languages, and special characters without any extra special modifications.

Older CatSystem 2 engines all expect Shift_JIS, text files, and text data. The engine switched to being locale independent around 2010. This change involved removing APIs that read and wrote text data automatically.

For example, in C:

// 't' for text mode, text encoding
// at the mercy of the system locale
FILE *fText=fopen("somefile.txt","rt");

// 'b' for binary mode, text encoding
// is manually parsed
FILE *fBin =fopen("somefile.txt","rb");

This issue still plagues open toolset compilers like mc.exe, fes.txt, and potentially ac.exe. However the last doesn't expect Shift_JIS text, so maybe not.

These programs still have to be run while emulating the Japanese locale. Unless your files are pure ASCII (needs confirmation).

WCHAR

now use Windows Unicode wchar_t strings instead of the standard ANSI multibyte char strings. This is a major difference as it unlocks access to filesystem paths, no matter what obscure characters they contain. Before filepaths were assumed as either Shift_JIS or the ANSI code page users by the system locale.

UTF-8

Includes: *.xml, *.csv, language*.txt, (TODO: more files)

Similarly to using Unicode internally (which is the only real choice with Windows programming and WINAPI), CatSystem 2 now assumes most text files are now stored in UTF-8 encoding.

cs2confx.dll replaces the older cs2conf.dll file. The older one will not work due to compatibility differences. (Likely now uses Unicode WCHAR API)

Front End Scripts

Includes: fes.int, *.fes files

In order to support UTF-8 text drawing, a front end script must specify string objects of the type: STRING8, LOGSTRING8, or NVSTRING8, these are object types dedicated to rendering UTF-8 text data. Meanwhile STRING, LOGSTRING, and NVSTRING are the Shift_JIS equivalents. Older games will always be using these string types.

STRING8 object example

#DEFINE
    // String types shown to draw a meswnd (message window)
    STRING8 str
    STRING8 namestr

One other major part of string drawing in the Front End Scripts need changing, and that's in the #STRLAYOUT header.

#STRLAYOUT is called to prepare drawing of message (and name) strings. Cs2 games using localization features will conditionally change some things layout settings depending on the language variable.

#STRLAYOUT
    str layout size 0,0,0,0
    str layout margin 0,0,0,0
    str layout frame \110
    // one of the two lines below
    str layout fontind 0 FontName.ttf
    str layout font  // default font

    str layout indent ...
    str layout fixedstr 0  // fixed string layout, this is very important. And must be zero.

This is essentially it for FES (when it comes to scenes. There are many configuration Front End Scripts (primarily flow.fes, the so-called 'main' script of Cs2) setup localizations settings and global variables. Everything else is handled in configuration files.

KCS

Includes: kcs.int, mot.int, ptcl.int, *.kcs files

Binary scripts for the Cs2 engine. As they contain low-level assembly-like instructions, decompiling is less feasible.

Assume that all of these archives (and any others containing .kcs files) should be completely replaced with those of the target UTF-8 compatible game.

These are almost guaranteed to break things (or crash on startup) if missing.

Config

Includes: config.int, font.int, *.xml, *.csv, *.ttf

All .csv (nametable.csv) and .xml Giles should be re-encoded as UTF-8. For XML, remember to change the encoding attribute in the XML DOCTYPE <?xml version="1.0" encoding="UTF-8"?>

Startup.xml (and Master)

Add languages grouping as seen in Nekopara Vol.3 startup.xml.

DO NOT add $0000-0000$ style localization strings to game titles without adding support for such in FES scripts. Only keybind names can probably be safely changed.

Optionally add font.int INT archives to INTS. Only if needed by FES strings.

GameDef.xml

Add this file to config/ (not the archive), use reference from target engine.

Soundconf.xml, and other new xml files aren't required.

Adv.xml

Add some elements from Nekopara Vol.3 Adv.xml, primarily <script><encoding>utf8</encoding></script>. Or is it a scene tag? Other fields may need adding too, but they haven't been confirmed.

Unknown

One config file, possibly GameDef.xml or Adv.xml has some required font config, specifically the one for 'fixed-length strings'.

CSTL

The base CSTL localization files can be generated from CST scene scripts by joining all the name/message fields within an 'input region'. The trailing \@ should be removed from any messages (it just marks a continued text before an eventual empty message (usually), then input.

At this point, you can add translations for each new language, and include them with the original scraper messages when building the CSTL file. Note: The order languages are listed in in CSTL does not corresond to the indexes in startup.xml's languages section. But they do seem to be listed alphabetically.