Skip to content

Latest commit

 

History

History
29 lines (26 loc) · 2.99 KB

UpdatingUnicodeFiles.md

File metadata and controls

29 lines (26 loc) · 2.99 KB

Updating Unicode data files

In order to update the Unicode data files, follow these steps:

  1. Download the following files into a folder named dat in your current working directory. If updating to another version, replace 12.1.0 with the version you are aiming for.
  2. Run src/com.oracle.truffle.regex/tools/unicode-script.sh. This generates the following files in dat:
    • UnicodeFoldTable.txt
    • NonUnicodeFoldTable.txt
    • PythonFoldTable.txt
  3. Run src/com.oracle.truffle.regex/tools/generate_case_fold_table.clj >> src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/tregex/parser/CaseFoldTable.java to generate the new case fold tables and append them to CaseFoldTable.java. Then open CaseFoldTable.java in an editor to replace the old character data with the new definitions.
  • In order to run this script, you will need to have a way to run Clojure scripts.
    • You can use Boot (https://boot-clj.com/), which lets you execute the script directly. Boot can usually be installed from your distribution's package manager.
    • Alternatively, you can use a Clojure jar file directly as in java -jar clojure-1.8.0.jar --init src/com.oracle.truffle.regex/tools/generate_case_fold_table.clj --eval '(-main)'.
  1. Run src/com.oracle.truffle.regex/tools/generate_unicode_properties.py > src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/charset/UnicodePropertyData.java. This rewrites UnicodePropertyData.java to contain the new definitions of Unicode properties.
  2. Run the main method of com.oracle.truffle.regex.charset.UnicodeGeneralCategoriesGenerator and replace src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/charset/UnicodeGeneralCategories.java with its output.
  3. Run src/com.oracle.truffle.regex/tools/generate_ruby_case_folding.py and replace src/com/oracle/truffle/regex/tregex/parser/flavors/RubyCaseFoldingData.java with its output.
  4. Run src/com.oracle.truffle.regex/tools/generate_name_alias_table.py and replace src/com/oracle/truffle/regex/chardata/UnicodeCharacterAliases.java with its output.`
  5. Run mx eclipseformat to fix any code formatting issues.

All of the above steps are automated by run_scripts.sh. This script assumes you have the following things installed: clojure, python3, wget, and unzip.