libvoikko IntegrationTesting

Harri Pitkänen edited this page Dec 19, 2015 · 7 revisions

Integration testing of libvoikko and its backends

During the early days of developing libvoikko and its Finnish Malaga based morphology (Suomi-malaga) it was decided that most of the automated testing of the system should happen at integration level. This was because it was expected that the Malaga backend would some day be replaced with something else (this has now in fact happened). We wanted to be able to use our test suite with the new backend without any modifications.

All of the compulsory pre-commit tests that every developer of libvoikko and voikko-fi are required to run are driven by Python script voikkotest. The script should be run whenever any change is made that may affect the functionality of libvoikko/voikko-fi combination.

Setting up the test environment

These instructions use examples and defaults that work in Linux environment. Some changes may be needed for other operating systems. Please add information and/or fix the test tools if you try this on OS X or Windows.

  1. Make sure you have all the required development tools installed. You need at least
    • C++ compiler (GCC and MSVC should both work)
      • With GCC you will also need autoconf, automake and pkg-config
    • Python
    • Foma (and Malaga if you want to test the old dictionary format)
    • GNU Make
    • Git (the official command line client and TODO some Windows client should both work)
  2. Do a Git clone of the core project sources. If you want to minimize the need for additional configuration the recommended location for the checkout is $HOME/git/corevoikko. Later we refer to this directory as $CORE.
  3. Compile libvoikko with desired backends. Currently tests exist for Malaga and VFST backends.
  4. If you used the autotools build system you should run make check to see if the internal test suite passes. For now this does not include many tests, most would need some external data such as dictionaries to work. This is being worked on by improving the test automation.
  5. Install libvoikko. In these instructions we assume that it is installed in a location such that the system library loader finds the shared libraries and executable binaries are in $PATH. Other private locations are possible but you will need to do some additional configuration.
  6. (Only needed for slow mode testing) Configure libvoikko to look for the default morphology from $CORE/voikko-fi/vvfst. On Linux and OS X you can do this by issuing the following commands on the command prompt: mkdir -p ~/.voikko/2 ln -s $CORE/voikko-fi/vvfst ~/.voikko/2/mor-standard On Windows you can use system registry to set the dictionary path and install voikko-fi binary morphology under that directory in subdirectory 2/mor-standard. See the READMEs from $CORE/libvoikko and $CORE/voikko-fi for more information.
  7. Add $CORE/tools/bin to your $PATH, at least in the shell you use for development.
  8. Add $CORE/tools/pylib and $CORE/libvoikko/python to your $PYTHONPATH, at least in the shell you use for development.
  9. Create directory $HOME/tmp/voikkotest which will be used for storing the temporary (and some not so temporary) files required by the test suite. You can choose different location for this directory but that requires a change in configuration.
  10. (Only needed for slow mode testing) Compile a list of Finnish words (preferably at least 200 000 unique entries) as a gzipped text file to $HOME/tmp/voikkotest/wordlist.txt.gz . The list does not need to consist of valid words, it should contain spelling errors and other garbage too. See $CORE/tools/bin/wp-wordlist for a Python script and instructions on how to create such list from a Wikipedia dump. You don't need to use this method, you can build the list from other sources too.

Using voikkotest

There are two modes in which voikkotest can be used: "quick" mode for running the basic test suite and "slow" mode for pre-commit testing.

voikkotest in quick mode

In quick mode voikkotest will compile morphologies if needed and then run the basic test suite. If the binaries are up to date this will only take a few seconds. To run voikkotest in quick mode simply type voikkotest in the terminal and press Enter. It does not even matter which directory you are in. Try this as the first when you have set up your test environment, it is the easiest way of making sure that everything works as it should. The output from successful test run looks like this: Starting test suite under /home/harri/git/corevoikko/tests/voikkotest/fi-x-malmor "voikko-fi_FI.pro" is up to date. "voikko-fi_FI.pro" is up to date. install -m 755 -d /home/harri/tmp/voikkotest/inst/2/mor-malmor install -m 644 /home/harri/tmp/voikkotest/build/fi-x-malmor/voikko-fi_FI.pro /home/harri/tmp/voikkotest/build/fi-x-malmor/voikko-fi_FI.lex_? /home/harri/tmp/voikkotest/build/fi-x-malmor/voikko-fi_FI.mor_? /home/harri/tmp/voikkotest/build/fi-x-malmor/voikko-fi_FI.sym_? /home/harri/tmp/voikkotest/inst/2/mor-malmor Build complete, starting tests All 33 morphological tests were successful. Starting test suite under /home/harri/git/corevoikko/tests/voikkotest/fi-x-vfst make: Kohteelle "vvfst" ei tarvitse tehdä mitään. install -m 755 -d /home/harri/tmp/voikkotest/inst/2/mor-vfstd install -m 644 /home/harri/tmp/voikkotest/build/fi-x-vfstd/voikko-fi_FI.pro /home/harri/tmp/voikkotest/build/fi-x-vfstd/mor.vfst /home/harri/tmp/voikkotest/inst/2/mor-vfstd Build complete, starting tests All 24 spelling tests were successful. All 5 morphological tests were successful. Starting test suite under /home/harri/git/corevoikko/tests/voikkotest/fi-x-malstd "voikko-fi_FI.pro" is up to date. "voikko-fi_FI.pro" is up to date. install -m 755 -d /home/harri/tmp/voikkotest/inst/2/mor-malstd install -m 644 /home/harri/tmp/voikkotest/build/fi-x-malstd/voikko-fi_FI.pro /home/harri/tmp/voikkotest/build/fi-x-malstd/voikko-fi_FI.lex_? /home/harri/tmp/voikkotest/build/fi-x-malstd/voikko-fi_FI.mor_? /home/harri/tmp/voikkotest/build/fi-x-malstd/voikko-fi_FI.sym_? /home/harri/tmp/voikkotest/inst/2/mor-malstd Build complete, starting tests All 863 spelling tests were successful. All 50 suggestion tests were successful. All 127 morphological tests were successful. All 72 hyphenation tests were successful. All 164 grammar tests were successful. All 31 tokenizer tests were successful. All 18 sentence splitting tests were successful. All 4 compatibility tests were successful.

If you are not able to get fully successful results from voikkotest in this mode and cannot solve the problem yourself, ask help from our mailing list. Failing test suite is a sign of a problem that needs to be fixed immediately and developers should never make commits that leave the test suite in such state.

Quick mode tests can be written for different backends and different languages. For examples see files under $CORE/tests.

voikkotest in slow mode

Slow mode is currently only available for testing the default Finnish dictionary. There are three commands involved in operating voikkotest in slow mode:

  • voikkotest --base will generate and store the baseline list of known good words from the full list in wordlist.txt.gz. You should run this after you have managed to get the tests pass in fast mode and have not yet made any changes to source code.
  • voikkotest --current will do a full rebuild of voikko-fi binaries, run the basic test suite, use the current code to generate a new list of good words from wordlist.txt.gz and finally tell whether the list of good words has changed after the last time when voikkotest --base was run. You should run this after you have made your changes to source and are ready to commit them.
  • If voikkotest --current reported changes compared to baseline, you can use voikkotest --compare to view the differences and determine if they were expected. If not, you have introduced a bug. Add a test case for it to the test suite, fix it in the code and try again. Once there are no undesired changes in the output, you can do the actual commit. After that you should run voikkotest --base to reset the baseline.

In slow mode voikkotest will also report some timing data that can be used to detect performance regressions. Usually there is no need to actively follow these numbers.

Configuring the test environment

While it is possible to run voikkotest without any configuration you may want to adjust some settings or use non-default paths for certain directories. We use a shared configuration mechanism for test and development tools that is used also in voikkotest. Here is how it should be activated:

  1. Take the sample configuration file from $CORE/tools/doc/voikko_dev_prefs.py and place it somewhere within your $PYTHONPATH. You may want to add an extra directory for this or use some of the directories that are already there.
  2. Uncomment the settings you want to change and set the value appropriately. The sample file has comments explaining what the settings do.

On Windows it is necessary to set encoding='UTF-8'.

Adding new tests to the basic test suite

TODO: write this paragraph and extend the tools to support testing of multiple languages and language variants.

Dictionary compatibility tests

Dictionaries for Voikko are versioned with a single interface version number. Any version of libvoikko that supports given dictionary interface should work with any dictionary with that interface. It is possible to introduce bug fixes or features to the dictionary-libvoikko interface without changing the interface version but only if this does not break the ability to use older dictionaries with the same interface version.

In quick mode voikkotest will run dictionary compatibility tests to make sure that the version of libvoikko being tested works sufficiently well with old dictionaries. No compatibility tests are provided with the program and not having them will not cause a test failure, just a warning. You need to provide the compatibility tests yourself if you wish to run them. We don't expect everyone to run them but at least dictionary developers are invited to run these tests against all released versions of the dictionaries they maintain. Of course dictionaries with older interface version do not need to be tested.

You need two thing to be able to run the compatibility tests:

  • Dictionaries that you want to test. These should be placed under $HOME/tmp/voikkotest/compatibility/ within subdirectories that start with mor-. For example testing Finnish with Suomi-malaga versions 1.5 and 1.6 would require these files $HOME/tmp/voikkotest/compatibility/mor-sm1.4/2/mor-standard/voikko-fi_FI.pro $HOME/tmp/voikkotest/compatibility/mor-sm1.4/2/mor-standard/voikko-fi_FI.sym_l $HOME/tmp/voikkotest/compatibility/mor-sm1.4/2/mor-standard/voikko-fi_FI.mor_l $HOME/tmp/voikkotest/compatibility/mor-sm1.4/2/mor-standard/voikko-fi_FI.lex_l $HOME/tmp/voikkotest/compatibility/mor-sm1.5/2/mor-standard/voikko-fi_FI.pro $HOME/tmp/voikkotest/compatibility/mor-sm1.5/2/mor-standard/voikko-fi_FI.sym_l $HOME/tmp/voikkotest/compatibility/mor-sm1.5/2/mor-standard/voikko-fi_FI.mor_l $HOME/tmp/voikkotest/compatibility/mor-sm1.5/2/mor-standard/voikko-fi_FI.lex_l Additionally each dictionary variant name should be modified to contain the string that follows mor- in the directory name. For example mor-sm1.5/2/mor-standard/voikko-fi_FI.pro should contain line like this: info: Language-Variant: standardsm1.5
  • Assertions file $HOME/tmp/voikkotest/compatibility/assertions.txt which should contain Python expressions that evaluate to True when any of the dictionaries is used. In the expressions v represents the Voikko object. Only basic stuff that works with all dictionary versions should be tested here. For Finnish the assertions file could look like this: v.spell(u"kissa") not v.spell(u"kisssa")