Find a better way to test Trixi2Vtk #680

efaulhaber · 2021-06-29T09:59:52Z

Currently, Trixi2Vtk tests are comparing hashes of the generated VTK files. This doesn't work reliably, especially not when testing on different operating systems.

sloede · 2021-08-19T18:31:39Z

I am trying to collect some ideas here, and weigh the pros and cons. Feel free to amend this post with more arguments or new ideas (and to leave a message afterwards stating what you did):

0 Do not change anything

That is, compare SHA hashes of the VTK files generated while testing against reference values.

Pro

Zero initial overhead (we already have this!)
Works for all file formats (VTI, VTK, VTS)
Transparent to changes to underlying data written etc. (the hash is oblivious to the file format)
Highly sensitive: anything changes, the hash will change

Con

Overly sensitive: VTK format changes or machine-accuracy differences in Trixi results already produce changed hashes even though no differences are visible to the user
Not portable: different hashes for different OS, different versions of zlib
No diff: a change in the hash only says something has change, but not what has changed
Not very robust & high maintenance: need to continuously update the hashes for any change --> real risk of people just updating hashes without checking files

1 Store VTK uncompressed

Achievable by passing compress=false to vtk_grid as an additional option.

Pro

Might circumvent the issue of different zlib implementations causing different hashes
Easy to implement

Con

Does not reflect what we use in practice (we store as compressed when running Trixi2Vtk)
Other disadvantages of current approach remain

2 Compare only headers

Instead of comparing the entire file and its contents, we could just compare the headers without the data. For example, for a VTK file, the file would look something like:

<?xml version="1.0" encoding="utf-8"?>
<VTKFile type="UnstructuredGrid" version="1.0" byte_order="LittleEndian" header_type="UInt64" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="259997" NumberOfCells="197440">
      <Points>
        <DataArray type="Float64" Name="Points" NumberOfComponents="3" format="appended" offset="0"/>
      </Points>
      <Cells>
        <DataArray type="Int32" Name="connectivity" NumberOfComponents="1" format="appended" offset="543461"/>
        <DataArray type="Int32" Name="offsets" NumberOfComponents="1" format="appended" offset="1369328"/>
        <DataArray type="UInt8" Name="types" NumberOfComponents="1" format="appended" offset="1598133"/>
      </Cells>
      <CellData>
        <DataArray type="Float64" Name="rho" NumberOfComponents="1" format="appended" offset="1598380"/>
        <DataArray type="Float64" Name="v1" NumberOfComponents="1" format="appended" offset="2969685"/>
        <DataArray type="Float64" Name="v2" NumberOfComponents="1" format="appended" offset="4283158"/>
        <DataArray type="Float64" Name="p" NumberOfComponents="1" format="appended" offset="5822483"/>
      </CellData>
    </Piece>
  </UnstructuredGrid>
  <AppendedData encoding="raw">
  [raw encoded data]
  </AppendedData>
</VTKFile>

We could just read in the files line by line until AppendedData, then diff the string against a reference.

Pros

Robust against changes in the file format, the compression, or even changes to the solution
On error, there is a chance to find out what went wrong

Cons

Maybe too insensitive against, e.g., errors during interpolation
Requires new code for creating a suitable diff

3 Read in VTU files and compare directly

Since ParaView files are "somewhat" compatible to XML, we could just parse them as XML and then compare. Alternatively, we could try to read them in as actual VTK files.

Pros

Fine-grained control over sensitivity (e.g., allow differences of floating point values up to a given epsilon)

Cons

Tremendous effort to implement testing infrastructure
Need to store entire VTU files as references

sloede · 2021-08-19T18:32:44Z

My current favorite would be to try 1) and if it doesn't help enough, go for 2).

efaulhaber · 2021-08-19T19:51:17Z

It looks to me like the headers of two simulations in 2) are identical if the same number of cells and the same polydeg has been used. So, the output could have completely different nodal coordinates and nodal values, and the tests would still pass, right?

sloede · 2021-08-19T20:30:18Z

It looks to me like the headers of two simulations in 2) are identical if the same number of cells and the same polydeg has been used. So, the output could have completely different nodal coordinates and nodal values, and the tests would still pass, right?

Yes, correct. Although I think it's unlikely (but not impossible) that the number of cells is "accidentally" generated correctly, this check no. 2) will tell us nothing about the contents of the individual arrays. If that's not enough, I think we'd have to go towards something like 3) and either parse the files as XML or directly write a (simplified) parser to be able to compare values one by one against a reference file.

ranocha · 2021-08-20T06:46:43Z

We're already doing this in practice, aren't we? Speaking at least for me, whenever I see a failing hash check, I expect it to be caused by minor differences and tend to think we should just update the hash. Effectively, we could just remove the checks of the hashes and would only get some smoke tests 🤷

sloede · 2021-08-20T07:07:08Z

Yes, I agree, @ranocha - I have done the same...

In the past, I have done such tests for comparing two files by first comparing their headers and then compare the contents variable by variable. I expected equality for integer types and allowed for 1e-12 difference for floating point values.

The VTK XML format is not super sophisticated, so I guess it would be possible to write a rudimentary reader in Julia. Given that we will likely always rely on ParaView for 3D visualization, I think the question about a more sophisticated test routine is not "if" but rather "when".

sloede · 2021-08-22T11:58:15Z

To support this discussion at the next meeting, I went ahead and implemented a rudimentary VTK XML reader: ReadVTK.jl. It is capable of reading in all unstructured VTK files produced by Trixi2Vtk, i.e., those files ending on .vtu.

Thus, it should be sufficient for comparing VTK files against references during testing, should we decide to go down this road. If necessary, adding support for the remaining file types wouldn't be too hard either.

sloede · 2021-08-24T16:19:29Z

The consensus at today's Trixi meeting was to rewrite Trixi2Vtk.jl testing to use ReadVTK.jl for comparing generated VTK files against reference files. Maybe @FelipeSantillan will be able to help us with this.

sloede · 2021-10-20T07:45:45Z

@FelipeSantillan @ranocha @jlchan

This is what I had in mind for ReadVTK.jl-based testing (but feel free to comment/improve):

Rewrite test_trixi2vtk such that it takes the names of reference VTK files against which to test. Then, for each reference file/test file (we should probably adopt a unique naming convention and stick to it), run the following checks (in this order) in a separate function:

Compare if expected file exists
Compare headers
- meta data such as file type, version, byte order, compressor should match
- number of cells and points should match as well
Compare points
- Shape should match
- Values should match approximately (FP comparison)
Compare cells
- Connectivity, offsets, cell types should match exactly (all integers IIRC)
Compare {point, cell} data
- Number of variables should match
- For each variable in the reference file, a variable should exist in the file to test and vice versa (i.e., no missing, no extra variables)
- For each variable in the reference file, compare to the variable in the file to test:
  - Length should match
  - Types should match
  - Values should match approximately for floating point data, exactly for all other data

A possible API for said test function could be

function compare_vtk_files(reference_file, vtk_file, atol=1e-12)
  ...
end

where atol (and possible rtol?) allow to specify tolerances for floating point comparisons.

Each test category should be in its own @testset such that an error can immediately be tracked to the offending part of the VTK file. Equally important: If possible a failed test should not result in the remaining tests to be skipped (unless they are rendered impossible, e.g., when a variable is missing). This way, you get the full picture of failures after a single run and don't have to step through it multiple times.

The code that compares and checks a single file against its reference (i.e., compare_vtk_files above) should be stateless and usable outside the testing framework, such that it can be used to also manually test VTK files against references. Maybe it could even become an exported function from Trixi2Vtk?

I would put the reference files in a repo, e.g., Trixi2Vtk_reference_files (or Trixi2Vtk_test_files). In the repo, I would probably create a folder structure that mimics the test structure with a clear rule to translate from test name to folder, e.g., "2D" -> "TreeMesh" -> "uniform test" to 2d/treemesh/uniform-test etc. In the test scripts of Trixi2Vtk, I would store the latest commit hash from that reference file repo for which to retrieve all reference files, such that files can be overwritten/deleted without destroying testing for older versions.

Now, these are my thoughts that I had when starting to write ReadVTK.jl, based on experience from other codes I used to work with, where we compared results against reference files. Please note that these are merely taken to be suggestions, and I might very well have missed something, but maybe this can be used to start a conversation and give @FelipeSantillan a starting point.

sloede · 2021-10-20T07:52:25Z

Regarding the ReadVTK.jl package registration: For the time being, @FelipeSantillan could install ReadVTK.jl locally as a dev version and develop initial testing capabilities based on it. Once he finds that the interface seems to work well enough, we need to register it as a package such that it can be used for CI testing. Alternatively, we could just register v0.1.0 right now, and then do a breaking release v0.2 (if necessary) once Felipe has gathered some experience with it.

The former approach might be slightly cleaner, the latter is likely more convenient. What do you think?

ranocha · 2021-10-20T07:56:53Z

Your overall proposal sounds good to me - thanks 👍
If ReadVTK.jl is already in a "somewhat usable" shape, I think it's okay to register it right now. That way, we will not have to wait for three days when everything is fine.

sloede · 2021-10-20T08:46:44Z

Your overall proposal sounds good to me - thanks 👍 If ReadVTK.jl is already in a "somewhat usable" shape, I think it's okay to register it right now. That way, we will not have to wait for three days when everything is fine.

OK. I think it is very well in a usable state right now. I'll check again, though, and then register it as a new package if it seems OK.

ranocha · 2021-10-20T10:50:54Z

Moreover, we should copy the elixirs we use for testing Trixi2Vtk.jl over to Trixi2Vtk.jl - the exact content of elixirs shouldn't be part of the public API of Trixi.jl, I think. This has already caused problems in trixi-framework/Trixi2Vtk.jl#46

This allows us to merge simple improvements such as #50 without tedious and useless work. Xref trixi-framework/Trixi.jl#680

efaulhaber added discussion visualization labels Jun 29, 2021

sloede mentioned this issue Jun 29, 2021

Trixi2Vtk test results for P4estMesh on macOS are unverified #681

Open

sloede mentioned this issue Jul 22, 2021

Update hashes trixi-framework/Trixi2Vtk.jl#39

Merged

ranocha assigned FelipeSantillan Oct 20, 2021

ranocha removed the discussion label Oct 20, 2021

ranocha added a commit to trixi-framework/Trixi2Vtk.jl that referenced this issue Jan 13, 2022

skip testing hashes since they are useless

a5d38b2

This allows us to merge simple improvements such as #50 without tedious and useless work. Xref trixi-framework/Trixi.jl#680

ranocha mentioned this issue Jan 13, 2022

skip testing hashes since they are useless trixi-framework/Trixi2Vtk.jl#51

Merged

ranocha added a commit to trixi-framework/Trixi2Vtk.jl that referenced this issue Jan 13, 2022

skip testing hashes since they are useless (#51)

771cb3d

This allows us to merge simple improvements such as #50 without tedious and useless work. Xref trixi-framework/Trixi.jl#680

andrewwinters5000 mentioned this issue Dec 2, 2022

Add capability to not perform reinterpolation trixi-framework/Trixi2Vtk.jl#57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a better way to test Trixi2Vtk #680

Find a better way to test Trixi2Vtk #680

efaulhaber commented Jun 29, 2021

sloede commented Aug 19, 2021

sloede commented Aug 19, 2021

efaulhaber commented Aug 19, 2021

sloede commented Aug 19, 2021

ranocha commented Aug 20, 2021

sloede commented Aug 20, 2021

sloede commented Aug 22, 2021

sloede commented Aug 24, 2021

sloede commented Oct 20, 2021

sloede commented Oct 20, 2021

ranocha commented Oct 20, 2021 •

edited

Loading

sloede commented Oct 20, 2021

ranocha commented Oct 20, 2021

Find a better way to test Trixi2Vtk #680

Find a better way to test Trixi2Vtk #680

Comments

efaulhaber commented Jun 29, 2021

sloede commented Aug 19, 2021

0 Do not change anything

Pro

Con

1 Store VTK uncompressed

Pro

Con

2 Compare only headers

Pros

Cons

3 Read in VTU files and compare directly

Pros

Cons

sloede commented Aug 19, 2021

efaulhaber commented Aug 19, 2021

sloede commented Aug 19, 2021

ranocha commented Aug 20, 2021

sloede commented Aug 20, 2021

sloede commented Aug 22, 2021

sloede commented Aug 24, 2021

sloede commented Oct 20, 2021

sloede commented Oct 20, 2021

ranocha commented Oct 20, 2021 • edited Loading

sloede commented Oct 20, 2021

ranocha commented Oct 20, 2021

ranocha commented Oct 20, 2021 •

edited

Loading