Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference datasets #20

Closed
neurolabusc opened this issue Apr 4, 2022 · 26 comments
Closed

Reference datasets #20

neurolabusc opened this issue Apr 4, 2022 · 26 comments

Comments

@neurolabusc
Copy link
Member

@frheault is it possible to have some demo code that converts from/to TRX to help validate upcoming TRX tools? You have TRK/TCK/VTK(FIB) samples here. Can you provide a simple script that reads each of these images and writes the output as TRX? Some concrete real world examples would be a great help.

@frheault
Copy link
Collaborator

frheault commented Apr 5, 2022

Great suggestion!
I had to steal a bit of code from Dipy to include the TRX in it (but this mean most of the work is done if Dipy wants to support it)

Here is the PR #22

@neurolabusc
Copy link
Member Author

neurolabusc commented Apr 6, 2022

@frheault thanks for your quick work. Your converter appears to work. I do have a couple of comments:

  1. When the input files define positions with float32 precision, the resulting TRX file generates "positions.3.float16". I would suggest that float32 be the default. Some languages (e.g. JavaScript) and some architectures (e.g. x86-64) are not tuned for float16, and it loss of precision should not be the default behavior.
  2. It would be great if the input ".trk.gz" format was supported, as GZip compression is common for TRK files.
  3. Your software does a nice job of converting TRK scalars and properties. I do wonder if these should be trimmed of white space. For example, your converter does a literal job of converting TRK files generated in Matlab, but the resulting names are padded (effect .float32, pval .float32, pval_corr.float32 presumably because Matlab string arrays were used that require equal sizes. I am not sure what the ideal behavior is. My sense is that effect.float32 is preferred, but I recognize this is not a direct translation. I do not have a strong opinion here, but I do think there should be a discussion. In particular, as TRX creates a file system, I think scalars and properties with special characters should be censored (with Windows being the lowest common denominator as it forbids characters that are allowed in Unix).
  4. I wonder if the software should default to offsets.uint32 instead of offsets.uint64 for meshes with less than 4 billion streamlines. The decompressed array would require less RAM and the zip format (as well as zstd and gz) does not fully leverage larger integer datatype swizzling for compression.

The detail of my comments reflect my enthusiasm for this format, rather than a core criticism. I am working on a JavaScript implementation which is designed to work on computers, tablets and phones, so memory considerations are at the top of my mind.

@frheault
Copy link
Collaborator

frheault commented Apr 6, 2022

Thanks for the feedback!

  1. I agree that having a default data type that leads to a loss of precision is not optimal. But would it be more appropriate to allow this data type to be changed in software like mrtrix, dsi-studio, dipy, etc. and leave it in float16? I assume most people will leave the default 99% of the time, is it worth literally hundreds of terabytes over thousands of lab to save some precision at the fourth or fifth decimal? (Or maybe I underestimate the loss of precision on these architectures)

  2. I will add a trk.gz support ASAP, good suggestion. By curiosity, which software are you using to generate these?

  3. Could I have a trk file with these data as an example and I will try to fix the problem, should be relatively easy. But I want to make sure I actually make your case work. And I agree that special characters should be avoided in filenames, anyway I don't think anyone would like to use these. @arokem what do you think?

  4. offsets are values related to vertices, so while 4 billion is a gigantic number it can be reached pretty fast when using a small step size. The default could be made uint32, and if the number of vertices passes a certain threshold (when doing on-the-fly saving for example) the array could be switched to uint64?

@neurolabusc
Copy link
Member Author

  1. For new datasets moving forward, I think there could be some discussion regarding whether Float16 is sufficient (I suspect it is for most applications). It may be reasonable that float16 is the default when generating new streamlines from scratch. However, when converting legacy formats like TCK, TRK, VTK, I would try to preserve their precision. It is worth noting that tck does not support float16, but it does support float64! My sense is modern DWI scans will allow denser tractography where size matters, but legacy formats were developed when there were fewer streamlines.
  2. I was trying your converter on a wide range of data available on the web, most of it created with DSI studio, e.g. here.
  3. I do think it is an open debate regarding whether white space should be removed or not (should one censor data), but the forbidden characters are a real issue with the TRX system where the property name is embedded in the file name.
  4. I think for tools creating new streamlines of unknown size, they might want to default to uint64, but for conversion and datasets of known size using uint32 makes a lot of sense. It is worth noting that most graphics draw calls are limited to UINT32 so if you really want more than 4294967296 vertices you will have to cull the number to display them or segment them into multiple VAOs.

This is outside my domain, so maybe experts like @jdtournier want to weigh in, but my sense is that if you are trying to track more that 4294967296 vertices, you should try to work on making the software smarter. One option would be vertex reuse (afterall, if you are OK with float16 precision, the vertices from all these streamlines will use many of the same vertices), maybe you want to work on signal averaging to create more accurate rather than more streamlines. I can certainly see the logic for creating a new format with headroom for the future. However, the human brain is of known, limited volume and MR signal is based on the volume of hydrogen, e.g. so a 2mm isotropic scan has 8 times (2^3) more signal than a 1mm voxel, with SNR effectively linear with field strengths for most modalities.

@frheault
Copy link
Collaborator

frheault commented Apr 7, 2022

@neurolabusc I added a new script (tff_convert_dsi_studio.py), I decide to go this way because TRK file from DSI-Studio do not seem to have a valid header (that matches their associated NIFTI):
Screenshot from 2022-04-07 10-42-41

Also, the NIFTI seems to spatially aligned with MNI152, but the streamlines are not (all software but TrackVis are in rasmm/world space/scanner space, TrackVIs is in voxmm:
Untitled drawing

This leads to my script tff_visualize_overlap.py to expose that problem.
Untitled drawing (1)

So I made a special script (kind of hacky/experimental) to fix it separately. Modifying the loading/saving code of DSI-Studio to increase compatibility with other tools would be nice! @frankyeh

Technical note, once these operations are done, the resulting streamlines can be saved with Dipy StatefulTractogram and they can be visualized in MITK, MI-Brain (TRK/TCK/FIB), SurfIce, TrackVis, Dipy and if saved in TCK-> mrtrix.
At the moment, I think TRK from DSI-Studio only works within DSI-Studio (see the second figure).

    sft.to_vox()
    sft_fix = StatefulTractogram(sft.streamlines, args.in_dsi_fa, Space.VOXMM)
    sft_fix.to_vox()
    flip_axis = ['x', 'y']
    sft_fix.streamlines._data -= get_axis_shift_vector(flip_axis)
    sft_flip = flip_sft(sft_fix, flip_axis)

    sft_flip.to_rasmm()
    sft_flip.streamlines._data -= [0.5, 0.5, -0.5]

EDIT: I took TRK from https://pitt-my.sharepoint.com/:f:/g/personal/yehfc_pitt_edu/Ek0DdO67iQ9NvkJUci91lzMBXCVBq926QXTTY7JK6LIjgw?e=jvydcC
and I took NIFTI from https://pitt-my.sharepoint.com/:f:/g/personal/yehfc_pitt_edu/EvAcb1QyogFPg206v-FRl2gB6EcDf3TIPG37JyugoL3hdA?e=SuGBZ4

@neurolabusc
Copy link
Member Author

@frheault this is an elegant script. One unusual feature is that it can convert compressed .trk.gz files but not uncompressed trk files:

$python ./tff_convert_dsi_studio.py C_PHP.trk.gz FA.nii CPHP.trk  
$python ./tff_convert_dsi_studio.py uCPHP.trk FA.nii uCPHP.trx 
Traceback (most recent call last):
  File "/Users/chrisrorden/Neuro/tractography_file_format/scripts/./tff_convert_dsi_studio.py", line 86, in <module>
    main()
  File "/Users/chrisrorden/Neuro/tractography_file_format/scripts/./tff_convert_dsi_studio.py", line 64, in main
    shutil.copyfileobj(f_in, f_out)
  File "/Users/chrisrorden/miniforge3/lib/python3.9/shutil.py", line 205, in copyfileobj
    buf = fsrc_read(length)
  File "/Users/chrisrorden/miniforge3/lib/python3.9/gzip.py", line 300, in read
    return self._buffer.read(size)
  File "/Users/chrisrorden/miniforge3/lib/python3.9/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/Users/chrisrorden/miniforge3/lib/python3.9/gzip.py", line 487, in read
    if not self._read_gzip_header():
  File "/Users/chrisrorden/miniforge3/lib/python3.9/gzip.py", line 435, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'TR')

@frheault
Copy link
Collaborator

frheault commented Apr 7, 2022

I will add an extra line, I forgot that part! I was too focused on the data online I forgot trk.gz can be decompressed manually.

Also, for now, it only converts to TRK/TCK/FIB, TRX will be added this afternoon.

@frankyeh
Copy link

frankyeh commented Apr 7, 2022 via email

@neurolabusc
Copy link
Member Author

@frheault I have one other issue with tff_convert_dsi_studio.py: it does not preserve properties or scalars. Here is an example of a mis-aligned TRK scan and anatomical image as well as the output from your script. Notice that the property "DATASETID" does not survive.

@frheault
Copy link
Collaborator

frheault commented Apr 7, 2022

@frankyeh Thank you for the quick answer! Do you have an example for me of an old.trk.gz (not fixed yet) and a new.trk.gz (loaded and saved with the fix) from DSI-Studio so I can make sure my code works well with as many tools as possible?
And is there a way to know if it's a pre/post fix trk? That way one script could crash and warn the user to fix it, either with the above mention script or a more recent DSI-Studio

@neurolabusc Thanks for the example data, I will incorporate the change (the metadata is kept in the other conversion script, but not the DSI-Studio one, because it was kind of hacky in the first place).

EDIT: Should keep data_per_*, and load trk/trk.gz and save trx now

@frankyeh
Copy link

frankyeh commented Apr 7, 2022 via email

@frheault
Copy link
Collaborator

frheault commented Apr 8, 2022

@frankyeh Thanks for the data. After looking at it I think the new version still has a problem.
First I would expect the header of the TRK to be identical to the NIFTI. Right now the voxel order is not the same (LAS for the NIFTI and LPS for the TRK) and the translation do not match (possibility due to the voxel order)
image

Also assuming the provided NIFTI is the right one, both the older and newer version of the TRK load outside of the volume in a RASMM viewer and my tff_visualize_overlap.py show the same problem.
image

I am not sure how, but it seems my tff_convert_dsi_studio.py does the job for both TRK file (green-ish in the middle of the volume above).

I am not sure exactly what is the source of the problem. But I think the inconsistencies between the headers' affine is the start of it. They should be identical. (Assuming the user provided the right reference NIFTI)

@frankyeh
Copy link

frankyeh commented Apr 8, 2022 via email

@frheault
Copy link
Collaborator

frheault commented Apr 8, 2022

So the NIFTI provided by the user was the wrong one? Also no matter the voxel order (or stride) of a NIFTI, it should not change its world space coordinates if the affine is modified accordingly. Same thing with the TRK, the coordinates written on disk should always be in voxel/grid space and the affine bring the coordinates to world space.

Do you have another example data with an old vs new TRK file with the right NIFTI you expect it to overlap with?

If it matches the NIFTI it is expected to match with, that's fine. But if it doesn't it means something is wrong. I would expect a FA and bundle saved by the new DSI-Studio to overlap in TrackVis and Nibabel or Mi-brain, do you have that lying around?

@neurolabusc
Copy link
Member Author

@frheault thanks your latest version of tff_convert_dsi_studio.py works elegantly. I do think this actually highlights the need for a new format like TRX that provides unambiguous spatial position for vertices. Beyond this dsi studio bug, I have seen implementations of TRK that display vertices 0.5 voxels off relative to trackvis.

I see you are using the Imeka mi-brain. I am unable to open TRX files using the latest 2020.04.09 release on macOS. Do I need a plug in, or are you using a developmental version (or merely viewing TRK files)?

@frankyeh
Copy link

frankyeh commented Apr 8, 2022 via email

@frheault
Copy link
Collaborator

frheault commented Apr 8, 2022

@neurolabusc I am not loading TRX in MI-Brain, only TRK/TCK. Since I worked on the Python code to load TRX, the StatefulTractogram in Dipy and coded the C++ TRK loader in MI-Brain I know that Dipy and MI-Brain work the same way.

Once we have the C++ prototype for the TRX file, I will try to convince people at Imeka (MI-Brain) to add it to their list of supported file formats. I know they would not use most functionalities, but loading/saving with basic data_per_point/streamline with the TRX would be very nice! But I also know they will likely prefer to wait for us (the research branch of the project) to somewhat agree on the design to avoid wasting resources.

@frankyeh I will try to generate it myself using the latest available version online for Windows. I will share the file and version number here (likely next week) with the steps I did so you can confirm (or not) that what I did is right/wrong.

@neurolabusc
Copy link
Member Author

@frheault thanks to your conversion tool, it is helping me create a JavaScript implementation. One issue I have with benchmarking my solution is that the script does not generate valid vtk files. Specifically,

python ./tff_convert_tractogram.py dpv.trx dpv.vtk

Will create a binary file that includes

LINES 3 32
OFFSETS vtktypeint64
numPoints0, i0, j0, k0, ...

whereas the specification only allows int32 and therefore there is no data type for lines (and VTK files that do allow a custom datatype require it on the same line as the data kind):

LINES 3 32
numPoints0, i0, j0, k0, ...

Beyond this clear violation: while legal, it is also unusual that the VTK file uses float64 (POINTS 32 double) rather than the more standard float32 (POINTS 32 float). Given that the TRX choses float16 for precision, this excessive precision seems to yield an unfair comparison.

@frheault
Copy link
Collaborator

At the moment this VTK code is in Dipy and was not intended to be customizable (for benchmarking or other uses), but rather a simple conversion to VTK with no specific "target" software/tool in mind.

For this specific benchmarking task, you would like to be able to customize the offsets and positions data type of the VTK file format?

@neurolabusc
Copy link
Member Author

@frheault DiPy is able to leverage the collective work of 124 contributors. It is a terrific tool that plays a key role in our community. However, the DiPy group did not create the VTK standard, nor did they update the standard. The streamlines created by DiPy retain the signature of the VTK format. I think the DiPy team needs to be conscientious of all the smaller teams that are unaware to these unilateral changes and may not have the resources to support them. Off the top of my head Camino, MedInria, TVFromEigenSystem, VTK, SingleTensorFT, FSL tools, Surfice, Slicer3D. If DiPy wants to make these unilateral changes they should use a different extension and signature or alternatively commit to supporting these other tools in adopting this standard. All these other tools will now mysteriously fail for no fault of the original developers who faithfully implemented the standard and with no explanation for the end user.

I have updated my own JavaScript code to support this data, but please do not see this as an endorsement for this behavior.

@frheault
Copy link
Collaborator

frheault commented Apr 11, 2022

@neurolabusc It seems that the problem is simply that the vtk_support module of NumPy default to float64 and int64 when converting NumPy arrays to VTK array (the code is not currently fixing a dtype), maybe this changed when the VTK file format got updated in VTK9.0 to include offsets and connectivity in the file. I see some change involving VTK_USE_64BIT_IDS being set to True in a lot of applications, so maybe NumPy set it to True if on a 64bits machine since VTK9.0.
(I see some StackOverflow questions about the variable being set to True on some machine and False on other at compilation, leading to different precision both in memory and on disk, which is our problem here)

Anyway, this is breaking for various tools, so I can try to fix this in the code and set the default to int32/float32?

Do you know which of the listed software above supports the newer VTK file format vs the legacy file format (or both)
(For example, I know MI-Brain only supports the legacy VTK file format?

EDIT: Here is two identical files (ASCII), one is when the default (previous problematic version) and the other I specified the data type to float32 and int32. Is the second one valid/working in your application?
files.zip

@neurolabusc
Copy link
Member Author

@frheault ASCII does not make sense for scientific data: there are implementation differences in rounding, the files are slow to load and require a lot of disk space. I adapted my script to handle the binary DiPy version as well as legitimate VTK binary lines data.

Regardless, your samples are not valid VTK files:

LINES 11 700
OFFSETS vtktypeint64
0 43 137 249 332 404 495 515 541 

Note that DiPy stores the OFFSETS, with each entry listing the start of a sequential streamline. While this is efficient for sequential streamlines, it does not allow vertex reuse and is not what the VTK format specifies. The VTK format would look something like

LINES 3000 7000
43 0 1 2 3 4 ...

I can completely understand the rationale for the DiPy format, but these should not have the .vtk extension or the VTK signature on the first line # vtk DataFile....

@frheault
Copy link
Collaborator

frheault commented Apr 11, 2022

@neurolabusc Maybe there is a misunderstanding. Isn't the offsets/connectivity the new file format convention of VTK (with VTK 9, spring 2020).
https://discourse.vtk.org/t/upcoming-changes-to-vtkcellarray/2066

I see now that this change from VTK caused a problem elsewhere, I think that Dipy follows the VTK current file format convention (version >5). But it is true that the vast majority of applications are following the legacy (version <5) file format.

Apparently there is code in VTK to switch back and forth between the old and the new format. Maybe Dipy could support (and even default to) the legacy convention? What do you think?

EDIT: All of this is indeed a major VTK update that was not properly announced or transitioned. The current VTK file format convention is not backward compatible and the new default is breaking the application expecting the older version (<5).
https://discourse.vtk.org/t/legacy-polydata-file-compatibility/5354/6
https://discourse.vtk.org/t/can-we-write-out-the-old-vtk-4-2-file-format-with-vtk-9/5066

PS: If VTK is 'pushing' forward with this, what is the best approach? Defaulting to legacy format or the newest format?

@neurolabusc
Copy link
Member Author

OK, so they never updated a specification, they just added notes to the vtkCellArray class documentation? Despite the comments from users, it is not available in the current Slicer 4.11.20210226, though it is supported by the Preview release. I guess if this was initiated by the VTK group it is technically legal, but the description of the legacy implementation seems very sparse. I guess I (and others) will have to update our code for this variation. Well mea culpa, I retract my comments regrading the rogue DiPy team.

@frheault
Copy link
Collaborator

frheault commented Apr 11, 2022

@neurolabusc No problem I understand the confusion, I struggled to find documentation about the changes and I looked for it. This seems to be a very sneaky and breaking update from VTK and I guess Dipy went along very fast without warning (well, as much warning as VTK gave us).

This is yet another example of a challenge with file formats! I'm glad we figured it out, I think my understanding of VTK file format is better than it was, I was not really sure about the new convention (and I sure thought it would be better detailled)

@neurolabusc
Copy link
Member Author

Closing the issue. Thanks @frheault for scripts that allow us to create datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants