Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community file-format technological survey discussion #1

Open
frheault opened this issue Nov 13, 2020 · 10 comments
Open

Community file-format technological survey discussion #1

frheault opened this issue Nov 13, 2020 · 10 comments

Comments

@frheault
Copy link
Collaborator

frheault commented Nov 13, 2020

General goals

  • Uniformization across tools
  • Allowing features needed by the fields (from trk, tck or new one)
  • Conceptually simpler (Simplicity)
  • Less memory AND hard drive hungry
  • General robustness
  • Load/save efficiency/speed
  • Independence/standalone
  • Extensibility

What is 'supported' by the memmap implementation?

  • Agreement on the coordinates system (RASMM) [x]
  • Random-access support with minimal upfront IO [x]
  • Support data streaming and on-the-fly saving [x]
  • Minimalistic header [x]
  • Ability to store (sparse) groups [x]
  • Ability to store None, one or more additional metadata [x]
  • Self-explanatory [x]
  • Self-coherent / strict checks [-] (depends on the implementation)
  • Compression [-] (only zip_deflate)
  • GPU compatibility [-] (Array+offsets are compatible 'enough' with VTK/OpenGL)
  • “Easy” to implement with basic libraries (in C++, python, matlab, etc.) [x]
  • Parallel-friendly for saving (per process on-the-fly saving) [ ] (Not investigated, likely depends on implementation)

(Initial thread: nipy/nibabel#942)

@frheault
Copy link
Collaborator Author

frheault commented Nov 13, 2020

Here is a draft of the (simple) specifications: Still ongoing work (I don't know exactly what to add since it has to cover multiple languages)
https://drive.google.com/file/d/1DVKisuoENqU5Q_652wZdQNrcerxKC2xL/view?usp=sharing

Here is data to test the current implementation. It is pretty big because to truly test speed I had to generate a large tractogram with a lot of metadata.
https://drive.google.com/drive/folders/1fjxyLskcFXYizg6sDNMrRPUu-N1PdvZu?usp=sharing

It would be nice to have comments on the current specifications, but if someone wanted to propose a new format to showcase. I can easily move "my" readme to my module, change the setup accordingly, or accommodate any new languages so the repository is simply hosting code and PDF. As for now, I made it easy to use and to test for my only proposition, but anyone that wants to expand can tell me and we can video chat and then plan a PR to accommodate new ideas.

If it is desired by someone, I could explain in detail the specifications and implementation or examples. To facilitate discussion it could be a conference call so we can go over it quickly as a group discussion about my implementation, limitations, etc. (largely inspired, if not all, by recommendations of @arokem @jdtournier @neurolabusc).

Again, it is important to re-iterate that this is not about the specific implementation or code or language, it is mostly about the specifications and file format descriptions/contents. My code is only there because I personally liked this idea and wanted to test if the idea could indeed achieve what was on the list of features.

@Lestropie
Copy link

Few little suggestions:

  • Make the specification document a part of the repository itself, so that changes can be proposed there, as well as ensuring that any changes to the format / examples are also made in parallel to that document;

  • Encourage individual issues for individual issues; Community file-format technological survey discussion #1 could be reserved for big-picture discussion, but don't want it to turn into a mess;

  • Include link to the original nibabel discussion thread in README.md.

@frheault
Copy link
Collaborator Author

frheault commented Nov 27, 2020

@arokem @francopestilli @frankyeh @MarcCote @neurolabusc @Garyfallidis @jchoude @mdesco @jdtournier @ppoulin @gabknight

If anyone wants to go over questions and discussions in the Issues sections, it would help to have a boost before the Dipy Meeting next Wednesday.

Sorry to add Github notification like that, but I don't who has time for that or not.
dipy/dipy#2229 (comment)

@emanuele
Copy link

emanuele commented Dec 3, 2020

@frheault unfortunately I couldn't participate in the meeting but I am pretty interested in the discussion. Following a recent request of @MarcCote and a few others before him, I made available the large tractogram (500K streamlines) I used to benchmark my fast TRK loader (https://github.com/emanuele/load_trk): https://nilab.cimec.unitn.it/people/olivetti/data/sub-100206_var-FNAL_tract.trk . And here is another, much larger, test tractogram (10M streamlines, 2.9Gb): https://nilab.cimec.unitn.it/people/olivetti/data/sub-599469_var-10M_tract.trk

@frheault
Copy link
Collaborator Author

Hello everyone,

@arokem @francopestilli @frankyeh @MarcCote @neurolabusc @Garyfallidis @jchoude @jdtournier @ppoulin @gabknight @Lestropie @StongeEtienne

I don't want to leave the project die again, but I cannot unilaterally decide most of the issues, if no one else have opinions on the various issues raised by me or @Lestropie (in the issues section) I will have to make a decision which might not please some people. So if you have something to add, we should re-start this discussion immediately.

Also, I don't have the time to do a C++ (or rust, matlab, C) reader and writer to try out the specification. I think it would be important that someone else than me, independently tries to do a small&simple reader/writer to check if something is unclear if there is limitations or if something is impossible to implement.

Is there anyone interested in a video call to discuss implementation in any other language? This would be very important, once this is done we could settle a few points in the issues and go forward.

@jdtournier
Copy link

I'd love to volunteer for the C++ implementation, but my teaching load is pretty intense at the moment, I'm finding it difficult to get anything done other than that. I'll have a think, and maybe I can cobble together a really quick & dirty proof of concept - but no guarantees... If anyone else feels they have the time and the inclination, feel free to put your hand up!

@neurolabusc
Copy link
Member

neurolabusc commented Jan 19, 2021 via email

@jdtournier
Copy link

jdtournier commented Jan 19, 2021

Good to see you've already had a go, @neurolabusc!

I was looking to use libzip here, it seems pretty active and well maintained. I'll take a look at what you've done when I get the chance...

@Lestropie
Copy link

I had naive plans to start looking into MRtrix3 support (which will be neither small nor simple) over the new year break, but unsurprisingly it didn't happen.

I'm not sure that a small & simple implementation will actually have the capability to identify issues with the specification. It's going to be working towards GUI support & the more exotic use cases that the pros and cons of different formulations are going to come out. So we may be more dependent on foresight rather than crashing into things after the fact.

@frheault
Copy link
Collaborator Author

Glad to see that attempts at this will be made in C++, @Lestropie about the small&simple it's true maybe it is not enough. If someone does it as they see fit and we encounter corner cases that's perfect!

@arokem @Garyfallidis I don't even know where to start to convert my existing code into something that would nicely fit into Nibabel. My current code is like a showcase, independent and in a vacuum. Would you be interested in a call to talk specifically about class, function design to refactor my code? I think re-implementing my code into a Nibabel branch right away would be nice.

Just to re-share, here is some example of my how to use my code. But also trx/tck/trk side by side to try to load the same data.
https://vanderbilt365-my.sharepoint.com/:f:/g/personal/francois_rheault_vanderbilt_edu/ElXpfTBbbkVDq44-yy_FJicBYva86qHi5zBbFihelDsP9A?e=0gI5us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants