Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr at Scipy 2019 #20

Closed
rabernat opened this issue Jan 22, 2019 · 31 comments
Closed

Zarr at Scipy 2019 #20

rabernat opened this issue Jan 22, 2019 · 31 comments

Comments

@rabernat
Copy link

I think someone should propose a talk about Zarr to Scipy. This would really help raise the profile of the project to a broad audience.

Deadline is Feb. 11:
https://www.scipy2019.scipy.org/talk-poster-presentations

@rabernat rabernat changed the title Zarr at Scipy Zarr at Scipy 2019 Jan 22, 2019
@alimanfoo
Copy link
Member

Thanks @rabernat for raising this. I've been wanting to get to scipy for years and haven't managed it yet, so perhaps this is the year. I'd be happy to write an abstract, I'll do it as a text file and PR it back to this repo so hopefully it's easy for anyone to give comments before submission.

@alimanfoo
Copy link
Member

@rabernat, @jhamman, it would be good to illustrate zarr with one or two use cases from geosciences. Could you suggest something?

@alimanfoo
Copy link
Member

@jakirkham could you suggest a use case from your domain?

@alimanfoo
Copy link
Member

For the author list I'll include everyone who is a member of @zarr-developers/core-devs by default. Please let me know if you have any objection to being listed as an author.

@alimanfoo
Copy link
Member

If anyone else would like to suggest a use case I'd be very happy to include it. Ideally it will be an example where zarr is already being used or being prototyped/evaluated, but if you have a potential use case that you're interested in using zarr for then I'd be interested to know too.

@mzjp2
Copy link
Member

mzjp2 commented Jan 25, 2019

@dazzag24 -- do you still use Zarr?

@dazzag24
Copy link

dazzag24 commented Jan 25, 2019 via email

@alimanfoo
Copy link
Member

To get the ball rolling, I've made a PR (#397) with an initial draft of an abstract. I'd very much welcome and comments, suggestions or contributions.

@rabernat
Copy link
Author

it would be good to illustrate zarr with one or two use cases from geosciences. Could you suggest something?

I have dozens of examples. Basically all of our data on the cloud is in zarr, will follow up in the PR.

@jakirkham
Copy link
Member

@alimanfoo, I'm not sure I'll be able to do too much on this before the deadline as I'm out of the country. That said, would be happy to chip in when I get back. Don't have a clear enough idea of my schedule to commit to being at SciPy this year. Will update once I do.

@alimanfoo
Copy link
Member

alimanfoo commented Jan 29, 2019 via email

@alimanfoo
Copy link
Member

Hi folks, I did a bit of editing of the abstract following comments from @jhamman, I think it's in pretty good shape but would be cool if we could mention one more example of data being stored as Zarr. I think we literally just need a sentence. E.g., here is the current text of the results subsection:

We illustrate the use of Zarr with examples from several different
scientific domains. Zarr is being used within the Pangeo project [5_],
which is building a community platform for big data geoscience. The
Pangeo community have converted a number of existing climate modelling
and satellite observation datasets to Zarr [6_], and has demonstrated
their use in computations using HPC and cloud computing
environments. Within the MalariaGEN project [7_], Zarr is used to
store genome variation data from next-generation sequencing of natural
populations of malaria parasites and mosquitoes (see, e.g., [8_]), and
these data are used as input to analyses of the evolution of these
organisms in response to selective pressure from anti-malarial drugs
and insecticides. @@todo another example.

I'd be extremely grateful if someone could volunteer a sentence describing another example. @jakirkham? @ambrosejcarr? @dazzag24?

Any other comments welcome. I'm aiming to submit on Friday.

@ambrosejcarr
Copy link

Happy to contribute some examples. One or both of the below sentences could be used.

Zarr is being used within the Human Cell Atlas (HCA) project [9_], which is building a reference atlas of healthy human cell types. This project hopes to leverage this information to better understand the dysregulation of cellular states that underly human disease. The Human Cell Atlas uses Zarr as the output data format because it enables the project to easily generate matrices containing user-selected subsets of cells.

[Optional] The Human Cell Atlas is also exploring the use of Zarr for the input and output of all of its image-based cellular assays [10_], as the format supports cloud-based analysis and visualization of these data.

.. _9: https://www.humancellatlas.org/
.. _10: https://spacetx-starfish.readthedocs.io/en/latest/

@alimanfoo
Copy link
Member

Wonderful, thanks @ambrosejcarr.

Just heard deadline has been extended to February 15, so we have a few extra days to contemplate.

@alimanfoo
Copy link
Member

I've pushed another commit to the abstract in #397 adding in the short summary, the HCA example from @ambrosejcarr, and an author list. In the author list I've included everyone who has contributed code or is a member of @zarr-developers/core-devs or who participated in the first zarr/n5 conference call or who contributed to the abstract. The list comprises @rabernat, @sbalmer, @ambrosejcarr, @tjcrone, @dazzag24, @martindurant, @funkey, @meggart, @jhamman, @shoyer, @jeromekelleher, @jakirkham, @alimanfoo, @joshmoore, @CSNoyes, @onalant, @constantinpape, @mzjp2, @mrocklin, @axtimwalde, @vincentschut, @shikharsg, @jmswaney, @ryan-williams. Apologies I did not know everyone's name or affiliation. If you would prefer not to be included in the author list or would like me to edit your name or affiliation please let me know asap. If you are not in this list but have contributed in some way to the project and would like to be included then please let me know (and apologies for not including you already). Submission deadline is noon CST so I'll submit in a couple of hours.

@rabernat
Copy link
Author

The deadline is 11:59pm, so midnight, not noon. 😉

@alimanfoo
Copy link
Member

The deadline is 11:59pm, so midnight, not noon.

Ha, I should pay more attention! I'll probably still submit in a couple of hours, before I go home.

@martindurant
Copy link
Member

FYI: I have submitted a proposal for Intake, and I believe there will be one for Dask (although I haven't heard for sure). Supposing that not all of these are accepted, perhaps we could merge some content, catalog of zarrs, dask-parallel processing of zarrs, etc.

@rabernat
Copy link
Author

Any news on the zarr scipy talk? Was it accepted?

@rabernat
Copy link
Author

Ok, it looks like they just listed the first three authors in alphabetical order. Is @alimanfoo planning to attend / present this?

@alimanfoo
Copy link
Member

alimanfoo commented Apr 18, 2019 via email

@alimanfoo
Copy link
Member

For interest, here are the review comments. Also interesting is that I initially got a rejection notice, then a couple of hours later got an acceptance notice, so I guess we initially just missed the cut but then someone above us dropped out. If so suggests that scipy is really competitive, and we had a bit of luck. In any case, great to see the positive reviews.

----------------------- REVIEW 1 ---------------------
Overall evaluation: 4 (accept)

This talk illustrates the use of Zarr with examples from several scientific domains. It is used for several interesting projects such as to store genome variation data from next-generation sequencing of natural populations of malaria parasites and mosquitoes and within the Human Cell Atlas project. It will be great for this conference and health care researchers will get to learn the utilization of Zarr.

----------------------- REVIEW 2 ---------------------
Overall evaluation: 5 (strong accept)

The need for storage of tensor data is only increasing in applications involving parallel and distributed computing of large data sets such as used in machine learning applications. This paper presents an important and ongoing research effort in this area by a large group of accomplished data scientists. The abstract is clearly written, and the work is novel and significant. Although the topic covers a technical implementation, the focus on case studies of real-world problems is a strength.

----------------------- REVIEW 3 ---------------------
Overall evaluation: 4 (accept)

The authors have proposed a talk on Zarr project for distributed and parallel computing. It is also related to the application of Python to life science (Malaria genomics and Human cell atlas) and climate studies.
The proposal is certainly interesting. All relevant resources are also clearly cited.

@alimanfoo
Copy link
Member

Btw does anyone have a suggestion for how to author the slides for the talk in a way that is amenable to putting in a PR and collaborating on?

I suppose the only text-based PR-friendly format would be latex+beamer, haven't used it before but happy to have a go.

Suppose could also be done with jupyter+reveal, although it's impossible to do line comments on a jupyter notebook.

Failing that, google doc?

@mrocklin
Copy link

mrocklin commented Apr 23, 2019 via email

@jakirkham
Copy link
Member

Who will be at SciPy? Also how long are people planning to be there? Finally would it be worth doing a sprint?

@alimanfoo
Copy link
Member

alimanfoo commented Jun 6, 2019 via email

@rabernat
Copy link
Author

rabernat commented Jun 7, 2019 via email

@jakirkham
Copy link
Member

That could be good. Will also be there for the sprints.

Related: There is an N-D image analysis sprint that overlaps with a few projects. I wonder if we can set it up so we have neighboring rooms or share the same room given that there will likely be a lot of common interest between participants.

@axtimwalde
Copy link

@hanslovsky will be at SciPy and present his ImgLib2 <-> Numpy bridge ImgLyb https://github.com/imglib/imglib2-imglyb and Paintera https://github.com/saalfeldlab/payntera

@alimanfoo alimanfoo transferred this issue from zarr-developers/zarr-python Jul 3, 2019
@alimanfoo alimanfoo reopened this Jul 3, 2019
@alimanfoo alimanfoo pinned this issue Jul 3, 2019
@joshmoore
Copy link
Member

Seeing as how SciPy 2022 will have Zarrish attendees, I feel it's safe to close this knowing we can always find it when/if we need it.

@jakirkham jakirkham unpinned this issue Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants