[ENH] support for sinks in csv format for RAMSES frontend #4424

Lenoble-lab · 2023-04-28T12:35:20Z

Hi everyone,
this pull request is to implement a reader to read the output from ramses sink particle algorithm.
For now, the sink ouput from the standard ramses (the one from the bitbucket) cannot be read. It's only an other format.

In each output_*****, the sink algo write a single sink_****.csv file with the properties of the sinks. I have added an example in attached file.
We need to read a single file per output folder in the csv format.

I think that the ideal would be to have a single ptype corresponding to the sinks (do not create a 'sink_csv' type). The read_header would have to be modified and specified as appropriate. For now, I have impleneted a new name 'sink_csv' to differentiate from the first class.

The main issue I had is the IOHandlerRAMSES._read_particle_fields which reads in every chunks concerned, whereas here there is only one single file to read. I hacked this with the if ptype == 'sink_csv' line 217 in io.py but this is really not ideal.

Thinking about it, I see several solutions:

add a flag in the sink class to specify that it should only be read once -> add an if in IOHandlerRAMSES._read_particle_fields to say that if this is the case, only one read is enough. And we avoid entering the for chunk in chunks loop.
we create a new IOHandlerRAMSES._read_sink_fields which reads only once the csv file, directly in IOHandlerRAMSES. It is possible to add a field type that would not be separated into cpu (I don't know if other Ramses features also write a single file per output). But then you lose the inheritance of the particle class (or re-create another one?).
sink_00010.csv

Thanks in advance !
Romain Lenoble

welcome · 2023-04-28T12:35:22Z

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

yt/frontends/ramses/particle_handlers.py

cphyc · 2023-04-28T13:44:51Z

yt/frontends/ramses/particle_handlers.py

+        field_offsets = {}
+        _pfields = {}
+
+        import pandas as pd


We have in yt a way of supporting imports of external packages using

from yt.utilities.on_demand_imports import _pandas as pd

This will fail gracefully if pandas is not installed with a useful message.

cphyc · 2023-04-28T13:46:46Z

yt/frontends/ramses/fields.py

@@ -131,6 +131,22 @@ class RAMSESFieldInfo(FieldInfoContainer):
        ("particle_metallicity", ("", [], None)),
        ("particle_family", ("", [], None)),
        ("particle_tag", ("", [], None)),
+        # sink field parameters
+        ("particle_msink",     ("code_mass", [], None)),


Is there any reason not to use the canonical particle_mass here and below?
Also the convention in yt is to use explicit names rather than common shorthand notations, e.g. lx, ly, lz should be angular_momentum_{xyz}, rho should be density, cs** should be sound_speed_square(d), tform should be formation_time, ....

Ok I see. Here, I use the format : "particle_" + name of the header column in the csv file.

The column names are : msink, x, y, z, vx,..., lx,...
Maybe using the default convention of yt is better

cphyc · 2023-04-28T13:49:15Z

yt/frontends/ramses/io.py

@@ -92,6 +92,27 @@ def _ramses_particle_file_handler(fname, foffsets, data_types, subset, fields, c
    tr = {}
    ds = subset.domain.ds
    current_time = ds.current_time.in_units("code_time").v
+    #test if sink csv file -> not same reader as the fortran classic files


I would suggest moving that piece of code to its own function (_ramses_particle_csv_file_handler?) and rename the current function _ramses_particle_file_handler by _ramses_particle_binary_file_handler.

Ok that is probably more clear

cphyc · 2023-04-28T13:53:40Z

yt/frontends/ramses/io.py

@@ -192,6 +213,9 @@ def _read_particle_fields(self, chunks, ptf, selector):
            for ax in "xyz":
                if pn % ax not in field_list:
                    fields.append((ptype, pn % ax))
+            # attention : read only one chunk as its a global file, don't know wether this is the good way to do it


I would suggest removing this piece and code and instead modifying the following chunk (currently sitting at line 260):

_ramses_particle_file_handler( fname, foffsets, data_types, subset, subs_fields, count=count ) )

with something like

tr.update( ph.reader( fname, foffsets, data_types, subset, subs_fields, count=count ) )

The reader attribute could then be a method of the particle readers rather than a stand-alone method.

ah yes, that would definitely solve our problem and it can be much more easy to later implement reader for any sort of file

I can try to implement this if you want

But actually, if you keep something like this :

tr.update( ph.reader( fname, foffsets, data_types, subset, subs_fields, count=count ) )

You are still updated the value of tr by appending the variable at the end, iterating over the chunks concerned.

To keep using this method, you need to check that the particle you are adding are well in the subset when you read it in the csv file. Thus checking with a few if the position of the particle. I don't know if it is fast (but here the files are less than 1e3 sinks so not too fast).

A 2nd method would be to read once for all the data of the csv file, in this case, you need to avoid to go in the iteration line 240 in _read_particle_fields :

for chunk in chunks: for subset in chunk.objs:

In this case, the class would store all the sink produced, independently of their position.
It requires to either write some condition in _read_particle_fields before going onto the iteration.
Maybe, we can implement this with _read_particle_data_file method ?

It might be difficult to check whether a particle is in the current subset though.

I think we have two options here that avoid having code specific to sinks here:

we update _read_particle_fields so that it returns a boolean that informs whether all the data has been read (sink case) or not (all other cases).

we update the particle handlers so that they carry a flag that says "all data is in one file, please only read one chunk".

At the end of the day, the reason for this (slight) mess is that the index assumes that the data is spread over Ncpu different domains, each of which forms one chunk. The best solution would require having a second index I guess, but that would be a massive change, especially for a simple case like this one.

we update _read_particle_fields so that it returns a boolean that informs whether all the data has been read (sink case) or not (all other cases).

in that case, we need to modify deeply the class BaseIOHandler, as we modify the output of _read_particle_fields no ?

we update the particle handlers so that they carry a flag that says "all data is in one file, please only read one chunk".

yes, that could be the simplest idea no ?

what I wanted to say is that in the BaseIOHandler there is a _read_particle_data_file. I don't get it but there is a raise NotImplementedError, whereas it is not implemented in the IOHandlerRAMSES class. Maybe we can implement a method for data_file like this. So when reading the csv data, the called function is this one and not the former _read_particle_fields. The ideal, would be a function setted as an attribute of the particle class, but that might require to modify the class BaseIOHandler I think.

You may be right. I don't remember the inheritance of BaseIOHandler. In that case, updating the particle handlers (they're specific to RAMSES and already encode all the specifics of each particle output file anyway) seems the best route.

hum, yes that seems the good idea. But for now, I don't manage the yield of _read_particle_fields.
Do you know if this can be called from another function like this (this code does not work, I don't know a lot about coroutine in python) :

def _read_particle_fields(self, chunks, ptf, selector): pn = "particle_position_%s" chunks = list(chunks) fields = [ (ptype, fname) for ptype, field_list in ptf.items() for fname in field_list ] for ptype, field_list in sorted(ptf.items()): for ax in "xyz": if pn % ax not in field_list: fields.append((ptype, pn % ax)) self._read_particle_from_binary_file(chunks, ptf, selector, fields) def _read_particle_from_binary_file(self, chunks, ptf, selector, fields): pn = "particle_position_%s" for chunk in chunks: for subset in chunk.objs: rv = self._read_particle_subset(subset, fields) for ptype, field_list in sorted(ptf.items()): x, y, z = (np.asarray(rv[ptype, pn % ax], "=f8") for ax in "xyz") mask = selector.select_points(x, y, z, 0.0) if mask is None: mask = [] for field in field_list: data = np.asarray(rv.pop((ptype, field))[mask], "=f8") print(data) yield (ptype, field), data

In this case, the function _read_particle_from_binary_file would be an attribute of the ParticleHandler class.

neutrinoceros · 2023-05-03T07:42:11Z

@Lenoble-lab when you are done pushing new changes, please take a minute to address linting error reported by pre-commit.ci. All of them seem to be auto-fixable (see the note at the bottom of this page https://results.pre-commit.ci/run/github/88776493/1683098596.TWKhnqatSLSZQeSp2akaZw)

yt/frontends/ramses/fields.py

Lenoble-lab · 2023-05-30T15:15:03Z

Hello guys,
Did you have time to look into this project. I'm using my branch for the sink in my simulations (but I may miss it for the next update of yt). We can discuss of it if you want (I'm far from knowing everything, but I can help a bit with the change I have done).
Cheers,
Romain

neutrinoceros · 2023-07-17T10:11:50Z

Let's try to get this across the finish line. The priority would be to fix failing checks.

cphyc · 2023-07-17T12:30:36Z

I'm on holidays so please feel free to merge without me. I won't be able to review code in the coming weeks. 17 Jul 2023 12:12:01 Clément Robert ***@***.***>:

…

Let's try to get this across the finish line. The priority would be to fix failing checks. * runtime test failure: https://tests.yt-project.org/job/yt_py38_git/7104/testReport/junit/yt.frontends.ramses.tests/test_outputs/test_ramses_part_count/ * style checks https://results.pre-commit.ci/run/github/88776493/1689259288.1fuat6kqReilI3lnYx0SMw — Reply to this email directly, view it on GitHub[#4424 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABJJII2Y7ZOUZO6WVAQ2RB3XQUFXBANCNFSM6AAAAAAXPFWF54]. You are receiving this because you commented.[Tracking image][https://github.com/notifications/beacon/ABJJII5YNMS46MPXBOPKKWLXQUFXBA5CNFSM6AAAAAAXPFWF56WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTBT2MF4.gif]

yt/frontends/ramses/io.py

Lenoble-lab · 2023-07-25T09:10:19Z

Hi, I don't get the errors from https://results.pre-commit.ci/run/github/88776493/1690275703.l2NvkRYcQf62QbImBS2O4g

Do you know what are the problems ?

yt/frontends/ramses/io.py

neutrinoceros · 2023-07-25T09:31:56Z

Hi, I don't get the errors from results.pre-commit.ci/run/github/88776493/1690275703.l2NvkRYcQf62QbImBS2O4g

Do you know what are the problems ?

All the linting errors on your branch appear to be auto-fixable (that's why the error messages are minimal).
If you look carefully, you'll note that, at the bottom of the page you linked, are instructions for how to solve these problems by commenting this PR. I would also advise you have a look at our contribution guidelines for how to work with linters locally, which would help keeping the PR tidy enough for merging.

.pre-commit-config.yaml

neutrinoceros · 2023-07-25T11:39:55Z

There's a merge conflict to resolve now. Would you allow me to rebase your branch or would you prefer to resolve it yourself ?

Lenoble-lab · 2023-07-25T12:16:35Z

hum, as you want I was eating so I didn't see it. Now, I don't have time but later in the afternoon I can have a deeper look. I run 'pre-commit run --all-files` but it changes deeply some files

Co-authored-by: Corentin Cadiou <contact@cphyc.me> Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros · 2023-07-25T12:49:23Z

fixed. I condensed the branch's history to eliminate back-and-forth edits which make stuff like git bisect and git blame less useful for future maintenance.

Here are the commands you need to run to update your local clone of the repo

git checkout main
git branch -D sink_csv
git fetch
git checkout sink_csv

now we should be able to focus on getting tests to pass

neutrinoceros

here are a couple of light weighted questions and suggestions to improve clarity and readability

yt/frontends/ramses/particle_handlers.py

yt/frontends/ramses/io.py

neutrinoceros · 2023-07-25T13:01:15Z

yt/frontends/ramses/io.py

-        for chunk in chunks:
-            for subset in chunk.objs:
+
+            if ptype != "sink_csv":


I'm confused. Shouldn't the guard clause be if ptype == "sink_csv": here ?

There is an if and then an else, so it's a bit equivalent, isn't it ?

I was to force to add this condition as the reader is really not the same between the fortran file and the csv file. I didn't manage to find a way without.

Indeed there's an else in there, and having this if/else branching isn't an issue, but I would expect the old code to correspond to previously supported cases (i.e. if ptype != "sink_csv"). Here, it seems to me that the old code is run specifically in the else clause, which matches the case you are adding. Am I missing anything ?

ok I see. I will change with this idea

yt/frontends/ramses/io.py

Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros · 2023-07-25T17:05:58Z

yt/frontends/ramses/io.py

+        dat = pd.read_csv(
+            fname, delimiter=",", usecols=[ind], skiprows=2, header=None
+        )  # read only selected fields


I'm wondering if we could call pd.read_csv just once ahead of the loop instead of doing multiple (costly) reading operations ? If there was a chance of memory errors, I would expect the current implementation to hit those any way.

ok I change now.
But, it's far from being optimized : I nead to read the entire file in _read_part_csv_file_descriptor to get the number of lines (to then initialize self.local_particle_count). And then I read it again (but only the columns i'm interested in)

I think reading is a csv file is to different from readinf the fortan binay file, so thying to use the same structure is not ideal

yt/frontends/ramses/particle_handlers.py

Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros

Almost there ! just a couple more small nits

yt/frontends/ramses/io.py

Co-authored-by: Clément Robert <cr52@protonmail.com>

Lenoble-lab · 2023-07-31T08:53:12Z

Sorry for the delay, I was away and I completely forget...

neutrinoceros · 2023-07-31T13:21:44Z

No need to apologise, and thank you for your contribution !

welcome · 2023-07-31T13:22:01Z

Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆

cphyc changed the title ~~Sink csv~~ [ENH] support for sinks in csv format for RAMSES frontend Apr 28, 2023

cphyc added enhancement Making something better code frontends Things related to specific frontends labels Apr 28, 2023

cphyc reviewed Apr 28, 2023

View reviewed changes

neutrinoceros reviewed May 3, 2023

View reviewed changes

yt/frontends/ramses/fields.py Outdated Show resolved Hide resolved

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

yt/frontends/ramses/io.py Outdated Show resolved Hide resolved

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

yt/frontends/ramses/io.py Outdated Show resolved Hide resolved

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

neutrinoceros force-pushed the sink_csv branch 3 times, most recently from 65ed224 to 53a8288 Compare July 25, 2023 12:42

ENH: support for sinks in csv format for RAMSES frontend

f4499b5

Co-authored-by: Corentin Cadiou <contact@cphyc.me> Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros force-pushed the sink_csv branch from 53a8288 to f4499b5 Compare July 25, 2023 12:43

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

Lenoble-lab and others added 6 commits July 25, 2023 18:28

Update yt/frontends/ramses/particle_handlers.py

9cdba49

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

1f557f5

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

5a9eb9f

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

9cd4278

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

8e68137

Co-authored-by: Clément Robert <cr52@protonmail.com>

black syntax

d5276ae

Merge branch 'sink_csv' of github.com:Lenoble-lab/yt into sink_csv

d02e008

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

yt/frontends/ramses/particle_handlers.py Outdated Show resolved Hide resolved

neutrinoceros reviewed Jul 25, 2023

View reviewed changes

yt/frontends/ramses/particle_handlers.py Outdated Show resolved Hide resolved

Lenoble-lab and others added 4 commits July 25, 2023 19:14

Update yt/frontends/ramses/particle_handlers.py

fc1a40d

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/particle_handlers.py

760c67c

Co-authored-by: Clément Robert <cr52@protonmail.com>

change if/else to correspond to the old code

95cf9fe

only one call to panda.read

91f1b53

neutrinoceros reviewed Jul 26, 2023

View reviewed changes

yt/frontends/ramses/io.py Outdated Show resolved Hide resolved

yt/frontends/ramses/io.py Outdated Show resolved Hide resolved

yt/frontends/ramses/io.py Outdated Show resolved Hide resolved

Lenoble-lab and others added 3 commits July 31, 2023 10:51

Update yt/frontends/ramses/io.py

647b6ec

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

16236e8

Co-authored-by: Clément Robert <cr52@protonmail.com>

Update yt/frontends/ramses/io.py

7293375

Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros enabled auto-merge (squash) July 31, 2023 13:21

neutrinoceros added this to the 4.3.0 milestone Jul 31, 2023

neutrinoceros approved these changes Jul 31, 2023

View reviewed changes

neutrinoceros merged commit 6c715aa into yt-project:main Jul 31, 2023
12 checks passed

[ENH] support for sinks in csv format for RAMSES frontend #4424

[ENH] support for sinks in csv format for RAMSES frontend #4424

Conversation

Lenoble-lab commented Apr 28, 2023

welcome bot commented Apr 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros commented May 3, 2023

Lenoble-lab commented May 30, 2023

neutrinoceros commented Jul 17, 2023

cphyc commented Jul 17, 2023 via email

Lenoble-lab commented Jul 25, 2023

neutrinoceros commented Jul 25, 2023

neutrinoceros commented Jul 25, 2023

Lenoble-lab commented Jul 25, 2023

neutrinoceros commented Jul 25, 2023 • edited

neutrinoceros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros left a comment

Choose a reason for hiding this comment

Lenoble-lab commented Jul 31, 2023

neutrinoceros commented Jul 31, 2023

welcome bot commented Jul 31, 2023

neutrinoceros commented Jul 25, 2023 •

edited