[WIP] Move to new calculator API #517

SamTov · 2022-04-13T19:53:45Z

Purpose

The purpose of this PR is to move the calculator all to one call that constructs a calculator object and another that executes it in the following way:

my_calc = EinsteinDiffusionCoefficients(species=["Na"], data_range=50)
project.experiments.NaCl_example_data.execute_operation(my_calc)

This currently works in this branch for the Einstein diffusion coefficients ONLY.

Please read the full PR before commenting.

What is broken?

Currently it still uses the call decorator and is able to check if data is already stored, the only thing that is broken here is that you cannot call this from the project at the moment and have it loop over experiments. This is because at the moment of the call the list of experiments is not parsed to the calculator, this can, however, be put into the experiment class. This will be part of a broader restructuring of the calculators assumedly.

Comments

I would propose that we move to using call methods on operations that can be called in this way. I added the method to the experiment class:

    def execute_operation(self, operation: callable):
        operation.experiment = self
        operation()

In this way, it doesn't matter if the operation is a calculator, transformation, or graph operation, it can be called by the same method.

Work program

I was thinking of going about this in the following way:

Decide on call API
Move all calculators to call method and put their attributes into their inits
Add functionality to call this from the Project class

If this is complete, we can merge this into main with now changes to the functionality of the package just with the new API in which one has to define the calculator class before running it.

Once this is merged, I would suggest then moving the data handling of the calculators into the experiment and changing the database in one shot.

Comments

All naming and methods at this stage are for demonstration purposes only.
The examples are likely broken due to API changes
An example of the new API is in the NaCl_walkthrough notebook

Updates

Resolved the calling from the project class with essentially a mirroring method in the project class:

    def execute_operation(self, operation):
        for item in self.active_experiments.values():
            item.execute_operation(operation)

~~This however does not check if the calculation was already run for some reason.~~

Removed the call method necessity and enabled checking of sql database for previous data saving.

christophlohrmann · 2022-04-21T14:42:03Z

I like the approach of changing the API first and then caring about the database.
As long as the "I did this already, will load instead of repeating the calculation" still works, it justifies using the __call__ of the calculator.
In the future we can probably go away from this by just calling a common function of the calculators inside execute_operation.
At that point the calculator has all the information it needs ("static info" like data_range from its init and "dynamic info" like species from the arguments passed by the experiment) and can decide if it should run or load.

SamTov · 2022-04-22T08:17:27Z

I like the approach of changing the API first and then caring about the database. As long as the "I did this already, will load instead of repeating the calculation" still works, it justifies using the __call__ of the calculator. In the future we can probably go away from this by just calling a common function of the calculators inside execute_operation. At that point the calculator has all the information it needs ("static info" like data_range from its init and "dynamic info" like species from the arguments passed by the experiment) and can decide if it should run or load.

The issue currently is an odd one. When you call this new api directly from an experiment it checks whether it has already been run, however, if you call it from the project, it does not. I haven't tried to find out why but it is the only issue preventing full adoption at this stage.

Fratorhe · 2022-04-22T09:45:42Z

This is a good idea.
I feel it improves the decoupling of the code, such that in principle, you could use the calculator/transformation anywhere else.
This will also make it easier for testing.

PythonFZ · 2022-04-24T18:05:12Z

I tried to visualize our approach here a bit. Is this what we are after?

flowchart TD

calc[Calculator] -->|pass user arguments| B(Calculator Instance)

B --> |exp.execute_operation| exp[Experiment]

exp --> db{Check DB}
db --> |exist| res[Show results]

db --> |new, pass exp args| D[run calculation]
B --> D
D -->|store in db| exp

I'm still struggeling a bit with this:

my_calc = EinsteinDiffusionCoefficients(species=["Na"], data_range=50)
project.experiments.NaCl_example_data.execute_operation(my_calc)

In my mind I would pass the data to the calculator, and not the calculator to the data.
But I know, we can not nicely do this, because the calculator writes to the experiment database.
But If it did it could simplify things:

flowchart TD

calc[Calculator] -->|pass user arguments| B(Calculator Instance)
B --> prep[Prepare Calculation]
exp[Experiment Data + DB handle] --> prep

prep --> db{check DB}
db --> yes[return results]
db --> no[run computation, save in db]
no --> yes

I do prefer having the experiment or rather the project handle all the database stuff though and not the calculators.
Ideally the calculator just takes some dataset and returns a dataset.

Fratorhe · 2022-04-25T08:47:12Z

I agree with this sentence: "Ideally the calculator just takes some dataset and returns a dataset."
The second sketch does indeed look more linear and less coupled, which should be more beneficial in the long term for maintenance.
However, it is unclear to me who would perform the "Prepare calculation" and "check DB" steps in this case.

SamTov · 2022-04-25T08:53:49Z

So the goal will be the following:

instantiate calculator
Call experiment.something_nice(calculator, **kwargs)
Experiment loads data required in memory safe manner or checks if it exists
Passes data to calculator
Calculator returns data
Experiment puts data into SQL and can return if necessary

The kwargs in this case could be for a post-rdf calculator where you pass RDF data or an MSD. In this way, you have flexibility to pass data but it is all handled by the experiment which provides a linear flow of data and communication between databases.

SamTov · 2022-06-21T21:59:14Z

I have a very odd bug at this stage. It seems that when I run the calculator, it does not load the data, but when I run it again, it finds the data in the database and returns it... So on the run where we compute it, it cannot find the data but then the second time I run it, it finds it in the database.

This is resolved

…o SamTov_Calculator_data_move # Conflicts: # examples/notebooks/NaCl_walkthrough.ipynb

…culator_data_move

# Conflicts: # examples/notebooks/Molten_Salt_Comparison.ipynb # mdsuite/calculators/coordination_number_calculation.py # mdsuite/calculators/einstein_diffusion_coefficients.py # mdsuite/calculators/green_kubo_distinct_diffusion_coefficients.py # mdsuite/calculators/green_kubo_ionic_conductivity.py # mdsuite/calculators/green_kubo_self_diffusion_coefficients.py # mdsuite/calculators/potential_of_mean_force.py # mdsuite/calculators/structure_factor.py

…ware/MDSuite into SamTov_Calculator_data_move # Conflicts: # CI/integration_tests/calculators/test_kirkwood_buff_integrals.py

example calculator use

c94cc30

SamTov requested a review from christophlohrmann April 13, 2022 19:53

get new api working for project

3061844

SamTov added 5 commits May 31, 2022 11:18

update

4d7caba

update data saving

b129791

correct indexing for the value selection in GK

dff0abf

test IC and update tests for new data

6c38737

fix data availability checks

01e00cb

SamTov added 15 commits June 22, 2022 00:13

move data checking to parent class

0998894

resolve data return.

25809b4

move most of SQL checking to experiment class.

c1f5cd9

try to debug

be5377c

Merge remote-tracking branch 'origin/SamTov_Calculator_data_move' int…

37f83c1

…o SamTov_Calculator_data_move # Conflicts: # examples/notebooks/NaCl_walkthrough.ipynb

Identify issue with data loading and fix.

b82bf3a

fix test

c144bbb

Merge remote-tracking branch 'origin/SamTov_IC_Patch' into SamTov_Cal…

4876125

…culator_data_move

fix plotting and move out of calculator

4bb655d

remove args dataclass

2e125cb

move ADF code over to new API

55813e7

update RDF and coordination numbers

533cc95

move rdf stuff to new api

31f488d

Check KB and POMF tests

45f1679

update doc string

3af2d1c

SamTov added 4 commits July 4, 2022 22:36

more updates

c383832

Merge branch 'SamTov_Calculator_data_move' of https://github.com/zinc…

f80944e

…ware/MDSuite into SamTov_Calculator_data_move # Conflicts: # CI/integration_tests/calculators/test_kirkwood_buff_integrals.py

update unnavailable calcs.

b13bcc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Move to new calculator API #517

[WIP] Move to new calculator API #517

SamTov commented Apr 13, 2022 •

edited

christophlohrmann commented Apr 21, 2022

SamTov commented Apr 22, 2022

Fratorhe commented Apr 22, 2022

PythonFZ commented Apr 24, 2022

Fratorhe commented Apr 25, 2022

SamTov commented Apr 25, 2022

SamTov commented Jun 21, 2022 •

edited

[WIP] Move to new calculator API #517

Are you sure you want to change the base?

[WIP] Move to new calculator API #517

Conversation

SamTov commented Apr 13, 2022 • edited

Purpose

What is broken?

Comments

Work program

Comments

Updates

christophlohrmann commented Apr 21, 2022

SamTov commented Apr 22, 2022

Fratorhe commented Apr 22, 2022

PythonFZ commented Apr 24, 2022

Fratorhe commented Apr 25, 2022

SamTov commented Apr 25, 2022

SamTov commented Jun 21, 2022 • edited

SamTov commented Apr 13, 2022 •

edited

SamTov commented Jun 21, 2022 •

edited