# Martignac tutorial

This Jupyter notebook is meant to introduce how Martignac is typically used, together with a number of functionalities. For this tutorial we'll focus on generating a solute (i.e., the simplest Martignac workflow). 

Before you go any further, please read the `README.md` and the online documentation to check that you have installed and configured everything. Also do go through the Martignac configuration file, the default configuration being stored in `martignac/config_default.yaml`. Configurations include notably the NOMAD parameters, as well as the paths for the local simulation input and output directories and files.

Let's first import a number of functions we'll be using.

In [38]:
from dataclasses import asdict
from pprint import pprint

from martignac.nomad.users import search_users_by_name
from martignac.nomad.datasets import retrieve_datasets, create_dataset, delete_dataset
from martignac.nomad.entries import get_entry_by_id
from martignac.workflows.solute_generation import project as solute_gen_flow_project

## NOMAD API connection

Martignac is tightly integrated with NOMAD (<https://nomad-lab.eu/nomad-lab/>). NOMAD is a state-of-the-art webserver to normalize, store, and share materials-science data. It supports Gromacs, which is the molecular dynamics simulation engine we'll be using here.

First, make sure you have a NOMAD account and that you've exported the environment variables `NOMAD_USERNAME` and `NOMAD_PASSWORD`.

Because we're running tests, we'll systematically work with the `test` database---**not the production one**. Make sure this is correctly configured in your configuration `config.yaml` file, either located in the `martignac/martignac` directory, or perhaps more conveniently stored in your current directory. The yaml file should contain the following:

``` yaml
nomad:
  upload_to_nomad: true
  publish_uploads: false
  use_prod: false
```
which indicates that we will be uploading simulations to NOMAD, we will **not** be publishing the entries (that's permanent!), and we will use the NOMAD test database (`use_prod: False`).

Let's first look for a NOMAD user object. Feel free to replace the query name by yours, and try to find yourself:

In [9]:
users = search_users_by_name("Bereau", use_prod=False)
users

24-08-23 14:42:45 - martignac.nomad.users - INFO - retrieving user Bereau on test server
24-08-23 14:42:45 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/users?prefix=Bereau


[NomadUser(name='Tristan Bereau')]

The object contains various attributes:

In [20]:
user = users[0]
user.__dict__

{'user_id': '30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b',
 'name': 'Tristan Bereau',
 'first_name': 'Tristan',
 'last_name': 'Bereau',
 'username': 'tbereau',
 'affiliation': 'Heidelberg University',
 'affiliation_address': 'Heidelberg, Germany',
 'email': None,
 'is_oasis_admin': False,
 'is_admin': False,
 'repo_user_id': None,
 'created': datetime.datetime(2023, 5, 23, 14, 22, 15, 547000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000'))}

Let's have a look at the datasets attached to the NOMAD user:

In [21]:
my_datasets = retrieve_datasets(user_id=user.user_id, use_prod=False)
pprint([asdict(d) for d in my_datasets])

24-08-23 15:56:58 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/datasets/?user_id=30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b&page_size=10
24-08-23 15:56:58 - martignac.nomad.users - INFO - retrieving user 30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b on prod server
24-08-23 15:56:58 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/users/30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b


[{'dataset_create_time': datetime.datetime(2024, 2, 14, 9, 39, 55, 876000),
  'dataset_id': 'HJdEI1q4SV-c5Di43BTT_Q',
  'dataset_modified_time': datetime.datetime(2024, 2, 14, 9, 39, 55, 876000),
  'dataset_name': 'Martignac test dataset',
  'dataset_type': 'DatasetType.owned',
  'doi': None,
  'm_annotations': None,
  'pid': None,
  'use_prod': False,
  'user': {'affiliation': 'Heidelberg University',
           'affiliation_address': 'Heidelberg, Germany',
           'created': None,
           'email': None,
           'first_name': 'Tristan',
           'is_admin': None,
           'is_oasis_admin': None,
           'last_name': 'Bereau',
           'name': 'Tristan Bereau',
           'repo_user_id': None,
           'user_id': '30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b',
           'username': 'tbereau'}},
 {'dataset_create_time': datetime.datetime(2024, 8, 20, 11, 15, 40, 253000),
  'dataset_id': 'IZYdt0ZhQeWq63WoyavGsQ',
  'dataset_modified_time': datetime.datetime(2024, 8, 20, 11, 

I will now create a new dataset specifically for this tutorial

In [14]:
dataset_id = create_dataset("Martignac tutorial", use_prod=False)
dataset_id

24-08-23 14:43:55 - martignac.nomad.datasets - INFO - creating dataset name Martignac tutorial on test server
24-08-23 14:43:55 - martignac.nomad.utils - INFO - Requesting authentication token @ https://nomad-lab.eu/prod/v1/test/api/v1
24-08-23 14:43:55 - martignac.nomad.utils - INFO - Sending post request @ https://nomad-lab.eu/prod/v1/test/api/v1/datasets/


'nla3SC_5TAKT5S08kKTXYQ'

You can access your datasets on the NOMAD GUI, simply head to <https://nomad-lab.eu/prod/v1/test/gui/user/datasets> (test database).

If you set this ID in your Martignac config `config.yaml` file, any Martignac simulations will be pulled from and pushed to this dataset.

Let's fetch a simulation entry from NOMAD. The entry has ID `IWfLp8VCyT7z9t3BtVy21Q5WZSRW`. The flag `with_authentication` is only necessary when we're querying private entries, for which a bearer token is required from the API. As you can see, the entry contains lots of information:

In [25]:
entry_id = "IWfLp8VCyT7z9t3BtVy21Q5WZSRW"
entry = get_entry_by_id(entry_id, with_authentication=False, use_prod=False)
pprint(asdict(entry))

{'authors': [{'affiliation': 'Heidelberg University',
              'affiliation_address': 'Heidelberg, Germany',
              'created': None,
              'email': None,
              'first_name': 'Tristan',
              'is_admin': None,
              'is_oasis_admin': None,
              'last_name': 'Bereau',
              'name': 'Tristan Bereau',
              'repo_user_id': None,
              'user_id': '30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b',
              'username': 'tbereau'}],
 'calc_id': 'IWfLp8VCyT7z9t3BtVy21Q5WZSRW',
 'comment': None,
 'datasets': [{'dataset_create_time': datetime.datetime(2024, 2, 14, 9, 39, 55, 876000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')),
               'dataset_id': 'HJdEI1q4SV-c5Di43BTT_Q',
               'dataset_modified_time': datetime.datetime(2024, 2, 14, 9, 39, 55, 876000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')),
               'dataset_name': 'Martignac test dataset',
               'dataset_type':

## MD simulation workflow

Now let's run a simulation workflow. We'll focus on generating a solute. The Martignac workflow class is called `SoluteGenFlow`. Simply calling the project returns all the jobs that exist locally. If you start fresh, you'll likely have an empty dataframe

In [26]:
solute_gen_flow_project

Unnamed: 0,sp.type,sp.solute_name,doc.SoluteGenFlow,doc.nomad_dataset_id
4021422cb94c1299990b3e0320c45330,solute,C2,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
823c6c8d69ff6586a2e710423f25655c,solute,C3,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
bc3c7ca12d4e7b0fa984e8cbb813acbf,solute,P6,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
7cbbfc58777902a755f28b4e00fef4ba,solute,C4,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
be40db643c4b16e8558ba80e06b9b8ce,solute,P4,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
3e793a7b2a1e83233c40458fddf958ab,solute,P5,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
f6c4cb3240dfe51db788f2d718f37baa,solute,C5,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q
78d55cc75f63f0f7d5bf8ff6fdbe2697,solute,C1,"{'files_symlinked': True, 'fetched_nomad': Tru...",HJdEI1q4SV-c5Di43BTT_Q


We've decided to generate the solute composed of a single bead type, "P6" in Martini. Following the `signac` design principle, each workflow is defined by its state point, which here corresponds to the chemistry used. In `signac`, a state point is a dictionary, which fully specifies the workflow. Let's define a new state point and add it (if it doesn't already exist) to the project. We'll "open the job", which means it either creates it or accesses it:

In [28]:
sp = {"type": "solute", "solute_name": "P6"}
job = solute_gen_flow_project.open_job(sp).init()
job

Job(project=SoluteGenFlow('/Users/bereau/work/projects/martignac/workspaces/solute_generation'), statepoint={'type': 'solute', 'solute_name': 'P6'})

Now we can run the job. Running will mean one of three things:

1. The job is present locally and has already been completed. Nothing needs to be done.
2. The job is either absent locally or has only partially been run. An attempt will be made at downloading from NOMAD the job. If the job is not on NOMAD, the job is continued locally.
3. The job is (partially) absent locally, but the job is present in NOMAD. The job is simply downloaded. The workflow is complete.

To run the workflow, we simply need to invoke `solute_gen_flow_project.run()`. We can also restrict ourselves to the afore mentioned job:

In [29]:
solute_gen_flow_project.run(jobs=[job])

24-08-23 16:06:56 - martignac.nomad.entries - INFO - retrieving entries for upload q-HvqWzAR8WIfkdai1z2Nw on test server
24-08-23 16:06:56 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/uploads/q-HvqWzAR8WIfkdai1z2Nw/entries
24-08-23 16:06:56 - martignac.nomad.users - INFO - retrieving user 30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b on prod server
24-08-23 16:06:56 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/users/30d3a108-d2cc-45ec-9ddb-0c1dc6a2c99b
24-08-23 16:06:56 - martignac.nomad.users - INFO - retrieving user 7c85bdf1-8b53-40a8-81a4-04f26ff56f29 on prod server
24-08-23 16:06:56 - martignac.nomad.utils - INFO - Sending get request @ https://nomad-lab.eu/prod/v1/test/api/v1/users/7c85bdf1-8b53-40a8-81a4-04f26ff56f29


As can be seen from the output, very little happened. In fact, the job already existed locally. Martignac simply sent a few requests to the NOMAD API to check that the NOMAD `entry_id` that's stored in the job in fact corresponds to what's online. The job stores metadata about the simulation workflow. This can be accessed in two commands. First, `job.sp` returns the `signac` state point:

In [30]:
job.sp

{'type': 'solute', 'solute_name': 'P6'}

The richer `job.doc` contains information that is Martignac specific:

In [37]:
pprint(dict(job.doc))

{'SoluteGenFlow': {'files_symlinked': True, 'fetched_nomad': True, 'gromacs_logs': {'minimize': 'solute_minimize.log', 'equilibrate': 'solute_equilibrate.log'}, 'itp_files': 'a52590b1d87d122ba1e376b83c3d6bee', 'mdp_files': '2d7a9e52d14d23e0dfb97192d75a3463', 'nomad_upload_id': 'q-HvqWzAR8WIfkdai1z2Nw', 'nomad_workflow': 'solute_generation.archive.yaml', 'ready_for_nomad_upload': True, 'tasks': {'build': 'run'}, 'solute_itp': 'solute.itp', 'solute_top': 'solute.top', 'solute_name': 'P6', 'solute_has_charged_beads': False, 'solute_gro': 'solute_equilibrate.gro'},
 'nomad_dataset_id': 'HJdEI1q4SV-c5Di43BTT_Q'}


where the NOMAD dataset is stored in `nomad_dataset_id`. The information specific to the solute-generation workflow is stored in the key `SoluteGenFlow`. Several subkeys are specified, and are largely used as flags to check for the internal advancement of the workflow:

- `files_symlinked`: itp and mdp files have already been symlinked
- `fetched_nomad`: workflow fetching from NOMAD was attempted
- `gromacs_logs`: node operations of the workflows consisting of MD simulations
- `itp_files`: hash of the ITP files
- `mdp_files`: hash of the MDP files
- `nomad_upload_id`: NOMAD upload ID after workflow upload
- `nomad_workflow`: name of the NOMAD workflow yaml file
- `ready_for_nomad_upload`: flag to push to NOMAD
- `tasks`: node operations of the workflows that are not MD simulations
- `solute_itp`: ITP file of the solute
- `solute_top`: TOP file of the solute
- `solute_name`: name of the solute (i.e., identical to `job.sp["solute_name"]`)
- `solute_has_charged_beads`: whether any bead is charged (for free-energy calculations)
- `solute_gro`: GRO file of the solute

The `job.doc` is significantly longer for more complex workflows, because it composes all workflows used. For instance, a solute-in-solvent generation will need the solute and solvent generation workflows, in addition to the SoluteInSolventGenFlow itself.

## Remove your dataset

You may want to clean things up and remove your simulations and dataset. You can remove simulations as long as you have not published them! However, NOMAD only allows you to have at most 10 unpublished uploads. From the NOMAD GUI you can access uploads: <https://nomad-lab.eu/prod/v1/test/gui/user/uploads>. From there, you can select the simulations you'd like to remove and click on the trash bin at the top right. Once you have remove all simulations from the dataset, you can also remove the dataset. This can be done through the API with the following command. If things run smoothly, you should get an acknowledgment in the logs ("successfully deleted dataset ..."):

In [41]:
delete_dataset(dataset_id, use_prod=False)

24-08-23 16:20:27 - martignac.nomad.datasets - INFO - deleting dataset nla3SC_5TAKT5S08kKTXYQ on test server
24-08-23 16:20:27 - martignac.nomad.utils - INFO - Requesting authentication token @ https://nomad-lab.eu/prod/v1/test/api/v1
24-08-23 16:20:27 - martignac.nomad.utils - INFO - Sending delete request @ https://nomad-lab.eu/prod/v1/test/api/v1/datasets/nla3SC_5TAKT5S08kKTXYQ
24-08-23 16:20:27 - martignac.nomad.datasets - INFO - successfully deleted dataset nla3SC_5TAKT5S08kKTXYQ


## That's all

That's it for this tutorial. Head out to the paper and the online docs to have a closer look at the different simulation workflows. The docs also describe how to construct the initialization of jobs. Last, head out to the Martignac Streamlit app to browse existing simulations: <https://martignac.streamlit.app/>.