## Imports

In [1]:
import re, os, sys, shutil
import shlex, subprocess
from importlib import reload
import glob
import gromacs
import matplotlib.pyplot as plt
import MDAnalysis as mda
import numpy as np
import pandas as pd
import panedr
import parmed as pmd
import pathlib
import py
import scipy
from scipy import stats
from thtools import cd
from paratemp import copy_no_overwrite
from paratemp import geometries as gm
from paratemp import coordinate_analysis as ca
import paratemp.sim_setup.para_temp_setup as pts
import paratemp as pt
from gautools import submit_gaussian as subg
from gautools.tools import use_gen_template as ugt

In [None]:
reload(subg)
reload(pts)
reload(pt)

# Using a class to setup a simulation

An example-based introduction to object-oriented programming and test-driven development

Thomas Heavey, David Coker

Group Meeting 2019-01-08

# Motivation

To set up a simulation, I used to just copy and paste a bunch of code around between notebooks.
Occasionally, I would add a function to a package for re-use later, but it was not a very simple interface.

In order to make it easier for other non-experts to be able to use, I aimed to simplify the interface.

## Ideal interface

Most basic functionality

In [None]:
input_geometry = 'geometry.gro'
topology = 'parameters/topology.top'

sim = Simulation(name='test_simulation',
                 gro=input_geometry,
                 top=topology)

This should create an instance of the class Simulation named `sim`.

## Let's make a test of this

In [2]:
def test_sim_exists():
    input_geometry = 'geometry.gro'
    topology = 'parameters/topology.top'

    sim = Simulation(name='test_simulation',
                     gro=input_geometry,
                     top=topology)
    assert isinstance(sim, Simulation), 'sim is not an instance of Simulation'

This would normally be run by the test runner (e.g., pytest or unittest), but we can also just run it here.

In [3]:
test_sim_exists()

NameError: name 'Simulation' is not defined

# Try to pass the test

Let's start by defining a class.

It won't do anything at this point other than exist.

In [4]:
class Simulation(object):
    pass

Defining a class like this means it is a subclass of the most basic class called `object`.

In [5]:
test_sim_exists()

TypeError: object() takes no parameters

## That didn't make it too far

Seems like our parent class `object` doesn't take any arguments when it is being instantiated (an instance of the class is being created).

Let's define an instantiator that takes some arguments.

In [6]:
class Simulation(object):
    
    def __init__(self, *args, **kwargs):
        print(f'I was instantiated with arguments: {args}\n '
              f'and keyword arguments: {kwargs}')
        pass

`__init__` is the name of the instantiator method that is called upon creating the instance of the class.  

`self` is the first argument given to the methods of the class. 
It is the object itself (in our cases so far, it would be `sim`.  
It doesn't have to be called `self`, but it is generally by convention. 
In other languages, it may commonly be called `this`.

`*args` will be a tuple (an immutable list) of all the positional (non-keyword) arguments.  
`**kwargs` will be a dict (a type of mapping) of all the arguments given as keywords (`key=value`)

In [7]:
sim = Simulation(1)
sim = Simulation(key='value')

I was instantiated with arguments: (1,)
 and keyword arguments: {}
I was instantiated with arguments: ()
 and keyword arguments: {'key': 'value'}


In [8]:
test_sim_exists()

I was instantiated with arguments: ()
 and keyword arguments: {'name': 'test_simulation', 'gro': 'geometry.gro', 'top': 'parameters/topology.top'}


## Testing beyond 'cogito ergo sum'

This instance should use the information about itself.

Let's write a test of what it should know

In [9]:
def test_knows_more_than_existence():
    input_geometry = 'geometry.gro'
    topology = 'parameters/topology.top'

    sim = Simulation(name='test_simulation',
                     gro=input_geometry,
                     top=topology)
    assert sim.name == 'test_simulation', 'The name is wrong'
    assert sim.gro == input_geometry, 'The geometry is wrong'
    assert sim.top == topology, 'The topology is wrong'

In [10]:
test_knows_more_than_existence()

I was instantiated with arguments: ()
 and keyword arguments: {'name': 'test_simulation', 'gro': 'geometry.gro', 'top': 'parameters/topology.top'}


AttributeError: 'Simulation' object has no attribute 'name'

We could just manually define these attributes after creating an instance:

In [13]:
input_geometry = 'geometry.gro'
topology = 'parameters/topology.top'

sim = Simulation(name='test_simulation',
                 gro=input_geometry,
                 top=topology)

sim.name = 'test_simulation'
sim.gro = input_geometry
sim.top = topology

assert sim.name == 'test_simulation'
assert sim.gro == input_geometry
assert sim.top == topology

I was instantiated with arguments: ()
 and keyword arguments: {'name': 'test_simulation', 'gro': 'geometry.gro', 'top': 'parameters/topology.top'}


First, that seems silly. 
We're giving it the information when we make the instance, but then have to manually assign it.

Second, will it even pass the test?

In [14]:
test_knows_more_than_existence()

I was instantiated with arguments: ()
 and keyword arguments: {'name': 'test_simulation', 'gro': 'geometry.gro', 'top': 'parameters/topology.top'}


AttributeError: 'Simulation' object has no attribute 'name'

## Define a better instantiator

In [19]:
class Simulation(object):
    
    def __init__(self, name, gro, top):
        self.name = name
        self.gro = gro
        self.top = top
        print('I was instantiated')
        
sim = Simulation(name='test',
                 gro=input_geometry,
                 top=topology)
sim.name

I was instantiated


'test'

In [16]:
test_knows_more_than_existence()

I was instantiated


# Testing what the instances can do

It should be able to do something and not just know things

Let's write a test of what it should be able to do

In [20]:

def test_do_something():
    input_geometry = 'geometry.gro'
    topology = 'parameters/topology.top'

    sim = Simulation(name='test_simulation',
                     gro=input_geometry,
                     top=topology)
    tpr = sim.make_tpr(mdp='minim.mdp')
    assert tpr == 'minim.tpr', 'The name is wrong'
    assert pathlib.Path(tpr).exists(), 'It does not exist'

In [21]:
test_do_something()

I was instantiated


AttributeError: 'Simulation' object has no attribute 'make_tpr'

Let's make it do something

In [22]:
class Simulation(object):
    
    def __init__(self, name, gro, top):
        self.name = name
        self.gro = gro
        self.top = top
        
    def make_tpr(self, mdp):
        tpr_name = f'{mdp[:-4]}.tpr'
        return tpr_name

In [23]:
test_do_something()

AssertionError: It does not exist

Just because we give it a name doesn't mean that it exists

In [24]:
class Simulation(object):
    
    def __init__(self, name, gro, top):
        self.name = name
        self.gro = gro
        self.top = top
        
    def make_tpr(self, mdp):
        tpr_name = f'{mdp[:-4]}.tpr'
        return_code, output, rest = gromacs.grompp_mpi(
            c=self.gro,
            p=self.top,
            f=mdp,
            o=tpr_name,
            stdout=False
        )
        print(output)
        return tpr_name

In [25]:
test_do_something()

Analysing residue names:
There are:     1      Other residues
There are:     1      Water residues
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Largest charge group radii for Van der Waals: 0.108, 0.045 nm
Largest charge group radii for Coulomb:       0.108, 0.094 nm
This run will generate roughly 0 Mb of data



# Side note: Importance of reasonable tests

In [None]:
class Simulation(object):
    
    def __init__(self, name, gro, top):
        self.name = name
        self.gro = gro
        self.top = top
        
    def make_tpr(self, mdp):
        tpr_name = f'{mdp[:-4]}.tpr'
        open(tpr_name, 'a').close()
        return tpr_name

In [None]:
test_do_something()

Obviously, this passes the test, but it doesn't actually make the file we wanted.

Many of my tests are like this:  
    I try to assume that no bad actor will try to decieve my code to pass the tests.
    
Writing a test to effectively test what we want would be challenging.
The tpr is a binary file that will even change depending on the version of GROMACS used (and of course the inputs).

Could just test that it's not empty, but that would be almost equally easy to cheat.

# Interface as I've actually written it

In [26]:
from paratemp.sim_setup import Simulation

? Simulation 

In [27]:
sim = Simulation(name='test_sim',
                 gro='geometry.gro',
                 top='parameters/topology.top',
                 mdps={'minimize': 'minim.mdp'})

This will then make a method of my instance called `minimize` that will run the simulation defined by that given mdp file.

In [28]:
sim.minimize()

AttributeError: module 'gromacs' has no attribute 'grompp'

It also does all the steps for the given mdp files in their own folders.

It then keeps track of the output file names, the folders used, the output given from each GROMACS step.

After running `minimize`, we can now get the minimized geometry;

In [29]:
sim.last_geometry

PosixPath('/projectnb/nonadmd/theavey/GPX-project/04-Simulation_class/geometry.gro')