# EXERCISES

## Philosophy

All graduate students in the group will be "forced" to engage in a series of exercises related to molecular simulation before really getting started in "real research." Series A focuses on molecular simulation, and Series B (yet-to-be-developed) focuses on machine learning. The primary purpose of these exercises is to expose students to essential elements in these disciplines and provide a firm foundation to understand literature and pursue research in the area. Interested prospective students may also elect to try out these exercises, prior to joining the group, to gauge interest in computational research (although admittedly, these are not the sexiest of topics!) or to get a headstart in research in the group (by timeshifting these exercises from early spring  after matriculation to sometime in the first semester/winter break). 

**Note on implementation**: In terms of programming language, you can use whatever you want. Even if slow, the simulations can be feasibly accomplished using interpreted languages. As a first recommendation, I would probably recommend Python for both series. However, for Series A, it would also be quite reasonable and worthwhile to write some programs in *C* or *FORTRAN*, and perhaps you could perform analysis in Python. Using a compiled language forces you to think more about some mechanical aspects of algorithm implementation and coding, in general, that can pay dividends. It is also possible to write some routines in a compiled language to accelerate an overarching Python code. Jupyter notebooks or powerpoint slides are a good way to share your results.

**Note on resources**: Most of these tasks can be found as examples in textbooks or courses, and so it would be very easy to copy the code from a relevant resource. Use whatever resources you want (**FS**, **AT**, [Shell course](https://sites.engineering.ucsb.edu/~shell/che210d/)), but in a marked exception from my typical coding hierarchy of importance, in this instance, you *must* understand what the the code is doing and what the algorithms are doing.

**Note on timing**: These exercises take variably long depending on your prior knowledge and comfort level with the required skills/tools. G1 students also always have other responsibilities and time commitments, but they can still usually find pockets of time here and there to make progress. About a week is as short as I have seen (the exercises are quite similar to those that you would find in a molecular simulation course, and so, in this case, the student could make use of that work). For a person with reasonable programming experience but little exposure to molecular simulation, a few weeks is probably adequate for Series A. If you are starting from scratch with programming, then that extends the timescale a bit. In the grand scheme of things, it should constitute an insignificant period of time.

**Note on brevity/lack of detail**: "It's all part of the plan." -- The Joker // Without a literal prescription on how to proceed with every detail, you will have to spend more time with references and thinking about the implementation of code/analysis. You will recognize just how many choices go into doing something deceptively simple. *My advice:* start simple and consider ways that you can benchmark or validate your code as you proceed. For example, once you think you have a working code/method of analysis, you can find a result to reproduce from a textbook/paper using exactly the indicated conditions.  

***Series A: Fundamentals of Molecular Simulation***

By thoughtfully completing the exercises in this exercise, you will

1. develop/demonstrate basic programming proficiency. If you possess only limited exposure to programming, then this is the time to learn with dedicated study! We cannot shortchange this. The goal here, however, is proficiency not expertise. This can also be a good opportunity to learn an unfamiliar language or programming paradigm.
2. gain detailed understanding of key algorithms that underlie moleclar simulation. You will also then be able to understand limitations of the algorithms. Fundamental understanding equips you to more readily diagnose issues when they arise.
3. be equipped to make connections between molecular simulation, statistical mechanics, and macroscopic properties.
4. understand common forms of analysis, error estimation, and visualization

***Series B: Fundamentals of Machine Learning in Chemistry***

By thoughtfully completing the exercises in this exercise, you will... <TBD>


## A. Fundamentals of Molecular Simulation 

### Task 1. Monte Carlo Simulation of a Lennard-Jones Fluid

**Objective(s):** the goal of this exercise is to write a code to perform *Monte Carlo simulation* of a simple Lennard-Jones fluid in three dimensions in the NVT/canonical ensemble ( fixed particle number, volume, and temperature). The data generated by the simulations will be used to perform some linchpin analyses. 

**Some details/hints:**
* A Lennard-Jones fluid is an archetypal, particle-based model for simple substances. The particles interact via a [Lennard-Jones](https://en.wikipedia.org/wiki/Lennard-Jones_potential) (or 12-6) potential. This potential is extensively used in molecular simulation to describe intermolecular interactions.
* Use cubic, periodic boundary conditions
* To obtain reasonable results, your systems need not contain more than 256 particles
* Your code should be flexible enough to allow for specification of the thermodynamic constraints (e.g., density and temperature). 
* Your code should use a cutoff distance for the interactions between two particles. I recommend using a truncate-and-shift approach. 
* Simple, single-particle displacment moves are sufficient. Consider the maximum displacement as a tunable parameter and how this impacts the frequency with which proposed moves are accepted.    

**Recommended Resource(s)**: **FS** 

**Target(s)/Questions:** 

a.) In Monte Carlo, you typically want moves that are not accepted a fair percentage of the time that they are proposed. A good, initial rule-of-thumb is perhaps 40% of moves should be accepted. Can you make sense of this idea? How does the maximum displacement size affect the "acceptance rate" of proposed MC moves. How do you expect this acceptance rate to change as a function of temperature?

b.) Estimate a constant-volume heat capacity $C_v$ when the system is a gas versus a liquid (the representative gas and liquid need not be at coexistence/can be at different temperatures).

c.) Obtain a numerical equation of state (pressure vs. density) for two isotherms (one above and one below the critical temperature). What does it mean for a system to exhibit negative pressure? Write a function to output the coordinates of particles in your system (if you haven't already) and create visualizations of your systems for a few representative state points using a visualization software. 

d.) Compute and plot pair radial distribution functions for two thermodynamic conditions: one in the gas-phase and the other a liquid. Remark on their characteristics.

### Task 2. Molecular Dynamics Simulation of a Lennard-Jones Fluid

**Objective(s):** the goal of this exercise is to write a code to perform a *molecular dynamics simulation* of a simple Lennard-Jones fluid in three dimensions. Simulations will be run in both the NVE/microcanonical ensemble (fixed particle number, volume, and energy) as well as a pseudo canonical ensemble. In addition to reproducing some prior analyses, dynamical characterization will be introduced.

**Some details/hints:**
* You can generate your MD code by modification of your MC code with pretty modest modifications.
* I would recommend time integration by the velocity-Verlet scheme (yielding positions and velocities at the same time points)
* Improved statistical convergence of per-particle quantities can be facilitated by exploiting the fact distinct particles are nominally independent. When considering time-dependent quantities, note that $t =0 $ is arbitrary, and you can have many effectively uncorrelated samples for a given $\Delta t$ from within the same trajectory. 
* Pay special attention to the handling of periodic boundaries. 
* If your particle coordinates are always *in* the confines of your defined periodic simulation cell, note that you can reconstruct a particle trajectory by summation of displacements between sampled frames (so long as the sampling frequency is confidently less than the time for a particle to diffuse half the box length).

**Recommended Resource(s)**: **FS** or **AT**

**Target(s)/Questions:** 

a.) Begin by implementing vanilla MD, which should correspond to simulation in the NVE ensemble. Why?

b.) What physically sets the timestep for numerical integration of the equations of motion? What practically determines its value? Demonstrate how the timestep affects the simulation by monitoring the total energy as a function of time for different timesteps.

c.) Running an NVE simulation, determine the average temperature of the system. Also, create probability distributions/histograms of the particle velocity/speed distributions. How does this compare to kinetic theory?

d.) Modify your MD code now to use a simple thermostat. I am going to recommend implementation of an Andersen thermostat. Confirm that your simulation possesses the target temperature. Demonstrate the statistical equivalence of using MC vs. MD by comparing some set of results you obtained previously with MC with results obtained using MD at the same set of conditions.

e.) Estimate the self-diffusion coefficient of a Lennard-Jones particle in a liquid-phase. You should do this usingYou can do this calculation two ways: *(1)* by computing the mean-squared displacment and *(2)* by interating the velocity autocorrelation function. Compare values from the two methods. For this analysis, you should collect statistics during an NVE simulation and perhaps periodically "reset" the temperature to a specific target. Why might it be problematic to compute these quantities in the presence of a thermostat?

### Task 3. Simulation of a Lennard-Jones Fluid with an MD software

**Objective(s):** The goal of this exercise is to gain experience with how to work with more advanced MD software. You will essentially reproduce some subset of results from the prior tasks but using your choice of MD software as the engine to generate results.

**Some details/hints:** I would recommend using [LAMMPS](https://www.lammps.org/#gsc.tab=0) or maybe [HOOMD-blue](https://hoomd-blue.readthedocs.io/en/v3.5.0/). Think about what you have needed to implement yourself and let that guide what you search for in the documentation.

**Recommended Resource(s)**: Tutorials and documentation pages related to the software.

**Target(s)/Questions:** 

a.) Develop an input file that should drive your simulation. Each command accomplishes something. Annotate/comment how these commands manifested (or not) in the code that you wrote yourself.

b.) Demonstrate the equivalence of some result from the chosen software with your "in-house" code. What's the comparison in performance (e.g., timesteps per day) between your code and the chosen software?

## B. Fundamentals of Machine Learning

To be determined!