<a href="https://colab.research.google.com/github/wsamyono/BulldogTeamFacHackGW23/blob/main/JSEP2022Week3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Data Collections: Tuples, Lists, For Loops, and Dictionaries
A staggering degree of algorithmic complexity is possible using only variables, functions, and control flow concepts. However, thus far, numbers and strings are the only data types that have been discussed. Such data types can be used to represent protein sequences (a string) and molecular masses (a floating point number), but actual scientific data are seldom so simple! The data from a mass spectrometry experiment are a list of intensities at various m/z values (the mass spectrum). Optical microscopy experiments yield thousands of images, each consisting of a large two-dimensional array of pixels, and each pixel has color information that one may wish to access [68]. A protein multiple sequence alignment can be considered as a two-dimensional array of characters drawn from a 21-letter alphabet (one letter per amino acid (AA) and a gap symbol), and a protein 3D structural alignment is even more complex. Phylogenetic trees consist of sets of species, individual proteins, or other taxonomic entities, organized as (typically) binary trees with branch weights that represent some metric of evolutionary distance. A trajectory from an MD or Brownian dynamics simulation is especially dense: Cartesian coordinates and velocities are specified for upwards of 106 atoms at >106 time-points (every ps in a μs-scale trajectory). As illustrated by these examples, real scientific data exhibit a level of complexity far beyond Python’s relatively simple built-in data types. Modern datasets are often quite heterogeneous, particularly in the biosciences [69], and therefore data abstraction and integration are often the major goals. The data challenges hold true at all levels, from individual RNA transcripts [70] to whole bacterial cells [71] to biomedical informatics [72].

In each of the above examples, the relevant data comprise a collection of entities, each of which, in turn, is of some simpler data type. This unifying principle offers a way forward. The term data structure refers to an object that stores data in a specifically organized (structured) manner, as defined by the programmer. Given an adequately well-specified/defined data structure, arbitrarily complex collections of data can be readily handled by Python, from a simple array of integers to a highly intricate, multi-dimensional, heterogeneous (mixed-type) data structure. Python offers several built-in sequence data structures, including strings, lists, and tuples.

**Tuples**
A tuple (pronounced like “couple”) is simply an ordered sequence of objects, with essentially no restrictions as to the types of the objects. Thus, the tuple is especially useful in building data structures as higher-order collections. Data that are inherently sequential (e.g., time-series data recorded by an instrument) are naturally expressed as a tuple, as illustrated by the following syntactic form: myTuple = (0,1,3). The tuple is surrounded by parentheses, and commas separate the individual elements. The empty tuple is denoted (), and a tuple of one element contains a comma after that element, e.g., (1,); the final comma lets Python distinguish between a tuple and a mathematical operation. That is, 2*(3+1) must not treat (3+1) as a tuple. A parenthesized expression is therefore not made into a tuple unless it contains commas. (The type function is a useful built-in function to probe an object’s type. At the Python interpreter, try the statements type((1)) and type((1,)). How do the results differ?)

A tuple can contain any sort of object, including another tuple. For example, diverseTuple = (15.38,"someString",(0,1)) contains a floating-point number, a string, and another tuple. This versatility makes tuples an effective means of representing complex or heterogeneous data structures. Note that any component of a tuple can be referenced using the same notation used to index individual characters within a string; e.g., diverseTuple[0] gives 15.38.

In general, data are optimally stored, analyzed, modified, and otherwise processed using data structures that reflect any underlying structure of the data itself. Thus, for example, two-dimensional datasets are most naturally stored as tuples of tuples. This abstraction can be taken to arbitrary depth, making tuples useful for storing arbitrarily complex data. For instance, tuples have been used to create generic tensor-like objects. These rich data structures have been used in developing new tools for the analysis of MD trajectories [18] and to represent biological sequence information as hierarchical, multidimensional entities that are amenable to further processing in Python [20].

As a concrete example, consider the problem of representing signal intensity data collected over time. If the data are sampled with perfect periodicity, say every second, then the information could be stored (most compactly) in a one-dimensional tuple, as a simple succession of intensities; the index of an element in the tuple maps to a time-point (index 0 corresponds to the measurement at time t0, index 1 is at time t1, etc.). What if the data were sampled unevenly in time? Then each datum could be represented as an ordered pair, (t, I(t)), of the intensity I at each time-point t; the full time-series of measurements is then given by the sequence of 2-element tuples, like so:



In [None]:
dataSet = ((0.00,0.2),(0.17,0.3),(0.33,0.4),(0.40,0.2),(0.90,0.0))
print(dataSet[4]) # print the final order pair
print(dataSet[0][1]) # print the intensity at time step 0
print(dataSet[2][0]) # print time at time step 2

(0.9, 0.0)
0.2
0.33


Three notes concern the above code: (i) From this two-dimensional data structure, the syntax
dataSet [i] [j] retrieves the jth element from the ith tuple. (ii) Negative indices can be
used as shorthand to index from the end of most collections (tuples, lists, etc.), as shown in
Fig 1; thus, in the above example dataSet [-1] represents the same value as dataSet [4].
(iii) Recall that Python treats all lines of code that belong to the same block (or degree of indentation)
as a single unit. In the example above, the first line alone is not a valid (closed) expression,
and Python allows the expression to continue on to the next line; the lengthy dataSet
expression was formatted as above in order to aid readability.
Once defined, a tuple cannot be altered; tuples are said to be immutable data structures.
This rigidity can be helpful or restrictive, depending on the context and intended purpose. For
instance, tuples are suitable for storing numerical constants, or for ordered collections that are
generated once during execution and intended only for referencing thereafter (e.g., an input
stream of raw data).

**Lists**

A mutable data structure is the Python list. This built-in sequence type allows for the addition,
removal, and modification of elements. The syntactic form used to define lists resembles the
definition of a tuple, except that the parentheses are replaced with square brackets, e.g.
myList = [0, 1, 42, 78]. (A trailing comma is unnecessary in one-element lists, as [1]
is unambiguously a list.) As suggested by the preceding line, the elements in a Python list are
typically more homogeneous than might be found in a tuple: The statement myList2 =
['a',1], which defines a list containing both string and numeric types, is technically valid,
but myList2 = ['a','b'] or myList2 = [0, 1] would be more frequently encountered
in practice. Note that myList [1] = 3.14 is a perfectly valid statement that can be
applied to the already-defined object named myList (as long as myList already contains
two or more elements), resulting in the modification of the second element in the list. Finally,
note that myList [5] = 3.14 will raise an error, as the list defined above does not contain a
sixth element. The index is said to be out of range, and a valid approach would be to append
the value via myList.append(3.14).
The foregoing description only scratches the surface of Python’s built-in data structures. Several
functions and methods are available for lists, tuples, strings, and other built-in types. For
lists, append, insert, and remove are examples of oft-used methods; the function len()
returns the number of items in a sequence or collection, such as the length of a string or number
of elements in a list. All of these “list methods” behave similarly as any other function—arguments
are generally provided as input, some processing occurs, and values may be returned.
(The OOP section, below, elaborates the relationship between functions and methods.)

**Iteration with For Loops**

Lists and tuples are examples of iterable types in Python, and the for loop is a useful construct
in handling such objects. (Custom iterable types are introduced in Supplemental Chapter 17 in
S1 Text.) A Python for loop iterates over a collection, which is a common operation in virtually
all data-analysis workflows. Recall that a while loop requires a counter to track progress
through the iteration, and this counter is tested against the continuation condition. In contrast,
a for loop handles the count implicitly, given an argument that is an iterable object:

In [None]:
myData = [1.414, 2.718, 3.142, 4.669]
total = 0
for datum in myData:
  # the next statement uses a compound assignment operator; in
  # the addition assignment operator, a += b means a = a + b
  total += datum
  print("added " + str(datum) + " to sum.")
# str makes a string from datum so we can concatenate with +.
print(total)


added 1.414 to sum.
added 2.718 to sum.
added 3.142 to sum.
added 4.669 to sum.
11.942999999999998


In the above loop, all elements in myData are of the same type (namely, floating-point numbers).
This is not mandatory. For instance, the heterogeneous object myData =
['a','b',1,2] is iterable, and therefore it is a valid argument to a for loop (though not
the above loop, as string and integer types cannot be mixed as operands to the + operator). The
context dependence of the + symbol, meaning either numeric addition or a concatenation operator, depending on the arguments, is an example of operator overloading. (Together with
dynamic typing, operator overloading helps make Python a highly expressive programming
language.) In each iteration of the above loop, the variable datum is assigned each successive
element in myData; specifying this iterative task as a while loop is possible, but less straightforward.
Finally, note the syntactic difference between Python’s for loops and the for
(<initialize>; <condition>; <update>) {<body>} construct that is found in
C, Perl, and other languages encountered in computational biology.

**Exercise 7:** Consider the fermentation of glucose into ethanol: C6H12O6!2C2H5OH +
2CO2. A fermentor is initially charged with 10,000 liters of feed solution and the rate of carbon
dioxide production is measured by a sensor in moles/hour. At t = 10, 20, 30, 40, 50, 60, 70, and
80 hours, the CO2 generation rates are 58.2, 65.2, 67.8, 65.4, 58.8, 49.6, 39.1, and 15.8 moles/
hour respectively. Assuming that each reading represents the average CO2 production rate
over the previous ten hours, calculate the total amount of CO2 generated and the final ethanol
concentration in grams per liter. Note that Supplemental Chapters 6 and 9 might be useful
here.


In [None]:
# Write the code for Exercise 7 below
#


**Exercise 8:** Write a program to compute the distance, d(r1, r2), between two arbitrary (userspecified)
points, r1 = (x1, y1, z1) and r2 = (x2, y2, z2), in 3D space. Use the usual Euclidean distance
between two points—the straight-line, “as the bird flies” distance. Other distance metrics,
such as the Mahalanobis and Manhattan distances, often appear in computational biology too.
With your code in hand, note the ease with which you can adjust your entire data-analysis
workflow simply by modifying a few lines of code that correspond to the definition of the distance
function. As a bonus exercise, generalize your code to read in a list of points and compute
the total path length. Supplemental Chapters 6, 7, and 9 might be useful here.

In [None]:
# Write the code for Exercise 8 below
#


**Sets and Dictionaries**

Whereas lists, tuples, and strings are ordered (sequential) data types, Python’s sets and dictionaries
are unordered data containers. Dictionaries, also known as associative arrays or hashes
in Perl and other common languages, consist of key:value pairs enclosed in braces. They are
particularly useful data structures because, unlike lists and tuples, the values are not restricted
to being indexed solely by the integers corresponding to sequential position in the data series.
Rather, the keys in a dictionary serve as the index, and they can be of any immutable data type
(strings, numbers, or tuples of immutable data). A simple example, indexing on three-letter
abbreviations for amino acids and including molar masses, would be aminoAcids =
{'ala':('a','alanine', 89.1),'cys':('c','cysteine', 121.2)}. A dictionary’s
items are accessed via square brackets, analogously as for a tuple or list, e.g., aminoAcids
['ala'] would retrieve the tuple ('a','alanine', 89.1). As another
example, dictionaries can be used to create lookup tables for the properties of a collection of
closely related proteins. Each key could be set to a unique identifier for each protein, such as its
UniProt ID (e.g., Q8ZYG5), and the corresponding values could be an intricate tuple data structure
that contains the protein’s isoelectric point, molecular weight, PDB accession code (if a
structure exists), and so on. Dictionaries are described in greater detail in Supplemental Chapter
10 in the S1 Text.

**Further Data Structures: Trees and Beyond**

Python’s built-in data structures are made for sequential data, and using them for other purposes
can quickly become awkward. Consider the task of representing genealogy: an individual
may have some number of children, and each child may have their own children, and so on.
There is no straightforward way to represent this type of information as a list or tuple. A better approach would be to represent each organism as a tuple containing its children. Each of those
elements would, in turn, be another tuple with children, and so on. A specific organism would
be a node in this data structure, with a branch leading to each of its child nodes; an organism
having no children is effectively a leaf. A node that is not the child of any other node would be
the root of this tree. This intuitive description corresponds, in fact, to exactly the terminology
used by computer scientists in describing trees [73]. Trees are pervasive in computer science.
This document, for example, could be represented purely as a list of characters, but doing so
neglects its underlying structure, which is that of a tree (sections, sub-sections, sub-sub-sections,
. . .). The whole document is the root entity, each section is a node on a branch, each subsection
a branch from a section, and so on down through the paragraphs, sentences, words,
and letters. A common and intuitive use of trees in bioinformatics is to represent phylogenetic
relationships. However, trees are such a general data structure that they also find use, for
instance, in computational geometry applications to biomolecules (e.g., to optimally partition
data along different spatial dimensions [74,75]).
Trees are, by definition, (i) acyclic, meaning that following a branch from node i will never
lead back to node i, and any node has exactly one parent; and (ii) directed, meaning that a node
knows only about the nodes “below” it, not the ones “above” it. Relaxing these requirements
gives a graph [76], which is an even more fundamental and universal data structure: A graph is
a set of vertices that are connected by edges. Graphs can be subtle to work with and a number
of clever algorithms are available to analyze them [77].
There are countless data structures available, and more are constantly being devised.
Advanced examples range from the biologically-inspired neural network, which is essentially a
graph wherein the vertices are linked into communication networks to emulate the neuronal
layers in a brain [78], to very compact probabilistic data structures such as the Bloom filter
[79], to self-balancing trees [80] that provide extremely fast insertion and removal of elements
for performance-critical code, to copy-on-write B-trees that organize terabytes of information
on hard drives [81].


## Object-Oriented Programming in a Nutshell: Classes, Objects, Methods, and All That

### OOP in Theory: Some Basic Principles
Computer programs are characterized by two essential features [82]: (i) algorithms or, loosely,
the “programming logic,” and (ii) data structures, or how data are represented within the program,
whether certain components are manipulable, iterable, etc. The object-oriented programming
(OOP) paradigm, to which Python is particularly well-suited, treats these two
features of a program as inseparable. Several thorough treatments of OOP are available, including
texts that are independent of any language [83] and books that specifically focus on OOP
in Python [84]. The core ideas are explored in this section and in Supplemental Chapters 15
and 16 in S1 Text.
Most scientific data have some form of inherent structure, and this serves as a starting point
in understanding OOP. For instance, the time-series example mentioned above is structured as
a series of ordered pairs, (t, I(t)), an X-ray diffraction pattern consists of a collection of intensities
that are indexed by integer triples (h, k, l), and so on. In general, the intrinsic structure of
scientific data cannot be easily or efficiently described using one of Python’s standard data
structures because those types (strings, lists, etc.) are far too simple and limited. Consider, for
instance, the task of representing a protein 3D structure, where “representing” means storing
all the information that one may wish to access and manipulate: AA sequence (residue types
and numbers), the atoms comprising each residue, the spatial coordinates of each atom, whether a cysteine residue is disulfide-bonded or not, the protein’s function, the year the protein
was discovered, a list of orthologs of known structure, and so on. What data structure
might be capable of most naturally representing such an entity? A simple (generic) Python
tuple or list is clearly insufficient.
For this problem, one could try to represent the protein as a single tuple, where the first element
is a list of the sequence of residues, the second element is a string describing the protein’s
function, the third element lists orthologs, etc. Somewhere within this top-level list, the coordinates
of the Cα atom of Alanine-42 might be represented as [x,y,z], which is a simple list of
length three. (The list is “simple” in the sense that its rank is one; the rank of a tuple or list is,
loosely, the number of dimensions spanned by its rows, and in this case we have but one row.)
In other words, our overall data-representation problem can be hierarchically decomposed into
simpler sub-problems that are amenable to representation via Python’s built-in types. While
valid, such a data structure will be difficult to use: The programmer will have to recall multiple
arbitrary numbers (list and sub-list indices) in order to access anything, and extensions to this
approach will only make it clumsier. Additionally, there are many functions that are meaningful
only in the context of proteins, not all tuples. For example, we may need to compute the solvent-
accessible surface areas of all residues in all β-strands for a list of proteins, but this operation
would be nonsensical for a list of Supreme Court cases. Conversely, not all tuple methods
would be relevant to this protein data structure, yet a function to find Court cases that reached a
5-4 decision along party lines would accept the protein as an argument. In other words, the
tuple mentioned above has no clean way to make the necessary associations. It’s just a tuple.
## OOP Terminology
This protein representation problem is elegantly solved via the OOP concepts of classes,
objects, and methods. Briefly, an object is an instance of a data structure that contains members
and methods. Members are data of potentially any type, including other objects. Unlike lists
and tuples, where the elements are indexed by numbers starting from zero, the members of an
object are given names, such as yearDiscovered. Methods are functions that (typically)
make use of the members of the object. Methods perform operations that are related to the
data in the object’s members. Objects are constructed from class definitions, which are blocks
that define what most of the methods will be for an object. The examples in the 'OOP in Practice'
section will help clarify this terminology. (Note that some languages require that all methods
and members be specified in the class declaration, but Python allows duck punching, or
adding members after declaring a class. Adding methods later is possible too, but uncommon.
Some built-in types, such as int, do not support duck punching.)
During execution of an actual program, a specific object is created by calling the name of
the class, as one would do for a function. The interpreter will set aside some memory for the
object’s methods and members, and then call a method named __init__, which initializes
the object for use.
Classes can be created from previously defined classes. In such cases, all properties of the
parent class are said to be inherited by the child class. The child class is termed a derived class,
while the parent is described as a base class. For instance, a user-defined Biopolymer class
may have derived classes named Protein and NucleicAcid, and may itself be derived
from a more general Molecule base class. Class names often begin with a capital letter, while
object names (i.e., variables) often start with a lowercase letter. Within a class definition, a leading
underscore denotes member names that will be protected. Working examples and annotated
descriptions of these concepts can be found, in the context of protein structural analysis,
in ref [85].

The OOP paradigm suffuses the Python language: Every value is an object. For example, the
statement foo = 'bar' instantiates a new object (of type str) and binds the name foo to
that object. All built-in string methods will be exposed for that object (e.g., foo.upper()
returns 'BAR'). Python’s built-in dir() function can be used to list all attributes and methods
of an object, so dir(foo) will list all available attributes and valid methods on the variable
foo. The statement dir(1) will show all the methods and members of an int (there
are many!). This example also illustrates the conventional OOP dot-notation, object.
attribute, which is used to access an object’s members, and to invoke its methods (Fig 1,
left). For instance, protein1.residues [2].CA.x might give the x-coordinate of the Cα
atom of the third residue in protein1 as a floating-point number, and protein1.residues
[5].ssbond(protein2.residues [6]) might be used to define a disulfide bond
(the ssbond() method) between residue-6 of protein1 and residue-7 of protein2. In
this example, the residues member is a list or tuple of objects, and an item is retrieved from
the collection using an index in brackets.

## Benefits of OOP
By effectively compartmentalizing the programming logic and implicitly requiring a disciplined
approach to data structures, the OOP paradigm offers several benefits. Chief among
these are (i) clean data/code separation and bundling (i.e., modularization), (ii) code reusability,
(iii) greater extensibility (derived classes can be created as needs become more specialized),
and (iv) encapsulation into classes/objects provides a clearer interface for other programmers
and users. Indeed, a generally good practice is to discourage end-users from directly accessing
and modifying all of the members of an object. Instead, one can expose a limited and clean
interface to the user, while the back-end functionality (which defines the class) remains safely
under the control of the class’ author. As an example, custom getter and setter methods can be
specified in the class definition itself, and these methods can be called in another user’s code in
order to enable the safe and controlled access/modification of the object’s members. A setter
can ‘sanity-check’ its input to verify that the values do not send the object into a nonsensical or
broken state; e.g., specifying the string "ham" as the x-coordinate of an atom could be caught
before program execution continues with a corrupted object. By forcing alterations and other
interactions with an object to occur via a limited number of well-defined getters/setters, one
can ensure that the integrity of the object’s data structure is preserved for downstream usage.
The OOP paradigm also solves the aforementioned problem wherein a protein implemented
as a tuple had no good way to be associated with the appropriate functions—we could call
Python’s built-in max() on a protein, which would be meaningless, or we could try to compute
the isoelectric point of an arbitrary list (of Supreme Court cases), which would be similarly nonsensical.
Using classes sidesteps these problems. If our Protein class does not define a max
() method, then no attempt can be made to calculate its maximum. If it does define an isoelectricPoint()
method, then that method can be applied only to an object of type Protein.
For users/programmers, this is invaluable: If a class from a library has a particular
method, one can be assured that that method will work with objects of that class.

## OOP in Practice: Some Examples
A classic example of a data structure that is naturally implemented via OOP is the creation of a
Human class. Each Human object can be fully characterized by her respective properties (members
such as height, weight, etc.) and functionality (methods such as breathing, eating, speaking,
etc.). A specific human being, e.g. guidoVanRossum, is an instance of the Human class; this
class may, itself, be a subclass of a Hominidae base class. The following code illustrates how one might define a Human class, including some functionality to age the Human and to set/get
various members (descriptors such as height, age, etc.):

In [None]:
from random import randint

class Human():
  _age = 0
  _height = 0
  _sex = ""

  def _init_(self, theSex):
    self.setSex(theSex)
  def haveBirthday(self):
     self._age += 1
     if (self._age < 21):
       self._height += randint(0.15)
     if (self._age > 60):
       self._height -= randint(0.05)
  def setAge(self, theAge):
    assert(theAge >= 0)
    self._age = theAge
  def setHeight(self, theHeight):
    self._height = theHeight
  def setSex(self, theSex):
    assert(theSex == "male" or theSex == "famale") # Validate input.
    self._sex = theSex
  def getAge(self):
    return self._age
  def getHeight(self):
    return self._height
  def getSex(self):
    return self._sex
# now use the class by instantiating an object and manipuliting it:
guido = Human() # This call _init_ and
guido.setSex("male")  # set guido's sex to "male"
print(guido.getSex())
print(guido.getAge()) # see the default value of age
guido.setAge(21)      # set guido's age.
guido.haveBirthday()  # it's guido's birthday.
print(guido.getAge()) # verify that age has advanced by one unit.


male
0
22


Note the usage of self as the first argument in each method defined in the above code.
The self keyword is necessary because when a method is invoked it must know which object
to use. That is, an object instantiated from a class requires that methods on that object have
some way to reference that particular instance of the class, versus other potential instances of
that class. The self keyword provides such a “hook” to reference the specific object for which
a method is called. Every method invocation for a given object, including even the initializer
called __init__, must pass itself (the current instance) as the first argument to the method;
this subtlety is further described at [86] and [87]. A practical way to view the effect of self is
that any occurrence of objName.methodName(arg1, arg2) effectively becomes
methodName(objName, arg1, arg2). This is one key deviation from the behavior of
top-level functions, which exist outside of any class. When defining methods, usage of self
provides an explicit way for the object itself to be provided as an argument (self-reference), and
its disciplined usage will help minimize confusion about expected arguments.

To illustrate how objects may interact with one another, consider a class to represent a
chemical’s atom:

In [None]:
class Atom:
  (x,y,z) = (0,0,0) # compactly set all three vars simultaneously (rather
                    # than on three separate lines, as x=0, y=0, z=0)
  name = "X"
  def distanceFrom(self,other):
    return ((self.x - other.x)**2 +
            (self.y - other.y)**2 +
            (self.z - other.z)**2)**0.5
  def setName(self,newName):
    assert(newName in ("C", "H", "O", "N", "S"))
    self.name = newName
  def moveTo(self, newX, newY, newZ):
    self.x = newX
    self.y = newY
    self.z = newZ

Then, we can use this Atom class in constructing another class to represent molecules:

In [None]:
class Molecule:
  atoms = []
  bonds = []
  def addAtom(self, newAtom):
    self.atoms.append(newAtom)
  def makeBond(self, a1, a2):
    self.bonds.append((a1, a2))
  def avgBondLength(self):
    totLength = 0
    for bondIndex in range(0,len(self.bonds)):
      firstAtom = self.bonds[bondIndex][0]
      secondAtom = self.bonds[bondIndex][1]
      totLength += firstAtom.distanceFrom(secondAtom)
    return totLength / len(self.bonds)


And, finally, the following code illustrates the construction of a diatomic molecule:


In [None]:
a1 = Atom(); a2 = Atom()   # inititate two atoms
a2.moveTo(1,1,0)           # move the second atom to (1, 1, 0)
m = Molecule()             # instantiate a new molecule
m.addAtom(a1)              # .... and populate it with
m.addAtom(a2)              # these two atoms
m.makeBond(a1, a2)         # define a bond between the two atoms
print (m.avgBondLength())

1.4142135623730951


If the above code is run, for example, in an interactive Python session, then note that the
aforementioned $dir()$ function is an especially useful built-in tool for querying the properties
of new classes and objects. For instance, issuing the statement $dir(Molecule)$ will return
detailed information about the $Molecule$ class (including its available methods).

**Exercise 9:** Amino acids can be effectively represented via OOP because each AA has a welldefined
chemical composition: a specific number of atoms of various element types (carbon,
nitrogen, etc.) and a covalent bond connectivity that adheres to a specific pattern. For these reasons,
the prototype of an L-amino acid can be unambiguously defined by the SMILES [88] string
$N[C@@H](R)C(=O)O$, where $R$ denotes the side-chain and $@@$ indicates the $L$ enantiomer. In addition to chemical structure, each $AA$ also features specific physicochemical properties
(molar mass, isoelectric point, optical activity/specific rotation, etc.). In this exercise, create an
$AA$ class and use it to define any two of the twenty standard AAs, in terms of their chemical
composition and unique physical properties. To extend this exercise, consider expanding your
AA class to include additional class members (e.g., the frequency of occurrence of that AA type)
and methods (e.g., the possibility of applying post-translational modifications). To see the utility
of this exercise in a broader OOP schema, see the discussion of the hierarchical Structure $\supset$
Model $\supset$ Chain $\supset$ Residue $\supset$ Atom (SMCRA) design used in ref [85] to create classes that can
represent entire protein assemblies.

In [None]:
# Write the code for Exercise 9 below.
#
