#  <font color = 545AA7> Introduction to Jupyter and Python </font>

Before we can begin analyzing protein PDB files, we need to cover some Python and Jupyter notebook basics. This notebook will not make you an expert programmer, but it will give you a quick foundation on skills you will need for the subsequent notebooks. The goals of this notebook are: 
- to familiarize everyone with running a Jupyter notebook
- provide some basic Python we will use later in this activity including basic math, variables, functions, and basic plotting

##  <font color = 545AA7> 1. Jupyter Notebooks </font>

Jupyter notebooks are a shareable and interactive electronic document that contains two main types of cells: code and markdown. The **code cells** contain live code that can be executed directly inside the Jupyter notebook with any output appearing directly below the code cell. **Markdown cells** can contain text, equation, and images to provide background and instructions to the user.

To provide rich content in the markdown cells, equations can be formated using Latex-like syntax (example below), and text can be formated using either the lightweight [markdown language](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) or html.

$$ E = E^o - \frac{RT}{nF}lnQ $$

## <font color = 545AA7> 2. Basic Python </font>

We are now going to cover a quick primer on some very basic Python used in the following activities. Ultimately, computer programming is an application of mathematics, so we need to be able to perform mathematical operations. Common operations are listed below.

| Operator   | Operation    |
|:-------------:|:---------------:|
|     +       | Addition      |
|     -       | Substration      |
|     *       | Multiplication      |
|     /       | Division        |
|     **       | Exponentiation      |


Parentheses may also be used to alter or clarify the order of operatation.

### <font color = 545AA7> Variables </font>


Often time you will want to continue using the output of a previous calculation. While you certaily could retype the number in a later cell, it is more efficient to attach the value to a variable. You can create your own variable names, but you should strive to use intuative names to make your code easier to read and follow.

To attach a number to a variable, use the single `=` symbol. Once a value is attched to a variable, the variable can be used in a calculation in lieu of the number itself.

In [None]:
mass = 121.32

### <font color = 545AA7> Functions </font>

Python allows the use of functions provided natively with every Python installation. Additional functions can be provided in external libraries such as those listed below.

The general structure of a function is below where $func$ is the function name, and any input is placed inside the parentheses.

$$ func() $$

For example, `abs()` is the absolute value function that comes with Python.

Python also includes series of **modules** containing more functions, and a list of these modules can be found at [https://docs.python.org/3/py-modindex.html](https://docs.python.org/3/py-modindex.html). Before these modules can be used, they must to be **imported**, which is how Python loads them into memory. The general format is `import <module>`.

Once a module has been imported, any function in that module may be executed using the format `module.func()`.

## <font color = 545AA7> 3. Using for Loops </font>

It is often neccesary to perform the same operation on a list or series of data. In Python, we typically use a operation called a `for` loop. Let's say for example that we have a list of peptide bond angles in radians, and we want to convert them to degrees. There are 2$\pi$ radians in a full circle (i.e., 360$^o$), so

$$ degrees = radians \times \frac{180}{\pi} $$

The `for` loop effectively grabs each number from the list one at a time and performs any operation indented (1 tab or 4 spaces) below.

`for value in list:
    value / 10`

In [None]:
radians = [1.91, 1.89, 1.88, 1.92, 1.92, 1.91, 1.85, 1.91]

## <font color = 545AA7> 4. External Libraries </font>

While Python comes with an impressive collection of modules, there are often tasks that users want to complete that are not covered with the native Python modules. For this, users can import external **libraries**. A list of common Python scientific libraries are listed below with breif description.

Libraries can contain submodules which are collections of functions/data with a similar theme or purpose. Examples of submodules in the SciPy library are listed below as an example.

- [**SciPy:**](https://www.scipy.org/scipylib/index.html) includes common function for scientific data processing tasks like signal processing, interpolation, optimization, etc...
    - signal: signal processing tools
    - fft: fast Fourier processing tools
    - optimize: optimization tools
    - integrate: integration functions
    - stats: statistics functions
    - constants: collection of scientific constants
- [**NumPy:**](https://numpy.org/) basic library to handeling larger amounts of data and includes additional mathematical functions
- [**Pandas:**](https://pandas.pydata.org/) more advanced library for handeling data
- [**Matplotlib:**](https://matplotlib.org/) standard data plotting and visuallization library
- [**Seaborn:**](https://seaborn.pydata.org/) more advanced data plotting and visualization library
- [**SymPy:**](https://www.sympy.org/en/index.html) symbolic mathematics library 
- [**Biopython:**](https://biopython.org/) bioinformatics library
- [**Scikit-image:**](https://scikit-image.org/) scientific image processing library
- [**Scikit-learn:**](https://scikit-learn.org/stable/) general purpose machine learning library

Almost all of the above libraries come with the [Anaconda](https://www.anaconda.com/products/individual#Downloads) installation of Python, so you should have most of these already installed (except Biopython).

## <font color = 545AA7> 5. Plotting with Matplotlib </font>

Matplotlib is a common plottling library using with Python to visualize data. The following commands need to be run in order to import the matplotlib library and to set plotting to display the outputs inside the Jupyter notebook, respectively.

`import matplotlib.pyplot as plt`

`%matplotlib inline`

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

We can plot data as a **scatter plot** using the `plt.scatter()` function. This function requires the `x` and `y` data as shown below.

`plt.scatter(x, y)`

The following lines can also be included to add the x- and y-labels on the plot.

`plt.xlabel('Text')`

`plt.ylabel('Text')`

In [None]:
chains = [1,  2,  3,  4,  5,  6,  7,  8,  9]
counts = [3843, 2738, 303, 734, 43, 139, 9, 91, 16]

### <font color = 545AA7> Matplotlib Functions </font>

Matplotlib includes a series of functions for generating different types

- `plt.scatter(x,y)`: yields scatter plot with just markers
- `plt.plot(x,y)`: yields line plot, markers optional
- `plt.bar(x,y)`: yields bar plot
- `plt.stem(x,y)`: yields stem plot (like scatter plot with lines to x-axis
- `plt.boxplot(x,y)`: yields box plot
- `plt.hist(nums)`: yields histogram plot showing distribution of values in dataset
- `plt.pie(nums)`: yields a pie plot showing relative ratios

## <font color = F28500> Plotting Activity</font>

Below is a series of data either includes in the Jupyter notebook or imported from an external file. Vidualize this data as requested or in whatever way you feel is most appropriate.

### <font color = F28500> Other Plotting Types </font>

Experiment display the above oligomer data using either the `plt.stem(x, y)` function or the `plt.plot(x,y)` function.

### <font color = F28500> Hisogram Plots</font>

A **histogram plot** is a frequency plot that shows how many values fall within each set of ranges known as bins. It looks like a bar plot except that the width of the bars is significant and the histogram function automatically tallies the data to see how many values go in each bin. The matplotlib histogram function is shown below. The first example only provides the data and allows the function to choose how many bins are appropriate. The second example provides the plotting function both the data and explicitly mandates that the data be sorted into 10 bins.

`plt.hist(data)`

`plt.hist(data, bins=10)`

If you want to zoom in on the graph, you can set the x-axis limits using the `plt.xlim()` function. Just add it to another line in the same code cell as the main plotting function.

`plt.xlim(min, max)`

<font color = F28500> Below is code that loads peptide bond length data from an external file into the variable `lengths`. Visualize these data using a histogram plot. </font>

In [None]:
import numpy as np
lengths = np.genfromtxt('amide_bond_lengths.csv', delimiter=',')

## <font color = 545AA7> Additional Resources </font>

Additional resources for learning Python and plotting are listed below.
- **Scientific Computing for Chemists** is a free electronic textbook for learning and teaching Python, Jupyter notebooks, and applying these skills to solving chemical problems available at [https://github.com/weisscharlesj/SciCompforChemists](https://github.com/weisscharlesj/SciCompforChemists)