# A 5-10 min intro to mixing markdown and code + version control

### Alisandra Denton

### 2022.03.02

## 1. Why? An ever-recurring hypothetical

- You do something important and good and usefull
- It sits around for a while (waiting on another aspect of the project, write-up, on review, etc)
- Something small needs to change, while everything else is kept _exactly_ the same
  - If your documentation and organzation is not good enough, this is now _harder_ than doing it from scratch.

### Plotting example 


#### let's imagine

- you have a clustering plot like the following as 'figure 1' in a word document
- collaborator or reviewer ask you to use a color-blind friendly color scale
- if you can't find the script, or _which_ script exactly created figure 1
  - you have to match the _exact_ filter threshold, and the _exact_ scaling, etc... (via trial and error, **potentially very hard**)
  - then plot and change the color (**easy**)
     
![Alt text](data/heatmap.png "a title")


##### we're now worse off effort-wise than if this hadn't been plotted to start with; yet iterative progress and intermittent feedback and revision are inherent to science


## 2. How to make this better? Jupyter notebook & co.

Beyond "get organized" / making organization easier

- get everything in one place
- linked / auto-updating
- re-run everything (feasible) before inclusion/hand-over

In general, any combination of markdown & code (e.g. RMarkdown), with what-you-see is what-you-get can help. Jupyter is just the largest.

### Same plotting example

> Use the markdown for context such as background and aim

#### Background

The data used here are borrowed from: 

Dominik Brilhaus, Andrea Bräutigam, Tabea Mettler-Altmann, Klaus Winter, Andreas P.M. Weber, Reversible Burst of Transcriptional Changes during Induction of Crassulacean Acid Metabolism in _Talinum triangulare_, _Plant Physiology_, Volume 170, Issue 1, January 2016, Pages 102–122, https://doi.org/10.1104/pp.15.01076

and as previously included in the DataAnalysis course

#### Aim
Convince you to work on analysis reproducibility before it is too late.

#### Pre-processing
> Markdown allows for explicit, in-place, and copy-paste friendly documentation of
> anything non-python that needs to be done.


The following pre-processing was performed. **you must re-run this in bash if the data has been updated**

```bash
cat data/Talinum_Data.tsv |sed 's/\t/,/g' > data/Talinum_Data.csv
```

#### Data import, processing, and plotting
> The python below is not just documentation, it directly runs

In [None]:
# import and setup standard data analysis libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

> note that the data here is _linked_, if data/Talinum_Data.csv changes
> the results change as soon as this is ran
> You can link many things, e.g. importing your own code that you need
> for multiple different notebooks

In [None]:
# read in data
talinum = pd.read_csv('data/Talinum_Data.csv')
n_tal = np.array(talinum.iloc[:, 4:])
# filter
n_tal = n_tal[np.min(n_tal, axis=1) > 20, ]

> We can also extract and auto-update numbers we might want elsewhere
> (e.g. in the main text of a paper)

In [None]:
f"{n_tal.shape[0]} transcripts passed filter"

In [None]:
# center
s_tal = n_tal.T - np.mean(n_tal, axis=1)
# scale 
s_tal = (s_tal / np.std(s_tal, axis=0)).T
# plot
sns.clustermap(s_tal, cmap='RdYlGn_r', figsize=(7, 7))

#### Conclusions
- This is about the Notebooks, not good plotting practice. Label your plots. 
- We've seen 
  - all-in-one 
  - dynamic linking
- We still need
  - re-run everything before hand over! (effectiveness of tools _always_ depend on _how_ you use them)
- Also, what about versioning...?


### Versioning example

#### Let's imagine you have the following files
```
project_notebook.ipynb 
project_notebookB_v1.ipynb 
project_notebookB_v10.ipynb 
project_notebookB_v11.ipynb 
project_notebookB_v2.ipynb 
project_notebookB_v3.ipynb 
project_notebookB_v4.ipynb 
project_notebookB_v5.ipynb 
project_notebookB_v6.ipynb 
project_notebookB_v7.ipynb 
project_notebookB_v8.ipynb 
project_notebookB_v9.ipynb 
project_notebookB_v11_publication_ready.ipynb 
project_notebookfinal.ipynb
project_notebookfinalFINAL.ipynb
project_notebookfinalFINAL_v2.ipynb
project_notebookfinalFINAL_v2_AD.ipynb
project_notebooknew.ipynb
```

Nice that everything is linked _inside_ each notebook, 
but this is still going to be tedious...

But you also don't want to just _delete_ them, in case
you have to come back to something