# Best Practices
The goal of this document is to present some important tips about Reproducible Research. The description here shows what worked and what didn't work during my research in the course class IA369-Z from FEEC Unicamp. Here you will find *my lessons learned* during the course, and I hope they will help you with your research.

## Reproducible Research
You must make Reproducible Research during your research. It is a way to research, and it is not an end step, like: "Ok, now I will put my files on Github and everybody will be able to reproduce my experiments". Reproducible Research is a practical approach that will help you to improve the **quality** of your work, to **validate** your findings and to **do relevant contributions in your area**. When other researchers can reproduce your experiment, they can use it and create from your work. Even if they find problems in your experiment, surely they will let you know and that is a good thing.

I think that sometimes we are so concentrated in the validation of our hypothesis, and in the writing (paper) of our insights that we forget to report detailed steps and decisions (for us obvious), making the reproduction impossible (for other researchers and even yourself).

What helped me during the experiments was to keep in my mind *"A human being will read my paper, code, and notes."*

If you are still not convinced, please, read these references: [Useless Studies](http://brasil.elpais.com/brasil/2017/01/10/internacional/1484073680_523691.html?id_externo_rsoc=FB_CC),  [Manifesto from Nature](https://www.nature.com/articles/s41562-016-0021).


## Resources for Reproducibility

Now, I hope, you are motivated to do Reproducible Research. Then you can start by reading this [10 Simple Rules](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285). 

After discussions during the classes, we agreed upon the following elements of Reproducible Research:
* Version Control
* Workflow
* Data repository
* Code development
* Documentation
* Environment

I will comment on each element. To be clear, I will separate the tips in DO (it works) and DON'T (it doesn't work):

### Workflow

**---- DO ----**
* For me, to design the workflow must be the first step of your research. Even in the beginning, it will help you to think and to draw the flow of the research with the inputs, outputs, dependencies and data involved. Moreover, it will help to plan your research;
* Save your editable files (for any tool), because certainly the workflow will change during the process;
* [Draw.io](http://draw.io/) is a good tool and it is integrated with Git (the integration works very well);

**---- DON'T ----**
* Do not underestimate this step, even in the first version, insert as many details as possible;

### Version Control

**---- DO ----**
* After planning your research, "befriend" the version control;
* [Git](https://git-scm.com/) is a good and disseminated option (and you will find a lot of documentation);
* Select a repository, [Github](https://github.com/) and [Bitbucket](https://bitbucket.org/) are famous ones;
* Use the version control DURING your work, making ``commits`` in changes that you want to save;
* If you are new in git, there are some good guides: [Try Git](https://try.github.io/) and [Github Guides](https://guides.github.com/) 

**---- DON'T ----**
* Do not leave to the end of your research to put the codes (and related files) in the repository. Make your repository in the beginning.

### Data Repository

**---- DO ----**
* This is an important step. Provide in your repository the raw and/or processed dataset;
* Make sure you are clear, in the documentation, about the dataset;
* If you are using dataset from another researcher, you must reference it.

**---- DON'T ----**
* Do not make manual manipulations in dataset. Use codes or scripts to do that, because, you can easily forget these changes.

### Code Development and Literate Programming

Literate Programming was a new concept for me, but much important. So, to do Reproducible Research you must understand it:
>*“Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.” Donald Knuth*

** ---- DO ----**
* Select a good environment to develop your codes and texts (Literate Programming);
* [Jupyter Notebook](http://jupyter.org/) is a good option. They recommend to install [Anaconda](https://www.continuum.io/downloads) that comes with Python and Jupyter;
* Learn about [Markdown](https://blog.da2k.com.br/2015/02/08/aprenda-markdown/), you will use this kind of text to explain your codes and experiments;
* Code with [good practices](http://www.devmedia.com.br/boas-praticas-de-programacao/31163?utm_content=buffer1cddf&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer) of programming, that will make a big difference; 
* Report libraries installed to execute your codes (including versions);
* In your files, put a cell with text explaining the decisions or steps, and after it, the cell(s) with codes. 

**---- DON'T ----**
* When you install the Anaconda, make sure you set the right path. It looks like a foolish thing, but I installed thinking "I will select any folder, because after I can change it", but when I needed to define the paths to repository, I couldn't, and I had to desinstall and install again.

### Documentation
I considered as documentation my official paper, so thinking to be more productive, I used the Jupyter Notebook. The experience was good for me, because you can put in the same file cells with codes and texts. The problem was when I converted Markdown in [TEX (LaTeX)](https://www.latex-project.org/), it was not as productive as I had hoped. I had to correct many things in the file. Although it was worth it, at least in the first version.

Keep it in your mind: "The Notebook file will be converted in Markdown, and then in TEX". Well, you can ask: why do we have to convert it in TEX? Because you will want to insert citation of references and figures, use templates and format your text. Markdown has limitations for these needs.

My recommendation: If you will use in your paper many code cells, then use Jupyter Notebook and make the convertions **otherwise, write directly in LaTeX**. 

If in your case you have many code cells:

**---- DO ----**
* Create your [bib tex](http://www.bibtex.org/) file with your references; 
* Write your paper (in Jupyter Notebook) with the LaTeX command of reference to figures and tables `\ref{}` and citations `\cite{}`;
* Convert your file:

```
jupyter nbconvert --to markdown <jupyter.ipynb> 
```

```
pandoc -f markdown -t latex <filename>.md -o filename.tex
```

* You can use [Overleaf](https://www.overleaf.com/). It is an online tool to create, edit and share LaTeX files in a collaborative way (and I recommend it); 
* Open your generated tex file and make the necessary corrections (figures, lists and so on);

**---- DON'T ----**
* Do not leave this step to the end of your research, since the documentation needs to be build gradually;
* There are some resources that make sense to be in the paper and others that do not. Do not insert codes that do not make sense in paper. For example, a call to code file, or codes which do not have a readable output (table, figure, etc.);
* There is this reference to [Update Overleaf from a Jupyter Notebook](https://medium.com/thoughts-philosophy-writing/how-to-update-overleaf-from-a-juypter-notebook-5469b1405fdc), I used it, but it was not productive... I copied files manually many times; 

### Environment
Based on my experience (including the course classes), the environment is the main limitation to reproduce computational research. We know that softwares have releases and dependencies and these can generate problems (e.g. compatibility). The options for environments are:

1. Local - It is not a safe option. You have to report ALL dependencies and versions (including save as backups of installers);
2. Virtual Machine - It is a good option, but you have the size problem. Whoever reproduces your research, he/she will need to have access to good Internet connection;
3. [Docker](http://docker.com/) - It is the best option! It uses the concept of containers (which I recommend strongly).

I used Local and Docker environments:

**---- DO ----**
* I recommend you learn about Docker (Some references: [Docker Basics](https://gitlab.com/daitan-learn/docker-basics), [Docker Lab Lessons](http://labs.play-with-docker.com/),[Blog](http://www.diego-garcia.info/2015/02/15/docker-por-onde-comecar/), [Mundo Docker](http://www.mundodocker.com.br/tag/docker-no-windows/), [Tutorials](https://www.digitalocean.com/community/tutorials/como-instalar-e-utilizar-o-docker-primeiros-passos-pt), [Blog ](https://woliveiras.com.br/posts/Criando-uma-imagem-Docker-personalizada/))
* Important to know: **Image** is a configurated environment and **Container** is an instance of image running in your machine;
* Verify if the applications you need in your project are already in [Docker Hub](https://hub.docker.com/). You can find images ready to use with the complete environment;
* Make all your experiments (if possible) using containers. In others words, configurate your development environment using image instead of local;
* You can use different images for each application you need (e.g. database, programming environment) and run them in parallel;
* When you run an image, verify the correct parameters in `run` command (ports, volume mapping);
* The command `docker run <image> -it` means you want interactivity and link with the container's shell. On the other hand, the command `docker run <image> -d`, means you want to run the image in background (take care, for someone who is not familiar with Docker, the terminal will not explicitly warn that the container is up).


**---- DON'T ----**
* Don't put the research data in the Docker image. Probably the data is the most important element of your research, and a container is not persistent.
* Don't forget to stop and to remove the containers after the use:

*Verify the containers*:
```
docker ps -a
```

*Stop all containers:*
```
docker stop $(docker ps -a -q)
```

*Remove all containers:*
```
docker rm $(docker ps -a -q)

``` 

## Final Recommendations
**---- DO ----**
* As a final step of your research, when your results and paper are done, create the README files. The instructions in these files are important, so make sure you are clear about every step of your research. Ask some colleague to follow your instructions and to try to reproduce your experiment (another person will identify confusing explanations).

**---- DON'T ----**
* Don't put many README files, I did that, and my instructions were confusing.