# The goal of this notebook

* Explain what is **Literary Computing** and how it relates to **Scientific Reproducibility**
* Describe what makes a good final report notebook for DSC291 projects

# Some History
* [Fernando Perez](https://bids.berkeley.edu/people/fernando-p%C3%A9rez) and [Brian Granger](https://physics.calpoly.edu/bgranger) are the original developers of Jupyter Notebooks. [Project Jupyter](Jupyter.org) has since become one of the most popular platforms for scientific computing and data science.
* You can learn more about the history of the project from [Fernando's Blog](http://blog.fperez.org/2012/01/ipython-notebook-historical.html)
* In 2001 Fernando Perez was a PhD. candidate in **Physics** in Boulder colorado. He was doing a lot of programming in `C/C++` on the one hand and of `Mathematica` on the other. He was looking for a way to bring together the computational power of `C/C++` with the interactivity, graphics, and readability of `Mathematica Notebooks`. Eventually, this led to `Ipython Notebooks` which later became `Jupyter notebooks`.
* One of the main goals of the notebook format is to advance [Literate Computing](http://blog.fperez.org/2013/04/literate-computing-and-computational.html). Literate computing is the idea that a single document should include the motivation, the theory, the code and the graphics for a project.

## Reproducibility
* Reproducibility is at the heart of the scientific method. When a scientist makes a claim, other scientists must be able to reproduce the experiment that led to that claim.
* In many cases, reproducing an experiment means that the complete physical setup used by the claiming scientist had to be recreated by the reproducing scientist. This can be an extremely expensive process. For example, there are many replication efforts around the world today for tests, as well as cures or the *Covid19* epidemic.
* **The reproducibility (or replication) crisis** It has been observed that an increasing fraction of published scientifica papers cannot be replicated. This is a serious problem in al scientific work, and particularly in medicine, where life or death decisions might rely on published papers.See, for example, [this paper](https://blogs.plos.org/thestudentblog/2019/11/18/living-in-the-reproducibility-crisis/). In response, journals are requiring paper authors to submit all supportive data together with their paper.

### Reproducibility in Data Science
* As high-throughput sensors become ubiquitous, much of the data is publically aavailable. The focus moves from the **collection** of data to the **analysis** of the collected data, which is part of so-called **Data Science**. As a result **reproducibility of the analysis** has become increasingly important.
* **Reproducible analysis** is one where the original scientist generates a complete collection of data, software and documentation such that the replicating scientist can perform the complete analysis, starting from raw data, and arrive at the same conclusions. This replication can include evaluating the statistical methods used, the software used and the choice of parameters at each step. The replicating scientist might have doubts about some parts of the analysis and may wish to use a different method or different software.
* The subject **Reproducible research using Jupyter Notebooks** is drawing a lot of interest from many communities. Here is a pointer to recent [workshop](https://reproducible-science-curriculum.github.io/workshop-RR-Jupyter/#examples) and the corresponding [github repository](https://github.com/Reproducible-Science-Curriculum)

# Good reports for DSC291
* A good report is a balance between readability and reproducibility. A good report should be easy to read and understand. On the other hand, the report, together with the other files in the repository, should allow the reviewer to reproduce the results given in the report.
* Readability consists of good writing, organization of the material and visual Layout.
* The layout of a jupyter notebook is effected by the width of the browser window, the font, and other variables. To best control the layout each team should submit a pdf file for the report notebook(s), generated by selecting **files/download as/pdf via LaTex** in the jupyter menu.
* There should be only **one** resuts notebook. If needed for full reproducibility there can be additional **methods** notebooks.

## Elements of a good report
The following is a description of the sections in a good report. It is not meant to be the one and only way to write a report. Rather, consider them general guidelines.

1. **Background:** One paragraph. Describe the context in which the question is being asked. Define any special words used in the question.
2. **The question:** 1-2 sentences defining the goal of the study.
3. **Experimental setup and methods:** Describe data collection, preparation, and cleaning.
Provide the high level calls to code. Do not provide lengthy code, lengthy code should be stored in external .py files.
4. **Experimental design:**  Describe experimental choices and explain why these experimental choices are appropriate for answering **the question**.
5. **Results**: Present the results in a condensed, well designed format. Spend time to make figures and tables that are easy to read (in the exported pdf). Remove redundant or irrelevant values, but **do not** remove results that are contrary to your conclusions. Include statistical tests or confidence interval.
6. **Conclusions:** A one paragraph summary of the statistically significant conclusions and how they relate to the question and to the results.

## Examples of good results notebooks

These are examples of pretty good notebooks. They could use some improvements, in hiding programming details, in layouts and in the conclusions.
* [Pregnancy length analysis](./1_pregnancy_length_analysis.pdf) 
* [Does the amount of snow vary more from year to year or from place to place?](6.is_SNWD_Variation_spatial_or_temporal.pdf)