# git & GitHub: Introduction for Scientists

# git, github, etc... Why do we care?

**Replication** and **reproducibility** are two of the cornerstones in the scientific method. Per the [American Statistical Society Reproducible Research Recommendations](https://www.amstat.org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf), these are defined as:

* **Reproducibility**: A study is reproducible if you can take the original data and the computer code used to analyze the data and reproduce all of the numerical findings from the study. This may initially sound like a trivial task but experience has shown that it’s not always easy to achieve this seemingly minimal standard.

* **Replicability**: This is the act of repeating an entire study, independently of the original investigator without the use of original data (but generally using the same methods).

In summary: A sound scientific result should be reproducible, and a sound scientific study should be replicable.

To achieve these goals (especially reproducibility), we need to:

* Keep and take note of *exactly* which source code and version that was used to produce data and figures in published papers.

* Record information of which version of external software that was used. Keep access to the environment that was used.

* Make sure that old codes and notes are backed up and kept for future reference.

* Ideally, codes should be published online, to make it easier for other scientists interested in the codes to access it.

# What is git?

* A tool that efficiently __saves snapshots__ of a set of files (what their contents was at some point in time).
* A tool that __tracks relationships__ between those snapshots (which snapshot preceeded which, etc.).
* A tool that can efficiently __merge__ different snapshots, even if they have conflicting changes.
* A tool that makes it possible to __share__ these snapshots with others, enabling __collaboration__ (and adding a degree of reproducibility).

# What is GitHub?

* It’s a company that will __host copies of your git repositories__.
* It’s a [website](github.com) that provides a nice way to __browse, view, and sometimes edit__ the contents of those repositories.
* It’s a company that will let you __share__ them with others, and make them __discoverable__.
* It’s a website that makes __collaboration__ on software (incl. research, analysis, etc.) projects __easy__. E.g.:
  * Report and discuss issues.
  * Review, discuss, accept proposed changes.


Note: Git and GitHub are two different things -- one is a tool, another one is a service built around it (and there are alternatives). You can use git without GitHub; it's just that GitHub makes some things easier.

# GitHub Student Developer Pack

https://education.github.com/pack

![image.png](attachment:image.png)

# git+github: introduction by example

#### Note
	
`git` repositories are also excellent for version controlling manuscripts, figures, thesis files, data files, lab logs, etc. Basically for any digital content that must be preserved and is frequently updated.

They are also excellent collaboration tools!

### Finding out more

 * [Google](http://google.com)
 * [YouTube](http://youtube.com)
   * [Scott Chacon on git](https://www.youtube.com/watch?v=ZDR433b0HJY)
   

 * [Step-by-step git tutorial](https://learngitbranching.js.org/)

 * [git website](http://git-scm.com)
 * [github.com](http://github.com)


 * [Reproducible Research in Computational Science](http://dx.doi.org/10.1126/science.1213847), Roger D. Peng, Science 334, 1226 (2011).
 * [Shining Light into Black Boxes](http://dx.doi.org/10.1126/science.1218263), A. Morin et al., Science 336, 159-160 (2012).
 * [The case for open computer programs](http://dx.doi.org/doi:10.1038/nature10836), D.C. Ince, Nature 482, 485 (2012).