# git & GitHub: Introduction for Scientists

# git, github, etc... Why do we care?

**Replication** and **reproducibility** are two of the cornerstones in the scientific method. With respect to data analysis (and scientific computing in general!), these concepts have the following practical implications:

* **Replication**: An author of a scientific paper that involves some data analysis should be able to rerun the analysis code and replicate the results upon request. Other scientists should be able to perform the same analysis and obtain the same results, given the information about the methods used in a publication.

* **Reproducibility**: The results obtained by analyzing the data should be reproducible with an independent implementation of the method, or using a different method altogether.


In summary: A sound scientific result should be reproducible, and a sound scientific study should be replicable.

To achieve these goals, we need to:

* Keep and take note of *exactly* which source code and version that was used to produce data and figures in published papers.

* Record information of which version of external software that was used. Keep access to the environment that was used.

* Make sure that old codes and notes are backed up and kept for future reference.

* Ideally, codes should be published online, to make it easier for other scientists interested in the codes to access it.

# What is git?

* A tool that efficiently __saves snapshots__ of a set of files (what their contents was at some point in time).
* A tool that __tracks relationships__ between those snapshots (which snapshot preceeded which, etc.).
* A tool that can efficiently __merge__ different snapshots, even if they have conflicting changes.
* A tool that makes it possible to __share__ these snapshots with others, enabling __collaboration__ (and adding a degree of reproducibility).

# What is GitHub?

* It’s a company that will __host copies of your git repositories__.
* It’s a [website](github.com) that provides a nice way to __browse, view, and sometimes edit__ the contents of those repositories.
* It’s a company that will let you __share__ them with others, and make them __discoverable__.
* It’s a website that makes __collaboration__ on software (incl. research, analysis, etc.) projects __easy__. E.g.:
  * Report and discuss issues.
  * Review, discuss, accept proposed changes.


Note: Git and GitHub are two different things -- one is a tool, another one is a service built around it (and there are alternatives). You can use git without GitHub; it's just that GitHub makes some things easier.

# git+github: introduction by example

# GitHub Student Developer Pack

<center>[https://education.github.com/pack](https://education.github.com/pack)</center>
<img src="images/gh-pack.jpg" width="600">

#### Note
	
`git` repositories are also excellent for version controlling manuscripts, figures, thesis files, data files, lab logs, etc. Basically for any digital content that must be preserved and is frequently updated.

They are also excellent collaboration tools!

## Homework #2: Practicing git, and collaborating using GitHub

At:

        https://github.com/uw-astr-302-w18/astr-302-scicalc

you will find a project attempting to collaboratively build a toy scientific calculator. The calculator has only a few operations implemented, but also a number of proposed enhancements recorded as GitHub issues. Your task is to:

* By the end of this class:
  - Claim an issue you will work on by assigning yourself to it. It cannot not be the issue you opened!


* Try By next Thursday (end-of-day):
  - Implement the proposed enhancement in your own fork, and initiate a Pull Request to merge it back.
  - Assign one of your colleagues as a reviewer (ask on #general who is available, or ask in class). You should also get a review assignment from one of your colleagues.


* By next Sunday (end-of-day):
  - review the pull request for correctness. At minimum:
    - clone the PR branch and test that it works as promised.
    - make sure the documentation (the docstring) is there.
  - if there are issues, work with the submitter to have them fixed.
    - (also fix any issues in your own code that someone else will be reviewing)
  - merge the PR into master, fixing any merge conflicts as necessary.

***Feel free to use #general and collaborate in case you get stuck!***

### Finding out more

 * [Google](http://google.com)
 * [YouTube](http://youtube.com)
   * [Scott Chacon on git](https://www.youtube.com/watch?v=ZDR433b0HJY)
   
   
 * [gitref.org](http://gitref.org/index.html)
 * [LSST's page on git](https://confluence.lsstcorp.org/display/LDMDG/Using+Git+for+LSST+Development)


 * [git website](http://git-scm.com)
 * [github.com](http://github.com)


 * [Reproducible Research in Computational Science](http://dx.doi.org/10.1126/science.1213847), Roger D. Peng, Science 334, 1226 (2011).
 * [Shining Light into Black Boxes](http://dx.doi.org/10.1126/science.1218263), A. Morin et al., Science 336, 159-160 (2012).
 * [The case for open computer programs](http://dx.doi.org/doi:10.1038/nature10836), D.C. Ince, Nature 482, 485 (2012).