# Version control for fun and profit: the tool you didn't know you needed. From personal workflows to open collaboration

This Notebook incorporates materials from the following sources:

-  https://github.com/fperez/reprosw are all licensed CC-BY, but note that the figures taken from the ProGit book carry a Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license instead.
- [Software Carpentry](https://software-carpentry.org) from this [git repository](https://github.com/swcarpentry/git-novice) made available under the [Creative Commons Attribution
license][cc-by-human]. The following is a human-readable summary of
(and not a substitute for) the [full legal text of the CC BY 4.0
license][cc-by-legal]. 



[![Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531](../fig/phd101212s.png)](http://www.phdcomics.com)

"Piled Higher and Deeper" by Jorge Cham, http://www.phdcomics.com



## Have You Been There?  
- Multiple nearly-identical versions of the same document
- Confusing to identify order of changes and true final version
- Word and Google docs have "track changes" mode to enavle some better workflow

#### Version Control
```
"A component of software configuration management, **version control**, also known as revision control or source control,[1] is the management of changes to documents, computer programs, large web sites, and other collections of information. Changes are usually identified by a number or letter code, termed the 'revision number', 'revision level', or simply 'revision'."
```
\- [Wikipedia](https://en.wikipedia.org/wiki/Version_control)



## Reproducablity is Critical for Science (and don't think that data science isn't science)

```
"Science is facing a 'reproducibility crisis' where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests."
```
  -[BBC News Article](http://www.bbc.com/news/science-environment-39054778)

```
"[...] Manage versions. Manage data versions. Being able to reproduce the models. What if, you know, the data disappears, the person disappears, the model disappears... And we cannot reproduce this. I have seen this hundreds of times in Bing. I have seen it every day. Like... Oh yeah, we had a good model. Ok, I need to tweak it. I need to understand it. And then... Now we cannot reproduce it. That is my biggest nightmare!” 
```
  -Microsoft Employee Answering question "What is your worst nightmare? (Related to Machine Learning Systems)" as quoted in [*Machine Teaching: A New Paradigm for Building Machine Learning Systems*](https://arxiv.org/abs/1707.06742v2)





## The Long History of Version Control Systems in CS/Application Development

- Automated version control systems (VCS) are nothing new.
- Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies.
- However, many of these are now becoming considered as legacy systems due to various limitations in their capabilities.
- In particular, the more modern systems, such as Git and [Mercurial](http://swcarpentry.github.io/hg-novice/) are *distributed*, meaning that they do not need a centralized server to host the repository.
- New Data Science specific VCS like [Pachyderm](http://www.pachyderm.io) emerging.



## Github and  GitHub Desktop
- If not too comfortable with command line (or even if you are) Github Desktop is convenient.
- We will introduce concepts and leave it to you to do more in-depth study. 
- Help for using the desktop software is [here](https://help.github.com/desktop-beta/guides/getting-started-with-github-desktop/).
- Create an ID on the [GitHub](http://github.com/) website.

## Git is an enabling technology: Use version control for everything

* Paper writing (never get `paper_v5_john_jane_final_oct22_really_final.tex` by email again!)
* Grant writing
* Everyday research
* Teaching (never accept an emailed homework assignment again!)

## Teaching courses with Git

<!-- offline: 
<img src="files/fig/indefero_projects_notes.png" width="100%">
-->
![](https://cdn-images-1.medium.com/max/1044/1*2FQ54aQ1dllVrpVAf86RBg.png)

## The plan for this tutorial

This tutorial is structured in the following way: we will begin with a brief overview of key concepts you need to understand in order for git to really make sense.  We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 5 "stages of git" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:
            
1. Local, single-user, linear workflow
2. Single local user, branching
3. Using remotes as a single user
4. Remotes for collaborating in a small team
5. Full-contact github: distributed collaboration with large teams
    
In reality, this tutorial only covers stages 1-4, since for #5 there are many software develoment-oriented tutorials and documents of very high quality online.  But most scientists start working alone with a few files or with a small team, so I feel it's important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world.  Once you've become familiar with 1-4, the excellent tutorials that exist about collaborating on github on open-source projects should make sense.

## Very high level picture: an overview of key concepts

The **commit**: *a snapshot of work at a point in time*

<!-- offline: 
![](fig/commit_anatomy.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/commit_anatomy.png">

Credit: ProGit book, by Scott Chacon, CC License.

In [18]:
ls

[34m01-notebook-basics[m[m/     PythonBasics.ipynb      local-intro.ipynb
01_introduction.pptx    Readme.md               test_setup.ipynb
02-intro-git-cl.ipynb   Version Control.ipynb
2017_introduction.pptx  changed format.pptx


A **repository**: a group of *linked* commits

<!-- offline: 
![](files/fig/threecommits.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/threecommits.png" >

Note: these form a Directed Acyclic Graph (DAG), with nodes identified by their *hash*.

A **hash**: a fingerprint of the content of each commit *and its parent*

In [19]:
import hashlib
data1 = 'date: 1/1/17'+'This is the start of my paper2.'
hashlib.sha1(data1.encode('utf-8')).hexdigest()

'792b6df404265ceed4c37b28fa4baca966383e0a'

And this is pretty much the essence of Git!

## First things first: git must be configured before first use

The minimal amount of configuration for git to work without pestering you is to tell it who you are:
(Fill in your details here)

In [23]:
%%bash
#Fill in your details here
git config --global user.name "Jason Kuruzovich"  
git config --global user.email "jkuruzovich@gmail.com"  

Also set your favorite text editor:

| Editor             | Configuration command                            |
|:-------------------|:-------------------------------------------------|
| Atom | `$ git config --global core.editor "atom --wait"`|
| nano               | `$ git config --global core.editor "nano -w"`    |
| Text Wrangler (Mac)      | `$ git config --global core.editor "edit -w"`    |
| Sublime Text (Mac) | `$ git config --global core.editor "subl -n -w"` |
| Sublime Text (Win, 32-bit install) | `$ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w"` |
| Sublime Text (Win, 64-bit install) | `$ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"` |
| Notepad++ (Win, 32-bit install)    | `$ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"`|
| Notepad++ (Win, 64-bit install)    | `$ git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"`|
| Kate (Linux)       | `$ git config --global core.editor "kate"`       |
| Gedit (Linux)      | `$ git config --global core.editor "gedit --wait --new-window"`   |
| Scratch (Linux)       | `$ git config --global core.editor "scratch-text-editor"`  |
| emacs              | `$ git config --global core.editor "emacs"`   |
| vim                | `$ git config --global core.editor "vim"`   |


 And how you will edit text files (it will often ask you to edit messages and other information, and thus wants to know how you like to edit your files):

In [24]:
%%bash
# Put here your preferred editor. If this is not set, git will honor
# the $EDITOR environment variable
git config --global core.editor "atom --wait"  # my lightweight unix editor

# On Windows Notepad will do in a pinch, I recommend Notepad++ as a free alternative
# On the mac, you can set nano or emacs as a basic option

# And while we're at it, we also turn on the use of color, which is very useful
git config --global color.ui "auto"

Set git to use the credential memory cache so we don't have to retype passwords too frequently. On Linux, you should run the following (note that this requires git version 1.7.10 or newer):

## Stage 1: Local, single-user, linear workflow

Simply type `git` to see a full list of all the 'core' commands.  We'll now go through most of these via small practical exercises:

In [26]:
!git

usage: git [--version] [--help] [-C <path>] [-c name=value]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone      Clone a repository into a new directory
   init       Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add        Add file contents to the index
   mv         Move or rename a file, a directory, or a symlink
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect     Use binary search to find the commit that introduced a bug
  

### `git init`: create an empty repository

In [27]:
%%bash
rm -rf test
git init test

Initialized empty Git repository in /Users/jasonkuruzovich/githubdesktop/techfundamentals-fall2017-materials/classes/01-overview/test/.git/


**Note:** all these cells below are meant to be run by you in a terminal  where  you change *once* to the `test` directory and continue working there.

Since we are putting all of them here in a single notebook for the purposes of the tutorial, they will all be prepended with the first two lines:

    %%bash
    cd test

that tell IPython to do that each time.  But you should ignore those two lines and type the rest of each cell yourself in your terminal.

Let's look at what git did:

In [28]:
%%bash
cd test

ls

In [29]:
%%bash
cd test

ls -la

total 0
drwxr-xr-x   3 jasonkuruzovich  staff  102 Aug 31 12:24 .
drwxr-xr-x  15 jasonkuruzovich  staff  510 Aug 31 12:24 ..
drwxr-xr-x  10 jasonkuruzovich  staff  340 Aug 31 12:24 .git


In [30]:
%%bash
cd test

ls -l .git

total 24
-rw-r--r--   1 jasonkuruzovich  staff   23 Aug 31 12:24 HEAD
drwxr-xr-x   2 jasonkuruzovich  staff   68 Aug 31 12:24 branches
-rw-r--r--   1 jasonkuruzovich  staff  137 Aug 31 12:24 config
-rw-r--r--   1 jasonkuruzovich  staff   73 Aug 31 12:24 description
drwxr-xr-x  12 jasonkuruzovich  staff  408 Aug 31 12:24 hooks
drwxr-xr-x   3 jasonkuruzovich  staff  102 Aug 31 12:24 info
drwxr-xr-x   4 jasonkuruzovich  staff  136 Aug 31 12:24 objects
drwxr-xr-x   4 jasonkuruzovich  staff  136 Aug 31 12:24 refs


Now let's edit our first file in the test directory with a text editor... I'm doing it programatically here for automation purposes, but you'd normally be editing by hand

In [31]:
%%bash
cd test

echo "My first bit of text" > file1.txt

### `git add`: tell git about this new file

In [32]:
%%bash
cd test

git add file1.txt

We can now ask git about what happened with `status`:

In [33]:
%%bash
cd test

git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   file1.txt



### `git commit`: permanently record our changes in git's database

For now, we are *always* going to call `git commit` either with the `-a` option *or* with specific filenames (`git commit file1 file2...`).  This delays the discussion of an aspect of git called the *index* (often referred to also as the 'staging area') that we will cover later.  Most everyday work in regular scientific practice doesn't require understanding the extra moving parts that the index involves, so on a first round we'll bypass it.  Later on we will discuss how to use it to achieve more fine-grained control of what and how git records our actions.

In [34]:
%%bash
cd test

git commit -a -m"This is our first commit"

[master (root-commit) 314a40f] This is our first commit
 1 file changed, 1 insertion(+)
 create mode 100644 file1.txt


In the commit above, we  used the `-m` flag to specify a message at the command line.  If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.  By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

### `git log`: what has been committed so far

In [35]:
%%bash
cd test

git log

commit 314a40fadb854a08c3baf346dc43562db5b64cea
Author: Jason Kuruzovich <jkuruzovich@gmail.com>
Date:   Thu Aug 31 12:41:27 2017 -0400

    This is our first commit


### `git diff`: what have I changed?

Let's do a little bit more work... Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)

In [36]:
%%bash
cd test

echo "And now some more text..." >> file1.txt

And now we can ask git what is different:

In [37]:
%%bash
cd test

git diff

diff --git a/file1.txt b/file1.txt
index ce645c7..4baa979 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1 +1,2 @@
 My first bit of text
+And now some more text...


### The cycle of git virtue: work, commit, work, commit, ...

In [38]:
%%bash
cd test

git commit -a -m"I have made great progress on this critical matter."

[master 2ce7082] I have made great progress on this critical matter.
 1 file changed, 1 insertion(+)


### `git log` revisited

First, let's see what the log shows us now:

In [39]:
%%bash
cd test

git log

commit 2ce7082363d00c8ddf6875e7bf15829f71f024df
Author: Jason Kuruzovich <jkuruzovich@gmail.com>
Date:   Thu Aug 31 12:41:54 2017 -0400

    I have made great progress on this critical matter.

commit 314a40fadb854a08c3baf346dc43562db5b64cea
Author: Jason Kuruzovich <jkuruzovich@gmail.com>
Date:   Thu Aug 31 12:41:27 2017 -0400

    This is our first commit


Sometimes it's handy to see a very summarized version of the log:

In [40]:
%%bash
cd test

git log --oneline --topo-order --graph

* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


Git supports *aliases:* new names given to command combinations. Let's make this handy shortlog an alias, so we only have to type `git slog` and see this compact log:

In [41]:
%%bash
cd test

# We create our alias (this saves it in git's permanent configuration file):
git config --global alias.slog "log --oneline --topo-order --graph"

# And now we can use it
git slog

* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


### `git mv` and `rm`: moving and removing files

While `git add` is used to add fils to the list git tracks, we must also tell it if we want their  names to change or for it to stop tracking them.  In familiar Unix fashion, the `mv` and `rm` git commands do precisely this:

In [42]:
%%bash
cd test

git mv file1.txt file-newname.txt
git status

On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	renamed:    file1.txt -> file-newname.txt



Note that these changes must be committed too, to become permanent!  In git's world, until something hasn't been committed, it isn't permanently recorded anywhere.

In [43]:
%%bash
cd test

git commit -a -m"I like this new name better"
echo "Let's look at the log again:"
git slog

[master ff677cb] I like this new name better
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename file1.txt => file-newname.txt (100%)
Let's look at the log again:
* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


And `git rm` works in a similar fashion.

### Exercise

Add a new file `file2.txt`, commit it, make some changes to it, commit them again, and then remove it (and don't forget to commit this last step!).

## Local user, branching

What is a branch?  Simply a *label for the 'current' commit in a sequence of ongoing commits*:

<!-- offline: 
![](files/fig/masterbranch.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/masterbranch.png" >

There can be multiple branches alive at any point in time; the working directory is the state of a special pointer called HEAD.  In this example there are two branches, *master* and *testing*, and *testing* is the currently active branch since it's what HEAD points to:

<!-- offline: 
![](files/fig/HEAD_testing.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/HEAD_testing.png" >

Once new commits are made on a branch, HEAD and the branch label move with the new commits:

<!-- offline: 
![](files/fig/branchcommit.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/branchcommit.png" >

This allows the history of both branches to diverge:

<!-- offline: 
![](files/fig/mergescenario.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/mergescenario.png" >

But based on this graph structure, git can compute the necessary information to merge the divergent branches back and continue with a unified line of development:
    
<!-- offline: 
![](files/fig/mergeaftermath.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/mergeaftermath.png" >

Let's now illustrate all of this with a concrete example.  Let's get our bearings first:

In [44]:
%%bash
cd test

git status
ls

On branch master
nothing to commit, working tree clean
file-newname.txt


We are now going to try two different routes of development: on the `master` branch we will add one file and on the `experiment` branch, which we will create, we will add a different one.  We will then merge the experimental branch into `master`.

In [45]:
%%bash
cd test

git branch experiment
git checkout experiment

Switched to branch 'experiment'


In [46]:
%%bash
cd test

echo "Some crazy idea" > experiment.txt
git add experiment.txt
git commit -a -m"Trying something new"
git slog

[experiment aca9da9] Trying something new
 1 file changed, 1 insertion(+)
 create mode 100644 experiment.txt
* aca9da9 Trying something new
* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


In [47]:
%%bash
cd test

git checkout master
git slog

* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


Switched to branch 'master'


In [48]:
%%bash
cd test

echo "All the while, more work goes on in master..." >> file-newname.txt
git commit -a -m"The mainline keeps moving"
git slog

[master b680adf] The mainline keeps moving
 1 file changed, 1 insertion(+)
* b680adf The mainline keeps moving
* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


In [49]:
%%bash
cd test

ls

file-newname.txt


In [50]:
%%bash
cd test

git merge experiment
git slog

Merge made by the 'recursive' strategy.
 experiment.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 experiment.txt
*   4309a12 Merge branch 'experiment'
|\  
| * aca9da9 Trying something new
* | b680adf The mainline keeps moving
|/  
* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


## Using remotes as a single user

We are now going to introduce the concept of a *remote repository*: a pointer to another copy of the repository that lives on a different location.  This can be simply a different path on the filesystem or a server on the internet.

For this discussion, we'll be using remotes hosted on the [GitHub.com](http://github.com) service, but you can equally use other services like [BitBucket](http://bitbucket.org) as well as host your own.

In [56]:
%%bash
cd test

ls
echo "Let's see if we have any remote repositories here:"
git remote -v

experiment.txt
file-newname.txt
Let's see if we have any remote repositories here:
origin	https://github.com/jkuruzovich/test.git (fetch)
origin	https://github.com/jkuruzovich/test.git (push)


You shouldn't see a remote but:
`git remote rm origin` will remove a remote. 

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.  We will now proceed to do so.  Once logged into GitHub, go to the [new repository page](https://github.com/new) and make a repository called `test`.  Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository here.  That option is useful when you're starting first at Github and don't have a repo made already on a local computer.

We can now follow the instructions from the next page:

In [55]:
%%bash
cd test

git remote add origin https://github.com/jkuruzovich/test.git
git push -u origin master

Branch master set up to track remote branch master from origin.


To https://github.com/jkuruzovich/test.git
 * [new branch]      master -> master


Let's see the remote situation again:

In [54]:
%%bash
cd test
git remote -v

We can now [see this repository publicly on github](https://github.com/jkuruzovich/test).

Let's see how this can be useful for backup and syncing work between two different computers.  I'll simulate a 2nd computer by working in a different directory...

In [58]:
%%bash

# Here I clone my 'test' repo but with a different name, test2, to simulate a 2nd computer
git clone https://github.com/jkuruzovich/test.git test2
cd test2
pwd
git remote -v

/Users/jasonkuruzovich/githubdesktop/techfundamentals-fall2017-materials/classes/01-overview/test2
origin	https://github.com/jkuruzovich/test.git (fetch)
origin	https://github.com/jkuruzovich/test.git (push)


Cloning into 'test2'...


Let's now make some changes in one 'computer' and synchronize them on the second.

In [59]:
%%bash
cd test2  # working on computer #2

echo "More new content on my experiment" >> experiment.txt
git commit -a -m"More work, on machine #2"

[master 1706ab6] More work, on machine #2
 1 file changed, 1 insertion(+)


Now we put this new work up on the github server so it's available from the internet

In [60]:
%%bash
cd test2

git push

To https://github.com/jkuruzovich/test.git
   4309a12..1706ab6  master -> master


Now let's fetch that work from machine #1:

In [61]:
%%bash
cd test

git pull

Updating 4309a12..1706ab6
Fast-forward
 experiment.txt | 1 +
 1 file changed, 1 insertion(+)


From https://github.com/jkuruzovich/test
   4309a12..1706ab6  master     -> origin/master


### An important aside: conflict management

While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  Let's see how that works by intentionally creating a conflict.

We start by creating a branch and making a change to our experiment file:

In [62]:
%%bash
cd test

git branch trouble
git checkout trouble
echo "This is going to be a problem..." >> experiment.txt
git commit -a -m"Changes in the trouble branch"

[trouble 1638726] Changes in the trouble branch
 1 file changed, 1 insertion(+)


Switched to branch 'trouble'


And now we go back to the master branch, where we change the *same* file:

In [63]:
%%bash
cd test

git checkout master
echo "More work on the master branch..." >> experiment.txt
git commit -a -m"Mainline work"

Your branch is up-to-date with 'origin/master'.
[master 60667d2] Mainline work
 1 file changed, 1 insertion(+)


Switched to branch 'master'


So now let's see what happens if we try to merge the `trouble` branch into `master`:

In [64]:
%%bash
cd test

git merge trouble

Auto-merging experiment.txt
CONFLICT (content): Merge conflict in experiment.txt
Automatic merge failed; fix conflicts and then commit the result.


Let's see what git has put into our file:

In [65]:
%%bash
cd test

cat experiment.txt

Some crazy idea
More new content on my experiment
<<<<<<< HEAD
More work on the master branch...
This is going to be a problem...
>>>>>>> trouble


At this point, we go into the file with a text editor, decide which changes to keep, and make a new commit that records our decision.  I've now made the edits, in this case I decided that both pieces of text were useful, but integrated them with some changes:

In [66]:
%%bash
cd test

cat experiment.txt

Some crazy idea
More new content on my experiment
<<<<<<< HEAD
More work on the master branch...
This is going to be a problem...
>>>>>>> trouble


Let's then make our new commit:

In [67]:
%%bash
cd test

git commit -a -m"Completed merge of trouble, fixing conflicts along the way"
git slog

[master d29d792] Completed merge of trouble, fixing conflicts along the way
*   d29d792 Completed merge of trouble, fixing conflicts along the way
|\  
| * 1638726 Changes in the trouble branch
* | 60667d2 Mainline work
|/  
* 1706ab6 More work, on machine #2
*   4309a12 Merge branch 'experiment'
|\  
| * aca9da9 Trying something new
* | b680adf The mainline keeps moving
|/  
* ff677cb I like this new name better
* 2ce7082 I have made great progress on this critical matter.
* 314a40f This is our first commit


*Note:* While it's a good idea to understand the basics of fixing merge conflicts by hand, in some cases you may find the use of an automated tool useful.  Git supports multiple [merge tools](https://www.kernel.org/pub/software/scm/git/docs/git-mergetool.html): a merge tool is a piece of software that conforms to a basic interface and knows how to merge two files into a new one.  Since these are typically graphical tools, there are various to choose from for the different operating systems, and as long as they obey a basic command structure, git can work with any of them.

## Sharing Github with a Small Team

Single remote with shared access: we are going to set up a shared collaboration with one partner (the person sitting next to you).  This will show the basic workflow of collaborating on a project with a small team where everyone has write privileges to the same repository.  


We will have two people, let's call them Alice and Bob, sharing a repository.  Alice will be the owner of the repo and she will give Bob write privileges.  

We begin with a simple synchronization example, much like we just did above, but now between *two people* instead of one person.  Otherwise it's the same:

- Bob clones Alice's repository.
- Bob makes changes to a file and commits them locally.
- Bob pushes his changes to github.
- Alice pulls Bob's changes into her own repository.

Next, we will have both parties make non-conflicting changes each, and commit them locally.  Then both try to push their changes:

- Alice adds a new file, `alice.txt` to the repo and commits.
- Bob adds `bob.txt` and commits.
- Alice pushes to github.
- Bob tries to push to github.  What happens here?

The problem is that Bob's changes create a commit that conflicts with Alice's, so git refuses to apply them.  It forces Bob to first do the merge on his machine, so that if there is a conflict in the merge, Bob deals with the conflict manually (git could try to do the merge on the server, but in that case if there's a conflict, the server repo would be left in a conflicted state without a human to fix things up).  The solution is for Bob to first pull the changes (pull in git is really fetch+merge), and then push again.

## Github Flow and Large Teams
- This is the typical workflow of introducing changes via Git in team projects. 
- [https://guides.github.com/introduction/flow/](https://guides.github.com/introduction/flow/)
![](https://guides.github.com/activities/hello-world/branching.png)

Multiple remotes and merging based on pull request workflow: this is beyond the scope of this brief tutorial, so we'll simply discuss how it works very briefly, illustrating it with the activity on the [Jupyterhub on Kuberhetes Repository](https://github.com/jupyterhub/helm-chart).

## Other useful commands

- [show](http://www.kernel.org/pub/software/scm/git/docs/git-show.html)
- [reflog](http://www.kernel.org/pub/software/scm/git/docs/git-reflog.html)
- [rebase](http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html)
- [tag](http://www.kernel.org/pub/software/scm/git/docs/git-tag.html)

## Git resources

### Introductory materials

There are lots of good tutorials and introductions for Git, which you
can easily find yourself; this is just a short list of things I've found
useful.  For a beginner, I would recommend the following 'core' reading list, and
below I mention a few extra resources:

1. The smallest, and in the style of this tuorial: [git - the simple guide](http://rogerdudler.github.com/git-guide)
contains 'just the basics'.  Very quick read.

1.  The concise [Git Reference](http://gitref.org): compact but with
    all the key ideas. If you only read one document, make it this one.

1. In my own experience, the most useful resource was [Understanding Git
Conceptually](http://www.sbf5.com/~cduan/technical/git).
Git has a reputation for being hard to use, but I have found that with a
clear view of what is actually a *very simple* internal design, its
behavior is remarkably consistent, simple and comprehensible.

1.  For more detail, see the start of the excellent [Pro
    Git](http://progit.org/book) online book, or similarly the early
    parts of the [Git community book](http://book.git-scm.com). Pro
    Git's chapters are very short and well illustrated; the community
    book tends to have more detail and has nice screencasts at the end
    of some sections.

If you are really impatient and just want a quick start, this [visual git tutorial](http://www.ralfebert.de/blog/tools/visual_git_tutorial_1)
may be sufficient. It is nicely illustrated with diagrams that show what happens on the filesystem.

For windows users, [an Illustrated Guide to Git on Windows](http://nathanj.github.com/gitguide/tour.html) is useful in that
it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well
as screenshots of the Windows interface.

Cheat sheets
:   Two different
    [cheat](http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html)
    [sheets](http://jan-krueger.net/development/git-cheat-sheet-extended-edition)
    in PDF format that can be printed for frequent reference.

### Beyond the basics

At some point, it will pay off to understand how git itself is *built*.  These two documents, written in a similar spirit, 
are probably the most useful descriptions of the Git architecture short of diving into the actual implementation.  They walk you through
how you would go about building a version control system with a little story. By the end you realize that Git's model is almost
an inevitable outcome of the proposed constraints:

* The [Git parable](http://tom.preston-werner.com/2009/05/19/the-git-parable.html) by Tom Preston-Werner.
* [Git foundations](http://matthew-brett.github.com/pydagogue/foundation.html) by Matthew Brett.

[Git ready](http://www.gitready.com)
:   A great website of posts on specific git-related topics, organized
    by difficulty.

[QGit](http://sourceforge.net/projects/qgit/): an excellent Git GUI
:   Git ships by default with gitk and git-gui, a pair of Tk graphical
    clients to browse a repo and to operate in it. I personally have
    found [qgit](http://sourceforge.net/projects/qgit/) to be nicer and
    easier to use. It is available on modern linux distros, and since it
    is based on Qt, it should run on OSX and Windows.

[Git Magic](http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html)
:   Another book-size guide that has useful snippets.

The [learning center](http://learn.github.com) at Github
:   Guides on a number of topics, some specific to github hosting but
    much of it of general value.

A [port](http://cworth.org/hgbook-git/tour) of the Hg book's beginning
:   The [Mercurial book](http://hgbook.red-bean.com) has a reputation
    for clarity, so Carl Worth decided to
    [port](http://cworth.org/hgbook-git/tour) its introductory chapter
    to Git. It's a nicely written intro, which is possible in good
    measure because of how similar the underlying models of Hg and Git
    ultimately are.

[Intermediate tips](http://andyjeffries.co.uk/articles/25-tips-for-intermediate-git-users)
:   A set of tips that contains some very valuable nuggets, once you're
    past the basics.

Finally, if you prefer a video presentation, this 1-hour tutorial prepared by the GitHub educational team will walk you through the entire process: