# Version Control With Git

**Based on materials by Katy Huff, Anthony Scopatz, Joshua R. Smith, Sri 
Hari Krishna Narayanan, Matthew Gidden, and Adam Klingler**

## Local Operations

## git : What is Version Control ?

Very briefly, version control is a way to keep a backup of changing
files, to store a history of those changes, and most importantly to
allow many people in a collaboration to make changes to the same files
concurrently. There are a lot of version control systems. Wikipedia
provides both a nice vocabulary list and a fairly complete table of some
popular version control systems and their equivalent commands.

Today, we'll be using git. Git is an example of a distributed version
control system, distinct from centralized versing control systems. I'll
make the distinction clear later, but for now, the table below will
suffice.

Version Control System Tool Options

- **Distributed** 
  - Decentralized CVS (dcvs)  
  - mercurial (hg)
  - git (git) 
  - bazaar (bzr)
  
- **Centralized**
  - concurrent versions system (cvs)
  - subversion (svn)

## git --help : Getting Help

The first thing I like to know about any tool is how to get help. Let's see what happens when we type :

```bash
$ git --help
```

Excellent, it gives a list of commands it is able to help with, as well
as their descriptions.

```bash
$ git --help
usage: git [--version] [--exec-path[=<path>]] [--html-path]
           [-p|--paginate|--no-pager] [--no-replace-objects]
           [--bare] [--git-dir=<path>] [--work-tree=<path>]
           [-c name=value] [--help]
           <command> [<args>]

The most commonly used git commands are:
   add        Add file contents to the index
   bisect     Find by binary search the change that introduced a bug
   branch     List, create, or delete branches
   checkout   Checkout a branch or paths to the working tree
   clone      Clone a repository into a new directory
   commit     Record changes to the repository
   diff       Show changes between commits, commit and working tree, etc
   fetch      Download objects and refs from another repository
   grep       Print lines matching a pattern
   init       Create an empty git repository or reinitialize an existing one
   log        Show commit logs
   merge      Join two or more development histories together
   mv         Move or rename a file, a directory, or a symlink
   pull       Fetch from and merge with another repository or a local branch
   push       Update remote refs along with associated objects
   rebase     Forward-port local commits to the updated upstream head
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index
   show       Show various types of objects
   status     Show the working tree status
   tag        Create, list, delete or verify a tag object signed with GPG

See 'git help <command>' for more information on a specific command.
```

## git config : Controls the behavior of git

```bash
$ git config --global user.name "YOUR NAME"
$ git config --global user.email "YOUR EMAIL"
```     

## git init : Creating a Local Repository

To keep track of numerous versions of your work without saving numerous
copies, you can make a local repository for it on your computer. What git
does is to save the first version, then for each subsequent version it
saves only the changes. That is, git only records the difference between
the new version and the one before it. With this compact information,
git is able to recreate any version on demand by adding the changes to
the original in order up to the version of interest.

To create your own local (on your own machine) repository, you must
initialize the repository with the infrastructure git needs in order to
keep a record of things within the repository that you're concerned
about. The command to do this is **git init** .

* * * * 
### Exercise : Create a Local Repository

Step 1 : Initialize your repository.

```bash
$ cd
$ mkdir simplestats
$ cd simplestats
$ git init
Initialized empty Git repository in /home/me/simplestats/.git/
# .git is initialized. It contains all the information that is necessary for your project in version control and all the information about commits, remote repository address, etc.
# The folder simplestats is your repo. git <COMMAND> are run under your repo not .git 
```

Step 2 : Browse the directory's hidden files to see what happened here.
Open directories, browse file contents. Learn what you can in a minute.

```bash
$ ls -A
.git
$ cd .git
$ ls -A
HEAD        config      description hooks       info        objects     refs 
```

Step 3 : Use what you've learned. You may have noticed the file called
description. You can describe your repository by opening the description
file and replacing the text with a name for the repository.  We will be
creating a module with some simple statistical methods, so mine will be
called "Some simple methods for statistical analysis". You may call yours 
anything you like.

```bash
$ nano description
```

Step 4 : Applications sometimes create files that are not needed. For
example, emacs creates a temporary file called 'filename\~\' when you edit
the file 'filename'.  You can ask git to ignore such files by editing
the file '.git/info/exclude'. Edit the file to ignore files the end with '~'.

```bash
git ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~
```
    
* * * *

## git add : Adding a File To Version Control

For the git repository to know which files within this directory you
would like to keep track of, you must add them. First, you'll need to
create one, then we'll learn the **git add** command.

* * * * 
### Exercise : Add a File to Your Local Repository

Step 1 : Create a file to add to your repository.

```bash
$ cd ../ 
# Now you are in your repository simplestats
$ touch README.md
```

Step 2 : Inform git that you would like to keep track of future changes
in this file.

```bash
$ git add README.md
```

* * * * 

## git status : Checking the Status of Your Local Copy

The files you've created on your machine are your local "working" copy.
The changes your make in this local copy aren't backed up online
automatically. Until you commit them, the changes you make are local
changes. When you change anything, your set of files becomes different
from the files in the official repository copy. To find out what's
different about them in the terminal, try:

```bash
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   README.md
#
```

The null result means that you're up to date with the current version of
the repository online. This result indicates that the current difference
between the repository HEAD (which, so far, is empty) and your
`simplestats` directory is this new README.md file.

## git commit : Saving a Snapshot

In order to save a snapshot of the current state (revision) of the
repository, we use the commit command. This command is always associated
with a message describing the changes since the last commit and
indicating their purpose. Informative commit messages will serve you
well someday, so make a habit of never committing changes without at
least a full sentence description.

**ADVICE: Commit often**

In the same way that it is wise to often save a document that you are
working on, so too is it wise to save numerous revisions of your code.
More frequent commits increase the granularity of your **undo** button.

**ADVICE: Good commit messages**

There are no hard and fast rules, but good commits are atomic: they are the smallest change that remain meaningful. A good commit message usually contains a one-line description followed by a longer explanation if necessary.

[Our repo](https://bitbucket.org/scopatz/aims-scicomp) has some good commit messages.

* * * 
### Exercise : Commit Your Changes

Step 1 : Commit the file you've added to your repository.

    $ git commit -am "This is the first commit. It adds a readme file."
    [master (root-commit) 1863aef] This is the first commit. It adds a readme file.
     1 files changed, 2 insertions(+), 0 deletions(-)
     create mode 100644 README.md

Step 2 : Admire your work.

    $ git status
    # On branch master
    nothing to commit (working directory clean)

* * * 

## git log : Viewing the History

A log of the commit messages is kept by the repository and can be
reviewed with the log command.

    $ git log
    commit 1863aefd7db752f58226264e5f4282bda641ddb3
    Author: Joshua Smith <joshua.r.smith@gmail.com>
    Date:   Wed Feb 8 16:08:08 2012 -0600

        This is the first commit. It adds a readme file.

There are some useful flags for this command, such as

    -p
    -3
    --stat
    --oneline
    --graph
    --pretty=short/full/fuller/oneline
    --since=X.minutes/hours/days/weeks/months/years or YY-MM-DD-HH:MM
    --until=X.minutes/hours/days/weeks/months/years or YY-MM-DD-HH:MM
    --author=<pattern>

## git tag : Marking an important commit

Tagging allows you to give a more readable name to a commit then a random string of characters. This can be useful for skimming what has happened as well as making it easier to "undo" to this point, as will be shown later.

* * * 
### Exercise : Rename your commit

Step 1 : Find the unique identifier for the commit.

    $ git log 
    commit 1863aefd7db752f58226264e5f4282bda641ddb3
    Author: Joshua Smith <joshua.r.smith@gmail.com>
    Date:   Wed Feb 8 16:08:08 2012 -0600

        This is the first commit. It adds a readme file.

Step 2 : The identifier is next to the ```commit``` line. Give it a more useful tag.

    $ git tag readme 1863aefd7db752f58226264e5f4282bda641ddb3

Step 3 : Check that the tag worked.
    
    $ git log 
    commit 1863aefd7db752f58226264e5f4282bda641ddb3 (tag: readme)
    Author: Joshua Smith <joshua.r.smith@gmail.com>
    Date:   Wed Feb 8 16:08:08 2012 -0600

        This is the first commit. It adds a readme file.

* * * 

## git diff : Viewing the Differences

There are many diff tools.

If you have a favorite you can set your default git diff tool to execute
that one. Git, however, comes with its own diff system.

Let's recall the behavior of the diff command on the command line.
Choosing two files that are similar, the command :

    $ diff file1 file2

will output the lines that differ between the two files. This
information can be saved as what's known as a patch, but we won't go
deeply into that just now.

The only difference between the command line diff tool and git's diff
tool is that the git tool is aware of all of the revisions in your
repository, allowing each revision of each file to be treated as a full
file.

Thus, git diff will output the changes in your working directory that
are not yet staged for a commit. To see how this works, make a change in
your README.md file, but don't yet commit it.

    $ git diff

A summarized version of this output can be output with the --stat flag :

    $ git diff --stat

To see only the differences in a certain path, try:

    $ git diff HEAD -- [path]

To see what IS staged for commit (that is, what will be committed if you
type git commit without the -a flag), you can try :

    $ git diff --cached

## git reset : Unstaging a staged file

There are a number of ways that you may accidentally stage a file that
you don't want to commit.  Create a file called `temp_notes` that
describes what you had for breakfast, and then add that file to your
repo.  Check with `status` to see that it is added but not committed.

You can now unstage that file with:

```bash
$ git reset temp_notes
```

Check with `status`.

## git checkout : Discarding unstaged modifications (git checkout has other purposes)

Perhaps you have made a number of changes that you realize are not
going anywhere.  Add a line to `README.md` that describes your dinner
last night.  Check with `status` to see that the file is changed and
ready to be added.

You can now return to previous checked in version with:

    $ git checkout -- README.md

Check with `status` and take a look at the file.

## git rm : Removing files

There are a variety of reasons you way want to remove a file from the
repository after it has been committed.  Create a file called
`READYOU.md` with the first names of all your immediate family
members, and add/commit it to the repository.

You can now remove the file from the repository with:

    git rm READYOU.md

List the directory to see that you have no file named `READYOU.md`.
Use `status` to determine if you need any additional steps.

What if you delete a file in the shell without `git rm`? Try deleting
`README.md`

     rm README.md

What does `git status` say?  Oops! How can you recover this important
file?

     git checkout -- README.md


## git revert : the promised "undo" button

It is possible that after many commits, you decide that you really
want to "rollback" a set of commits and start over.  It is easy to
revert your code to a previous version.

You can use `git log` and `git diff` to explore your history and
determine which version you are interested in.  Choose a version and
note the *hash* for that version. (Let's assume `abc456`)

     git revert abc456
     
You can also use the tag you put onto the commit to revert. Lets use the example from above about tagging.

    git revert readme

**Importantly,** this will not erase the intervening commits.  This
will create a new commit that is changed from the previous commit by a
change that will recreate the desired version.  This retains a
complete provenance of your software, and be compared to the
prohibition in removing pages from a lab notebook.

## git branch : Listing, Creating, and Deleting Branches

Branches are parallel instances of a repository that can be edited and
version controlled in parallel. They are useful for pursuing various
implementations experimentally or maintaining a stable core while
developing separate sections of a code base.

Without an argument, the **branch** command lists the branches that
exist in your repository.

    $ git branch
    * master

The master branch is created when the repository is initialized. With an
argument, the **branch** command creates a new branch with the given
name.

    $ git branch experimental

    $ git branch
    * master
      experimental

To delete a branch, use the **-d** flag.

    $ git branch -d experimental

    $ git branch
    * master


## git checkout : Switching Between Branches, Abandoning Local Changes

The **git checkout** command allows context switching between branches
as well as abandoning local changes.

To switch between branches, try

    $ git branch add_stats

    $ git checkout add_stats

    $ git branch

How can you tell we've switched between branches? When we used the
branch command before there was an asterisk next to the master branch.
That's because the asterisk indicates which branch you're currently in.

* * * * 
### Exercise : Copy files into your repo

Let's make sure we have a good copy of `stats.py` and `test_stats.py`.

    $ cd ~/simplestats

    $ cp ~/EPEtutorials/notebooks/OtherFiles/stats.py ./

    $ cp ~/EPEtutorials/notebooks/OtherFiles/test_stats.py ./

Now let's add them to our repo, but in the current branch.

    $ git add *stats.py

    $ git commit -m "Adding a first version of the files for mean."

* * * *

### Exercise : Add a comment to one of the stats files.

1. Open either `stats.py` or `test_stats.py` in the text editor of your choice.
2. Add a comment line to this file with your name.  Comment lines in Python start with the `#` character.  For exmaple, 

   ```python
   # Anthony is a very nice teacher
   ```
   
3. Commit the changed files to your repo.

* * * * 

## git merge : Merging Branches

At some point, the `add_stats` branch may be ready to become part of
the `master` branch.  In real life, we might do a lot more testing and
development.  For now, let's assume that our mean function is ready
and merge this back to the master.  One method for combining the
changes in two parallel branches is the **merge** command.

    $ git checkout master
    
    $ git merge add_stats


## GitHub.com

GitHub is a site where many people store their open (and closed) source
code repositories. It provides tools for browsing, collaborating on and
documenting code. Your home institution may have a repository hosting
system of it's own. To find out, ask your system administrator.  GitHub,
much like other forge hosting services ([launchpad](https://launchpad.net), 
[bitbucket](https://bitbucket.org),[googlecode](http://code.google.com), 
[sourceforge](http://sourceforge.net)
etc.) provides :

-   landing page support 
-   wiki support
-   network graphs and time histories of commits
-   code browser with syntax highlighting
-   issue (ticket) tracking
-   user downloads
-   varying permissions for various groups of users
-   commit triggered mailing lists
-   other service hooks (twitter, etc.)

**NOTE** Public repos have public licenses **by default**. If you don't
want to share (in the most liberal sense) your stuff with the world, pay
girths money for private repos, or host your own.

## GitHub password 

Setting up GitHub requires a GitHub user name and password.  Please take a
moment to [create a free GitHub account](https://github.com/signup/free) (if you
want to start paying, you can add that to your account some other day).

## git remote : Steps for Forking a Repository

A key step to interacting with an online repository that you have forked
is adding the original as a remote repository. By adding the remote
repository, you inform git of a new option for fetching updates and
pushing commits.

The **git remote** command allows you to add, name, rename, list, and
delete repositories such as the original one **upstream** from your
fork, others that may be **parallel** to your fork, and so on.

We'll be continuing our testing exercises using GitHub as the online repository,
so you'll need to start off by getting a copy of that repository to work on!

* * * *
### Exercise : Fork Our GitHub Repository

Step 1 : Go to our
[repository](https://github.com/UW-Madison-ACI/simplestats)
from your browser, and click on the Fork button. Choose to fork it to your
user name rather than any organizations.

Step 2 : Clone it. From your terminal :

```bash
$ git clone https://github.com/YOU/simplestats.git
$ cd simplestats
```

Step 3 : 

```bash
$ git remote add upstream https://github.com/UW-Madison-ACI/simplestats.git
$ git remote -v
origin  https://github.com/YOU/simplestats.git (fetch)
origin  https://github.com/YOU/simplestats.git (push)
upstream        https://github.com/UW-Madison-ACI/simplestats.git (fetch)
upstream        https://github.com/UW-Madison-ACI/simplestats.git (push)
```

All repositories that are clones begin with a remote called origin.

* * * *

## git fetch : Fetching the contents of a remote

Now that you have alerted your repository to the presence of others, it
is able to pull in updates from those repositories. In this case, if you
want your master branch to track updates in the original simplestats
repository, you simply **git fetch** that repository into the master
branch of your current repository.

The fetch command alone merely pulls down information recent changes
from the original master (upstream) repository. By itself, the fetch
command does not change your local working copy. To update your local
working copy to include recent changes in the original (upstream)
repository, it is necessary to also merge.

## git merge : Merging the contents of a remote

To incorporate upstream changes from the original master repository (in
this case UW-Madison-ACI/simplestats) into your local working copy, you
must both fetch and merge. The process of merging may result in
conflicts, so pay attention. This is where version control is both at
its most powerful and its most complicated.

* * * * 
### Exercise : Fetch and Merge the Contents of Our GitHub Repository

This exercise is meant to represent the general work flow you should use to
update your fork. Let's say that you come in and sit down in the morning, you've
gotten your coffee (or tea) and you're ready to get started. However, someone
from your research group has added something to the project you're working on,
and you need to add it into your work to keep up to date. I'll add a comment,
then let's get started!

Step 1 : Fetch the recent remote repository history

```bash
$ git fetch upstream
```

Step 2 : Merge the master branch

```bash
$ git checkout master
$ git merge upstream/master
```

Step 3 : Check out what happened by browsing the directory.

* * * *

## git pull : Pull = Fetch + Merge

The command **git pull** is the same as executing **git fetch** followed
by **git merge**. Though it is not recommend for cases in which there
are many branches to consider, the pull command is shorter and simpler
than fetching and merging as it automates the branch matching.
Specifically, to perform the same task as we did in the previous
exercise, the pull command would be :

```bash
$ git pull upstream
Already up-to-date.
```

When there have been remote changes, the pull will apply those changes
to your local branch, unless there are conflicts with your local
changes.

## git push : Sending Your Commits to Remote Repositories

The **git push** command pushes commits in a local working copy to a
remote repository. The syntax is git push [remote] [local branch].
Before pushing, a developer should always pull (or fetch + merge), so
that there is an opportunity to resolve conflicts before pushing to the
remote. 

**git push** is also great for pushing local branches to the remote so that you can see them on other machines or to collaborate with others. In order to push a local branch that doesn't exist in the remote repository, you will have to include the ```--set-upstream``` tag, like this:

```git push --set-upstream origin example_branch```

## git checkout --track: Pulling a branch from the remote

Sometimes there is a branch that exists in the remote that you want to work on. This happens a lot when you are working from multiple different machines or want to look at someone elses work to validate it before it is merged into master. However, when you type ```git pull``` the branch does not come with. This is because ```git pull``` only pulls the branch you are in. Here is how you get a remote branch.

Step 1: Fetch the remote repository to update your local information.

    $ git fetch
    
Step 2: Find the names of the remote branches.

    $ git branch -r
    origin/HEAD -> origin/master
    origin/master
    origin/wanted_branch
    
Step 3: Track the remote branch.
    
    $ git checkout --track origin/wanted_branch
    Branch 'wanted_branch' set up to track remote branch 'wanted_branch' from 'origin'.
    Switched to a new branch 'wanted_branch'

## Test your version control skills!

Feel up to testing all of your skills? Check out
[this](https://learngitbranching.js.org/) excellent website. We
haven't taught you all the things you'll need to progress through the entire
exercise, but feel free to take a look and try it out!