# Version control
This notebook is heavily based on one from J.R. Johansson (jrjohansson at gmail.com)

In any software development, one of the most important tools are revision control software (RCS).

They are used in virtually all software development and in all environments, by everyone and everywhere (no kidding!)

RCS can be used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!



## There are two main purposes of RCS systems:

1. Keep track of changes in the source code.
    * Allow reverting back to an older revision if something goes wrong.
    * Work on several "branches" of the software concurrently.
    * Tags revisions to keep track of which version of the software that was used for what (for example, "release-1.0", "paper-A-final", ...)
2. Make it possible for serveral people to collaboratively work on the same code base simultaneously.
    * Allow many authors to make changes to the code.
    * Clearly communicating and visualizing changes in the code base to everyone involved.

## Basic principles and terminology for RCS systems

In an RCS, the source code or digital content is stored in a **repository**. 

* The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository. 

* A user can **checkout** the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated. 

* When a task has been completed, the changes to the local files are **commited** (saved to the repository).

* If someone else has been making changes to the same files, a **conflict** can occur. In many cases conflicts can be **resolved** automatically by the system, but in some cases we might manually have to **merge** different changes together.

* It is often useful to create a new **branch** in a repository, or a **fork** or **clone** of an entire repository, when we doing larger experimental development. The main branch in a repository is called often **master** or **trunk**. When work on a branch or fork is completed, it can be merged in to the master branch/repository.

* With distributed RCSs such as GIT or Mercurial, we can **pull** and **push** changesets between different repositories. For example, between a local copy of there repository to a central online reposistory (for example on a community repository host site like github.com).

### Some good and modern RCS software

1. GIT (`git`) : http://git-scm.com/
2. Mercurial (`hg`) : http://mercurial.selenic.com/

In the rest of this lecture we will look at `git`, although `hg` is just as good and work in almost exactly the same way.

## Installing git

On Linux:
    
    $ sudo apt-get install git

On Mac (with macports):

    $ sudo port install git

The first time you start to use git, you'll need to configure your author information:

    $ git config --global user.name 'Thomas Erben'
    
    $ git config --global user.email 'terben@.......'

## Notes
- In the following we will consider work-flows including `github` repositories.
It is certainly best if you immediately create and administrate your projects immediately
with an RCS host such as [github](https://github.com/) ot [bitbucket](https://bitbucket.org/). Besides the version control aspect,
they also provide you with online backup and internet availability of all your codes!
  <img src="figs/Git_data_flow.png" style="width: 200px">
  
- `github` offers unlimited private repositories for academia (students and researchers). To profit from it:
  - Register for `github` with your **student E-Mail adress** (avoid gmail etc.)!
  - Apply for an [academic account](https://education.github.com/discount_requests/new)

## Creating and cloning a repository

You can create a brand-new repository on `github`. For this lecture, you should create a **public** repository with name `gitdemo`. We use a public repository so that we do not need to type user/password all the time.

I will walk you through this in class.

# Clone the newly created github-repository to your computer.

In [None]:
!git clone https://github.com/terben/gitdemo
    # instead of 'terben', you need to use your own
    # username here

In [None]:
%cd gitdemo
  # The cloned repository is in a subdirectory
  # with the repository name
%ls

## Status

Using the command `git status` we get a summary of the current status of the working directory. It shows if we have modified, added or removed files.

In [None]:
!git status

In this case, as we just cloned the repository, it just tells us that everything is up-to-date

## Adding files and committing changes

To add a new file to the repository, we first create the file and then use the `git add filename` command:

In [None]:
%%file prime.py

# simple demo script to test an integer for the prime property
import numpy as np

def is_prime(n):
    """
    tests whether an integer is a prime number

    input: the number to be testes
    return: True if number is prime and False otherwise
    """

    if n != 2 and n%2 == 0:
        return False
    else:
        for i in range(3, int(np.sqrt(n))):
            if n%i == 0:
                return False

    return True


print(is_prime(23))

In [None]:
!git status

After having added the file `prime.py` to the directory, the command `git status` list it as an *untracked* file.

In [None]:
!git add prime.py
   # 'git add' adds files to the 'index'.

In [None]:
!git status

Now that it has been added, it is listed as a *new file* that has not yet been commited to the repository.

In [None]:
!git commit -m "program to test integers for prime property"
  # git commit takes into account all files in the 'index'

In [None]:
!git status 

After *committing* the change to the repository from the local working directory, `git status` again reports that working directory is clean but that you are not yet in sync with your remote github repository.

The following command to push your changes to github needs to be given on the command line because it asks for github user/password

   user$ git push origin master

## Commiting changes

When files that are tracked by GIT are changed, they are listed as *modified* by `git status`:

In [None]:
%%file prime.py

# simple demo script to test an integer for the prime property

# 23.07.2017:
# Bug fix: The program did not work properly for
#       square numbers
import numpy as np

def is_prime(n):
    """
    tests whether an integer is a prime number

    input: the number to be testes
    return: True if number is prime and False otherwise
    """

    if n != 2 and n%2 == 0:
        return False
    else:
        for i in range(3, int(np.sqrt(n) + 1)):
            if n%i == 0:
                return False

    return True

print(is_prime(25))

In [None]:
!git status

Again, we can add the changes to the index and commit them to the local and github repositories.

We can look at the changes between the new and the old version.

In [None]:
!git add prime.py

In [None]:
!git status

In [None]:
!git commit -m "bug fix: prime numbers were not treated correctly"

user$ git push origin master

In [None]:
!git status

## Removing files

To remove file that has been added to the repository, use `git rm filename`, which works similar to `git add filename`:

In [None]:
%%file tmpfile

A short-lived file.

Add it:

In [None]:
!git add tmpfile

In [None]:
!git commit -m "adding temporary file"

Remove it again:

In [None]:
!git rm tmpfile

In [None]:
!git commit -m "remove file tmpfile"

## Commit logs

The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the `-m "message"` is omitted when invoking the `git commit` message an editor will be opened for you to type a commit message (for example useful when a longer commit message is requried). 

We can look at the revision log by using the command `git log`:

In [None]:
!git log

In the commit log, each revision is shown with a timestampe, a unique hash tag that, and author information and the commit message.

## Diffs

All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use `git diff` so see what has changed in a file:

In [None]:
%%file prime.py

# simple demo script to test an integer for the prime property

# 23.07.2017:
# Bug fix: The program did not work properly for
#       square numbers
import numpy as np

def is_prime(n):
    """
    tests whether an integer is a prime number

    input: the number to be testes
    return: True if number is prime and False otherwise
    """

    if n != 2 and n%2 == 0:
        return False
    else:
        for i in range(3, int(np.sqrt(n) + 1)):
            if n%i == 0:
                return False

    return True


In [None]:
!git diff prime.py

That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.

## Further reading
I only gave you a very brief introduction to version control and the use of git!
Important topics that I did not cover include:
- Checking out old revisions
- Tagging and branching
- Use of `github` for collaborations

Her some tipps for further literature on the topic:
* http://git-scm.com/book
* http://www.vogella.com/articles/Git/article.html
* http://cheat.errtheblog.com/s/git