# Lecture 2: Version Control with Git

In the first part of this lecture, you worked with the playground repository.  You learned how to navigate the repository from the `Git` point of view, make changes to the repo, and work with the remote repo.

One very important topic in `Git` involves the concept of the branch.  You will work **extensively** with branches in any real project.  In fact, branches are central to the `Git` workflow.  In this portion of the lecture, we will discuss branches with `Git`.

----

### Branching.

As you might have seen by now, everything in git is a branch. We have branches on remote (upstream) repositories, copies of remote branches in our local repository, and branches on local repositories which (so far) track remote branches (or more precisely local copies of remote repositories)

In [1]:
%%bash
cd /tmp/playground
git branch -avv

* L2-branches             fd50512 Extending Lecture 2 to incluce branches.
  master                  a847425 [origin/master] Fixed a few typos in Lecture 2.
  remotes/origin/HEAD     -> origin/master
  remotes/origin/gh-pages 36aa8a1 Shifted schedule around.
  remotes/origin/master   a847425 Fixed a few typos in Lecture 2.
  remotes/upstream/master 21634fb Initializing Lecture 2 presentation.


bash: line 1: cd: /tmp/playground: No such file or directory


And all of these branches are nothing but commit-streams in disguise, as can be seen above. Its a very simple model which leads to a lot of interesting version control patterns.

Since branches are so light-weight, the recommended way of working on software using git is to create a new branch for each new feature you add, test it out, and if good, merge it into master. Then you deploy the software from master. We have been using branches under the hood. Let's now lift the hood.

----
### `branch`

![git_branch](figs/git_branch.png)

Branches can also be created manually, and they are a useful way of organizing unfinished changes.

The `branch` command has two forms. The first:

`git branch`

simply lists all of the branches in your local repository. If you run it without having created any branches, it will list only one, called `master`. This is the default branch. You have seen the use of `git branch -avv` to show all branches.

The other form creates a branch with a given name:

It's important to note that the other branch is not *active*. If you make changes, they will still apply to the `master` branch, not `my-new-branch`. To change this, you need the next command.

----
### `checkout`

![git_checkout](figs/git_checkout.png)

Checkout switches the active branch. Since branches can have different changes, `checkout` may make the working directory look very different. For instance, if you have added new files to one branch, and then check another branch out, those files will no longer show up in the directory. They are still stored in the `.git` folder, but since they only exist in the other branch, they cannot be accessed until you check out the original branch.

You can combine creating a new branch and checking it out with the shortcut:

Ok so lets try this out on our repository....

In [3]:
%%bash
cd /tmp/playground
git branch mybranch1

See what branches we have created...

In [4]:
%%bash
cd /tmp/playground
git branch 

* master
  mybranch1


Jump onto the `mybranch1` branch...

In [5]:
%%bash
cd /tmp/playground
git checkout mybranch1
git branch

  master
* mybranch1


Switched to branch 'mybranch1'


Notice that it is bootstrapped off the `master` branch and has the same files.

In [6]:
%%bash
cd /tmp/playground
ls

README.md
new.md
world.md


You could have created this branch using `git checkout -b mybranch1`. Lets see the status here.

In [7]:
%%bash
cd /tmp/playground
git status

On branch mybranch1
nothing to commit, working tree clean


Lets add a new file.  Note that this file gets added on this branch only.

In [8]:
%%bash
cd /tmp/playground
echo '# Hello Aliens' > aliens.md
git status

On branch mybranch1
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	aliens.md

nothing added to commit but untracked files present (use "git add" to track)


We add the file to the index, and then commit the files to the local repository on the `mybranch1` branch.

In [9]:
%%bash
cd /tmp/playground
git add .
git status


On branch mybranch1
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   aliens.md



In [10]:
%%bash
cd /tmp/playground
git commit -m "Added another test file to demonstrate git features" -a
git status

[mybranch1 e0f4713] Added another test file to demonstrate git features
 1 file changed, 1 insertion(+)
 create mode 100644 aliens.md
On branch mybranch1
nothing to commit, working tree clean


Ok we have committed. Lets try to push!

In [11]:
%%bash
cd /tmp/playground
git push

fatal: The current branch mybranch1 has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin mybranch1



Oops that failed. Why? git didnt know what to push to on origin, and didnt want to assume we wanted to call the branch `mybranch1` on the remote. We need to tell that to git explicitly, as it tells us to.

In [12]:
%%bash
cd /tmp/playground
git push --set-upstream origin mybranch1

Branch mybranch1 set up to track remote branch mybranch1 from origin.


To https://github.com/dsondak/playground.git
 * [new branch]      mybranch1 -> mybranch1


Aha, now we are set my with both a remote and a local for `mybranch1`

In [16]:
%%bash
cd /tmp/playground
git branch -avv

* master                   4163942 [origin/master] Merge remote-tracking branch 'course/master'
  mybranch1                e0f4713 [origin/mybranch1] Added another test file to demonstrate git features
  remotes/course/master    bc97b48 Important file added.
  remotes/origin/HEAD      -> origin/master
  remotes/origin/master    4163942 Merge remote-tracking branch 'course/master'
  remotes/origin/mybranch1 e0f4713 Added another test file to demonstrate git features


We make sure we are back on master

In [17]:
%%bash
cd /tmp/playground
git checkout master

Your branch is up-to-date with 'origin/master'.


Already on 'master'


### Recovering from a mistake

Now suppose for a second that this `mybranch1` was created by someone else. We wanted to get it down using `fetch` and play. But we called `pull`, which did an automatic merge for us.

In [32]:
%%bash
cd /tmp/toplay
git pull origin mybranch1

Updating 255b608..7e68482
Fast-forward
 aliens.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 aliens.md


From github.com:rahuldave/toplay
 * branch            mybranch1  -> FETCH_HEAD


In [33]:
%%bash
cd /tmp/toplay
git status

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean


oops, that was not what we wanted. We undo it using `git reset`, to go back to the state at the last commit.

In [34]:
%%bash
cd /tmp/toplay
git reset --hard origin/master

HEAD is now at 255b608 Said hello world to world


In [35]:
%%bash
cd /tmp/toplay
git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean


### Changing only one file

The last git command I want to talk about, since it might come handy in the course, is the situation in which you dont want to merge an entire branch from the upstream, but just one file from it. There is a direct use case for it. Indeed, suppose I've made an error in this lab and want to correct it. So i fix it in the upstream. In the meanwhile you have edited some other files. You dont care to manually ignore my older copies of those files. So you want to fix just one file from this new branch. This is how you do it.

First you fetch from the remote (i ought to be doing this with the course remote but just want to illustrate it here)

In [36]:
%%bash
cd /tmp/playground
git fetch origin

In [37]:
%%bash
cd /tmp/playground
git branch -avv

* master                   255b608 [origin/master] Said hello world to world
  mybranch1                7e68482 [origin/mybranch1] Added another test file to demonstrate git features
  remotes/course/master    62846b8 Said hello to myself
  remotes/origin/HEAD      -> origin/master
  remotes/origin/master    255b608 Said hello world to world
  remotes/origin/mybranch1 7e68482 Added another test file to demonstrate git features


In [38]:
%%bash
cd /tmp/playground
git checkout origin/mybranch1 -- aliens.md

Why does the syntax have `origin/mybranch1`? Remember that multiple remotes may have the same branch...so you want to be specific. So we check out `origin`'s `mybranch1`, and only want `world.md`.

In [39]:
%%bash
cd /tmp/playground
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   aliens.md



Note that the file as automatically added to the index. We'll commit and push.

In [40]:
%%bash
cd /tmp/playground
git commit -m "want aliens in master" 

[master aad929d] want aliens in master
 1 file changed, 1 insertion(+)
 create mode 100644 aliens.md


In [41]:
%%bash
cd /tmp/playground
git status

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean


In [42]:
%%bash
cd /tmp/playground
git push

To git@github.com:rahuldave/toplay
   255b608..aad929d  master -> master


Note that in git versions > 2.0, which you must use, `git push` will push the current branch to its appropriate remote, all of which can be seen through `git branch -avv` or by looking at the `config` file.

In [43]:
%%bash
cd /tmp/playground
cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = git@github.com:rahuldave/toplay
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[remote "course"]
	url = git@github.com:iacs-cs207/toplay.git
	fetch = +refs/heads/*:refs/remotes/course/*
[branch "mybranch1"]
	remote = origin
	merge = refs/heads/mybranch1


----

## Git habits

** * Commit early, commit often. * **

Git is more effective when used at a fine granularity. For starters, you can't undo what you haven't committed, so committing lots of small changes makes it easier to find the right rollback point. Also, merging becomes a lot easier when you only have to deal with a handful of conflicts.

** * Commit unrelated changes separately. * **

Identifying the source of a bug or understanding the reason why a particular piece of code exists is much easier when commits focus on related changes. Some of this has to do with simplifying commit messages and making it easier to look through logs, but it has other related benefits: commits are smaller and simpler, and merge conflicts are confined to only the commits which actually have conflicting code.

** * Do not commit binaries and other temporary files. * **

Git is meant for tracking changes. In nearly all cases, the only meaningful difference between the contents of two binaries is that they are different. If you change source files, compile, and commit the resulting binary, git sees an entirely different file. The end result is that the git repository (which contains a complete history, remember) begins to become bloated with the history of many dissimilar binaries. Worse, there's often little advantage to keeping those files in the history. An argument can be made for periodically snapshotting working binaries, but things like object files, compiled python files, and editor auto-saves are basically wasted space.

** * Ignore files which should not be committed * **

Git comes with a built-in mechanism for ignoring certain types of files. Placing filenames or wildcards in a `.gitignore` file placed in the top-level directory (where the `.git` directory is also located) will cause git to ignore those files when checking file status. This is a good way to ensure you don't commit the wrong files accidentally, and it also makes the output of `git status` somewhat cleaner.

** * Always make a branch for new changes * **

While it's tempting to work on new code directly in the `master` branch, it's usually a good idea to create a new one instead, especially for team-based projects. The major advantage to this practice is that it keeps logically disparate change sets separate. This means that if two people are working on improvements in two different branches, when they merge, the actual workflow is reflected in the git history. Plus, explicitly creating branches adds some semantic meaning to your branch structure. Moreover, there is very little difference in how you use git.

** * Write good commit messages * **

I cannot understate the importance of this.

** Seriously. Write good commit messages. **