# Seminar 1
by Martin Hronec </br>
October 12th

**Contents**

1. [Git revision](#revision)
2. [Git's data model](#model) 
3. [Branching](#branching)
4. [Collaboration](#collaboration)

# 1. Revision <a name="revision"></a>

### Lifecycle of a file/blob

* each file in the working directory can be in one of two states: 
    * tracked`
        * files that were in the last snapshot - can be *modified, unmodified or untracked*
    * untracked
        * any files that weren't in the last snapshot are are not in the staging area (explained later)
* a created file will not be automatically untracked until you add it
    * when you clone a repository (instead of `git init`-ing it, all of your files will be tracked and unmodified 
       
![From Git pro book](pictures/file_lifecycle.png)

# 2. Git's data model <a name="model"></a>

* version control is a method for tracking changes to source code
* modelling history of a collection of files and folders within some top-level directory through relating snapshots
* to be a Git repository, we need objects and system for naming those objects
    * refs is the system for naming those objects
    
* a history is a directed acyclic graph (DAG) of snapshots
    * each snapshot refers to a set of "parents (previous snapshots)
        * why set of "parent" snapshots instead of just one "parent" snapshot?
* in Git's terms:        
    * these snapshots are called **commits**
    * a file is called a **blob** (a bunch of bytes)
    * a directory is called a **tree**
        * maps names to blobs or trees (directories can contain other directories)
        
        
* a "snapshot" is basically the top-level tree in your project

* ASCII art for pedagogical purposes:

`root_folder (tree)
|
+- folder (tree)
|  |
|  + file.txt (blob, contents = "A barber learns to shave by shaving fools.")
|
+- another_file.txt (blob, contents = "I don't give a git.")`


### Objects, references and content-addressing

* an "object" is a blob, tree, commit
* in Git data store, all objects are content-addressed by their **SHA-1 hash**
    * SHA-1 hash is 
    * more on this later

* all snapshots can be identified by their SHA-1 hash
    * inconvevient - try to remember strings of 40 hexadecimal characters
    * Git's solution: readeble names for SHA-1 hashes - **references**
    
* references are pointers to commits
* objects are immutable (building blocks)
* references are mutable (can be updated to point to a new commit)
    * e.g. `master` reference - typically pointing to the the main "codebase"
    
* we want a notion of "where we currently are" (e.g. in the history of the project)
    * when we take a new snapshot, we know what it is related to
    * "where we are/what are we looking at" is a special reference called **HEAD**



* a Git **repository** is the data objects and references
* all Git stores are objects and references
    * easy
    
* all git commands map to some manipulation of the DAG which captures the relation between snapshots (history and development) 
    * e.g. by adding objects and adding/updating references
    

## Example of Git's data model

* after the first commit:
    * Git checksums each subdirectory (only the root project directory in this case)
    * stores objects (one file in our case) in the Git repository
    * then it creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed
    
* repository now contains 3 objects:
    * one blob for the content of your file, 
    * one tree that lists the contents of the directory and specifies which file names are stored as which blobs 
    * one commit with the pointer to that root tree and all the commit metadata

![caption](./pictures/single_commit_detailed.PNG)


* we will use [git-graph](https://github.com/hoduche/git-graph) to quickly show the underlying data model of our repository
    * see the github page for the color reference
* e.g.

![caption](pictures/2020_09_22_19_37_48_git_graph.dot.png)

* note that this graph does not correspond to the above one (different hashes)

### 2-commits repository representation

* if you make some changes and commit againt, the next commit stores a pointer to the commit that came immediately before it
  
![caption](pictures/2020_09_22_19_37_51_git_graph.dot.png)

* even if you are not using the command line interface to Git, you are still working with the same underlying objects
* let's quickly look at 2 "GUI" Git interfaces
    * Visual Studio Code
    * GitKraken

### 3-commits repository representation
![caption](pictures/2020_09_22_19_37_54_git_graph.dot.png)

# 3. Branches <a name="branches"></a>

* branching means:
    * diverging from the main line of development
        * and continue to do work without affecting the main (stable/production) development line
    * developing multiple things in parallel

* in other VCS tools, this requires a lot of resources
    * e.g. to create a newcopy of your source code directory   
* not in Git! - very lightweight, killer feature

* when you commit in Git, Git stores a commit object that contains a pointer to the snapshot of the content you staged:
    * the author and message metadata
    * and zero or more pointers to the commit or commits that were the direct parents of this commit:
        * zero parents for the first commit
        * one parent for a normal commit
        * and multiple parents for a commit that results from a merge of two or more branches
        
* you can view all commits across branches using `git log --branches=*`        

### Creating a new branch

* a branch in Git is simply a lightweight movable pointer to one of the commits
* the default branch name in Git is *master*
    * as you initially make commits, you're given a master branch that points to the last commit you made
    and every time you commit, it moves forward automatically
* **what happens when you create a new branch?** 
    * it creates a new pointer for you to move around
    
* `git branch <branch_name>` to create a new branch
* look at the following 2 graphs to see that indeed, new branch is just another pointer pointing to the last commit

![caption](./pictures/before_branch.png)

* added a new branch called "brave" with `git branch brave`
![caption](./pictures/branch_as_pointer.png)
* HEAD is still pointing to the master
    * if we would add another commit now, we would be contributing to the master

### HEAD

* how does Git know what branch you're currently on?
    * it keeps a special pointer called *HEAD*
        * a pointer to the local branch you're currently on
        * you can see it in the file .git/HEAD
        
* for now we are still on the "master" branch, we did not yet switched to the "new branch"

* to switch to another branch, you can use `git checkout <new_branch>`
    * `git status` will confirm that you are indeed looking at it
    * `git log` will show *HEAD* pointing to the "new branch"
* if you change a file, stage it and commit it, your project history will diverge

### Basic branching and merging

* you've decided that you're going to work on issue #12
    * issues are generally the way code development is organized and managed, see [github manual](https://guides.github.com/features/issues/)

* `git checkout -b iss12`


* now on the new branch
    * create some new file or make some changes
    * commit it and look at the graph again
        * your project history has diverged


* a branch in Git is actually a simple file that contains the 40 character SHA-1 checksum of the commit it points to
    * => branches are cheap to create and destroy
    * creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters and a newline).

* now imagine you need to make a *hotfix*, to do this, you need to move away from the iss53 back to the main line of the development
* to switch from the current branch to another, you need to commit your changes
    * or stash them, which we will be talking about later
    
* you switch back to *master*
    * you create a new branch named "hotfix"
    * you hotfix what you need
    * then you need to merge it back to the master (after some tests)



## Merging - fastforward
* branching is usefull only if we can combine things back in the end
* `git merge <branch_to_be_merged_into_where_HEAD_is_pointing>`
* because the commit pointed to by the branch you merged in was directly upstream of the commit you’re on, Git moves the pointer forward
    * when you try to merge one commit with a commit that can be reached by following the first commit’s history, Git simplifies things by moving the pointer forward because there is no divergent work to merge together—this is called a **fast-forward.**
    * notice that Git itself tells us this was a fast-forwart type of the merge
    * Git does the best job to automatically merge things, if not possible you need to do it manually or use `git mergetool`

* `git branch -d hotfix`

### Merging - recursive (three-way merge)

* your development history has diverged from some older point
* becausee the commit on the branch you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work
    * in this case, Git does a simple **three-way merge**, using the two snapshots pointed to by the branch tips and the common ancestor of the two.

* `git checkout master` & `git merge iss12`

* instead of just moving the branch pointer forward, Gitcreates a new snapshot that results from this three-way merge and automatically creates a new commit that points to it 
    * this is referred to as a merge commit
        * it is special by the fact that it has more than one parent

* `git add <changed file>` 
* `git merge --continue` 
* if something goes wrong, e.g. there is a conflict, use `git merge --abort`
* look at the git log to see the branched development

### Merge conflicts - diverged paths

* if you changed the same part of the same file differently in the two branches you-re merging together, Git won't be able to merge them cleanly

* this is how you repository looks like ![](./pictures/pre_conflict_diverged.png)

### Merge conflicts - resolving
* if we run `git merge branch` when there is a conflict we will get (something like) following message

    * `CONFLICT (content): Merge conflict in index.html` 
      `Automatic merge failed; fix conflicts and then commit the result.`

* Git hasn’t automatically created a new merge commit
    * it has paused the process while you resolve the conflict.
    * `git status` to see which files are unmerged at any point after a merge conflict


* open the conflicting file
    * `<<<<<<<< HEAD:`
    
    `......one version`
    
    `=========`
    
    `...... second version`
    
    `>>>>>>>>> <conflicting branch name>`

* resolve each of the conflicted section in the conflicted file
* run `git add` on each file
    * staging the file marks it as resolved in Git

* run `git commit` to finalize the merge commit

* see the resulting graph on the next slide

### Merged - conflict resolved 
![.](./pictures/conflict_resolved.png)

# 4. Collaboration <a name="collaboration"></a>

* make groups of 2, e.g. using [this](https://docs.google.com/spreadsheets/d/1Ajav7uEDKNhNS1VcxUMsIOwzMgK9P4BHi129YmOF95k/edit?usp=sharing) spreadsheet
* do the following:
    * create public Github repo
    * each member clones the repository 
    * each member of the group makes some changes locally
    * push them to the (central) remote repository
    * other member pulls the changes
    * adds other changes and pushes them again
    
* put link to your github repositories next to your names in the matching spreadsheet
 