# Introduction to Git

## Overview

- Git vs svn
- Git configuration
- Repository setup
- Making changes
- Sharing changes

## Git vs svn

### Distributed

- Subversion (hence fcm) is **centralised**: there is one central source of truth. Nearly all actions involve communicating with a server.

- Git is **distributed**: any number of copies can be synchronised as required. A local copy of a repository is entirely self-contained: nearly all actions are local, only communicated to the server when ready.

In practice, git is usually used in a centralised way, by simply agreeing that one of the copies is the main one - such as one on GitHub.

### Staging changes

- Subversion assumes that all changes to the **worktree** should be committed.
- Git additionally has a **staging area** to represent what will be included in the next commit.

The staging area allows more control over which changes are included in which commit, which can be a useful resource for future developers - for example when using `blame` to identify when a bug turned up.

### Structure

- A subversion commit is like a snapshot of the entire repository, including all branches. Every commit therefore knows which branch it affects.
- A git commit is a snapshot of files, with no branch information. Commits simply point to parent commits, forming a graph, and a branch is nothing more than a pointer to a commit.

```
      branch1
      C
     /          main
A - B - D - G - H
     \     /
      E - F
```

## Git configuration

### Identity

Before anything else, tell git what your name and email address are.

```bash
git config --global user.name "Your Name"
git config --global user.email "you@email"
```

This is because a commit in git **must** have an associated name and email address, so if you have not provided them, it will guess (badly!)

Configuration is stored in a simple *ini* file `~/.gitconfig`, which could also be modified directly:

```ini
[user]
name = Your Name
email = you@email
```

### Editor

Git sometimes needs to open a text editor, such as for entering a commit message. This can be configured in the same way as name and email:

```bash
git config --global core.editor "vim"
```

Some programs might need an option to allow git to wait for it. For example:

- `gedit -s`
- `gvim -f`
- `code -nw`

Alternatively, git will fall back to the environment variables `$GIT_EDITOR`, `$VISUAL`, and `$EDITOR`, or ultimately default to `vi`.

### SSH

Commits will be transferred to and from GitHub via SSH.

It is recommended to generate a new key pair for GitHub, rather than using any existing keys:
```bash
ssh-keygen -b 4096 -f ~/.ssh/id_rsa_github
```

Then, add the **public** SSH key (the contents of `~/.ssh/id_rsa_github.pub`) to your GitHub account, under ["SSH and GPG keys" settings](https://github.com/settings/keys).

It is also recommended to configure SSH to use this key, and only this key, when connecting to `github.com`. This can be done by adding the following to `~/.ssh/config`:

```
Host github.com
IdentityFile ~/.ssh/id_rsa_github
IdentitiesOnly yes
```

## Repository setup

In [1]:
mkdir -p "$SCRATCH/git"
cd "$SCRATCH/git"

### Cloning a repository

If contributing to an existing repository, it needs to be *cloned* with `git clone <url>`.

This copies the entire repository with all its branches and history to your machine.

The URL for a GitHub repository can be found by clicking the green "Code" button:

![GitHub clone URLs](clone.png)

*Note*

- **HTTPS** can only be used to clone public repositories.
- **SSH** is required to clone private repositories, or push any changes.
- The URL can also be a path to a folder, on any machine accessible via SSH.

### Initialising a repository

If wanting to put some existing code under version control, it needs to be *initialised*.

In [4]:
if [ -d example-repo ]; then rm -rf example-repo; fi

In [5]:
git init example-repo

Initialized empty Git repository in /net/spice/scratch/bsherrat/git/example-repo/.git/


*Behind the scenes*

All the information about a repository is in a hidden `.git` folder:

In [6]:
ls -A example-repo

[0m[38;5;27m.git[0m


In [7]:
ls example-repo/.git

[0m[38;5;27mbranches[0m  config  description  HEAD  [38;5;27mhooks[0m  [38;5;27minfo[0m  [38;5;27mobjects[0m  [38;5;27mrefs[0m


### Initial commit

It is customary in a fresh repository to create a very simple initial commit with a README file. This would eventually contain basic information about the repository contents such as installation or usage.

In [8]:
cd example-repo
echo '# Example repository' > README.md

A very important command that should be run regularly, to ensure you and git are on the same page, is `git status`.

In [9]:
git status

On branch [1mmaster[m

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[34mREADME.md[m

nothing added to commit but untracked files present (use "git add" to track)


In [10]:
git add README.md
git status

On branch [1mmaster[m

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   README.md[m



In [11]:
git commit -m "Initial commit"

[master (root-commit) 01bf887] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md


In [12]:
git status

On branch [1mmaster[m
nothing to commit, working tree clean


## Making changes

### Branching

Branches are managed with `git branch`:

- `git branch` - list current branches
- `git branch <name> [<from>]` - create a branch (from another)
- `git branch -d <name>` - delete a branch
- `git branch -m [<name>] <new_name>` - rename a branch
- `git branch -u <name>` - set the "upstream" (where `pull` and `push` go)

In [13]:
git branch

* [32mmaster[m


First, `master` is no longer the preferred default branch name, so change it to `main`:

In [14]:
git branch -m main
git branch

* [32mmain[m


In [15]:
git branch dev
git branch

  dev[m
* [32mmain[m


Note that the new branch was NOT checked out: `main` is still the current branch.

To check out a branch, use `git checkout <branch>`.

In [16]:
git checkout dev

Switched to branch 'dev'


In [17]:
git branch

* [32mdev[m
  main[m


Checking out works by modifying the files in your working copy to match how they should look at the tip of the branch. Git is kind enough to block you from checking out a branch if it would overwrite uncommitted changes.

While `git branch` can't check out branches, `git checkout` is able to create them:

In [18]:
git checkout -b example
git branch

Switched to a new branch 'example'
  dev[m
* [32mexample[m
  main[m


In [19]:
git checkout dev
git branch -d example

Switched to branch 'dev'
Deleted branch example (was 01bf887).


### Do work

Working on a branch largely involves these commands:

- `git status` - describe the state of files, and suggest how to change it
- `git add` - add changes to the staging area
- `git restore` *or* `git checkout HEAD <file>` - abandon changes
- `git restore --staged` *or* `git reset HEAD <file>` - unstage changes
- `git diff` - show unstaged changes
- `git diff --staged` - show staged changes
- `git difftool` - show changes in a GUI
- `git commit` - commit staged changes

*Note*

- The term "stage" evolved relatively recently; some resources still refer to it as the "index" or "cache".
- `git restore` is quite a new command (v2.23, ~Aug 2019) and still marked as "experimental". It is recommended over the `checkout` / `reset` approach as those commands are heavily overloaded with different functionality.

In [20]:
echo 'Hello, world!' >> README.md

In [21]:
git status

On branch [1mdev[m
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[33mmodified:   README.md[m

no changes added to commit (use "git add" and/or "git commit -a")


In [22]:
git diff

[1mdiff --git a/README.md b/README.md[m
[1mindex 64624b3..2e2a1c4 100644[m
[1m--- a/README.md[m
[1m+++ b/README.md[m
[36m@@ -1 +1,2 @@[m
 # Example repository[m
[32m+[m[32mHello, world![m


In [23]:
git add README.md
git status

On branch [1mdev[m
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   README.md[m



In [24]:
git diff

In [25]:
git diff --staged

[1mdiff --git a/README.md b/README.md[m
[1mindex 64624b3..2e2a1c4 100644[m
[1m--- a/README.md[m
[1m+++ b/README.md[m
[36m@@ -1 +1,2 @@[m
 # Example repository[m
[32m+[m[32mHello, world![m


In [26]:
git commit -m "Update readme"

[dev 52364e3] Update readme
 1 file changed, 1 insertion(+)


There are some [strong commit message conventions](https://commit.style/):

- Start with a one-line summary - try not to go beyond 50 characters, certainly not over 72. Since a git commit is essentially an email, this is the "subject".
- If more explanation is needed, separate it from the summary by a blank line. This is the "body" of the "email".
- Wrap at 72 characters - git commands generally do not do their own wrapping.
- Use the *imperative mood* (eg "fix x", "add y").

Note that there are some shortcuts, instead of needing to add every change:

- `git add <folder>` - recursively add a folder and its contents.
  - For example, `git add .` to add *everything*. **Be careful** that temporary or built files are correctly ignored first!
- `git add -u` - add all unstaged changes. New files are ignored.
- `git commit -a` - commit all changes, again excluding new files.

### When things go wrong...

Something will definitely go wrong at some point, from simple typos to committing to the wrong branch and only realising several commits in. **There is always a way out**. [This website](https://dangitgit.com/en) contains a good summary.

The main commands are:

- `git commit --amend` - amend the most recent commit with any staged changes, instead of creating a new commit.
- `git commit --amend --no-edit` - as above but without a prompt to edit the commit message.
- `git reflog` - show a log of every single state the `HEAD` has been in.
- `git reset --soft` - make `HEAD` point to a different commit, such as one found in the reflog.
- `git reset` - as above but also update the staging area to match the commit.
- `git reset --hard` - as above but update the working tree as well. **Caution**: this will overwrite changes with no warning.

## Sharing changes

### Remotes

A remote is another copy of a repository, such as one on GitHub. Changes can be pulled from a remote, or pushed to it.

When cloning a repository with `git clone <url>`, a remote called `origin` is automatically defined with the URL it was cloned from.

When a repository was initialised, we need to define the remote separately. GitHub will remind you how to do this after [creating a new repository](https://github.com/new).

In [27]:
git remote add origin git@github.com:bsherratt/example-repo.git

In [28]:
# Actually just use a local folder, so the notebook is more self-contained
if [ -d ../origin ]; then rm -rf ../origin; fi
git init --bare ../origin
git remote set-url origin ../origin

Initialized empty Git repository in /net/spice/scratch/bsherrat/git/origin/


### Remote branches

- **Local** branches can be freely checked out and committed to as expected. This is what we have been using so far.
- **Remote** branches exist only to represent the last known state of a remote repository, and cannot be checked out or committed to.
- A local branch can **track** a remote branch (or even another local branch).
- The branch being tracked is referred to as **upstream**.

The main commands to keep these branches in sync with each other are:

- `git fetch` - update remote branches to match the remote repository
- `git merge` (with no argument) - merge the upstream into the current branch
- `git pull` - `git fetch` followed by `git merge`
- `git push` - push to the upstream branch, if there is one. If not, it is an error, but git will suggest...
- `git push -u origin <branch>` - push `<branch>` to a branch of the same name at `origin`, and set it as the upstream

In [29]:
git push -u origin main dev

Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), 464 bytes | 154.00 KiB/s, done.
Total 6 (delta 1), reused 0 (delta 0)
To ../origin
 * [new branch]      main -> main
 * [new branch]      dev -> dev
Branch 'main' set up to track remote branch 'main' from 'origin'.
Branch 'dev' set up to track remote branch 'dev' from 'origin'.


There is now a copy of the `dev` branch at `origin` - ready for someone else to review.

On GitHub, this is done via a "Pull Request", and it will even prompt you to create one after pushing to a branch:

![GitHub pull request prompt](pr-prompt.png)

Upstream and remote branches are mentioned by several of the commands already seen:

In [30]:
git status

On branch [1mdev[m
Your branch is up to date with 'origin/dev'.

nothing to commit, working tree clean


In [31]:
git branch -r

  [34morigin/dev[m
  [34morigin/main[m


In [32]:
git branch -a

* [32mdev[m
  main[m
  [34mremotes/origin/dev[m
  [34mremotes/origin/main[m


In [33]:
git log --oneline

[33m52364e3[m[33m ([m[1;36mHEAD -> [m[1;32mdev[m[33m, [m[1;34morigin/dev[m[33m)[m Update readme
[33m01bf887[m[33m ([m[1;34morigin/main[m[33m, [m[1;32mmain[m[33m)[m Initial commit


### Merges

**True merge**: a new commit is created with two parents.

```
    main                 main
A - B            A - B - D
 \         -->    \     /
  C                C --'
  dev              dev
```

**Fast-forward**: one branch is updated to point at the other, because it has no commits that are not in the branch.

```
    main
A - B                     main
     \      -->   A - B - C
      C                   dev
      dev
```

**Rebase**: rewriting all the commits in a branch, as if it had been branched from somewhere else.

```
    main              main
A - B             A - B
 \          -->        \
  C - D                 C' - D'
      dev                    dev
```

The default behaviour of `git merge` is to attempt a fast-forward, or fall back to a true merge, though this can be controlled with the `--no-ff` or `--ff-only` options, or appropriate config settings.

When incorporating changes into `main`, repositories usually either always merge, or always fast-forward (rebasing if necessary). GitHub makes both options easy:

![Merge button](merge.png)

For bringing changes to `main` back into a development branch, merging and rebasing are both good options, entirely down to preference.

Rebasing creates a simpler, linear history than merging, but conflicts are generally harder to resolve correctly.

In [34]:
git checkout main

Switched to branch 'main'
Your branch is up to date with 'origin/main'.


In [35]:
git log --oneline --all

[33m52364e3[m[33m ([m[1;34morigin/dev[m[33m, [m[1;32mdev[m[33m)[m Update readme
[33m01bf887[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m, [m[1;34morigin/main[m[33m)[m Initial commit


Since there are no commits on `main` that are not on `dev`, this would be a fast-forward. Let's force a true merge instead.

In [36]:
git merge --no-ff --no-edit dev

Merge made by the 'recursive' strategy.
 README.md | 1 [32m+[m
 1 file changed, 1 insertion(+)


In [37]:
git log --graph --oneline --all

*   [33m2c6588e[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m)[m Merge branch 'dev' into main
[31m|[m[32m\[m  
[31m|[m * [33m52364e3[m[33m ([m[1;34morigin/dev[m[33m, [m[1;32mdev[m[33m)[m Update readme
[31m|[m[31m/[m  
* [33m01bf887[m[33m ([m[1;34morigin/main[m[33m)[m Initial commit


## Appendix

### Recommended config

| Setting                | Value  | Why |
|:-----------------------|:-------|:----|
| core.eol               | lf     | Use Linux line endings in checked out files, even on Windows |
| merge.ff               | false  | Make `merge` always a "true merge" |
| merge.conflictStyle    | diff3  | Add common ancestor to conflict markers |
| mergetool.keepBackup   | false  | Tidy `*.orig` files when `mergetool` finishes |
| color.status.changed   | yellow | Colour unstaged files differently to staged (green) |
| color.status.untracked | blue   | Nicer than the default red |

### Suggested aliases

| Alias         | Command |
|:--------------|:--------|
| alias.st      | `status` |
| alias.br      | `branch` |
| alias.co      | `checkout` |
| alias.cb      | `checkout -b` |
| alias.lg      | `log --oneline` |
| alias.graph   | `log --graph --oneline --all` |
| alias.sdiff   | `diff --cached` |
| alias.unstage | `restore --staged` |
| alias.amend   | `commit --amend --no-edit` |
| alias.reword  | `commit --amend --only --` |
| alias.ff      | `merge --ff-only` |

### Common commands

| Action                    | git                          | fcm                    |
|:--------------------------|:-----------------------------|:-----------------------|
| Check file status         | `status`                     | `status`               |
| Track new file            | `add <files>`                | `add <files>`          |
| Stage changes             | `add <files>`                | -                      |
| Stage all changes         | `add -u`                     | -                      |
| Unstage changes           | `restore --staged <files>`   | -                      |
| Abandon unstaged changes  | `restore <files>`            | `revert <files>`       |
| Remove file               | `rm <files>`                 | `rm <files>`           |
| View unstaged changes     | `diff`                       | -                      |
| View staged changes       | `diff --cached`              | `diff`                 |
| Open diff GUI             | `difftool`                   | `diff -g`              |
| Commit staged changes     | `commit`                     | -                      |
| Commit unstaged changes   | `commit -a`                  | `commit`               |
| Edit latest commit        | `commit --amend`             | -                      |
| List commits              | `log`                        | `log`                  |

| Action                    | git                          | fcm                    |
|:--------------------------|:-----------------------------|:-----------------------|
| Get a local copy          | `clone <url>`                | `checkout <url>`       |
| Create a branch           | `branch <name>`              | `branch-create <name>` |
| Switch checked-out branch | `checkout <branch>`          | `switch <branch>`      |
| Check out new branch      | `checkout -b <name>`         | -                      |
| Merge without committing  | `merge --no-commit <branch>` | `merge <branch>`       |
| Create a merge commit     | `merge <branch>`             | -                      |
| Resolve conflicts         | `mergetool`                  | `conflicts`            |
| Update local copy         | `pull`                       | `update`               |
| Publish branch            | `push -u origin <branch>`    | -                      |
| Share new commits         | `push`                       | -                      |