This notebook is derived from [this](https://neurathsboat.blog/post/git-intro/) excellent git tutorial, which in turn borrows from [here](https://swcarpentry.github.io/git-novice/). You can refer to these resources for more details on what's covered below, as well as more git functionality.

# TL;DR

"Too long; didn't read"

This is all you need to know to get started and for 90% of your git work.

Here's how staging is structured in git:

![git structure](https://neurathsboat.blog/post/git-intro/featured.png)

Here's how you set up a new git repo and push it to Github:

In [None]:
cd your_project_directory
# initialise a new repository in the current directory
git init
# add all files to a staging area
git add .
# commit files to the repository
git commit -m "Commit message"
# connect the local project with a remote (github) repository
# you'll need to have already created a github project to do this
# you'll need to have created an SSH key pair and registered on GitHub 
git remote add origin git@github.com:account/git-demo.git
# push your local repo to the remote repo
git push -u origin main

Here's how you continue to work on the project and push new changes:

In [None]:
# pull the remote repository (defined above as "origin") to this directory
git pull origin main
# make edits then add all files to the staging area
git add .
# commit changes to the local repo
git commit -m "Commit message"
# push changes to the remote repo
git push origin main

# Using this notebook

This notebook runs shell commands (Bash), __not__ Python. To run this notebook, you'll need to install Bash Kernel. You can do this by running the following commands in your terminal:

```
pip install bash_kernel
python -m bash_kernel.install
```

Afterwards, restart Jupyter (Notebook or Lab) and select "Bash" as the kernel for this notebook.

Alternatively, you can copy the commands from this notebook into the terminal if you don't wish to use Bash kernel. 

You need to have git installed on your system. You can find  instructions to install git for all systems [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).

Here's how to install it on a Debian-based Linux operating system (however, it may already be installed). Again, run this in your terminal. You'll need admin rights to install git.

```
sudo apt install git-all
```

Now that git is installed you can configure it with your details.

In [None]:
git config --global user.name "Your Name"
git config --global user.email "name@mail.com"

Let's get ready to start the class by clearing any previous demo results from the directory.

In [None]:
rm -r git_demo

# Why git?

Scientific practice needs to

- be collaborative
- be documented
- be open
- be amenable to revision
- include verifiable computer code

Scientists need to collaborate in a way that gracefully and robustly handles concurrent editing. Common practices to handle even simple cases of single-file versioning edited by few collaborators are cumbersome. An example that should be familiar to all academic researchers is to have a folder piled with versions of a single document.

![naming](https://neurathsboat.blog/post/git-intro/img/notFinal.gif)

The purpose of distributed source control systems, such as Git, is to enable multiple people to work on the same project in parallel or asynchronously. In combination with using a central online repository, everyone can have immediate access to the latest main version, while at the same time working on their private copies.

Simultaneously editing a file in Google Docs, although admirably powerful, can be very distractive as a change introduced by someone may disrupt another person’s work. But with Git every collaborator is working with a private copy, or branch as we will later see. Changes inside a private branch do not affect other branches.

At the most basic level, Git automatically tracks the answers to the questions:

- What has changed?
- Who made the change?
- When was the change made?

and in addition motivates you to answer the question

- Why did it change/ What is this change about?

# Local git repositories

I will first describe some basic commands for use through a terminal. However, many text editors or integrated development environments (IDE, e.g. VS Code, Atom) now offer a way to use these basic commands through a guided user interface. For more advanced usage, there are even standalone graphical git clients and we will see some examples of graphical interfaces for git later in this post.

In [1]:
mkdir git_demo # Create folder
cd git_demo # Move to that folder
git init # Tell Git to start watching this folder

Initialized empty Git repository in /home/jvrt/git_class/git_demo/.git/


If you look in the directory, you can see there is now a hidden file (prepended with ".") which contains all of the git information:

In [53]:
ls -la

total 16
drwxr-xr-x 3 jvrt jvrt 4096 Mar  8 14:26 [0m[01;34m.[0m
drwxr-xr-x 4 jvrt jvrt 4096 Mar  8 16:27 [01;34m..[0m
drwxr-xr-x 8 jvrt jvrt 4096 Mar  8 16:26 [01;34m.git[0m
-rw-r--r-- 1 jvrt jvrt    5 Mar  8 14:26 test.txt


Now the git software knows this is a git repo so we can start using git commands with it.

In [54]:
git status

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


Now let's start adding some files and text to our project.

In [4]:
touch test.txt # Create file
echo spam >> test.txt # Write "spam" inside test.txt
cat test.txt # Display file contents

spam


To include this file in our staging area we need to `add` it. We should `add` things frequently. Think of it as backing up the work you're doing, like the save button in Word.

In [None]:
git add test.txt
git status

Every so often, when we're performed a chunk of work that has achieved something in the project (e.g. fix a big, add a new function), we can `commit` it to the repository. When we do so it's important to include a message that briefly describes what this chunk of work achieved:

In [None]:
git commit -m "Include spam"

In [59]:
git checkout -b develop

Switched to a new branch 'develop'


In [60]:
echo "I am developing" >> test.txt
cat test.txt

spam
I am developing


In [61]:
git status

On branch develop
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   test.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


In [66]:
git add --all
git commit -m "include asinine statement"

[develop 6a6a4e0] include asinine statement
 1 file changed, 1 insertion(+)


To revert back to the latest state of the file Git has recorded we simply checkout the branch’s HEAD

In [None]:
git checkout HEAD test.txt

At some point, we will feel that our edits are complete. We can then merge the changes we made in the develop branch with the master branch. First, we need to switch back to the master branch and then use the git command merge:

In [68]:
git checkout main
git merge develop

Switched to branch 'main'
Your branch is up to date with 'origin/main'.
Updating bb13dd5..6a6a4e0
Fast-forward
 test.txt | 1 [32m+[m
 1 file changed, 1 insertion(+)


# Remote git repositories

This is where online services such as GitHub and GitLab come in. Both GitHub and GitLab offer free public repositories with many useful features.

While anyone can view public GitHub projects, you need to create an account to create or interact with a hosted project.

## Remote repository security

Interacting with your GitHub repository outside its own web interface will be "simplified" (read: (╯°□°）╯︵ ┻━┻ ) if you create a secure shell (SSH) key pair (1 public, 1 private). This process can be fraught with errors and you may need to do some Googling to find solutions to problems you run into. GitHub will soon __only__ allow SSH access, so it's good to figure it out now. The steps below should cover some of the major headaches.

You need an SSH key pair for every device you intend to use with GitHub. If you do not already have an SSH key associated with your computer & GitHub account, create one with:

In [None]:
ssh-keygen -t ed25519 -C "jvrt@pm.me" -f ~/.ssh/id_ed25519 -N "" -m PEM

You'll need to add the public part of this key pair to your GitHub account under:

Settings ➡ SSH and GPG Keys

Then click __New SSH key__. Give it a sensible name so you know which device the key is associated with (e.g. "my laptop"), then copy the output of the following cell into the key cell.

In [None]:
cat ~/.ssh/id_ed25519.pub

Finally hit __Add SSH key__.

You'll now need to make your computer "aware" of this SSH key pair using:

In [36]:
eval `ssh-agent -s`
ssh-add ~/.ssh/id_ed25519

Agent pid 19077
Identity added: /home/jvrt/.ssh/id_ed25519 (jvrt@pm.me)


You also need to restrict access to the keys (so only you can access them on your computer):

In [40]:
chmod 600 ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519.pub
chmod 644 ~/.ssh/known_hosts
chmod 755 ~/.ssh

Now add github.com as a known SSH host, so your computer trusts it:

In [None]:
ssh-keyscan github.com >> ~/.ssh/known_hosts

## Creating a remote repository

Now it's time to create your remote repository. Do so on GitHub.com like so:

![create github repo](./git_create_repo.png)

GitHub helpfully gives you the commands you need next to sync your local repo with their servers:

![create github repo options](./git_create_repo_options.png)

In [None]:
git remote add origin git@github.com:JamesTownend/git_demo.git
git branch -M main
git push -u origin main

This local repo is now aware of the remote repo, so we can push changes to it using

In [None]:
git push origin main

Or, because `origin` is the default remote, and `main` is the default branch:

In [None]:
git status

In [None]:
git push

And we can pull changes from the remote using

In [None]:
git pull

# Merge (AKA pull) requests

If we are working with others on a remote repository, when we want to merge a local branch with the master branch of the remote repository we can open a merge request (or pull request in GitHub’s terminology). The main difference with a local merge as the one we performed before is that now collaborators of a project are free to submit a new merge request, or to review an existing merge request and suggest improvements or comments on the merge request as a whole.

Every merge request starts by creating a branch. Merge requests are performed on 

GitHub.com ➡ your repo ➡ pull request

Let's try it with our demo repo.

First let's create a new branch, make some changes, and commit those changes.

In [None]:
git checkout -b test_merge_request
echo "random text for our merge request!" >> test.txt
git add --all
git commit -m "included some random text"

Now we can push this branch to the remote repo (`origin`). GitHub conveniently gives us a link to the pull request.

In [None]:
git push origin test_merge_request

# Git for non-text

Putting your manuscripts and other text-based files under Git can be rewarding, as long as you take some considerations into account. While Git can track any file that you instruct it to, with some file formats you are not getting the most out of Git. An example that is relevant to many academics is Microsoft Word’s native file format, DOCX. Since a DOCX file is a compressed XML file, diffing a DOCX file is not very informative.

Consider using LaTeX or Markdown to write documents with, as these are text-based files formats that play well with git.

You should make it a habit of ignoring large non-text files in your repository, which GitHub is not suited for (and may be too large to store for free anyway). You can ignore files and folders by including them in a text file called `.gitignore`. To associate ignored files with your project when you want to share it, you can include code to download these files from another storage location, such as [Zenodo](https://zenodo.org/). Zenodo allows you to associate files with a DOI, which makes sharing easier.

Here's an example of a `.gitignore` file, used to ignore the `.ipynb_checkpoints` folder used to store Jupyter notebook save data, which serves no purpose by being included in the repo:

In [None]:
cat .gitignore

# Other ways of using git

Despite the flexibility it offers, using the command line is a deterrent for some people. However, everything that we have done so far with Git and even more can be performed from inside the comfort of a modern text editor, IDE, or dedicated software application.

I use [VS Code](https://code.visualstudio.com/) to develop in, which has built-in git (and jupyter!) functionality.

You can also manage git in [Rstudio](https://www.rstudio.com/products/RStudio/)

![Rstudio](https://neurathsboat.blog/post/git-intro/img/Rstudio_commit.png)

and [GitKraken](https://www.gitkraken.com/git-client)

![GitKraken](https://neurathsboat.blog/post/git-intro/img/GitKraken_commit.png)