# Things to remember about `git` and GitHub

## `git` ≠ GitHub

There's been some confusion about what's the difference between `git` and GitHub, and specifically when you make changes using `git` but don't see them on GitHub. 

### `git` is the program you use for tracking changes
This will be a sped-up version of the `git` intro from Software Carpentry and will hit all the high points that you need to remember for working with `git` in this class.

Let's start with a fresh new repository.

```
$ cd ~/code
$ mkdir biom262-git-test
$ cd biom262-git-test
$ git init
Initialized empty Git repository in /Users/olga/workspace-git/biom262-git-test/.git/
```

Let's see what `git` knows about this file using `git status`:

```
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)
```

There's no files here yet. Just like in the git tutorial, we'll create a few empty files and then see how this changes the `git` repo.

```
$ touch pipeline.sh
$ touch statistics.sh
$ touch plots.sh
```

We should now see a few files, both using `ls` (which literally tells us which files are there) and `git status` (which tells us what `git` knows about the files)

```
$ ls
pipeline.sh   plots.sh      statistics.sh
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	pipeline.sh
	plots.sh
	statistics.sh

nothing added to commit but untracked files present (use "git add" to track)
```

At this point, **`git` has no idea that these files matter to you**. Where it says "`Untracked files`", it shows a list of files that `git` sees, but doesn't think you care about so it's not *tracking* their changes. Let's `git add` and `git commit` these files. I'm going to be lazy and add everything that ends in "`.sh`".

```
$ git add *.sh
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   pipeline.sh
	new file:   plots.sh
	new file:   statistics.sh
```

Oh wait I don't want to commit `plots.sh` yet. So let's remove it using `git`'s hint of `git rm --cached <file>` to "unstage." You can think of "staging" as the "staging area" like if you're corralling cattle 

So in TSCC you edited some files in a `git` repository. You then do `git status` and see 

You used `git` on TSCC to `git add` and `git commit` files

### GitHub is the place you put files so you don't lose your dissertation work when you computer explodes

After you've `git add`'d, `git commit`'d and everything, GitHub (the place) still has no idea what you've done. It's only when you `git push` and `git pull` that you talk to GitHub and tell them you've made some updates.

```
$ git pull upstream master
$ git push origin week04
```


## Forks ≠ Clones

When you "Fork" a repository (repo), you make a copy that you personally can edit, commit, add, and push changes to. When you "Clone" a repository, you're copying all history of all code that was ever written in that project. If it's your project, then you have write access. Otherwise, no. Usually you don't have write access to things you're cloning.

## No `LICENSE` = no fun

One thing we didn't go over in class is that if your code is up somewhere publicly available without an explicit license, then technically nobody is allowed to use it. That's why using a license for your code (I recommend the [UCSD-specific 3-Clause BSD license "UCSD Software Copyright Notice"](https://confluence.crbs.ucsd.edu/display/CRBS/Releasing+Open+Source+Software+at+UCSD) we used for the first homework because it allows for the most people to use it - academics, companies, industry, non-profit, for-profit - without restrictions. 

There's another license called the GPL that many people use and is (in my opinion) a little idealistic because it prevents the usage of the code in a properitary setting (i.e. Microsoft can't use it privately within the compnay). If you're interested in open source license, I encourage you to browse [opensource.org](https://opensource.org/).

## Branches are useful for working on homework

### Why branches

Branches are great because you can work on one notebook in one branch and leave the other code completely untouched and then switch to another branch and work on some other notebook

### Working with branches

Create a new branch:
```
git checkout -b newbranchname
```

Change to a different branch

```
git checkout otherbranch
```

## Fixing merge conflicts

IF you update your repository and you see a merge conflict, it'll look like the thing below. This is totally normal and we can fix it and there is no need to panic.

```
[ucsd-train01@tscc-0-63 biom262-2016]$ git pull upstream master
From github.com:biom262/biom262-2016
 * branch            master     -> FETCH_HEAD
Auto-merging weeks/week04/0_alignment_expression_quantification.ipynb
CONFLICT (content): Merge conflict in weeks/week04/0_alignment_expression_quantification.ipynb
Automatic merge failed; fix conflicts and then commit the result.
```

You'll need to edit the file:

```
nano weeks/week04/0_alignment_expression_quantification.ipynb
```

And find the merge conflict that looks like:


```
<<<<<<< HEAD
    Stuff that's in your version (whether you wrote it or it's an old version)
=======
    Updates from "upstream"
>>>>>>> 537cb50... add clarification for alignment quantification
```

If it's an answer you want to keep, you'll want to keep the first section (your answer) and delete the first, but if it's updates from upstream you'll want to remove the first section and keep the second. Either way, at the end you don't want any of these lines - they should be removed. What you keep in between them is what you decide makes sense.

```
<<<<<<< HEAD
=======
>>>>>>> 537cb50... add clarification for alignment quantification
```

## Fixing "Unreadable Notebook" error

If you updated your noteobok you're probably seeing this kind of error:


> Unreadable Notebook: /home/ucsd-train01/code/biom262-2016/weeks/week04/0_alignment_expression_quantification.ipynb NotJSONError('Notebook does not appear to be JSON: \'{\\n "cells": [\\n {\\n "cell_type": "m...',)

That's because there are merge conflicts in the Jupyter notebook that make it incomprehensible to Jupyter, but we can fix that.

### Fixing merge conflicts

Do "git status" to see which files are modified. The ones with "both modified" are the ones with the merge conflict:

```
[ucsd-train01@tscc-login2 biom262-2016]$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 73 commits.
#
# Unmerged paths:
#   (use "git add/rm <file>..." as appropriate to mark resolution)
#
#       both modified:      weeks/week04/0_alignment_expression_quantification.ipynb
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       weeks/week01/data/gencode.v19.annotation.chr22.transcript.promoter.nfkb.gtf
#       weeks/week01/data/tf.nkfb.bed
no changes added to commit (use "git add" and/or "git commit -a")
```

### Fixing conflicts by editing the modified file

Now you'll need to edit the modified file in question using `nano`.

```
nano weeks/week04/0_alignment_expression_quantification.ipynb
```
(you'll need to replace the above path with the actual path to your conflicted notebook)

Use the arrow keys (up and down) to scroll through the file until you see the merge conflict syntax, which looks like this:

```
<<<<<<< HEAD
    "Now use `ln -s filename newplace` to create soft links of the folders in your `~/scratch/shalek2013/` directory to your `~/projects/shalek2013` directory so when yo$
=======
    "Now use `ln -s filename newplace` to create **soft links** (aka \"shortcuts\" or \"pointers\") of the folders in your `~/scratch/shalek2013/` directory to your `~/p$
    "\n",
    "```\n",
    "ln -s /projects/ps-yeolab/biom262-2016/seqdata/shalek2013 $HOME/projects/shalek2013/raw_data\n",
    "ln -s $HOME/scratch/shalek2013/processed_data $HOME/projects/shalek2013/processed data\n",
    "```\n",
    "\n",
    "Fix bottom one to:\n",
    "```\n",
    "ln -s $HOME/scratch/shalek2013/processed_data $HOME/projects/shalek2013/processed_data\n",
    "```\n",
    "\n",
    "If you're seeing black and red links you've done something wrong... remove the `~/projects/shalek2013` directory and start over.\n",
    "\n",
    "```\n",
    "rm -f ~/projects/shalek2013\n",
    "```\n",
    "\n",
    "Then make sure your:\n",
>>>>>>> 537cb50... add clarification for alignment quantification
```

To fix the conflict, remove the lines:

```
<<<<<<< HEAD
    "Now use `ln -s filename newplace` to create soft links of the folders in your `~/scratch/shalek2013/` directory to your `~/projects/shalek2013` directory so when yo$
=======
```

and 
```
>>>>>>> 537cb50... add clarification for alignment quantification
```

If that's the only one you can find, then you're done! Refresh the notebook. If you find more, you'll need to resolve the rest. Remember that the first section, "`HEAD`" is what you have locally, and the second section is what the remote has.

Now you'll need to add and commit the file.


```
[ucsd-train01@tscc-login2 biom262-2016]$git add  weeks/week04/0_alignment_expression_quantification.ipynb
[ucsd-train01@tscc-login2 biom262-2016]$ git commit -m 'fix updates from upstream'
```

## Show files with merge conflicts

```
git ls-files -u
```