# Version Control and Git

## Why do I need to learn about version control?

### Reproducibility in Research. 

This is an important topic and is something that you should not only familiarize yourself with as early as possible, but make a habit to use and employ throughout your scientific career. Hopefully, we know how important it is to document your experiments, your reagents, and your protocols fully. This _must_ also extend into your data analysis. Modern computational biology and bioinformatics workflows often involve using a number of community-supported tools, or custom programs/scripts that you create during the course of designing your workflows.  Using and developing these tools, it is very likely that you will not remember exactly how you performed some component of your workflow, or what specific parameters you set for a given program. And of course, we all make mistakes, and we would prefer to be able to back track through our analysis to find these mistakes, find out when they were introduced, and what the potential consequences are.

What we would like to have is a system where we can make frequent 'checkpoints' in our project analysis amd keep a log or record of how our projects have changed over time. 

### Saving Lost Work

Even not considering computational work, just consider writing a report/essay/article using a word prcoessor - working alone but also collabortively with rounds of edits:

![phd101212s.gif](../files/phd101212s.gif)

credit - phdcomics

But even worse, as you are working - you might only use one filename the entire time.  What happens if you lose work this way, deleting a section that you later realize you need, or saving over a file with a mistake?

In fact - even normal word processors use version control under the hood now, precluding situations like this.  If you look at google docs, you can see version history:

![image.png](attachment:image.png)

By looking at this you can even roll back and see what edits were made and when.  Google docs for example has the ability to "name" a specific version, so you can go back to it - like before sharing a document, submitting to a journal, etc.


## Git

`Git` is one popular tool (certainly not the only one) that attempts to solve some of these issues by providing a mechanism for version control.  It allows you to take snapshots (commits) of a project or directory that can then be referred back to, or shared publicly as you see fit.  The sharing of code/methods/projects openly and publicly also allows you to A) solicit feedback and or identify errors you may have introduced. B) Work on a project collaboratively with multiple people simultaneously (every one can work on their own parts locally and then MERGE them together into a common repository), and C) publish your analysis concurrently with your manuscripts/reports to allow others to reproduce your work and confirm its validity.

![git_2x.png](../files/git_2x.png)

Credit - xkcd

As an example, most of the materials for this course were developed on a collaborative GitHub repository: [https://github.com/timplab/bcmb_bootcamp](https://github.com/timplab/bcmb_bootcamp)

## Github

Github is perhaps the most popular platform for hosting git repositories.  It allows you to host your code, data, and other files in a public or private repository.  It also allows you to collaborate with others on projects, and to solicit feedback on your work.  It is also a great place to find code and data that others have made available for public use.  For example, you can find a lot of code and data for bioinformatics tools and pipelines on GitHub.  Microsoft recently acquired GitHub, but it is still a very popular platform for hosting code and data.

To setup a github account, you should navigate to `github.com` via a web browser on your computer and create an account - we'll pause while you do this.  I suggest using your jhu email - you will then be able to take advantage of educational discounts and other benefits.

After you've setup your account, you should also apply for GitHub Education - follow the instructions linked here: https://docs.github.com/en/education/explore-the-benefits-of-teaching-and-learning-with-github-education/github-education-for-students/apply-to-github-education-as-a-student

### SSH Keys

Next we are going to make a set of Secure SHell (SSH) keys.  SSH keys allow passwordless access to computers and servers.  This is useful for connecting to remote servers, and for authenticating with GitHub.

Note - for Windows Machines - you will need to install openssh using Windows Update under "Optional Features" - you can then use the ssh-keygen command in the command prompt to generate a key.  

MacOS already has openssh installed.

To create an SSH key for GitHub use, follow these steps:

1. Open a terminal or command prompt.
2. Generate a new SSH key by running the following command:
```bash
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
```
    Replace `"your_email@example.com"` with your JH email address.

3. You will be prompted to choose a location to save the key. Press Enter to accept the default location (`~/.ssh/id_rsa`) or specify a different location.

4. You will also be prompted to enter a passphrase. For class - please leave this blank.  If you are using this for your own work, you may want to enter a passphrase to add an extra layer of security.

5. Once the key is generated, you can view the public key by running the following command:
```bash
more ~/.ssh/id_rsa.pub
```

6. Copy the entire contents of the public key that is displayed in the terminal to the clipboard (e.g. Ctrl-C or Apple-C).

7. Go to your GitHub account settings and navigate to the "SSH and GPG keys" section.

8. Click on "New SSH key" or "Add SSH key".

9. Give your SSH key a descriptive title and paste the copied public key into the "Key" field.

10. Click "Add SSH key" to save the key to your GitHub account.

Now you have successfully created an SSH key for GitHub use. You can use this key to securely authenticate with GitHub when pushing or pulling repositories.

(Optional): You can also use this key instead of passwords to authenticate to the server.  To do this, you will need to copy the public key to the server's `~/.ssh/authorized_keys` file.  This is a bit more advanced, but is a good practice for security.

## Make first Github Repo

Let's make a first repository on github - go to github.com and click the "+" in the upper right corner and select "New Repository".  Name it "first_repo" and give it a description.  You can make it public or private - for now, make it public.  You can also initialize it with a README file, which is a good idea.  Click "Create Repository".

This will then land you on your repos basic page - you can see the README file, and you can see the URL for the repository:

![image.png](attachment:image.png)

## Cloning a Repository to your local machine

To clone a Git repository from GitHub, follow these steps:

1. Open your terminal or command prompt.
2. Navigate to the directory where you want to clone the repository.
3. Copy the URL of the GitHub repository you want to clone.
4. In the terminal or command prompt, use the `git clone` command followed by the repository URL. For example:
    ```bash
    git clone https://github.com/username/repository.git
    ```
    Replace `username` with the GitHub username and `repository` with the name of the repository you want to clone.
5. Press Enter to execute the command.
6. Git will create a new directory with the same name as the repository and download all the files from the repository into that directory.
7. Once the cloning process is complete, you can navigate into the cloned repository directory using the `cd` command.
8. You now have a local copy of the Git repository on your machine, and you can start working with the files and making changes as needed.

By following these instructions, you will be able to clone a Git repository from GitHub and have a local copy of the repository on your machine for further development or collaboration.

## Configuring Git Enviornment

To configure Git on the command line interface (CLI), follow these steps:

1. Open the terminal or command prompt on your computer.
2. Set your global username by running the following command:
    ```bash
    git config --global user.name "Your Name"
    ```
    Replace "Your Name" with your desired username.
3. Set your global email address by running the following command:
    ```bash
    git config --global user.email "your_email@example.com"
    ```
    Replace "your_email@example.com" with your email address associated with your Git account.
4. (Optional) Set your preferred text editor for Git by running the following command:
    ```bash
    git config --global core.editor "your_text_editor"
    ```
    Replace "your_text_editor" with the command or path to your preferred text editor (e.g., "nano", "vim", "code").
5. (Optional) Enable helpful Git command suggestions by running the following command:
    ```bash
    git config --global help.autocorrect 1
    ```
    This will automatically correct mistyped Git commands.
6. Verify your Git configuration by running the following command:
    ```bash
    git config --list
    ```
    This will display your global Git configuration settings.

By following these steps, you will have successfully configured Git on the command line interface (CLI), allowing you to use Git for version control in your projects.

## Using git from VSCode

Git integration is a key feature of VSCode, providing seamless version control capabilities within the editor itself. With Git integrated directly into VSCode, developers can easily manage their code repositories, track changes, collaborate with others, and revert to previous versions if needed.

One example of where Git integration can be found in VSCode is the Source Control view. This view allows you to see the status of your Git repository, including any modified, added, or deleted files. You can also view the commit history and branch information.

![screenshot of source control view](../files/vscode_sourcecontrol.png)

To access the Source Control view in VSCode, you can use the shortcut `Ctrl + Shift + G` or click on the Git icon in the left sidebar. Once in the Source Control view, you can see a list of changed files and their status. You can stage changes, commit them with a message, and push them to a remote repository.

Additionally, VSCode provides a range of Git-related commands and features to streamline your workflow. For example, you can use the "Git: Clone" command to clone a remote repository directly from within VSCode. You can also use the "Git: Pull" command to fetch the latest changes from a remote repository and merge them into your local branch.

In summary, Git integration in VSCode empowers developers to effectively manage their code repositories, collaborate with others, and maintain a comprehensive version control history, all within the familiar and powerful VSCode environment.

## Pull in the Bootcamp Repo

To clone the `timplab/bcmb_bootcamp` repository using VSCode, follow these steps:

1. Click on the "Source Control" icon in the left sidebar (it looks like a branch with a magnifying glass).
2. In the Source Control view, click on the "Clone Repository" button.
3. Enter the URL of the `timplab/bcmb_bootcamp` repository: `https://github.com/timplab/bcmb_bootcamp.git`.
4. Choose a local directory where you want to clone the repository.
5. Click "Clone" to start the cloning process.
6. Once the cloning is complete, you will see the `bcmb_bootcamp` repository listed in the Source Control view.
7. You can now navigate through the repository, view files, make changes, and commit them using the Git integration in VSCode.

By following these steps, you will be able to clone the `timplab/bcmb_bootcamp` repository using VSCode and start working with the code and documentation provided in the repository.


But then - you don't want to edit on the "main" branch, but on your own personal branch.



To make your own branch of the repository and merge new changes from the main branch in VSCode, follow these instructions:

1. Open the repository in VSCode.
2. Click on the "Source Control" icon in the left sidebar (it looks like a branch with a magnifying glass).
3. In the Source Control view, click on the branch name (usually "main") to open the branch dropdown menu.
4. Click on the "+" icon next to the branch dropdown menu to create a new branch.
5. Enter a descriptive name for your branch and press Enter to create it.
6. VSCode will automatically switch to your newly created branch.
7. Make the desired changes to the files in your branch.
8. Once you have made your changes, save the files.
9. In the Source Control view, you will see the changed files listed under "Changes".
10. Click on the checkbox next to each file you want to include in the commit.
11. Enter a commit message in the text box at the top of the Source Control view.
12. Click on the checkmark icon to commit your changes.

Pause here - I'll make some edits when everyone is caught up, then you can try to merge them:

Making lots of changes, ch-ch-ch-changes.


1. To merge new changes from the main branch into your branch, switch back to the main branch by selecting it from the branch dropdown menu.
2. Click on the ellipsis (...) icon in the Source Control view and select "Pull".
3. VSCode will fetch the latest changes from the remote repository and merge them into your main branch.
4. Switch back to your branch by selecting it from the branch dropdown menu.
5. Click on the ellipsis (...) icon in the Source Control view and select "Merge Branch".
6. Select the main branch as the branch to merge from.
7. VSCode will merge the new changes from the main branch into your branch.
8. Resolve any merge conflicts, if necessary.
9. Once the merge is complete, you can continue working on your branch and repeat the process of committing and pushing your changes.

By following these instructions, you will be able to create your own branch of the repository, make changes, and merge new changes from the main branch in VSCode.

## CoPilot and ChatGPT

GitHub Copilot is a LLM coding assistant developed by GitHub and OpenAI. It is designed to assist developers by generating code suggestions and completing code snippets in real-time. Copilot is trained on a vast amount of publicly available code, making it capable of providing context-aware code suggestions based on the code you are currently working on.

It's especially useful if you _know what you want to do, algorithmically or otherwise, but don't know the specific "syntax" or "function" to use in a programming lanugage.  So if you learn python but don't know R, you can use copilot to help you write R code.

One of the main advantages of Copilot is its ability to save time and increase productivity. It can quickly generate boilerplate code, complete function signatures, and suggest relevant code snippets based on the task at hand. This can significantly speed up the development process, especially for repetitive or complex coding tasks.

However, it is important to note that Copilot makes a ton of mistakes. It can generate code that's not what you really intend, or code that doesn't even work.  If anything, it requires more understanding and more careful checking of what the steps are doing than not. It's a tool - basically advanced googling for code pieces. It's not a replacement for understanding what you are doing.

### Let's try to sign up:

Your educational github account - when approved - should give you free copilot access while you are a student.  Meanwhile, let's sign up for a trial:

To get a free Copilot trial on GitHub, follow these steps:

1. Open your web browser and navigate to the GitHub website (github.com).
2. Once you are logged in, go to the Copilot page by visiting the following URL: [https://copilot.github.com/](https://copilot.github.com/)
3. On the Copilot page, you will see a button that says "Get Copilot for free". Click on this button to start the trial process.
4. You will be prompted to enter your email address to receive an invitation to the Copilot trial. Enter your email address and click "Request access".

To install the GitHub Copilot extension in Visual Studio Code, follow these steps:

1. Click on the Extensions icon in the left sidebar (or press `Ctrl+Shift+X`).
2. In the search bar, type "GitHub Copilot" and press Enter.
3. The GitHub Copilot extension should appear in the search results. Click on it.
4. On the extension page, click the "Install" button to begin the installation process.
5. Once the installation is complete, you will see a "Reload" button. Click on it to reload Visual Studio Code.
6. After the reload, the GitHub Copilot extension will be ready to use.

Now you have successfully installed the GitHub Copilot extension in Visual Studio Code. You can start using it to assist you with code suggestions and completions while you work.

It'll do tab completion/autocomplete for you, or you can you ctrl/appl-I to give it a prompt for what you want.

In [None]:
#!/bin/bash 

for i in {1..20}
