# Python for (open) Neuroscience

_Lecture 2.3_ - GitHub and publishing code

Luigi Petrucco

Jean-Charles Mariani

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/lectures/Lecture2.0_Real-world-Python.ipynb)

# Git and GitHub

## Git

Git is a software for code versioning.

Code versioning is a way to avoid a scripts folder that looks like:

 - `final_script.py`
 - `final_script_final.py`
 - `final_script_final_afterrevision0304.py`
 - `final_script_final_afterrevision0304_isweartogodthisisthelastone.py`

The concept is simple: as source code files are small (it's just text after all), we can keep every version of them stored and accessible on demand!

The core concept of Git is the one of repository (or repo).

A repository is a folder that contains a project that we want to use version control on.



When we work in a repository, every now and then (as we would do with file saving), we can `stage` all our changes, and `commit` them.

This creates a snapshot of the state of the repository we can go back to every time we need it!

All those snapshots just live locally, in the `.git` folder you will find in the project. GitHub offer a very straightforward way of keeping it safe online, and work on it from different machines!

## The GitHub flow

With GitHub, we can synch our local repository with a remote twin repository that lives in GitHub's servers (there's other options for hosting, we won't see them here)

- `push` code to a remote repository: every time we make local changes, we can then `push` them to the remote repository to save them in the cloud.

- `pull` if we edit files of the repository online, or we have changed code and `push`ed it to the GitHub repository, we can `pull` the changes to update our local project version.

## Sounds cool, but how do I start all this?

Let's create an actual repository and use it!

From GitHub, top right plus sign, let's create a new repository.

### `clone`

Now, to make a local version of a remote repo, we need to `clone` it.

To do this, and use all the other Git/GitHub commands, we need to use a Git interface:

- The most basic and straightforward interface is the command line tool
- There are graphical interfaces (such as GitHub Desktop or Fork) that can be useful to visualize the project history
- Some development environments like VSCode, PyCharm or JupyterLab directly embed Git tools

To `clone` a repository:
- From the command line, you can use:

```shell
> git clone url_of_repository
```

Otherwise, you can do it with GitHub Desktop.

If you are trying to use the terminal, make sure you then write
```shell
cd repo_name
``` 
To make the repository the working directory of the terminal!

### `add`

Let's now make some changes to the repository! For example, add a small `.py` file with code that just prints `"Hello World"` in it.

Then, let's `stage`/`add` this change:
- from the terminal, we can just say:
```shell
> git add *
```

This will stage all the change we made (we could do it for a single file by specifying the filename instead of *). Otherwise, we could have staged it from GitHub Desktop.

### `commit`

To save all the changes that we staged, we need one last step: commiting the changes, with a message specifying what we just changed in the code.

From the terminal, we can:
```shell
> git commit -m "Added test.py file"
```

(Or commit from GitHub Desktop)

### `push`

Finally, we can push our changes to the remote repository to save them forever. To do so, from the terminal, we just

```shell
> git push
```

(Or push from GitHub Desktop)

### `pull` remote changes

Let's now go on GitHub and add some changes to our `README.md` file! (which is super-cool and you should write on it, by the way).

After saving the change by committing it directly from GitHub, we can go to our terminal (or GitHub Desktop), and (from inside the repository):

```bash
> git pull
```

## More advanced Git/GitHub features:

There are more advanced features of GitHub you might want to explore, in particular:

- `branches`: we can work on different variants of the same project where we explore different ideas. We can fork our project with a different `branch`, and `merge` it to the main branch only when we are happy with it.

- Collaborative code using `pull requests`: GitHub offers smart ways of working collaboratively with people. Each developer can work on their own versions, or `forks`, of the project, and then asking for pulling code to the main project via `pull requests`.

# Publishing code

Why publish code? Two occurrences:

- Software tools that we want to give to the community
- Code associated with a paper


## Software tools for the community

We won't we covering this section. Publishing and mantaining user-oriented software is worth an entire course by itself!

## Code associated with a paper

This one's easy! You should ALWAYS do it!

## Why publishing code

As papers become more and more dependent on computer analysis, your code is your best Methods section!

From experimental paradigms to every step of an analysis, the flow from raw data to a figure on a paper is almost entirely code-driven.

Publishing code incentives us to keep manual steps of an analysis to the minimum!

## When not to publish code?

Never!

The main reasons people don't publish code in 2023 are:
- They have not been told how good publishing code is
- They feel ashamed by putting out ugly code
- They feel they would need to spend a loot of time to make code publishable

Not true! And nothing to be ashamed of!

Ugly code is better than no code (also, if you don't publish your code one can only assume the code was ugly)

## Publishing code is simple

The fundamental requirements for publishing code are:
- a repository with the project's code
- A `README.md` stating:
    - what is the content of the repository
    - which scripts are the most important ones to replicate
    - what are the software requirements (Python/MATLAB version, libraries, etc)
- A license, even if you think no one will ever use your code
- A DOI!!! 

## DOIs

Digital Objects Identifiers; they are the only way to ensure that something we put on the internet will stay there for a while.

Websites that give us DOIs (such as journals or other agents) ensure that the corresponding address will be active and reachable for some number of years.

## GitHub does not mint DOIs

For publishing your code, GitHub is not a good place! The address of a repository is not permanent, the repository can be deleted or made private! 

NEVER use GitHub to publish your paper-associated code!

Give a GitHub link in a paper only if you want to refer to the project/developer community associated with it

## Servers for publishing code

There are nice servers that give you the possibility of publishing code directly from a GitHub repo,  for example:
- Zenodo
- FigShare

Those websites will give you a DOI that you can use to mention your code in a paper.

You can find on their websites nice instructions to put your code there.

# Publishing data

Ideally, the more data you can share together with your code, the better!

If size is a problem, you can think about sharing some partially processed version of the data.

Published datasets are likely to become increasingly important and citable in the future!

For very computationally heavy projects, I would recommend always to share at least a small toy dataset that can be used to reproduce your analyses!

**Note**: Always keep your data separate from code! You do not want to put data in a GitHub repository, remember!

### Where to publish data

For datasets there are hosting services that can store tens or hundreds of Gigabytes of data and give us a DOI!
- FigShare
- Zenodo
- Dandi
- OpenNeuro
- ...

Make sure you always mint a DOI if you want to publish data!