# <center>Open source, Git, and GitHub</center>
<br>
<center>
    <div style="width:auto; height:100px;">
        <img style="float:center; display:inline; height:80px; width:auto;" src="figures/Git-Logo-2Color.png" />
        <img style="float:right; display:inline; height:80px; width:auto;" src="figures/GitHub-Mark-120px-plus.png" />
    </div>
</center>
    
### <center>Timothy E. Holy</center>
<center>Department of Neuroscience</center>
<center>Washington University in St. Louis</center>

# What we'll cover today

- open source: expectations, etiquette, and economies
- open source licenses
- how to recognize a great package
- package structure
- core git concepts (*not* the practicalities of git or GitHub)

# Covered in your homework

Practical use of `git` and GitHub

![open source rules](figures/os_take_over_world.png)

# Open-source expectations & etiquette

- GitHub is a social media platform (people get to know you)

- Be nice!

Specifically:
  + you're not a *customer* (you haven't paid anything)
  + much of the code is written by volunteers
  + the authors are probably like you (busy with a variety of demands and reward structures)

- Creating great packages takes a diversity of skills (code development, writing documentation, and growing your community). People might be good at some of these and not so good at others

# Open source economies

- the currencies are *time* and *passion*

- many developers care about their packages and will fix bugs if you report them. **Most developers are very grateful to receive "good" bug reports.**

- most developers know that code-familiarity helps: if it's my package, I can fix it more efficiently than someone who doesn't know the code

- bug reports should be fully reproducible: whoever handles your issue should be able to copy/paste your code and reproduce the bug. Gold standard: the *MWE = Minimal Working Example*.

- it's OK for a developer to *not* fix your bug or implement your feature request: you can do it, too.

# GitHub repository structure

![package structure](figures/repo_structure_dataframes_1.png)

![readme](figures/repo_structure_dataframes_2.png)

# Things to check when using or contributing: open-source licenses

The license declares what you can do with code:
- can you use it in a research project?
- can you use it to produce a commercial product?
- can you share it?
- can you modify it?

Categories of license:
- restrictive (awareness required even when reading the source code)
  + commercial
  + "copyleft" (examples: GPL, LGPL): fully open, can *use & distribute* without encumberance, but *derived* works must use the same license
- permissive (examples: MIT, BSD, Apache): largely unencumbered (but do read the license terms for more specifics)

## The MIT license


    The Revise.jl package is licensed under the MIT "Expat" License:

    Copyright (c) 2017: Tim Holy.

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:


    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# "Scoping out" repositories on GitHub

To identify packages that are likely to be useful to you, good signs are:
- packages you hear about or seem to be widely used
- packages with a lot of stars and good documentation
- packages with multiple contributors
- packages with responsive maintainers (check the history of issues & pull requests)

A package that meets *all* of these criteria is **DataFrames.jl**.
(Many wonderful packages do not come close to this same standard; absence of these signs does not prove that a package is not useful.)

Stars (upper right): ![stars](figures/dataframes_stars.png)

Contributors (lower right): ![contributors](figures/dataframes_contributors.png)


and "Issues" (bug reports etc.):

![issues](figures/dataframes_issues.png)

Major positives: rapid replies to users, many issues have a corresponding pull request. This is "platinum-level" maintenance; commercial software should be so good!

# Version control

*Manual* "version control": email documents like `myfile_v1.docx`, `myfile_v2.docx`, etc. to your collaborators

Problems:
- they cannot easily see each other's comments or changes as they work through the document
- you have to manually merge the changes from multiple modifications (`myfile_v1_PersonA_edits.docx`, `myfile_v1_PersonB_edits.docx`, etc.)

*Centralized* version control: Google Docs & similar services

Pros:
- everyone sees the document in real-time
- the changes tend to be the union of all changes, preserving a single master document

Cons:
- you can't work if you're not connected to the internet (or if you do, you break the model, and changes may need manual merging)

*Distributed* version control: `git`

Features:
- there is no single "master document" ("official version" is a *social* construct, not a technical one)
- different copies do not automatically sync, but you can trigger syncing yourself
- merge disparate changes based on shared history

A reasonably-good mental model: biological evolution
- there is no "master copy" of a species, only individuals
- "syncing" = "mating": the combination of mutations from two parent organisms
- there can be no syncing without shared history (you need a foundation of sequence homology)

The model is imperfect:
- in software there is a social drive to (eventually) unify into "a single thing"
- unlike reproduction, changes are merged *exhaustively*, not randomly
- there is no analog of the sexes: any two "individuals" (*branches*) can merge if they have shared history

# Distributed version control works well for code

- *pro*: develop your feature without interference from unrelated changes occurring simultaneously all over the globe
- *pro*: combine changes when they are ready
- *con*: conflicts sometimes have to be merged manually

# Why does history matter?

Merges could be made purely based on comparisons between final products, the *diff*:

| Version A | Version B | Differences |
|:---- |:--- | ---:|
| `x = 6` | `x = 6` | |
| `y = 2` | `y = 3` | <--- |
| `z = x + y` | `z = x + y` | |

But which one do you want to keep? History endows modifications with *directionality* and allows most merges to be automatic.

<img style="float:center; display:inline; height:120px; width:auto;" src="figures/Git-Logo-2Color.png" />

`git` has become the dominant version control system (VCS) for software.

Install it on all machines you use (Windows, Macs, and Unix)

Works with or without a *hosting service* (GitHub, GitLab, BitBucket, etc.)

# Git concepts (1/3): branch and merge

![branching and merging](figures/git-branches-merge.png)

Inspired by *NobleDesktop*

# Git concepts (2/3): checkouts and staging

![checkout](figures/git-staging.png)

The code you see in your folder is a "checkout" from the overall `git` tree: you can switch branches mid-project.

From *Pro Git book*

# Git concepts (3/3): remotes

![remotes](figures/remotes.png)

- *push*: send code (needs read access to local, and write access to remote)
- *pull*: receive code (needs read access to remote, write access to local)
- *clone*: make initial copy of a remote repository

# Hosting concepts: forks

![forks](figures/hosting1.png)

Inspired by *Thomas Beuzen*

![forks](figures/hosting2.png)

# Advanced topics

`git` is history based. It is possible to rewrite history:

- `git commit --amend`
- `git rebase`

**Warning**: don't do this on `main` if anyone besides yourself is using the repository: you will generate incompatible histories.

You can do it to "clean up a branch," but this is a bit advanced and we won't cover it here. (You can read about it.)


# Summary

- `git` and hosting services like GitHub enable worldwide collaboration
- the hosting services are social media platforms: be respectful and earn the respect of others
- `git` is based on a few core concepts:
  + changes get *staged* and *committed*
  + branches get `push`ed, `pull`ed, and `merge`d


On the homework:
- complete a set of "bot" exercises on the basics of GitHub
- use VS Code on your local machine to create a repository, make changes, and push up to GitHub