- What is version control and why?
- What are commits and branches?
- What are forks and clones?
- Get a mental representation for commits and branches.
- Understand the difference between forks and clones.
- Understand the difference between Git and GitHub.
- Command line interface
- Cloning using SSH protocol and SSH keys
- Rebasing and squashing
- Many Git tricks which can be explored later {: .discussion}
Version control is the answer to these questions:
- "It broke ... hopefully I have a working version somewhere?"
- "Can you please send me the latest version?"
- "Where is the latest version?"
- "Which version are you using?"
- "Which version have the authors used in the paper I am trying to reproduce?"
- "Found a bug! Since when was it there?"
- "I am sure it used to work. When did it change?"
- Version control is a tool that can record snapshots of a project.
- You can think of version control like regularly taking a photo of your work (movie sets take regular polaroids to be able to recreate a scene the next day).
class: with-border
Snapshots (**commits**) in the [EHT-imaging](https://github.com/achael/eht-imaging) repository.
- Software (this is how it started but Git/GitHub can track a lot more)
- Scripts
- Documents (plain text file much better suitable than Word documents)
- Manuscripts (Git is great for collaborating/sharing LaTeX manuscripts)
- Configuration files
- Website sources
- Data
:alt: Research comic
:width: 100%
- We can always go back if we make a mistake.
- We can test new ideas without editing the working version
- If we discover a problem, we can find out when it was introduced.
- We have the means to refer to a well-defined version of a project when sharing, collaborating, and publishing.
- Tool that can record and synchronize snapshots.
- Not the only tool that can record snapshots (other popular tools are Subversion and Mercurial).
- Not only a tool but also a format that can be read by many different tools.
- Service that provides hosting for Git repositories with a nice web interface.
- Not the only service that provides this (other popular services are GitLab and Bitbucket).
GitHub Desktop
- Graphical user interface to Git and GitHub which runs locally on your computer.
- There are other tools that can do this, too (e.g. Sourcetree).
- repository: The project, contains all data and history (commits, branches, tags).
- branch: Independent development line, often we call the main development line
. - commit: Snapshot of the project, gets a unique identifier (e.g.
). - tag: A pointer to one commit, to be able to refer to it later. Like a sticky note that you attach to a particular commit (e.g.
). - cloning: Copying the whole repository to your laptop - the first time. It is not necessary to download each file one by one.
- forking: Taking a copy of a repository (which is typically not yours) - your copy (fork) stays on GitHub and you can make changes to your copy.
class: with-border
GitHub file view of the
repository. This is the version of all files at a single point in
class: with-border
Github history view of the
repository. This is the progression of the repository (with the
**commit message** over time).
class: with-border
Network graph of all commits in the
repository. This shows the relationship between different **forks**
of people who are contributing and sharing code.
- Event Horizon Telescope imaging software
- Repository: https://github.com/achael/eht-imaging
- Commits, branches, forks: https://github.com/achael/eht-imaging/network
- Activity inequality study
- Contains data and code necessary to create figures from their article.
- Data: https://github.com/timalthoff/activityinequality/tree/master/data
- FiveThirtyEight story Why We’re Sharing 3 Million Russian Troll Tweets
- Contains data and readme file, no code.
- Data: https://github.com/fivethirtyeight/russian-troll-tweets
- The NY Times Coronavirus (Covid-19) Data in the United States
- Contains data, readme, license, but no code. As of 2020.april, being updated every day.
- Data: https://github.com/nytimes/covid-19-data
- Website: https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
- CSV exports of the Getty Provenance Index
- Entire books are written using Git/GitHub:
- Papers under open review:
- All changes are recorded.
- We do not have to send changes via email.
- We can experiment with several ideas which might not work out (using branches).
- Several people can work on the same project at the same time (using branches).
- We do not have to wait for others to send us "the latest version" over email.
- We do not have to merge parallel developments by hand.
- Group-based access model where shared access is the default, instead of everything fundamentally owned by individuals who manage sharing as-needed: with Git you can easily have collaboration be the default.
- It is possible to serve websites directly from a repository. {: .discussion}
- How have you solved these in the past without version control? {: .discussion}