# Introduction to Version Control
----------------------------------------------------------------

## Definitions and Key Concepts
----------------------------------------------------------------

Version control, also known as revision control, is a method for tracking changes to files and folders within a source code tree, project, or any complex set of files or documents. Informally, many of us implement version control concepts and strategies into our typical workflows via:

* file and folder naming conventions
* (documented and consistent) backup procedures
* application or platform specific change tracking features.

Each strategy or method has its pros and cons. Often the disadvantages of any given approach become apparent as we try to scale it out to larger projects with more collaborators. An advantage to software version control systems is that they are designed to facilitate the workflows of large, often distributed teams.

Version control systems have changed over time, but the current generation of popular or commonly used applications are typically of a kind known as distributed version control systems, or DVCS. Throughout the rest of the session/tutorial, we will use VCS and DVCS interchangeably. 

### Features of (D)VCS

Some of the features below apply generally to DVCS, but some are Git specific.

![Illustration of DVCS system features](images/features.png)

* __Distributed__

Beyond being web accessible from remote clients such as a local workstation, the _distributed_ in DVCS means that any given  repository with its entire change history is copied to each contributor's local environment (desktop, server, etc.) This has important workflow implications and constitutes part of what is considered to be the VCS learning curve.

But as the saying goes, it's a feature not a bug. Because of their distributed nature, DVCS offer redundancy which can be of great value when a central server goes down, a file needs to be recovered, the wifi is spotty, etc. Additionally, beyond a set of basic conventions distribution allows for individual flexibility regarding workflows, organization, content management and development, etc. Most importantly, multiple users can edit the 'same' file at the same time without creating immediate conflicts (though conflicts can still occur).

* __Granular change tracking__

There is some variation among platforms, but in general a DVCS tracks and records every change to every file and directory in a repository. It is possible and often desirable to exclude certain types of content, but for text based files this change tracking includes content edits (insertions and deletions) as well as file creation, deletion, moving, etc. For binary file types and directories this includes creating, renaming, moving, and deleting.

In a Git based workflow, changes are periodically (frequently!) registered as repository snapshots using the `add` and `commit` commands. This makes it easy to view, revert to or restore a given snapshot or repository state. It can also simplify bug or error tracking without requiring contributors to maintain an extensive set of backup copies or previous versions of files. (How many 'mfile_old.docx' or '.bak' files do you have?) 

* __Annotation and attribution (_blame_)__

VCS support transparency and accountability: Changes within a repository are automatically attributed to the users who committed those changes. Additionally, commits include comments or annotations which in principle should describe or explain committed changes.

* __Branching__

Among others, branching is a feature that contributes to the scalability of VCS. Branching allows contributors to test ideas, draft sections of a paper, troubleshoot or experiment on a 'branched' copy of the repository without impacting the master or production copy. Branches can be managed locally without pushing changes to the master repository, or branches can be merged with the master repository in order to push changes into production. 



## Use Cases
----------------------------------------------------------------

Above, we noted that version control is useful for tracking changes to 'complex' sets of files or documents. That is true, but there are many cases when version control may be useful for smaller scale, individual projects or even for drafting standlone documents (memos, posters, article drafts, etc.).

The use cases below focus on common workflows and are not specific to any application or file type. That said, while it is certainly possible to use a VCS for managing changes to binary file types (MS Word and Excel, SQL databases, images, etc.), we recommend wherever possible using text formats - CSV, Markdown (.md) or LaTex, HTML/XML, etc. Especially with regard to content edits,the full benefit of using a VCS is generally only achieved with text files.

* __Document Authoring__

In some ways an obvious use case, but also one which can require the most extensive changes to existing workflows (and mindsets). We routinely create reports, papers, posters, and other forms of primarily text-based content using productivity software applications like MS Word and Powerpoint. 

As mentioned, increased efficiency can be achieved by adopting workflows which utilize text formats.

[https://github.com/data-8/textbook](https://github.com/data-8/textbook)

* __Data Management__

Researchers are increasingly expected to share data, which in addition to the logisitics of making data publicly available comes with implications for transparency, reproducibility, replicability, etc. 

[https://github.com/cisco-ie/telemetry](https://github.com/cisco-ie/telemetry)

* __Web Content Management__

In addition to managing web assets like HTML and CSS files, the online front ends of services like GitHub and Bitbucket are in themselves publishing platforms. 

[https://dataoneorg.github.io/Education/](https://dataoneorg.github.io/Education/) & [GitHub Repository](https://github.com/DataONEorg/Education)

[https://github.com/nawrs/nawrs](https://github.com/nawrs/nawrs)

* __Software and Code__

The cannonical use case.

[https://github.com/jonathanwheeler01/lter_dspace_harvester](https://github.com/jonathanwheeler01/lter_dspace_harvester)

## Basic Conceptual Workflow

![An illustration of the basic Git workflow](images/basic_cycle.png)

The conceptual workflow for Git's distributed version control model includes the creation of a repository within which changes are tracked, the creation/modification/deletion of content, the addition of those changes to the repository into the "tracked" information, and committing those changes to a specific identified point in the history of the repository. Optionally, this repository history can also be synchronized with a remote repository such as GitHub.

## Interlude - How Our Files and Folders are Organized

__Where am I and what's here?__

#### The relationship between your GUI view of your file system and the command line view ####

([full resolution image](images/shellPathFigure.png))

![View of how the Windows Explorer and Mac Finder views of the file system related to the same locations in shell paths](images/shellPathFigure_sm.png)




## More Information and Resources
-----------------------------------------------------------------

For more information about version control in general or specific VCS (Git, Mercurial, etc.), see:

* Safari Bookshelf ([link](https://learning.oreilly.com/home/))

    * _Git Essentials - Second Edition_ ([link](https://learning.oreilly.com/library/view/git-essentials-/9781787120723/))
    * _Git for Teams_ ([link](https://learning.oreilly.com/library/view/git-for-teams/9781491911204/))
* [Git](https://git-scm.com/)
    * [Git Reference Manual](https://git-scm.com/docs)
    * [The Pro Git Book](https://git-scm.com/book/en/v2)
* [GitHub Git tutorial](https://try.github.io/levels/1/challenges/1)

