# Git version control system

## Why do we care?
- Do you recall receiving/sharing files via email like **file_rev01.doc ... file_rev02.doc, file_final.doc, file_final1.doc or file_final_final.doc**?
- Making copies of projects/folders like **project_20aug2018**, **project_02sep2018 etc** in your computer?

##  So what? What are the issues here?

- Most of the content is similar. Delta change is quite small.
    + Wasteful of  storage space
- Even in a month or a year, it is hard to understand what is the difference between different files/ folder.
    + What is the timeline of changes. Why and what was done?

If we go back to adding suffixes [like \_01, \_02, ...] one thing we need to ask is, why we did this?

+ Maybe \_01, \_02 is some kind of implicit timeline.
    - one issue is we seldom keep meta data about different version of a file. Going back to a particular version is very hard.
+ Parallel changes are not advisable. Who will resolve conflict (**merge conflict**) in words, lines etc.
    - e.g. you are working  with someone and you need to keep updating documents alternatively to produce final document.
+ If we agree on parallel changes to the documents, someone has to take care of merging.

During my student's life. I have worked with people in this fashion. It was time consuming and not productive.

__Clearly we need a system (version control or source contro) to handle above issues.__

A version control or source control
- Allows keeping track of the history of the files
- Allows creating different versions of the files
- Allows collaboration.

|<center> <h1> Version control </h1> </center>

# Types of version control system(VCS)
- Centralized
- Distributed




# Centralized VCS(SVN, CSV)
|<center> <h1> Distributed version control system </h1>
    <img src= "./figures/vcs_centralized.png" height= "1000" width = "1000">  </img>
</center>

<center>  <b>Repository:  Place where version of files are kept </b> </center>


<center>  <font color = "red">Centralized version control system has a single point of failure </font></center>

digraph G {
  "centralized repository" -> "file 1 ";
  "centralized repositroy" -> "file 2";
  "centralized repository" -> "file 3";
}
! emcas ./figures/vcs_centralized.gv
!dot -Tpng vcs_centralized.gv -o vcs_centralized.png

|<center> <h1> Distributed version control system </h1>
    <img src= "./figures/git_dist_arch.png" height= "1000" width = "1000">  </img>
</center>



!emacs git_dist_arch.gv
!dot -Tpng git_dist_arch.gv -o git_dist_arch.png

In a distributed version control system

- Each user has a copy of the repository.
    + Copying is called **cloning.**
    + Resulting copy is called **clone.**
    
**_Git is a popular distributed version control system._**

# Git configuration
When you work in a team, it is important to know who made a particular change.

- git config --global user.name "you name"
- git config  --global user.email "you email id"

To check current configuration run

**git config --list**

# Using git
Git can be used one of the following situations.
1. Tracking local project/folder
1. Working on a remote repository in a team
     + Creating a remote repository for a team if there is none.
2. Contibuting to open source projects when you don't have write permission on repository.

# 1. Tracking local project/folder
- Run **git init** command within the project folder.
    + If you do **ls -al** from command prompt, you would see a **.git** folder. This is how git tracks our projects.

## Demo for creating a folder and tracking it with git.
We'll run
- git init to initialize empty repository
- ls -al to see .git folder
- git status to check repository state.
- create some python code, like square of numbers up to n. Also create .pyc file using py_compile. In fact we don't want .pyc file to be tracked. We will see how to do that using .gitignore
- run git status again (What are untracked files)

## Working Tree
Collection of files in a local repository is called working tree.

A file in the working tree can have following state


<table style="width:100%" >
    
  <tr>
    <th> <font size = "6"> state </font> </th>
    <th> <font size = "6">Description </font></th> 

  </tr>
  <tr>
    <td> <font size = "6">untracked </font></td>
        <td> <font size = "6">file not tracked by git </font></td>
    
  </tr>
  <tr>
    <td><font size = "6">tracked </font></td>
    <td><font size = "6">committed and not staged </font></td>
  </tr>
  <tr>
    <td><font size = "6">staged </font></td>
    <td><font size = "6"> to be included in next commit <font></td>
  </tr>
  <tr>
    <td><font size = "6">dirty/modified </font></td>
    <td><font size = "6"> file has changes but not staged <font></td>
  </tr>
</table>


# How to commit a file in git for tracking.
<center>
Let's see various state change for a file.
<img src= "https://git-scm.com/book/en/v2/images/lifecycle.png" height="800" width="800"> </img>
source : Pro Git book, written by Scott Chacon and Ben Straub
</center>

**Demo:** We will do following activity in sample project folder created earlier
- **git status** on sample folder
- **git add file_name[or -A to add all untracked or modified files]** to put a file in the staging area.
    + Use **git reset file_name** to remove from staging area (*unstage*).
    + or use **git rm --cached file_name**
    + **git reset** will remove all the files from staging area
- **git status** to see the files moved to staging area.


**Demo cont.:**
- create a .gitignore to avoid tracking .pyc file (or any file we don't want to be tracked).
- **git status** to see it is no longer tracked
- **git commit -m "commit message"** to start tracking the file
    + Use **git log** to see detailed  commit logs.
- **git status** to show status again
<center><font color = 'red'>What is that staging area before commit? </font> </center>


# 2.0 Start working on a remote repository in a team

## But wait. How does one create a new remote repository on github.
Click on top right + icon in www.github.com

<img src = "./figures/create_remote_rep01.png"> </img>

give a name (like **analysis_of_word_bank** in the following image) -> select private or public -> check Initialize this repository with a README(optional) >add licence and .gitignore (optional) ->  **then click on create repository**
<img src= "./figures/create_remote_rep02.png" height = "800" width = "800"> </img>

Follow on with the option to connect new or exisiting local repository 

<img src= "./figures/create_remote_rep03.png" height = "1000" width = "1000"> </img>

<center> <h1> Demo for creating a remote repository on github.com as shown in previous slides. </h1> </center>

- Create analysis_of_world_bank_data repository
- In the command prompt, navigate to the directory you want to clone this remote repository
- Let's use the first way to clone by running

**git clone https://github.com/psnegi/analysis_of_world_bank_data.git**. 

After cloning one has complete repository with all the history.
Note: **To find remote repository url**, click on <font color = "green" size = "8">clone or download</font> button in the middle right hand side of remote repository webpage. Copy https/ssh url.


# Demo cont.
- Navigate to the analysis_of_world_bank_data directory. Run **git remote -v** to see remote repository connection information.
- Add **README.md and  util.py** file in the local repository, containg code for calculating the nth power of integers up to a given upper limit.

dot code for above figure
digraph G {
   rankdir="LR";
  "working tree" -> "staging area";
  "staging area" -> "Git repository";
}

<center> <h1> is README.md, util.py in remote repository? </h1></center>

<center> <h1>To synchornize  to remote repository, run the following commands </h1></center>

- **git pull origin master.** *Why to pull?*


- **git push origin master**
This synchronizes the local repository to remote repository.

**origin:** Repository(remote) you cloned from.

**master:** Local master branch will be pushed to the master branch of the remote origin.

 <center> <b>  Check remote repository on github </b> </center>

# Why do we need the branch concept?
Why do we care about the branch concept?
Note that master branch generally contains production code.


- Allows to work on different versions of your files
- By switching to different branches one can work on different features.



Note: __*You can also work on your clone of master.  Finish assigned features/work, pull and after testing can push code to a remote master(production).*__

run *__git branch__* to see all the branches in your repository.


 __*Side: Actually pushing to remote in real production environment is bit more involved*__

- Run some some unit and integration test for assigned feature.
- If success, push to remote master
- Run whole integration suite in deployment pipleline.
    + some stage may mock external dependencies.
- Promote to production.
    + Manual or continuous.


## Workflow to work in a team
To work on new feature/issue you create a branch. Let's say nth_pow.py was supposed to print nth pow of numbers up to given *parameter high*. Clearly it is printing only upto *high-1*. Hence we need to fix this. Let's say we got assiged this feature.

- We need to create a branch. Use **git checkout -b branch_name**. It will create and switch you to the new branch. run **git branch** to confirm this.
    + Note that you can also do **git branch branch_name** to create a branch first and then checkout(switch to) created branch as  **git checkout branch_name.**


## Workflow to work in a team. Cont.
- modify the code and **git add nth_pow.py, git commit -m "fixed range issue"**
- Note that branch is local. You may push branch to a remote.
    + git push  origin branch_name

## Demo
- git branch
- git checkout -b nth_pow_upper_limit
- fix power issue. git status to see the file in modified state
- git add nth_pow.py, git status to see the file moved to stage
- git -m "fixed upper limit for the power" to commit the file. git status to show clean working directory
- git diff COMMIT~ COMMIT to see the difference between a commit and its ancestor or use git show


# Demo cont.
Above operations only afftect the  created branch, local and remote master as still intact.
- git checkout master. To show original content
- git checkout nth_pow_upper_limit to go back to current feature branch
- push this branch to remote using
    + **git push -u origin nth_pow_upper_limit**
        - **-u** tell git to associate local nth_pow_upper_limit branch to remote nth_pow_upper_limit. In future we can just do git pull and git push.
        

# Demo cont. How to get this fix into local master branch and push it to remote?

- **checkout master** and **git pull origin master (why?)**
- **git branch --merged master** to list branches merged into master.
    + or **git branch --no-merged** to list branches that have not been merged.
-  **git merge nth_pow_upper_limit**
- **git pull origin master**. See if remote master has change.
    + this may results in merge conflicts.
- **git push origin master**    

 __*We are done with the feature and we can delete the branch locally and remotely*__

# Demo  deleting the local and remote branches

- git branch -d nth_pow_upper_limit
- git push origin --delete nth_pow_upper_limit

# Resources
## Book
- [Pro Git book](https://git-scm.com/book/en/v2) by Scott Chacon and Ben Straub 
- [github guides](https://guides.github.com/)