# Introduction

Welcome! In this course I'll try to teach the basic tools that you need for data analysis in Python. How things generally work is that I will provide a short explanation of some concepts and leave the details to the readings. I may also complement them with some everyday useful tricks. Then, I'll give you some exercises to test your skills. You should go through everything properly - there are no due dates, there is no grading, so take your time and do things right. You can only cheat yourself here :)

We'll start off in this lecture by showing you how to use Jupyter Lab (the environment in which you will be working in all the time) and Git, a very useful tool for keeping track of changes in your code. There will be no programming in this lecture.

## Jupyter Lab

You're using it right now :) It's an evolution of Jupyter Notebooks, a really great leap which brings it closer to being an full fledged IDE. It's relatively new, so you will probably hear people mentioning Jupyter notebooks more - but this is the *future* :)

### Cells
As you can see, the code is organized in cells (even this text is a cell - double click on it to see it!). Cells are of two types - Markdown (text) or code. 
You can execute the cells by selecting them and them pressing either <kybd>Shift</kybd>+<kybd>Enter</kybd> or <kybd>Crtl</kybd>+<kybd>Enter</kybd>. The first one executes the cell and moves you to the cell bellow, while the second one just executes the cell (and doesn't move you anywhere). If you have a lot of cells to execute you may find yourself pressing <kybd>Shift</kybd>+<kybd>Enter</kybd> furiously :)

There are two cells bellow all this text - first one is text, second one is code. Execute them both, try out the two different keyboard commands!

### Keyboard shortcuts 
Now that we know how to execute cells, let's talk about creating and deleting them. You probably see a + sign in the menu bar above - yup, that creates a new cell (you can try it). But this is not the best way to do it. The best way to do it is with keyboard shortcuts.

Instead of writing everything down, read the link bellow:
- [Link bellow](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html)

Try out the shortcuts explained there! If you are having some problems:
- To enter edit mode, just double click on the cell (this you probably got)
- To enter command mode, click in the white space just to the right of the cell.

One very useful command mode shortcut not mentioned:
- <kbd>Z</kbd> - it's like <kybd>CTRL</kybd>+<kybd>Z</kybd>, but for cell operations (so if you deleted cell, <kybd>Z</kybd> in command mode will bring it back, etc.)
- <kybd>CTRL</kybd><kybd>Shift</kybd><kybd>-</kybd> splits the cell at the selcted place.

### Other stuff

That's all we need right now. You can probably see that Jupyter lab has tabs (very useful), and you can also make these tabs half-window sized, like you do in Windows. Open a new tab and try it out :) If you haven't figured it out yet, you can hide the siedbar as well. Also, Jupyter Lab includes editors for a variety of file. For example, open `README.md` file (you should see it in the sidebar). When you are in it, right click and select "Show Markdown preview". Amazing, isn't it! (Don't be afraid of making changes to that file). To see cool stuff that Jupyter lab can do, you can have a look at this post, or the documentation:

- [Post](https://blog.jupyter.org/jupyterlab-is-ready-for-users-5a6f039b8906)
- [Documentation](https://jupyterlab.readthedocs.io/en/stable/)

### Exercises - very easy ones

You have a bunch of cells bellow. I want you to do the following with them (in the order as they appear - the cells give you clues):

1. Execute the cell (if you haven't already)
2. Execute the cell (if you haven't already)

From now on, I want you to do **everything** using keyboard shortcuts. No cheating!

3. Change the cell to code, execute it
4. Change the cell to Markdown, execute it
5. Create a cell bellow, write "1+1" in it
6. Create a cell above it, write "2+2" in it
7. Delete the cell
8. Delete the cell, then undo the deletion (using keyboard!)
9. Merge the two cells, then execute
10. Split the two cells, then execute

Cool, you are done!

**This is a text cell**

In [1]:
print('This is a code cell')

This is a code cell


In [None]:
print('Change me to code')

Change me to *Markdown* please $!$

Create a cell bellow me!

Create a cell above me!

*Delete me*!

Don't delete me :(

In [2]:
print('Please,')

Please,


In [3]:
print('merge us!')

merge us!


In [1]:
print('Split us')

Split us


In [2]:
print('in the middle')

in the middle


## Markdown

You've seen it already - markdown is a really lightweight, easy markup language (basically, transforms plain text to something that looks nice wiht almost no effort).
Markdown files have the file ending `.md` (as the one you have opened in the previous section), but in Jupyter lab you can mix code and Markdown, which is nice.

## How to use it

I'll let you figure this out by reading through the clearly written cheatsheet bellow:
- [Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet)

Now that you've read it, I just want to show you another cool feature not included in the cheatsheet (edit this cell to see how it's done)

- [x] Jupyter Lab
- [ ] Markdown
- [ ] Git

Also, I want to make a distinction between inline Latex (like this: $\alpha + \beta = \gamma$) and full line (equation) Latex:
$$
\beta + \alpha = \gamma
$$

## Exercise

Here you should replicate what you see bellow. 
- The links are to the Wikipedia articles with the same name. (also, the first paragraph is copied from Wikipedia, if that helps you)
- The image is in the `Images` folder, so the full path (link) to it that you will use is `Images/regression.png`.
- The image should show "Linear regression" when you hover above it.
- Make sure to get the code highlighting right (the language is Python).

![](Images/exercise_md.png)

Do the replication in this cell.
# Ordinary least squares

In [statistics](https://en.wikipedia.org/wiki/Statistics), **ordinary least squares (OLS)** is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being predicted) in the given dataset and those predicted by the linear function.

![linear regression](Images/regression.png)

# Example

Let's try this on an example. The model we are considering is
$$
y_i = \beta_0 + \beta_1 x_i + \epsilon_i
$$
Where $y_i$ represents car sales in year i, and $x_i$ is the GDP in that year. The coefficients we obtain are

|Coefficients   |$\beta_0$      |$\beta_1$| 
| :-----------: |:-------------:|:-----:|
| Value         | 1.43          | 3.84  |

# Code

Here's how to compute the previous example in Python. We will be using `pandas` and `statsmodels`, so make sure you have them installed.

```python
import pandas as pd
import statsmodel.api as sm

data = pd.read_csv('car_sales.csv')

model = sm.old.from_formula('sales ~ GDP')
results = mod.fit()

#Show results
result.summary()
```


# Git

Yay, almost done! Now I'll show you git - the basics, and also how the workflow for this course will be like (i.e., how you submit exercises). You don't have to worry about installing anything - git is automatically installed on your machines. Short introduction on what git is, and why it is useful (and why not just use Dropbox):

First, git is a version control software. It's great for:
- Keeping track of exactly when you implemented feature X in your code
- Collaborating on the same codebase (i.e., all your code files) with your teammmates, each working on your own feature, merging your work when you finish
- Collaborating on open source projects

It's not good for:
- Just saving files. First of all, it's super slow, so forget uploading large files. Really, it's just meant for code - simple, (relatively) small text files. Saving files is what dropbox is for.

It beats dropbox for code, because, as you will learn, it enables multiple people to work on the same document at the same time, merging their edits gracefully when you finish. Dropbox, on the other hand, would try to enforce the same version on everybody - if two people were editing a document at the same time, it would give some nasty duplicated "conflict" files, which you would have to manually merge.

Last thing before I send you off to the readings: there are two ways to use git. Using the terminal, or using a GUI app. GUI apps are great, but it's useful to know how to use the terminal as well - for that reason, I think you should execute all git commands via terminal for this course (I have no way of controlling for this, of course, but I think it is for your own good). 

However, some things are just so much better using GUI that it makes no sense doing them in terminal. The two things I have in mind are:
- Seeing the history of commits. GUI can show you nice branches and everything, it's not limited to ASCII characters, as the terminal is
- Merging changes - you can much clearly see the changes that occured in the GUI, with the code highlighting and all, and also more easily select which ones to keep (and which not to)

## Readings 
Ok, with no further redo, here are the readings. It's a lot, but there's no rush, so take your time to read through them (but note that these readings contain more information than you will need, so don't spend too much time memorizing everything):
1. Getting started
    - [About version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
    - [A short history of git](https://git-scm.com/book/en/v2/Getting-Started-A-Short-History-of-Git)
    - [What is Git?](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F)
    - [First-time Git Setup](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup)
2. Git basics
    - [Getting a git repository](https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository)
    - [Recording changes to the Repository](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository)
    - [Undoing things](https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things)
    - [Working with Remotes](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes)
3. Git branching
    - [Branches in a Nutshell](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell)
    - [Basic Branching and Merging](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging)
    - [Branch Managament](https://git-scm.com/book/en/v2/Git-Branching-Branch-Management)
    - [Branching Workflows](https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows)
    - [Remote Branches](https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches)

## Some everyday commands

As a complement to the readings above, I want to highlight some commands that you will be using most of the time. First, just checking what changes have you made to the code is done with
```
git status
```
If you have not created any new file, then you can stage and committ all the changes at once  with
```
commit -a -m "comment"
```
However, if you created new files, the command above will not stage them (so they also won't be commited). You can make sure that it does with
```
git add .
git commit -m "comment"
```
After you commited your glorious changes, you push them to github with
```
git push
```
And when you wake up the next day you pull what your Chinese teammate did at night with
```
git pull
```

## GUI

As you will need a GUI from time to time, it would be wise to install it. I recommend you install [GitKraken](https://www.gitkraken.com/)

## Exercises
You will be doing all these exercises from the terminal. Don't forget to navigate to the project folder first, with
```
cd ~/Documents/Python-data-course/
```

(Also, you have a terminal window already open running the notebook. You can either open a new window, or a new terminal tab (like Chrome tabs) with Ctrl+Shift+T).

You should copy all the terminal commands you do in the cell bellow (use markdown). Just to avoid conflicts, save this notebook now, and don't edit it (write the commands in a text file meanwhile), return to enter the code for this exercise after you are done with it.

1. Do the First time git setup (from the readings) - i.e., set your name and email for git
2. Stage and commit all changes you have done so far (comment should be `jupyterlab, markdown done`)
3. Create a new branch (and check out to it), called `test1`
4. Create a folder called `Test`, and inside it create a plain document called `testing.txt`, write something in it. (You don't need to do this one in the Terminal)
5. Stage and commit all changes (be careful here)
6. Oops, you forgot something! Go to `testing.txt` and add `done!` at the end. **Amend** the previous commit.
7. Push your new `test1` branch to github. Go to Github, check that it is there!
8. Checkout to the master branch
9. Merge it with the `test1` branch
10. Delete the `test1` branch (note: usually this is not a neccesary step - there is no cost in letting old unused branches linger. You usually do it when you have a lot of such dead branches and want to clean up).

Write the commands in this cell:

## Submitting exercises

Last, but not least! I'll show you how to submit exercises now. Let's first describe the setup. I assume your name is Jenny, if it's not, change as appropriate.

### Setup
So, we have the original (Tadej's) repository of this class on Github. This repository contains (among others) these two branches:
- master
- Jenny

Then we have your (Jenny's) fork. It contains (among others) the following branch:
- master

### Workflow

#### Submitting exercises

So, say that you have diligently completed all the exercises, and have commited and pushed the changes to your fork (always make sure that the master branch contains your final changes, even if you used, for some reason, other branches meanwhile). What do you do then?

To submit the exercises you have to do something called a **pull request**. Note that this is completely different from the `pull` command you read about in the readings (just unfortunate terminology). What a pull request does is requests that the original repository (Tadej's) be merged with the changes from your fork (Jenny's). How do you do it?

On the GitHub page of your fork there should be a button "Pull request". Click it. Then, when you are there, it shows  that your master branch will be merged to the original master branch. **Change this**. Namely, change the original branch from master to Jenny. Then, just click "Submit pull request", write a comment for me if you want, and you are done! I will comment on your solutions, telling you if you need to improve something.

#### Getting new exercises
You've submitted your solutions yesterday, and today a new lecture with new exercises already awaits you! But, it is in the original reposiroty, so how do you get it? You need to sync your fork with the original repository. This article provides clear instructions on how to do that in 5 easy steps:
- [Article](https://help.github.com/en/articles/syncing-a-fork)

Bellow I provide a simple diagram illustrating the situation:
![Git workflow](Images/git_flow.jpg)

# Up next...

You've used some terminal in this lecture. We'll super charge that in the next one and give you all the knowledge you need to be a proficient and efficient terminal user. We'll also start coding, slowly, by showing you how to properly set up a python environment. We'll then combine these two new skills to show you how to deploy a new Amazon server and start coding on it in a matter of minutes!