## Installing Python

I highly recommend installing Python using the [Anaconda distribution](https://conda.io/docs/user-guide/install/index.html). It comes with almost all of the packages we will need for this class and pretty much just works. Windows users should especially be grateful as installing Python on Windows used to be an enormous pain...

## IPython vs Python

Ipython is what makes Python interactive. Meaning that you can type some code, get some results, and then type some more code. This is very useful for exploring data because you don't always know what you are looking for and it can be annoying to have to run your entire program every time you make changes.

## Jupyter Notebooks

As you may have noticed, I write all my lectures in jupyter notebooks. These notebooks wrap Ipython with a web editor that allow one to mix code and markdown in one place. While they are not perfect, they offer a lot of benefits for data scientists. I would recommend doing most of your homework in notebooks. To start a notebook, open your terminal, navigate to the folder and type "jupyter notebook" then hit enter. That will open up the notebook! Then in the top right click "New" and select "Python 3" and you should be good to go. 

There are a lot of things that notebooks can do, but we will not cover them all here. The most basic thing is toggleing a cell type by selecting "Cell" from the menu. Then "Cell Type." This allows you to change cells from code to markdown and visa versa.

To learn more use Google or here is a [decent guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook##UseJupyter)

## The Terminal

Some of you might not be very familiar with the terminal on your computer. I would say it is worth investing some time to learn basic commands. For Mac / Linux users [here](https://computers.tutsplus.com/tutorials/navigating-the-terminal-a-gentle-introduction--mac-3855) is a pretty gentle introduction. I found [something similar](https://www.bleepingcomputer.com/tutorials/windows-command-prompt-introduction/) for Windows, but I am not a Windows user, so I am not sure how good it is. Please feel free to open a pull request if you have a better resource for windows.

## Git and GitHub/BitBucket

Git is an amazing tool for software development. It allows for pretty easy version control and collaboration.

First, install Git by following [this guide](https://www.atlassian.com/git/tutorials/install-git).

Then, [learn](https://www.atlassian.com/git/tutorials/what-is-version-control) why it is useful.

Lastly, figure out [how to use it](https://www.atlassian.com/git/tutorials/setting-up-a-repository).

## Native Python for Science

Let's see what Python looks like! First, you will notice that Python makes use of whitespace and does not use ; like some other languages

In [1]:
x = 5
y = 10
print(x+y)

15


We will make use of many libraries. Some are pre-installed with Python, even more come with Anaconda, and some we will have to install ourselves. To get a library use an import statement:

In [2]:
from collections import Counter

This command imports the class "Counter" from the "collections" library. Counter is actually a really useful tool for data scientists. It can count the number of times items appear in collections like lists. Lists are a useful data structure to store data. For example:

In [4]:
marriage_ages = [22, 22, 25, 25, 30, 24, 26, 24, 35, 22]
value_counts = Counter(marriage_ages)
value_counts.most_common(5)

[(22, 3), (25, 2), (24, 2), (30, 1), (26, 1)]

Functions are also very useful.

## The data science stack


Pandas - Provides R like data structures and a high level API to work with data
Numpy - Provides fast numerical computing such as arrays and linear algebra
Scipy - For scientific computing such as drawing from distributions
Matplotlib - For plotting
    Seaborn - To make your plots look better
Scikit-Learn - For machine learning; great documentation and tutorials
Statsmodels - For more traditional statistics