# Introduction to python and jupyter notebooks

Below is an (extremely non-comprehensive) introduction to python and jupyter notebooks. 
There are many great resources for learning python out there. 
Some of my favorite include:
* Automate the boring stuff with python: https://automatetheboringstuff.com/
* The [Python Data Science Handbook](https://www.oreilly.com/library/view/python-data-science/9781491912126/), which has free notebooks availabe on GitHub: https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/README.md

## Moving around in notebooks

Notebooks are convenient ways of combining text, code, and output. 
The text is written in a language called Markdown. 
Code can be written in any number of languages; we'll be using python. 

Navigating notebooks:
* Can write text (markdown) or code in cells
* To run code or compile text, use `Shift+Enter` or `command + Enter`
* Hit `Enter` to continue editing text or code in a cell
* Use your arrow keys to move up and down
* Hit `Esc` to move into navigation mode

Markdown cheatsheet: https://www.markdownguide.org/cheat-sheet/


## Python basics

Basics of python cheatsheet: https://www.pythoncheatsheet.org/cheatsheet/basics

In [None]:
# python comment (will not run)

In [None]:
# types
'String Type'  # string type
3.41           # float type
1              # int type
True           # bool type

In [None]:
# define variables/objects
# Numbers
age = 25                    # int
price = 19.99              # float
coordinate = 2 + 3j        # complex

# Text
name = "Alice"             # str

# Boolean
is_student = True          # bool

# None
result = None              # NoneType

In [None]:
# Collections
scores = [85, 92, 78]      # list
scores
person = {'name': ['Bob'], 'age': [30]}  # dict
person
# coordinates = (10, 20)     # tuple
# unique_ids = {1, 2, 3}     # set

In [None]:
def foo():
    """
    This is a function docstring
    You can also use:
    ''' Function Docstring '''
    """
    print('Hello world!')
foo()


## Packages

Built-in functions in python only do so much. You generally need python packages to do interesting analysis.
To use a package, you may need to first install it to your local python distribution. 
This can be done using `pip install` or `conda install` on the command line. 
In a jupyter notebook, you can access the command line by preceeding your code with `!`

In [None]:
!pip install pandas

You must `import` a package in order to use it in your code. 
You'll often see `as` statements following import statements. This is known as an *alias*. 
When a package has an alias, you can reference the alias instead of the package name.
After you've imported a package, you can use its commands. 
Below, I define a pandas Series. 

In [None]:
import pandas as pd
import numpy as np

pd.Series([3])

## Pandas and Numpy

### Why do we need these packages?

Numpy and Pandas are often used for analyzing data. 

Question: Why do we need these packages?
Python has a list type; can't we just store data as a bunch of lists?

Answer: In python, lists can store different types:

In [None]:
[True, '2', 3.0, 4]

This is very flexible! But ultimately very slow, particularly when you're dealing with a lot of data. In contrast, all elements of numpy arrays are of the same type. This allows for faster compilation of code.

In [None]:
np.array([1, 2, 3])
np.array([1., 2.2, 2.1])
np.array(['s', 't', 'r'])
np.array([True, False])

Pandas arrays are built on numpy arrays, and have even more information; there are column and index (row) labels. 
Pandas is a bit slower than numpy, so sometimes analysis is done with numpy instead of pandas.

Pandas objects are generally two types: Series (vectors), and DataFrames (matrices).

In [None]:
pd.Series([2, 3, 1])

In [None]:
pd.DataFrame([[2, 3, 4], [5, 6, 7]], columns=['col1', 'col2', 'col3'])

## Pandas introduction

Google colab introduction to pandas: https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb#scrollTo=dPmpVM_8IoBO

# Create and populate a 5x2 NumPy array.


In [None]:
# Create and populate a 5x2 NumPy array.
my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])

# Create a Python list that holds the names of the two columns.
my_column_names = ['temperature', 'activity']

# Create a DataFrame.
my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)

# Print the entire DataFrame
print(my_dataframe)

In [None]:
# index of dataframe are the row labels. Sometimes you can ignore these, and sometimes they matter.
my_dataframe.index = ['A', 'B', 'C', 'D', 'E']
my_dataframe

In [None]:
# Create a new column named adjusted.
my_dataframe["adjusted"] = my_dataframe["activity"] + 2

# Print the entire DataFrame
print(my_dataframe)

In [None]:
print("Rows #0, #1, and #2:")
print(my_dataframe.head(3), '\n')

print("Row #2:")
print(my_dataframe.iloc[[2]], '\n')

print("Rows #1, #2, and #3:")
print(my_dataframe[1:4], '\n')

print('Row B:')
print(my_dataframe.loc['B'], '\n')

print("Column 'temperature':")
print(my_dataframe['temperature'])