## Jupyter Notebook Basics

- this is a cell!
- there are two modes to this cell
    - edit mode - when you are typing in the cell
    - command mode - hit escape to go into this mode
- you can write python code here. You can also write markdown here. But not both together :(
- if you write code, the output from the last line is printed out automatically
- see the brackets next on the left of the cell? If they are empty, it means this cell has not been run yet!
- if they have an Asterisk in them, the cell is running!
- if it has a number, the cell was run at some point in the past. The number increase by one every time you run a cell
- markdown cells do not get these numbers
- if you are ever stuck if the notebook is not behaving, you can `Restart Kernel`
- to run a cell, use the menu on top. You can also run a cell by using `shift + enter`
- you can insert a cell before or after a cell by using the toolbar menu or by using the following shortcuts in the command mode
   - A - insert above
   - B - insert below
   - DD - delete cell


In [None]:
# simply click on the cell and write some code
34+8

In [None]:
# you can install packages from pypip as usual. 
!pip install pyjokes

In [None]:
# you can now use the package
import pyjokes

print(pyjokes.get_joke())

## Some Python if you need a quick refresher

Let's look at some basic data structures in Python

   - str (string)
   - List (mutable, order maintained)
   - Tuple (immutable)
   - Set (unordered, unique)
   - Dictionary (mutable, key-value pairs)

    


In [None]:
#you can also write python functions
def greetings(name):
    if name:
        print('hello', name, '!')
    else:
        print('hello there!')

In [None]:
#you can call that function from any other cell once it has been declared and defined
greetings('upkar')

In [None]:
#this is a string (str)
hackathon = "spectra"
print('length of the string:', len(hackathon))
print('first character of the string:', hackathon[0])
print('last character of the string:', hackathon[-1])
print('capitalize string: ', hackathon.capitalize())
print('uppper case string: ', hackathon.upper())
print('find tra in the string: ', hackathon.find('tra')) #start counting from 0
#Many more methods available. Read the docs - https://docs.python.org/3/library/stdtypes.html#string-methods

In [None]:
#this is a list
states = ["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida"]
print('number of states:', len(states))
print('first state: ', states[0])
print('last state: ', states[-1])
print('list contains California?', 'California' in states)
print('where is California in the list?', states.index('California'))
print('list first three states:', states[:3])
print('list last three states:', states[-3:])

#more advanced
print('filter list to only states starting with C:', list(filter(lambda x: x.startswith('C'),states)))
print('filter list to only states starting with C:', [state for state in states if state.startswith('C')])
#Many more methods available. Read the docs - https://docs.python.org/3/tutorial/datastructures.html

In [None]:
#this is a dictionary. They are unordered, so first and last makes little sense
states = {
'AL': 'Alabama',
'AK': 'Alaska',
'AZ': 'Arizona',
'AR': 'Arkansas',
'CA' : 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DE': 'Delaware',
'FL': 'Florida'
}
print('keys in states:', states.keys())
print('values in states:', states.values())
print('# states in dict:', len(states))
print('find ca:', states['CA'])
print('is FL in the list:', 'FL' in states)
print('is Florida in the list: ', 'Florida' in states.values())
#Many more methods available. Read the docs - https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

## Data structures in Pandas - Series

In [None]:
import pandas as pd
import numpy as np

In [None]:
#This is a series. Think of this as a list with an index!
states = pd.Series(["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida"])
states

In [None]:
#you can give the data your own index!
states = ["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida"]
abb = ['AL','AK','AZ','AR','CA','CO','CT','DE','FL']
states = pd.Series(data=states, index=abb)
states


In [None]:
#slice and dice
print('first state:', states[0])
print('first index:', states.index[0])
print('\n\n')
print('first three states:')
print(states[0:3])
print('\n\n')
print('last three states:')
print(states[-3:])

In [None]:
#you can create series out of comprehensions and anything else that returns an iterable
fours = pd.Series([x**4 for x in range(6)])
fours

In [None]:
#numpy has a wonderful random number generator that is useful in creating dummy data. Returns np.array
rand = pd.Series(np.random.rand(5))
rand


## Data structures in Pandas - Dataframes

In [None]:
#dataframe is a like a table
state_data = {
    'abb':['AL','AK','AZ','AR','CA','CO','CT','DE','FL'],
    'name':["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida"],
    'temp': np.random.rand(9)
}
df_state = pd.DataFrame(state_data)
df_state

In [None]:
df = pd.DataFrame(data = np.random.randn(2,2), columns=['Temp', 'Humidity'])
df

In [None]:
#the very cool thing about dataframes is that you can operate on them as a single unit. NO NEED TO ITERATE EXPLICITLY!
df = df * 10
df

In [None]:
#you can use apply method to perform a more complex operation. 
#remove the comment below and shift+tab+tab_tab inside the apply brackets to bring up the documentation. Yes, three tabs. See what each one does!

#df.apply()

In [None]:
def add100(x):
    return x + 100

In [None]:
#let's apply this function to 
df.apply(add100)

In [None]:
#notice that the original dataframe does not change. This can get tricky. 
#If you want to change the df, assign it to itself. Or some methods have inplace=True parameters to force change inplace
df

In [None]:
#you can also use lambda or anonymous functions to do the same
df.apply(lambda x: x+100)

## How to get help?

In [None]:
import pandas as pd

In [None]:
pd?

In [None]:
# shift + tab
# shift + tab + tab
# shift + tab + tab