# Python Basics
**by [Jason DeBacker](http://jasondebacker.com), August 2017**

This Jupyter Notebook is intended to introduce students to Python, with a particular focus on functionality that has direct application to economic modeling and econometrics.  This Notebook will cover built-in object types some aspects of the Standard Library, as well and an introduction to NumPy and Pandas.


## Built-in Types
There are several data types in the standard library.  This include:
* Numeric Types
    * int
    * float
    * complex
* Booleans
* Sequence Types
    * Strings
    * Lists
    * Tuples
    * Ranges
* Set Types
    * Sets
* Mapping Types
    * Dictionaries
    
### Numeric Types

In [154]:
# You do not need to declare the numeric type - python will infer it from the value of the object
# you can always check the type of an object 
three = 3
type(three)

int

A few notes on the vocaulary of python here.  The variables is `three`.  Is the name for the object that is `3`.  A name is a reference to an object.  In this case, the name `three` references an object that is the number 3.

A "namespace" is thus the collection of names (i.e., variables) that point to some objects.  In particular, the namespace in Python is a dictionary that maps each variable name to is value (the object).  More on dictionaries below.

In [3]:
# But you can cast an object as a certain type
three = float(3)
type(three)

float

In [4]:
# Even as a non-numeric type
three = str(3)
type(three)

str

Numeric types support most basic mathematical operations.  They mostly have obvious names or use standard symbols. For example:

In [6]:
x = 3
y = 5
# addition
print(x+y)

# subtraction
print(x-y)

# mulitiplication
print(x*y)

# division
print(x/y)

# absolute value
abs(x-y)

8
-2
15
0.6


2

An exception to this may be raising a number to a power.

In [7]:
# raising x to the y
x ** y

243

##  Booleans

Boolean objects are special types of integers.  You can assign them value of `True` or `False`, which are equivalent to `1` and `0`, respectively.

In [8]:
# By setting the variable equal to True/False, Python will set it as the Boolean type
drop_out = True
drop_out

True

In [9]:
## Test if drop_out equals 1
drop_out == 1

True

In [10]:
drop_out == 0

False

## Sequence types

### Strings

Strings are handled as a sequence of characters.  Seems odd, but let's look at what this means.

In [11]:
# A string must be enclosed in single or double quotes
name = 'Jason'
name

'Jason'

In [13]:
type(name)

str

#### Indexing

Indexing refers to selecting an element (or "slice") or a sequence by referencing it's index.  It's important to remmber that Python using 0-indexing.  So the index of the first element in a sequence is 0.  

Are can select an element(s) from a sequence by referencing the index, including from strings.

In [14]:
print('First letter = ', name[0])

First letter =  J


In [16]:
# You can use a colon return a slice of a sequence based on index values (remember the 0-indexing!)
print('First 3 letters = ', name[0:3])

First 3 letters =  Jas


But notice that `0:3` doesn't pull the first four letters.  This notation saying select from the 0th indexed element up to (but not including) the element with index 3.

Futhermore, if you leave the left side of the colon emptpy, that means take the elements from the first up to (but not including) the index on the right side of the colon. 

In [17]:
name[:3]

'Jas'

In [18]:
# And to get from the element with index 3 to the last, do
name[3:]

'on'

And a colon on it's own takes all elements from the sequence (you wouldn't use this with a one-dimensional object like the string, but it's useful for slicing multi-dimensional arrays)

In [19]:
name[:]

'Jason'

There are many built-in functions that can be performed with strings.  Some examples:

In [21]:
name + ' DeBacker'

'Jason DeBacker'

In [22]:
name * 2

'JasonJason'

In [25]:
name.upper()

JASON


In [45]:
name.lower()

'jason'

### Lists

Lists are ordered collections of objects.  A list can be comprised of objects that are all the same type of different objects.  But typically lists contain objects of similar types.  One feature of lists is that they are "mutable", which means that you can change the object (in this case a list) after it's been created.

In [32]:
# To define a list, use square brackets
num_list = [1, 2, 3]
num_list

[1, 2, 3]

In [34]:
# Lists can repeat objects that have the same value
num_list2 = [1, 2, 3, 2]
num_list2

[1, 2, 3, 2]

In [35]:
# You can reference elememts of a list and slice lists just as we did with strings
num_list2[3]

2

In [37]:
num_list2[:3]

[1, 2, 3]

In [39]:
# You can also easily iterate over a list
for item in num_list:
    print(item)

1
2
3


In [40]:
# In fact you can do this with any sequence, including strings
for item in name:
    print(item)

J
a
s
o
n


There are a bunch of operations you can do with lists.  A few examples:

In [41]:
# addition appends lists together
num_list + num_list2

[1, 2, 3, 1, 2, 3, 2]

In [43]:
# multiplication repeasts the list n times in one list
num_list * 2

[1, 2, 3, 1, 2, 3]

In [44]:
# Reversing a list
num_list[::-1]

[3, 2, 1]

In [46]:
# Deleting an element by index
del(num_list[1])
num_list

[1, 3]

In [49]:
# Deleting an element by value (deletes the first instance of that value in the list)
num_list2.remove(2)
num_list2

[1, 3, 2]

In [52]:
# showing how a list if mutable
# we can change the 3rd element in num_list2 through the following assignement
num_list2[2] = 99
num_list2

[1, 3, 99]

### Tuples

Types are ordered collections of objects.  They can contain objects of the same of differnt types and more often contain distinct types than do lists.  Tuples major difference from lists are that they are "immutable" or cannot be changed after assignment.

In [53]:
# A tuple is defined by places a sequence of objects in parentheses
num_tuple = (1, 2, 3)
num_tuple


(1, 2, 3)

In [54]:
# Showing that a tuple is immutable
# Try to change the 3rd element of the tuple
num_tuple[2] = 99
num_tuple

TypeError: 'tuple' object does not support item assignment

In [55]:
# But indexing and slicing works the same as we saw with other sequences
num_tuple[2]

3

In [56]:
num_tuple[:1]

(1,)

In [57]:
# and iterating over the sequence
for item in num_tuple:
    print(item)

1
2
3


In [58]:
# appending tuples works through addition
num_tuple + (4, 5)

(1, 2, 3, 4, 5)

In [59]:
# But deleting and removing elements don't - remember, immutable!
del(num_tuple[2])

TypeError: 'tuple' object doesn't support item deletion

In [60]:
num_tuple.remove(2)

AttributeError: 'tuple' object has no attribute 'remove'

### Ranges

The last sequence type we'll discuss are ranges.  Ranges are immutable sequences of numbers, typically used in `for` loops.

In [61]:
range(3)

range(0, 3)

In [64]:
# notice how the range starts are 0 by default and range(n) contains n elements
for i in range(3):
    print(i)

0
1
2


In [65]:
# by range() has other arguments and we can start at not zero
# range(x, y) will start at x and go up to (but not include y)
for i in range(10, 15):
    print(i)

10
11
12
13
14


In [71]:
# and range() can accept a third argument, that is the step size
# so if we want to count down from 15 to 10 (but not including 10), we can do so with a step size = -1
for i in range(15, 10, -1):
    print(i)

15
14
13
12
11


### Sets

A set is a collection of *unordered* and *unique* objects.  Because sets are unorders, they do not support indexing or slicing as the sequence tpes do.  Sets are mutable (although there is another object type, a frozenset, that is like a set, but is immutable).  These properties of sets make them very useful for testing membership or finding unique groups of values.

Sets are defined with curly brackets.

In [73]:
num_set = {1, 2, 3}
num_set

{1, 2, 3}

In [79]:
# Look what happens when values are repeated and numbers entered not in order
num_set2 = {5, 1, 2, 3, 2, 3}
num_set2

{1, 2, 3, 5}

In [80]:
# since they are mutable, we can remove elements from sets
num_set2.remove(2)
num_set2

{1, 3, 5}

In [81]:
# because they contain only unique values, you can do some membership testing
# e.g.
5 in num_set2

True

### Dictionaries

Dictionaries contain key-value pairs.  Keys provide the pointer to an associated value.  Typically keys are strings, but they can be numeric (or even other types - but not dictionaries or lists or other mutable types).  Values can be any type.  Thus dictionaries are very valuable as providing mappings from a key word to a value.

Dictionaries are created by placing a comma-separated list of key: value pairs within curly braces, for example:

In [83]:
address_dict = {'street': 'Green St.', 'number': 1014, 'city': 'Columbia', 'state': 'SC'}
address_dict

{'city': 'Columbia', 'number': 1014, 'state': 'SC', 'street': 'Green St.'}

In [85]:
# you reference a particular element of a dictionary by indexing with the key
address_dict['city']

'Columbia'

In [89]:
# you can iterate over the keys or the keys and value of a dictionary
# note the .items()
for key,value in address_dict.items():
    print('Key = ', key)
    print('Value = ', value)

Key =  street
Value =  Green St.
Key =  number
Value =  1014
Key =  city
Value =  Columbia
Key =  state
Value =  SC


In [94]:
# iterating only over the keys
# note the .keys() method
for key in address_dict.keys():
    print(key)

street
number
city
state


In [95]:
# iterating only over the values
# note the .values() method
for value in address_dict.values():
    print(value)

Green St.
1014
Columbia
SC


In [96]:
# just to show that you don't need to use the name key or value for the item in the list that's iterated over
for xyz in address_dict.values():
    print(xyz)

Green St.
1014
Columbia
SC


## NumPy

NumPy is a Python package that is important for economics applications and scientific computing in general.  It allows you to define N-dimensional arrays, use sophisticated (broadcasting) functions, more easily integrate C/C++ and Fortran code, and provides useful linear algebra, Fourier transform, and random number capabilities.  In this sense, you can think of it as bringing the standard Matlab functionality into Python.

NumPy is a Python package, it's not part of the standard library or Python itself.  Since you installed the Anaconda distribution of Python, you installed Python along with a number of packages, including Numpy.  However, if we want to have access to the functionality of Numpy, we need to import NumPy into our Python session.  

To do this:

In [99]:
# note the "as" provide an alias for referencing numpy - so instead of typing numpy, we can just type np
import numpy as np

Now NumPy is available in this iPython session running with this notebook!!

Let's start using NumPy by creating a NumPy array.  You can think of an array as a list of lists.  E.g., a 2-D matrix can be thought of as a list of rows, where each row is a list of elements (the items in each column in that row).  In fact, you could just make a list of lists with they built-in list type rather than using a NumPy array. But then matrix operations would involved a lot of (slow) loops.  NumPy arrays will allow you to do matrix operations with NumPy methods that leverage optimally compiled code so that they run more quickly.

But since an array is a list of lists and lists are defined using square brackets, we'll use similar syntax to define a Numpy array:

In [101]:
A = np.array([[1, 2], [3, 4]])
A

array([[1, 2],
       [3, 4]])

In [102]:
# what is the shape of A?
A.shape

(2, 2)

We can do a whole bunch of mathemical operations on these arrays.  By default, these are performed element by element.

For example:

In [103]:
A + 2

array([[3, 4],
       [5, 6]])

In [104]:
A * 2

array([[2, 4],
       [6, 8]])

In [105]:
A / 2

array([[ 0.5,  1. ],
       [ 1.5,  2. ]])

In [106]:
A ** 2

array([[ 1,  4],
       [ 9, 16]])

If we want to do matrix operations, we need to call NumPy methods.

In [107]:
# perform a dot product
np.dot(A, A)

array([[ 7, 10],
       [15, 22]])

In [108]:
# perform a Kronecker product
np.kron(A, A)

array([[ 1,  2,  2,  4],
       [ 3,  4,  6,  8],
       [ 3,  6,  4,  8],
       [ 9, 12, 12, 16]])

### Broadcasting arrays

It's important to understand how NumPy works when you try to do operations on arrays that aren't the same size.  NumPy will try to perform the operations on any two arrays. This will work if:
* the arrays are the same size
* if in two dimensions trying to be broadcast together, one of them is 1, but the other has matching size

When broadcasting arrays, NumPy starts with the last dimension and works its way forward.

In [109]:
# Some examples
B = np.array([2, 5])
# this multiplication works because the second dimension of B is 1
# and the first dimension has the same length as the first dimesion of A
A * B

array([[ 2, 10],
       [ 6, 20]])

In [127]:
# Define a 3x1 array
C = np.array([1 ,2, 1])
print(A.shape, C.shape)
# Multiplication of a 2x2 with a 3x1 won't work
A * C

(2, 2) (3,)


ValueError: operands could not be broadcast together with shapes (2,2) (3,) 

In [113]:
# NumPy allows you to easily make arrays of 1's, 0's, or the identity matrix
np.ones(2)

array([ 1.,  1.])

In [114]:
np.zeros(2)

array([ 0.,  0.])

In [116]:
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [117]:
# You can also reshape arrays
# Here we reshape the 2x2 array A into a vector of length 4
A.reshape((4,))

array([1, 2, 3, 4])

In [124]:
# And you can tile arrays in other dimensions
# Here we'll turn A into a 2x2x1 matrix (note arrays are mutable, so the reshape command
# above changed A to a vector of length 4)
A.reshape((2, 2, 1))
print(A.shape)
# Now we'll tile this 2x2 matrix 4 times in the 3rd dimension
D = np.tile(A, (1, 1, 4))
D.shape

(2, 2)


(1, 2, 8)

In [119]:
D

array([[[1, 2, 1, 2, 1, 2, 1, 2],
        [3, 4, 3, 4, 3, 4, 3, 4]]])

## Pandas

Pandas is Python package that excels as data analysis.  The name references its initial use for **Pan**el **Da**ta and it's certainly excellent for that and time series, but also for cross-sectional data.  The main object you'll work with in Pandas is the DataFrame.  

### Dataframes
A data frame is a 2-dimensional object (so you can think of table or your view of data in Stata).  Typically, you'll have rows representing observations and columns representing variables.  

#### Indexing
Specific rows can be referenced by an index value for each row.  But Pandas is very flexible and indices can be numeric or strings.  In addition, you can have DataFrames with multiple index values or hierarchial indices.  This advanced indexing allows you to represent high dimensional data in the 2-D DataFrame.


Let's look at a simple dataframe and how we can manipulate it.

In [129]:
# import the pandas package and assign an alias
import pandas as pd

# create a dictionary with some data
data = {'school': ['Texas', 'Texas', 'Texas', 'UGA', 'UGA'],
        'year': [2014, 2015, 2016, 2015, 2016],
        'wins': [6, 5, 5, 10, 8]}
# create a DataFrame from the dictionary 
frame = pd.DataFrame(data)
frame

Unnamed: 0,school,wins,year
0,Texas,6,2014
1,Texas,5,2015
2,Texas,5,2016
3,UGA,10,2015
4,UGA,8,2016


In [130]:
# Show the column names
frame.columns

Index(['school', 'wins', 'year'], dtype='object')

In [131]:
# Another way to list column names
list(frame)

['school', 'wins', 'year']

In [132]:
# To reference a column of the data frame, use the column name
frame['school']

0    Texas
1    Texas
2    Texas
3      UGA
4      UGA
Name: school, dtype: object

In [139]:
# You can reference a row through the index value
# note the .loc method to locate a record by index
frame.loc[0]

school    Texas
wins          6
year       2014
Name: 0, dtype: object

In [137]:
# You can also use .loc to reference specific cell by using the column name as the second argument
frame.loc[0,'school']

'Texas'

In [140]:
# and you can take slices of dataframes
frame.loc[:2,'school']

0    Texas
1    Texas
2    Texas
Name: school, dtype: object

In [142]:
# another key functionality is the "groupby"
# here we'll sum the number of wins by school
frame.groupby(['school'])['wins'].sum()

school
Texas    16
UGA      18
Name: wins, dtype: int64

In [143]:
# summary stats can be seen by using the describe() method
frame.describe()

Unnamed: 0,wins,year
count,5.0,5.0
mean,6.8,2015.2
std,2.167948,0.83666
min,5.0,2014.0
25%,5.0,2015.0
50%,6.0,2015.0
75%,8.0,2016.0
max,10.0,2016.0


DataFrames allow many of the of the numpy operations.

For example, we can use mathetmatical opertions on columns.  Let's do that and create a column with the number of wins squared.

In [145]:
frame['wins squared'] = frame['wins'] ** 2
frame

Unnamed: 0,school,wins,year,wins squared
0,Texas,6,2014,36
1,Texas,5,2015,25
2,Texas,5,2016,25
3,UGA,10,2015,100
4,UGA,8,2016,64


For large dataframes you don't always want to look at the whole thing, but you do want to see an example of what is in each how.  The `head()` method is useful here.

In [146]:
# this prints the columns name and the first two rows
frame.head(n=2)

Unnamed: 0,school,wins,year,wins squared
0,Texas,6,2014,36
1,Texas,5,2015,25


In [153]:
# renaming columns
# note the argument "inplace=True" - this replaces this instance of the dataframe with the one with renamed columns
frame.rename(columns={"wins": "Wins", "wins squared": "win sq"}, inplace=True)
frame

Unnamed: 0,school,Wins,year,win sq
0,Texas,6,2014,36
1,Texas,5,2015,25
2,Texas,5,2016,25
3,UGA,10,2015,100
4,UGA,8,2016,64
