# Two hour Python test-drive

Questions:

1. How do I run a python environment? (And what is python? What is an environment?)
1. What is python good for?
1. Where can I get more information?

Objectives:

1. Define python's data types and structures.
1. Store file content in a variable.
1. Save data to file, read data from file and URL.

#### Python is an interpreted language:

Code can be run interactively using an interpeter; does not have to be compiled.

Things to know -

Python versions 2 and 3 are available. Recommended to use latest update to version 3.

Python and many IDEs are free and open source. We recommend the Anaconda distribution.

* complexity of interpreters can vary from command line interface to full featured IDEs
* any interpeter or IDE should be able to run any python script - interoperable and cross-platform
* large and active community - where to go for more info?
    * the main python website: https://www.python.org/, includes documentation, tutorials, etc. - walk through some
    * Stack Overflow - how to get help on a specific problem (python list index out of range)
    * documentation for specific libraries (pandas, matplotlib - most have tutorials, etc.)


## The interpreter

In [None]:
# recall - python is an interpreted language
# we can execute commands - in this case mathematical operations - within the interpreter

# addition
3 + 3

6

In [None]:
# multiplication
3 * 3

9

In [None]:
# subtraction
9-5

4

In [None]:
# division - there are different types of division!
# the standard division
9/4

2.25

In [None]:
# don't linger too long on modulo and integer division
# modulo (returns the remainder)
9%4

1

In [None]:
# Integer division
# returns the whole number, no remainder
9//4

2

Python recognizes different data types. We have used to two common numeric data types - integer and floating point number.

In [None]:
type(9)

int

In [None]:
type(4)

int

In [None]:
type(2.25)

float

In [None]:
type(9/4)

float

Another common data type is a string - a character string

In [None]:
'my cat is hiding'

'my cat is hiding'

In [None]:
type('my cat is hiding')

str

In [None]:
# Quotes make a string
type(9)

int

In [None]:
type('9')

str

In [None]:
type(cat)

NameError: name 'cat' is not defined

**Question:**

What is the output of the following:

```
type(True)
```

What kind of data type is 'bool'? Where can you find out more info?

In [None]:
type(True)

bool

## Variables

Variables are used to store values.

In [None]:
a = 5
b = 10
a + b

15

In [None]:
# can be reassigned
# can be reassigned manually
b = 24
a + b

29

In [None]:
# current value of a variable can be used to reassign that variable
t = 84
print('initial value of t:', t)
t = t + 5
print('final value of t:', t)

initial value of t: 84
final value of t: 89


In [None]:
# values of variables can also be udated programmtically
a = 5
b = 10
while a < b:
    print('value of a is:', a)
    a = a + 1

print('the final value of a is:', a)

value of a is: 5
value of a is: 6
value of a is: 7
value of a is: 8
value of a is: 9
the final value of a is: 10


In [None]:
# variables have data types
# 'a' in the next line refers to the variable with that name
type(a)

int

In [None]:
# note the difference
# 'a' in the next line does not refer to the variable but to a string
type('a')

str

In [None]:
animal = 'cat'
type(animal)

str

Given the following variable assignments:

```
x = 12
y = str(14)
z = donuts
```

Predict the output of the following:
```
1. y + z
2. x + y
3. x + int(y)
4. str(x) + y
```
Check your answers in the interpreter.

### Variable Naming Rules

Variable names are case senstive and:

1. Can only consist of one "word" (no spaces).
2. Must begin with a letter or underscore character ('_').
3. Can only use letters, numbers, and the underscore character.

We further recommend using variable names that are meaningful within the context of the script and the research.

## Read and save tabular data from a URL to a file

As an application of what we have done so far, here we demonstrate using variables to download data and save it to a file on our local system.

In [None]:
# need to add functionality to base python - import library
import requests

In [None]:
#help(requests) # the basic usage is all we need for today but help is available

In [None]:
file_url = "https://raw.githubusercontent.com/unmrds/cc-python/master/tutorials/beowulf_babynames/names/2010"

In [None]:
r = requests.get(file_url) # dot syntax - "get" is a function or method  of a requests object, file_url is the arg

In [None]:
print(r)

<Response [200]>


In [None]:
print(type(r))

<class 'requests.models.Response'>


In [None]:
# okay but what can we do with this? Didn't we just download a file?
# NOTE: inspecting object, getting data types and help are part of a workflow!
help(r)

In [None]:
print(r.status_code)

200


In [None]:
#print(r.text)

In [None]:
# save to file

with open('2010', 'w') as o:
    o.write(r.text)

In [None]:
# Before proceeding - check is anyone not using Anaconda
# demo installing pandas as needed
import pandas as pd

In [None]:
names_2010 = pd.read_csv('2010', encoding='latin1')

In [None]:
names_2010 # inspect the data - note 34078 rows have been excluded, this is the head and tail

Unnamed: 0,name,sex,count
0,Isabella,F,22925
1,Sophia,F,20648
2,Emma,F,17354
3,Olivia,F,17030
4,Ava,F,15436
...,...,...,...
34084,Zymaire,M,5
34085,Zyonne,M,5
34086,Zyquarius,M,5
34087,Zyran,M,5


In [None]:
# other ways to inspect the data - note again this is an important part of a workflow
# not just something we're demonstrating here
names_2010.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34089 entries, 0 to 34088
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    34089 non-null  object
 1   sex     34089 non-null  object
 2   count   34089 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 799.1+ KB


In [None]:
names_2010.head()

Unnamed: 0,name,sex,count
0,Isabella,F,22925
1,Sophia,F,20648
2,Emma,F,17354
3,Olivia,F,17030
4,Ava,F,15436


In [None]:
# attributes - no parenthesis
names_2010.shape

(34089, 3)

In [None]:
# descriptive stats
# default is to only show stats for numeric data types
names_2010.describe()

Unnamed: 0,count
count,34089.0
mean,108.352812
std,697.685909
min,5.0
25%,7.0
50%,11.0
75%,29.0
max,22925.0


In [None]:
# in our case it can be useful to get all stats
names_2010.describe(include='all')

Unnamed: 0,name,sex,count
count,34089,34089,34089.0
unique,31643,2,
top,Isabella,F,
freq,2,19823,
mean,,,108.352812
std,,,697.685909
min,,,5.0
25%,,,7.0
50%,,,11.0
75%,,,29.0


In [None]:
# we know 34089 babies were registered with the US SSA in 2010
# the data provide counts by name
# what about counts by sex?
# a one liner!
names_2010.groupby('sex').count()

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,19823,19823
M,14266,14266


In [None]:
# that is a total count of names - there were 19823 different girl names registered, 14266 boy names
# what about the total number of boys and girls?
# also a one liner!
# note in this case we have to specify the numeric column we are summing - pandas will complain otherwise
names_2010.groupby('sex')['count'].sum()

sex
F    1776223
M    1917416
Name: count, dtype: int64

In [None]:
# we can do a lot with one-liners in python
# but for clarity sake from here on we will use a more verbose style
names_grouped = names_2010.groupby("sex")

In [None]:
# we know the most popular girl name by the way the data are sorted
# what about the boy name?
# the below tells us how many boys had the most popular boy name, but not the name
names_grouped['count'].max()

sex
F    22925
M    22139
Name: count, dtype: int64

In [None]:
names_grouped = names_2010.groupby('sex')
names_grouped.first() # note this only works becaue the data are already sorted by count

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,Isabella,22925
M,Jacob,22139


In [None]:
# this approach will work with unsorted data
names_sorted = names_2010.sort_values(['count'], ascending=False)
names_grouped = names_sorted.groupby('sex')
names_grouped.first()

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,Isabella,22925
M,Jacob,22139


Pandas is quite powerful and we could spend days on it! For now let's dig a little deeper into data structures, beginning with lists.

In [None]:
# about lists
# collections of objects, separated by commas
# ordered and mutable

number_list = [1, 2, 3, 4, 4, 3, 9, 12]
string_list = ['cat', 'bat', 'hat', 'mat', 'pat']

In [None]:
print(number_list)

[1, 2, 3, 4, 4, 3, 9, 12]


In [None]:
print(string_list)

['cat', 'bat', 'hat', 'mat', 'pat']


In [None]:
mixed_type_list = [1, 'dog', 99, 'pencil', 3.14]

In [None]:
for i in mixed_type_list:
    print(i, type(i))

1 <class 'int'>
dog <class 'str'>
99 <class 'int'>
pencil <class 'str'>
3.14 <class 'float'>


In [None]:
# objects in a list can be other lists
# rather than create a new list, use the append() method
mixed_type_list.append(['a', 1, 'b', 2])
mixed_type_list.append({'c1': 'v1', 'c2': 'v2'})

In [None]:
# rerun our loop
for i in mixed_type_list:
    print(i, type(i))

1 <class 'int'>
dog <class 'str'>
99 <class 'int'>
pencil <class 'str'>
3.14 <class 'float'>
['a', 1, 'b', 2] <class 'list'>
{'c1': 'v1', 'c2': 'v2'} <class 'dict'>


In [None]:
# but just because we can do stuff like this, it generally makes more sense to
# keep lists to a single data type like the number_list and string_list above

# list indexing - every object in a list has an index position
# the first object
number_list[0]

1

In [None]:
string_list[0]

'cat'

In [None]:
# the second
number_list[1]

2

In [None]:
string_list[1]

'bat'

In [None]:
# the last object
number_list[-1]

12

In [None]:
string_list[-1]

'pat'

In [None]:
# second from last, etc.
number_list[-2]

9

In [None]:
string_list[-2]

'mat'

In [None]:
# slicing - subsetting lists
# index pos to right of colon is up to but not including - in this case, index pos 0, 1, 2 but not 3
number_list[0:3]

[1, 2, 3]

In [None]:
# when starting from the beginning of a list, we can leave the start position out - same as above
number_list[:3]

[1, 2, 3]

In [None]:
number_list[2:5]

[3, 4, 4]

In [None]:
number_list[2:]

[3, 4, 4, 3, 9, 12]

In [None]:
number_list

[1, 2, 3, 4, 4, 3, 9, 12]

In [None]:
number_list[:]

[1, 2, 3, 4, 4, 3, 9, 12]

In [None]:
number_list[0:-1] # exercise - what will this output, and why?

[1, 2, 3, 4, 4, 3, 9]

In [None]:
# we don't always know how many objects are in a list
# to find out, use the len() function
len(number_list)

8

In [None]:
len(string_list)

5

In [None]:
# nested lists are very handy - a good way to represent tabular data
# indexing and slicing nested lists
nested_list = [['a', 'b', 'c'], [1, 2, 3], [4, 5, 6]]

In [None]:
nested_list[0]

['a', 'b', 'c']

In [None]:
nested_list[0][0]

'a'

**Exercise**

Given the following nested list:

```
my_data = [['a', 'b', 'c'], [[1, 2, 3], [4, 5, 6]], [['cat', 'cow', 'dog'], ['red', 'green', 'blue']]]
```

Write a statement to produce the following outputs:

```
5
```

and

```
[['cat', 'cow', 'dog'], ['red', 'green', 'blue']]
```

and

```
['b', 'c']
```

**Hint:** experiment and build your statement iteratively!

In [None]:
my_data = [['a', 'b', 'c'], [[1, 2, 3], [4, 5, 6]], [['cat', 'cow', 'dog'], ['red', 'green', 'blue']]]

In [None]:
# 5
my_data[1][1][1]

5

In [None]:
# last two lists
my_data[2]

[['cat', 'cow', 'dog'], ['red', 'green', 'blue']]

In [None]:
# b and c
my_data[0][1:]

['b', 'c']

## Dictionaries

In [None]:
# Dictionaries
# a final data structure to demonstrate (but there are many we haven't demonstrated!)

# similar to lists - collections of objects
# unordered, mutable
# store objects as key:value pairs

# allows us to work with larger collections since the keys are indexed

'''
Think of things that have properties - what are those properties? For example, a car:

make
model
color
mpg
transmission
'''

'\nThink of things that have properties - what are those properties? For example, a car:\n    \nmake\nmodel\ncolor\nmpg\ntransmission\n'

In [None]:
my_car = {'make': 'honda',
         'model': 'fit',
         'year': '2013',
         'color': 'blue',
         'transmission': 'manual'}

In [None]:
# indexing dictionaries with keys
my_car['model']

'fit'

In [None]:
# what if we don't know the keys?
my_car.keys() # note the output is a list!

dict_keys(['make', 'model', 'year', 'color', 'transmission'])

In [None]:
# we can also get the values
my_car.values() # note the output is a list!

dict_values(['honda', 'fit', '2013', 'blue', 'manual'])

In [None]:
# a value in a dictionary can be any data type - str, int, list, dictionary
# let's say we want to add info about optional features in my car - we can use a list
my_car['options'] = ['radio', 'air conditioning', 'seat covers']

In [None]:
my_car

{'make': 'honda',
 'model': 'fit',
 'year': '2013',
 'color': 'blue',
 'transmission': 'manual',
 'options': ['radio', 'air conditioning', 'seat covers']}

In [None]:
my_car['mpg'] = 35

In [None]:
my_car

{'make': 'honda',
 'model': 'fit',
 'year': '2013',
 'color': 'blue',
 'transmission': 'manual',
 'options': ['radio', 'air conditioning', 'seat covers'],
 'mpg': 35}

In [None]:
# we can build a catalog or directory of people's cars
deans_car = {'make': 'chevrolet', 'model': 'impala', 'color': 'black'}

In [None]:
# Exercise - create a dictionary for your vehicle (or if you don't drive, a famous movie car)

In [None]:
# now we can build a catalog of nested dictionaries
cars = {}
cars['jon'] = my_car
cars['dean'] = deans_car

In [None]:
cars

{'jon': {'make': 'honda',
  'model': 'fit',
  'year': '2013',
  'color': 'blue',
  'transmission': 'manual',
  'options': ['radio', 'air conditioning', 'seat covers'],
  'mpg': 35},
 'dean': {'make': 'chevrolet', 'model': 'impala', 'color': 'black'}}

In [None]:
# add your car to the catalog
# now how do we get nested values?
# model of Dean's car
cars['dean']['model']

'impala'

## Loops

In [None]:
# for loops
# once we have data of specific types stored witin data structures, what do we do?
# slicing and indexing is not very useful if we're just retrieving one object at a time

# so we use loops to do things

In [None]:
# syntax
# the value of the loop variable is updated each time the loop runs
# the collection can be anything

'''
for loop_variable in collection:
    do something
'''

'\nfor loop_variable in collection:\n    do something\n'

In [None]:
for letter in 'snailshell':
    print(letter)

s
n
a
i
l
s
h
e
l
l


In [None]:
s = 'snailshell'
for letter in s:
    print(letter)

s
n
a
i
l
s
h
e
l
l


In [None]:
for word in ['cat', 'hat', 9, 18, s]:
    print(word)

cat
hat
9
18
snailshell


In [None]:
some_list = ['cat', 'hat', 9, 18, s]

In [None]:
for obj in some_list:
    print(obj)

cat
hat
9
18
snailshell


In [None]:
# for loops for dictionaries are a little different
# we need loop variables for the keys and values, and the 'items()' method

'''
for key, value in dictionary.items():
   do something
'''

'\nfor key, value in dictionary.items():\n   do something\n'

In [None]:
# first of all - what is 'items()'?
# the output is a list of tuples
# not really going into tuples today

print(my_car.items())

dict_items([('make', 'honda'), ('model', 'fit'), ('year', '2013'), ('color', 'blue'), ('transmission', 'manual'), ('options', ['radio', 'air conditioning', 'seat covers']), ('mpg', 35)])


In [None]:
print(my_car) # compare with above

{'make': 'honda', 'model': 'fit', 'year': '2013', 'color': 'blue', 'transmission': 'manual', 'options': ['radio', 'air conditioning', 'seat covers'], 'mpg': 35}


In [None]:
# this is a common error

for key, value in my_car:
    print(key)
    print(value)

ValueError: too many values to unpack (expected 2)

In [None]:
for key, value in my_car.items():
    print(key, ":", value)

make : honda
model : fit
year : 2013
color : blue
transmission : manual
options : ['radio', 'air conditioning', 'seat covers']
mpg : 35


In [None]:
# finally - conditionals
# evaluate a statement
# do something different depending on whether the statement evaluates to True or False

a = 5
b = -19

# what do we mean by evaluate True or False?
print(a < b)

False


In [None]:
print(a > b)

True


In [None]:
# based on this evaluation, we might do something different

checking = 1000
savings = 20
bills = 250

if checking > bills:
    print('move money into savings!')

move money into savings!


In [None]:
# we can specify an else statement for all other cases
if savings > bills:
    print('move money from savings!')
else:
    print('be sure to avoid an overdraft!')

be sure to avoid an overdraft!


In [None]:
# if we only have one condition, we can use a single if statement
# but we may have multiple conditions to evaluate

checking = 20
savings = 1000
bills = 200

# additional conditions use elif (else if)
# the else statement handles all other cases - no condition needed
if checking > bills:
    print('pay bills!')
elif savings + checking > bills:
    print('move money from savings and pay bills')
else:
    print('tighten that belt!')

move money from savings and pay bills


In [None]:
# another kind of loop is the while loop - using conditional tests to determine
# if/how we exit a loop

# define value for initial test
# while Test:
#   Do Someting

In [None]:
a = 10
while a>0:
    print(a)
    a = a-1
print("Blast Off")

# Question:
# What do you predict would happen if we didn't include the
# 'a = a-1' command in the loop?

10
9
8
7
6
5
4
3
2
1
Blast Off


In [None]:
# let's pick some things at random using the random.choice(function)
# and add them to a list of things - animals in this case
import random

menagerieLength = 3

done = False
menagerie = []
animals = ["cat","dog","parakeet","goldfish","carp","dragon"]

while not done:
    newAnimal = random.choice(animals)
    print("adding",newAnimal, "to our menagerie")
    menagerie.append(newAnimal)
    if len(menagerie) == menagerieLength:
        done = True


adding goldfish to our menagerie
adding goldfish to our menagerie
adding cat to our menagerie


In [None]:
# Now let's list the items in our list - our menagerie

print("Our menagerie includes ")
for animal in menagerie:
    print("  ",animal)


Our menagerie includes 
   goldfish
   goldfish
   cat


In [None]:
# Finally let's count the items in our list - how many of each animal type to we have

# Approach One: use the source list as the reference for potential animals
print(type(animals),"\n")
for animal in animals:
    animalCount = menagerie.count(animal)
    print(animal,":",animalCount)

<class 'list'> 

cat : 1
dog : 0
parakeet : 0
goldfish : 2
carp : 0
dragon : 0


In [None]:
# Approach 2: use the menagerie list as the reference for animals
uniqueAnimals = set(menagerie)
print(type(uniqueAnimals),"\n")
for animal in uniqueAnimals:
    animalCount = menagerie.count(animal)
    print(animal,":",animalCount)

<class 'set'> 

cat : 1
goldfish : 2


In [None]:
# python includes more data types and structures than we have introduced here
# but what we have done is enough to develop some powerful workflows
# define variables and data structures, and use logic - loops and conditionals - to do things with them