# Two hour introduction to Python

Questions:

1. How do I run a python environment? (And what is python? What is an environment?)
2. What is python good for?
3. Where can I get more information?

Objectives:

1. Define python's data types and structures.
2. Store file content in a variable.
3. Demonstrate a workflow using covered concepts

In [1]:
# Python is an interpreted language -
# code can be run interactively using an interpeter; does not have to be compiled

'''
Things to know - 

Python versions 2 and 3 are available. Recommended to use latest update to version 3.

Python and many IDEs are free and open source. We recommend the Anaconda distribution.

    * complexity of interpreters can vary from command line interface to full features IDEs
    * any interpeter or IDE should be able to run any python script - interoperable and cross-platform
    * large and active community - where to go for more info?
        * the main python website: https://www.python.org/, includes documentation, tutorials, etc. - walk through some
        * Stack Overflow - how to get help on a specific problem (python list index out of range)
        * documentation for specific libraries (pandas, matplotlib - most have tutorials, etc.)

'''

'\nThings to know - \n\nPython versions 2 and 3 are available. Recommended to use latest update to version 3.\n\nPython and many IDEs are free and open source. We recommend the Anaconda distribution.\n\n    * complexity of interpreters can vary from command line interface to full features IDEs\n    * any interpeter or IDE should be able to run any python script - interoperable and cross-platform\n    * large and active community - where to go for more info?\n        * the main python website: https://www.python.org/, includes documentation, tutorials, etc. - walk through some\n        * Stack Overflow - how to get help on a specific problem (python list index out of range)\n        * documentation for specific libraries (pandas, matplotlib - most have tutorials, etc.)\n\n'

## The interpreter

In [2]:
# recall - python is an interpreted language
# we can execute commands - in this case mathematical operations - within the interpreter

# addition
3 + 3

6

In [3]:
# multiplication
3 * 3

9

In [4]:
# subtraction
9-5

4

In [5]:
# division - there are different types of division!
# the standard division
9/4

2.25

In [6]:
# don't linger too long on modulo and integer division
# modulo (returns the remainder)
9%4

1

In [7]:
# Integer division
# returns the whole number, no remainder
9//4

2

Python recognizes different data types. We have used to two common numeric data types - integer and floating point number.

In [8]:
type(9)

int

In [9]:
type(4)

int

In [10]:
type(2.25)

float

In [11]:
type(9/4)

float

Another common data type is a string - a character string

In [12]:
'my cat is hiding'

'my cat is hiding'

In [13]:
type('my cat is hiding')

str

In [14]:
# Quotes make a string
type(9)

int

In [15]:
type('9')

str

In [16]:
type(cat)

NameError: name 'cat' is not defined

**Question:**

What is the output of the following:

```
type(True)
```

What kind of data type is 'bool'? Where can you find out more info?

In [17]:
type(True)

bool

## Variables

Variables are used to store values.

In [18]:
a = 5
b = 10
a + b

15

In [19]:
# can be reassigned
# can be reassigned manually
b = 24
a + b

29

In [20]:
# current value of a variable can be used to reassign that variable
t = 84
print('initial value of t:', t)
t = t + 5
print('final value of t:', t)

initial value of t: 84
final value of t: 89


In [21]:
# values of variables can also be udated programmtically
a = 5
b = 10
while a < b:
    print('value of a is:', a)
    a = a + 1

print('the final value of a is:', a)

value of a is: 5
value of a is: 6
value of a is: 7
value of a is: 8
value of a is: 9
the final value of a is: 10


In [22]:
# variables have data types
type(a)

int

In [23]:
# note the difference
type('a')

str

In [24]:
animal = 'cat'
type(animal)

str

Given the following variable assignments:

```
x = 12
y = str(14)
z = donuts
```

Predict the output of the following:
```
1. y + z
2. x + y
3. x + int(y)
4. str(x) + y
```
Check your answers in the interpreter.

### Variable Naming Rules

Variable names are case senstive and:

1. Can only consist of one "word" (no spaces).
2. Must begin with a letter or underscore character ('_').
3. Can only use letters, numbers, and the underscore character.

We further recommend using variable names that are meaningful within the context of the script and the research.

In [25]:
# Using variables to store file content

# One way is to open the whole file as a block
file_path = "./beowulf" # We can save the path to the file as a variable
file_in = open(file_path, "r") # Options are 'r', 'w', and 'a' (read, write, append)
beowulf_a = file_in.read()
file_in.close()
print(beowulf_a)

ï»¿BEOWULF.

I.

THE LIFE AND DEATH OF SCYLD.


{The famous race of Spear-Danes.}

          Lo! the Spear-Danes' glory through splendid achievements
          The folk-kings' former fame we have heard of,
          How princes displayed then their prowess-in-battle.

{Scyld, their mighty king, in honor of whom they are often called
Scyldings. He is the great-grandfather of Hrothgar, so prominent in the
poem.}

          Oft Scyld the Scefing from scathers in numbers
        5 From many a people their mead-benches tore.
          Since first he found him friendless and wretched,
          The earl had had terror: comfort he got for it,
          Waxed 'neath the welkin, world-honor gained,
          Till all his neighbors o'er sea were compelled to
       10 Bow to his bidding and bring him their tribute:
          An excellent atheling! After was borne him

{A son is born to him, who receives the name of Beowulf--a name afterwards
made so famous by the hero of the poem.}

          A 

In [26]:
# Another way is to read the file as a list of individual lines

with open(file_path, "r") as b:
    beowulf_b = b.readlines()

print(beowulf_b)

['ï»¿BEOWULF.\n', '\n', 'I.\n', '\n', 'THE LIFE AND DEATH OF SCYLD.\n', '\n', '\n', '{The famous race of Spear-Danes.}\n', '\n', "          Lo! the Spear-Danes' glory through splendid achievements\n", "          The folk-kings' former fame we have heard of,\n", '          How princes displayed then their prowess-in-battle.\n', '\n', '{Scyld, their mighty king, in honor of whom they are often called\n', 'Scyldings. He is the great-grandfather of Hrothgar, so prominent in the\n', 'poem.}\n', '\n', '          Oft Scyld the Scefing from scathers in numbers\n', '        5 From many a people their mead-benches tore.\n', '          Since first he found him friendless and wretched,\n', '          The earl had had terror: comfort he got for it,\n', "          Waxed 'neath the welkin, world-honor gained,\n", "          Till all his neighbors o'er sea were compelled to\n", '       10 Bow to his bidding and bring him their tribute:\n', '          An excellent atheling! After was borne him\n', '\n'

In [27]:
# In order to get a similar printout to the first method, we use a for loop
# to print line by line - more on for loops below!

for l in beowulf_b:
    print(l)

ï»¿BEOWULF.



I.



THE LIFE AND DEATH OF SCYLD.





{The famous race of Spear-Danes.}



          Lo! the Spear-Danes' glory through splendid achievements

          The folk-kings' former fame we have heard of,

          How princes displayed then their prowess-in-battle.



{Scyld, their mighty king, in honor of whom they are often called

Scyldings. He is the great-grandfather of Hrothgar, so prominent in the

poem.}



          Oft Scyld the Scefing from scathers in numbers

        5 From many a people their mead-benches tore.

          Since first he found him friendless and wretched,

          The earl had had terror: comfort he got for it,

          Waxed 'neath the welkin, world-honor gained,

          Till all his neighbors o'er sea were compelled to

       10 Bow to his bidding and bring him their tribute:

          An excellent atheling! After was borne him



{A son is born to him, who receives the name of Beowulf--a name afterwards

made so famous by the hero 

In [28]:
# We now have two variables with the content of our 'beowulf' file represented using two different data structures.
# Why do you think we get the different outputs from the next two statements?

# Beowulf text stored as one large string
print("As string:", beowulf_a[0])

# Beowulf text stored as a list of lines
print("As list of lines:", beowulf_b[0])

As string: ï
As list of lines: ï»¿BEOWULF.



In [29]:
# We can confirm our expectations by checking on the types of our two beowulf variables
print(type(beowulf_a))
print(type(beowulf_b))

<class 'str'>
<class 'list'>


In [30]:
# Read tabular data - many ways to do this but we are going to use the csv library, which 
# is included in the default distribution

# we have to import the library - it is not loaded on startup
import csv

In [136]:
# we will store the data in a list
csv_data = []

# https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data - add citation
with open('./names/2021', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        csv_data.append(row)

In [149]:
# csv_data

In [138]:
csv_data[0]

{'name': 'Olivia', 'sex': 'F', 'count': '17728'}

In [139]:
csv_data[0]['name']

'Olivia'

Discuss what happened above when CSV file was read - data have been structured as a list of dictionaries.

Within each dictionary, column names and data are stored as key:value pairs.

So let's look at lists and dictionaries. These are two commonly used types of python data structures.

In [35]:
# there are other libraries and methods for opening csvs and other tabular data files
import pandas as pd

In [150]:
nm_ins = pd.read_csv('./names/2021', encoding='latin1')

In [151]:
nm_ins

Unnamed: 0,name,sex,count
0,Olivia,F,17728
1,Emma,F,15433
2,Charlotte,F,13285
3,Amelia,F,12952
4,Ava,F,12759
...,...,...,...
31532,Zyeire,M,5
31533,Zyel,M,5
31534,Zyian,M,5
31535,Zylar,M,5


In [38]:
# this is a little easier to work with!

In [39]:
# about lists
# collections of objects, separated by commas
# ordered and mutable

number_list = [1, 2, 3, 4, 4, 3, 9, 12]
string_list = ['cat', 'bat', 'hat', 'mat', 'pat']

In [40]:
print(number_list)

[1, 2, 3, 4, 4, 3, 9, 12]


In [41]:
print(string_list)

['cat', 'bat', 'hat', 'mat', 'pat']


In [42]:
mixed_type_list = [1, 'dog', 99, 'pencil', 3.14]

In [43]:
for i in mixed_type_list:
    print(i, type(i))

1 <class 'int'>
dog <class 'str'>
99 <class 'int'>
pencil <class 'str'>
3.14 <class 'float'>


In [44]:
# objects in a list can be other lists
# rather than create a new list, use the append() method
mixed_type_list.append(['a', 1, 'b', 2])
mixed_type_list.append({'c1': 'v1', 'c2': 'v2'})

In [45]:
# rerun our loop
for i in mixed_type_list:
    print(i, type(i))

1 <class 'int'>
dog <class 'str'>
99 <class 'int'>
pencil <class 'str'>
3.14 <class 'float'>
['a', 1, 'b', 2] <class 'list'>
{'c1': 'v1', 'c2': 'v2'} <class 'dict'>


In [46]:
# but just because we can do stuff like this, it generally makes more sense to
# keep lists to a single data type like the number_list and string_list above

# list indexing - every object in a list has an index position
# the first object
number_list[0]

1

In [47]:
string_list[0]

'cat'

In [48]:
# the second
number_list[1]

2

In [49]:
string_list[1]

'bat'

In [50]:
# the last object
number_list[-1]

12

In [51]:
string_list[-1]

'pat'

In [52]:
# second from last, etc.
number_list[-2]

9

In [53]:
string_list[-2]

'mat'

In [54]:
# slicing - subsetting lists
# index pos to right of colon is up to but not including - in this case, index pos 0, 1, 2 but not 3
number_list[0:3]

[1, 2, 3]

In [55]:
# when starting from the beginning of a list, we can leave the start position out - same as above
number_list[:3]

[1, 2, 3]

In [56]:
number_list[2:5]

[3, 4, 4]

In [57]:
number_list[2:]

[3, 4, 4, 3, 9, 12]

In [58]:
number_list

[1, 2, 3, 4, 4, 3, 9, 12]

In [59]:
number_list[:]

[1, 2, 3, 4, 4, 3, 9, 12]

In [60]:
number_list[0:-1] # exercise - what will this output, and why?

[1, 2, 3, 4, 4, 3, 9]

In [61]:
# we don't always know how many objects are in a list
# to find out, use the len() function
len(number_list)

8

In [62]:
len(string_list)

5

In [63]:
# nested lists are very handy - a good way to represent tabular data
# indexing and slicing nested lists
nested_list = [['a', 'b', 'c'], [1, 2, 3], [4, 5, 6]]

In [64]:
nested_list[0]

['a', 'b', 'c']

In [65]:
nested_list[0][0]

'a'

**Exercise**

Given the following nested list:

```
my_data = [['a', 'b', 'c'], [[1, 2, 3], [4, 5, 6]], [['cat', 'cow', 'dog'], ['red', 'green', 'blue']]]
```

Write a statement to produce the following outputs:

```
5
```

and 

```
[['cat', 'cow', 'dog'], ['red', 'green', 'blue']]
```

and

```
['b', 'c']
```

**Hint:** experiment and build your statement iteratively!

In [66]:
my_data = [['a', 'b', 'c'], [[1, 2, 3], [4, 5, 6]], [['cat', 'cow', 'dog'], ['red', 'green', 'blue']]]

In [67]:
# 5
my_data[1][1][1]

5

In [68]:
# last two lists
my_data[2]

[['cat', 'cow', 'dog'], ['red', 'green', 'blue']]

In [69]:
# b and c
my_data[0][1:]

['b', 'c']

In [70]:
# Dictionaries
# a final data structure for today

# similar to lists - collections of objects
# unordered, mutable
# store objects as key:value pairs

# allows us to work with larger collections since the keys are indexed

'''
Think of things that have properties - what are those properties? Like, a car:
    
make
model
color
mpg
transmission
'''

'\nThink of things that have properties - what are those properties? Like, a car:\n    \nmake\nmodel\ncolor\nmpg\ntransmission\n'

In [71]:
my_car = {'make': 'honda',
         'model': 'fit',
         'year': '2013',
         'color': 'blue',
         'transmission': 'manual'}

In [72]:
# indexing dictionaries with keys
my_car['model']

'fit'

In [73]:
# what if we don't know the keys?
my_car.keys() # note the output is a list!

dict_keys(['make', 'model', 'year', 'color', 'transmission'])

In [74]:
# we can also get the values
my_car.values() # note the output is a list!

dict_values(['honda', 'fit', '2013', 'blue', 'manual'])

In [75]:
# a value in a dictionary can be any data type - str, int, list, dictionary
# let's say we want to add info about optional features in my car - we can use a list
my_car['options'] = ['radio', 'air conditioning', 'seat covers']

In [76]:
my_car

{'make': 'honda',
 'model': 'fit',
 'year': '2013',
 'color': 'blue',
 'transmission': 'manual',
 'options': ['radio', 'air conditioning', 'seat covers']}

In [77]:
my_car['mpg'] = 35

In [78]:
my_car

{'make': 'honda',
 'model': 'fit',
 'year': '2013',
 'color': 'blue',
 'transmission': 'manual',
 'options': ['radio', 'air conditioning', 'seat covers'],
 'mpg': 35}

In [79]:
# we can build a catalog or directory of people's cars
deans_car = {'make': 'chevrolet', 'model': 'impala', 'color': 'black'}

In [80]:
# Exercise - create a dictionary for your vehicle (or if you don't drive, a famous movie car)

In [81]:
# now we can build a catalog of nested dictionaries
cars = {}
cars['jon'] = my_car
cars['dean'] = deans_car

In [82]:
cars

{'jon': {'make': 'honda',
  'model': 'fit',
  'year': '2013',
  'color': 'blue',
  'transmission': 'manual',
  'options': ['radio', 'air conditioning', 'seat covers'],
  'mpg': 35},
 'dean': {'make': 'chevrolet', 'model': 'impala', 'color': 'black'}}

In [83]:
# add your car to the catalog
# now how do we get nested values?
# model of Dean's car
cars['dean']['model']

'impala'

In [84]:
# for loops
# once we have data of specific types stored witin data structures, what do we do?
# slicing and indexing is not very useful if we're just retrieving one object at a time

# so we use loops to do things

In [85]:
# syntax
# the value of the loop variable is updated each time the loop runs
# the collection can be anything

'''
for loop_variable in collection:
    do something
'''

'\nfor loop_variable in collection:\n    do something\n'

In [87]:
for letter in 'snailshell':
    print(letter)

s
n
a
i
l
s
h
e
l
l


In [88]:
s = 'snailshell'
for letter in s:
    print(letter)

s
n
a
i
l
s
h
e
l
l


In [89]:
for word in ['cat', 'hat', 9, 18, s]:
    print(word)

cat
hat
9
18
snailshell


In [90]:
some_list = ['cat', 'hat', 9, 18, s]

In [91]:
for obj in some_list:
    print(obj)

cat
hat
9
18
snailshell


In [92]:
# for loops for dictionaries are a little different
# we need loop variables for the keys and values, and the 'items()' method

'''
for key, value in dictionary.item():
   do something
'''

'\nfor key, value in dictionary.item():\n   do something\n'

In [103]:
# first of all - what is 'items()'?
# the output is a list of tuples
# not really going into tuples today

print(my_car.items())

dict_items([('make', 'honda'), ('model', 'fit'), ('year', '2013'), ('color', 'blue'), ('transmission', 'manual'), ('options', ['radio', 'air conditioning', 'seat covers']), ('mpg', 35)])


In [104]:
print(my_car) # compare with above

{'make': 'honda', 'model': 'fit', 'year': '2013', 'color': 'blue', 'transmission': 'manual', 'options': ['radio', 'air conditioning', 'seat covers'], 'mpg': 35}


In [105]:
# this is a common error

for key, value in my_car:
    print(key)
    print(value)

ValueError: too many values to unpack (expected 2)

In [107]:
for key, value in my_car.items():
    print(key, ":", value)

make : honda
model : fit
year : 2013
color : blue
transmission : manual
options : ['radio', 'air conditioning', 'seat covers']
mpg : 35


In [108]:
# another kind of loop is the while loop - we may not cover it much today

In [109]:
# finally - conditionals
# evaluate a statement
# do something different depending on whether the statement evaluates to True or False

a = 5
b = -19

# what do we mean by evaluate True or False?
print(a < b)

False


In [110]:
print(a > b)

True


In [115]:
# based on this evaluation, we might do something different

checking = 1000
savings = 20
bills = 250

if checking > bills:
    print('move money into savings!')

move money into savings!


In [117]:
# we can specify an else statement for all other cases
if savings > bills:
    print('move money from savings!')
else:
    print('be sure to avoid an overdraft!')

be sure to avoid an overdraft!


In [112]:
# if we only have one condition, we can use a single if statement
# but we may have multiple conditions to evaluate

checking = 20
savings = 1000
bills = 200

# additional conditions use elif (else if)
# the else statement handles all other cases - no condition needed
if checking > bills:
    print('pay bills!')
elif savings + checking > bills:
    print('move money from savings and pay bills')
else:
    print('tighten that belt!')

move money from savings and pay bills


In [118]:
# python includes more data types and structures than we have introduced here
# but what we have done is enough to develop some powerful workflows
# define variables and data structures, and use logic - loops and conditionals - to do things with them

In [140]:
# we have baby names from SSN applications 2010-2021, stored in a file per year
# can we get the most and least popular names for both sexes?
# Note this is exclusive of non-binary genders - for future iterations pick a different dataset

'''
think about the process - we have 11 files of baby names
we need to 

1. get a list of the files so python can open them
2. read each file individually
3. get the most common names per sex
'''

# 1. get a list of files
import glob
flist = glob.glob('./names/*')
print(flist)

['./names\\2010', './names\\2011', './names\\2012', './names\\2013', './names\\2014', './names\\2015', './names\\2016', './names\\2017', './names\\2018', './names\\2019', './names\\2020', './names\\2021']


In [141]:
# we need a for loop to do steps 2 and 3
# but we can use slicing to try our code on a subset of the list

for f in flist[:2]:
    print(f)

./names\2010
./names\2011


In [142]:
# reuse some code from above

for f in flist[:2]:
    print(f)
    name_data = []
    with open(f, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            name_data.append(row)
    print(len(name_data), 'names in the file')

./names\2010
34089 names in the file
./names\2011
33923 names in the file


In [148]:
# we can iteratively develop our code and test it

for file in flist:
    name_data = []
    with open(file, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            name_data.append(row)
    # comparing values is a little clumsy with the CSV library
    # here's my solution - set a baseline and update by comparing the baseline with each row values
    f_max_c = 0
    f_popular = ''
    m_max_c = 0
    m_popular = ''
    for name in name_data:
        if name['sex'] == 'F':
            if int(name['count']) > int(f_max_c):
                f_max_c = int(name['count'])
                f_popular = name['name']
        elif name['sex'] == 'M':
            if int(name['count']) > int(m_max_c):
                m_max_c = int(name['count'])
                m_popular = name['name']
    # clean up the filename a little
    y = file.replace('./names\\', '')
    print(y, 'most popular girl name:', f_popular, '(', f_max_c, ')')
    print(y, 'most popular boy name:', m_popular, '(', m_max_c, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )
2011 most popular girl name: Sophia ( 21855 )
2011 most popular boy name: Jacob ( 20381 )
2012 most popular girl name: Sophia ( 22322 )
2012 most popular boy name: Jacob ( 19091 )
2013 most popular girl name: Sophia ( 21236 )
2013 most popular boy name: Noah ( 18269 )
2014 most popular girl name: Emma ( 20949 )
2014 most popular boy name: Noah ( 19324 )
2015 most popular girl name: Emma ( 20468 )
2015 most popular boy name: Noah ( 19654 )
2016 most popular girl name: Emma ( 19531 )
2016 most popular boy name: Noah ( 19164 )
2017 most popular girl name: Emma ( 19847 )
2017 most popular boy name: Liam ( 18838 )
2018 most popular girl name: Emma ( 18786 )
2018 most popular boy name: Liam ( 19940 )
2019 most popular girl name: Olivia ( 18534 )
2019 most popular boy name: Liam ( 20578 )
2020 most popular girl name: Olivia ( 17641 )
2020 most popular boy name: Liam ( 19777 )
2021 most popular girl nam

In [None]:
# we now have the most popular girl and boy names for each year since 2010
# How many babies were given the same name as yours each year?