# Introduction to Python
----------------------------------

This hands-on tutorial is intended to provide an introduction to concepts that are key to programming in Python (or most any other language), through the development of a worked example. Topics to be covered include:

- Interpreted languages
    - Development environments
- Variables
    - Data types
- Data structures
    - Lists
    - Dictionaries
- Control of execution
    - Loops
    - Conditions
 
We will also demonstrate data analysis with Pandas.

We will use Jupyter Notebooks for the tutorial. Learners are encouraged to use the interpreter or development environment of their choice. Python 3 is recommended.

## Python is an interpreted language

Code does not need to be compiled. It is evaluated and executed by an interpreter. 

In [1]:
# Demonstration: Python as a calculator
# Mathematical expressions are evaluated using the standard order of operations (PEMDAS).
3 * ((14 / 7) + 99 - 16)

255.0

## Variables store values

In [2]:
# Values are assigned to variables using =
a = 5
b = 10
print("The value of 'a' is:", a)
print("The value of 'b' is:", b)

The value of 'a' is: 5
The value of 'b' is: 10


In [3]:
# Variables can be used in expressions
print("Sum of a + b:", a + b)
print("Product of a * b:", a * b)
print("a to the power of b:", a ** b)

Sum of a + b: 15
Product of a * b: 50
a to the power of b: 9765625


In [4]:
# The value of a variable can be modified in place.
# That is, it can be overwritten using the previous value.
t = 84
print("Initial value of t:", t)
t = t + (a * b)
print("Final value of t:", t)

Initial value of t: 84
Final value of t: 134


## Variables have data types

Python supports many different data types. Common data types are

- string (character data)
- integer
- floating point number
- boolean (True/False)

In [5]:
# Use the type() function to find out an object's data type
type(1)

int

In [6]:
type(1.0)

float

In [7]:
type("one")

str

In [8]:
type("1")

str

In [9]:
type(True)

bool

In [10]:
# Python will do its best to coerce data types as needed to evaluate an expression
print(1 + 1.0)
print(type(1 + 1.0))

2.0
<class 'float'>


In [11]:
# But generally expressions that attempt to perform operations on both string and numeric data types will cause an error.
# Troubleshooting tip - this is a common error.
print(1 + "1")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [12]:
# Let's use some variables to describe a car
make = "Honda"
model = "Fit"
year = 2013
cyl = 4
transmission = "manual"

In [13]:
# Describe a second car.
# What happens is we reuse the previous variable names?
# One solution is to use more variable names.
make_c2 = "GMC"
model_c2 = "Canyon"
year_c2 = 2006
cyl_c2 = 4
transmission_c2= "automatic"

In [14]:
# We can use this variable naming schema to describe any number of cars.
# Describe a third vehicle, using a '_c3' suffix after each of the common variable names.
make_c3 = "Honda"
model_c3 = "Rebel"
year_c3 = 2018
cyl_c3 = 1
transmission_c3 = "manual"

## Lists

A list is an ordered, mutable collection of objects.

In [15]:
# Ordered - in this case means not sorted
number_list = [1, 23, 8, 14, 200, 75, 6]
print(number_list)

[1, 23, 8, 14, 200, 75, 6]


In [16]:
# Objects in a list can be any data type
string_list = ["a", "ball", "car", "Jon"]
print(string_list)

['a', 'ball', 'car', 'Jon']


In [17]:
# We can mix data types in a list.
# For example if we create a list of information about one of our cars.
first_car = [make, model, year, cyl, transmission]

In [18]:
print(first_car)

['Honda', 'Fit', 2013, 4, 'manual']


In [19]:
# Mutable
# What if we want to update a value?
# We use the index position of the element we want to replace.
first_car[4]

'manual'

In [20]:
first_car[4] = 'standard'

In [21]:
print(first_car)

['Honda', 'Fit', 2013, 4, 'standard']


In [22]:
# Index positions are used to select or retrieve elements from a lit
# Python uses zero-indexing
# Retrieve the first element
number_list[0]

1

In [23]:
# Retrieve the 2nd element, etc.
string_list[1]

'ball'

In [24]:
# Negative index positions work
# Get the last element
print(number_list[-1])

# And the second to last element
print(string_list[-2])

6
car


In [25]:
# Use index positions to slice or subset list elements
# Slicing syntax is begin with left index position, go up to but not including right index position
number_list[1:3]

[23, 8]

In [26]:
# Start and end positions can be left out is starting from the beginning or going to the end
string_list[:4]

['a', 'ball', 'car', 'Jon']

In [27]:
number_list[3:]

[14, 200, 75, 6]

## Loops

Often we interact with lists and other iterable objects using *for loops*.

In [28]:
# Use a for loop to iterate over a collection.
# A word is iterable - it is a collection of letters.
for a in 'letter':
    print(a)

l
e
t
t
e
r


In [29]:
# Lists are iterable
for v in first_car:
    print("value:", v)
    print("data type:", type(v))

value: Honda
data type: <class 'str'>
value: Fit
data type: <class 'str'>
value: 2013
data type: <class 'int'>
value: 4
data type: <class 'int'>
value: standard
data type: <class 'str'>


In [30]:
# Loops can be used to modify the value of a variable.
c = 10
for i in range(1, 10):
    print("Initial value of c this time through the loop:", c)
    c += 1

print("Final value of c:", c)

Initial value of c this time through the loop: 10
Initial value of c this time through the loop: 11
Initial value of c this time through the loop: 12
Initial value of c this time through the loop: 13
Initial value of c this time through the loop: 14
Initial value of c this time through the loop: 15
Initial value of c this time through the loop: 16
Initial value of c this time through the loop: 17
Initial value of c this time through the loop: 18
Final value of c: 19


In [31]:
# We can use loops to add elements to a list.
for n in range(2, 20, 2):
    number_list.append(n)

print(number_list)

[1, 23, 8, 14, 200, 75, 6, 2, 4, 6, 8, 10, 12, 14, 16, 18]


In [32]:
# Use help() to find out more about list() methods
#help(list)

In [33]:
# Another type of loop is a while loop.
# Execution continues until a condition is met.
x = 10
y = 20
while x < y:
    print("Current value of x:", x)
    # The below is shorthand for x = x + 1
    x += 1

print("Final value of x:", x)

Current value of x: 10
Current value of x: 11
Current value of x: 12
Current value of x: 13
Current value of x: 14
Current value of x: 15
Current value of x: 16
Current value of x: 17
Current value of x: 18
Current value of x: 19
Final value of x: 20


In [34]:
# Make two more car lists.
second_car = [make_c2, model_c2, year_c2, cyl_c2, transmission_c2]
third_car = [make_c3, model_c3, year_c3, cyl_c3, transmission_c3]
print(second_car)
print(third_car)

['GMC', 'Canyon', 2006, 4, 'automatic']
['Honda', 'Rebel', 2018, 1, 'manual']


In [35]:
# We can create an inventory or catalog of our cars by nesting loops.
car_inventory = [first_car, second_car, third_car]

In [36]:
# Note that this is a nested list - a list of lists.
print(car_inventory)

[['Honda', 'Fit', 2013, 4, 'standard'], ['GMC', 'Canyon', 2006, 4, 'automatic'], ['Honda', 'Rebel', 2018, 1, 'manual']]


In [37]:
# In order to select elements, we extend the syntax from above.
# First nested list
print("First element in the nested list:", car_inventory[0])

# First element in the first nested list
print("First element of the first element in the nested list:", car_inventory[0][0])

First element in the nested list: ['Honda', 'Fit', 2013, 4, 'standard']
First element of the first element in the nested list: Honda


## Dictionaries

Our *car_inventory* is a nested list, which contains lists of details about a set of cars. The example was selected as a way to demonstrate lists and list methods, but wouldn't be a very practical way to actually keep an inventory. What are some problems with this approach?

- Fragile (doesn't scale well to adding lots of cars)
- Requires use of many similar variable names
- Make, model, and other descriptors are lost when lists are created

Dictionaries are another commonly used data structure in python. Dictionaries are mutable, unordered key:value pairs.

In [38]:
# Use curly brackets to create a dictionary.

my_car = {"make": "Honda",
         "model": "Fit",
         "year": 2013,
         "color": "blue"}

In [39]:
# We can inspect the dictionary.
print(my_car)

{'make': 'Honda', 'model': 'Fit', 'year': 2013, 'color': 'blue'}


In [40]:
# Get the keys
my_car.keys()

dict_keys(['make', 'model', 'year', 'color'])

In [41]:
# Get the values
my_car.values()

dict_values(['Honda', 'Fit', 2013, 'blue'])

In [42]:
# Retrieve a particular value by using its key
my_car['color']

'blue'

In [43]:
# Add a new key:value pair
# Note the value in this case is a list - values can be anything, including lists and other dictionaries.
my_car["options"] = ["cd player", "a/c", "power windows"]

In [44]:
my_car

{'make': 'Honda',
 'model': 'Fit',
 'year': 2013,
 'color': 'blue',
 'options': ['cd player', 'a/c', 'power windows']}

In [45]:
# Dictionaries are iterable, though the syntax for using a dictionary in a for loop is different:
for key, value in my_car.items():
    print("Key:", key)
    print("Value:", value)

Key: make
Value: Honda
Key: model
Value: Fit
Key: year
Value: 2013
Key: color
Value: blue
Key: options
Value: ['cd player', 'a/c', 'power windows']


In [46]:
# Now we can create a more useful inventory
car_inventory = {"my_car": my_car}

In [47]:
# CREATE DICTIONARIES FOR THE OTHER CARS WE DESCRIBED ABOVE - DO TOGETHER OR INDIVIDUALLY

In [48]:
another_car = {"make": "GMC",
              "model": "Canyon",
              "year": 2006,
              "color": "grey"}

In [49]:
car_inventory["another_car"] = another_car

In [50]:
# A different way to do the above
car_inventory["deans_car"] = {"make": "Chevrolet",
                             "model": "Impala",
                             "color": "black"}

In [51]:
car_inventory

{'my_car': {'make': 'Honda',
  'model': 'Fit',
  'year': 2013,
  'color': 'blue',
  'options': ['cd player', 'a/c', 'power windows']},
 'another_car': {'make': 'GMC',
  'model': 'Canyon',
  'year': 2006,
  'color': 'grey'},
 'deans_car': {'make': 'Chevrolet', 'model': 'Impala', 'color': 'black'}}

## Conditionals

We can control which lines of code are evaluated based on conditions.

In [52]:
# If - the only required condition
if 1 > 0:
    print("Greater than")

Greater than


In [53]:
# Else covers all other conditions
if 1 > 2:
    print("Greater than")
else:
    print("Less than or equal to")

Less than or equal to


In [54]:
# use elif - else if - for multiple conditions
# again, the final else is optional and used to handle cases that do not evaluate to True
my_pets = ["heeler", "terrier", "gecko", "bearded dragon"]
your_pets = ["terrier", "poodle", "turtle", "goldfish"]
some_animal = 'poodle'

if some_animal in my_pets:
    print("I have a", some_animal, "!")
elif some_animal in your_pets:
    print("You have a", some_animal, "!")
else:
    print("Does anyone want a", some_animal, "?")

You have a poodle !


In [55]:
# What happens if we change some_animal to "terrier" and rerun the above cell? What do you anticipate? 
# What actually happens? Why?

In [56]:
# To address that we can combine conditions
# During tutorial, just update the cell above

if some_animal in my_pets and some_animal in your_pets:
    print("We both have a", some_animal, "!")
elif some_animal in my_pets:
    print("I have a", some_animal, "!")
elif some_animal in your_pets:
    print("You have a", some_animal, "!")
else:
    print("Does anyone want a", some_animal, "?")

You have a poodle !


## Pandas

Read and save tabular data from a URL to a file

As an application of what we have done so far, here we demonstrate using variables to download data and save it to a file on our local system.


In [57]:
# need to add functionality to base python - import library
import requests

In [58]:
file_url = "https://raw.githubusercontent.com/unmrds/cc-python/master/tutorials/beowulf_babynames/names/2010"

In [59]:
r = requests.get(file_url) # dot syntax - "get" is a function or method  of a requests object, file_url is the arg

In [60]:
# inspect
# print(r.text)

In [61]:
# save to file

with open('2010', 'w') as o:
    o.write(r.text)

In [62]:
# Before proceeding - check is anyone not using Anaconda
# demo installing pandas as needed
import pandas as pd

In [63]:
names_2010 = pd.read_csv('2010', encoding='latin1')

In [64]:
names_2010 # inspect the data - note 34078 rows have been excluded, this is the head and tail

Unnamed: 0,name,sex,count
0,Isabella,F,22925
1,Sophia,F,20648
2,Emma,F,17354
3,Olivia,F,17030
4,Ava,F,15436
...,...,...,...
34084,Zymaire,M,5
34085,Zyonne,M,5
34086,Zyquarius,M,5
34087,Zyran,M,5


In [65]:
# other ways to inspect the data - note again this is an important part of a workflow
# not just something we're demonstrating here
names_2010.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34089 entries, 0 to 34088
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    34089 non-null  object
 1   sex     34089 non-null  object
 2   count   34089 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 799.1+ KB


In [67]:
names_2010.head()

Unnamed: 0,name,sex,count
0,Isabella,F,22925
1,Sophia,F,20648
2,Emma,F,17354
3,Olivia,F,17030
4,Ava,F,15436


In [68]:
# attributes - no parenthesis
names_2010.shape

(34089, 3)

In [69]:
# descriptive stats
# default is to only show stats for numeric data types
names_2010.describe()

Unnamed: 0,count
count,34089.0
mean,108.352812
std,697.685909
min,5.0
25%,7.0
50%,11.0
75%,29.0
max,22925.0


In [70]:
# in our case it can be useful to get all stats
names_2010.describe(include='all')

Unnamed: 0,name,sex,count
count,34089,34089,34089.0
unique,31643,2,
top,Isabella,F,
freq,2,19823,
mean,,,108.352812
std,,,697.685909
min,,,5.0
25%,,,7.0
50%,,,11.0
75%,,,29.0


In [71]:
# we know 34089 babies were registered with the US SSA in 2010
# the data provide counts by name
# what about counts by sex?
# a one liner!
names_2010.groupby('sex').count()

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,19823,19823
M,14266,14266


In [72]:
# that is a total count of names - there were 19823 different girl names registered, 14266 boy names
# what about the total number of boys and girls?
# also a one liner!
# note in this case we have to specify the numeric column we are summing - pandas will complain otherwise
names_2010.groupby('sex')['count'].sum()

sex
F    1776223
M    1917416
Name: count, dtype: int64

In [73]:
# we can do a lot with one-liners in python
# but for clarity sake from here on we will use a more verbose style
names_grouped = names_2010.groupby("sex")

In [74]:
# we know the most popular girl name by the way the data are sorted
# what about the boy name?
# the below tells us how many boys had the most popular boy name, but not the name
names_grouped['count'].max()

sex
F    22925
M    22139
Name: count, dtype: int64

In [75]:
# we know the most popular girl name by the way the data are sorted
# what about the boy name?
# the below tells us how many boys had the most popular boy name, but not the name
names_grouped['count'].max()

sex
F    22925
M    22139
Name: count, dtype: int64

In [76]:
names_grouped = names_2010.groupby('sex')
names_grouped.first() # note this only works becaue the data are already sorted by count

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,Isabella,22925
M,Jacob,22139


In [77]:
# this approach will work with unsorted data
names_sorted = names_2010.sort_values(['count'], ascending=False)
names_grouped = names_sorted.groupby('sex')
names_grouped.first()

Unnamed: 0_level_0,name,count
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
F,Isabella,22925
M,Jacob,22139
