## 🐍 Intro to Python 

Python is a programming language that we can use to do a lot of different things!

Here's how I use it in my day-to-day as a data reporter:
* Analyze and visualize data
* Build websites and interactives 
* Scrape websites
* Build and manage databases 
* Scrape PDFs for information 

In this session, we are going to go over the basics of Python. 

#### Goals
I hope that you come away from the class knowing enough that you can learn to use Pandas, a powerful data analysis library. Also, I want you to know enough to be able to google what you want to do in Python and be able to understand what you find. 

#### Who am I? 
Hi! I am Will Craft. Pronouns he/him. 

💻 email: wcraft@apmreports.org 

🐦 @craftworksxyz 


## 🐍 Why use Python over Excel? 
Excel good for doing some stuff but but ...

#### Python is versatile 
I do almost all my work in python. Here's a sample workflow from a recent project:

Get scanned PDF of data from local utility service -> Write python code to pull text data out of pdf -> Write code to parse the text data and turn it into a csv -> Use python to analyze data -> Visualize findings for publication 

You can use Python to scrape the web, make visuals, do statistics, write up the results of an investigation, build a website, and a whole lot more. 

#### Python is reproducible
*Reproducibility* is important

Excel is not reproducible. With Python, I can hand someone my original data and my code and they can re-create all my work with a few keystrokes. This is not possible with Excel unless you take meticulous notes about all the changes you make. Reproducibility reduces errors and allows for fact-checking

If I get new data or I make a change to the data, I can run the same script and update my results. 

#### Python is shareable 
We are working in jupyter notebook, a program for doing python in bite sized chunks and sharing stuff. You can share your code with co-workers and the public. 

#### Lots of resources online for learning
I do a lot of learning on the job. There are tons of resources for getting started:
* NICAR: https://ire.org/resource-center/tipsheets/?q=python 
* Ben Welsh: http://www.firstpythonnotebook.org/
* Django tutorials: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Tutorial_local_library_website
* Other data journalists (Lonely Coders Club/News Nerdery) 


## 🐍 Herpatology 101

There are a several different versions of Python, because Python itself is constantly evolving, with new features being added and changes being made to the language.

We are using Python 3.7 and a program called pipenv. There are lots of packages that people have written to do different things, and these packages sometimes need different versions of Python and might conflict with one another. Pipenv creates and manages a virtual environment, a self-contained sandbox to install different packages. 

We are going to be working in a jupyter notebook. Jupyter is a program that allows us to write, test, and share python code in handy cells.

Let's start our notebook. Open a terminal and navigate to `~/Desktop/hands_on_classes/python_1_the_fundamentals_repeat_1651`. 

Type `pipenv run jupyter lab` into the terminal, and it will open up a notebook in a browser. 

You can run the code in a cell by hitting shift+enter

## Variables
Type something into the cell below

In [None]:
snake

You probably got some kind of error. I got `NameError: name 'snake' is not defined`. 

Here are some tips for when you get error messages in the future:
* Proofread your code. Errors from misspellings, misplaced punctuation, etc. are common
* Google the error message 


Our problem here is that `snake` is not defined. Python thought it was a variable but we haven't created a variable called `snake` so we got an error

In [None]:
# Lets create a variable
# A variable can be almost set of characters (No spaces!)

# Pikachu was the name of a pet snake I had as a kid :) 
my_favorite_snake = pikachu

# What do you think will happen when I run this?


In [None]:
# What went wrong? 
# Same error! 
# Python sees a set of letters and expects a variable. We need to put the information in quotation marks

my_favorite_snake = "pikachu"

# we want python to return something. Lets tell Python to print what we want

print(my_favorite_snake)

## Data Types

Python has several different data types. 
* Strings -> character data. Strings start and with quotation marks. 
* Intergers -> whole numbers
* Float -> floating point numbers. Numbers with decimal points
* Booleans -> True or False (capitalization matters!) 

Things to keep in mind:
* some words are used by Python and can't be used for variables (`def`, `import`, `from`, `in`, `while`, `for`, `if`, `elif`, `else`, `print`, `input`, and many more.)

### Strings

Anything between quotation marks, Python considers a string. Text data, basically

In [None]:
# Python has a lot of built-in functions for handling strings. 

city_hall = "1 Frank H Ogawa Plaza, Oakland, CA 94612"

In [None]:
# We can get the length of the string with len()

print(len(city_hall))

In [None]:
# We can break the string into pieces based on spaces. 
city_hall.split(",")

Notice the output of `city_hall.split()`. Each piece is its own string and they are all in something called a list, which we will get to soon.

In [None]:
# We can add strings together 

first_half = "I hate snakes"

second_half = "that eat kittens."


# Will this give us any problems? 
print(first_half+second_half)

In [None]:
print(first_half+ ' '+ second_half)

In [None]:
# You can call variables in strings with something called an f-string

number_of_registered_voters = 1556
number_of_votes = 556

print(f"There were {number_of_votes} and there were {number_of_registered_voters} registered voters")

In [None]:
# You can even do math in f-strings
# I've used this to generate paragraphs of text for different cities where the only difference are the numbers. 
# Very handy for quickly generating a lot of text for accompanying visuals

print(f"If {number_of_votes} people voted and there were {number_of_registered_voters} registered voters,\
then {round(number_of_votes/number_of_registered_voters * 100)}% of people voted")

### Integers and Floats
Whole numbers and floating point numbers 

In [None]:
# What do you think will happen if I run this? 
print("5" + "5")

# What happened? 

In [None]:
# What about this?
print(5 + 5)

Integers and floats, don't need no quotes 

In [None]:
# Python automatically follows order of operations 
print(100/80 + 5 * 2)

The difference between floats and ints does matter. 

You can convert between the two... but be careful. That can cause some weird rounding problems you might not expect! 

In [None]:
float_num = 18.5

# convert to an integer with int() ... what happens? 
print(int(float_num))

### Booleans
There are two booleans, True and False. They need to be capitalized! 

In [None]:
# You assign a variable with a single =
teacher = 'Will Craft'

# You compare with a double equal sign == 
# What will this give us? 
teacher == 'will craft'

Python demands exactness! We are testing if something is exactly equal. 

In [None]:
# We can string together booleans with a couple keywords: `and` and `or`

print(True and True)
print(True and False)
print(True or False)

## Lists

Lists are groups of different elements enclosed in square brackets


In [None]:
# Here's a list of months. 
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November"]

In [None]:
# We can find out how long the list is. The function len() tells us how many elements are in the list. 
len(months)

In [None]:
# Something is wrong with our list! Let's add an element. 
# Notice we call the append function at the end of our variable. Don't be afraid to look up different functions and how to use them! 
months.append('December')

print(months)

In [None]:
# We can get the position of different elements in the list by using indexing. You can get the element at the n-th position on the list 

# What do you think will happen if we do this? 
print(months[0])

In [None]:
# What about this? 
print(months[1])

In [None]:
# Can we go backwards? 
print(months[-1])

In [None]:
# I only want a part of the list 
print(months[0:6])

In [None]:
# We can save that selection as its own variable! 
first_half_of_months = months[0:6]
print(first_half_of_months)

In [None]:
# What if you know the thing you want but not the position in the list???
print(months.index('August'))

## Dictionaries 

Dictonaries are like lists, but instead of storing a set of variables or integers, dictionaries store sets of information that are connected, called key-value pairs. 

Think of keys like column names and the values are the values of the cells 

In [None]:
# Curly braces! Not brackets 
my_dict = { "city": "Newport Beach", "state": "California", "county": "Orange", "zip": 92657}

print(my_dict)

In [None]:
# Get a single value from the dictonary 

print(my_dict['city'])

In [None]:
#Lets look at all the keys and all the values 

print(my_dict.keys())

print(my_dict.values())

In [None]:
# We can check to see if there is a key in the dictionary 

print('weather' in my_dict)

In [None]:
# We can add keys and values to the dictionary. Notice that its a bit different than adding an element to a list. 

my_dict['weather'] = 'great!'
print(my_dict)

## For Loops and if else statements 

For loops and if/else statements are very important! 

A for-loop iterates through every element in a list. 

They look like this: 

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


# indentation matters in a for loop. Many Python programs will automatically indent for you
for number in numbers:
    print(f"{number} is a {type(number)}")

    

If/else statements, like in excel, does one thing if some condition is met, and something else if the condition isn't met. 

In [None]:
for number in numbers: 
    if number%2==0:
        print(f"{number} is ... EVEN!")
    # elif short for else, if 
    # As soon as the condition is met, the code executes and moves to the next element of the list 
    elif number == 5: 
        print(f"{number} is ... the number 5")
    else:
        print(f"{number} is ... ODD!")

## All together now
Now let's put everything we've learned together and write some code!

Here's a list of dictonaries. Each dictionary has details on a city, the city's newspaper, the paper's circulation, the city's population etc.

What are some things we can find out from this information? 

In [2]:
newspaper_details  = [
    {"city": "Seattle", 
     "paper_name": "The Seattle Times", 
     "circulation": "229,764", 
     "city_population": 608660, 
     "radio_station": "KUOW", 
     "radio_listeners": 375800},
    {"city": "Portland", 
     "paper_name": "The Oregonian", 
     "circulation": "140,000", 
     "city_population":583776, 
     "radio_station": "KOBP", 
     "radio_listeners": 380000},
    {"city": "Denver", 
     "paper_name": "The Denver Post", 
     "circulation": "253,261", 
     "city_population": 704621, 
     "radio_station": "KCFM", 
     "radio_listeners": 68000 },
    {"city": "Minneapolis", 
     "paper_name": "The Star Tribune", 
     "circulation": "288,315", 
     "city_population":  422331, 
     "radio_station": "MPR", 
     "radio_listeners": 1000000},
    {"city": "Boston", 
     "paper_name": "The Globe", 
     "circulation": 245572, 
     "city_population": 617594, 
     "radio_station": "WGBH", 
     "radio_listeners": 1438300}
    
]

Which city has the highest per capita circulation? Which has the highest per capita listenership? 

What else could you learn? 

In [17]:
for dictionary in newspaper_details:
    pop = dictionary['city_population']
    if type(dictionary['circulation']) == int:
        circ = dictionary['circulation']
    else:
        circ = int(dictionary['circulation'].replace(',',''))
    circ_rate = circ/pop * 1000
    
    print(f"{dictionary['city']} has {circ_rate} subscribers per 1000 people")

Seattle has 377.49153879012914 subscribers per 1000 people
Portland has 239.81801238831332 subscribers per 1000 people
Denver has 359.42868577575746 subscribers per 1000 people
Minneapolis has 682.6754370387208 subscribers per 1000 people
Boston has 397.6269199506472 subscribers per 1000 people
