# Python Workshop Series I - Intro to Python

#  Why Python?

Python is an open-sourced software. It is becoming increasingly utilized in the professional and emerging-tech world. It is one of the top 10 popular programing languages and recently passed 40 million users.

The popularity of Python stems from its flexibility and ability to use API's to route 3rd party tools into processes. Python also interacts well with a myriad of other softwares including R, QGIS, ArcGIS, Git, and many more.

Some reasons to use Python are:

    1) It is free!
    
    2) Many other softwares support it
    
    3) Lots of free resources and guides
    
    4) Great for data visualization
    
    5) Great for automation of tasks
    
Python is great for programming, coding, and data science...all at the same time! However, the learning curve is higher than other softwares and may not be the best option for single use projects. R is often the better choice, for simplicity reasons, for data analytics.

# How do we access Python?

If you came to an install night, you will have had Anaconda installed on your laptop. Anaconda has a console that allows us to access Python. To access the Anaconda console:

- On PC: Go to start and type 'Anaconda' and click on Anaconda Console

- On Mac:

To install Jupyter Notebooks we go to our Anaconda Console and type

pip install jupyter

# Lesson 1 - Variables and Assignment & Loading Data Into Python

- Navigating Jupyter Notebooks
- Typing in Python/Jupyter Notebooks
- Use variables to store values
- Use 'print' to display values
- Variables persist between cells
- Variables must be created before they are used
- Variables can be used in calculations
- Python is case-sensitive
- Use meaningful variable names
- Import Data

# Navigating Jupyter Notebooks

Jupyter Notebooks can confine segments of code into cells. To run a cell, press shift+enter at the same time. This runs the code within the highlighted cell.

One can add on a new cell via the 'insert' toolbar option, or set-up a short cut command. Cells can be deleted the same way, as well as injected inbetween preexisting cells the same way too. There are also a myriad of editing options for cells in the 'edit' toolbar option.

One can select from the drop down menu above whether the cell will contain code, text(markdown) with formatting, be a title(Heading), or be plain text(Raw NBConver).

Jupyter Notebooks can be as simple or complex as necessary and have many many functions. One of the greatest features of Jupyter Notebooks is how easy they are to share and work in.

# Typing in Python/Jupyter Notebooks

Coding can appear to be intimidating, however, it is just typing. Thanks to Python, many of the functions have straightfoward names that are easy to use, and thanks to Jupyter Notebooks, code formatting is taken care of for you.

As you go along in Python, use ## to leave notes to yourself about what you are doing. This is very important coding etiquette!

Lets try typing in Jupyter Notebooks and running Python by using it as a simple calculator.

In [24]:
## Simple Calculator
3+3*5

18

Any simple calculation can be done in Python, as well as advanced calculus, linear algebra, and regression analysis. There is really no limit to its functionality.

# Use variables to store values

Using Python as a simple calculator isn't very exciting and also does not tap into the full potential of the program. To explore more capabilities, let's explore using variables to store values. This service will be expanded upon later for entire data sets.

For an application, let's imagine we are looking at a plot of land with a building on it and want to understand the building better.

# Tips for working with variables in Python

There are three various types of data. The most common are:

- integer numbers
- floating point numbers (infinite decimals), and
- strings (words/characters)

You can make comments anywhere in your code with ##
This is imperative when generating code for the notes(documentation) that you write yourself are essentially your methodology and can let you know what you are doing at all times.

Also, Python is case sensitive meaning, variables must be spelt exactly write each time with upper and lower cases in the appropriate spots.

It also will help you to make variables names that coincide with the information to track variables better.

In [2]:
## Use variables to store values

## Numeric Value
cup_ounces = 8 

## String Value
ounces_in_a_cup = 'Fluid ounces in one cup:' ## This is a string

In [3]:
## Use print to display values

print(cup_ounces)
print(ounces_in_a_cup)

8
Fluid ounces in one cup:


In [27]:
print(ounces_in_a_cup, cup_ounces) ## We add in the comma to list all the variables we want to display

Fluid ounces in one cup: 8


In [28]:
## Manipulate variables in the print feature | Let's convert acres to square feet
print('Fluid ounces in one pint:', 2*cup_ounces)

Fluid ounces in one pint: 16


Variables must be created before they are used, if they aren't Python will let us know

In [29]:
print(ounces_in_gallon)

NameError: name 'ounces_in_gallon' is not defined

In [30]:
cups_in_gallon = 16

Variables can be used in calculations

In [31]:
ounces_in_a_gallon = cup_ounces*cups_in_gallon

In [32]:
print('Fluid ounces in one gallon:',ounces_in_a_gallon)

Fluid ounces in one gallon: 128


# Loading Data Into Python

Python is great for generating simple and complex mathematical formulas, however, it is also great for handling data sets, or data frames as they are known in Python.

In the comming segment, we will go over how to import data into Python directly from an internet source.

Let's try an application of importing data on Bob Ross's paintings from Five Thirty Eight's website.

In [4]:
## Importing a library that has specific commands
import pandas as pd

## as we import pandas, we can assign it an abbreviation
## so we do not have to type pandas over and over again. Let's just call it
## pd. However, you can call it anything you would like

bob_ross = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bob-ross/elements-by-episode.csv')

## Much like a variable, we are going to assign a new database by giving it
## a name, in this case bob_ross. Then, we are going to use pandas(pd) to 
## read a csv. Pandas can take a url that downloads data and directly 
## download it into python itself. Here, we are taking the data directly from 
## Five Thirty Eight's GitHub

In [5]:
## View the data we just imported
bob_ross

Unnamed: 0,EPISODE,TITLE,APPLE_FRAME,AURORA_BOREALIS,BARN,BEACH,BOAT,BRIDGE,BUILDING,BUSHES,...,TOMB_FRAME,TREE,TREES,TRIPLE_FRAME,WATERFALL,WAVES,WINDMILL,WINDOW_FRAME,WINTER,WOOD_FRAMED
0,S01E01,"""A WALK IN THE WOODS""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
1,S01E02,"""MT. MCKINLEY""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
2,S01E03,"""EBONY SUNSET""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
3,S01E04,"""WINTER MIST""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
4,S01E05,"""QUIET STREAM""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
5,S01E06,"""WINTER MOON""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
6,S01E07,"""AUTUMN MOUNTAINS""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
7,S01E08,"""PEACEFUL VALLEY""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
8,S01E09,"""SEASCAPE""",0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,S01E10,"""MOUNTAIN LAKE""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0


In [35]:
## first 5 rows/observations
bob_ross[:5]

Unnamed: 0,EPISODE,TITLE,APPLE_FRAME,AURORA_BOREALIS,BARN,BEACH,BOAT,BRIDGE,BUILDING,BUSHES,...,TOMB_FRAME,TREE,TREES,TRIPLE_FRAME,WATERFALL,WAVES,WINDMILL,WINDOW_FRAME,WINTER,WOOD_FRAMED
0,S01E01,"""A WALK IN THE WOODS""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
1,S01E02,"""MT. MCKINLEY""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
2,S01E03,"""EBONY SUNSET""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
3,S01E04,"""WINTER MIST""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
4,S01E05,"""QUIET STREAM""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0


In [36]:
## Import a library that does fancy numerical operations
import numpy as np

In [37]:
## We have generated several variables and a data frame.
## To keep track of what we have made, we can ask Python to tell us the 'type'
## of the variable we are looking at
print(type(cup_ounces))
print(type(bob_ross))

<class 'int'>
<class 'pandas.core.frame.DataFrame'>


In [38]:
## We can also have Python easily tell us the dimensions of our data frame
print(bob_ross.shape)

(403, 69)


In [39]:
np.mean(bob_ross)

APPLE_FRAME           0.002481
AURORA_BOREALIS       0.004963
BARN                  0.042184
BEACH                 0.066998
BOAT                  0.004963
BRIDGE                0.017370
BUILDING              0.002481
BUSHES                0.297767
CABIN                 0.171216
CACTUS                0.009926
CIRCLE_FRAME          0.004963
CIRRUS                0.069479
CLIFF                 0.019851
CLOUDS                0.444169
CONIFER               0.526055
CUMULUS               0.213400
DECIDUOUS             0.563275
DIANE_ANDRE           0.002481
DOCK                  0.002481
DOUBLE_OVAL_FRAME     0.002481
FARM                  0.002481
FENCE                 0.059553
FIRE                  0.002481
FLORIDA_FRAME         0.002481
FLOWERS               0.029777
FOG                   0.057072
FRAMED                0.131514
GRASS                 0.352357
GUEST                 0.054591
HALF_CIRCLE_FRAME     0.002481
                        ...   
MOUNTAIN              0.397022
MOUNTAIN

In [45]:
np.mean(bob_ross.MOUNTAIN)

0.3970223325062035

In [23]:
print('Percent of Bob Ross Paintings with Mountains:',np.mean(bob_ross.MOUNTAIN)*100)

Percent of Bob Ross Paintings with Mountains: 39.70223325062035


# Looking Ahead -- Python Workshop Series II

- Integration of APIs
- JSON