# Lesson 1: Intro to Python

## Variables and Types

### Calculations
Python allows you to make calculations, just like a regular calculator:

In [6]:
# Addition
3+1+1

5

In [2]:
# Multiplication
3*4

12

In [3]:
# Order of operations
(3+4)*4

28

### Variables: Storing Values
When you want to make more complex calculations you can "save" values in variables when you are coding.

Imagine you have the task to calculate the properties of a cylindrical can given that you know its radius (r) to be 1.82364 and height (h) to be 2.984757

Measurements:
- **Volume**: pi X r^2 X h
- **Surface Area**: (2 X pi X r X h) + (2 X pi X r^2)

Writing the *r* and *h* in these equations would be very inefficient and would take a long time...

Instead we can assign the radius and height to variables and write the equations out...


In [4]:
# Store the values of the radius, height and pi into variables
r = 1.82364
h = 2.984757
pi = 3.14

# We may now calculate the volume and surface area using normal math

Vol = pi*(r**2)*h
SA = (2*pi*r*h)+(2*pi*(r**2))

### Display your information: print()
print() is a function that can be used to display the results of your code

In [5]:
print('The volume of your cylinder is ', Vol,  'and the Surface Area', SA)

The volume of your cylinder is  31.168567775748336 and the Surface Area 55.0679704599024


### Manipulating Variables
You can also manipulate variables to update their values. In the example below, you will update the values or *r* and *Vol*.

In [6]:
# Increase the radius of the cylinder by 3 

r += 3 #this is the same as saying r = r + 3

# Calculate the new volume with the larger radius
Vol = pi*(r**2)*h

# Print the newly calculate volume
print('The volume for the new cylinder is',Vol)

The volume for the new cylinder is 218.06622388899157


### Types
- All values in Python have a specific type.
    - There are many data types in python but the most common are:
        - **Int**: This type describes whole numbers (no decimal points). *e.g., 2 or -11*
        - **Float**: This type describes real numbers with decimals. *e.g, 65.23 or 0.854*
        - **Strings**: This type describes a sequence of characters, like a word. *e.g., Music or television*
        - **Booleans**: A value with only two states (True or False and 1 or 0) used to represent logical propositions and to do data filtering
        
- The types of values can be found using the type() function


In [7]:
# Create some variables
x = 1.56
y = 2
z = x + y

# Print variable type
print(type(x))
print(type(y))
print(type(z))

<class 'float'>
<class 'int'>
<class 'float'>


In [8]:
# Create a variable
a = 'this is a string'
# Print variable type
type(a)

str

In [9]:
# Boolean type

print(type(True))
print(type(False))

<class 'bool'>
<class 'bool'>


### Python Lists
- A list is just another data type. This data type allows us to work with multiple data points at the same time.

- In data science you will have to work with multiple data points often. Imagine that you were building a machine learning model that has as input the height of your friends. It would be very inconvenient to create a variable for every single person and then call each variable one by one.

- Instead what you could do is create a list to simplify this process.
- Lists a mutable, meaning that their element values can be manipulated.



In [10]:
# Create a list. Note square brackets
friends_height = [5.5,5.6,5.7,6.1,6.2,4.9]

# Print friends height list
print(friends_height)

[5.5, 5.6, 5.7, 6.1, 6.2, 4.9]


**Sublist**: elements within lists can be of any type (int, float, string). In fact, list elements can be composed of others lists.

In [11]:
#  Sublist Example
sublist_a = [5.5,5.6,[5.7,6.1,6.3]]

# Print sublist
print(sublist_a)

[5.5, 5.6, [5.7, 6.1, 6.3]]


**Index:** elements within a list are indexed according to their location in the list, starting with 0 (zero) from left to right. If you need to access an element within a list, you can use its indexing value. To do this, you type the name of the list with the index value in square brackets. 

**Zero Based Indexing:** is a way of numbering each of the elements of a list, where the very first value in the list is assigned the location = 0

For example, <br>
list_b = [a, b, c, d, e] <br>
Index =    0, 1, 2, 3, 4  <br>

In this example, the index for element c is 2, while the index for element e is 4, and so on. 


In [12]:
# Extract 1st value from friends_height list
friends_height[0]

5.5

**Slicing:** sometimes you might need to extract multiple elements from a list at the same time. You can use a colon or slice operator for this (:).
- Note the format of the slicing process list[start,end] where the last element is not included [inclusive:exclusive]

In [13]:
#Reprint Friends Height List
print(friends_height)

[5.5, 5.6, 5.7, 6.1, 6.2, 4.9]


In [14]:
# Slicing Example: Extract the first 2 elements of the friends_height list
friends_height[0:2]

[5.5, 5.6]

In the example above, only the elements with index values of 0 and 1 were printed. The 2 is exclusive as explained above.
 

In [15]:
# Extract up to the element with index 4 from the end of the list
friends_height[4:]

[6.2, 4.9]

In [16]:
# Extract all elements from the beginning of the list up to the 4th index
friends_height[:5]

[5.5, 5.6, 5.7, 6.1, 6.2]

**List Manipulation:** since lists in python are mutable, you can update, add or remove elements from the list

In [17]:
# Print the original list
print('This is the original list:', friends_height)

# Update the 6th element (index 5) from the list
friends_height[5] = 7.1

# Print updated list
print('This is the updated list:', friends_height)

This is the original list: [5.5, 5.6, 5.7, 6.1, 6.2, 4.9]
This is the updated list: [5.5, 5.6, 5.7, 6.1, 6.2, 7.1]


In [18]:
# Add an element to the end of the list

# Create a new height element for your new friend
new_friend = [3.1]

# Modify the friends_height list with the newly created variable at the end
friends_height += new_friend

# Print the updated list
print(friends_height)


[5.5, 5.6, 5.7, 6.1, 6.2, 7.1, 3.1]


## Functions and Packages

### Functions
- A **Function** is a piece of reusable code used to solve a particular task. You can use a function instead of having to write the code yourself everytime you're trying to complete the same action. 


In [19]:
# Simple function example
# Use the max() function to find the maximum height (or the tallest height) among your friends
print(max(friends_height))

# Print size of list
print('There are',len(friends_height), 'elements in this list')



7.1
There are 7 elements in this list


- You may assign a function to a variable to store its results

In [20]:
# Store the tallest height on a variable called "tallest"
tallest = max(friends_height)

# Print the variable
print('My tallest friends is:', tallest)

# Sort the list and save in another variable
sorted_friends_height = sorted(friends_height)
print("\nThe new sorted list is shown below:")
print(sorted_friends_height)


My tallest friends is: 7.1

The new sorted list is shown below:
[3.1, 5.5, 5.6, 5.7, 6.1, 6.2, 7.1]


- Functions may take multiple inputs:

In [1]:
# Example of multiple inputs using the round() function, store the value in a variable called "z"
z = round(1.9835,2)

# Print the result
print(z)

1.98


 - Note that Python stores all the information and inner workings of a function internally, you don't have to worry about these.
 - If you want to know more about how a function work, you may simply look for its documentation online
 - You can either create your own functions or use existing ones. 
 

### Objects and Methods
- Everything in Python is an **object**, that is why Python is known as an "object oriented programming language"
- **Methods** are functions that belong to objects
    - Different objects have specific methods
    - You can apply a method to an object using the dot notation
    
**Dot Notation Example:** to call a method on a particular object you write the object followed by a period and the name of the method you would like to apply on it. On this example we use the index method on a list to find the location of an element in that list

In [22]:
## Dot notation example on a method.
# Apply the index method on the new_friends_height list
# Find the location of a friend with the height of 6.1 feet in the list
print(friends_height)
friends_height.index(6.1)

[5.5, 5.6, 5.7, 6.1, 6.2, 7.1, 3.1]


3

In [23]:
## Example 2
# You can use the replace method to modify a string
# Define a friend name
friend_a = 'Liza'
friend_b = 'Hailey'

print(friend_a)
print(friend_b)

Liza
Hailey


In [24]:
# Replace 'z' with 's' and store it in a new variable
new_friend_a = friend_a.replace('z','s')

# Replace 'y' with 'ay' and store it in the same variable
friend_b = friend_b.replace('ai','a')

# print
print(new_friend_a)
print(friend_b)

Lisa
Haley


### Packages
- There is a very large number of methods and types created for Python.
- Having all of these in the same Python distribution would be messy.
- We must therefore store all of these methods and types into packages.
- You can think of packages as a directory of Python scripts.
- Python packages are made up of **Modules**
    - Each script is a Module, each module contains functions, methods and types
- Among the most important packages for data scientists are:
    - NumPy: allows us to work with numbers in array form
    - Matplotlib: allows us to create data visualizations
    - Scikit-Learn: allows us to use machine learning models
- To use packages you must install them on your own system
    - **conda install** allows you to install packages in the Anaconda python distribution


In [25]:
conda install numpy

Collecting package metadata (current_repodata.json): ...working... done
Note: you may need to restart the kernel to use updated packages.
Solving environment: ...working... done

# All requested packages already installed.




In [26]:
conda install pandas

Collecting package metadata (current_repodata.json): ...working... done
Note: you may need to restart the kernel to use updated packages.

Solving environment: ...working... done

# All requested packages already installed.



- After installing any package, you must import the package into python to be able to use it
- You will be using the numpy extension often in your code, therfore you can import and immediately abbreviate it as "np"

In [27]:
import numpy as np

- At this point you are ready to use the Numpy package in your code
- In the example below we use numpy to create an array of zeros

In [28]:
# Define a 5x2 matrix with only zeros 
a = np.zeros((5,2))

# print the matrix you defined
print(a)

# Get the dimensions and size of the array
dimensions = np.ndim(a)
shape = np.shape(a)
print('\nThis is a', dimensions, 'dimensional array of shape' ,shape)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

This is a 2 dimensional array of shape (5, 2)


You can also create your own arrays and manipulate its data. 

In [29]:
# Create a new array
my_array = np.array([[3,5,7],[11,0.2,-2],[-1,-2,1.9]])

# Print array and type
print(my_array)
print(type(my_array))

# Find the value of the middle element in your array
ind = my_array[1,1]
print(ind)

#Find the size of the array
print('Array size:',my_array.size)

[[ 3.   5.   7. ]
 [11.   0.2 -2. ]
 [-1.  -2.   1.9]]
<class 'numpy.ndarray'>
0.2
Array size: 9


## LAB - Lists
- Given the area measurements of a house, complete the following steps

In [30]:
# List with locations in a house with their corresponding areas
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]


# Use slicing to create a list of the areas that would belong to the first level of the house. Name this list "downstairs"
downstairs = areas[:6]

# Use slicing to create a list of the areas that would belong to the second level of the house. Name this list "upstairs"
upstairs = areas[-4:]

# Print areas from the first level
print('The rooms downstairs are:',downstairs)

# Print areas from the second level
print('The rooms upstairs are:',upstairs)

# Add a restroom to the downstairs list with an area of 13.5
restroom_area = ["restroom", 13.5]
downstairs += restroom_area

# Print new areas from the first level
print('The updated rooms downstairs are:',downstairs)


The rooms downstairs are: ['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
The rooms upstairs are: ['bedroom', 10.75, 'bathroom', 9.5]
The updated rooms downstairs are: ['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'restroom', 13.5]


In [31]:
# Create a copy of the original areas, remove all the room names and sort from largest to smallest

new_areas = areas

new_areas.pop(0)
new_areas.pop(1)
new_areas.pop(2)
new_areas.pop(3)
new_areas.pop(4)
new_areas.sort(reverse = True)

print('The room areas are:')
print(new_areas)

# Think of other ways to achieve the result above without having to individually remove elements

The room areas are:
[20.0, 18.0, 11.25, 10.75, 9.5]


## LAB - NumPy
- As a data scientist you will be working with millions of data points
- You will need to be able to interpret the data easily
- NumPy is great at finding basic statistical facts about your data such as: mean, median , sum, sort, std deviation
- Note that because NumPy forces the data into an array, it can drastically speed up the processing of this data
- For this lab you are given a dataset with baseball information



In [32]:
# Import numpy
import numpy as np

# Baseball Dataset
baseball = [['Adam_Donachie','BAL','Catcher','74','180','22.99','Catcher'],['Paul_Bako','BAL','Catcher','74','215','34.69','Catcher'],
         ['Ramon_Hernandez','BAL','Catcher','72','210','30.78','Catcher'],['Kevin_Millar','BAL','First_Baseman','72','210','35.43','Infielder'],
         ['Chris_Gomez','BAL','First_Baseman','73','188','35.71','Infielder'],['Brian_Roberts','BAL','Second_Baseman','69','176','29.39','Infielder'],
         ['Miguel_Tejada','BAL','Shortstop','69','209','30.77','Infielder'],['Melvin_Mora','BAL','Third_Baseman','71','200','35.07','Infielder'],
         ['Aubrey_Huff','BAL','Third_Baseman','76','231','30.19','Infielder'],['Adam_Stern','BAL','Outfielder','71','180','27.05','Outfielder'],
         ['Jeff_Fiorentino','BAL','Outfielder','73','188','23.88','Outfielder'],['Freddie_Bynum','BAL','Outfielder','73','180','26.96','Outfielder'],
         ['Nick_Markakis','BAL','Outfielder','74','185','23.29','Outfielder'],['Brandon_Fahey','BAL','Outfielder','74','160','26.11','Outfielder'],
         ['Corey_Patterson','BAL','Outfielder','69','180','27.55','Outfielder'],['Jay_Payton','BAL','Outfielder','70','185','34.27','Outfielder'],
         ['Erik_Bedard','BAL','Starting_Pitcher','73','189','27.99','Pitcher'],['Hayden_Penn','BAL','Starting_Pitcher','75','185','22.38','Pitcher'],
         ['Adam_Loewen','BAL','Starting_Pitcher','78','219','22.89','Pitcher'],['Daniel_Cabrera','BAL','Starting_Pitcher','79','230','25.76','Pitcher'],
         ['Steve_Trachsel','BAL','Starting_Pitcher','76','205','36.33','Pitcher'],['Jaret_Wright','BAL','Starting_Pitcher','74','230','31.17','Pitcher'],
         ['Kris_Benson','BAL','Starting_Pitcher','76','195','32.31','Pitcher'],['Scott_Williamson','BAL','Relief_Pitcher','72','180','31.03','Pitcher']]

#print new array
np_baseball = np.array(baseball)
print(np_baseball)

# Create np_height_in from np_baseball
np_height_in = np_baseball[:,3]

# Convert the height value type to int
np_height_in = np_height_in.astype(int)

# Print new height array
print(np_height_in)

# Print out the mean of np_height_in
print('The average height is:',np.mean(np_height_in))

# Print out the median of np_height_in
print('The median height is:',np.median(np_height_in))

# Print out the standard deviation on height
#np.std(np_baseball[:,0])
print('The standard deviation:', np.std(np_height_in))

# Print out correlation between height and weight column.
np_weight_in = np_baseball[:,4]
np_weight_in = np_weight_in.astype(int)

corr = np.corrcoef(np_height_in,np_weight_in) #[1,0] including this at the end of the () yields a singel value

print("Correlation:" , corr)

[['Adam_Donachie' 'BAL' 'Catcher' '74' '180' '22.99' 'Catcher']
 ['Paul_Bako' 'BAL' 'Catcher' '74' '215' '34.69' 'Catcher']
 ['Ramon_Hernandez' 'BAL' 'Catcher' '72' '210' '30.78' 'Catcher']
 ['Kevin_Millar' 'BAL' 'First_Baseman' '72' '210' '35.43' 'Infielder']
 ['Chris_Gomez' 'BAL' 'First_Baseman' '73' '188' '35.71' 'Infielder']
 ['Brian_Roberts' 'BAL' 'Second_Baseman' '69' '176' '29.39' 'Infielder']
 ['Miguel_Tejada' 'BAL' 'Shortstop' '69' '209' '30.77' 'Infielder']
 ['Melvin_Mora' 'BAL' 'Third_Baseman' '71' '200' '35.07' 'Infielder']
 ['Aubrey_Huff' 'BAL' 'Third_Baseman' '76' '231' '30.19' 'Infielder']
 ['Adam_Stern' 'BAL' 'Outfielder' '71' '180' '27.05' 'Outfielder']
 ['Jeff_Fiorentino' 'BAL' 'Outfielder' '73' '188' '23.88' 'Outfielder']
 ['Freddie_Bynum' 'BAL' 'Outfielder' '73' '180' '26.96' 'Outfielder']
 ['Nick_Markakis' 'BAL' 'Outfielder' '74' '185' '23.29' 'Outfielder']
 ['Brandon_Fahey' 'BAL' 'Outfielder' '74' '160' '26.11' 'Outfielder']
 ['Corey_Patterson' 'BAL' 'Outfielder' 

In [2]:
# List with locations in a house with their corresponding areas
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]


# Use slicing to create a list of the areas that would belong to the first level of the house. Name this list "downstairs"


# Use slicing to create a list of the areas that would belong to the second level of the house. Name this list "upstairs"


# Print areas from the first level


# Print areas from the second level


# Add a restroom to the downstairs list with an area of 13.5


# Print new areas from the first level



In [3]:
# Create a copy of the original areas, remove all the room names and sort from largest to smallest


# Think of other ways to achieve the result above without having to individually remove elements

In [4]:
# Import numpy
import numpy as np

# Baseball Dataset
baseball = [['Adam_Donachie','BAL','Catcher','74','180','22.99','Catcher'],['Paul_Bako','BAL','Catcher','74','215','34.69','Catcher'],
         ['Ramon_Hernandez','BAL','Catcher','72','210','30.78','Catcher'],['Kevin_Millar','BAL','First_Baseman','72','210','35.43','Infielder'],
         ['Chris_Gomez','BAL','First_Baseman','73','188','35.71','Infielder'],['Brian_Roberts','BAL','Second_Baseman','69','176','29.39','Infielder'],
         ['Miguel_Tejada','BAL','Shortstop','69','209','30.77','Infielder'],['Melvin_Mora','BAL','Third_Baseman','71','200','35.07','Infielder'],
         ['Aubrey_Huff','BAL','Third_Baseman','76','231','30.19','Infielder'],['Adam_Stern','BAL','Outfielder','71','180','27.05','Outfielder'],
         ['Jeff_Fiorentino','BAL','Outfielder','73','188','23.88','Outfielder'],['Freddie_Bynum','BAL','Outfielder','73','180','26.96','Outfielder'],
         ['Nick_Markakis','BAL','Outfielder','74','185','23.29','Outfielder'],['Brandon_Fahey','BAL','Outfielder','74','160','26.11','Outfielder'],
         ['Corey_Patterson','BAL','Outfielder','69','180','27.55','Outfielder'],['Jay_Payton','BAL','Outfielder','70','185','34.27','Outfielder'],
         ['Erik_Bedard','BAL','Starting_Pitcher','73','189','27.99','Pitcher'],['Hayden_Penn','BAL','Starting_Pitcher','75','185','22.38','Pitcher'],
         ['Adam_Loewen','BAL','Starting_Pitcher','78','219','22.89','Pitcher'],['Daniel_Cabrera','BAL','Starting_Pitcher','79','230','25.76','Pitcher'],
         ['Steve_Trachsel','BAL','Starting_Pitcher','76','205','36.33','Pitcher'],['Jaret_Wright','BAL','Starting_Pitcher','74','230','31.17','Pitcher'],
         ['Kris_Benson','BAL','Starting_Pitcher','76','195','32.31','Pitcher'],['Scott_Williamson','BAL','Relief_Pitcher','72','180','31.03','Pitcher']]

#print new array


# Create np_height_in from np_baseball


# Convert the height value type to int

# Print new height array


# Print out the mean of np_height_in


# Print out the median of np_height_in


# Print out the standard deviation on height
#np.std(np_baseball[:,0])


# Print out correlation between height and weight column.


