# Programming Fundamentals II - Tools for Data Science 
Agenda today:
- Loops, lists, functions continued
- Lambda function
- List comprehension
- Knowing your data - measurements of central tendency & dispersion

After this class, students will be able to:
- lay out plan and execute advanced functions 
- understand lambda function syntax and write lambda function in conjunction with other python operations
- understand list comprehension and replace for loop with it 

#### Starting off - reflections from yesterday 

Based on the lessons you have learned yesterday and Monday in _What is Data Science?_, what are some of the real world applications you can think of that allow you to utilize the tools you have learned and achieve a data science related goal?

## Part I - loops continued
- Nested loops 
- While, Break, and Continue

#### 1.1 While, break, and continue
In our last class, we learned about the behavior of while loops. How do we tame it?

In [None]:
i = 1
while i < 6:
    print(i)
    i += 1

What is break and continue?

In [None]:
i = 0
while i < 6:
    i += 1
    if i == 3:
        print("foo")
        continue
    print(i)

How is the code below different?

In [None]:
i = 0
while i < 6:
    i += 1
    if i == 3:
        print("foo")
        break
    print(i)

#### 1.2 Nested Loops 

In [None]:
lst = [1,2,3,4,5]

for x in lst:
    print('loop1:', x)
    for y in lst:
        print('loop2---', y)

What do you expect to see and why?

## Part II - lambda function 
The lambda function is anonymous function in python. The syntax is __lambda arguments : expression.__ <br>
We use lambda functions when we require a nameless function for a short period of time.

In Python, we generally use it as an argument to a higher-order function (a function that takes in other functions as arguments). Lambda functions are used along with built-in functions like filter(), map() and reduce() etc. <br>

Lambda function can also come really handy when we are working with feature engineering. 

In [None]:
# example 1:
func_1 = lambda x: x+10
print(func_1(10))

In [None]:
def plus_ten(x):
    return x+10

In [None]:
# example 2: with more arguments
func_2 = lambda x,y : x**y 
print(func_2(2,3))

In [None]:
# using lambda more other operations 
func_3 = lambda x: False if x//2 == 0 else True
func_3(9)

Write a function that turns Fahrenheit to Celsius using Lambda expression. Subtract 32 from the temperature and then multiply by 5/9.

Write a function that turns Fahrenheit to Celsius using Lambda. Multiply the temperature by 9/5 and then add 32 to it. 

#### Using it in conjunction with map(), reduce(), and filter()

In [None]:
# map applies a function to a collection of objects (lists)
# without map
# create a list that's the age of dogs, and multiply it by 7 to get their age in human years
age_of_dogs = [2,5,10,6,13,18]
age_of_dogs_human_years = []
for age in age_of_dogs:
    age_of_dogs_human_years.append(age*7)
print(age_of_dogs_human_years)

In [None]:
# with map
print(list(map(lambda x: x*7, age_of_dogs)))

In [None]:
# filter - filtering thru a dictionary 
# syntax: filter(function_object, iterable)
# function_object is called for each element of the iterable and filter returns only those element for which the 
# function_object returns true.
dog_dictionary = [{'name': 'dolce', 'age': 11}, {'name': 'dengue', 'age': 6}]
which_dog = list(filter(lambda x : x['name'] == 'dolce', dog_dictionary))
print(which_dog)

In [None]:
# filter thru a list 
age_of_cats = [12, 15, 30, 25, 30, 27] # i secretly believe cats actually live forever
even_aged_cats = list(filter(lambda x : x % 2 == 0, age_of_cats)) 
print(even_aged_cats)

In [None]:
# reduce: The function reduce(func, seq) continually applies the function
#func() to the sequence seq. It returns a single value. 
from functools import reduce
age_of_cat_product = reduce((lambda x,y: x+y), age_of_cats)
print(age_of_cat_product)

In [None]:
#average temperature in summer 
temperature = [68,84,75,73,69,79,64,84,79,84,86,93]

Convert this list of temperature to celsius

Temperature over 85 is considered too hot - create a new list that picks out days when it is too hot

## Part III - List Comprehension
List comprehensions provide a concise way to create lists instead of using for loops. Syntax: __[expression for item in list]__

In [None]:
# using for loops

age_of_cats
age_of_cats_in_human_years_fl = []
for age in age_of_cats:
    age_of_cats_in_human_years.append(age)


In [None]:
print(age_of_cats)
age_of_cats_in_human_years = [cat*7 for cat in age_of_cats]


In [None]:
# using list comprehension with other operations such as conditionals
num_list = [cat*7 for cat in age_of_cats if cat % 2 == 0]
print(num_list)

In [None]:
# list comprehension used in conjunction with conditionals -> like filtering
age_of_cats_in_human_years_still_alive = [cat for cat in age_of_cats if cat*7 < 100]
age_of_cats_in_human_years_still_alive

Good resources for list comprehension:
- [Datacamp Tutorial](https://www.datacamp.com/community/tutorials/python-list-comprehension)
- [Map filter reduce python documentation](http://book.pythontips.com/en/latest/map_filter.html)

## Part II. Get to know your data - measure of central tendency

### Measurement of Central Tendency
Mean, Median, and Mode 
____

### Measurement of Dispersion
- __Absolute Deviation__

The simplest form of dispersion, calculated by taking the difference between a number and the average
Eg. in a list [2,10,20,30], the absolute deviation of 2 is |[(2+10+20+30) / 4] - 2| = 13.5

- __Variance__
The variance is calculated by taking the squared difference from the mean and add them all up

$$σ² = \frac{1}{n}\sum(x_i - \bar x)^2$$

E.g σ² of [2,10,20,30] is 147.67

- __Standard Deviation__
The square root of variance. 

Why is standard deviation better than variance?
____

- __Quantile__

Quantiles are points in a distribution that relates to the rank order of values in that distribution. We can find any quantile by sorting the sample. The middle value of the sorted sample (middle quantile, 50th percentile) is known as the median. The limits are the minimum and maximum values. Any other locations between these points can be described in terms of percentiles.

- __Quartile__

The quartiles of a data set divides the data into four equal parts, with one-fourth of the data values in each part. The second quartile position is the median of the data set, which divides the data set in half as shown for a simple dataset below

<img src="attachment:Screen%20Shot%202019-04-24%20at%209.45.40%20AM.png" width="400">