In [48]:
import numpy as np
import pandas as pd

# Challenge 1 - Iterators, Generators and `yield`. 

In iterator in Python is an object that represents a stream of data. However, iterators contain a countable number of values. We traverse through the iterator and return one value at a time. All iterators support a `next` function that allows us to traverse through the iterator. We can create an iterator using the `iter` function that comes with the base package of Python. Below is an example of an iterator.

In [9]:
# We first define our iterator:

iterator = iter([1,2,3])

# We can now iterate through the object using the next function

print(next(iterator))

1


In [4]:
# We continue to iterate through the iterator.

print(next(iterator))

2


In [5]:
print(next(iterator))

3


In [6]:
# After we have iterated through all elements, we will get a StopIteration Error

print(next(iterator))

StopIteration: 

In [11]:
# We can also iterate through an iterator using a for loop like this:
# Note: we cannot go back directly in an iterator once we have traversed through the elements. 
# This is why we are redefining the iterator below

iterator = iter([1,2,3])
for i in iterator:
    print(i)

1
2
3


In the cell below, write a function that takes an iterator and returns the first element in the iterator and returns the first element in the iterator that is divisible by 2. Assume that all iterators contain only numeric data. If we have not found a single element that is divisible by 2, return zero.

In [26]:
def divisible2(iterator):
    # This function takes an iterable and returns the first element that is divisible by 2 and zero otherwise
    # Input: Iterable
    # Output: Integer
    
    # Sample Input: iter([1,2,3])
    # Sample Output: 2
    
    # Your code here:
    b = next(iterator)
    
    if (b%2 == 0):
        return "This step of the iterator is even and the iterator is:", b
    else:
        return 0
    
iterator = iter([1,2,3,4])
    
for i in iterator:
    print(divisible2(iterator))   



('This step of the iterator is even and the iterator is:', 2)
('This step of the iterator is even and the iterator is:', 4)


### Generators

It is quite difficult to create your own iterator since you would have to implement a `next` function. Generators are functions that enable us to create iterators. The difference between a function and a generator is that instead of using `return`, we use `yield`. For example, below we have a function that returns an iterator containing the numbers 0 through n:

In [10]:
def firstn(n):
     number = 0
     while number < n:
         yield number
         number = number + 1

If we pass 5 to the function, we will see that we have a iterator containing the numbers 0 through 4.

In [12]:
iterator = firstn(5)

for i in iterator:
    print(i)

0
1
2
3
4


In the cell below, create a generator that takes a number and returns an iterator containing all even numbers between 0 and the number you passed to the generator.

In [32]:
def even_iterator(n):
    # This function produces an iterator containing all even numbers between 0 and n
    # Input: integer
    # Output: iterator
    
    # Sample Input: 5
    # Sample Output: iter([0, 2, 4])
    
    # Your code here:
    number = 0
    while (number < n):
        if(number%2 ==0):
            yield number
        number = number + 1

iterator = even_iterator(7)
for i in iterator:
    print(i)
    

0
2
4
6


# Challenge 2 - Applying Functions to DataFrames

In this challenge, we will look at how to transform cells or entire columns at once.

First, let's load a dataset. We will download the famous Iris classification dataset in the cell below.

In [33]:
columns = ['sepal_length', 'sepal_width', 'petal_length','petal_width','iris_type']
iris = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", names=columns)

Let's look at the dataset using the `head` function.

In [36]:
# Your code here:

iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,iris_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Let's start off by using built-in functions. Try to apply the numpy mean function and describe what happens in the comments of the code.

In [45]:
# Your code here:
print(np.mean(iris))
# O a média é calculado para as colunas de valores numéricos. Corresponde à linha mean da função describe do pandas

iris.describe()

#A função describe da biblioteca numpy aplica conceitos de estatística descritiva nos dados do dataframe passados na função
#São calculados os seguintes valores:
# Count: contar quantos elementos existem na tabela, ou seja, quantas observações existem
# Mean: média
# STD: Standard Deviation, ou desvio padrão
# Min: valor mínimo entre todas as observações
# 25%: Valor médio da concentração de 25% das observações de menor valor 
# 50%: Valor médio da concentração de 50% das observações, considerando Q2, Q3, tendo a mediana entre eles
# 25%: Valor médio da concentração de 25% das observações de maior valor
# Max: valor máximo entre todas as observações

sepal_length    5.843333
sepal_width     3.054000
petal_length    3.758667
petal_width     1.198667
dtype: float64


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


Next, we'll apply the standard deviation function in numpy (`np.std`). Describe what happened in the comments.

In [42]:
# Your code here:
np.std(iris)

# O desvio padrão é calculado para as colunas de valores numéricos. Corresponde à linha std da função describe do pandas



sepal_length    0.825301
sepal_width     0.432147
petal_length    1.758529
petal_width     0.760613
dtype: float64

The measurements are in centimeters. Let's convert them all to inches. First, we will create a dataframe that contains only the numeric columns. Assign this new dataframe to `iris_numeric`.

In [205]:
# Your code here:

# Only the numeric column names
iris_numeric = iris.select_dtypes(include=['float64'])
print(iris_numeric)



#print(iris[nomeColumns[0]])
    
#iris_numeric = pd.DataFrame(columns = nomeColumns)

#print(iris_numeric)


     sepal_length  sepal_width  petal_length  petal_width
0             5.1          3.5           1.4          0.2
1             4.9          3.0           1.4          0.2
2             4.7          3.2           1.3          0.2
3             4.6          3.1           1.5          0.2
4             5.0          3.6           1.4          0.2
5             5.4          3.9           1.7          0.4
6             4.6          3.4           1.4          0.3
7             5.0          3.4           1.5          0.2
8             4.4          2.9           1.4          0.2
9             4.9          3.1           1.5          0.1
10            5.4          3.7           1.5          0.2
11            4.8          3.4           1.6          0.2
12            4.8          3.0           1.4          0.1
13            4.3          3.0           1.1          0.1
14            5.8          4.0           1.2          0.2
15            5.7          4.4           1.5          0.4
16            

Next, we will write a function that converts centimeters to inches in the cell below. Recall that 1cm = 0.393701in.

In [156]:
def cm_to_in(x):
    # This function takes in a numeric value in centimeters and converts it to inches
    # Input: numeric value
    # Output: float
    
    # Sample Input: 1.0
    # Sample Output: 0.393701
    
    # Your code here:
    
    return x*0.393701
    
print(cm_to_in(5.1))


2.0078751


Now convert all columns in `iris_numeric` to inches in the cell below. We like to think of functional transformations as immutable. Therefore, save the transformed data in a dataframe called `iris_inch`.

In [182]:
# Your code here:
iris_inch = iris_numeric.apply(cm_to_in)

iris_inch


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,2.007875,1.377954,0.551181,0.078740
1,1.929135,1.181103,0.551181,0.078740
2,1.850395,1.259843,0.511811,0.078740
3,1.811025,1.220473,0.590552,0.078740
4,1.968505,1.417324,0.551181,0.078740
5,2.125985,1.535434,0.669292,0.157480
6,1.811025,1.338583,0.551181,0.118110
7,1.968505,1.338583,0.590552,0.078740
8,1.732284,1.141733,0.551181,0.078740
9,1.929135,1.220473,0.590552,0.039370


We have just found that the original measurements were off by a constant. Define the global constant `error` and set it to 2. Write a function that uses the global constant and adds it to each cell in the dataframe. Apply this function to `iris_numeric` and save the result in `iris_constant`.

In [183]:
# Define constant below:
GLOBAL_CONSTANT = 2
def add_constant(x):
    # This function adds a global constant to our input.
    # Input: numeric value
    # Output: numeric value
    
    # Your code here:
    return x+GLOBAL_CONSTANT

iris_constant = iris_numeric.apply(add_constant)

print(iris_constant)

     sepal_length  sepal_width  petal_length  petal_width
0             7.1          5.5           3.4          2.2
1             6.9          5.0           3.4          2.2
2             6.7          5.2           3.3          2.2
3             6.6          5.1           3.5          2.2
4             7.0          5.6           3.4          2.2
5             7.4          5.9           3.7          2.4
6             6.6          5.4           3.4          2.3
7             7.0          5.4           3.5          2.2
8             6.4          4.9           3.4          2.2
9             6.9          5.1           3.5          2.1
10            7.4          5.7           3.5          2.2
11            6.8          5.4           3.6          2.2
12            6.8          5.0           3.4          2.1
13            6.3          5.0           3.1          2.1
14            7.8          6.0           3.2          2.2
15            7.7          6.4           3.5          2.4
16            

# Bonus Challenge - Applying Functions to Columns

Read more about applying functions to either rows or columns [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) and write a function that computes the maximum value for each row of `iris_numeric`

In [184]:
# Your code here:
def calculateSumRow(df_arg):
    return df_arg.apply(np.sum, axis=1)
    
print(calculateSumRow(iris_numeric))



0      10.2
1       9.5
2       9.4
3       9.4
4      10.2
5      11.4
6       9.7
7      10.1
8       8.9
9       9.6
10     10.8
11     10.0
12      9.3
13      8.5
14     11.2
15     12.0
16     11.0
17     10.3
18     11.5
19     10.7
20     10.7
21     10.7
22      9.4
23     10.6
24     10.3
25      9.8
26     10.4
27     10.4
28     10.2
29      9.7
       ... 
120    18.1
121    15.3
122    19.2
123    15.7
124    17.8
125    18.2
126    15.6
127    15.8
128    16.9
129    17.6
130    18.2
131    20.1
132    17.0
133    15.7
134    15.7
135    19.1
136    17.7
137    16.8
138    15.6
139    17.5
140    17.8
141    17.4
142    15.5
143    18.2
144    18.2
145    17.2
146    15.7
147    16.7
148    17.3
149    15.8
Length: 150, dtype: float64


Compute the combined lengths for each row and the combined widths for each row using a function. Assign these values to new columns `total_length` and `total_width`.

In [211]:
# Your code here:
# Your code here:
def SumMeasure(iris_arg, col1, col2):
    nameCol1 = iris_arg.columns[col1]
    nameCol2 = iris_arg.columns[col2]
    return iris_arg[nameCol1] + iris_arg[nameCol2]


iris_numeric["total_width"] = SumMeasure(iris_numeric,0,2)
iris_numeric["total_length"] = SumMeasure(iris_numeric,1,3)



In [212]:
iris_numeric


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,total_width,total_length
0,5.1,3.5,1.4,0.2,6.5,3.7
1,4.9,3.0,1.4,0.2,6.3,3.2
2,4.7,3.2,1.3,0.2,6.0,3.4
3,4.6,3.1,1.5,0.2,6.1,3.3
4,5.0,3.6,1.4,0.2,6.4,3.8
5,5.4,3.9,1.7,0.4,7.1,4.3
6,4.6,3.4,1.4,0.3,6.0,3.7
7,5.0,3.4,1.5,0.2,6.5,3.6
8,4.4,2.9,1.4,0.2,5.8,3.1
9,4.9,3.1,1.5,0.1,6.4,3.2
