In [1]:
#pre-run
import re

<!-- TITLE 1 -->
## 1D NumPy Arrays

<!-- CONCEPT 1 -->
When you're manipulating data, you often need to perform operations on every element in a collection. To do this with a list, you'd need to write a loop that iterates over every element, but with the <span class="girk">ndarray</span> datatype from the <span class="girk">NumPy</span> (Numerical Python) library it's possible to perform element-wise operations without iteration.

If you tried to add two lists together using a <span class="girk">+</span>, one list would simply be appended to the end of the other. Adding two <span class="girk">NumPy</span> arrays, however, would actually sum elements at corresponding index positions pair by pair.




You can create an array by inputting a <span class="girk">list</span> into the <span class="girk">np.array()</span> function.

In [None]:
## STARTER CODE for 1
import numpy as np
hours_list = [38.5, 40, 35.0, 40, 25, 41.5, 45]

In [3]:
## SOLUTION CODE for 1
import numpy as np
hours_list = [38.5, 40, 35.0, 40, 25, 41.5, 45]
hours_array = np.array(hours_list)
type(hours_array)
set([type(x) for x in hours_list])
set([type(x) for x in hours_array])

{numpy.float64}

<!-- TASK 1.1 -->
Create a Numpy array from <span class="girk">hours_list</span> the using the <span class="girk">array()</span> function. Assign the output to the variable <span class="girk">hours_array</span>.

<!-- HINT 1.1 -->
You can create <span class="girk">hours_array</span> by inputting <span class="girk">hours_list</span> into the function <span class="girk">np.array()</span>.

In [3]:
## TEST 1.1
try:
    hours_array, "Please assign a value to the variable hours_array."
    assert isinstance(hours_array, np.ndarray), "Use the np.array() function to make hours_array a NumPy array."
    assert len(hours_array) == 7, "hours_array should have the same number of elements as hours_list."
    assert np.max(hours_array) == 45.0 and np.min(hours_array) == 25.0, "Input hours_list into the array() function to generate hours_array."
    print("K44: Great job!")
except (AssertionError, NameError) as e:
    print(e.args[0])

K44: Great job!


<span class="girk">NumPy</span> arrays differ from lists in that they can consist of multiple dimensions like a matrix, but they are limited to a single data type. If you attempt to create an array of multiple types, the elements will be coerced to a single type, often <span class="girk">string</span>. 

<!-- TASK 1.2 -->
Check the type of <span class="girk">hours_array</span> to confirm that it's of type <span class="girk">numpy.ndarray</span>.

<!-- HINT 1.2 -->
You can check type using the <span class="girk">type()</span> function.

In [5]:
## TEST 1.2
search = re.search("type\(hours_array", In[2])
if not search:
    print("Check the type of hours_array.")
else:
    print("K44: Good! The type ndarray stands for n-dimensional array.")

K44: Good! The type ndarray stands for n-dimensional array.


<!-- TASK 1.3 -->
Check the data types in hours_list by running the following line of code: <span class="girk">set([type(x) for x in hours_list])</span>

<!-- TASK 1.4 -->
Now, check the data types in hours_array by running this line of code: <span class="girk">set([type(x) for x in hours_array])</span>.

<!-- HINT 1.3 -->
Run <span class="girk">set([type(x) for x in hours_list])</span> to see the datatypes contained in <span class="girk">hours_list</span>. (You should see <span class="girk">float</span> and <span class="girk">int</span>.

In [41]:
## TEST 1.3
search = re.search("set\(\[\s*?type\(\s*?x\s*?\)\s+?for\s+?x\s+?in\s+?hours_list", In[32])
if not search:
    print("Run the given code to see the data types contained in hours_list.")
else:
    print("K44: Very good. You can see that hours_list contains both floats and integers.")

K44: Very good. You can see that hours_list contains both floats and integers.


<!-- HINT 1.4 -->
Run <span class="girk">set([type(x) for x in hours_array])</span>.

In [43]:
## TEST 1.4
search = re.search("set\(\[\s*?type\(\s*?x\s*?\)\s+?for\s+?x\s+?in\s+?hours_array", In[32])
if not search:
    print("Run the given code to see the data types contained in hours_array.")
else:
    print("K44: Very good. You can see that hours_array contains only elements of type numpy.float64.")

Run the given code to see the data types contained in hours_array.


<!-- Info -->
You can see that hours_list contains both floats and integers.

<!-- TITLE 2 -->
## Arithmetic with NumPy Arrays

<!-- CONCEPT 2 -->
The main benefit of storing data in a <span class="girk">NumPy</span> <span class="girk">ndarray</span> is the ease of performing element-wise calculations, such as addition, subtraction, multiplication, division, or modular division. 

Given <span class="girk">hours_array</span>, which contains number of hours worked in a week for seven employees and <span class="girk">pay_array</span>, which contains each employee's hourly rate, it's easy to multiply the hours worked by the pay rate to determine each worker's total earnings. You can also add or subtract a value from each element of <span class="girk">pay_array</span> to reflect adjustments in pay rate.

In [19]:
## STARTER CODE for 2
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

In [3]:
## SOLUTION CODE for 2
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25]) + 1.50
earnings_array = hours_array * pay_array 
withholdings = earnings_array * 0.062

<!-- TASK 2.1 -->
Create a new array called <span class="girk">earnings_array</span> that represents the earnings for each worker by multiplying <span class="girk">hours_array</span> and <span class="girk">pay_array</span>. Print it out to check the values.

<!-- HINT 2.1 -->
You can create <span class="girk">earnings_array</span> with the following operation: <span class="girk">hours_array * pay_array</span>.

In [10]:
## TEST 2.1
try:
    earnings_array
    assert isinstance(earnings_array, np.ndarray), "Create earnings_array, a new NumPy array."
    assert max(earnings_array) == 911.25, "earnings_array must be the product of pay_array and hours_array."
    print("K44: Very good!")
except (AssertionError, NameError) as e:
    print(e.args[0])

name 'earnings_array' is not defined


<!-- TASK 2.2 -->
Recalculate the values of <span class="girk">pay_array</span> to reflect a $1.50 hourly raise for each employee. Print out the array to check that it worked the way you'd expect.

<!-- HINT 2.2 -->
You can add $1.50 to each element of pay_array like this: <span class="girk">pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25]) + 1.50</span>.

In [5]:
## TEST 2.2
try:
    pay_array
    assert not max(pay_array) == 33.75, "Use addition to increase each hourly wage by $1.50." 
    assert max(pay_array) == 24.0, "It doesn't look like you calculated the raise correctly."
    print("K44: Great job! You should notice that each element of the array has increased by 1.5.")
except (AssertionError, NameError) as e:
    print(e.args[0])

K44: Great job! You should notice that each element of the array has increased by 1.5.


<!-- TASK 2.3 -->
Calculate how many dollars are withheld from each employee's pay in <span class="girk">earnings_array</span> given a 6.2% Social Security tax rate. Save the values in a new array called <span class="girk">withholdings</span>.

<!-- HINT 2.3 -->
You can create the <span class="girk">withholdings</span> array by multiplying <span class="girk">earnings_array</span> by 0.062.

In [4]:
## TEST 2.3
search = re.search("witholdings", In[5])
if search:
    print("Built-in spelling lesson: withholdings has 2 h's!\n")
try:
    withholdings
    assert isinstance(withholdings, np.ndarray), "Create the withholdings array."
    assert (max(withholdings) == 60.682499999999997 and min(withholdings) == 37.200000000000003) or (min(withholdings) == 34.875 and max(withholdings) ==56.497500000000002), "Double check your math for withholdings."
    print("K44: Excellent.")
except (AssertionError, NameError) as e:
    print(e.args[0])

IndexError: list index out of range

<!-- TITLE 3 -->
## 2D NumPy Arrays

<!-- CONCEPT 3 -->
The "nd" in the name <span class="girk">ndarray</span> type stands for "n-dimensional". You just worked only with some one-dimensional arrays, but it's possible to create arrays in as many dimensions as you want (though we'll stick to two in this lesson).

You can think of a two-dimensional array as a list of lists, where each sublist represents a row.

 Two-dimensional arrays can still contain only a single data type, but you can perform the element-wise operations in the same way you did for one-dimensional arrays. 

You can create two-dimensional arrays by inputting a list of lists into the <span class="girk">array()</span> function, or by using the <span class="girk">column_stack()</span> function to add one-dimensional arrays as columns in an n-dimensional array.

In [76]:
## STARTER CODE for 3
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

(7,)

In [5]:
## SOLUTION CODE for 3
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

hours_array.shape
pay_array.shape

payroll = np.column_stack((hours_array, pay_array))
payroll.shape

my_array = np.array([["Harry Potter", "Ron Weasley"], ["Frodo", "Samwise"], ["Jon Snow", "Sam"]])

<!-- TASK 3.1 -->
Check the dimensions of hours_array and pay_array by viewing the <span class="girk">.shape</span> attribute of each of them.

<!-- HINT 3.1 -->
You can print <span class="girk">pay_array.shape</span> and <span class="girk">hours_array.shape</span> to confirm that each one is one-dimensional.

In [85]:
## TEST 3.1
search1 = re.search("pay_array.shape", In[84])
search2 = re.search("hours_array.shape", In[84])
if not (search1 or search2):
    print("Use the .shape attribute to check the dimensions of pay_array and hours_array.")
if search1 or search2:
    print("K44: Great! You can see that each array contains seven elements, which will correspond to seven rows in a 2d array.")

Use the .shape attribute to check the dimensions of pay_array and hours_array.


<!-- TASK 3.2 -->
The <span class="girk">column_stack()</span> function takes a tuple of arrays as input. Use <span class="girk">column_stack()</span> to create a two-dimensional array composed of <span class="girk">hours_array</span> and <span class="girk">pay_array</span>. Assign the output to the variable <span class="girk">payroll</span>.

<!-- HINT 3.2 -->
You can create the <span class="girk">payroll</span> array like this: <span class="girk">np.column_stack((hours_array, pay_array))</span>. Print it out to check its contents.

In [6]:
## TEST 3.2
try:
    payroll
    assert isinstance(payroll, np.ndarray)
    assert not payroll.shape == (7, 1), "The payroll array must have two columns, not just one."
    assert payroll.shape == (7, 2), "It doesn't look like you created payroll correctly."
    assert not payroll[0][0] == payroll[0][1], "Careful! Don't stack the same column twice."
    assert not payroll[0][0] == 18. and not payroll[0][0] == 19.5, "Change the order of pay_array and hours_array in the input tuple for the column_stack() function."
    print("K44: Excellent work!")
except (AssertionError, NameError) as e:
    print(e.args[0])

K44: Excellent work!


<!-- TASK 3.3 -->
Check the shape of the <span class="girk">payroll</span> array.

<!-- HINT 3.3 -->
Use the <span class="girk">.shape</span> attribute.

In [94]:
## TEST 3.3
search = re.search("payroll.shape", In[93])
if not search:
    print("Check the shape of the payroll array.")
else:
    print("K44: Good. You can see that the array has 7 rows and 2 columns.")

K44: Good. You can see that the array has 7 rows and 2 columns.


<!-- TASK 3.4 -->
Create a new two-dimensional array by inputting a list of lists to the <span class="girk">array()</span> function.
The contents of each list can be whatever you like, but remember that an array can only contain one data type. Call the array <span class="girk">my_array</span>.

<!-- HINT 3.4 -->
You can create an array like this: <span class="girk">my_array = [[1,2,3], [4,5,6], [7,8,9]]</span>. Each sublist will become a row.

In [16]:
## TEST 3.4
try:
    my_array
    assert isinstance(my_array, np.ndarray), "Create a NumPy array called my_array."
    assert not len(my_array.shape) == 1, "Make my_array have at least two dimensions."
    print("K44: Cool! What a lovely array.")
except (AssertionError, NameError) as e:
    print(e.args[0])

K44: Cool! What a lovely array.


<!-- TASK 3.5 -->
Print the shape of my_array to confirm that it is not one-dimensional.

<!-- HINT 3.5 -->
Use <span class="girk">my_array.shape</span>.

In [7]:
## TEST 3.5
search = re.search("my_array.shape", In[4])
if not search:
    print("Check the shape of my_array.")
else:
    print("K44: That's right!")

Check the shape of my_array.


<!-- TITLE for 4 -->
## Basic Indexing for Arrays

<!-- CONCEPT for 4 -->
You've probably used square brackets (<span class="girk">[]</span>) to select individual elements by index in a list. Fortunately, indexing of NumPy arrays works basically the same way. You can select elements or ranges of elements from a one-dimensional array as if it were a list.

However, to select a particular element of a two-dimensional array, you'll need two sets of brackets. You select a particular row in the first pair of brackets, and then the index of an element within that row (essentially, the column), using a second pair. For instance, <span class="girk">my_array[0][2]</span> will produce the third element of the first row of an array. Remember that the indices are always specified row first, then column. Alternatively, you could access that same element like this: <span class="girk">my_array[0,2]</span>, with the row and column indices separated by a comma.

In [None]:
## STARTER CODE for 4
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])

In [21]:
## SOLUTION for 4
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])
hours_array[0]
payroll[0]
payroll[0,0]
payroll[2,1]

18.0

<!-- TASK 4.1 -->
Print the first element of the <span class="girk">hours_array</span>.

<!-- HINT 4.1 -->
Remember that arrays are zero-indexed, so you can access the first element like this: <span class="girk">hours_array[0]</span>.

In [17]:
## TEST 4.1
search1 = re.search("hours_array\[\s*?1\s*?\]", In[16])
if search1:
    print("The first element of an array has index 0, not index 1.")

search2 = re.search("pay_array\[", In[16])
if search2:
    print("Get the first element of hours_array, not pay_array.")
    
search3 = re.search("pay_roll\[", In[16])
if search3:
    print("Get the first element of hours_array, not payroll.")

search4 = re.search("hours_array\[\s*?0\s*?\]", In[16])
if not search4 and not search2 and not search1 and not search3:
    print("Select the first element of hours_array.")
    
if search4 and not search1 and not search2 and not search3:
    print("K44: Great job! Indexing of 1d arrays is the same as for lists.")

Select the first element of hours_array.


<!-- TASK 4.2 -->
Now access the first row of <span class="girk">payroll</span>.

<!-- HINT 4.2 -->
If you think of the <span class="girk">payroll</span> array as a list of lists, the first row is the first element of the master list, so you can view it like this: <span class="girk">payroll[0]</span>.

In [22]:
## TEST 4.2
search1 = re.search("pay_array\[", In[21])
if search1:
    print("Get the first row of payroll, not pay_array.")

search2 = re.search("payroll\[\s*?1\s*?\]", In[21])
if search2:
    print("The first row of payroll has index 0, not index 1.")

search3 = re.search("payroll\[\s*?0\s*?\]\[", In[21])
if search3:
    print("Access the entire first row, not an element in that row.")
    
search4 = re.search("payroll\[\s*?0\s*?\]", In[21])
if not search4:
    print("Get the first row of payroll.")
    
if search4 and not search1 and not search2 and not search3:
    print("K44: You got it!")

K44: You got it!


<!-- TASK 4.3 -->
Now, print just the first element of the first row of <span class="girk">payroll</span>.

<!-- HINT 4.3 -->
You'll need two sets of square brackets. In the first, put the index of the row you want to access, and in the second, the index of the column.

In [20]:
## TEST 4.3
search = re.search("payroll\[\s*?0(.+){1,5}0\s*?\]", In[19])
if not search:
    print("Access the first element of the first row of payroll.")
else:
    print("K44: Very good!")

K44: Very good!


<!-- TASK 4.4 -->
Get the second element of the third row of <span class="girk">payroll</span>.

<!-- HINT 4.4 -->
Remember, zero-indexing means that the second element of the row has index 1 rather than 2, and the third row has index 2, not 3.

In [23]:
## TEST 4.4
search1 = re.search("payroll\[\s*?[013456]\s*?\]\[\s*?1\s*?\]", In[21])
if search1:
    print("The row index doesn't look quite right.")
search2 = re.search("payroll\[\s*?2\s*?\]\[\s*?0\s*?\]", In[21])
if search2:
    print("The column index doesn't look quite right.")
search3 = re.search("payroll\[\s*?2(.){1,5}1\s*?\]", In[21])
if not search3 and not search1 and not search2:
    print("That's not the right element.")
if search3:
    print("K44: Nailed it!")

K44: Nailed it!


<!-- TASK 4.5 -->
Rather than using two sets of square brackets, you can use just a single set, with the row and column values separated by a comma. Select the first element of the fourth row of <span class="girk">payroll</span> using a single set of brackets and a comma.

<!-- HINT 4.5 -->
Try <span class="girk">payroll[3,0]</span>.

In [38]:
## TEST 4.5
search1 = re.search("payroll\[\s*?3\s*?\]", In[36])
if search1:
    print("Select the element using a comma to separate row and column indices.")
search2 = re.search("payroll\[\s*?3\s*?,\s*?0\s*?\]", In[36])
if not search2 and not search1:
    print("That's not the right element.")
if search2 and not search1:
    print("K44: You got it!")

Select the element using a comma to separate row and column indices.


<!-- TITLE for 5 -->
## Selecting a Range of Elements

<!-- CONCEPT for 5 -->
You may be familiar with the method for selecting a range of elements from a list using a colon (<span class="girk">:</span>). The same logic also applies to <span class="girk">NumPy</span> arrays, where the selected range includes the element to the left of the <span class="girk">:</span> and goes up to, but not including the element to the right. For instance, <span class="girk">my_array[1:4]</span> includes the elements at indices 1, 2, and 3, but not 4.

In a two-dimensional array, you can combine this notation with the logic for selecting rows or columns to select ranges of rows or ranges of columns. However, be aware that using a comma to separate ranges of rows and columns is often more straight-forward than using pairs of square brackets to narrow down your selection to the right subset of elements.

In [None]:
## STARTER CODE for 5
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])

In [30]:
## SOLUTION CODE for 5
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])
hours_array[0:3]
first_two_employees = payroll[0:2]
hours = payroll[:,0]
third_fourth_hours = payroll[2:4,0]
third_fourth_hours
pay_only = payroll[:, 1]

In [206]:
pay_only

array([ 18.  ,  19.25,  18.  ,  19.25,  22.5 ,  18.5 ,  20.25])

<!-- TASK 5.1 -->
Print the first three elements of the <span class="girk">hours_array</span> by selecting a range.

<!-- HINT 5.1 -->
To select the first three elements, you'll need the values at indices 0, 1, and 2, but not including 3. The range should look like this <span class="girk">[0:3]</span>.

In [28]:
## TEST 5.1
search1 = re.search("pay_array\[", In[27])
if search1:
    print("Select the first three elements of hours_array, not pay_array.")
search2 = re.search("payroll\[(.){1,2}3", In[27])
if search2:
    print("Select the first three elements of hours_array, not payroll.")
search3 = re.search("hours_array\[\s*?0?\s*?:\s*?3\s*?\]", In[27])
if not search3:
    print("Select the first three elements of hours_array.")
if search3:
    print("K44: That's right!")

K44: That's right!


<!-- TASK 5.2 -->
Select the number of hours worked as well as earnings from <span class="girk">payroll</span> for the first two employees. Save the selection to the variable <span class="girk">first_two_employees</span>.

<!-- HINT 5.2 -->
The data for the first two employees are located at row indices 0 and 1, so you'll need to select rows <span class="girk">[0:2]</span>.

In [17]:
## TEST 5.2
try:
    first_two_employees
    assert isinstance(first_two_employees, np.ndarray), "Please create the array first_two_employees."
    assert first_two_employees.shape[0] == 2, "You should select only two rows."
    assert first_two_employees.shape[1] == 2, "You should select both columns."
    assert first_two_employees[0][0] == payroll[0][0], "This is not the correct range."
    print("K44: Great job! You made a 2x2 array containing hours and earnings for two employees.")
except (AssertionError, NameError) as e:
    print(e.args[0])

name 'first_two_employees' is not defined


<!-- TASK 5.3 -->
Now, select only the hours worked for the third and fourth employees in <span class="girk">payroll</span>. Assign this range to the variable <span class="girk">third_fourth_hours</span>.

<!-- HINT 5.3 -->
The third and fourth employees' data is located in row at indices 2 and 3, and since you're only interested in hours worked, the column index of interest is 0. You'll want to use the comma method here: <span class="girk">payroll[2:4,0]</span>.

If you try <span class="girk">payroll[2:4][0]</span>, you'll end up with only the first element (the row containing the third employee's data) of the 2x2 array containing hours and pay for the third and fourth employees.

In [18]:
## TEST
try:
    third_fourth_hours
    assert isinstance(third_fourth_hours, np.ndarray), "Please create the array third_fourth_hours."
    assert not third_fourth_hours.shape == (2,1), "The third_fourth_hours array should only be one-dimensional."
    assert len(third_fourth_hours) == 2, "The third_fourth_hours array should contain 2 rows."
    assert third_fourth_hours[0] == 35.0, "The third_fourth_hours array does not contain the correct values."
    assert third_fourth_hours.shape == (2,), "The third_fourth_hours_array does not have the correct dimensions."
    print("K44: Great job! That was tricky!")
except (AssertionError, NameError) as e:
    print(e.args[0])

name 'third_fourth_hours' is not defined


<!-- TASK 5.4 -->
From <span class="girk">payroll</span>, select the pay column for all employees and save the values in an array called <span class="girk">pay_only</span>.

<!-- HINT 5.4 -->
Since you're interested in all rows, the value to the left of the comma can be <span class="girk">:</span>, and the value to the right should be <span class="girk">1</span> because you're only interested in the second column.

In [31]:
## TEST 5.4
try:
    pay_only
    assert isinstance(pay_only, np.ndarray), "Please create the array pay_only."
    assert not pay_only.shape == (7,1), "The pay_only array should have only one dimension."
    assert pay_only.shape[0] == 7, "Select all the rows from payroll."
    assert pay_only.shape == (7,), "The pay_only array does not have the correct dimensions."
    assert pay_only[0] == payroll[0][1], "The pay_only array does not contain the correct data."
    print("K44: Excellent work!")
except (AssertionError, NameError) as e:
    print(e.args[0])

K44: Excellent work!


<!-- TITLE 6 -->
## Boolean Indexing

<!-- CONCEPT 6 -->
Another way to select subsets of an array is with Boolean indexing. Using a mathematical expression, like an inequality, with an array assesses the truth value of each element for that expression, and returns an array full of <span class="girk">True</span> and <span class="girk">False</span> values. If the element at a particular index location returns <span class="girk">True</span>, then the value of the corresponding index in the resulting Boolean array will be <span class="girk">True</span> as well. You can use this array to create a new shorter array containing only values of the original that align with <span class="girk">True</span>.

For example, if you have an array of integers, <span class="girk">my_array</span>, you could make a subset of all values less than 10 with the following syntax: <span class="girk">my_array[my_array < 10]</span>.

In [None]:
## STARTER CODE for 6
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])

In [10]:
## SOLUTION for 6
hours_array = np.array([ 38.5,  40. ,  35. ,  40. ,  25. ,  41.5,  45. ])
pay_array = np.array([18.00, 19.25, 18.00, 19.25, 22.50, 18.50, 20.25])

payroll = np.array([[ 38.5 ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 35.  ,  18.  ],
                   [ 40.  ,  19.25],
                   [ 25.  ,  22.5 ],
                   [ 41.5 ,  18.5 ],
                   [ 45.  ,  20.25]])
less_than_40 = hours_array[hours_array < 40]
greater_than_20 = payroll[payroll[:,1] > 20]
multiple_of_10 = payroll[payroll[:,0] % 10 == 0]

<!-- TASK 6.1 -->
Generate a Boolean array corresponding to all values that are less than 40 in <span class="girk">hours_array</span>. You don't have to save it or assign it to a variable.

<!-- HINT 6.1 -->
You can do this with the inequality <span class="girk">hours_array < 40</span>.

In [17]:
## TEST 6.1
search1 = re.search("hours_array\s*?>\s*?40", In[15])
if search1:
    print("Be careful with the direction of the inequality!")
search2 = re.search("hours_array\s*?<\s*?40", In[15])
search3 = re.search("hours_array\s*?\[\s*?hours_array", In[15])
if search3:
    print("Don't use square brackets to subset the array just yet! Print the Boolean array first.")
if search2 and not search3:
    print("K44: Great job! Notice that the True and False values align with whether the values in hours_array are greater or less than 40.")

Don't use square brackets to subset the array just yet! Print the Boolean array first.


<!-- TASK 6.2 -->
Now that you've created a Boolean array, use it to subset <span class="girk">hours_array</span> and return an array of only those values less than 40. Assign the result to <span class="girk">less_than_40</span>, and view its contents.

<!-- HINT 6.2 -->
You can assign <span class="girk">less_than_40 = hours_array[hours_array < 40]</span>.

In [31]:
## TEST 6.2
search1 = re.search("payroll\[\s*?hours_", In[30])
search2 = re.search("pay_array\[\s*?hours_", In[30])
if search1 or search2:
    print("You should be indexing hours_array.")
try:
    less_than_40
    assert isinstance(less_than_40, np.ndarray), "Please create the array less_than_40."
    assert less_than_40[0] == 38.5, "The values of less_than_40 have not been assigned correctly."
    print("K44: Very good! Notice that the less_than_40 array contains only 3 values, and they're all less than 40!")
except (NameError, AssertionError) as e:
    print(e.args[0])

K44: Very good! Notice that the less_than_40 array contains only 3 values, and they're all less than 40!


<!-- TASK 6.3 -->
Create an subset of <span class="girk">payroll</span> containing only rows for which pay rate is greater than $20.00, and name the resulting array <span class="girk">greater_than_20</span>.

<!-- HINT 6.3 -->
It helps to build the correct inequality first, and then surround it by brackets and use it to subset the original. The right inequality in this case is <span class="girk">payroll[:,1] > 20</span> because you're interested in any row, but only if the second column is greater than 20.

In [48]:
## TEST 6.3
try:
    assert isinstance(greater_than_20, np.ndarray), "Please create the array greater_than_20."
    assert not True in greater_than_20, "greater_than_20 should contain actual values, not Booleans."
    assert greater_than_20.shape[0] == 2, "The array greater_than_20 should contain 2 rows."
    assert greater_than_20.shape[1] == 2, "The array greater_than_20 should contain both columns."
    print("K44: You got it! Check out the second column to see that the values there are both greater than 20.")
except (NameError, AssertionError) as e:
    print(e.args[0])

K44: You got it! Check out the second column to see that the values there are both greater than 20.


<!-- TASK 6.4 -->
Create a subset of <span class="girk">payroll</span> consisting only of rows where the number of hours worked is a multiple of 10 (e.g. 40.0, not 40.5). Name this array <span class="girk">multiple_of_10</span>.

<!-- HINT 6.4 -->
The modulo function, <span class="girk">%</span> returns the remainder of a division operation. If a number x is divisible by number y, x % y will equal 0. You can use this criteria to subset <span class="girk">payroll</span>.

In [51]:
## TEST 6.4
try:
    multiple_of_10
    assert isinstance(multiple_of_10, np.ndarray), "Please create the array multiple_of_10."
    assert not multiple_of_10.shape[0] < 2, "You haven't selected enough rows."
    assert not multiple_of_10.shape[1] > 2, "You've selected too many rows."
    assert multiple_of_10.shape[1] == 2, "The array multiple_of_10 should contain both columns."
    assert multiple_of_10[0][0] == 40.0 and multiple_of_10[0][1] == 19.25, "The values in multiple_of_10 are not correct."
    print("K44: Great work!")
except (NameError, AssertionError) as e:
    print(e.args[0])

K44: Great work!


manipulate an array to use statistics
mean, median, corrcoef, std, sum, sort
- forcing a single datatype makes it much faster than regular python

In [6]:
import pandas as pd

test = pd.DataFrame([[1,3],[2,4]])
print(test)

   0  1
0  1  3
1  2  4


In [20]:
%%html
<img src="http://www.readersdigest.ca/wp-content/uploads/2011/01/4-ways-cheer-up-depressed-cat.jpg" style='width:600px'>

In [19]:
%%html
<video  controls style='width:600px'>
  <source src="./assets/test.mov" type="video/mp4">
</video>

/Users/timhogan/Documents/datastar/dash_courses
