# Lab 2 

# Introduction to NumPy

In this lab, you'll be working through Chapter 2 to get an introduction to the numerical computing package for Python, NumPy. This notebook is made up of two sections.

- Section 1: Work through the code samples in Chapter 2
- Section 2: Exercises

# Section 1: Code Practice

In this section, you will be reading through the various chapter sections and typing out/running the code samples given in the sections. The purpose of this is for you to practice using Jupyter to run Python code as well as learn about the functionality available to you in both IPython and Jupyter.

##### Executing code in Jupyter

When typing and executing code in Jupyter, it is helpful to know the various keyboard shortcuts. You can find the full list of these by clicking **Help &rarr; Keyboard Shortcuts** in the menu. However, the two most useful keyboard shortcuts are:

- `Shift-Enter`: Execute the current cell and advance to the next cell. This will create one if none exists, but if a cell exists below your current cell, a new cell will **not** be created.
- `Alt-Enter`: Execute the current cell and **create** a new cell below.
- `Control-Enter`: Execute the current cell without advancing to the next cell

When writing your code, you will be using these two commands to make sure input/output (`In`/`Out`) is consistent with what is found in the chapter. If you create a cell by mistake, you can always go to **Edit &rarr; Delete Cells** to remove it.

#### Purpose of Section 1

Your purpose in this section is 

- **Type out** the code examples from the chapter (do not copy and paste)
- **Run** them
- **Check** to **make sure** you are getting the same results as what is contained in the chapter

---




## Understanding Data Types in Python

#### A Python List is More Than Just a List

In [12]:
import numpy as np
import array

In [13]:
listarray = [1, 2, 3, 4, 5, 
             6, 7]

In [14]:
arrays = np.array(listarray)

In [15]:
squared = arrays ** 2

In [16]:
print(f"List:, {listarray}")
print(f"Squared Array: {squared}")

List:, [1, 2, 3, 4, 5, 6, 7]
Squared Array: [ 1  4  9 16 25 36 49]


#### Fixed-Type Arrays in Python

In [17]:
fixateforarraytype = array.array('i', listarray)

In [18]:
np_array = np.array(listarray)
print(f"Fixed Array:, {fixateforarraytype}")

Fixed Array:, array('i', [1, 2, 3, 4, 5, 6, 7])


#### Creating Arrays from Python Lists

In [19]:
array = np.array(listarray)

In [20]:
print(f"Array:, {array}")

Array:, [1 2 3 4 5 6 7]


#### Creating Arrays from Scratch

In [21]:
zeroes = np.zeros(10)

In [22]:
ones = np.ones(10)

In [23]:
random = np.random.rand(3, 3)

In [24]:
print("Array of zeros:", zeroes)

Array of zeros: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [25]:
print("Array of ones:", ones)

Array of ones: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [26]:
print("Random array:\n", random)

Random array:
 [[0.80919436 0.2744498  0.98110895]
 [0.83276292 0.21738659 0.77453472]
 [0.11901405 0.39202127 0.28132802]]


## The Basics of NumPy Arrays

### NumPy Array Attributes

In [27]:
array = np.random.rand(3, 3)

array

array([[0.25092551, 0.03286527, 0.01891979],
       [0.81815289, 0.89962409, 0.29278908],
       [0.2174957 , 0.38758127, 0.6470891 ]])

In [28]:
array.shape

(3, 3)

In [29]:
array.size

9

In [30]:
array.dtype

dtype('float64')

In [31]:
array.itemsize

8

### Array Indexing: Accessing Single Elements

In [32]:
first_element_first_row = array[0, 0]  # First element in the first row
last_element_second_row = array[1, 2]  # Last element in the second row
first_element_third_row = array[2, 0]  # first element in the third row

array 

array([[0.25092551, 0.03286527, 0.01891979],
       [0.81815289, 0.89962409, 0.29278908],
       [0.2174957 , 0.38758127, 0.6470891 ]])

In [33]:
last_element_second_row


0.2927890769388749

In [34]:
first_element_third_row


0.21749570105963534

In [35]:
array[1, 1] = 55  

In [36]:
array

array([[2.50925508e-01, 3.28652717e-02, 1.89197932e-02],
       [8.18152891e-01, 5.50000000e+01, 2.92789077e-01],
       [2.17495701e-01, 3.87581274e-01, 6.47089101e-01]])

### Array Slicing: Accessing Subarrays

#### One-dimensional subarrays

In [37]:
array = np.random.randint(1, 100, size=10)

array

array([73, 81, 46, 14, 89, 12, 75, 67, 17, 13])

In [38]:
subarray_1 = array[2:6]    
subarray_1

array([46, 14, 89, 12])

In [39]:
subarray_2 = array[:4]     
subarray_2

array([73, 81, 46, 14])

In [40]:
subarray_3 = array[5:]
subarray_3

array([12, 75, 67, 17, 13])

In [41]:
subarray_4 = array[::2]
subarray_4

array([73, 46, 89, 75, 17])

#### Multi-dimentional subarrays

In [42]:
array = np.random.randint(1, 100, size=(4, 4))

In [43]:
subarray_1 = array[0:2, 1:3]    
subarray_1

array([[11, 74],
       [53, 43]])

In [44]:
subarray_2 = array[2:, 2:]      
subarray_2

array([[40, 90],
       [64, 63]])

In [45]:
subarray_3 = array[:, 1:3]      
subarray_3

array([[11, 74],
       [53, 43],
       [30, 40],
       [23, 64]])

In [46]:
subarray_4 = array[1:3, :]    
subarray_4

array([[ 1, 53, 43, 31],
       [ 1, 30, 40, 90]])

#### Subarrays as no-copy views

In [47]:
array = np.random.randint(1, 100, size=(4, 4))

In [48]:
subarray = array[1:3, 1:3]
subarray

array([[34, 97],
       [69, 49]])

In [49]:
subarray[0, 0] = 999
print(subarray)

[[999  97]
 [ 69  49]]


#### Creating copies of arrays

In [50]:
array = np.random.randint(1, 100, size=(4, 4))

view_array = array[1:3, 1:3]
view_array

array([[47, 79],
       [90, 55]])

In [51]:
copy = array.copy()
copy

array([[40, 38, 33, 58],
       [87, 47, 79, 83],
       [29, 90, 55, 11],
       [82, 32, 25, 69]])

In [52]:
copy[0, 0] = 999
copy

array([[999,  38,  33,  58],
       [ 87,  47,  79,  83],
       [ 29,  90,  55,  11],
       [ 82,  32,  25,  69]])

### Reshaping of Arrays

In [53]:
array = np.arange(12)
array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [54]:
array2d = array.reshape((3, 4))
array2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [55]:
array3d = array.reshape((2, 3, 2))
array3d

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]])

### Array Concatenation and Splitting

#### Concatenation of arrays

In [56]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

In [57]:
concatenatedarray = np.concatenate((array1, array2))
concatenatedarray

array([1, 2, 3, 4, 5, 6])

#### Splitting of arrays

In [58]:
splitarray = np.split(concatenatedarray, 3)
splitarray

[array([1, 2]), array([3, 4]), array([5, 6])]

## Computation on NumPy Arrays: Universal Functions

### The Slowness of Loops

In [59]:
import time

size = 10**6
array1 = np.random.rand(size)
array2 = np.random.rand(size)

start = time.time()

result = np.empty(size)
for i in range(size):
    result[i] = array1[i] + array2[i]

end = time.time()
loop = end - start

print(f"this took: {loop:.5f} seconds")

this took: 0.30680 seconds


### Introducing UFuncs

In [60]:
array1 = np.random.randint(1, 100, size=(4, 4))
array2 = np.random.randint(1, 100, size=(4, 4))

### Exploring NumPy's UFuncs

#### Array arithmetic

In [68]:
addition = np.add(array1, array2)
addition

array([[100, 196,  45, 102],
       [101, 103, 118, 181],
       [103,  38,  94,  65],
       [149, 139,  92, 135]])

In [69]:
subtraction = np.subtract(array1, array2)
subtraction

array([[-14,   0, -43, -30],
       [-29, -19,  24, -13],
       [ 29,  10,  20,  59],
       [-11, -17, -12,   5]])

In [70]:
multiplication = np.multiply(array1, array2)
multiplication

array([[2451, 9604,   44, 2376],
       [2340, 2562, 3337, 8148],
       [2442,  336, 2109,  186],
       [5520, 4758, 2080, 4550]])

In [71]:
division = np.divide(array1, array2)
division

array([[ 0.75438596,  1.        ,  0.02272727,  0.54545455],
       [ 0.55384615,  0.68852459,  1.5106383 ,  0.86597938],
       [ 1.78378378,  1.71428571,  1.54054054, 20.66666667],
       [ 0.8625    ,  0.78205128,  0.76923077,  1.07692308]])

#### Absolute value

In [73]:
np.abs(subtraction)

array([[14,  0, 43, 30],
       [29, 19, 24, 13],
       [29, 10, 20, 59],
       [11, 17, 12,  5]])

#### Trigonometric functions

In [74]:
np.sin(array1)

array([[-0.83177474, -0.57338187,  0.84147098, -0.99177885],
       [-0.99177885, -0.91652155,  0.95105465,  0.73319032],
       [-0.02655115, -0.90557836,  0.43616476, -0.7391807 ],
       [-0.11478481, -0.96611777,  0.74511316,  0.77389068]])

In [75]:
np.cos(array1)

array([[ 0.5551133 , -0.81928825,  0.54030231, -0.12796369],
       [-0.12796369, -0.39998531, -0.30902273, -0.6800235 ],
       [-0.99964746,  0.42417901,  0.89986683,  0.67350716],
       [ 0.99339038, -0.25810164, -0.66693806,  0.6333192 ]])

In [76]:
np.tan(array1)

array([[-1.49838734,  0.69985365,  1.55740772,  7.75047091],
       [ 7.75047091,  2.29138799, -3.0776204 , -1.07818381],
       [ 0.02656052, -2.1348967 ,  0.48469923, -1.09750978],
       [-0.11554855,  3.74316794, -1.11721493,  1.22195992]])

#### Exponents and logarithms

In [77]:
exponential = np.exp(array2)
exponential

array([[5.68572000e+24, 3.63797095e+42, 1.28516001e+19, 4.60718663e+28],
       [1.69488924e+28, 3.10429794e+26, 2.58131289e+20, 1.33833472e+42],
       [1.17191424e+16, 1.20260428e+06, 1.17191424e+16, 2.00855369e+01],
       [5.54062238e+34, 7.49841700e+33, 3.83100800e+22, 1.69488924e+28]])

In [78]:
logarithm = np.log(array2)
logarithm

array([[4.04305127, 4.58496748, 3.78418963, 4.18965474],
       [4.17438727, 4.11087386, 3.8501476 , 4.57471098],
       [3.61091791, 2.63905733, 3.61091791, 1.09861229],
       [4.38202663, 4.35670883, 3.95124372, 4.17438727]])

In [79]:
sqrt = np.sqrt(array1)
sqrt

array([[6.55743852, 9.89949494, 1.        , 6.        ],
       [6.        , 6.4807407 , 8.42614977, 9.16515139],
       [8.1240384 , 4.89897949, 7.54983444, 7.87400787],
       [8.30662386, 7.81024968, 6.32455532, 8.36660027]])

#### Specialized ufuncs

In [80]:
special = np.array([1+2j, 3+4j, 5+6j])

In [81]:
np.log(special)

array([0.80471896+1.10714872j, 1.60943791+0.92729522j,
       2.05543693+0.87605805j])

In [82]:
np.angle(special)

array([1.10714872, 0.92729522, 0.87605805])

### Advanced UFunc Features

#### Specifying output

In [87]:
output = np.empty_like(array1)

In [88]:
np.add(array1, array2, out=output)

array([[100, 196,  45, 102],
       [101, 103, 118, 181],
       [103,  38,  94,  65],
       [149, 139,  92, 135]])

In [89]:
output

array([[100, 196,  45, 102],
       [101, 103, 118, 181],
       [103,  38,  94,  65],
       [149, 139,  92, 135]])

#### Aggregates

In [93]:
sumout = np.empty((1,), dtype=array.dtype)
sumout[0]

38

In [94]:
meanout = np.empty((1,), dtype=array.dtype)
meanout[0]

38

In [95]:
stdout = np.empty((1,), dtype=array.dtype)
stdout[0]

38

#### Outer products

In [96]:
outer_product = np.outer(array1, array2)
outer_product

array([[2451, 4214, 1892, 2838, 2795, 2623, 2021, 4171, 1591,  602, 1591,
         129, 3440, 3354, 2236, 2795],
       [5586, 9604, 4312, 6468, 6370, 5978, 4606, 9506, 3626, 1372, 3626,
         294, 7840, 7644, 5096, 6370],
       [  57,   98,   44,   66,   65,   61,   47,   97,   37,   14,   37,
           3,   80,   78,   52,   65],
       [2052, 3528, 1584, 2376, 2340, 2196, 1692, 3492, 1332,  504, 1332,
         108, 2880, 2808, 1872, 2340],
       [2052, 3528, 1584, 2376, 2340, 2196, 1692, 3492, 1332,  504, 1332,
         108, 2880, 2808, 1872, 2340],
       [2394, 4116, 1848, 2772, 2730, 2562, 1974, 4074, 1554,  588, 1554,
         126, 3360, 3276, 2184, 2730],
       [4047, 6958, 3124, 4686, 4615, 4331, 3337, 6887, 2627,  994, 2627,
         213, 5680, 5538, 3692, 4615],
       [4788, 8232, 3696, 5544, 5460, 5124, 3948, 8148, 3108, 1176, 3108,
         252, 6720, 6552, 4368, 5460],
       [3762, 6468, 2904, 4356, 4290, 4026, 3102, 6402, 2442,  924, 2442,
         198, 5280, 51

## Aggregations: Min, Max, and Everything In Between

### Summing the Values in an Array

In [97]:
np.sum(array1)

860

### Minimum and Maximum

In [98]:
minimum = np.min(array1)
minimum

1

In [99]:
maximum = np.max(array1)
maximum

98

#### Multi dimensional aggregates

In [100]:
array3d = np.array([[[1, 2, 3, 4],
                      [5, 6, 7, 8],
                      [9, 10, 11, 12]],
                     
                     [[13, 14, 15, 16],
                      [17, 18, 19, 20],
                      [21, 22, 23, 24]]])

In [101]:
sum_total = np.sum(array3d)
sum_total

300

In [102]:
mean_total = np.mean(array3d)
mean_total

12.5

In [103]:
min_total = np.min(array3d)
min_total

1

In [104]:
max_total = np.max(array3d)
max_total

24

### Example: What is the Average Height of US Presidents?

For this section, you'll need to execute the following cell first. 

In [70]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

heights = array_from_url('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/8a34a4f653bdbdc01415a94dc20d4e9b97438965/notebooks/data/president_heights.csv','height(cm)')
print(heights)

[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
 174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
 177 185 188 188 182 185]


For this portion, start with the cell labeled `In [15]:`

---

# Section 2: Exercises

In this section, you will be provided a few exercises to demonstrate your understanding of the chapter contents. Each exercise will have a Markdown section describing the problem, and you will provide cells below the description with code, comments and visual demonstrations of your solution.

---

### Problem 1

Make sure you have the `array_from_url` function defined:

```python
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])
```

Using the `array_from_url` function, use the following arguments

- URL: `"https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"`
- column: `"area (sq. mi)"`

to load the NumPy array into a variable `areas`.

Print out the `mean` area for all the states in the US. Use built-in methods and UFuncs where appropriate.

In [72]:
url = "https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"
column = "area (sq. mi)"
areas = array_from_url(url, column)

mean_area = np.mean(areas)
print("Mean area:", mean_area)

Mean area: 72892.28846153847


---

### Problem 2

Using the `areas` array created above, assign the total area of the United States and D.C. to a new variable, `total_area` by using the `sum` method of `areas`.

In [76]:
areas = np.sum(areas)
areas

3790399

---

### Problem 3

Using NumPy's various UFuncs, create a new array, `area_percentage`, that is each state's area as percentage of the `total_area`.

E.g. Given that Alaska's area is the second element of the array `areas` (i.e. `areas[1]`), Alaska's percentage of the total would be: `areas[1]/total_area`

In [78]:
area = np.sum(areas)
areapercent = (areas / area) * 100
areapercent

100.0

### Problem 4

Print out the heights of the American Presidents in feet (rather than cm). Use UFuncs for this. You'll need to look up the formula to convert cm to feet.

In [107]:
import pandas as pd

def array_from_url(url, column):
    data = pd.read_csv(url)
    return np.array(data[column])

heights = array_from_url('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/8a34a4f653bdbdc01415a94dc20d4e9b97438965/notebooks/data/president_heights.csv', 'height(cm)')

heightsfeet = heights * 0.0328084
heightsfeet

array([6.2007876, 5.577428 , 6.2007876, 5.3477692, 6.0039372, 5.6102364,
       6.069554 , 5.5118112, 5.6758532, 6.0039372, 5.6758532, 5.6758532,
       5.74147  , 5.8398952, 6.0039372, 6.3320212, 5.8398952, 5.6758532,
       5.7086616, 6.0039372, 6.0039372, 5.5118112, 5.577428 , 5.8398952,
       5.9711288, 5.905512 , 6.0039372, 5.8398952, 5.9711288, 6.1679792,
       5.74147  , 5.8727036, 6.0039372, 6.3320212, 5.9711288, 6.0039372,
       5.8070868, 6.069554 , 6.1679792, 6.1679792, 5.9711288, 6.069554 ])