<img src = "https://img.betapage.co/images/77640967-77641456.png" height=50% width = 50%>

In [2]:
import numpy as np

# Introduction to NumPy

"Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. This library provides you with an array data structure that holds some benefits over Python lists, such as: being more compact, faster access in reading and writing items, being more convenient and more efficient."


# What is a NumPy array?

"The central feature of NumPy is the array object class. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists."

LINK: https://engineering.ucsb.edu/~shell/che210d/numpy.pdf

<img src = "http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/332/content_arrays-axes.png">

# NumPy Array Syntax
The function array takes two arguments: the list to be converted into the array and the type of each member of the list. 

In [3]:
#List to be converted
lst = [1,2,3,4,5,6,7,8,9]

arr = np.array(lst)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Array elements are accessed, sliced, and manipulated just like lists.

In [4]:
#Index from the 2nd index on
arr[2:]

array([3, 4, 5, 6, 7, 8, 9])

In [5]:
#manipulate item at index 0
arr[0] = 10
arr

array([10,  2,  3,  4,  5,  6,  7,  8,  9])


<b>* Why can't we simply use a python list for these scientific computations?<b>

# Python List VS NumPy Array

"Arrays and lists are both used in Python to store data, but they don't serve exactly the same purposes. They both can be used to store any data type (real numbers, strings, etc), and they both can be indexed and iterated through, but the similarities between the two don't go much further. The main difference between a list and an array is the functions that you can perform to them. For example, you can divide an array by 3, and each number in the array will be divided by 3 and the result will be printed if you request it. If you try to divide a list by 3, Python will tell you that it can't be done, and an error will be thrown."


In [6]:
lst = [3,6,9,12,15,18,12]
lst/3

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [7]:
arr = np.array([3,6,9,12,15,18,12])
arr/3

array([1., 2., 3., 4., 5., 6., 4.])

Arrays can be multidimensional. Unlike lists, different axes are accessed using commas inside bracket notation. Here is an example with a two-dimensional array (e.g., a matrix)

In [8]:
lst1 = [1,2,3,4,5]
lst2 = [5,6,7,8,9]
arr = np.array([lst1,lst2]) # turn it into a list of lists, without putting them in a list, it would error
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

In [9]:
arr/3

array([[0.33333333, 0.66666667, 1.        , 1.33333333, 1.66666667],
       [1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ]])

In [10]:
lst_lst = [lst1,lst2]
lst_lst

[[1, 2, 3, 4, 5], [5, 6, 7, 8, 9]]

In [11]:
lst_lst/3

TypeError: unsupported operand type(s) for /: 'list' and 'int'

# Indexing Arrays VS Lists

In [12]:
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

In [13]:
arr[0][1]

2

In [14]:
lst_lst

[[1, 2, 3, 4, 5], [5, 6, 7, 8, 9]]

In [15]:
lst_lst[0,1]

TypeError: list indices must be integers or slices, not tuple

In [16]:
lst_lst[0][1]

2

In [17]:
arr[-1]

array([5, 6, 7, 8, 9])

In [18]:
lst_lst[-1]

[5, 6, 7, 8, 9]

In [19]:
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

<h3> How to index a multidemsional array? </h3><br>
The individual elements of arrays can be accessed in the same way as for lists.

<img src = "http://www.scipy-lectures.org/_images/numpy_indexing.png" height = 60% width = 60%>

In [20]:
list_2d = [[0,1,2,3,4,5],
           [10,11,12,13,14,15],
           [20,21,22,23,24,25],
           [30,31,32,33,34,35],
           [40,41,42,43,44,45],
           [50,51,52,53,54,55]]

In [21]:
array_2d = np.array(list_2d)
print(array_2d)
array_2d.shape

[[ 0  1  2  3  4  5]
 [10 11 12 13 14 15]
 [20 21 22 23 24 25]
 [30 31 32 33 34 35]
 [40 41 42 43 44 45]
 [50 51 52 53 54 55]]


(6, 6)

In [22]:
print(array_2d[0,3:5])

[3 4]


In [23]:
print(array_2d[4:,4:])

[[44 45]
 [54 55]]


In [24]:
print(array_2d[:,2])

[ 2 12 22 32 42 52]


In [25]:
print(array_2d[2::2,::2]) # step by 2

[[20 22 24]
 [40 42 44]]


In [26]:
# adding new column to numpy array

In [27]:
calc = array_2d[:,5] * 1.05
calc

array([ 5.25, 15.75, 26.25, 36.75, 47.25, 57.75])

In [28]:
np.column_stack((array_2d,calc))

array([[ 0.  ,  1.  ,  2.  ,  3.  ,  4.  ,  5.  ,  5.25],
       [10.  , 11.  , 12.  , 13.  , 14.  , 15.  , 15.75],
       [20.  , 21.  , 22.  , 23.  , 24.  , 25.  , 26.25],
       [30.  , 31.  , 32.  , 33.  , 34.  , 35.  , 36.75],
       [40.  , 41.  , 42.  , 43.  , 44.  , 45.  , 47.25],
       [50.  , 51.  , 52.  , 53.  , 54.  , 55.  , 57.75]])

# Changing Array to different DataType

In [29]:
arr = arr.tolist()
arr

[[1, 2, 3, 4, 5], [5, 6, 7, 8, 9]]

In [30]:
type(arr)

list

In [31]:
arr = np.array(arr)
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

In [32]:
# type gives you the data structure type of an object
type(arr)

numpy.ndarray

In [33]:
# dtype is a method that tells you what data type is inside the object
arr.dtype

dtype('int32')

In [34]:
arr.shape

(2, 5)

# Change Array Shape

<img src = "https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/httpatomoreillycomsourceoreillyimages1346880.png" height = 50% width = 30% style = display.left> 

Transposed versions of arrays can also be generated, which will create a new array with the final two axes switched:

In [35]:
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

In [36]:
arr.shape

(2, 5)

In [37]:
arr.transpose()

array([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8],
       [5, 9]])

In [38]:
arr.transpose().shape

(5, 2)

In [39]:
arr.reshape((5,2))

array([[1, 2],
       [3, 4],
       [5, 5],
       [6, 7],
       [8, 9]])

Make multidimensional array into one-dimensional array

In [40]:
arr.shape

(2, 5)

In [41]:
arr.flatten()

array([1, 2, 3, 4, 5, 5, 6, 7, 8, 9])

In [42]:
arr.flatten().shape

(10,)

# Create New Array (Specific)

Numpy also provides many functions to create arrays.

Creates an array of all zeros with a specified shape.

In [43]:
#1-Dimensional
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [44]:
#2-Dimensional
np.zeros((2,2), int)

array([[0, 0],
       [0, 0]])

Creates an array of all ones with a specified shape.

In [45]:
#1-Dimensional
np.ones(10, int)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [46]:
#2-Dimensional
np.ones((2,2))

array([[1., 1.],
       [1., 1.]])

Creates a constant array (specified number) with a specified shape.

In [47]:
#1-Dimensional
np.full(10,7)

array([7, 7, 7, 7, 7, 7, 7, 7, 7, 7])

In [48]:
#2-Dimensional
np.full((2, 2), 7)

array([[7, 7],
       [7, 7]])

Created an array of a specified shape with random values.

In [49]:
#1-Dimensional  # np.random(10) will give an error of 'module' object is not callable
np.random.random(10)

array([0.02107211, 0.39107656, 0.25652008, 0.09399946, 0.23916376,
       0.52494928, 0.39130812, 0.18342682, 0.78896763, 0.42157173])

In [50]:
#2-Dimensional
np.random.random((2,2))

array([[0.08616472, 0.09666005],
       [0.88274015, 0.07452481]])

Create an array of a specified length with evenly spaced values.

In [51]:
#1-Dimensional
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Create an array with a specified "start", "stop", and number of values, evenly spaced.

In [52]:
#1-Dimensional # Evenly spaced numbers from starting value to ending value, inclusive
np.linspace(1, 10, num = 20)

array([ 1.        ,  1.47368421,  1.94736842,  2.42105263,  2.89473684,
        3.36842105,  3.84210526,  4.31578947,  4.78947368,  5.26315789,
        5.73684211,  6.21052632,  6.68421053,  7.15789474,  7.63157895,
        8.10526316,  8.57894737,  9.05263158,  9.52631579, 10.        ])

Creates a 2x2 identity matrix (array).

An identity matrix is a square matrix having 1s on the main diagonal, and 0s everywhere else. These are called identity matrices because, when you multiply them with a compatible matrix , you get back the same matrix.
http://www.sparknotes.com/math/algebra2/matrices/section3.rhtml

In [53]:
#2-Dimensional # eye can be n by m with 1 on the diagonal, it doesn't have to be square
np.eye(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

OR

In [54]:
#2-Dimensional # identify gives a square array with 1 on the diagonal
np.identity(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [55]:
# Element-wise multiplication
np.random.random((5, 5)) * np.identity(5)

array([[0.37715072, 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.00069883, 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.00224135, 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.23655394, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.41484868]])

In [56]:
# Use np.dot() for matrix multiplication
np.random.random((5, 5)).dot(np.identity(5))

array([[0.47563097, 0.23810778, 0.81911847, 0.69994335, 0.56337358],
       [0.80770716, 0.18217004, 0.92379027, 0.70408446, 0.60244833],
       [0.44306892, 0.48694048, 0.76360406, 0.46943033, 0.62141231],
       [0.57695235, 0.51047635, 0.48479367, 0.96038767, 0.75350201],
       [0.28784526, 0.94987159, 0.92968963, 0.11688747, 0.12137085]])

# Math Functions using NumPy

"As such, it probably won’t surprise you that you can just use +, -, *, / or % to add, subtract, multiply, divide or calculate the remainder of two (or more) arrays. However, a big part of why NumPy is so handy, is because it also has functions to do this. The equivalent functions of the operations that you have seen just now are, respectively, np.add(), np.subtract(), np.multiply(), np.divide() and np.remainder()."

https://www.datacamp.com/community/tutorials/python-numpy-tutorial

In [57]:
arr = np.ones((10,10))
arr

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [58]:
np.add(arr,2)

array([[3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.]])

In [59]:
#OR
arr + 2

array([[3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.]])

In [60]:
np.multiply(arr,2)

array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]])

In [61]:
#OR
arr*2

array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]])

In [62]:
np.subtract(arr,1)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [63]:
#OR
arr -1 

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [64]:
np.divide(arr,2)

array([[0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]])

In [65]:
#OR
arr/2

array([[0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]])

In [66]:
np.remainder(arr,1)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [67]:
#OR
arr % 1

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [68]:
arr.sum()

100.0

In [69]:
arr.min()

1.0

In [70]:
arr.max()

1.0

In [71]:
arr.mean()

1.0

# <font color = magenta> NumPy Problem 1 </font>
<font color = magenta>
Create the three arrays displayed in the image, below.

<img src = "https://i.stack.imgur.com/ojnFF.jpg">

In [72]:
#Array 1, create list then convert to np array
array1 = np.array([[4,6,4],[1,1,8],[0,7,5],[5,3,3],[8,9,5]])
array1

array([[4, 6, 4],
       [1, 1, 8],
       [0, 7, 5],
       [5, 3, 3],
       [8, 9, 5]])

In [73]:
#Array 2
array2 = np.array([[8,8,4],[3,4,4],[0,0,9],[3,7,3],[3,4,7]])
array2

array([[8, 8, 4],
       [3, 4, 4],
       [0, 0, 9],
       [3, 7, 3],
       [3, 4, 7]])

In [74]:
#Array 3
array3 = np.array([[9,5,4],[7,7,3],[9,5,9],[8,7,8],[5,8,8]])
array3

array([[9, 5, 4],
       [7, 7, 3],
       [9, 5, 9],
       [8, 7, 8],
       [5, 8, 8]])

# <font color = magenta> NumPy Problem 2 </font>
<font color = magenta>
Create a multidimensional array of your dimension choice and fill it random values(not filled manually).

In [75]:
ran_array = np.random.random((2,3))
print(ran_array)

[[0.45301542 0.26563688 0.53423446]
 [0.73333001 0.09758066 0.72374289]]


Find the min and max values of your array.

In [76]:
print("Min: " + str(ran_array.min()))
print("Max: " + str(ran_array.max()))

Min: 0.09758066320771497
Max: 0.7333300101437166


# <font color = magenta> NumPy Problem 3 </font>
<font color = magenta>
Create another multidimensional array of your dimension choice and fill it random values(not filled manually). Find the max value of your new array and replace it with your min value. Find the min value and replace it in your array with the max value.

In [77]:
ran_array2 = np.random.random((3,3))
ran_array2

array([[0.35954133, 0.14680573, 0.87311603],
       [0.39034766, 0.19316619, 0.67200132],
       [0.85479472, 0.70401127, 0.78789604]])

In [78]:
# np.where or np.nonzero can be used to extract the indices by indexing [0] and [1]
# but np.argmin and np.unravel_index is easier to extract the flat index of min, and corresponding tuple index
print(np.where(ran_array2 == ran_array2.min()))
print(np.nonzero(ran_array2 == ran_array2.min()))

(array([0], dtype=int64), array([1], dtype=int64))
(array([0], dtype=int64), array([1], dtype=int64))


In [79]:
# np.argmin gets the flat index of min
min_ix = np.argmin(ran_array2)  
# np.unravel_index(index, array_shape) turns the flat index into an index tuple
min_ix = np.unravel_index(min_ix, ran_array2.shape) 
min_ix

(0, 1)

In [80]:
max_ix= np.argmax(ran_array2)
max_ix = np.unravel_index(max_ix, ran_array2.shape)
max_ix

(0, 2)

In [81]:
ran_array2[min_ix], ran_array2[max_ix] = ran_array2[max_ix], ran_array2[min_ix]  # simultaneous assignment

In [82]:
ran_array2

array([[0.35954133, 0.87311603, 0.14680573],
       [0.39034766, 0.19316619, 0.67200132],
       [0.85479472, 0.70401127, 0.78789604]])

# <font color = magenta> NumPy Problem 4 </font>

Create a random vector of size 10 and sort it.

In [83]:
# sorted(list) function creates a new list and works on iterables
# list.sort() method mutates the list and doesn't work on iterables
# for numpy arrays, need to use np.sort() function which creates a new array, sorted(array) would return a list, and array.sort() doesn't exist
ran_array3 = np.random.random(10)
print(ran_array3)
print(np.sort(ran_array3))  # creates a new array, doesn't change original


[0.28001569 0.05008231 0.82557061 0.63516288 0.06406694 0.08258813
 0.33245263 0.18885157 0.77675723 0.25189331]
[0.05008231 0.06406694 0.08258813 0.18885157 0.25189331 0.28001569
 0.33245263 0.63516288 0.77675723 0.82557061]


# <font color = magenta> NumPy Problem 5 </font>

<font color = magenta>
How to swap two rows of an array?

In [84]:
# i.e. row 1 becomes row 2 and row 2 becomes row 1
ran_array4 = np.random.random((3,3))
print(ran_array4)

# numpy array allows you to index multiple rows by a list of row numbers separated by commas
# with numpy, simultaneous assignment like that of list does not work and would lead to overwriting
ran_array4[[1,2]] = ran_array4[[2,1]]  
print(ran_array4)

[[0.38365102 0.75558365 0.91419139]
 [0.15636999 0.85022064 0.24988512]
 [0.80047298 0.73932565 0.41144519]]
[[0.38365102 0.75558365 0.91419139]
 [0.80047298 0.73932565 0.41144519]
 [0.15636999 0.85022064 0.24988512]]


In [85]:
# numpy array allows you to index slice of mulituple rows or columns by a list of rol/col numbers, separated by commas
ran_array4[[0,1],0]

array([0.38365102, 0.80047298])

In [86]:
# pandas allow you to index slice by [row range][col range], no list of row/col numbers
import pandas as pd
test_df = pd.DataFrame(ran_array4)
print(test_df)
test_df[0:2][0]

          0         1         2
0  0.383651  0.755584  0.914191
1  0.800473  0.739326  0.411445
2  0.156370  0.850221  0.249885


0    0.383651
1    0.800473
Name: 0, dtype: float64

In [87]:
# No easy way to index certain slice in list of lists because it is not structured as a multi-D array or a dataframe
# Best to turn it into a numpy array if slice extraction is needed
test_list = [[4,5,6],[6,8,9], [10,3,5]]
print([row[0] for row in test_list][0:2]) # Extract the 0th column with list comprehension, then index the needed row
print(list(zip(*test_list))[0][0:2])  # Zip the unpack list to turn it into tuples of columns, then index the needed row, return a tuple

# zip wants a bunch of arguments to zip together, but here is a single argument (a list of lists)
# the * operator "unpacks" a list (or other iterable), making each of its elements a separate argument

[4, 6]
(4, 6)


# Numpy with Bay Area housing data set

In [88]:
def read_file_housing(filename):
    file_open = open(filename,"r")
    fixed_file = open("fixed-housing-data.csv","w")
    line_count = 0
    for line in iter(file_open):
        line_count += 1
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line1 = line_no_newline.replace("84085","94085") #Ex9
        line2 = line1.replace("84087","94087") #Ex9
        line3 = line2.replace("85014","95014") #Ex9
        line4 = line3.replace("85051","95051") #Ex9
        line5 = line4.replace("l","1") #Ex11 -- Car_Garage
        line_split = line5.split(",")
        if (int(line_split[5]) < 100): #Ex10 -- School_API
            line_split[5] = int(line_split[5]) * 10
        else:
            line_split[5] = int(line_split[5])
        line_split = [str(x) for x in line_split]
        myString = ",".join(line_split) + "\n"
        fixed_file.write(myString)
    return

In [89]:
read_file_housing("bayarea_home_prices.csv")

In [90]:
import numpy as np

In [91]:
"""
0 = HomeID
1 = HomeAge
2 = HomeSqft
3 = LotSize
4 = BedRooms
5 = HighSchoolAPI
6 = ProxFwy
7 = CarGarage
8 = ZipCode
9 = HomePriceK
"""

'\n0 = HomeID\n1 = HomeAge\n2 = HomeSqft\n3 = LotSize\n4 = BedRooms\n5 = HighSchoolAPI\n6 = ProxFwy\n7 = CarGarage\n8 = ZipCode\n9 = HomePriceK\n'

In [92]:
housing = np.loadtxt("fixed-housing-data.csv",
                          dtype=int,
                          delimiter=",")

In [93]:
print(housing[0:2])

[[    1    24  1757  6056     2   899     3     3 94085   894]
 [    2    10  1563  6085     2   959     4     3 94085   861]]


In [94]:
print(housing.shape)

(100, 10)


In [95]:
# home prices
print(housing[:,9])

[ 894  861  831  809  890  867  843  820  874  885  903  912  933  865
  918  950  882  896  942  859  904  912  916  972  908  934  914  949
  919  953  991 1049 1042  994 1030 1019 1044 1038 1024  976 1115 1128
 1071 1059 1000 1185 1015 1114 1138 1068 1068 1097 1074 1114 1075 1130
 1116 1103 1080 1150 1177 1149 1163 1132 1138 1199 1179 1173 1128 1165
 1233 1180 1240 1242 1184 1173 1194 1181 1190 1182 1221 1288 1275 1300
 1272 1294 1219 1282 1256 1205 1252 1294 1269 1335 1267 1307 1336 1284
 1269 1250]


In [96]:
housing[:,9][0].dtype

dtype('int32')

In [97]:
print(housing[:,9] + 10)  # prints +10, doesn't change the housing data

[ 904  871  841  819  900  877  853  830  884  895  913  922  943  875
  928  960  892  906  952  869  914  922  926  982  918  944  924  959
  929  963 1001 1059 1052 1004 1040 1029 1054 1048 1034  986 1125 1138
 1081 1069 1010 1195 1025 1124 1148 1078 1078 1107 1084 1124 1085 1140
 1126 1113 1090 1160 1187 1159 1173 1142 1148 1209 1189 1183 1138 1175
 1243 1190 1250 1252 1194 1183 1204 1191 1200 1192 1231 1298 1285 1310
 1282 1304 1229 1292 1266 1215 1262 1304 1279 1345 1277 1317 1346 1294
 1279 1260]


In [98]:
print(housing.sum(axis=0)) # summing all rows for each column, below is a row array

[   5050    1720  161528  784050     271   90443     310     152 9455925
  108099]


In [99]:
print(housing.sum(axis=1)) # summing all columns for each row, below is a column array

[103724 103574 103240 103224 103896 103638 104044 103869 103609 104170
 104094 103592 103978 104376 104062 105595 104218 104335 104940 104012
 104075 104583 104684 105583 104958 104446 104876 105593 104503 106006
 105990 106058 106256 105929 106298 105629 106320 106196 106023 106261
 105369 105445 106369 105987 105936 105549 106606 105320 105358 106782
 106365 106479 106225 105461 106713 105795 106138 105594 106827 105847
 105796 106180 105869 105800 106169 107018 106669 106315 106430 106551
 107465 106524 107181 107608 106533 107139 107961 106805 106554 106595
 107843 107864 108077 107983 108182 108227 107735 107879 108478 108106
 107902 108487 107956 108654 108172 108678 108406 108111 108617 108407]


In [100]:
homes_94085 = (housing[:,8] == 94085)

In [101]:
print(homes_94085)

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True False  True  True False  True  True  True  True False
  True  True  True False  True False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False]


In [102]:
data_94085 = housing[homes_94085,][:,:]
# print(data_94085)

In [103]:
sum_price_94085 = data_94085[:,9].sum()

In [104]:
average_94085 = sum_price_94085/25
print(average_94085)

885.96


In [105]:
housing[housing[:,8] == 94085,9].sum()/len(housing[housing[:,8] == 94085,9])

885.96

# NumPy Problem 6
### Calculate average price in each zip code: 94085, 94087, 95014, 95051
### Calculate minimum and max price in each zip code: 94085, 94087, 95014, 95051
### Calculate standard deviation of price in each zip code: 94085, 94087, 95014, 95051

In [106]:
# Your code here
# print(housing[0:5])
'''
0 = HomeID
1 = HomeAge
2 = HomeSqft
3 = LotSize
4 = BedRooms
5 = HighSchoolAPI
6 = ProxFwy
7 = CarGarage
8 = ZipCode
9 = HomePriceK
'''
def avg_price(zipcode):
    return housing[housing[:,8] == zipcode,9].mean()

def min_max_price(zipcode):
    return [housing[housing[:,8] == zipcode,9].min(), housing[housing[:,8] == zipcode,9].max()]
    
def std_price(zipcode):
    return housing[housing[:,8] == zipcode,9].std()
            
zipcodes = [94085, 94087, 95014, 95051]
#print("Stats for "+', '.join([str(z) for z in zipcodes]))

print("ZipCode: Average, Min, Max, Stdev")
# f-string makes formatting *much easier* by putting variables in {}, and format with {var:format}
for z in zipcodes:
        print(f"{z}  : {avg_price(z)}, {min_max_price(z)[0]}, {min_max_price(z)[1]}, {std_price(z):.2f}")    
        #print("%i  : %.2f, %i, %i, %.2f" % (z, avg_price(z), min_max_price(z)[0], min_max_price(z)[1], std_price(z)))
        #print("{0}  : {1}, {2}, {3}, {4:.2f}".format(z, avg_price(z), min_max_price(z)[0], min_max_price(z)[1], std_price(z)))
  

# unrelated note: instructor's company is Discern Analytics - Financial Deep Learning to uncover not obvious events that may or may not happen in the next week that may affect a stock ticker


ZipCode: Average, Min, Max, Stdev
94085  : 885.96, 809, 934, 33.71
94087  : 1151.48, 1103, 1190, 27.57
95014  : 1263.32, 1194, 1336, 37.74
95051  : 1023.2, 942, 1097, 46.04


In [107]:
print(f"Hello {avg_price(94085)}")
f"Hello {avg_price(94085)}"

Hello 885.96


'Hello 885.96'

# NumPy Problem 7
### Find top-2 listings by School API for all zipcodes

In [108]:
# Your code here, search by zipcode and sort by school api, print first two listings

In [109]:
# argsort() returns the index positions for the values if they were to be sorted, kind of like their order number
h1 = housing[housing[:,5].argsort()] # by school_api ascending
print(h1[0:10])

[[   65    14  1617  8394     2   850     2     0 94087  1138]
 [   73    25  1302  8668     3   850     4     2 95014  1240]
 [   23    15  1828  6956     3   851     4     3 94085   916]
 [   20    13  1358  6819     2   851     3     2 94085   859]
 [   79    17  1373  8953     2   851     2     0 94087  1190]
 [   77    17  1881  8921     3   852     2     0 95014  1194]
 [   19    10  1246  6810     2   853     4     3 95051   942]
 [   32    18  1866  7181     2   854     2     3 95051  1049]
 [   95    13  1582  9339     3   856     3     0 95014  1267]
 [   26    12  1500  7025     2   856     4     2 94085   934]]


In [110]:
# .argsort()[::-1] reverse the numpy array of sort indices
h2 = housing[housing[:,5].argsort()[::-1]] # by school_api descending
print(h2[0:10])

[[   38    22  1724  7339     3   975     3     3 95051  1038]
 [   35    12  1943  7249     2   974     2     0 95051  1030]
 [   27    13  1836  7027     2   966     3     3 94085   914]
 [   17    23  1464  6773     3   965     4     2 94085   882]
 [   69    21  1575  8579     2   962     4     3 94087  1128]
 [   37    13  1874  7333     3   960     3     2 95051  1044]
 [   45    15  1249  7609     3   960     2     2 95051  1000]
 [    2    10  1563  6085     2   959     4     3 94085   861]
 [    4    14  1215  6129     3   959     4     2 94085   809]
 [   33    11  1953  7199     3   959     3     2 95051  1042]]


In [111]:
temp = housing[housing[:,8] == 94085]  # first filter by rows of the proper zipcode
print(temp[temp[:,5].argsort()[::-1]][0:2])  # then display the sorted version by SchoolAPI, and show the first 2 rows

[[   27    13  1836  7027     2   966     3     3 94085   914]
 [   17    23  1464  6773     3   965     4     2 94085   882]]


In [112]:
# can use sorted with lambda instead of .argsort(), but the result is a list of arrays instead of a multi-D arrays
# so it's better to use .argsort() above for sorting multi-D arrays
sorted(temp, key=lambda x:x[5], reverse = True)[0:2]  

[array([   27,    13,  1836,  7027,     2,   966,     3,     3, 94085,
          914]),
 array([   17,    23,  1464,  6773,     3,   965,     4,     2, 94085,
          882])]

In [113]:
def top_api_zipcode(zipcode):
    ziprows = housing[housing[:,8] == zipcode]
    return ziprows[ziprows[:,5].argsort()[::-1]][0:2]

for z in zipcodes:
    print(f"Top 2 listings by School API for zipcode {z}:")
    print(top_api_zipcode(z))
    

Top 2 listings by School API for zipcode 94085:
[[   27    13  1836  7027     2   966     3     3 94085   914]
 [   17    23  1464  6773     3   965     4     2 94085   882]]
Top 2 listings by School API for zipcode 94087:
[[   69    21  1575  8579     2   962     4     3 94087  1128]
 [   76    12  1947  8882     3   954     3     2 94087  1173]]
Top 2 listings by School API for zipcode 95014:
[[   97    10  1645  9352     4   942     3     3 95014  1336]
 [   93    25  1298  9309     3   942     3     0 95014  1269]]
Top 2 listings by School API for zipcode 95051:
[[   38    22  1724  7339     3   975     3     3 95051  1038]
 [   35    12  1943  7249     2   974     2     0 95051  1030]]


# NumPy Problem 8
### Prices are expected to go up by 4% next year.
### Add another column with predicted prices

In [114]:
# using .astype to format the float as object so to not turn all other values in the multi-D array into floats
newprices = housing[:,9]*1.04
newpriceadded = np.column_stack((housing, newprices.astype(np.object)))

print(newpriceadded)

[[1 24 1757 ... 94085 894 929.76]
 [2 10 1563 ... 94085 861 895.44]
 [3 14 1344 ... 94085 831 864.24]
 ...
 [98 21 1312 ... 95014 1284 1335.3600000000001]
 [99 19 1880 ... 95014 1269 1319.76]
 [100 11 1691 ... 95014 1250 1300.0]]


# NumPy Problem 9
### Sort the matrix based on HomeID. Save the updated numpy matrix with added column in Problem 8 to a file.

In [115]:
# Your code here, HomeID should be already sequential, can try sorting it with SchoolAPI or other cols, write numpy array as a file, google
sort_newhousing = newpriceadded[newpriceadded[:,0].argsort()]
print(sort_newhousing)


[[1 24 1757 ... 94085 894 929.76]
 [2 10 1563 ... 94085 861 895.44]
 [3 14 1344 ... 94085 831 864.24]
 ...
 [98 21 1312 ... 95014 1284 1335.3600000000001]
 [99 19 1880 ... 95014 1269 1319.76]
 [100 11 1691 ... 95014 1250 1300.0]]


In [116]:
import pandas as pd
newhousingdf = pd.DataFrame(sort_newhousing)
newhousingdf.to_csv("newhousingdata.csv")


# <font color = magenta> NumPy Problem 10 </font>

Write a function that takes a long string containing multiple words. Print the same string, except with the words in backwards order. 

<i>HINT: Use <b>YOUR_STRING<code>.split()</code></b> function<br></i>

In [131]:
# We do not worry about grade in this class. -> Class this in grade about worry not do we. 
# lowercase, split, reverse, first letter of sentence made uppercase, append period at the end
my_string = "We do not worry about grade in this class."


In [132]:
import string
my_string_translate = my_string.maketrans('','',string.punctuation)
string_list = my_string.translate(my_string_translate).lower().split()
reversed_string_list = string_list[::-1]
reversed_string_list

['class', 'this', 'in', 'grade', 'about', 'worry', 'not', 'do', 'we']

In [133]:
reversed_string = " ".join(reversed_string_list)
reversed_string[0] 

'c'

In [134]:
result_string = reversed_string[0].upper() + reversed_string[1:] + '.'
result_string

'Class this in grade about worry not do we.'