## **TDI DATA SCIENCE TRACK WEEK 6: NumPy**

In [1]:
import numpy as np

### **Section 1: Basic Questions**
**1. NumPy Array Creation and Manipulation**

**a)	Create a 1D array of numbers from 1 to 10.**

In [2]:
basic_array = np.array(range(1, 11)) # use range function to generate numbers from 1 to 10 then convert to an array
print(basic_array)

[ 1  2  3  4  5  6  7  8  9 10]


**b)	Reshape the array into a 2 x 5 matrix.**

In [3]:
reshaped = basic_array.reshape(2, 5)
print(reshaped)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


**c)	Find the element located in the second row, third column of the matrix.**

In [4]:
print(reshaped[1, 2])

8


**2. Basic Array Operations**

**a)	Create two NumPy arrays: array_1 = [10, 20, 30] and array_2 = [1, 2, 3].**

In [5]:
array_1 = np.array(range(10, 31, 10))
array_2 = np.array(range(1, 4, 1))

print(f"array_1: {array_1}")
print(f"array_2: {array_2}")


array_1: [10 20 30]
array_2: [1 2 3]


**b)	Perform element-wise addition, subtraction, and multiplication of these arrays.**

In [6]:
print(f"Adding the two arrays created above will result to a new array: {array_1 + array_2}")
print(f"Subtracting the two arrays will result to a new array: {array_1 - array_2}")
print(f"Multiplying the two arrays will result to a new array: {array_1 * array_2}")

Adding the two arrays created above will result to a new array: [11 22 33]
Subtracting the two arrays will result to a new array: [ 9 18 27]
Multiplying the two arrays will result to a new array: [10 40 90]


**c)	Calculate the sum, mean, and standard deviation of the resulting array after element-wise multiplication.**

In [7]:
print("Sum of the resulting arrays:")
print(f"> For the addition operation is: {(array_1 + array_2).sum()}")
print(f"> For the subtraction operation is: {(array_1 - array_2).sum()}")
print(f"> For the multiplication operation is: {(array_1 * array_2).sum()}")

Sum of the resulting arrays:
> For the addition operation is: 66
> For the subtraction operation is: 54
> For the multiplication operation is: 140


In [8]:
print("Mean of the resulting arrays:")
print(f"> For the addition operation is: {(array_1 + array_2).mean()}")
print(f"> For the subtraction operation is: {(array_1 - array_2).mean()}")
print(f"> For the multiplication operation is: {round((array_1 * array_2).mean(), 2)}")

Mean of the resulting arrays:
> For the addition operation is: 22.0
> For the subtraction operation is: 18.0
> For the multiplication operation is: 46.67


In [9]:
print("Standard Deviation of the resulting arrays:")
print(f"> For the addition operation is: {round((array_1 + array_2).std(), 2)}")
print(f"> For the subtraction operation is: {round((array_1 - array_2).std(), 2)}")
print(f"> For the multiplication operation is: {round((array_1 * array_2).std(), 2)}")

Standard Deviation of the resulting arrays:
> For the addition operation is: 8.98
> For the subtraction operation is: 7.35
> For the multiplication operation is: 33.0


3. Broadcasting and Reshaping

**a)	Create a 1D array of numbers from 0 to 15. Reshape it into a 3x5 matrix.**

In [10]:
my_array = np.array(range(0,15)).reshape(3,5)
print(my_array)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


**b)	Add 10 to each element in the matrix using broadcasting.**

In [11]:
my_array2 = my_array + 10
print(my_array2)

[[10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


**c)	Create another 3x5 matrix and multiply the two matrices using broadcasting.**

In [12]:
print(my_array * my_array2)

[[  0  11  24  39  56]
 [ 75  96 119 144 171]
 [200 231 264 299 336]]


**4. Logical Operations and Boolean Indexing**

**a)	Create an array with values: [15, 22, 33, 41, 50, 65, 72].**

In [13]:
values = np.array([15, 22, 33, 41, 50, 65, 72])
print(values)

[15 22 33 41 50 65 72]


**b)	Use boolean indexing to filter out values that are greater than 40.**


In [14]:
print("There are the values greater than 40 in the list:")
print(values[values > 40])

There are the values greater than 40 in the list:
[41 50 65 72]


**c)	Replace all values greater than 40 with the value 0 in the original array.**

In [15]:
values[values > 40] = 0
print(values)

[15 22 33  0  0  0  0]


### **Section 2: Advanced Questions**
**1. Matrix Operations**

**a)	Create a 3x3 matrix using the values [[1, 2, 3], [4, 5, 6], [7, 8, 9]].**


In [16]:
advanced = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(advanced)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


**b)	Find the transpose of the matrix.**

In [17]:
transposed = advanced.T
print(transposed)

[[1 4 7]
 [2 5 8]
 [3 6 9]]


**c)	Perform matrix multiplication with another 3x3 matrix of your choice using np.dot().**

In [18]:
using_dot = np.dot(advanced, transposed)
print(using_dot)

[[ 14  32  50]
 [ 32  77 122]
 [ 50 122 194]]


**d)	Calculate the inverse of the resulting matrix, if possible.**

In [19]:
inverse = using_dot^(-1)
print(inverse)

[[ -15  -33  -51]
 [ -33  -78 -123]
 [ -51 -123 -195]]


**2. Statistical Analysis**

**a)	Create two arrays: array_x = [1, 3, 5, 7, 9] and array_y = [2, 4, 6, 8, 10].**

In [20]:
array_x = np.array([1, 3, 5, 7, 9])
array_y = np.array([2, 4, 6, 8, 10])

**b)	Compute the correlation coefficient between the two arrays using np.corrcoef().**

In [21]:
print(np.corrcoef(array_x, array_y))

[[1. 1.]
 [1. 1.]]


**c)	Generate 1000 random numbers from a normal distribution and find the 95th percentile.**

In [22]:
generate = np.random.randint(15, 85, 1000)
print(np.percentile(generate, 0.95))

15.0


**3. Solving Linear Equations** <br>
a) Solve the following system of equations using NumPy:<br>
   2x + y = 10 <br>
   x - y = 2 <br>
Use matrix form to solve this using np.linalg.solve().

In [23]:
A = np.array([[2, 1], [1, -1]])
B = np.array([[10], [2]])
solution = np.linalg.solve(A, B)
print("The solution of the system of equations:")
print(f"x={solution[0]}, y={solution[1]}")

The solution of the system of equations:
x=[4.], y=[2.]


**4. Fancy Indexing and Conditional Selection**

**a)	Create an array: array = [25, 45, 15, 75, 35, 55, 85].**

In [24]:
fancy = np.array([25, 45, 15, 75, 35, 55, 85])
print(fancy)

[25 45 15 75 35 55 85]


**b)	Use fancy indexing to rearrange the elements in descending order.**

In [25]:
print(fancy[[2, 0, 4, 1, -2, 3, -1]])

[15 25 35 45 55 75 85]


**c)	Use np.where() to replace all values greater than 50 with 100 and values less than or equal to 50 with 0.**


In [26]:
print(np.where(fancy > 50, 100, 0))

[  0   0   0 100   0 100 100]


**5. Performance Comparison: NumPy vs. Python Lists**

**a)	Create a Python list with 1 million elements and a NumPy array with the same number of elements.**

In [27]:
python_list = (np.random.randint(5, 200, 1000000)).tolist()
np_array = np.array(np.random.randint(5, 200, 1000000))

**b)	Write a Python function that computes the sum of all elements in both the list and the NumPy array.**

In [28]:
def sum(elements):
    return np.sum(elements)

# call the function
x = print(f"Sum of Python list: {sum(python_list)}")
print(f"Sum of NumPy array: {sum(np_array)}")

Sum of Python list: 102068479
Sum of NumPy array: 101919995


**c)	Use the timeit module to compare the time taken to sum the elements in both the list and the NumPy array. Report the time difference.**


In [29]:
import timeit

In [30]:
# define the code so it can be timedd
def wrapper():
    sum(python_list)


python_time = timeit.timeit(stmt = wrapper, number = 100)
print(f"The time taken to calculate the sum of the Python list is: {python_time} seconds")

The time taken to calculate the sum of the Python list is: 37.65937789995223 seconds


In [31]:
# define the code so it can be timed
def wrapper():
    sum(np_array)


numpy_time = timeit.timeit(stmt = wrapper, number = 100)
print(f"The time taken to calculate the sum of the NumPy array is: {numpy_time} seconds")

The time taken to calculate the sum of the NumPy array is: 0.8927232000278309 seconds


In [32]:
print(f"The difference in processing the calculation 100 times for both data type is {python_time - numpy_time}.")
print("This calculation validates the reason NumPy arrays are preferred to Python list in Data Science.")

The difference in processing the calculation 100 times for both data type is 36.7666546999244.
This calculation validates the reason NumPy arrays are preferred to Python list in Data Science.
