# NumPy

## 1. Numpy Array Basics
***what is Numpy:***<br>
NumPy, which stands for Numerical Python, is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional array and matrix data structures, along with a large collection of high-level mathematical functions to operate on these arrays.

***What is NumPy Arrays:***<br>
A NumPy array is a central data structure of the NumPy library. Unlike Python lists, NumPy arrays are more compact, faster, and more efficient for mathematical operations. They are particularly useful for handling large datasets and performing numerical computations.

In [14]:
import numpy as np

example_array = np.array([1, 2, 3, 4, 5])
print('this is an array:', example_array)

multi_dim_array = np.array([[1, 2, 3], [4, 5, 6]])
print('this is a multi dim array',multi_dim_array)

print(example_array.shape)
print(multi_dim_array.shape)

zeros_array = np.zeros((2, 3))  # 2x3 array of zeros
ones_array = np.ones((3, 2))   # 3x2 array of ones
print(f"array of zeros {zeros_array}")
print(f"array of ones {ones_array}")



this is an array: [1 2 3 4 5]
this is a multi dim array [[1 2 3]
 [4 5 6]]
(5,)
(2, 3)
array of zeros [[0. 0. 0.]
 [0. 0. 0.]]
array of ones [[1. 1.]
 [1. 1.]
 [1. 1.]]


## 2. Array Inspection
Techniques to inspect the size, shape, memory consumption, and data types of arrays.

In [15]:
example_array = np.array([[1, 2, 3], [4, 5, 6]])
print("Number of Dimensions:", example_array.ndim)
print("Shape of Array:", example_array.shape)
print("Size of Array:", example_array.size)
print("Data Type of Array:", example_array.dtype)


Number of Dimensions: 2
Shape of Array: (2, 3)
Size of Array: 6
Data Type of Array: int32


## 3. Array Operations

NumPy arrays facilitate easy and efficient execution of basic arithmetic operations. You can perform these operations on arrays element-wise, which means the operation is applied to each element individually.

In [16]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
print(array1 + 5)  # Broadcasting a scalar to each element of the array

[6 7 8]


In [17]:
# Dot product
dot_product = np.dot(array1, array2)
print(dot_product)

# Matrix multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
matrix_product = np.matmul(matrix1, matrix2)
print(matrix_product)

32
[[19 22]
 [43 50]]


## 4. Working with NumPy Arrays
Indexing, slicing, iterating, and reshaping arrays.

In [22]:
array = np.array([1, 2, 3, 4, 5])
print(array[0])  # Accessing the 1 element of the array
print(array[-1])  #accessing the last element of the array

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix[0, 1])  # Accessing element in the first row and second column


1
5
2
[2 3 4]
[[2 3]
 [5 6]]


In [23]:
# slicing
print(array[1:4])  # Elements from index 1 to 3
print(matrix[0:2, 1:3])  # First two rows and the second and third columns

[2 3 4]
[[2 3]
 [5 6]]


In [28]:
array =  np.arange(10)
reshaped_array1 = array.reshape(2, 5)  # Reshape an array to 2 rows and 5 columns
reshaped_array2 = array.reshape(5, -1)  # Reshape an array to 2 rows and 5 columns

print(array)
print(reshaped_array1)
print(reshaped_array2)



[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


## 5. NumPy for Data Cleaning

Data cleaning is an essential step in the data analysis process. NumPy offers several functions and techniques to handle missing data and clean data sets, ensuring they are well-prepared for analysis.

In [29]:
# np.isnan() to identify missing values in an array.
data = np.array([1, np.nan, 3, 4, np.nan])
print(np.isnan(data)) 

[False  True False False  True]


In [32]:

# Removing Missing Values:
data = np.array([[1, 2, np.nan], [4, 5, 6], [np.nan, 8, 9]])
clean_data = data[~np.isnan(data).any(axis=1)]  # Removes rows with any NaN values
print(clean_data)


[[4. 5. 6.]]


In [None]:
# Replacing NaN with the mean
mean_val = np.nanmean(data)
filled_data = np.nan_to_num(data, nan=mean_val)
print(filled_data)


In [41]:
array_with_nan = np.array([1, np.nan, 3, 4, 5,2,10,22,1000,1000])
clean_array = np.nan_to_num(array_with_nan)
print('Clean Array:', clean_array)

Clean Array: [   1.    0.    3.    4.    5.    2.   10.   22. 1000. 1000.]


In [44]:
normalized_data = (clean_array  - np.min(clean_array )) / (np.max(clean_array ) - np.min(clean_array ))
print(np.mean(clean_array ),np.std(clean_array ))
print(np.mean(normalized_data ),np.std(normalized_data ))

204.7 397.69562481878023
0.20469999999999997 0.39769562481878024


In [46]:
# Converting Data Types:
int_array = clean_array.astype(int)

## 6. NumPy for Statistical Analysis

NumPy provides a comprehensive set of functions to perform statistical analysis on arrays. These functions allow you to calculate various statistical measures, which are fundamental in understanding the distribution, variability, and central tendency of your data

In [48]:
scores = np.array([88, 72, 93, 94, 89, 78, 99, 100, 73, 85])

mean_score = np.mean(scores)      
median_score = np.median(scores)    
variance_score = np.var(scores)    
std_deviation_score = np.std(scores) 
min_score = np.min(scores)          
max_score = np.max(scores)        
range_scores = np.ptp(scores)       # Range (Max - Min)

# Printing the results
print("Mean Score:", mean_score)
print("Median Score:", median_score)
print("Variance:", variance_score)
print("Standard Deviation:", std_deviation_score)
print("Minimum Score:", min_score)
print("Maximum Score:", max_score)
print("Range of Scores:", range_scores)


Mean Score: 87.1
Median Score: 88.5
Variance: 90.88999999999999
Standard Deviation: 9.53362470417207
Minimum Score: 72
Maximum Score: 100
Range of Scores: 28


## 7. Advanced NumPy Techniques
Description: Advanced topics like vectorization, broadcasting, and more.

In [50]:
vectorized_array = np.vectorize(lambda x: x * 2)(example_array)
print('Vectorized Array:', vectorized_array)

set1 = np.array([[1, 2], [3, 4], [5, 6]])
set2 = np.array([[7, 8], [9, 10], [11, 12]])

# Vectorized operation to calculate Euclidean distance
distances = np.sqrt(np.sum((set1 - set2) ** 2, axis=1))
print("Euclidean Distances:", distances)

Vectorized Array: [[ 2  4  6]
 [ 8 10 12]]
Euclidean Distances: [8.48528137 8.48528137 8.48528137]
