# Data Manipulation with Python

## Introduction

Data manipulation is a crucial aspect of data science and analysis. In this notebook, we'll explore three powerful libraries in Python: NumPy, Pandas, and Matplotlib. These libraries provide tools for handling, analyzing, and visualizing data.


## NumPy: Numerical Python

### Introduction to NumPy

NumPy is a fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.

### NumPy Basics

#### Array Creation

In [4]:
import numpy as np

# Create a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Create an array with zeros
zeros_array = np.zeros((3, 3))

# Create an array with ones
ones_array = np.ones((2, 4))

print(arr_1d)
print(arr_2d)
print(zeros_array)
print(ones_array)

[1 2 3 4 5]
[[1 2 3]
 [4 5 6]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


#### Array Operations

In [5]:
# Arithmetic operations
result = arr_1d + 10

# Element-wise multiplication
result2 = arr_2d * 2

# Matrix multiplication
result3 = np.dot(arr_2d, np.ones((3, 1)))

print(result)
print(result2)
print(result3)

[11 12 13 14 15]
[[ 2  4  6]
 [ 8 10 12]]
[[ 6.]
 [15.]]


#### Array Indexing

In [7]:
# Array Indexing
print("Element at index 2:", result[2])

Element at index 2: 13


#### Array Slicing

In [9]:
# Array Slicing
print("Sliced array:", result[1:4])

Sliced array: [12 13 14]


#### Data Types

In [10]:
arr_float = np.array([1, 2, 3], dtype=float)
print("Array with float data type:", arr_float)

Array with float data type: [1. 2. 3.]


#### Copy vs View

In [16]:
arr_copy = result.copy()
arr_view = result.view()
result[0] = 10
print("Original Array:", result)
print("Copied Array:", arr_copy)
print("Viewed Array:", arr_view)

Original Array: [10 12 13 14 15]
Copied Array: [11 12 13 14 15]
Viewed Array: [10 12 13 14 15]


#### Array Shape

In [15]:
print("Shape of Array:", result.shape)

Shape of Array: (5,)


#### Array Reshape

In [19]:
arr_reshape = result.reshape(1, 5)
print("Reshaped Array:", arr_reshape)

Reshaped Array: [[10 12 13 14 15]]


#### Array Iterating

In [20]:
for element in  result:
    print(element)

10
12
13
14
15


#### Array Join

In [21]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_join = np.concatenate((arr1, arr2))
print("Joined Array:", arr_join)

Joined Array: [1 2 3 4 5 6]


#### Array Split

In [22]:
arr_split = np.array_split(arr_join, 2)
print("Split Arrays:", arr_split)

Split Arrays: [array([1, 2, 3]), array([4, 5, 6])]


#### Array Search

In [23]:
index = np.where(arr_join == 4)
print("Index of 4:", index)

Index of 4: (array([3], dtype=int64),)


#### Array Sort

In [27]:
arr_sort = np.sort(arr_join)
print("Sorted Array:", arr_sort)

Sorted Array: [1 2 3 4 5 6]


#### Array Filter

In [25]:
arr_filter = arr_join[arr_join > 3]
print("Filtered Array:", arr_filter)

Filtered Array: [4 5 6]


## Random

NumPy provides a variety of functions for generating random numbers and arrays. Here's a list of some common random functions in NumPy

#### np.random.rand 
Generate random numbers from a uniform distribution over [0, 1).

In [26]:
random_numbers = np.random.rand(3, 2)  # 3x2 array of random numbers
print(random_numbers )

[[0.07225681 0.18417801]
 [0.05048088 0.63659361]
 [0.79947358 0.75929477]]


#### np.random.randn 
Generate random numbers from a standard normal distribution.

In [28]:
random_numbers_std_normal = np.random.randn(3, 2)  # 3x2 array of standard normal distribution numbers
print(random_numbers_std_normal )

[[ 0.19794525 -0.71724616]
 [ 1.05347964 -1.35449645]
 [-2.40356582 -0.3437388 ]]


#### np.random.randint
Generate random integers from a specified low to high, exclusive.

In [29]:
random_integers = np.random.randint(1, 10, size=(3, 2))  # 3x2 array of random integers between 1 and 10
print(random_integers)

[[2 3]
 [9 3]
 [1 5]]


#### np.random.random_sample or np.random.random
Generate random floats in the half-open interval [0.0, 1.0).

In [30]:
random_floats = np.random.random_sample((3, 2))  # 3x2 array of random floats
print(random_floats)

[[0.21383953 0.12183887]
 [0.16265151 0.9278005 ]
 [0.98291903 0.90679948]]


#### np.random.choice
Generates a random sample from a given 1-D array.

In [31]:
choices = np.array([1, 2, 3, 4, 5])
random_choice = np.random.choice(choices, size=(3, 2))  # 3x2 array of random choices from the array
print(random_choice)

[[1 5]
 [1 3]
 [2 3]]


#### np.random.shuffle
Shuffle an array in-place.

arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr)

#### np.random.permutation
Randomly permute a sequence or return a permuted range.

In [33]:
permuted_arr = np.random.permutation(arr)
print(permuted_arr)

[5 1 2 3 4]


#### np.random.seed
Seed the generator for reproducibility.

In [36]:
seed=np.random.seed(42)
print(seed)

None


## Probability Distribitions
NumPy's random module provides functions for generating random numbers from various probability distributions. Here are some common probability distribution functions in NumPy:

#### Uniform Distribution (np.random.uniform):

Generates random samples from a uniform distribution over a specified interval

In [37]:
uniform_distribution = np.random.uniform(low=0.0, high=1.0, size=(3, 2))
print(uniform_distribution)

[[0.37454012 0.95071431]
 [0.73199394 0.59865848]
 [0.15601864 0.15599452]]


#### Normal Distribution (np.random.normal):

Generates random samples from a normal (Gaussian) distribution.

In [38]:
normal_distribution = np.random.normal(loc=0.0, scale=1.0, size=(3, 2))
print(normal_distribution)

[[ 1.57921282  0.76743473]
 [-0.46947439  0.54256004]
 [-0.46341769 -0.46572975]]


#### Binomial Distribution (np.random.binomial):

Generates random samples from a binomial distribution.

In [39]:
binomial_distribution = np.random.binomial(n=10, p=0.5, size=(3, 2))
print(binomial_distribution)

[[4 5]
 [5 4]
 [5 3]]


#### Poisson Distribution (np.random.poisson):

Generates random samples from a Poisson distribution.

In [40]:
poisson_distribution = np.random.poisson(lam=5, size=(3, 2))
print(poisson_distribution)

[[5 3]
 [5 4]
 [6 7]]


#### Exponential Distribution (np.random.exponential):

Generates random samples from an exponential distribution.

In [41]:
exponential_distribution = np.random.exponential(scale=1.0, size=(3, 2))
print(exponential_distribution)

[[0.04628197 0.39353209]
 [0.49213029 0.31656044]
 [1.76455787 0.441227  ]]


#### Logistic Distribution (np.random.logistic):

Generates random samples from a logistic distribution.

In [42]:
logistic_distribution = np.random.logistic(loc=0.0, scale=1.0, size=(3, 2))
print(logistic_distribution)

[[-0.93983086  0.17120127]
 [-1.8076348   1.40008251]
 [-2.51880073  4.32094654]]


#### Chi-Square Distribution (np.random.chisquare):

Generates random samples from a chi-square distribution.

In [43]:
chi_square_distribution = np.random.chisquare(df=3, size=(3, 2))
print(chi_square_distribution)

[[1.15522492 3.91983193]
 [5.34434366 4.97870708]
 [0.94940659 1.72707227]]


#### Gamma Distribution (np.random.gamma):

Generates random samples from a gamma distribution.

In [44]:
gamma_distribution = np.random.gamma(shape=2, scale=1, size=(3, 2))
print(gamma_distribution)

[[1.12143497 1.43828812]
 [2.95108852 4.10226246]
 [2.17848714 0.96484462]]


#### Beta Distribution (np.random.beta):

Generates random samples from a beta distribution.

In [45]:
beta_distribution = np.random.beta(a=2, b=5, size=(3, 2))
print(beta_distribution)

[[0.15364547 0.30550235]
 [0.20331388 0.18387688]
 [0.36782409 0.20209677]]


#### Laplace Distribution (np.random.laplace):

Generates random samples from a Laplace distribution.

In [46]:
laplace_distribution = np.random.laplace(loc=0.0, scale=1.0, size=(3, 2))
print(laplace_distribution )

[[-0.45254579 -1.5136558 ]
 [-0.78554688 -0.15757168]
 [ 1.01068255  1.27819779]]


## Universal Functions

Universal functions (ufuncs) in NumPy are functions that operate element-wise on arrays, performing element-wise operations on the array elements. They are the key to NumPy's ability to perform array operations efficiently and quickly. Here are some common universal functions in NumPy:

### Mathematical Operations:

#### np.add
Add corresponding elements of two arrays.

In [47]:
result_add = np.add(arr1, arr2)
print(result_add)

[5 7 9]


#### np.subtract: 
Subtract elements of the second array from the first array.

In [48]:
result_subtract = np.subtract(arr1, arr2)
print(result_subtract)

[-3 -3 -3]


#### np.multiply: 
Multiply corresponding elements of two arrays.

In [49]:
result_multiply = np.multiply(arr1, arr2)
print(result_multiply)

[ 4 10 18]


#### np.divide:
Divide elements of the first array by the corresponding elements of the second array.

In [50]:
result_divide = np.divide(arr1, arr2)
print(result_divide)

[0.25 0.4  0.5 ]


#### np.power: 
Raise elements of the first array to the power of the corresponding elements of the second array.

In [51]:
result_power = np.power(arr1, arr2)
print(result_power)

[  1  32 729]


#### np.sqrt: 
Compute the square root of each element.

In [52]:
result_sqrt = np.sqrt(arr)
print(result_sqrt )

[1.41421356 2.         1.         2.23606798 1.73205081]


## Trigonometric Functions:
np.sin, np.cos, np.tan: Compute trigonometric functions.

In [53]:
result_sin = np.sin(arr)
result_cos = np.cos(arr)
result_tan = np.tan(arr)
print(result_sin)
print(result_cos)
print(result_tan)

[ 0.90929743 -0.7568025   0.84147098 -0.95892427  0.14112001]
[-0.41614684 -0.65364362  0.54030231  0.28366219 -0.9899925 ]
[-2.18503986  1.15782128  1.55740772 -3.38051501 -0.14254654]


In [None]:
np.arcsin, np.arccos, np.arctan: Compute inverse trigonometric functions.

In [59]:
# Filter values within the valid range
valid_values = np.clip(arr, -1, 1)
result_arcsin = np.arcsin(valid_values)
result_arccos = np.arccos(valid_values)
result_arctan = np.arctan(valid_values)
print(result_arcsin)
print(result_arccos)
print(result_arctan)

[1.57079633 1.57079633 1.57079633 1.57079633 1.57079633]
[0. 0. 0. 0. 0.]
[0.78539816 0.78539816 0.78539816 0.78539816 0.78539816]


## Exponential and Logarithmic Functions:
#### np.exp: 
Compute the exponential of each element.

In [55]:
result_exp = np.exp(arr)
print(result_exp)

[  7.3890561   54.59815003   2.71828183 148.4131591   20.08553692]


#### np.log, np.log2, np.log10: 
Compute logarithmic functions.

In [62]:
result_log = np.log(arr)
result_log2 = np.log2(arr)
result_log10 = np.log10(arr)
print(result_log)
print(result_log2)
print(result_log10 )

[0.69314718 1.38629436 0.         1.60943791 1.09861229]
[1.         2.         0.         2.32192809 1.5849625 ]
[0.30103    0.60205999 0.         0.69897    0.47712125]


## Rounding and Absolute Value:

#### np.round: 
Round elements to the nearest integer.

In [63]:
result_round = np.round(arr)
print(result_round)

[2 4 1 5 3]


#### np.abs: 
Compute the absolute value of each element.

In [64]:
result_abs = np.abs(arr)
print(result_abs)

[2 4 1 5 3]


## Statistical Functions:

#### np.mean, np.median, np.std: 
Compute statistical measures.

In [65]:
mean_value = np.mean(arr)
median_value = np.median(arr)
std_dev = np.std(arr)
print(mean_value)
print(median_value)
print(std_dev)

3.0
3.0
1.4142135623730951


#### np.min, np.max: 
Find the minimum and maximum values.

In [66]:
min_value = np.min(arr)
max_value = np.max(arr)
print(min_value)
print(max_value)

1
5
