# Data Manipulation with Python

## Introduction

Data manipulation is a crucial aspect of data science and analysis. In this notebook, we'll explore three powerful libraries in Python: NumPy, Pandas, and Matplotlib. These libraries provide tools for handling, analyzing, and visualizing data.


## NumPy: Numerical Python

### Introduction to NumPy

NumPy is a fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.

### NumPy Basics

#### Array Creation

In [4]:
import numpy as np

# Create a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Create an array with zeros
zeros_array = np.zeros((3, 3))

# Create an array with ones
ones_array = np.ones((2, 4))

print(arr_1d)
print(arr_2d)
print(zeros_array)
print(ones_array)

[1 2 3 4 5]
[[1 2 3]
 [4 5 6]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


#### Array Operations

In [5]:
# Arithmetic operations
result = arr_1d + 10

# Element-wise multiplication
result2 = arr_2d * 2

# Matrix multiplication
result3 = np.dot(arr_2d, np.ones((3, 1)))

print(result)
print(result2)
print(result3)

[11 12 13 14 15]
[[ 2  4  6]
 [ 8 10 12]]
[[ 6.]
 [15.]]


#### Array Indexing

In [7]:
# Array Indexing
print("Element at index 2:", result[2])

Element at index 2: 13


#### Array Slicing

In [9]:
# Array Slicing
print("Sliced array:", result[1:4])

Sliced array: [12 13 14]


#### Data Types

In [10]:
arr_float = np.array([1, 2, 3], dtype=float)
print("Array with float data type:", arr_float)

Array with float data type: [1. 2. 3.]


#### Copy vs View

In [16]:
arr_copy = result.copy()
arr_view = result.view()
result[0] = 10
print("Original Array:", result)
print("Copied Array:", arr_copy)
print("Viewed Array:", arr_view)

Original Array: [10 12 13 14 15]
Copied Array: [11 12 13 14 15]
Viewed Array: [10 12 13 14 15]


#### Array Shape

In [15]:
print("Shape of Array:", result.shape)

Shape of Array: (5,)


#### Array Reshape

In [19]:
arr_reshape = result.reshape(1, 5)
print("Reshaped Array:", arr_reshape)

Reshaped Array: [[10 12 13 14 15]]


#### Array Iterating

In [20]:
for element in  result:
    print(element)

10
12
13
14
15


#### Array Join

In [21]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_join = np.concatenate((arr1, arr2))
print("Joined Array:", arr_join)

Joined Array: [1 2 3 4 5 6]


#### Array Split

In [22]:
arr_split = np.array_split(arr_join, 2)
print("Split Arrays:", arr_split)

Split Arrays: [array([1, 2, 3]), array([4, 5, 6])]


#### Array Search

In [23]:
index = np.where(arr_join == 4)
print("Index of 4:", index)

Index of 4: (array([3], dtype=int64),)


#### Array Sort

In [27]:
arr_sort = np.sort(arr_join)
print("Sorted Array:", arr_sort)

Sorted Array: [1 2 3 4 5 6]


#### Array Filter

In [25]:
arr_filter = arr_join[arr_join > 3]
print("Filtered Array:", arr_filter)

Filtered Array: [4 5 6]


## Random

NumPy provides a variety of functions for generating random numbers and arrays. Here's a list of some common random functions in NumPy

#### np.random.rand 
Generate random numbers from a uniform distribution over [0, 1).

In [26]:
random_numbers = np.random.rand(3, 2)  # 3x2 array of random numbers
print(random_numbers )

[[0.07225681 0.18417801]
 [0.05048088 0.63659361]
 [0.79947358 0.75929477]]


#### np.random.randn 
Generate random numbers from a standard normal distribution.

In [28]:
random_numbers_std_normal = np.random.randn(3, 2)  # 3x2 array of standard normal distribution numbers
print(random_numbers_std_normal )

[[ 0.19794525 -0.71724616]
 [ 1.05347964 -1.35449645]
 [-2.40356582 -0.3437388 ]]


#### np.random.randint
Generate random integers from a specified low to high, exclusive.

In [29]:
random_integers = np.random.randint(1, 10, size=(3, 2))  # 3x2 array of random integers between 1 and 10
print(random_integers)

[[2 3]
 [9 3]
 [1 5]]


#### np.random.random_sample or np.random.random
Generate random floats in the half-open interval [0.0, 1.0).

In [30]:
random_floats = np.random.random_sample((3, 2))  # 3x2 array of random floats
print(random_floats)

[[0.21383953 0.12183887]
 [0.16265151 0.9278005 ]
 [0.98291903 0.90679948]]


#### np.random.choice
Generates a random sample from a given 1-D array.

In [31]:
choices = np.array([1, 2, 3, 4, 5])
random_choice = np.random.choice(choices, size=(3, 2))  # 3x2 array of random choices from the array
print(random_choice)

[[1 5]
 [1 3]
 [2 3]]


#### np.random.shuffle
Shuffle an array in-place.

arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr)

#### np.random.permutation
Randomly permute a sequence or return a permuted range.

In [33]:
permuted_arr = np.random.permutation(arr)
print(permuted_arr)

[5 1 2 3 4]


#### np.random.seed
Seed the generator for reproducibility.

In [36]:
seed=np.random.seed(42)
print(seed)

None


## Probability Distribitions
NumPy's random module provides functions for generating random numbers from various probability distributions. Here are some common probability distribution functions in NumPy:

#### Uniform Distribution (np.random.uniform):

Generates random samples from a uniform distribution over a specified interval

In [37]:
uniform_distribution = np.random.uniform(low=0.0, high=1.0, size=(3, 2))
print(uniform_distribution)

[[0.37454012 0.95071431]
 [0.73199394 0.59865848]
 [0.15601864 0.15599452]]


#### Normal Distribution (np.random.normal):

Generates random samples from a normal (Gaussian) distribution.

In [38]:
normal_distribution = np.random.normal(loc=0.0, scale=1.0, size=(3, 2))
print(normal_distribution)

[[ 1.57921282  0.76743473]
 [-0.46947439  0.54256004]
 [-0.46341769 -0.46572975]]


#### Binomial Distribution (np.random.binomial):

Generates random samples from a binomial distribution.

In [39]:
binomial_distribution = np.random.binomial(n=10, p=0.5, size=(3, 2))
print(binomial_distribution)

[[4 5]
 [5 4]
 [5 3]]


#### Poisson Distribution (np.random.poisson):

Generates random samples from a Poisson distribution.

In [40]:
poisson_distribution = np.random.poisson(lam=5, size=(3, 2))
print(poisson_distribution)

[[5 3]
 [5 4]
 [6 7]]


#### Exponential Distribution (np.random.exponential):

Generates random samples from an exponential distribution.

In [41]:
exponential_distribution = np.random.exponential(scale=1.0, size=(3, 2))
print(exponential_distribution)

[[0.04628197 0.39353209]
 [0.49213029 0.31656044]
 [1.76455787 0.441227  ]]


#### Logistic Distribution (np.random.logistic):

Generates random samples from a logistic distribution.

In [42]:
logistic_distribution = np.random.logistic(loc=0.0, scale=1.0, size=(3, 2))
print(logistic_distribution)

[[-0.93983086  0.17120127]
 [-1.8076348   1.40008251]
 [-2.51880073  4.32094654]]


#### Chi-Square Distribution (np.random.chisquare):

Generates random samples from a chi-square distribution.

In [43]:
chi_square_distribution = np.random.chisquare(df=3, size=(3, 2))
print(chi_square_distribution)

[[1.15522492 3.91983193]
 [5.34434366 4.97870708]
 [0.94940659 1.72707227]]


#### Gamma Distribution (np.random.gamma):

Generates random samples from a gamma distribution.

In [44]:
gamma_distribution = np.random.gamma(shape=2, scale=1, size=(3, 2))
print(gamma_distribution)

[[1.12143497 1.43828812]
 [2.95108852 4.10226246]
 [2.17848714 0.96484462]]


#### Beta Distribution (np.random.beta):

Generates random samples from a beta distribution.

In [45]:
beta_distribution = np.random.beta(a=2, b=5, size=(3, 2))
print(beta_distribution)

[[0.15364547 0.30550235]
 [0.20331388 0.18387688]
 [0.36782409 0.20209677]]


#### Laplace Distribution (np.random.laplace):

Generates random samples from a Laplace distribution.

In [46]:
laplace_distribution = np.random.laplace(loc=0.0, scale=1.0, size=(3, 2))
print(laplace_distribution )

[[-0.45254579 -1.5136558 ]
 [-0.78554688 -0.15757168]
 [ 1.01068255  1.27819779]]
