<h1 align="center" style="background-color:#001f3f;
           color:white;
           border-radius:8px;
           font-family:'Times New Roman', Times, serif;
           padding:20px;
           display:inline-block;">
Data Manipulation
</h1>

<hr>

<p align="center"><b>Author:</b> Muhammad Usman</p>
<p align="center"><b>Dated:</b> 6/Oct/2025</p>

`Data Manipulation` in NumPy means `reshaping`, `combining`, `splitting`, and `transforming` *arrays* to make them suitable for **analysis, visualization, and model input**\.

It’s a crucial step in ***Data Engineering***, Machine Learning preprocessing, and Scientific Computing.

> #### Lets have a look on Data Manipulation Functions:

* ## **Shape and Reshape Operations**

let creates an array and apply functions on it

In [74]:
# Import Library first
import numpy as np

In [75]:
arr1 = np.array(

    [
        [50,55,66,77,33,22],
        [4,5,6,5.5,4.4,3.3]
    ]
)
print(arr1)
print(f"Dimension: {arr1.ndim}")

[[50.  55.  66.  77.  33.  22. ]
 [ 4.   5.   6.   5.5  4.4  3.3]]
Dimension: 2


### **1.Change Structure on an array**

In [76]:
# checking shape of it

print(f"The shape of an array is: {arr1.shape}")

# if we want to change the shape of an array
print(arr1.reshape(3,4))

The shape of an array is: (2, 6)
[[50.  55.  66.  77. ]
 [33.  22.   4.   5. ]
 [ 6.   5.5  4.4  3.3]]


`.reshape()` is used to change the shape (rows × columns) of an array without changing its data.
- Shortcut with -1

In [77]:
# If you don’t want to calculate one of the dimensions manually, use -1:

arr1.reshape(4, -1)

array([[50. , 55. , 66. ],
       [77. , 33. , 22. ],
       [ 4. ,  5. ,  6. ],
       [ 5.5,  4.4,  3.3]])

### **2.Convert 2D Array into 1D Array**
convert multi-dimensional arrays into 1D, but they do not change the shape of the original (arr2) array

In [69]:
arr2 = arr1.copy()               # copy arr1 to arr2 to make arr1 original

print(f"Convert into 1D through Ravel: {arr2.ravel()}")
print(f"Convert into 1D through Flatten: {arr2.flatten()}")

print(f"Dimension: {arr2.ndim}")
print("")

# Actual way to convert dimension (we have to assign it)

arr2 = arr2.ravel()    
print(arr2)
print(f"Dimension: {arr2.ndim}")

Convert into 1D through Ravel: [50.  55.  66.  77.  33.  22.   4.   5.   6.   5.5  4.4  3.3]
Convert into 1D through Flatten: [50.  55.  66.  77.  33.  22.   4.   5.   6.   5.5  4.4  3.3]
Dimension: 2

[50.  55.  66.  77.  33.  22.   4.   5.   6.   5.5  4.4  3.3]
Dimension: 1


### **3.Resize and Transpose**

In [70]:
arr3 = arr1.copy()
arr3

array([[50. , 55. , 66. , 77. , 33. , 22. ],
       [ 4. ,  5. ,  6. ,  5.5,  4.4,  3.3]])

In [71]:
# changes its structure permanently

arr3.resize((6,2))
arr3

array([[50. , 55. ],
       [66. , 77. ],
       [33. , 22. ],
       [ 4. ,  5. ],
       [ 6. ,  5.5],
       [ 4.4,  3.3]])

**Note: The size should be same ie in array 1 the size is (2,6 == 2x6 = 12), so reshaping arr3 should have exact size of 12**

In [72]:
arr3.T        # -----> it will flips rows and columns 

# it is widly used in img processing

array([[50. , 66. , 33. ,  4. ,  6. ,  4.4],
       [55. , 77. , 22. ,  5. ,  5.5,  3.3]])

### **4.swap or reorder columns**

In [73]:
print("Original Array:")
print(arr1)

# Swap column 0 and column 2
swapped = arr1[:, [2, 1, 0, 3,4,5]]     # new column order

print("\nSwapped Columns (Col 1 , Col 3):")
print(swapped)

Original Array:
[[50.  55.  66.  77.  33.  22. ]
 [ 4.   5.   6.   5.5  4.4  3.3]]

Swapped Columns (Col 1 , Col 3):
[[66.  55.  50.  77.  33.  22. ]
 [ 6.   5.   4.   5.5  4.4  3.3]]


* ## **Stacking and Combining Arrays**

### **1.Horizontal Stack**

Joins arrays side by side (column-wise).

In [85]:
# create an array

aa = np.array([[1,2,3]])
bb = np.array([[4,5,6]])

print(np.hstack((aa,bb)))


[[1 2 3 4 5 6]]


### **2.Splitting the Array**
It is very usefull in dividing the data into training, validation, and testing sets in ML.

In [89]:
array = np.arange(10)
print("Original Array:", array)

result = np.split(array, 2)            # split into 2 equal parts
print(result)


Original Array: [0 1 2 3 4 5 6 7 8 9]
[array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]


In [92]:
# you can also split by indexing

print(np.split(array, [3, 7]))

[array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])]


**Official Documentation Reference**

For more stacking and array manipulation functions, refer to the official NumPy documentation:

🔗 [NumPy vstack() – Official Docs](https://numpy.org/doc/stable/reference/generated/numpy.vstack.html)

* ## **Adding, Removing, and Repeating Elements of an array**

**Key Functions:**
append,	
insert,	
delete,
tile.. etc	

In [95]:
arr4 =arr1.copy()
print(arr4)

[[50.  55.  66.  77.  33.  22. ]
 [ 4.   5.   6.   5.5  4.4  3.3]]


In [100]:
print(np.append(arr4, [4,5]))
print("")

print(np.insert(arr4, 1, 99))
print("")

print(np.delete(arr4, 5))
print("")

print(np.unique(arr4))


[50.  55.  66.  77.  33.  22.   4.   5.   6.   5.5  4.4  3.3  4.   5. ]

[50.  99.  55.  66.  77.  33.  22.   4.   5.   6.   5.5  4.4  3.3]

[50.  55.  66.  77.  33.   4.   5.   6.   5.5  4.4  3.3]

[ 3.3  4.   4.4  5.   5.5  6.  22.  33.  50.  55.  66.  77. ]


* ## **BroadCasting**
`Broadcasting` lets NumPy apply operations between arrays of `different shapes` `automatically` like: adding, subtracting, or multiplying a small array across a big one, without writing loops

In [101]:
# creating an array

sales = np.array([
    [100, 120, 130],
    [90, 110, 115],
    [105, 125, 140],
    [95, 100, 120]
])

# here, row = month 
#     column = Products
sales

array([[100, 120, 130],
       [ 90, 110, 115],
       [105, 125, 140],
       [ 95, 100, 120]])

**moving deeper**
<br>
Can we name column and rows?

the answer is no; NumPy focuses on speed and numerical computation, not metadata (like column/row names)

In [107]:
# Recommended way is Create DataFrame

import pandas as pd

# Discounts per product
discount = np.array([5, 10, 15])

df = pd.DataFrame(sales,

    columns=['Prod_A', 'Prod_B', 'Prod_C'],
    index=['Jan', 'Feb', 'Mar', 'Apr']
)

print("Original Sales Data:\n")
print(df)

print("\n Discounts per Product:", discount)


Original Sales Data:

     Prod_A  Prod_B  Prod_C
Jan     100     120     130
Feb      90     110     115
Mar     105     125     140
Apr      95     100     120

 Discounts per Product: [ 5 10 15]


now apply discount in sales data

In [108]:
# Create an empty DataFrame to store results
manual_df = df.copy()

# Manual subtraction (row by row, column by column)
for i in range(len(df)):
    for j in range(len(df.columns)):
        manual_df.iat[i, j] = df.iat[i, j] - discount[j]

print("\n Manual Discount Applied (No Broadcasting):\n")
print(manual_df)



 Manual Discount Applied (No Broadcasting):

     Prod_A  Prod_B  Prod_C
Jan      95     110     115
Feb      85     100     100
Mar     100     115     125
Apr      90      90     105


In [109]:
# Automatic broadcasting using NumPy logic
broadcasted_df = df - discount

print("\n Automatic Broadcasting Result:\n")
print(broadcasted_df)



 Automatic Broadcasting Result:

     Prod_A  Prod_B  Prod_C
Jan      95     110     115
Feb      85     100     100
Mar     100     115     125
Apr      90      90     105


- ## **Sorting and Searching in Data Manipulation**

In data manipulation, `sorting and searching` are essential when you need to `organize data` or `find specific elements` within arrays.

- If you have to search any value, sort any dataset, or find the position of elements, you can use NumPy’s built-in functions like:

np.sort() → Sorts elements in ascending or descending order

np.argsort() → Returns the indices that would sort an array

np.argmax() / np.argmin() → Finds the index of maximum/minimum value

np.searchsorted() → Finds the insertion point for a value in a sorted array

np.unique() → Returns all unique values in an array

np.where() → Returns positions or elements matching a condition

These functions help in data cleaning, analysis, and preprocessing, making them a vital part of data manipulation workflows.

**📘 Official Documentation Reference**

For detailed explanation, parameters, and examples, refer to the official NumPy documentation:

🔗 [NumPy Sorting – Official Docs](https://numpy.org/doc/stable/reference/routines.sort.html)