# Data Manipulation with Python

## Introduction

Data manipulation is a crucial aspect of data science and analysis. In this notebook, we'll explore three powerful libraries in Python: NumPy, Pandas, and Matplotlib. These libraries provide tools for handling, analyzing, and visualizing data.


## NumPy: Numerical Python

### Introduction to NumPy

NumPy is a fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.

### NumPy Basics

#### Array Creation

In [1]:
import numpy as np

# Create a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Create an array with zeros
zeros_array = np.zeros((3, 3))

# Create an array with ones
ones_array = np.ones((2, 4))

print(arr_1d)
print(arr_2d)
print(zeros_array)
print(ones_array)

[1 2 3 4 5]
[[1 2 3]
 [4 5 6]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


#### Array Operations

In [2]:
# Arithmetic operations
result = arr_1d + 10

# Element-wise multiplication
result2 = arr_2d * 2

# Matrix multiplication
result3 = np.dot(arr_2d, np.ones((3, 1)))

print(result)
print(result2)
print(result3)

[11 12 13 14 15]
[[ 2  4  6]
 [ 8 10 12]]
[[ 6.]
 [15.]]


#### Array Indexing

In [3]:
# Array Indexing
print("Element at index 2:", result[2])

Element at index 2: 13


#### Array Slicing

In [4]:
# Array Slicing
print("Sliced array:", result[1:4])

Sliced array: [12 13 14]


#### Data Types

In [5]:
arr_float = np.array([1, 2, 3], dtype=float)
print("Array with float data type:", arr_float)

Array with float data type: [1. 2. 3.]


#### Copy vs View

In [6]:
arr_copy = result.copy()
arr_view = result.view()
result[0] = 10
print("Original Array:", result)
print("Copied Array:", arr_copy)
print("Viewed Array:", arr_view)

Original Array: [10 12 13 14 15]
Copied Array: [11 12 13 14 15]
Viewed Array: [10 12 13 14 15]


#### Array Shape

In [7]:
print("Shape of Array:", result.shape)

Shape of Array: (5,)


#### Array Reshape

In [8]:
arr_reshape = result.reshape(1, 5)
print("Reshaped Array:", arr_reshape)

Reshaped Array: [[10 12 13 14 15]]


#### Array Iterating

In [9]:
for element in  result:
    print(element)

10
12
13
14
15


#### Array Join

In [10]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_join = np.concatenate((arr1, arr2))
print("Joined Array:", arr_join)

Joined Array: [1 2 3 4 5 6]


#### Array Split

In [11]:
arr_split = np.array_split(arr_join, 2)
print("Split Arrays:", arr_split)

Split Arrays: [array([1, 2, 3]), array([4, 5, 6])]


#### Array Search

In [12]:
index = np.where(arr_join == 4)
print("Index of 4:", index)

Index of 4: (array([3], dtype=int64),)


#### Array Sort

In [13]:
arr_sort = np.sort(arr_join)
print("Sorted Array:", arr_sort)

Sorted Array: [1 2 3 4 5 6]


#### Array Filter

In [14]:
arr_filter = arr_join[arr_join > 3]
print("Filtered Array:", arr_filter)

Filtered Array: [4 5 6]


## Random

NumPy provides a variety of functions for generating random numbers and arrays. Here's a list of some common random functions in NumPy

#### np.random.rand 
Generate random numbers from a uniform distribution over [0, 1).

In [15]:
random_numbers = np.random.rand(3, 2)  # 3x2 array of random numbers
print(random_numbers )

[[0.9213901  0.350397  ]
 [0.08483354 0.86725345]
 [0.33436114 0.49988585]]


#### np.random.randn 
Generate random numbers from a standard normal distribution.

In [16]:
random_numbers_std_normal = np.random.randn(3, 2)  # 3x2 array of standard normal distribution numbers
print(random_numbers_std_normal )

[[-1.4281745   0.0722784 ]
 [ 0.34200444  0.2533777 ]
 [-1.96684441  0.01909037]]


#### np.random.randint
Generate random integers from a specified low to high, exclusive.

In [17]:
random_integers = np.random.randint(1, 10, size=(3, 2))  # 3x2 array of random integers between 1 and 10
print(random_integers)

[[5 9]
 [1 4]
 [7 6]]


#### np.random.random_sample or np.random.random
Generate random floats in the half-open interval [0.0, 1.0).

In [18]:
random_floats = np.random.random_sample((3, 2))  # 3x2 array of random floats
print(random_floats)

[[0.2347914  0.99258368]
 [0.71104656 0.61237905]
 [0.93973209 0.89573567]]


#### np.random.choice
Generates a random sample from a given 1-D array.

In [19]:
choices = np.array([1, 2, 3, 4, 5])
random_choice = np.random.choice(choices, size=(3, 2))  # 3x2 array of random choices from the array
print(random_choice)

[[5 3]
 [1 5]
 [2 2]]


#### np.random.shuffle
Shuffle an array in-place.

arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr)

#### np.random.permutation
Randomly permute a sequence or return a permuted range.

In [20]:
permuted_arr = np.random.permutation(arr1)
print(permuted_arr)

[3 2 1]


#### np.random.seed
Seed the generator for reproducibility.

In [21]:
seed=np.random.seed(42)
print(seed)

None


## Probability Distribitions
NumPy's random module provides functions for generating random numbers from various probability distributions. Here are some common probability distribution functions in NumPy:

#### Uniform Distribution (np.random.uniform):

Generates random samples from a uniform distribution over a specified interval

In [22]:
uniform_distribution = np.random.uniform(low=0.0, high=1.0, size=(3, 2))
print(uniform_distribution)

[[0.37454012 0.95071431]
 [0.73199394 0.59865848]
 [0.15601864 0.15599452]]


#### Normal Distribution (np.random.normal):

Generates random samples from a normal (Gaussian) distribution.

In [23]:
normal_distribution = np.random.normal(loc=0.0, scale=1.0, size=(3, 2))
print(normal_distribution)

[[ 1.57921282  0.76743473]
 [-0.46947439  0.54256004]
 [-0.46341769 -0.46572975]]


#### Binomial Distribution (np.random.binomial):

Generates random samples from a binomial distribution.

In [24]:
binomial_distribution = np.random.binomial(n=10, p=0.5, size=(3, 2))
print(binomial_distribution)

[[4 5]
 [5 4]
 [5 3]]


#### Poisson Distribution (np.random.poisson):

Generates random samples from a Poisson distribution.

In [25]:
poisson_distribution = np.random.poisson(lam=5, size=(3, 2))
print(poisson_distribution)

[[5 3]
 [5 4]
 [6 7]]


#### Exponential Distribution (np.random.exponential):

Generates random samples from an exponential distribution.

In [26]:
exponential_distribution = np.random.exponential(scale=1.0, size=(3, 2))
print(exponential_distribution)

[[0.04628197 0.39353209]
 [0.49213029 0.31656044]
 [1.76455787 0.441227  ]]


#### Logistic Distribution (np.random.logistic):

Generates random samples from a logistic distribution.

In [27]:
logistic_distribution = np.random.logistic(loc=0.0, scale=1.0, size=(3, 2))
print(logistic_distribution)

[[-0.93983086  0.17120127]
 [-1.8076348   1.40008251]
 [-2.51880073  4.32094654]]


#### Chi-Square Distribution (np.random.chisquare):

Generates random samples from a chi-square distribution.

In [28]:
chi_square_distribution = np.random.chisquare(df=3, size=(3, 2))
print(chi_square_distribution)

[[1.15522492 3.91983193]
 [5.34434366 4.97870708]
 [0.94940659 1.72707227]]


#### Gamma Distribution (np.random.gamma):

Generates random samples from a gamma distribution.

In [29]:
gamma_distribution = np.random.gamma(shape=2, scale=1, size=(3, 2))
print(gamma_distribution)

[[1.12143497 1.43828812]
 [2.95108852 4.10226246]
 [2.17848714 0.96484462]]


#### Beta Distribution (np.random.beta):

Generates random samples from a beta distribution.

In [30]:
beta_distribution = np.random.beta(a=2, b=5, size=(3, 2))
print(beta_distribution)

[[0.15364547 0.30550235]
 [0.20331388 0.18387688]
 [0.36782409 0.20209677]]


#### Laplace Distribution (np.random.laplace):

Generates random samples from a Laplace distribution.

In [31]:
laplace_distribution = np.random.laplace(loc=0.0, scale=1.0, size=(3, 2))
print(laplace_distribution )

[[-0.45254579 -1.5136558 ]
 [-0.78554688 -0.15757168]
 [ 1.01068255  1.27819779]]


## Universal Functions

Universal functions (ufuncs) in NumPy are functions that operate element-wise on arrays, performing element-wise operations on the array elements. They are the key to NumPy's ability to perform array operations efficiently and quickly. Here are some common universal functions in NumPy:

### Mathematical Operations:

#### np.add
Add corresponding elements of two arrays.

In [32]:
result_add = np.add(arr1, arr2)
print(result_add)

[5 7 9]


#### np.subtract: 
Subtract elements of the second array from the first array.

In [33]:
result_subtract = np.subtract(arr1, arr2)
print(result_subtract)

[-3 -3 -3]


#### np.multiply: 
Multiply corresponding elements of two arrays.

In [34]:
result_multiply = np.multiply(arr1, arr2)
print(result_multiply)

[ 4 10 18]


#### np.divide:
Divide elements of the first array by the corresponding elements of the second array.

In [35]:
result_divide = np.divide(arr1, arr2)
print(result_divide)

[0.25 0.4  0.5 ]


#### np.power: 
Raise elements of the first array to the power of the corresponding elements of the second array.

In [36]:
result_power = np.power(arr1, arr2)
print(result_power)

[  1  32 729]


#### np.sqrt: 
Compute the square root of each element.

In [37]:
result_sqrt = np.sqrt(arr1)
print(result_sqrt )

[1.         1.41421356 1.73205081]


## Trigonometric Functions:
np.sin, np.cos, np.tan: Compute trigonometric functions.

In [38]:
result_sin = np.sin(arr1)
result_cos = np.cos(arr1)
result_tan = np.tan(arr1)
print(result_sin)
print(result_cos)
print(result_tan)

[0.84147098 0.90929743 0.14112001]
[ 0.54030231 -0.41614684 -0.9899925 ]
[ 1.55740772 -2.18503986 -0.14254654]


#### np.arcsin, np.arccos, np.arctan: Compute inverse trigonometric functions.

In [39]:
# Filter values within the valid range
valid_values = np.clip(arr1, -1, 1)
result_arcsin = np.arcsin(valid_values)
result_arccos = np.arccos(valid_values)
result_arctan = np.arctan(valid_values)
print(result_arcsin)
print(result_arccos)
print(result_arctan)

[1.57079633 1.57079633 1.57079633]
[0. 0. 0.]
[0.78539816 0.78539816 0.78539816]


## Exponential and Logarithmic Functions:
#### np.exp: 
Compute the exponential of each element.

In [40]:
result_exp = np.exp(arr1)
print(result_exp)

[ 2.71828183  7.3890561  20.08553692]


#### np.log, np.log2, np.log10: 
Compute logarithmic functions.

In [41]:
result_log = np.log(arr1)
result_log2 = np.log2(arr1)
result_log10 = np.log10(arr1)
print(result_log)
print(result_log2)
print(result_log10 )

[0.         0.69314718 1.09861229]
[0.        1.        1.5849625]
[0.         0.30103    0.47712125]


## Rounding and Absolute Value:

#### np.round: 
Round elements to the nearest integer.

In [42]:
result_round = np.round(arr1)
print(result_round)

[1 2 3]


#### np.abs: 
Compute the absolute value of each element.

In [43]:
result_abs = np.abs(arr1)
print(result_abs)

[1 2 3]


## Statistical Functions:

#### np.mean, np.median, np.std: 
Compute statistical measures.

In [44]:
mean_value = np.mean(arr1)
median_value = np.median(arr1)
std_dev = np.std(arr1)
print(mean_value)
print(median_value)
print(std_dev)

2.0
2.0
0.816496580927726


#### np.min, np.max: 
Find the minimum and maximum values.

In [45]:
min_value = np.min(arr1)
max_value = np.max(arr1)
print(min_value)
print(max_value)

1
3


# Pandas: Python Data Analysis Library

#### Introduction to Pandas

Pandas is a fast, powerful, and flexible open-source data analysis and manipulation library for Python.

### Pandas Basics

#### Series and DataFrame

In [46]:
import pandas as pd

# Create a Series
series = pd.Series([1, 3, 5, np.nan, 6, 8])

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, 5, 6]})
df

Unnamed: 0,A,B
0,1.0,4
1,2.0,5
2,,6


#### Data Cleaning

In [47]:
# Handling missing data
df.dropna()
df.fillna(0)


Unnamed: 0,A,B
0,1.0,4
1,2.0,5
2,0.0,6


#### Creating a DataFrame from a dictionary:

In [48]:
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


#### Creating a DataFrame from a list of dictionaries:

In [49]:
data_list = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
             {'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
             {'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]
df_from_list = pd.DataFrame(data_list)
df_from_list

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


### Reading and Writing Data:

#### Reading from CSV:

In [50]:
df_csv = pd.read_csv('example.txt')
df_csv

Unnamed: 0,Ths is an example file.
0,Python is fun


#### Writing to CSV:

In [51]:
df.to_csv('output_filename.csv', index=False)


### Data Indexing and Selection

#### Selecting a column:

In [52]:
name_column = df['Name']
name_column

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

#### Selecting multiple columns:

In [53]:
selected_columns = df[['Name', 'Age']]
selected_columns

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


#### Selecting rows based on conditions:

In [54]:
young_people = df[df['Age'] < 30]
young_people

Unnamed: 0,Name,Age,City
0,Alice,25,New York


### Handling Missing Data:

#### Checking for missing values:

In [55]:
missing_values = df.isnull().sum()
missing_values

Name    0
Age     0
City    0
dtype: int64

#### Dropping missing values:

In [56]:
df_no_missing = df.dropna()
df_no_missing

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


#### Filling missing values:

In [57]:
df_filled = df.fillna(value=0)
df_filled

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


### Grouping and Aggregation:

#### Grouping by a column:

In [58]:
grouped_by_city = df.groupby('City')
grouped_by_city

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B5871B0FD0>

#### Aggregating with mean:

In [59]:
mean_age_by_city = grouped_by_city['Age'].mean()
grouped_by_city

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B5871B0FD0>

### Merging and Concatenating:

#### Concatenating DataFrames vertically:

In [60]:
# Define DataFrame df1
data1 = {'Name': ['Alice', 'Bob', 'Charlie'],
         'Age': [25, 30, 35],
         'City': ['New York', 'San Francisco', 'Los Angeles']}
df1 = pd.DataFrame(data1)

# Define DataFrame df2
data2 = {'Name': ['David', 'Eva', 'Frank'],
         'Age': [28, 32, 40],
         'City': ['Chicago', 'Seattle', 'Miami']}
df2 = pd.DataFrame(data2)

# Concatenate DataFrames vertically
df_concatenated = pd.concat([df1, df2], axis=0)

# Display the concatenated DataFrame
df_concatenated

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles
0,David,28,Chicago
1,Eva,32,Seattle
2,Frank,40,Miami


#### Merging DataFrames:

In [61]:
# Sample DataFrame df1
data1 = {'ID': [1, 2, 3],
         'Name': ['Alice', 'Bob', 'Charlie'],
         'Score1': [85, 90, 75]}
df3 = pd.DataFrame(data1)

# Sample DataFrame df2
data2 = {'ID': [1, 2, 3],
         'Score2': [92, 88, 95]}
df4 = pd.DataFrame(data2)

# Merging on the 'ID' column
merged_df = pd.merge(df3, df4, on='ID')
merged_df

Unnamed: 0,ID,Name,Score1,Score2
0,1,Alice,85,92
1,2,Bob,90,88
2,3,Charlie,75,95


### Sorting DataFrames:

#### Sorting by a column:

In [62]:
sorted_df = df.sort_values(by='Age', ascending=False)
sorted_df

Unnamed: 0,Name,Age,City
2,Charlie,35,Los Angeles
1,Bob,30,San Francisco
0,Alice,25,New York


#### Applying a function to a column:

In [63]:
df['Age'] = df['Age'].apply(lambda x: x + 1)
df['Age']

0    26
1    31
2    36
Name: Age, dtype: int64

### Statistical Summary:

#### Descriptive statistics:

In [64]:
summary_stats = df.describe()
summary_stats

Unnamed: 0,Age
count,3.0
mean,31.0
std,5.0
min,26.0
25%,28.5
50%,31.0
75%,33.5
max,36.0


### Renaming Columns:

Renaming specific columns:

In [65]:
df
df.rename(columns={'Name': 'Name1'}, inplace=True)
df

Unnamed: 0,Name1,Age,City
0,Alice,26,New York
1,Bob,31,San Francisco
2,Charlie,36,Los Angeles


In [66]:
df.columns = ['Name', 'Age2', 'City3']
df

Unnamed: 0,Name,Age2,City3
0,Alice,26,New York
1,Bob,31,San Francisco
2,Charlie,36,Los Angeles


### Dropping Columns and Rows:

#### Dropping columns:

In [67]:
df_dropped_columns = df.drop(['City3'], axis=1)
df_dropped_columns

Unnamed: 0,Name,Age2
0,Alice,26
1,Bob,31
2,Charlie,36


#### Dropping rows based on conditions:

In [68]:
df_filtered_rows = df[df['Age2'] > 25]
df_filtered_rows

Unnamed: 0,Name,Age2,City3
0,Alice,26,New York
1,Bob,31,San Francisco
2,Charlie,36,Los Angeles


### Applying Functions Row-wise or Column-wise:

#### Applying a function to each column:

In [69]:
df.apply(lambda x: x.max())
df

Unnamed: 0,Name,Age2,City3
0,Alice,26,New York
1,Bob,31,San Francisco
2,Charlie,36,Los Angeles


#### Applying a function to each row:

In [70]:
df.apply(lambda row: row['Age2'] * 2, axis=1)
df

Unnamed: 0,Name,Age2,City3
0,Alice,26,New York
1,Bob,31,San Francisco
2,Charlie,36,Los Angeles


### Handling Categorical Data:

#### Converting a column to categorical:

In [71]:
df['Category'] = pd.Categorical(df['City3'])
df['Category']

0         New York
1    San Francisco
2      Los Angeles
Name: Category, dtype: category
Categories (3, object): ['Los Angeles', 'New York', 'San Francisco']

#### Encoding categorical variables:

In [72]:
df_encoded = pd.get_dummies(df, columns=['Category'])
df_encoded

Unnamed: 0,Name,Age2,City3,Category_Los Angeles,Category_New York,Category_San Francisco
0,Alice,26,New York,False,True,False
1,Bob,31,San Francisco,False,False,True
2,Charlie,36,Los Angeles,True,False,False


### Pivoting and Melting:

#### Pivoting a DataFrame:

In [73]:

df2

Unnamed: 0,Name,Age,City
0,David,28,Chicago
1,Eva,32,Seattle
2,Frank,40,Miami


In [74]:
df2=df.drop(['Category'],axis=1)
df2.columns = ['Name', 'Age', 'City']
df_pivoted = df2.pivot(index='Name', columns='City', values='Age')
df_pivoted

City,Los Angeles,New York,San Francisco
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alice,,26.0,
Bob,,,31.0
Charlie,36.0,,


#### Melting a DataFrame:

In [75]:
df_melted = pd.melt(df2, id_vars=['Name'], value_vars=['Age', 'City'])
df_melted

Unnamed: 0,Name,variable,value
0,Alice,Age,26
1,Bob,Age,31
2,Charlie,Age,36
3,Alice,City,New York
4,Bob,City,San Francisco
5,Charlie,City,Los Angeles


### DateTime Operations:

#### Converting a column to datetime:

In [76]:
# Create a DataFrame with Date as strings
data = {
    'DateString': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
    'Value1': [10, 15, 20, 25],
    'Value2': [1.2, 2.3, 3.5, 4.1]
}

df = pd.DataFrame(data)

# Display the DataFrame before conversion
print("DataFrame Before Conversion:")
print(df)

# Convert the 'DateString' column to datetime
df['Date'] = pd.to_datetime(df['DateString'])

# Drop the original 'DateString' column
df = df.drop(columns=['DateString'])

# Display the DataFrame after conversion
print("\nDataFrame After Conversion:")
print(df)

DataFrame Before Conversion:
   DateString  Value1  Value2
0  2023-01-01      10     1.2
1  2023-01-02      15     2.3
2  2023-01-03      20     3.5
3  2023-01-04      25     4.1

DataFrame After Conversion:
   Value1  Value2       Date
0      10     1.2 2023-01-01
1      15     2.3 2023-01-02
2      20     3.5 2023-01-03
3      25     4.1 2023-01-04


#### Extracting components of a datetime column:

In [77]:
df['Year'] = df['Date'].dt.year
df['Year'] 

0    2023
1    2023
2    2023
3    2023
Name: Year, dtype: int32

### Filtering with isin:

#### Selecting rows where a column value is in a list:

In [78]:
# Create a date range
date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')

# Create a DataFrame with Date, Value1, Value2, and Category columns
data = {
    'Date': date_rng,
    'Value1': np.random.randint(1, 100, size=(len(date_rng))),
    'Value2': np.random.randn(len(date_rng)),
    'Category': np.random.choice(['A', 'B', 'C'], size=(len(date_rng)))
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

# Select rows where 'Category' is 'A' or 'B'
selected_rows = df[df['Category'].isin(['A', 'B'])]
print(selected_rows)

        Date  Value1    Value2 Category
0 2023-01-01      70 -0.161286        B
1 2023-01-02      72  0.404051        C
2 2023-01-03      27  1.886186        A
3 2023-01-04       9  0.174578        A
4 2023-01-05      62  0.257550        A
5 2023-01-06      37 -0.074446        C
6 2023-01-07      97 -1.918771        B
7 2023-01-08      51 -0.026514        B
8 2023-01-09      44  0.060230        A
9 2023-01-10      24  2.463242        B
        Date  Value1    Value2 Category
0 2023-01-01      70 -0.161286        B
2 2023-01-03      27  1.886186        A
3 2023-01-04       9  0.174578        A
4 2023-01-05      62  0.257550        A
6 2023-01-07      97 -1.918771        B
7 2023-01-08      51 -0.026514        B
8 2023-01-09      44  0.060230        A
9 2023-01-10      24  2.463242        B


### Combining DataFrames:

#### Combining DataFrames vertically:

In [79]:
df_combined = pd.concat([df1, df2], ignore_index=True)
df_combined

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles
3,Alice,26,New York
4,Bob,31,San Francisco
5,Charlie,36,Los Angeles


#### Combining DataFrames horizontally:

In [80]:
df_combined_horizontal = pd.concat([df1, df2], axis=1)
df_combined_horizontal 

Unnamed: 0,Name,Age,City,Name.1,Age.1,City.1
0,Alice,25,New York,Alice,26,New York
1,Bob,30,San Francisco,Bob,31,San Francisco
2,Charlie,35,Los Angeles,Charlie,36,Los Angeles


### Handling Duplicate Data:

#### Removing duplicate rows:

In [81]:
df_no_duplicates = df.drop_duplicates()
df_no_duplicates

Unnamed: 0,Date,Value1,Value2,Category
0,2023-01-01,70,-0.161286,B
1,2023-01-02,72,0.404051,C
2,2023-01-03,27,1.886186,A
3,2023-01-04,9,0.174578,A
4,2023-01-05,62,0.25755,A
5,2023-01-06,37,-0.074446,C
6,2023-01-07,97,-1.918771,B
7,2023-01-08,51,-0.026514,B
8,2023-01-09,44,0.06023,A
9,2023-01-10,24,2.463242,B


#### Identifying and keeping only duplicates:

In [82]:
df_duplicates_only = df[df.duplicated()]
df_duplicates_only

Unnamed: 0,Date,Value1,Value2,Category


### Resetting Index:

#### Resetting the DataFrame index:

In [83]:
df_reset_index = df.reset_index(drop=True)
df_reset_index

Unnamed: 0,Date,Value1,Value2,Category
0,2023-01-01,70,-0.161286,B
1,2023-01-02,72,0.404051,C
2,2023-01-03,27,1.886186,A
3,2023-01-04,9,0.174578,A
4,2023-01-05,62,0.25755,A
5,2023-01-06,37,-0.074446,C
6,2023-01-07,97,-1.918771,B
7,2023-01-08,51,-0.026514,B
8,2023-01-09,44,0.06023,A
9,2023-01-10,24,2.463242,B


#### Head and Tail Methods

In [84]:
df_reset_index.head(4)

Unnamed: 0,Date,Value1,Value2,Category
0,2023-01-01,70,-0.161286,B
1,2023-01-02,72,0.404051,C
2,2023-01-03,27,1.886186,A
3,2023-01-04,9,0.174578,A


In [85]:
df_reset_index.tail()

Unnamed: 0,Date,Value1,Value2,Category
5,2023-01-06,37,-0.074446,C
6,2023-01-07,97,-1.918771,B
7,2023-01-08,51,-0.026514,B
8,2023-01-09,44,0.06023,A
9,2023-01-10,24,2.463242,B


#### loc and iloc methods

In [86]:
df_reset_index.iloc[0:5,0:3]

Unnamed: 0,Date,Value1,Value2
0,2023-01-01,70,-0.161286
1,2023-01-02,72,0.404051
2,2023-01-03,27,1.886186
3,2023-01-04,9,0.174578
4,2023-01-05,62,0.25755
