## Exercice 1 : Creating and Modifying Series

Create a Pandas Series from a dictionary where keys are ['a', 'b', 'c'] and values are [100, 200, 300].

In [1]:
import pandas as pd

# Create a dictionary with keys and values
data = {'a': 100, 'b': 200, 'c': 300}

# Create a Pandas Series from the dictionary
series = pd.Series(data)

print(series)

a    100
b    200
c    300
dtype: int64


## Exercice 2 : Creating DataFrames
Create a DataFrame from the following data:


Modify the code to add a new column D with values [10, 11, 12].

Drop column B from the DataFrame and display the result.

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

In [5]:
import pandas as pd

# Create the DataFrame
data = {'A': [1, 4, 7], 'B': [2, 5, 8], 'C': [3, 6, 9]}
df = pd.DataFrame(data)
print(df)
print('\n')

# Add a new column 'D'
df['D'] = [10, 11, 12]
print(df) 
print('\n')

# Drop column 'B'
df = df.drop('B', axis=1)

# Display the modified DataFrame
print(df)

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


   A  B  C   D
0  1  2  3  10
1  4  5  6  11
2  7  8  9  12


   A  C   D
0  1  3  10
1  4  6  11
2  7  9  12


## Exercice 3 : DataFrame Indexing and Selection

Select column B from the following DataFrame:

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

Modify the code to select both columns A and C.

Select the row with index 1 using the .loc method.

In [None]:
import pandas as pd

# Create the DataFrame
data = {'A': [1, 4, 7], 'B': [2, 5, 8], 'C': [3, 6, 9]}
df = pd.DataFrame(data)

# Select column B
column_b = df['B']
print("Column B:\n", column_b)

# Select columns A and C
columns_ac = df[['A', 'C']]
print("\nColumns A and C:\n", columns_ac)

# Select row with index 1 using .loc
row_1 = df.loc[1]
print("\nRow with index 1:\n", row_1)


Column B:
 0    2
1    5
2    8
Name: B, dtype: int64

Columns A and C:
    A  C
0  1  3
1  4  6
2  7  9

Row with index 1:
 A    4
B    5
C    6
Name: 1, dtype: int64


## Exercice 4 : Adding and Removing DataFrame Elements

Add a new column Sum to the DataFrame which is the sum of columns A, B, and C.

Remove the column Sum from the DataFrame.

Add a column Random with random numbers generated using numpy.

In [7]:
import pandas as pd
import numpy as np

# Create the DataFrame
data = {'A': [1, 4, 7], 'B': [2, 5, 8], 'C': [3, 6, 9]}
df = pd.DataFrame(data)

# Add a new column 'Sum'
df['Sum'] = df['A'] + df['B'] + df['C']
print("DataFrame with 'Sum' column:\n", df)

# Remove the 'Sum' column
df = df.drop('Sum', axis=1)
print("\nDataFrame after removing 'Sum' column:\n", df)

# Add a column 'Random' with random numbers
df['Random'] = np.random.rand(len(df))
print("\nDataFrame with 'Random' column:\n", df)

DataFrame with 'Sum' column:
    A  B  C  Sum
0  1  2  3    6
1  4  5  6   15
2  7  8  9   24

DataFrame after removing 'Sum' column:
    A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

DataFrame with 'Random' column:
    A  B  C    Random
0  1  2  3  0.412644
1  4  5  6  0.140522
2  7  8  9  0.459053


## Exercice 5 : Merging DataFrames

Merge the following two DataFrames on the key column:

left:
   key  A  B
0    1  A1  B1
1    2  A2  B2
2    3  A3  B3

right:
   key  C  D
0    1  C1  D1
1    2  C2  D2
2    3  C3  D3

Modify the merge to use an outer join instead of an inner join.

Add a new column E to the right DataFrame and update the merge to include this new column.

In [3]:
# Create the left DataFrame
left = pd.DataFrame({'key': [1, 2, 3], 'A': ['A1', 'A2', 'A3'], 'B': ['B1', 'B2', 'B3']})

# Create the right DataFrame
right = pd.DataFrame({'key': [1, 2, 3], 'C': ['C1', 'C2', 'C3'], 'D': ['D1', 'D2', 'D3']})

# Merge DataFrames using inner join (default)
merged_inner = pd.merge(left, right, on='key')
print("Inner Merge:\n", merged_inner)

# Merge DataFrames using outer join
merged_outer = pd.merge(left, right, on='key', how='outer')
print("\nOuter Merge:\n", merged_outer)

# Add a new column 'E' to the right DataFrame
right['E'] = ['E1', 'E2', 'E3']

# Merge DataFrames with the updated right DataFrame
merged_outer_with_e = pd.merge(left, right, on='key', how='outer')
print("\nOuter Merge with 'E' column:\n", merged_outer_with_e)

Inner Merge:
    key   A   B   C   D
0    1  A1  B1  C1  D1
1    2  A2  B2  C2  D2
2    3  A3  B3  C3  D3

Outer Merge:
    key   A   B   C   D
0    1  A1  B1  C1  D1
1    2  A2  B2  C2  D2
2    3  A3  B3  C3  D3

Outer Merge with 'E' column:
    key   A   B   C   D   E
0    1  A1  B1  C1  D1  E1
1    2  A2  B2  C2  D2  E2
2    3  A3  B3  C3  D3  E3


## Exercice 6 : Data Cleaning

Replace all NaN values in the following DataFrame with the value 0:

   A    B    C
0  1.0  NaN  3.0
1  NaN  5.0  6.0
2  7.0  8.0  NaN

Modify the code to replace NaN values with the mean of the column.

Drop rows where any value is NaN.

In [2]:
import numpy as np

# Create the DataFrame with NaN values
data = {'A': [1.0, np.nan, 7.0], 'B': [np.nan, 5.0, 8.0], 'C': [3.0, 6.0, np.nan]}
df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Replace NaN with 0
df_filled_zero = df.fillna(0)
print("\nDataFrame with NaN replaced by 0:\n", df_filled_zero)

# Replace NaN with the mean of the column
df_filled_mean = df.fillna(df.mean())
print("\nDataFrame with NaN replaced by column mean:\n", df_filled_mean)

# Drop rows with NaN values
df_dropped_na = df.dropna()
print("\nDataFrame with NaN rows dropped:\n", df_dropped_na)

Original DataFrame:
      A    B    C
0  1.0  NaN  3.0
1  NaN  5.0  6.0
2  7.0  8.0  NaN

DataFrame with NaN replaced by 0:
      A    B    C
0  1.0  0.0  3.0
1  0.0  5.0  6.0
2  7.0  8.0  0.0

DataFrame with NaN replaced by column mean:
      A    B    C
0  1.0  6.5  3.0
1  4.0  5.0  6.0
2  7.0  8.0  4.5

DataFrame with NaN rows dropped:
 Empty DataFrame
Columns: [A, B, C]
Index: []


## Exercice 7 : Grouping and Aggregation

Group the following DataFrame by column Category and calculate the mean of column Value:

   Category  Value
0         A      1
1         B      2
2         A      3
3         B      4
4         A      5
5         B      6

Modify the code to calculate the sum instead of the mean.

Group by Category and count the number of entries in each group.

In [4]:
# Create the DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)

# Group by Category and calculate the mean of Value
grouped_mean = df.groupby('Category')['Value'].mean()
print("Grouped Mean:\n", grouped_mean)

# Group by Category and calculate the sum of Value
grouped_sum = df.groupby('Category')['Value'].sum()
print("\nGrouped Sum:\n", grouped_sum)

# Group by Category and count the number of entries
grouped_count = df.groupby('Category')['Value'].count()
print("\nGrouped Count:\n", grouped_count)

Grouped Mean:
 Category
A    3.0
B    4.0
Name: Value, dtype: float64

Grouped Sum:
 Category
A     9
B    12
Name: Value, dtype: int64

Grouped Count:
 Category
A    3
B    3
Name: Value, dtype: int64


## Exercice 8 : Pivot Tables

Create a pivot table from the following DataFrame, showing the mean Value for each Category and Type:

   Category  Type  Value
0         A     X      1
1         A     Y      2
2         A     X      3
3         B     Y      4
4         B     X      5
5         B     Y      6

Modify the pivot table to show the sum of Value instead of the mean.

Add margins to the pivot table to show the total mean for each Category and Type.

In [5]:
# Create the DataFrame
data = {'Category': ['A', 'A', 'A', 'B', 'B', 'B'],
        'Type': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
        'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)

# Create a pivot table showing the mean of Value
pivot_mean = pd.pivot_table(df, values='Value', index='Category', columns='Type', aggfunc=np.mean)
print("Pivot Table (Mean):\n", pivot_mean)

# Create a pivot table showing the sum of Value
pivot_sum = pd.pivot_table(df, values='Value', index='Category', columns='Type', aggfunc=np.sum)
print("\nPivot Table (Sum):\n", pivot_sum)

# Create a pivot table with margins showing the total mean
pivot_mean_margins = pd.pivot_table(df, values='Value', index='Category', columns='Type', aggfunc=np.mean, margins=True, margins_name='Total')
print("\nPivot Table (Mean with Margins):\n", pivot_mean_margins)

Pivot Table (Mean):
 Type      X  Y
Category      
A         2  2
B         5  5

Pivot Table (Sum):
 Type      X   Y
Category       
A         4   2
B         5  10

Pivot Table (Mean with Margins):
 Type      X  Y  Total
Category             
A         2  2    2.0
B         5  5    5.0
Total     3  4    3.5


## Exercice 9 : Time Series Data

Create a time series DataFrame with a date range starting from '2023-01-01' for 6 periods and random values.

Set the date column as the index of the DataFrame.

Resample the data to calculate the sum for each 2-day period.

In [6]:
# Create a date range
dates = pd.date_range(start='2023-01-01', periods=6)

# Create a DataFrame with random values
df = pd.DataFrame({'Value': np.random.rand(6)}, index=dates)

# Resample the data to calculate the sum for each 2-day period
df_resampled = df.resample('2D').sum()

print(df_resampled)

               Value
2023-01-01  1.709937
2023-01-03  0.900738
2023-01-05  0.103323


## Exercice 10 : Handling Missing Data

Interpolate missing values in the following DataFrame:

   A    B    C
0  1.0  NaN  3.0
1  2.0  5.0  NaN
2  NaN  8.0  9.0

Drop rows with any NaN values instead of interpolating.

In [9]:
# Create the DataFrame with NaN values
data = {'A': [1.0, 2.0, np.nan], 'B': [np.nan, 5.0, 8.0], 'C': [3.0, np.nan, 9.0]}
df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Interpolate missing values
df_interpolated = df.interpolate()
print("\nDataFrame with interpolated NaN values:\n", df_interpolated)

# Drop rows with NaN values
df_dropped_na = df.dropna()
print("\nDataFrame with NaN rows dropped:\n", df_dropped_na)

Original DataFrame:
      A    B    C
0  1.0  NaN  3.0
1  2.0  5.0  NaN
2  NaN  8.0  9.0

DataFrame with interpolated NaN values:
      A    B    C
0  1.0  NaN  3.0
1  2.0  5.0  6.0
2  2.0  8.0  9.0

DataFrame with NaN rows dropped:
 Empty DataFrame
Columns: [A, B, C]
Index: []


## Exercice 11 : DataFrame Operations

Calculate the cumulative sum of the following DataFrame:

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

Calculate the cumulative product of the DataFrame.

Apply a function to subtract 1 from all elements in the DataFrame.

In [8]:
# Create the DataFrame
data = {'A': [1, 4, 7], 'B': [2, 5, 8], 'C': [3, 6, 9]}
df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Calculate the cumulative sum
cumulative_sum = df.cumsum()
print("\nCumulative Sum:\n", cumulative_sum)

# Calculate the cumulative product
cumulative_product = df.cumprod()
print("\nCumulative Product:\n", cumulative_product)

# Apply a function to subtract 1 from all elements
def subtract_one(x):
    return x - 1

df_subtracted = df.applymap(subtract_one)
print("\nDataFrame with 1 subtracted:\n", df_subtracted)

Original DataFrame:
    A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

Cumulative Sum:
     A   B   C
0   1   2   3
1   5   7   9
2  12  15  18

Cumulative Product:
     A   B    C
0   1   2    3
1   4  10   18
2  28  80  162

DataFrame with 1 subtracted:
    A  B  C
0  0  1  2
1  3  4  5
2  6  7  8
