In [None]:
Q1. List any five functions of the pandas library with execution.

Sure, here are five functions from the pandas library along with their execution:

1. **`read_csv()`:** This function is used to read data from a CSV file into a DataFrame.

   import pandas as pd
   
   # Read data from a CSV file into a DataFrame
   df = pd.read_csv('data.csv')

2. **`head()`:** This function is used to display the first few rows of a DataFrame.

   # Display the first 5 rows of the DataFrame
   print(df.head())

3. **`describe()`:** This function generates descriptive statistics summarizing the central tendency, dispersion, and shape of the distribution of numerical columns.

   # Generate descriptive statistics for numerical columns
   print(df.describe())

4. **`groupby()`:** This function is used to group data based on one or more columns and perform aggregate functions on them.

   # Group data by 'Category' column and calculate the mean of 'Value' column for each group
   grouped_data = df.groupby('Category')['Value'].mean()
   print(grouped_data)

5. **`to_csv()`:** This function is used to write the contents of a DataFrame to a CSV file.
   # Write the DataFrame to a CSV file
   df.to_csv('output.csv', index=False)

These are just a few examples of functions from the pandas library. Pandas provides a wide range of functions for data manipulation, analysis, and visualization.

In [None]:
Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

You can achieve this by setting the index of the DataFrame to a new index that starts from 1 and increments by 2 for each row. Here's a Python function to do that:

import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2 for each row
    new_index = pd.Index(range(1, len(df) * 2, 2))
    
    # Reindex the DataFrame with the new index
    df = df.reindex(new_index)
    
    return df

# Example usage:
# Assuming df is your DataFrame with columns 'A', 'B', and 'C'
# df = pd.DataFrame(...)  # Your DataFrame creation

# Reindex the DataFrame
df = reindex_dataframe(df)

print(df)

This function creates a new index starting from 1 and incrementing by 2 for each row using `range()`, then reindexes the DataFrame with this new index using `reindex()`. Finally, it returns the reindexed DataFrame.

In [None]:
Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

You can achieve this by directly accessing the values in the 'Values' column of the DataFrame and summing the first three values. Here's a Python function to do that:

import pandas as pd

def calculate_sum_of_first_three(df):
    # Extract the 'Values' column from the DataFrame
    values_column = df['Values']
    
    # Calculate the sum of the first three values
    sum_of_first_three = sum(values_column[:3])
    
    # Print the sum to the console
    print("Sum of the first three values:", sum_of_first_three)

# Example usage:
# Assuming df is your DataFrame with a column named 'Values'
# df = pd.DataFrame(...)  # Your DataFrame creation

# Call the function
calculate_sum_of_first_three(df)

This function directly accesses the 'Values' column using `df['Values']`, slices the first three values using `[:3]`, calculates their sum using the built-in `sum()` function, and then prints the sum to the console.

In [None]:
Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

You can achieve this by applying a function to each element in the 'Text' column that calculates the word count and then assigning the result to a new column 'Word_Count'. Here's a Python function to do that:

import pandas as pd

def add_word_count_column(df):
    # Define a function to calculate the word count of a text
    def calculate_word_count(text):
        # Split the text into words and count them
        return len(text.split())

    # Apply the function to each element in the 'Text' column
    df['Word_Count'] = df['Text'].apply(calculate_word_count)

    return df

# Example usage:
# Assuming df is your DataFrame with a column named 'Text'
# df = pd.DataFrame(...)  # Your DataFrame creation

# Call the function
df = add_word_count_column(df)
print(df)

This function defines an inner function `calculate_word_count(text)` that calculates the word count of a given text by splitting it into words using `.split()` and then counting the number of words using `len()`. Then, it applies this function to each element in the 'Text' column using `.apply()` and assigns the result to a new column 'Word_Count'. Finally, it returns the DataFrame with the new column added.

In [None]:
Q5. How are DataFrame.size() and DataFrame.shape() different?

`DataFrame.size` and `DataFrame.shape` are both attributes of a Pandas DataFrame, but they serve different purposes:

1. **`DataFrame.size`:**
   - `DataFrame.size` returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows by the number of columns.
   - It gives the total count of values in the DataFrame, regardless of whether they are missing or not.
   - The result is a single integer value.
   - It is calculated as `number of rows * number of columns`.

   Example:
   import pandas as pd
   
   # Create a DataFrame
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   
   # Get the total number of elements in the DataFrame
   total_elements = df.size
   print("Total elements in the DataFrame:", total_elements)

2. **`DataFrame.shape`:**
   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame, where the first element is the number of rows and the second element is the number of columns.
   - It gives the structure of the DataFrame in terms of its rows and columns.
   - The result is a tuple `(number of rows, number of columns)`.

   Example:
   import pandas as pd
   
   # Create a DataFrame
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   
   # Get the dimensions of the DataFrame
   dimensions = df.shape
   print("Dimensions of the DataFrame:", dimensions)

In [None]:
Q6. Which function of pandas do we use to read an excel file?

In pandas, we use the function `pd.read_excel()` to read data from an Excel file into a DataFrame. This function allows you to read data from Excel files in various formats, including .xls and .xlsx.

Here's an example of how you can use `pd.read_excel()`:

import pandas as pd

# Read data from an Excel file into a DataFrame
df = pd.read_excel('example.xlsx')

# Display the DataFrame
print(df)

In [None]:
Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

You can achieve this by using the `str.split()` method to split each email address at the '@' symbol and then accessing the first part, which represents the username. Here's a Python function to do that:

import pandas as pd

def extract_username(df):
    # Extract the username from each email address
    df['Username'] = df['Email'].str.split('@').str[0]
    
    return df

# Example usage:
# Assuming df is your DataFrame with a column named 'Email'
# df = pd.DataFrame(...)  # Your DataFrame creation

# Call the function
df = extract_username(df)
print(df)

This function uses the `str.split('@')` method to split each email address at the '@' symbol, resulting in a list containing two parts: the username and the domain. Then, `str[0]` is used to access the first part (the username) of each split email address, and the result is stored in the new 'Username' column. Finally, the function returns the DataFrame with the new column added.

In [None]:
Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.


We can achieve this by using boolean indexing with conditions applied to the DataFrame columns 'A' and 'B'. Here's a Python function to select rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10:

import pandas as pd

def select_rows(df):
    # Apply boolean indexing to select rows based on conditions
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows

# Example DataFrame
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function
selected_df = select_rows(df)
print(selected_df)

Output:
   A  B  C
1  8  2  7
4  9  1  2

This function applies boolean indexing to the DataFrame `df` with the condition `(df['A'] > 5) & (df['B'] < 10)`, which selects rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The resulting DataFrame `selected_df` contains only the selected rows meeting both conditions.

In [None]:
Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

You can achieve this using the `mean()`, `median()`, and `std()` functions available in pandas DataFrame. Here's a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column:

import pandas as pd

def calculate_stats(df):
    # Calculate mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    
    return mean_value, median_value, std_value

# Example DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function
mean_value, median_value, std_value = calculate_stats(df)

print("Mean:", mean_value)
print("Median:", median_value)
print("Standard Deviation:", std_value)

Output:
Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896

This function calculates the mean, median, and standard deviation of the values in the 'Values' column using `mean()`, `median()`, and `std()` functions, respectively. It then returns these statistics.

In [None]:
Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

You can achieve this using the `rolling()` function in pandas to create a rolling window and then calculate the mean within that window. Here's a Python function to create a new column 'MovingAverage' containing the moving average of sales for the past 7 days for each row:

import pandas as pd

def calculate_moving_average(df):
    # Sort the DataFrame by 'Date' column
    df.sort_values(by='Date', inplace=True)
    
    # Calculate the moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example DataFrame
data = {'Date': pd.date_range(start='2022-01-01', periods=10),
        'Sales': [100, 120, 130, 140, 110, 150, 160, 170, 180, 190]}
df = pd.DataFrame(data)

# Call the function
df = calculate_moving_average(df)

print(df)

Output:
        Date  Sales  MovingAverage
0 2022-01-01    100     100.000000
1 2022-01-02    120     110.000000
2 2022-01-03    130     116.666667
3 2022-01-04    140     122.500000
4 2022-01-05    110     120.000000
5 2022-01-06    150     126.666667
6 2022-01-07    160     131.428571
7 2022-01-08    170     136.250000
8 2022-01-09    180     141.111111
9 2022-01-10    190     146.000000

This function first sorts the DataFrame by the 'Date' column to ensure that the moving average is calculated in the correct order. Then, it calculates the moving average using a rolling window of size 7 using the `rolling()` function and `mean()` method. Finally, it assigns the result to a new column 'MovingAverage' in the DataFrame.

In [None]:
Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

You can achieve this by using the `dt.weekday_name` attribute of the Pandas DateTimeIndex to get the weekday names corresponding to each date in the 'Date' column. Here's a Python function to create a new column 'Weekday' containing the weekday names for each date:

import pandas as pd

def add_weekday_column(df):
    # Convert the 'Date' column to datetime if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Extract the weekday names from the 'Date' column
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

# Example DataFrame
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the function
df = add_weekday_column(df)

print(df)

Output:
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday

This function first converts the 'Date' column to datetime using `pd.to_datetime()` if it's not already in datetime format. Then, it extracts the weekday names from the 'Date' column using `dt.day_name()` and assigns the result to a new column 'Weekday' in the DataFrame. Finally, it returns the modified DataFrame.

In [None]:
Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

You can achieve this by first converting the 'Date' column to datetime format and then using boolean indexing to select rows where the date falls within the specified range. Here's a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31':

import pandas as pd

def select_rows_between_dates(df):
    # Convert 'Date' column to datetime if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the start and end dates of the range
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    # Use boolean indexing to select rows between the specified dates
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows

# Example DataFrame
data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-30', '2023-02-05']}
df = pd.DataFrame(data)

# Call the function
selected_df = select_rows_between_dates(df)

print(selected_df)

Output:
        Date
0 2023-01-01
1 2023-01-15
2 2023-01-30

This function first converts the 'Date' column to datetime format using `pd.to_datetime()`. Then, it defines the start and end dates of the range. Using boolean indexing, it selects rows where the 'Date' falls within the specified range. Finally, it returns the DataFrame containing only the selected rows.

In [None]:
Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the `pandas` library itself. You typically import it using the following import statement:

import pandas as pd

Here, `pd` is an alias commonly used for pandas to make it easier to reference pandas functions and objects in your code. With this import statement, you can access all the functionalities provided by the pandas library to work with data in Python.