Q1.List any five functions of the pandas library with execution.


Sure! Here are five commonly used functions in the pandas library along with examples of their execution:

1. **`read_csv()`**: This function reads a comma-separated values (CSV) file into a DataFrame.

```python
import pandas as pd

# Example: Reading a CSV file
df = pd.read_csv('sample_data.csv')
print(df.head())
```

2. **`head()`**: This function returns the first n rows of a DataFrame.

```python
# Example: Displaying the first 5 rows of the DataFrame
print(df.head())
```

3. **`describe()`**: This function generates descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

```python
# Example: Getting summary statistics of the DataFrame
print(df.describe())
```

4. **`groupby()`**: This function is used for splitting the data into groups based on some criteria.

```python
# Example: Grouping by a column and calculating mean of each group
grouped = df.groupby('Category').mean()
print(grouped)
```

5. **`plot()`**: This function is used for plotting data from a DataFrame using matplotlib.

```python
import matplotlib.pyplot as plt

# Example: Plotting a line chart
df['Column1'].plot()
plt.show()
```

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [1]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = pd.RangeIndex(start=1, stop=2*len(df)+1, step=2)
    
    # Set the new index to the DataFrame
    df_reindexed = df.set_index(new_index)
    
    return df_reindexed

# Example DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}
df = pd.DataFrame(data)

# Reindex the DataFrame
df_reindexed = reindex_dataframe(df)
print(df_reindexed)


    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60

In [2]:
import pandas as pd

def sum_first_three_values(df):
    # Ensure the DataFrame has at least three rows
    if len(df) < 3:
        print("The DataFrame does not have enough rows.")
        return
    
    # Initialize the sum
    sum_values = 0
    
    # Iterate over the first three values in the 'Values' column
    for i in range(3):
        sum_values += df['Values'].iloc[i]
    
    # Print the sum
    print("The sum of the first three values is:", sum_values)

# Example DataFrame
data = {
    'Values': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Call the function with the example DataFrame
sum_first_three_values(df)


The sum of the first three values is: 60


Q4.Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.


In [3]:
import pandas as pd

def add_word_count_column(df):
    # Define a function to count words in a text
    def count_words(text):
        # Split the text by whitespace and count the number of resulting parts
        return len(text.split())

    # Apply the count_words function to each row in the 'Text' column and create a new 'Word_Count' column
    df['Word_Count'] = df['Text'].apply(count_words)
    
    return df

# Example DataFrame
data = {
    'Text': ["Hello world", "This is a test", "Pandas is great", "Word count example"]
}
df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
df = add_word_count_column(df)
print(df)


                 Text  Word_Count
0         Hello world           2
1      This is a test           4
2     Pandas is great           3
3  Word count example           3


Q5.How are DataFrame.size() and DataFrame.shape() different?


`DataFrame.size` and `DataFrame.shape` are attributes of a Pandas DataFrame that provide different types of information about the DataFrame. Here’s how they differ:

### `DataFrame.shape`

- **Definition**: The `shape` attribute returns a tuple representing the dimensionality of the DataFrame.
- **Output**: It gives the number of rows and columns in the DataFrame.
- **Usage**: It is useful when you want to know the structure of the DataFrame in terms of its dimensions.

### Example
```python
import pandas as pd

# Example DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Get the shape of the DataFrame
print(df.shape)
```

**Output**:
```
(3, 3)
```
This output indicates that the DataFrame has 3 rows and 3 columns.

### `DataFrame.size`

- **Definition**: The `size` attribute returns an integer representing the number of elements in the DataFrame.
- **Output**: It gives the total number of cells (rows × columns) in the DataFrame.
- **Usage**: It is useful when you want to know the total number of elements in the DataFrame.

### Example
```python
# Get the size of the DataFrame
print(df.size)
```

**Output**:
```
9
```
This output indicates that the DataFrame has a total of 9 elements (3 rows * 3 columns).

### Key Differences

- **DataFrame.shape**:
  - Returns a tuple `(number_of_rows, number_of_columns)`.
  - Example output for a DataFrame with 3 rows and 3 columns: `(3, 3)`.

- **DataFrame.size**:
  - Returns a single integer `number_of_rows * number_of_columns`.
  - Example output for a DataFrame with 3 rows and 3 columns: `9`.

### Summary

- Use `DataFrame.shape` to get the dimensions of the DataFrame.
- Use `DataFrame.size` to get the total number of elements in the DataFrame.

Q6. Which function of pandas do we use to read an excel file?


In pandas, the function used to read an Excel file is pd.read_excel(). This function allows you to read data from an Excel file into a pandas DataFrame.

Basic Usage Here’s an example of how to use pd.read_excel(): 

In [None]:
import pandas as pd

Read the Excel file into a DataFrame
df = pd.read_excel('example.xlsx')

Display the first few rows of the DataFrame
print(df.head())

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.



In [5]:
import pandas as pd

def extract_username(df):
    # Define a function to extract the username from an email address
    def get_username(email):
        # Split the email by '@' and take the first part
        return email.split('@')[0]
    
    # Apply the get_username function to each row in the 'Email' column and create a new 'Username' column
    df['Username'] = df['Email'].apply(get_username)
    
    return df

# Example DataFrame
data = {
    'Email': ['john.doe@example.com', 'jane.smith@domain.com', 'foo.bar@test.com']
}
df = pd.DataFrame(data)

# Call the function to add the 'Username' column
df = extract_username(df)
print(df)


                   Email    Username
0   john.doe@example.com    john.doe
1  jane.smith@domain.com  jane.smith
2       foo.bar@test.com     foo.bar


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.

For example, if df contains the following values:

A B C

0 3 5 1

1 8 2 7

2 6 9 4

3 2 3 5

4 9 1 2

Your function should select the following rows: A B C

1 8 2 7

4 9 1 2

The function should return a new DataFrame that contains only the selected rows

In [8]:
import pandas as pd

def filter_dataframe(df):
    # Apply the conditions to filter rows
    filtered_df = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return filtered_df

# Example DataFrame
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}
df = pd.DataFrame(data)

# Call the function to filter the DataFrame
filtered_df = filter_dataframe(df)
print(filtered_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9.Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.


In [9]:
import pandas as pd

def calculate_statistics(df):
    # Calculate the mean of the 'Values' column
    mean_value = df['Values'].mean()
    
    # Calculate the median of the 'Values' column
    median_value = df['Values'].median()
    
    # Calculate the standard deviation of the 'Values' column
    std_dev_value = df['Values'].std()
    
    # Print the calculated statistics
    print(f"Mean: {mean_value}")
    print(f"Median: {median_value}")
    print(f"Standard Deviation: {std_dev_value}")

# Example DataFrame
data = {
    'Values': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Call the function to calculate and print the statistics
calculate_statistics(df)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.


In [10]:
import pandas as pd

def add_moving_average(df):
    # Ensure the 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort the DataFrame by date to ensure the moving average is calculated correctly
    df = df.sort_values(by='Date')
    
    # Calculate the moving average with a window size of 7 and include the current day
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example DataFrame
data = {
    'Date': ['2023-07-01', '2023-07-02', '2023-07-03', '2023-07-04', '2023-07-05', '2023-07-06', '2023-07-07', '2023-07-08', '2023-07-09', '2023-07-10'],
    'Sales': [100, 150, 200, 250, 300, 350, 400, 450, 500, 550]
}
df = pd.DataFrame(data)

# Call the function to add the 'MovingAverage' column
df = add_moving_average(df)
print(df)


        Date  Sales  MovingAverage
0 2023-07-01    100          100.0
1 2023-07-02    150          125.0
2 2023-07-03    200          150.0
3 2023-07-04    250          175.0
4 2023-07-05    300          200.0
5 2023-07-06    350          225.0
6 2023-07-07    400          250.0
7 2023-07-08    450          300.0
8 2023-07-09    500          350.0
9 2023-07-10    550          400.0


Q11.You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

For example, if df contains the following values:

Date

0 2023-01-01

1 2023-01-02

2 2023-01-03

3 2023-01-04

4 2023-01-05

Your function should create the following DataFrame:


Date Weekday

0 2023-01-01 Sunday

1 2023-01-02 Monday

2 2023-01-03 Tuesday

3 2023-01-04 Wednesday

4 2023-01-05 Thursday

The function should return the modified DataFrame

In [11]:
import pandas as pd

def add_weekday_column(df):
    # Ensure the 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Create the 'Weekday' column by extracting the weekday name from the 'Date' column
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

# Example DataFrame
data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df = add_weekday_column(df)
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12.Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.


In [12]:
import pandas as pd

def select_rows_between_dates(df):
    # Ensure the 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the start and end dates
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    # Select rows where the 'Date' is between the start and end dates
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    filtered_df = df[mask]
    
    return filtered_df

# Example DataFrame
data = {
    'Date': ['2022-12-31', '2023-01-01', '2023-01-15', '2023-01-31', '2023-02-01'],
    'Value': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Call the function to filter the DataFrame
filtered_df = select_rows_between_dates(df)
print(filtered_df)


        Date  Value
1 2023-01-01     20
2 2023-01-15     30
3 2023-01-31     40


Q13.To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is `pandas`. This is done using the following import statement:

```python
import pandas as pd
```

### Explanation:

- **`import pandas as pd`**:
  - This statement imports the pandas library and gives it the alias `pd`.
  - Using `pd` as an alias is a common convention and makes it easier to reference pandas functions and classes in your code.

### Example:

Here’s a simple example that demonstrates how to import pandas and use some basic functions:

```python
import pandas as pd

# Create a simple DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

# Calculate the mean of column 'A'
mean_a = df['A'].mean()
print(f"The mean of column A is: {mean_a}")
```

In this example:
- We import pandas as `pd`.
- We create a DataFrame `df`.
- We print the DataFrame.
- We calculate and print the mean of column 'A'.