# Data Manipulation (Pandas) - Problem set

## 1. **DataFrame Creation and Manipulation**  
   Create a `DataFrame` representing a portfolio of stocks with the following data:
   - `Stock`: ['AAPL', 'GOOGL', 'MSFT', 'AMZN']
   - `Shares`: [50, 30, 100, 10]
   - `Price`: [150.0, 2800.5, 299.0, 3500.75]
   - `Sector`: ['Tech', 'Tech', 'Tech', 'Retail']
   
   Add a new column `Total_Value` representing the total value of each stock holding (`Shares * Price`).
   

In [None]:
# Your code

## 2. **Advanced Selection**  
   Given a `DataFrame` of loan transactions with the following data:
   - `Loan_Amount`: [8000, 15000, 20000, 5000]
   - `Interest_Rate`: [0.06, 0.045, 0.03, 0.07]

   Select all transactions where the loan amount is greater than $10,000 and the interest rate is below 5%.

In [None]:
# Your code

## 3. **Missing Data Handling**  
   Create a `DataFrame` with the following data:
   - `Price`: [100, None, 250, None]
   - `Stock`: [10, 20, None, 5]

   Replace all missing values in `Price` with the average price, and drop rows where `Stock` is missing.

In [None]:
# Your code

## 4. **GroupBy and Aggregation**  
   You are given sales data for a company with the following data:
   - `Region`: ['North', 'South', 'North', 'South']
   - `Salesperson`: ['John', 'Anna', 'John', 'Anna']
   - `Product`: ['A', 'B', 'C', 'A']
   - `Revenue`: [500, 700, 800, 600]

   Use the `GroupBy` function to calculate the total revenue per region and per salesperson.

In [None]:
# Your code

## 5. **Custom Functions with Apply**  
   Create a `DataFrame` of customers with the following data:
   - `Customer_ID`: ['C001', 'C002', 'C003']
   - `Transaction_Amount`: [500, 2000, 1500]
   - `Age`: [22, 30, 45]

   Use the `.apply()` function to create a new column `Eligibility` that assigns 'Eligible' if the `Transaction_Amount` is greater than 1000 and the `Age` is above 25.

## 6. **Concatenation**  
   You have two `DataFrames` representing different departments’ sales figures:
   - `df1`: {'Department': ['Sales', 'HR'], 'Revenue': [1000, 500]}
   - `df2`: {'Department': ['IT', 'Finance'], 'Revenue': [1200, 800]}

   Concatenate the two `DataFrames` to create a combined dataset, ensuring no index duplication.

In [None]:
# Your code

## 7. **Merging DataFrames**  
   You are provided two `DataFrames`, one with customer details:
   - `Customer_ID`: ['C001', 'C002', 'C003']
   - `Name`: ['John', 'Anna', 'Paul']
   - `Location`: ['NY', 'LA', 'SF']

   And another with purchase details:
   - `Customer_ID`: ['C001', 'C003']
   - `Purchase_Amount`: [500, 700]

   Merge the two `DataFrames` on `Customer_ID` to analyze customer purchases.

In [2]:
# Your code
customer_data = {
    'Customer_ID': ['C001', 'C002', 'C003'],
    'Name': ['John', 'Anna', 'Paul'],
    'Location': ['NY', 'LA', 'SF']
}

purchase_data = {
    'Customer_ID': ['C001', 'C003'],
    'Purchase_Amount': [500, 700]
}

In [6]:
customer_df = pd.DataFrame(customer_data)
purchase_df = pd.DataFrame(purchase_data)

In [8]:
pd.merge(customer_df, purchase_df, on='Customer_ID', how = 'inner')

Unnamed: 0,Customer_ID,Name,Location,Purchase_Amount
0,C001,John,NY,500
1,C003,Paul,SF,700


In [7]:
pd.merge(customer_df, purchase_df, on='Customer_ID', how = 'outer')

Unnamed: 0,Customer_ID,Name,Location,Purchase_Amount
0,C001,John,NY,500.0
1,C002,Anna,LA,
2,C003,Paul,SF,700.0


## 8. **DateTime Manipulation**  
   Create a `DataFrame` with the following sales transaction data:
   - `Date`: ['2023-01-01', '2023-01-15', '2023-02-01']
   - `Sales`: [500, 700, 900]

   Extract the month and year from the `Date` column, and then group the sales data by month to calculate total monthly sales.

In [None]:
# Your code

## 9. **String Operations with Series**  
   Given a `Series` of customer feedback comments:
   - ['The service was excellent!', 'Poor response', 'Excellent work']

   Identify how many comments contain the word "excellent" (case-insensitive), and replace the word "poor" with "unsatisfactory".

In [None]:
# Your code

## 10. **Advanced Sorting**  
    Create a `DataFrame` with the following product data:
    - `Product`: ['A', 'B', 'C']
    - `Price`: [10, 50, 30]
    - `Stock`: [100, 50, 200]

    Sort the `DataFrame` first by `Stock` in descending order and then by `Price` in ascending order. What is the most expensive product with the highest stock?

In [None]:
# Your code