# 📍 Problem Title: Group Sold Products by the Date

🔗 [LeetCode Problem 1484 – Group Sold Products by the Date](https://leetcode.com/problems/group-sold-products-by-the-date/description/?envType=study-plan-v2&envId=30-days-of-pandas)

---

## 📝 Problem Description
We are given a table `Activities` with the following schema:

| Column     | Type |
|------------|------|
| sell_date  | date |
| product    | str  |

- Each row represents a product sold on a particular date.
- We need to group products by `sell_date` and report two things:
  - **num_sold** → the number of distinct products sold on that date.
  - **products** → the list of distinct products sold on that date, ordered lexicographically and joined by commas.

Return the result table ordered by `sell_date`.

---

## 🧾 Example

**Input:**

| sell_date  | product   |
|------------|-----------|
| 2020-05-30 | Headphone |
| 2020-05-30 | Pencil    |
| 2020-06-01 | Pencil    |
| 2020-06-01 | Mask      |
| 2020-06-01 | Bible     |
| 2020-06-02 | Mask      |

**Output:**

| sell_date  | num_sold | products              |
|------------|----------|-----------------------|
| 2020-05-30 | 2        | Headphone,Pencil      |
| 2020-06-01 | 3        | Bible,Mask,Pencil     |
| 2020-06-02 | 1        | Mask                  |

---

## 🧠 Key Concepts
- **Grouping by date** → use `groupby("sell_date")` to collect products sold on the same day.
- **Distinct count** → use `.nunique()` to count how many unique products were sold.
- **Sorted product list** →
  - Convert group to a set to remove duplicates.
  - Sort products alphabetically.
  - Join into a comma-separated string.
- **Ordering** → sort the final DataFrame by `sell_date`.

---

## 🐼 Pandas Outline
1. Group the DataFrame by `sell_date`.
2. Aggregate with two custom operations:
   - `num_sold` = number of unique products.
   - `products` = sorted, comma-separated string of unique products.
3. Reset the index to flatten the grouped result.
4. Sort by `sell_date`.

---

✅ This approach ensures we report **both the count of distinct products** and the **sorted product list** for each day.

In [31]:
import pandas as pd

In [32]:
data = [['2020-05-30', 'Headphone'], ['2020-06-01', 'Pencil'], ['2020-06-02', 'Mask'], ['2020-05-30', 'Basketball'], ['2020-06-01', 'Bible'], ['2020-06-02', 'Mask'], ['2020-05-30', 'T-Shirt']]
activities = pd.DataFrame(data, columns=['sell_date', 'product']).astype({'sell_date':'datetime64[ns]', 'product':'object'})

In [33]:
activities

Unnamed: 0,sell_date,product
0,2020-05-30,Headphone
1,2020-06-01,Pencil
2,2020-06-02,Mask
3,2020-05-30,Basketball
4,2020-06-01,Bible
5,2020-06-02,Mask
6,2020-05-30,T-Shirt


In [34]:
def categorize_product(activities : pd.DataFrame) -> pd.DataFrame:
    groups = activities.groupby("sell_date") # groups the data by sell_date

    # .agg() with multiple aggregations.
    stats = groups.agg(
        num_sold =("product", "nunique"), # -> count distinct products per date.
        products =("product" , lambda x : ",".join(sorted(set(x))))
        # turns the group into set (to remove duplicates)
        # sorted() puts them in ascending lexicographic order.
        # ','.join() -> concatenates into a single string.
    ).reset_index()
    stats.sort_values("sell_date" , inplace = True)
    return stats


In [35]:
categorize_product(activities)

Unnamed: 0,sell_date,num_sold,products
0,2020-05-30,3,"Basketball,Headphone,T-Shirt"
1,2020-06-01,2,"Bible,Pencil"
2,2020-06-02,1,Mask
