# Week 4 - Data Wrangling and Group-Based Aggregations

In this notebook we will practice data cleaning and group-based aggregations using a *messy* version of the german credit risk dataset.

Dataset reference: üîó https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data

Topics covered:
- Concatenating DataFrames
- Preprocessing
    - Categorical x Numerical Data
    - Fixing column types
    - Standardizing Categorical Values
    - Missing values (Identifying and Imputation)
- Group-Based Aggregations

## SETUP

In [None]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)

## 1. Load the Dataset

> When loading the German Credit dataset from the UCI repository, you will notice
> that the data is split into **two separate DataFrames**:
>
> - `X` contains all feature columns (Attribute1 ‚Ä¶ Attribute20)
> - `y` contains the target variable (`class`)
>
> This separation is common in Machine Learning libraries because it clearly
> distinguishes:
>
> - **independent variables** ‚Üí used to make predictions  
> - **dependent variable** ‚Üí the value we want to predict
>
> However, for **Exploratory Data Analysis (EDA)**, it is usually more convenient
> to work with a **single unified table**.
>
> Having both features and the target in the same DataFrame simplifies:
>
> - inspecting the overall structure  
> - checking distributions  
> - computing correlations  
> - detecting missing values  
> - visualizing relationships between variables
>
> To prepare for EDA, we will **concatenate** the two parts into one unified table.
>
> ### Concatenating DataFrames
>
> The simplest way to combine `X` and `y` is with `pd.concat`, which allows us to
> join DataFrames **side-by-side** using `axis=1`:
>
> - `pd.concat([...])` ‚Üí specifies the DataFrames to combine  
> - `axis=1` or `axis='columns'` ‚Üí concatenate **column-wise**, placing the
>   target column next to the features  
>
> **Example:**
>
> ```python
> df = pd.concat([df_1, df_2], axis=1)
>
> # or equivalently
>
> df = pd.concat([df_1, df_2], axis="columns")
> ```
>
> ### What about `axis=0` or `axis='rows'`?
>
> - This stacks DataFrames **row-wise**, one on top of the other.  
> - It requires both DataFrames to have the **same columns**.  
> - Therefore it is *not* appropriate for joining `X` and `y`.

---

### Q1.1 Load both datasets in separate DataFrames `X` and `y`, then concatenate them into one using `pd.concat`.


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here



## Data Dictionary
>
>Below is the official data dictionary for the German Credit dataset.  
>
>Notice how the variables are originally labeled as `Attribute1`, `Attribute2`, ‚Ä¶ `Attribute20`.  
>
>Although this scheme preserves the order of the variables, it is **not descriptive**, which makes the dataset hard to read during analysis.
>
>| Variable Name | Role    | Type         | Demographic     | Description                                             | Units |
>|---------------|---------|--------------|-----------------|---------------------------------------------------------|-------|
>| Attribute1    | Feature | Categorical  |                 | Status of existing checking account                     |       |
>| Attribute2    | Feature | Integer      |                 | Duration                                                | months|
>| Attribute3    | Feature | Categorical  |                 | Credit history                                          |       |
>| Attribute4    | Feature | Categorical  |                 | Purpose                                                 |       |
>| Attribute5    | Feature | Integer      |                 | Credit amount                                           |       |
>| Attribute6    | Feature | Categorical  |                 | Savings account/bonds                                   |       |
>| Attribute7    | Feature | Categorical  | Other           | Present employment since                                |       |
>| Attribute8    | Feature | Integer      |                 | Installment rate as % of disposable income              |       |
>| Attribute9    | Feature | Categorical  | Marital Status  | Personal status and sex                                 |       |
>| Attribute10   | Feature | Categorical  |                 | Other debtors / guarantors                              |       |
>| Attribute11   | Feature | Integer      |                 | Present residence since                                 |       |
>| Attribute12   | Feature | Categorical  |                 | Property owned                                          |       |
>| Attribute13   | Feature | Integer      | Age             | Age                                                     | years |
>| Attribute14   | Feature | Categorical  |                 | Other installment plans                                 |       |
>| Attribute15   | Feature | Categorical  | Other           | Housing                                                 |       |
>| Attribute16   | Feature | Integer      |                 | Number of existing credits at this bank                 |       |
>| Attribute17   | Feature | Categorical  | Occupation      | Job                                                     |       |
>| Attribute18   | Feature | Integer      |                 | Number of dependents                                    |       |
>| Attribute19   | Feature | Binary       |                 | Telephone                                               |       |
>| Attribute20   | Feature | Binary       | Other           | Foreign worker                                          |       |
>| class         | Target  | Binary       |                 | 1 = Good, 2 = Bad                                       |       |
---
## 2. Clean the Data

> You may have noticed that the column names are mostly **impractical for quick or direct analysis**.
>
> Labels like `Attribute3` or `Attribute14` do not convey meaning and force us to constantly consult the data dictionary.
>
> Before doing any EDA, it is important to assign **clear, consistent, and descriptive** column names.
> This improves:
>
> - readability  
> - visualization and plotting  
> - correlation analysis  
> - interpretability of models later on

---

### Q2. Use the dictionary below to rename all columns to meaningful, standardized names.
 
- Apply it using `.rename(columns=...)` right after concatenating `X` and `y`.

```python
rename_dict = {
    "Attribute1":  "checking_status",
    "Attribute2":  "duration_months",
    "Attribute3":  "credit_history",
    "Attribute4":  "purpose",
    "Attribute5":  "credit_amount",
    "Attribute6":  "savings_account",
    "Attribute7":  "employment_since",
    "Attribute8":  "installment_rate",
    "Attribute9":  "personal_status_sex",
    "Attribute10": "other_debtors",
    "Attribute11": "residence_since",
    "Attribute12": "property",
    "Attribute13": "age",
    "Attribute14": "other_installment_plans",
    "Attribute15": "housing",
    "Attribute16": "existing_credits",
    "Attribute17": "job",
    "Attribute18": "dependents",
    "Attribute19": "telephone",
    "Attribute20": "foreign_worker",
    "class":       "credit_risk"
}


In [None]:
rename_dict = {
    "Attribute1":  "checking_status",
    "Attribute2":  "duration_months",
    "Attribute3":  "credit_history",
    "Attribute4":  "purpose",
    "Attribute5":  "credit_amount",
    "Attribute6":  "savings_account",
    "Attribute7":  "employment_since",
    "Attribute8":  "installment_rate",
    "Attribute9":  "personal_status_sex",
    "Attribute10": "other_debtors",
    "Attribute11": "residence_since",
    "Attribute12": "property",
    "Attribute13": "age",
    "Attribute14": "other_installment_plans",
    "Attribute15": "housing",
    "Attribute16": "existing_credits",
    "Attribute17": "job",
    "Attribute18": "dependents",
    "Attribute19": "telephone",
    "Attribute20": "foreign_worker",
    "Attribute21": "months",
    "Attribute22": "postal_area",
    "class":       "credit_risk"
    
}

# your code here


### Q2.1 Obtain the `.info()` from the Dataset:

>Investigate the datatypes of each column. Are they appropriate?

In [None]:
# your code here


### Q2.2 Obtain descriptive statistics using `.describe()`

In [None]:
# your code here


### Q2.3 Investigate how many missing values are in each column

In [None]:
# you code here


### Q2.4 Create `numerical` and `categorical` lists
- you can check based on dtype (`'O'`) for object
- you can also check using `df.select_dtypes(include=object)` or `np.number` for numerical

In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


### Q2.5 Standadize categorical columns

- if possible, define strategies that could be used in columns with the same problem
- if there are distinct problems, create lists containing a subset of columns with the same problem

In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


### Q2.6 Verify the percentage of missing values in each `categorical` column:
- if it's below `5%`, input the `Mode` (this may not be the best approach but we are cleaning the best we can with what we have learned so far)
- if it's above `40%` drop the column

In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


### Q2.7. Verify the percentage of missing values in each `numerical` column
- if it's above `40%` drop the column
- inpute the `mean`, `median` or `mode`, decide yourself which you are going to use

In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


### Q2.8 Verify if there are **outliers** in `numerical` columns using the `IQR method`

> To detect outliers in a numerical column, we can use the **Interquartile Range (IQR) method**.
> The IQR represents the spread of the middle 50% of the data.
>
> The formula works as follows:
>
> - Compute the 1st quartile (Q1) ‚Üí 25th percentile  
> - Compute the 3rd quartile (Q3) ‚Üí 75th percentile  
> - Compute the **IQR**:
>
> $$
> \text{IQR} = Q3 - Q1
> $$
>
> Outliers are any observations outside the following bounds:
>
> $$
> \text{Lower Bound} = Q1 - 1.5 \times \text{IQR}
> $$
> $$
> \text{Upper Bound} = Q3 + 1.5 \times \text{IQR}
> $$
>
> Values smaller than the lower bound or greater than the upper bound are considered **outliers**.
>
> Now, define a function that verifies whether a column contains outliers:
>
>```python
>def verify_outliers(df: pd.DataFrame, col: str) -> bool:
>    q1 = df[col].quantile(0.25)
>    q3 = df[col].quantile(0.75)
>    iqr = q3 - q1
>    # continue from here
>```
>
>Return a bool from the function and apply it on every `numerical` column


In [None]:
def verify_outliers(df: pd.DataFrame, col: str) -> bool:
    q1 = df[col].quantile(0.25)
    q3 = df[col].quantile(0.75)
    iqr = q3 - q1
    
    # continue from here


In [None]:
# your code here


<details>
<summary><h3>Can we also use the IQR method to remove outliers?</h3></summary>

> Yes, the same mathematical rule used to *detect* outliers can also be used
> to *remove* them.  
>
> Once we compute the lower and upper bounds:
>
> $$
> \text{Lower Bound} = Q1 - 1.5 \times \text{IQR}
> $$
> $$
> \text{Upper Bound} = Q3 + 1.5 \times \text{IQR}
> $$
>
> We can simply filter the DataFrame to keep only the values **within these limits**.
>
> This is known as **IQR-based outlier removal** and is one of the most common
> preprocessing techniques in data cleaning, especially for algorithms that are 
> sensitive to extreme values.
>
> Example function to *remove* outliers from a column:
>
> ```python
> def remove_outliers_iqr(df: pd.DataFrame, col: str) -> pd.DataFrame:
>     q1 = df[col].quantile(0.25)
>     q3 = df[col].quantile(0.75)
>     iqr = q3 - q1
>
>     lower = q1 - 1.5 * iqr
>     upper = q3 + 1.5 * iqr
>
>     return df[(df[col] >= lower) & (df[col] <= upper)]
> ```
>
___
<div style="background-color:#f2f2f2; padding:12px; border-left:4px solid #d9534f; border-radius:4px; margin:10px 0;">
<strong>‚ö†Ô∏è NOTE:</strong> REMOVING OUTLIERS IS NOT ALWAYS RECOMMENDED.<br>
It depends on the context and whether extreme values are real observations or measurement errors.<br>
In credit scoring datasets like this one, outliers may represent important patterns of risk.
</div>


</details>

### Q2.9 Any other problematic column?
- Check for dtypes and duplicated information üòâ
- Convert the columns and drop duplicated information

In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


In [None]:
# your code here


### Q2.10 Export the dataset to a `csv` file as `cleaned_credit_risk_dataset.csv`

In [None]:
# your code here


## Exploratory Data Analysis

Before we continue with groupby-based exploration, it is important to notice that  
many columns in the German Credit dataset contain *coded categorical values* such as:

- `A11`, `A12`, `A13`, ‚Ä¶
- `A30`, `A31`, ‚Ä¶
- `A40`, `A41`, ‚Ä¶
- `A171`, `A172`, ‚Ä¶

These codes make the dataset harder to read and interpret during analysis.

> This is extremely common in real datasets:
> - data may come encoded for storage efficiency  
> - documentation may be separate from the data  
> - variables may need mapping tables to become understandable  

To make our exploratory analysis clearer ‚Äî and to avoid constantly checking the data dictionary ‚Äî  
we will now apply an explicit **mapping** from coded values to descriptive labels.

The mappings below were created based on the official dataset documentation provided in:

üîó https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data

---

>**Install the library.**

In [None]:
!pip install ucimlrepo

>**Load the original dataset using the `ucimlrepo` library**

In [None]:
from ucimlrepo import fetch_ucirepo 

# fetch dataset 
statlog_german_credit_data = fetch_ucirepo(id=144) 
  
# data (as pandas dataframes) 
X = statlog_german_credit_data.data.features 
y = statlog_german_credit_data.data.targets 
  
# metadata 
print(statlog_german_credit_data.metadata) 
  
# variable information 
#print(statlog_german_credit_data.variables) 

>**Renaming the columns with human-readable names.**

In [None]:
rename_dict = {
    "Attribute1":  "checking_status",
    "Attribute2":  "duration_months",
    "Attribute3":  "credit_history",
    "Attribute4":  "purpose",
    "Attribute5":  "credit_amount",
    "Attribute6":  "savings_account",
    "Attribute7":  "employment_since",
    "Attribute8":  "installment_rate",
    "Attribute9":  "personal_status_sex",
    "Attribute10": "other_debtors",
    "Attribute11": "residence_since",
    "Attribute12": "property",
    "Attribute13": "age",
    "Attribute14": "other_installment_plans",
    "Attribute15": "housing",
    "Attribute16": "existing_credits",
    "Attribute17": "job",
    "Attribute18": "dependents",
    "Attribute19": "telephone",
    "Attribute20": "foreign_worker",
    "Attribute21": "months",
    "Attribute22": "postal_area",
    "class":       "credit_risk"
    
}

df.rename(columns=rename_dict, inplace=True)
df.head()

>**Run the following block to replace the coded categorical values with human-readable descriptions.**

In [None]:
# -----------------------------------------
# SAVE ORIGINAL PERSONAL_STATUS_SEX CODES
# (needed later to extract 'sex' and clean personal status)
# -----------------------------------------
df["personal_status_sex_code"] = df["personal_status_sex"].copy()

# -----------------------------------------
# MAPPINGS FOR QUALITATIVE VARIABLES
# -----------------------------------------

map_status = {
    "A11": "< 0 DM",
    "A12": "0<=X<200 DM",
    "A13": ">=200 DM / salary assignments ‚â• 1 year",
    "A14": "no checking account"
}

map_history = {
    "A30": "no credits taken / all paid back duly",
    "A31": "all credits at this bank paid back duly",
    "A32": "existing credits paid back duly till now",
    "A33": "delay in paying off in the past",
    "A34": "critical account / other credits elsewhere"
}

map_purpose = {
    "A40": "car (new)",
    "A41": "car (used)",
    "A42": "furniture/equipment",
    "A43": "radio/television",
    "A44": "domestic appliances",
    "A45": "repairs",
    "A46": "education",
    # A47 does not exist in the original dataset
    "A48": "retraining",
    "A49": "business",
    "A410": "others"
}

map_savings = {
    "A61": "<100 DM",
    "A62": "100<=X<500 DM",
    "A63": "500<=X<1000 DM",
    "A64": ">=1000 DM",
    "A65": "unknown/no savings"
}

map_employment = {
    "A71": "unemployed",
    "A72": "<1 year",
    "A73": "1‚Äì4 years",
    "A74": "4‚Äì7 years",
    "A75": ">=7 years"
}

# Combined personal status + sex text
map_personal_status_sex = {
    "A91": "male: divorced/separated",
    "A92": "female: divorced/separated/married",
    "A93": "male: single",
    "A94": "male: married/widowed",
    "A95": "female: single"
}

map_debtors = {
    "A101": "none",
    "A102": "co-applicant",
    "A103": "guarantor"
}

map_property = {
    "A121": "real estate",
    "A122": "building society savings/life insurance",
    "A123": "car or other (not in savings)",
    "A124": "unknown/no property"
}

map_installment_plans = {
    "A141": "bank",
    "A142": "stores",
    "A143": "none"
}

map_housing = {
    "A151": "rent",
    "A152": "own",
    "A153": "for free"
}

map_job = {
    "A171": "unemployed/unskilled ‚Äì non-resident",
    "A172": "unskilled ‚Äì resident",
    "A173": "skilled employee/official",
    "A174": "management/self-employed/highly qualified"
}

map_telephone = {
    "A191": "none",
    "A192": "yes, registered"
}

map_foreign = {
    "A201": "yes",
    "A202": "no"
}

# -----------------------------------------
# APPLY MAPPINGS TO THE DATAFRAME
# -----------------------------------------

df = df.replace({
    "status": map_status,
    "credit_history": map_history,
    "purpose": map_purpose,
    "savings": map_savings,
    "present_employment": map_employment,
    "personal_status_sex": map_personal_status_sex,  # human-readable combined field
    "other_debtors": map_debtors,
    "property": map_property,
    "other_installment_plans": map_installment_plans,
    "housing": map_housing,
    "job": map_job,
    "telephone": map_telephone,
    "foreign_worker": map_foreign
})

# -----------------------------------------
# SPLIT personal_status_sex INTO 'sex' AND CLEAN 'personal_status'
# (using the original codes saved in personal_status_sex_code)
# -----------------------------------------

# Mapping to extract sex only
map_sex = {
    "A91": "male",
    "A92": "female",
    "A93": "male",
    "A94": "male",
    "A95": "female"
}

# Mapping to extract civil/marital status only
map_personal_status_clean = {
    "A91": "divorced/separated",
    "A92": "divorced/separated/married",
    "A93": "single",
    "A94": "married/widowed",
    "A95": "single"
}

# Create the new 'sex' column
df["sex"] = df["personal_status_sex_code"].map(map_sex)

# Create a new 'personal_status' column with only civil status
df["personal_status"] = df["personal_status_sex_code"].map(map_personal_status_clean)

# Drop the temporary code column
df.drop(columns=["personal_status_sex_code"], inplace=True)


## 3. Introduction to `groupby()` for Exploratory Analysis

> Until now, we have used methods such as `value_counts()`, `mean()`, or `describe()`  
> to inspect columns individually.
>
> However, real datasets often have **subgroups** that behave differently, and we may want
> to understand how a variable behaves *inside* each subgroup.
>
> For this, Pandas provides the command:
>
> `df.groupby("column")`
>
> which splits the dataset into smaller groups based on the values of one column.
>
> Each group can then be inspected separately.

---

### 3.1 Counting Values Inside Groups ‚Äî `.groupby().value_counts()`

> This tells us **how a categorical variable behaves inside each subgroup**.
>
>**Example:**
>
>```python
>   df.groupby("housing")["checking_status"].value_counts()
>```

### Q3.1. Inspect Categorical Distributions Inside Groups

- Using `.groupby('col1')['related_col'].value_counts()`, compute how the column
`personal_status` is distributed inside each `credit_risk` group.

- Your output should show, **for each value** of `credit_risk`,
**the count of each category** in `personal_status`.


In [None]:
# your code here



### Q3.2 Inspect Categorical Distributions Inside Sub-Groups

- Using `.groupby(['col1', 'col2'])['related_col'].value_counts()`, compute how the column
`personal_status` is distributed across each `sex` category within each `credit_risk` group.

- The output should display, for every value of credit_risk, the count of each category in personal_status, separated by sex.

>**Keep in mind that as we add more grouping columns, the resulting output becomes less intuitive to read.**


In [None]:
# your code here


### Q3.3 Compare the distribution of `housing` Inside each `credit_risk` group

In [None]:
# your code here


### Q3.4 Which professionals category have the highest average credit amount?

- For each `job` category, compute the mean of `credit_amount`.

In [None]:
# your code here


### Q3.5. Inspect `age` statistics inside each `credit_risk` group
- Compute descriptive statistics (`.describe()`) for the column `age` inside each `credit_risk` group.

In [None]:
# your code here


### Q3.6. Create an `age` **binning column** (`age_group`) to explore group statistics


>**Remember that we can create bins using `cut` like:**
>
>```python
>   bins = [0, 25, 40, 60, 120]   # interval limits
>   labels = ["<25", "25‚Äì40", "40‚Äì60", "60+"]  # names of the age groups
>
>   df["age_bin"] = pd.cut(df["col"], bins=bins, labels=labels)
>```
>
>**Using `qcut` to split into equal `n` parts like:**
>
>```python
>   df["age_bin_q"] = pd.qcut(df["col"], q=n, labels=["Q1", "Q2", "Q3", ..., "QN"])
>```

- **We want meaningful age groups such as (e.g., `<25`, `25‚Äì40`, `40‚Äì60`, `60+`).**


In [None]:
# your code here


### Q3.7. Compare credit risk across age groups

- Using the `age_group` created in the previous question, analyze how `credit_risk` is distributed inside each age group.

In [None]:
# your code here


### Q3.8 Compute the percentage of bad credit risk per age Group

In [None]:
# your code here


### Q3.9 Based on the previous question answer, younger or older customers are more likely to have good or bad credit risk?

In [None]:
# your answer here


### Q3.10 Compare the average credit amount across age Groups

- Compute the **mean** value of `credit_amount` for each age group.
- Which age group tends to request the highest credit amounts?


In [None]:
# your code here


### Q3.11 Compare Employment Duration Across Age Groups

- Compute the count of each `employment_since` category inside each `age_group`.

In [None]:
# your code here


### Q3.12 Explore Purpose of Credit Within Age Groups

- For each `age_group`, compute how many people requested credit for each type of `purpose`.

In [None]:
# your code here


### Q3.13 Number of Existing Credits by Age Group
- determine which age group tends to have more existing credit lines.

In [None]:
# your code here


### Q3.14 Cross-Analyze Age Groups and Housing

- For each age_group, compute how many people fall into each housing category.

In [None]:
# your code here


## 3.2 Aggregating Multiple Statistics with `.agg()`

> Until now, we have computed one summary statistic at a time:
>
> - `.mean()`
> - `.size()`
> - `.value_counts()`
> - `.describe()`
>
> These methods are useful, but they only compute **one metric at a time**.
>
> The real power of `groupby()` comes when we want to calculate **several statistics at once**,  
> either for:
>
> - the **same column**, or  
> - **multiple columns** with different metrics.
>
> For this, Pandas provides the method:
>
> ```python
> df.groupby("column").agg({...})
> ```
>
> which allows us to define exactly **which statistics** to compute.
---
>
>**Example 1 ‚Äî Multiple Statistics for One Column**
>
>```python
>   df.groupby("age_group")["credit_amount"].agg(["mean", "median", "max"])
>```
___
>**Example 2 ‚Äî Different Statistics for Different Columns**
>```python
>   df.groupby("age_bin").agg({
>    "credit_amount": ["mean", "std"],
>    "duration_months": ["mean", "max"]
>})
>```
___
> **Example 3 ‚Äî Using Custom Functions Inside `.agg()`**
>
> You can also define your own functions and use them directly inside `.agg()`.
>
> This is extremely useful when the standard statistics (`mean`, `median`, etc.) are not enough for your analysis.
>
> ```python
> # Custom function: range = max - min
> def value_range(series):
>     return series.max() - series.min()
>
> df.groupby("age_group")["credit_amount"].agg([
>     "mean",
>     "median",
>     value_range,     # custom function
> ])
> ```
>
> **This will return a table containing:**
> - the mean  
> - the median  
> - and your custom-defined "range" metric  
>
> computed separately **for each age group**.
___
>**This approach is very common because it allows you to summarize multiple variables at once, grouped by a meaningful category**

### Q3.15. Multiple Statistics for `credit_amount` per age group

- Using `.groupby("age_group")` and `.agg()`, compute the following statistics for `credit_amount` inside each `age_group`:

    - mean  
    - median
    - maximum value


In [None]:
# your code here


### Q3.16. Aggregate Two Numerical Columns at Once

- Using .groupby("age_group"), compute:

    - mean and standard deviation of credit_amount

    - mean and max of duration_months

In [None]:
# your code here


### Q3.17 Define your own function that computes the range of a numeric variable.
- Using `.groupby("age_group")["credit_amount"].agg([...])`, compute:

    - mean
    - median
    - your custom range function

>**Example**
>```python
>   # Custom function: range = max - min
>   def value_range(series: pd.Series):
>       # your code here
>```

In [None]:
# Custom function: range = max - min
def value_range(series):
    pass
    # continue from here removing the pass


In [None]:
# your tears here üòä

In [None]:
# your tears here üòä

In [None]:
# your tears here üòä

In [None]:
# your tears here üòä