# Week 5 In-Class Python Lab
## Introduction to Pandas and Exploratory Data Analysis (EDA)

In this lab, you will work with a real-world dataset using **pandas**. You will not be told exactly which commands to use. Instead, youâ€™ll explore the data by answering questions that mirror how analysts work with unfamiliar datasets.

The goal is to understand the data before trying to predict outcomes.

## Objectives

By the end of this lab, you should be able to:
- Load a CSV file into pandas
- Describe the shape and structure of a dataset
- Identify missing values and extreme values
- Select and filter subsets of data
- Perform basic exploratory and univariate analysis
- Reason about appropriate data cleaning decisions


In [None]:
# START HERE
# import the appropriate packages pandas and numpy, be sure to alias them appropriately 

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

def _check(name, condition, success_msg="Looks good!", fail_msg="Check your work and try again."):
    if condition:
        print(f"{name}: {success_msg}")
    else:
        raise AssertionError(f"{name}: {fail_msg}")

print("Setup complete.")

## Dataset

You will use the **E-Commerce Product Performance Dataset** from Kaggle.

Download the dataset and place the CSV file in the same folder as this notebook.

## Activity 1: Load the Data

Your first task is to load the dataset into a DataFrame named `df`.

Think about:
- Which pandas function reads CSV files?
- What information does it need?

In [None]:
# YOUR CODE HERE

In [None]:
_check(
    "Check 1",
    'df' in globals() and df.shape[0] > 0 and df.shape[1] > 0,
    success_msg="Data loaded successfully"
)

## Activity 2: Understand the Structure

Explore the dataset to answer the following questions:
- How many rows and columns are there?
- What are the column names?
- Which columns appear numerical?
- Which columns appear categorical?
- Are there missing values?

In [None]:
# YOUR CODE HERE

## Activity 3: Exploring Numerical Values

Generate summary information for the numerical columns.

As you review the output, think about:
- What seems typical?
- What seems unusually large or small?

In [None]:
# YOUR CODE HERE


In [None]:
_check(
    "Check 3",
    'summary_stats' in globals(),
    success_msg="Summary statistics created"
)

## Activity 4: Selecting and Filtering Data

Analysts rarely work with an entire dataset at once.

Practice:
- Selecting a single column
- Selecting multiple columns
- Selecting rows by position
- Filtering rows based on a condition you choose

In [None]:
# YOUR CODE HERE

## Activity 5: Missing Values and Cleaning

Real datasets often contain missing values. In this activity, you will **discover where missing data occurs** and think critically about how (or whether) it should be handled.

### Task 5A: Discover Missing Data Patterns

Explore the dataset to answer:
- Which columns contain missing values?
- How many missing values does each column have?
- Are missing values concentrated in a few columns or spread out?

Pay special attention to columns that represent **availability, counts, or yes/no indicators** (for example, stock availability).

In [None]:
# YOUR CODE HERE

### Task 5B: Reason About Handling Missing Data

Choose **one column with missing values** and answer the following in comments:
- What does this column represent?
- Is it numerical, categorical, or binary?
- What might a missing value mean in this context?

Now decide how you would handle the missing values and explain **why**:
- Does replacing missing values with an average make sense?
- Would filling missing values change the meaning of the data?
- Would dropping rows be more appropriate?

For example, if a column represents a **binary concept like stock availability**, consider whether averaging values would produce meaningful results.

Apply your chosen strategy.

In [None]:
# YOUR CODE HERE

## Activity 6: Univariate Analysis

Univariate analysis focuses on understanding **one variable at a time**.

Choose one numerical column and explore:
- Typical values
- Spread
- Extreme values

Use summaries and (if you know how) a simple visualization.

In [None]:
# YOUR CODE HERE

## Reflection

In a Markdown cell, answer:
1. What surprised you about the dataset?
2. What did univariate analysis reveal?
3. How did exploration help you decide what to clean or keep?
4. How might this analysis contribute to a data story?