# Exercise 1b - Pandas Basics

### Getting Started with Pandas and DataFrames
Pandas is a powerful Python library for data manipulation and analysis. Its core data structures are Series (1D) and DataFrame (2D).

**Why use Pandas?**
- Easy data exploration and cleaning.
- Integration with NumPy and Matplotlib.
- Tools for reading, writing, filtering, and transforming data.

**DataFrame Capabilities:**
- Intuitive access to rows/columns by label or index.
- Built-in functions for statistics and transformation.

In this exercise, you'll create a simple DataFrame, explore it, and apply filters.


In [29]:
import pandas as pd
import numpy as np

# Create a sample dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
    'Score': [85, 92, 78, 90, np.nan],
    'Passed': [True, True, False, True, True]
}

df = pd.DataFrame(data)

# Show structure and data
print(df.head())
print(df.describe())

# Filter students with Score > 80
print("Students scoring above 80:")
print(df[df['Score'] > 80])


      Name  Score  Passed
0    Alice   85.0    True
1      Bob   92.0    True
2  Charlie   78.0   False
3    David   90.0    True
4   Edward    NaN    True
           Score
count   4.000000
mean   86.250000
std     6.238322
min    78.000000
25%    83.250000
50%    87.500000
75%    90.500000
max    92.000000
Students scoring above 80:
    Name  Score  Passed
0  Alice   85.0    True
1    Bob   92.0    True
3  David   90.0    True


### Practice
Apply what you've learned by modifying the code or writing your own version.

- Try creating a different shape of ndarray.
- Filter DataFrame rows using a different condition.
- Fill missing values using the median instead of the mean.
- Create a new column based on existing ones.

In [3]:
# Your practice code here
df[df['Passed']==False]

Unnamed: 0,Name,Score,Passed
2,Charlie,78,False


In [30]:
df[df['Score'] > 0]

Unnamed: 0,Name,Score,Passed
0,Alice,85.0,True
1,Bob,92.0,True
2,Charlie,78.0,False
3,David,90.0,True


In [31]:
med = df['Score'].median()
df.fillna(med, inplace=True)
df

87.5


Unnamed: 0,Name,Score,Passed
0,Alice,85.0,True
1,Bob,92.0,True
2,Charlie,78.0,False
3,David,90.0,True
4,Edward,87.5,True


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    5 non-null      object 
 1   Score   5 non-null      float64
 2   Passed  5 non-null      bool   
dtypes: bool(1), float64(1), object(1)
memory usage: 217.0+ bytes


In [34]:
df.head()

Unnamed: 0,Name,Score,Passed
0,Alice,85.0,True
1,Bob,92.0,True
2,Charlie,78.0,False
3,David,90.0,True
4,Edward,87.5,True


In [32]:
df.describe()

Unnamed: 0,Score
count,5.0
mean,86.5
std,5.43139
min,78.0
25%,85.0
50%,87.5
75%,90.0
max,92.0


In [27]:
df['Score'] = df['Score'].apply(lambda x: x / 100)
df

Unnamed: 0,Name,Score,Passed
0,Alice,0.85,True
1,Bob,0.92,True
2,Charlie,0.78,False
3,David,0.9,True
4,Edward,0.875,True


### Handling Missing Data with Pandas
Missing data is common in real-world datasets. Pandas provides methods to detect and fill or remove missing values.

**Handling Strategies:**
- Remove missing values (`dropna()`).
- Replace missing values with a fixed value or statistic (`fillna()`).
- Analyze the impact of missing data.

In this exercise, you’ll practice these techniques using a DataFrame with missing entries.


In [2]:
import pandas as pd
import numpy as np

# Create a DataFrame with NaNs
data = {
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Fill missing values with column mean
df_filled = df.fillna(df.mean(numeric_only=True))

print("DataFrame after filling missing values:")
print(df_filled)


Original DataFrame:
     A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0
DataFrame after filling missing values:
          A    B
0  1.000000  3.0
1  2.000000  2.0
2  2.333333  3.0
3  4.000000  4.0
