# House Prices Dataset - Exploratory Data Analysis (EDA)

In this notebook, we will explore the House Prices dataset. The objectives of this EDA are to:

- **Understand the Data Structure:**  
  Examine the columns, data types, and general summary statistics.
  
- **Identify Missing Values:**  
  Determine which features have missing values and the extent of these missing data points.

- **Visualize Distributions:**  
  Look at the distribution of key variables, especially the target variable `SalePrice`, and see how they relate to other features.

- **Detect Outliers:**  
  Identify any outliers or unusual data points that might need further investigation.

By doing this, we will lay the foundation for effective data cleaning and feature engineering in later steps.

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn

# Enable inline plotting
%matplotlib inline

# Load the training dataset from the 'data/raw' folder
df = pd.read_csv('C:/Users/ycarvalho/OneDrive - EDENRED/Documentos/Data_Analysis_MLE_Kaggle/House_Prices_Dataset/data/raw/train.csv')

print("First 5 rows of the dataset:")
print(df.head())

# Display basic information about the dataset
print("\nDataset Information:")
print(df.info())

# Show summary statistics of the dataset
print("\nSummary Statistics:")
print(df.describe())

# Check for missing values
print("\nMissing Values in Each Column:")
print(df.isnull().sum())

First 5 rows of the dataset:
   Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0   1          60       RL         65.0     8450   Pave   NaN      Reg   
1   2          20       RL         80.0     9600   Pave   NaN      Reg   
2   3          60       RL         68.0    11250   Pave   NaN      IR1   
3   4          70       RL         60.0     9550   Pave   NaN      IR1   
4   5          60       RL         84.0    14260   Pave   NaN      IR1   

  LandContour Utilities  ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold  \
0         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
1         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      5   
2         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      9   
3         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
4         Lvl    AllPub  ...        0    NaN   NaN         NaN       0     12   

  YrSold  SaleType  SaleCondition  Sale