### a) Key Stages to Build and Deploy a Deep Learning Model for Property Price Prediction
## 1.	Problem Definition
### Objective: We would clearly articulate the goal of predicting property prices based on various features. 
### This includes identifying the target audience and the specific requirements of the prediction task, such as accuracy 
### and interpretability.
## Data Collection
### Objective: We would gather relevant datasets that include features such as location, size, number of bedrooms, 
### and historical prices. High-quality data is crucial for training an effective model.
## Data Preprocessing
### Objective: We would clean and prepare the data for analysis. This involves handling missing values, encoding 
### categorical variables, and normalizing numerical features to ensure the data is suitable for deep learning.
## Exploratory Data Analysis
### Objective: We would analyze the dataset to understand feature distributions and relationships. This helps us 
### identify trends, correlations, and potential outliers that could affect model performance.
## Feature Engineering
### Objective: We would create new features or transform existing ones to enhance model performance. This may 
### involve generating interaction terms or aggregating features that better capture the underlying patterns in the data.
## Model Selection
### Objective: We would choose an appropriate deep learning architecture based on the nature of the data and task 
### complexity.
## Model Training
### Objective: We would train the selected model using the training dataset. This involves feeding the data through 
### the model and adjusting weights based on the loss function.
## Model Evaluation
### Objective: We would assess the model's performance using metrics such as Mean Absolute Error and R-squared on a 
### separate validation dataset to ensure it generalizes well.
## Model Tuning
### Objective: We would optimize hyperparameters and model architecture to enhance performance. This includes 
### adjusting learning rates, batch sizes, and the number of layers or neurons.
## Deployment
### Objective: We would deploy the trained model for real-world use, making it accessible through a web application or API. This allows end-users to input features and receive price predictions.
## Monitoring and Maintenance
### Objective: We would continuously monitor the model's performance after deployment, ensuring it remains accurate 
### as new data comes in. This may involve regular retraining with updated datasets.


## b) Critical Hyperparameters to Tune
## 1.Learning Rate
### The learning rate controls how much we change the model in response to the estimated error each time we 
### update the model weights. A learning rate that is too high may cause the model to converge too quickly to a 
### suboptimal solution. A rate that is too low may slow down the training process.
### Proper tuning of the learning rate is essential to ensure effective training, allowing the model to learn 
### efficiently without overshooting the optimal solution.
## 2.Batch Size
### The batch size determines how many samples we process before updating the model's internal parameters. 
### Smaller batch sizes provide a more detailed gradient update, while larger sizes can speed up training.
### Tuning batch size can impact convergence speed and model performance, influencing both the stability of the 
### training process and the final accuracy of the model.
## 3.Number of Layers and Neurons
### The architecture of the neural network, including the number of layers and neurons in each layer, defines the model's capacity to learn complex patterns. More layers can capture more intricate relationships but may also lead to overfitting.
### Adjusting the depth and width of the model is critical in balancing underfitting and overfitting, ultimately impacting the model's ability to generalize to unseen data.


## c) Challenges and Strategies
### 1.Data Quality and Quantity
### We might face issues with insufficient or poor-quality data, leading to inaccurate predictions and a lack of 
### model robustness. Real estate data can be noisy and may contain missing or inconsistent entries.
### Strategy: We would implement robust data cleaning and preprocessing techniques to handle missing values and outliers. Additionally, we might augment the dataset by sourcing more data from multiple platforms or using synthetic data generation methods.
## 2.Model Interpretability
### Deep learning models are often seen as black boxes, making it difficult for us to interpret how predictions 
### are made. This can be a barrier in industries like real estate, where stakeholders need to understand 
### the reasoning behind price predictions.
### Strategy: We could use techniques such as SHAP or LIME to provide insights into feature contributions. 
### This can enhance trust and transparency in the model's predictions.


In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from kerastuner import HyperModel, RandomSearch
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# Step 1: Load the dataset
data = pd.read_csv('Housing.csv')

In [None]:
#Extracting all variable names
data.columns.tolist()

In [None]:
data

In [None]:
# Preview the data
data.head()
data.info()

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
"Missing Values:\n", missing_values

In [None]:
# Fill numeric columns with their mean
numeric_columns = data.select_dtypes(include=['number']).columns
data[numeric_columns] = data[numeric_columns].fillna(data[numeric_columns].mean())

# Fill categorical columns with their mode
categorical_columns = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea', 'furnishingstatus']
for col in categorical_columns:
    data[col].fillna(data[col].mode()[0], inplace=True)


In [None]:
# Check for duplicates
duplicates = data.duplicated().sum()
f'Duplicate Rows: {duplicates}'
data.drop_duplicates(inplace=True)

In [None]:
# Convert relevant columns to categorical
for col in categorical_columns:
    data[col] = data[col].astype('category')

# One-hot encode categorical variables
data = pd.get_dummies(data, drop_first=True)


In [None]:
# Step 2: Handle Outliers using IQR
Q1 = data['price'].quantile(0.25)
Q3 = data['price'].quantile(0.75)
IQR = Q3 - Q1

# Define outlier bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

In [None]:
# Remove outliers
data = data[(data['price'] >= lower_bound) & (data['price'] <= upper_bound)]

In [None]:
# Visualize After Removing Outliers
plt.figure(figsize=(10, 6))
sns.boxplot(y=data['price'])
plt.title('Box Plot of House Prices (After Outlier Removal)')
plt.ylabel('Price')
plt.show()


In [None]:
# Step 3: Data Visualization
# Distribution of house prices
plt.figure(figsize=(10, 6))
sns.histplot(data['price'], bins=30, kde=True)
plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Setting figure size
plt.figure(figsize=(10, 5))

# Plotting the countplot for the number of bedrooms
ax = sns.countplot(x='bedrooms', data=data)  # Use 'data' as the DataFrame
plt.xlabel("Number of Bedrooms")  
plt.ylabel("Count")
plt.title("Data Distribution of Number of Bedrooms")

# Adding labels to each bar
for i in ax.containers:
    ax.bar_label(i,)

# Display the plot
plt.show()

In [None]:
# Function to create a count plot for a given feature
def plot_count_distribution(data, feature):
    plt.figure(figsize=(10, 5))
    ax = sns.countplot(x=feature, data=data)
    plt.xlabel(f"Number of {feature.capitalize()}")  
    plt.ylabel("Count")
    plt.title(f"Data Distribution of Number of {feature.capitalize()}")

    # Adding labels to each bar
    for i in ax.containers:
        ax.bar_label(i,)
        
    plt.show()

# Plot for number of bedrooms
plot_count_distribution(data, 'bedrooms')

# Plot for number of bathrooms
plot_count_distribution(data, 'bathrooms')

# Plot for number of stories
plot_count_distribution(data, 'stories')

In [None]:
# Function to create a count plot for a given feature
def plot_count_distribution(data, feature):
    plt.figure(figsize=(10, 5))
    ax = sns.countplot(x=feature, data=data)
    plt.xlabel(f"{feature.capitalize()}")  
    plt.ylabel("Count")
    plt.title(f"Data Distribution of {feature.capitalize()}")

    # Adding labels to each bar
    for i in ax.containers:
        ax.bar_label(i,)
        
    plt.show()

# Plot for parking
plot_count_distribution(data, 'parking')


In [None]:
# Heatmap for correlation matrix
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True, cbar_kws={"shrink": .8})
plt.title('Correlation Heatmap')
plt.show()

In [None]:
# Handle missing values
print("\nMissing values before cleaning:")
print(data.isnull().sum())

In [None]:
data.info()

In [None]:
# Encode categorical feature
data = pd.get_dummies(data, columns=['furnishingstatus'], drop_first=True)

# Split data
X = data.drop('price', axis=1)
y = data['price']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)