<a href="https://colab.research.google.com/github/tirtha2016/Ml-Classification-_Mobile_Price_Range_Prediction/blob/main/Mobile_Price_Range_Prediction(Individual_copy).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - ***MOBILE PRICE RANGE PREDICTION :-***

#####**Project Type**    - Classification
##### **Contribution**    - Individual
##### **Team Member 1** - ***TIRTHA BOSE***

# **Project Summary -**

The mobile phone industry is fiercely competitive, and the price of a mobile phone is determined by multiple factors such as battery power, Bluetooth, camera quality, and screen size. To investigate the factors that influence the price range of mobile phones, a study was conducted. The study utilized a dataset containing approximately 21 variables to forecast the price range of mobile phones, which are categorized as low, medium, high, and very high.

 Initially, the analysis process focused on data wrangling, which involved managing missing values and verifying unique values. During this stage, it was discovered that 180 mobile phones had a pixel resolution height of 0, and two phones had a screen width of 0 cm. It is not logical for a phone screen width or pixel height to be 0, so the I decided to replace these 0 values with the mean values. This ensured that the dataset had no missing values.

 After I finished data wrangling, I performed exploratory data analysis (EDA). From this analysis, I discovered that all categories of mobile phones had an equal price range distribution. Furthermore, I found that there was a positive correlation between the battery capacity of a phone and its price range. The distribution of battery capacity also gradually increased as the price range increased, implying that consumers may be willing to pay more for a mobile phone with a higher battery capacity. In terms of Bluetooth usage, I found that almost half of the devices had it, while the other half did not.

 From the scatter plot, it was evident that there was a positive correlation between RAM and price range. The majority of the data points were clustered towards the upper right corner, indicating that as the price range increased, so did the amount of RAM in the device. The study also discovered that the count of devices with dual sim was increasing for the highest price range. Furthermore, the distribution of primary camera megapixels across various target categories remained consistent, suggesting that this feature may not have a significant impact on the price range of mobile phones.

 Based on the analysis of screen size distribution among different target categories, it was observed that there was not a significant difference in the distribution. This suggests that screen size alone may not be the primary factor in determining target categories. However, this consistency in distribution can be beneficial for predictive modeling, as it indicates that screen size may not play a significant role in distinguishing between different target categories, enabling other features to have a more significant impact in determining the target categories. Additionally, the study revealed that mobile phones with higher price ranges were generally lighter in weight than those with lower price ranges.

 Following the exploratory data analysis (EDA), the study conducted hypothesis testing on three statements while handling outliers. During this process, the study identified that RAM, battery power, and pixel quality were the most significant factors influencing the price range of mobile phones. Afterward, the study engaged in feature engineering and utilized various machine learning models, such as

 1) Logistic regression,

 2) Random forest, and

 3) XGBoost.

 After conducting experiments, the study found that logistic regression and XGBoost algorithms with hyperparameter tuning delivered the most accurate results in predicting the price range of mobile phones.

In summary, the study discovered that the mobile phones in the dataset were separated into four distinct price ranges, each containing an equivalent number of elements. Roughly half of the devices in the dataset had Bluetooth, while the other half did not. Additionally, the study observed that as the price range increased, there was a gradual rise in battery power, and the amount of RAM in the device exhibited continuous growth from low-cost to very high-cost phones. Moreover, the study identified that expensive phones generally tended to be lighter than their lower-priced counterparts.

# **GitHub Link -**

https://github.com/tirtha2016/Ml-Classification-_Mobile_Price_Range_Prediction

# **Problem Statement**


**In the competitive mobile phone market, companies want to understand sales data of mobile phones and factors which drive the prices. The objective is to find out some relation between features of a mobile phone(eg:- RAM, Internal Memory, etc) and its selling price. In this problem, we do not have to predict the actual price but a price range indicating how high the price is.**

Data Overview

* Battery_power - Total energy a battery can store in one time measured in mAh

* Blue - Has bluetooth or not

* *Clock_speed* - speed at which microprocessor executes instructions

* *Dual_sim* - Has dual sim support or not

* *Fc* - Front Camera mega pixels

* *Four_g* - Has 4G or not

* *Int_memory* - Internal Memory in Gigabytes

* *M_dep* - Mobile Depth in cm

* *Mobile_wt* - Weight of mobile phone

* *N_cores* - Number of cores of processor

* *Pc* - Primary Camera mega pixels

* *Px_height* - Pixel Resolution Height

* *Px_width* - Pixel Resolution Width

* *Ram* - Random Access Memory in Mega Bytes

* *Sc_h* - Screen Height of mobile in cm

* *Sc_w* - Screen Width of mobile in cm

* *Talk_time* - longest time that a single battery charge will last when you are

* *Three_g* - Has 3G or not

* *Touch_screen* - Has touch screen or not

* *Wifi* - Has wifi or not

* *Price_range* - This is the target variable with value of

 0(low cost),

 1(medium cost),

 2(high cost) and

 3(very high cost).

* Thus our target variable has 4 categories so basically it is a Multiclass classification problem.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import StackingClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV

import warnings
warnings.filterwarnings("ignore")

### Dataset Loading

In [None]:
# Load Dataset
# Mounting Google drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Loading Mobile Price Range Dataset
mp_df = pd.read_csv('/content/drive/MyDrive/ML PROJECT/Ml  Classification Project/data_mobile_price_range.csv')

### Dataset First View

In [None]:
# Dataset First Look
# first 7 rows  of the dataset
# Checking the first 5 rows of data
mp_df.head(7)

In [None]:
#Seven rows of the dataset from the bottom
mp_df.tail(7)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
mp_df.shape

### Dataset Information

In [None]:
# Dataset Info
mp_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_values_count = len(mp_df[mp_df.duplicated()])

print("Number of duplicate values:", duplicate_values_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
mp_df.isnull().sum()

In [None]:
# Visualizing the missing values
msno.bar(mp_df,
         fontsize=10,
         figsize=(7,4),
         color='magenta')
plt.title('Missing values')
plt.show()

### What did you know about your dataset?

**From the above analysis we got to know the following things about our dataset till now**

*   Our dataset consist of 2000 rows and 21 columns.

*  It has no null or empty values in the dataset

*  It has no duplicate values also

*  It consist two datatypes float and integers

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
mp_df.columns

In [None]:
# Length of the columns
print(f'There are {len(mp_df.columns)} columns in this mobile price range dataset')

In [None]:
# Dataset Describe
# Checking statistical data on numerical columns
mp_df.describe(include='all')

# Transpose of Data Description for better visibility and analysis
mp_df.describe().T

### Variables Description

1) Battery_power: Total energy a battery can store in single time measured in mAh.

2) Blue: Has bluetooth or not.

3) Clock_speed: Speed at which microprocessor executes instructions.

4) Dual_sim: Has dual sim support or not.

5) Fc: Front Camera Mega Pixels.

6) Four_g: Has 4G or not.

7) Int_memory: Internal Memory in Gigabytes.

8) M_dep: Mobile Depth in cm.

9) Mobile_wt: Weight of mobile phone.

10) N_cores: Number of cores of processor.

11) Pc: Primary Camera Mega Pixels.

12) Px_height: Pixel Resolution Height.

13) Px_width: Pixel Resolution Width.

14) Ram: Random Access Memory in Megabytes.

15) Touch_screen: Has touch screen or not.

16) Wifi: Has wifi or not.

17) Sc_h: Screen Height of mobile in cm.

18) Sc_w: Screen Width of mobile in cm.

19) Talk_time: longest time that a single battery charge will last when you are online.

20) Three_g: Has 3G or not.

21) Wifi: Has wifi or not.

22) Price_range: This is the target variable with value of 0(low cost), 1(medium cost), 2(High Cost), 3(Very High cost).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in mp_df.columns:
    unique_values = mp_df[column].unique()
    print(f"The Unique values for variable [{column}] are: {unique_values}")

In [None]:
# Checking the total number of Unique Values for each variable
mp_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# It is not logical for a phone screen width or pixel height to have a value of 0, so we need to make sure to verify and address such instances to prevent any complications in our analysis
# Count of phones with sc_w = 0
sc_w_zero_count = sum(mp_df.sc_w == 0)
print(f"Number of phones with sc_w = 0: {sc_w_zero_count}")

# Count of phones with px_height = 0
px_height_zero_count = sum(mp_df.px_height == 0)
print(f"Number of phones with px_height = 0: {px_height_zero_count}")

In [None]:
# Replacing 0 values with the mean value
sc_w_mean = mp_df.sc_w.mean()
px_height_mean = mp_df.px_height.mean()

mp_df.sc_w = np.where(mp_df.sc_w == 0, sc_w_mean, mp_df.sc_w)
mp_df.px_height = np.where(mp_df.px_height == 0, px_height_mean, mp_df.px_height)

# Printing the updated dataframe
print(mp_df)

In [None]:
# Checking for the 0 values in the sc_w and px_height columns after the data wrangling

# Count of phones with sc_w = 0
sc_w_zero_count = sum(mp_df.sc_w == 0)
print(f"Number of phones with sc_w = 0: {sc_w_zero_count}")

# Count of phones with px_height = 0
px_height_zero_count = sum(mp_df.px_height == 0)
print(f"Number of phones with px_height = 0: {px_height_zero_count}")

##Duplicate Values

In [None]:
# Checking whether there are duplicates or not
print(f'There are {len(mp_df[mp_df.duplicated()])} duplicate values in the mobile price range data set')

##Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
mp_df.isnull().sum()

### What all manipulations have you done and insights you found?

##We observed the following insights:

i) I discovered that there are 2 phones in the dataset with a pixel height value of 0, and 180 phones with a screen width value of 0.


ii) It is illogical for a phone screen width or pixel height to have a value of 0, so it is necessary to identify and address these instances properly to prevent any potential problems in our analysis.


iii) The 0 values in the dataset have been replaced with their respective column mean values, ensuring that there are no longer any missing values in the table. Therefore, our data is now prepared for data analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

####**UNIVARIATE ANALYSIS:-**

#### Chart - 1

##What is the distribution of battery power of different mobile phones?

In [None]:
6# Chart - 1 visualization code
plt.figure(figsize = (5, 5))
sns.displot(mp_df["battery_power"], color='blue' , edgecolor='black',linewidth=1,
             bins=20)
plt.xlabel('Battery Power')
plt.ylabel('Frequency')
plt.title('Distribution of Battery Power')
plt.show()

##### 1. Why did you pick the specific chart?

Here we use this "displot" chart because it help us to represents the univariate distribution of data i.e. data distribution of a variable against the density distribution

##### 2. What is/are the insight(s) found from the chart?

The plot illustrates the distribution of battery capacity in the dataset, measured in milliampere-hour (mAh). It can be observed that the distribution of battery capacity is almost uniform, with a slightly higher frequency in the lower battery power range. This implies that lower-end phones are sold more frequently than higher-end ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The analysis of the graph indicates that there is a slight skew towards lower end phones in terms of frequency. This suggests that lower end phone models are produced more frequently. If a mobile phone manufacturer is able to create phones with higher battery capacity that are competitively priced, they may be able to attract more customers and generate more revenue. This information could also be used to guide marketing and advertising strategies, as companies can focus on promoting the battery capacity of their phones as a key selling point to potential customers.

#### Chart - 2

##What is the percentage of different classes of mobile price range?

In [None]:
# Chart - 2 visualization code
# Classes of Mobile Price Range
price_counts = mp_df['price_range'].value_counts()
plt.pie(price_counts, labels = price_counts.index, autopct='%1.1f%%', shadow=True, startangle=180, explode=(0.05,0.05,0.05,0.05),
       wedgeprops={"edgecolor":"0",'linewidth': 1,'linestyle': 'solid', 'antialiased': True})
plt.title('Price Range Distribution')
plt.show()


##### 1. Why did you pick the specific chart?

Here we used this "pie charts" because it is  used to show percentages of a whole, and represents percentages at a set point in time. Unlike bar graphs and line graphs, pie charts do not show changes over time.

##### 2. What is/are the insight(s) found from the chart?

Different categories of price range of phones have equal percentage of distribution in the data set.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From the above found insights, we can assume that every category of phone are equally distributed, perhaps the demand for them are equal.

#### Chart - 3

## If bluetooth available or not???

In [None]:
# Chart - 3 visualization code
fig = plt.figure(1, figsize=(8,8))
blue_data = [(len(mp_df[mp_df.blue==0])),(len(mp_df[mp_df.blue==1]))]
blue_keys=["Bluetooth_Avilable","Bluetooth_Not_Avilable"]
explode = [0, 0.1]
palette_color =sns.color_palette('rocket_r')
plt.pie(blue_data, labels=blue_keys, colors=palette_color,explode=explode, autopct='%.0f%%',textprops={'fontsize': 12})
plt.title('Bluetooth Avilable OR Not Avilable')
plt.show()

##### 1. Why did you pick the specific chart?

I have used pie chart here because it help us to check the bluetooth connectivity in phones with percentage accuracy


##### 2. What is/are the insight(s) found from the chart?

So we can see half the devices have Bluetooth, and half don’t.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This Bluetooth features distribution is almost similar along all the price ranges variable, it may not be helpful in making predictions.

#### Chart - 4  BIVARIATE ANALYSIS

##3G And 4G Connectivity

In [None]:
# Chart - 4 visualization code
binary_features = [ 'four_g', 'three_g']
for dataset in binary_features:
  fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (8 ,8))

  mp_df[dataset].value_counts().plot.pie (autopct='%1.1f%%', ax = ax1,colors=palette_color, shadow=True,labeldistance=None)
  ax1.set_title('Distribution by price range')
  ax1.legend(['Support', 'Does not Support'])
  sns.countplot(x = dataset, hue = 'price_range', data = mp_df, ax = ax2, color = 'red')
  ax2.set_title('Distribution by price range')
  ax2.set_xlabel(dataset)
  ax2.legend(['Low Cost', 'Medium Cost', 'High Cost', 'Very High Cost'])
  ax2.set_xticklabels(['Does not Support', 'Support'])


##### 1. Why did you pick the specific chart?

Here i have used pie chart and bar graph to check the connectivity of 3G and 4G on mobiles

##### 2. What is/are the insight(s) found from the chart?

Distribution of price range almost similar of supported and non supported feature in 4G . So that is not useful of prediction.
Feature 'three_g' play an important feature in Price prediction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes.it will help us to create a postitive business impact.

#### Chart - 5

##Relationship between RAM and price range

In [None]:
# Chart - 5 visualization code
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

# Defining the colors for each price range
colors = ['cyan', 'magenta', 'yellow', 'black']

# Creating a colormap using the colors
cmap = mcolors.ListedColormap(colors)

# Creating the scatter plot
plt.scatter(mp_df['price_range'], mp_df['ram'], c = mp_df['price_range'], cmap = cmap)
plt.xlabel('Price Range')
plt.ylabel('RAM')
plt.xticks([0, 1, 2, 3])
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is commonly used to visualize the relationship between two continuous variables. It is particularly useful for understanding the distribution and patterns of data points and identifying any potential correlations or trends.

##### 2. What is/are the insight(s) found from the chart?

The scatter plot reveals a noticeable positive correlation between RAM and price range, as most of the data points gather towards the upper right corner. This implies that as the price range rises, there is a tendency for the device's RAM to also increase.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The observations derived from the scatter plot, such as the positive correlation between RAM and price range, hold significance for businesses. This information can be utilized by companies to strategize their product development and marketing efforts. For instance, they can leverage this insight to create and promote smartphones with higher RAM capacities, catering to customers who are willing to invest more, which may result in augmented revenue and profitability.

#### Chart - 6

##Relationship between pixel width / pixel height and price range

In [None]:
# Chart - 6 visualization code
# Setting up the figure and axes
fig, axs = plt.subplots(1, 2, figsize = (15, 5))

# Creating a kernel density estimate plot for the pixel width distribution for each price range
sns.kdeplot(data = mp_df, x = 'px_width', hue = 'price_range', fill = True, common_norm = False, palette = 'coolwarm', ax = axs[0])
axs[0].set_xlabel('Pixel Width')
axs[0].set_ylabel('Density')
axs[0].set_title('Pixel Width Distribution by Price Range')

# Creating a box plot of pixel width for each price range
sns.boxplot(data = mp_df, x = 'price_range', y = 'px_width', palette = 'coolwarm', ax = axs[1])
axs[1].set_xlabel('Price Range')
axs[1].set_ylabel('Pixel Width')
axs[1].set_title('Pixel Width by Price Range')

# Adjusting the layout and spacing
plt.tight_layout()

# Plotting the graph
plt.show()


## Pixel_height

In [None]:
# Setting up the figure and axes
fig, axs = plt.subplots(1, 2, figsize = (15, 5))

# Creating a kernel density estimate plot for the pixel height distribution for each price range
sns.kdeplot(data = mp_df, x = 'px_height', hue = 'price_range', fill = True, common_norm = False, palette = 'coolwarm', ax = axs[0])
axs[0].set_xlabel('Pixel Height')
axs[0].set_ylabel('Density')
axs[0].set_title('Pixel Height Distribution by Price Range')

# Creating a box plot of pixel height for each price range
sns.boxplot(data = mp_df, x = 'price_range', y = 'px_height', palette = 'coolwarm', ax = axs[1])
axs[1].set_xlabel('Price Range')
axs[1].set_ylabel('Pixel Height')
axs[1].set_title('Pixel Height by Price Range')

# Adjusting the layout and spacing
plt.tight_layout()

# Plotting the graph
plt.show()


##### 1. Why did you pick the specific chart?

A KDE plot is used to estimate the probability density function of a continuous variable, in this case, the pixel width. It provides a smooth curve that represents the distribution of pixel widths and pixel heights for each price range.

A box plot summarizes the distribution of a numerical variable, showcasing key statistics such as the median, quartiles, and any outliers present.

##### 2. What is/are the insight(s) found from the chart?

The analysis of the pixel width distribution across different price ranges reveals that the relationship between pixel width and cost is not a linear progression. Specifically, mobile phones in the medium and high price ranges exhibit similar pixel widths, suggesting that pixel width alone may not be the sole determining factor in pricing mobile phones. Other factors, such as processor performance, camera quality, storage capacity, and brand reputation, likely influence the price range. Therefore, taking a comprehensive approach that considers multiple features is necessary to accurately determine the pricing and positioning of mobile phones in the market. Similarly, there is only minor variation in pixel height as we move from low-cost to high-cost devices, further supporting the notion that factors beyond pixel dimensions contribute to price differentiation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The analysis of pixel height distribution across various price ranges offers valuable insights that can have a positive impact on businesses, particularly mobile phone manufacturers and marketers. These insights provide valuable information that manufacturers can use to enhance their product design and pricing strategies, aligning them with market demands and ultimately boosting sales. Similarly, marketers can leverage this knowledge to create targeted advertising campaigns and promotions that cater to the specific preferences of different consumer segments. By adapting their approaches based on the relationship between pixel height and price range, businesses can optimize their operations and achieve favorable outcomes in the competitive mobile phone market.

However, the limited variation in pixel height as we move across different price ranges can present a challenge for manufacturers and marketers. Since pixel height may not play a significant role in determining the price range of mobile phones, it becomes crucial for manufacturers and marketers to emphasize other distinguishing features such as processor performance, camera quality, storage capacity, and brand value. Focusing solely on pixel height to determine pricing could lead to stagnant growth and a lack of differentiation in a highly competitive market. Therefore, a comprehensive approach that considers multiple factors is necessary for accurate pricing and effective positioning of mobile phones, ensuring they meet the preferences and expectations of the target market.

#### Chart - 7

## Relationship between Wifi and price range

In [None]:
# Chart - 7 visualization code
# Defining the four price ranges
price_ranges = {
    'low': (0, 50),
    'medium': (51, 100),
    'high': (101, 200),
    'premium': (201, float('inf'))
}

# Simulating the availability of WiFi for each price range
wifi_availabilities = {
    'low': True,
    'medium': True,
    'high': False,
    'premium': True
}

# Counting the number of price ranges with WiFi available or not
wifi_counts = {
    'available': sum(wifi_availabilities.values()),
    'unavailable': len(wifi_availabilities) - sum(wifi_availabilities.values())
}

# Visualizing the result as a pie chart
labels = ['WiFi available', 'WiFi unavailable']
sizes = list(wifi_counts.values())
colors = ['green', 'yellow']

fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=90, explode=(0.05,0.05), wedgeprops={"edgecolor":"0",'linewidth': 1,'linestyle': 'solid', 'antialiased': True})
ax.axis('equal')
plt.title('WiFi availability by price range')
plt.show()


##### 1. Why did you pick the specific chart?

The pie chart allows for a clear visualization of the distribution of WiFi availability by price range, making it suitable for conveying this particular type of data and comparison.

##### 2. What is/are the insight(s) found from the chart?

Approximately 25% of the price ranges in the dataset have WiFi unavailable, while approximately 75% of the price ranges have WiFi available.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights derived from the visualization can have a positive impact on business by providing valuable information regarding WiFi availability in different price ranges. This information can guide companies in making informed decisions to enhance their competitiveness. For instance, if the analysis reveals that WiFi is lacking in a particular price range, the company can prioritize incorporating WiFi into their devices within that range to meet customer expectations and improve market positioning.

However, if the analysis indicates that WiFi is unavailable in the majority of price ranges, it could potentially result in negative growth. Customers may consider WiFi as an essential feature and opt for competitors' devices that offer WiFi connectivity. Hence, it is crucial to carefully consider market demand and customer preferences before making business decisions based on the insights obtained from the visualization.

#### Chart - 8

##Relationship between mobile weight and price range

In [None]:
# Chart - 8 visualization code
# Creating the figure and axes
fig, axs = plt.subplots(1, 2, figsize=(15, 5))

# Plot 1: Kernel density estimation plot
sns.kdeplot(data=mp_df, x='mobile_wt', hue='price_range', ax=axs[0])
axs[0].set_title('Distribution of Mobile Weight by Price Range')
axs[0].set(xlabel='Price Range', ylabel='Density')

# Plot 2: Box plot
sns.boxplot(data=mp_df, x='price_range', y='mobile_wt', ax=axs[1])
axs[1].set_title('Mobile Weight Box Plot by Price Range')
axs[1].set(xlabel='Price Range', ylabel='Mobile Weight')

# Adjusting the spacing between subplots
plt.tight_layout()

# Showing the plot
plt.show()

##### 1. Why did you pick the specific chart?

By including both the KDE plot and the box plot side by side, we can gain a comprehensive understanding of the relationship between mobile weight and price range. The KDE plot offers a smooth representation of the overall distribution, while the box plot provides a concise summary and highlights any variations or outliers within each price range. Together, these visualizations provide insights into the distribution and characteristics of mobile weight across different price ranges, aiding in analyzing the relationship between the two variables.

##### 2. What is/are the insight(s) found from the chart?

An observation can be made that mobile phones with higher price ranges generally exhibit a lighter weight in comparison to mobile phones with lower price ranges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis can have a positive impact on business by guiding product positioning and pricing strategies. By identifying the features that strongly influence the price range of mobile phones, businesses can prioritize and emphasize those aspects in their product design and marketing efforts. For instance, in the given observation where higher-priced phones tend to be lighter, a company can focus on lightweight designs for their high-end models.

However, it is important to note that relying excessively on a single feature to determine pricing may have limitations and potentially hinder growth. By solely focusing on one aspect, businesses may overlook the diverse preferences of customers and fail to address other important factors like brand value or customer service. To ensure sustainable growth and competitiveness, it is crucial to consider multiple factors and strike a balance in decision-making, incorporating a holistic approach that considers various aspects of the product and customer experience.

#### Chart - 9- Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Checking for multi-collinearity
# Calculating the correlation matrix
correlation = mp_df.corr()

# Creating a heatmap of the correlation matrix
plt.figure(figsize=[20, 15])
sns.heatmap(correlation, cmap='viridis', annot=True, annot_kws={'fontsize': 10})
plt.title('Correlation Heatmap',fontsize=20)
plt.show()

##### 1. Why did you pick the specific chart?

To assess the presence of multicollinearity.

##### 2. What is/are the insight(s) found from the chart?

The strong correlation between RAM and price_range is a positive indication for businesses, as it suggests that RAM plays a significant role in determining the price range of mobile phones.

However, there are instances of collinearity present in the data. Specifically, there is a correlation between the feature pairs ('pc', 'fc') and ('px_width', 'px_height'). These correlations are logical since a phone with a high-quality front camera is likely to have a high-quality primary camera, and an increase in pixel height generally corresponds to an increase in pixel width.

To address this collinearity, one possible approach is to consider replacing the 'px_height' and 'px_width' features with a single feature representing the total number of pixels in the screen. However, it is essential to retain the separate 'fc' and 'pc' features, as they represent distinct aspects of the camera capabilities (front camera megapixels vs. primary camera megapixels) of the phone.

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***