<img src="the_leagueAI_Logo.png" alt="the_leagueAI_Logo" width="200"/>

## 1. Introduction
Smartphones are an integral part of the lives of human beings. When making the decision to purchase a phone, many factors like the display, processor, memory, camera, thickness, battery and connectivity are taken into account. With so many options in the market customers find it increasingly difficult to estimate whether the product they are purchasing is worth the cost.

Today you will attempt at solving this problem by generating a machine learning model capable of predicting the cost of a smartphone taking the historical data pertaining to the key features of smartphones along with their cost. A model capable of predicting the value of a new smartphone can be a powerful tool for customers who are looking to make the best investment when buying a new phone.

<img src="phones_image.jpeg" alt="phones_image" width="1000"/>

### Approach
A reminder of the general approach to working on a Machine Lerning Project:</p>
<ul>
 
    1. Start off by loading and viewing the dataset. Make sure you are able to understand the how our data looks, the data types and value ranges.
 
    2. Prepare the data to make sure that you are not missing any values and that your data is ready for your ML model to make predictions.
 
    3. Build some intuition on your data by exploring the features.
 
    4. Finally build a machine learning model that can predict if an individual's application for a credit card will be accepted.


### Notes
- The data is sitting on a csv file named train_mobil_data.csv
- The features present in this data are:

    * id:ID
    * battery_power:Total energy a battery can store in one time measured in mAh
    * blue:Has bluetooth or not
    * clock_speed:speed at which microprocessor executes instructions
    * dual_sim:Has dual sim support or not
    * fc:Front Camera mega pixels
    * four_g:Has 4G or not
    * int_memory:Internal Memory in Gigabytes
    * m_dep:Mobile Depth in cm
    * mobile_wt:Weight of mobile phone
    * n_cores:Number of cores of processor
    * pc:Primary Camera mega pixels
    * px_height:Pixel Resolution Height
    * px_width:Pixel Resolution Width
    * ram:Random Access Memory in Megabytes
    * sc_h:Screen Height of mobile in cm
    * sc_w:Screen Width of mobile in cm
    * talk_time:longest time that a single battery charge will last when you are
    * three_g:Has 3G or not
    * touch_screen:Has touch screen or not
    * wifi:Has wifi or not



## Data Load
Load the data from the file provided and inspect it.

In [None]:
# Import pandas
import pandas as pd

# Load dataset
dataset=pd.read_csv('../input/train.csv')

# Inspect data
dataset.head()

## Data Exploration
This is the process where you look at and understand their data with statistical and visualization methods. This step helps identifying patterns and problems in the dataset, as well as deciding which model or algorithm to use in subsequent steps.

The steps you should consider in this stage include:

- Identify input(features) and output(target) variables on your data
- Identify the types of data
- identify categorical vs continuous variables
- Understanding the statistical properties of your variables

In [None]:
# Code for data exploration
dataset.info()
dataset.describe()

## Data Visualization
e now have a basic idea about the data. We need to extend that with some visualizations.

We are going to look at two types of plots:

- Histograms plot to have an idea of the distribution
- Scatter plots to find some of the correlation between variables

In [None]:
# Code for data visualization
sns.pairplot(dataset,hue='price_range')

# how is price affected by ram?
sns.jointplot(x='ram',y='price_range',data=dataset,color='red',kind='kde');

# how is price affected by internam memory
sns.pointplot(y="int_memory", x="price_range", data=dataset)

# % percentage of phones wich support 3G
labels = ["3G-supported",'Not supported']
values=dataset['three_g'].value_counts().values
fig1, ax1 = plt.subplots()
ax1.pie(values, labels=labels, autopct='%1.1f%%',shadow=True,startangle=90)
plt.show()

# % percentage of phones that support 4G
labels4g = ["4G-supported",'Not supported']
values4g = dataset['four_g'].value_counts().values
fig1, ax1 = plt.subplots()
ax1.pie(values4g, labels=labels4g, autopct='%1.1f%%',shadow=True,startangle=90)
plt.show()

# How is price affected by battery power
sns.boxplot(x="price_range", y="battery_power", data=dataset)

# No of phones vs camera megapiles of front and primary camera
plt.figure(figsize=(10,6))
dataset['fc'].hist(alpha=0.5,color='blue',label='Front camera')
dataset['pc'].hist(alpha=0.5,color='red',label='Primary camera')
plt.legend()
plt.xlabel('MegaPixels')

# Talk time vs price range
sns.pointplot(y="talk_time", x="price_range", data=dataset)

## Data Preparation
You must now begin the process of transforming raw data so that data it is run through your ml model

- Identify features and target variables
- Modify the data types of each feature (if needed)
- Look for missing values, replace or remove
- Modify skewed variables
- Remove outliers

In [None]:
# Separate features and target variables
features = dataset.drop('price_range',axis=1)
target = dataset['price_range']

In [None]:
#

## Feature Engineering
You must now begin the process of extracting more information from existing data. You are not adding any new data here, but you are actually making the data you already have more useful.


The steps you should consider in this stage include:

- Developing new features apart from those already generated

- Selecting a set of features to remove

- Creating features using existing data through mathematical operations 

- Applying feature scaling

- Applying label encoding

- Understanding correlation between features and target




In [None]:
# Code for feature engineering

In [None]:
# Code for feature engineering

## Build Train and Test
For training a model we initially split the model into 3 three sections which are ‘Training data’ ,‘Validation data’ and ‘Testing data’.
You train the classifier using ‘training data set’, tune the parameters using ‘validation set’ and then test the performance of your classifier on unseen ‘test data set’. 

- Note: during training the classifier only the training and/or validation set is available. The test data set must not be used during training the classifier. The test set will only be available during testing the classifier.

In [None]:
# Import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=101)

## Building the Model

Now that the data has been processed it is time to determine the what model will be used to find our predictions. 

Consider the following points before making a choosing a model:

- The type of prediction this project requires (classification/regression)
- How well do you understand the model you want to use.
- Previoous performance of the model you choose on similar data



In [None]:
# Code to build the model
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)
dtree.score(X_test,y_test)

In [None]:
feature_names=['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
       'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
       'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi']

In [None]:
#For tree Visualization as kaggle does't support pydotplus just install the pydotplus in your systems's conda terminal
'''
import pydotplus as pydot

from IPython.display import Image

from sklearn.externals.six import StringIO

dot_data = StringIO()

tree.export_graphviz(dtree, out_file=dot_data,feature_names=feature_names)

graph = pydot.graph_from_dot_data(dot_data.getvalue())

Image(graph.create_png())'''

In [None]:
#Another way
'''from IPython.display import Image  
from sklearn.externals.six import StringIO  
from sklearn.tree import export_graphviz
import pydot 
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
dot_data = StringIO()  
export_graphviz(dtree, out_file=dot_data,feature_names=feature_names,filled=True)

graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph[0].create_png())'''  

In [None]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=200)
rfc.fit(X_train, y_train)

## Evaluating and Accuracy Metrics
<p>But how well does our model perform? </p>


In [None]:
# Data Accuracy metrics code

rfc.score(X_test,y_test)

In [None]:
pred = rfc.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

In [None]:
print(classification_report(y_test,pred))

In [None]:
matrix=confusion_matrix(y_test,pred)
print(matrix)

In [None]:
plt.figure(figsize = (10,7))
sns.heatmap(matrix,annot=True)