# **Hands-on lab: Exploratory Data Analysis - Laptops Pricing dataset**


In this lab, you will use the skills acquired  to explore the effect of different features on the price of laptops. 


# Objectives

After completing this lab you will be able to:

 - Visualize individual feature patterns
 - Run descriptive statistical analysis on the dataset
 - Use groups and pivot tables to find the effect of categorical variables on price
 - Use Pearson Correlation to measure the interdependence between variables


# Setup


For this lab, we will be using the following libraries:

* `skillsnetwork` for downloading the data
*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.
*   [`scipy`](https://docs.scipy.org/doc/scipy/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for statistical operations.
*   [`seaborn`](https://seaborn.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for visualizing the data.
*   [`matplotlib`](https://matplotlib.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for additional plotting tools.


# Install Required Libraries

You can install the required libraries by simply running the `pip install` command with a `%` sign before it. For this environment, `seaborn` library requires installation.


In [7]:
!python -m pip install --upgrade pip


Collecting pip
  Downloading pip-24.2-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-24.2-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
    --------------------------------------- 0.0/1.8 MB 1.3 MB/s eta 0:00:02
    --------------------------------------- 0.0/1.8 MB 1.3 MB/s eta 0:00:02
   - -------------------------------------- 0.1/1.8 MB 465.5 kB/s eta 0:00:04
   - -------------------------------------- 0.1/1.8 MB 416.7 kB/s eta 0:00:05
   -- ------------------------------------- 0.1/1.8 MB 401.6 kB/s eta 0:00:05
   -- ------------------------------------- 0.1/1.8 MB 423.5 kB/s eta 0:00:04
   --- ------------------------------------ 0.1/1.8 MB 425.3 kB/s eta 0:00:04
   --- ------------------------------------ 0.2/1.8 MB 446.5 kB/s eta 0:00:04
   --- ------------------------------------ 0.2/1.8 MB 419.0 kB/s eta 0:00:04
   ---- ----------------------------------- 0.2/1.8 MB 428.5 kB/s eta 0:00:04
   ---- ---------------------------

### Importing Required Libraries


In [13]:
pip install seaborn

Note: you may need to restart the kernel to use updated packages.


In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
%matplotlib inline

# Import the dataset

You should download the modified version of the data set from the last module. 
Run the following code block to download the CSV file to this environment.


The functions below will download the dataset into your browser:


In [19]:
filepath="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"

In [20]:
df = pd.read_csv(filepath, header=0)

In [None]:
#filepath="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"
#df = pd.read_csv(filepath, header=None)

Print the first 5 entries of the dataset to confirm loading.


In [21]:
df.head(5)

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,Screen_Size_inch,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_pounds,Price,Price-binned,Screen-Full_HD,Screen-IPS_panel
0,0,0,Acer,4,2,1,5,14.0,0.551724,8,256,3.528,978,Low,0,1
1,1,1,Dell,3,1,1,3,15.6,0.689655,4,256,4.851,634,Low,1,0
2,2,2,Dell,3,1,1,7,15.6,0.931034,8,256,4.851,946,Low,1,0
3,3,3,Dell,4,2,1,5,13.3,0.551724,8,128,2.6901,1244,Low,0,1
4,4,4,HP,4,2,1,7,15.6,0.62069,8,256,4.21155,837,Low,1,0


# Task 1 - Visualize individual feature patterns

### Continuous valued features
Generate regression plots for each of the parameters "CPU_frequency", "Screen_Size_inch" and "Weight_pounds" against "Price". Also, print the value of correlation of each feature with "Price".


In [None]:
# Write your code below and press Shift+Enter to execute
# CPU_frequency plot


In [None]:
# Write your code below and press Shift+Enter to execute
# Screen_Size_inch plot


In [None]:
# Write your code below and press Shift+Enter to execute
# Weight_pounds plot


In [None]:
# Correlation values of the three attributes with Price


Interpretation: "CPU_frequency" has a 36% positive correlation with the price of the laptops. The other two parameters have weak correlation with price.


### Categorical features
Generate Box plots for the different feature that hold categorical values. These features would be "Category", "GPU", "OS", "CPU_core", "RAM_GB", "Storage_GB_SSD"


In [None]:
# Write your code below and press Shift+Enter to execute
# Category Box plot


In [None]:
# Write your code below and press Shift+Enter to execute
# GPU Box plot


In [None]:
# Write your code below and press Shift+Enter to execute
# OS Box plot

In [None]:
# Write your code below and press Shift+Enter to execute
# CPU_core Box plot

In [None]:
# Write your code below and press Shift+Enter to execute
# RAM_GB Box plot

In [None]:
# Write your code below and press Shift+Enter to execute
# Storage_GB_SSD Box plot

# Task 2 - Descriptive Statistical Analysis


Generate the statistical description of all the features being used in the data set. Include "object" data types as well.


In [None]:
# Write your code below and press Shift+Enter to execute

# Task 3 - GroupBy and Pivot Tables

Group the parameters "GPU", "CPU_core" and "Price" to make a pivot table and visualize this connection using the pcolor plot.


In [None]:
# Write your code below and press Shift+Enter to execute
# Create the group

In [None]:
# Write your code below and press Shift+Enter to execute
# Create the Pivot table

In [None]:
# Write your code below and press Shift+Enter to execute
# Create the Plot

# Task 4 - Pearson Correlation and p-values

Use the `scipy.stats.pearsonr()` function to evaluate the Pearson Coefficient and the p-values for each parameter tested above. This will help you determine the parameters most likely to have a strong effect on the price of the laptops.


In [None]:
# Write your code below and press Shift+Enter to execute

# END

## Thank you for listening


<!--## Change Log


<!--|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-09-15|0.1|Abhishek Gagneja|Initial Version Created|
|2023-09-18|0.2|Vicky Kuo|Reviewed and Revised|--!>
