# Aerofit - Treadmill Buyer Profile

## Table of Contents
* [Assignment](#Assignment)
* [Data Description](#Data-Description)
* [Data Exploration](#Data-Exploration)

## Assignment

The market research team at AeroFit wants to identify the characteristics of the target audience for each type of treadmill offered by the company, to provide a better recommendation of the treadmills to new customers. The team decides to investigate whether there are differences across the product with respect to customer characteristics.

Perform descriptive analytics to create a customer profile for each AeroFit treadmill product by developing appropriate tables and charts. For each AeroFit treadmill product, construct two-way contingency tables and compute all conditional and marginal probabilities along with their insights/impact on the business.

Product Portfolio:

* The KP281 is an entry-level treadmill that sells for `$1,500` ;
* The KP281 is an entry-level treadmill that sells for `$1,500` ;
* The KP481 is for mid-level runners and sells for `$1,750` ;
* The KP781 treadmill is having advanced features and it sells for `$2,500` .

## Data Description

The company collected data on individuals who purchased a treadmill from the AeroFit stores during the prior three months. The dataset in aerofit_treadmill_data.csv has the following features:

* `Product` - product purchased: KP281, KP481, or KP781


* `Age` - in years


* `Gender` - male/female


* `Education` - in years


* `MaritalStatus` - single or partnered


* `Usage` - the average number of times the customer plans to use the treadmill each week


* `Fitness` - self-rated fitness on a 1-5 scale, where 1 is the poor shape and 5 is the excellent shape


* `Income` - annual income in US dollar


* `Miles` - the average number of miles the customer expects to walk/run each week

## Data Exploration

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import warnings
warnings.filterwarnings("ignore")

In [2]:
aerofit_treadmill_data = pd.read_csv('aerofit_treadmill_data.csv')
aerofit_treadmill_data.head(3)

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,KP281,18,Male,14,Single,3,4,29562,112
1,KP281,19,Male,15,Single,2,3,31836,75
2,KP281,19,Female,14,Partnered,4,3,30699,66


In [4]:
aerofit_treadmill_data.shape

(180, 9)

In [8]:
aerofit_treadmill_data.dtypes

Product          object
Age               int64
Gender           object
Education         int64
MaritalStatus    object
Usage             int64
Fitness           int64
Income            int64
Miles             int64
dtype: object

In [9]:
aerofit_treadmill_data['Product'] = aerofit_treadmill_data['Product'].astype('category')

aerofit_treadmill_data['Gender'] = aerofit_treadmill_data['Gender'].astype('category')

aerofit_treadmill_data['MaritalStatus'] = aerofit_treadmill_data['MaritalStatus'].astype('category')

In [10]:
aerofit_treadmill_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   Product        180 non-null    category
 1   Age            180 non-null    int64   
 2   Gender         180 non-null    category
 3   Education      180 non-null    int64   
 4   MaritalStatus  180 non-null    category
 5   Usage          180 non-null    int64   
 6   Fitness        180 non-null    int64   
 7   Income         180 non-null    int64   
 8   Miles          180 non-null    int64   
dtypes: category(3), int64(6)
memory usage: 9.5 KB


Skewness is a statistical measure that describes the asymmetry of a data distribution. It quantifies the extent and direction of skew (departure from horizontal symmetry) from a normal distribution.

* If the skewness is less than 0, the data is said to be left-skewed, meaning that the left tail is longer or fatter than the right tail.


* If the skewness is greater than 0, the data is right-skewed, meaning that the right tail is longer or fatter than the left tail.


* If the skewness is close to 0, the data distribution is quite symmetrical, but still, it doesn't indicate it follows a normal distribution.

In [13]:
aerofit_treadmill_data.skew()

Age          0.982161
Education    0.622294
Usage        0.739494
Fitness      0.454800
Income       1.291785
Miles        1.724497
dtype: float64

In [14]:
aerofit_treadmill_data.describe(include = 'all')

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
count,180,180.0,180,180.0,180,180.0,180.0,180.0,180.0
unique,3,,2,,2,,,,
top,KP281,,Male,,Partnered,,,,
freq,80,,104,,107,,,,
mean,,28.788889,,15.572222,,3.455556,3.311111,53719.577778,103.194444
std,,6.943498,,1.617055,,1.084797,0.958869,16506.684226,51.863605
min,,18.0,,12.0,,2.0,1.0,29562.0,21.0
25%,,24.0,,14.0,,3.0,3.0,44058.75,66.0
50%,,26.0,,16.0,,3.0,3.0,50596.5,94.0
75%,,33.0,,16.0,,4.0,4.0,58668.0,114.75


In [15]:
aerofit_treadmill_data.isna().sum()

Product          0
Age              0
Gender           0
Education        0
MaritalStatus    0
Usage            0
Fitness          0
Income           0
Miles            0
dtype: int64

In [17]:
aerofit_tr
eadmill_data.duplicated(subset = None,keep='first').sum()

0

#### Observations:

* There are no missing values in the data.


* There are no dublicated values in the data.


* There are 3 unique products in the dataset.


* KP281 is the most frequent product.


* Minimum & Maximum age of the person is 18 & 50, mean is 28.79, and 75% of persons have an age less than or equal to 33.


* Most of the people are having 16 years of education i.e. 75% of persons are having education <= 16 years.


* Out of 180 data points, 104's gender is Male and rest are the Female.


* Standard deviation for Income & Miles is very high. These variables might have outliers in them.

In [20]:
aerofit_treadmill_data['Product'].value_counts()

KP281    80
KP481    60
KP781    40
Name: Product, dtype: int64

In [21]:
aerofit_treadmill_data['MaritalStatus'].value_counts()

Partnered    107
Single        73
Name: MaritalStatus, dtype: int64

In [22]:
aerofit_treadmill_data['Gender'].value_counts()

Male      104
Female     76
Name: Gender, dtype: int64