<a href="https://colab.research.google.com/github/zuhahassen/KWK-2021/blob/main/Descriptive_Statistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📈 **Descriptive Statistics** 📈

First, go to `file > save a copy in drive`. This will make a new copy of this notebook. Next, open your new notebook and go to `edit > clear all outputs`. This will make sure that when you run your code, the output is not already shown. 


### **Importing Packages**

In [None]:
# importing packages 
import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

## 📏 **Measures of Central Tendency**

#### **Mean**



Suppose 12 people went to a restaurant and all ordered at least one item. Below is a dataset that describes the number of items each person ordered and the total price of their order.





In [None]:
# creating our dataframe
order = pd.DataFrame({'item_price':[2.39, 3.39, 3.39, 2.39, 16.98, 10.98, 1.69, 11.75, 9.25, 9.25, 4.45, 8.75],
                   'items_ordered':[1,2,2,1,4,3,1,4,3,3,2,3 ]})

# looking at our data
order.head(12)

Unnamed: 0,item_price,items_ordered
0,2.39,1
1,3.39,2
2,3.39,2
3,2.39,1
4,16.98,4
5,10.98,3
6,1.69,1
7,11.75,4
8,9.25,3
9,9.25,3


In [None]:
# calculating the mean 
order["item_price"].mean()

7.055000000000001

#### **Median**





In [None]:
# calculating the median 
order["item_price"].median()

6.6

#### **Mode**

In [None]:
# calculating the mode 
order["item_price"].mode()

0    2.39
1    3.39
2    9.25
Name: item_price, dtype: float64

## 📐 **Measures of Spread** 


### ☯️ **Variance**


**Calculating variance using `np.var()`** 

In [None]:
np.var(order["item_price"], ddof=1) # ddof=1 calculates population variance instead of sample variance

23.129227272727277

### 📊 **Standard Deviation**


In [None]:
# calculated by taking the square root of the variance 
np.sqrt(np.var(order['item_price'], ddof=1))

4.809285526221881

We can also get the same result using the `np.std` function:

In [None]:
# calculated using np.std function 
np.std(order['item_price'], ddof=1)

4.809285526221881

### 💯**Percentiles** 



#### **Quantiles**



In [None]:
np.quantile(order['item_price'], 0.5)

6.6

#### **Quartiles**



In [None]:
np.quantile(order["item_price"], [0,0.25,0.5,0.75,1])

array([ 1.69  ,  3.14  ,  6.6   ,  9.6825, 16.98  ])

### ↔️ **Ranges**



#### **Interquartile Range**

In [None]:
np.quantile(order["item_price"], 0.75) - np.quantile(order["item_price"], 0.25)

6.5425

We can also get the same result by importing a SciPy module. You can learn more about it [here](https://docs.scipy.org/doc/scipy/reference/stats.html).

In [None]:
from scipy.stats import iqr

iqr(order['item_price'])

6.5425

This means that the range of the middle 50% of the data is 6.54.

## 👩‍💻 **A Shortcut**

In [None]:
order.describe()

Unnamed: 0,item_price,items_ordered
count,12.0,12.0
mean,7.055,2.416667
std,4.809286,1.083625
min,1.69,1.0
25%,3.14,1.75
50%,6.6,2.5
75%,9.6825,3.0
max,16.98,4.0
