## Descriptive Statistics


 Import **NumPy**, **SciPy**, and **Pandas**

In [3]:
import numpy as np
from scipy import stats
import pandas as pd 

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [4]:
samples = np.random.normal(100, 15, 1000)

Compute the **mean**, **median**, and **mode**

In [5]:
mean = np.mean(samples)
mean

100.45919120509596

In [6]:
median = np.median(samples)
median

100.88857261949369

In [7]:
mode = stats.mode(samples)
mode

ModeResult(mode=array([52.17726022]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [8]:
np.min(samples)

52.17726022162607

In [9]:
np.max(samples)

152.98799050974134

In [10]:
#First quartile
np.percentile(samples, 25)

89.78817371254117

In [11]:
#Third quartile
np.percentile(samples, 75)

110.14209447542869

In [12]:
iqr = np.percentile(samples, 75) - np.percentile(samples, 25)
iqr

20.353920762887526

Compute the **variance** and **standard deviation**

In [13]:
np.var(samples)

213.68347136100522

In [14]:
np.std(samples)

14.617916108700488

Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [15]:
from scipy.stats import kurtosis, skew
from scipy import stats

In [16]:
kurtosis(samples)

-0.08389779151987664

In [17]:
skew(samples)

0.05089832305359444

In [18]:
#skewness = stats.skew(samples)
#skewness

0.05089832305359444

In [19]:
#kurtosis = stats.kurtosis(samples)
#kurtosis

-0.08389779151987664

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [21]:
x = np.arange(10, 20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [22]:
y = np.array(np.random.randint(0,100,10))
y


array([10, 29, 36, 42, 56, 47, 87, 50, 54, 46])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [23]:
r = np.corrcoef(x, y)
r

array([[1.        , 0.64102472],
       [0.64102472, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [24]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [25]:
from scipy.stats import pearsonr

In [27]:
r = pearsonr(x, y)
r

(0.758640289091187, 0.010964341301680813)

In [30]:
#r = x.corr(y)
#r

0.7586402890911867

In [31]:
 #r = stats.pearsonr(x, y)
 #r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
from scipy.stats import spearmanr

In [None]:
rho = spearmanr(x, y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

In [None]:
# rho =x.corr(y, method ="spearman")
# rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [None]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [None]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [None]:
# tip=tips["tip"]
# size=tips["size"]
# r=tip.corr(size)
# r