## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [None]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [None]:
samples = np.random.normal(100,15,1000)

Compute the **mean**, **median**, and **mode**

In [None]:
mean = samples.mean()
median = np.median(samples)
mode = stats.mode(samples)

In [None]:
print(' Mean: {}\n Median: {}\n Mode: {}'.format(mean,median,mode))

 Mean: 100.17053026777474
 Median: 99.56218050984377
 Mode: ModeResult(mode=array([60.83365074]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
samples.max()

148.06219700430694

In [None]:
min = samples.min()
max = samples.max()
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = q3-q1

In [None]:
print('Min: ',min,'\nMax: ',max,'\nq1: ',q1,'\nq3: ',q3,'\niqr: ',iqr)

Min:  60.83365074284536 
Max:  148.06219700430694 
q1:  90.11359170674825 
q3:  110.30296829741211 
iqr:  20.189376590663855


Compute the **variance** and **standard deviation**

In [None]:
variance = np.var(samples)
std_dev = np.std(samples)

In [None]:
print('Variance: {}\nStd: {}'.format(variance,std_dev))

Variance: 222.88890092469
Std: 14.929464187461317


Compute the **skewness** and **kurtosis**

In [None]:
skewness = stats.skew(samples)

kurtosis = stats.kurtosis(samples)

In [None]:
print('Skewness: {}\nKurtosis: {}'.format(skewness,kurtosis))

Skewness: 0.12085234994614222
Kurtosis: -0.048618516300255266


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x = np.arange(11,21)
x

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.random.randint(1,21,10)
y

array([12, 19,  6,  5,  1,  3,  3,  6, 13, 14])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
np.cov(x, y)

array([[  9.16666667,  68.16666667],
       [ 68.16666667, 880.76666667]])

## Pandas Correlation Calculation

Run the code below

In [None]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [None]:
r, p = stats.pearsonr(x, y)
r

0.758640289091187

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
rho = stats.spearmanr(x,y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [None]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
tips.describe(include='all')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
count,244.0,244.0,244,244,244,244,244.0
unique,,,2,2,4,2,
top,,,Male,No,Sat,Dinner,
freq,,,157,151,87,176,
mean,19.785943,2.998279,,,,,2.569672
std,8.902412,1.383638,,,,,0.9511
min,3.07,1.0,,,,,1.0
25%,13.3475,2.0,,,,,2.0
50%,17.795,2.9,,,,,2.0
75%,24.1275,3.5625,,,,,3.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [None]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
