## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [3]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [19]:
samples = np.random.normal(size=1000, loc = 100, scale =100)
samples

array([-9.68796191e+01,  1.33935243e+01,  7.33590399e+01,  2.77954277e+01,
       -3.62905409e+01,  2.72609986e+02,  1.51228454e+02,  3.20080773e+02,
        2.95376503e+01, -9.64656296e+01,  7.09816866e+01,  8.48981165e+01,
        1.22038764e+02,  1.27602679e+02,  2.23172091e+02,  3.10847875e+02,
        1.73199920e+02,  5.70121822e+01, -3.79082045e+01, -9.63898386e-01,
        6.46494047e+01,  1.51836582e+02,  2.35122995e+02,  1.15164941e+02,
        3.09576717e+01, -6.24065506e+01,  5.71564046e+01,  1.55644896e+02,
        1.18147926e+02,  2.01248595e+02,  1.98823882e+02,  1.08451929e+02,
       -3.77727742e+01, -8.40538988e+01,  1.73707303e+02,  9.34700694e+01,
        1.51701556e+02,  8.76357998e+01, -3.47491957e+01,  8.67403784e+00,
        2.55881798e+02,  1.74437548e+02,  1.15111125e+01, -4.28883554e+01,
        1.04320398e+02,  1.04050834e+02,  9.41260252e+01, -2.79451040e+01,
        5.42866459e+01,  3.18282422e+01,  1.17250440e+02, -6.01046079e+00,
        1.46888734e+02, -

Compute the **mean**, **median**, and **mode**

In [45]:
mean = np.mean(samples)
median =np.median(samples)
mode = stats.mode(samples)



In [41]:
mean

100.25036291737538

In [42]:
median

99.70680171550421

In [44]:
mode

ModeResult(mode=array([-261.09013467]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [33]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = stats.iqr(samples)

In [36]:
min

-261.090134672497

In [35]:
max

479.7156196382672

In [37]:
q1

28.406423617769168

In [39]:
q3

170.59812658126128

In [40]:
iqr

142.1917029634921

Compute the **variance** and **standard deviation**

In [46]:
variance = np.var(samples)
std_dev = np.std(samples)

In [47]:
variance

10603.955040820587

In [48]:
std_dev

102.97550699472465

Compute the **skewness** and **kurtosis**

In [50]:
skewness =stats.skew(samples)
kurtosis =stats.kurtosis(samples)

In [51]:
skewness

0.08472544974371264

In [52]:
kurtosis

0.03672467295014448

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [63]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [70]:
y = np.array([1,2,3,4,5,6,7,8,9,18])
y

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 18])

In [72]:
numer = len(x)*sum(x*y) - sum(x)*sum(y)
denom = np.sqrt((len(x)*sum(x**2)-sum(x)**2)*(len(y)*sum(y**2)-sum(y)**2))

In [65]:
numer

825

In [66]:
denom

825.0

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [73]:
r = numer/denom
r

0.8958205931813593

## Pandas Correlation Calculation

Run the code below

In [74]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [76]:
r = stats.pearsonr(x, y)
r

(0.7586402890911869, 0.010964341301680832)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [78]:
rho = stats.spearmanr(x, y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [79]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [81]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

Call the relevant method to calculate pairwise Pearson's r correlation of columns