# Percentiles
Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than.
Let us try to explain it by some examples, using Average_Pulse.

The 25% percentile of Average_Pulse means that 25% of all of the training sessions have an average pulse of 100 beats per minute or lower. If we flip the statement, it means that 75% of all of the training sessions have an average pulse of 100 beats per minute or higher
The 75% percentile of Average_Pulse means that 75% of all the training session have an average pulse of 111 or lower. If we flip the statement, it means that 25% of all of the training sessions have an average pulse of 111 beats per minute or higher


In [1]:
import numpy as np
import pandas as pd

health_data = pd.read_json('data.json')
max_pulse = health_data["Max_Pulse"]
per10 = np.percentile(max_pulse,10)
print(per10)

120.0


# standard deviation

In [2]:
std = np.std(health_data)
print(std)

Duration           15.370426
Average_Pulse      14.361407
Max_Pulse          10.770330
Calorie_Burnage    28.722813
Hours_Work          3.440930
Hours_Sleep         0.500000
dtype: float64


  return std(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs)


In [3]:
print(np.mean(health_data["Duration"]))

52.5


# coefficient of variation
The coefficient of variation is used to get an idea of how large the standard deviation is.

In [4]:
cv = np.std(health_data) / np.mean(health_data)
print(cv)

Duration           0.156019
Average_Pulse      0.145776
Max_Pulse          0.109325
Calorie_Burnage    0.291553
Hours_Work         0.034927
Hours_Sleep        0.005075
dtype: float64


  return std(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs)


# variance 
Variance is another number that indicates how spread out the values are.

In fact, if you take the square root of the variance, you get the standard deviation. Or the other way around, if you multiply the standard deviation by itself, you get the variance!

## Step 1 to Calculate the Variance: Find the Mean

In [5]:
mean_avg_pulse = np.mean(health_data["Average_Pulse"])

## Step 2: For Each Value - Find the Difference From the Mean

In [9]:
difference = health_data["Average_Pulse"] - mean_avg_pulse
print(difference)

0   -22.5
1   -17.5
2   -12.5
3    -7.5
4    -2.5
5     2.5
6     7.5
7    12.5
8    17.5
9    22.5
Name: Average_Pulse, dtype: float64


## Step 3: For Each Difference - Find the Square Value

In [8]:
diff_sqr = difference**2
print(diff_sqr)

0    506.25
1    306.25
2    156.25
3     56.25
4      6.25
5      6.25
6     56.25
7    156.25
8    306.25
9    506.25
Name: Average_Pulse, dtype: float64


## Step 4: The Variance is the Average Number of These Squared Values

In [None]:
sqr_mean = np.mean(diff_sqr)
print(sqr_mean)

# the variance of the Average_Pulse column is 206.25

206.25


## or...you could do this

In [11]:
var = np.var(health_data)
print(var)

Duration           236.25
Average_Pulse      206.25
Max_Pulse          116.00
Calorie_Burnage    825.00
Hours_Work          11.84
Hours_Sleep          0.25
dtype: float64


  return var(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs)
