# Blood Pressure Project

The objective of this project is to determine if relationships exists between blood pressure and some parameters such as age, weight, body surface area, duration, pulse rate and/or stress level of patients.

I used a dataset coming from researchers observations on 20 individuals with high blood pressure. The dataset contains :
<ul>
<li>blood pressure (BP in mm Hg)</li>
<li>age (Age in years)</li>
<li>weight (Weight in kg)</li>
<li>body surface area (BSA in square meter)</li>
<li>duration of hypertension (Dur in years)</li>
<li>basal pulse (Pulse in beats per minute)</li>
<li>stress index (Stress)</li>
</ul>

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('data/Blood_Pressure.csv')
dataset.head(3)

Unnamed: 0,Col1,Col2
0,Blood pressure,Age
1,Low,Under 30
2,Low,Under 30


In [2]:
# Building matrix of features X and dependent variable y
X = dataset.iloc[:, 2:8].values 
y = dataset.iloc[:, 1].values

In [3]:
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

ModuleNotFoundError: No module named 'sklearn.cross_validation'

In [5]:
# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [6]:
# Predicting the Test set results
y_pred = regressor.predict(X_test)
print(y_pred)

[ 111.23941728  115.19878669  121.86682795  110.18685104]


In [7]:
# Compare predicted values to test values
print(y_test)

[110 115 122 110]


The model I built works pretty well indeed. Now I will identify which parameters are significants and others not.
In order to do that I will use Backwar Elimination process.

In [9]:
# Building the optimal model using Backward Elimination with 5% as significant level
import statsmodels.formula.api as sm
X = np.append(arr = np.ones((20, 1)).astype(int), values = X, axis = 1)
X_opt = X[:, [0, 1, 2, 3, 4, 5, 6]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# x4 (duration of hypertension variable) have the highest P-value so I eliminate it 
X_opt = X[:, [0, 1, 2, 3, 5, 6]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# x4 (basal pulse variable) have the highest P-value so I eliminate it 
X_opt = X[:, [0, 1, 2, 3, 6]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# x4 (stress index variable) have the highest P-value so I eliminate it 
X_opt = X[:, [0, 1, 2, 3]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# All P-values are under 5%

0,1,2,3
Dep. Variable:,y,R-squared:,0.991
Model:,OLS,Adj. R-squared:,0.99
Method:,Least Squares,F-statistic:,978.2
Date:,"Sun, 28 Jan 2018",Prob (F-statistic):,2.81e-18
Time:,17:27:12,Log-Likelihood:,-14.157
No. Observations:,20,AIC:,34.31
Df Residuals:,17,BIC:,37.3
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-8.2897,1.504,-5.513,0.000,-11.462,-5.117
x1,-8.2897,1.504,-5.513,0.000,-11.462,-5.117
x2,0.7083,0.054,13.235,0.000,0.595,0.821
x3,1.0330,0.031,33.154,0.000,0.967,1.099

0,1,2,3
Omnibus:,0.989,Durbin-Watson:,1.688
Prob(Omnibus):,0.61,Jarque-Bera (JB):,0.768
Skew:,0.101,Prob(JB):,0.681
Kurtosis:,2.061,Cond. No.,2.67e+17


At the end of the backward elimination process we obtain 3 main criterias : Age, Weight and Body.
So Blood Pressure is extremly correlated to these three variables.