# Diabetes prediction using Machine Learning

Importing all the required libraries

In [1]:
import numpy as np
from sklearn import datasets
diabetes= datasets.load_diabetes()
from sklearn.datasets import load_diabetes


Listing all the keys available in Diabetes Dataset

In [2]:
print(list(diabetes.keys()))

['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename', 'target_filename', 'data_module']


Printing the DESCRIPTION of the Retrieved DataSet

In [3]:
print(diabetes['DESCR'])

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - age     age in years
      - sex
      - bmi     body mass index
      - bp      average blood pressure
      - s1      tc, total serum cholesterol
      - s2      ldl, low-density lipoproteins
      - s3      hdl, high-density lipoproteins
      - s4      tch, total cholesterol / HDL
      - s5      ltg, possibly log of serum triglycerides level
      - s6      glu, blood sugar level

Note: Each of these 1

Dividing the dataset in two parts (INPUT AND OUTPUT)


In [4]:
diabetes_X ,diabetes_y = load_diabetes(return_X_y = True)

Printing some of the examples 

In [5]:
diabetes_X[0:5]


array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187235, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990842, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, -0.02632783, -0.00844872,
        -0.01916334,  0.07441156, -0.03949338, -0.06832974, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, -0.00567061, -0.04559945,
        -0.03419447, -0.03235593, -0.00259226,  0.00286377, -0.02593034],
       [-0.08906294, -0.04464164, -0.01159501, -0.03665645,  0.01219057,
         0.02499059, -0.03603757,  0.03430886,  0.02269202, -0.00936191],
       [ 0.00538306, -0.04464164, -0.03638469,  0.02187235,  0.00393485,
         0.01559614,  0.00814208, -0.00259226, -0.03199144, -0.04664087]])

In [6]:
diabetes_y[0:5]

array([151.,  75., 141., 206., 135.])

Returning the shape of X and y

In [7]:
diabetes_X.shape

(442, 10)

In [8]:
diabetes_y.shape

(442,)

In [9]:
diabetes_y

array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 28

Reshaping the dataset according to the requirement


In [10]:
diabetes_X = diabetes_X[:,np.newaxis,2]
diabetes_X.shape

(442, 1)

In [11]:
diabetes_X_train = diabetes_X[:-2]
diabetes_X_test = diabetes_X[-2:]

In [12]:
diabetes_X_train.shape

(440, 1)

In [13]:
diabetes_X_test.shape

(2, 1)

In [14]:
diabetes_y_train = diabetes_y[:-2]
diabetes_y_test = diabetes_y[-2:]

In [15]:
diabetes_y_train.shape

(440,)

In [16]:
diabetes_y_test.shape

(2,)

Importing Linear Regression from SciKit Learn Library

In [17]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

Fitting the dataset into the model to train the model

In [18]:
model.fit(diabetes_X_train,diabetes_y_train)

LinearRegression()

Testing the model

In [19]:
y_predict = model.predict(diabetes_X_test)

Prediction Output (Blood sugar levels of the person)

In [20]:
print(y_predict)

[189.08801164  83.01176831]


Accuracy of the model

In [21]:
model.score(diabetes_X_test,diabetes_y_test)

0.877137783507265