**Parkinson's diseases detection**

**Task details:**

This dataset is composed of a range of biomedical voice measurements from
31 people, 23 with Parkinson's disease (PD). Each column in the table is a
particular voice measure, and each row corresponds to one of 195 voice
recordings from these individuals ("name" column). The main aim of the data
is to discriminate healthy people from those with PD, according to the "status"
column which is set to 0 for healthy and 1 for PD.

**IMPORTING DATASET**

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
dataset = pd.read_csv('/content/Parkinsson disease_mod.csv')
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              195 non-null    object 
 1   MDVP:Fo(Hz)       195 non-null    float64
 2   MDVP:Fhi(Hz)      195 non-null    float64
 3   MDVP:Flo(Hz)      195 non-null    float64
 4   MDVP:Jitter(%)    195 non-null    float64
 5   MDVP:Jitter(Abs)  195 non-null    float64
 6   MDVP:RAP          195 non-null    float64
 7   MDVP:PPQ          195 non-null    float64
 8   Jitter:DDP        195 non-null    float64
 9   MDVP:Shimmer      195 non-null    float64
 10  MDVP:Shimmer(dB)  195 non-null    float64
 11  Shimmer:APQ3      195 non-null    float64
 12  Shimmer:APQ5      195 non-null    float64
 13  MDVP:APQ          195 non-null    float64
 14  Shimmer:DDA       195 non-null    float64
 15  NHR               195 non-null    float64
 16  HNR               195 non-null    float64
 1

In [5]:
name=dataset['name']

In [6]:
dataset.drop(['name'], axis=1, inplace=True)

In [7]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 23 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   MDVP:Fo(Hz)       195 non-null    float64
 1   MDVP:Fhi(Hz)      195 non-null    float64
 2   MDVP:Flo(Hz)      195 non-null    float64
 3   MDVP:Jitter(%)    195 non-null    float64
 4   MDVP:Jitter(Abs)  195 non-null    float64
 5   MDVP:RAP          195 non-null    float64
 6   MDVP:PPQ          195 non-null    float64
 7   Jitter:DDP        195 non-null    float64
 8   MDVP:Shimmer      195 non-null    float64
 9   MDVP:Shimmer(dB)  195 non-null    float64
 10  Shimmer:APQ3      195 non-null    float64
 11  Shimmer:APQ5      195 non-null    float64
 12  MDVP:APQ          195 non-null    float64
 13  Shimmer:DDA       195 non-null    float64
 14  NHR               195 non-null    float64
 15  HNR               195 non-null    float64
 16  RPDE              195 non-null    float64
 1

**TRAINING DATASET**

In [8]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
print(X_train)

[[1.482720e+02 1.649890e+02 1.422990e+02 ... 8.784000e-02 2.344336e+00
  1.864890e-01]
 [1.162860e+02 1.772910e+02 9.698300e+01 ... 1.339170e-01 2.058658e+00
  2.143460e-01]
 [1.636560e+02 2.008410e+02 7.677900e+01 ... 2.208900e-01 2.692176e+00
  2.159610e-01]
 ...
 [1.707560e+02 4.502470e+02 7.903200e+01 ... 3.721140e-01 2.975889e+00
  2.827800e-01]
 [2.524550e+02 2.614870e+02 1.827860e+02 ... 2.008730e-01 2.028612e+00
  8.639800e-02]
 [1.107390e+02 1.135970e+02 1.001390e+02 ... 1.923750e-01 1.889002e+00
  1.741520e-01]]


In [9]:
y_test

array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1])

**DATA PREPROCESSING**

In [10]:
from sklearn.preprocessing import StandardScaler	
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

**TRAINING MODEL**

In [14]:
from xgboost import XGBClassifier
model=XGBClassifier(eval_metric='mlogloss')
xgb=model.fit(X_train,y_train)

**INFORMATION ON THE MODEL**

In [16]:
xgb.score(X,y)

0.7538461538461538

In [None]:
y_pred = model.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

In [18]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
pd.DataFrame(cm,columns=['Predicted Healthy', 'Predicted Parkinsons'], index=['True Healthy', 'True Parkinsons'])

Unnamed: 0,Predicted Healthy,Predicted Parkinsons
True Healthy,11,0
True Parkinsons,2,36


In [19]:
accuracy_score(y_test, y_pred)

0.9591836734693877

**PREDICTING A RANDOM VALUE**

In [21]:
print(model.predict(sc.transform([[197.076, 206.896, 192.055, 0.00289, 0.00001, 0.00166, 0.00168, 0.00498,	0.01098, 0.097, 0.00563, 0.0068, 0.00802, 0.01689, 0.00339, 26.775, 0.422229, 0.741367, -7.3483, 0.177551, 1.743867, 0.085569
]])))

[0]


THE RESULT IS SAME AS THE VALUE GIVEN IN THE DATASET

In [22]:
df = pd.DataFrame({ 'Actual': y_test, 'Predicted': y_pred})  

In [23]:
comparison_column = np.where(df["Actual"] == df["Predicted"], True, False)
df["equal"] = comparison_column
print(df)

    Actual  Predicted  equal
0        1          1   True
1        1          1   True
2        0          0   True
3        1          1   True
4        0          0   True
5        1          1   True
6        1          0  False
7        1          1   True
8        1          1   True
9        1          1   True
10       1          0  False
11       1          1   True
12       1          1   True
13       1          1   True
14       0          0   True
15       1          1   True
16       1          1   True
17       0          0   True
18       0          0   True
19       1          1   True
20       1          1   True
21       0          0   True
22       1          1   True
23       1          1   True
24       0          0   True
25       1          1   True
26       1          1   True
27       0          0   True
28       0          0   True
29       0          0   True
30       1          1   True
31       1          1   True
32       1          1   True
33       1    

In [24]:
df.value_counts()

Actual  Predicted  equal
1       1          True     36
0       0          True     11
1       0          False     2
dtype: int64