# A brief description about Parkinson’s disease

Parkinson’s disease is a type of disorder of the central nervous system that effects movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neuro-degenerative disorder affecting dopamine-producing neurons in the brain.

# Reason behind using XGBoost algorithm?

XGBoost is a new Machine Learning algorithm designed with speed and performance. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the XGBoost library. This is an implementation of the scikit-learn API for XGBoost classification.

## Objective

To build a model accurately by detecting the presence of Parkinson’s disease in an individual.

## Summarization of the Project

In this project, we will use python libraries such as pandas, numpy, xgboost, scikit-learn and build a model using XGB-Classifier. We will first load the data, get the features & labels, split the dataset, then scale the features and labels followed by building an XGBClassifier. Finally we will calculate the accuracy of the model.

### Dataset for ML Project

We'll use UCI ML Parkinson's dataset that has 24 columns and 195 rows (recods).

In [14]:
pip install numpy pandas sklearn xgboost

Note: you may need to restart the kernel to use updated packages.


In [15]:
# Prerequisites
## Install the below libraries with PIP command, if not done already;

### Import necessary python libraries;

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [16]:

import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
cos_client = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='c5X6bp8xmJiVqN3CciJHx8kcUPSoz7NJ8AKhaRhb1mBM',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.private.us.cloud-object-storage.appdomain.cloud')

bucket = 'identifyparkinson39sdisease-donotdelete-pr-pf7fxmtfebi3c0'
object_key = 'parkinsons.csv'

body = cos_client.get_object(Bucket=bucket,Key=object_key)['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()


Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [17]:
# Get the features and labels from the DataFrame (dataset). 
## The features are all the columns except ‘status’, and the labels are those in the ‘status’ column.

features=df_data_1.loc[:,df_data_1.columns!='status'].values[:,1:]
labels=df_data_1.loc[:,'status'].values

In [18]:
# The ‘status’ column has values 0 and 1 as labels; 
## let’s get the counts of these labels for both- 0 and 1

print(labels[labels==1].shape[0], labels[labels==0].shape[0])

147 48


In [None]:
# We have 147 ones and 48 zeros in the status column in our dataset.

In [19]:
# Initialize a MinMaxScaler and scale the features between -1 and 1 to normalize them.
## The MinMaxScaler transforms features by scaling them to a given range. 
### The fit_transform() method fits to the data and then transforms it. We don’t need to scale the labels.

scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels

In [20]:
# Now, split the dataset into training and testing sets keeping 20% of the data for testing.

x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)

In [21]:
# Initialize an XGBClassifier and train the model.
## This classifies using eXtreme Gradient Boosting- using gradient boosting algorithms for modern data science problems. 
### It falls under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.

model=XGBClassifier()
model.fit(x_train,y_train)

In [22]:
# Find the Accuracy of the model
## Finally, generate y_pred (predicted values for x_test) and calculate the accuracy for the model. Print it out.

y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

94.87179487179486


## Conclusion

### In this Python machine learning project, we learned to detect the presence of Parkinson’s Disease in individuals using various factors. We used an XGBClassifier for this and made use of the sklearn library to prepare the dataset. This gives us an accuracy of 94.87%, which is great considering the number of lines of code in this python project.