# Goal of the Competition

The goal of this competition is to predict MDS-UPDR scores, which measure progression in patients with Parkinson's disease. The Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is a comprehensive assessment of both motor and non-motor symptoms associated with Parkinson's. You will develop a model trained on data of protein and peptide levels over time in subjects with Parkinson’s disease versus normal age-matched control subjects.

Your work could help provide important breakthrough information about which molecules change as Parkinson’s disease progresses.



# Context


Parkinson’s disease (PD) is a disabling brain disorder that affects movements, cognition, sleep, and other normal functions. Unfortunately, there is no current cure—and the disease worsens over time. It's estimated that by 2037, 1.6 million people in the U.S. will have Parkinson’s disease, at an economic cost approaching $80 billion. Research indicates that protein or peptide abnormalities play a key role in the onset and worsening of this disease. Gaining a better understanding of this—with the help of data science—could provide important clues for the development of new pharmacotherapies to slow the progression or cure Parkinson’s disease.

Current efforts have resulted in complex clinical and neurobiological data on over 10,000 subjects for broad sharing with the research community. A number of important findings have been published using this data, but clear biomarkers or cures are still lacking.

Competition host, the Accelerating Medicines Partnership® Parkinson’s Disease (AMP®PD), is a public-private partnership between government, industry, and nonprofits that is managed through the Foundation of the National Institutes of Health (FNIH). The Partnership created the AMP PD Knowledge Platform, which includes a deep molecular characterization and longitudinal clinical profiling of Parkinson’s disease patients, with the goal of identifying and validating diagnostic, prognostic, and/or disease progression biomarkers for Parkinson’s disease.

Your work could help in the search for a cure for Parkinson’s disease, which would alleviate the substantial suffering and medical care costs of patients with this disease.

# Project Structure

1. **Data collection** : Gather data on protein and peptide levels over time in subjects with Parkinson’s disease versus normal age-matched control subjects from the AMP PD Knowledge Platform.

2. **Data preprocessing**: Preprocess the data to handle any missing values, outliers, or inconsistencies. This might involve techniques such as imputation, normalization, and feature scaling.

3. **Feature engineering**: Extract relevant features from the data that could help predict MDS-UPDR scores. This might involve techniques such as principal component analysis, feature selection, and feature transformation.

4. **Model selection**: Choose appropriate models that could accurately predict MDS-UPDR scores. This might involve techniques such as linear regression, decision trees, random forests, or neural networks.

5. **Model training**: Train the chosen models on the preprocessed data, using techniques such as cross-validation to optimize hyperparameters and avoid overfitting.

6. **Model evaluation**: Evaluate the performance of the trained models on a separate test set, using metrics such as mean squared error, mean absolute error, or R-squared.

7. **Model interpretation**: Interpret the trained models to gain insights into the relationship between protein and peptide levels and MDS-UPDR scores. This might involve techniques such as feature importance analysis or partial dependence plots.


# Import Libraries

In [3]:
import os
os.environ['APPDATA'] = ""

In [4]:
import pandas as pd
import pandasgui

This process is not trusted! Input event monitoring will not be possible until it is added to accessibility clients.


# Data collection

In [None]:
## get data path

In [8]:
train_peptides = pd.read_csv('/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/train_peptides.csv')

In [None]:
pandasgui.show(train_peptides)

PandasGUI INFO — pandasgui.gui — Opening PandasGUI
PandasGUI INFO — pandasgui.gui — Opening PandasGUI


In [6]:
train = pd.read_csv('/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/train_clinical_data.csv')

In [7]:
pandasgui.show(train)

PandasGUI INFO — pandasgui.gui — Opening PandasGUI
PandasGUI INFO — pandasgui.gui — Opening PandasGUI
qt.qpa.fonts: Populating font family aliases took 433 ms. Replace uses of missing font family "Consolas" with one that exists to avoid this cost. 


<pandasgui.gui.PandasGui at 0x13e1f1fc0>

PandasGUI ERROR — pandasgui.store — name 'v' is not defined
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/computation/scope.py", line 233, in resolve
    return self.resolvers[key]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py", line 982, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py", line 974, in __missing__
    raise KeyError(key)
KeyError: 'v'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/computation/scope.py", line 244, in resolve
    return s

PandasGUI ERROR — pandasgui.store — "None of [Index(['55_0', '55_3', '55_6', '55_9', '55_12', '55_18', '55_24', '55_30',\n       '55_36', '55_42',\n       ...\n       '65043_18', '65043_24', '65043_30', '65043_36', '65043_42', '65043_48',\n       '65043_54', '65043_60', '65043_72', '65043_84'],\n      dtype='object', length=2615)] are in the [index]"
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandasgui/store.py", line 621, in apply_filters
    df = df.query(filt.expr, engine='python')
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4452, in query
    result = self.loc[res]
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/indexing.py", line 1103, in __geti

PandasGUI ERROR — pandasgui.store — '>' not supported between instances of 'str' and 'int'
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandasgui/store.py", line 621, in apply_filters
    df = df.query(filt.expr, engine='python')
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4449, in query
    res = self.eval(expr, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4575, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/computation/eval.py", line 336, in eval
    parsed_expr = Exp

PandasGUI ERROR — pandasgui.store — '>' not supported between instances of 'str' and 'int'
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandasgui/store.py", line 621, in apply_filters
    df = df.query(filt.expr, engine='python')
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4449, in query
    res = self.eval(expr, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4575, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/computation/eval.py", line 336, in eval
    parsed_expr = Exp

PandasGUI ERROR — pandasgui.store — "None of [Index([  nan,   nan,   nan,   nan,   nan,   nan,   nan,   nan, 'Off', 'Off',\n       ...\n         nan,   nan,   nan,  'On', 'Off',   nan, 'Off', 'Off', 'Off', 'Off'],\n      dtype='object', length=983)] are in the [index]"
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandasgui/store.py", line 621, in apply_filters
    df = df.query(filt.expr, engine='python')
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4452, in query
    result = self.loc[res]
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/indexing.py", line 1103, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/ziadN

PandasGUI ERROR — pandasgui.store — unterminated string literal (detected at line 1) (<unknown>, line 1)
Traceback (most recent call last):
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandasgui/store.py", line 621, in apply_filters
    df = df.query(filt.expr, engine='python')
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4449, in query
    res = self.eval(expr, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4575, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/ziadNader/Desktop/Personal Projects/amp-parkinsons-disease-progression-prediction/amp_env/lib/python3.10/site-packages/pandas/core/computation/eval.py", line 336, in eval
    par