# Welcome!


In this session we will tackle the problem of **Clinical Trial Matching**!


# Questions

## What are Clinical Trials?

Clinical trials are research studies performed in people that are aimed at evaluating a medical,surgical or behavioral interventions. They are the primary way that researchers find out if a new treatment, like a new drug or diet or medical device is safe and effective in people.

## What is Clinical Trial Matching

Clinical trial matching facilitate patient enrollment in clinical trials by matching patients with a set of requirements.

----

Understood? Let's begin!

<img src="https://www.memecreator.org/static/images/memes/5045204.jpg">

# Unimportant settings

In [1]:
# This setting allows the notebook to show all 
# outputs instead of only the last one. It's just a QoL thing.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Import packages

This is where we will import our **Python tools** that will help us tackle this problem

In [13]:
import pandas as pd # to read and analyse data

# Import the data

In [3]:
clinical_trials_data = pd.read_csv('../data/sample_collection.csv',index_col=0)
patients_data = pd.read_csv('../data/patients_sample.csv',index_col=0)

## Clinical Trials

So now we have our clinical trials below 👀

In [4]:
clinical_trials_data.head()
clinical_trials_data.shape

Unnamed: 0_level_0,title,summary,gender,min_age,max_age
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
NCT00000408,Low Back Pain Patient Education Evaluation,\n Back pain is one of the most common of...,Both,18 Years,
NCT00000492,Beta-Blocker Heart Attack Trial (BHAT),\n To determine whether the regular admin...,Both,30 Years,69 Years
NCT00000501,Hypertension Prevention Trial (HPT) Feasibilit...,\n To test the feasibility and the effica...,Both,25 Years,49 Years
NCT00001853,Diabetes and Heart Disease Risk in Blacks,\n It is unknown if obesity contributes t...,Both,18 Years,65 Years
NCT00004727,Antiplatelet Therapy to Prevent Stroke in Afri...,\n The African-American Antiplatelet Stro...,Both,29 Years,85 Years


(3170, 5)

It seems we have 3170 clinical trials here!

Each clinical trial is comprised by:
- an **identifier**
- a **title**
- a **summary**
- and some explicit requirements:
    - **gender**
    - **min_age**
    - **max_age**

In order for a patient to be elegible to join this trial, they must satisfy both the **explicit requirements** and the **summary's requirements**!

In [7]:
clinical_trials_data.iloc[10]

title      Influence of Amlodipine on the Mortality of Pa...
summary    \n      Patients with end-stage renal failure ...
gender                                                  Both
min_age                                             18 Years
max_age                                             90 Years
Name: NCT00124969, dtype: object

The explicit requirements seem pretty easy to understand. What additional information does the **summary** contain?

In [8]:
print(clinical_trials_data.iloc[10].summary)


      Patients with end-stage renal failure have a markedly higher mortality because of
      cardiovascular events in comparison with the normal population. Disorders in the calcium
      metabolism, such as calcification of the vessel walls, occur very frequently. There are
      indications that calcium channel blockers are capable of lowering the cardiovascular
      mortality in patients with end-stage renal failure.

      It is intended to carry out a prospective, randomized, double-blind, placebo-controlled,
      multicenter study in order to find out if the calcium channel blocker amlodipine is able to
      reduce the mortality of patients with end-stage renal failure.

      The investigation will be carried out after suitable explanation and written informed
      consent in 356 patients aged between 18 and 90 years with end-stage renal failure and
      chronic haemodialysis treatment. The patients will be randomized to either treatment with
      amlodipine 10 mg/day or

We can now see that there are additional requirements from the summary text:
- patient's condition must be in an "end-stage renal failure"
- patient must be currently under "chronic haemodialysis treatment"

## Patients

In [9]:
patients_data.head(3)
patients_data.shape

Unnamed: 0_level_0,description
patient_id,Unnamed: 1_level_1
20141,A 58-year-old African-American woman presents ...
201410,A physician is called to see a 67-year-old wom...
201411,A 40-year-old woman with no past medical histo...


(51, 1)

We have 51 patients. Each patient's condition is described by the column **description**. Let's look at one of them:

In [10]:
print(patients_data.iloc[19].description)

A 72-year-old man complains of increasing calf pain when walking uphill. The symptoms have gradually increased over the past 3 months. The patient had an uncomplicated myocardial infarction 2 years earlier and a transient ischemic attack 6 months ago. Over the past month, his blood pressure has worsened despite previous control with diltiazem, hydrochlorothiazide, and propranolol. His is currently taking isosorbide dinitrate, hydrochlorothiazide, and aspirin. On physical examination, his blood pressure is 151/91 mm Hg, and his pulse is 67/min. There is a right carotid bruit. His lower extremities are slightly cool to the touch and have diminished pulses at the dorsalis pedis.
        


Patient information is stored in **written notes** by healthcare professionals (eg.: doctors, physicians, nurses).

<img src="https://memegenerator.net/img/instances/75581336.jpg" width="500">

> Q1: How can we make a computer understand text data such that it enables it to match patients and trials?