# Welcome

The process of manually searching for clinical trials is very resource intensive and should be automated by a computer.

## Problem
It is very costly (human resources) to manually check each patient description and assign them to a clinical trial

## Objective

Create an algorithm that automatically assigns possible candidates (patients) to clinical trials

In [1]:
import pandas as pd

# Data

In [2]:
!ls ../data

patients_sample.csv  qrels_sample.csv  sample_collection.csv


In [3]:
clinical_trials = pd.read_csv('../data/sample_collection.csv')
patients = pd.read_csv('../data/patients_sample.csv')
relevance = pd.read_csv('../data/qrels_sample.csv')

In [4]:
clinical_trials.head(2)
clinical_trials.shape

patients.head(2)
patients.shape

Unnamed: 0,id,title,summary,gender,min_age,max_age
0,NCT00000408,Low Back Pain Patient Education Evaluation,\n Back pain is one of the most common of...,Both,18 Years,
1,NCT00000492,Beta-Blocker Heart Attack Trial (BHAT),\n To determine whether the regular admin...,Both,30 Years,69 Years


(3170, 6)

Unnamed: 0,patient_id,description
0,20141,A 58-year-old African-American woman presents ...
1,201410,A physician is called to see a 67-year-old wom...


(51, 2)

# Let's go easy


Main objective: Match based on **gender** requirements

## Part 1: Divide and conquer

We have a big objective but in order to solve it let's divide it into smaller objectives that are easier to complete.

<br>

**Sub-objectives**:
- **A**: Detect gender from patient description
- **B**: Go through each trial and assign patients based on their gender

In [5]:
def naive_man_detector(text):

    possible_male_references = ['man', 'male', 'm']
    
    # convert everything to lower case
    text = text.lower() # possible exercise!!
    
    # usually gender is in the first sentence
    # so let's pick the first ~100 characters to find the gender
    first_part = text[:100]
    
    words = first_part.split(" ")
    
    for word in words:
        if word in possible_male_references:
            return True
    return False

^^aproveitar estes hard rules para explicar a vantagem de aplicar ML nisto

In [6]:
patients['is_male'] = patients.description.apply(naive_man_detector)

Now that we can classify each patient in gender, lets assign patients to clinical trials

In [7]:
patient2trials = {}

for patient in patients.itertuples(index=False):

    patient_id = patient.patient_id
    patient_description = patient.description
    gender = 'male' if patient.is_male == True else 'female'
    
    patient2trials[patient_id] = []
    for trial in clinical_trials.itertuples(index=False):
        if patient.is_male == True and trial.gender in ['All', 'Male']:
            patient2trials[patient_id].append(trial)
        elif patient.is_male == False and trial.gender in ['All','Female']:
            patient2trials[patient_id].append(trial)
            
    print(f'Patient {patient_id} is believed to be a {gender} and was matched to {len(patient2trials[patient_id])} trials!')

Patient 20141 is believed to be a female and was matched to 317 trials!
Patient 201410 is believed to be a female and was matched to 317 trials!
Patient 201411 is believed to be a female and was matched to 317 trials!
Patient 201412 is believed to be a female and was matched to 317 trials!
Patient 201413 is believed to be a female and was matched to 317 trials!
Patient 201414 is believed to be a male and was matched to 70 trials!
Patient 201415 is believed to be a female and was matched to 317 trials!
Patient 201416 is believed to be a female and was matched to 317 trials!
Patient 201417 is believed to be a male and was matched to 70 trials!
Patient 201418 is believed to be a male and was matched to 70 trials!
Patient 20142 is believed to be a male and was matched to 70 trials!
Patient 201421 is believed to be a female and was matched to 317 trials!
Patient 201422 is believed to be a female and was matched to 317 trials!
Patient 201423 is believed to be a male and was matched to 70 tri