<h1 style="border:2px solid Purple;text-align:center">RANZCR CLiP - Catheter and Line Position Challenge</h1>

<h1 style="border:2px solid Purple;text-align:center">The Competition</h1>

Serious complications can occur as a result of malpositioned lines and tubes in patients. Doctors and nurses frequently use checklists for placement of lifesaving equipment to ensure they follow protocol in managing patients. Yet, these steps can be time consuming and are still prone to human error, especially in stressful situations when hospitals are at capacity.

Hospital patients can have catheters and lines inserted during the course of their admission and serious complications can arise if they are positioned incorrectly. Nasogastric tube malpositioning into the airways has been reported in up to 3% of cases, with up to 40% of these cases demonstrating complications [1-3]. Airway tube malposition in adult patients intubated outside the operating room is seen in up to 25% of cases [4,5]. The likelihood of complication is directly related to both the experience level and specialty of the proceduralist. Early recognition of malpositioned tubes is the key to preventing risky complications (even death), even more so now that millions of COVID-19 patients are in need of these tubes and lines.

The gold standard for the confirmation of line and tube positions are chest radiographs. However, a physician or radiologist must manually check these chest x-rays to verify that the lines and tubes are in the optimal position. Not only does this leave room for human error, but delays are also common as radiologists can be busy reporting other scans. Deep learning algorithms may be able to automatically detect malpositioned catheters and lines. Once alerted, clinicians can reposition or remove them to avoid life-threatening complications.

The Royal Australian and New Zealand College of Radiologists (RANZCR) is a not-for-profit professional organisation for clinical radiologists and radiation oncologists in Australia, New Zealand, and Singapore. The group is one of many medical organisations around the world (including the NHS) that recognizes malpositioned tubes and lines as preventable. RANZCR is helping design safety systems where such errors will be caught.

<h1 style="border:2px solid Purple;text-align:center">Objective</h1>

In this competition, you’ll detect the presence and position of catheters and lines on chest x-rays. Use machine learning to train and test your model on 40,000 images to categorize a tube that is poorly placed.


<h1 style="border:2px solid Purple;text-align:center">Dataset</h1>

The dataset has been labelled with a set of definitions to ensure consistency with labelling. The normal category includes lines that were appropriately positioned and did not require repositioning. The borderline category includes lines that would ideally require some repositioning but would in most cases still function adequately in their current position. The abnormal category included lines that required immediate repositioning.

<h1 style="border:2px solid Purple;text-align:center">Importing Necessary Libraries</h1>

In [None]:
import pandas as pd
import numpy as np
import cv2
from ast import literal_eval
from tqdm import tqdm
import os

import plotly_express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt

from plotly.offline import init_notebook_mode
init_notebook_mode()

<h1 style="border:2px solid Purple;text-align:center">Training Dataset</h1>

In [None]:
BASE_DIR = "../input/ranzcr-clip-catheter-line-classification/"
df = pd.read_csv(BASE_DIR + 'train.csv')
df.head()

In [None]:
print(f"The size of the dataset is {df.shape[0]} and it contains {df['PatientID'].nunique()} patient ids")

### Columns

* StudyInstanceUID - unique ID for each image
* ETT - Abnormal - endotracheal tube placement abnormal
* ETT - Borderline - endotracheal tube placement borderline abnormal
* ETT - Normal - endotracheal tube placement normal
* NGT - Abnormal - nasogastric tube placement abnormal
* NGT - Borderline - nasogastric tube placement borderline abnormal
* NGT - Incompletely Imaged - nasogastric tube placement inconclusive due to imaging
* NGT - Normal - nasogastric tube placement borderline normal
* CVC - Abnormal - central venous catheter placement abnormal
* CVC - Borderline - central venous catheter placement borderline abnormal
* CVC - Normal - central venous catheter placement normal
* Swan Ganz Catheter Present
* PatientID - unique ID for each patient in the dataset

In [None]:
idCounts = df['PatientID'].value_counts().reset_index()
idCounts.columns = ['PatientID', 'Number of Observations']
idCounts = idCounts.sort_values(by = 'Number of Observations', ascending = False)
idCounts.head()

In [None]:
fig = px.histogram(idCounts, 'Number of Observations', title = 'Distribution of Number of Observations per PatientIDs', template = 'ggplot2')
fig.show()

Most of the PatientID's have only one observation in the dataset, but there are some PatientIDs with as many as 100+ records in the training dataset. 

In [None]:
categories = ['ETT - Abnormal', 'ETT - Borderline',
       'ETT - Normal', 'NGT - Abnormal', 'NGT - Borderline',
       'NGT - Incompletely Imaged', 'NGT - Normal', 'CVC - Abnormal',
       'CVC - Borderline', 'CVC - Normal','Swan Ganz Catheter Present']
categoryCounts = df[categories].sum(axis = 0).reset_index()
categoryCounts.columns = ['Malpositions', 'Number of Observations']

In [None]:
fig = px.bar(categoryCounts, y = 'Malpositions', x = 'Number of Observations', template = 'seaborn', text = 'Number of Observations', title = 'Line and tube positions')
fig.show()

In [None]:
countsDf = {'Type' : [], 'Malposition'  : [], 'Num of Observations' : []}
for Type in ['Normal','Abnormal','Borderline']:
    for malposition in ['ETT','NGT','CVC']:
        colName = f'{malposition} - {Type}'
        countsDf['Type'].append(Type)
        countsDf['Malposition'].append(malposition)
        val = df[colName].sum(axis = 0)
        countsDf['Num of Observations'].append(val)
countsDf = pd.DataFrame(countsDf)

In [None]:
fig = px.bar(countsDf, x = 'Num of Observations', y = 'Type', color = 'Malposition', barmode = 'stack', 
             color_discrete_map={'ETT' : '#a2885e', 'NGT' : '#e9cf87', 'CVC' : '#f1efd9'}, template = 'plotly_dark')
fig.show()

<h1 style="border:2px solid Purple;text-align:center">Image Annotations</h1>

In [None]:
annotations = pd.read_csv(BASE_DIR + 'train_annotations.csv')
annotations['data'] = annotations['data'].apply(literal_eval)
annotations.head()

In [None]:
IMAGE_DIR_TRAIN = BASE_DIR + 'train/'
IMAGE_DIR_TEST = BASE_DIR + 'test/'

In [None]:
def plot_image_and_annotations(image_uid, title):
    image_path = IMAGE_DIR_TRAIN + image_uid + '.jpg'
    data = annotations[annotations['StudyInstanceUID'] == image_uid]['data']
    if(len(data) == 0):
        print(title)
        return
    plt.figure(figsize=(10,6))

    plt.subplot(1, 2,1)
    img = plt.imread(image_path)
    
    print(f"Image dimensions:  {img.shape[0],img.shape[1]}")
    print(f"Maximum pixel value : {img.max():.1f} ; Minimum pixel value:{img.min():.1f}")
    print(f"Mean value of the pixels : {img.mean():.1f} ; Standard deviation : {img.std():.1f}")
    
    plt.imshow(img, cmap='gray')
    plt.title('Actual Image')
    plt.axis('off')

    
    data = data.values[0]
    x_loc = [x[0] for x in data]
    y_loc = [x[1] for x in data]
    
    plt.subplot(1,2,2)
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    plt.plot(x_loc, y_loc, linewidth = 5.0)
    plt.tight_layout()
    
    plt.title('Annotated Image')
    
    plt.suptitle(title)
    plt.show()

In [None]:
imageIds = {}
for cat in categories:
    imageIds[cat] = df[df[cat] == 1]['StudyInstanceUID'].to_list()

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - ETT - Abnormal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[0]][0], categories[0])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - ETT - Borderline</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[1]][2], categories[1])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - ETT - Normal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[2]][-1], categories[2])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - NGT - Abnormal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[3]][-1], categories[3])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - NGT - Borderline</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[4]][-1], categories[4])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - NGT - Incompletely Imaged</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[5]][-1], categories[5])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - NGT - Normal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[6]][7], categories[6])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - CVC - Abnormal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[7]][-1], categories[7])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - CVC - Borderline</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[8]][0], categories[8])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - CVC - Normal</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[9]][-1], categories[9])

<h1 style="border:2px solid Purple;text-align:center">Image Annotations - Swan Ganz Catheter Present</h1>

In [None]:
plot_image_and_annotations(imageIds[categories[10]][5], categories[10])

In [None]:
image_width = []
image_height = []
for image_uid in tqdm(df['StudyInstanceUID'].to_list()):
    image_path = IMAGE_DIR_TRAIN + image_uid + '.jpg'
    img = cv2.imread(image_path,cv2.IMREAD_GRAYSCALE)
    image_height.append(img.shape[0])
    image_width.append(img.shape[1])
    
df['Image Height'] = image_height
df['Image Width'] = image_width
df['Total Pixels'] = df['Image Height'] * df['Image Width']

In [None]:
test_image_width = []
test_image_height = []
for image_name in tqdm(os.listdir(IMAGE_DIR_TEST)):
    image_path = IMAGE_DIR_TEST + image_name
    img = cv2.imread(image_path,cv2.IMREAD_GRAYSCALE)
    test_image_height.append(img.shape[0])
    test_image_width.append(img.shape[1])

In [None]:
trace0 = go.Violin(x = image_height, name = 'Train Height')
trace1 = go.Violin(x = image_width, name = 'Train Width')

trace2 = go.Violin(x = test_image_width, name = 'Test Height')
trace3 = go.Violin(x = test_image_height, name = 'Test Width')

fig = go.Figure([trace0,trace2, trace1, trace3])
fig.update_layout(title = 'Image Height and Widths', template = 'ggplot2')
fig.show()

## Work in Progress