# 🧠 Thyroid Cancer Recurrence Prediction using Keras

In this notebook, we will use a neural network built with **Keras** (TensorFlow backend) to explore and answer some key clinical questions related to **thyroid cancer recurrence**.

### 🔍 Objectives

We'll focus on answering the following questions:

1. **Are thyroid cancer recurrences more common in men or women?**
2. **How does age affect recurrence risk?**
3. **Can we predict recurrence based on tumor staging and pathology?**
4. **What is the relationship between treatment response and recurrence?**

We will also create **interactive visualizations** with `Plotly` and use a **neural network** to predict recurrence based on multiple clinical variables.

---

### 📦 Libraries Used

Make sure the following libraries are installed:

```python
!pip install pandas numpy plotly scikit-learn tensorflow plotly

In [1]:
# 📚 Library Imports

# Data manipulation
import pandas as pd
import numpy as np

# Preprocessing and model utilities
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

# Deep learning - Keras with TensorFlow backend
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Plotting
import plotly.express as px
from plotly.subplots import make_subplots




In [4]:
# 📥 Load the dataset

df = pd.read_csv('data/filtered_thyroid_data.csv')

# 🔍 Preview the first few rows
print(df.head(3))

# 📊 Check class distribution for recurrence
print("\nTarget variable distribution:")
print(df['Recurred'].value_counts())

   Age Gender Hx Radiothreapy Adenopathy       Pathology   Focality Risk    T  \
0   27      F              No         No  Micropapillary  Uni-Focal  Low  T1a   
1   34      F              No         No  Micropapillary  Uni-Focal  Low  T1a   
2   30      F              No         No  Micropapillary  Uni-Focal  Low  T1a   

    N   M Stage       Response Recurred  
0  N0  M0     I  Indeterminate       No  
1  N0  M0     I      Excellent       No  
2  N0  M0     I      Excellent       No  

Target variable distribution:
Recurred
No     275
Yes    108
Name: count, dtype: int64


In [5]:
# 🔧 Data Preprocessing

# Create a copy of the original dataframe
df_enc = df.copy()

# Initialize the LabelEncoder
le = LabelEncoder()

# Encode categorical features
for col in df_enc.select_dtypes(include='object'):
    df_enc[col] = le.fit_transform(df_enc[col])

# Split features (X) and target (y)
X = df_enc.drop('Recurred', axis=1)
y = df_enc['Recurred']

# 📊 Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,  # 20% for testing
    random_state=42,  # Reproducible results
    stratify=y  # Maintain distribution of target variable
)

In [6]:
# 🔧 Feature Scaling

# Fit and transform the training data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

# Transform the test data
X_test = scaler.transform(X_test)

# 🧠 Building the Neural Network Model
model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

# 📊 Compile the model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# 🏋️‍♂️ Train the model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=16,
    validation_split=0.2,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        )
    ],
    verbose=1
)

# 🧪 Evaluate the model
loss, acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {acc:.2%}')



Epoch 1/50


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 93.51%


In [8]:
# 🔍 Data Visualization for Insights

# Create a subplot with 2 rows and 2 columns
fig = make_subplots(
    rows=2, cols=2,
    specs=[[{}, {}],
           [{"type": "domain"}, {}]],
    subplot_titles=(
        '1️⃣ Recurrence by Gender',
        '2️⃣ Age vs Recurrence',
        '3️⃣ Stage and Pathology vs Recurrence',
        '4️⃣ Treatment Response vs Recurrence'
    )
)

# 1️⃣ Gender-based Recurrence
fig1 = px.histogram(df, x='Gender', color='Recurred', barmode='group', text_auto=True)
for trace in fig1.data:
    fig.add_trace(trace, row=1, col=1)

# 2️⃣ Age-based Recurrence Distribution (Boxplot)
fig2 = px.box(df, x='Recurred', y='Age', color='Recurred')
for trace in fig2.data:
    fig.add_trace(trace, row=1, col=2)

# 3️⃣ Sunburst Chart: Stage and Pathology vs Recurrence
fig3 = px.sunburst(df, path=['Stage', 'Pathology', 'Recurred'])
fig.add_trace(fig3.data[0], row=2, col=1)

# 4️⃣ Treatment Response vs Recurrence
fig4 = px.histogram(df, x='Response', color='Recurred', barmode='group', text_auto=True)
for trace in fig4.data:
    fig.add_trace(trace, row=2, col=2)

# Update layout for final figure appearance
fig.update_layout(
    height=800,
    width=1000,
    title_text='Thyroid Cancer Recurrence Analysis',
    template='plotly_dark'
)

# Show the final figure
fig.show()