## Clinical Trial and Experimental Bias: Detection and Attrition

### Author: Victoria McCray

In [1]:
import datetime

current_date = datetime.date.today()

print("Last Updated:", current_date)

Last Updated: 2025-02-25


## Detection Bias

Detection bias occurs when there is a systematic difference in how outcomes are measured between groups. For example, if one group is more likely to have outcomes measured in a certain way that exaggerates results.

## Example
A study measuring cognitive function in patients with neurodegenerative diseases may be biased if some participants undergo more frequent or detailed cognitive testing than others.



## Mitigation
Implement *blinding* during data collection:

- **Blinding**: The experimenters, including those who analyze the data, should be blinded to the group assignments.
- **Randomization**: Randomly assign participants to treatment or control groups to minimize selection bias, which could otherwise confound the results.
- **Stage for Breaking Blinding**: Define the stage at which the blinding codes will be broken (e.g., after data analysis), ensuring that bias is minimized throughout the study.
- **Blinding Protocol**: The protocol should clearly define at what stage the blinding codes will be broken—ideally, only after all data has been analyzed.

In [2]:
import pandas as pd
import numpy as np
import random
import seaborn as sns
import matplotlib.pyplot as plt

# Seed for reproducibility
random.seed(42)

# Number of participants
n_participants = 100

# Create a list of participant IDs
participants = [f"Participant_{i+1}" for i in range(n_participants)]

# Step 1: Randomly assign participants to Treatment or Control
# Randomly shuffle a list of treatment and control labels
groups = ['Treatment'] * (n_participants // 2) + ['Control'] * (n_participants // 2)
random.shuffle(groups)



In [3]:
# Step 2: Create DataFrame
df = pd.DataFrame({
    'Participant': participants,
    'Group': groups
})

# Step 3: Simulate outcomes for each participant (e.g., cognitive scores)
# Simulate some baseline scores for treatment and control
treatment_outcome = np.random.normal(80, 10, n_participants // 2)  # Treatment group scores
control_outcome = np.random.normal(75, 10, n_participants // 2)    # Control group scores

# Assign scores to DataFrame
df.loc[df['Group'] == 'Treatment', 'Outcome'] = treatment_outcome
df.loc[df['Group'] == 'Control', 'Outcome'] = control_outcome

print("Step 4: Blind the experimenter (they cannot know who belongs to which group)")
# We simulate the "blinded" experimenter by creating a new column where the group label is hidden.
# This represents the experimenter's perspective during outcome measurement.
df['Blinded_Group'] = np.where(df['Group'] == 'Treatment', 'Unknown', 'Unknown')
# Display the first few rows of the dataframe
blind = df.head()
blind

Step 4: Blind the experimenter (they cannot know who belongs to which group)


Unnamed: 0,Participant,Group,Outcome,Blinded_Group
0,Participant_1,Treatment,76.088347,Unknown
1,Participant_2,Treatment,82.645521,Unknown
2,Participant_3,Control,83.8184,Unknown
3,Participant_4,Treatment,78.468526,Unknown
4,Participant_5,Control,80.42008,Unknown


In [4]:
print("Step 5: Data Analysis (after blinding is broken)")
# After the data is collected, the experimenter can now see the group labels.
df['Blinded_Group'] = df['Group']  # "Breaking the blind" after analysis

# Display the first few rows of the broken blind dataframe
broken_blind = df.head()
broken_blind


Step 5: Data Analysis (after blinding is broken)


Unnamed: 0,Participant,Group,Outcome,Blinded_Group
0,Participant_1,Treatment,76.088347,Treatment
1,Participant_2,Treatment,82.645521,Treatment
2,Participant_3,Control,83.8184,Control
3,Participant_4,Treatment,78.468526,Treatment
4,Participant_5,Control,80.42008,Control


## Attrition Bias


### Simulating Attrition Bias in a Study

In this section, we simulate a clinical trial with 100 participants, split into two groups: Treatment and Control. We then simulate baseline outcome scores for each group. Afterward, we introduce attrition bias by randomly dropping out 20% of the participants.

1. Simulating Participant Data and Randomization
We first define the number of participants, generate random baseline outcome scores, and assign participants to either the Treatment or Control group.

### Example: Internal Validity
In your study, many more participants drop out of the treatment group than the control group. 

For the participants who stay, the treatment is more successful than the control protocol in encouraging responsible alcohol use.

But it’s hard to form a conclusion, because you don’t know what the outcomes were for the participants who left the treatment group.

### Example: External Validity

Most of the participants who drop out from your study consume much more alcohol than those who stay. They leave because they aren’t satisfied with the program.
Your final sample is skewed towards college students who consume low-to-moderate amounts of alcohol.

Your findings aren’t applicable to all college students, because your sample underrepresents those who drink large amounts of alcohol.

### Preventing Attrition Bias
- Increase accessibility of participation
- Conduct outreach tailored for protected populations
- Increase benefits, compensation, and motivational offerings to participants when feasible

In [5]:
import pandas as pd
import numpy as np
import random
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer

In [6]:
# Seed for reproducibility
random.seed(42)

# Number of participants
n_participants = 150

# Create a list of participant IDs
participants = [f"Participant_{i+1}" for i in range(n_participants)]

# Randomly assign participants to Treatment or Control
groups = ['Treatment'] * (n_participants // 2) + ['Control'] * (n_participants // 2)
random.shuffle(groups)

# Simulate baseline outcome scores (before attrition occurs)
treatment_outcome = np.random.normal(80, 10, n_participants // 2)  # Treatment group scores
control_outcome = np.random.normal(75, 10, n_participants // 2)    # Control group scores

# Create DataFrame
df = pd.DataFrame({
    'Participant': participants,
    'Group': groups
})

# Assign outcomes
df.loc[df['Group'] == 'Treatment', 'Outcome'] = treatment_outcome
df.loc[df['Group'] == 'Control', 'Outcome'] = control_outcome

In [7]:
# initial row count (number of participants)
len(df)

150

In [8]:
# Initialize dataset before simulated drop out
before_df = df

# Randomly sample 20% of the DataFrame's index
drop_indices = df.sample(frac=0.2).index

# Drop the sampled indices from the DataFrame
after_df = df.drop(drop_indices)

# The DataFrame now has 20% fewer rows
print(len(after_df))

120


In [9]:
print("Row Count before Simulated Drop Out:", len(before_df))
before_df

Row Count before Simulated Drop Out: 150


Unnamed: 0,Participant,Group,Outcome
0,Participant_1,Treatment,61.077011
1,Participant_2,Control,89.345540
2,Participant_3,Control,74.700179
3,Participant_4,Control,55.779368
4,Participant_5,Control,67.398284
...,...,...,...
145,Participant_146,Treatment,87.699242
146,Participant_147,Treatment,99.450725
147,Participant_148,Treatment,73.285100
148,Participant_149,Treatment,73.503686


In [10]:
print("Row Count after Simulated Drop Out:", len(after_df))
after_df

Row Count after Simulated Drop Out: 120


Unnamed: 0,Participant,Group,Outcome
0,Participant_1,Treatment,61.077011
1,Participant_2,Control,89.345540
3,Participant_4,Control,55.779368
4,Participant_5,Control,67.398284
6,Participant_7,Control,79.859062
...,...,...,...
142,Participant_143,Control,57.503796
143,Participant_144,Treatment,88.072925
147,Participant_148,Treatment,73.285100
148,Participant_149,Treatment,73.503686


But we don't want to drop them (therefore introducing bias); instead we can impute the missing values using measures of central tendency.

In [11]:
# Step 1: Simulate Attrition by assigning NaN instead of dropping
drop_indices = df.sample(frac=0.2, random_state=42).index  # Randomly select 20% of participants
df.loc[drop_indices, 'Outcome'] = np.nan  # Assign NaN to 'Outcome' instead of dropping

# Step 2: Create a new DataFrame after attrition (still includes all participants)
after_df = before_df.copy()  # Copy dataset with missing outcomes instead of dropping rows

# Print results
print(f"Total participants in after_df: {len(after_df)}")  # Should still be 150
print(f"Number of missing outcomes (dropped out participants): {after_df['Outcome'].isna().sum()}")

Total participants in after_df: 150
Number of missing outcomes (dropped out participants): 30


In [12]:
# Impute missing data using SimpleImputer (mean imputation for simplicity)
imputer = SimpleImputer(strategy='median')
after_df['Imputed_Outcome'] = imputer.fit_transform(after_df[['Outcome']])

# Display imputed outcomes
after_df[['Outcome','Imputed_Outcome']]
before_df['Outcome'].unique()

array([ 61.07701096,  89.34554019,  74.70017942,  55.77936797,
        67.39828409,  82.15132966,  79.85906216,  71.57046762,
        74.38125542,          nan,  66.73157125,  55.12388429,
        66.99197435,  74.57895176,  80.01339067,  73.33451791,
        80.97938629,  59.90036197,  79.37598137,  72.42189306,
        76.99045799,  75.44915337,  64.1368751 ,  75.96609569,
        91.08504681,  68.58407373,  79.90393517,  78.79293884,
        95.3045008 ,  81.48100707,  78.28804266,  80.74360111,
        81.20513366,  69.0057795 ,  63.39130012,  83.24919007,
        58.78621189,  83.40917602,  58.48344606,  69.37109339,
       100.53373799,  83.60284835,  71.16363928,  57.63513294,
        89.75432349,  93.33062466,  73.7545119 ,  75.87044019,
        98.21787808,  75.23139931,  86.79657973,  71.91247852,
        74.94179569,  79.51962716,  78.07193479,  72.59745968,
        70.70058685,  54.09677145,  74.48312375,  74.28618999,
        80.89630968,  81.72963674,  52.70177404,  62.69

## Works Cited
Bhandari, P. (2021, November 1). Attrition bias in research. Scribbr. https://www.scribbr.com/research-bias/attrition-bias/

## Additional Resources

Attrition Bias:
https://www.scribbr.com/research-bias/attrition-bias/#:~:text=If%20you%20have%20a%20small%20amount%20of%20attrition%20bias%2C%20you,of%20participants%20in%20your%20sample.

An Easy Introduction to Statistical Significance (With Examples): https://www.scribbr.com/statistics/statistical-significance/