# Defining cohort for real-world KEYNOTE-361 and IMvigor130

**The goal of this notebook is to identify patients with advanced or metastatic urothelial carcinoma who received first-line treatment with either atezolizumab, pembrolizumab, or chemotherapy, following protocols consistent with the KEYNOTE-361 and IMvigor130 clinical trials.**

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Function that returns number of rows and count of unique PatientIDs for a dataframe. 
def row_ID(dataframe):
    row = dataframe.shape[0]
    ID = dataframe['PatientID'].nunique()
    return row, ID

## 1. Identify cohort receiving first-line atezolizumab or pembrolizumab

In [3]:
therapy = pd.read_csv('../data/LineOfTherapy.csv')

In [4]:
therapy.head(5)

Unnamed: 0,PatientID,LineName,LineNumber,LineSetting,RegimenClass,IsMaintenanceTherapy,EnhancedCohort,StartDate,EndDate
0,F5AAF96C85477,Pembrolizumab,1,,,False,BLADDER,2021-07-08,2021-09-14
1,F43136CF07859,"Carboplatin,Paclitaxel",1,,,False,BLADDER,2018-05-04,2018-08-29
2,F43136CF07859,Clinical Study Drug,3,,,False,BLADDER,2019-04-04,2019-06-13
3,F43136CF07859,Pembrolizumab,2,,,False,BLADDER,2018-08-30,2019-04-03
4,F6FAD468C5AE0,"Nivolumab,Pembrolizumab",1,,,False,BLADDER,2018-05-17,2019-01-07


In [5]:
therapy.query('LineNumber == 1').LineName.value_counts().head(20)

LineName
Carboplatin,Gemcitabine                  2226
Cisplatin,Gemcitabine                    2068
Pembrolizumab                            1486
Atezolizumab                              681
Avelumab                                  410
Carboplatin,Paclitaxel                    320
Nivolumab                                 301
Gemcitabine                               278
MVAC                                      261
Clinical Study Drug                       252
Cisplatin                                 137
Paclitaxel                                 98
Enfortumab Vedotin-Ejfv,Pembrolizumab      88
Enfortumab Vedotin-Ejfv                    78
Carboplatin                                71
Gemcitabine,Paclitaxel                     60
Carboplatin,Etoposide                      53
Fluorouracil,Mitomycin                     50
Pemetrexed                                 46
Docetaxel                                  36
Name: count, dtype: int64

In [6]:
therapy.query('LineNumber == 1').query('LineName == "Pembrolizumab" or LineName == "Atezolizumab"').LineName.value_counts()

LineName
Pembrolizumab    1486
Atezolizumab      681
Name: count, dtype: int64

In [7]:
checkpoint_df = (
    therapy
    .query('LineNumber == 1')
    .query('LineName == "Pembrolizumab" or LineName == "Atezolizumab"')
    [['PatientID', 'LineName', 'StartDate']])

In [8]:
checkpoint_df.sample(3)

Unnamed: 0,PatientID,LineName,StartDate
6506,F95B8B499E9FA,Atezolizumab,2017-03-31
14124,F554FF907590A,Pembrolizumab,2022-02-07
12525,F50E1B5D5B44C,Pembrolizumab,2020-03-10


In [9]:
row_ID(checkpoint_df)

(2167, 2167)

## 2. Identify cohort receiving first-line chemotherapy
**FDA approved first-line chemotherapy regimens for advanced or metastatic urothelial carcinoma include:** 
- **Gemcitabine + Carboplatin/Cisplatin**
- **MVAC (methotrexate, vinblastine, doxorubicin, cisplatin)**
- **PGC (paclitaxel, gemicitabine, and cisplatin)**

**Patients receiving maintenance avelumab are included.**

In [10]:
(therapy
 .query('LineNumber == 1 and (LineName == "Carboplatin,Gemcitabine" or \
 LineName == "Cisplatin,Gemcitabine" or \
 LineName == "MVAC" or \
 LineName == "Carboplatin,Gemcitabine,Paclitaxel" or \
 LineName == "Cisplatin,Gemcitabine,Paclitaxel")')
 .LineName.value_counts())

LineName
Carboplatin,Gemcitabine               2226
Cisplatin,Gemcitabine                 2068
MVAC                                   261
Carboplatin,Gemcitabine,Paclitaxel      33
Cisplatin,Gemcitabine,Paclitaxel        20
Name: count, dtype: int64

In [11]:
chemo_df = (
    therapy
    .query('LineNumber == 1 and (LineName == "Carboplatin,Gemcitabine" or \
    LineName == "Cisplatin,Gemcitabine")')
    [['PatientID', 'LineName', 'StartDate']].assign(LineName = 'chemo'))

In [12]:
chemo_df.sample(3)

Unnamed: 0,PatientID,LineName,StartDate
13520,FD10443FE12B8,chemo,2014-06-25
5365,F9DB617947C3C,chemo,2021-06-21
9286,FDC7252667272,chemo,2017-02-08


In [13]:
row_ID(chemo_df)

(4294, 4294)

## 3. Combine dataframes and export to csv 

In [14]:
full_cohort = pd.concat([checkpoint_df, chemo_df], axis = 0)

In [15]:
full_cohort.sample(5)

Unnamed: 0,PatientID,LineName,StartDate
8779,F37553C494927,chemo,2020-03-20
1918,F4800A812B339,chemo,2018-12-18
5242,F61FA2C30CD96,chemo,2013-09-11
9491,F12600412495E,chemo,2022-04-29
14514,F187F567846EE,chemo,2017-03-27


In [16]:
row_ID(full_cohort)

(6461, 6461)

In [17]:
full_cohort.to_csv('full_cohort.csv', index = False)