# Defining cohort for real-world KEYNOTE-361 and IMvigor130

**The goal of this notebook is to identify patients with advanced or metastatic urothelial carcinoma who received first-line treatment with either atezolizumab, pembrolizumab, or chemotherapy, following protocols consistent with the KEYNOTE-361 and IMvigor130 clinical trials.**

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Function that returns number of rows and count of unique PatientIDs for a dataframe. 
def row_ID(dataframe):
    row = dataframe.shape[0]
    ID = dataframe['PatientID'].nunique()
    return row, ID

## 1. Identify cohort receiving first-line atezolizumab or pembrolizumab

In [3]:
therapy = pd.read_csv('../data/LineOfTherapy.csv')

In [4]:
therapy.head(5)

Unnamed: 0,PatientID,LineName,LineNumber,LineSetting,RegimenClass,IsMaintenanceTherapy,EnhancedCohort,StartDate,EndDate
0,F5AAF96C85477,Pembrolizumab,1,,,False,BLADDER,2021-07-08,2021-09-14
1,F43136CF07859,"Carboplatin,Paclitaxel",1,,,False,BLADDER,2018-05-04,2018-08-29
2,F43136CF07859,Clinical Study Drug,3,,,False,BLADDER,2019-04-04,2019-06-13
3,F43136CF07859,Pembrolizumab,2,,,False,BLADDER,2018-08-30,2019-04-03
4,F6FAD468C5AE0,"Nivolumab,Pembrolizumab",1,,,False,BLADDER,2018-05-17,2019-01-07


In [5]:
therapy.query('LineNumber == 1').LineName.value_counts().head(20)

LineName
Carboplatin,Gemcitabine                  2226
Cisplatin,Gemcitabine                    2068
Pembrolizumab                            1486
Atezolizumab                              681
Avelumab                                  410
Carboplatin,Paclitaxel                    320
Nivolumab                                 301
Gemcitabine                               278
MVAC                                      261
Clinical Study Drug                       252
Cisplatin                                 137
Paclitaxel                                 98
Enfortumab Vedotin-Ejfv,Pembrolizumab      88
Enfortumab Vedotin-Ejfv                    78
Carboplatin                                71
Gemcitabine,Paclitaxel                     60
Carboplatin,Etoposide                      53
Fluorouracil,Mitomycin                     50
Pemetrexed                                 46
Docetaxel                                  36
Name: count, dtype: int64

In [6]:
therapy.query('LineNumber == 1').query('LineName == "Pembrolizumab" or LineName == "Atezolizumab"').LineName.value_counts()

LineName
Pembrolizumab    1486
Atezolizumab      681
Name: count, dtype: int64

In [7]:
checkpoint_df = (
    therapy
    .query('LineNumber == 1')
    .query('LineName == "Pembrolizumab" or LineName == "Atezolizumab"')
    [['PatientID', 'LineName', 'StartDate']])

In [8]:
checkpoint_df.sample(3)

Unnamed: 0,PatientID,LineName,StartDate
7686,F5525BAFBEECB,Pembrolizumab,2021-04-02
1092,FBE58CF8C10B2,Pembrolizumab,2022-06-24
9583,F84214198A790,Pembrolizumab,2019-06-26


In [9]:
row_ID(checkpoint_df)

(2167, 2167)

## 2. Identify cohort receiving first-line chemotherapy
**FDA approved first-line chemotherapy regimens for advanced or metastatic urothelial carcinoma include:** 
- **Gemcitabine + Carboplatin/Cisplatin**
- **MVAC (methotrexate, vinblastine, doxorubicin, cisplatin)**
- **PGC (paclitaxel, gemicitabine, and cisplatin)**

**Patients receiving maintenance avelumab are included.**

In [10]:
(therapy
 .query('LineNumber == 1 and (LineName == "Carboplatin,Gemcitabine" or \
 LineName == "Cisplatin,Gemcitabine" or \
 LineName == "MVAC" or \
 LineName == "Carboplatin,Gemcitabine,Paclitaxel" or \
 LineName == "Cisplatin,Gemcitabine,Paclitaxel")')
 .LineName.value_counts())

LineName
Carboplatin,Gemcitabine               2226
Cisplatin,Gemcitabine                 2068
MVAC                                   261
Carboplatin,Gemcitabine,Paclitaxel      33
Cisplatin,Gemcitabine,Paclitaxel        20
Name: count, dtype: int64

In [11]:
chemo_df = (
    therapy
    .query('LineNumber == 1 and (LineName == "Carboplatin,Gemcitabine" or \
    LineName == "Cisplatin,Gemcitabine")')
    [['PatientID', 'LineName', 'StartDate']].assign(LineName = 'chemo'))

In [12]:
chemo_df.sample(3)

Unnamed: 0,PatientID,LineName,StartDate
16463,F8FBEB386233C,chemo,2020-09-14
5030,FA9151D7C535A,chemo,2021-05-04
5798,F262AF767AE4D,chemo,2021-09-23


In [13]:
row_ID(chemo_df)

(4294, 4294)

## 3. Combine dataframes and export to csv 

In [14]:
full_cohort = pd.concat([checkpoint_df, chemo_df], axis = 0)

In [15]:
full_cohort.sample(5)

Unnamed: 0,PatientID,LineName,StartDate
14392,F542C1C8C501D,chemo,2016-04-04
7087,F8E50C7F52CE6,chemo,2018-09-20
3189,F2322A7A14F34,chemo,2014-05-15
12423,FC472877031B8,Pembrolizumab,2021-11-03
10889,F2763D709DA71,chemo,2013-09-23


In [16]:
row_ID(full_cohort)

(6461, 6461)

In [17]:
full_cohort.to_csv('../outputs/full_cohort.csv', index = False)