# Cancer Test Results

In this section, you'll find a simulated dataset on cancer test results for patients and whether they really have cancer. Explore cancer_test_data.csv in the Jupyter notebook to answer the following questions.

- How many patients are there in total?
- How many patients have cancer?
- How many patients do not have cancer?
- What proportion of patients have cancer?
- What proportion of patients don't have cancer?
- What proportion of patients with cancer test positive?
- What proportion of patients with cancer test negative?
- What proportion of patients without cancer test positive?
- What proportion of patients without cancer test negative?

In [1]:
import pandas as pd
import numpy as np

# load dataset
df = pd.read_csv('cancer_test_data.csv')
df.head()

Unnamed: 0,patient_id,test_result,has_cancer
0,79452,Negative,False
1,81667,Positive,True
2,76297,Negative,False
3,36593,Negative,False
4,53717,Negative,False


In [4]:
# number of patients
print(df.patient_id.nunique())
print(df.shape)

2914
(2914, 3)


In [8]:
# number of patients with cancer
df.query('has_cancer==True').patient_id.nunique()

306

In [10]:
# number of patients without cancer
df.query('has_cancer==False').patient_id.nunique()

2608

In [15]:
# proportion of patients with cancer
(df.has_cancer).mean()

0.10501029512697323

In [16]:
# proportion of patients without cancer
1- (df.has_cancer).mean()

0.89498970487302676

In [20]:
# proportion of patients with cancer who test positive
(df.query('has_cancer==True').test_result == "Positive").mean()

0.90522875816993464

In [21]:
# proportion of patients with cancer who test negative
(df.query('has_cancer==True').test_result == "Negative").mean()

0.094771241830065356

In [22]:
# proportion of patients without cancer who test positive
(df.query('has_cancer==False').test_result == "Positive").mean()

0.2036042944785276

In [23]:
# proportion of patients without cancer who test negative
(df.query('has_cancer==False').test_result == "Negative").mean()

0.79639570552147243

In [24]:
0.90522875816993464+0.094771241830065356+0.2036042944785276+0.79639570552147243

2.0

In [None]:
0.90522875816993464+0.094771241830065356

In the previous section, you found the following proportions from the cancer results dataset.

- Patients with cancer: 0.105
- Patients without cancer: 0.895
- Patients with cancer who tested positive: 0.905
- Patients with cancer who tested negative: 0.095
- Patients without cancer who tested positive: 0.204
- Patients without cancer who tested negative: 0.796

| Probability | Meaning |
| :--------------------------- | :-------------------------------------------------- | 
| P(cancer) = 0.105            | Probability a patient has cancer                    |
| P(~cancer) = 0.895           | Probability a patient does not have cancer          |
| P(positive\|cancer) = 0.905  | Probability a patient with cancer tests positive    |
| P(negative\|cancer) = 0.095  | Probability a patient with cancer tests negative    |
| P(positive\|~cancer) = 0.204 | Probability a patient without cancer tests positive |
| P(negative\|~cancer) = 0.796 | Probability a patient without cancer tests negative |