# Drawing Conclusions Example
Using descriptive statistics to answer some questions from the `cancer_data_edited.csv` dataset.  
The question to solve for: Does the size of a tumor affect its malignancy? We'll create a layered histogram to better understand the differences between malignant and benign diagnosis.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# load the dataset
df = pd.read_csv('cancer_data_edited.csv')
df.head()

In [None]:
# Example of using a mask to filter our data
mask = df['diagnosis'] == 'M'
print(mask)

In [None]:
# Creating a sub dataset for malignant diagnosis
df_m = df[df['diagnosis'] == 'M']
df_m.head()

In [None]:
# Summary statistics, take a look at the mean
df_m['area'].describe()

In [None]:
# Creating a sub dataset for benign diagnosis
df_b = df[df['diagnosis'] == 'B']
# Create the same summary stats
df_b['area'].describe()

In [None]:
# Create a histogram plot
# .hist() returns a matplotlib subplot
# alpha changes it's transparency
# figsize changes the figure size
ax = df_b['area'].hist(alpha=0.5, figsize=(8, 6), label='benign');
# Layer a new histogram using the same subplot that was returned as 'ax'
df_m['area'].hist(alpha=0.5, figsize=(8, 6), label='malignant', ax=ax);
# Label the subplot with titles and a legend
ax.set_title('Distributions of Benign and Malignant Tumor Areas')
ax.set_xlabel('Area');
ax.set_ylabel('Count');
ax.legend(loc='upper right');