## Brain Tumor Prediction and Treatment Analysis

## Introduction 
The National Brain Tumor Society reports that about 1 million Americans are living with a brain tumor with an estimated 90,000 more to receive a primary brain tumor diagnosis in the following year. There are more than 100 distinct types of primary brain tumors, each with its own spectrum of presentations, treatments, and outcomes. More than any other cancer, brain tumors can have lasting and life-altering physical, cognitive, and psychological impacts on a patient’s life.

Our study addresses three research questions that aim to understand characteristics of brain tumor diagnosis and treatment to help reveal (if any) patterns of brain tumor presentation, to support diagnosis and treatment. Our analysis aims to understand the frequency and distribution of brain tumors within the studied population. 

1. Does tumor size differ significantly between benign and malignant tumors?
2. What is the relationship between treatment modalities (radiation, surgery, chemotherapy) and patient survival rates?
3. Does the first symptom presented predict the stage of the tumor at its discovery?

The dataset utilized in this analysis was retrieved from Kaggle https://www.kaggle.com/datasets/miadul/brain-tumor-dataset. The data consists of simulated medical repositories comprising of 20,000 synthetic patient records. While the data is simulated, its breadth reflects a variety of medical scenarios, incorporating diverse patient demographics, tumor attributes, and clinical outcomes.

## Data wrangling: Refining, Grouping, and Shaping for Analysis
To prepare the data for analysis, the following steps were performed: 
- Rounded numbers in "Tumor_Size" column, for ease of mathematical manipulation   
- Create a smaller dataframe "brain_tumor_dataset_3" containing variables of interest 
- Convert MRI results to 0 and 1 (Negative and Positive) 
- Convert "Stage" to integer values (1, 2, 3, 4) 
- Convert "Gender" to 1 and 2 (Male and Female)
- Convert the three treatment columns (Radiation_Treatment, Surgery_Performed, and Chemotherapy) to 0 and 1 (No/Yes)

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import geopandas as gpd
import statistics
import datetime
import seaborn as sns
%matplotlib inline

brain_tumor_dataset = pd.read_csv("brain_tumor_dataset.csv")
brain_tumor_dataset.head()

brain_tumor_dataset_2 = brain_tumor_dataset_1.copy()
brain_tumor_dataset_3 = brain_tumor_dataset_2[["Tumor_Size", "Tumor_Type", "Symptom_1", "Stage", "Radiation_Treatment", "Surgery_Performed", "Chemotherapy", "Survival_Rate", "Gender", "MRI_Result"]]
brain_tumor_dataset_3.head()
brain_tumor_dataset_3["Tumor_Size"] = brain_tumor_dataset_1["Tumor_Size"].round(2)
#display(brain_tumor_dataset_3.reset_index())

brain_tumor_dataset_3["Stage"] = brain_tumor_dataset_1["Stage"].replace({"I": 1, "II": 2, "III": 3, "IV": 4})
brain_tumor_dataset_3["Gender"] = brain_tumor_dataset_1["Gender"].replace({"Male": 1, "Female": 2})

brain_tumor_dataset_3["MRI_Result"] = brain_tumor_dataset_3["MRI_Result"].replace({"Negative": 0, "Positive": 1})
brain_tumor_dataset_3["Radiation_Treatment"] = brain_tumor_dataset_3["Radiation_Treatment"].replace({"No": 0, "Yes": 1})
brain_tumor_dataset_3["Surgery_Performed"] = brain_tumor_dataset_3["Surgery_Performed"].replace({"No": 0, "Yes": 1})
brain_tumor_dataset_3["Chemotherapy"] = brain_tumor_dataset_3["Chemotherapy"].replace({"No": 0, "Yes": 1})

display(brain_tumor_dataset_3)

NameError: name 'brain_tumor_dataset_1' is not defined

## Visualize the data: Change the subtitle here to describe what you are plotting etc.
 

Create visualizations of your data and findings. Describe the plots, what they show, and how they give insight into your the question you are addressing. Include the code to create these plots in the Python sections belows. You can also discuss the plots after the code. Finally, be sure to make plots as clear as possible by having clear axis labels, legends, captions etc., so that it is easy for the reader to quickly understand the central information being conveyed. 



In [None]:
# Show the analyses you did here

# Breaking this up to multiple cells with multiple descriptions of what you did is probably a good idea



## Analyses: subtitle about the analyses/models you are using 

Include other analyses here, including extracting insights using pandas and also potentially including hypothesis tests and machine learning methods in the final version of your project once we have disucssed these methods in class. 


In [None]:
# Show the analyses you did here

# Breaking this up to multiple cells with multiple descriptions of what you did is a good idea



## Conclusions

Write a few paragraphs summarizing what you found, how the findings address your question of interest, and possible future directions. Please make sure describe your conclusions in an intuitive way, and make sure that your argument is strong and backed by solid evidence from your data. 


## Reflection
  

Write a few paragraphs describing what went well with this project and what was more difficult. Also describe any additional analyses you tried that you did not end up including in this report, and approximately how much time you spend working the project. 

Finally, please go to Canvas and answer a few questions related to how the project went.



## Appendix (optional)

If there is additional code you would like to include (in order to keep your project report 10 pages or less) you can include it here. Additionally, you could create a GitHub page that has all the working code and data for your analyses (this could be beneficial later as well if you want to show this to future employers, etc.).  

