# The World of Sporia
---

%%html
<img src='MushroomWorld.png'>

Recent advances in astronomical research have lead to the discovery of a new planet called Sporia - a lush and vibrant world notorious for its chaotic mushroom inhabitants. Your colleague, Steve Enoki, has travelled and is currently performing field studies and analyses based out of the Port-of-Bello research centre (view image above). Results from Steve's analyses are streamed back to IBM's botanical analytics hub located in Calgary, Alberta. 

Mysteriously, Steve's incoming updates have come to a halt. His absences from the daily SCRUM calls has the team growing increasingly worried. Luckily, the comraderie at the IBM Calgary office is strong. To decide the best course of action, the botanical analytics group held a meeting. The lead researcher delivered a motivating speech stating *No human should be left behind*. Feeling inspired, you bravely step forward and volunteer to save Mr. Enoki. In a week's time you are scheduled to travel to Sporia.

Before travelling, you begin to devise a plan. You realize that the data Steve has sent over is highly conducive to wilderness survival in Sporia. Using the data attributes and your wide breadth of data science knowledge, you set out to develop a data-driven survival system that will deter you from eating or touching any poisonous fungi. 

## Your Challenge...
---
Your challenge, should you choose to accept it, is to develop a set of data driven guidelines for the purpose of wilderness survival in Sporia. There are two options for this mini data science project:
1. Perform Exploratory Data Analyses that expose important criteria regarding the toxicity of mushrooms
2. Develop a machine learning model that predicts the likelihood of a mushroom's toxicity

The following subsection will elaborate further on these options.

**Note:** The student only needs to do one of the two options. YOU ARE NOT REQUIRED TO DO BOTH!!! However, it is important to note that the student will be asked questions about the section they chose not to do. These questions will largely be conceptual asking you about the general approach to the problem.

### Option A: Exploratory Data Analysis
---
The student may conduct a set of analyses which may include data visualizations, statistical tests etc. that expose insights regarding the toxicity of a mushroom. This exercise is purposefully left open ended so that any results may be discussed during the formal interview with IBM. The student should be able to explain their results and defend any assumptions made during their analysis. Ultimately, the set of analyses should be capable of identifying key attributes that indicate whether a mushroom is poisonous or not.

As a deliverable, the IBM team expects the following:
1. 4 or more data analyses
2. A brief written discussion that explains why each analysis was conducted
3. A written set of guidelines derived from the analyses that explain some key criteria for surving the toxic jungle of Sporia

### Option B: Machine Learning Model
---
Alternatively, the student may choose to develop a machine learning model that can predict the likelihood of mushroom's toxicity. The student may choose any model formulation they believe is best for the problem. They must be able to explain why they chose this model! The student may assume that the data is perfectly clean (i.e. do not concern about missing or corrupt values). 

As a deliverable, the IBM team expects the following:
1. A model that can predict the toxicity of a mushroom
2. Evaluation Metric for the model
3. A written section that explains the choice of model, the advantages & limitations of the model, suggestions for improvement of the model and a summary of the model's performance.



## Keep in Mind
---

The student should be able to state and defend any assumptions they make when solving a problem. 
For the most part, there are no right or wrong answers. This exercise was designed to be open-ended allowing the student to lead their own mini data science project with the expectation of completing a final deliverable. This is analogous to any real-world data science consulting scenario where you are expected to develop and present a final solution that is satisfactory to the client's standards. Keep in mind that any work provided may be questioned or brought up for discussion during the formal interview. The following are some important criteria to consider while you code and prepare for the interview:
1. State any assumptions made when solving the problem
2. Write clean and well documented code
3. Be able to explain your solution in a concise manner
4. Analyze and interpret results that our conducive to the problem being solved

**Note:** Again, the student will be asked questions pertaining to both option A & option B. Questions pertaining to student's selected option will be more programming/technically oriented while the unchosen option will largely be conceptual/theoretical. 

# Import Data and Python Libraries
---
Please list all python modules used for your analyses in the code block below. As a general piece of advice, we recommend the following packages that are standard to any data science project. Links are provided to their home page. These packages are very well documented and should provide  answers to any questions you may have. By no means are you restricted to the use of these packages only!

See links below:

**Pandas Data Analysis Library:** https://pandas.pydata.org/

**Numpy Scientific Computing Package:** https://www.numpy.org/

**Sklearn - Machine Learning Library:** https://scikit-learn.org/stable/

In [None]:
### Import Libraries and Data Here ###


## Discussion
---
**Why did you choose these libraries? Please explain.**

*Explain here*

**What Data Structure did you choose for this project? Why?**

*Explain here*


# Option A: Exploratory Data Analysis (EDA)
---
Each analysis should have a preceding markdown block that describes the analysis process and any results. The description can be very brief. Discussion about results can be saved for the end of this section. We provide an example template in the markdown and code cells below. At minimum, we expect 4 analyses.

As mentioned before, we expect the following as deliverable:
1. 4 or more data analyses
2. A brief written discussion that explains why each analysis was conducted
3. A written set of guidelines derived from the analyses that explain some key criteria for surving the toxic jungle of Sporia

## Analysis 1
---
*Briedly explain analysis here*

In [None]:
### Code for Analysis 1 here ###


## Analysis 2
---
*Briedly explain analysis here*

In [None]:
### Code for Analysis 2 here ###


## Analysis 3
---
*Briedly explain analysis here*

In [None]:
### Code for Analysis 3 here ###


## Analysis 4
---
*Briedly explain analysis here*

In [None]:
### Code for Analysis 4 here ###


## EDA Results & Discussion
---

**Discuss Results**

*Provide discussion here*


**What are some key criteria for surviving the Sporia Jungle?**

*Provide list of important criteria here*

# Option B: Machine Learning Model
---
Here, the student may choose a machine learning model to predict the toxicity of a mushroom. This machine learning formulation will be a binary prediction of edible (1) or toxic (0). Since all of the features are categorical, we highly recommend one-hot encoding that data prior to modelling. It is up to you to choose your own evaluation criteria for your model. 

As mentioned before, we expect the following as a deliverable:
1. State any assumptions made when solving the problem
2. Write clean and well documented code
3. Be able to explain your solution in a concise manner
4. Analyze and interpret results that our conducive to the problem being solved

In [None]:
### All relevant code here ###
# You may choose to modulate code into different blocks if you wish


## Model Results & Discussion
---
*Please provide your writeup here*