### Paul Callaghan
### Fundamentals of Data Analysis Winter 2023/24 Assessment
***
> #### Project
> - The project is to create a notebook investigating the variables and data points within the well-known iris flower data set associated with Ronald A Fisher.
>
> - In the notebook, you should discuss the classification of each variable within the data set according to common variable types and scales of measurement in mathematics, statistics, and Python.
>
> - Select, demonstrate, and explain the most appropriate summary statistics to describe each variable.
>
> - Select, demonstrate, and explain the most appropriate plot(s) for each variable.
> 
> - The notebook should follow a cohesive narrative about the data set.
***

> ### Academic References
> - GeeksforGeeks. (2020, July 15). NumPy Random Choice() in Python. Retrieved from: https://www.geeksforgeeks.org/numpy-random-choice-in-python/.
>
> - Molin, S. (2019). Hands-On Data Analysis with Pandas.
>
>![Image](https://m.media-amazon.com/images/I/71fe9zjm3yL._AC_UF350,350_QL50_.jpg)
>
> - [McKenna, H., Chang, L., & Brinkerhoff, R. (2023). Numeracy: A Quantitative Reasoning Approach.](https://uen.pressbooks.pub/uvumqr/)
>
> ![Image](https://uen.pressbooks.pub/app/uploads/sites/202/2023/01/Numeracy-Front-Cover-2-350x453.png)

>### Technical References
> - IBM. (n.d.). Markdown and HTML quick reference. Retrieved from: https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet.
>
> - Seaborn. (n.d.). seaborn.load_dataset. Retrieved from: https://seaborn.pydata.org/generated/seaborn.load_dataset.html.
>
***

I will use the [**seaborn.load_datset**](https://github.com/mwaskom/seaborn-data/blob/master/iris.csv) function to load the Iris dataset

In [7]:
# Library Imports
import seaborn as sns

iris = sns.load_dataset('iris')

# Print head and tail of dataset to check that it is retrieved.
print(iris.head(), iris.tail())

# Print datatypes for each column
print(iris.dtypes)

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa      sepal_length  sepal_width  petal_length  petal_width    species
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica
sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object


#### Classification of Each Variable
There are five variables within Fisher's Iris dataset. These are: 

> - Sepal Length (*sepal_length*)
>
> - Sepal Width (*sepal_width*)
>
> - Petal Length (*petal_length*)
>
> - Petal Width (*petal_width*)
>
> - Species (*species*)
>
##### Classification
[Laerd Statistics](https://statistics.laerd.com/statistical-guides/types-of-variable.php) have a useful guide for classification of data where they split data into two basic variables - *categorical* and *continuous*.

**Categorical (Qualitative):**
- "Nominal variables are variables that have two or more categories, but which do not have an intrinsic order" (Laerd Statistics).

McKenna et al (2023) describes nominal data as data classified purely "purely by labelling or naming values". 

The only variable from the dataset that would be a nominal is *species*

**Continuous (Quantative):**
