# Loading data into machine learning

In this notebook, we will review how to load data before any machine learning takes place.

<a href="https://colab.research.google.com/github/thomasjpfan/ml-workshop-intro/blob/master/notebooks/01-loading-data.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

In [None]:
# Install dependencies for google colab
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install -r https://raw.githubusercontent.com/thomasjpfan/ml-workshop-intro/master/requirements.txt

In [None]:
import sklearn
assert sklearn.__version__.startswith("1.0"), "Plese install scikit-learn 1.0"

## Generated datasets

## Regression

In [None]:
from sklearn.datasets import make_regression

X, y = make_regression()

In [None]:
X

In [None]:
y

## Classification

In [None]:
from sklearn.datasets import make_classification

X, y = make_classification()

In [None]:
X

In [None]:
y

## Sample Datasets

In [None]:
from sklearn.datasets import fetch_openml

In [None]:
iris = fetch_openml(data_id=61, as_frame=True)

In [None]:
print(iris.DESCR)

In [None]:
X = iris.data
X

In [None]:
y = iris.target
y

In [None]:
import matplotlib.pyplot as plt
plt.scatter(X['sepallength'], X['sepalwidth'], c=y.cat.codes)

In [None]:
iris_df = iris.frame

In [None]:
iris_df.head()

In [None]:
import seaborn as sns
sns.set_theme(font_scale=1.5)

In [None]:
iris_df.columns

In [None]:
sns.relplot(data=iris_df, x='sepallength', y='sepalwidth',
            hue='class', height=6);

In [None]:
sns.displot(data=iris_df, x='sepallength', hue='class', kind='kde', aspect=2);

In [None]:
sns.jointplot(data=iris_df, x="sepallength", y="sepalwidth", height=10, hue='class');

# Exercise 1

1. Load the wine dataset from the `sklearn.datasets` module using the `load_wine` function.
2. Print the description of the dataset.
3. What is the number of classes and features in this dataset?
4. Is this a classifiation of a regression problem? Hint: The target column is called `target`.
5. Use `sns.jointplot` to explore the relationship between the `alcohol` and `hue` features.

**If you are running locally**, you can uncomment the following cell to load the solution into the cell. On **Google Colab**, [see solution here](https://github.com/thomasjpfan/ml-workshop-intro/blob/master/notebooks/solutions/01-ex1-solution.py). 

In [None]:
# %load solutions/01-ex1-solution.py