# Packages for automating EDA

Load basic packages and data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data, convert column Year from `int` to `datetime`, and preview dataframe
df = pd.read_csv('data/athlete_events.csv')
df['Year'] = pd.to_datetime(df['Year'], format='%Y')
df

## Mito package

[Mito](https://www.trymito.io/) allows the user to EDA from a spreadsheet inside Python. You can call Mito into your Python environment, and each edit you make in the Mito spreadsheet will generate the equivalent Python in the code cell below.

### Installing `Mito` package

In Anaconda Prompt (run As Administrator), execute 
1. `python -m pip install mitoinstaller`
2. `python -m mitoinstaller install`

If installed correctly, you will see:

"""
 
Mito has finished installing

Please shut down the currently running JupyterLab and relaunch it to enable Mito

Then render a mitosheet following the instructions here: https://docs.trymito.io/how-to/creating-a-mitosheet
"""

### Running mito

Run an empty interface to import data files

In [None]:
import mitosheet
mitosheet.sheet()

If a database already exists, you can point to it when opening `mito` interface:

In [None]:
import mitosheet
mitosheet.sheet(df, view_df=True)

## Lux package

`pip install lux-api`

In [None]:
import lux

## Sweetviz

Sweetviz is self-described as *an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.*

Sweetviz can be installed with pip.

`pip install sweetviz`

we can generate HTML reports with Sweetviz:

In [None]:
import sweetviz as sv
my_report = sv.analyze(df)
my_report.show_html()

## Pandas Profiling

Pandas Profiling is a good tool that allows us to easily obtain information including data types, missing and unique values, and correlations between variables. It also allows us to generate interactive HTML reports.

Pandas Profiling can be installed viapip:

`pip install pandas-profiling`

Generating a report is simple. Let's import Pandas Profiling and create a report:

In [None]:
from pandas_profiling import ProfileReport

# generate report with pandas profiling
profile = ProfileReport(df, title='Worker Report')
profile

## DataPrep

DataPrep bills itself as *the fastest and the easiest EDA tool in Python. It allows data scientists to understand a Pandas/Dask DataFrame with a few lines of code in seconds.*

It is indeed very easy to use and can reduce the EDA process to just a few lines of code.

DataPrep is easy to install with pip:

`pip install dataprep`

We can use the `plot()` method to visualize our entire dataframe and to show key insights:

In [None]:
from dataprep.eda import plot

# using dataprep's plot method to get insights on each variable
plot(df)