<a href="https://colab.research.google.com/github/ua-datalab/Workshops/blob/main/Statistical_Inference/IntroLowCodeEDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Low code Exploratory Data Tools
Created: 03/15/2023<br>
Updated: 03/17/2023

(**Must be executed again in Google Colab, to show all outputs**).

This Google Colab Jupyter Notebook shows EDA examples using a set of diverse low-code exploratory data tools:

* [ydata-profiling](https://github.com/ydataai/ydata-profiling), also known as `pandas-profiling`, the primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution.
* [sweetviz](https://github.com/fbdesignpro/sweetviz). Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.
* [lux APLI](https://github.com/lux-org/lux). Lux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset.
* [DataPrep](https://github.com/sfu-db/dataprep). DataPrep.EDA is the fastest and the easiest EDA (Exploratory Data Analysis) tool in Python. It allows you to understand a Pandas/Dask DataFrame with a few lines of code in seconds.
* [AutoViz](https://github.com/AutoViML/AutoViz). AutoViz performs automatic visualization of any dataset with one line of code. Give it any input file (CSV, txt or json format) of any size and AutoViz will visualize it.


## ydata-profiling example.

In [None]:
#!pip install visions==0.7.4 --quiet

In [None]:
# pandas-profiling requirements

import sys
!{sys.executable} -m pip install -U ydata-profiling[notebook] --quiet
!jupyter nbextension enable --py widgetsnbextension


In [None]:
import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport


In [None]:
# Read Penguins dataset

pdataset = "https://raw.githubusercontent.com/clizarraga-UAD7/Datasets/main/penguins/penguins_size.csv"
df = pd.read_csv(pdataset)


In [None]:
# General dataframe info
df.info()


In [None]:
# Drop rows with missing numbers
df=df.dropna()
df.info()


In [None]:
# Pandas df.describe() summary
df.describe()


In [None]:
# To generate the standard profiling report

profile = ProfileReport(df, title="Profiling Report")


In [None]:
# displaying the report as a set of widgets

profile.to_widgets()


In [None]:
# The HTML report can be directly embedded in a cell

profile.to_notebook_iframe()


## Sweetviz example.

In [None]:
# Sweetviz install
!pip install sweetviz --quiet

In [None]:
import sweetviz as sv

my_dataframe = df.copy()

my_report = sv.analyze(df)
my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

**Note**: Google Colab does not open HTML. Please download the HTML to your local computer and open it wiith a browser.

In [None]:
# To open results in notebook.
my_report.show_notebook()

## Lux API example.

In [None]:
# Lux installation

!pip install lux-api --quiet


In [None]:
# Activate Jupyter notebook extension

!jupyter nbextension install --py luxwidget
!jupyter nbextension enable --py luxwidget


In [None]:
# Load Lux and add local commands for Googe Colab

import lux
import pandas as pd

from google.colab import output
output.enable_custom_widget_manager()


In [None]:
# Read dataset

pdataset = "https://raw.githubusercontent.com/clizarraga-UAD7/Datasets/main/penguins/penguins_size.csv"
df = pd.read_csv(pdataset)

# Drop rows with missing numbers
df=df.dropna()


In [None]:
# Now print df and a Toggle widget will appear to switch between Pandas and Lux

df

## DataPrep example

In [None]:
!pip install -U git+https://github.com/sfu-db/dataprep.git@develop --quiet

In [None]:
from dataprep.eda import *
from dataprep.eda import plot, plot_correlation, plot_missing, plot_diff, create_report

In [None]:
# Read dataset

pdataset = "https://raw.githubusercontent.com/clizarraga-UAD7/Datasets/main/penguins/penguins_size.csv"
df = pd.read_csv(pdataset)

# Drop rows with missing numbers
#df=df.dropna()

In [None]:
plot(df)

In [None]:
# Plot missing values

plot_missing(df)

In [None]:
# Missing value overview

plot_missing(df, "sex")

In [None]:
# Correlation overview

plot_correlation(df)

In [None]:
# Understand how other columns correlated to the given column

plot_correlation(df, "body_mass_g")

In [None]:
# Numerical column overview

plot(df, "body_mass_g")

In [None]:
# Categorical value overview

plot(df, "sex")

In [None]:
# Numerical vs. Numerical column relationship

plot(df, "flipper_length_mm", "body_mass_g")

In [None]:
# Numerical vs. Categorical column relationship

plot(df, "body_mass_g", "sex")

In [None]:
# Create a full report

create_report(df)

## AutoViz example

In [None]:
# Install AutoViz

!pip install autoviz --quiet

In [None]:
# Load AutoViz libraries

from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()

# AutoViz does not display plots automatically. The next line is needed.
%matplotlib inline

In [None]:
# Generate visualizations
pdataset = "https://raw.githubusercontent.com/clizarraga-UAD7/Datasets/main/penguins/penguins_size.csv"

AV.AutoViz(pdataset)
