While there are plenty of Python libraries that can help create beautiful and complex visualizations, I often find myself starting with the most simplistic analyses: count plot, histogram, scatter plot, boxplot, etc. This initial EDA workflow is very similar for each new data set. But unfortunately, they are tedious. Converting to correct data types, selecting the right variable type for the right plot, iterate through all possible variable combinations, adjusting plot aesthetics and labels, etc. These are the tasks I would love to do... once. As someone that does not find great joy in completing repetitive tasks, I set out to build a tool that allows me to be as lazy as possible.

Description

Majora is a python library that automates common tasks in your exploratory data analysis. This includes missing values visualization, missing values handling, variable types handling, predictive modeling, and a variety of univariate and bivariate graphs. The goal is to provide a fast and effective tool for discovering insights, so you can quickly move on to the machine learning model.

Features

Smart data type conversion
Automatic graph discovery
Simple missing values identificaiton and handling
CART model with cross-validation and tree visualization

report.tree_model(max_depth = 4)

Classification Report on 25% of Testing Data:
              precision    recall  f1-score   support

 has disease       0.85      0.85      0.85        41
  no disease       0.83      0.83      0.83        35

    accuracy                           0.84        76
   macro avg       0.84      0.84      0.84        76
weighted avg       0.84      0.84      0.84        76

Bar chart of relative feature importance

Decision tree visualization with Dtreeviz

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
__pycache__		__pycache__
build/lib/majora		build/lib/majora
datasets		datasets
dist		dist
images		images
majora.egg-info		majora.egg-info
majora		majora
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Example Notebook.ipynb		Example Notebook.ipynb
README.md		README.md
Test.ipynb		Test.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Description

Features

Table of Contents

Installation

Usage

Dataset Overview

Missing Values

Identify Missing Values

Handling missing values

Variable Types

Identify Data Types

Handle Suggested Type Conversions:

Visualization

Univariate Plots

Histogram

Count Plots

Word Cloud

Bivariate Plots

Correlation Plots

Principal Component Analysis

Box Plots

Relative Frequency Plots

Correspondence Analysis

Trend Plot

Statistical Modeling

About

Releases

Packages

Languages

taocao/Auto-EDA

Folders and files

Latest commit

History

Repository files navigation

Introduction

Description

Features

Table of Contents

Installation

Usage

Dataset Overview

Missing Values

Identify Missing Values

Handling missing values

Variable Types

Identify Data Types

Handle Suggested Type Conversions:

Visualization

Univariate Plots

Histogram

Count Plots

Word Cloud

Bivariate Plots

Correlation Plots

Principal Component Analysis

Box Plots

Relative Frequency Plots

Correspondence Analysis

Trend Plot

Statistical Modeling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages