This is a package for quick, easy and customizable data analysis with pandas and seaborn. We automate all the routine so you can focus on the data. Clear and cuztomizable workflow is proposed. Check my talk about EDA good practices and the concepts of this library.
EDAwesome is generally compatible with standard Anaconda environment in therms of dependencies. So, you can just install it in you environment with pip:
pip install edawesome
You can also install the dependencies, using requirements.txt
:
pip install -r requirements.txt
If you use Poetry, just include the depedencies in your pyproject.toml
:
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
seaborn = "^0.12.1"
kaggle = "^1.5.12"
ipython = "^8.5.0"
transitions = "^0.9.0"
patool = "^1.12"
pyspark = "^3.3.1"
pandas = "^1.5.2"
statsmodels = "^0.13.5"
scikit-learn = ">=1.2.0"
scipy = "~1.8.0"
This package is designed to be used in Jupyter Notebook. You can use step-by-step workflow or just import the functions you need. Below is the example of the step-by-step workflow:
from edawesome.eda import EDA
eda = EDA(
data_dir_path='/home/dreamtim/Desktop/Coding/turing-ds/MachineLearning/tiryko-ML1.4/data',
archives=['/home/dreamtim//Downloads/home-credit-default-risk.zip'],
use_pyspark=True,
pandas_mem_limit=1024**2,
pyspark_mem_limit='4g'
)
This will create the EDA
object. Now you can load the data into your EDA:
eda.load_data()
This will display the dataframes and their shapes. You can also use eda.dataframes
to see the dataframes. Now you can go to the next step:
eda.next()
eda.clean_check()
Let us say, that we don't want to do any cleaning in this case. So, we just go to the next step:
eda.next()
eda.categorize()
Now you can compare some numerical column by category just in one line:
eda.compare_distributions('application_train', 'ext_source_3', 'target')
Full notebook which was used for the examples above can be found in one of my real ML projects.
You can find the documentation here.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.