☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
Updated
Jan 12, 2025 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
A tool to read CSV files with CSVW metadata and transform them into other formats.
Easy ETL
DEPRECATED: YAML-based data transformations
This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting
fast-resource is a data transformation layer that sits between the database and the application's users, enabling quick data retrieval. It further enhances performance by caching data using Redis and Memcached.
GUI and library made to flatten HUGE JSON files. A library and utility for exploring, analyzing, and flattening JSON files of any size (LARGE - GBs) into CSVs, along with CSV transformations, dynamic CSV filtering, and all with low memory utilization.
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
This project focuses on analyzing and visualizing the insurance portfolio of an anonymous company that implemented an aggressive growth plan in 2021 across the counties of Florida using Python and Power BI
A Holistic Platform for Automating Data Preparation
Python scripts to process, and analyze log files using PySpark.
This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.
This project focuses on scraping data related to Japanese Whiskey from the Whiskey Exchange website; performing necessary transformations on the scraped data and then analyzing & visualizing it using Jupyter Notebook and Power BI.
Python library to transfer and convert vertical profile time series data
ETL Redshift-based workflow automated with AWS Step Funtions.
NuScenes, Lyft, Waymo and a2d2 datasets parser.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."