☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
Updated
Nov 3, 2024 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
A tool to read CSV files with CSVW metadata and transform them into other formats.
Easy ETL
DEPRECATED: YAML-based data transformations
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
fast-resource is a data transformation layer that sits between the database and the application's users, enabling quick data retrieval. It further enhances performance by caching data using Redis and Memcached.
This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting
A Holistic Platform for Automating Data Preparation
GUI and library made to flatten HUGE JSON files. A library and utility for exploring, analyzing, and flattening JSON files of any size (LARGE - GBs) into CSVs, along with CSV transformations, dynamic CSV filtering, and all with low memory utilization.
This project focuses on analyzing and visualizing the insurance portfolio of an anonymous company that implemented an aggressive growth plan in 2021 across the counties of Florida using Python and Power BI
This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.
ETL Redshift-based workflow automated with AWS Step Funtions.
Python scripts to process, and analyze log files using PySpark.
Python library to transfer and convert vertical profile time series data
This project focuses on scraping data related to Japanese Whiskey from the Whiskey Exchange website; performing necessary transformations on the scraped data and then analyzing & visualizing it using Jupyter Notebook and Power BI.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."