You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Project uses Pandas to create multiple DataFrames from CSV files containing Disneyland Reviews and Chocolate Reviews.. Cleaned those DataFrames, then loaded to PostgreSQL to create a relational database to join everything together.
Customisable ETL utility to validate, filter and merge CSV files. Off-the-shelf merges files from Google COVID-19 repository while checking the input data for errors, inconsistencies etc.