Data Modeling using CassandraQL for a Music Industry Project (Sparkify). Course Project by Udactiy Data Engineering Nano Degree.
-
Updated
May 23, 2020 - Python
Data Modeling using CassandraQL for a Music Industry Project (Sparkify). Course Project by Udactiy Data Engineering Nano Degree.
Built a data warehoue with that picks data from AWS s3, transforms and loads the data into AWS redshift.
A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.
This project shows the data model of the NBA shots dataset for the 2022/2023 season.
A program to create RDF and RDF* knowledge graphs from the clinical database MIMIC-III, where a hospital layout has been created and patients move within the hospital.
Assorted tools for data modeling, ETL, and web scraping.
This project demonstrates the process of data preprocessing, model training, evaluation, and tuning in building a predictive model for a classic dataset. The Random Forest model, with its ability to handle complex relationships and interactions between features, proved to be the most effective in this case.
A repository for practices of Django and web development
In this project, we apply Data Modeling with Postgres and build an ETL pipeline using Python.
Deloitte's Virtual Experience (VE) Program offers students a platform to explore and develop skills in cyber technology, forensic investigations, data analytics, platform engineering, and coding. The program empowers participants to shape their own career paths and provides a glimpse into the exciting opportunities available at Deloitte.
This is a tiny machine learning demo which covers the relevant necessary steps included: raw dataset loading, data preprocessing, targeting params calculation, etc.
Note Taking App built with Python Flask and connected to PostgreSQL. Includes data modeling, RESTful, CRUD, login, authentication and session management.
JP Morgan Cognizant Artificial Virtual internship
A full web developed store with python using Django
This pipeline retrieves Yelp Fusion API restaurant data as JSON files, then loads it into Google Cloud Storage. The data is transformed while being moved into BigQuery and refined further with dbt. Finally, Looker Studio visualizes the processed data. Prefect orchestrates the entire workflow.
Batch & streaming data pipelines built using Databricks with Pyspark and modeled the data into star schema to analyze in PowerBI, Formula-1 racing data from multiple data sources, APIs.
Twitter message sentiment analysis
This repository is about a project to create a datawarehouse and ETL pipeline on AWS for a music streaming app. They will define Fact and Dimension tables and insert data into new tables. At the end, I'm going to process two analytical queries to check it out the top ten artists and songs.
Abstract Data Structures over neo4j
Add a description, image, and links to the data-modeling topic page so that developers can more easily learn about it.
To associate your repository with the data-modeling topic, visit your repo's landing page and select "manage topics."