awsglue

Here are 15 public repositories matching this topic...

VivekaAryan / Reddit-Data-Pipeline

This project offers a robust data pipeline solution designed to efficiently extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. Leveraging a blend of industry-standard tools and services, the pipeline ensures seamless data processing and integration.

aws airflow athena aws-s3 postgresql reddit-api celery redshift-database awsglue

Updated Jun 19, 2024
Jupyter Notebook

bhavanachitragar / Superstore-Data-Analysis-using-AWS

Star

This project builds a pipeline to analyze Superstore sales data using the power of AWS. It transforms the data to make it ready for exploration. Querying the transformed data using SQL queries to uncover trends and patterns. Analyzing results and creates easy-to-understand visualizations, providing clear insights into Superstore sales performance.

aws athena s3-bucket quicksight-dashboard awsglue

Updated May 27, 2024

TanishkaMarrott / Real-Time-Streaming-Analytics-with-Kinesis-Flink-and-OpenSearch

Star

This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.

opensearch dataengineering cloudcomputing awslambda kinesisdatastreams apacheflink awsglue realtimeanalytics

Updated May 9, 2024
Java

prestodb / prestorials

Star

Tutorials and examples of how to deploy Presto and connect it to different data sources

docker aws data tutorial sql mongodb presto example glue walkthrough datalake prestodb presto-connector prestosql lakehouse awsglue

Updated May 7, 2024

nazish555 / AWS-Data_Engineering-Spotify_Data

Star

This project showcases a data transformation pipeline utilizing AWS Glue and Amazon Athena to process Spotify data from CSV files. It involves loading, transforming, and storing data in an S3 datawarehouse, enabling seamless querying through Amazon Athena.

aws sql athena s3 etl-pipeline awsglue

Updated Mar 28, 2024
Python

parth2050 / aws-data-pipeline

Star

An End-To-End data pipeline integration from Website Source to analytical dashboard in AWS using Python flask, ML models, DynamoDB and other AWS services.

aws python3 ec2-instance datapipeline awslambda cloud-watch aws-quicksight aws-sns-sqs awsglue

Updated Mar 7, 2024
HTML

vanibhat02 / Big-Data

Star

Big data and Cloud Deployment

aws big-data athena etl aws-s3 tableau aws-cloudformation awscli sagemaker-deployment iam-authentication awsglue

Updated Jan 15, 2024
Jupyter Notebook

shreyask1406 / Financial-Market-AWS-Data-Pipeline

Star

AWS Data pipeline

aws athena aws-s3 tableau awsglue

Updated Aug 29, 2023

Undisputed-jay / SpotifyAPI-Data-Engineering-Project

Star

This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).

aws sql aws-lambda aws-s3 python3 aws-cloudformation aws-athena awsglue

Updated Jul 8, 2023
Jupyter Notebook

iqrabismii / Big-Data-Projects-

Star

Projects on Big Data Using Pyspark and AWS

ecommerce airflow athena aws-s3 pyspark tableau pyspark-mllib customer-products awsglue

Updated Apr 28, 2023
Jupyter Notebook

riship1095 / YouTube-ETL

Star

Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.

aws aws-lambda etl aws-s3 data-engineering aws-athena awsglue