politician stock market activity web scraping project
-
Updated
Jun 23, 2024 - Python
politician stock market activity web scraping project
Data Lakehouse solution for data produced by STEDI Step Trainer sensors and the mobile app so that it can train the machine learning module.
a toolkit that provides an object-oriented interface for working with parquet datasets on AWS
Implementation of ETL data pipeline to load data from S3 to snowflake and refresh tableau datasource in AWS
Intro to streaming data with Kafka, Spark and AWS Glue
Get the dataset intro a S3 bucket, use AWS glue to transform the dataset, write a Lambda script to clean the dataset, query the dataset via AWS Athena then build a dashboard using AWS Quicksight.
This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
A small walkthrough how to create an AWS Glue Job Pipeline with AWS CDK
AWS has Athena service which can query structured data from S3. The DynamoDB is managed NoSQL database. So we have to convert Unstructured data to Structured data. The code written in python & performs this objective.
This is a sample project to demonstrate how to update DynamoDB with AWS Glue
An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau
An end-to-end data ingestion pipeline for airline data, utilizing various AWS services to process and store flight information efficiently.
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Based on Zack Wilson's Data Engineering Bootcamp
Streaming data pipeline on aws, Tech session repository for hist
Deployed an OpenSearch domain to index bank transactions and created a Glue ETL job to process transactions generated from upstream on-premise applications into the domain.
Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."