Big data training material
-
Updated
Jun 29, 2023 - Python
Big data training material
This repository contains practice assignments of Intro to Hadoop and MapReduce course by Udacity.
code for Data Science 101 - Online Course for Beginner
This will show you how to send data by producer and capture by consumer using python
We'll create a simple application in Java using Spark which will integrate with the Kafka topic we created earlier. The application will read the messages as posted and count the frequency of words in every message
Ingest CSV files and load them to S3, upload Spark script to S3, run the Spark code on EMR cluster, which will pull the raw UberEats data from S3, clean the data, and load them back to S3 in the proper schema. All of this orchestrated with Airflow
Trabajo academico de la universidad UCSM, creamos una red neuronal multicapa para predecir precios de venta de viviendas en Argentina y Uruguay
This project aims to propose and evaluate the performance of the Entity Component System (ECS) architecture for Big Data and AI pipelines.
The code repository encompasses a real-time election voting system constructed with Python, Kafka, Spark Streaming, Postgres, and Streamlit. Docker Compose is employed to effortlessly launch the necessary services within Docker containers.
Intel BigDL for high energy physics
Containerized: ElasticSearch + Kibana
Loading Spotify million playlist to Mongodb, querying over previous PostgreSQL database and query optimizations for the relational db queries.
Completed the SQL Basics for Data Science Specialization from the University of California, Davis, gaining proficiency in Data Analysis, SQL, Apache Spark, and Delta Lake.
Imports raw JSON to Elasticsearch in a multi-thread way
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."