Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
Updated
Aug 26, 2022 - Python
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download the any kind of Images at any date and time over the internet. These images will get downloaded as a job and then let user know that the images have been downloaded by sending them a link over an email.
Robot Framework library to execute CQL statements in Cassandra Database
An API to alter the data in MySQL, Cassandra, and MongoDB databases
Apollo is a backup tool for Cassandra database using snapshots and incremental backups and store them in AWS S3.
Scan an apache cassandra database and dump some data
Creating a Video Membership app using FastAPI & NoSQL
LlamaIndex RAG using Cassandra DB powered vectorstore with a knowledge graph
Analysing live tweets from twitter by generating a big data pipeline and scheduling it with Airflow (Using also Kafka for tweet ingestion, Cassandra for storing parsed tweets, and Spark for Analysis)
Synchronizer for ElasticSearch <-> Cassandra
A recommender for the Apt/Housing category of Craigslist.
The Financial DataWarehouse project aims to provide an efficient solution for storing and managing financial and commodities data using Cassandra. This project includes a REST API built with FastAPI for easy access and manipulation of data. Using Docker Compose, it deploys a multi-node Cassandra cluster to ensure data redundancy and fault tolerance
In this Project, I'll be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerised using Docker.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Proof-of-concept for secure data storage and retrieval in a distributed environment and Cloud using Ethereum Block Chain.
This is the Apollo ORM library for ScyllaDB/Cassandra, a Python library that provides an Object-Relational Mapping (ORM) interface to ScyllaDB.
Simple App with Django and Cassandra DB
This repo contains some simple queries for Cassandra.
Add a description, image, and links to the cassandra-database topic page so that developers can more easily learn about it.
To associate your repository with the cassandra-database topic, visit your repo's landing page and select "manage topics."