Udacity Nano Degree Data Engineering Program
The project is about data modelling using Apache Cassandra NoSQL database. Apache Cassandra database can be linearly scalable and provide high availability in the data. Compared to SQL databases, it has faster reads and writes with low latency between nodes.
This project has two main parts: 1- ETL pipelining to preprocess the csv files. 2- Data insertion into Apache Cassandra.
Build Environment: Apache Cassandra - v3.11.6 Python 3.7