The projects are conduced under the course Big Data - MSc of Data Science @ University of Amsterdam.
-
The assignements implement a MapReduce engine and a machine learning algorithm with Apache Hadoop.
-
The final project compares the performance of a SQL database (PostgreSQL) and a distributed database (Cassandra) on increasing loads of read operations on a public retail dataset. Our results show that Cassandra read queries are faster than the equivalent read queries performed on PostgreSQL, but Cassandra is unable to efficiently perform aggregations or joins on the data.