Skip to content

Process Big Data with distributed data systems

Notifications You must be signed in to change notification settings

whchien/big-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data

The projects are conduced under the course Big Data - MSc of Data Science @ University of Amsterdam.

  • The assignements implement a MapReduce engine and a machine learning algorithm with Apache Hadoop.

  • The final project compares the performance of a SQL database (PostgreSQL) and a distributed database (Cassandra) on increasing loads of read operations on a public retail dataset. Our results show that Cassandra read queries are faster than the equivalent read queries performed on PostgreSQL, but Cassandra is unable to efficiently perform aggregations or joins on the data.