You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).
In this project, we try to reproduce the paper 'Comparing Measures of Semantic Similarity' by Nikola Ljubešić et al. which aims at comparing different methods for automatic extraction of semantic similarity mesaures from a corpus.
A map-reduce implementation in Apache Hadoop (AWS EMR) for calculating the probabilities of trigrams in the Hebrew language. This project utilizes the deleted estimation two-way cross validation method to calculate trigram probabilities. The Google Hebrew Trigram database serves as this project's corpus.
A map-reduce implementation in Apache Hadoop (AWS EMR) for calculating the probabilities of trigrams in the Hebrew language. This project utilizes the deleted estimation two-way cross validation method to calculate trigram probabilities. The Google Hebrew Trigram database serves as this project's corpus.
Built a distributed system which completes several objectives with given data to generate loan reports using Amazon Web Services, Apache Spark, Java and Python.