You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is my fourth project of Data Engineering Nanodegree from Udacity. In this project, I have created an EMR cluster with Spark. Extracted the data from S3 into Spark, and transformed them and written them into parquet files.
Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).