Skip to content

trajanov/BigDataAnalytics

 
 

Repository files navigation

MET CS 777 Big Data Analytics Code Repository

This repository is a comprehensive collection of code examples, Jupyter notebooks, and tutorials designed to support students and practitioners in learning big data analytics, with a focus on Apache Spark and Python. Below is a detailed description of each folder and its contents:

Folder Structure and Contents

Contains a wide range of Jupyter notebooks illustrating the use of PySpark for data processing, machine learning, and advanced analytics. Notebooks range from basic RDD/DataFrame operations to advanced MLlib and Spark NLP examples. Subfolders include:

Contains practical examples of data analytics on well-known datasets using Spark RDDs and DataFrames. Subfolders include:

  • Spark-Example-FlightsData/: Analysis of flight data, including scripts and notebooks for exploring flight details, delays, and airport statistics.
  • Spark-Example-Social-Media/: Examples and notebooks for analyzing social media datasets (e.g., Facebook posts, tweets).
  • Spark-Example-TPCH/: Benchmarks and analytics on the TPC-H dataset, including schema explanations and example queries.
  • Spark-Example-Word-Count/: Classic word count examples using Spark, with scripts and sample data.

Contains Python scripts and notebooks implementing machine learning algorithms and statistical methods.

Step-by-step guides and tutorials for installing and configuring tools required for big data analytics, including:

  • How to install Apache Spark on Windows and MacOS.
  • How to install Git and Java JDK.
  • How to set up and use AWS EMR clusters and Google Cloud Dataproc.
  • How to run Jupyter and PySpark on Google Colab and cloud platforms.
  • Screenshots and markdown guides for various installation and setup tasks.

This repository is a valuable resource for anyone interested in learning about big data analytics using Spark and Python, providing both foundational and advanced materials for hands-on practice and experimentation.

About

Big Data Analytics - Spark and Machine Learning Examples

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.3%
  • Other 0.7%