Skip to content
#

big-data

Here are 53 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Mar 20, 2024
  • Python

Build a movie recommendation data pipeline using Azure services for efficient data ingestion, transformation, and orchestration. Utilize Azure Blob Storage, Azure Databricks, and Azure Data Factory to implement collaborative filtering and PySpark ML for accurate movie recommendations.

  • Updated Sep 30, 2023
  • Jupyter Notebook

This is the final project I had to do to finish my Big Data Expert Program in U-TAD in September 2017. It uses the following technologies: Apache Spark v2.2.0, Python v2.7.3, Jupyter Notebook (PySpark), HDFS, Hive, Cloudera Impala, Cloudera HUE and Tableau.

  • Updated May 4, 2018
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more