Um repositório em Python para armazenar códigos de exercícios da disciplina Análise de Dados e Big Data. Também, está presente o trabalho da disciplina, feito com o Jupyter Notebook.
-
Updated
May 27, 2024 - Jupyter Notebook
Um repositório em Python para armazenar códigos de exercícios da disciplina Análise de Dados e Big Data. Também, está presente o trabalho da disciplina, feito com o Jupyter Notebook.
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
Data science encompasses a wide range of areas, topics, and sub-domains such as Big Data, Machine & Deep learning (ETL, TensorFlow, Keras), Data Mining/Visualization (EDA), BI, Predictive Analytics, Statistical Analytics, etc.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
This is my final project for PSTAT 135, Big Data Analytics, using PySpark to conduct county-wide voter turnout regression analysis by demographic. This project was done in collaboration with Tyler Kim and Erasmo Rivas. The GCP storage bucket linked below contains the full project, while the Jupyter notebook and exported PDF are included here.
Build a movie recommendation data pipeline using Azure services for efficient data ingestion, transformation, and orchestration. Utilize Azure Blob Storage, Azure Databricks, and Azure Data Factory to implement collaborative filtering and PySpark ML for accurate movie recommendations.
Collection of research notebooks done by Eonian
Project developed for the exam of Big Data computing. The code includes a set of python notebooks that implement different approaches for movie shot classification and clustering tasks.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
It's about using python and jupyter notebook to analyze the data of electrical appliances, and then using the data to predict the sales of electrical appliances.
The notebook shows how tools of the PySpark SQL module work in practice.
In this Python notebook, we explore how K-Means can be used for customer segmentation to gain a competitive advantage and improve a business's bottom line.
Notebooks for Python and Spark for Big Data
Performed Big Data Analysis on Bundesliga Football League Dataset using tools PySpark, spark-SQL, and numpy and done in Jupyter Notebook.
This repository contains a dashboard to visualize the US flights data and notebooks for some ML tasks on the same data
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."