Build software better, together

eng-joaoelias / AnaliseDadosBigData

Um repositório em Python para armazenar códigos de exercícios da disciplina Análise de Dados e Big Data. Também, está presente o trabalho da disciplina, feito com o Jupyter Notebook.

python big-data pandas-dataframe linear-regression pandas python3 data-analysis dataframe predictive-analytics descriptive-statistics pandas-python descriptive-analysis

Updated May 27, 2024
Jupyter Notebook

apache / zeppelin

Star

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

javascript java scala database big-data spark nosql flink zeppelin

Updated May 17, 2024
Java

alvertogit / bigdata_docker

Star

Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

python docker data-science machine-learning scala big-data spark jupyter-notebook jupyter-lab spark3

Updated May 12, 2024
Python

In this repository i have performed some of the basic queries using Python and SQL with vast data (Big Data). Please download the html firl to view the notebook as github has html file rendering issue.

python sql big-data

Updated May 9, 2024
HTML

I2DSR / data-science-ipython-notebooks

Star

Data science encompasses a wide range of areas, topics, and sub-domains such as Big Data, Machine & Deep learning (ETL, TensorFlow, Keras), Data Mining/Visualization (EDA), BI, Predictive Analytics, Statistical Analytics, etc.

python data-science machine-learning data-mining r big-data deep-learning etl tensorflow exploratory-data-analysis keras data-visualization statistical-analysis business-intelligence predictive-analytics big-data-analytics

Updated May 3, 2024

jongyoul / zeppelin

Star

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

javascript java scala database big-data spark nosql flink zeppelin

Updated Apr 10, 2024
Java

donnemartin / data-science-ipython-notebooks

Star

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Mar 20, 2024
Python

jadianes / spark-py-notebooks

Star

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

python data-science machine-learning big-data spark notebook ipython bigdata ipython-notebook pyspark mllib data-analysis

Updated Mar 16, 2024
Jupyter Notebook

starryjay / PSTAT135Final

Star

This is my final project for PSTAT 135, Big Data Analytics, using PySpark to conduct county-wide voter turnout regression analysis by demographic. This project was done in collaboration with Tyler Kim and Erasmo Rivas. The GCP storage bucket linked below contains the full project, while the Jupyter notebook and exported PDF are included here.

python big-data apache-spark exploratory-data-analysis pyspark big-data-analytics

Updated Feb 21, 2024
Jupyter Notebook

DucAnhNTT / movie-recom-pipeline-azure

Star

Build a movie recommendation data pipeline using Azure services for efficient data ingestion, transformation, and orchestration. Utilize Azure Blob Storage, Azure Databricks, and Azure Data Factory to implement collaborative filtering and PySpark ML for accurate movie recommendations.

python big-data analytics notebook azure pyspark adf azure-storage databricks azuredatabricks azure-pipelines azurelogicapp

Updated Sep 30, 2023
Jupyter Notebook

eonian-core / research

Star

Collection of research notebooks done by Eonian

crypto big-data analytics jupyter-notebook defi

Updated Aug 15, 2023
Jupyter Notebook

bensalem14 / Hadoop-MapReduce

Star

Simple notebook illustrating Hadoop example with map reduce on a large file for big data.

big-data hadoop mapreduce

Updated Aug 15, 2023
Jupyter Notebook

rafmc98 / Shot-Cafe-movie-shot-analysis

Star

Project developed for the exam of Big Data computing. The code includes a set of python notebooks that implement different approaches for movie shot classification and clustering tasks.

python machine-learning big-data computer-vision deep-learning pyspark

Updated Aug 5, 2023
Jupyter Notebook

akromnassir / Data-Science-Python-stuffs-

Star

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science big-data keras ml pandas matplotlib dl

Updated Jul 26, 2023
Python

cworld1 / business-data

Star

It's about using python and jupyter notebook to analyze the data of electrical appliances, and then using the data to predict the sales of electrical appliances.

python big-data jupyter-notebook

Updated Jul 17, 2023
Jupyter Notebook

data42lana / learning_big_data_tools

Star

The notebook shows how tools of the PySpark SQL module work in practice.

big-data jupyter-notebook python3 pyspark pyspark-sql

Updated Jun 16, 2023
Jupyter Notebook

aibysalman / CustomerSegmentationViaK-means

Star

In this Python notebook, we explore how K-Means can be used for customer segmentation to gain a competitive advantage and improve a business's bottom line.

marketing machine-learning big-data k-means k-means-clustering customer-segmentation profitability k-means-algorithm