Skip to content
#

spark-mllib

Here are 32 public repositories matching this topic...

This project demonstrates a complete ETL pipeline for Formula 1 racing data using Azure Databricks, Delta Lake, and Azure Data Factory. It covers data ingestion, transformation with PySpark and Spark SQL, data governance with Unity Catalog, and visualization through Power BI. Designed to showcase real-world data engineering workflows in Azure.

  • Updated Nov 14, 2024
  • Python

Implemented an auto-clustering tool with seed and number of clusters finder. Optimizing algorithms: Silhouette, Elbow. Clustering algorithms: k-Means, Bisecting k-Means, Gaussian Mixture. Module includes micro-macro pivoting, and dashboards displaying radius, centroids, and inertia of clusters. Used: Python, Pyspark, Matplotlib, Spark MLlib.

  • Updated Mar 26, 2022
  • Python

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph p…

  • Updated Mar 17, 2019
  • Python

Improve this page

Add a description, image, and links to the spark-mllib topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-mllib topic, visit your repo's landing page and select "manage topics."

Learn more