Skip to content

XD-DENG/Spark-ML-Intro

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Spark Machine Learning Introduction

NOTE: the methods introduced here are all based on RDD-based API. As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package. I would strongly suggest NOT use this repo for your learning anymore (please refer to https://spark.apache.org/docs/2.1.0/ml-guide.html).

In this repo, I try to introduce some basic machine learning usages of PySpark. The contents I'm going to cover would be quite simple. But I guess it would be helpful for some people since I would cover some questions I encountered myself from the perspective of a person who's used to more "normal" ML settings (like R language).

For the basic PySpark operations (Tranformations and Actions), you may refer to my another GitHub repo, Spark Practice.

Some of the examples are from the official examples given by Spark. But I will give more details.

License

Please note this repostory is under the Creative Commons Attribution-ShareAlike License[https://creativecommons.org/licenses/by-sa/3.0/].

About

PySpark Machine Learning Examples

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published