Skip to content

Developing........One-stop data mining tools in Python Spark

License

Notifications You must be signed in to change notification settings

Treers/spark-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-learn

spark-learn is a library for data mining engineering built on top of Python Spark.

LICENSE

MIT. See License File.

Environmental Configuration

  • Apache Spark - Installation

  • How to add the remote PySpark module as an external library using local Pycharm

    # An example under the configured environment
    
    import os
    import sys
    os.environ['SPARK_HOME'] = '/usr/local/spark'
    sys.path.append("/usr/local/spark/python")
    
    try:
        from pyspark import SparkContext
        from pyspark import SparkConf
    
        print ("Successfully imported Spark Modules")
    
    except ImportError as e:
        print ("Can not import Spark Modules", e)
        sys.exit(1)
        
        
    # After running the script, the output is as follows:
    
    ssh://hadoop@106.12.30.59:22/usr/bin/python3 -u /tmp/pycharm_project_192/test.py
    Successfully imported Spark Modules
    
    Process finished with exit code 0
    

About

Developing........One-stop data mining tools in Python Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages