Skip to content
AutoML template for PredictionIO
Scala Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data (refs #3) Improve type conversion May 27, 2019
project
src/main/scala (refs #4, #9) Use copied model for batch prediction May 29, 2019
.gitignore
LICENSE.txt
README.md
build.sbt
engine.json
template.json Initial import May 15, 2019

README.md

PredictionIO AutoML Engine Template

This is a Apache PredictionIO engine template which offers AutoML capability using TransmogrifAI.

You can launch a prediction WebAPI service without any coding.

Prerequisites

  • Apache PredictionIO 0.14.0

  • Apache Spark 2.3.2

  • Java 1.8

  • TransmogrifAI 0.6.0

  • Scala 2.11.12

  • Make sure you compile PredictionIO with the correct scala & spark version (check out detailed instructions):

    $ ./make-distribution.sh -Dscala.version=2.11.12 -Dspark.version=2.3.2

NOTE: if the compilation fails due to cache problems, you may want to manually remove ~/.ivy2 folder and try again.

Run Titanic example

Create an application.

$ pio app new MyAutoMLApp1
[INFO] [App$] Initialized Event Store for this app ID: 4.
[INFO] [Pio$] Created a new app:
[INFO] [Pio$]       Name: MyAutoMLApp1
[INFO] [Pio$]         ID: 1
[INFO] [Pio$] Access Key: xxxxxxxxxxxxxxxx

Set the accesskey to an environmental variable.

$ export ACCESS_KEY=xxxxxxxxxxxxxxxx

Run the event server.

$ pio eventserver &

Import data to the event server.

$ python ./data/import_titanic.py --file ./data/titanic.csv --access_key $ACCESS_KEY

Build the app

$ pio build --verbose

Train a model. It can take a long time to find the best model.

$ pio train

Deploy the trained model as Web API.

$ pio deploy

Test the Web API.

$ curl -H "Content-Type: application/json" -d '{ "pClass": "2", "name": "Wheadon, Mr. Edward H", "sex": "male", "age": 66, "sibSp": 0, "parCh": 0, "ticket": "C.A 24579", "fare", 10.5, "cabin": "", "embarked": "S" }' http://localhost:8000/queries.json
{"survived":0.0}

$ curl -H "Content-Type: application/json" -d '{ "pClass": "2", "name": "Nicola-Yarred, Miss. Jamila", "sex": "female", "age": 14, "sibSp": 1, "parCh": 0, "ticket": "2651", "fare", 11.2417, "cabin": "", "embarked": "C" }' http://localhost:8000/queries.json
{"survived":1.0}

Customize

You only need to modify algorithm parameters in engine.json to customize this template.

"algorithms": [
  {
    "name": "algo",
    "params": {
      "target" : "survived",
      "schema" : [
        {
          "field": "survived",
          "type": "double",
          "nullable": false
        },
        {
          "field": "pClass",
          "type": "string",
          "nullable": true
        },
        ...
      ]
    }
  }
]

Define schema according to your data, and specify target which will be a response of prediction Web API. Note that the target field type must be double for now.

You can’t perform that action at this time.