This repository creates a production-ready docker image that uses R and the keras and plumber R packages to create a neural network powered REST API. The package keras provides the ability to create neural networks, while plumber allows it to run R as a web service. The docker container is designed to be:
- R first - designed so that R users can comfortably create their own neural network powered services without having to learn other languages.
- Production ready - as small of an image size as possible while still being maintainable and understandable. At our last measure, it was 1.86gb, with around 800mb from Python and the keras backend and 900mb from R and R libraries.
- TensorFlow compatible - the container works with models created using the R keras package without any additional Python configuration. Note: this uses the cpu version of Tensorflow since it is designed for running models (not training them).
- HTTPS enabled - unfortunately, the R plumber library does not support SSL encrypted traffic. Since encryption is likely required for enterprise use, we have an optional dockerfile which uses an Apache 2 server. The server redirects HTTPS traffic as HTTP to plumber, and then take the plumber response and re-encrypts it back to HTTPS.
Check out our blog post on the T-Mobile open source website.
How to use this repository
To use this repository, first download and install docker. Once docker is installed and the repository is downloaded, go to the repository folder and open a terminal, then run the following commands:
docker build -t r-tensorflow-api . docker run --rm -p 80:80 r-tensorflow-api
The first command builds the docker image (which may take up to 30 minutes to compile the R libraries), and the second runs the container.
If the image successfully built, you should be able to go to http://127.0.0.1/name in your browser and see a random pet name generated from a neural network. The names are based on pet licenses from the City of Seattle. You can also select if you want a cat or dog name, and the first few characters of the pet name by editing the query string of the url:
Creating your own API
To make your own API, you only need to modify the R code in the
src folder. The code in there is split into several files:
main.R[required] - this is what gets run by the API, and only contains code to start plumber. You should not need to modify this.
rest_controller.R[required] - this file specifies the endpoints for your web service. You should modify this heavily as you create your own endpoints.
setup.R[required] - this script is run when building the image to install the R libraries. If any libraries fail to install, the build aborts. You should modify this to include the libraries you need. If you want to install packages from GitHub, you can install devtools and then call it from within this script.
runtime_functions.R- this file contains any additional functions that are needed when you run the service. It's a good place to store functions to keep the
parameters.R- this file contains parameters needed for the model. It's a good place to put values that are need for both building the model and during runtime.
train_model.R- this script trains the model and saves it. It is not run as part of the container, it's only here as a reference. The repository already includes the output of the script (the model as an hdf5 object), but if you wanted to rebuild them you only need to run this file.
The dockerfile does the following steps:
- Load from the
rocker/r-verdocker image - This image was chosen as a balance of ease of maintenance (compared to a pure debian or alpine base) and size of the image (compared to
rocker/tidyversewhich also includes RStudio). Unfortunately this means many R packages have to be manually installed. If you are finding the images builds too slowly for your tastes, switch to using
rocker/tidyverseas the base image.
- Install the necessary linux libraries. These libraries should be sufficient for most tidyverse libraries, however if you need more check out the libraries loaded by the
- Install miniconda and the appropriate Python libraries for keras. This image uses a Miniconda version based on Python 3.6. Miniconda Python was chosen instead of Anaconda Python to decrease the size of the image by 3gb. An environmental variable is also set so R knows to use the correct Python version.
- Install the necessary R packages. Any package that you would want to install to run your service (that you'd normally use
install.packages()for) you need to list in the
setup.Rfile. Instead of using an R script to install the libraries, you could do it directly from within the dockerfile using
install2.r. We chose to split this out so that data scientists have fewer reasons to touch the dockerfile.
- Copy the R code over.
- At runtime, use Rscript to start the
main.Rscript, which has the plumber web server code.
To run the HTTPS version, use these commands:
docker build -f Dockerfile.https -t r-tensorflow-api-https . docker run --rm -p 443:443 r-tensorflow-api-https
If you try to test it by going to https://127.0.0.1/name you'll get a warning that the certificates are invalid. You'll want to replace the https/server.cert and https/server.key file with your valid certificates before deploying.
The dockerfile has several differences from the HTTP one:
- It sets up an Apache 2 server to reroute HTTPS traffic (port 443) to HTTP traffic (port 80) which gets received by plumber. This includes transferring a config file (
000-default.conf) and setting Apache 2 to only listen to port 443 and not port 80.
- It transfers the certificate and key file to use for the HTTPS encryption.
- It has a
run-r-and-redirect.shscript which is executed when the image is run. This script first starts the apache2 server then runs the
main.RR script. Thus, the docker container has two programs running simultaneously: R and Apache 2.
- The Rocker project for maintaining the R docker images these build from.
- The plumber maintainers for creating a way to use R as a web service.
- The RStudio developers for creating the keras interface to python keras.
- Rbloggers author gluc, for writing the blog post that informed our solution for https redirecting.
- The City of Seattle for making the pet license data available for public use.
Terms and conditions
From the City of Seattle on the pet license data:
The data made available here has been modified for use from its original source, which is the City of Seattle. Neither the City of Seattle nor the Office of the Chief Technology Officer (OCTO) makes any claims as to the completeness, timeliness, accuracy or content of any data contained in this application; makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the web feed is being used at one's own risk.
From RStudio for using some of the R Keras example code as a framework for our model:
the keras library is copyright 2017: RStudio, Inc; Google, Inc; François Chollet; Yuan Tang