# RunInference


In this notebook, we walk through the use of the RunInference transform.
The transform and its accompanying [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) classes handle the following tasks:


*   Optimizing loading models from popular frameworks.
*   Batching examples in a scalable fashion.


This notebook illustrates common RunInference patterns such as the following:
*   Generating predictions using both Pytorch and Scikit-learn.
*   Post processing results after RunInference.
*   Inference with multiple models in the same pipeline.

The linear regression models used in these samples are trained on data that correspondes to the 5 and 10 times table; that is,`y = 5x` and `y = 10x` respectively.

**NOTE**: This notebook visualizes outputs as Pandas DataFrame using the `interactibe_beam.collect()` method.

### Dependencies

The RunInference library is available in Apache Beam version <b>2.40</b> or later.

Pytorch module is needed to use Pytorch RunInference API. use `pip` to install Pytorch.

In [1]:
%pip install torch

[0mNote: you may need to restart the kernel to use updated packages.


In [None]:
import argparse
import json
import os
import torch
from typing import Tuple

import apache_beam as beam
import apache_beam.runners.interactive.interactive_beam as ib
import numpy
from apache_beam.io.gcp.bigquery import ReadFromBigQuery
from apache_beam.ml.inference.base import KeyedModelHandler
from apache_beam.ml.inference.base import PredictionResult
from apache_beam.ml.inference.base import RunInference
from apache_beam.dataframe.convert import to_pcollection
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.runners.interactive import interactive_runner


In [None]:
# Constants
import google.auth
_, project = google.auth.default()
bucket = "gs://<your-bucket>"

save_model_dir_multiply_five = 'five_times_table_torch.pt'
save_model_dir_multiply_ten = 'ten_times_table_torch.pt'

## Create data and Pytorch models for RunInference transform

### Linear regression model in Pytorch.

In [None]:
class LinearRegression(torch.nn.Module):
    def __init__(self, input_dim=1, output_dim=1):
        super().__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)  
    def forward(self, x):
        out = self.linear(x)
        return out

### Prepare train and test data to train a 5 times model.
* `x` contains values in the range from 0 to 99.
* `y` is a list of 5 * `x`. 
* `value_to_predict` includes values outside of the training data.

In [None]:
x = numpy.arange(0, 100, dtype=numpy.float32).reshape(-1, 1)
y = (x * 5).reshape(-1, 1)
value_to_predict = numpy.array([20, 40, 60, 90], dtype=numpy.float32).reshape(-1, 1)

### Train the linear regression mode on 5 times data.

In [None]:
five_times_model = LinearRegression()
optimizer = torch.optim.Adam(five_times_model.parameters())
loss_fn = torch.nn.L1Loss()

"""
Train the five_times_model
"""
epochs = 10000
tensor_x = torch.from_numpy(x)
tensor_y = torch.from_numpy(y)
for epoch in range(epochs):
    y_pred = five_times_model(tensor_x)
    loss = loss_fn(y_pred, tensor_y)
    five_times_model.zero_grad()
    loss.backward()
    optimizer.step()

Save the model using `torch.save()` and verify if the saved model file exists.

In [None]:
torch.save(five_times_model.state_dict(), save_model_dir_multiply_five)
print(os.path.exists(save_model_dir_multiply_five)) # verify if the model is saved

### Prepare train and test data to train a 10 times model.
* `x` contains values in the range from 0 to 99.
* `y` is a list of 10 * `x`. 

In [None]:
x = numpy.arange(0, 100, dtype=numpy.float32).reshape(-1, 1)
y = (x * 10).reshape(-1, 1)

### Train the linear regression model on 10 times data.

In [None]:
ten_times_model = LinearRegression()
optimizer = torch.optim.Adam(ten_times_model.parameters())
loss_fn = torch.nn.L1Loss()

epochs = 10000
tensor_x = torch.from_numpy(x)
tensor_y = torch.from_numpy(y)
for epoch in range(epochs):
    y_pred = ten_times_model(tensor_x)
    loss = loss_fn(y_pred, tensor_y)
    ten_times_model.zero_grad()
    loss.backward()
    optimizer.step()

Save the model using `torch.save()`

In [None]:
torch.save(ten_times_model.state_dict(), save_model_dir_multiply_ten)
print(os.path.exists(save_model_dir_multiply_ten)) # verify if the model is saved

# Pattern 1: RunInference for predictions.

### Step 1 - Use RunInference within the pipeline.

1. Create pytorch model handler object by passing required arguments such as `state_dict_path`, `model_class`, `model_params` to the `PytorchModelHandlerTensor` class.
2. Pass the `PytorchModelHandlerTensor` object to the RunInference transform to peform prediction on unkeyed data.

In [None]:
torch_five_times_model_handler = PytorchModelHandlerTensor(
    state_dict_path=save_model_dir_multiply_five,
    model_class=LinearRegression,
    model_params={'input_dim': 1,
                  'output_dim': 1}
                  )
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner())

inference_result = (
    pipeline 
    | "ReadInputData" >> beam.Create(value_to_predict)
    | "ConvertNumpyToTensor" >> beam.Map(torch.Tensor)
    | "RunInferenceTorch" >> RunInference(torch_five_times_model_handler)
    )
ib.collect(inference_result)

# Pattern 2: Post-process RunInference results.
Add a `PredictionProcessor` to the pipeline after `RunInference`. `PredictionProcessor` processes the output of the `RunInference` transform.

In [None]:
class PredictionProcessor(beam.DoFn):
  """
  A processor to format the output of the RunInference transform.
  """
  def process(
      self,
      element: PredictionResult):
    input_value = element.example
    output_value = element.inference
    yield (f"input is {input_value.item()} output is {output_value.item()}")

pipeline = beam.Pipeline(interactive_runner.InteractiveRunner())
inference_result = (
        pipeline
        | "ReadInputData" >> beam.Create(value_to_predict)
        | "ConvertNumpyToTensor" >> beam.Map(torch.Tensor)
        | "RunInferenceTorch" >> RunInference(torch_five_times_model_handler)
        | "PostProcessPredictions" >> beam.ParDo(PredictionProcessor())
        )
ib.collect(inference_result)

# Pattern 3: Attach a key

## Step 1 - Create a source with attached key.


## Step 2 - Modify model handler and post processor.
* Modify the pipeline to read from sources like CSV files and BigQuery.

In this step we:

* Wrap the `PytorchModelHandlerTensor` object around `KeyedModelHandler` to handle keyed data.
* Add a map transform, which converts a table row into `Tuple[str, float]`.
* Add a map transform which converts `Tuple[str, float]` from  to `Tuple[str, torch.Tensor]`.
* Modify the post inference processor to output results along with the key.

In [None]:
class PredictionWithKeyProcessor(beam.DoFn):
    def __init__(self):
        beam.DoFn.__init__(self)

    def process(
          self,
          element: Tuple[str, PredictionResult]):
        key = element[0]
        input_value = element[1].example
        output_value = element[1].inference
        yield (f"key: {key}, input: {input_value.item()} output: {output_value.item()}" )

#### Use BigQuery as the source.

Install Google Cloud BigQuery API using `pip`.

In [None]:
%pip install --upgrade google-cloud-bigquery --quiet

Create a table in the BigQuery using the snippet below, which has two columns: One holds the key and the second holds the test value. To use BiqQuery, a Google Cloud account with the BigQuery API enabled is required.

In [None]:
from google.cloud import bigquery

client = bigquery.Client()

# Make sure the dataset_id is unique in your project.
dataset_id = '{project}.maths'.format(project=project)
dataset = bigquery.Dataset(dataset_id)

# Modify the location based on your project configuration.
dataset.location = 'US'
dataset = client.create_dataset(dataset, exists_ok=True, timeout=30)

# Table name in the BigQuery dataset.
table_name = 'maths_problems_2'

query = """
    CREATE OR REPLACE TABLE
      {project}.maths.{table} ( key STRING OPTIONS(description="A unique key for the maths problem"),
    value FLOAT64 OPTIONS(description="Our maths problem" ) );
    INSERT INTO maths.{table}
    VALUES
      ("first_question", 105.00),
      ("second_question", 108.00),
      ("third_question", 1000.00),
      ("fourth_question", 1013.00)
""".format(project=project, table=table_name)

create_job = client.query(query)
create_job.result()

Use `BigQuery` as the source in the pipeline to read keyed data.

In [None]:
pipeline_options = PipelineOptions().from_dictionary({'temp_location':f'{bucket}/tmp',
                                                      })
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

keyed_torch_five_times_model_handler = KeyedModelHandler(torch_five_times_model_handler)

table_spec = f'{project}:maths.{table_name}'
inference_result = (   
    pipeline
    | "ReadFromBQ" >> beam.io.ReadFromBigQuery(table=table_spec) 
    | "PreprocessData" >> beam.Map(lambda x: (x['key'], x['value']))
    | "ConvertNumpyToTensor" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorch" >> RunInference(keyed_torch_five_times_model_handler)
    | "PostProcessPredictions" >> beam.ParDo(PredictionWithKeyProcessor())
    )

In [None]:
ib.collect(inference_result)

### Using CSV file as the source.

Create a CSV file with two columns: one named `key` that holds the keys, and a second named `value` that holds the test values.

In [None]:
input_csv_file = "../assets/run_inference/maths_problem.csv"
pipeline_options = PipelineOptions().from_dictionary({'temp_location':f'{bucket}/tmp',
                                                      })
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

keyed_torch_five_times_model_handler = KeyedModelHandler(torch_five_times_model_handler)

df = pipeline | beam.dataframe.io.read_csv(input_csv_file)
pc = to_pcollection(df)

inference_result = (   
    pc
    | "ConvertNumpyToTensor" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorch" >> RunInference(keyed_torch_five_times_model_handler)
    | "PostProcessPredictions" >> beam.ParDo(PredictionWithKeyProcessor())
    )

In [None]:
ib.collect(inference_result)

# Pattern 4: Inference with multiple models in the same pipeline.

## Inference with multiple models in parallel. 

Create a torch model handler for the 10 times model using `PytorchModelHandlerTensor`.

In [None]:
torch_ten_times_model_handler = PytorchModelHandlerTensor(state_dict_path=save_model_dir_multiply_ten,
                                        model_class=LinearRegression,
                                        model_params={'input_dim': 1,
                                                      'output_dim': 1}
                                        )
keyed_torch_ten_times_model_handler = KeyedModelHandler(torch_ten_times_model_handler)

In this, the same data is run through two different models: the one that we’ve been using to multiply by 5 
and a new model, which will learn to multiply by 10.

In [None]:
pipeline_options = PipelineOptions().from_dictionary(
                                      {'temp_location':f'{bucket}/tmp'})

pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

read_from_bq = beam.io.ReadFromBigQuery(table=table_spec)
multiply_five = (
    pipeline 
    |  read_from_bq
    | "CreateMultiplyFiveTuple" >> beam.Map(lambda x: ('{} {}'.format(x['key'], '* 5'), x['value']))
    | "ConvertNumpyToTensorFiveTuple" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorchFiveTuple" >> RunInference(keyed_torch_ten_times_model_handler)
)
multiply_ten = (
    pipeline 
    | read_from_bq 
    | "CreateMultiplyTenTuple" >> beam.Map(lambda x: ('{} {}'.format(x['key'], '* 10'), x['value']))
    | "ConvertNumpyToTensorTenTuple" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorchTenTuple" >> RunInference(keyed_torch_ten_times_model_handler)
)

inference_result = ((multiply_five, multiply_ten) | beam.Flatten() 
                                 | beam.ParDo(PredictionWithKeyProcessor()))

In [None]:
ib.collect(inference_result)

## Inference with multiple models in sequence 

In a sequential pattern, data is sent to one or more models in sequence, 
with the output from each model chaining to the next model.

1. Read the data from BigQuery.
2. Map the data.
3. RunInference with multiply by 5 model.
4. Process the results.
5. RunInference with multiply by 10 model.
6. Process the results.


In [None]:
def process_interim_inference(element):
    key, prediction_result = element
    input_value = prediction_result.example
    inference = prediction_result.inference
    formatted_input_value = 'original input is `{} {}`'.format(key, input_value)
    return formatted_input_value, inference


pipeline_options = PipelineOptions().from_dictionary(
                                      {'temp_location':f'{bucket}/tmp'})
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

multiply_five = (
    pipeline 
    | beam.io.ReadFromBigQuery(table=table_spec) 
    | "CreateMultiplyFiveTuple" >> beam.Map(lambda x: (x['key'], x['value']))
    | "ConvertNumpyToTensorFiveTuple" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorchFiveTuple" >> RunInference(keyed_torch_five_times_model_handler)
)

inference_result = (
  multiply_five 
    | "ExtractResult" >> beam.Map(process_interim_inference) 
    | "RunInferenceTorchTenTuple" >> RunInference(keyed_torch_ten_times_model_handler)
    | beam.ParDo(PredictionWithKeyProcessor())
  )

In [None]:
ib.collect(inference_result)

# Sklearn implementation of RunInference API.

Here, we showcase the Sklearn implementation of the RunInference API with the unkeyed data and the keyed data.

Sklearn is a build-dependency of Apache Beam. If a different version of sklearn needs to be installed, use `%pip install scikit-learn==<version>`

In [None]:
import pickle
from sklearn import linear_model

import numpy as np
from apache_beam.ml.inference.sklearn_inference import ModelFileType
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy

## Create the data and the Sklearn model.
In this cell, we perform:
1. Create the data to train the Sklearn linear regression model.
2. Train the linear regression model.
3. Save the Sklearn model using `pickle`.

In [None]:
# Input data to train the sklearn model.
x = numpy.arange(0, 100, dtype=numpy.float32).reshape(-1, 1)
y = (x * 5).reshape(-1, 1)

regression = linear_model.LinearRegression()
regression.fit(x,y)

sklearn_model_filename = 'sklearn_5x_model.pkl'
with open(sklearn_model_filename, 'wb') as f:
    pickle.dump(regression, f)

### Scikit-learn RunInference pipeline.

1. Define the Sklearn model handler that accepts array_like object as input.
2. Read the data from BigQuery.
3. Use the Sklearn trained model and the Sklearn RunInference transform on unkeyed data.

In [None]:
# SklearnModelHandlerNumpy accepts only unkeyed examples.
sklearn_model_handler = SklearnModelHandlerNumpy(model_uri=sklearn_model_filename,
                                                 model_file_type=ModelFileType.PICKLE) # Use ModelFileType.JOBLIB if the model is seriazlized using joblib.


pipeline_options = PipelineOptions().from_dictionary(
                                      {'temp_location':f'{bucket}/tmp'})
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

inference_result = (
    pipeline 
    | "ReadFromBQ" >> beam.io.ReadFromBigQuery(table=table_spec)
    | "ExtractInputs" >> beam.Map(lambda x: [x['value']]) 
    | "RunInferenceSklearn" >> RunInference(model_handler=sklearn_model_handler)
)

In [None]:
ib.collect(inference_result)

### Sklearn RunInference on keyed inputs.
1. Wrap the `SklearnModelHandlerNumpy` object around `KeyedModelHandler` to handle keyed data.
2. Read the data from BigQuery.
3. Use the Sklearn trained model and the Sklearn RunInference transform on a keyed data.

In [None]:
sklearn_model_handler = SklearnModelHandlerNumpy(model_uri=sklearn_model_filename,
                                                 model_file_type=ModelFileType.PICKLE) # Use ModelFileType.JOBLIB if the model is serialized using joblib.

keyed_sklearn_model_handler = KeyedModelHandler(sklearn_model_handler)

pipeline_options = PipelineOptions().from_dictionary(
                                      {'temp_location':f'{bucket}/tmp'})
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

inference_result = (
  pipeline 
  | "ReadFromBQ" >> beam.io.ReadFromBigQuery(table=table_spec)
  | "ExtractInputs" >> beam.Map(lambda x: (x['key'], [x['value']])) 
  | RunInference(model_handler=keyed_sklearn_model_handler)
  )

In [None]:
ib.collect(inference_result)

# Cross framework transforms in a single pipeline

In this pipeline, RunInference transforms of different frameworks are used in a single pipeline sequentially. 

In the below cells, we perform the following actions:
1. Create `KeyedModelHandler` for Sklearn and Pytorch. 
2. Run inference on Sklearn and perform intermediate processing using `process_interim_inference`.
3. Take the intermediate result from Sklearn RunInference transform and run that through Pytorch RunInference transform.

In [None]:
def process_interim_inference(element: Tuple[str, PredictionResult]):
    """
    Returns the key and the prediction to the next RunInference transform.
    """
    key, prediction_result = element
    prediction = prediction_result.inference
    return key, prediction

class PredictionProcessor(beam.DoFn):
    def process(self, element: Tuple[str, PredictionResult]):
        key, prediction_result = element
        input_from_upstream = prediction_result.example
        prediction = prediction_result.inference
        yield (key, prediction.item())

In [None]:
pipeline_options = PipelineOptions().from_dictionary(
                                      {'temp_location':f'{bucket}/tmp'})
pipeline = beam.Pipeline(interactive_runner.InteractiveRunner(), options=pipeline_options)

read_from_bq = beam.io.ReadFromBigQuery(table=table_spec)
keyed_inputs = "ExtractKeyedInputs" >> beam.Map(lambda x: (x['key'], [x['value']]))

keyed_sklearn_model_handler = KeyedModelHandler(SklearnModelHandlerNumpy(
    model_uri=sklearn_model_filename,
    model_file_type=ModelFileType.PICKLE))

keyed_torch_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(
    state_dict_path=save_model_dir_multiply_ten,
    model_class=LinearRegression,
    model_params={'input_dim': 1,
                  'output_dim': 1}))


sklearn_inference_result = (
    pipeline
    | read_from_bq
    | keyed_inputs
    | "RunInferenceSklearn" >> RunInference(model_handler=keyed_sklearn_model_handler)
    | "ExtractOutputs" >> beam.Map(process_interim_inference)
)

torch_inference_result = (
    sklearn_inference_result
    | "ConvertNumpyToTensorFiveTuple" >> beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
    | "RunInferenceTorchFiveTuple" >> RunInference(keyed_torch_five_times_model_handler)
    | "ProcessPredictions" >> beam.ParDo(PredictionProcessor())
)


Inspect Sklearn pipeline result

In [None]:
ib.collect(sklearn_inference_result)

Inspect Pytorch pipeline result

In [None]:
ib.collect(torch_inference_result)