# Comparing execution times: `SurrealML` vs `ONNX` vs `PyTorch` 

## Table of contents

1.  <a href="#1">General dependencies and helpers</a>
2.  <a href="#2">Some words about SurrealML</a>
3.  <a href="#3">Problem refinement</a>
4.  <a href="#4">A typical neural network</a>
5.  <a href="#5">Loading the model to SurrealDB</a>
6.  <a href="#6">Generating and uploading fake data to SurrealDB</a>

<h2 id="1"></h2>

## General dependencies and helpers

We will start by exporting some tools we will use for timing, and operating with SurrealDB/SurrrealML...

In [52]:
import os
import subprocess
import time
from functools import wraps

import torch
import torch.nn as nn

from surrealml import SurMlFile, Engine
from helpers import remove_existing_container_if_exists

from IPython.core.magic import register_cell_magic

def chronometer(foo):
    @wraps(foo)
    def wrapper(*args, **kwargs):
        start = time.time()
        _ = foo(*args, **kwargs)
        end = time.time()
        return end - start

    return wrapper

@register_cell_magic
def skip(line, cell):
    return

<h2 id="2"></h2>

## Some words about SurrealML

According to the [official docs](https://surrealdb.com/docs/surrealml):
    
```text
SurrealML is an engine that seeks to do one thing, and one thing well: store and execute trained ML models. SurrealML does not intrude on the training frameworks that are already out there, instead works with them to ease the storage, loading, and execution of models. Someone using SurrealML will be able to train their model in a chosen framework in Python, save their model, and load and execute the model in either Python or Rust.
```

Basically, we aim to develop and train models using `PyTorch/scikit-learn/Tensorflow/linfa`, and then load them to SurrealDB.

Inside SurrealDB, a model is represented in the [.surml format](https://surrealdb.com/docs/surrealml/storage#the-anatomy-of-a-surml-file). Schematically, from top to bottom of a .surml file, we roughly have that:

`.surml file` =  `4 byte integer` + `variable metadata [size specified by 4 bytes integer]` + `model parameters [ONNX format]`

A .surml file is loaded by starting with the 4 bytes integer, and then using it to determine the length of the model metadata. Once the model metadata has been loaded, the loader assumes that the rest is ONNX protobuf, and parses it accordingly.

At the time of writing, in the [source code](https://github.com/surrealdb/surrealml/blob/main/surrealml/engine/__init__.py) of the `Engine` enum, we have the following docstring:

_Attributes_:
- **PYTORCH**: The PyTorch engine which will be PyTorch and ONNX.
- **NATIVE**: The native engine which will be native Rust and Linfa.
- **SKLEARN**: The scikit-learn engine which will be scikit-learn and ONNX.
- **TENSORFLOW**: The TensorFlow engine which will be TensorFlow and ONNX.
- **ONNX**: The ONNX engine which bypasses the conversion to ONNX.

Thus, we may infer that, for the sake of comparing `SurrealML` vs `ONNX` vs `PyTorch`, for the same model, it should be equivalent using `Engine.PYTORCH`/`Engine.SKLEARN`/`Engine.TENSORFLOW`, as irrespective of the framework used, the model will be exported to the ONNX first.

<a id="3"></a>

## Problem refinement

We single out three cases that may be encountered in practice, namely:

1. **Execute with SurrealML[inside SurrealDB] && fetch data from SurrealDB [optional]**: predicting with the model in .surml format _inside_ the SurrealDB, and then _optionally_ fetching the prediction from SurrealDB.
2. **Fetch data from SurrealDB && execute with PyTorch**: fetching the data from SurrealDB and _externally_ predicting with the PyTorch model.
3. **Fetch data from SurrealDB && execute with ONNX runtime**: fetching the data from SurrealDB and _externally_ predicting with the ONNX model.

Given the 3 scenarios above, one may deduct the following benefits of using SurrealML:

- **Reduced Database Transactions**
  - No need to fetch data from SurrealDB if predictions are not consumed immediately.
  - Eliminates at least **2 database transactions**, if fetching the input data and inserting the computed predictions is not needed anymore.

- **Improved Security**
  - Operates on the input used for predictions, as well as on the calculated predictions, without needing to retrieve it from the database, enhancing security.


However, one may be curious about the `performance` of SurrealML, so we will provide an implementation of an experiment to measure just this.

<h2 id="4"></h2>

## A toy neural network

In the following, we define `ToyNet`, which is a two-layer feedforward neural network with ReLU activation. It consists of an input layer with 10 features, a hidden layer of 5 neurons (`fc1`), and an output layer of 1 neuron (`fc2`).

In [53]:
class ToyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    def __str__(self) -> str:
        return self.__class__.__name__

... and then we instantiate the model, and load a persistent version of the randomly intialized parameters of the model, from a previous run: 

In [54]:
model = ToyNet()
torch.save(model.state_dict(), "./params.pth")
model.load_state_dict(torch.load("params.pth"))

<All keys matched successfully>

<h2 id="5"></h2>

## Loading the model to SurrealDB

As we know from the Engine docstring, under the hood SurrealML converts any PyTorch/scikit-learn/Tensorflow model to the ONNX format, hence we switch the model to inference mode:

In [55]:
model.eval()

ToyNet(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=1, bias=True)
)

The `SurMlFile` object comes in handy to save our model in the `.surml` format. As our model was developed using PyTorch, we select the `Engine.PYTORCH` option:

In [56]:
# Owing to the fact that the SURML format builds on the ONNX format, we have to specify an example input
example_input = torch.rand(1, 10)
surml_file = SurMlFile(
    model=model, name=str(model), inputs=example_input, engine=Engine.PYTORCH
)

# we also choose a local path where to save the model, as well as a version of the model
path_surml = "./model.surml"
surml_file.add_version("0.0.1")
surml_file.save(path_surml)



It is high time to start a SurrealDB instance, ready to be accessed at port `8000` of `localhost`. Moreover, as SurrealDB organizes the data in a two-tier way, we have to specify a namespace, let´s name it `comparison_test`, as well as a database, for which we choose `surrealml_vs_onnx_vs_pytorch`.

In [57]:
URL = "http://0.0.0.0:8000"
NS = "comparison_test"
DB = "surrealml_vs_onnx_vs_pytorch"

__NOTE__: From here on, those who wish to follow should have `Docker Engine` installed, as well as __v.1.5.4-1.5.5__ of SurrealDB. 

Subsequently, we will use the SurrealDB CLI programatically to upload the model to SurrealDB:

In [58]:
local_model_directory = os.getcwd()
model_file = "model.surml"
container_name = "surrealdb"

remove_existing_container_if_exists(container_name)
print("*" * 200, "\n" * 2, "*" * 200)

# starting the Docker container
docker_run_command = [
    "docker",
    "run",
    "--name", 
    container_name,
    "--user",
    "root",
    "--rm",
    "--pull", "always",
    "-p", "8000:8000",
    "-v", f"{local_model_directory}:/models", #we mount a volume, as the CLI tools are inside the container
    "-e", "RUST_BACKTRACE=full", #include Rust backtrace for debugging
    "surrealdb/surrealdb:v1.5.5", #we go for SurrealDB 1.5.5
    "start",
]

container_process = subprocess.Popen(docker_run_command)
time.sleep(4)
print("*" * 200, "\n" * 2, "*" * 200)

# now we use surreal ml to import the model
try:
    command_inside_container = [
        "docker",
        "exec",
        container_name,
        "/surreal",
        "ml",  
        "import",
        "--endpoint",
        "http://0.0.0.0:8000",
        "--ns",
        "comparison_test",
        "--db",
        "surrealml_vs_onnx_vs_pytorch",
        "/models/" + model_file,
    ]
    output = subprocess.check_output(command_inside_container, stderr=subprocess.STDOUT)
    print(output.decode("utf-8"))
except subprocess.CalledProcessError as e:
    print(f"Command '{e.cmd}' failed with return code {e.returncode}.")
    print("Output:\n", e.output.decode("utf-8"))

print("*" * 200, "\n" * 2, "*" * 200)

[2m2024-10-15T23:26:26.907632Z[0m [32m INFO[0m [2msurrealdb::net[0m[2m:[0m SIGTERM received. Waiting for graceful shutdown... A second signal will force an immediate shutdown
[2m2024-10-15T23:26:26.907760Z[0m [32m INFO[0m [2msurrealdb::net[0m[2m:[0m Web server stopped. Bye!
[2m2024-10-15T23:26:26.907794Z[0m [31mERROR[0m [2msurreal::cli::start[0m[2m:[0m Failed to send shutdown signal to task: sending on a closed channel


surrealdb
No existing container named 'surrealdb' found or could not be removed.
******************************************************************************************************************************************************************************************************** 

 ********************************************************************************************************************************************************************************************************


Error response from daemon: removal of container surrealdb is already in progress
v1.5.5: Pulling from surrealdb/surrealdb
Digest: sha256:b52fcd0b7ce8802fe376e220508a2cbcc082c4f9443e81775f7111437273d4ad
Status: Image is up to date for surrealdb/surrealdb:v1.5.5



 .d8888b.                                             888 8888888b.  888888b.
d88P  Y88b                                            888 888  'Y88b 888  '88b
Y88b.                                                 888 888    888 888  .88P
 'Y888b.   888  888 888d888 888d888  .d88b.   8888b.  888 888    888 8888888K.
    'Y88b. 888  888 888P'   888P'   d8P  Y8b     '88b 888 888    888 888  'Y88b
      '888 888  888 888     888     88888888 .d888888 888 888    888 888    888
Y88b  d88P Y88b 888 888     888     Y8b.     888  888 888 888  .d88P 888   d88P
 'Y8888P'   'Y88888 888     888      'Y8888  'Y888888 888 8888888P'  8888888P'




[2m2024-10-15T23:26:29.205887Z[0m [32m INFO[0m [2msurreal::env[0m[2m:[0m Running 1.5.5 for linux on x86_64
[2m2024-10-15T23:26:29.205908Z[0m [33m WARN[0m [2msurreal::dbs[0m[2m:[0m ❌🔒 IMPORTANT: Authentication is disabled. This is not recommended for production use. 🔒❌
[2m2024-10-15T23:26:29.205922Z[0m [32m INFO[0m [2msurrealdb_core::kvs::ds[0m[2m:[0m Starting kvs store in memory
[2m2024-10-15T23:26:29.205925Z[0m [32m INFO[0m [2msurrealdb_core::kvs::ds[0m[2m:[0m Started kvs store in memory
[2m2024-10-15T23:26:29.206221Z[0m [32m INFO[0m [2msurrealdb::net[0m[2m:[0m Started web server on 0.0.0.0:8000


******************************************************************************************************************************************************************************************************** 

 ********************************************************************************************************************************************************************************************************
[2m2024-10-15T23:26:31.613906Z[0m [32m INFO[0m [2msurreal::cli::ml::import[0m[2m:[0m The SurrealML file was imported successfully

******************************************************************************************************************************************************************************************************** 

 ********************************************************************************************************************************************************************************************************


According to the last logs, that resulted from executing the `surreal ml` command inside the `surrealdb` Docker container, the .surml file should be in SurrealDB. By using the [surreal sql](https://surrealdb.com/docs/surrealdb/cli/sql#sql-command) command inside the `surrealdb` container as well, I can see that the model was correctly uploaded to SurrealDB:

![image.png](attachment:image.png)

According to the documentation, the instance of the SurMlFile has an equivalent `.upload()` method, however I could not figure out what I did wrongly...

In [68]:
%%capture
"""
###################################################################################################
--> The other way to upload a model to SurrealDB. Could not make it work by the time of writing :( 
###################################################################################################

surml_file.upload(
    path="./model.surml",
    url=URL,
    chunk_size=36864,
    namespace=NS,
    database=DB,
    username=USR,
    password=PASS)
"""

<h2 id="6"></h2>

## Generating and uploading fake data to SurrealDB

In [69]:
"""
The code that was used to generate the test data, and load it in SurrealDB
"""

from surrealist import Surreal

surreal = Surreal(
    url=URL,
    namespace=NS,
    database=DB,
    credentials=CRD,
    log_level="ERROR",
    timeout=10**4,
)

max_test_size = 10**4
# chunk_size should divide max_test_size, we do it as we create the test inputs in memory, and thus we should avoid a memory crash.
chunk_size = 10**2
number_chunks = int(max_test_size / chunk_size)

# likewise, test_step should divide max_test_size
test_step = 10**3
number_steps = int(max_test_size / test_step)

with surreal.connect() as connect:
    for _ in range(number_chunks):
        test_inputs = torch.rand(chunk_size, 10).tolist()
        [
            connect.query(f"CREATE inputs:ulid() SET value = {input};")
            for input in test_inputs
        ]

TypeError: Surreal.__init__() got an unexpected keyword argument 'log_level'

In [None]:
surreal_times = []
try:
    with surreal.connect() as connect:
        for increment in range(number_steps):
            test_size = (increment + 1) * test_step

            # in a prior run, the query result was tested with assert query_result["status"] == "OK"
            # also do print(query_result) to check further
            @chronometer
            def evaluate_with_surrealdb():
                _ = connect.query(
                    f"SELECT VALUE ml::Fnn<0.0.1>(value) FROM inputs LIMIT {test_size};"
                ).to_dict()["result"]

            elapsed_time = evaluate_with_surrealdb()
            print(f"For {test_size} datapoints, it took {elapsed_time} seconds")
            surreal_times.append(elapsed_time)
except Exception as e:
    print(e)

In [None]:
pytorch_times = []
try:
    with surreal.connect() as connect:
        for increment in range(1, 11):
            test_size = increment * test_step

            @chronometer
            def evaluate_pytorch():
                inputs = connect.query(
                    f"SELECT VALUE value FROM inputs LIMIT {test_size}"
                ).to_dict()["result"]

                with torch.no_grad():
                    _ = model.forward(torch.tensor(inputs))

            elapsed_time = evaluate_pytorch()
            print(f"For {test_size} datapoints, it took {elapsed_time} seconds")
            pytorch_times.append(elapsed_time)
except Exception as e:
    print(e)