# Proof splitting

Here we showcase how to split a larger circuit into multiple smaller proofs. This is useful if you want to prove over multiple machines, or if you want to split a proof into multiple parts to reduce the memory requirements.

We showcase how to do this in the case where:
- intermediate calculations can be public (i.e. they do not need to be kept secret) and we can stitch the circuits together using instances
- intermediate calculations need to be kept secret and we need to use the low overhead kzg commitment scheme detailed [here](https://blog.ezkl.xyz/post/commits/) to stitch the circuits together. 


First we import the necessary dependencies and set up logging to be as informative as possible. 

In [4]:
# check if notebook is in colab
try:
    # install ezkl
    import google.colab
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "ezkl"])
    subprocess.check_call([sys.executable, "-m", "pip", "install", "onnx"])

# rely on local installation of ezkl if the notebook is not in colab
except:
    pass

from torch import nn
import ezkl
import os
import json
import logging

# uncomment for more descriptive logging 
FORMAT = '%(levelname)s %(name)s %(asctime)-15s %(filename)s:%(lineno)d %(message)s'
logging.basicConfig(format=FORMAT)
logging.getLogger().setLevel(logging.INFO)


Now we define our model. It is a humble model with but a conv layer and a $ReLU$ non-linearity, but it is a model nonetheless

In [6]:
import torch
# Defines the model
# we got convs, we got relu, 
# What else could one want ????

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=5, stride=4)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)

        return x


circuit = MyModel()

# this is where you'd train your model




We omit training for purposes of this demonstration. We've marked where training would happen in the cell above. 
Now we export the model to onnx and create a corresponding (randomly generated) input file.

You can replace the random `x` with real data if you so wish. 

In [7]:
x = torch.rand(1,*[3, 8, 8], requires_grad=True)

# Flips the neural net into inference mode
circuit.eval()

    # Export the model
torch.onnx.export(circuit,               # model being run
                      x,                   # model input (or a tuple for multiple inputs)
                      "network.onnx",            # where to save the model (can be a file or file-like object)
                      export_params=True,        # store the trained parameter weights inside the model file
                      opset_version=10,          # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      input_names = ['input'],   # the model's input names
                      output_names = ['output'], # the model's output names
                      dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                    'output' : {0 : 'batch_size'}})

data_array = ((x).detach().numpy()).reshape([-1]).tolist()

data = dict(input_data = [data_array])

    # Serialize data into file:
json.dump( data, open("input.json", 'w' ))



Now we split the model into two parts. The first part is the first conv layer and the second part is the rest of the model.

In [32]:
import onnx

input_path = "network.onnx"
output_path = "network_split_0.onnx"
input_names = ["input"]
output_names = ["/conv1/Conv_output_0"]
# first model
onnx.utils.extract_model(input_path, output_path, input_names, output_names)

KeyError: '/conv1/Conv_output_0/0'

In [28]:
import onnx

input_path = "network.onnx"
output_path = "network_split_1.onnx"
input_names = ["/conv1/Conv_output_0"]
output_names = ["output"]
# second model
onnx.utils.extract_model(input_path, output_path, input_names, output_names)

### Public intermediate calculations

This is where the magic happens. We define our `PyRunArgs` objects which contains the visibility parameters for out model. 
- `input_visibility` defines the visibility of the model inputs
- `param_visibility` defines the visibility of the model weights and constants and parameters 
- `output_visibility` defines the visibility of the model outputs

There are currently 4 visibility settings:
- `public`: known to both the verifier and prover (a subtle nuance is that this may not be the case for model parameters but until we have more rigorous theoretical results we don't want to make strong claims as to this). 
- `private`: known only to the prover
- `hashed`: the hash pre-image is known to the prover, the prover and verifier know the hash. The prover proves that the they know the pre-image to the hash. 
- `encrypted`: the non-encrypted element and the secret key used for decryption are known to the prover. The prover and the verifier know the encrypted element, the public key used to encrypt, and the hash of the decryption hey. The prover proves that they know the pre-image of the hashed decryption key and that this key can in fact decrypt the encrypted message.
- `kzgcommit`: unblinded advice column which generates a kzg commitment. This doesn't appear in the instances of the circuit and must instead be modified directly within the proof bytes.  

Here we create the following setup:
- `input_visibility`: "public"
- `param_visibility`: "public"
- `output_visibility`: public


In [30]:
import ezkl

model_path = os.path.join('network_split_0.onnx')
compiled_model_path = os.path.join('network_split_0.compiled')
pk_path = os.path.join('test_split_0.pk')
vk_path = os.path.join('test_split_0.vk')
settings_path = os.path.join('settings_split_0.json')
srs_path = os.path.join('kzg.srs')
data_path = os.path.join('input.json')

run_args = ezkl.PyRunArgs()
run_args.input_visibility = "public"
run_args.param_visibility = "public"
run_args.output_visibility = "public"
run_args.variables = [("batch_size", 1)]


Now we generate a settings file. This file basically instantiates a bunch of parameters that determine their circuit shape, size etc... Because of the way we represent nonlinearities in the circuit (using Halo2's [lookup tables](https://zcash.github.io/halo2/design/proving-system/lookup.html)), it is often best to _calibrate_ this settings file as some data can fall out of range of these lookups.

You can pass a dataset for calibration that will be representative of real inputs you might find if and when you deploy the prover. Here we create a dummy calibration dataset for demonstration purposes. 

In [16]:
!RUST_LOG=trace
# TODO: Dictionary outputs
res = ezkl.gen_settings(model_path, settings_path, py_run_args=run_args)
assert res == True

INFO ezkl.graph.model 2023-10-22 14:31:58,018 model.rs:728 set batch_size to 1
INFO ezkl.graph.model 2023-10-22 14:31:58,021 model.rs:431 [34mmodel has[0m [34m2[0m [34minstances[0m
INFO ezkl.graph.model 2023-10-22 14:31:58,021 model.rs:1286 calculating num of constraints using dummy model layout...
INFO ezkl.graph.model 2023-10-22 14:31:58,023 model.rs:445 [34mmodel generates[0m [34m462[0m [34mconstraints (excluding modules)[0m


In [17]:
# generate a bunch of dummy calibration data
cal_data = {
    "input_data": [torch.cat((x, torch.rand(10, *[3, 8, 8]))).flatten().tolist()],
}

cal_path = os.path.join('val_data.json')
# save as json file
with open(cal_path, "w") as f:
    json.dump(cal_data, f)

res = await ezkl.calibrate_settings(cal_path, model_path, settings_path, "resources")

INFO ezkl.graph.model 2023-10-22 14:32:01,922 model.rs:728 set batch_size to 1
INFO ezkl.execute 2023-10-22 14:32:01,924 execute.rs:588 num of calibration batches: 11
INFO ezkl.graph.model 2023-10-22 14:32:02,030 model.rs:728 set batch_size to 1
INFO ezkl.graph.model 2023-10-22 14:32:02,031 model.rs:431 [34mmodel has[0m [34m2[0m [34minstances[0m
INFO ezkl.graph.model 2023-10-22 14:32:02,032 model.rs:1286 calculating num of constraints using dummy model layout...
INFO ezkl.graph.model 2023-10-22 14:32:02,033 model.rs:445 [34mmodel generates[0m [34m462[0m [34mconstraints (excluding modules)[0m
INFO ezkl.graph.model 2023-10-22 14:32:02,033 model.rs:728 set batch_size to 1
INFO ezkl.graph.model 2023-10-22 14:32:02,034 model.rs:431 [34mmodel has[0m [34m2[0m [34minstances[0m
INFO ezkl.graph.model 2023-10-22 14:32:02,034 model.rs:1286 calculating num of constraints using dummy model layout...
INFO ezkl.graph.model 2023-10-22 14:32:02,035 model.rs:445 [34mmodel generates[0m

In [18]:
res = ezkl.compile_circuit(model_path, compiled_model_path, settings_path)
assert res == True

INFO ezkl.graph.model 2023-10-22 14:32:26,203 model.rs:728 set batch_size to 1


We now need to generate the (partial) circuit witness. These are the model outputs (and any hashes) that are generated when feeding the previously generated `input.json` through the circuit / model. 

In [20]:
!export RUST_BACKTRACE=1

witness_path = "witness_split_0.json"

res = ezkl.gen_witness(data_path, compiled_model_path, witness_path)

INFO ezkl.graph 2023-10-22 14:33:34,577 mod.rs:641 input scales: [7]


We now build the input to the second circuit. This is the output of the first circuit.

In [31]:
witness = json.load(open(witness_path, 'r'))
data = dict(input_data = witness['outputs'])
# Serialize data into file:
json.dump( data, open("input_1.json", 'w' ))

# now for the second model
settings = json.load(open(settings_path, 'r'))
run_args.input_scale = settings['model_output_scales'][0]

settings_path = os.path.join('settings_split_1.json')
ezkl.gen_settings("network_split_1.onnx", "settings_split_1.json", py_run_args=run_args)

model_path = os.path.join('network_split_1.onnx')
compiled_model_path = os.path.join('network_split_1.compiled')
res = ezkl.compile_circuit(model_path, compiled_model_path, settings_path)
assert res == True

witness_path = "witness_split_1.json"
res = ezkl.gen_witness(data_path, compiled_model_path, witness_path)


RuntimeError: Failed to generate settings: Translating node #0 "/conv1/Conv_output_0" Source ToTypedTranslator

As we use Halo2 with KZG-commitments we need an SRS string from (preferably) a multi-party trusted setup ceremony. For an overview of the procedures for such a ceremony check out [this page](https://blog.ethereum.org/2023/01/16/announcing-kzg-ceremony). The `get_srs` command retrieves a correctly sized SRS given the calibrated settings file from [here](https://github.com/han0110/halo2-kzg-srs). 

These SRS were generated with [this](https://github.com/privacy-scaling-explorations/perpetualpowersoftau) ceremony. 

In [19]:
res = ezkl.get_srs(srs_path, settings_path)


INFO ezkl.execute 2023-10-22 14:32:30,404 execute.rs:457 SRS downloaded


Here we setup verifying and proving keys for the circuit. As the name suggests the proving key is needed for ... proving and the verifying key is needed for ... verifying. 

In [None]:
# HERE WE SETUP THE CIRCUIT PARAMS
# WE GOT KEYS
# WE GOT CIRCUIT PARAMETERS
# EVERYTHING ANYONE HAS EVER NEEDED FOR ZK
res = ezkl.setup(
        compiled_model_path,
        vk_path,
        pk_path,
        srs_path,
    )

assert res == True
assert os.path.isfile(vk_path)
assert os.path.isfile(pk_path)
assert os.path.isfile(settings_path)




As a sanity check you can "mock prove" (i.e check that all the constraints of the circuit match without generate a full proof). 

In [None]:

res = ezkl.mock(witness_path, compiled_model_path)

Now we generate a full proof. 

In [None]:
# GENERATE A PROOF

proof_path = os.path.join('test.pf')

res = ezkl.prove(
        witness_path,
        compiled_model_path,
        pk_path,
        proof_path,
        srs_path,
        "single",
    )

print(res)
assert os.path.isfile(proof_path)

Now we need to swap out the public commitments inside the corresponding proof bytes

In [None]:
res = ezkl.swap_proof_commitments(proof_path, witness_path)

And verify it as a sanity check. 

In [None]:
# VERIFY IT

res 


res = ezkl.verify(
        proof_path,
        settings_path,
        vk_path,
        srs_path,
    )

assert res == True
print("verified")

We can now create an EVM / `.sol` verifier that can be deployed on chain to verify submitted proofs using a view function.

In [None]:

abi_path = 'test.abi'
sol_code_path = 'test.sol'

res = ezkl.create_evm_verifier(
        vk_path,
        srs_path,
        settings_path,
        sol_code_path,
        abi_path,
    )
assert res == True


## Verify on the evm

In [None]:
# Make sure anvil is running locally first
# run with $ anvil -p 3030
# we use the default anvil node here
import json

address_path = os.path.join("address.json")

res = ezkl.deploy_evm(
    address_path,
    sol_code_path,
    'http://127.0.0.1:3030'
)

assert res == True

with open(address_path, 'r') as file:
    addr = file.read().rstrip()

In [None]:
# make sure anvil is running locally
# $ anvil -p 3030

res = ezkl.verify_evm(
    proof_path,
    addr,
    "http://127.0.0.1:3030"
)
assert res == True