# data-attest-ezkl hashed

Here's an example leveraging EZKL whereby the hashes of the outputs to the model are read and attested to from an on-chain source.

In this setup:
- the hashes of outputs are publicly known to the prover and verifier


First we import the necessary dependencies and set up logging to be as informative as possible. 

In [None]:
# check if notebook is in colab
try:
    # install ezkl
    import google.colab
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "ezkl"])
    subprocess.check_call([sys.executable, "-m", "pip", "install", "onnx"])

# rely on local installation of ezkl if the notebook is not in colab
except:
    pass


from torch import nn
import ezkl
import os
import json
import logging

# uncomment for more descriptive logging 
# FORMAT = '%(levelname)s %(name)s %(asctime)-15s %(filename)s:%(lineno)d %(message)s'
# logging.basicConfig(format=FORMAT)
# logging.getLogger().setLevel(logging.DEBUG)


Now we define our model. It is a very simple PyTorch model that has just one layer, an average pooling 2D layer. 

In [None]:
import torch
# Defines the model

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer = nn.AvgPool2d(2, 1, (1, 1))

    def forward(self, x):
        return self.layer(x)[0]


circuit = MyModel()

# this is where you'd train your model




We omit training for purposes of this demonstration. We've marked where training would happen in the cell above. 
Now we export the model to onnx and create a corresponding (randomly generated) input. This input data will eventually be stored on chain and read from according to the call_data field in the graph input.

You can replace the random `x` with real data if you so wish. 

In [None]:
x = 0.1*torch.rand(1,*[3, 2, 2], requires_grad=True)

# Flips the neural net into inference mode
circuit.eval()

    # Export the model
torch.onnx.export(circuit,               # model being run
                      x,                   # model input (or a tuple for multiple inputs)
                      "network.onnx",            # where to save the model (can be a file or file-like object)
                      export_params=True,        # store the trained parameter weights inside the model file
                      opset_version=10,          # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      input_names = ['input'],   # the model's input names
                      output_names = ['output'], # the model's output names
                      dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                    'output' : {0 : 'batch_size'}})

data_array = ((x).detach().numpy()).reshape([-1]).tolist()

data = dict(input_data = [data_array])

    # Serialize data into file:
json.dump(data, open("input.json", 'w' ))



We now define a function that will create a new anvil instance which we will deploy our test contract too. This contract will contain in its storage the data that we will read from and attest to. In production you would not need to set up a local anvil instance. Instead you would replace RPC_URL with the actual RPC endpoint of the chain you are deploying your verifiers too, reading from the data on said chain.

In [None]:
import subprocess
import time
import threading

# make sure anvil is running locally
# $ anvil -p 3030

RPC_URL = "http://localhost:3030"

# Save process globally
anvil_process = None

def start_anvil():
    global anvil_process
    if anvil_process is None:
        anvil_process = subprocess.Popen(["anvil", "-p", "3030", "--code-size-limit=41943040"])
        if anvil_process.returncode is not None:
            raise Exception("failed to start anvil process")
        time.sleep(3)

def stop_anvil():
    global anvil_process
    if anvil_process is not None:
        anvil_process.terminate()
        anvil_process = None


We define our `PyRunArgs` objects which contains the visibility parameters for out model. 
- `input_visibility` defines the visibility of the model inputs
- `param_visibility` defines the visibility of the model weights and constants and parameters 
- `output_visibility` defines the visibility of the model outputs

Here we create the following setup:
- `input_visibility`: "private"
- `param_visibility`: "private"
- `output_visibility`: hashed


In [None]:
import ezkl

model_path = os.path.join('network.onnx')
compiled_model_path = os.path.join('network.compiled')
pk_path = os.path.join('test.pk')
vk_path = os.path.join('test.vk')
settings_path = os.path.join('settings.json')
srs_path = os.path.join('kzg.srs')
data_path = os.path.join('input.json')

run_args = ezkl.PyRunArgs()
run_args.input_visibility = "private"
run_args.param_visibility = "private"
run_args.output_visibility = "hashed"
run_args.variables = [("batch_size", 1)]




Now we generate a settings file. This file basically instantiates a bunch of parameters that determine their circuit shape, size etc... Because of the way we represent nonlinearities in the circuit (using Halo2's [lookup tables](https://zcash.github.io/halo2/design/proving-system/lookup.html)), it is often best to _calibrate_ this settings file as some data can fall out of range of these lookups.

You can pass a dataset for calibration that will be representative of real inputs you might find if and when you deploy the prover. Here we create a dummy calibration dataset for demonstration purposes. 

In [None]:
!RUST_LOG=trace
# TODO: Dictionary outputs
res = ezkl.gen_settings(model_path, settings_path, py_run_args=run_args)
assert res == True

In [None]:
# generate a bunch of dummy calibration data
cal_data = {
    "input_data": [(0.1*torch.rand(2, *[3, 2, 2])).flatten().tolist()],
}

cal_path = os.path.join('val_data.json')
# save as json file
with open(cal_path, "w") as f:
    json.dump(cal_data, f)

res = await ezkl.calibrate_settings(cal_path, model_path, settings_path, "resources")

In [None]:
res = ezkl.compile_circuit(model_path, compiled_model_path, settings_path)
assert res == True

As we use Halo2 with KZG-commitments we need an SRS string from (preferably) a multi-party trusted setup ceremony. For an overview of the procedures for such a ceremony check out [this page](https://blog.ethereum.org/2023/01/16/announcing-kzg-ceremony). The `get_srs` command retrieves a correctly sized SRS given the calibrated settings file from [here](https://github.com/han0110/halo2-kzg-srs). 

These SRS were generated with [this](https://github.com/privacy-scaling-explorations/perpetualpowersoftau) ceremony. 

In [None]:
res = await ezkl.get_srs( settings_path)


We now need to generate the circuit witness. These are the model outputs (and any hashes) that are generated when feeding the previously generated `input.json` through the circuit / model. 

In [None]:
!export RUST_BACKTRACE=1

witness_path = "witness.json"

res = await ezkl.gen_witness(data_path, compiled_model_path, witness_path)



In [None]:
print(ezkl.felt_to_big_endian(res['processed_outputs']['poseidon_hash'][0]))

We now post the hashes of the outputs to the chain. This is the data that will be read from and attested to.

In [None]:
from web3 import Web3, HTTPProvider
from solcx import compile_standard
from decimal import Decimal
import json
import os
import torch


# setup web3 instance
w3 = Web3(HTTPProvider(RPC_URL))

def test_on_chain_data(res):
    print(f'poseidon_hash: {res["processed_outputs"]["poseidon_hash"]}')
    # Step 0: Convert the tensor to a flat list
    data = [int(ezkl.felt_to_big_endian(res['processed_outputs']['poseidon_hash'][0]), 0)]

    # Step 1: Prepare the data
    # Step 2: Prepare and compile the contract.
    # We are using a test contract here but in production you would
    # use whatever contract you are fetching data from.
    contract_source_code = '''
    // SPDX-License-Identifier: UNLICENSED
    pragma solidity ^0.8.17;

    contract TestReads {

        uint[] public arr;
        constructor(uint256[] memory _numbers) {
            for(uint256 i = 0; i < _numbers.length; i++) {
                arr.push(_numbers[i]);
            }
        }
        function getArr() public view returns (uint[] memory) {
            return arr;
        }
    }
    '''

    compiled_sol = compile_standard({
        "language": "Solidity",
        "sources": {"testreads.sol": {"content": contract_source_code}},
        "settings": {"outputSelection": {"*": {"*": ["metadata", "evm.bytecode", "abi"]}}}
    })

    # Get bytecode
    bytecode = compiled_sol['contracts']['testreads.sol']['TestReads']['evm']['bytecode']['object']

    # Get ABI
    # In production if you are reading from really large contracts you can just use
    # a stripped down version of the ABI of the contract you are calling, containing only the view functions you will fetch data from.
    abi = json.loads(compiled_sol['contracts']['testreads.sol']['TestReads']['metadata'])['output']['abi']

    # Step 3: Deploy the contract
    TestReads = w3.eth.contract(abi=abi, bytecode=bytecode)
    tx_hash = TestReads.constructor(data).transact()
    tx_receipt = w3.eth.wait_for_transaction_receipt(tx_hash)
    # If you are deploying to production you can skip the 3 lines of code above and just instantiate the contract like this,
    # passing the address and abi of the contract you are fetching data from.
    contract = w3.eth.contract(address=tx_receipt['contractAddress'], abi=abi)

    # Step 4: Interact with the contract
    calldata = contract.functions.getArr().build_transaction()['data'][2:]

    # Prepare the calls_to_account object
    # If you were calling view functions across multiple contracts,
    # you would have multiple entries in the calls_to_account array,
    # one for each contract.
    decimals = [0] * len(data)
    call_to_account = {
        'call_data': calldata,
        'decimals': decimals,
        'address': contract.address[2:], # remove the '0x' prefix
    }

    print(f'call_to_account: {call_to_account}')

    return call_to_account

# Now let's start the Anvil process. You don't need to do this if you are deploying to a non-local chain.
start_anvil()

# Now let's call our function, passing in the same input tensor we used to export the model 2 cells above.
call_to_account = test_on_chain_data(res)

data = dict(input_data = [data_array], output_data =  {'rpc': RPC_URL, 'call': call_to_account })

# Serialize on-chain data into file:
json.dump(data, open("input.json", 'w'))



Here we setup verifying and proving keys for the circuit. As the name suggests the proving key is needed for ... proving and the verifying key is needed for ... verifying. 

In [None]:
# HERE WE SETUP THE CIRCUIT PARAMS
# WE GOT KEYS
# WE GOT CIRCUIT PARAMETERS
# EVERYTHING ANYONE HAS EVER NEEDED FOR ZK
res = ezkl.setup(
        compiled_model_path,
        vk_path,
        pk_path,
        
    )

assert res == True
assert os.path.isfile(vk_path)
assert os.path.isfile(pk_path)
assert os.path.isfile(settings_path)

Now we generate a full proof. 

In [None]:
# GENERATE A PROOF

proof_path = os.path.join('test.pf')

res = ezkl.prove(
        witness_path,
        compiled_model_path,
        pk_path,
        proof_path,
        
        "single",
    )

print(res)
assert os.path.isfile(proof_path)

And verify it as a sanity check. 

In [None]:
# VERIFY IT

res = ezkl.verify(
        proof_path,
        settings_path,
        vk_path,
        
    )

assert res == True
print("verified")

We can now create and then deploy a vanilla evm verifier.

In [None]:
abi_path = 'test.abi'
sol_code_path = 'test.sol'

res = await ezkl.create_evm_verifier(
        vk_path,
        
        settings_path,
        sol_code_path,
        abi_path,
    )
assert res == True

In [None]:
import json

addr_path_verifier = "addr_verifier.txt"

res = await ezkl.deploy_evm(
    addr_path_verifier,
    sol_code_path,
    'http://127.0.0.1:3030'
)

assert res == True

With the vanilla verifier deployed, we can now create the data attestation contract, which will read in the instances from the calldata to the verifier, attest to them, call the verifier and then return the result. 

In [None]:

abi_path = 'test.abi'
sol_code_path = 'test.sol'
input_path = 'input.json'

res = await ezkl.create_evm_data_attestation(
        input_path,
        settings_path,
        sol_code_path,
        abi_path,
    )

Now we can deploy the data attest verifier contract. For security reasons, this binding will only deploy to a local anvil instance, using accounts generated by anvil. 
So should only be used for testing purposes.

In [None]:
addr_path_da = "addr_da.txt"

res = await ezkl.deploy_da_evm(
        addr_path_da,
        input_path,
        settings_path,
        sol_code_path,
        RPC_URL,
    )


Call the view only verify method on the contract to verify the proof. Since it is a view function this is safe to use in production since you don't have to pass your private key.

In [None]:
# read the verifier address
addr_verifier = None
with open(addr_path_verifier, 'r') as f:
    addr = f.read()
#read the data attestation address
addr_da = None
with open(addr_path_da, 'r') as f:
    addr_da = f.read()

res = await ezkl.verify_evm(
    addr,
    proof_path,
    RPC_URL,
    addr_da,
)