Skip to content

feat(tests): ckzg functions for peerdas tests #1614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

felix314159
Copy link
Collaborator

🗒️ Description

Added an eest wrapper for ckzg functions we will probably use in our tests.
A few open questions:

  • Their code relies on trusted_setup.txt, I just import in the functions that need it and it assumes it is in the same folder. Is this compatible with our pytest framework or do we need to change how this file is accessed
  • I currently don't understand why I can only create blobs that hold values 0x73 or lower, otherwise calling compute_cells complains
  • It seems to me that every cell holds the exact same commitment. So you commit to your blob once and then copy paste the 48 byte commitment into every cell? Seems odd/wasteful to me, but could be the case

Next steps:

  • Use the functions in here to prepare a few static examples of blob/commitment/cells/proofs and store them as JSON (yml and me don't talk no more) in our repo. Then we can re-use these static examples in our tests instead of having to repeatedly create them at runtime.

🔗 Related Issues

✅ Checklist

  • All: Set appropriate labels for the changes.
  • All: Considered squashing commits to improve commit history.
  • All: Added an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • Tests: All converted JSON/YML tests from ethereum/tests have been added to converted-ethereum-tests.txt.
  • Tests: A PR with removal of converted JSON/YML tests from ethereum/tests have been opened.
  • Tests: Included the type and version of evm t8n tool used to locally execute test cases: e.g., ref with commit hash or geth 1.13.1-stable-3f40e65.
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.

@kevaundray
Copy link
Contributor

I currently don't understand why I can only create blobs that hold values 0x73 or lower, otherwise calling compute_cells complains

Blobs are not entirely opaque sequence of bytes. You can think of a blob as being 32 byte chunks where each 32 byte must represent an integer less than q, where q is the bls12-381 scalar field.

The "easier" way to always have it pass is to generate a 31 byte integer and then serialize that into 32 bytes. This is because every 31 byte integer will be less than q since q is roughly a 32 byte integer.

It seems to me that every cell holds the exact same commitment. So you commit to your blob once and then copy paste the 48 byte commitment into every cell? Seems odd/wasteful to me, but could be the case

Yeah, in the original API it was deduplicated, but then you need another index to index the deduplicated blob because the verify function can take in multiple commitments -- I found this to be more confusing though

Copy link
Member

@jtraglia jtraglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @felix314159, nice work. I would suggest only loading the trusted setup once and removing the wrapper functions. I don't believe these really add anything, right? Ckzg will do these checks too.

Comment on lines 28 to 31
# below are mine, not sure if they are defined in specs somewhere
KZG_PROOF_LENGTH = 48
KZG_COMMITMENT_LENGTH = 48
CELL_LENGTH = 2048
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are constants for these in the specs! Please see:

These are not exposed to the bindings, but maybe they should be. I can do this later if you want.

Comment on lines 31 to 36
if int(hex_byte, 16) > 115:
# TODO: figure out why this happens
print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501
return None

hex_byte = hex_byte.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because of BLS_MODULUS. The uint256 value of each field element must be less than this. Given that Python supports large integers, I think it would be best if this function took a seed value, then generated FIELD_ELEMENTS_PER_BLOB random field elements and joined them together to make the blob. Something like:

import random

seed = 42
random.seed(seed)
BLS_MODULUS = 0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001
ints = [random.randrange(BLS_MODULUS) for _ in range(FIELD_ELEMENTS_PER_BLOB)]
encoded = [i.to_bytes(BYTES_PER_FIELD_ELEMENT, "big") for i in ints]
blob = b"".join(encoded)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, great suggestion, and the seed can be an int to replace the hex_byte: str in the original function.

Comment on lines 55 to 56
expected_blob_length = 2**int(math.log2(Spec.BYTES_PER_BLOB))
assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a strange. Why don't we just do assert len(blob) == Spec.BYTES_PER_BLOB?

assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501

# calculate commitment
ts = ckzg.load_trusted_setup("trusted_setup.txt", 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slow operation. It would be best if this were only done once. Maybe in some setup function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the trusted setup as a module constant at the beginning of the file:

from os.path import realpath
from pathlib import Path

TRUSTED_SETUP_FILE_NAME = "trusted_setup.txt"
TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME
TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0)

Comment on lines 67 to 75
def eest_compute_cells(blob: bytes) -> list[int]:
"""Take a blob and returns a list of cells that are derived from this blob."""
ts = ckzg.load_trusted_setup("trusted_setup.txt", 0)

cells = ckzg.compute_cells(blob, ts)

assert len(cells) == 128

return cells
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of these wrapper functions don't appear to be very useful. Could we not just call ckzg directly?

return commitment


def eest_compute_cells(blob: bytes) -> list[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this return type isn't correct. It doesn't return a list of ints. It returns a list of bytes.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thanks for putting this together!

Just a couple of comments to make some parts of it more pythonic, just overall we should try to avoid to use strings when other more appropriate primitive python types can be used.

Comment on lines 31 to 36
if int(hex_byte, 16) > 115:
# TODO: figure out why this happens
print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501
return None

hex_byte = hex_byte.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, great suggestion, and the seed can be an int to replace the hex_byte: str in the original function.

class PersistentBlobGenerator:
# PersistentBlobGenerator("4a") creates blob_4a.json in cwd and contains blob, commitment, cells and proofs
encoding: str = "utf-8"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic can be of excellent help here:

from typing import List
from pydantic import BaseModel
from ethereum_test_base_types import Bytes

class Blob(BaseModel):
    name: str
    blob: Bytes
    commitments: List[Bytes]
    cells: List[Bytes]
    proofs: List[Bytes]

And this can be then converted to json using model_dump or model_dump_json, and the to_json and from_json methods can be removed.

Comment on lines 147 to 163
def __init__(self, singular_byte: str):
# safely derive blob from input
blob: bytes | None = generate_blob_from_hex_byte(singular_byte)
assert blob is not None, f"PersistentBlobGenerator received invalid input: {singular_byte}"

if "0x" in singular_byte:
singular_byte = singular_byte.replace("0x", "", 1)

commitment: bytes = eest_blob_to_kzg_commitment(blob)
cells, proofs = eest_compute_cells_and_kzg_proofs(blob)

# populate instance
self.name: str = "blob_" + singular_byte
self.blob: bytes = blob
self.commitments: list[bytes] = [commitment] * len(cells)
self.cells: list[bytes] = cells
self.proofs: list[bytes] = proofs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change this class to pydantic, the only other change is that this init function needs to be a class method instead.

@classmethod
def from_index(cls, index: int):
  ...
  return cls(blob, proofs, ...)

or similar.

uv.lock Outdated
@@ -1,10 +1,6 @@
version = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @felix314159, as we discussed, can you please revert the updates to all packages and only modify the required packages.

Instructions available here:
https://eest.ethereum.org/main/dev/deps_and_packaging/

@danceratopz danceratopz changed the title test(add): ckzg functions for peerdas tests feat(tests): ckzg functions for peerdas tests May 19, 2025
@danceratopz danceratopz added scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature fork:osaka Osaka hardfork labels May 19, 2025
Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a lot of comments but I think these are necessary to future-proof the new code introduced 👍

Thanks!


TRUSTED_SETUP_FILE_NAME = "blob_trusted_setup.txt"
TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME
TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to ClassVar inside of Blob that is loaded upon first use:

class Blob(CamelModel):
    ...
    _trusted_setup: ClassVar[Any | None] = None
    ...
    @classmethod
    def trusted_setup(cls):
        ...
        if cls._trusted_setup is None:
            # load trusted setup
        return cls._trusted_setup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea but I will make this function void, no need to return the trusted setup if it is already accessible

return would_be_static_blob_path

@staticmethod
def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802
def from_seed(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob":

I think this name is a bit more accurate and follows python method naming better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should use snake case I agree. However, I strongly disagree calling it from_seed as the seed is just an optional parameter. You probably wanted to name it from_fork.

Comment on lines 103 to 106
fork_str: str = fork.name().lower()

# if this blob already exists then load from file
blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the timestamp is a good idea, but we could improve it a bit further here since I think using the timestamp to get the filename could result in too many duplicates:

Suggested change
fork_str: str = fork.name().lower()
# if this blob already exists then load from file
blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp)
fork = fork.fork_at(timestamp=timestamp) # This resolves to the correct fork in fork transition tests, but it's no-op in normal forks
fork_str: str = fork.name().lower()
# if this blob already exists then load from file
blob_location: Path = Blob.get_filepath(fork_str, seed)

And then we just remove the timestamp parameter from get_filepath.

def generate_blob_data(rng_seed: int = 0) -> Bytes:
"""Calculate blob data deterministically via provided seed."""
# apply RNG seed
random.seed(rng_seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this to use a local thread-safe instance:

Suggested change
random.seed(rng_seed)
rng = random.Random(rng_seed)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch, thanks

Comment on lines 99 to 101
assert fork.engine_get_blobs_version() is not None, (
f"You provided fork {fork.name()} but it does not support blobs!"
) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can instead use supports_blobs here:

Suggested change
assert fork.engine_get_blobs_version() is not None, (
f"You provided fork {fork.name()} but it does not support blobs!"
) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'?
assert fork.supports_blobs(), (
f"Provided fork {fork.name()} but it does not support blobs!"
)

Comment on lines 213 to 224
detected_fork = parts[1]
detected_seed = parts[2]
detected_timestamp = parts[3].removesuffix(".json")
assert detected_seed.isdigit(), (
f"Failed to extract seed from filename. Ended up with seed {detected_seed} given filename {file_path}" # noqa: E501
)
assert detected_timestamp.isdigit(), (
f"Failed to extract timestamp from filename. Ended up with timestamp {detected_timestamp} given filename {file_path}" # noqa: E501
)
file_path = Blob.get_filepath(
detected_fork, int(detected_seed), int(detected_timestamp)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels unnecessary, is it just for debugging purposes?

Copy link
Collaborator Author

@felix314159 felix314159 May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test writer tries to import an existing blob via filename this code ensures that the name of the blob file is valid (as in matches the expected format). I would prefer to keep this, if someone tries to import an invalidly named blob this way we at least get a descriptive error message

Edit: it was removed because entire function was reworked due to suggested filename change

Comment on lines 250 to 252
assert self.fork in ["osaka"], (
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert self.fork in ["osaka"], (
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}"
)
assert self.fork.blob_kzg_cell_proofs() > 0, (
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}"
)

Comment on lines 289 to 291
assert len(self.cells) == 128, (
f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert len(self.cells) == 128, (
f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501
)
assert len(self.cells) == self.fork.blob_kzg_cell_proofs(), (
f"You are supposed to pass a full cell list with {self.fork.blob_kzg_cell_proofs()} elements to this function, but got list of length {len(self.cells)}" # noqa: E501
)

Same for the rest of the function, it can be adapted in the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think this rename is necessary, were you getting an error somehow with the original name? If so, could you share so we can help debug pls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are shadowing the stdlib "types" when we have types.py. This means any import that implictly requires stdlib types will fail with ImportError: cannot import name 'GenericAlias' from partially initialized module 'types' (most likely due to a circular import). Try the following once with types.py and then again with eest_types.py:

echo 'from enum import Enum' > ./src/ethereum_test_types/abcd.py

uv run python ./src/ethereum_test_types/abcd.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :D I think this is particular to using uv run python ./src/... but still I think it should be ok to finally split types.py into different files.

But I think that one deserves its own PR, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we take the local cache approach, we should remove these static blob files from the branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's remove them

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taking shape and I think it's closer to being done! 🚀

I've left a couple more minor comments.

Thanks for putting the effort into this so far! Let's try to get it through the finish line :)

Comment on lines 3 to 4
from typing import Literal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary.

Suggested change
from typing import Literal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is no longer necessary.

@@ -22,6 +23,8 @@
Transaction,
TransactionException,
)
from ethereum_test_types import Blob
from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input
from .common import blobs_to_transaction_input

whitelist.txt Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file got sorted at some point? Could you try to reset the file to the version in the base branch (should be main) and then just add the lines that are required for this PR?

kzg_proof=INF_POINT,
),
# you can provide the blob via file name
Blob.from_file("blob_cancun_0_0"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only use the more-abstract from_fork method in tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit-pick but could we rename this file to "kzg_trusted_setup.txt"?


versioned_hash: Hash
name: str # blob_<fork>_<seed>
fork: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fork: str
fork: Fork

After #1686, and can also delete all str<->Fork conversions everywhere in this file.

Copy link
Collaborator Author

@felix314159 felix314159 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side-effect of this change is that pydantic now struggles to serialize and deserialize Blob object (when we sticked to basic python types like fork: str sth like Blob.model_validate_json(json_str) and self.model_dump_json() 'just worked' but now we have to add hard-to-read logic to tell pydantic how to handle the Fork type at serialization and deserialization:

    @field_validator("fork", mode="before")
    @classmethod
    def validate_fork(cls, v):
        """
        When reading JSON file and trying to go back from fork string to fork object we must
        tell pydantic how to do this.
        """
        if isinstance(v, str):
            return fork_string_to_object(v)
        return v

    @field_serializer("fork")
    def serialize_fork(self, fork: Fork) -> str:
        """
        When trying to serialize a Blob object into a JSON file we must
        tell pydantic how to do this.
        """
        return fork.name()

@felix314159
Copy link
Collaborator Author

felix314159 commented Jun 2, 2025

@marioevz Please look at the latest commit, I added fork_at() like you described.

Edit: I think everything is working now, can be reviewed & merged

@felix314159 felix314159 requested a review from marioevz June 4, 2025 10:20
Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just a couple of comments but we can merge after those 👍

kzg_commitment=INF_POINT,
kzg_proof=INF_POINT,
),
Blob.from_fork(Cancun),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Blob.from_fork(Cancun),
Blob.from_fork(fork),

We can just pass the fork here to avoid constraining the test to a hard-coded fork.

kzg_commitment=INF_POINT,
kzg_proof=INF_POINT,
)
Blob.from_fork(Cancun),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Blob.from_fork(Cancun),
Blob.from_fork(fork),

Same here.

kzg_commitment=INF_POINT,
kzg_proof=INF_POINT,
)
Blob.from_fork(Cancun),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Blob.from_fork(Cancun),
Blob.from_fork(fork),

from ethereum_test_forks import Fork
from ethereum_test_forks.forks.forks import Cancun
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ethereum_test_forks.forks.forks import Cancun

If the rest of the comments are applied.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@felix314159 felix314159 requested a review from marioevz June 13, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fork:osaka Osaka hardfork scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants