-
Notifications
You must be signed in to change notification settings - Fork 147
feat(tests): ckzg functions for peerdas tests #1614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Blobs are not entirely opaque sequence of bytes. You can think of a blob as being 32 byte chunks where each 32 byte must represent an integer less than q, where q is the bls12-381 scalar field. The "easier" way to always have it pass is to generate a 31 byte integer and then serialize that into 32 bytes. This is because every 31 byte integer will be less than q since q is roughly a 32 byte integer.
Yeah, in the original API it was deduplicated, but then you need another index to index the deduplicated blob because the verify function can take in multiple commitments -- I found this to be more confusing though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @felix314159, nice work. I would suggest only loading the trusted setup once and removing the wrapper functions. I don't believe these really add anything, right? Ckzg will do these checks too.
tests/osaka/eip7594_peerdas/spec.py
Outdated
# below are mine, not sure if they are defined in specs somewhere | ||
KZG_PROOF_LENGTH = 48 | ||
KZG_COMMITMENT_LENGTH = 48 | ||
CELL_LENGTH = 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are constants for these in the specs! Please see:
These are not exposed to the bindings, but maybe they should be. I can do this later if you want.
if int(hex_byte, 16) > 115: | ||
# TODO: figure out why this happens | ||
print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501 | ||
return None | ||
|
||
hex_byte = hex_byte.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is because of BLS_MODULUS
. The uint256 value of each field element must be less than this. Given that Python supports large integers, I think it would be best if this function took a seed value, then generated FIELD_ELEMENTS_PER_BLOB
random field elements and joined them together to make the blob. Something like:
import random
seed = 42
random.seed(seed)
BLS_MODULUS = 0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001
ints = [random.randrange(BLS_MODULUS) for _ in range(FIELD_ELEMENTS_PER_BLOB)]
encoded = [i.to_bytes(BYTES_PER_FIELD_ELEMENT, "big") for i in ints]
blob = b"".join(encoded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, great suggestion, and the seed
can be an int to replace the hex_byte: str
in the original function.
expected_blob_length = 2**int(math.log2(Spec.BYTES_PER_BLOB)) | ||
assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a strange. Why don't we just do assert len(blob) == Spec.BYTES_PER_BLOB
?
assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501 | ||
|
||
# calculate commitment | ||
ts = ckzg.load_trusted_setup("trusted_setup.txt", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slow operation. It would be best if this were only done once. Maybe in some setup function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the trusted setup as a module constant at the beginning of the file:
from os.path import realpath
from pathlib import Path
TRUSTED_SETUP_FILE_NAME = "trusted_setup.txt"
TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME
TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0)
def eest_compute_cells(blob: bytes) -> list[int]: | ||
"""Take a blob and returns a list of cells that are derived from this blob.""" | ||
ts = ckzg.load_trusted_setup("trusted_setup.txt", 0) | ||
|
||
cells = ckzg.compute_cells(blob, ts) | ||
|
||
assert len(cells) == 128 | ||
|
||
return cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several of these wrapper functions don't appear to be very useful. Could we not just call ckzg directly?
return commitment | ||
|
||
|
||
def eest_compute_cells(blob: bytes) -> list[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this return type isn't correct. It doesn't return a list of ints. It returns a list of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, thanks for putting this together!
Just a couple of comments to make some parts of it more pythonic, just overall we should try to avoid to use strings when other more appropriate primitive python types can be used.
if int(hex_byte, 16) > 115: | ||
# TODO: figure out why this happens | ||
print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501 | ||
return None | ||
|
||
hex_byte = hex_byte.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, great suggestion, and the seed
can be an int to replace the hex_byte: str
in the original function.
class PersistentBlobGenerator: | ||
# PersistentBlobGenerator("4a") creates blob_4a.json in cwd and contains blob, commitment, cells and proofs | ||
encoding: str = "utf-8" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic can be of excellent help here:
from typing import List
from pydantic import BaseModel
from ethereum_test_base_types import Bytes
class Blob(BaseModel):
name: str
blob: Bytes
commitments: List[Bytes]
cells: List[Bytes]
proofs: List[Bytes]
And this can be then converted to json using model_dump
or model_dump_json
, and the to_json
and from_json
methods can be removed.
def __init__(self, singular_byte: str): | ||
# safely derive blob from input | ||
blob: bytes | None = generate_blob_from_hex_byte(singular_byte) | ||
assert blob is not None, f"PersistentBlobGenerator received invalid input: {singular_byte}" | ||
|
||
if "0x" in singular_byte: | ||
singular_byte = singular_byte.replace("0x", "", 1) | ||
|
||
commitment: bytes = eest_blob_to_kzg_commitment(blob) | ||
cells, proofs = eest_compute_cells_and_kzg_proofs(blob) | ||
|
||
# populate instance | ||
self.name: str = "blob_" + singular_byte | ||
self.blob: bytes = blob | ||
self.commitments: list[bytes] = [commitment] * len(cells) | ||
self.cells: list[bytes] = cells | ||
self.proofs: list[bytes] = proofs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this class to pydantic, the only other change is that this init function needs to be a class method instead.
@classmethod
def from_index(cls, index: int):
...
return cls(blob, proofs, ...)
or similar.
uv.lock
Outdated
@@ -1,10 +1,6 @@ | |||
version = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @felix314159, as we discussed, can you please revert the updates to all packages and only modify the required packages.
Instructions available here:
https://eest.ethereum.org/main/dev/deps_and_packaging/
d33b90d
to
ccfedf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a lot of comments but I think these are necessary to future-proof the new code introduced 👍
Thanks!
src/ethereum_test_types/blob.py
Outdated
|
||
TRUSTED_SETUP_FILE_NAME = "blob_trusted_setup.txt" | ||
TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME | ||
TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to ClassVar
inside of Blob
that is loaded upon first use:
class Blob(CamelModel):
...
_trusted_setup: ClassVar[Any | None] = None
...
@classmethod
def trusted_setup(cls):
...
if cls._trusted_setup is None:
# load trusted setup
return cls._trusted_setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea but I will make this function void, no need to return the trusted setup if it is already accessible
src/ethereum_test_types/blob.py
Outdated
return would_be_static_blob_path | ||
|
||
@staticmethod | ||
def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802 | |
def from_seed(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": |
I think this name is a bit more accurate and follows python method naming better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should use snake case I agree. However, I strongly disagree calling it from_seed
as the seed is just an optional parameter. You probably wanted to name it from_fork
.
src/ethereum_test_types/blob.py
Outdated
fork_str: str = fork.name().lower() | ||
|
||
# if this blob already exists then load from file | ||
blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the timestamp is a good idea, but we could improve it a bit further here since I think using the timestamp to get the filename could result in too many duplicates:
fork_str: str = fork.name().lower() | |
# if this blob already exists then load from file | |
blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp) | |
fork = fork.fork_at(timestamp=timestamp) # This resolves to the correct fork in fork transition tests, but it's no-op in normal forks | |
fork_str: str = fork.name().lower() | |
# if this blob already exists then load from file | |
blob_location: Path = Blob.get_filepath(fork_str, seed) |
And then we just remove the timestamp
parameter from get_filepath
.
src/ethereum_test_types/blob.py
Outdated
def generate_blob_data(rng_seed: int = 0) -> Bytes: | ||
"""Calculate blob data deterministically via provided seed.""" | ||
# apply RNG seed | ||
random.seed(rng_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change this to use a local thread-safe instance:
random.seed(rng_seed) | |
rng = random.Random(rng_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch, thanks
src/ethereum_test_types/blob.py
Outdated
assert fork.engine_get_blobs_version() is not None, ( | ||
f"You provided fork {fork.name()} but it does not support blobs!" | ||
) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can instead use supports_blobs
here:
assert fork.engine_get_blobs_version() is not None, ( | |
f"You provided fork {fork.name()} but it does not support blobs!" | |
) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'? | |
assert fork.supports_blobs(), ( | |
f"Provided fork {fork.name()} but it does not support blobs!" | |
) |
src/ethereum_test_types/blob.py
Outdated
detected_fork = parts[1] | ||
detected_seed = parts[2] | ||
detected_timestamp = parts[3].removesuffix(".json") | ||
assert detected_seed.isdigit(), ( | ||
f"Failed to extract seed from filename. Ended up with seed {detected_seed} given filename {file_path}" # noqa: E501 | ||
) | ||
assert detected_timestamp.isdigit(), ( | ||
f"Failed to extract timestamp from filename. Ended up with timestamp {detected_timestamp} given filename {file_path}" # noqa: E501 | ||
) | ||
file_path = Blob.get_filepath( | ||
detected_fork, int(detected_seed), int(detected_timestamp) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels unnecessary, is it just for debugging purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test writer tries to import an existing blob via filename this code ensures that the name of the blob file is valid (as in matches the expected format). I would prefer to keep this, if someone tries to import an invalidly named blob this way we at least get a descriptive error message
Edit: it was removed because entire function was reworked due to suggested filename change
src/ethereum_test_types/blob.py
Outdated
assert self.fork in ["osaka"], ( | ||
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert self.fork in ["osaka"], ( | |
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | |
) | |
assert self.fork.blob_kzg_cell_proofs() > 0, ( | |
f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | |
) |
src/ethereum_test_types/blob.py
Outdated
assert len(self.cells) == 128, ( | ||
f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(self.cells) == 128, ( | |
f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | |
) | |
assert len(self.cells) == self.fork.blob_kzg_cell_proofs(), ( | |
f"You are supposed to pass a full cell list with {self.fork.blob_kzg_cell_proofs()} elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | |
) |
Same for the rest of the function, it can be adapted in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think this rename is necessary, were you getting an error somehow with the original name? If so, could you share so we can help debug pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are shadowing the stdlib "types" when we have types.py
. This means any import that implictly requires stdlib types will fail with ImportError: cannot import name 'GenericAlias' from partially initialized module 'types' (most likely due to a circular import)
. Try the following once with types.py
and then again with eest_types.py
:
echo 'from enum import Enum' > ./src/ethereum_test_types/abcd.py
uv run python ./src/ethereum_test_types/abcd.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :D I think this is particular to using uv run python ./src/...
but still I think it should be ok to finally split types.py into different files.
But I think that one deserves its own PR, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we take the local cache approach, we should remove these static blob files from the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's remove them
a7861a5
to
35ffb69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taking shape and I think it's closer to being done! 🚀
I've left a couple more minor comments.
Thanks for putting the effort into this so far! Let's try to get it through the finish line :)
src/ethereum_test_forks/__init__.py
Outdated
from typing import Literal | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary.
from typing import Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File is no longer necessary.
@@ -22,6 +23,8 @@ | |||
Transaction, | |||
TransactionException, | |||
) | |||
from ethereum_test_types import Blob | |||
from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input | |
from .common import blobs_to_transaction_input |
whitelist.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file got sorted at some point? Could you try to reset the file to the version in the base branch (should be main) and then just add the lines that are required for this PR?
kzg_proof=INF_POINT, | ||
), | ||
# you can provide the blob via file name | ||
Blob.from_file("blob_cancun_0_0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only use the more-abstract from_fork method in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit-pick but could we rename this file to "kzg_trusted_setup.txt"?
|
||
versioned_hash: Hash | ||
name: str # blob_<fork>_<seed> | ||
fork: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fork: str | |
fork: Fork |
After #1686, and can also delete all str
<->Fork
conversions everywhere in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side-effect of this change is that pydantic now struggles to serialize and deserialize Blob object (when we sticked to basic python types like fork: str
sth like Blob.model_validate_json(json_str)
and self.model_dump_json()
'just worked' but now we have to add hard-to-read logic to tell pydantic how to handle the Fork type at serialization and deserialization:
@field_validator("fork", mode="before")
@classmethod
def validate_fork(cls, v):
"""
When reading JSON file and trying to go back from fork string to fork object we must
tell pydantic how to do this.
"""
if isinstance(v, str):
return fork_string_to_object(v)
return v
@field_serializer("fork")
def serialize_fork(self, fork: Fork) -> str:
"""
When trying to serialize a Blob object into a JSON file we must
tell pydantic how to do this.
"""
return fork.name()
@marioevz Please look at the latest commit, I added Edit: I think everything is working now, can be reviewed & merged |
4bf0f25
to
f07306e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just a couple of comments but we can merge after those 👍
kzg_commitment=INF_POINT, | ||
kzg_proof=INF_POINT, | ||
), | ||
Blob.from_fork(Cancun), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob.from_fork(Cancun), | |
Blob.from_fork(fork), |
We can just pass the fork here to avoid constraining the test to a hard-coded fork.
kzg_commitment=INF_POINT, | ||
kzg_proof=INF_POINT, | ||
) | ||
Blob.from_fork(Cancun), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob.from_fork(Cancun), | |
Blob.from_fork(fork), |
Same here.
kzg_commitment=INF_POINT, | ||
kzg_proof=INF_POINT, | ||
) | ||
Blob.from_fork(Cancun), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob.from_fork(Cancun), | |
Blob.from_fork(fork), |
from ethereum_test_forks import Fork | ||
from ethereum_test_forks.forks.forks import Cancun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ethereum_test_forks.forks.forks import Cancun |
If the rest of the comments are applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
96fb60d
to
31a541c
Compare
4a23956
to
1e350f2
Compare
841673b
to
329e19f
Compare
…many blobs (7vs9, geth)
18393da
to
abab88c
Compare
🗒️ Description
Added an eest wrapper for ckzg functions we will probably use in our tests.
A few open questions:
trusted_setup.txt
, I just import in the functions that need it and it assumes it is in the same folder. Is this compatible with our pytest framework or do we need to change how this file is accessedcompute_cells
complainsNext steps:
🔗 Related Issues
✅ Checklist
mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.