-
-
Notifications
You must be signed in to change notification settings - Fork 0
feat(chain): add BatchParser #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a unit test for the new parser function before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now CI does not pass.
class FakeCodec: | ||
def decode(self, payload: bytes, _: Any) -> Mapping[str, int]: | ||
if payload == b'{"a": 1}': | ||
return {"a": 1} | ||
else: | ||
return {"b": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of returning a fake codec here i would suggest to use Codec class by kafka schemas directly, just to exercise more code. if there is a breaking change in sentry-kafka-schemas (intentional or not), we're more likely to catch it with this test
same for FakeMessage, it seems feasible to me to create a real message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used real classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comments
monkeypatch.setattr( | ||
"sentry_streams.pipeline.msg_parser._get_codec_from_msg", | ||
lambda _: JsonCodec(json_schema={}), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to monkeypatch? Please consider using a real schema like ingest-metrics.
While, in unit tests, you want to keep your test restricted in scope and fast, mocking the schema does not buy you anything. There is no dependency on external systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we remove the schema or even just alter it, we'll have to change this. i personally regret using real schemas/datasets in some unit tests in snuba, as i changed/removed some schema and suddenly have to change random irrelevant tests that fully rely on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can hardcode a real schema instead of importing it, then the tests won't be affected if the schema changes. But that doesn't make a difference than using a fake schema imo since they're both hardcoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we remove the schema or even just alter it, we'll have to change this. i personally regret using real schemas/datasets in some unit tests in snuba, as i changed/removed some schema and suddenly have to change random irrelevant tests that fully rely on it.
Don't worry sentry-kafka-schemas will not be there for long, we need to decouple that repo from this one (even the licenses are incompatible). So feel free to use a real schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victoria-yining-huang my issue is having to hijack the behavior of the parser by monkeypatching the codec, not the fact that the schema is the real one or not. Any schema is fine, I'd try to avoid monkeypatching because it makes the test cover less code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i used real example and schema imported from sentry_kafka_schema
lib for the test now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed monkeypatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test cases for no shema or wrong shema. This covers _get_codec_from_msg
function
d674ad7
to
e0337ac
Compare
add all draft remove prints frmt move exception block frmt typing typing input output type typing add generic tout push for riya type fixes remove comment add unit test add unit test use real classes dont cast fix module path hardcode a real codec use real things from sentry kafka schema
453763d
to
6bda28f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the BatchParser output another batch? Should it use a FlatMap instead?
@evanh addressed the question "Why does the BatchParser output another batch" in weekly meeting. The design choice for batching is batch in batch out, batches stay intact throughout every single step re "Should it use a FlatMap instead?" I know currently FlatMap is not fully implemented yet ( streams/sentry_streams/sentry_streams/adapters/arroyo/rust_arroyo.py Lines 238 to 242 in 68500b1
|
No The Batch parser is just syntactic sugar over a Batch -> Custom map. |
this
BatchParser
is to be used after aBatch
step, not to be confused with theParser
step which is to be used before aBatch
step.also updated the example in
Batching.py
to use thisBatchParser
, tested successfully locally