feat(chain): add BatchParser #134

victoria-yining-huang · 2025-06-12T21:02:36Z

this BatchParser is to be used after a Batch step, not to be confused with the Parser step which is to be used before a Batch step.

also updated the example in Batching.py to use this BatchParser, tested successfully locally

sentry_streams/sentry_streams/examples/batching.py

sentry_streams/sentry_streams/pipeline/chain.py

sentry_streams/sentry_streams/examples/batching.py

fpacifici

Please write a unit test for the new parser function before merging

sentry_streams/sentry_streams/examples/batching.py

sentry_streams/sentry_streams/pipeline/msg_parser.py

fpacifici

Actually now CI does not pass.

untitaker · 2025-06-16T18:10:36Z

sentry_streams/tests/pipeline/test_msg_parser.py

+    class FakeCodec:
+        def decode(self, payload: bytes, _: Any) -> Mapping[str, int]:
+            if payload == b'{"a": 1}':
+                return {"a": 1}
+            else:
+                return {"b": 2}


instead of returning a fake codec here i would suggest to use Codec class by kafka schemas directly, just to exercise more code. if there is a breaking change in sentry-kafka-schemas (intentional or not), we're more likely to catch it with this test

same for FakeMessage, it seems feasible to me to create a real message?

used real classes

fpacifici

Please address the comments

sentry_streams/sentry_streams/pipeline/chain.py

sentry_streams/tests/pipeline/test_msg_parser.py

fpacifici · 2025-06-16T22:05:07Z

sentry_streams/tests/pipeline/test_msg_parser.py

+    monkeypatch.setattr(
+        "sentry_streams.pipeline.msg_parser._get_codec_from_msg",
+        lambda _: JsonCodec(json_schema={}),
+    )


Why do you need to monkeypatch? Please consider using a real schema like ingest-metrics.
While, in unit tests, you want to keep your test restricted in scope and fast, mocking the schema does not buy you anything. There is no dependency on external systems.

if we remove the schema or even just alter it, we'll have to change this. i personally regret using real schemas/datasets in some unit tests in snuba, as i changed/removed some schema and suddenly have to change random irrelevant tests that fully rely on it.

i can hardcode a real schema instead of importing it, then the tests won't be affected if the schema changes. But that doesn't make a difference than using a fake schema imo since they're both hardcoded

if we remove the schema or even just alter it, we'll have to change this. i personally regret using real schemas/datasets in some unit tests in snuba, as i changed/removed some schema and suddenly have to change random irrelevant tests that fully rely on it.

Don't worry sentry-kafka-schemas will not be there for long, we need to decouple that repo from this one (even the licenses are incompatible). So feel free to use a real schema.

@victoria-yining-huang my issue is having to hijack the behavior of the parser by monkeypatching the codec, not the fact that the schema is the real one or not. Any schema is fine, I'd try to avoid monkeypatching because it makes the test cover less code.

i used real example and schema imported from sentry_kafka_schema lib for the test now.

removed monkeypatch

added test cases for no shema or wrong shema. This covers _get_codec_from_msg function

add all draft remove prints frmt move exception block frmt typing typing input output type typing add generic tout push for riya type fixes remove comment add unit test add unit test use real classes dont cast fix module path hardcode a real codec use real things from sentry kafka schema

evanh

Why does the BatchParser output another batch? Should it use a FlatMap instead?

victoria-yining-huang · 2025-06-18T16:42:11Z

@evanh addressed the question "Why does the BatchParser output another batch" in weekly meeting. The design choice for batching is batch in batch out, batches stay intact throughout every single step

re "Should it use a FlatMap instead?" I know currently FlatMap is not fully implemented yet (

streams/sentry_streams/sentry_streams/adapters/arroyo/rust_arroyo.py

Lines 238 to 242 in 68500b1

    
               def flat_map(self, step: FlatMap, stream: Route) -> Route: 
        
                   """ 
        
                   Builds a flat-map operator for the platform the adapter supports. 
        
                   """ 
        
                   raise NotImplementedError

) so this will not work. I'm not sure if it eventually should use FlatMap or not tho. @fpacifici ?

fpacifici · 2025-06-18T22:35:47Z

I'm not sure if it eventually should use FlatMap or not tho. @fpacifici ?

No The Batch parser is just syntactic sugar over a Batch -> Custom map.
If you want to expand the batch after parsing you would plug a flatMap

ayirr7 reviewed Jun 12, 2025

View reviewed changes

sentry_streams/sentry_streams/examples/batching.py Outdated Show resolved Hide resolved

ayirr7 reviewed Jun 12, 2025

View reviewed changes

sentry_streams/sentry_streams/pipeline/chain.py Outdated Show resolved Hide resolved

ayirr7 reviewed Jun 12, 2025

View reviewed changes

sentry_streams/sentry_streams/pipeline/chain.py Outdated Show resolved Hide resolved

ayirr7 reviewed Jun 12, 2025

View reviewed changes

sentry_streams/sentry_streams/examples/batching.py Outdated Show resolved Hide resolved

fpacifici approved these changes Jun 13, 2025

View reviewed changes

sentry_streams/sentry_streams/examples/batching.py Outdated Show resolved Hide resolved

sentry_streams/sentry_streams/examples/batching.py Outdated Show resolved Hide resolved

sentry_streams/sentry_streams/pipeline/msg_parser.py Outdated Show resolved Hide resolved

fpacifici requested changes Jun 13, 2025

View reviewed changes

untitaker reviewed Jun 16, 2025

View reviewed changes

fpacifici requested changes Jun 16, 2025

View reviewed changes

victoria-yining-huang force-pushed the vic/add_sbc_parser branch from d674ad7 to e0337ac Compare June 17, 2025 15:02

victoria-yining-huang force-pushed the vic/add_sbc_parser branch from 453763d to 6bda28f Compare June 18, 2025 04:48

evanh reviewed Jun 18, 2025

View reviewed changes

victoria-yining-huang added 2 commits June 18, 2025 12:56

remove monkeypach

4fc3a26

add no schema or wrong schema cases

f5ad8f3

victoria-yining-huang changed the title ~~add BatchParser~~ feat(chain): add BatchParser Jun 18, 2025

fpacifici approved these changes Jun 18, 2025

View reviewed changes

Uh oh!

feat(chain): add BatchParser #134

Are you sure you want to change the base?

feat(chain): add BatchParser #134

Conversation

victoria-yining-huang commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evanh left a comment

Choose a reason for hiding this comment

Uh oh!

victoria-yining-huang commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fpacifici commented Jun 18, 2025

Uh oh!

Uh oh!

victoria-yining-huang commented Jun 12, 2025 •

edited

Loading

victoria-yining-huang commented Jun 18, 2025 •

edited

Loading