Open
Description
What happened?
Beam Version: 2.62
Scala Version: 2.12
Java Version: 11
Python Version: 3.12
In a Scala pipeline, i produce Pcollection[Row]
and pass to PythonExternalTransform
via something like:
pcollTuple
.apply(SqlTransform.query("..."))
.apply(PythonExternalTransform.from("my_package.MyTransform", "localhost:65001"): PythonExternalTransform[PCollection[Row], PCollection[Row]])
and the subsequent python code:
class MyTransform(beam.PTransform):
def expand(self, pcoll: beam.PCollection[Row]) -> beam.PCollection[Row]:
return pcoll | ...
I end up with the following error:
apache_beam.typehints.decorators.TypeCheckError: Input type hint violation at MyTransform: expected <class 'apache_beam.pvalue.Row'>, got <class 'apache_beam.typehints.schemas.BeamSchema_6a7b1e90_8aac_43f6_8caa_aadca7856423'>
Alternatives I've tried to workaround the issue are:
- Creating a
typing.NamedTuple
class to match the expected inboundSchema
, and changing thepcoll
input type hint to match - Using
beam.row_type.RowTypeConstraint.from_fields
from #25749
Both methods gave similar errors about not being able to validate the type hint.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner