Conversation
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Merging this PR will degrade performance by 21.4%
|
| canonical_encoder: RwLock<Option<ArrowEncoderRef>>, | ||
| /// Fallback decoder used after the user chain has declined. | ||
| default_decoder: RwLock<Option<ArrowDecoderRef>>, | ||
| /// Fallback dtype reader used after the user chain has declined. | ||
| default_dtype_reader: RwLock<Option<ArrowDTypeReaderRef>>, |
There was a problem hiding this comment.
A RwLock still has contention if all accessors and readers, maybe an ArcSwap. I think we might want a wrapper for this pattern, not needed here however!
| fn to_arrow_array( | ||
| &self, | ||
| array: ArrayRef, | ||
| target: &DataType, |
There was a problem hiding this comment.
I think we want this to be an optional so if we don't mind we can return encoding specific values
| target: &DataType, | |
| target: Option<&DataType>, |
Sorry if I missed a comment regarding this.
There was a problem hiding this comment.
the idea was that you must pass in an explicit target physical type, whether that's derived from user or by calling preferred_arrow_type() first
| /// Returning [`Ok(None)`] passes the request to the next reader in the chain. | ||
| pub trait ArrowDTypeReader: 'static + Send + Sync + Debug { | ||
| /// Try to read a Vortex [`DType`] from an Arrow [`Field`]. | ||
| /// | ||
| /// Implementations typically inspect [`Field::metadata`] for the `ARROW:extension:name` | ||
| /// key and dispatch on it. | ||
| fn try_read_dtype(&self, field: &Field) -> VortexResult<Option<DType>>; | ||
| } |
There was a problem hiding this comment.
why have both fields and data_types? I guess nullability?
There was a problem hiding this comment.
yea both is kinda funny. i think if you want to cover extension dtypes then you need the Field b/c that has metadata. and also nullability yea.
| fn preferred_arrow_type( | ||
| &self, | ||
| array: &ArrayRef, | ||
| session: &ArrowSession, | ||
| ) -> VortexResult<Option<DataType>>; |
There was a problem hiding this comment.
two passes might be expensive?
There was a problem hiding this comment.
we need to make sure this doens't regress any benchmarks
| pub fn register_encoder_for_extension( | ||
| &self, | ||
| key: impl Into<Id>, | ||
| plugin: impl Into<ArrowEncoderRef>, |
There was a problem hiding this comment.
I think we should make these ArrowEncoderRef I would think we end up using the same for for many ids so we don't want a different instance for each?
Using might may lead to that?
|
looks like 3 people started doing the same thing in parallel 😄 #7726 |
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
|
It's a hot topic |
Summary
Closes: #000
Testing