Per-connection decoders with PoC `citext` and `hstore` support #89

bjeanes · 2017-03-25T14:35:50Z

This is a follow up to my thoughts in #88.

Summary

Reading from a result set picks a decoder via a connection-local decoder map instead of the global list. Misses on the local map fallback to the existing global map of handlers.

When a connection is first initialised, it is immediately used to interrogate pg_type and expand on the statically-defined global decoder map.

This mechanism is used to implement citext support.

Notes

This will cause types to be determined multiple types in a connection pooling scenario. It may instead be preferable to do this once per DB. However, given that connections may or may not be pooled, the current approach seemed the most prudent, at least from a proof-of-concept standpoint.
Changes to the types during a connection won't be visible (e.g. CREATE EXTENSION). There's a few options here, but they don't seem deeply important to address now. It does mean the test for citext has to jump through some hoops, though.
The statically-defined global map of decoder handlers is depended upon to interpret the result set from querying pg_type.
I've left more comments and commented-out code than I usually would as the point of this PR is foremost to communicate the idea. E.g. A small snippet of code shows how a specialised HstoreDecoder could be wired up, too.

No-op for now but it introduces some indirection to experiment with decoders for known types whose OIDs differ from database-to-database, such as `hstore` or `citext`.

Probably this could do with its own error type or even fail gracefully to the built-in static list.

Duh... `oid` is a hidden column that I can just query. No need to monkey around to figure it out.

This refactors the existing behaviour to have a shared interface (which could be an abstract module) for registering decoders: Decoders.register_decoder(decoder : Decoders::Decoder, oid) PG::Connection#register_decoder(decoder : Decoders::Decoder, oid) While the `Decoders` module maintains responsibility for discovering types, it adds them to the connection's map via this interface, instead of returning a new `DecoderMap` for the `PG::Connection` to assign. This also means the connection's `DecoderMap` is never nil and isn't swapped out. However, it is mutated. This only happens automatically before the connection is available for use by the caller but the `#register_decoder` method could be called later.

I just reverse engineered this by looking at the output of the `ByteaDecoder` on these test values so this may not handle some other scenarios I haven't accounted for.

I think this could probably take most maps (e.g. Map(Symbol, T)) by always `#to_s`ing everything except `nil`. This encoder will need some stronger test cases to ensure values are escaped properly (I don't think `#inspect` will necessarily cut it).

bjeanes · 2017-03-26T00:19:10Z

Added an HstoreDecoder and param handling but it's probably pretty naïve.

bjeanes · 2017-03-26T21:35:43Z

Further thoughts:

A lot of types will use the same decoder and as more types are supported, the number of these decoder instances will grow with the connections. They aren't stateful so perhaps stand to be made singletons.
This first pass just demonstrates loading a fallback decoder based on the type category and loading one based on the type name. However, for other DOMAINs actually looking up the decoder based on the base type's OID would be good.
Array decoding doesn't (yet) benefit from this approach. pg_type does contain enough information that we could add extra array decoders. Though I don't yet know enough about Crystal to see how this dynamic approach could support the recursive type definition employed now.
Is there any reason read_array and read approaches can't be unified? Array entries in pg_type include OID for the elements' type, which could be used to pass in the element decoder.
This code still needs a bit of factoring (I don't think this new code all belongs in Decoders directly—probably in a child namespace).

will · 2017-03-26T20:36:43Z

spec/pg/encoder_spec.cr

-    PG_DB.exec "drop table if exists test_table"
-    PG_DB.exec "create table test_table (v #{datatype})"
+    begin
+      PG_DB.exec "create extension \"#{extension}\"" if extension


Maybe add if exists here and in the drop extension later on, to avoid errors if the test database isn't 100% clean

will · 2017-03-29T21:35:34Z

I need to think more on your questions. I'm at a postgres conf this week (and giving a talk including the protocol parts of this project 😀), so I'll dive in next week. Thanks for putting this together!

bjeanes · 2017-03-29T21:37:52Z

No worries! I actually have another approach that may generalise better and handle more cases so may have another similar PR next week anyway (if i can spare the time–otherwise I'll leave a describing comment instead)

…

On Thu., 30 Mar. 2017, 8:35 am Will Leinweber, ***@***.***> wrote: I need to think more on your questions. I'm at a postgres conf this week (and giving a talk including the protocol parts of this project 😀), so I'll dive in next week. Thanks for putting this together! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#89 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAKAAtCKVYzpPzSWA5HNqQ0Bhy0z90rks5rqs6ngaJpZM4MpF_T> .

will · 2017-04-15T00:57:52Z

Hey @bjeanes did you have any progress on the other approach?

bjeanes · 2017-04-18T01:35:40Z

@will ah no I got sidetracked with other things.

I was basically wanting to turn the DecoderMap as Hash(OID, Decoder) to TypeMap as Hash(OID, PG::Type). PG::Type is a struct which includes a (singleton) reference to decoder but also also more information from the pg_type table. I don't know if this is actually valuable but I have a hunch that it could make array decoding less of a special-case and allow for some sane assumptions for newly encountered types based on the category (geometric, numeric, etc).

If not that approach, I think my actual PoC here could be factored to be less ad-hoc as I kinda just dumped some code in the existing decoder code.

I dunno... what do you think? Did you have any imagined direction when you've contemplated per-connection type support? Anything you'd want to see to have a this or a derivative of this be merged?

bjeanes · 2017-04-18T01:37:14Z

Oh I also think there's more opportunity for composite or higher-order decoders to leverage the primitive decoders and perhaps for some symmetry between encoding/decoding, which might make it a bit clearer when adding support for future types.

bjeanes · 2017-04-18T01:46:30Z

src/pg/decoder.cr

+    def self.register_connection_decoders(connection : PG::Connection) : Void
+      types = connection.query_all(TYPE_SQL, as: {UInt32, String, Char})
+      types.each do |oid, name, category|
+        oid = oid.to_i32 # Query execution if I read straight to Int32 :\


Oops this comment should read "query execution hangs if...".

That was a reminder to me to ask @will why that is? I was fighting a lot of mysterious hangs if I didn't get things right. It can be hard to debug sometimes. Any advice?

bjeanes · 2018-02-18T02:02:31Z

I've been thinking about picking back up my side project that needed this functionality and drove this PR. @will are you still working on this lib and if so what do you see as the path forward for hstore/citext support?

will · 2018-02-19T02:49:47Z

Yeah, this is still a project. I think the only way for any sort of extension data type where the oid changes from database to database will require a per-connection mapping. If you're interested in working on this more, I'm happy to help if you run into anything.

bjeanes added 8 commits March 25, 2017 23:21

Refactor decoder lookup to go via connection

d2fd80f

No-op for now but it introduces some indirection to experiment with decoders for known types whose OIDs differ from database-to-database, such as `hstore` or `citext`.

Add citext support as POC for other types

11810c0

Don't mask decoder loading err as connection err

d9eb61c

Probably this could do with its own error type or even fail gracefully to the built-in static list.

Fix pg_type introspection for earlier versions

a1b702e

Duh... `oid` is a hidden column that I can just query. No need to monkey around to figure it out.

Add rudimentary HstoreDecoder

73488b3

I just reverse engineered this by looking at the output of the `ByteaDecoder` on these test values so this may not handle some other scenarios I haven't accounted for.

Add encoder spec for citext

add941d

bjeanes mentioned this pull request Mar 26, 2017

Decoding types with unstable OIDs #88

Closed

Detect NULLs in a way consisent with rest of code

2dc4bda

bjeanes force-pushed the per-connection-decoders branch from aa24fb6 to 2dc4bda Compare March 26, 2017 05:02

bjeanes changed the title ~~Per-connection decoders with PoC citext support~~ Per-connection decoders with PoC citext and hstore support Mar 26, 2017

will reviewed Mar 29, 2017

View reviewed changes

Only create/drop extensions in tests if needed

58e1211

bjeanes force-pushed the per-connection-decoders branch from d5ffde1 to 58e1211 Compare April 18, 2017 01:42

bjeanes commented Apr 18, 2017

View reviewed changes

thomasnal mentioned this pull request Aug 29, 2017

How to read postgres hstore column? imdrasil/jennifer.cr#25

Closed

will mentioned this pull request Aug 5, 2019

add register decoder with oid #180

Closed

bjeanes mentioned this pull request Jun 3, 2020

[PostgreSQL] citext is not compatable with text launchbadge/sqlx#295

Closed

matthewmcgarvey mentioned this pull request Dec 31, 2020

Read Slice(UInt8) as String if given as type #221

Merged

jgaskins mentioned this pull request Apr 20, 2022

Add support for extensions like citext #250

Draft

bjeanes closed this Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-connection decoders with PoC `citext` and `hstore` support #89

Per-connection decoders with PoC `citext` and `hstore` support #89

bjeanes commented Mar 25, 2017

bjeanes commented Mar 26, 2017

bjeanes commented Mar 26, 2017

will Mar 26, 2017

will commented Mar 29, 2017

bjeanes commented Mar 29, 2017 via email

will commented Apr 15, 2017

bjeanes commented Apr 18, 2017

bjeanes commented Apr 18, 2017

bjeanes Apr 18, 2017

bjeanes commented Feb 18, 2018

will commented Feb 19, 2018

Per-connection decoders with PoC citext and hstore support #89

Per-connection decoders with PoC citext and hstore support #89

Conversation

bjeanes commented Mar 25, 2017

Summary

bjeanes commented Mar 26, 2017

bjeanes commented Mar 26, 2017

will Mar 26, 2017

Choose a reason for hiding this comment

will commented Mar 29, 2017

bjeanes commented Mar 29, 2017 via email

will commented Apr 15, 2017

bjeanes commented Apr 18, 2017

bjeanes commented Apr 18, 2017

bjeanes Apr 18, 2017

Choose a reason for hiding this comment

bjeanes commented Feb 18, 2018

will commented Feb 19, 2018

Per-connection decoders with PoC `citext` and `hstore` support #89

Per-connection decoders with PoC `citext` and `hstore` support #89