Annotation-driven SQL for UDTs. #59

jcflack · 2015-10-19T05:52:28Z

The 2013 work on the SQL generator didn't add annotations for setting up PL/Java User Defined Types ... an unhappy omission because hand-coding the SQL for UDTs (base type UDTs, anyway) is fiddly and tedious. So here are two new annotations:

@BaseUDT class foo implements SQLData { ... }

as long as the class implements the interface and also has the necessary parse method and no-arg constructor, will generate all the SQL (shell type, four function declarations, complete type redeclaration) for a new base type.

@MappedUDT(structure={"x float8", "y float8"}) class cplx implements SQLData ...

will emit SQL to declare a new composite/structured type with the specified structure, and call sqlj.add_type_mapping to associate it with the class.

If PostgreSQL says "composite" and the SQL standard says "structured", why did I say "mapped"? Well, because you can use the same annotation to do this (even though it's arguably not a UDT at all then):

@MappedUDT class point implements SQLData ...

If no structure is specified, no type declaration is actually emitted, only the call to sqlj.add_type_mapping to associate the class with some existing PostgreSQL type.

Includes updates to the ComplexScalar, ComplexTuple, and Point examples to make them annotation-driven, and remove the associated SQL from the hand-maintained examples.ddr as it now appears by magic in the generated pljava.ddr.

In the examples, I still use complexType on return values and @SQLType on parameter values to indicate the SQL type corresponding to the class ... one would think the generator could learn that much from seeing the annotation saying the type is the class ... but that will take a bit more rework of the generator's type mapper than appeals to me just now, and complexType/SQLType are not that hard to use, so it can go on the enhancement queue for later.

I had not run these examples (specifically, their logAndReturn methods) before, and trying them out seems to uncover a couple areas for attention. ComplexScalar works great (logs and returns the correct values); ComplexTuple produces equally good results but also a TupleDesc reference leak warning, and Point succeeds and returns the correct values, but emits a log message with different values (that is, after adding a toString method to be able to see the values). As for Point to work it has to intimately know the PostgreSQL internal point layout, maybe these bogus values mean some detail has changed there. I see the same behavior on the master branch.

Turns out the ComplexTuple reference leak warning goes away if the logAndReturn function is declared IMMUTABLE STRICT as it was in the hand-maintained examples.ddr; the generator makes it VOLATILE by default unless annotated otherwise (and that produces the same warning even on the master branch), so I'll adjust the annotation. Maybe it is worth exploring why marking such a simple function VOLATILE would lead to a reference leak, though.

- Add the annotation, have DDRProcessor recognize it. - Check the annotated class for the required properties and members. - The snippets map formerly allowed only one type of annotation on a given element. That happened to work, but not now when a class could have both a UDT and a SQLAction annotation, for example. Adapt the map to key by element and snippet class, so snippets of different classes can be hung on an element and selectively retrieved. - An old comment in populateAnnotationImpl suggested it would reduce boilerplate setter code if, failing to find a setter method, the field itself could be reflectively looked up and stored. That's done now, so setter methods are needed only when something more special has to be done. - Function declarations will be synthesized automagically for the four mandatory (in, out, recv, send) methods, but to allow their properties to be individually adjusted, they can still accept Function annotations. That's done by hanging a new subclass of FunctionImpl on those elements, that generates the right special form of declaration, and has setters that refuse changes to certain properties where that wouldn't make sense. - Treat the default implementor-tag (PostgreSQL if not changed with new ddr.implementor command line property) specially. Now that everything is getting wrapped with an implementor tag by default, the implied requires="implementor-tag" was holding everything back until released by the cycle-breaker, sometimes in puzzling order. The implementor tag that happens to be the default one should /not/ be treated as a pending requirement. - Move ComplexScalar to annotation subpackage, in preparation for revamping it as an annotation example. So git won't lose track of it, this is a breaking change, to be fixed in the next commit with edits corresponding to the move.

Make the ComplexScalar example an annotation example, and remove the corresponding code from the hand-maintained examples.ddr.

It is much slimmer when able to use a UDT annotation. :)

... to clear the way for also having MappedUDT. So git doesn't lose it, no edits in the file yet, so it won't compile until the following commit.

Supporting MappedUDT is much simpler. It emits a CREATE TYPE foo AS ( ... structure ... ) if a structure is provided, or not, if it isn't, followed by a SELECT sqlj.add_type_mapping( ... ), and that's it. Much javadoc also added. This commit ends with pure renames of the Point and ComplexTuple examples in preparation to make them annotation examples ... breaking compilation until they are fixed up in the next commit, but git can see where they went.

Completes the work on MappedUDT, with Point and ComplexTuple now annotation examples, and the corresponding code removed from the old hand-maintained examples.ddr. In a couple last this-really-has-to-be-it review passes... - some msg() calls had the Element parameter in the wrong place (and, being variadic and expecting extra parameters, didn't warn about it). - whether to include length/alignment/storage in the output when 'like' is used should explicitly depend on whether they were set explicitly (is that explicit enough?); otherwise, in some cases it wouldn't be possible to override the values copied by like. - removed a goofy constant in the VarlenaUDTTest left over from very early sanity checking. - Shortened unnecessary full-qualification on some enums actually in scope.

That's how it was declared in the hand-maintained examples.ddr, and letting it default to VOLATILE makes it produce TupleDesc reference leak warnings for some reason, so I've added IMMUTABLE to the annotation. At the same time, onNullInput=RETURNS_NULL is clearly needed, just by glancing at the function, which does no null checking. That's also true of Point, so that's added both places. This demonstrates an advantage of the SQL generation ... the place where you put the annotation is exactly where you're looking when you're looking at the code.

Javadoc for 'implementor' in BaseUDT and MappedUDT showed it had been copy/pasted from Function. package-info.java did not document new ddr.implementor. It also turned out that ddr.implementor can't use an empty string to mean "don't wrap" because javac passes such an option to the processor as if it were null or not specified ... so -Addr.implementor=- has been reserved for that purpose.

Annotation-driven SQL for UDTs.

Addresses issue #98. There really hasn't been a document that just lays out in detail how PL/Java's data coercions are chosen and especially how the raw-chunk-based versions of SQLInput/SQLOutput work. Discovered, while documenting that, the reason Point was showing bogus values (noted in pull request #59 comments) ... byte order ... leading to a last-minute rework of SQLInputFromChunk and SQLOutputToChunk allowing byte order to be selected. (It has to be selectable, not just fixed, in case PL/Java gets updated at a site that has a bunch of data stored under the old byte order. A switch allows dumping with the old order, then reloading with the new. In fact, by choosing a different byte order for each conversion direction, it is possible ... carefully ... to update in place.) A flag is passed to SQLInputFromChunk_create and SQLOutputFromChunk_create to indicate whether the type is a 'java-based scalar' (otherwise, it is a non-composite mirror), and this allows byte order to be specified separately for scalars and mirrors: historically, it was always bigendian regardless of architecture. Presumably that has never mattered much for java-based scalars (because PostgreSQL itself doesn't have any independent way of seeing into them, so no inconsistency was visible), but for a mirror type it is definitely broken if the hardware is not bigendian, with PostgreSQL and Java not seeing the values the same way. Therefore, this change leaves the scalar default (for 1.5.0 at least) at bigendian, in case some sites may have used Java-based scalars in the past and have tables that contain them. Also, the current UDT implementation gives every scalar type a binary send/receive/COPY format that can't (yet) be decoupled from its internal stored form, and the PostgreSQL docs specify big-endian for those binary transfer formats, so it would be premature to change the scalar byte-order default before it is also possible to have a separate transfer format. For mirror types, on the other hand, the default is here immediately changed to native ... it is less likely that sites were heavily using mirror UDTs if they were seeing bogus values, and this will make them work right by default. Java properties are used (set with -D in the pljava.vmoptions GUC): org.postgresql.pljava.udt.byteorder to set everything the same, org.postgresql.pljava.udt.byteorder.scalar org.postgresql.pljava.udt.byteorder.mirror to set them separately. The allowable values for each property are big_endian, little_endian, and native. Even finer-grained control is available with: org.postgresql.pljava.udt.byteorder.scalar.p2j org.postgresql.pljava.udt.byteorder.scalar.j2p org.postgresql.pljava.udt.byteorder.mirror.p2j org.postgresql.pljava.udt.byteorder.mirror.j2p for a very special purpose, should someone wish to attempt an UPDATE where within a session, a column with values written in one order gets rewritten in another, a fiddly and somewhat unnerving process that turns out to actually work, and has a full new page documenting it now.

jcflack added 8 commits October 18, 2015 22:58

(Unbreaking) complete the ComplexScalar rename.

43072b6

Make the ComplexScalar example an annotation example, and remove the corresponding code from the hand-maintained examples.ddr.

Get that issue 52 test into the tree.

fe2297e

It is much slimmer when able to use a UDT annotation. :)

Rename UDT to BaseUDT (breaking change).

b9a9d6d

... to clear the way for also having MappedUDT. So git doesn't lose it, no edits in the file yet, so it won't compile until the following commit.

thallgren added a commit that referenced this pull request Oct 20, 2015

Merge pull request #59 from jcflack/feature/master/udtsqlgen

0de8b77

Annotation-driven SQL for UDTs.

thallgren merged commit 0de8b77 into tada:master Oct 20, 2015

jcflack mentioned this pull request Mar 11, 2016

Bogus values on little-endian hardware when a UDT mirrors a PostgreSQL type #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation-driven SQL for UDTs. #59

Annotation-driven SQL for UDTs. #59

jcflack commented Oct 19, 2015

Annotation-driven SQL for UDTs. #59

Annotation-driven SQL for UDTs. #59

Conversation

jcflack commented Oct 19, 2015