Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation-driven SQL for UDTs. #59

Merged
merged 8 commits into from
Oct 20, 2015

Conversation

jcflack
Copy link
Contributor

@jcflack jcflack commented Oct 19, 2015

The 2013 work on the SQL generator didn't add annotations for setting up PL/Java User Defined Types ... an unhappy omission because hand-coding the SQL for UDTs (base type UDTs, anyway) is fiddly and tedious. So here are two new annotations:

@BaseUDT class foo implements SQLData { ... }

as long as the class implements the interface and also has the necessary parse method and no-arg constructor, will generate all the SQL (shell type, four function declarations, complete type redeclaration) for a new base type.

@MappedUDT(structure={"x float8", "y float8"}) class cplx implements SQLData ...

will emit SQL to declare a new composite/structured type with the specified structure, and call sqlj.add_type_mapping to associate it with the class.

If PostgreSQL says "composite" and the SQL standard says "structured", why did I say "mapped"? Well, because you can use the same annotation to do this (even though it's arguably not a UDT at all then):

@MappedUDT class point implements SQLData ...

If no structure is specified, no type declaration is actually emitted, only the call to sqlj.add_type_mapping to associate the class with some existing PostgreSQL type.

Includes updates to the ComplexScalar, ComplexTuple, and Point examples to make them annotation-driven, and remove the associated SQL from the hand-maintained examples.ddr as it now appears by magic in the generated pljava.ddr.

In the examples, I still use complexType on return values and @SQLType on parameter values to indicate the SQL type corresponding to the class ... one would think the generator could learn that much from seeing the annotation saying the type is the class ... but that will take a bit more rework of the generator's type mapper than appeals to me just now, and complexType/SQLType are not that hard to use, so it can go on the enhancement queue for later.

I had not run these examples (specifically, their logAndReturn methods) before, and trying them out seems to uncover a couple areas for attention. ComplexScalar works great (logs and returns the correct values); ComplexTuple produces equally good results but also a TupleDesc reference leak warning, and Point succeeds and returns the correct values, but emits a log message with different values (that is, after adding a toString method to be able to see the values). As for Point to work it has to intimately know the PostgreSQL internal point layout, maybe these bogus values mean some detail has changed there. I see the same behavior on the master branch.

Turns out the ComplexTuple reference leak warning goes away if the logAndReturn function is declared IMMUTABLE STRICT as it was in the hand-maintained examples.ddr; the generator makes it VOLATILE by default unless annotated otherwise (and that produces the same warning even on the master branch), so I'll adjust the annotation. Maybe it is worth exploring why marking such a simple function VOLATILE would lead to a reference leak, though.

- Add the annotation, have DDRProcessor recognize it.
- Check the annotated class for the required properties and members.
- The snippets map formerly allowed only one type of annotation on
  a given element. That happened to work, but not now when a class
  could have both a UDT and a SQLAction annotation, for example.
  Adapt the map to key by element and snippet class, so snippets
  of different classes can be hung on an element and selectively
  retrieved.
- An old comment in populateAnnotationImpl suggested it would reduce
  boilerplate setter code if, failing to find a setter method, the
  field itself could be reflectively looked up and stored. That's done
  now, so setter methods are needed only when something more special
  has to be done.
- Function declarations will be synthesized automagically for the four
  mandatory (in, out, recv, send) methods, but to allow their properties
  to be individually adjusted, they can still accept Function
  annotations. That's done by hanging a new subclass of FunctionImpl on
  those elements, that generates the right special form of declaration,
  and has setters that refuse changes to certain properties where that
  wouldn't make sense.
- Treat the default implementor-tag (PostgreSQL if not changed with new
  ddr.implementor command line property) specially. Now that everything
  is getting wrapped with an implementor tag by default, the implied
  requires="implementor-tag" was holding everything back until released
  by the cycle-breaker, sometimes in puzzling order. The implementor tag
  that happens to be the default one should /not/ be treated as a
  pending requirement.
- Move ComplexScalar to annotation subpackage, in preparation for
  revamping it as an annotation example. So git won't lose track of it,
  this is a breaking change, to be fixed in the next commit with edits
  corresponding to the move.
Make the ComplexScalar example an annotation example, and remove the
corresponding code from the hand-maintained examples.ddr.
It is much slimmer when able to use a UDT annotation. :)
... to clear the way for also having MappedUDT.

So git doesn't lose it, no edits in the file yet, so it won't compile
until the following commit.
Supporting MappedUDT is much simpler. It emits a
CREATE TYPE foo AS ( ... structure ... )
if a structure is provided, or not, if it isn't, followed by a
SELECT sqlj.add_type_mapping( ... ), and that's it.

Much javadoc also added.

This commit ends with pure renames of the Point and ComplexTuple
examples in preparation to make them annotation examples ... breaking
compilation until they are fixed up in the next commit, but git can
see where they went.
Completes the work on MappedUDT, with Point and ComplexTuple now
annotation examples, and the corresponding code removed from the
old hand-maintained examples.ddr.

In a couple last this-really-has-to-be-it review passes...

- some msg() calls had the Element parameter in the wrong place (and,
  being variadic and expecting extra parameters, didn't warn about it).
- whether to include length/alignment/storage in the output when 'like'
  is used should explicitly depend on whether they were set explicitly
  (is that explicit enough?); otherwise, in some cases it wouldn't be
  possible to override the values copied by like.
- removed a goofy constant in the VarlenaUDTTest left over from very
  early sanity checking.
- Shortened unnecessary full-qualification on some enums actually in
  scope.
That's how it was declared in the hand-maintained examples.ddr, and
letting it default to VOLATILE makes it produce TupleDesc reference
leak warnings for some reason, so I've added IMMUTABLE to the annotation.

At the same time, onNullInput=RETURNS_NULL is clearly needed, just by
glancing at the function, which does no null checking. That's also true
of Point, so that's added both places. This demonstrates an advantage of
the SQL generation ... the place where you put the annotation is exactly
where you're looking when you're looking at the code.
Javadoc for 'implementor' in BaseUDT and MappedUDT showed it had been
copy/pasted from Function. package-info.java did not document new
ddr.implementor. It also turned out that ddr.implementor can't use an
empty string to mean "don't wrap" because javac passes such an option
to the processor as if it were null or not specified ... so
-Addr.implementor=- has been reserved for that purpose.
thallgren added a commit that referenced this pull request Oct 20, 2015
@thallgren thallgren merged commit 0de8b77 into tada:master Oct 20, 2015
jcflack added a commit that referenced this pull request Mar 19, 2016
Addresses issue #98. There really hasn't been a document that just lays
out in detail how PL/Java's data coercions are chosen and especially how
the raw-chunk-based versions of SQLInput/SQLOutput work. Discovered,
while documenting that, the reason Point was showing bogus values (noted
in pull request #59 comments) ... byte order ... leading to a
last-minute rework of SQLInputFromChunk and SQLOutputToChunk allowing
byte order to be selected.

(It has to be selectable, not just fixed, in case PL/Java gets updated
at a site that has a bunch of data stored under the old byte order.
A switch allows dumping with the old order, then reloading with the
new. In fact, by choosing a different byte order for each conversion
direction, it is possible ... carefully ... to update in place.)

A flag is passed to SQLInputFromChunk_create and
SQLOutputFromChunk_create to indicate whether the type is
a 'java-based scalar' (otherwise, it is a non-composite mirror),
and this allows byte order to be specified separately for scalars
and mirrors: historically, it was always bigendian regardless of
architecture. Presumably that has never mattered much for java-based
scalars (because PostgreSQL itself doesn't have any independent way
of seeing into them, so no inconsistency was visible), but for a
mirror type it is definitely broken if the hardware is not bigendian,
with PostgreSQL and Java not seeing the values the same way.

Therefore, this change leaves the scalar default (for 1.5.0 at least)
at bigendian, in case some sites may have used Java-based scalars in
the past and have tables that contain them. Also, the current UDT
implementation gives every scalar type a binary send/receive/COPY
format that can't (yet) be decoupled from its internal stored form, and
the PostgreSQL docs specify big-endian for those binary transfer formats,
so it would be premature to change the scalar byte-order default before
it is also possible to have a separate transfer format.

For mirror types, on the other hand, the default is here immediately
changed to native ... it is less likely that sites were heavily
using mirror UDTs if they were seeing bogus values, and this will
make them work right by default.

Java properties are used (set with -D in the pljava.vmoptions GUC):
org.postgresql.pljava.udt.byteorder  to set everything the same,

org.postgresql.pljava.udt.byteorder.scalar
org.postgresql.pljava.udt.byteorder.mirror  to set them separately.

The allowable values for each property are big_endian, little_endian,
and native.

Even finer-grained control is available with:
org.postgresql.pljava.udt.byteorder.scalar.p2j
org.postgresql.pljava.udt.byteorder.scalar.j2p
org.postgresql.pljava.udt.byteorder.mirror.p2j
org.postgresql.pljava.udt.byteorder.mirror.j2p

for a very special purpose, should someone wish to attempt an
UPDATE where within a session, a column with values written
in one order gets rewritten in another, a fiddly and somewhat
unnerving process that turns out to actually work, and has a
full new page documenting it now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants