Soft automatic schema reload #200

nicktorwald · 2019-06-18T19:59:43Z

Now client keeps actual schema metadata and sends schemaId header to be
checked against current Tarantool schema version. If client version
mismatches DB version client does schema reloading in the background.

Client operation interface was reworked in scope of support not only
number identifiers for spaces and indexes but also their string names.

Closes: #7, #137

coveralls · 2019-06-19T17:21:24Z

Coverage increased (+2.06%) to 77.66% when pulling dd1f29f on nicktorwald/gh-7-soft-schema-reload into 56c28a3 on master.

Totktonada

Thank you! My comments are below.

README.md

src/main/java/org/tarantool/AbstractTarantoolOps.java

src/main/java/org/tarantool/schema/TarantoolIndexMeta.java

src/main/java/org/tarantool/TarantoolClientImpl.java

src/test/java/org/tarantool/ClientAsyncOperationsIT.java

src/test/java/org/tarantool/schema/ClientSchemaIT.java

src/test/java/org/tarantool/ClientAsyncOperationsIT.java

Totktonada · 2019-08-02T18:18:47Z

Sorry for the late reply. Here are my thoughts about the patch.

Re approach

It is possible that a schema is changed between selects from _vspace and
from _vindex. If those select responses contain different schema_id, then
we should retry both selects. Let's consider several cases:
- an index is removed during fetching a space;
- a new field is added to a space format during fetching a space;
- a new space is filled, old one is dropped and the new one is renamed to the
  name of the old one.
  I think it worth to have corresponding test cases if it is possible.
I guess we need to handle the case when another instance is started in a
place on previous one and so schema_id check becomes unreliable. Maybe we
should drop a schema after successful authentificate (or successful
(re)connect as guest). Need to test this case.
- Related: tarantool/tarantool-php@42a63b4
I think that testing with sync ops have a way less chances to catch a race
comparing with async ops. Or we need to run it in much more threads I think.
What will be going on if we'll request a non-exists space by name? As I see
it will just reschedule such request again and again instead of reporting
an error. Need to test it.

A schema is updated at the following points:

Just after connect and (if needed) auth.
- We cannot guarantee that an instance is the same after a reconnect, so I
  think we should drop a schema cache each time we connect to an instance.
When a schema version error is received in as a response to our request.
When a request is against a non-existent space / index name.
- It worth to send ping request and fetch a schema only in case of a wrong
  schema error, because a schema can be relatively big.
- We should make sure that we ask to update a schema with a ping request
  only once for a certain request (if a space/index still does not exist),
  then report a corresponsing error. (If we'll agree to add a request object
  we can track this inside it; however we'll need to ensure only one
  connection proceed with the request object; or track it in TarantoolOp?)

It worth to describe how the schema reload works (in README): I would add the
update conditions above and guaranties the connector provides for a user. AFAIU
the guarantee is the following: a request will be evaluated against a schema
actual at a moment when the request is added or against a newer schema.

AFAIU cluster client should work well if we'll drop a schema cache between
reconnects. However I would add a test that will model the following situation.
There is a space on different instances with the same space name, buf different
space id. Schema id is the same on both instances. Connect to a first instance,
select from the space by name, then connect to the second one and select from
the space by name. The connector should use a space id from the second instance
in this case.

Re code

When I start to look into the code there were difficulties to undestand some
points. I don't sure whether it is due to me or due to the code.

It was not obvious that 'resolve' means: whether it can lead to a schema
fetch or just cache lookup.
It was not obvious whether hasUnresolvedArgument() checks for arguments that
were initially added as unresolved or lead to a schema cache lookup.

I think it worth to think whether we able to improve the terminology: are we
should use abstract term 'resolve' that does not give enough context? Maybe it
worth to incapsulate some logic inside a request / a statement object and pass
this object instead of 'args' from AbstractTarantoolOps to TarantoolClientImpl.
This object can have methods like isBinarySerializable() and getBinaryPacket().
Also here we can store whether we already had a try to sync a schema id to
don't run into a loop in case of non-existing name (see above). I recently
proposed in #212 to add some kind of such object as part of the public API to
provide easier way to construct a request. Within scope of this issue it can be
added as the private thing.

Don't sure about all points within this section. I don't sure that they will
improve readability of the code. Please, consider this section as 'yet another
possible way to solve this task'. If you think it worth to try to follow this,
let's do; if not, then skip it (but, please, leave a comment for me).

BTW, I like the approach where space / index names are resolved using a schema
cache just before write a request.

Quite minor points

Don't sure whether there is a real need in this, but we don't support
(Integer space, String index) requests.
- Related: tarantool/tarantool-php@0b2db7f
"<...>i.e. after DML operation <...>" Typo: DML -> DDL.
Typo: two test cases are named 'fetched a space with its index'.
The commit message is duplicated.

Totktonada

Sorry, I possibly directed you in a wrong way with all those thoughts about
ordering. I thought that it would be cheap to give some simple guarantees on
ordering, but it does not look so for me now. Please, consider comments below.

Key points:

Extract the DSL into separate PR.
- It will be possible to discuss it separately.
- It seems we can form a request object w/o using the DSL internally.
Don't try to give requests in some order.
- Dependent actions should either performed using a server-side procedure or
  performed sequencial (next one will wait a result of previous one). There is
  no general way (streams, interactive transactions or so) to ensure a server
  will execute request in an order when we send them in a batch.
- We use two buffers to write and choose one of them depending (AFAIU) on a
  packet size and a fullness of a shared buffer, so we have no guarantee
  that we'll write requests in the order it was written by a user.
- Tarantool starts to execute a request in the order them were written. But
  this does not guarantee that they will be finished in the same order if
  there are yields inside. It is possible for calls, evals and in case of
  vinyl.
- As I see the current implementation retries requests in the order of
  ER_SCHEMA responses. If we would want to provide some ordering guarantee
  we will be obligated to lean on, say, monotonically increasing request id
  (sync id).
- The question here sounds for me so: how many guarantees about ordering we
  can give keeping the code considerably simple and keeping description of
  those guarantees considerably simple. When I said guarantees I mean
  guarantees of ordering for memtx/vinyl/call(eval) requests (or at least
  order of starting of execution) -- and they should become the same even if
  a schema error was received -- and guarantees about a point when we
  resolve names (this is about ordering too). As I see it is not worth to
  give some partial and hard in understanding solution here. It is better to
  come to this question when streams / transactions will be supported on
  tarantool side.
- So I propose to treat each request separately (in context of retries), but
  as optimization we can postpone those requests that will definitely
  receive a schema error if will be written immediately.

BTW, it means that while we have no some kind of streams
support on the server
side, we are unable to implement batching (say, for jdbc) using async
connector. Or maybe we can in theory (after significant changes around points
above) and for some specific set of requests: memtx only, no yields inside
(each request either is a signle operation of several ones in a transaction).
It does not look worthful.

src/main/java/org/tarantool/TarantoolRequest.java

src/test/java/org/tarantool/ClientAsyncOperationsIT.java

src/main/java/org/tarantool/TarantoolClientImpl.java

Totktonada · 2019-09-13T12:43:01Z

I assumed that relaxing of guarantees may allow us to simplify the code. That is why I bother about this.

Totktonada

Sorry for a big delay.

I looked into the schema reload part of the PR (1st commit). I think that the second one should be developed in a separate PR. However it is helpful to have a draft for a request DSL to evaluate how good we designed a future achitecture. (Now the DSL lacks documentation and tests, that is why I said 'draft': maybe the implementation is mature enough.)

Now I looked into a behaviour part (and I mostly comfortable with it). I need to look into schema/ classes and tests; but stops here to give some comments sooner.

README.md

src/main/java/org/tarantool/RequestArguments.java

src/main/java/org/tarantool/TarantoolClientImpl.java

src/main/java/org/tarantool/TarantoolRequest.java

src/main/java/org/tarantool/TarantoolClientImpl.java

src/main/java/org/tarantool/TarantoolConnection.java

src/main/java/org/tarantool/schema/TarantoolSchemaException.java

src/main/java/org/tarantool/MsgPackLite.java

Totktonada

A bit more comments.

src/main/java/org/tarantool/schema/TarantoolIndexMeta.java

src/main/java/org/tarantool/schema/TarantoolSpaceMeta.java

src/test/java/org/tarantool/ClientAsyncOperationsIT.java

Totktonada · 2019-11-16T01:46:05Z

Thank you for the great work!

Now I have no objections, except API compitibility: template parameters of TarantoolClientOps was changed and it seems that it will require a user to update its code. Can we eliminate this by using a first template parameter for numerical space/index IDs as before, but use 'hardcoded' String type for string space/index names?

I have the question about a died stage and the proposal re make some tests more durable: please, update me on those points.

I propose to extract DSL implementation into its own PR. I looked over it too and this looks adorable. It needs tests and a documentation. Also I would name TarantoolRequestConvertible as TarantoolRequestSpec, just for readability.

Now client keeps actual schema metadata and sends schemaId header to be checked against current Tarantool schema version. If client version mismatches DB version client does schema reloading in the background. Client operation interface was reworked in scope of support not only number identifiers for spaces and indexes but also their string names. Closes: #7, #137

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 3 times, most recently from eeefdca to 178337c Compare June 19, 2019 15:52

Totktonada reviewed Jun 26, 2019

View reviewed changes

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 3 times, most recently from 258f1d4 to 672a687 Compare July 5, 2019 16:29

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 2 times, most recently from 8ef4616 to e99d0dd Compare September 4, 2019 17:42

Totktonada mentioned this pull request Sep 4, 2019

Consider support of space / index names in the binary protocol tarantool/tarantool#4479

Closed

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 7 times, most recently from 8aed0a1 to 456385e Compare September 8, 2019 22:01

Totktonada reviewed Sep 13, 2019

View reviewed changes

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 3 times, most recently from 5ffc288 to 244a493 Compare September 19, 2019 07:52

nicktorwald mentioned this pull request Sep 24, 2019

Internal logging support #205

Merged

Totktonada reviewed Oct 18, 2019

View reviewed changes

src/main/java/org/tarantool/schema/TarantoolIndexMeta.java Outdated Show resolved Hide resolved

src/main/java/org/tarantool/schema/TarantoolSpaceMeta.java Outdated Show resolved Hide resolved

src/test/java/org/tarantool/ClientAsyncOperationsIT.java Outdated Show resolved Hide resolved

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch from 244a493 to d5ceeb2 Compare October 25, 2019 18:04

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch from d5ceeb2 to 567851c Compare November 20, 2019 17:33

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch 2 times, most recently from 4056cb2 to de7faf9 Compare November 21, 2019 17:15

nicktorwald force-pushed the nicktorwald/gh-7-soft-schema-reload branch from de7faf9 to dd1f29f Compare November 21, 2019 18:38

Totktonada merged commit f32bafa into master Nov 26, 2019

Totktonada deleted the nicktorwald/gh-7-soft-schema-reload branch November 26, 2019 11:04

This was referenced Nov 27, 2019

Perform DQL/DML using a space name #137

Closed

More flexible schema declaration #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft automatic schema reload #200

Soft automatic schema reload #200

nicktorwald commented Jun 18, 2019

coveralls commented Jun 19, 2019 •

edited

Totktonada left a comment •

edited

Totktonada commented Aug 2, 2019 •

edited

Totktonada left a comment

Totktonada commented Sep 13, 2019

Totktonada left a comment

Totktonada left a comment

Totktonada commented Nov 16, 2019

Soft automatic schema reload #200

Soft automatic schema reload #200

Conversation

nicktorwald commented Jun 18, 2019

coveralls commented Jun 19, 2019 • edited

Totktonada left a comment • edited

Choose a reason for hiding this comment

Totktonada commented Aug 2, 2019 • edited

Re approach

Re code

Quite minor points

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Sep 13, 2019

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Nov 16, 2019

coveralls commented Jun 19, 2019 •

edited

Totktonada left a comment •

edited

Totktonada commented Aug 2, 2019 •

edited