Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive updates via triggers cause a document_missing_exception #622

Closed
mathroc opened this issue Dec 29, 2020 · 24 comments
Closed

Recursive updates via triggers cause a document_missing_exception #622

mathroc opened this issue Dec 29, 2020 · 24 comments

Comments

@mathroc
Copy link
Contributor

mathroc commented Dec 29, 2020

ZomboDB version: alpha3 ( 8ffc83d )
Postgres version: 12.4
Elasticsearch version: 7.10.1

Problem Description:

I have a script to migrate a database to a new one, it first inserts a bunch of rows to be indexed and commit that.
Later it inserts a bunch of row somewhere else that triggers an update on the zombodb indexed rows (because the indexed function use rows from this other table to build the final document to index in elasticsearch). when the transaction is committed, zombodb tries to update the document in ElasticSearch but it fails with this error

Error Message (if any):

ERROR:  code=Some(200), {"error":null,"errors":true,"items":[{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2091649073195]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"3","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2091649073197]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"2","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2091649073199]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"1","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2091649073201]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"2","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040449]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"4","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040451]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"4","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040453]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040455]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040457]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040459]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040461]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040463]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040465]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"4","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040467]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"0","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040469]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"4","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040471]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"2","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040473]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"2","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040475]: document missing","index_uuid":"k8UtZOewTDKfgv1B3KeAug","shard":"4","index":"190027.2200.190121.194391"}}},{"update":{"error":{"type":"document_missing_exception","reason":"[_doc][2095944040477]: document missing"

there’s a few hundreds of document_missing_exception but I truncated the message here. I think none of the document were updated

Table Schema/Index Definition:

I don’t have a small enough schema at the moment, I’ll update if I can reproduce the issue with something smaller. however if I try to do what I expect to be similar manualy in postgres (instead of from my php script) I don’t know yet how to reproduce the issue

if I try to query one of the missing document select zdb.request('public.idx_es_freelance', '_doc/1859720839210')::jsonb; it’s indeed missing :

ERROR: ElasticsearchError(Some(404), "{"_index":"21479.2200.21573.25843","_type":"_doc","_id":"2370821947396","found":false}")
CONTEXT: src/elasticsearch/mod.rs:507:19

the index in ES does contain as many documents as the table in postgres and fetching those by id work fine. so I don’t know where all those ids without a corresponding ES document are coming from

I restarted with a fresh elasticsearch database so I only have one index on it to make sure there wouldn’t be any confusion on my part or from zdb between which index to use.

I was thinking that maybe it was caused by a delay related to ES wait_for parameter but the doc says it should be handled properly and when I tried counting the document before the update with a select * from zdb.cat_indices everything was indeed fine. and adding a 30 seconds delay did not help either.

I’ll continue investigating this for a while, but I’m open to suggestion on how to debug this further 🙏

@eeeebbbbrrrr
Copy link
Collaborator

This sounds like a duplicate of issue #618.

But I gotta tell ya, after our discussions in #621, I'm not sure which "version" of ZDB you're really using.

@mathroc
Copy link
Contributor Author

mathroc commented Dec 29, 2020

It looks like #618 is fixed on the version I’m running

image

I built the version at 8ffc83d which is currently the last commit of develop

@eeeebbbbrrrr
Copy link
Collaborator

Was this database created and indexed before I fixed #618?

if you do a vacuum full on all the tables with zdb indexes do the errors go away?

it’s kinda unfair to ask me to try and diagnose problems against the develop branch. The only time the develop branch is in a state to be used is the moment I merge it into master.

@mathroc
Copy link
Contributor Author

mathroc commented Dec 29, 2020

Was this database created and indexed before I fixed #618?

no, I’m working on a test environment I destroy and recreate from scratch every time I start the migration script. so the database was created today

if you do a vacuum full on all the tables with zdb indexes do the errors go away?

no, same result

it’s kinda unfair to ask me to try and diagnose problems against the develop branch. The only time the develop branch is in a state to be used is the moment I merge it into master.

oh I’m sorry ! no worries, I don’t expect you to prioritize this over your own stuff. Take your time I’ll continue investigating and updating the issue if I find something that could be worthwhile. or if you’d rather not be distracted I’ll wait until I can work on master

@eeeebbbbrrrr
Copy link
Collaborator

Can you post your entire database schema? Or can you email me a pg_dump of the database?

@eeeebbbbrrrr
Copy link
Collaborator

eeeebbbbrrrr commented Dec 30, 2020

Okay, with the database dump you emailed me, I can re-create this.

It seems to happen whenever the es_index_freelance(integer) (non-trigger) function gets called.

I think what's happening is you've got recursive updates happening on the freelance table, and as such ZDB is seeing the top-of-stack updates first, which want to update an index tuple that hasn't been created yet because the stack hasn't fully unwound yet.

I don't think there's anything I can do about this. You might should look at doing this in a way where you're just changing the columns in NEW instead of issuing another UPDATE.

I'll think on this a bit more tho...

@eeeebbbbrrrr
Copy link
Collaborator

More commentary...

ZDB installs its own ON UPDATE FOR EACH ROW trigger (same for DELETE), which gets fired before Postgres wants to insert the NEW, updated tuple in the index. ZDB's trigger marks some xact tracking information for the OLD tuple. But that tuple doesn't exist yet because triggers haven't finished firing yet for that row.

I might could maybe fix this by queueing the ES update commands to happen after triggers have finished firing, but a) I don't know how to do that, and b) that would require that ZDB track these things in memory, and that's probably a terrible idea if large UPDATEs are happening.

@mathroc
Copy link
Contributor Author

mathroc commented Dec 30, 2020

again, thank you a lot for looking into this so quickly

I think what's happening is you've got recursive updates happening on the freelance table, and as such ZDB is seeing the top-of-stack updates first, which want to update an index tuple that hasn't been created yet because the stack hasn't fully unwound yet.

I don't think there's anything I can do about this. You might should look at doing this in a way where you're just changing the columns in NEW instead of issuing another UPDATE.

unfortunately I can’t always change NEW, because most of the time NEW is a row from another table, not from the table with the zdb index. I’ve worked around that by deporting the index update to an asynchronous service instead of doing it in the same transaction. it’s ok in this case because I don’t mind if there is a small delay before the change is visible in ElasticSearch

ZDB installs its own ON UPDATE FOR EACH ROW trigger (same for DELETE), which gets fired before Postgres wants to insert the NEW, updated tuple in the index. ZDB's trigger marks some xact tracking information for the OLD tuple. But that tuple doesn't exist yet because triggers haven't finished firing yet for that row.

I might could maybe fix this by queueing the ES update commands to happen after triggers have finished firing, but a) I don't know how to do that, and b) that would require that ZDB track these things in memory, and that's probably a terrible idea if large UPDATEs are happening.

I think I understand what’s happening. I didn’t know before that a row ctid was changing each time it's updated.

I guess oid wasn’t used to avoid having to create table with WITH OIDS ?

suppose oid was used, would that have avoided this issue ?

@eeeebbbbrrrr
Copy link
Collaborator

I don’t mind if there is a small delay before the change is visible in ElasticSearch

I do. I absolutely do. ZDB's #1 design goal is to have results always be MVCC-correct, even if the query is wholly answered by Elasticsearch (ie, aggregate queries like zdb.terms()).

I didn’t know before that a row ctid was changing each time it's updated.

I'm gonna put together a standalone test case here in a few minutes, but yeah, I think that's what's happening. In the 5+ years ZDB has been around, this is the first time such a thing has come up and I honestly have never put any thought into this.

I guess oid wasn’t used to avoid having to create table with WITH OIDS ?
suppose oid was used, would that have avoided this issue ?

oid doesn't scale beyond an unsigned 32bit integer and WITH OIDS support was removed in PG12+ anyways.

I'm still thinking about how to handle this. It's necessary for ZDB to update the xmin/xmax/cmin/cmax values in the ES index on the OLD tuple prior to inserting the NEW tuple. If the order is reversed, concurrent transactions would see both tuples for the time between those operations, breaking MVCC visibility rules.

@mathroc
Copy link
Contributor Author

mathroc commented Dec 30, 2020

I don’t mind if there is a small delay before the change is visible in ElasticSearch

I do. I absolutely do. ZDB's #1 design goal is to have results always be MVCC-correct, even if the query is wholly answered by Elasticsearch (ie, aggregate queries like zdb.terms()).

I didn’t know before that a row ctid was changing each time it's updated.

I'm gonna put together a standalone test case here in a few minutes, but yeah, I think that's what's happening. In the 5+ years ZDB has been around, this is the first time such a thing has come up and I honestly have never put any thought into this.

I guess oid wasn’t used to avoid having to create table with WITH OIDS ?
suppose oid was used, would that have avoided this issue ?

oid doesn't scale beyond an unsigned 32bit integer and WITH OIDS support was removed in PG12+ anyways.

I'm still thinking about how to handle this. It's necessary for ZDB to update the xmin/xmax/cmin/cmax values in the ES index on the OLD tuple prior to inserting the NEW tuple. If the order is reversed, concurrent transactions would see both tuples for the time between those operations, breaking MVCC visibility rules.

ah, I didn't know oids were removed. I don’t know if this can be solved with ctids.

I suppose it would be hard to use the primary key because (1) the type is unknown and potentially spread across multiple column. or (2) just absent altogether.

if (1) is solvable, I think requiring a primary key is a reasonable requirement.

I wouldn’t mind having to add a column just for that, eg: zdb_id double

@eeeebbbbrrrr
Copy link
Collaborator

I wouldn’t mind having to add a column just for that, eg: zdb_id double

I'm not doing that. ;) It's never been a requirement fo ZDB before and It's not going to be now.

@eeeebbbbrrrr
Copy link
Collaborator

Got an update here, which I think is positive, although I don't know what to do about it yet!

ZomboDB indexes documents using background threads, fed by the main Postgres thread using a simple channel (queue).

When the triggers recurse, what looks like is happening is that say Thread1 will get the "update" message for OLD and Thread2 will get the "insert" message for 'NEW', and then due to random chance Thread1 will hit ES first, and it fails because the the document it wants to update (which would be from depth > 1) doesn't exist yet in ES b/c it's still being processed by Thread2.

If I hack ZDB to only use a single thread then everything works works just fine as ordering is properly preserved.

So I'd say this is my bug -- now I just need to figure out what to do about it.

I could just blindly say "if the table has any (non-ZDB) triggers, we run in single-thread mode". That would be the quick fix. I don't know any way to predict if the triggers are going to recurse, and otherwise batching everything in memory and re-ordering would essentially make it a single-threaded processes anyways.

@eeeebbbbrrrr eeeebbbbrrrr self-assigned this Dec 30, 2020
@eeeebbbbrrrr
Copy link
Collaborator

I guess I could also defer queueing the "update" command to be sent along with its corresponding "insert" command. As long as those two commands are in the same _bulk ES request (in the proper order), they'd be fine. I wonder what that would look like, code wise...

@eeeebbbbrrrr
Copy link
Collaborator

This will re-create the problem. The key here is having enough rows in the table (and updating enough rows) to cause ZDB to spin up more than 1 indexing thread.

CREATE TABLE issue622 (id serial8, oid bigint, by_trigger bool);
CREATE OR REPLACE FUNCTION update_trigger () RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    IF (round(random())::int)::boolean THEN
        -- recurse.  There's a chance we could end up doing this forever, but seems unlikely
        -- don't feel like making this smarter
        UPDATE issue622 SET by_trigger = true WHERE id = OLD.id;
    END IF;
    RETURN NEW;
END
$$;

CREATE TRIGGER update_trigger AFTER UPDATE ON issue622 FOR EACH ROW EXECUTE FUNCTION update_trigger();

INSERT INTO issue622 (oid) SELECT gs FROM generate_series(1, 30000) gs;
CREATE INDEX idxissue622 ON issue622 USING zombodb ((issue622.*));

UPDATE issue622 SET id = id;

@mathroc
Copy link
Contributor Author

mathroc commented Dec 30, 2020

This will re-create the problem. The key here is having enough rows in the table (and updating enough rows) to cause ZDB to spin up more than 1 indexing thread.

so that’s why I could not reproduce it manually. race condition problems are always fun ... :/

I could just blindly say "if the table has any (non-ZDB) triggers, we run in single-thread mode". That would be the quick fix. I don't know any way to predict if the triggers are going to recurse, and otherwise batching everything in memory and re-ordering would essentially make it a single-threaded processes anyways.

could that behavior be the default and adding an option in the index creation with(use_multiple_threads = true) so that if you know that your triggers are parallel safe you could go back to using multi threads. it would probably be a good idea to test how much of a performance cost it has in the first place to decide if it’s worth it

@eeeebbbbrrrr
Copy link
Collaborator

could that behavior be the default and adding an option in the index creation with(use_multiple_threads = true)

I think I'm going to be able to keep the "update" and "insert" commands together in the same package, and do them at the time of "insert". Then ZDB will be able to scale out threads automatically like it does now. I've almost got this implemented already.

@eeeebbbbrrrr
Copy link
Collaborator

eeeebbbbrrrr commented Dec 30, 2020

An UPDATE in Postgres is essentially a DELETE + INSERT, so that same sort of thing happens with ZDB+ES. ZDB will update the old doc's xmax/cmax values, then insert a brand new row. I've now got these operations grouped together so they'll always happen in the same _bulk ES request.

However, lets say our table is:

  ctid |  id 
--------------
(0, 1) | 1

When we update that row, we'll want to have two versions in ES, that look like this (essentially)

  ctid |  id 
--------------
(0, 1) | 1
(0, 2) | 1

And now those rows are updated/inserted together.

But when the trigger recurses we end up with:

  ctid |  id 
--------------
(0, 1) | 1
(0, 2) | 1
(0, 3) | 1

And what we'll have going on concurrently with ZDB is an update/insert pair for (0, 1) and (0, 2), along with an update/insert pair for (0, 2) and (0, 3).

If that second pair hits ES, where it wants to update (0, 2) before the first pair hits ES where it inserts (0, 2), then we get the "document_missing" error.

This is the problem.

Grouping these operations together is a good change for ZDB -- it halves the number of things that run through ZDB's queue, but it doesn't actually fix the problem.

I'm still thinking...

eeeebbbbrrrr added a commit that referenced this issue Dec 30, 2020
…ng ErrorKind::Iterrupted was causing things to retry. So use ErrorKind::BrokenPipe instead.

This was discovered while debugging issue #622.
@eeeebbbbrrrr
Copy link
Collaborator

eeeebbbbrrrr commented Dec 30, 2020

I think I've got this working. Basically, ZDB now tracks "in-flight" ctids (ctids that aren't known to have finished their _bulk request yet), and if we see we want to "update" a ctid (in ES) that's in-flight, we queue it up to the end of the entire bulk process.

And then after the bulk indexing process has completed, we re-process the the queued items, which keeps working recursively until we have nothing else queued.

With some debug output, this is what that looks like:

[v12.3][79378] contrib_regression=# UPDATE issue622 SET id = id;
INFO:  requeueing 15573 commands
INFO:  requeueing 7774 commands
INFO:  requeueing 3852 commands
INFO:  requeueing 1898 commands
INFO:  requeueing 947 commands
INFO:  requeueing 479 commands
INFO:  requeueing 239 commands
INFO:  requeueing 112 commands
INFO:  requeueing 49 commands
INFO:  requeueing 19 commands
INFO:  requeueing 6 commands
INFO:  requeueing 2 commands
INFO:  requeueing 0 commands
UPDATE 30000
Time: 7972.490 ms (00:07.972)

I think this is the correct way to handle it with minimal code weirdness. I'm gonna think on this another day before I push it.

@eeeebbbbrrrr
Copy link
Collaborator

(and I need to learn to spell requeuing, but whatever).

@eeeebbbbrrrr
Copy link
Collaborator

And all these additional requeuing batches run in parallel themselves, as much as they can, going through the same in-flight detection process.

@eeeebbbbrrrr
Copy link
Collaborator

eeeebbbbrrrr commented Dec 31, 2020

and this is what it looks like against your database:

=# update freelance set id = id;
INFO:  requeuing 1400 commands
UPDATE 23105
Time: 81291.085 ms (01:21.291)

Only one additional bulk process, but your triggers are more deterministic than mine.

I couldn't do that yesterday. So that's great. I'm pretty jazzed!

I'm not sure if I like the performance here or not, but I think that's just a factor of ZDB being ZDB. ZDB isn't known for its UPDATE perf relative to not having a ZDB index on your table.

@eeeebbbbrrrr
Copy link
Collaborator

oh, in --release mode, it's fine:

[v12.3][82463] mathroc=# update freelance set id = id;
INFO:  requeuing 1276 commands
UPDATE 23105
Time: 47299.639 ms (00:47.300)
[v12.3][82463] mathroc=# 

@mathroc
Copy link
Contributor Author

mathroc commented Dec 31, 2020

oh that looks promising !

I’m not too worried about the performance, this kind of an edge case and such large scale update won’t happen often once the migration is over.

And the time it takes to reindex that is actually acceptable

if you commit this on a branch I can test it too (not today but soon)

@eeeebbbbrrrr eeeebbbbrrrr changed the title Error on batch update: document_missing_exception Recursive updates via triggers cause a document_missing_exception Dec 31, 2020
@eeeebbbbrrrr eeeebbbbrrrr changed the title Recursive updates via triggers cause a document_missing_exception Recursive updates via triggers cause a document_missing_exception Dec 31, 2020
@eeeebbbbrrrr
Copy link
Collaborator

I feel pretty good about the changes here. They're really not that intrusive. I'll just include this as part of -alpha3, which I think might actually happen today!

eeeebbbbrrrr added a commit that referenced this issue Dec 31, 2020
There's two primary changes here.

The first is that during an UPDATE, we postpone queuing the  ES BulkRequestCommand until we see the corresponding insert command via the index AM API.

The second change is that we now track a set of "in flight" ctids (these are ctids that are either on their way to ES or being waited on for ES to finish indexing), and if we see an update command for one of these, we defer it (and its corresponding index command) and play those in after all the other documents have successfully finished.  This essentially happens recursively until all deferred commands have been processed.

The latter adds a small bit of per-tuple overhead in all cases (including CREATE INDEX), but it seems worth the price to ensure things actually work!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants