DROP COLUMN doesn't remove field from mapping #322

esatterwhite · 2018-12-17T21:33:36Z

ZomboDB version: 10-1.0.0b9
Postgres version: 10.5
Elasticsearch version: 6

Problem Description:

When removing a column from a table, the related field is left in the index mapping in elastic search.
This means you can't drop a column and add it back with a different column type. The only thing that seems to get a round the problem is a full REINDEX.

or add a suffix to the column names.

Is this expected?

The text was updated successfully, but these errors were encountered:

eeeebbbbrrrr · 2018-12-17T21:36:22Z

Yep, this is expected. Elasticsearch doesn't provide a way to remove a field from the index mapping, so there's not really anything ZDB can do.

I suppose this could be better documented by ZDB, but that's about it.

eeeebbbbrrrr · 2018-12-17T21:43:57Z

Also worth nothing that ES doesn't provide a way to rename a field in a mapping either. :(

esatterwhite · 2018-12-17T22:11:32Z

Is it possible to control the order of operations at the ES level?

right now it looks like it drops the index, makes a new one and populates it.
is it possible to make a new index in ES, populate it, and then drop the old one?

eeeebbbbrrrr · 2018-12-17T22:14:46Z

You mean during a REINDEX?

The problem there is that ES doesn't provide a way to rename an index, either. They want you to do that using aliases, and ZDB uses aliases for other reasons.

What's the actual problem we're trying to solve here? Do you have a production workflow where you drop/re-add columns with the same name but different types?

eeeebbbbrrrr · 2018-12-17T22:16:46Z

The problem there is that ES doesn't provide a way to rename an index, either. They want you to do that using aliases, and ZDB uses aliases for other reasons.

Hmm, I guess maybe you could use ES' "reindex" api to do this. I haven't actually used that, but I think it would necessitate a full rewrite of the index.

esatterwhite · 2018-12-17T22:39:55Z

it is basically a user defined schema allowing companies to define a custom table to store data + search through it with full operator support for the appropriate data type. Indexing that is pretty difficult. operator support for date / numeric ranges in json columns is mostly non-existent. But ES makes this pretty easy.

The problem is people tend to change their mind. Or someone tries to add a field that were previously deleted and it is of a different type.

someone add a foo column of text type
removes it a few weeks later
someone else comes a long and adds a foo column with a date type

a full index write under the hood is fine. more the problem is that nothing is really going to work while that is happening because the index is just gone. If zombo stood up a new index before removing the old one - I think we'd be in business.

I can probably get around the type conflicts in general by using a type suffix like foo_text instead of just foo but at some point we'll need to do some kind of maintenance on these, just trying to figure out how that would be possible

You mean during a REINDEX?

yes during a reindex. our manual process in the past for reindexing ES / fixing mappings in the past has been

stand up a new index
apply current template mapping
populate the new index with documents
tag it with the same alias
drop old index

applications queries the alias obviously. our template mapping had a template property of foobar*
and we always queried foobar.

This is basically a REINDEX CONCURRENTLY, which postgres doesn't really support - but is possible with elastic search.

eeeebbbbrrrr · 2018-12-17T23:01:31Z

Lets look at this example:

create table foo (id text);
insert into foo (id) values (1), (2), (3);
create index idxfoo on foo using zombodb ((foo.*));
alter table foo alter column id type bigint using id::bigint; -- rewrites index

begin;
alter table foo alter column id type text using id::text; -- takes exclusive lock, blocks concurrent reads
commit; -- lock released

This uses ALTER TABLE ALTER COLUMN to change the datatype. In doing so, Postgres is going to rewrite any index over that column, which includes full, covering USING zombodb indices. It does take an AccessExclusiveLock, which blocks concurrent writes, but that's a PG thing, not a ZDB thing.

So maybe you can look at doing this pattern for your case of "user decides to change field type". You just need proper conversion functions/casts from type a to type b.

That said, in the case of ALTER TABLE DROP COLUMN, I could teach ZDB to intercept that statement and do the necessary Elasticsearch _reindex API calls to create a new index without that field in the mapping.

ALTER TABLE DROP COLUMN also takes an AccessExclusiveLock, blocking concurrent reads, but right now it's a fairly fast operation. With ZDB intercepting it to rewrite the backing ES index, it would take quite a bit longer.

The same, I suppose, could be done for ALTER TABLE RENAME COLUMN too.

If you're looking for some kind of "REINDEX CONCURRENTLY" feature, that ain't gunna happen from ZDB until it's an actual feature that PG supports first.

Thoughts?

esatterwhite · 2018-12-18T21:43:58Z

Ya that is an idea. There isn't always a clean mapping like going for a number to a timestamp or a timestamp to a boolean for example. That would make sense to end users anyway

After thinking about it, I think I could just use an internal representation for the column names for zombodb to index and use column aliases in the query.

this would minimize the likely hood of conflicts.

I could teach ZDB to intercept that statement and do the necessary Elasticsearch _reindex API calls to create a new index without that field in the mapping

that would be fantastic.

eeeebbbbrrrr · 2018-12-18T21:48:13Z

After thinking about it, I think I could just use an internal representation for the column names for zombodb to index and use column aliases in the query.

Yeah, that's a good idea. Maybe you could use COMMENT ON to store the alias, so at least you can keep that user-defined metadata as part of the actual schema. Then again, it's not like it'd be difficult to track that stuff elsewhere. Just a thought.

I'll think about the work to intercept ALTER TABLE DROP/RENAME COLUMN and get started on it sometime in January. Holidays are quickly taking over...

esatterwhite · 2018-12-18T21:52:51Z

ya, there is a table that tracks what is user facing ( type, name, id ) and a trigger on that table that manages a partition ( add / remove columns ) for that company with a zombo index on it. so I think this is pretty feasible.

Thanks!

eeeebbbbrrrr · 2018-12-18T23:08:08Z

And maybe another general benefit is that you can programmatically create sensical Postgres column names that are all lowercase, no spaces, no unicode, etc, but let users just name them whatever they want. How could that possibly cause a problem?!

esatterwhite · 2018-12-19T18:01:28Z

actually now that I understand what is going a little better and have a better approach to it I'm not sure I would want every drop column to always trigger a reindex.

could it be a setting of some type?
Sorry to keep flip-flopping on you. feel free to back-burner this one. I Think I'm ok for the time being.

eeeebbbbrrrr · 2018-12-20T00:14:26Z

From ZDB's perspective, I like the idea of it leaving the index in a state that exactly matches the table schema, and doesn't require manual a REINDEX to "fix problems".

I'll think on it more. I've got things to sort out like:

BEGIN;
ALTER TABLE foo DROP COLUMN bar; -- ZDB rewrites the index and mapping
ALTER TABLE foo ADD COLUMN bar text; -- ZDB adds "bar" to index mapping
ABORT; -- now what?

I've gotta be able to properly handle transaction rollbacks with this.

esatterwhite · 2021-08-09T19:49:47Z

Is this closed fixed or won't do?

eeeebbbbrrrr · 2021-08-09T20:37:43Z

Won’t do. I just don’t see it as overly valuable relative to the implementation completely plus the operational overhead of reindexing the entire index.

eeeebbbbrrrr added the question label Dec 17, 2018

eeeebbbbrrrr added the enhancement label Dec 18, 2018

eeeebbbbrrrr self-assigned this Dec 18, 2018

eeeebbbbrrrr closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DROP COLUMN doesn't remove field from mapping #322

DROP COLUMN doesn't remove field from mapping #322

esatterwhite commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

esatterwhite commented Dec 17, 2018 •

edited

eeeebbbbrrrr commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

esatterwhite commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018 •

edited

esatterwhite commented Dec 18, 2018

eeeebbbbrrrr commented Dec 18, 2018

esatterwhite commented Dec 18, 2018

eeeebbbbrrrr commented Dec 18, 2018

esatterwhite commented Dec 19, 2018

eeeebbbbrrrr commented Dec 20, 2018

esatterwhite commented Aug 9, 2021

eeeebbbbrrrr commented Aug 9, 2021

DROP COLUMN doesn't remove field from mapping #322

DROP COLUMN doesn't remove field from mapping #322

Comments

esatterwhite commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

esatterwhite commented Dec 17, 2018 • edited

eeeebbbbrrrr commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018

esatterwhite commented Dec 17, 2018

eeeebbbbrrrr commented Dec 17, 2018 • edited

esatterwhite commented Dec 18, 2018

eeeebbbbrrrr commented Dec 18, 2018

esatterwhite commented Dec 18, 2018

eeeebbbbrrrr commented Dec 18, 2018

esatterwhite commented Dec 19, 2018

eeeebbbbrrrr commented Dec 20, 2018

esatterwhite commented Aug 9, 2021

eeeebbbbrrrr commented Aug 9, 2021

esatterwhite commented Dec 17, 2018 •

edited

eeeebbbbrrrr commented Dec 17, 2018 •

edited