Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP COLUMN doesn't remove field from mapping #322

Closed
esatterwhite opened this issue Dec 17, 2018 · 15 comments
Closed

DROP COLUMN doesn't remove field from mapping #322

esatterwhite opened this issue Dec 17, 2018 · 15 comments

Comments

@esatterwhite
Copy link

ZomboDB version: 10-1.0.0b9
Postgres version: 10.5
Elasticsearch version: 6

Problem Description:

When removing a column from a table, the related field is left in the index mapping in elastic search.
This means you can't drop a column and add it back with a different column type. The only thing that seems to get a round the problem is a full REINDEX.

or add a suffix to the column names.

Is this expected?

@eeeebbbbrrrr
Copy link
Collaborator

Yep, this is expected. Elasticsearch doesn't provide a way to remove a field from the index mapping, so there's not really anything ZDB can do.

I suppose this could be better documented by ZDB, but that's about it.

@eeeebbbbrrrr
Copy link
Collaborator

Also worth nothing that ES doesn't provide a way to rename a field in a mapping either. :(

@esatterwhite
Copy link
Author

esatterwhite commented Dec 17, 2018

Is it possible to control the order of operations at the ES level?

right now it looks like it drops the index, makes a new one and populates it.
is it possible to make a new index in ES, populate it, and then drop the old one?

@eeeebbbbrrrr
Copy link
Collaborator

You mean during a REINDEX?

The problem there is that ES doesn't provide a way to rename an index, either. They want you to do that using aliases, and ZDB uses aliases for other reasons.

What's the actual problem we're trying to solve here? Do you have a production workflow where you drop/re-add columns with the same name but different types?

@eeeebbbbrrrr
Copy link
Collaborator

The problem there is that ES doesn't provide a way to rename an index, either. They want you to do that using aliases, and ZDB uses aliases for other reasons.

Hmm, I guess maybe you could use ES' "reindex" api to do this. I haven't actually used that, but I think it would necessitate a full rewrite of the index.

@esatterwhite
Copy link
Author

it is basically a user defined schema allowing companies to define a custom table to store data + search through it with full operator support for the appropriate data type. Indexing that is pretty difficult. operator support for date / numeric ranges in json columns is mostly non-existent. But ES makes this pretty easy.

The problem is people tend to change their mind. Or someone tries to add a field that were previously deleted and it is of a different type.

  • someone add a foo column of text type
  • removes it a few weeks later
  • someone else comes a long and adds a foo column with a date type

a full index write under the hood is fine. more the problem is that nothing is really going to work while that is happening because the index is just gone. If zombo stood up a new index before removing the old one - I think we'd be in business.

I can probably get around the type conflicts in general by using a type suffix like foo_text instead of just foo but at some point we'll need to do some kind of maintenance on these, just trying to figure out how that would be possible

You mean during a REINDEX?

yes during a reindex. our manual process in the past for reindexing ES / fixing mappings in the past has been

  • stand up a new index
  • apply current template mapping
  • populate the new index with documents
  • tag it with the same alias
  • drop old index

applications queries the alias obviously. our template mapping had a template property of foobar*
and we always queried foobar.

This is basically a REINDEX CONCURRENTLY, which postgres doesn't really support - but is possible with elastic search.

@eeeebbbbrrrr
Copy link
Collaborator

eeeebbbbrrrr commented Dec 17, 2018

Lets look at this example:

create table foo (id text);
insert into foo (id) values (1), (2), (3);
create index idxfoo on foo using zombodb ((foo.*));
alter table foo alter column id type bigint using id::bigint; -- rewrites index

begin;
alter table foo alter column id type text using id::text; -- takes exclusive lock, blocks concurrent reads
commit; -- lock released

This uses ALTER TABLE ALTER COLUMN to change the datatype. In doing so, Postgres is going to rewrite any index over that column, which includes full, covering USING zombodb indices. It does take an AccessExclusiveLock, which blocks concurrent writes, but that's a PG thing, not a ZDB thing.

So maybe you can look at doing this pattern for your case of "user decides to change field type". You just need proper conversion functions/casts from type a to type b.

That said, in the case of ALTER TABLE DROP COLUMN, I could teach ZDB to intercept that statement and do the necessary Elasticsearch _reindex API calls to create a new index without that field in the mapping.

ALTER TABLE DROP COLUMN also takes an AccessExclusiveLock, blocking concurrent reads, but right now it's a fairly fast operation. With ZDB intercepting it to rewrite the backing ES index, it would take quite a bit longer.

The same, I suppose, could be done for ALTER TABLE RENAME COLUMN too.

If you're looking for some kind of "REINDEX CONCURRENTLY" feature, that ain't gunna happen from ZDB until it's an actual feature that PG supports first.

Thoughts?

@esatterwhite
Copy link
Author

Ya that is an idea. There isn't always a clean mapping like going for a number to a timestamp or a timestamp to a boolean for example. That would make sense to end users anyway

After thinking about it, I think I could just use an internal representation for the column names for zombodb to index and use column aliases in the query.

this would minimize the likely hood of conflicts.

I could teach ZDB to intercept that statement and do the necessary Elasticsearch _reindex API calls to create a new index without that field in the mapping

that would be fantastic.

@eeeebbbbrrrr
Copy link
Collaborator

After thinking about it, I think I could just use an internal representation for the column names for zombodb to index and use column aliases in the query.

Yeah, that's a good idea. Maybe you could use COMMENT ON to store the alias, so at least you can keep that user-defined metadata as part of the actual schema. Then again, it's not like it'd be difficult to track that stuff elsewhere. Just a thought.

I'll think about the work to intercept ALTER TABLE DROP/RENAME COLUMN and get started on it sometime in January. Holidays are quickly taking over...

@esatterwhite
Copy link
Author

ya, there is a table that tracks what is user facing ( type, name, id ) and a trigger on that table that manages a partition ( add / remove columns ) for that company with a zombo index on it. so I think this is pretty feasible.

Thanks!

@eeeebbbbrrrr
Copy link
Collaborator

And maybe another general benefit is that you can programmatically create sensical Postgres column names that are all lowercase, no spaces, no unicode, etc, but let users just name them whatever they want. How could that possibly cause a problem?!

@esatterwhite
Copy link
Author

actually now that I understand what is going a little better and have a better approach to it I'm not sure I would want every drop column to always trigger a reindex.

could it be a setting of some type?
Sorry to keep flip-flopping on you. feel free to back-burner this one. I Think I'm ok for the time being.

@eeeebbbbrrrr
Copy link
Collaborator

From ZDB's perspective, I like the idea of it leaving the index in a state that exactly matches the table schema, and doesn't require manual a REINDEX to "fix problems".

I'll think on it more. I've got things to sort out like:

BEGIN;
ALTER TABLE foo DROP COLUMN bar; -- ZDB rewrites the index and mapping
ALTER TABLE foo ADD COLUMN bar text; -- ZDB adds "bar" to index mapping
ABORT; -- now what?

I've gotta be able to properly handle transaction rollbacks with this.

@esatterwhite
Copy link
Author

Is this closed fixed or won't do?

@eeeebbbbrrrr
Copy link
Collaborator

Won’t do. I just don’t see it as overly valuable relative to the implementation completely plus the operational overhead of reindexing the entire index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants