Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept multiple --id options #19

Merged
merged 4 commits into from
Dec 8, 2022
Merged

Accept multiple --id options #19

merged 4 commits into from
Dec 8, 2022

Conversation

allenap
Copy link
Contributor

@allenap allenap commented Oct 25, 2022

It's useful to be able to get a database slice for multiple identifiers in a single pass.

@allenap allenap requested a review from jelder October 25, 2022 16:31
jelder
jelder previously approved these changes Oct 25, 2022
@allenap
Copy link
Contributor Author

allenap commented Oct 26, 2022

This is a breaking change because custom copy-out queries will need to change conditions from blah = :id to blah in :ids. How do we want to handle this? Ordinarily this would be a major version bump, but we're pre-1.0 so I assume we release with a bump to the minor version.

@jelder jelder dismissed their stale review October 26, 2022 12:23

Breaking change

@jelder
Copy link
Contributor

jelder commented Oct 26, 2022

@allenap Great point. I wonder if we could just run each user-defined query once for each :id without breaking anything, or making existing feats impossible.

@allenap
Copy link
Contributor Author

allenap commented Oct 26, 2022

I wonder if we could just run each user-defined query once for each :id without breaking anything, or making existing feats impossible.

This will break for a query like:

select * from example_table
 where id_column = :id
    or id_column is null

because we'll fetch those id_column is null rows multiple times.

In custom copy-out queries we could instead replace :id with ANY (ARRAY[...the ids...]), so that the query above expands to:

select * from example_table
 where id_column = ANY (ARRAY[...])
    or id_column is null

I've read that PostgreSQL treats IN (...) and = ANY (array) the same, i.e. one de-sugars to the other.

Queries like:

select * from example_table
 where id_column in (:id, 'some-fixed-value')

will not work, but they won't work with the :ids expansion in this branch either.

I'll try it out.

@allenap
Copy link
Contributor Author

allenap commented Oct 26, 2022

I tried out id_column = ANY (ARRAY[...]) and it breaks for an unexpected reason: there's no type coercion with the = ANY (ARRAY[...]) form:

image

I'll try adding that cast.

@allenap
Copy link
Contributor Author

allenap commented Oct 26, 2022

@jelder I've added a cast based on the column's udt_name and it works... but custom copy-out queries still need to be updated. For example, we expand the following:

select * from example_table
 where uuid_column = :id

into:

select * from example_table
 where uuid_column = any (array[...])

and PostgreSQL complains "operator does not exist: uuid = text" about that.

This is a confusing error message. I think we're better off sticking with the original breaking change. What do you think?

@JQuezada0
Copy link
Contributor

@allenap I'm somewhat confused, after your change to cast the array it works? or it still fails?

@allenap
Copy link
Contributor Author

allenap commented Nov 7, 2022

@allenap I'm somewhat confused, after your change to cast the array it works? or it still fails?

It works with the explicit type casting that I added, e.g.:

select * from example_table
 where uuid_column = any (array[...] :: uuid[])

but it feels overcomplicated. I think it's better to take the incompatibility hit of changing :id to :ids than to go with this.

There's another way:

@allenap Great point. I wonder if we could just run each user-defined query once for each :id without breaking anything, or making existing feats impossible.

We could use union, e.g. given the query:

select * from example_table
 where column = :id

and the IDs "foo" and "bar", we would generate the following query:

select * from example_table
 where column = 'foo'
union
select * from example_table
 where column = 'bar'

Upside: simple, it works (probably).
Downsides:

  • Lots of IDs means huge queries.
  • union means that PostgreSQL has to deduplicate rows which could be expensive. We can't use union all to get around this because it may break uniqueness constraints when loading.

My conclusion remains as before: switch from :id to :ids and take the one-time hit.

@JQuezada0
Copy link
Contributor

I agree w/ @allenap, idCol IN (...) seems like the simplest path, even though that'll cause a breaking change with earlier versions

@JQuezada0
Copy link
Contributor

What do you think @jelder ?

@jelder
Copy link
Contributor

jelder commented Nov 16, 2022

It's a breaking change, so we need to reflect that in the version when we release it.

Also, the animated gif would have to be regenerated (it's in code).

Could we continue to support --id (singular, but used >0 times)?

@allenap
Copy link
Contributor Author

allenap commented Nov 16, 2022

It's a breaking change, so we need to reflect that in the version when we release it.

👍

Also, the animated gif would have to be regenerated (it's in code).

Good point. I will do it as a separate commit/merge to main, so that it shows the correct release version information.

Could we continue to support --id (singular, but used >0 times)?

Yes; this PR does that.

@allenap
Copy link
Contributor Author

allenap commented Nov 16, 2022

Updated to use IN :ids and tested with an example database and two --id flags. Seems to work fine.

@jelder jelder merged commit 0987b1d into main Dec 8, 2022
@jelder jelder deleted the gpanella/multiple-ids branch December 8, 2022 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants