Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for FETCH/OFFSET syntax #1

Closed
martint opened this issue Jan 21, 2019 · 11 comments
Closed

Add support for FETCH/OFFSET syntax #1

martint opened this issue Jan 21, 2019 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@martint
Copy link
Member

martint commented Jan 21, 2019

In terms of syntax, this is what the spec says:

A <query expression> can also optionally contain a <result offset clause>, 
which may limit the cardinality of the derived table by removing a specified 
number of rows from the beginning of the derived table. If a <query expression> 
contains both an <order by clause> and a <result offset clause>, then the rows 
in the derived table are first sorted according to the <order by clause> and then
 limited by dropping the number of rows specified in the <result offset clause> 
from the beginning of the result produced by the <query expression>. If the 
cardinality of the result of an evaluation of a <query expression> is less than
 the offset value specified by a <result offset clause>, then the derived table is empty.

And

<query expression> ::=
  [ <with clause> ] <query expression body>
      [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ]

<result offset clause> ::=
  OFFSET <offset row count> { ROW | ROWS }

<fetch first clause> ::=
  FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }

<fetch first quantity> ::=
    <fetch first row count>
  | <fetch first percentage>

<offset row count> ::=
  <simple value specification>

<fetch first row count> ::=
  <simple value specification>

<fetch first percentage> ::=
  <simple value specification> PERCENT
@martint martint added the enhancement New feature or request label Jan 21, 2019
@findepi
Copy link
Member

findepi commented Jan 29, 2019

Certainly, we want to support standard syntax.
Do we want to support simplified syntax OFFSET n LIMIT m as well?
Do we want to support LIMIT m OFFSET n version too? It is misleading (it looks as if LIMIT was applied before OFFSET). But this form is available in eg PostgreSQL and MySQL.
@martint ?

@martint
Copy link
Member Author

martint commented Apr 22, 2019

Do we want to support simplified syntax OFFSET n LIMIT m as well?

Yes, that would fall out naturally from having LIMIT be an alternate form of FETCH FIRST/NEXT.

Do we want to support LIMIT m OFFSET n version too. It is misleading [...]

Agreed. That's confusing. Let's not add it for now. We can always expand later.

@martint
Copy link
Member Author

martint commented Apr 22, 2019

Some additional details from the spec:

29) The declared type of <offset row count> shall be an exact numeric with scale 0 (zero).
30) The declared type of <fetch first row count> shall be an exact numeric with scale 0 (zero).
31) The declared type of the <simple value specification> simply contained in <fetch first percentage> shall be numeric.
32) If a <query expression> simply contains a <fetch first clause> that simply contains WITH TIES, then the 
    <query expression> shall simply contain an <order by clause>. If a <query primary> simply contains a <fetch first clause>
     that simply contains WITH TIES, then the <query primary> shall simply contain an <order by clause>.
6) Let OCT be the cardinality of T.
  a) Case:
    i) If <result offset clause> is not specified, then let RORC be 0 (zero).
    ii) Otherwise, let RORC be the value of <offset row count>.
  b) If RORC is less than 0 (zero), then an exception condition is raised: data exception — invalid row
     count in result offset clause.
  c) Case:
    i) If RORC is greater than or equal to OCT, then all rows are removed from T and the cardinality of T is 0 (zero).
    ii) Otherwise, the first RORC rows in order as specified by General Rule 5) are removed from T and the cardinality 
       of T is (OCT – RORC).
7) If <fetch first clause> is specified, then:
  a) Let OCT2 be the cardinality of T.
        NOTE 332 — OCT2 is the cardinality of T after the removal of any rows from T by the application of the prior General Rule.
  b) Case:
    i) If <fetch first row count> is specified, then let FFRC be the value of <fetch first row count>.
    ii) If <fetch first percentage> is specified, then let FFP be the <simple value specification> simply contained in 
        <fetch first percentage>, and let LOCT be a <literal> whose value is OCT. Let FFRC be the value of CEILING ( FFP * LOCT / 100.0E0 )
            NOTE 333 — The percentage is computed using the number of rows before removing the rows specified by <offset row count>.
    iii) Otherwise, let FFRC be 1 (one).
  c) If FFRC is less than 1 (one), then an exception condition is raised: data exception — invalid row count in fetch first clause.
  d) If FFRC is less than OCT2, then Case:
    i) If WITH TIES is specified, then rows other than the first FFRC rows in the order specified by General Rule 6) of 
       this Subclause and their peers as defined in Subclause 10.10, “<sort specification list>”, are removed from T, 
       and the cardinality of T is the number of rows remaining in T.
    ii) Otherwise, rows other than the first FFRC rows in order as specified by General Rule 6) of this Subclause are 
        removed from T and the cardinality of T is FFRC.

@kasiafi
Copy link
Member

kasiafi commented Apr 23, 2019

There are two major differences between LIMIT and FETCH:

  1. LIMIT value must be specified to either number or ALL, while FETCH value defaults to 1 if not specified,
  2. according to spec, FETCH value cannot be lower than 1, while LIMIT 0 is accepted.

The former difference doesn't bother me, but I find the latter difference hard to justify.
@martint do you think I could disobey the spec and allow FETCH zero rows?

@martint
Copy link
Member Author

martint commented Apr 23, 2019

Let's start with strict compliance. We can always relax later if necessary.

@dain
Copy link
Member

dain commented May 8, 2019

Fetch was added in 310. Offset still remains to be implemented.

@findepi
Copy link
Member

findepi commented May 10, 2019

Offset was added in #732, will be available in Presto 311.

@findepi
Copy link
Member

findepi commented Jun 11, 2019

WITH TIES was added in #832, will be available in Presto 315.

@arminmaurice1990
Copy link

arminmaurice1990 commented Jan 14, 2020

@martint @findepi Would it be okay if we made a pr with the LIMIT then OFFSET ordering being allowed? We run queries over several datastore not supported by Presto which expect this ordering, and it seems to be somewhat standard for many sql (and sql-ish) databases.

@martint
Copy link
Member Author

martint commented Jan 14, 2020

We run queries over several datastore not supported by Presto which expect this ordering, and it seems to be somewhat standard for many sql (and sql-ish) databases.

Can you elaborate? It's unlikely that queries that run in other systems will run out of the box in Presto unless they are very simple, anyway. There are subtle differences in dialects, support for certain SQL features and available functions.

@arminmaurice1990
Copy link

arminmaurice1990 commented Jan 14, 2020

https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
As above, the syntax is almost fully interchangeable for non-trivial queries, save for the ordering of limit and offset (offset being part of the limit clause, same with Postgres syntax https://www.postgresql.org/docs/8.1/queries-limit.html which I know presto already translates to).

edit: seems as though this is standard with impala as well
http://impala.apache.org/docs/build/html/topics/impala_offset.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

5 participants