Skip to content

Commit

Permalink
Update V3HighLevelDesign.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sougou committed Jan 11, 2016
1 parent c1d550c commit a9f5b5c
Showing 1 changed file with 38 additions and 54 deletions.
92 changes: 38 additions & 54 deletions doc/V3HighLevelDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,17 +226,11 @@ limit 10
The steps performed by the SQL engine are as follows:

1. Scan t (FROM)

2. Filter rows where c=10 (WHERE)

3. Remove all columns other than a & b, and compute a+1 (SELECT)

4. Count and group unique a+1, b values (GROUP BY)

5. Filter counts that are > 10 (HAVING)

6. order the rows by descending count (ORDER BY)

7. Return the top 10 results (LIMIT)

The one thing that’s not obvious in a SELECT is that the select list is not the first clause that’s executed. It’s the FROM.
Expand All @@ -245,25 +239,15 @@ The one thing that’s not obvious in a SELECT is that the select list is not th

If a SELECT statement was used just to fetch results, things would be very simple. However, it can be used in other parts of a query, and can produce 5 kinds of values:

1. A simple value:
1. A simple value: `select * from t where id = (select 1 from dual)`

`select * from t where id = (``_select 1 from dual_``)`
2. A value tuple: multiple values of the same type: `select * from t1 where id in (select val from t2)`

2. A value tuple: multiple values of the same type:
3. A row tuple: multiple values of different types. A row tuple can also be used as a value: `select * from t1 where (a,b) = (select b,c from t2 where pk=1)`

`select * from t1 where id in (``_select val from t2_``)`
4. Rows: results of a regular query. Note that rows can be treated as a value tuple of row tuples. For example, this statement is valid: `select * from t1 where (a,b) in (select b,c from t2)`

3. A row tuple: multiple values of different types. A row tuple can also be used as a value:

`select * from t1 where (a,b) = (``_select b,c from t2 where pk=1_``)`

4. Rows: results of a regular query. Note that rows can be treated as a value tuple of row tuples. For example, this statement is valid:

`select * from t1 where (a,b) in (``_select b,c from t2_``)`

5. A virtual table: If you don’t strip out the field info from the original query, then a select can act as a table, where the field names act as column names. For example, this statement is valid:

`select * from (``_select a, b from t2_``) as t where t.a=1`
5. A virtual table: If you don’t strip out the field info from the original query, then a select can act as a table, where the field names act as column names. For example, this statement is valid: `select * from (select a, b from t2) as t where t.a=1`

This opens the door to some powerful expressibility, the other edge of the sword being indefinite complexity.

Expand Down Expand Up @@ -586,7 +570,7 @@ Now that all operators are defined, we have to see if they’re complete. Let’
```
01: (select count(*), ta1.col+1, tb1.col
02: from ta1 join tb1 on ta1.id=tb1.id
03: where ta1.``c``ol2=10 and ta1.col2 > any
03: where ta1.col2=10 and ta1.col2 > any
04: (select c from tc1 where tc1.id=ta1.id)
05: group by ta1.col+1, tb1.col
06: having count(*) > 10
Expand All @@ -610,7 +594,7 @@ The above query can be represented as a single expression. But let’s start wit
04: SubqueryCScanResult = Select(
Filter(
Scan(tc1),
tc1.id=``JoinResult1.ta1.id``,
tc1.id=JoinResult1.ta1.id,
),
tc1.c)
03: FilterResult1 = Filter(
Expand All @@ -628,7 +612,7 @@ The above query can be represented as a single expression. But let’s start wit
10: SelectResult2 = Select(FilterResult2, (1, ta2.col+1, tb2.col))
13: Result2 = Aggregate(SelectResult2, count(‘1’))
// union
09: UnionResult = ``Aggregate``(Merge(Result1, Result2), NULL)
09: UnionResult = Aggregate(Merge(Result1, Result2), NULL)
14: OrderResult = Sort(UnionResult, (‘1’ desc))
15: Result = Limit(OrderResult, 0, 5)
```
Expand All @@ -646,7 +630,7 @@ Result = Limit(
Aggregate(
Select(
Filter(
``JoinResult1`` = InnerJoin(
JoinResult1 = InnerJoin(
Scan(ta1),
Scan(tb1),
ta1.id=tb1.id,
Expand All @@ -655,7 +639,7 @@ Result = Limit(
Select(
Filter(
Scan(tc1),
tc1.id=``JoinResult1``.ta1.id,
tc1.id=JoinResult1.ta1.id,
),
tc1.c,
),
Expand Down Expand Up @@ -1061,16 +1045,16 @@ Also, the inner query cannot access any columns of the outer t2 because it has i
A corollary: since subquery nesting can have many levels, an inner query can still reference results from any of the outer scopes. For example, this is a valid query:

```
select`` ``t1.a``, t1.b
select t1.a, t1.b
from t1
where t1.id in (
``select`` t2.id
select t2.id
from t2
where t2.id in (
select t3.id
from t3
where t3.id = ``t1.a
``)
where t3.id = t1.a
)
)
```

Expand All @@ -1086,7 +1070,7 @@ The result of such a subquery will be a new virtual table with its own vindex co
select sq.a, t2.b
from
(select a from t1 where id = :id) as sq
join t2` on`` t2.b = sq.a
join t2 on t2.b = sq.a
```

The main symbol table will look like this:
Expand Down Expand Up @@ -1120,11 +1104,11 @@ The symbol table for the `‘from t1’` subquery will be in its own scope. Its
*Conceptually speaking, there’s no strong reason to prevent the FROM clause from referencing columns of an outer query. MySQL probably disallows this because it leads to confusing scoping rules. Let’s take a look at this example:*

```
select id from ``a`` where id in (
select id from a where id in (
select b.id, a.id
from
(select id from b where b.id=``a.id``) b
join ``d as a`` on ``a.id``=b.id)
(select id from b where b.id=a.id) b
join d as a on a.id=b.id)
```

*If we allowed an outer scope, then b.id=a.id becomes ambiguous. We don’t know if it should refer to the outer ‘a’, or the one that is yet to be created as part of the current scope. Needless to say, one shouldn’t be writing such unreadable queries.*
Expand All @@ -1133,11 +1117,11 @@ select id from ``a`` where id in (

By the fact that the ON clause is part of a FROM clause, you cannot refer to an element of an outer query. This is an invalid query:

`select id from a ``where id in (select b.id from b join c on b.id=``a.id``)`
`select id from a where id in (select b.id from b join c on b.id=a.id)`

But this is valid:

`select id from a where id in (select b.id from b join c where b.id=``a.id``)`
`select id from a where id in (select b.id from b join c where b.id=a.id)`

Furthermore, you can only refer to elements that participate in the current JOIN. But the following query is still valid:

Expand All @@ -1147,7 +1131,7 @@ Because of associativity rules, c is actually joined with ‘a join b’.

But this query is not valid:

`select * from a join ``(b join c on c.id = a.id)`
`select * from a join (b join c on c.id = a.id)`

The ON clause participants are only b and c because of the parenthesis. So, you’re not allowed to reference a.id in that join. In other words, analysis of the ON clause requires a list of participants. The analyzer will need to check with this list before looking inside the symbol table, or we could exploit Go interfaces here by sending a subset symbol table that has the same interface as the original one.

Expand Down Expand Up @@ -1229,7 +1213,7 @@ For the sake of simplicity, we’ll not allow dependencies to use non-standard c

This should instead be rewritten as:

`select * from (select a, count(*) ``c`` from t1) t where ``t.c`` > 0`
`select * from (select a, count(*) c from t1) t where t.c > 0`

This way, we can build a bind variable like `‘:_t_c’` for the outer query if we have split the above query into two.

Expand Down Expand Up @@ -1421,11 +1405,11 @@ But left joins are slightly different, because the join condition is applied *to

is same as:

`a LeftJoin (b LeftJoin c where c.id=b.id) ``where c.id=a.id``)`
`a LeftJoin (b LeftJoin c where c.id=b.id) where c.id=a.id)`

But it’s not the same as:

`a LeftJoin (b left join c where c.id=b.id ``and c.id=a.id``)`
`a LeftJoin (b left join c where c.id=b.id and c.id=a.id)`

If the above joins were normal joins, then all three would be equivalent.

Expand Down Expand Up @@ -1525,16 +1509,16 @@ At the end of the analysis, the output is a symbol table and a reduced tree repr

Here’s an example:

`a join b on b.id=a.id ``join c on c.id=b.id2 join d on d.id2=c.id2 ``join e on e.id=a.id`
`a join b on b.id=a.id join c on c.id=b.id2 join d on d.id2=c.id2 join e on e.id=a.id`

will be represented as:

```
``J2
J2
/ \
``J1`` e ``_where e.id=a.id_
J1 e where e.id=a.id
/ \
a join b on b.id=a.id ``c join d on d.id2=c.id2 ``_where c.id=b.id2_
a join b on b.id=a.id c join d on d.id2=c.id2 where c.id=b.id2
```

### WHERE clause
Expand All @@ -1559,17 +1543,17 @@ the ON clause is processed, which is not until `LeftJoin` returns the results.

*There are some practical use cases where you’d want to add conditions in the WHERE clause, typically for NULL checks. To support this in the future, we can use this technique that converts a LEFT JOIN into a normal JOIN:*

`_from a left join b on b.id=a.id where b.id is NULL_`
`from a left join b on b.id=a.id where b.id is NULL`

*After split, the query on b should have become:*

`_from b where b.id=a.id_`
`from b where b.id=a.id`

*We cannot push down the null check in the above case. But we can, if we rewrote it as:*

`_from (select 1) as t left join b on b.id=a.id _``_where b.id is NULL_`
`from (select 1) as t left join b on b.id=a.id where b.id is NULL`

*The (select 1) acts as a fake driver row for the left join, which allows us to tack on the where clause and also change the` `*_LeftJoin_`* to a `*_Join_`*.*
*The (select 1) acts as a fake driver row for the left join, which allows us to tack on the where clause and also change the `LeftJoin` to a `Join`.*

###### Subqueries

Expand All @@ -1588,9 +1572,9 @@ If a, b and c where in different groups, the output would be:
```
J
/ \
J c where (c.id=``b.id``) and ``(cond2(a.col, c.col))
J c where (c.id=b.id) and (cond2(a.col, c.col))
/ \
a b where (b.id=``a.id``) and ``(cond1(a.col, b.col))
a b where (b.id=a.id) and (cond1(a.col, b.col))
```

The cond2 expression gets pushed into the the where clause for table ‘c’ because it’s the right-most group that’s referenced by the condition. External references will be changed to appropriate bind variables by the rewiring phase.
Expand Down Expand Up @@ -1618,9 +1602,9 @@ If an expression references columns from more than one group, then the expressio
*can be resolved with the following join*

```
_select a.col, a.bar from a // produce _a_col & _a_bar_
_Join_
_select :_a_col+b.col from b where b.foo=:_a_bar_
select a.col, a.bar from a // produce _a_col & _a_bar
Join
select :_a_col+b.col from b where b.foo=:_a_bar
```

*However, such queries are not very efficient for VTTablet. It can’t cache field info, or use rowcache. Such push-downs don’t benefit MySQL either. So, it’s better to wait till VTGate implements the ability to evaluate expressions.*
Expand Down Expand Up @@ -1771,7 +1755,7 @@ In order to resolve dependencies correctly, we need the symbol tables that were

1. Change the external reference to the corresponding bind var name.

2. If that column is not already in the SELECT list of the other group, we add it.
2. If that column is not already in the select list of the other group, we add it.

3. Mark the column number to be exported as the bind var name that we chose.

Expand Down

0 comments on commit a9f5b5c

Please sign in to comment.