Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Inconsistent Results When Executing SELECT with MAX() Function #15565

Closed
wangweicugw opened this issue Mar 26, 2024 · 6 comments
Closed

Comments

@wangweicugw
Copy link
Contributor

wangweicugw commented Mar 26, 2024

Overview of the Issue

I've encountered an issue with inconsistent results when executing a query that involves the MAX() aggregate function. The query is as follows:

SELECT empno, MAX(sal) FROM emp;

When I execute this SQL multiple times, the result set of the query is inconsistent, with different empno values being returned for the same MAX(sal) value upon multiple executions.
I expect the query to return a consistent result set each time.

image

Reproduction Steps

1.Deploy the following vschema:

{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    },
  "tables": {
    "emp": {
      "column_vindexes": [
        {
          "column": "deptno",
          "name": "hash"
        }
      ]
    }
  }
}

2.Deploy the following schema:

CREATE TABLE emp (
    empno bigint NOT NULL,
    ename VARCHAR(10),
    job VARCHAR(9),
    mgr bigint,
    hiredate DATE,
    sal bigint,
    comm bigint,
    deptno bigint,
    PRIMARY KEY (empno)
) Engine = InnoDB
  COLLATE = utf8mb4_general_ci;

INSERT INTO emp (empno, ename, job, mgr, hiredate, sal, comm, deptno) VALUES (1, 'John', 'Manager', NULL, '2022-01-01', 5000, 500, 1);

3.Run

SET GLOBAL sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';

Execute this SQL multiple times and you may see different results.

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|     1 |     5000 |
+-------+----------+
1 row in set (0.01 sec)

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|  NULL |     5000 |
+-------+----------+
1 row in set (0.01 sec)

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|     1 |     5000 |
+-------+----------+
1 row in set (0.01 sec)

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|     1 |     5000 |
+-------+----------+
1 row in set (0.01 sec)

Binary Version

vtgate version Version: 18.0.3 (Git revision 72528db2b706b7c36bc8236cd765b99b256d247f branch 'HEAD') built on Mon Mar 25 17:21:25 CST 2024 by wangwei@ZBMac-xxxxx using go1.21.8 darwin/arm64

Operating System and Environment details

Darwin 21.1.0
arm64

Log Fragments

No response

@wangweicugw wangweicugw added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Mar 26, 2024
@wangweicugw
Copy link
Contributor Author

wangweicugw commented Mar 26, 2024

I think there might be an issue with the TryExecute method in go/vt/vtgate/engine/scalar_aggregation.go. To illustrate with my current example, consider a situation where there is only one data entry, but suppose it is split across two shards. In this case, one shard would return null, null, like this

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|  NULL |     NULL |
+-------+----------+
1 row in set (0.01 sec)

and the other shard would return 1, 5000. , like this

mysql> SELECT empno, MAX(sal) FROM emp;
+-------+----------+
| empno | max(sal) |
+-------+----------+
|     1 |     5000 |
+-------+----------+
1 row in set (0.01 sec)

If null, null is in the first row of result.Rows, the final result would be null, 5000. Conversely, if 1, 5000 is in the first row of result.Rows, the final result would be 1, 5000.

@mattlord
Copy link
Contributor

mattlord commented Mar 26, 2024

This is not obviously a Vitess bug to me. I say that as the results for the non-aggregated column in the query you ran are not guaranteed in SQL. If you tried to execute the query directly on one of the mysqld instances you should get this error (same type of query used here with the local example's: ./101_initial_cluster.sh):

mysql> select sku, max(price) from product;
ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'vt_commerce.product.sku'; this is incompatible with sql_mode=only_full_group_by

When I execute this same query in vtgate (v20-dev):

mysql>  select sku, max(price) from product;
ERROR 1140 (42000): target: commerce.0.primary: vttablet: rpc error: code = InvalidArgument desc = In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'vt_commerce.product.sku'; this is incompatible with sql_mode=only_full_group_by (errno 1140) (sqlstate 42000) (CallerID: userData1): Sql: "select sku, max(price) from product", BindVars: {}

Are you explicitly modifying MySQL's sql_mode in your setup to remove the default only_full_group_by flag? I can't try and repeat what you saw as you did not provide a full test case (e.g. using the local examples). Each Vitess cluster is a bit of a snowflake and I have no details on yours.

@mattlord mattlord self-assigned this Mar 26, 2024
@mattlord mattlord added Can't Repeat and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 26, 2024
@mattlord
Copy link
Contributor

To be more clear, this is not guaranteed in SQL -- the language itself (not just Vitess or MySQL). SQL is declarative. And the results you declared in your statement are:
select, any sku column value, the maximum value in the price column, from the employee table

This is the kind of statement that MySQL, many, many years ago, would let you execute w/o any warnings or errors even though the results are not deterministic. The results for that are not deterministic as written, thus them not being deterministic in Vitess is not in and of itself a bug — the same is true in a single stand-alone mysqld instance. https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html

@wangweicugw
Copy link
Contributor Author

Yes, I changed the sql_mode. Apologies for not mentioning it earlier. I set it globally as follows:

SET GLOBAL sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';

@wangweicugw
Copy link
Contributor Author

wangweicugw commented Mar 26, 2024

Actually, this SQL comes from here; the original SQL was SELECT empno, MAX(sal) FROM emp HAVING COUNT(*) > 3. I discovered that this SQL could not pass in my test case, so I simplified the SQL and found that it still could not pass.

using my current example, MySQL returns a consistent result no matter how many times I execute it.

@mattlord
Copy link
Contributor

If you look at the test executing that query, and the function it uses, you will see that it's confirming in this case that we get the same error when executing it against mysqld and against vtgate:

func (mcmp *MySQLCompare) ExecAllowAndCompareError(query string) (*sqltypes.Result, error) {
	mcmp.t.Helper()
	vtQr, vtErr := mcmp.VtConn.ExecuteFetch(query, 1000, true)
	mysqlQr, mysqlErr := mcmp.MySQLConn.ExecuteFetch(query, 1000, true)
	compareVitessAndMySQLErrors(mcmp.t, vtErr, mysqlErr)

	// Since we allow errors, we don't want to compare results if one of the client failed.
	// Vitess and MySQL should always be agreeing whether the query returns an error or not.
	if vtErr == nil && mysqlErr == nil {
		vtErr = compareVitessAndMySQLResults(mcmp.t, query, mcmp.VtConn, vtQr, mysqlQr, false)
	}
	return vtQr, vtErr
}

And as I showed in my example, they do.

I discovered that this SQL could not pass in my test case, so I simplified the SQL and found that it still could not pass.

I don't know what this means.

using my current example, MySQL returns a consistent result no matter how many times I execute it.

Sure, if the state in that table does not change. Just as in if the rows came back from the shards in the same order every time in vtgate you would get the same results. The point is that the query as you wrote it is non-deterministic so you cannot expect it to be deterministic.

I'm going to close this as there's no Vitess bug here that I can see. If you feel it was closed in error, please add a comment with a clear description of what you feel the bug is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants