Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8627: invalidate table schema cache on DDL #8652

Merged

Conversation

klimov-paul
Copy link
Member

Fix #8627: invalidate table schema cache on DDL

Any DDL statement composed using yii\db\Command methods automatically refreshing related table schema.
Migration::execute() refreshes entire database schema, because raw SQL may contain DDL, which changes several tables.

@klimov-paul
Copy link
Member Author

I am unsure about naming for Command::requireTableSchemaRefreshment() and Command::refreshTableSchema(). Any suggestion is welcome.

@klimov-paul klimov-paul added the type:bug Bug label Jun 4, 2015
@klimov-paul klimov-paul added this to the 2.0.5 milestone Jun 4, 2015
@@ -537,6 +541,7 @@ public function createTable($table, $columns, $options = null)
public function renameTable($table, $newName)
{
$sql = $this->db->getQueryBuilder()->renameTable($table, $newName);
$this->requireTableSchemaRefreshment($table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop $_refreshTableName and simply call $this->db->invalidateTableSchema($table) for the following reasons:

  • The new invalidateTableSchema() can be used by users directly in case they use raw SQL to modify a table.
  • Although methods like renameTable() doesn't change the table (because it doesn't execute the SQL), it is not likely that you would call this method without executing it. Moreover, even if you don't execute the SQL, the only loss is that we incorrectly invalidated the table schema and thus cause an extra table schema reloading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine following:

$db = Yii::$app->db;
$command = $db->createCommand()->dropTable('foo'); // prepare
// some manipulate 'foo' table data
// ...
$command->execute(); // actual execution.

Or:

$db = Yii::$app->db;
$dropCommand = $db->createCommand()->dropTable('tmp');
if ($db->schema->getTableSchema('tmp')) {
    $dropCommand->execute(); // drop 1
}

$db->createCommand()->createTable('tmp', [...]);
// manipulate table 'tmp'

$dropCommand->execute(); // drop 2

Current implementation ensures cache invalidated in all cases, while simplifying it will produce unexpected results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

I think _refreshTableName should be set null when the command is being reused. Otherwise whenever you are reusing the command for other purposes, it will mark the table schema as invalid. Not very common, but should be similar to the use case you gave.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

@qiangxue
Copy link
Member

qiangxue commented Jun 6, 2015

We also have Connection::getTableSchema(). The main idea for exposing Connection::refreshTableSchema() is to make this more visible and accessible. Users are probably much more familiar with Connection than Schema.

Another arguable change is the addition of Schema::refresh() to Migration::execute(). @yiisoft/core-developers What's your opinion?

@klimov-paul
Copy link
Member Author

The main idea for exposing Connection::refreshTableSchema() is to make this more visible and accessible.

This can be done, but in this case Connection::refreshTableSchemaS() (or whatever the name will be) should be added as well for consistency.

Another arguable change is the addition of Schema::refresh() to Migration::execute().

I can't find any other way as raw SQL may content any DDL statement. We may attempt to parse the SQL string, searching for some particular keywords, but DDL syntax may vary from one DBMS to other. I don't think such appraoch will be reliable.

@samdark
Copy link
Member

samdark commented Jun 6, 2015

Refreshing schema at migration start is the safest approach I can think of.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

This can be done, but in this case Connection::refreshTableSchemaS() (or whatever the name will be) should be added as well for consistency.

Well, it's very rare that you would need to refresh all table schemas.

I can't find any other way as raw SQL may content any DDL statement. ...

Our goal is to make sure schema-changing methods are working correctly. For raw SQL execution, we should emphasize by documentation that manually schema refresh is required. As you can see, refreshing schema in Migration::execute() still doesn't cover all cases (e.g. Command::execute()).

Refreshing schema at migration start is the safest approach I can think of.
Show all checks

It doesn't help in this case (e.g., you modify a table schema followed by row insertions in the same migration).

@samdark
Copy link
Member

samdark commented Jun 7, 2015

Right. Then I like the plan and PR. I see nothing wrong in having refreshTableSchemaS. Sometimes you don't care about re-reading schema (in MySQL it's fast) and don't want to name all 10 tables which schema you've affected.

@klimov-paul
Copy link
Member Author

Our goal is to make sure schema-changing methods are working correctly.

Disagreed. The problem, which is exposed at the issue #8627, is about unreliable DB migrations applying. Currently there are many migration written already, which may cause this problem appear at certain circumstances. We must ensure migration code as it mentioned in the docs is reliable. Enforcing developers to recheck and rewrite thier old migrations does not seem right.

For raw SQL execution, we should emphasize by documentation that manually schema refresh is required.

This is true for the DDL code written for specific business logic support inside regular application code. But for migration this will produce bad user experience: why should I manually clear table schema cache inside migration, which exists to change table schemas?

As you can see, refreshing schema in Migration::execute() still doesn't cover all cases (e.g. Command::execute()).

Yes, it does not. Still it is unlikely Command::execute() will be used inside a migration, while Migration::execute() is available. Of course, extra docs should be added, but migration should be reliable, at least in case of common usage, without extra developer work.
I don't see much harm clearing all table schemas on Migration::execute(): inside migration it will not cause sugnificant performance lost.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

@klimov-paul Migration::execute() only deals with Migration::db. If a migration needs to handle multiple DBs, then you will see inconsistencies: when you call $this->execute(), you don't need to refresh schema; while when you call $db2->createCommand(...)->execute(), you have to refresh schema manually.

@samdark When you need to refresh multiple tables, $db->schema->refresh() is already very self-explanatory.

@klimov-paul
Copy link
Member Author

If a migration needs to handle multiple DBs, then you will see inconsistencies

Yes, it will be inconsistencies in complex cases. Sure we need to warn developers about it. But we need to handle the common migration flow automatically.
It is rather populare practice to use plain SQL for DLL inside migrations. For example: working with MySQL, people like to create tables via PHPMyAdmin interface and then create a migration for them using SQL generated by SHOW CREATE TABLE. This is not a good practice, but common one.
If I remove refresh() from Migration::execute() the following migration code will NOT work:

$this->execute('CREATE TABLE status ...');
$this->insert('status', ['name' => 'some']);
// another migration: 
$this->execute('ALTER TABLE status ...');
$this->insert('status', ['title' => 'some']);

To make it work developer have to write it in following way:

$this->execute('CREATE TABLE status ...');
$this->insert('status', ['name' => 'some']);
// another migration: 
$this->execute('ALTER TABLE status ...');
$this->db->refreshTableSchema('status');  // does not look right
$this->insert('status', ['title' => 'some']);

Do not forget, we are talking about migrations, which are already written! Which were promised to be working. We are fixing a bug - not creating new feature.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

My point is that when you are working with multiple DBs, for one DB you don't need to refresh schema while for the other, you will have to. And even for working with a single DB, if you use Command::execute() directly (e.g. you are repeatedly executing SQLs which you don't want to use Migration::execute() to display the messages for), you still need to manually refresh schema.This is kind of hard to understand without knowing the internals. So why not just emphasizing the importance of always refreshing schema when using raw SQLs? This is the only rule that users need to remember.

@klimov-paul
Copy link
Member Author

why not just emphasizing the importance of always refreshing schema when using raw SQLs?

Simply because it enforces to add extra redundant code inside the migration. Migration up() and down() methods should be clean, clear of any code, which does not belong to actual database state change. Placing code like $this->db->refreshTableSchema('status'); all over migration code does not feel right - it seems redundant. It produces bad user experience - people will not accept this.
What we are going to tell to developers, which work on the at least 1 year old project? Should be ask them to review and edit all created so far migrations adding refreshTableSchema() all over them?
Currently you are saying to developer he/she must manually clear schema cache after database schema has been changed. For the migration this is noncence: migration exists for database schema changing, why it can not clear cache on its own?

I can agree on removal of Schema::refresh() from Migration::excute() in exchange of adding it at the begining (and perhaps at the end) of the migration. Migration internal code should remain clean. It may require extra effots for the complex cases, but not while developer uses only migration internal methods.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

Simply because it enforces to add extra redundant code inside the migration.

That's only when you are using raw SQLs. So it really depends on how common it is that you would use raw SQLs to adjust table schemas (and if it is indeed common, then is it common to use raw SQLs to insert rows). Also when the migrations are working with a different DB or using Command::execute() to execute SQLs, this PR still won't solve the problem. It actually even hides the problem because users may think we have solved all problems. This is something I don't like.

I can agree on removal of Schema::refresh() from Migration::excute() in exchange of adding it at the begining (and perhaps at the end) of the migration.

This doesn't solve the problem if you are altering a table and inserting rows into it in the same migration.

@samdark
Copy link
Member

samdark commented Jun 7, 2015

I saw usage of SQL in migrations many times so it's indeed a common practice.

When you need to refresh multiple tables, $db->schema->refresh() is already very self-explanatory.

OK. Fine with me to leave it there.

@samdark
Copy link
Member

samdark commented Jun 7, 2015

Technically schema refresh could be solved by parsing low level requests for ALTER statement but that means doing it for every request made. And, of course, it doesn't solve changing schema by another app.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

I saw usage of SQL in migrations many times so it's indeed a common practice.

What if the SQLs are executed via Command::execute()?

Technically schema refresh could be solved by parsing low level request...

What if it's done via a stored procedure?

@samdark
Copy link
Member

samdark commented Jun 7, 2015

Well, then we can't detect it. I understand what you're talking about: partial solution is worse than no solution in this case but I really don't like doing ->refresh() all the time. Especially in migrations.

@klimov-paul
Copy link
Member Author

This doesn't solve the problem if you are altering a table and inserting rows into it in the same migration.

For common migration - it will!
Usually developer performs all alteations to the table and then insert or update its data.
If at the begining of the migration cache is cleared, no table schema object will be created at plain execute(), which means table alteration will be applied just before insert/update.
For the single migration, while enableSchemaCache if false it works exactly this way. That is why it took so long for the issue #8627 to reveal itself.
If developer forget to clear cache after execute() inside migration and run up(), down(), up() to test it - migration will work fine - nothing will indicate anything is wrong, untill several migrations are run during same script execution.
Calling refresh() (at least) before migration apply ensures any sequence of migrations are run correctly. If extra refresing required during single migration run - developer will realize it during migration testing.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

Thanks for the explanation.

Are we currently disabling schema caching in migrations? I don't see the code there yet.

@klimov-paul
Copy link
Member Author

Disabling enableSchemaCache will not resolve the issue. If we disable - it we only ensures single migration applying works fine. If several migrations are applying in sequence they still may break.

@qiangxue
Copy link
Member

qiangxue commented Jun 7, 2015

I thought we already did this. Okay, I would prefer we refresh schema at the beginning of each migration.

@klimov-paul
Copy link
Member Author

Okay, I would prefer we refresh schema at the beginning of each migration.

refresh() has been moved to Migration::init().
Is there anything else to be changed?

@samdark
Copy link
Member

samdark commented Jun 8, 2015

A changelog entry? Codewise it looks good to me.

@klimov-paul
Copy link
Member Author

A changelog entry already present.

@samdark
Copy link
Member

samdark commented Jun 8, 2015

Right. Then it's complete.

klimov-paul added a commit that referenced this pull request Jun 8, 2015
…a-cache

Fix #8627: invalidate table schema cache on DDL
@klimov-paul klimov-paul merged commit 28877f8 into yiisoft:master Jun 8, 2015
@cebe
Copy link
Member

cebe commented Jun 8, 2015

👍

1 similar comment
@qiangxue
Copy link
Member

qiangxue commented Jun 9, 2015

👍

@creocoder
Copy link
Contributor

Great work 👍

@klimov-paul klimov-paul deleted the 8627-invalidate-table-schema-cache branch April 6, 2016 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants