-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V8: Speed up content migration #6191
Conversation
There is cmsPropertyData and not umbracoPropertyData. Are you talking about the same table ? |
@KDharmindar it's the same table, though with the columns renamed and changed around a bit. It's renamed during the migration to V8. |
@stevemegson my point is that I need to convert from v7 to v8 and the cmsPropertyData contains 2.1 million records. And it is taking days to update that table. I was monitoring the updates in SQL Profiler. Any suggestions to improve things? |
@KDharmindar it sounds like you're stuck on updates to the versionId2 column, which happen before the table is renamed. These changes improve that significantly - that step takes a minute on my site, which is a similar size to yours. I could share a custom build of 8.1.4 with these changes included if you're interested in testing it on your database? It only changes Umbraco.Core.dll. |
15d0eb9
to
114c351
Compare
@stevemegson sure. That would be a big favor if I get able to speed up things with this. |
No problem, here's the modified DLL. If you could share the log file after testing it, that would help me see where other improvements could be made. |
Database migration failed duplicate published document version. This is the error now. |
This is the log file |
Probably best to move to the forum to discuss any unrelated errors you're getting, but I think that error can only happen if there are already duplicate published versions in your database before the migration. This SQL should find the offending pages: SELECT nodeId, count(*)
FROM cmsDocument
WHERE published=1
GROUP BY nodeId
HAVING count(*) > 1 |
Duplicate issue resolved. Now the new one is as follows exec sp_executesql N'delete top (1) JQ This query is being called repeatedly. |
Hi @stevemegson, I've tested your changes on several huge sites, more than 2 million rows in cmsPropertyData (umbracoPropertyData ) and the improvement is great. For some sites there was a need to add aliases otherwise the ToDictionary() call failed, did it thru console app and also now One of the site can't upgrade given the
I've tested all the data in the dataDate and date format is fine. Would you know where to look for the offending data as the SQLProfiler doesn't give the exact location? |
@matijagrcic This SQL should give you all the values that it's trying to parse: select contentNodeId, versionId, propertytypeid, dataNvarchar
from cmsPropertyData
where propertytypeid in (
select id from cmsPropertyType
where Alias in ('umbracoMemberLastLockoutDate', 'umbracoMemberLastLogin', 'umbracoMemberLastPasswordChangeDate')
)
and dataNvarchar is not null |
Thanks @stevemegson but running this query doesn't return any rows. |
@matijagrcic Ah, I'm thinking of AddTypedLabels' behaviour in 8.2.0. In 8.1.x it used the IDs of properties rather than aliases, which causes problems with sites which have been upgraded from older versions and have other properties using those IDs. Try this SQL: select contentNodeId, versionId, propertytypeid, dataNvarchar
from cmsPropertyData
where propertytypeid in (32, 33, 34)
and dataNvarchar is not null |
Thanks 32, 33, 34 are
so certainly not dates, mostly ints, strings and some null values. |
@matijagrcic AddTypedLabels should be fixed in 8.2.0, so the easiest solution would be to upgrade to that. Here's a build of 8.2.0 with the changes from this PR applied: |
Thanks @stevemegson will test this out and let you know the result. |
This passed the AddTypedLabels now but failed when doing so almost there. Seems the index already exist or hasn't been dropped?
Found the node breaking the integrity, will delete it and rerun this again. |
That means that you have data which conflicts with the foreign key it's trying to create - a node in |
If anyone interested found the offending node by running select * from [umbracoNode]
WHERE [parentId] NOT IN
(SELECT [id] from [umbracoNode]) then removed it thru SQL using
@stevemegson Your changes are now tested on several projects where The current official solution was running for 2-4 days, depending on the DB and didn't completed. |
This PR is a lifesaver. Migration was taking multiple days, but got it down to circa 2 hours, with 5 million rows in cmsDataProperty. 👏 |
@leethomascook I'm glad it helped. Would you be able to share the log from the migration? I'm always keen to see where there might still be bottlenecks for large databases. |
Just ran this on an umbraco site with 2 mil+ rows in cmsPropertyData and the migration took only a couple of minutes vs the HOURS it was running before It would be great if this could be merged in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots to try to review here :) I've added some notes/comments inline for a few items. Some other notes here:
Use sp_rename for columns on SQL Server (mainly for umbracoPropertyData.textValue)
I don't see this in this PR?
VariantsMigration: Check for edited property data before copying versions where "published=newest" - we know that the copied versions won't be edited
I think this is the updatedIds
HashSet change? I'm just trying to follow along here. I know the original code was probably super painful to debug and work with so ideally if you can add detailed code comments about what is happening it will be most helpful for reviewers and anyone else that needs to dive into this code in the future.
VariantsMigration: Create extra versions for "published=newest" with a few large queries rather than a foreach loop
Yep this one will take some time to figure out. It's quite difficult to figure out what is going on when reviewing this, as above some code comments would go a long way to help.
VariantsMigration: Unpublished documents should have edited=1 (fixes warnings "Skip item id=x, both draft and published data are null.")
Is this the edited=~published
change?
Ideally, most of these variable names are updated to be meaningful. Sorry you've had to deal with these names like temp
, temp3
, etc... which are totally meaningless :/
{ | ||
var tableDefinition = Persistence.DatabaseModelDefinitions.DefinitionFactory.GetTableDefinition(typeof(PropertyDataDto), SqlSyntax); | ||
Execute.Sql(SqlSyntax.FormatPrimaryKey(tableDefinition)).Do(); | ||
Execute.Sql("CREATE UNIQUE NONCLUSTERED INDEX[IX_umbracoPropertyData_VersionId] ON [umbracoPropertyData]([versionId],[propertyTypeId],[languageId],[segment])").Do(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to rethink these indexes.
There's currently 4x indexes on this table: languageId, propertyTypeId, segment, versionId. It is most likely very true that all of these should be encapsulated under a single index which will significantly improve performance but we need to deal with those indexes first. There are no changes in this PR for the PropertyDataDto
class.
We need this index defined at the Dto level, then I thought the Delete.KeysAndIndexes(Constants.DatabaseSchema.Tables.PropertyData).Do();
in KeysAndIndexes
would do what is required without manually doing this?
Was there a calculated reason behind the decision for the ordering of these columns in this new index or just adding all columns that are used in a query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have made it clear here that this is a temporary index for the benefit of the remaining migrations, since we've got no indexes at this stage. The index is deleted in CreateKeysAndIndexes.cs
before we create the indexes defined by PropertyDataDto
.
If memory serves, I ended up with this particular index for the "set 'edited' to true whenever a 'non-published' property data is != a published one" step in VariantsMigration
, and it covers what's needed for all the Database.Update(propertyDataDto)
calls too.
// 'cos INSERT above has inserted the 'published' document version | ||
// and v8 always has a 'edited' document version too | ||
Database.Execute($@" | ||
INSERT INTO [cmsContentVersion] (nodeId, versionId, versionDate, userId, [current], text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this SQL needs to have non-hard coded table names, there is also a lot of column names with square brackets which should be dealt with with GetQuotedTableName. More ideally we'd use strongly typed SQL but at least we can not hard code table names or sql syntax. Same goes for NEWID(), this needs to be resolved from FormatSystemMethods(SystemMethods.NewGuid)
.
It may be useful for me to recreate this with separate commits that make the individual changes clearer, and add comments as I go. I think I squashed it into one commit to get rid of various dead end changes which I ended up overwriting, but coming back to the code now it does make it hard to see why some of the changes are there, particularly with some blocks of code being reordered.
It's the change to
It's just reordering the steps within that method, so that "need to add extra rows for where published=newest" comes after "set 'edited' to true whenever a 'non-published' property data is != a published one". The HashSet change is to avoid repeatedly setting the
Yes - the code assumed that the existing version will become the published version, and we'll copy it to create the newest version later. If the content is unpublished then we just want to make the existing version the newest version. |
Hi @stevemegson I agree about the individual PRs. It would certainly be easier to review and merge with smaller changes and comments. I've got some time booked out for reviewing this so just let me know here if/when you get other PRs ready and we can close this in favor of those. Thanks! |
I've broken out some of the changes into #7561, #7562, #7563, #7565, #7566. There's more to do in |
Actually the other changes don't lead to quite as much merge misery as I expected, so I've added #7567, #7568, #7569. The remaining change I made is to move the "set 'edited' to true whenever a 'non-published' property data is != a published one" step of |
amazing work @stevemegson ! 🎉 I've reviewed, and merged with a few fixes where necessary. great stuff! Ill close this PR for now but this is the one tagged for being due in 8.7 even though your smaller PRs are in there too. |
Some work on #5803.
Speed improvements
umbracoPropertyData
to speed up updates in later migrationssp_rename
for columns on SQL Server (mainly for umbracoPropertyData.textValue)umbracoPropertyData
,umbracoContentVersion
,umbracoMediaVersion
on SQL ServerOther fixes
edited
=1 (fixes warnings "Skip item id=x, both draft and published data are null.")languageId
andsegment
when checking for edited property datapublished
column after "reduce document to 1 row per content")This item has been added to our backlog AB#4555