Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NuGet.org Bug]: many catalog package details leaves are redundant (duplicate) because of improper deprecation no-oping #9921

Open
joelverhagen opened this issue Apr 17, 2024 · 0 comments
Assignees
Labels

Comments

@joelverhagen
Copy link
Member

Impact

I'm unable to use NuGet.org

Describe the bug

As part of #8873, I was interested in implementing proper no-oping for deprecation. This is to ensure that redundant API calls do not easily cause many unnecessary V3 catalog leaves and therefore load on the DB.

I did a data investigation with NuGet.Insights to see how often duplicate catalog leaves occurred. It turns out... a lot!

Of catalog leaves with deprecation information, over 15% are duplicate, i.e. their mutable metadata (listed, vulnerabilities, and deprecation) are the same as the previous leaf for that same package.

let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
let LeafCount = toscalar(NiCatalogLeafItems | count);
SameAsPreviousLeaves
| summarize LeavesWithDeprecation = count(), DuplicateLeaves = countif(SameAsPrevious)
| extend TotalLeaves = LeafCount
| extend ['Duplicate % of total'] = round(100.0 * DuplicateLeaves / TotalLeaves, 2)
| extend ['Duplicate % of deprecated'] = round(100.0 * DuplicateLeaves / LeavesWithDeprecation, 2)
LeavesWithDeprecation DuplicateLeaves TotalLeaves Duplicate % of total Duplicate % of deprecated
684887 103619 12725635 0.81 15.13

I believe this is made even worse for users that have access to the deprecation API private preview. It's easy to re-request deprecation of a growing set of versions with the API and cause a growing wave of duplicate deprecations. azure-sdk is at the top of the deprecation list because it is enabled for the deprecation API.

let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
SameAsPreviousLeaves
| summarize DeprecationCount = count(), DuplicateCount = countif(SameAsPrevious) by Identity, LowerId
| join kind=inner NiPackageOwners on LowerId
| project LowerId, Identity, Owners, DeprecationCount, DuplicateCount
| mv-expand Owner = Owners to typeof(string)
| summarize DeprecationCount = sum(DeprecationCount), DuplicateCount = sum(DuplicateCount) by Owner
| order by DuplicateCount desc
| extend ['Duplicate %'] = round(100.0 * DuplicateCount / DeprecationCount, 2)
| take 10
Owner DeprecationCount DuplicateCount Duplicate %
Microsoft 112632 55381 49.17
azure-sdk 74116 52725 71.14
monk.soul 73453 21262 28.95
chinadotnet 61946 17637 28.47
dotnetchina 61946 17637 28.47
Thinkka 3860 3497 90.6
nugetservicebus 4296 3030 70.53
AppInsightsSdk 4168 2770 66.46
RabbitFoot 15833 1990 12.57
furion.net 6501 1847 28.41

We should implement proper no-oping in PackageDeprecationService. Currently the code updates the LastEdited field on ALL packages in the deprecation batch if there is ANY change (rather than no-oping on a package-by-package basis).

if (_entitiesContext.HasChanges)
{
using (var strategy = new SuspendDbExecutionStrategy())
using (var transaction = _entitiesContext.GetDatabase().BeginTransaction())
{
await _entitiesContext.SaveChangesAsync();
await _packageUpdateService.UpdatePackagesAsync(packages);

Repro Steps

  1. Deprecate version X and Y of a package via the UI.
  2. Deprecate version X, Y, and Z of a package via the UI, carefully selecting the same deprecation settings as the previous step.

Expected Behavior

Version Z should only have it's LastEdited value updated.

Actual: X, Y, and Z have their LastEdited value updated.

Screenshots

No response

Additional Context and logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants