Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: decouple user used storage sql function #2069

Merged
merged 1 commit into from
Nov 1, 2022

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Oct 26, 2022

This PR aims to optimize performance for calculating user used storage. It includes:

  • decouple user_used_storage into 2 functions, one for regular uploads and other for psa
  • both regular uploads and psa now ignore Pin status. If it is not Pinned, we won't know the dag size.
    • this should be a big boost given Pin table is by far the largest table we have
    • an accepted side effect by us is that non completed uploads that get stalled will be accounted
  • changes psa sql indexes so that a sequential scan is avoided
    • seq_scan on psa_pin_request is done to check the deleted_at and to join with auth_key table
    • psa_pin_request_search_idx should perform better and includes the 3 columns used
    • other psa indexes were only used in this flow

@vasco-santos vasco-santos force-pushed the perf/decouple-user-used-storage-sql-function branch 20 times, most recently from 736a76f to 25cc8d3 Compare October 27, 2022 11:14
@vasco-santos vasco-santos temporarily deployed to production October 27, 2022 11:20 Inactive
@vasco-santos vasco-santos force-pushed the perf/decouple-user-used-storage-sql-function branch from 25cc8d3 to 01bb640 Compare October 27, 2022 11:34
@vasco-santos vasco-santos temporarily deployed to production October 27, 2022 11:40 Inactive

CREATE OR REPLACE FUNCTION user_used_storage(query_user_id BIGINT)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping this function because users_by_storage_used uses it. So, just refactored to re-use partial functions created

assert.strictEqual(data.usedStorage.uploaded, 32000)
assert.strictEqual(data.usedStorage.psaPinned, 10000)
assert.strictEqual(data.usedStorage.psaPinned, 710000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed updated as now we are counting fixtures that were not pinned status from https://github.com/web3-storage/web3.storage/blob/main/packages/api/test/fixtures/init-data.sql#L156-L167

Comment on lines -186 to -202
// Create "Failed" Upload. It should not be counted.
const cid3 = await randomCid()
const dagSize3 = 100000
await createUpload(client, user._id, authKey._id, cid3, {
dagSize: dagSize3,
pins: pinsError
})

// Create Upload not pinned yet. It should not be counted yet.
await createUpload(client, user._id, authKey._id, await randomCid(), {
dagSize: 1000000,
pins: initialPinsNotPinned
})

const usedStorageWithFailed = await client.getStorageUsed(user._id)
assert.strictEqual(usedStorageWithFailed.uploaded, dagSize1 + dagSize2, 'used storage should not count unpinned')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be accounted now, so really no need to add them given there is no exception anymore

@@ -127,37 +127,13 @@ export async function createUserWithFiles (dbClient, options = {}) {
const pinRequests = 3
const dagSize = Math.ceil(((percentStorageUsed / 100) * storageQuota) / (uploads + pinRequests))

// Create a failed upload.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed + Not pinned are now counted too

Copy link
Contributor

@flea89 flea89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! This should hopefully improve perf substantially!

As discussed offline, it'd be great to make it obvious to users that failed/uncompleted uploads might count against their quota.
Create a ticket for that here?

/**
* @param {Number} num
*/
export function safeNumber (num) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.

We could call this function from parseTextToNumber.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not parsing anything - just ensuring it's within the safe integer range.

Maybe validateSafeInteger?

Copy link
Contributor

@flea89 flea89 Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant is

export function parseTextToNumber (numberText) {
  const num = Number(numberText)
  validateSafeInteger(num)
  return num
}

/**
 * @param {Number} num
 */
export function validateSafeInteger (num) {
  if (!Number.isSafeInteger(num)) {
    throw new Error('Invalid integer number.')
  }
  return num
}

to avoid duplicating the validation in 2 functions and making sure for instance we're raising a consistent Error.

Really minor ;)

.rpc('user_used_storage', { query_user_id: userId })
.single()
const [userUploadedResponse, psaPinnedResponse] = await Promise.all([
this._client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind splitting the 2 up?
Is it to better understand where the bottleneck is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goal is to decrease the request time to be less likely of getting to the 30s timeout of heroku dyno. With Alan's account that has a lot of data this was taking so much.

Also, we were talking about having a view per product usage, and we will be able to perform queries specific for each thing.

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vasco-santos vasco-santos merged commit 2f5d5f8 into main Nov 1, 2022
@vasco-santos vasco-santos deleted the perf/decouple-user-used-storage-sql-function branch November 1, 2022 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Storage calcuation timing out in web3.storage account dashboard
3 participants