Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File load & pagination issues for user with many files #1362

Closed
mbommerez opened this issue May 26, 2022 · 12 comments · Fixed by #1408 or #1739
Closed

File load & pagination issues for user with many files #1362

mbommerez opened this issue May 26, 2022 · 12 comments · Fixed by #1408 or #1739
Assignees
Labels
topic/pagination topic/pot Issues handled by PT.

Comments

@mbommerez
Copy link

mbommerez commented May 26, 2022

User observations

  • Users with lots of uploads in web3.storage experience really slow loading times.
  • Users with > 1000 uploads are not currently able to see them all in the web interface. It stops at 100 pages of 10 files (loads 1000 first ones only).

Acceptance criteria

  • As a user I’m able to navigate through all my uploads
  • Loading uploads should be reasonably quick

Notes

There are 2 parts to this:

  • Have proper paginations to the page ( not loading 1000 items and then paginate in memory)
  • Look into query optimisation
  • This was first raised and looked into for ticket 1147

https://images.zenhubusercontent.com/378893240/704a835d-8695-4f22-853d-85ad199a1058/screen_recording_2022_03_21_at_10_38_42_720p.mp4

@joshJarr
Copy link
Contributor

Done some initial investigation on this, there's a few ways to approach this task and am keen to get some feedback before tackling it!

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

If we want to keep all "sort by" and "search" functionality, we would have to replicate this in the API.

Pagination

  • To show the numerical "pages" of the results, we may have to change the API response to also deliver some metadata on total uploads for the frontend to calculate the amount of pages. This change could be avoided if we were to simply show "next" and "previous" instead of numerical total pages.
  • I'd also like to add an offset param to the endpoint that works with the existing size param to chunk results by "page".

Sorting

  • Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.
  • We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date.
    Additionally, since date/time stamps are not unique, an edge case could occur in which two or more uploads with the same upload time stamp (an unlikely, event, unless a user is uploading files in parallel). Paginating by datetime with duplicated timestamps could result in missing uploads listed by the API.

Searching

  • I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?
  • Instead we could add a CID parameter to the GET user/uploads endpoint to only return a single response, or add a new endpoint of /user/uploads/<CID> to fetch a single result.

I could fix this all in a full fat solution in order keep parity with existing functionality, or I could implement an MVP hotfix for pagination using mainly existing API features but resulting in losing the sorting for file size and name.

Would love to hear your thoughts on this @dchoi27, @alanshaw & @orvn!

@orvn
Copy link
Contributor

orvn commented May 27, 2022

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

Thanks for this @joshJarr! Yes, I agree, this is far from ideal and was really just implemented as a stopgap. It definitely needs to be replaced because:

  1. Response will be slow for large queries
  2. Once the uploads list is > 1000 objects, the search won't even address the entire collection

Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date.

Agree as well, +1 to using offset and size in place of before and after dates.

I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

@joshJarr
Copy link
Contributor

joshJarr commented Jun 7, 2022

Thanks @orvn! I've got a WIP PR for these changes: #1408

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

That's a great shout. I think we can perhaps make a new ticket for applying this logic to the /pins endpoint. I'd love to see the new split design to see how best to do this

Agree as well, +1 to using offset and size in place of before and after dates.

Cool, I've implemented this in my WIP PR, currently weighing up using an offset vs page, as I think the latter reduces the effort of calculating page & page size on the frontend.

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

Cool - this is great to know. I may move adding a search endpoint/filtering by query to a new ticket. This could mean temporarily removing this functionality unless we block this ticket from being released until the search is implemented.

I also may move sorting by size to a new ticket. Currently the file size is not stored on the upload entity, so we may need to add this column via migration as it is currently denormalised on request, so I cannot sort the uploads by this efficiently. Keen to know the importance of sort by size and if this too can temporarily be removed to keep this fix small.

@vasco-santos
Copy link
Member

hey @joshJarr

Thanks for going through these issues. The historical reason for no pages appearing is that this was not possible with FaunaDB (DB we were using before). Now we can add that if it is important.

The offset here will map to use https://supabase.com/docs/reference/dart/range#examples with the page, is it correct? I would call it size and page, given we already name like this in other APIs and prefer to keep things consistent.

@joshJarr
Copy link
Contributor

Hey @vasco-santos, thanks for the context. Yep have it currently mapping to a .range in my PR.
Yep that's a great point around offset vs page, I'll drop offset in favour for page and use page numbers instead of calculating the offset in the client. This approach keeps it more consistent and easier to integrate with.
Thanks!

@JeffLowe
Copy link

@joshJarr @mbommerez Just let me or @drewdelano know once the pagination updates to the api lands so we can update the frontend to use it.

@mbommerez
Copy link
Author

@JeffLowe Thanks, we are reviewing this internally and get a review from PL next and then get this ready for the next API release.

@JeffLowe
Copy link

To clarify, this ticket's scope should be on pagination for the upload api. The utilization of the api by the frontend is ticketed in #1501

@joshJarr joshJarr linked a pull request Jun 28, 2022 that will close this issue
@joshJarr
Copy link
Contributor

joshJarr commented Jun 28, 2022

Thanks, will do @JeffLowe.

@vasco-santos I've requested your review on the PR, would be great to get your input and ensure it's consistent with existing functionality before merging.
Many thanks!

@joshJarr
Copy link
Contributor

Hey @JeffLowe @drewdelano
Just an FYI, the API pagination has been merged and deployed!
There's a minor chore to update the HTTP documentation with the new params, that's awaiting a website deploy.

Pagination
For now, if you add the page=1 param to the GET /user/uploads you should see a paginated response. You can still set page size with the size param.

I've added pagination metadata to the API response headers that you should be able to use to calculate page numbers and such. These are:

  • Count: Number of total uploads.
  • Size: The requested size.
  • Page: The requested page.

Sorting
Also sorting can be achieved by adding sortBy param, this supports 'Date' | 'Name' for now.
sortOrder can be used to order the sort, accepting 'Asc' | 'Desc'.

When running the Web documentation locally, there should be more info there.

Let me know if you have any questions or feedback!

@JeffLowe
Copy link

Thank you so much for this into @joshJarr !!

@mbommerez
Copy link
Author

Reassigning to @joshJarr. PR #1699 needs completing and review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment