Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type, limit, offset, min_id, max_id, account_id to search API #10091

Merged
merged 5 commits into from Feb 26, 2019

Conversation

@Gargron
Copy link
Member

commented Feb 20, 2019

Fix #8939

Note: The search controller was running authorize on each status, but SearchService already runs StatusFilter on each status, which calls StatusPolicy internally, so the same work was done twice.

type allows specifying what kind of results you expect back, to omit the unneeded ones. Now customizable limit (previously: always 5; now: 0-40), as well as offset allows paginating. The following params are only available for statuses search: min_id and max_id allow filtering results by date ranges, and account_id allows narrowing down results by author.

@Gargron Gargron added the api label Feb 20, 2019
@nolanlawson

This comment has been minimized.

Copy link
Collaborator

commented Feb 21, 2019

Does offset introduce potential problems if the search results change? For instance:

  1. I ask for the first 20 rows
  2. The search results change in the underlying DB
  3. Now when I ask for offset 20, I see duplicates and/or missing results.

I'm not sure if Postgres offers a good way to solve this problem, though. That's the only potential issue with this API that I see, LGTM otherwise. 👍

@apetresc

This comment has been minimized.

Copy link

commented Feb 21, 2019

Indeed, limit-offset-based pagination for something as fast-moving as toot search results is virtually guaranteed to run into the exact problem @nolanlawson is describing.

The standard solution is cursor-based pagination. For example, here’s how the Slack API solves a very analogous problem.

@Gargron

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2019

something as fast-moving as toot search results

Mastodon search is anything but fast-moving. The results are limited to owned/marked content.

Keyset pagination is what we use everywhere else, but there are a few obstacles to doing it here: Search results are not sorted by primary ID, which makes keyset pagination impossible. While ElasticSearch seems to support a search_after param, the other search types do not, plus min_id/max_id are reserved for the date range filter, so it's unclear what param to use to pass that value, not to mention that clients would need to pass an ElasticSearch-internal value which is not exposed through our API in the first place.

I've checked Discord's network traffic and it's using offset/limit for pagination too.

@Gargron

This comment has been minimized.

Copy link
Member Author

commented Feb 24, 2019

Any more comments? Wishlists? I could add more params like to filter by if a post has images or a video, if anyone needs that. Let me know

@BenLubar

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2019

It would be nice to have a way of searching posts that were recently in the home timeline in addition to posts that I've interacted with.

I frequently find myself unable to reply to a post that linked to something because I can't find it.

@witcheslive

This comment has been minimized.

Copy link

commented Feb 25, 2019

Looking forward to these changes, thank you!

I would like to second a consideration to relax some of the limitations on the search. It would be nice if admins could search their own instance (even if it was limited to a backend interface) or you could search everything in the last day to look for things you might have missed as they flew by but also couldn't mine.

The limitations are good overall but with alternatives like Pleroma having much more complete search (also certain Mastodon modifying their instances to allow unrestricted search) and that it doesn't seem too difficult to go directly to postgres or elasticsearch as an admin and run queries (I honestly haven't even tried but it does not seem hard at all) it might be a good idea to relax some of the limitations to make it more useful and remove a large chunk of the temptation to make some of these more far-reaching modifications.

@Gargron

This comment has been minimized.

Copy link
Member Author

commented Feb 25, 2019

It would be nice to have a way of searching posts that were recently in the home timeline in addition to posts that I've interacted with.

A bit out of scope here. Search permissions are implemented using a searchable_by array of account IDs on each post. For your suggestion, every post would need to have every follower's account ID in that array, which is a lot of data.

@witcheslive

This comment has been minimized.

Copy link

commented Feb 25, 2019

A bit out of scope here. Search permissions are implemented using a searchable_by array of account IDs on each post. For your suggestion, every post would need to have every follower's account ID in that array, which is a lot of data.

I don't know a lot about elasticsearch queries to be fair, but is there no way to make the restrictions on the query be "it's searchable_by OR has a timestamp within the last 24 hours"?

@ykzts
ykzts approved these changes Feb 26, 2019
@Gargron Gargron merged commit e7f20cc into master Feb 26, 2019
11 checks passed
11 checks passed
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: check-i18n Your tests passed on CircleCI!
Details
ci/circleci: install Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.4 Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.5 Your tests passed on CircleCI!
Details
ci/circleci: install-ruby2.6 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.4 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.5 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby2.6 Your tests passed on CircleCI!
Details
ci/circleci: test-webui Your tests passed on CircleCI!
Details
codeclimate All good!
Details
@ykzts ykzts deleted the feature-advanced-search branch Feb 26, 2019
hiyuki2578 added a commit to ProjectMyosotis/mastodon that referenced this pull request Oct 2, 2019
…otsuite#10091)

* Add type, limit, offset, min_id, max_id, account_id to search API

Fix tootsuite#8939

* Make the offset work on accounts and hashtags search as well

* Assure brakeman we are not doing mass assignment here

* Do not allow paginating unless a type is chosen

* Fix search query and index id field on statuses instead of created_at
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.