-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to distinct on a multi-table search #137
Comments
Hi, just wanted to follow up… @gregmolnar @simi any thoughts on this? I tried a bunch of approaches but I don't have much experience with ActiveRecord (or Postgres, for that matter), so I've struck out so far. |
Is this problem present for other searches as well, right? It is not related to web search only. All you wrote makes sense actual, you need to de-duplicate results. Would grouping by Case id help here? |
I'm not sure, I haven't tried other searches… but I keep running into that syntax error whenever I try to run any kind of select or group command. Could you send me an example of what you'd do? I'll test it out and report back here. |
@tomcardoso you can't use |
@gregmolnar Thanks. My results could be in the tens of thousands easily, if not more (my database grows by between 2,000-10,000 entries a day). I'm also using |
I will have a look at a proper fix it in the coming days.
… On 3 Jul 2020, at 19:52, Tom Cardoso ***@***.***> wrote:
@gregmolnar Thanks. My results could be in the tens of thousands easily, if not more (my database grows by between 2,000-10,000 entries a day). I'm also using web_search as part of a chain of filters and whatnot which gets passed to gems like will-paginate, so I want to make sure to keep it in ActiveRecord if at all possible.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@tomcardoso I tried to reproduce this and wrote a spec here: e7f7da5 |
@gregmolnar Thanks, took me a minute to figure out what was going on… try doing this instead: expect(
WebComicWithSearchableName
.joins(:characters)
.web_search({name: "Batman", characters: { name: "Batman" }}, false)
.distinct
.size
).to eq(1) I'm using the |
Ah, I missed that part. If you do that, an OR condition is used and since we return the rank as well, which will have a unique alias for both, distinct can't return a unique row, since the alias is different. Case
.joins(:case_events)
.select("DISTINCT ON (cases.id) cases.*")
.order("cases.id")
.web_search({
case_number: 'cobalt',
short_title: 'cobalt',
case_events: {
party_name: 'cobalt',
short_title: 'cobalt'
}
}, false) Let me know if that still doesn't work. |
Thanks @gregmolnar. But with this, I'd lose the rank-based sorting, right? Is there any way to retain that? I realize some entries would have two rankings – I think it'd make sense to pick the entry with the higher rank value and drop the other. |
(I did test your code out, and it works! But I assume that I'd lose the rank-based sorting.) |
@tomcardoso indeed, you would lose the ranking based sorting.
I don't know any solution from the top of my head, but I will think about it. |
Thanks. I tried something dumb like: Case
.joins(:case_events)
.select("DISTINCT ON (cases.id) cases.*")
.order("cases.rank_web")
.web_search({ case_number: 'cobalt', short_title: 'cobalt', case_events: { party_name: 'cobalt', short_title: 'cobalt' }}, false, rank_alias = 'rank_web') But that didn't work:
|
There is a unique alias generated for the ranking, so you can't do it like that. And the column you use with distinct on has to be the first column you order on, so even if you would use the correct alias, it would fail with another error. |
Perhaps for cases like this, we almost need a special Textacular version of |
I think the query you're looking for is this: SELECT web_comics.*
FROM "web_comics"
INNER JOIN "characters" ON "characters"."web_comic_id" = "web_comics"."id"
WHERE (to_tsvector('english', "web_comics"."name"::text) @@ websearch_to_tsquery('english', 'Batman'::text) OR to_tsvector('english', "characters"."name"::text) @@ websearch_to_tsquery('english', 'Batman'::text))
GROUP BY "web_comics"."id"
ORDER BY max(COALESCE(ts_rank(to_tsvector('english', "web_comics"."name"::text), websearch_to_tsquery('english', 'Batman'::text)), 0) + COALESCE(ts_rank(to_tsvector('english', "characters"."name"::text), websearch_to_tsquery('english', 'Batman'::text)), 0)) ^ group by ID and order by max is the key to get deduplicated result set. I'm not sure how to get this query out of Textacular directly. This hacky version was enough for me, but I think this deservers much better API if we would like to make it into master. This is just POC: # monkey-patch
module Textacular
# hacked version of web_search to not assemble final query
def web_search_similarities_and_conditions(query = '', exclusive = true, rank_alias = nil)
exclusive, query = munge_exclusive_and_query(exclusive, query)
parsed_query_hash = parse_query_hash(query)
similarities, conditions = web_similarities_and_conditions(parsed_query_hash)
end
end
# passing spec
it "abc" do
WebComicWithSearchableName.create(
name: 'Batman',
author: 'Bill Finger',
characters: Character.create([
{ name: "Batman" },
{ name: "Robin" },
])
)
order, conditions = WebComicWithSearchableName.web_search_similarities_and_conditions({name: "Batman", characters: { name: "Batman" }}, false)
r = WebComicWithSearchableName
.joins(:characters)
.where(conditions.join(' OR ')) # depends on exclusive, could be AND as well
.order(Arel.sql("max(#{order.join(' + ')})")) # use aggregation function to pick highest rank for dupilcates
.group(:id) # group by ID to deduplicate
expect(r.length).to eq(1)
end |
Looking again at this, for those advanced cases Textacular can just provide related SQL fragments and let user to use it as needed. Any ideas for API around this? |
Huh! This is all well beyond my comprehension, but if it works, great. I'm happy to test this on our data when it's incorporated into the Textacular API generally (assuming there's an interest in doing so). Could it be as simple as adding an extra param telling Textacular whether to assemble the final query? |
@simi and @gregmolnar, one more question if you don't mind. Your code has worked great, but it's unfortunately broken the cases
.joins(:case_events)
.where(conditions.join(' OR '))
.group(:id)
.order(Arel.sql("max(#{order.join(' + ')})"))
.size Returns a hash like so:
Where each key is the case ID and each value is the number of case events – but I just want |
Oh – is the answer to just use the results of that query to build a new, final one, like so? first_query = cases
.joins(:case_events)
.where(conditions.join(' OR '))
.group(:id)
.order(Arel.sql("max(#{order.join(' + ')})"))
final_query = cases.where(:id => first_query) |
Hi there, I'm running into an issue where I'm unable to dedupe my results after using
web_search
on a model and its relations. Here's what I'm doing:So far so good. Except, this returns four results (since I have matches in both the parent Case model and its relation CaseEvent), which becomes two results after the
distinct
:As you can see, they both have the same
id
, 592, so I'd like to dedupe further. I try doing a.select("DISTINCT ON (id) *")
on that first command up there, but then I get this:I realize I could just do a
uniq
, but I need to keep this in ActiveRecord for several reasons. I'm a bit out of my depth here… Any suggestions?The text was updated successfully, but these errors were encountered: