Skip to content

Commit

Permalink
changes related to PR #70
Browse files Browse the repository at this point in the history
Add a 'user_identifier' argument to the zdb_multi_search() functions, and
change the returned result set to aggregate scores and row_data, so that
there's one row returned per table specified.  This should make working
with the results a lot easier for client applications.
  • Loading branch information
eeeebbbbrrrr committed Dec 19, 2015
1 parent e0f2450 commit b3c7f6c
Show file tree
Hide file tree
Showing 5 changed files with 91 additions and 95 deletions.
57 changes: 21 additions & 36 deletions SQL-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,65 +307,50 @@ These custom domains are to be used in user tables as data types when you requir
>
>NB: depending on query complexity, the "clause" column can sometimes be incorrect or null
#### `FUNCTION zdb_multi_search(table_names regclass[], query text) RETURNS SETOF zdb_multi_search_response`
#### `FUNCTION zdb_multi_search(table_names regclass[], user_identifiers text[], query text) RETURNS SETOF zdb_multi_search_response`

> `table_name`: The name of a table with a ZomboDB index, or the name of a view on top of a table with a ZomboDB index
> `table_names`: An array of tables (or views) with ZomboDB indexes to search at the same time
> `user_identifiers`: An array of arbitrary identifiers for each table. This could be useful for a client application to distinguish between result rows that use the same underlying table.
> `query`: a full text query
>
> This function searches the array of tables (or views) using the specified full text query, and returns the top 10 documents from each, ordered by score in descending order.
> This function searches the array of tables using the specified full text query, and returns the top 10 documents from each in descending score order.
>
> The scores and document field data are aggregated, such that at most, the returned resultset will only have as many rows as `table_names` has elements. If no results were found for a particular table, it is excluded from the results.
>
> The results of are the type:
> The result is of the type:
>
> ```
> TYPE zdb_multi_search_response AS (
> table_name regclass,
> user_identifier text,
> query text,
> total int8,
> ctid tid,
> score float4,
> score float4[],
> row_data json
> );
> ```
>
> The `table_name` column indicates the table name of the matching row
> The `user_identifier` column indicates the provided value for the current table
> The `query` column indicates which query the row matched
> The `total` column indicates the total number of matching documents
> The `ctid` column indicates the Postgres system column named `ctid` that contains the row in the table on which the underlying index was created
> The `score` column indicates the Elasticsearch-calculated score for the row
> The `row_data` column is a `row_to_json()` of the matching document, excluding any columns of type `fulltext` -- this decision was made for performance reasons
> The `score` column indicates the Elasticsearch-calculated scores for each matching row
> The `row_data` column is a json array of the `row_to_json()` for the top 10 matching documents. Note that columns of type `fulltext` are excluded -- this is for performance reasons
>
> Note that if one of the `table_names` elements is actually a view, the returned `row_data` will be from the table that owns the index, as determined by `zdb_determine_index()`. This means the `row_data` could have fewer properties than you expect.
>
> Example (using the "contrib_regression" database that comes with ZomboDB sources):
>
> ```
> select * from zdb_multi_search(ARRAY['so_posts', 'so_users'], 'java javascript');
table_name | query | total | ctid | score |
------------+-----------------+-------+-----------+---------+----------------------------------------------------------------------------------------------------------------
so_posts | java javascript | 1171 | (3952,2) | 11.0458 | {"accepted_answer_id":70607,"answer_count":2,"closed_date":"2012-01-30 22:51:06.303-05","comment_count":1,"comm
so_posts | java javascript | 1171 | (7728,3) | 10.4871 | {"accepted_answer_id":121413,"answer_count":5,"closed_date":null,"comment_count":8,"community_owned_date":null,
so_posts | java javascript | 1171 | (6969,2) | 10.3758 | {"accepted_answer_id":112258,"answer_count":5,"closed_date":null,"comment_count":3,"community_owned_date":null,
so_posts | java javascript | 1171 | (2545,5) | 9.46877 | {"accepted_answer_id":47691,"answer_count":15,"closed_date":null,"comment_count":8,"community_owned_date":"2013
so_posts | java javascript | 1171 | (15367,5) | 9.37296 | {"accepted_answer_id":null,"answer_count":2,"closed_date":"2013-11-03 20:55:10.65-05","comment_count":3,"commun
so_posts | java javascript | 1171 | (6867,2) | 9.31329 | {"accepted_answer_id":111044,"answer_count":9,"closed_date":null,"comment_count":8,"community_owned_date":null,
so_posts | java javascript | 1171 | (11974,8) | 8.66756 | {"accepted_answer_id":null,"answer_count":null,"closed_date":null,"comment_count":0,"community_owned_date":null
so_posts | java javascript | 1171 | (9177,6) | 8.64198 | {"accepted_answer_id":null,"answer_count":4,"closed_date":null,"comment_count":0,"community_owned_date":null,"c
so_posts | java javascript | 1171 | (15803,7) | 8.26861 | {"accepted_answer_id":223021,"answer_count":1,"closed_date":null,"comment_count":0,"community_owned_date":null,
so_posts | java javascript | 1171 | (12434,3) | 8.25173 | {"accepted_answer_id":180054,"answer_count":11,"closed_date":"2015-02-23 18:45:40.173-05","comment_count":9,"co
so_users | java javascript | 1309 | (1215,7) | 3.32197 | {"account_id":29943,"age":30,"creation_date":"2012-02-22 18:54:08.103-05","display_name":"Nachiket","down_votes
so_users | java javascript | 1309 | (3644,17) | 3.23258 | {"account_id":2068268,"age":25,"creation_date":"2014-12-02 22:49:02.65-05","display_name":"Tim Castelijns","dow
so_users | java javascript | 1309 | (2722,1) | 3.20018 | {"account_id":1756236,"age":20,"creation_date":"2014-02-03 18:10:40.417-05","display_name":"Calebe Oliveira","d
so_users | java javascript | 1309 | (2229,41) | 3.14003 | {"account_id":462203,"age":17,"creation_date":"2013-07-20 03:41:36.817-04","display_name":"Shawn31313","down_vo
so_users | java javascript | 1309 | (2071,22) | 2.99723 | {"account_id":2720548,"age":22,"creation_date":"2013-05-03 12:55:08.443-04","display_name":"Spencer Wieczorek",
so_users | java javascript | 1309 | (2273,22) | 2.96903 | {"account_id":553069,"age":50,"creation_date":"2013-08-09 13:06:29.33-04","display_name":"John DeRegnaucourt","
so_users | java javascript | 1309 | (2320,1) | 2.9636 | {"account_id":286663,"age":null,"creation_date":"2013-09-01 18:16:19.683-04","display_name":"Rahul Desai","down
so_users | java javascript | 1309 | (3866,1) | 2.90212 | {"account_id":1063117,"age":25,"creation_date":"2015-02-09 23:21:19.083-05","display_name":"falsarella","down_v
so_users | java javascript | 1309 | (2535,1) | 2.87772 | {"account_id":2036664,"age":43,"creation_date":"2013-12-06 13:45:56.27-05","display_name":"josedacruz","down_vo
so_users | java javascript | 1309 | (1499,1) | 2.75175 | {"account_id":214928,"age":30,"creation_date":"2012-07-23 14:49:38.917-04","display_name":"davidbuzatto","down_
(20 rows)
> ```
#### `FUNCTION zdb_multi_search(table_names regclass[], queries text[]) RETURNS SETOF zdb_multi_search_response`
> select * from zdb_multi_search(ARRAY['so_posts', 'so_users'], ARRAY['a', 'b'], 'java javascript');
table_name | user_identifier | query | total | score |
------------+-----------------+-----------------+-------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------
so_posts | a | java javascript | 1171 | {11.0463,10.4817,10.3789,9.45525,9.36948,9.31416,8.66982,8.64361,8.26008,8.25053} | [{"accepted_answer_id":70607,"answer_count":2,"closed_date":"2012-01-30 22:51:06.303-05","comm...
so_users | b | java javascript | 1309 | {3.32197,3.23258,3.20018,3.14003,2.99723,2.96903,2.9636,2.90212,2.87772,2.75175} | [{"account_id":29943,"age":30,"creation_date":"2012-02-22 18:54:08.103-05","display_name":"Nac...
(2 rows)
> ```
#### `FUNCTION zdb_multi_search(table_names regclass[], user_identifiers text[], queries text[]) RETURNS SETOF zdb_multi_search_response`
>
> This function is similar to the above function that only takes a single query, except this function requires that the `table_names` and `queries` arguments be of the same length. Each table in the `table_names` array is searched using the corresponding query from the `queries` array.
>
Expand Down
42 changes: 30 additions & 12 deletions postgres/src/main/sql/zombodb--2.5.6--2.5.7.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE TYPE zdb_multi_search_response AS (table_name regclass, query text, total int8, ctid tid, score float4, row_data json);
CREATE TYPE zdb_multi_search_response AS (table_name regclass, user_identifier text, query text, total int8, score float4[], row_data json);

CREATE OR REPLACE FUNCTION zdb_id_to_ctid(id text) RETURNS tid LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT ('(' || replace(id, '-', ',') || ')')::tid;
Expand All @@ -19,24 +19,42 @@ END;
$$;

CREATE OR REPLACE FUNCTION zdb_internal_multi_search(table_names oid[], queries text[]) RETURNS json LANGUAGE c STRICT IMMUTABLE AS '$libdir/plugins/zombodb';
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], queries text[]) RETURNS SETOF zdb_multi_search_response LANGUAGE plpgsql STRICT IMMUTABLE AS $$
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], user_identifiers text[], queries text[]) RETURNS SETOF zdb_multi_search_response LANGUAGE plpgsql STRICT IMMUTABLE AS $$
DECLARE
response json;
many integer;
BEGIN
IF array_upper(table_names,1) <> array_upper(user_identifiers,1) OR array_upper(table_names,1) <> array_upper(queries,1) THEN
RAISE EXCEPTION 'Arrays of table names, user_identifiers, and queries are not of the same length';
END IF;

response := zdb_internal_multi_search((SELECT array_agg(zdb_determine_index(unnest)) FROM unnest(table_names)), queries)->'responses';
many := array_upper(table_names, 1);

RETURN QUERY SELECT
table_names[gs],
queries[gs],
(json_array_element(response, gs - 1)->'hits'->>'total')::int8,
zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1)->'hits'->'hits')->> '_id')::tid,
(json_array_elements(json_array_element(response, gs-1)->'hits'->'hits')->>'_score')::float4,
zdb_extract_table_row(table_names[gs], zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1)->'hits'->'hits')->>'_id')::tid)
FROM generate_series(1, many) gs;
RETURN QUERY
SELECT
table_name,
user_identifier,
query,
total,
array_agg(score),
json_agg(row_data)
FROM (
SELECT
table_names[gs] AS table_name,
user_identifiers[gs] AS user_identifier,
queries[gs] AS query,
(json_array_element(response, gs - 1) -> 'hits' ->>'total') :: INT8 AS total,
(json_array_elements(json_array_element(response, gs - 1) -> 'hits' -> 'hits') ->>'_score') :: FLOAT4 AS score,
zdb_extract_table_row(
table_names[gs],
zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1) -> 'hits' -> 'hits') ->> '_id') :: tid
) AS row_data
FROM generate_series(1, many) gs
) x
GROUP BY 1, 2, 3, 4;
END;
$$;
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], query text) RETURNS SETOF zdb_multi_search_response LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT * FROM zdb_multi_search($1, (SELECT array_agg($2) FROM unnest(table_names)));
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], user_identifier text[], query text) RETURNS SETOF zdb_multi_search_response LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT * FROM zdb_multi_search($1, $2, (SELECT array_agg($3) FROM unnest(table_names)));
$$;
45 changes: 33 additions & 12 deletions postgres/src/main/sql/zombodb.sql
Original file line number Diff line number Diff line change
Expand Up @@ -417,10 +417,12 @@ $$;



CREATE TYPE zdb_multi_search_response AS (table_name regclass, query text, total int8, ctid tid, score float4, row_data json);
CREATE TYPE zdb_multi_search_response AS (table_name regclass, user_identifier text, query text, total int8, score float4[], row_data json);

CREATE OR REPLACE FUNCTION zdb_id_to_ctid(id text) RETURNS tid LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT ('(' || replace(id, '-', ',') || ')')::tid;
$$;

CREATE OR REPLACE FUNCTION zdb_extract_table_row(table_name regclass, row_ctid tid) RETURNS json LANGUAGE plpgsql STRICT IMMUTABLE AS $$
DECLARE
real_table_name regclass;
Expand All @@ -434,27 +436,46 @@ BEGIN
RETURN row_data;
END;
$$;

CREATE OR REPLACE FUNCTION zdb_internal_multi_search(table_names oid[], queries text[]) RETURNS json LANGUAGE c STRICT IMMUTABLE AS '$libdir/plugins/zombodb';
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], queries text[]) RETURNS SETOF zdb_multi_search_response LANGUAGE plpgsql STRICT IMMUTABLE AS $$
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], user_identifiers text[], queries text[]) RETURNS SETOF zdb_multi_search_response LANGUAGE plpgsql STRICT IMMUTABLE AS $$
DECLARE
response json;
many integer;
BEGIN
IF array_upper(table_names,1) <> array_upper(user_identifiers,1) OR array_upper(table_names,1) <> array_upper(queries,1) THEN
RAISE EXCEPTION 'Arrays of table names, user_identifiers, and queries are not of the same length';
END IF;

response := zdb_internal_multi_search((SELECT array_agg(zdb_determine_index(unnest)) FROM unnest(table_names)), queries)->'responses';
many := array_upper(table_names, 1);

RETURN QUERY SELECT
table_names[gs],
queries[gs],
(json_array_element(response, gs - 1)->'hits'->>'total')::int8,
zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1)->'hits'->'hits')->> '_id')::tid,
(json_array_elements(json_array_element(response, gs-1)->'hits'->'hits')->>'_score')::float4,
zdb_extract_table_row(table_names[gs], zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1)->'hits'->'hits')->>'_id')::tid)
FROM generate_series(1, many) gs;
RETURN QUERY
SELECT
table_name,
user_identifier,
query,
total,
array_agg(score),
json_agg(row_data)
FROM (
SELECT
table_names[gs] AS table_name,
user_identifiers[gs] AS user_identifier,
queries[gs] AS query,
(json_array_element(response, gs - 1) -> 'hits' ->>'total') :: INT8 AS total,
(json_array_elements(json_array_element(response, gs - 1) -> 'hits' -> 'hits') ->>'_score') :: FLOAT4 AS score,
zdb_extract_table_row(
table_names[gs],
zdb_id_to_ctid(json_array_elements(json_array_element(response, gs - 1) -> 'hits' -> 'hits') ->> '_id') :: tid
) AS row_data
FROM generate_series(1, many) gs
) x
GROUP BY 1, 2, 3, 4;
END;
$$;
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], query text) RETURNS SETOF zdb_multi_search_response LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT * FROM zdb_multi_search($1, (SELECT array_agg($2) FROM unnest(table_names)));
CREATE OR REPLACE FUNCTION zdb_multi_search(table_names regclass[], user_identifier text[], query text) RETURNS SETOF zdb_multi_search_response LANGUAGE sql STRICT IMMUTABLE AS $$
SELECT * FROM zdb_multi_search($1, $2, (SELECT array_agg($3) FROM unnest(table_names)));
$$;


Expand Down
40 changes: 6 additions & 34 deletions postgres/src/test/expected/multi-search.out
Original file line number Diff line number Diff line change
@@ -1,35 +1,7 @@
select table_name, query, total from zdb_multi_search(ARRAY['so_posts', 'so_users', 'words'], 'cheese');
table_name | query | total
------------+--------+-------
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_posts | cheese | 48
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
so_users | cheese | 19
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
words | cheese | 54
(30 rows)
select table_name, user_identifier, query, total from zdb_multi_search(ARRAY['so_posts', 'so_users'], ARRAY['a', 'b'], 'java javascript');
table_name | user_identifier | query | total
------------+-----------------+-----------------+-------
so_posts | a | java javascript | 1171
so_users | b | java javascript | 1309
(2 rows)

2 changes: 1 addition & 1 deletion postgres/src/test/sql/multi-search.sql
Original file line number Diff line number Diff line change
@@ -1 +1 @@
select table_name, query, total from zdb_multi_search(ARRAY['so_posts', 'so_users', 'words'], 'cheese');
select table_name, user_identifier, query, total from zdb_multi_search(ARRAY['so_posts', 'so_users'], ARRAY['a', 'b'], 'java javascript');

0 comments on commit b3c7f6c

Please sign in to comment.