Skip to content

Commit

Permalink
issue #27: update tutorial and other docs to talk about how to use ag…
Browse files Browse the repository at this point in the history
…gregates
  • Loading branch information
eeeebbbbrrrr committed Aug 31, 2015
1 parent 7591040 commit 14e0fa4
Show file tree
Hide file tree
Showing 5 changed files with 273 additions and 54 deletions.
2 changes: 1 addition & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ A large cluster configuration is likely to have a number of dedicated "data", "m
# sudo bin/plugin -i zombodb -u file:///path/to/zombodb-plugin-X.X.X.zip
```

There's a few configuration settings that **must** to be set in `elasticsearch.yml`:
There are a few configuration settings that **must** be set in `elasticsearch.yml`:

```
script.disable_dynamic: false
Expand Down
32 changes: 17 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ Index management happens using standard Postgres SQL commands such as ```CREATE
- extremely fast indexing
- record count estimation
- high-performance hit highlighting
- access to Elasticsearch's full set of aggregations
- access to many of Elasticsearch's aggregations, including ability to nest aggregations
- use whatever method you currently use for talking to Postgres (JDBC, DBI, libpq, etc)
- fairly extensive test suite (NB: in progress of being converted from closed-source version)
- fairly extensive test suite

Not to suggest that these things are impossible, but there's a small set of non-features too:

Expand Down Expand Up @@ -80,31 +80,33 @@ CREATE EXTENSION zombodb;
Create a table:

```
CREATE TABLE books (
book_id serial8 NOT NULL PRIMARY KEY,
author varchar(128),
publication_date date,
title phrase, -- 'phrase' is a DOMAIN provided by ZomboDB
content fulltext -- 'fulltext' is a DOMAIN provided by ZomboDB
CREATE TABLE products (
id SERIAL8 NOT NULL PRIMARY KEY,
name text NOT NULL,
keywords varchar(64)[],
short_summary phrase,
long_description fulltext,
price bigint,
inventory_count integer,
discontinued boolean default false,
availability_date date
);
-- insert some data
```

Index it:

```
CREATE INDEX idxbooks ON books
USING zombodb (zdb(books))
WITH (url='http://localhost:9200', shards=5, replicas=1);
CREATE INDEX idx_zdb_products
ON products
USING zombodb(zdb(products))
WITH (url='http://localhost:9200/', shards=5, replicas=1);
```

Query it:

```
SELECT * FROM books WHERE zdb(books) ==> 'title:(catcher w/3 rye)
and content:"Ossenburger Memorial Wing"
or author:Salinger*';
SELECT * FROM products WHERE zdb(products) ==> 'keywords:(sports,box) or long_description:(wooden w/5 away) and price < 100000';
```


Expand Down
47 changes: 36 additions & 11 deletions SQL-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,24 +60,49 @@ These custom domains are to be used in user tables as data types when you requir
#### ```FUNCTION zdb_arbitrary_aggregate(table_name regclass, aggregate_query json, query text) RETURNS json```

> ```table_name```: The name of a table with a ZomboDB index, or the name of a view on top of a table with a ZomboDB index
> ```aggregate_query```: an Elasticsearch-compatible aggregate query, in JSON form
> ```aggregate_query```: specialized ZomboDB-specific syntax to chain together one or more ZomboDB-supported aggregation types (terms, significant terms, suggestions, extended statistics)
> ```query```: a full text query
>
> returns the Elasticsearch-created JSON results. The data returned is MVCC-safe.
>
> This function is primary used for building and returning nested aggregation queries. Currently, only the three aggregation types ZomboDB supports can be used.
>
> The syntax for the `aggregate_query` argument follows the form:
>
> ```
> #tally(fieldname, stem, max_terms, term_order [, another aggregate])
> ```
>
> or
>
> ```
> #significant_terms(fieldname, stem, max_terms [, another aggregate])
> ```
>
> or
>
> ```
> #extended_stats(fieldname)
> ```
>
> or
>
> ```
> #suggest(fieldname, base_term, max_terms)
> ```
>
> Then then they can be chained together to form complex, nested aggregations. For example, using the `products` table from the [TUTORIAL](TUTORIAL.md), to break down the products by availability month and keyword:
>
> Example:
>
> ```
> SELECT * FROM zdb_arbitrary_aggregate('table', '{
> "aggregations": {
> "my_agg": {
> "terms": {
> "field": "text"
> }
> }
> }
>}', 'beer,wine,cheese');
> tutorial=# SELECT * FROM zdb_arbitrary_aggregate('products', '#tally(availability_date, month, 5000, term, #tally(keywords, ''^.*'', 5000, term))', '');
zdb_arbitrary_aggregate
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"missing":{"doc_count":0},"availability_date":{"buckets":[{"key_as_string":"2015-07","key":1435708800000,"doc_count":1,"keywords":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"box","doc_count":1},{"key":"negative space","doc_count":1},{"key":"square","doc_count":1},{"key":"wooden","doc_count":1}]}},{"key_as_string":"2015-08","key":1438387200000,"doc_count":3,"keywords":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"alexander graham bell","doc_count":1},{"key":"baseball","doc_count":1},{"key":"communication","doc_count":1},{"key":"magical","doc_count":1},{"key":"primitive","doc_count":1},{"key":"round","doc_count":2},{"key":"sports","doc_count":1},{"key":"widget","doc_count":1}]}}]}}
>```
>
>The response is a JSON blob because it's quite difficult to project an arbitrary nested structure into a resultset with SQL. The intent is that decoding of the response would be application-specific.
#### ```FUNCTION zdb_describe_nested_object(table_name regclass, fieldname text) RETURNS json```

Expand Down Expand Up @@ -257,7 +282,7 @@ These custom domains are to be used in user tables as data types when you requir
>
> This function provides direct access to Elasticsearch's ["significant terms"](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html) aggregation. The results are MVCC-safe. Returned terms are forced to upper-case.
>
> Note, fields of type ```phrase```, ```phrase_array```, and ```fulltext``` are not supported.
> Note: Fields of type ```phrase```, ```phrase_array```, and ```fulltext``` are not supported.
>
> Example:
>
Expand Down
8 changes: 4 additions & 4 deletions TUTORIAL-data.dmp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
1 Magical Widget {magical,widget} A widget that is quite magical Magical Widgets come from the land of Magicville and are capable of things you can't imagine 9900 42 f
2 Baseball {baseball,sports} It's a baseball Throw it at a person with a big wooden stick and hope they don't hit it 1249 2 f
3 Telephone {communication,primitive,"alexander graham bell"} A device to enable long-distance communications Use this to call your friends and family and be annoyed by telemarketers. Long-distance charges may apply 1899 200 f
4 Box {wooden,box,"negative space"} Just an empty box made of wood A wooden container that will eventually rot away. Put stuff it in (but not a cat). 17000 0 t
1 Magical Widget {magical,widget,round} A widget that is quite magical Magical Widgets come from the land of Magicville and are capable of things you can't imagine 9900 42 f 2015-08-31
2 Baseball {baseball,sports,round} It's a baseball Throw it at a person with a big wooden stick and hope they don't hit it 1249 2 f 2015-08-21
3 Telephone {communication,primitive,"alexander graham bell"} A device to enable long-distance communications Use this to call your friends and family and be annoyed by telemarketers. Long-distance charges may apply 1899 200 f 2015-08-11
4 Box {wooden,box,"negative space",square} Just an empty box made of wood A wooden container that will eventually rot away. Put stuff it in (but not a cat). 17000 0 t 2015-07-01
Loading

0 comments on commit 14e0fa4

Please sign in to comment.