From f6ba5bd61376b276f7a7094b815313702d744b73 Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Tue, 7 Oct 2025 15:52:46 +0200 Subject: [PATCH 1/7] chore: pg-textsearch, first draft. --- _partials/_early_access_11_25.md | 1 + use-timescale/extensions/pg-textsearch.md | 313 ++++++++++++++++++++++ use-timescale/page-index/page-index.js | 5 + 3 files changed, 319 insertions(+) create mode 100644 _partials/_early_access_11_25.md create mode 100644 use-timescale/extensions/pg-textsearch.md diff --git a/_partials/_early_access_11_25.md b/_partials/_early_access_11_25.md new file mode 100644 index 0000000000..1ac4003e4a --- /dev/null +++ b/_partials/_early_access_11_25.md @@ -0,0 +1 @@ +Early access: October 2025 diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md new file mode 100644 index 0000000000..d304086b63 --- /dev/null +++ b/use-timescale/extensions/pg-textsearch.md @@ -0,0 +1,313 @@ +--- +title: Optimize full text search with BM25 +excerpt: Set up and optimize BM25-based full-text search using pg_textsearch extension for efficient ranked text searching +keywords: [pg_textsearch, BM25, full-text search, text search, ranking, hybrid search] +tags: [search, indexing, performance, BM25] +--- + +import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; + + +# Optimize full text search with BM25 + +$PG full-text search at scale consistently hits a wall where performance degrades catastrophically. +$COMPANY's [pg_textsearch][pg_textsearch-repo] brings modern BM25-based full-text search directly into $PG, using a memtable +architecture for efficient indexing and ranking. pg_textsearch integrates seamlessly with SQL and provides better search +quality and performance than the $PG built-in full-text search. + +This guide shows you how to install pg_textsearch and configure BM25 indexes, then optimize your search capabilities. + + + +## Prerequisites + +To use pg_textsearch you need: + +* A Tiger Cloud service (available on free tier) +* $PG 17 or later +* Tables with text columns you want to search + +## Install pg_textsearch on Tiger Cloud + +pg_textsearch is available to all Tiger Cloud customers, including those on the free plan. This is a preview release +designed for development and staging environments. + + + +1. **Enable the extension on your Tiger Cloud service** + + For new services, simply enable the extension: + ```sql + CREATE EXTENSION pg_textsearch; + ``` + +1. **For existing services, update your instance** + + The extension may not be available until after your next scheduled maintenance window. You can manually pause and restart your service to pick up the update immediately. + +1. **Verify installation** + + ```sql + SELECT * FROM pg_extension WHERE extname = 'pg_textsearch'; + ``` + + + +You have installed pg_textsearch on Tiger Cloud. + +## Create and configure BM25 indexes + +BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus statistics and better algorithmic design. + + + +1. **Create a table with text content** + + ```sql + CREATE TABLE products ( + id serial PRIMARY KEY, + name text, + description text, + category text, + price numeric + ); + ``` + +1. **Insert sample data** + + ```sql + INSERT INTO products (name, description, category, price) VALUES + ('Mechanical Keyboard', 'Durable mechanical switches with RGB backlighting for gaming and productivity', 'Electronics', 149.99), + ('Ergonomic Mouse', 'Wireless mouse with ergonomic design to reduce wrist strain during long work sessions', 'Electronics', 79.99), + ('Standing Desk', 'Adjustable height desk for better posture and productivity throughout the workday', 'Furniture', 599.99); + ``` + +1. **Create a BM25 index** + + ```sql + CREATE INDEX products_search_idx ON products + USING pg_textsearch(description) + WITH (text_config='english'); + ``` + +1. **Configure memory limit if needed** + + The size of the memtable depends primarily on the number of distinct terms in your corpus. The Timescale docs dataset produces a roughly 10MB index. For comparison, a corpus with longer documents or more varied vocabulary will require more memory per document. + ```sql + -- Set memory limit per index (default 64MB) + SET pg_textsearch.index_memory_limit = '128MB'; + ``` + + + +You have created a BM25 index for full-text search. + +## Optimize search queries for performance + +Use efficient query patterns to leverage BM25 ranking and optimize search performance. + + + +1. **Perform ranked searches using the distance operator** + + ```sql + SELECT name, description, + description <@> to_tpquery('ergonomic work', 'products_search_idx') as score + FROM products + ORDER BY description <@> to_tpquery('ergonomic work', 'products_search_idx') + LIMIT 3; + ``` + +1. **Filter results by score threshold** + + ```sql + SELECT name, + description <@> to_tpquery('wireless', 'products_search_idx') as score + FROM products + WHERE description <@> to_tpquery('wireless', 'products_search_idx') < -2.0; + ``` + +1. **Combine with standard SQL operations** + + ```sql + SELECT category, name, + description <@> to_tpquery('ergonomic', 'products_search_idx') as score + FROM products + WHERE price < 500 + AND description <@> to_tpquery('ergonomic', 'products_search_idx') < -1.0 + ORDER BY description <@> to_tpquery('ergonomic', 'products_search_idx') + LIMIT 5; + ``` + +1. **Verify index usage with EXPLAIN** + + ```sql + EXPLAIN SELECT * FROM products + ORDER BY description <@> to_tpquery('wireless keyboard', 'products_search_idx') + LIMIT 5; + ``` + + + +You have optimized your search queries for BM25 ranking. + +## Build hybrid search with semantic and keyword search + +Combine pg_textsearch with pgvector to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. + + + +1. **Create a table with both text content and vector embeddings** + + ```sql + CREATE TABLE articles ( + id serial PRIMARY KEY, + title text, + content text, + embedding vector(1536) -- OpenAI ada-002 embedding dimension + ); + ``` + +1. **Create indexes for both search types** + + ```sql + -- Vector index for semantic search + CREATE INDEX articles_embedding_idx ON articles + USING hnsw (embedding vector_cosine_ops); + + -- Keyword index for BM25 search + CREATE INDEX articles_content_idx ON articles + USING pg_textsearch(content) + WITH (text_config='english'); + ``` + +1. **Perform hybrid search using Reciprocal Rank Fusion** + + ```sql + WITH vector_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector) AS rank + FROM articles + ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector + LIMIT 20 + ), + keyword_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY content <@> to_tpquery('query performance', 'articles_content_idx')) AS rank + FROM articles + ORDER BY content <@> to_tpquery('query performance', 'articles_content_idx') + LIMIT 20 + ) + SELECT + a.id, + a.title, + COALESCE(1.0 / (60 + v.rank), 0.0) + COALESCE(1.0 / (60 + k.rank), 0.0) AS combined_score + FROM articles a + LEFT JOIN vector_search v ON a.id = v.id + LEFT JOIN keyword_search k ON a.id = k.id + WHERE v.id IS NOT NULL OR k.id IS NOT NULL + ORDER BY combined_score DESC + LIMIT 10; + ``` + +1. **Adjust relative weights for different search types** + + ```sql + SELECT + a.id, + a.title, + 0.7 * COALESCE(1.0 / (60 + v.rank), 0.0) + -- 70% weight to vectors + 0.3 * COALESCE(1.0 / (60 + k.rank), 0.0) -- 30% weight to keywords + AS combined_score + FROM articles a + LEFT JOIN vector_search v ON a.id = v.id + LEFT JOIN keyword_search k ON a.id = k.id + WHERE v.id IS NOT NULL OR k.id IS NOT NULL + ORDER BY combined_score DESC + LIMIT 10; + ``` + + + +You have implemented hybrid search combining semantic and keyword search. + +## Configuration options + +Customize pg_textsearch behavior for your specific use case and data characteristics. + + + +1. **Configure language-specific text processing** + + ```sql + -- French language configuration + CREATE INDEX products_fr_idx ON products_fr + USING pg_textsearch(description) + WITH (text_config='french'); + + -- Simple tokenization without stemming + CREATE INDEX products_simple_idx ON products + USING pg_textsearch(description) + WITH (text_config='simple'); + ``` + +1. **Tune BM25 parameters** + + ```sql + -- Adjust term frequency saturation (k1) and length normalization (b) + CREATE INDEX products_custom_idx ON products + USING pg_textsearch(description) + WITH (text_config='english', k1=1.5, b=0.8); + ``` + +1. **Monitor index usage and memory consumption** + + ```sql + -- Check index usage statistics + SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read + FROM pg_stat_user_indexes + WHERE indexrelid::regclass::text ~ 'pg_textsearch'; + + -- View detailed index information + SELECT tp_debug_dump_index('products_search_idx'); + ``` + + + +You have configured pg_textsearch for optimal performance. + +## Understanding BM25 scoring + +BM25 scores in pg_textsearch are returned as negative values, where lower (more negative) numbers indicate better matches. + +Key concepts: + +* **Corpus-aware ranking**: BM25 uses inverse document frequency to weight rare terms higher +* **Term frequency saturation**: Prevents documents with excessive term repetition from dominating results +* **Length normalization**: Adjusts scores based on document length relative to corpus average +* **Relative ranking**: Focus on rank order rather than absolute score values + +## Current limitations + +The preview release (v0.0.1) focuses on core BM25 functionality: + +* **Memory-only storage**: Indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) +* **Single-column indexes**: Cannot index multiple columns in one index +* **No phrase queries**: Cannot search for exact multi-word phrases yet + +These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities. + +## Best practices + +Follow these practices for optimal pg_textsearch performance: + +* **Memory planning**: Size your `index_memory_limit` based on corpus vocabulary and document count +* **Language configuration**: Choose appropriate text search configurations for your data language +* **Hybrid search**: Combine with pgvector for applications requiring both semantic and keyword search +* **Query optimization**: Use score thresholds to filter low-relevance results +* **Index monitoring**: Regularly check index usage and memory consumption + +For production applications, consider implementing result caching and pagination to improve user experience with large result sets. + + +[pg_textsearch-repo]: https://github.com/timescale/tapir \ No newline at end of file diff --git a/use-timescale/page-index/page-index.js b/use-timescale/page-index/page-index.js index 871bc94722..8b58c0ef58 100644 --- a/use-timescale/page-index/page-index.js +++ b/use-timescale/page-index/page-index.js @@ -583,6 +583,11 @@ module.exports = [ href: "extensions", excerpt: "The Postgres extensions installed in each Tiger Cloud service", children: [ + { + title: "Optimize full text search with BM25", + href: "pg-textsearch", + excerpt: "Set up and optimize BM25-based full-text search for efficient ranked text searching", + }, { title: "Create a chatbot using pgvector", href: "pgvector", From 64633b1f425e0e47ffd3c4e697a3dead9a25616b Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Tue, 7 Oct 2025 17:43:57 +0200 Subject: [PATCH 2/7] chore: tapir, first draft. --- use-timescale/extensions/index.md | 17 +-- use-timescale/extensions/pg-textsearch.md | 127 +++++++++++----------- 2 files changed, 76 insertions(+), 68 deletions(-) diff --git a/use-timescale/extensions/index.md b/use-timescale/extensions/index.md index 9c93529404..e8fcd201d2 100644 --- a/use-timescale/extensions/index.md +++ b/use-timescale/extensions/index.md @@ -16,13 +16,14 @@ The following $PG extensions are installed with each $SERVICE_LONG: ## $COMPANY extensions -| Extension | Description | Enabled by default | -|--------------------------------------------|------------------------------------|-----------------------------------------------------| -| [pgai][pgai] | Helper functions for AI workflows | For [AI-focused][services] $SERVICE_SHORTs | -| [pgvector][pgvector] | Vector similarity search for $PG | For [AI-focused][services] $SERVICE_SHORTs | -| [pgvectorscale][pgvectorscale] | Advanced indexing for vector data | For [AI-focused][services] $SERVICE_SHORTs | -| [timescaledb_toolkit][timescaledb-toolkit] | TimescaleDB Toolkit | For [Real-time analytics][services] $SERVICE_SHORTs | -| [timescaledb][timescaledb] | TimescaleDB | For all $SERVICE_SHORTs | +| Extension | Description | Enabled by default | +|---------------------------------------------|--------------------------------------------|-----------------------------------------------------------------------| +| [pgai][pgai] | Helper functions for AI workflows | For [AI-focused][services] $SERVICE_SHORTs | +| [pg_textsearch][pg_textsearch] | [BM25][bm25-wiki]-based full-text search | Currently early access. For development and staging environments only | +| [pgvector][pgvector] | Vector similarity search for $PG | For [AI-focused][services] $SERVICE_SHORTs | +| [pgvectorscale][pgvectorscale] | Advanced indexing for vector data | For [AI-focused][services] $SERVICE_SHORTs | +| [timescaledb_toolkit][timescaledb-toolkit] | TimescaleDB Toolkit | For [Real-time analytics][services] $SERVICE_SHORTs | +| [timescaledb][timescaledb] | TimescaleDB | For all $SERVICE_SHORTs | ## $PG built-in extensions @@ -138,6 +139,7 @@ The following $PG extensions are installed with each $SERVICE_LONG: [refint]: https://www.postgresql.org/docs/current/contrib-spi.html [seg]: https://www.postgresql.org/docs/current/seg.html [pgcrypto]: /use-timescale/:currentVersion:/extensions/pgcrypto/ +[pg_textsearch]: /use-timescale/:currentVersion:/extensions/pg-textsearch/ [sslinfo]: https://www.postgresql.org/docs/current/sslinfo.html [tablefunc]: https://www.postgresql.org/docs/current/tablefunc.html [tcn]: https://www.postgresql.org/docs/current/tcn.html @@ -153,3 +155,4 @@ The following $PG extensions are installed with each $SERVICE_LONG: [timescale-extensions]: #timescale-extensions [third-party]: #third-party-extensions [services]: /getting-started/:currentVersion:/ +[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25 \ No newline at end of file diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index d304086b63..56c3ef934c 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -6,46 +6,63 @@ tags: [search, indexing, performance, BM25] --- import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; - +import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx"; # Optimize full text search with BM25 $PG full-text search at scale consistently hits a wall where performance degrades catastrophically. -$COMPANY's [pg_textsearch][pg_textsearch-repo] brings modern BM25-based full-text search directly into $PG, using a memtable -architecture for efficient indexing and ranking. pg_textsearch integrates seamlessly with SQL and provides better search -quality and performance than the $PG built-in full-text search. +$COMPANY's [pg_textsearch][pg_textsearch-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, +with a memtable architecture for efficient indexing and ranking. pg_textsearch integrates seamlessly with SQL and +provides better search quality and performance than the $PG built-in full-text search. + +BM25 scores in pg_textsearch are returned as negative values, where lower (more negative) numbers indicate better +matches. pg_textsearch implements the following: -This guide shows you how to install pg_textsearch and configure BM25 indexes, then optimize your search capabilities. +* **Corpus-aware ranking**: BM25 uses inverse document frequency to weight rare terms higher +* **Term frequency saturation**: prevents documents with excessive term repetition from dominating results +* **Length normalization**: adjusts scores based on document length relative to corpus average +* **Relative ranking**: focuses on rank order rather than absolute score values - +This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using +the following best practice: + +* **Memory planning**: Size your `index_memory_limit` based on corpus vocabulary and document count +* **Language configuration**: Choose appropriate text search configurations for your data language +* **Hybrid search**: Combine with pgvector for applications requiring both semantic and keyword search +* **Query optimization**: Use score thresholds to filter low-relevance results +* **Index monitoring**: Regularly check index usage and memory consumption + + this preview release is designed for development and staging environments. It is not recommended for use with hypertables ## Prerequisites -To use pg_textsearch you need: + -* A Tiger Cloud service (available on free tier) -* $PG 17 or later * Tables with text columns you want to search -## Install pg_textsearch on Tiger Cloud +## Install pg_textsearch -pg_textsearch is available to all Tiger Cloud customers, including those on the free plan. This is a preview release -designed for development and staging environments. +To install this $PG extension: -1. **Enable the extension on your Tiger Cloud service** +1. **Connect to your $SERVICE_LONG** - For new services, simply enable the extension: - ```sql - CREATE EXTENSION pg_textsearch; - ``` + In [$CONSOLE][services-portal] open an [SQL editor][in-console-editors]. You can also connect to your $SERVICE_SHORT using [psql][connect-using-psql]. + +1. **Enable the extension on your $SERVICE_LONG** -1. **For existing services, update your instance** + - For new services, simply enable the extension: + ```sql + CREATE EXTENSION pg_textsearch; + ``` + + - For existing services, update your instance, then enable the extension: - The extension may not be available until after your next scheduled maintenance window. You can manually pause and restart your service to pick up the update immediately. + The extension may not be available until after your next scheduled maintenance window. To pick up the update + immediately, manually pause and restart your service. -1. **Verify installation** +1. **Verify the installation** ```sql SELECT * FROM pg_extension WHERE extname = 'pg_textsearch'; @@ -53,11 +70,14 @@ designed for development and staging environments. -You have installed pg_textsearch on Tiger Cloud. +You have installed pg_textsearch on $CLOUD_LONG. + +## Create BM25 indexes on your data -## Create and configure BM25 indexes +BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus +statistics and better algorithmic design. -BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus statistics and better algorithmic design. +To create a BM25 with pg_textsearch: @@ -90,14 +110,6 @@ BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts WITH (text_config='english'); ``` -1. **Configure memory limit if needed** - - The size of the memtable depends primarily on the number of distinct terms in your corpus. The Timescale docs dataset produces a roughly 10MB index. For comparison, a corpus with longer documents or more varied vocabulary will require more memory per document. - ```sql - -- Set memory limit per index (default 64MB) - SET pg_textsearch.index_memory_limit = '128MB'; - ``` - You have created a BM25 index for full-text search. @@ -181,7 +193,7 @@ Combine pg_textsearch with pgvector to build powerful hybrid search systems that WITH (text_config='english'); ``` -1. **Perform hybrid search using Reciprocal Rank Fusion** +1. **Perform hybrid search using [reciprocal rank fusion][recip-rank-fusion]** ```sql WITH vector_search AS ( @@ -237,6 +249,16 @@ Customize pg_textsearch behavior for your specific use case and data characteris +1. **Configure the memory limit** + + The size of the memtable depends primarily on the number of distinct terms in your corpus. A corpus with longer + documents or more varied vocabulary requires more memory per document. + ```sql + -- Set memory limit per index (default 64MB) + SET pg_textsearch.index_memory_limit = '128MB'; + ``` + + 1. **Configure language-specific text processing** ```sql @@ -274,40 +296,23 @@ Customize pg_textsearch behavior for your specific use case and data characteris -You have configured pg_textsearch for optimal performance. - -## Understanding BM25 scoring - -BM25 scores in pg_textsearch are returned as negative values, where lower (more negative) numbers indicate better matches. - -Key concepts: - -* **Corpus-aware ranking**: BM25 uses inverse document frequency to weight rare terms higher -* **Term frequency saturation**: Prevents documents with excessive term repetition from dominating results -* **Length normalization**: Adjusts scores based on document length relative to corpus average -* **Relative ranking**: Focus on rank order rather than absolute score values +You have configured pg_textsearch for optimal performance. For production applications, consider implementing result +caching and pagination to improve user experience with large result sets. ## Current limitations -The preview release (v0.0.1) focuses on core BM25 functionality: +This preview release focuses on core BM25 functionality. It has the following limitations: -* **Memory-only storage**: Indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) -* **Single-column indexes**: Cannot index multiple columns in one index -* **No phrase queries**: Cannot search for exact multi-word phrases yet +* **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) +* **Single-column indexes**: cannot index multiple columns in one index +* **No phrase queries**: cannot search for exact multi-word phrases yet These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities. -## Best practices - -Follow these practices for optimal pg_textsearch performance: - -* **Memory planning**: Size your `index_memory_limit` based on corpus vocabulary and document count -* **Language configuration**: Choose appropriate text search configurations for your data language -* **Hybrid search**: Combine with pgvector for applications requiring both semantic and keyword search -* **Query optimization**: Use score thresholds to filter low-relevance results -* **Index monitoring**: Regularly check index usage and memory consumption - -For production applications, consider implementing result caching and pagination to improve user experience with large result sets. - -[pg_textsearch-repo]: https://github.com/timescale/tapir \ No newline at end of file +[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25 +[pg_textsearch-repo]: https://github.com/timescale/tapir +[in-console-editors]: /getting-started/:currentVersion:/run-queries-from-console/ +[services-portal]: https://console.cloud.timescale.com/dashboard/services +[connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service +[recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank \ No newline at end of file From 0f02ae5d19b2a48262d30714d8622e64ba64bf4b Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Wed, 8 Oct 2025 12:18:30 +0200 Subject: [PATCH 3/7] chore: updates on review. --- use-timescale/extensions/pg-textsearch.md | 31 ++++++++++++----------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 56c3ef934c..9f95abd732 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -28,7 +28,7 @@ the following best practice: * **Memory planning**: Size your `index_memory_limit` based on corpus vocabulary and document count * **Language configuration**: Choose appropriate text search configurations for your data language -* **Hybrid search**: Combine with pgvector for applications requiring both semantic and keyword search +* **Hybrid search**: Combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search * **Query optimization**: Use score thresholds to filter low-relevance results * **Index monitoring**: Regularly check index usage and memory consumption @@ -75,9 +75,9 @@ You have installed pg_textsearch on $CLOUD_LONG. ## Create BM25 indexes on your data BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus -statistics and better algorithmic design. +statistics and better algorithmic design. -To create a BM25 with pg_textsearch: +To create a BM25 index with pg_textsearch: @@ -109,6 +109,8 @@ To create a BM25 with pg_textsearch: USING pg_textsearch(description) WITH (text_config='english'); ``` + + pg_textsearch supports single-column indexes only. @@ -124,9 +126,9 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor ```sql SELECT name, description, - description <@> to_tpquery('ergonomic work', 'products_search_idx') as score + description <@> to_bm25query('ergonomic work', 'products_search_idx') as score FROM products - ORDER BY description <@> to_tpquery('ergonomic work', 'products_search_idx') + ORDER BY description <@> to_bm25query('ergonomic work', 'products_search_idx') LIMIT 3; ``` @@ -134,20 +136,20 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor ```sql SELECT name, - description <@> to_tpquery('wireless', 'products_search_idx') as score + description <@> to_bm25query('wireless', 'products_search_idx') as score FROM products - WHERE description <@> to_tpquery('wireless', 'products_search_idx') < -2.0; + WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -2.0; ``` 1. **Combine with standard SQL operations** ```sql SELECT category, name, - description <@> to_tpquery('ergonomic', 'products_search_idx') as score + description <@> to_bm25query('ergonomic', 'products_search_idx') as score FROM products WHERE price < 500 - AND description <@> to_tpquery('ergonomic', 'products_search_idx') < -1.0 - ORDER BY description <@> to_tpquery('ergonomic', 'products_search_idx') + AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -1.0 + ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') LIMIT 5; ``` @@ -155,7 +157,7 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor ```sql EXPLAIN SELECT * FROM products - ORDER BY description <@> to_tpquery('wireless keyboard', 'products_search_idx') + ORDER BY description <@> to_bm25query('wireless keyboard', 'products_search_idx') LIMIT 5; ``` @@ -165,7 +167,7 @@ You have optimized your search queries for BM25 ranking. ## Build hybrid search with semantic and keyword search -Combine pg_textsearch with pgvector to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. +Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. @@ -205,9 +207,9 @@ Combine pg_textsearch with pgvector to build powerful hybrid search systems that ), keyword_search AS ( SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> to_tpquery('query performance', 'articles_content_idx')) AS rank + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank FROM articles - ORDER BY content <@> to_tpquery('query performance', 'articles_content_idx') + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') LIMIT 20 ) SELECT @@ -258,7 +260,6 @@ Customize pg_textsearch behavior for your specific use case and data characteris SET pg_textsearch.index_memory_limit = '128MB'; ``` - 1. **Configure language-specific text processing** ```sql From 64de230ce975b1d31f736f2cb0e1e6777246a016 Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Wed, 8 Oct 2025 12:20:42 +0200 Subject: [PATCH 4/7] chore: updates on review. --- use-timescale/extensions/pg-textsearch.md | 1 - 1 file changed, 1 deletion(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 9f95abd732..004b472414 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -305,7 +305,6 @@ caching and pagination to improve user experience with large result sets. This preview release focuses on core BM25 functionality. It has the following limitations: * **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) -* **Single-column indexes**: cannot index multiple columns in one index * **No phrase queries**: cannot search for exact multi-word phrases yet These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities. From 64633a7734feee210dafd58b5bfff2f0847becf8 Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Thu, 16 Oct 2025 11:09:01 +0200 Subject: [PATCH 5/7] chore: test and update bm25 doc. --- use-timescale/extensions/pg-textsearch.md | 80 ++++++++++++++--------- 1 file changed, 50 insertions(+), 30 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 004b472414..aae9082971 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -106,11 +106,11 @@ To create a BM25 index with pg_textsearch: ```sql CREATE INDEX products_search_idx ON products - USING pg_textsearch(description) + USING bm25(description) WITH (text_config='english'); ``` - - pg_textsearch supports single-column indexes only. + + bm25 supports single-column indexes only. @@ -171,6 +171,10 @@ Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid se +1. **Enable the [vectorscale][pg-vectorscale] extension on your $SERVICE_LONG** + ```sql + CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE; + ``` 1. **Create a table with both text content and vector embeddings** ```sql @@ -191,7 +195,7 @@ Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid se -- Keyword index for BM25 search CREATE INDEX articles_content_idx ON articles - USING pg_textsearch(content) + USING bm25(content) WITH (text_config='english'); ``` @@ -199,34 +203,47 @@ Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid se ```sql WITH vector_search AS ( - SELECT id, - ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector) AS rank - FROM articles - ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector - LIMIT 20 + SELECT id, + ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rank + FROM articles + ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector + LIMIT 20 ), keyword_search AS ( - SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank - FROM articles - ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') - LIMIT 20 + SELECT id, + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + FROM articles + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + LIMIT 20 ) - SELECT - a.id, - a.title, - COALESCE(1.0 / (60 + v.rank), 0.0) + COALESCE(1.0 / (60 + k.rank), 0.0) AS combined_score + SELECT a.id, + a.title, + COALESCE(1.0 / (60 + v.rank), 0.0) + COALESCE(1.0 / (60 + k.rank), 0.0) AS combined_score FROM articles a LEFT JOIN vector_search v ON a.id = v.id LEFT JOIN keyword_search k ON a.id = k.id WHERE v.id IS NOT NULL OR k.id IS NOT NULL ORDER BY combined_score DESC - LIMIT 10; + LIMIT 10; ``` 1. **Adjust relative weights for different search types** ```sql + WITH vector_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rank + FROM articles + ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector + LIMIT 20 + ), + keyword_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + FROM articles + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + LIMIT 20 + ) SELECT a.id, a.title, @@ -279,21 +296,23 @@ Customize pg_textsearch behavior for your specific use case and data characteris ```sql -- Adjust term frequency saturation (k1) and length normalization (b) CREATE INDEX products_custom_idx ON products - USING pg_textsearch(description) + USING bm25(description) WITH (text_config='english', k1=1.5, b=0.8); ``` -1. **Monitor index usage and memory consumption** + 1. **Monitor index usage and memory consumption** - ```sql - -- Check index usage statistics - SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read - FROM pg_stat_user_indexes - WHERE indexrelid::regclass::text ~ 'pg_textsearch'; + - Check index usage statistics + ```sql + SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read + FROM pg_stat_user_indexes + WHERE indexrelid::regclass::text ~ 'bm25'; + ``` - -- View detailed index information - SELECT tp_debug_dump_index('products_search_idx'); - ``` + - View detailed index information + ```sql + SELECT bm25_debug_dump_index('products_search_idx'); + ``` @@ -315,4 +334,5 @@ These limitations will be addressed in upcoming releases with disk-based segment [in-console-editors]: /getting-started/:currentVersion:/run-queries-from-console/ [services-portal]: https://console.cloud.timescale.com/dashboard/services [connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service -[recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank \ No newline at end of file +[recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank +[pg-vectorscale]: /ai/:currentVersion:/sql-interface-for-pgvector-and-timescale-vector/#installing-the-pgvector-and-pgvectorscale-extensions \ No newline at end of file From a315ce1dc9a90266239e0aaf407157463e4ed473 Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Thu, 16 Oct 2025 13:02:01 +0200 Subject: [PATCH 6/7] chore: updates on review --- _partials/_integration-prereqs-cloud-only.md | 2 +- _partials/_integration-prereqs.md | 2 +- use-timescale/extensions/pg-textsearch.md | 32 ++++++++++---------- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/_partials/_integration-prereqs-cloud-only.md b/_partials/_integration-prereqs-cloud-only.md index e2be24287e..22e7fd8e3c 100644 --- a/_partials/_integration-prereqs-cloud-only.md +++ b/_partials/_integration-prereqs-cloud-only.md @@ -1,7 +1,7 @@ To follow the steps on this page: -* Create a target [$SERVICE_LONG][create-service] with time-series and analytics enabled. +* Create a target [$SERVICE_LONG][create-service] with the Real-time analytics capability. You need your [connection details][connection-info]. diff --git a/_partials/_integration-prereqs.md b/_partials/_integration-prereqs.md index 2dd9da6482..86fb5409a7 100644 --- a/_partials/_integration-prereqs.md +++ b/_partials/_integration-prereqs.md @@ -1,6 +1,6 @@ To follow the steps on this page: -* Create a target [$SERVICE_LONG][create-service] with time-series and analytics enabled. +* Create a target [$SERVICE_LONG][create-service] with the Real-time analytics capability. You need [your connection details][connection-info]. This procedure also works for [$SELF_LONG][enable-timescaledb]. diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index aae9082971..67057e0acd 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -1,6 +1,6 @@ --- title: Optimize full text search with BM25 -excerpt: Set up and optimize BM25-based full-text search using pg_textsearch extension for efficient ranked text searching +excerpt: Set up and optimize BM25-based full-text search using the pg_textsearch extension keywords: [pg_textsearch, BM25, full-text search, text search, ranking, hybrid search] tags: [search, indexing, performance, BM25] --- @@ -12,11 +12,11 @@ import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.md $PG full-text search at scale consistently hits a wall where performance degrades catastrophically. $COMPANY's [pg_textsearch][pg_textsearch-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, -with a memtable architecture for efficient indexing and ranking. pg_textsearch integrates seamlessly with SQL and +with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and provides better search quality and performance than the $PG built-in full-text search. -BM25 scores in pg_textsearch are returned as negative values, where lower (more negative) numbers indicate better -matches. pg_textsearch implements the following: +BM25 scores in `pg_textsearch` are returned as negative values, where lower (more negative) numbers indicate better +matches. `pg_textsearch` implements the following: * **Corpus-aware ranking**: BM25 uses inverse document frequency to weight rare terms higher * **Term frequency saturation**: prevents documents with excessive term repetition from dominating results @@ -26,19 +26,19 @@ matches. pg_textsearch implements the following: This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using the following best practice: -* **Memory planning**: Size your `index_memory_limit` based on corpus vocabulary and document count -* **Language configuration**: Choose appropriate text search configurations for your data language -* **Hybrid search**: Combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search -* **Query optimization**: Use score thresholds to filter low-relevance results -* **Index monitoring**: Regularly check index usage and memory consumption +* **Memory planning**: size your `index_memory_limit` based on corpus vocabulary and document count +* **Language configuration**: choose appropriate text search configurations for your data language +* **Hybrid search**: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search +* **Query optimization**: use score thresholds to filter low-relevance results +* **Index monitoring**: regularly check index usage and memory consumption - this preview release is designed for development and staging environments. It is not recommended for use with hypertables + this preview release is designed for development and staging environments. It is not recommended for use with hypertables. ## Prerequisites -* Tables with text columns you want to search +* Search tables with text columns ## Install pg_textsearch @@ -70,7 +70,7 @@ To install this $PG extension: -You have installed pg_textsearch on $CLOUD_LONG. +You have installed `pg_textsearch` on $CLOUD_LONG. ## Create BM25 indexes on your data @@ -110,7 +110,7 @@ To create a BM25 index with pg_textsearch: WITH (text_config='english'); ``` - bm25 supports single-column indexes only. + BM25 supports single-column indexes only. @@ -167,7 +167,7 @@ You have optimized your search queries for BM25 ranking. ## Build hybrid search with semantic and keyword search -Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. +Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. @@ -264,7 +264,7 @@ You have implemented hybrid search combining semantic and keyword search. ## Configuration options -Customize pg_textsearch behavior for your specific use case and data characteristics. +Customize `pg_textsearch` behavior for your specific use case and data characteristics. @@ -316,7 +316,7 @@ Customize pg_textsearch behavior for your specific use case and data characteris -You have configured pg_textsearch for optimal performance. For production applications, consider implementing result +You have configured `pg_textsearch` for optimal performance. For production applications, consider implementing result caching and pagination to improve user experience with large result sets. ## Current limitations From 58f13b49c6165be1cddfc73f864b83eaa5fbcd02 Mon Sep 17 00:00:00 2001 From: Iain Cox Date: Thu, 16 Oct 2025 17:15:33 +0200 Subject: [PATCH 7/7] Update pg-textsearch.md Signed-off-by: Iain Cox --- use-timescale/extensions/pg-textsearch.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 67057e0acd..fbf7cb63f0 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -38,8 +38,6 @@ the following best practice: -* Search tables with text columns - ## Install pg_textsearch To install this $PG extension: @@ -335,4 +333,4 @@ These limitations will be addressed in upcoming releases with disk-based segment [services-portal]: https://console.cloud.timescale.com/dashboard/services [connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service [recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank -[pg-vectorscale]: /ai/:currentVersion:/sql-interface-for-pgvector-and-timescale-vector/#installing-the-pgvector-and-pgvectorscale-extensions \ No newline at end of file +[pg-vectorscale]: /ai/:currentVersion:/sql-interface-for-pgvector-and-timescale-vector/#installing-the-pgvector-and-pgvectorscale-extensions