A Rails application benchmarking substring search strategies for product inventory systems, comparing PostgreSQL with pg_trgm against MongoDB with regex.
This project explores how to implement fast, accent-tolerant, case-insensitive substring search for a product autocomplete feature — the kind used in inventory or e-commerce systems where a user types "com g" and expects to find "Água com Gás 500ml".
| Strategy | Real time / query (500k records) | Index used |
|---|---|---|
PostgreSQL + pg_trgm (GIN) |
~0.57ms | ✅ Bitmap Index Scan |
MongoDB + B-tree on normalized_name |
~1.63ms | ❌ Ignored by planner |
PostgreSQL with pg_trgm is ~3x faster for substring search at scale.
- Ruby on Rails 7.1
- PostgreSQL with
pg_trgmextension - MongoDB via Mongoid
- RSpec + DatabaseCleaner
- Benchmark (stdlib)
Every product stores a normalized_name column — a lowercase, accent-stripped version of the name generated automatically on save:
def self.normalize_string(str)
str
.unicode_normalize(:nfd) # decompose accents: "á" → "a" + combining mark
.gsub(/\p{Mn}/, "") # strip combining marks
.downcase
.strip
end"Água com Gás 500ml" → "agua com gas 500ml"
This makes searches accent-tolerant without any special DB configuration.
Queries are split into tokens and applied as individual AND filters:
def self.search(query)
tokens = normalize_string(query).split.reject(&:blank?)
return none if tokens.empty?
scope = all
tokens.each { |token| scope = apply_token_filter(scope, token) }
scope.limit(20)
endSearching "agua gas" finds any product containing both words in any order — much more natural for autocomplete than an exact substring match.
scope.where("normalized_name LIKE ?", "%#{token}%")scope.where(normalized_name: /#{Regexp.escape(token)}/i)- Ruby 3.x
- PostgreSQL 14+
- MongoDB 6+
git clone <repo>
cd estoque
bundle install# PostgreSQL
rails db:create db:migrate
# MongoDB (indexes)
rails runner "ProductMongo.create_indexes"The migration enables the extension and creates the GIN index automatically:
rails db:migrateIf you need to run it manually in psql:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX index_products_on_normalized_name_trigram
ON products USING gin (normalized_name gin_trgm_ops);rspec spec/models/product_benchmark_spec.rb --format documentationThis will:
- Insert 500,000 products into both PostgreSQL and MongoDB using bulk inserts
- Run 100 search queries against each backend
- Print Benchmark results side by side
- Clean up all records after
Expected output:
user system total real
Product.search('com g'): 0.039609 0.007008 0.046617 ( 0.057242)
user system total real
ProductMongo.search('com g'): 0.128274 0.011873 0.140147 ( 0.162563)
| Column | What it measures |
|---|---|
user |
CPU time spent by the Ruby process |
system |
CPU time spent by the kernel (I/O, syscalls) |
total |
user + system |
real |
Wall-clock time — what the user actually waits |
real is the most relevant column for perceived performance.
rspec spec/models/product_spec.rb --format documentationCovers:
- Substring search with accented input
- Case-insensitive search
- Multi-token search
- Empty result cases
- Result limit (max 20)
A standard B-tree index only accelerates LIKE 'prefix%' queries (anchored at the start). For LIKE '%substring%', it is ignored and the database falls back to a sequential scan.
pg_trgm solves this by indexing every trigram (group of 3 characters) in the string. A GIN index on trigrams allows PostgreSQL to use an index scan even for LIKE '%...%' patterns.
MongoDB's B-tree index faces the same limitation with leading-wildcard regex (/substring/). The planner ignores the index and performs a collection scan. The MongoDB equivalent of pg_trgm is Atlas Search (Lucene-based), but it is only available on MongoDB Atlas (cloud).
Conclusion: for a self-hosted inventory system with autocomplete, PostgreSQL + pg_trgm is the pragmatic and performant choice.
app/
models/
concerns/
product_searchable.rb # shared search logic (tokens, normalization)
product.rb # ActiveRecord model (PostgreSQL)
product_mongo.rb # Mongoid model (MongoDB)
spec/
models/
product_spec.rb # unit tests (correctness)
product_benchmark_spec.rb # performance benchmark
support/
shared_examples/
product_search.rb # shared RSpec examples for both backends
db/
migrate/
..._create_products.rb
..._add_trigram_index_to_products.rb
For production autocomplete, wrap search with Rails cache to avoid hitting the database on repeated queries:
def self.search(query, limit: 20)
normalized_query = normalize_string(query)
Rails.cache.fetch("product_search/#{normalized_query}/#{limit}", expires_in: 5.minutes) do
# ... search logic
end
endInvalidate on record changes:
after_commit { Rails.cache.delete_matched("product_search/*") }