Skip to content

trinaldi/pg-mongo-benchmark

Repository files navigation

Product Search Benchmark — PostgreSQL vs MongoDB

A Rails application benchmarking substring search strategies for product inventory systems, comparing PostgreSQL with pg_trgm against MongoDB with regex.


Overview

This project explores how to implement fast, accent-tolerant, case-insensitive substring search for a product autocomplete feature — the kind used in inventory or e-commerce systems where a user types "com g" and expects to find "Água com Gás 500ml".

Key findings

Strategy Real time / query (500k records) Index used
PostgreSQL + pg_trgm (GIN) ~0.57ms ✅ Bitmap Index Scan
MongoDB + B-tree on normalized_name ~1.63ms ❌ Ignored by planner

PostgreSQL with pg_trgm is ~3x faster for substring search at scale.


Stack

  • Ruby on Rails 7.1
  • PostgreSQL with pg_trgm extension
  • MongoDB via Mongoid
  • RSpec + DatabaseCleaner
  • Benchmark (stdlib)

How it works

Normalized name

Every product stores a normalized_name column — a lowercase, accent-stripped version of the name generated automatically on save:

def self.normalize_string(str)
  str
    .unicode_normalize(:nfd)  # decompose accents: "á" → "a" + combining mark
    .gsub(/\p{Mn}/, "")       # strip combining marks
    .downcase
    .strip
end

"Água com Gás 500ml""agua com gas 500ml"

This makes searches accent-tolerant without any special DB configuration.

Token-based search

Queries are split into tokens and applied as individual AND filters:

def self.search(query)
  tokens = normalize_string(query).split.reject(&:blank?)
  return none if tokens.empty?

  scope = all
  tokens.each { |token| scope = apply_token_filter(scope, token) }
  scope.limit(20)
end

Searching "agua gas" finds any product containing both words in any order — much more natural for autocomplete than an exact substring match.

PostgreSQL filter

scope.where("normalized_name LIKE ?", "%#{token}%")

MongoDB filter

scope.where(normalized_name: /#{Regexp.escape(token)}/i)

Setup

Requirements

  • Ruby 3.x
  • PostgreSQL 14+
  • MongoDB 6+

Install

git clone <repo>
cd estoque
bundle install

Database setup

# PostgreSQL
rails db:create db:migrate

# MongoDB (indexes)
rails runner "ProductMongo.create_indexes"

Enable pg_trgm

The migration enables the extension and creates the GIN index automatically:

rails db:migrate

If you need to run it manually in psql:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX index_products_on_normalized_name_trigram
  ON products USING gin (normalized_name gin_trgm_ops);

Running the benchmark

rspec spec/models/product_benchmark_spec.rb --format documentation

This will:

  1. Insert 500,000 products into both PostgreSQL and MongoDB using bulk inserts
  2. Run 100 search queries against each backend
  3. Print Benchmark results side by side
  4. Clean up all records after

Expected output:

                                           user     system      total        real
Product.search('com g'):               0.039609   0.007008   0.046617 (  0.057242)

                                           user     system      total        real
ProductMongo.search('com g'):          0.128274   0.011873   0.140147 (  0.162563)

Reading the benchmark output

Column What it measures
user CPU time spent by the Ruby process
system CPU time spent by the kernel (I/O, syscalls)
total user + system
real Wall-clock time — what the user actually waits

real is the most relevant column for perceived performance.


Running unit tests

rspec spec/models/product_spec.rb --format documentation

Covers:

  • Substring search with accented input
  • Case-insensitive search
  • Multi-token search
  • Empty result cases
  • Result limit (max 20)

Why pg_trgm wins for substring search

A standard B-tree index only accelerates LIKE 'prefix%' queries (anchored at the start). For LIKE '%substring%', it is ignored and the database falls back to a sequential scan.

pg_trgm solves this by indexing every trigram (group of 3 characters) in the string. A GIN index on trigrams allows PostgreSQL to use an index scan even for LIKE '%...%' patterns.

MongoDB's B-tree index faces the same limitation with leading-wildcard regex (/substring/). The planner ignores the index and performs a collection scan. The MongoDB equivalent of pg_trgm is Atlas Search (Lucene-based), but it is only available on MongoDB Atlas (cloud).

Conclusion: for a self-hosted inventory system with autocomplete, PostgreSQL + pg_trgm is the pragmatic and performant choice.


Project structure

app/
  models/
    concerns/
      product_searchable.rb   # shared search logic (tokens, normalization)
    product.rb                # ActiveRecord model (PostgreSQL)
    product_mongo.rb          # Mongoid model (MongoDB)
spec/
  models/
    product_spec.rb           # unit tests (correctness)
    product_benchmark_spec.rb # performance benchmark
  support/
    shared_examples/
      product_search.rb       # shared RSpec examples for both backends
db/
  migrate/
    ..._create_products.rb
    ..._add_trigram_index_to_products.rb

Caching (optional)

For production autocomplete, wrap search with Rails cache to avoid hitting the database on repeated queries:

def self.search(query, limit: 20)
  normalized_query = normalize_string(query)

  Rails.cache.fetch("product_search/#{normalized_query}/#{limit}", expires_in: 5.minutes) do
    # ... search logic
  end
end

Invalidate on record changes:

after_commit { Rails.cache.delete_matched("product_search/*") }

About

A Rails application benchmarking substring search strategies for product inventory systems, comparing PostgreSQL with pg_trgm against MongoDB with regex.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors