Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #35689 - Use better fact search query #9724

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

adamruzicka
Copy link
Contributor

No description provided.

@theforeman-bot
Copy link
Member

Issues: #35689

@sbernhard
Copy link
Contributor

Can you use Benchmark.measure to determine which implementation is faster? It would also be interesting to run "EXPLAIN ANALYZE SELECT ....` on the DB to see which one is better.

@adamruzicka
Copy link
Contributor Author

I don't have a box with lots of facts right now and it will take me a bit to get one so I can't really benchmark it, however here are the explains:

Before the changes

Foreman

> Host.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" = $1 AND ((hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'architecture') AND (fact_values.value = 'x86_64')))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27266.33..27266.50 rows=67 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=27236.52..27264.30 rows=67 width=3199)
         ->  HashAggregate  (cost=27236.23..27236.90 rows=67 width=4)
               Group Key: fact_values.host_id
               ->  Nested Loop  (cost=231.73..27236.06 rows=67 width=4)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Bitmap Heap Scan on fact_values  (cost=231.45..27226.78 rows=99 width=8)
                           Recheck Cond: (fact_name_id = fact_names.id)
                           Filter: (value = 'x86_64'::text)
                           ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                 Index Cond: (fact_name_id = fact_names.id)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.41 rows=1 width=3199)
               Index Cond: (id = fact_values.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(16 rows)

Discovery

> Host::Discovered.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" INNER JOIN "fact_values" fact_values_634325 ON (hosts.id = fact_values_634325.host_id)
 INNER JOIN "fact_names" fact_names_634325 ON (fact_names_634325.id =  fact_values_634325.fact_name_id) WHERE "hosts"."type" = $1 AND ((fact_names_634325."name" = 'architecture') AND ("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_634325"."value" = 'x86_64' ))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27336.40..27336.40 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Hash Join  (cost=240.27..27336.39 rows=1 width=3199)
         Hash Cond: (fact_values_634325.host_id = hosts.id)
         Join Filter: (SubPlan 1)
         ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_634325  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Bitmap Heap Scan on fact_values fact_values_634325  (cost=233.92..27204.26 rows=9999 width=28)
                     Recheck Cond: (fact_name_id = fact_names_634325.id)
                     ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                           Index Cond: (fact_name_id = fact_names_634325.id)
         ->  Hash  (cost=6.05..6.05 rows=1 width=3199)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
         SubPlan 1
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_634325.value = 'x86_64'::text)
                 ->  Seq Scan on fact_values  (cost=0.00..150863.99 rows=8358799 width=4)
(19 rows)

After the changes

Foreman

> Host.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_128255 ON fact_values_128255.host_id = hosts.id LEFT JOIN fact_names AS fact_names_128255 ON fact_names_128255.id = fact_values_128255.fact_name_id WHERE "hosts"."type" = $1 AND ((fact_names_128255.name = 'architecture' AND (fact_values_128255.value = 'x86_64'))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27259.29..27259.46 rows=67 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=232.01..27257.26 rows=67 width=3199)
         ->  Nested Loop  (cost=231.73..27236.06 rows=67 width=4)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_128255  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Bitmap Heap Scan on fact_values fact_values_128255  (cost=231.45..27226.78 rows=99 width=8)
                     Recheck Cond: (fact_name_id = fact_names_128255.id)
                     Filter: (value = 'x86_64'::text)
                     ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                           Index Cond: (fact_name_id = fact_names_128255.id)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.32 rows=1 width=3199)
               Index Cond: (id = fact_values_128255.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(14 rows)

Discovery

> Host::Discovered.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_a84d6e ON fact_values_a84d6e.host_id = hosts.id LEFT JOIN fact_names AS fact_names_a84d6e ON fact_names_a84d6e.id = fact_values_a84d6e.fact_name_id WHERE "hosts"."type" = $1 AND ((fact_names_a84d6e.name = 'architecture' AND (fact_values_a84d6e.value = 'x86_64'))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=22.83..22.84 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Nested Loop  (cost=1.00..22.82 rows=1 width=3199)
         ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_a84d6e  (cost=0.28..8.29 rows=1 width=4)
               Index Cond: ((name)::text = 'architecture'::text)
         ->  Nested Loop  (cost=0.72..14.52 rows=1 width=3203)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_a84d6e  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names_a84d6e.id) AND (host_id = hosts.id))
                     Filter: (value = 'x86_64'::text)
(11 rows)

@adamruzicka
Copy link
Contributor Author

With selective joins and searching on multiple facts

Before

Foreman

> ::Host.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" = $1 AND (((hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'architecture') AND (fact_values.value = 'x86_64'))) AND (hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'chassistype') AND (fact_values.value = 'Other'))))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27682.51..27682.52 rows=1 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop Semi Join  (cost=27236.51..27682.50 rows=1 width=3199)
         ->  Nested Loop  (cost=27235.80..27246.48 rows=26 width=3203)
               ->  HashAggregate  (cost=27235.51..27235.77 rows=26 width=4)
                     Group Key: fact_values_1.host_id
                     ->  Nested Loop  (cost=231.71..27235.45 rows=26 width=4)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_1  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_1  (cost=231.43..27226.76 rows=39 width=8)
                                 Recheck Cond: (fact_name_id = fact_names_1.id)
                                 Filter: (value = 'Other'::text)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_1.id)
               ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.41 rows=1 width=3199)
                     Index Cond: (id = fact_values_1.host_id)
                     Filter: ((type)::text = 'Host::Managed'::text)
         ->  Nested Loop  (cost=0.71..16.76 rows=1 width=4)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names.id) AND (host_id = hosts.id))
                     Filter: (value = 'x86_64'::text)
(23 rows)

Discovery

> ::Host::Discovered.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" INNER JOIN "fact_values" fact_values_253030 ON (hosts.id = fact_values_253030.host_id)
 INNER JOIN "fact_names" fact_names_253030 ON (fact_names_253030.id =  fact_values_253030.fact_name_id) INNER JOIN "fact_values" fact_values_22845 ON (hosts.id = fact_values_22845.host_id)
 INNER JOIN "fact_names" fact_names_22845 ON (fact_names_22845.id =  fact_values_22845.fact_name_id) WHERE "hosts"."type" = $1 AND ((fact_names_253030."name" = 'architecture') AND (fact_names_22845."name" = 'chassistype') AND (("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_253030"."value"
= 'x86_64' )) AND ("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_22845"."value" = 'Other' )))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=54832.55..54832.56 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Hash Join  (cost=27637.48..54832.54 rows=1 width=3199)
         Hash Cond: (fact_values_253030.host_id = hosts.id)
         Join Filter: ((SubPlan 1) AND (SubPlan 2))
         ->  Hash Join  (cost=27631.42..54812.77 rows=5221 width=48)
               Hash Cond: (fact_values_253030.host_id = fact_values_22845.host_id)
               ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_253030  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Bitmap Heap Scan on fact_values fact_values_253030  (cost=233.92..27204.26 rows=9999 width=28)
                           Recheck Cond: (fact_name_id = fact_names_253030.id)
                           ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                 Index Cond: (fact_name_id = fact_names_253030.id)
               ->  Hash  (cost=27312.54..27312.54 rows=6774 width=24)
                     ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_22845  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_22845  (cost=233.92..27204.26 rows=9999 width=28)
                                 Recheck Cond: (fact_name_id = fact_names_22845.id)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_22845.id)
         ->  Hash  (cost=6.05..6.05 rows=1 width=3199)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
         SubPlan 1
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_253030.value = 'x86_64'::text)
                 ->  Seq Scan on fact_values  (cost=0.00..150863.99 rows=8358799 width=4)
         SubPlan 2
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_22845.value = 'Other'::text)
                 ->  Seq Scan on fact_values fact_values_1  (cost=0.00..150863.99 rows=8358799 width=4)
(33 rows)

After

Foreman

> ::Host.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_55e851 ON fact_values_55e851.host_id = hosts.id LEFT JOIN fact_names AS fact_names_55e851 ON fact_names_55e851.id = fact_values_55e851.fact_name_id AND fact_names_55e851.name = 'architecture' LEFT JOIN fact_values AS fact_values_76baee ON fact_values_76baee.host_id = hosts.id LEFT JOIN fact_names AS fact_names_76baee ON fact_names_76baee.id = fact_values_76baee.fact_name_id AND fact_names_76baee.name = 'chassistype' WHERE "hosts"."type" = $1 AND (((fact_names_55e851.name = 'architecture' AND (fact_values_55e851.value = 'x86_64')) AND (fact_names_76baee.name = 'chassistype' AND (fact_values_76baee.value = 'Other')))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27465.15..27465.16 rows=1 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=232.71..27465.14 rows=1 width=3199)
         ->  Nested Loop  (cost=232.42..27464.82 rows=1 width=8)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_55e851  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Nested Loop  (cost=232.14..27455.54 rows=99 width=12)
                     ->  Nested Loop  (cost=231.71..27235.45 rows=26 width=4)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_76baee  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_76baee  (cost=231.43..27226.76 rows=39 width=8)
                                 Recheck Cond: (fact_name_id = fact_names_76baee.id)
                                 Filter: (value = 'Other'::text)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_76baee.id)
                     ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_55e851  (cost=0.43..8.45 rows=1 width=8)
                           Index Cond: ((fact_name_id = fact_names_55e851.id) AND (host_id = fact_values_76baee.host_id))
                           Filter: (value = 'x86_64'::text)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.32 rows=1 width=3199)
               Index Cond: (id = fact_values_55e851.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(21 rows)

Discovery

> ::Host::Discovered.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_9dcff5 ON fact_values_9dcff5.host_id = hosts.id LEFT JOIN fact_names AS fact_names_9dcff5 ON fact_names_9dcff5.id = fact_values_9dcff5.fact_name_id AND fact_names_9dcff5.name = 'architecture' LEFT JOIN fact_values AS fact_values_47b143 ON fact_values_47b143.host_id = hosts.id LEFT JOIN fact_names AS fact_names_47b143 ON fact_names_47b143.id = fact_values_47b143.fact_name_id AND fact_names_47b143.name = 'chassistype' WHERE "hosts"."type" = $1 AND (((fact_names_9dcff5.name = 'architecture' AND (fact_values_9dcff5.value = 'x86_64')) AND (fact_names_47b143.name = 'chassistype' AND (fact_values_47b143.value = 'Other')))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=39.60..39.61 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Nested Loop  (cost=1.71..39.59 rows=1 width=3199)
         ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_47b143  (cost=0.28..8.29 rows=1 width=4)
               Index Cond: ((name)::text = 'chassistype'::text)
         ->  Nested Loop  (cost=1.43..31.29 rows=1 width=3203)
               ->  Nested Loop  (cost=1.00..22.82 rows=1 width=3203)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_9dcff5  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Nested Loop  (cost=0.72..14.52 rows=1 width=3207)
                           ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                                 Index Cond: ((type)::text = 'Host::Discovered'::text)
                           ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_9dcff5  (cost=0.43..8.45 rows=1 width=8)
                                 Index Cond: ((fact_name_id = fact_names_9dcff5.id) AND (host_id = hosts.id))
                                 Filter: (value = 'x86_64'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_47b143  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names_47b143.id) AND (host_id = fact_values_9dcff5.host_id))
                     Filter: (value = 'Other'::text)
(18 rows)

@adamruzicka adamruzicka marked this pull request as ready for review March 19, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants