A Different Type of SQL Recursion with PostgreSQL #1
Replies: 3 comments 6 replies
-
Given the amount of effort and the way recursion is implemented in solution 2, I'd go for solution 2. It's more elegant and more C language-like and more in-line HW would execute recursion (mechanical sympathy). Though solution 1 is ok, and it's not that convoluted and complex, it's overcomplicated for a simple requirement like this. It just shows how algebra and lambda calculus theory (procedural, functional and OOP paradigms) can solve some problems in a way more elegant way than set theory (SQL-like DBs and Codd's relational model). All major DB vendors have procedural extensions and languages associated with them: Just use them. Portability is not an issue when having similar Turing complete procedural language SQL extensions. |
Beta Was this translation helpful? Give feedback.
-
I also had to traverse the relation graph recursively and used the recursive CTE approach: https://github.com/docteurklein/httpg/blob/main/sql/relation_defs.sql |
Beta Was this translation helpful? Give feedback.
-
This all seems to be a lot more complicated than it needs to be. The following should be sufficient. CREATE VIEW fk_tables (table_schema, table_name, fk_table_schema, fk_table_name) AS
SELECT n1.nspname, c1.relname, n2.nspname, c2.relname
FROM pg_constraint AS con
JOIN pg_class AS c1 ON c1.oid = con.conrelid
JOIN pg_namespace AS n1 ON n1.oid = c1.relnamespace
JOIN pg_class AS c2 ON c2.oid = con.confrelid
JOIN pg_namespace n2 ON n2.oid = c2.relnamespace
WHERE con.contype = 'f'; CREATE FUNCTION public.select_foreign_tables(_table_schema text, _table_name text)
RETURNS TABLE (table_schema text, table_name text, fk_table_schema text, fk_table_name text, level integer)
BEGIN ATOMIC
WITH RECURSIVE
_recursive_cte (table_schema, table_name, fk_table_schema, fk_table_name, level) AS (
SELECT
t.table_schema,
t.table_name,
t.fk_table_schema,
t.fk_table_name,
1
FROM fk_tables AS t
WHERE (t.table_schema, t.table_name) = ($1, $2)
UNION ALL
SELECT
rec.fk_table_schema,
rec.fk_table_name,
t.fk_table_schema,
t.fk_table_name,
rec.level + 1
FROM _recursive_cte AS rec
JOIN fk_tables AS t ON (rec.fk_table_schema, rec.fk_table_name) = (t.table_schema, t.table_name)
)
CYCLE table_schema, table_name SET is_cycle USING cycle_path
SELECT table_schema,
table_name,
fk_table_schema,
fk_table_name,
level
FROM _recursive_cte
WHERE NOT is_cycle;
END; SELECT *
FROM select_foreign_tables('public', 'users')
ORDER BY level; |
Beta Was this translation helpful? Give feedback.
-
A Different Type of SQL Recursion with PostgreSQL
PostgreSQL offers a powerful procedural programming model out of the box (in addition to the standard SQL).
You can combine that with the standard SQL approach to overcome almost any issue and solve any programming task. Sometimes, the good old procedural approach may be a more straightforward way out of the complex data problems than standard SQL.
This article will try to demonstrate that approach to a complex problem example.
The Problem
Let's say we want to write a function that will return all related tables for a table name in a parameter.
We will start with a made-up schema:
As we can see, we have the following tables:
users
- references itself (users
) and tabledepartments
departments
references itself, tableusers
, tabledivisions
, and tabledepartment_status
.divisions
references itself, tableusers
and tabledivision_status
.So, what we want to achieve is to have a function that will return all related tables directly and indirectly, together with the level of relation.
For example, we want to be able to call a function
select_foreign_tables
for tableusers
as a parameter, like this:This should return the following result:
As we can see, each row should also have a level assigned. For example, if table
users
directly references tabledepartments
, that's level 1. But,department_status
is level 2 becauseusers
referencesdepartments
and departments referencesdepartment_status
, and so on...Simple, right? It is just another data-related task. Now, how would we do that?
Let's start with a basic query.
Basic Query
A basic query is the PostgreSQL system query that will query system tables to fetch information about direct references. It's a rather complicated query, and I won't get into it, but for the sake of simplicity, let's create a view out of it:
Fine, it's ugly and dirty; don't even look at it.
Use it as a view, for example:
This will return first-level references for the table
users
:Now, all we have to do is run the same query again for each of these FK tables in this results, and we're done.
That's why it's called recursion.
Solution 1
Luckily for us, PostgreSQL has recursive common-table queries as a standard feature, and we can do the first solution with that.
For a recursion seed, we can use the filter on our
fk_tables
view.In the recursive part, we can return the FK table as the base table and join the
fk_tables
view again to get the real FK tables like this:All together, it should look like this:
Confusing? We have yet to start with the confusing part.
Anyway, this query will return all FK tables for
users
at all levels:However, we still need levels. We can count it manually, but we'd rather have instead the database engine do it.
First naive try it to add level 1 at recursion seed like this:
And have it increased by one in each recursion step:
This is very naive. We have a
union
of seed query and recursion query. Theunion
operator filters out duplicate records by default. From the moment we've introduced a level number that is unique, duplicates won't be filtered out. And sinceusers
referencesdepartments
anddepartments
referencesusers
again, the query will end up in an infinite loop, which potentially could crash the server.So this is reckless and even dangerous.
What we have here is a very well-known data problem - the gap and islands problem.
What we want to do is this:
table_schema
andtable_name
- increase the counter by one.To be able to do this, we first need to add two more calculated fields:
Here is what it looks like:
We can wrap this into another CTE and do the last calculation in the final query:
Field
island_flips
will be one if the previous table is different; otherwise, it is 0. Now, if we do a cumulative sum over those values withsum(island_flips) over (order by row_number) as level
- we will get the correct levels.Here is the final version of the plain SQL function:
Do you think this sounds complicated?
It sounds complicated because it is.
Let's try to solve this problem with a completely different approach.
Solution 2
In this approach, we will use an old-school procedural programming approach - instead of using the recursive SQL query - we will use a normal function recursion as you would with any other language.
So, instead of creating a function of the
sql
type - this new function will be of theplpgsql
type. This allows us to use procedural extensions like "if branching", "loops" and so on... Those are the concepts most developers are familiar with.The idea is to create a temporary table that we will insert in recursive calls and return values from that temporary table on the last call.
The function skeleton will look like this:
What we first need to do is to create a temporary table:
This table will be automatically dropped when transaction commits (or rollbacks). All PostgreSQL functions in procedural language are in transaction by default (notice
begin
andend
).When we call this function again in a recursion - it will still be the same transaction. The new transaction is merged into the existing transaction (PostgreSQL doesn't support nested transactions, as far as I know).
However, since it will be called recursively, that temp table may already exist. That's why we also need to check if it exists
create temp table if not exists
.Next, we must check if we have gone through this step already. Meaning have we processed that table from parameters? If we have, exit immediately:
And now, we are ready to insert the data:
fk_tables
view for table parameters and for each record:fk_table_schema
andfk_table_name
as parameters and increase one level.Here is what that loop looks like:
And finally, the entire function:
Conclusion
Which of these two approaches is better, in your opinion?
Recursive CTE queries are powerful but confusing and limiting. This old-school procedural programming approach may be a much better fit for developers untrained with SQL, and that is, in my opinion, the majority. On the other hand, the procedural approach may be something most developers feel comfortable with.
Leave comments below.
Edit:
A follow-up on this article:
Recursion with PostgreSQL Follow-Up 1 - Perfomances
Beta Was this translation helpful? Give feedback.
All reactions