Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ A variable is "bound" in a disjunction if it is bound in every branch of the dis
Variables which are present in & bound by only some of the branches of the disjunction
(and not present outside it) are considered "internal" to that disjunction.

[,typeql]
----
#!test[read, count=2]
match
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
= Functions v/s rules

TypeDB 3 introduces functions to reason over your data.
They replace rules, which were the way to reason in TypeDB 2.
Functions work in a similar way to rules, providing an intuitive abstraction over subqueries. They should be able to replace them in most cases.
This page discusses the similarities, differences and factors to consider when moving from rules to functions.

== Separating data & computation
Whereas rules were a natural fit for predicate-logic based semantics of TypeDB 2,
functions are better suited for TypeDB 3's type-theory based semantics.
The major difference to the user is that there is now a separation between the data and reasoning constructs.
Rules allowed the user to think reason purely in terms of the data-model by silently completing the data by inferring instances of existing types.
Functions force the user to separate reasoning from the data-model, by explicitly creating a new function for any new computation that infers something new.

Although this shift sounds significant, most advanced rules do require the user to think of the computation involved for efficiency reasons.

== Similarities
Although one would ideally think natively in functions (rather than thinking of them in terms of rules) it is easy to see the similarities between rules and functions.

E.g. A rule completing a `reachable` relation transitively, can be seen as a function which computes all pairs of nodes where one is reachable from the other.
Conversely, the function can be seen as special type of relation where the roles are implicitly specified by the position of arguments or returned concepts.

[,typeql]
----
define
rule transitive-reachability:
when {
{
(from: $from, to: $to) isa edge;
} or {
(from: $from, to: $via) isa edge;
(from: $via, to: $to) isa reachable;
};
} then {
(from: $from, to: $to) isa reachable;
};

# query
match
$x isa node; $y isa node;
(from: $x, to: $y) isa reachable;
----

[,typeql]
----
define
fun reachable() -> { node, node }:
match
{
(from: $from, to: $to) isa edge;
} or {
let $from, $via in reachable();
(from: $via, to: $to) isa edge;
};
return { $from, $to };

# query
match
$x isa node; $y isa node;
let $x, $y in reachable();
----

An equally valid way is to see a function as a predicate computing whether one node is a reachable from the other.
[,typeql]
----
define
fun reachable($from: node, $to: node) -> bool:
match
{
(from: $from, to: $to) isa edge;
} or {
(from: $from, to: $via) isa edge;
true == reachable($via, $to);
};
return check;

# query
match
$x isa node; $y isa node;
true == reachable($x, $y);
----

== Arguments & return values
From the example, there are multiple ways of expressing the same rule as a function.
This is understandable given (mathematically) a relation is a collection of tuples,
whereas a function is a mapping from the input domain to the output range.
So there is some ambiguity as to which concepts are arguments and which are to be returned.

The choice depends on what the query needs to compute.
Does it need to enumerate the pairs of nodes for which one is reachable from the other?
Or check whether a node is reachable from the other?
Or even enumerate the nodes reachable from a given node, as the function below does?

[,typeql]
----
define
fun reachable_from($from: node) -> { node }:
match
{
(from: $from, to: $to) isa edge;
} or {
let $via in reachable_from($from);
(from: $via, to: $to) isa edge;
};
return { $to };

# query
match
$x isa node, has id "123";
$y isa node;
let $y in reachable_from($x);
----

Currently, each of these use cases would need a different function to be defined.
The arguments are inputs to the function and must be bound by the rest of the query.
The function will bind the returned values to the variables on the left of `in`.

=== Contextually bound functions
Although the split between arguments and returned concepts is a reasonable consequence of
having the user think about the computation explicitly, it is less declarative than rules -
where the planner would infer which role-players of a relation should be bound when evaluating the rule.
A future version of TypeDB will introduce "contextually bound functions" where the planner may choose which
variables are input to the functions and which are output - allowing a single function definition
to satisfy all the cases discussed above.

== What functions can do
Since functions aren't forced to return relations (or ownerships) which are defined in the schema,
they are more flexible and can also operate on raw values, including structs (when they are implemented).
Since they don't infer new instances, they also avoid the overhead
of making the result of the computation abide by the rules governing the inferred types.
In short, they're a really simple way of doing reasoning.

Functions are still evaluated on demand in a goal-driven way.
This means we don't materialize all the results of the function when the data is updated.
Instead, we only compute the calls relevant to the query being evaluated.
The advantage of a goal-driven approaches over an eagerly materializing one is that they support very large models
(in theory infinitely large models - i.e. an infinite number of inferred concepts).
A simple illustration is this toy function that produces natural numbers.

[,typeql]
----
with fn nat() -> { integer }:
match
{let $x = 0;} or
{ let $x in nat() + 1; };
return { $x };

match let $x in nat();
----

Since TypeDB's grpc endpoint also supports reactive streaming,
this query will stream out natural numbers on demand.
This isn't quite an infinite domain, since we are limited to the domain of the `integer` type which is a signed 64-bit integer.
Understandable, since this is the case with most programming languages -
but it does bring us to the next section.

=== What functions can't do (yet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mentioning turing completeness somewhere here might be good is a relatable way to have it thought about IMO.

Aren't we turing complete if we can generate nats() though?

Copy link
Member Author

@krishnangovindraj krishnangovindraj Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a push-down automaton might be able to generate nats.

TypeDB in its current state can't do "arbitrarily large" models.

This limitation is unlikely to be practically relevant,
since most uses of functions only need to compute tuples of concepts which exist in the database.
Since this is a combination of finite sets, it is combinatorially large but still finite.

The problem is still (theoretically) interesting for two reasons - (1) TypeDB 2 could do it; (2) A feature complete TypeDB 3 could too.

==== How did TypeDB 2 do it? Nested relations.
In theory, one can infer an arbitrary number of nested relations,
because you can always infer a new relation which relates the one you just inferred.

In particular this makes state-space search on "open" worlds possible.
For games like the Rubik's cube, it is possible to assign each state a number (not necessarily one that fits in 64 bits).
For open worlds such as an arbitrary block's world, this doesn't work since the number of "objects" of each type aren't fixed.
You would instead represent a state as a relation involving the previous state and the action taken.

==== How will TypeDB 3 do it? Lists.
Lists can be arbitrarily long. A state can be represented by specifying the initial state,
and the list of actions which brings us to this state.


[NOTE]
====
Modern computers have finite memory so none of this is truly infinite.
The difference between the finiteness of TypeDB 3's model without and with lists
is that the former is constrained by the limits of TypeQL's expressivity,
while the latter is constrained by the limits of the underlying computer.
====

[NOTE]
====
This is based on ideas & results from the deductive databases field.
A lot of research there is conducted on datalog -
a "syntactic subset" of prolog that bans compound terms.
Lists are central to prolog and are implemented as recursive compound terms.
The exclusion of compound terms has implications on the decidability of datalog.
====
6 changes: 6 additions & 0 deletions reference/modules/ROOT/pages/typeql/functions/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,10 @@ Learn how to work with stream-return functions.
****
Learn how to work with scalar functions.
****

.xref:{page-version}@reference::typeql/functions/functions-vs-rules.adoc[]
[.clickable]
****
How do functions differ from TypeDB 2 rules?
****
--