-
Notifications
You must be signed in to change notification settings - Fork 71
Functions vs rules #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
krishnangovindraj
merged 5 commits into
typedb:3.x-development
from
krishnangovindraj:document-functions-vs-rules
Sep 11, 2025
Merged
Functions vs rules #982
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
195 changes: 195 additions & 0 deletions
195
reference/modules/ROOT/pages/typeql/functions/functions-vs-rules.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
= Functions v/s rules | ||
|
||
TypeDB 3 introduces functions to reason over your data. | ||
They replace rules, which were the way to reason in TypeDB 2. | ||
Functions work in a similar way to rules, providing an intuitive abstraction over subqueries. They should be able to replace them in most cases. | ||
This page discusses the similarities, differences and factors to consider when moving from rules to functions. | ||
|
||
== Separating data & computation | ||
Whereas rules were a natural fit for predicate-logic based semantics of TypeDB 2, | ||
functions are better suited for TypeDB 3's type-theory based semantics. | ||
The major difference to the user is that there is now a separation between the data and reasoning constructs. | ||
Rules allowed the user to think reason purely in terms of the data-model by silently completing the data by inferring instances of existing types. | ||
Functions force the user to separate reasoning from the data-model, by explicitly creating a new function for any new computation that infers something new. | ||
|
||
Although this shift sounds significant, most advanced rules do require the user to think of the computation involved for efficiency reasons. | ||
|
||
== Similarities | ||
Although one would ideally think natively in functions (rather than thinking of them in terms of rules) it is easy to see the similarities between rules and functions. | ||
|
||
E.g. A rule completing a `reachable` relation transitively, can be seen as a function which computes all pairs of nodes where one is reachable from the other. | ||
Conversely, the function can be seen as special type of relation where the roles are implicitly specified by the position of arguments or returned concepts. | ||
|
||
[,typeql] | ||
---- | ||
define | ||
rule transitive-reachability: | ||
when { | ||
{ | ||
(from: $from, to: $to) isa edge; | ||
} or { | ||
(from: $from, to: $via) isa edge; | ||
(from: $via, to: $to) isa reachable; | ||
}; | ||
} then { | ||
(from: $from, to: $to) isa reachable; | ||
}; | ||
|
||
# query | ||
match | ||
$x isa node; $y isa node; | ||
(from: $x, to: $y) isa reachable; | ||
---- | ||
|
||
[,typeql] | ||
---- | ||
define | ||
fun reachable() -> { node, node }: | ||
match | ||
{ | ||
(from: $from, to: $to) isa edge; | ||
} or { | ||
let $from, $via in reachable(); | ||
(from: $via, to: $to) isa edge; | ||
}; | ||
return { $from, $to }; | ||
|
||
# query | ||
match | ||
$x isa node; $y isa node; | ||
let $x, $y in reachable(); | ||
---- | ||
|
||
An equally valid way is to see a function as a predicate computing whether one node is a reachable from the other. | ||
[,typeql] | ||
---- | ||
define | ||
fun reachable($from: node, $to: node) -> bool: | ||
match | ||
{ | ||
(from: $from, to: $to) isa edge; | ||
} or { | ||
(from: $from, to: $via) isa edge; | ||
true == reachable($via, $to); | ||
}; | ||
return check; | ||
|
||
# query | ||
match | ||
$x isa node; $y isa node; | ||
true == reachable($x, $y); | ||
---- | ||
|
||
== Arguments & return values | ||
From the example, there are multiple ways of expressing the same rule as a function. | ||
This is understandable given (mathematically) a relation is a collection of tuples, | ||
whereas a function is a mapping from the input domain to the output range. | ||
So there is some ambiguity as to which concepts are arguments and which are to be returned. | ||
|
||
The choice depends on what the query needs to compute. | ||
Does it need to enumerate the pairs of nodes for which one is reachable from the other? | ||
Or check whether a node is reachable from the other? | ||
Or even enumerate the nodes reachable from a given node, as the function below does? | ||
|
||
[,typeql] | ||
---- | ||
define | ||
fun reachable_from($from: node) -> { node }: | ||
match | ||
{ | ||
(from: $from, to: $to) isa edge; | ||
} or { | ||
let $via in reachable_from($from); | ||
(from: $via, to: $to) isa edge; | ||
}; | ||
return { $to }; | ||
|
||
# query | ||
match | ||
$x isa node, has id "123"; | ||
$y isa node; | ||
let $y in reachable_from($x); | ||
---- | ||
|
||
Currently, each of these use cases would need a different function to be defined. | ||
The arguments are inputs to the function and must be bound by the rest of the query. | ||
The function will bind the returned values to the variables on the left of `in`. | ||
|
||
=== Contextually bound functions | ||
Although the split between arguments and returned concepts is a reasonable consequence of | ||
having the user think about the computation explicitly, it is less declarative than rules - | ||
where the planner would infer which role-players of a relation should be bound when evaluating the rule. | ||
A future version of TypeDB will introduce "contextually bound functions" where the planner may choose which | ||
variables are input to the functions and which are output - allowing a single function definition | ||
to satisfy all the cases discussed above. | ||
|
||
== What functions can do | ||
Since functions aren't forced to return relations (or ownerships) which are defined in the schema, | ||
they are more flexible and can also operate on raw values, including structs (when they are implemented). | ||
Since they don't infer new instances, they also avoid the overhead | ||
of making the result of the computation abide by the rules governing the inferred types. | ||
In short, they're a really simple way of doing reasoning. | ||
|
||
Functions are still evaluated on demand in a goal-driven way. | ||
This means we don't materialize all the results of the function when the data is updated. | ||
Instead, we only compute the calls relevant to the query being evaluated. | ||
The advantage of a goal-driven approaches over an eagerly materializing one is that they support very large models | ||
(in theory infinitely large models - i.e. an infinite number of inferred concepts). | ||
A simple illustration is this toy function that produces natural numbers. | ||
|
||
[,typeql] | ||
---- | ||
with fn nat() -> { integer }: | ||
match | ||
{let $x = 0;} or | ||
{ let $x in nat() + 1; }; | ||
return { $x }; | ||
|
||
match let $x in nat(); | ||
---- | ||
|
||
Since TypeDB's grpc endpoint also supports reactive streaming, | ||
this query will stream out natural numbers on demand. | ||
This isn't quite an infinite domain, since we are limited to the domain of the `integer` type which is a signed 64-bit integer. | ||
Understandable, since this is the case with most programming languages - | ||
but it does bring us to the next section. | ||
|
||
=== What functions can't do (yet) | ||
TypeDB in its current state can't do "arbitrarily large" models. | ||
|
||
This limitation is unlikely to be practically relevant, | ||
since most uses of functions only need to compute tuples of concepts which exist in the database. | ||
Since this is a combination of finite sets, it is combinatorially large but still finite. | ||
|
||
The problem is still (theoretically) interesting for two reasons - (1) TypeDB 2 could do it; (2) A feature complete TypeDB 3 could too. | ||
|
||
==== How did TypeDB 2 do it? Nested relations. | ||
In theory, one can infer an arbitrary number of nested relations, | ||
because you can always infer a new relation which relates the one you just inferred. | ||
|
||
In particular this makes state-space search on "open" worlds possible. | ||
For games like the Rubik's cube, it is possible to assign each state a number (not necessarily one that fits in 64 bits). | ||
For open worlds such as an arbitrary block's world, this doesn't work since the number of "objects" of each type aren't fixed. | ||
You would instead represent a state as a relation involving the previous state and the action taken. | ||
|
||
==== How will TypeDB 3 do it? Lists. | ||
Lists can be arbitrarily long. A state can be represented by specifying the initial state, | ||
and the list of actions which brings us to this state. | ||
|
||
|
||
[NOTE] | ||
==== | ||
Modern computers have finite memory so none of this is truly infinite. | ||
The difference between the finiteness of TypeDB 3's model without and with lists | ||
is that the former is constrained by the limits of TypeQL's expressivity, | ||
while the latter is constrained by the limits of the underlying computer. | ||
==== | ||
|
||
[NOTE] | ||
==== | ||
This is based on ideas & results from the deductive databases field. | ||
A lot of research there is conducted on datalog - | ||
a "syntactic subset" of prolog that bans compound terms. | ||
Lists are central to prolog and are implemented as recursive compound terms. | ||
The exclusion of compound terms has implications on the decidability of datalog. | ||
==== |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mentioning turing completeness somewhere here might be good is a relatable way to have it thought about IMO.
Aren't we turing complete if we can generate nats() though?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a push-down automaton might be able to generate nats.