-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflect distributed outputs back to DDlog. #1065
Conversation
Use `::differential_datalog::ddval::DDValue` and `::differential_datalog::program::Weight` to avoid name clashes and in preparation for upcoming library changes. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This fixes a regression that caused DDlog-generate crates to recompile (just the top-level) crate even when there were no changes to DDlog code. This was caused by references to files that were moved from the template to the `differential_datalog` crate lingering in `build.rs`. See comments in `build.rs` for an explanation of why this list is needed. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
We are starting to build an infrastructure to help implement more of the D3log runtime in DDlog. However, the features added here can be useful in other contexts. This commit exposes the `DDValue` type, a type-erased wrapper around any DDlog value, to DDlog programs. This enables DDlog programs to manipulate collections of heterogeneous values in a type-agnostic manner. For instance, in D3log we want to put all records of all output relations in a single relation, and route them without knowing their exact types. We cannot expose `DDValue` directly, since for performance reasons some of its operations, namely comparisons methods from `Eq` and `Ord` traits, are unsafe: comparing DDValue's backed by different types leads to undefined behavior. We therefore introduce `DDAny`, a safe wrapper around `DDValue` and expose it in the DDlog standard library (`ddlog_std`), along with methods to convert DDlog types to and from `DDAny`. Serialization/Deserialization ----------------------------- The `Serialize` implementation of `DDAny` simply invokes the `Serialize` implementation of its underlying type. However, since we cannot deserialize a value without knowing its type, `DDAny` does not implement `trait Deserialize`. More precisely, it does have a `Deserialize` implementation, as mandated by DDlog for all types; however this implementation always fails. We do provide a way to deserialize into `DDAny` provided the caller knows the relation id the type being deserialized belongs to. To this end, we introduce a new trait `DDAnyDeserialize` that maps an input relation id to a deserialization function for records in that relation. We also provide a `DeserializeSeed` implementation that wraps around this function and can be used to implement `Deserialize` for complex types containing `DDAny` as a field. The `rust_api_test` Caveats ------- `DDAny` relies on Rust's TypeId mechanism to order instances with different underlying types. As a result, the ordering may differ across Rust compiler releases. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This test was incorrectly deleted along with old distributed ddlog stuff. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
`stack test --ta '-p tutorial'` is running out of diskspace in the github CI pipeline on Windows. We run the simpler `path` test instead. A proper long-term solution is to switch to private github runners. This will also eliminate the need for gitlab CI. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This commit is another step towards an infrastructure to implement more of the D3log logic in DDlog. In the current architecture, distributed relations, i.e., relations whose records are exchanged across nodes, are cut into pairs of input/output relations, where the output relation contains records to be shipped to a remote location. These are consumed by the Rust D3log runtime, which is responsible for buffering and routing the records to their destinations. The stateful part of the D3log runtime must deal with changing cluster membership, and may actually benefit from D3log's declarative, incremental semantics. In order to port this logic to DDlog and keep it generic, we must first convert all distributed output facts to `Any` and collect them in one DDlog relation, so that it can be processed by application-independent D3log runtime. Since we currently don't have general meta-programming facilities in DDLog, we implement this as a one-off compiler extension. Specifically, this commit introduces the following changes: - A new DDlog compiler CLI switch: `--d3log-dev`, which enables the following features. Note that the old `--d3log` flag is still supported, as we don't want to break existing D3log code. Once all of it has been ported, we will remove the old `--d3log` behavior and rename `--d3log-dev` to `--d3log`. - `lib/d3log/reflect.dl` library declares the following two tables that will collect all distributed facts: ``` // Rows from distributed relations. // // `relname` - relation name // `fact` - row in the relation // `destination` - destination specified via @-annotation, if any relation DistributedRelFacts( relname: istring, fact: Any, destination: Option<D3logLocationId> ) // Rows from distributed streams. // // `relname` - stream name // `fact` - record in the stream // `destination` - destination specified via @-annotation, if any stream DistributedStreamFacts( relname: istring, fact: Any, destination: Option<D3logLocationId> ) ``` Note that we need two separate tables for stream and non-stream tables (relations and multisets). - The compiler rewrites rules that write into distributed relations to populate the above tables instead. - As before (with the `--d3log` switch) distributed tables are converted into input tables with the same signature, so that D3log can feed remote facts to them. One important caveat is that distributed `relations`'s are converted to `input multiset`'s, as required by the D3log semantics. Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
So DDlog can now be used as a dynamically-typed language too. |
Sort of, but in a very limited way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DDlog can now be used as a dynamically-typed language too.
C# added that at some point (a "dynamic" type).Sort of, but in a very limited way.
nothing stops you from actually exposing this type to normal programs (except the fact that there aren't many operations you can do on this value except perhaps cast it).
@@ -65,7 +65,11 @@ jobs: | |||
#with: | |||
# path: test/datalog_tests/tutorial_ddlog | |||
# key: ${{ runner.os }}-tutorial | |||
- name: Run tutorial test | |||
- if: ${{ runner.os == 'Windows' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate small commit in the history. We're running out of disk on Windows in github with the more complex test.
@@ -99,6 +100,7 @@ options = [ Option ['h'] ["help"] (NoArg Help) | |||
, Option [] ["rust-flatbuffers"] (NoArg RustFlatBuffers) "Build flatbuffers bindings for Rust" | |||
, Option [] ["nested-ts-32"] (NoArg NestedTS32) "Use 32-bit instead of 16-bit nested timestamps. Supports recursive programs that may perform >65,536 iterations. Slightly increases the memory footprint of the program." | |||
, Option [] ["d3log"] (NoArg D3log) "Compile the input program to execute in the distributed DDlog (D3log) environment." | |||
, Option [] ["d3log-dev"] (NoArg D3logDev) "Compile the input program to execute in the D3log environment with experimental reflective features enabled." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to add a note that this will be deprecated soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, the "dev" and "experimental" part should make that clear :)
} | ||
} | ||
|
||
impl PartialOrd for Any { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should say that the result for different types is not fixed for all instances, but not specified or deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a comment from ddlog_std.dl
, which I think says this:
Note: `Any` relies on Rust's TypeId mechanism to order instances
with different underlying types. As a result, the ordering may
differ across Rust compiler releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't really read the Rust macro, but the rest seems fine (although I had some questions).
mod ddvalue; | ||
|
||
pub use any::{Any, AnyDeserializeSeed}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rust is not good enough; why do you need to include this module if there are no explicit uses of anything in it in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-exporting them from the top-level module, to simplify the namespace for users.
@@ -26,10 +26,7 @@ fn libtool() { | |||
} | |||
println!("cargo:rerun-if-changed=src/lib.rs"); | |||
println!("cargo:rerun-if-changed=src/main.rs"); | |||
println!("cargo:rerun-if-changed=src/api/mod.rs"); | |||
println!("cargo:rerun-if-changed=src/api/c_api.rs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't these needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to differential_datalog
a while ago, but forgotten here, causing excessive re-compilations :(
import Language.DifferentialDatalog.NS | ||
|
||
tStringInternFunc :: Type | ||
tStringInternFunc = tFunction [ArgType nopos False tString] (tIntern tString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixing Ben's bug?
What happens if the function requires arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is just the signature of the internment::intern
function applied to String
. The transformation in this file is performed over type checked spec, at which point polymorphic function calls must be type-annotated, so I need to tediously insert these annotations by hand.
else relSemantics rel | ||
rel_in = rel{ | ||
relRole = RelInput, | ||
relSemantics = rel_in_sem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you change the semantics for the inputs?
doesn't this break programs?
Doesn't this imply a distinct after the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't touch any existing input relations. Instead I take internal relations that appear at least ones with location annotation @
and reap them into an input and an output relation, so that D3log can grab output records and deliver them to inputs at a remote (or local) node.
It is exposed to everyone and can be used to, e.g., store objects of different types in a vector. |
@convolvatron , @mbudiu-vmw: only the last commit in this PR is new compared to #1060 (github wouldn't let me submit a PR to a PR), so please only review that commit.
This is another step towards an infrastructure to implement more
of the D3log logic in DDlog. In the current architecture, distributed
relations, i.e., relations whose records are exchanged across nodes, are
cut into pairs of input/output relations, where the output relation
contains records to be shipped to a remote location. These are consumed
by the Rust D3log runtime, which is responsible for buffering and
routing the records to their destinations.
The stateful part of the D3log runtime must deal with changing cluster
membership, and may actually benefit from D3log's declarative,
incremental semantics. In order to port this logic to DDlog and keep it
generic, we must first convert all distributed output facts to
Any
andcollect them in one DDlog relation, so that it can be processed by
application-independent D3log runtime. Since we currently don't have
general meta-programming facilities in DDLog, we implement this as a
one-off compiler extension.
Specifically, this commit introduces the following changes:
A new DDlog compiler CLI switch:
--d3log-dev
, which enables thefollowing features. Note that the old
--d3log
flag is stillsupported, as we don't want to break existing D3log code. Once
all of it has been ported, we will remove the old
--d3log
behaviorand rename
--d3log-dev
to--d3log
.lib/d3log/reflect.dl
library declares the following two tables thatwill collect all distributed facts:
Note that we need two separate tables for stream and non-stream tables
(relations and multisets).
The compiler rewrites rules that write into distributed relations to
populate the above tables instead.
As before (with the
--d3log
switch) distributed tables areconverted into input tables with the same signature, so that D3log
can feed remote facts to them. One important caveat is that
distributed
relations
's are converted toinput multiset
's,as required by the D3log semantics.