Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect distributed outputs back to DDlog. #1065

Merged
merged 7 commits into from
Aug 30, 2021

Conversation

ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Aug 27, 2021

@convolvatron , @mbudiu-vmw: only the last commit in this PR is new compared to #1060 (github wouldn't let me submit a PR to a PR), so please only review that commit.


This is another step towards an infrastructure to implement more
of the D3log logic in DDlog. In the current architecture, distributed
relations, i.e., relations whose records are exchanged across nodes, are
cut into pairs of input/output relations, where the output relation
contains records to be shipped to a remote location. These are consumed
by the Rust D3log runtime, which is responsible for buffering and
routing the records to their destinations.

The stateful part of the D3log runtime must deal with changing cluster
membership, and may actually benefit from D3log's declarative,
incremental semantics. In order to port this logic to DDlog and keep it
generic, we must first convert all distributed output facts to Any and
collect them in one DDlog relation, so that it can be processed by
application-independent D3log runtime. Since we currently don't have
general meta-programming facilities in DDLog, we implement this as a
one-off compiler extension.

Specifically, this commit introduces the following changes:

  • A new DDlog compiler CLI switch: --d3log-dev, which enables the
    following features. Note that the old --d3log flag is still
    supported, as we don't want to break existing D3log code. Once
    all of it has been ported, we will remove the old --d3log behavior
    and rename --d3log-dev to --d3log.

  • lib/d3log/reflect.dl library declares the following two tables that
    will collect all distributed facts:

    // Rows from distributed relations.
    //
    // `relname`     - relation name
    // `fact`        - row in the relation
    // `destination` - destination specified via @-annotation, if any
    relation DistributedRelFacts(
        relname: istring,
        fact: Any,
        destination: Option<D3logLocationId>
    )
    
    // Rows from distributed streams.
    //
    // `relname`     - stream name
    // `fact`        - record in the stream
    // `destination` - destination specified via @-annotation, if any
    stream DistributedStreamFacts(
        relname: istring,
        fact: Any,
        destination: Option<D3logLocationId>
    )
    

    Note that we need two separate tables for stream and non-stream tables
    (relations and multisets).

  • The compiler rewrites rules that write into distributed relations to
    populate the above tables instead.

  • As before (with the --d3log switch) distributed tables are
    converted into input tables with the same signature, so that D3log
    can feed remote facts to them. One important caveat is that
    distributed relations's are converted to input multiset's,
    as required by the D3log semantics.

Use `::differential_datalog::ddval::DDValue` and
`::differential_datalog::program::Weight` to avoid name clashes and in
preparation for upcoming library changes.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This fixes a regression that caused DDlog-generate crates to recompile
(just the top-level) crate even when there were no changes to DDlog
code.  This was caused by references to files that were moved from the
template to the `differential_datalog` crate lingering in `build.rs`.  See
comments in `build.rs` for an explanation of why this list is needed.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
We are starting to build an infrastructure to help implement more of the
D3log runtime in DDlog.  However, the features added here can be useful
in other contexts.

This commit exposes the `DDValue` type, a type-erased wrapper around
any DDlog value, to DDlog programs.  This enables DDlog programs to
manipulate collections of heterogeneous values in a type-agnostic manner.
For instance, in D3log we want to put all records of all output
relations in a single relation, and route them without knowing their
exact types.

We cannot expose `DDValue` directly, since for performance reasons some
of its operations, namely comparisons methods from `Eq` and `Ord` traits,
are unsafe: comparing DDValue's backed by different types leads to undefined
behavior.  We therefore introduce `DDAny`, a safe wrapper around `DDValue`
and expose it in the DDlog standard library (`ddlog_std`), along with
methods to convert DDlog types to and from `DDAny`.

Serialization/Deserialization
-----------------------------

The `Serialize` implementation of `DDAny` simply invokes the `Serialize`
implementation of its underlying type.

However, since we cannot deserialize a value without knowing its type,
`DDAny` does not implement `trait Deserialize`.  More precisely, it does
have a `Deserialize` implementation, as mandated by DDlog for all types;
however this implementation always fails.

We do provide a way to deserialize into `DDAny` provided the caller knows
the relation id the type being deserialized belongs to.  To this end, we
introduce a new trait `DDAnyDeserialize` that maps an input relation id to
a deserialization function for records in that relation.  We also
provide a `DeserializeSeed` implementation that wraps around this
function and can be used to implement `Deserialize` for complex types
containing `DDAny` as a field.  The `rust_api_test`

Caveats
-------

`DDAny` relies on Rust's TypeId mechanism to order instances
with different underlying types.  As a result, the ordering may
differ across Rust compiler releases.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This test was incorrectly deleted along with old distributed ddlog
stuff.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
`stack test --ta '-p tutorial'` is running out of diskspace in the github
CI pipeline on Windows.  We run the simpler `path` test instead.

A proper long-term solution is to switch to private github runners.
This will also eliminate the need for gitlab CI.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
This commit is another step towards an infrastructure to implement more
of the D3log logic in DDlog.  In the current architecture, distributed
relations, i.e., relations whose records are exchanged across nodes, are
cut into pairs of input/output relations, where the output relation
contains records to be shipped to a remote location.  These are consumed
by the Rust D3log runtime, which is responsible for buffering and
routing the records to their destinations.

The stateful part of the D3log runtime must deal with changing cluster
membership, and may actually benefit from D3log's declarative,
incremental semantics.  In order to port this logic to DDlog and keep it
generic, we must first convert all distributed output facts to `Any` and
collect them in one DDlog relation, so that it can be processed by
application-independent D3log runtime.  Since we currently don't have
general meta-programming facilities in DDLog, we implement this as a
one-off compiler extension.

Specifically, this commit introduces the following changes:
- A new DDlog compiler CLI switch: `--d3log-dev`, which enables the
  following features.  Note that the old `--d3log` flag is still
  supported, as we don't want to break existing D3log code.  Once
  all of it has been ported, we will remove the old `--d3log` behavior
  and rename `--d3log-dev` to `--d3log`.

- `lib/d3log/reflect.dl` library declares the following two tables that
  will collect all distributed facts:

  ```
  // Rows from distributed relations.
  //
  // `relname`     - relation name
  // `fact`        - row in the relation
  // `destination` - destination specified via @-annotation, if any
  relation DistributedRelFacts(
      relname: istring,
      fact: Any,
      destination: Option<D3logLocationId>
  )

  // Rows from distributed streams.
  //
  // `relname`     - stream name
  // `fact`        - record in the stream
  // `destination` - destination specified via @-annotation, if any
  stream DistributedStreamFacts(
      relname: istring,
      fact: Any,
      destination: Option<D3logLocationId>
  )
  ```

  Note that we need two separate tables for stream and non-stream tables
  (relations and multisets).

- The compiler rewrites rules that write into distributed relations to
  populate the above tables instead.

- As before (with the `--d3log` switch) distributed tables are
  converted into input tables with the same signature, so that D3log
  can feed remote facts to them.  One important caveat is that
  distributed `relations`'s are converted to `input multiset`'s,
  as required by the D3log semantics.

Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com>
@mihaibudiu
Copy link

So DDlog can now be used as a dynamically-typed language too.
C# added that at some point (a "dynamic" type).

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Aug 27, 2021

So DDlog can now be used as a dynamically-typed language too.
C# added that at some point (a "dynamic" type).

Sort of, but in a very limited way.

Copy link

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So DDlog can now be used as a dynamically-typed language too.
C# added that at some point (a "dynamic" type).

Sort of, but in a very limited way.

nothing stops you from actually exposing this type to normal programs (except the fact that there aren't many operations you can do on this value except perhaps cast it).

@@ -65,7 +65,11 @@ jobs:
#with:
# path: test/datalog_tests/tutorial_ddlog
# key: ${{ runner.os }}-tutorial
- name: Run tutorial test
- if: ${{ runner.os == 'Windows' }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate small commit in the history. We're running out of disk on Windows in github with the more complex test.

@@ -99,6 +100,7 @@ options = [ Option ['h'] ["help"] (NoArg Help)
, Option [] ["rust-flatbuffers"] (NoArg RustFlatBuffers) "Build flatbuffers bindings for Rust"
, Option [] ["nested-ts-32"] (NoArg NestedTS32) "Use 32-bit instead of 16-bit nested timestamps. Supports recursive programs that may perform >65,536 iterations. Slightly increases the memory footprint of the program."
, Option [] ["d3log"] (NoArg D3log) "Compile the input program to execute in the distributed DDlog (D3log) environment."
, Option [] ["d3log-dev"] (NoArg D3logDev) "Compile the input program to execute in the D3log environment with experimental reflective features enabled."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add a note that this will be deprecated soon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, the "dev" and "experimental" part should make that clear :)

}
}

impl PartialOrd for Any {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should say that the result for different types is not fixed for all instances, but not specified or deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a comment from ddlog_std.dl, which I think says this:

Note: `Any` relies on Rust's TypeId mechanism to order instances
with different underlying types.  As a result, the ordering may
differ across Rust compiler releases.

Copy link

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't really read the Rust macro, but the rest seems fine (although I had some questions).

mod ddvalue;

pub use any::{Any, AnyDeserializeSeed};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rust is not good enough; why do you need to include this module if there are no explicit uses of anything in it in the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-exporting them from the top-level module, to simplify the namespace for users.

@@ -26,10 +26,7 @@ fn libtool() {
}
println!("cargo:rerun-if-changed=src/lib.rs");
println!("cargo:rerun-if-changed=src/main.rs");
println!("cargo:rerun-if-changed=src/api/mod.rs");
println!("cargo:rerun-if-changed=src/api/c_api.rs");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't these needed anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to differential_datalog a while ago, but forgotten here, causing excessive re-compilations :(

import Language.DifferentialDatalog.NS

tStringInternFunc :: Type
tStringInternFunc = tFunction [ArgType nopos False tString] (tIntern tString)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fixing Ben's bug?
What happens if the function requires arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is just the signature of the internment::intern function applied to String. The transformation in this file is performed over type checked spec, at which point polymorphic function calls must be type-annotated, so I need to tediously insert these annotations by hand.

else relSemantics rel
rel_in = rel{
relRole = RelInput,
relSemantics = rel_in_sem

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you change the semantics for the inputs?
doesn't this break programs?
Doesn't this imply a distinct after the input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't touch any existing input relations. Instead I take internal relations that appear at least ones with location annotation @ and reap them into an input and an output relation, so that D3log can grab output records and deliver them to inputs at a remote (or local) node.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Aug 27, 2021

So DDlog can now be used as a dynamically-typed language too.
C# added that at some point (a "dynamic" type).

Sort of, but in a very limited way.

nothing stops you from actually exposing this type to normal programs (except the fact that there aren't many operations you can do on this value except perhaps cast it).

It is exposed to everyone and can be used to, e.g., store objects of different types in a vector.

@ryzhyk ryzhyk merged commit 1c31bbb into vmware:master Aug 30, 2021
@ryzhyk ryzhyk deleted the reflect_d3log_output branch August 30, 2021 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants