Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed dicts #101

Closed
wants to merge 108 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
fee9d28
starting work on typed dictionaries
matko Nov 23, 2022
fde3b53
tfc block entry retrieval
matko Nov 23, 2022
c54eddc
buf implementation for tfc dict entry
matko Nov 23, 2022
1da44ba
remove dummy test
matko Nov 23, 2022
6ad263e
replicate all pfc comparison logic
matko Nov 23, 2022
e97e157
also optimize dict entries
matko Nov 23, 2022
72653d8
move tfc head to the start of the block
matko Nov 24, 2022
169448c
lookup slice in tfc block
matko Nov 24, 2022
1c8c4a2
test for close matches in tfc
matko Nov 24, 2022
0807643
implement tfcdict
matko Nov 24, 2022
dfa4834
look up id of entry in tfcdict
matko Nov 24, 2022
de19515
move block size to start for easier search
matko Nov 24, 2022
07fe95e
renamed TfcDict and related types to SizedDict and related
matko Nov 24, 2022
1124d43
typed dictionary segments
matko Nov 24, 2022
a2df5f3
Add decimals and bigints
GavinMendelGleason Nov 25, 2022
1b91710
Implementaiton of decimal and integer
GavinMendelGleason Nov 25, 2022
b1ddd01
only enable rug features we need
matko Nov 25, 2022
699bfbc
refactor segment building to use one continuous offset array
matko Nov 25, 2022
ab5e12f
write mutliple segments into one go
matko Nov 25, 2022
71df537
Working test, added start offset parameter, Need to fix offsets
GavinMendelGleason Nov 26, 2022
46b9a28
fix offsets
matko Nov 26, 2022
1b83930
typed dict retrieval
matko Nov 26, 2022
593d227
Two significant changes and some formatting.
GavinMendelGleason Nov 26, 2022
4ab91b3
Remove dbgs and add better tests
GavinMendelGleason Nov 26, 2022
d2959e6
Fix bigint naming, add tests
GavinMendelGleason Nov 26, 2022
8dfa62b
make versions of the bitindex generator that work with bufs
matko Nov 28, 2022
4e61d03
precalculate typed dict len
matko Nov 28, 2022
75675eb
block and dict iterators
matko Nov 28, 2022
b342ae8
full iterator over the entire typed dict
matko Nov 28, 2022
73cf214
prereserve a vector with the right size on block iteration
matko Nov 28, 2022
1185e23
refactor entry buf code to reuse code smarter
matko Nov 28, 2022
e23b065
reformat
matko Nov 28, 2022
6411aa3
work
matko Nov 29, 2022
e25f46b
more work
matko Nov 29, 2022
1982937
some builder logic around sizeddict
matko Nov 29, 2022
23608d4
Merge branch 'typed_dicts' into typed_dicts_refactor_work
matko Nov 29, 2022
26d655e
Adding builder, doesn't work because of buf borrow
GavinMendelGleason Nov 29, 2022
b6f8df8
Annoyed about a move
GavinMendelGleason Nov 29, 2022
2651215
Fix borrow issues by adding option
GavinMendelGleason Nov 29, 2022
102900c
Almost working
GavinMendelGleason Nov 30, 2022
724d668
Working
GavinMendelGleason Nov 30, 2022
5174483
Remove debug prints
GavinMendelGleason Nov 30, 2022
df22829
Adding suffixless blocks
GavinMendelGleason Nov 30, 2022
8ed589f
Remove extraneous
GavinMendelGleason Nov 30, 2022
f0a030b
Merge branch 'typed_dicts' into typed_dicts_refactor_work
GavinMendelGleason Nov 30, 2022
c209e25
refactor progress
matko Nov 30, 2022
a3768a3
adding parse of control word logic
GavinMendelGleason Nov 30, 2022
25d7d21
make lexical conversion more generic
matko Nov 30, 2022
db2959c
Merge branch 'typed_dicts' into typed_dicts_refactor_work
matko Nov 30, 2022
46d62b1
back to something that compiles
matko Nov 30, 2022
a09a67e
Builds
GavinMendelGleason Nov 30, 2022
c802006
No warnings
GavinMendelGleason Nov 30, 2022
4837d73
WIP: To avoid data loss, checking in this debugging code
GavinMendelGleason Dec 1, 2022
2fbd860
Some debugging code.
GavinMendelGleason Dec 1, 2022
2c113eb
Add condition for empty slice
GavinMendelGleason Dec 1, 2022
2b07fa6
Adding fixes for phase2
GavinMendelGleason Dec 2, 2022
55f20d5
190 passing
GavinMendelGleason Dec 2, 2022
7750194
only 90 failing
GavinMendelGleason Dec 4, 2022
a51feed
Fewer debug prints
GavinMendelGleason Dec 4, 2022
48750c2
Ready for comparison to refactor branch
GavinMendelGleason Dec 6, 2022
1e66cef
Merge pull request #104 from terminusdb/typed_dicts_fix_offsets
matko Dec 6, 2022
3f63e88
Merge branch 'main' into typed_dicts
matko Dec 6, 2022
65c1634
Merge branch 'typed_dicts' into typed_dicts_refactor_work
matko Dec 6, 2022
49b6c97
make lower level empty dicts work
matko Dec 6, 2022
af9abb4
fix id correction when looking up in parent layers
matko Dec 6, 2022
3380e78
fix id mapping for new id offset
matko Dec 6, 2022
14b71a0
remove some debug expressions
matko Dec 6, 2022
7c3b340
removed loads of dbg! invocations, and started string dict logic
matko Dec 6, 2022
7377423
fixed all tests
matko Dec 6, 2022
b017c5e
Adding multiblock logic
GavinMendelGleason Dec 6, 2022
29c0640
Moving types into their own file
GavinMendelGleason Dec 7, 2022
f4e8407
Make tests pass (imports)
GavinMendelGleason Dec 7, 2022
e486508
Satisfy linter
GavinMendelGleason Dec 7, 2022
f5aba77
Merge pull request #107 from terminusdb/extending_types
matko Dec 7, 2022
641b55c
Made SizedDictEntry slightly more efficient for single byte structs
matko Dec 7, 2022
bcca7cf
implement TypedDictEntry
matko Dec 7, 2022
bdd4bec
reformat everything
matko Dec 7, 2022
a0e9546
split from_lexical to its own trait
matko Dec 7, 2022
6147b2b
put trait bound on TdbDataType ensuring FromLexical and ToLexical are…
matko Dec 7, 2022
4e2b9e8
remove debug print
matko Dec 7, 2022
f6ccdd7
Merge pull request #105 from terminusdb/typed_dicts_refactor_work
matko Dec 7, 2022
1707c33
Merge branch 'main' into typed_dicts
matko Dec 7, 2022
534068a
change interface to allow adding of arbitrary values, not just strings
matko Dec 7, 2022
f741afe
value extraction from typed dict + test
matko Dec 7, 2022
9a31003
Adding datatype accessor function
GavinMendelGleason Dec 7, 2022
8d0b943
Some more data types
GavinMendelGleason Dec 7, 2022
81d2333
Accidentally broke build briefly
GavinMendelGleason Dec 7, 2022
d27ad4e
Remove length test
GavinMendelGleason Dec 8, 2022
c860a0a
add langstring support
matko Dec 8, 2022
d347786
Extending types
GavinMendelGleason Dec 8, 2022
5e6bb66
Adding date times.
GavinMendelGleason Dec 8, 2022
d4cef31
DateTimes (with very little testing)
GavinMendelGleason Dec 9, 2022
56d3310
Remove warning
GavinMendelGleason Dec 9, 2022
5b7ebed
More types
GavinMendelGleason Dec 9, 2022
e3a1424
Adding more types to store
GavinMendelGleason Dec 9, 2022
5976307
Adding gyear, days, etc.
GavinMendelGleason Dec 9, 2022
5d2b356
Make acccessors public
GavinMendelGleason Dec 9, 2022
9dd5f8e
Add some more datatypes
GavinMendelGleason Dec 10, 2022
9ceb73e
Typo bug
GavinMendelGleason Dec 10, 2022
dc6cec6
Typo!
GavinMendelGleason Dec 10, 2022
f1557d8
Add any simple type
GavinMendelGleason Dec 10, 2022
ce4d31e
Fix tests
GavinMendelGleason Dec 10, 2022
1e58262
Create fake f32
GavinMendelGleason Dec 11, 2022
f6a107e
Add back f32s with a cast to f64
GavinMendelGleason Dec 12, 2022
caedc17
Adding durations
GavinMendelGleason Dec 12, 2022
5aa62c8
Adding string casts
GavinMendelGleason Dec 12, 2022
07d4f12
fix test
matko Dec 13, 2022
eb31c89
Fix date tyeps
GavinMendelGleason Dec 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions Cargo.toml
Expand Up @@ -25,6 +25,13 @@ flate2 = "1.0"
rayon = "1.4"
thiserror = "1.0"
async-trait = "0.1"
itertools = "0.10"
rug = {version="1.16", default-features=false, features=["integer","rational"]}
num-derive = "0.3"
num-traits = "0.2"
chrono = "0.4"
base64 = "0.13"
hex = "0.4"

[dev-dependencies]
tempfile = "3.1"
Expand Down
4 changes: 2 additions & 2 deletions benches/bench.rs
Expand Up @@ -3,7 +3,7 @@ extern crate test;

use tempfile::tempdir;
use terminus_store;
use terminus_store::layer::StringTriple;
use terminus_store::layer::ValueTriple;
use test::Bencher;

#[bench]
Expand All @@ -14,7 +14,7 @@ fn bench_add_string_triple(b: &mut Bencher) {
let mut count = 1;
b.iter(|| {
layer_builder
.add_string_triple(StringTriple::new_value(
.add_value_triple(ValueTriple::new_string_value(
&count.to_string(),
&count.to_string(),
&count.to_string(),
Expand Down
8 changes: 4 additions & 4 deletions benches/builder/data.rs
@@ -1,7 +1,7 @@
use rand::distributions::Alphanumeric;
use rand::prelude::*;
use std::iter;
use terminus_store::layer::StringTriple;
use terminus_store::layer::ValueTriple;

fn random_string<R: Rng>(rand: &mut R, len_min: usize, len_max: usize) -> String {
let len: usize = rand.gen_range(len_min..len_max);
Expand Down Expand Up @@ -50,19 +50,19 @@ impl<R: Rng> TestData<R> {
}
}

pub fn random_triple(&mut self) -> StringTriple {
pub fn random_triple(&mut self) -> ValueTriple {
let subject_ix = self.rand.gen_range(0..self.nodes.len());
let predicate_ix = self.rand.gen_range(0..self.predicates.len());
if self.rand.gen() {
let object_ix = self.rand.gen_range(0..self.nodes.len());
StringTriple::new_node(
ValueTriple::new_node(
&self.nodes[subject_ix],
&self.predicates[predicate_ix],
&self.nodes[object_ix],
)
} else {
let object_ix = self.rand.gen_range(0..self.values.len());
StringTriple::new_value(
ValueTriple::new_string_value(
&self.nodes[subject_ix],
&self.predicates[predicate_ix],
&self.values[object_ix],
Expand Down
8 changes: 4 additions & 4 deletions benches/builder/main.rs
Expand Up @@ -38,7 +38,7 @@ fn build_base_layer_1000(b: &mut Bencher) {
let builder = store.create_base_layer().unwrap();

for triple in triples.iter() {
builder.add_string_triple(triple.clone()).unwrap();
builder.add_value_triple(triple.clone()).unwrap();
}

let _base_layer = builder.commit().unwrap();
Expand Down Expand Up @@ -78,7 +78,7 @@ fn build_nonempty_child_layer_on_empty_base_layer(b: &mut Bencher) {
let builder = base_layer.open_write().unwrap();

for triple in triples.iter() {
builder.add_string_triple(triple.clone()).unwrap();
builder.add_value_triple(triple.clone()).unwrap();
}

builder.commit().unwrap();
Expand All @@ -97,7 +97,7 @@ fn build_nonempty_child_layer_on_nonempty_base_layer(b: &mut Bencher) {
let builder = store.create_base_layer().unwrap();

for _ in 0..1000 {
builder.add_string_triple(data.random_triple()).unwrap();
builder.add_value_triple(data.random_triple()).unwrap();
}
let base_layer = builder.commit().unwrap();

Expand All @@ -110,7 +110,7 @@ fn build_nonempty_child_layer_on_nonempty_base_layer(b: &mut Bencher) {
let builder = base_layer.open_write().unwrap();

for triple in triples.iter() {
builder.add_string_triple(triple.clone()).unwrap();
builder.add_value_triple(triple.clone()).unwrap();
}

builder.commit().unwrap();
Expand Down
5 changes: 3 additions & 2 deletions examples/print_graph.rs
@@ -1,6 +1,7 @@
use std::env;

use std::io;
use terminus_store::structure::TdbDataType;
use terminus_store::*;
use tokio;

Expand All @@ -21,15 +22,15 @@ async fn print_graph(store_path: &str, graph: &str) -> io::Result<()> {
.expect("expected id triple to be mapable to string");

println!(
"{}, {}, {} {}",
"{}, {}, {} {:?}",
triple.subject,
triple.predicate,
match triple.object {
ObjectType::Node(_) => "node",
ObjectType::Value(_) => "value",
},
match triple.object {
ObjectType::Node(n) => n,
ObjectType::Node(n) => String::make_entry(&n),
ObjectType::Value(v) => v,
}
);
Expand Down
12 changes: 6 additions & 6 deletions examples/write_to_graph.rs
Expand Up @@ -7,8 +7,8 @@ use tokio;
use tokio::io::{self, AsyncBufReadExt};

enum Command {
Add(StringTriple),
Remove(StringTriple),
Add(ValueTriple),
Remove(ValueTriple),
}

async fn parse_command(s: &str) -> io::Result<Command> {
Expand All @@ -25,8 +25,8 @@ async fn parse_command(s: &str) -> io::Result<Command> {
let object = &matches[5];

let triple = match object_type_name {
"node" => StringTriple::new_node(subject, predicate, object),
"value" => StringTriple::new_value(subject, predicate, object),
"node" => ValueTriple::new_node(subject, predicate, object),
"value" => ValueTriple::new_string_value(subject, predicate, object),
_ => {
return Err(io::Error::new(
io::ErrorKind::InvalidData,
Expand Down Expand Up @@ -84,8 +84,8 @@ async fn process_commands(store_path: &str, graph: &str) -> io::Result<()> {
// Since no io is happening, adding triples to the builder is
// not a future.
match command {
Command::Add(triple) => builder.add_string_triple(triple)?,
Command::Remove(triple) => builder.remove_string_triple(triple)?,
Command::Add(triple) => builder.add_value_triple(triple)?,
Command::Remove(triple) => builder.remove_value_triple(triple)?,
}
}

Expand Down
110 changes: 70 additions & 40 deletions src/layer/builder.rs
@@ -1,5 +1,6 @@
use std::io;

use bytes::{Bytes, BytesMut};
use futures::stream::TryStreamExt;
use rayon::prelude::*;

Expand All @@ -8,32 +9,35 @@ use crate::storage::*;
use crate::structure::util;
use crate::structure::*;

pub struct DictionarySetFileBuilder<F: 'static + FileStore> {
node_dictionary_builder: PfcDictFileBuilder<F::Write>,
predicate_dictionary_builder: PfcDictFileBuilder<F::Write>,
value_dictionary_builder: PfcDictFileBuilder<F::Write>,
pub struct DictionarySetFileBuilder<F: 'static + FileLoad + FileStore> {
node_files: DictionaryFiles<F>,
predicate_files: DictionaryFiles<F>,
value_files: TypedDictionaryFiles<F>,
node_dictionary_builder: StringDictBufBuilder<BytesMut, BytesMut>,
predicate_dictionary_builder: StringDictBufBuilder<BytesMut, BytesMut>,
value_dictionary_builder: TypedDictBufBuilder<BytesMut, BytesMut, BytesMut, BytesMut>,
}

impl<F: 'static + FileLoad + FileStore> DictionarySetFileBuilder<F> {
pub async fn from_files(
node_files: DictionaryFiles<F>,
predicate_files: DictionaryFiles<F>,
value_files: DictionaryFiles<F>,
value_files: TypedDictionaryFiles<F>,
) -> io::Result<Self> {
let node_dictionary_builder = PfcDictFileBuilder::new(
node_files.blocks_file.open_write().await?,
node_files.offsets_file.open_write().await?,
);
let predicate_dictionary_builder = PfcDictFileBuilder::new(
predicate_files.blocks_file.open_write().await?,
predicate_files.offsets_file.open_write().await?,
);
let value_dictionary_builder = PfcDictFileBuilder::new(
value_files.blocks_file.open_write().await?,
value_files.offsets_file.open_write().await?,
let node_dictionary_builder = StringDictBufBuilder::new(BytesMut::new(), BytesMut::new());
let predicate_dictionary_builder =
StringDictBufBuilder::new(BytesMut::new(), BytesMut::new());
let value_dictionary_builder = TypedDictBufBuilder::new(
BytesMut::new(),
BytesMut::new(),
BytesMut::new(),
BytesMut::new(),
);

Ok(Self {
node_files,
predicate_files,
value_files,
node_dictionary_builder,
predicate_dictionary_builder,
value_dictionary_builder,
Expand All @@ -43,91 +47,117 @@ impl<F: 'static + FileLoad + FileStore> DictionarySetFileBuilder<F> {
/// Add a node string.
///
/// Panics if the given node string is not a lexical successor of the previous node string.
pub async fn add_node(&mut self, node: &str) -> io::Result<u64> {
let id = self.node_dictionary_builder.add(node).await?;
pub fn add_node(&mut self, node: &str) -> u64 {
let id = self
.node_dictionary_builder
.add(Bytes::copy_from_slice(node.as_bytes()));

Ok(id)
id
}

/// Add a predicate string.
///
/// Panics if the given predicate string is not a lexical successor of the previous node string.
pub async fn add_predicate(&mut self, predicate: &str) -> io::Result<u64> {
let id = self.predicate_dictionary_builder.add(predicate).await?;
pub fn add_predicate(&mut self, predicate: &str) -> u64 {
let id = self
.predicate_dictionary_builder
.add(Bytes::copy_from_slice(predicate.as_bytes()));

Ok(id)
id
}

/// Add a value string.
///
/// Panics if the given value string is not a lexical successor of the previous value string.
pub async fn add_value(&mut self, value: &str) -> io::Result<u64> {
let id = self.value_dictionary_builder.add(value).await?;
pub fn add_value(&mut self, value: TypedDictEntry) -> u64 {
let id = self.value_dictionary_builder.add(value);

Ok(id)
id
}

/// Add nodes from an iterable.
///
/// Panics if the nodes are not in lexical order, or if previous added nodes are a lexical succesor of any of these nodes.
pub async fn add_nodes<I: 'static + IntoIterator<Item = String> + Unpin + Send + Sync>(
pub fn add_nodes<I: 'static + IntoIterator<Item = String> + Unpin + Send + Sync>(
&mut self,
nodes: I,
) -> io::Result<Vec<u64>>
) -> Vec<u64>
where
<I as std::iter::IntoIterator>::IntoIter: Unpin + Send + Sync,
{
let mut ids = Vec::new();
for node in nodes {
let id = self.add_node(&node).await?;
let id = self.add_node(&node);
ids.push(id);
}

Ok(ids)
ids
}

/// Add predicates from an iterable.
///
/// Panics if the predicates are not in lexical order, or if previous added predicates are a lexical succesor of any of these predicates.
pub async fn add_predicates<I: 'static + IntoIterator<Item = String> + Unpin + Send + Sync>(
pub fn add_predicates<I: 'static + IntoIterator<Item = String> + Unpin + Send + Sync>(
&mut self,
predicates: I,
) -> io::Result<Vec<u64>>
) -> Vec<u64>
where
<I as std::iter::IntoIterator>::IntoIter: Unpin + Send + Sync,
{
let mut ids = Vec::new();
for predicate in predicates {
let id = self.add_predicate(&predicate).await?;
let id = self.add_predicate(&predicate);
ids.push(id);
}

Ok(ids)
ids
}

/// Add values from an iterable.
///
/// Panics if the values are not in lexical order, or if previous added values are a lexical succesor of any of these values.
pub async fn add_values<I: 'static + IntoIterator<Item = String> + Unpin + Send + Sync>(
pub fn add_values<I: 'static + IntoIterator<Item = TypedDictEntry> + Unpin + Send + Sync>(
&mut self,
values: I,
) -> io::Result<Vec<u64>>
) -> Vec<u64>
where
<I as std::iter::IntoIterator>::IntoIter: Unpin + Send + Sync,
{
let mut ids = Vec::new();
for value in values {
let id = self.add_value(&value).await?;
let id = self.add_value(value);
ids.push(id);
}

Ok(ids)
ids
}

pub async fn finalize(self) -> io::Result<()> {
self.node_dictionary_builder.finalize().await?;
self.predicate_dictionary_builder.finalize().await?;
self.value_dictionary_builder.finalize().await?;
let (mut node_offsets_buf, mut node_data_buf) = self.node_dictionary_builder.finalize();
let (mut predicate_offsets_buf, mut predicate_data_buf) =
self.predicate_dictionary_builder.finalize();
let (
mut value_types_present_buf,
mut value_type_offsets_buf,
mut value_offsets_buf,
mut value_data_buf,
) = self.value_dictionary_builder.finalize();

self.node_files
.write_all_from_bufs(&mut node_data_buf, &mut node_offsets_buf)
.await?;
self.predicate_files
.write_all_from_bufs(&mut predicate_data_buf, &mut predicate_offsets_buf)
.await?;

self.value_files
.write_all_from_bufs(
&mut value_types_present_buf,
&mut value_type_offsets_buf,
&mut value_offsets_buf,
&mut value_data_buf,
)
.await?;

Ok(())
}
Expand Down