Skip to content

Commit

Permalink
Merge pull request #11 from hntd187/commit-watcher
Browse files Browse the repository at this point in the history
Commit watcher
  • Loading branch information
hntd187 committed Oct 12, 2018
2 parents 4398125 + ce5c30e commit 0648cdc
Show file tree
Hide file tree
Showing 18 changed files with 465 additions and 185 deletions.
35 changes: 17 additions & 18 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,23 @@ path = "src/lib.rs"
capnpc = "0.9.0"

[dependencies]
gotham = "0.2"
gotham_derive = "0.2"
hyper = "0.11"
mime = "0.3"
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"
lazy_static = "1.1"
futures = "0.1"
tantivy = "0.7"
tokio = "0.1"
config = "0.9.0"
log = "0.4"
pretty_env_logger = "0.2"
failure = "0.1.2"
crossbeam-channel = "0.2"
capnp = "0.9"

gotham = "^0.2"
gotham_derive = "^0.2"
hyper = "^0.11"
mime = "^0.3"
serde = "^1.0"
serde_derive = "^1.0"
serde_json = "^1.0"
lazy_static = "^1.1"
futures = "^0.1"
tantivy = "^0.7"
tokio = "^0.1"
config = "^0.9.0"
log = "^0.4"
pretty_env_logger = "^0.2"
failure = "^0.1.2"
crossbeam-channel = "^0.2"
capnp = "^0.9"

[profile.release]
opt-level = 3
Expand Down
54 changes: 37 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ to be that for ElasticSearch.
Toshi will always target stable rust and will try our best to never make any use of unsafe. While underlying libraries may make some
use of unsafe, Toshi will make a concerted effort to vet these libraries in an effort to be completely free of unsafe Rust usage. The
reason I chose this was because I felt that for this to actually become an attractive option for people to consider it would have to have
be safe, stable and consistent. This was why stable rust was chosen because of the guarentees and safety it provides. I did not want to go down the rabbit hole of using nightly features to then have issues with their stability later on. Since Toshi is not
be safe, stable and consistent. This was why stable rust was chosen because of the guarantees and safety it provides. I did not want to go down the rabbit hole of using nightly features to then have issues with their stability later on. Since Toshi is not
meant to be a library I'm perfectly fine with having this requirement because people who would want to use this more than likely will
take it off the shelf and not modify it. So my motivation was to cater to that usecase when building Toshi.
take it off the shelf and not modify it. So my motivation was to cater to that use case when building Toshi.

#### Build Requirements
At this current time Toshi should build and work fine on Windows, OSX and Linux. From dependency requirements you are going to be Rust >= 1.27 and cargo installed to build.
Expand All @@ -22,15 +22,32 @@ At this current time Toshi should build and work fine on Windows, OSX and Linux.

There is a default config in config/config.toml

```toml
host = "localhost"
port = 8080
path = "data/"
writer_memory = 200000000
log_level = "debug"
json_parsing_threads = 4
bulk_buffer_size = 10000
auto_commit_duration = 10

[merge_policy]
kind = "log"
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75
```

##### Host
`host = "localhost"`

The local hostname toshi will bind on upon start.
The local hostname Toshi will bind on upon start.

##### Port
`port = 8080`

The port toshi will bind to upon start.
The port Toshi will bind to upon start.

##### Path
`path = "data/"`
Expand Down Expand Up @@ -60,14 +77,20 @@ This will control the buffer size for parsing documents into an index. It will c
take up by blocking when the message buffer is filled. If you want to go totally off the rails you can set this to 0 in order to make
the buffer unbounded.

##### Auto Commit Duration
`auto_commit_duration = 10`

This controls how often an index will automatically commit documents if there are docs to be committed. Set this to 0 to disable this feature,
but you will have to do commits yourself when you submit documents.

##### Merge Policy
```toml
[merge_policy]
kind = "log"
```

Tantivy will merge index segments according to the configuration outlined here. There are 2 options for this. "log" which is the default
segment merge behavior. Log has 3 additional values to it as well. Any of these 3 values can be ommitted to use Tantivy's default value.
segment merge behavior. Log has 3 additional values to it as well. Any of these 3 values can be omitted to use Tantivy's default value.
The default values are listed below.

```toml
Expand Down Expand Up @@ -148,22 +171,24 @@ curl -X PUT \
If everything succeeded we should receive a `201 CREATED` from this request and if you look in the data directory you configured you
should now see a directory for the test_index you just created.
Now we can add some documents to our Index.
Now we can add some documents to our Index. The options field can be omitted if a user does not want to commit on every document addition, but
for completeness it is included here.
```bash
curl -X PUT \
http://localhost:8080/test_index \
-H 'Content-Type: application/json' \
-d '{
"fields": [
{"field": "test_text", "value": "Babbaboo!" },
{"field": "test_u64", "value": 10 },
{"field": "test_i64", "value": -10 }
]
"options": { "commit": true },
"document": {
"test_text": "Babbaboo!",
"test_u64": 10,
"test_i64": -10
}
}'
```
And finally we can retreive all the documents in an index with a simple get call
And finally we can retrieve all the documents in an index with a simple get call
```bash
curl -X GET http://localhost:8080/test_index -H 'Content-Type: application/json'
Expand All @@ -173,11 +198,6 @@ curl -X GET http://localhost:8080/test_index -H 'Content-Type: application/json'
`cargo test`
#### Road Map
- 1.0 Single Node Parity with Elastic
- 2.0 Full Implementation of Elastic Search DSL
- 3.0 Cluster Distribution based on Raft
#### What is a Toshi?
Toshi is a three year old Shiba Inu. He is a very good boy and is the official mascot of this project. Toshi personally reviews all code before it is commited to this repository and is dedicated to only accepting the highest quality contributions from his human. He will though accept treats for easier code reviews.
6 changes: 5 additions & 1 deletion config/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,13 @@ host = "localhost"
port = 8080
path = "data/"
writer_memory = 200000000
log_level = "info"
log_level = "debug"
json_parsing_threads = 4
bulk_buffer_size = 10000
auto_commit_duration = 1

[merge_policy]
kind = "log"
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75
27 changes: 24 additions & 3 deletions src/bin/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,37 @@ extern crate toshi;
use std::path::PathBuf;
use std::sync::Arc;
use std::sync::RwLock;
use toshi::commit::IndexWatcher;
use toshi::index::IndexCatalog;
use toshi::router::router_with_catalog;
use toshi::settings::{HEADER, SETTINGS};

pub fn main() {
let code = runner();
std::process::exit(code);
}

pub fn runner() -> i32 {
std::env::set_var("RUST_LOG", &SETTINGS.log_level);
pretty_env_logger::init();
println!("{}", HEADER);

let catalog = Arc::new(RwLock::new(IndexCatalog::new(PathBuf::from(&SETTINGS.path)).unwrap()));
let index_catalog = match IndexCatalog::new(PathBuf::from(&SETTINGS.path)) {
Ok(v) => v,
Err(e) => {
eprintln!("Error Encountered - {}", e.to_string());
std::process::exit(1);
}
};
let catalog_arc = Arc::new(RwLock::new(index_catalog));

if SETTINGS.auto_commit_duration > 0 {
let commit_watcher = IndexWatcher::new(Arc::clone(&catalog_arc));
commit_watcher.start();
}

let addr = format!("{}:{}", &SETTINGS.host, SETTINGS.port);
gotham::start(addr, router_with_catalog(&catalog))
println!("{}", HEADER);
gotham::start(addr, router_with_catalog(&catalog_arc));

0
}
105 changes: 105 additions & 0 deletions src/commit.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
use index::IndexCatalog;
use settings::SETTINGS;

use std::sync::{Arc, RwLock};
use std::time::{Duration, Instant};

use tokio::prelude::*;
use tokio::runtime::{Builder as RtBuilder, Runtime};
use tokio::timer::Interval;

pub struct IndexWatcher {
catalog: Arc<RwLock<IndexCatalog>>,
runtime: Runtime,
}

impl IndexWatcher {
pub fn new(catalog: Arc<RwLock<IndexCatalog>>) -> Self {
let runtime = RtBuilder::new()
.core_threads(2)
.name_prefix("toshi-index-committer")
.build()
.unwrap();
IndexWatcher { catalog, runtime }
}

pub fn start(mut self) {
let catalog = Arc::clone(&self.catalog);

let task = Interval::new(Instant::now(), Duration::from_secs(SETTINGS.auto_commit_duration))
.for_each(move |_| {
if let Ok(mut cat) = catalog.write() {
cat.get_mut_collection().iter_mut().for_each(|(key, index)| {
let writer = index.get_writer();
match writer.lock() {
Ok(mut w) => {
let current_ops = index.get_opstamp();
if current_ops == 0 {
info!("No update to index={}, opstamp={}", key, current_ops);
} else {
w.commit().unwrap();
index.set_opstamp(0);
}
}
Err(_) => (),
};
});
}
Ok(())
})
.map_err(|e| panic!("Error in commit-watcher={:?}", e));

self.runtime.spawn(future::lazy(|| task));
self.runtime.shutdown_on_idle();
}

pub fn shutdown(self) { self.runtime.shutdown_now(); }
}

#[cfg(test)]
mod tests {

use super::*;
use handlers::search::tests::*;
use hyper::StatusCode;
use index::tests::*;
use std::thread::sleep;
use std::time::Duration;

use mime;
use serde_json;

#[test]
pub fn test_auto_commit() {
let idx = create_test_index();
let catalog = IndexCatalog::with_index("test_index".to_string(), idx).unwrap();
let arc = Arc::new(RwLock::new(catalog));
let test_server = create_test_client(&arc);
let watcher = IndexWatcher::new(Arc::clone(&arc));
watcher.start();

let body = r#"
{
"document": {
"test_text": "Babbaboo!",
"test_u64": 10 ,
"test_i64": -10
}
}"#;

let response = test_server
.put("http://localhost/test_index", body, mime::APPLICATION_JSON)
.perform()
.unwrap();
assert_eq!(response.status(), StatusCode::Created);
sleep(Duration::from_secs(1));

let check_request = create_test_client(&arc)
.get("http://localhost/test_index?pretty=false")
.perform()
.unwrap();
let results: TestResults = serde_json::from_slice(&check_request.read_body().unwrap()).unwrap();
assert_eq!(6, results.hits);
}

}
35 changes: 35 additions & 0 deletions src/handle.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
use settings::SETTINGS;
use tantivy::{Index, IndexWriter};

use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};

pub struct IndexHandle {
index: Index,
writer: Arc<Mutex<IndexWriter>>,
current_opstamp: AtomicUsize,
}

impl IndexHandle {
pub fn new(index: Index) -> Self {
let i = index.writer(SETTINGS.writer_memory).unwrap();
i.set_merge_policy(SETTINGS.get_merge_policy());
let current_opstamp = AtomicUsize::new(0);
let writer = Arc::new(Mutex::new(i));
Self {
index,
writer,
current_opstamp,
}
}

pub fn get_index(&self) -> &Index { &self.index }

pub fn recreate_writer(self) -> Self { IndexHandle::new(self.index) }

pub fn get_writer(&self) -> Arc<Mutex<IndexWriter>> { Arc::clone(&self.writer) }

pub fn get_opstamp(&self) -> usize { self.current_opstamp.load(Ordering::Relaxed) }

pub fn set_opstamp(&self, opstamp: usize) { self.current_opstamp.store(opstamp, Ordering::Relaxed) }
}
Loading

0 comments on commit 0648cdc

Please sign in to comment.