More documentation work

zgornel · Feb 10, 2020 · dc06b31 · dc06b31
1 parent 6adc4ca
commit dc06b31
Show file tree

Hide file tree

Showing 6 changed files with 46 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
 ![Alt text](https://github.com/zgornel/Garamond.jl/blob/master/docs/src/assets/logo.png)
 
-A small, flexible neural and data search engine, written in Julia. Batteries not included.
-
-[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](LICENSE.md) 
+<p align="center">
+[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://zgornel.github.io/Garamond.jl/dev)
 [![Build Status (master)](https://travis-ci.com/zgornel/Garamond.jl.svg?token=8HcgFtAjpxwpdXiu8Fon&branch=master)](https://travis-ci.com/zgornel/Garamond.jl)
 [![Coverage Status](https://coveralls.io/repos/github/zgornel/Garamond.jl/badge.svg?branch=master)](https://coveralls.io/github/zgornel/Garamond.jl?branch=master)
-[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://zgornel.github.io/Garamond.jl/dev)
+</p>
 
+### Garamond is small, flexible neural and data search engine, written in Julia. Batteries not included.
 
 ## Installation
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -19,7 +19,7 @@ makedocs(
         "Configuration" => "configuration.md",
         "Client/Server" => "clientserver.md",
         "Building" => "build.md",
-        "Notes" => "notes.md",
+        "Remarks" => "remarks.md",
         "API Reference" => "api.md",
     ]
 )

diff --git a/docs/src/clientserver.md b/docs/src/clientserver.md
@@ -1,20 +1,20 @@
 # Search server, clients and REST APIs
 
-Garamond is designed as a [client-server architecture](http://catb.org/~esr/writings/taoup/html/ch11s06.html#id2958899) in which the server receives requests, performs the search, recommendation or ranking operations and returns the response i.e. results back to the client.
+Garamond is designed as a [client-server architecture](http://catb.org/~esr/writings/taoup/html/ch11s06.html#id2958899) in which the server receives requests, performs the search, recommendation or ranking operations and returns a response containing the search results back to the client.
 
 !!! note
 
     - The clients do not depend on the Garamond package and are very lightweight.
     - The preferred way of communicating with the server is through the [REST API](@ref rest-api-specification) using HTTP clients such as [curl](https://curl.haxx.se/), etc.
 
 In the root directory of the package the search server utility and two thin clients can be found:
-- **gars** - starts the search server. The operations performed by the search engine server at this point are indexing data according to a given configuration and serving requests coming from connections to sockets or HTTP ports.
-- **garc** - command line client supporting Unix socket communication. Through it, a single search can be performed and many of the search request parameters can be specified. It supports printing search results in a human-readable way.
-- **garw** - web client supporting Web socket communication (experimental and feature limited). The basic principle is that it starts a HTTP server which serves a page at a given HTTP port. If the web page is not specified, a default one is generated internally and served. The user connects with a web browser of choice at the local address and port (i.e. `127.0.0.1`) and performs the search queries from the page. It naturally supports multiple queries however, the parameters of the search cannot be changed.
+- `gars` - starts the search server. The operations performed by the search engine server at this point are indexing data according to a given configuration and serving requests coming from connections to sockets or HTTP ports.
+- `garc` - command line client supporting Unix socket communication. Through it, a single search can be performed and many of the search request parameters can be specified. It supports printing search results in a human-readable way.
+- `garw` - web client supporting Web socket communication (experimental and feature limited). The basic principle is that it starts a HTTP server which serves a page at a given HTTP port. If the web page is not specified, a default one is generated internally and served. The user connects with a web browser of choice at the local address and port (i.e. `127.0.0.1`) and performs the search queries from the page. It naturally supports multiple queries however, the parameters of the search cannot be changed.
 
 
 ## Server
-The search server listens on an ip and socket for incoming requests. Once one is received, it is processed and the response sent back to same socket. Looking at the `gars` command line help
+The search server listens on an ip and/or socket for incoming requests. Once one is received, it is processed and the response sent back to same socket. Looking at the `gars` command line help
 ```
 $ ./gars --help
 Activating environment at `~/projects/Garamond.jl/Project.toml`
@@ -47,6 +47,7 @@ optional arguments:
   -h, --help            show this help message and exit
 ```
 starting the server becomes quite straightforward.
+
 For example, to start the server listening to a web socket at port 9100 and to a UNIX socket at `/tmp/some/socket`:
 ```
 $ ./gars -d ./search_data_config.json -u /tmp/some/socket -w 9100 --log-level info
@@ -71,10 +72,12 @@ $ ./garc --help
 usage: garc [--log-level LOG-LEVEL] [-u UNIX-SOCKET]
             [--return-fields [RETURN-FIELDS...]] [--pretty]
             [--max-matches MAX-MATCHES]
+            [--response-size RESPONSE-SIZE]
             [--search-method SEARCH-METHOD]
             [--max-suggestions MAX-SUGGESTIONS] [--id-key ID-KEY] [-k]
-            [--update-searcher UPDATE-SEARCHER] [--update-all]
-            [--rank] [-h] [query]
+            [--env-operation ENV-OPERATION ENV-OPERATION]
+            [--ranker RANKER] [--input-parser INPUT-PARSER] [-h]
+            [query]
 
 positional arguments:
   query                 the search query (default: "")
@@ -89,8 +92,11 @@ optional arguments:
                         List of fields to return (ignores wrong names)
   --pretty              output is a pretty print of the results
   --max-matches MAX-MATCHES
-                        maximum results to return (type: Int64,
-                        default: 10)
+                        maximum number of results for internal
+                        neighbor searches (type: Int64, default: 10)
+  --response-size RESPONSE-SIZE
+                        maximum number of results to return (type:
+                        Int64, default: 10)
   --search-method SEARCH-METHOD
                         type of match done during search (type:
                         Symbol, default: :exact)
@@ -101,10 +107,14 @@ optional arguments:
   --id-key ID-KEY       The linear ID key (default:
                         "garamond_linear_id")
   -k, --kill            Kill the search engine server
-  --update-searcher UPDATE-SEARCHER
-                        Update a searcher (default: "")
-  --update-all          Update all searchers
-  --rank                Use ranker (if any)
+  --env-operation ENV-OPERATION ENV-OPERATION
+                        Environment operation
+  --ranker RANKER       The ranker to use; avalilable: noop_ranker
+                        (default: "noop_ranker")
+  --input-parser INPUT-PARSER
+                        The input parser to use; available:
+                        noop_input_parser, base_input_parser (default:
+                        "noop_input_parser")
   -h, --help            show this help message and exit
 ```
 

diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md
@@ -10,14 +10,19 @@ The engine uses a pluggable approach in which data loaders, parsers, recommender
 !!! tip "Glossary"
 
     Throughout the documentation, certain terms will appear when refering to the internals of the engine. Some of the most frequent ones are:
-    * **config** may refer to several configuration files or objects that the engine uses.
+    * **configuration** may refer to:
+      - searcher configuration, a `SearcherConfig` object which holds the configuration options for individual searchers.
+      - environment configuration, a `NamedTuple` that contains searcher configurations as well as other parameters.
+      - data configuration file, a JSON file which is parsed to generate an environment configuration.
     * **search environment** a `SearchEnv` object that holds the data and searchers among other. It fully describes the state of the engine.
-    * **searcher** - object that is used to perform the actual search. It holds the indexed documents in some vectorial representation.
+    * **searcher**, a `Searcher` object that is used to perform the actual search. It holds the indexed documents in some vectorial representation.
     * **index** - the data structure holding the vector representation of the documents.
-    * **request** - may refer to either a request form an outside system to the engine i.e. HTTP request or its internal representation in the API, of type `InternalRequest`.
+    * **request** - may refer to:
+      - a request form an outside system to the engine i.e. HTTP request.
+      - the internal representation of a request, of type `InternalRequest`.
 
-### Engine configuration
-The main configuration of the engine pertains to data loading, parsing and indexing. Its role is to provide all necessary details as well as the internal architecture of the engine. The recommended way for configuring the engine is to create a JSON file with all necessary options. Alternatively, the result of parsing the configuration file i.e. the configuration object can be created manually or programatically however it is - at least at this point - a cumbersome operation.
+## Engine configuration
+The main configuration of the engine pertains to data loading, parsing and indexing. Its role is to provide all necessary details as well as the internal architecture of the engine. The recommended way for configuring the engine is to create a JSON file with all necessary options. Alternatively, the result of parsing the configuration file i.e. the configuration object can be created explicitly however it is, at least at this point, a cumbersome operation.
 
 ```@repl_index
 using Logging, JSON, JuliaDB, Garamond
@@ -33,14 +38,14 @@ for field in fieldnames(typeof(cfg))
 end
 ```
 
-### The search environment
+## The search environment
 Building the search environment out of the configuration is straightforward. The environment holds the in-memory data in the form of an `IndexedTable` or `NDSparse` object, the searchers as well as other information such as primary db key and configuration paths. 
 
 ```@repl_index
 env = build_search_env(cfg)
 ```
 
-### Engine operations
+## Engine operations
 The internal API is designed to be straightforward and uniform in the way it is called. First, one has to build a request which fully describes the operation to be performed and subsequently, call the operation desired. For example, to perform a search, one request would be:
 ```@repl_index
 request = Garamond.InternalRequest(operation=:search,
@@ -62,7 +67,7 @@ Ranking the results using the ranker specified in the request is done with:
 ranked = rank(env, request, search_results)
 ```
 
-### Results and responses
+## Results and responses
 
 Once results are available, these can be printed
 ```@repl_index

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -8,7 +8,7 @@ CurrentModule=Garamond
 
 # Introduction
 
-Garamond is a small, flexible neural and data search engine. It can be used both as a Julia package i.e. search functionality available through API method calls or as a standalone search server i.e. search functionality accessible through clients that communicate with the server.
+Garamond is a small, flexible neural and data search engine. It can be used both as a Julia package, with search functionality available through API method calls or as a standalone search server, with search functionality accessible through clients that communicate with the server.
 
 Internally, the engine's architecture is that of an ensemble of searchers, with an analytical database as data backend. Each searcher has its own characteristics i.e. ways of embedding documents, searching through the vectors and the search results from all searchers can be combined in a variety of ways. The engine supports runtime loading and use of custom data loaders, recommendation engines and result rankers.
 
@@ -34,17 +34,18 @@ downloads the `master` branch of the repository and adds `Garamond` to the curre
 - Run-time batch re-indexing
 - HTTP(REST)/Web-socket and UNIX socket connectivity
 - Wordvectors support: [Word2Vec](https://en.wikipedia.org/wiki/Word2vec), [ConceptnetNumberbatch](https://github.com/commonsense/conceptnet-numberbatch), [GloVe](https://nlp.stanford.edu/projects/glove/)
-- Classic search based on [term frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Term_frequency_2), [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Term_frequency%E2%80%93Inverse_document_frequency), [bm25](https://en.wikipedia.org/wiki/Okapi_BM25)
 - Compressed vector support for low-memory footprint using [array quantization](https://github.com/zgornel/QuantizedArrays.jl)
+- Classic search based on [term frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Term_frequency_2), [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Term_frequency%E2%80%93Inverse_document_frequency), [bm25](https://en.wikipedia.org/wiki/Okapi_BM25)
 - Suggestion support using [BK Trees](https://en.wikipedia.org/wiki/BK-tree)
 - Many state-of-the-art neural document and sentence embedding methods
 - Multi-threading [supported](https://github.com/zgornel/Garamond.jl/tree/cc-multithreading)
 - Caching mechanisms for fast resume
-- Portable (and statically compilable) to many architectures
+- Portable and statically compilable to many architectures
 
 ## Coming Soon
 - Billion-scale search through [IVFADC](https://github.com/JuliaNeighbors/IVFADC.jl)
 - Run-time indexing
+- Architectural improvements i.e. pool of embedders
 
 ## Longer term plans
 - Image/Video/Audio i.e. generic search

diff --git a/docs/src/notes.md → docs/src/remarks.md b/docs/src/notes.md → docs/src/remarks.md
@@ -1,4 +1,4 @@
-# Notes
+# Remarks
 
 ## Multi-threading
 If one chooses to use multi-threading i.e. through the `Threads.@threads`, `Threads.@spawn`  macros for example, export the following: `OPENBLAS_NUM_THREADS=1` and `JULIA_NUM_THREADS=<n>` where `n` is the number of threads desired.