The toyEngine is a implementation of the search engine invert-index, along with basic and advanced functionalities. Specifically this program is currently providing functionalities including:
- persisting and loading of lexicon / last posting unit id / term associated information / added document information;
- posting list inependent persisting and lazily loading;
- double layers term lock service
- posting unit adding, lazily deleting and posting list cleaning;
- document adding and deleting;
- posting list accessing status recording and automatically deactivating;
- inverted index reloading for garbage collection and reallocating post unit IDs;
- posting list scanning and simple document scoring models;
- three searching algorithms including plain search, maxScore and WAND;
From the perspective of design, the program mainly consists of three parts,
- inverted-index and associated operations;
- entities supporting the implementations of the operations;
- helper classes like commonly used basic data structures and various utils.
The design of entities are mostly applying the scheme of “mainstay and plugins”, in which specific functionalities and data structures are provided and maintained by the plugins, this is for the convenience of developing additional functionalities based on the current backbone.