-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
detail in `packages/common/infra/src/sync/indexer/README.md`
- Loading branch information
Showing
26 changed files
with
3,388 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# index | ||
|
||
Search engine abstraction layer for AFFiNE. | ||
|
||
## Using | ||
|
||
1. Define schema | ||
|
||
First, we need to define the shape of the data. Currently, there are the following data types. | ||
|
||
- 'Integer' | ||
- 'Boolean' | ||
- 'FullText': for full-text search, it will be tokenized and stemmed. | ||
- 'String': for exact match search, e.g. tags, ids. | ||
|
||
```typescript | ||
const schema = defineSchema({ | ||
title: 'FullText', | ||
tag: 'String', | ||
size: 'Integer', | ||
}); | ||
``` | ||
|
||
> **Array type** | ||
> All types can contain one or more values, so each field can store an array. | ||
2. Pick a backend | ||
|
||
Currently, there are two backends available. | ||
|
||
- `MemoryIndex`: in-memory indexer, useful for testing. | ||
- `IndexedDBIndex`: persistent indexer using IndexedDB. | ||
|
||
> **Underlying Data Table** | ||
> Some back-end processes need to maintain underlying data tables, including table creation and migration. This operation should be silently executed the first time the indexer is invoked. | ||
> Callers do not need to worry about these details. | ||
> | ||
> This design conforms to the usual conventions of search engine APIs, such as in Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html | ||
3. Write data | ||
|
||
Write data to the indexer. you need to start a write transaction by `await index.write()` first and then complete the batch write through `await writer.commit()`. | ||
|
||
> **Transactional** | ||
> Typically, the indexer does not provide transactional guarantees; reliable locking logic needs to be implemented at a higher level. | ||
```typescript | ||
const indexer = new IndexedDBIndex(schema); | ||
|
||
const writer = await index.write(); | ||
writer.insert( | ||
Document.from('id', { | ||
title: 'hello world', | ||
tag: ['doc', 'page'], | ||
size: '100', | ||
}) | ||
); | ||
await writer.commit(); | ||
``` | ||
|
||
4. Search data | ||
|
||
To search for content in the indexer, you need to use a specific **query language**. Here are some examples: | ||
|
||
```typescript | ||
// match title == 'hello world' | ||
{ | ||
type: 'match', | ||
field: 'title', | ||
match: 'hello world', | ||
} | ||
|
||
// match title == 'hello world' && tag == 'doc' | ||
{ | ||
type: 'boolean', | ||
occur: 'must', | ||
queries: [ | ||
{ | ||
type: 'match', | ||
field: 'title', | ||
match: 'hello world', | ||
}, | ||
{ | ||
type: 'match', | ||
field: 'tag', | ||
match: 'doc', | ||
}, | ||
], | ||
} | ||
``` | ||
|
||
There are two ways to perform the search, `index.search()` and `index.aggregate()`. | ||
|
||
- **search**: return each matched node and pagination information. | ||
- **aggregate**: aggregate all matched results based on a certain field into buckets, and return the count and score of items in each bucket. | ||
|
||
Examples: | ||
|
||
```typescript | ||
const result = await index.search({ | ||
type: 'match', | ||
field: 'title', | ||
match: 'hello world', | ||
}); | ||
// result = { | ||
// nodes: [ | ||
// { | ||
// id: '1', | ||
// score: 1, | ||
// }, | ||
// ], | ||
// pagination: { | ||
// count: 1, | ||
// hasMore: false, | ||
// limit: 10, | ||
// skip: 0, | ||
// }, | ||
// } | ||
``` | ||
|
||
```typescript | ||
const result = await index.aggregate( | ||
{ | ||
type: 'match', | ||
field: 'title', | ||
match: 'affine', | ||
}, | ||
'tag' | ||
); | ||
// result = { | ||
// buckets: [ | ||
// { key: 'motorcycle', count: 2, score: 1 }, | ||
// { key: 'bike', count: 1, score: 1 }, | ||
// { key: 'airplane', count: 1, score: 1 }, | ||
// ], | ||
// pagination: { | ||
// count: 3, | ||
// hasMore: false, | ||
// limit: 10, | ||
// skip: 0, | ||
// }, | ||
// } | ||
``` | ||
|
||
More uses: | ||
|
||
[black-box.spec.ts](./__tests__/black-box.spec.ts) |
Oops, something went wrong.