v1.7.0 - Array datatypes and two new modules (NER, Spellcheck)
Features
-
Array Datatypes (#1611)
Starting with this releases, primitive object properties are no longer limited to individual properties, but can also include lists of primitives. Array types can be stored, filtered and aggregated in the same way as other primitives.
Auto-schema will automatically recognize lists of
string
/text
andnumber
/int
. You can also explicitly specify lists in the schema by using the following data typesstring[]
,text[]
,int[]
,number[]
. A type that is assigned to be an array, must always stay an array, even if it only contains a single element. -
New Module:
text-spellcheck
- Check and auto-correct misspelled search terms (#1606)Use the new spellchecker module to verify user-provided search queries (in existing
nearText
orask
functions) are spelled correctly and even suggest alternative, correct spellings. Spell-checking happens at query time.There are two ways to use this module:
- It provides a new additional prop which can be used to check (but not alter) the provided queries:
The following query:
{ Get { Post(nearText:{ concepts: "missspelled text" }) { content _additional{ spellCheck{ changes{ corrected original } didYouMean location originalText } } } } }
will produce results, similar to the following:
"_additional": { "spellCheck": [ { "changes": [ { "corrected": "misspelled", "original": "missspelled" } ], "didYouMean": "misspelled text", "location": "nearText.concepts[0]", "originalText": "missspelled text" } ] }, "content": "..." },
- It extends existing
text2vec-modules
with aautoCorrect
flag, which can be used to correct the query if incorrect in the background.
- It provides a new additional prop which can be used to check (but not alter) the provided queries:
-
New Module
ner-transformers
- Extract entities from Weaviate using transformers (#1632)Use transformer-based models to extract entities from your existing Weaviate objects on the fly. Entity Extraction happens at query time. Note that for maximum perfomance, transformer-based models should run with GPUs. CPUs can be used, but the throughput will be lower.
To make use of the modules capabilities, simply extend your query with the following new
_additional
property:{ Get { Post { content _additional { tokens( properties: ["content"], # is required limit: 10, # optional, int certainty: 0.8 # optional, float ) { certainty endPosition entity property startPosition word } } } } }
It will return results similar to the following:
"_additional": { "tokens": [ { "property": "content", "entity": "PER", "certainty": 0.9894614815711975, "word": "Sarah", "startPosition": 11, "endPosition": 16 }, { "property": "content", "entity": "LOC", "certainty": 0.7529033422470093, "word": "London", "startPosition": 31, "endPosition": 37 } ] }
Fixes
- Aggregation can get stuck when aggregating
number
datatypes (#1660)