Skip to content

v1.7.0 - Array datatypes and two new modules (NER, Spellcheck)

Compare
Choose a tag to compare
@etiennedi etiennedi released this 01 Sep 15:22
de88013

Features

  • Array Datatypes (#1611)

    Starting with this releases, primitive object properties are no longer limited to individual properties, but can also include lists of primitives. Array types can be stored, filtered and aggregated in the same way as other primitives.

    Auto-schema will automatically recognize lists of string/text and number/int. You can also explicitly specify lists in the schema by using the following data types string[], text[], int[], number[]. A type that is assigned to be an array, must always stay an array, even if it only contains a single element.

  • New Module: text-spellcheck - Check and auto-correct misspelled search terms (#1606)

    Use the new spellchecker module to verify user-provided search queries (in existing nearText or ask functions) are spelled correctly and even suggest alternative, correct spellings. Spell-checking happens at query time.

    There are two ways to use this module:

    1. It provides a new additional prop which can be used to check (but not alter) the provided queries:
      The following query:
     {
       Get {
         Post(nearText:{
           concepts: "missspelled text"
         }) {
           content
           _additional{
             spellCheck{
               changes{
                 corrected
                 original
               }
               didYouMean
               location
               originalText
             }
           }
         }
       }
     }

    will produce results, similar to the following:

       "_additional": {
         "spellCheck": [
           {
             "changes": [
               {
                 "corrected": "misspelled",
                 "original": "missspelled"
               }
             ],
             "didYouMean": "misspelled text",
             "location": "nearText.concepts[0]",
             "originalText": "missspelled text"
           }
         ]
       },
       "content": "..."
     },
    
    1. It extends existing text2vec-modules with a autoCorrect flag, which can be used to correct the query if incorrect in the background.
  • New Module ner-transformers - Extract entities from Weaviate using transformers (#1632)

    Use transformer-based models to extract entities from your existing Weaviate objects on the fly. Entity Extraction happens at query time. Note that for maximum perfomance, transformer-based models should run with GPUs. CPUs can be used, but the throughput will be lower.

    To make use of the modules capabilities, simply extend your query with the following new _additional property:

    {
      Get {
        Post {
          content
          _additional {
            tokens(
              properties: ["content"],    # is required
              limit: 10,                  # optional, int
              certainty: 0.8              # optional, float
            ) {
              certainty
              endPosition
              entity
              property
              startPosition
              word
            }
          }
        }
      }
    }
    

    It will return results similar to the following:

     "_additional": {
       "tokens": [
         {
           "property": "content",
           "entity": "PER",
           "certainty": 0.9894614815711975,
           "word": "Sarah",
           "startPosition": 11,
           "endPosition": 16
         },
         {
           "property": "content",
           "entity": "LOC",
           "certainty": 0.7529033422470093,
           "word": "London",
           "startPosition": 31,
           "endPosition": 37
         }
       ]
     }
    

Fixes

  • Aggregation can get stuck when aggregating number datatypes (#1660)