Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to convert code for collection creation from ts client v2 to new client v3 #149

Open
michael-pont opened this issue Jun 13, 2024 · 1 comment

Comments

@michael-pont
Copy link

I am having some troubles converting code from previous typescript client (v2) to new client (3.0.5).
My confusion revolves around:

  • Will the collection definition be the same? I am not sure what certain properties do.
  • Typescript is showing me errors of properties that need to be defined (vectorIndex).

Here is the old "class"

import { WeaviateClass } from 'weaviate-ts-client'
import { CustomClassName } from '../types'

const className: CustomClassName = CustomClassName.SCOPED_ACCOUNT
export const ScopedAccountClass: WeaviateClass = {
  class: className,
  description: 'A class holding scoped information about an account',
  multiTenancyConfig: { enabled: true },
  vectorizer: 'text2vec-openai',
  moduleConfig: {
    'text2vec-openai': {
      model: 'text-embedding-3-small',
      dimensions: 1536,
      type: 'text',
      tokenization: 'word',
      vectorizeClassName: false,
    },
  },
  properties: [
    {
      name: 'accountId',
      description: 'The account id', // No cross-reference because querying from tenant to non tenant is not possible
      dataType: ['uuid'],
    },
    {
      name: 'crmAccountId',
      description:
        'CRM account identifier - we use this to link the account to our customers CRM',
      dataType: ['text'],
    },
    {
      name: 'teamId',
      description: 'Team ID - we use this as tenant identifier',
      dataType: ['uuid'],
    },
    {
      name: 'name',
      description:
        'Name of scoped account',
      dataType: ['text'],
    },
    {
      name: 'queryResponses',
      description:
        'The answers to the queries that are part of this enrichment request',
      dataType: ['object[]'],
      nestedProperties: [
        { dataType: ['text'], name: 'query' },
        { dataType: ['text'], name: 'customQueryId' },
        { dataType: ['text'], name: 'answer' },
        { dataType: ['text'], name: 'confidence' },
        { dataType: ['text'], name: 'explanation' },
        {
          name: 'sources',
          description: 'The sources of where the answer is found',
          dataType: ['object[]'],
          nestedProperties: [
            { dataType: ['text'], name: 'title' },
            { dataType: ['text'], name: 'link' },
            { dataType: ['boolean'], name: 'isVisitable' },
          ],
        },
      ],
    },
    {
      name: 'notes',
      description: 'Optional notes for the scoped account',
      dataType: ['text[]'],
      moduleConfig: {
        'text2vec-openai': {
          skip: false,
        },
      },
    },
    {
      name: 'enrichmentRequests',
      description:
        'Enrichment Requests - which enrichment requests are linked to this account',
      dataType: [`${CustomClassName.ENRICHMENT_REQUEST}[]`],
    },
    {
      name: 'customPropertyValues',
      description: 'Custom property values for this scoped account',
      dataType: [`${CustomClassName.CUSTOM_PROPERTY_VALUE}`],
    },
    {
      name: 'score',
      description: 'The score of the account',
      dataType: ['number'],
    },
    {
      name: 'originalUrl',
      description: 'Original input URL',
      dataType: ['text'],
      moduleConfig: {
        'text2vec-openai': {
          skip: true,
        },
      },
    },
  ],
}

Here is the new collection as I have defined it

import { CustomClassName } from '../types'
import { type CollectionConfigCreate } from 'weaviate-client'

const className: CustomClassName = CustomClassName.SCOPED_ACCOUNT
export const ScopedAccountClass: CollectionConfigCreate = {
  name: className,
  description: 'A class holding scoped information about an account',
  multiTenancy: {
    enabled: true,
  },
  vectorizers: [
    {
      name: 'text2vec-openai',
      properties: ['notes'],
      vectorIndex: {
        name: 'hnsw',
        config: {},
      },
      vectorizer: {
        name: 'text2vec-openai',
        config: {
          model: 'text-embedding-3-small',
          dimensions: 1536,
          type: 'text',
        },
      },
    },
  ],
  generative: {
    name: 'generative-openai',
    config: {
      model: 'gpt-3.5-turbo',
      temperatureProperty: 0,
    },
  },
  references: [
    {
      name: 'enrichmentRequests',
      description:
        'Enrichment Requests - which enrichment requests are linked to this account',
      targetCollection: CustomClassName.ENRICHMENT_REQUEST,
    },
    {
      name: 'customPropertyValues',
      description: 'Custom property values for this scoped account',
      targetCollection: CustomClassName.CUSTOM_PROPERTY_VALUE,
    },
  ],
  properties: [
    {
      name: 'accountId',
      description: 'The account id', // No cross-reference because querying from tenant to non tenant is not possible
      dataType: 'uuid',
    },
    {
      name: 'crmAccountId',
      description:
        'CRM account identifier - we use this to link the account to our customers CRM',
      dataType: 'text',
    },
    {
      name: 'teamId',
      description: 'Team ID - we use this as tenant identifier',
      dataType: 'uuid',
    },
    {
      name: 'name',
      description:
        'Name (honestly used for placeholder, as I want notes to be text[] and class creation will throw error)',
      dataType: 'text',
    },
    {
      name: 'queryResponses',
      description:
        'The answers to the queries that are part of this enrichment request',
      dataType: 'object[]',
      nestedProperties: [
        { dataType: 'text', name: 'query' },
        { dataType: 'text', name: 'customQueryId' },
        { dataType: 'text', name: 'answer' },
        { dataType: 'text', name: 'confidence' },
        { dataType: 'text', name: 'explanation' },
        {
          name: 'sources',
          description: 'The sources of where the answer is found',
          dataType: 'object[]',
          nestedProperties: [
            { dataType: 'text', name: 'title' },
            { dataType: 'text', name: 'link' },
            { dataType: 'boolean', name: 'isVisitable' },
          ],
        },
      ],
    },
    {
      name: 'notes',
      description: 'Optional notes for the scoped account',
      dataType: 'text[]',
    },
    {
      name: 'score',
      description: 'The score of the account',
      dataType: 'number',
    },
    {
      name: 'originalUrl',
      description: 'Original input URL',
      dataType: 'text',
      skipVectorization: true,
    },
  ],
}

Questions / Observations I had:

  • Defining vectorIndex property on the vectorizer has very little type support compared to vectorizer property. The same goes for generative field. It's not clear what to write / define or what behavior is expected when defining empty object (does default config override?) I've looked at the 3.0.5 codebase to find examples in tests and all I want to do is define the default hsnw vector index config. I ended up defining config as {}.
  • What do autoTenantActivation and autoTenantCreation properties do for multiTenancy config? I've read 1.25 release notes, however, I'm not 100% sure what the behavior should be here especially coming from v2 client.
  • references is now a separate field. I've removed the two properties I had previously defined in properties field and added them here. Previously I had an array of references so I defined it as <CLASS_NAME>[], do I need the [] anymore when defining the targetCollection field? Why is there the plural targetCollections? Can 1 property reference multiple other collections? That seems confusing?
  • I've added my custom property notes within the vectorizer config. I assume any properties I define here will set skipVectorization: false for each property? If I have multiple text data type properties in my collection do I need to explicitly set the skip property to true or false? Or can I just set the properties I want vectorized in the vectorizer config?
@michael-pont michael-pont changed the title [Question] How to convert code for collection creation to ts client v2 to new client v3 [Question] How to convert code for collection creation from ts client v2 to new client v3 Jun 13, 2024
@tsmith023
Copy link
Contributor

Hi @michael-pont, thanks for diving straight into the new client and getting to grips with it!

  • Perhaps it's not so clearly explained or pointed towards in the docs (we'll make sure to add this to the migration guide) but we strongly encourage you to make use of the weaviate.configure object to define the objects that the .create() method expects in the correct format. Indeed, you'll find that the types are not as expressive without it!
  • The autoTenantX options are only relevant to recent 1.25.x releases. In short, autoTenantCreation will automatically create a tenant if you try to insert objects with an associated tenant that didn't previously exist, e.g. collection.withTenant('tenant').data.insert(). autoTenantActivation will automatically activate a tenant if you try to insert/query objects into a tenant that was previously COLD so that you don't have to pre-activate it.
  • If your reference was defined as ['Class'] then this translates to targetCollection: 'Class'. You're right that one property referencing multiple collections is confusing but it is a feature of Weaviate! In the GraphQL API, this was traversable at query time through the ... on MyClass { } fragments syntax.
  • Your final question touches on something we're not happy about with the vectors syntax currently but is blocked by the current server-side capabilities. We would like to have it that skipVectorization and vectorizerPropertyName are definable in the vectorizers: field rather than in the properties: field. At the moment, if you set either of these booleans then they will be true for all named vectors. In future, this will not be the case but it is a current limitation.

To help your migration, here's how I would translate your .create() code to make use of the weaviate.configure object:

import { type CollectionConfigCreate } from '..'
import { CustomClassName } from '../types'
import weaviate from '../../index'

const className: CustomClassName = CustomClassName.SCOPED_ACCOUNT
export const ScopedAccountClass: CollectionConfigCreate = {
  name: className,
  description: 'A class holding scoped information about an account',
  multiTenancy: weaviate.configure.multiTenancy({ enabled: true }),
  vectorizers: weaviate.configure.vectorizer.text2VecOpenAI({
    sourceProperties: ['notes'],
    vectorIndexConfig: weaviate.configure.vectorIndex.hnsw(),
    model: 'text-embedding-3-small',
    dimensions: 1536,
    type: 'text',
  }),
  generative: weaviate.configure.generative.openAI({
    model: 'gpt-3.5-turbo',
    temperature: 0,
  }),
  references: [
    {
      name: 'enrichmentRequests',
      description:
        'Enrichment Requests - which enrichment requests are linked to this account',
      targetCollection: CustomClassName.ENRICHMENT_REQUEST,
    },
    {
      name: 'customPropertyValues',
      description: 'Custom property values for this scoped account',
      targetCollection: CustomClassName.CUSTOM_PROPERTY_VALUE,
    },
  ],
  properties: [
    {
      name: 'accountId',
      description: 'The account id', // No cross-reference because querying from tenant to non tenant is not possible
      dataType: 'uuid',
    },
    {
      name: 'crmAccountId',
      description:
        'CRM account identifier - we use this to link the account to our customers CRM',
      dataType: 'text',
    },
    {
      name: 'teamId',
      description: 'Team ID - we use this as tenant identifier',
      dataType: 'uuid',
    },
    {
      name: 'name',
      description:
        'Name (honestly used for placeholder, as I want notes to be text[] and class creation will throw error)',
      dataType: 'text',
    },
    {
      name: 'queryResponses',
      description:
        'The answers to the queries that are part of this enrichment request',
      dataType: 'object[]',
      nestedProperties: [
        { dataType: 'text', name: 'query' },
        { dataType: 'text', name: 'customQueryId' },
        { dataType: 'text', name: 'answer' },
        { dataType: 'text', name: 'confidence' },
        { dataType: 'text', name: 'explanation' },
        {
          name: 'sources',
          description: 'The sources of where the answer is found',
          dataType: 'object[]',
          nestedProperties: [
            { dataType: 'text', name: 'title' },
            { dataType: 'text', name: 'link' },
            { dataType: 'boolean', name: 'isVisitable' },
          ],
        },
      ],
    },
    {
      name: 'notes',
      description: 'Optional notes for the scoped account',
      dataType: 'text[]',
    },
    {
      name: 'score',
      description: 'The score of the account',
      dataType: 'number',
    },
    {
      name: 'originalUrl',
      description: 'Original input URL',
      dataType: 'text',
      skipVectorization: true,
    },
  ],
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants