Skip to content

Verida protocol schema research #18

@tahpot

Description

@tahpot

Verida schemas

The Verida protocol supports defining a schema for data stored in Verida databases. These schema'd databases are called datastores. See Verida schema documentation.

Anyone can define a schema and reference it by URL. The Verida protocol defines a collection of base schemas in the schema repo.

All Verida schemas should match the JSON schema specification.

We need a Verida meta-schema

Verida schemas need to provide additional metadata, beyond just "validation rules".

This includes:

  • Information about the database storing this schema (database name, indexes etc.)
  • Display information used when displaying this schema in the Verida Vault (or other applications)

You can see an example of this additional metadata in the draft Veirda social/contact schema.

Tasks:

  • Define a JSON structure for storing the above metadata
  • Define and publish a Verida JSON metaschema
  • Update our existing schemas to reference the new Verida JSON metaschema

We need a plan for schema versioning

Schemas will change. We need to have a clearly defined strategy and process to help developers update schemas.

I'm not sure we need to solve this problem now, but at least need a plan.

Things to consider:

  • What prior art / existing versioning strategies that exist? (ask community?)
  • How to migrate data from one schema to an upgraded version?
  • How to ensure all client applications are using the latest schema?
  • How strict should client implementations be to ensure the latest schema is being used?
  • What role can the Verida Vault play to ensure valid schemas are being used?
  • Need to support versioning of the meta schema
  • When data is signed, it includes signing the schema URI. This means "updating" the schema to a new URI will invalidate the signature unless the data is re-signed by the data originator (unlikely).

My Initial thoughts:

In phase 1, we support versioning by having a convention of building it into the URL, ie: .../social/contact/schema/v1.json. Since multiple schemas can use the same database, it's possible to have records stored in the same database but using different schema versions.

In phase 2, we support a "data migration" process, whereby a new schema version can define a data migration schema. The Verida client can support applying this data transform based on the data migration schema to convert data from an older schema to the new schema.

Tasks:

  • Research current versioning strategies
  • Document a set of staged recommendations to build into the Verida protocol
  • Sign off
  • Implement first phase of recommendations

Schema security

There's a security risk where a schema is specified by URL and then the schema is modified (or the hosting provider hacked) to generate a different URL. For example, modifying the schema to remove the list of required fields, allowing data to be saved across the network with invalid data.

I don't think we need to solve this right now, but need to consider the implications and have a strategy to improve this in the future.

It's possible to use IPFS to store a schema and then reference the content addressable URI within data saved using the Verida protocol.

In a future phase, we could support on chain "schema hashes" via the Trust Framework. This allows schemas to be referenced by an on-chain hash instead of a https URL. Ceramic network also provides similar capabilities.

Tasks:

  • Document an initial assessment of the security risks
  • Research appropriate mitigation strategies and document recommendations

Community resources

The following community resources exist and seem active:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions