Skip to content
/ hash Public

πŸš€ The open-source, multi-tenant, self-building knowledge graph

License

Notifications You must be signed in to change notification settings

hashintel/hash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

github_banner

github_star

HASH

This is HASH's public monorepo which contains our public code, docs, and other key resources.

a   What is HASH?

HASH is a self-buliding, open-source database which grows, structures and checks itself. HASH integrates data in (near-)realtime, and provides a powerful set of interfaces so that information can be understood and used in any context. Intelligent, autonomous agents can be deployed to grow, check, and maintain the database, integrating and structuring information from the public internet as well as your own connected private sources. And users, including those who are non-technical, are able to visually browse and manage both entities (data) and types (schemas). HASH acts as a source of truth for critical data, no matter its source, and provides a platform for high-trust, safety-assured decision-making. Read our blog post β†’

In the future... we plan on growing HASH into an all-in-one workspace, or complete operating system, with AI-generated interfaces known as "blocks" created at the point of need, on top of your strongly-typed data (addressing the data quality and integrity challenges inherent in today's current generation of generative AI interfaces).

a   Getting started

  πŸš€ Quick-start (<5 mins): use the hosted app

Create an account

The only current "officially supported" way of trying HASH right now is by signing up for and using the hosted platform at app.hash.ai

Create an account to get started.

Sign in

Sign in to access your account.

Skip the queue

When you first create an account you may be placed on a waitlist. To jump the queue, once signed in, follow the instructions shown in your HASH dashboard. All submissions are reviewed by a member of the team.

  Running HASH locally

Running HASH locally

Running HASH locally is not yet officially supported. We plan on publishing a comprehensive guide to running your own instance of HASH shortly (2025Q2). In the meantime, you may try the instructions below.

Experimental instructions

  1. Make sure you have, Git, Rust, Docker, and Protobuf. Building the Docker containers requires Docker Buildx. Run each of these version commands and make sure the output is expected:

    git --version
    ## β‰₯ 2.17
    
    rustup --version
    ## β‰₯ 1.27.1 (Required to match the toolchain as specified in `rust-toolchain.toml`, lower versions most likely will work as well)
    
    docker --version
    ## β‰₯ 20.10
    
    docker compose version
    ## β‰₯ 2.17.2
    
    docker buildx version
    ## β‰₯ 0.10.4

    If you have difficulties with git --version on macOS you may need to install Xcode Command Line Tools first: xcode-select --install.

    If you use Docker for macOS or Windows, go to Preferences β†’ Resources and ensure that Docker can use at least 4GB of RAM (8GB is recommended).

  2. Clone this repository and navigate to the root of the repository folder in your terminal.

  3. We use mise-en-place to manage tool versions consistently across our codebase. We recommend using mise to automatically install and manage the required development tools:

    mise install

    It's also possible to install them manually, use the correct versions for these tools as specified in .config/mise.

    After installing mise you will also need to set it to automatically activate in your shell.

  4. Install dependencies:

    yarn install
  5. Ensure Docker is running. If you are on Windows or macOS, you should see app icon in the system tray or the menu bar. Alternatively, you can use this command to check Docker:

    docker run hello-world
  6. If you need to test or develop AI-related features, you will need to create an .env.local file in the repository root with the following values:

    OPENAI_API_KEY=your-open-ai-api-key                                      # required for most AI features
    ANTHROPIC_API_KEY=your-anthropic-api-key                                 # required for most AI features
    HASH_TEMPORAL_WORKER_AI_AWS_ACCESS_KEY_ID=your-aws-access-key-id         # required for most AI features
    HASH_TEMPORAL_WORKER_AI_AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key # required for most AI features
    E2B_API_KEY=your-e2b-api-key                                             # only required for the question-answering flow action

    Note on environment files: .env.local is not committed to the repo – put any secrets that should remain secret here. The default environment variables are taken from .env, extended by .env.development, and finally by .env.local. If you want to overwrite values specified in .env or .env.development, you can add them to .env.local. Do not change any other .env files unless you intend to change the defaults for development or testing.

  7. Launch external services (Postgres, the graph query layer, Kratos, Redis, and OpenSearch) as Docker containers:

    yarn external-services up --wait
    1. You can optionally force a rebuild of the Docker containers by adding the --build argument(this is necessary if changes have been made to the graph query layer). It's recommended to do this whenever updating your branch from upstream.

    2. You can keep external services running between app restarts by adding the --detach argument to run the containers in the background. It is possible to tear down the external services with yarn external-services down.

    3. When using yarn external-services:offline up, the Graph services does not try to connect to https://blockprotocol.org to fetch required schemas. This is useful for development when the internet connection is slow or unreliable.

    4. You can also run the Graph API and AI Temporal worker outside of Docker – this is useful if they are changing frequently and you want to avoid rebuilding the Docker containers. To do so, stop them in Docker and then run yarn dev:graph and yarn workspace @apps/hash-ai-worker-ts dev respectively in separate terminals.

  8. Launch app services:

    yarn start

    This will start backend and frontend in a single terminal. Once you see http://localhost:3000, the frontend end is ready to visit there. The API is online once you see localhost:5001 in the terminal. Both must be online for the frontend to function.

    You can also launch parts of the app in separate terminals, e.g.:

    yarn start:graph
    yarn start:backend
    yarn start:frontend

    See package.json β†’ scripts for details and more options.

  9. Log in

    When the HASH API is started, three users are automatically seeded for development purposes. Their passwords are all password.

    • alice@example.com, bob@example.com – regular users
    • admin@example.com – an admin
Running the browser plugin

If you need to run the browser plugin locally, see the README.md in the apps/plugin-browser directory.

Resetting the local database

If you need to reset the local database, to clear out test data or because it has become corrupted during development, you have two options:

  1. The slow option – rebuild in Docker

    1. In the Docker UI (or via CLI at your preference), stop and delete the hash-external-services container
    2. In 'Volumes', search 'hash-external-services' and delete the volumes shown
    3. Run yarn external-services up --wait to rebuild the services
  2. The fast option – reset the database via the Graph API

    1. Run the Graph API in test mode by running yarn dev:graph:test-server
    2. Run yarn graph:reset-database to reset the database
    3. If you need to use the frontend, you will also need to delete the rows in the identities table in the dev_kratos database, or signin will not work. You can do so via any Postgres UI or CLI. The db connection and user details are in .env
External services test mode

The external services of the system can be started in 'test mode' to prevent polluting the development database. This is useful for situations where the database is used for tests that modify the database without cleaning up afterwards.

To make use of this test mode, the external services can be started as follows:

yarn external-services:test up
  Deploying HASH to the cloud
Sending emails

Email-sending in HASH is handled by either Kratos (in the case of authentication-related emails) or through the HASH API Email Transport (for everything else).

To use AwsSesEmailTransporter, set export HASH_EMAIL_TRANSPORTER=aws_ses in your terminal before running the app. Valid AWS credentials are required for this email transporter to work.

Transactional emails templates are located in the following locations:

  • Kratos emails in ./../../apps/hash-external-services/kratos/templates/. This directory contains the following templates:
    • recovery_code - Email templates for the account recovery flow using a code for the UI.
      • When an email belongs to a registered HASH user, it will use the valid template, otherwise the invalid template is used.
    • verification_code - Email verification templates for the account registration flow using a code for the UI.
      • When an email belongs to a registered HASH user, it will use the valid template, otherwise the invalid template is used.
  • HASH emails in ../hash-api/src/email/index.ts

Deploying HASH to the cloud

Support for running HASH in the cloud is coming soon. We plan on publishing a comprehensive guide to deploying HASH on AWS/GCP/Azure in the near future. In the meantime, instructions contained in the root /infra directory might help in getting started.

a   Examples

Coming soon: we'll be collecting examples in the Awesome HASH repository.

a   Roadmap

Browse the HASH development roadmap for more information about currently in-flight and upcoming features.

a   About this repository

  Repository structure

Repository structure

This repository's contents is divided across several primary sections:

  • /apps contains the primary code powering our runnable applications
    • The HASH application itself is divided into various different services which can be found in this directory.
  • /blocks contains our public Block Protocol blocks
  • /infra houses deployment scripts, utilities and other infrastructure useful in running our apps
  • /libs contains libraries including npm packages and Rust crates
  • /tests contains end-to-end and integration tests that span across one or more apps, blocks or libs
  Environment variables

Environment variables

Here's a list of possible environment variables. Everything that's necessary already has a default value.

You do not need to set any environment variables to run the application.

General API server environment variables

  • NODE_ENV: ("development" or "production") the runtime environment. Controls default logging levels and output formatting.
  • PORT: the port number the API will listen on.

AWS configuration

If you want to use AWS for file uploads or emails, you will need to have it configured:

  • AWS_REGION: The region, eg. us-east-1
  • AWS_ACCESS_KEY_ID: Your AWS access key
  • AWS_SECRET_ACCESS_KEY: Your AWS secret key
  • AWS_S3_UPLOADS_BUCKET: The name of the bucket to use for file uploads (if you want to use S3 for file uploads), eg: my_uploads_bucket
  • AWS_S3_UPLOADS_ACCESS_KEY_ID: (optional) the AWS access key ID to use for file uploads. Must be provided along with the secret access key if the API is not otherwise authorized to access the bucket (e.g. via an IAM role).
  • AWS_S3_UPLOADS_SECRET_ACCESS_KEY: (optional) the AWS secret access key to use for file uploads.
  • AWS_S3_UPLOADS_ENDPOINT: (optional) the endpoint to use for S3 operations. If not, the AWS S3 default for the given region is used. Useful if you are using a different S3-compatible storage provider.
  • AWS_S3_UPLOADS_FORCE_PATH_STYLE: (optional) set true if your S3 setup requires path-style rather than virtual hosted-style S3 requests.

For some in-browser functionality (e.g. document previewing), you must configure a Access-Control-Allow-Origin header on your bucket to be something other than '*'.

File uploads

By default, files are uploaded locally, which is not recommended for production use. It is also possible to upload files on AWS S3.

  • FILE_UPLOAD_PROVIDER: Which type of provider is used for file uploads. Possible values LOCAL_FILE_SYSTEM, or AWS_S3. If choosing S3, then you need to configure the AWS_S3_UPLOADS_ variables above.
  • LOCAL_FILE_UPLOAD_PATH: Relative path to store uploaded files if using the local file system storage provider. Default is var/uploads (the var folder is the folder normally used for application data)

Email

During development, the dummy email transporter writes emails to a local folder.

  • HASH_EMAIL_TRANSPORTER: dummy or aws. If set to dummy, the local dummy email transporter will be used during development instead of aws (default: dummy)
  • DUMMY_EMAIL_TRANSPORTER_FILE_PATH: Default is var/api/dummy-email-transporter/email-dumps.yml
  • DUMMY_EMAIL_TRANSPORTER_USE_CLIPBOARD: true or false (default: true)

OpenSearch

NOTE: OpenSearch is currently disabled by default, and is presently unmaintained.

  • HASH_OPENSEARCH_ENABLED: whether OpenSearch is used or not. true or false. (default: false).
  • HASH_OPENSEARCH_HOST: the hostname of the OpenSearch cluster to connect to. (default: localhost)
  • HASH_OPENSEARCH_PASSWORD: the password to use when making the connection. (default: admin)
  • HASH_OPENSEARCH_PORT: the port number that the cluster accepts (default: 9200)
  • HASH_OPENSEARCH_USERNAME: the username to connect to the cluster as. (default: admin)
  • HASH_OPENSEARCH_HTTPS_ENABLED: (optional) set to "1" to connect to the cluster over an HTTPS connection.

Postgres

  • POSTGRES_PORT (default: 5432)

Various services also have their own configuration.

The Postgres superuser is configured through:

  • POSTGRES_USER (default: postgres)
  • POSTGRES_PASSWORD (default: postgres)

The Postgres information for Kratos is configured through:

  • HASH_KRATOS_PG_USER (default: kratos)
  • HASH_KRATOS_PG_PASSWORD (default: kratos)
  • HASH_KRATOS_PG_DATABASE (default: kratos)

The Postgres information for Temporal is configured through:

  • HASH_TEMPORAL_PG_USER (default: temporal)
  • HASH_TEMPORAL_PG_PASSWORD (default: temporal)
  • HASH_TEMPORAL_PG_DATABASE (default: temporal)
  • HASH_TEMPORAL_VISIBILITY_PG_DATABASE (default: temporal_visibility)

The Postgres information for the graph query layer is configured through:

  • HASH_GRAPH_PG_USER (default: graph)
  • HASH_GRAPH_PG_PASSWORD (default: graph)
  • HASH_GRAPH_PG_DATABASE (default: graph)

Redis

  • HASH_REDIS_HOST (default: localhost)
  • HASH_REDIS_PORT (default: 6379)

Statsd

If the service should report metrics to a StatsD server, the following variables must be set.

  • STATSD_ENABLED: Set to "1" if the service should report metrics to a StatsD server.
  • STATSD_HOST: the hostname of the StatsD server.
  • STATSD_PORT: (default: 8125) the port number the StatsD server is listening on.

Snowplow telemetry

  • HASH_TELEMETRY_ENABLED: whether Snowplow is used or not. true or false. (default: false)
  • HASH_TELEMETRY_HTTPS: set to "1" to connect to the Snowplow over an HTTPS connection. true or false. (default: false)
  • HASH_TELEMETRY_DESTINATION: the hostname of the Snowplow tracker endpoint to connect to. (required)
  • HASH_TELEMETRY_APP_ID: ID used to differentiate application by. Can be any string. (default: hash-workspace-app)

Others

  • FRONTEND_URL: URL of the frontend website for links (default: http://localhost:3000)
  • NOTIFICATION_POLL_INTERVAL: the interval in milliseconds at which the frontend will poll for new notifications, or 0 for no polling. (default: 10_000)
  • HASH_INTEGRATION_QUEUE_NAME The name of the Redis queue which updates to entities are published to
  • HASH_REALTIME_PORT: Realtime service listening port. (default: 3333)
  • HASH_SEARCH_LOADER_PORT: (default: 3838)
  • HASH_SEARCH_QUEUE_NAME: The name of the queue to push changes for the search loader service (default: search)
  • API_ORIGIN: The origin that the API service can be reached on (default: http://localhost:5001)
  • SESSION_SECRET: The secret used to sign sessions (default: secret)
  • LOG_LEVEL: the level of runtime logs that should be omitted, either set to debug, info, warn, error (default: info)
  • BLOCK_PROTOCOL_API_KEY: the api key for fetching blocks from the Þ Hub. Generate a key at https://blockprotocol.org/settings/api-keys.

a   Contributing

Please see CONTRIBUTING if you're interested in getting involved in the design or development of HASH.

We're also hiring for a number of key roles. We don't accept applications for engineering roles like a normal company might, but exclusively headhunt (using HASH as a tool to help us find the best people). Contributing to our public monorepo, even in a small way, is one way of guaranteeing you end up on our radar as every PR is reviewed by a human, as well as AI.

We also provide repo-specific example configuration files you can use for popular IDEs, including VSCode or Zed.

a   License

The vast majority of this repository is published as free, open-source software. Please see LICENSE for more information about the specific licenses under which the different parts are available.

a   Security

Please see SECURITY for instructions around reporting issues, and details of which package versions we actively support.

a   Contact

Find us on 𝕏 at @hashintel, email hey@hash.ai, create a discussion, or open an issue for quick help and community support.

Project permalink: https://github.com/hashintel/hash