Skip to content

xmlking/entity-resolution

Repository files navigation

Entity Resolution

Entity resolution (ER) is the task of disambiguating records that correspond to real world entities across and within datasets.

The problems associated with entity resolution are equally big as the volume and velocity of data grow, inference across networks and semantic relationships between entities becomes increasingly difficult. This project attempts to provide a solution using Elasticsearch and Graph Database.

Overview

Links: RedisGraph, Data Modeling

Getting Started

Clone

gh repo clone xmlking/entity-resolution

Install dependencies

export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build
go work sync
go mod tidy

Install git hooks

# cog install-hook all
cog install-hook commit-msg
# you can verify if the hooks are installed by running
cat .git/hooks/commit-msg

Maintenance

Update generated proto code from BSR, after you publish proto to BSR

export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build
go get go.buf.build/grpc/go/<repo_owner>/entityapis
go work sync

update outdated Go dependencies interactively

export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build 
GOWORK="off" go-mod-upgrade
# then commit the changes. 

Update deps

go work sync
task mod:outdated
task mod:sync
task mod:verify

Lint Code

before commit, line your code with following command in the order:

go fmt ./...
#golangci-lint --version
golangci-lint run -c .github/linters/.golangci.yml

Development

Launch Redis

docker compose up
# docker compose up redis
# open Grafana UI and enable redis plugin
open http://localhost:3000/plugins/redis-app/
open http://localhost:3000/dashboards

# to ssh to grafana
docker-compose exec grafana /bin/bash
cd /etc/grafana/provisioning

# stop
docker compose down
# this will stop redis and remove all volumes
docker compose down -v 

Run

# first generate go code.
go generate ./... 
# run engine
go run ./service/entity/... 
#go run ./cmd/er/...   

https://studio.buf.build/micro/entityapis/main/er.service.entity.v1.EntityService/Ingest?target=http%3A%2F%2Flocalhost%3A8080

{
  "externalId": "123e4567-e89b-12d3-a456-426614174000",
  "names": [{"first":"sumo", "last": "demo"}],
  "gender": "GENDER_MALE",
  "emails": {},
  "phones": {},
  "addresses": []
}

To see all config environment variable options, run:

CONFY_LOG_LEVEL=debug \
CONFY_DEBUG_MODE=true \
CONFY_VERBOSE_MODE=true \
go run ./service/entity/... 

Test

# first generate go code. <-- IMPORTANT
go generate ./... 

go test -v ./service/entity/... 
go test -v ./cmd/er/...   

Build

# first generate go code.
go generate ./... 
go build -o build ./service/entity/... 
go build -o build ./cmd/er/...

Release

Following command bump VERSION number and push changes and tag to remote
Then, GitHub Action trigger GoReleaser process.

NOTE: make sure you commit all changes before running this command.

### 
```shell
# dry-run: calculate the next version based on the commit types since the latest tag
cog bump --auto --dry-run 
# calculate the next version based on the commit types since the latest tag
cog bump --auto

Verify

Test multi-platform docker images
```shell
docker run -it --rm --init --platform linux/amd64 ghcr.io/xmlking/entity-resolution/entity:latest
docker run -it --rm --init --platform linux/arm64 ghcr.io/xmlking/entity-resolution/entity:latest

Local Docker Build

multi-platform, multi-stage, multi-module local build

#VERSION=$(git describe --tags || echo "HEAD")
VERSION=v0.1.262
BUILD_DATE=$(date +%FT%T%Z)
DOCKER_IMAGE=ghcr.io/xmlking/entity-resolution/entity

# build 
docker buildx create --use

docker buildx build --platform linux/arm64,linux/amd64 \
-t $DOCKER_IMAGE:$VERSION \
-t $DOCKER_IMAGE:latest \
--build-arg BUILD_DATE=$BUILD_DATE --build-arg VERSION=$VERSION \
--secret id=BUF_TOKEN,src=buf_token.txt \
-f Dockerfile.local --push .

# inspect
docker buildx imagetools inspect $DOCKER_IMAGE:$VERSION
docker buildx imagetools inspect --raw $DOCKER_IMAGE:$VERSION

# run
docker run -it --rm --platform linux/arm64 $DOCKER_IMAGE:$VERSION
docker run -it --rm --platform linux/amd64 $DOCKER_IMAGE:$VERSION
# Notice `platform` in the build_info logs
# build_info={"branch":"main","build_time":"","commit":"","compiler":"gc","go_version":"go1.19","platform":"linux/arm64","state":"dirty","tag":""}

TODO

Reference