Entity resolution (ER) is the task of disambiguating records that correspond to real world entities across and within datasets.
The problems associated with entity resolution are equally big as the volume and velocity of data grow, inference across networks and semantic relationships between entities becomes increasingly difficult. This project attempts to provide a solution using Elasticsearch and Graph Database.
Links: RedisGraph, Data Modeling
gh repo clone xmlking/entity-resolution
export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build
go work sync
go mod tidy
# cog install-hook all
cog install-hook commit-msg
# you can verify if the hooks are installed by running
cat .git/hooks/commit-msg
Update generated proto code from BSR, after you publish proto to BSR
export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build
go get go.buf.build/grpc/go/<repo_owner>/entityapis
go work sync
update outdated Go dependencies interactively
export GOPRIVATE=github.com/<repo_owner>/*,go.buf.build
GOWORK="off" go-mod-upgrade
# then commit the changes.
go work sync
task mod:outdated
task mod:sync
task mod:verify
before commit, line your code with following command in the order:
go fmt ./...
#golangci-lint --version
golangci-lint run -c .github/linters/.golangci.yml
docker compose up
# docker compose up redis
# open Grafana UI and enable redis plugin
open http://localhost:3000/plugins/redis-app/
open http://localhost:3000/dashboards
# to ssh to grafana
docker-compose exec grafana /bin/bash
cd /etc/grafana/provisioning
# stop
docker compose down
# this will stop redis and remove all volumes
docker compose down -v
# first generate go code.
go generate ./...
# run engine
go run ./service/entity/...
#go run ./cmd/er/...
{
"externalId": "123e4567-e89b-12d3-a456-426614174000",
"names": [{"first":"sumo", "last": "demo"}],
"gender": "GENDER_MALE",
"emails": {},
"phones": {},
"addresses": []
}
To see all config environment variable options, run:
CONFY_LOG_LEVEL=debug \
CONFY_DEBUG_MODE=true \
CONFY_VERBOSE_MODE=true \
go run ./service/entity/...
# first generate go code. <-- IMPORTANT
go generate ./...
go test -v ./service/entity/...
go test -v ./cmd/er/...
# first generate go code.
go generate ./...
go build -o build ./service/entity/...
go build -o build ./cmd/er/...
Following command bump VERSION number and push changes
and tag
to remote
Then, GitHub Action trigger GoReleaser
process.
NOTE: make sure you commit all changes before running this command.
###
```shell
# dry-run: calculate the next version based on the commit types since the latest tag
cog bump --auto --dry-run
# calculate the next version based on the commit types since the latest tag
cog bump --auto
- check cog docs
- check GoReleaser docs
Test multi-platform docker images
```shell
docker run -it --rm --init --platform linux/amd64 ghcr.io/xmlking/entity-resolution/entity:latest
docker run -it --rm --init --platform linux/arm64 ghcr.io/xmlking/entity-resolution/entity:latest
multi-platform, multi-stage, multi-module local build
#VERSION=$(git describe --tags || echo "HEAD")
VERSION=v0.1.262
BUILD_DATE=$(date +%FT%T%Z)
DOCKER_IMAGE=ghcr.io/xmlking/entity-resolution/entity
# build
docker buildx create --use
docker buildx build --platform linux/arm64,linux/amd64 \
-t $DOCKER_IMAGE:$VERSION \
-t $DOCKER_IMAGE:latest \
--build-arg BUILD_DATE=$BUILD_DATE --build-arg VERSION=$VERSION \
--secret id=BUF_TOKEN,src=buf_token.txt \
-f Dockerfile.local --push .
# inspect
docker buildx imagetools inspect $DOCKER_IMAGE:$VERSION
docker buildx imagetools inspect --raw $DOCKER_IMAGE:$VERSION
# run
docker run -it --rm --platform linux/arm64 $DOCKER_IMAGE:$VERSION
docker run -it --rm --platform linux/amd64 $DOCKER_IMAGE:$VERSION
# Notice `platform` in the build_info logs
# build_info={"branch":"main","build_time":"","commit":"","compiler":"gc","go_version":"go1.19","platform":"linux/arm64","state":"dirty","tag":""}
- Start/Stop ES with docker-compose
- Generate test data
- Batch load test data with benthos
- GoLang