Skip to content

Replication does not work for Transactions #1250

Open
@Xizt

Description

@Xizt

Describe the bug

We run Garnet in cluster mode. Along with it, we use transactions (https://github.com/microsoft/garnet/blob/main/test/Garnet.test/Extensions/RateLimiterTxn.cs)

We found out that keys inserted using transactions are not replicated to Replica Nodes.

Steps to reproduce the bug

Run two instances of Garnet. You would also need to enable transactions. We use RateLimit similar to the one linked above.
Instance 1 config (Port 7005)
--port 7005 --memory 1g --index 500m --obj-heap-memory 240m --obj-log-memory 10m --obj-index 64m --cluster --clean-cluster-config --aof --aof-commit-freq -1 --gossip-sp 70 --logger-level Trace --logger-freq 5 --fast-commit --main-memory-replication --on-demand-checkpoint --aof-null-device --network-connection-limit 10000

Instance 2 config (Port 7006)
--port 7006 --memory 1g --index 500m --obj-heap-memory 240m --obj-log-memory 10m --obj-index 64m --cluster --clean-cluster-config --aof --aof-commit-freq -1 --gossip-sp 70 --logger-level Trace --logger-freq 5 --fast-commit --main-memory-replication --on-demand-checkpoint --aof-null-device --network-connection-limit 10000


Make one of the instances as Primary(7005), add slotsrange and other node as Replica(7006) for it.
memurai-cli -p 7005
127.0.0.1:7005> CLUSTER MEET 127.0.0.1 7006
127.0.0.1:7005> CLUSTER ADDSLOTSRANGE 0 16383
OK

memurai-cli -p 7006
127.0.0.1:7006> CLUSTER NODES
ab0cb4531668c2303ea3b8cfff647e712369061e 127.0.0.1:7006@17006,CPC-tekul-BRIOW myself,master - 0 0 0 connected 1737b9e096934c763657fdc589a27e988a275ca8 127.0.0.1:7005@17005,CPC-tekul-BRIOW master - 638853143180999062 638853143180986380 1 connected
127.0.0.1:7006> CLUSTER REPLICATE 1737b9e096934c763657fdc589a27e988a275ca8
OK


Run a transaction on primary node. We see that two keys are present on running KEYS *. Same can be observed using SCAN command with TYPE zset

127.0.0.1:7005> RATELIMIT X 1000000 1000000
ALLOWED 1
(2.98s)
127.0.0.1:7005> RATELIMIT Y 100000 100000
ALLOWED 1
(3.14s)
127.0.0.1:7005> KEYS *
1) "X"
2) "Y"
(0.95s)

But no keys are replicated to the Replica node.
127.0.0.1:7006> KEYS *
(empty array)
(4.00s)

The issue is specific to transactions. If we manually insert using sorted set, the behavior is as expected.
127.0.0.1:7005> ZADD Z 100000 100000
(integer) 0
127.0.0.1:7005> KEYS *
1) "X"
2) "Z"
3) "Y"

On replica node:
127.0.0.1:7006> KEYS *
1) "Z"
(12.05s)


The issue on secondary node is, the ReplicaReplayTask failingg due to uninitialized clusterSession within respSession during ResetCacheSlotVerificationResult.

Attaching the stacktrace

Image

Image

Expected behavior

Replication happens as expected and replica nodes have all the keys.

Screenshots

No response

Release version

v1.0.70

IDE

No response

OS version

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions