Description
Describe the bug
We run Garnet in cluster mode. Along with it, we use transactions (https://github.com/microsoft/garnet/blob/main/test/Garnet.test/Extensions/RateLimiterTxn.cs)
We found out that keys inserted using transactions are not replicated to Replica Nodes.
Steps to reproduce the bug
Run two instances of Garnet. You would also need to enable transactions. We use RateLimit similar to the one linked above.
Instance 1 config (Port 7005)
--port 7005 --memory 1g --index 500m --obj-heap-memory 240m --obj-log-memory 10m --obj-index 64m --cluster --clean-cluster-config --aof --aof-commit-freq -1 --gossip-sp 70 --logger-level Trace --logger-freq 5 --fast-commit --main-memory-replication --on-demand-checkpoint --aof-null-device --network-connection-limit 10000
Instance 2 config (Port 7006)
--port 7006 --memory 1g --index 500m --obj-heap-memory 240m --obj-log-memory 10m --obj-index 64m --cluster --clean-cluster-config --aof --aof-commit-freq -1 --gossip-sp 70 --logger-level Trace --logger-freq 5 --fast-commit --main-memory-replication --on-demand-checkpoint --aof-null-device --network-connection-limit 10000
Make one of the instances as Primary(7005), add slotsrange and other node as Replica(7006) for it.
memurai-cli -p 7005
127.0.0.1:7005> CLUSTER MEET 127.0.0.1 7006
127.0.0.1:7005> CLUSTER ADDSLOTSRANGE 0 16383
OK
memurai-cli -p 7006
127.0.0.1:7006> CLUSTER NODES
ab0cb4531668c2303ea3b8cfff647e712369061e 127.0.0.1:7006@17006,CPC-tekul-BRIOW myself,master - 0 0 0 connected 1737b9e096934c763657fdc589a27e988a275ca8 127.0.0.1:7005@17005,CPC-tekul-BRIOW master - 638853143180999062 638853143180986380 1 connected
127.0.0.1:7006> CLUSTER REPLICATE 1737b9e096934c763657fdc589a27e988a275ca8
OK
Run a transaction on primary node. We see that two keys are present on running KEYS *. Same can be observed using SCAN command with TYPE zset
127.0.0.1:7005> RATELIMIT X 1000000 1000000
ALLOWED 1
(2.98s)
127.0.0.1:7005> RATELIMIT Y 100000 100000
ALLOWED 1
(3.14s)
127.0.0.1:7005> KEYS *
1) "X"
2) "Y"
(0.95s)
But no keys are replicated to the Replica node.
127.0.0.1:7006> KEYS *
(empty array)
(4.00s)
The issue is specific to transactions. If we manually insert using sorted set, the behavior is as expected.
127.0.0.1:7005> ZADD Z 100000 100000
(integer) 0
127.0.0.1:7005> KEYS *
1) "X"
2) "Z"
3) "Y"
On replica node:
127.0.0.1:7006> KEYS *
1) "Z"
(12.05s)
The issue on secondary node is, the ReplicaReplayTask failingg due to uninitialized clusterSession within respSession during ResetCacheSlotVerificationResult.
Attaching the stacktrace
Expected behavior
Replication happens as expected and replica nodes have all the keys.
Screenshots
No response
Release version
v1.0.70
IDE
No response
OS version
No response
Additional context
No response