Skip to content

feat: add GetTaskQueueUserData RPC to admin service#9934

Merged
veeral-patel merged 5 commits intotemporalio:mainfrom
veeral-patel:feature/task-queue-get-user-data-admin-rpc
Apr 15, 2026
Merged

feat: add GetTaskQueueUserData RPC to admin service#9934
veeral-patel merged 5 commits intotemporalio:mainfrom
veeral-patel:feature/task-queue-get-user-data-admin-rpc

Conversation

@veeral-patel
Copy link
Copy Markdown
Contributor

@veeral-patel veeral-patel commented Apr 13, 2026

What changed?

This PR adds a new GetTaskQueueUserData RPC to the admin service.

Given a namespace, task queue name, task queue type, and optional partition ID (default is 0, which is root), it returns the user data currently loaded by that partition.

This PR wraps the existing GetTaskQueueUserData RPC in Matching Service. We will also create a tdbg command which calls this Admin Service RPC, in a separate PR.

Why?

Each task queue family has associated metadata, stored in TaskQueueUserData. Metadata related to worker versioning, queue rate limiting, fairness are all stored in TaskQueueUserData.

TaskQueueUserData is replicated from the root partition to all other partitions.

However, there is no admin-accessible way to read the user data loaded by a specific partition or compare versions across partitions to diagnose replication lag.

Files changed

File Change
proto/internal/.../adminservice/v1/request_response.proto Added GetTaskQueueUserDataRequest and GetTaskQueueUserDataResponse messages
proto/internal/.../adminservice/v1/service.proto Added GetTaskQueueUserData RPC to AdminService
service/frontend/admin_handler.go Implemented AdminHandler.GetTaskQueueUserData: validates request, resolves namespace → ID, builds partition RPC name via tqid, calls matching service, returns per-type entry + version
service/frontend/admin_handler_test.go Added unit tests

How did you test it?

  • built
  • run locally and tested manually
  • added new unit test(s)
  • added new integration test(s) - not applicable, not touching persistence layer
  • added new functional test(s)

Unit tests

100% unit test coverage

Test case Input Expected
Nil request request == nil errRequestNotSet
Empty namespace namespace == "" errNamespaceNotSet
Namespace not found Namespace registry returns not-found Error propagated; matching never called
Invalid task queue name task_queue starts with /_sys/ INVALID_ARGUMENT from tqid.NewTaskQueueFamily; matching never called
Root partition partition_id=0, workflow type Sends bare name my-queue to matching; returns correct user_data and version
Non-root partition partition_id=1, workflow type Sends mangled name /_sys/my-queue/1 to matching
No per-type data Matching returns response with empty per_type map user_data is nil; version still populated
Matching error Matching client returns error Error propagated to caller

Functional tests

Test Setup What it verifies
TestAdminGetTaskQueueUserData_RootPartition Write fairness weight config to a workflow task queue Admin RPC resolves namespace by name, routes to root partition (partition_id=0), returns version > 0 and non-nil per-type data
TestAdminGetTaskQueueUserData_NonRootPartition Same write, then poll until non-root partition replicates Admin RPC routes to a non-root partition (partition_id=1) via mangled name, returns the same version as root after replication

Manual tests

Setup
  1. Build and start the server: make temporal-server && make start-sqlite
  2. Create namespace: temporal operator namespace create default
  3. Insert assignment rule: temporal task-queue versioning insert-assignment-rule
Case 1 — Root partition, workflow type
grpcurl -plaintext \
  -d '{"namespace":"default","task_queue":"my-queue","task_queue_type":"TASK_QUEUE_TYPE_WORKFLOW"}' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
{
  "version": "1"
}
Case 2 — Root partition, activity type
grpcurl -plaintext \
  -d '{"namespace":"default","task_queue":"my-queue","task_queue_type":"TASK_QUEUE_TYPE_ACTIVITY"}' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
{
  "version": "1"
}
Case 3 — Non-root partition, workflow type
grpcurl -plaintext \
  -d '{"namespace":"default","task_queue":"my-queue","task_queue_type":"TASK_QUEUE_TYPE_WORKFLOW","partition_id":1}' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
{
  "version": "1"
}
Case 4 — Non-root partition, activity type
grpcurl -plaintext \
  -d '{"namespace":"default","task_queue":"my-queue","task_queue_type":"TASK_QUEUE_TYPE_ACTIVITY","partition_id":1}' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
{
  "version": "1"
}
Case 5 — Non-root partition, activity type, with user data

Setup: Add rate limit config

grpcurl -plaintext \
  -d '{
    "namespace": "default",
    "task_queue": "my-queue",
    "task_queue_type": "TASK_QUEUE_TYPE_ACTIVITY",
    "update_queue_rate_limit": {
      "rate_limit": {
        "requests_per_second": 50.0
      },
      "reason": "manual test"
    }
  }' \
  localhost:7233 \
  temporal.api.workflowservice.v1.WorkflowService/UpdateTaskQueueConfig
grpcurl -plaintext \
  -d '{
    "namespace": "default",
    "task_queue": "my-queue",
    "task_queue_type": "TASK_QUEUE_TYPE_ACTIVITY"
  }' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
{
  "userData": {
    "config": {
      "queueRateLimit": {
        "rateLimit": {
          "requestsPerSecond": 50
        },
        "metadata": {
          "reason": "manual test",
          "updateTime": "2026-04-13T21:40:39.888Z"
        }
      }
    }
  },
  "version": "2"
}
Case 6 — Namespace not found
grpcurl -plaintext \
  -d '{"namespace":"nonexistent","task_queue":"my-queue","task_queue_type":"TASK_QUEUE_TYPE_WORKFLOW"}' \
  localhost:7233 \
  temporal.server.api.adminservice.v1.AdminService/GetTaskQueueUserData
ERROR:
  Code: NotFound
  Message: Namespace nonexistent is not found.

NOT_FOUND from namespace registry; matching never called.

@veeral-patel veeral-patel requested review from a team as code owners April 13, 2026 21:47
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

@veeral-patel veeral-patel force-pushed the feature/task-queue-get-user-data-admin-rpc branch 2 times, most recently from bfb53ca to ba8f70b Compare April 15, 2026 00:54
veeral-patel and others added 4 commits April 14, 2026 17:56
Exposes per-type task queue user data via the admin service, proxying to
the existing matching service RPC. Accepts namespace name + task queue +
type + optional partition_id, resolving the partition to its wire-format
RPC name for consistent-hash routing to the correct matching host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@veeral-patel veeral-patel force-pushed the feature/task-queue-get-user-data-admin-rpc branch from ba8f70b to 347c4e1 Compare April 15, 2026 00:56

// Fetch the user data currently loaded by the target partition.
// LastKnownUserDataVersion=0: no cached version, always return current data.
// LastKnownEphemeralDataVersion=-1: skip ephemeral data; we only need persisted per-type data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a potential follow up, I wonder if we would ever need a lens to look at the ephemeral user data given that this data is not replicated between partitions.

maybe a question for david/kannan though and is def not a blocker for this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I would like to see the ephemeral data returned through this rpc. that'll be useful for debugging. (and it is replicated between partitions)

if len(request.Namespace) == 0 {
return nil, errNamespaceNotSet
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a check here to see if the task queue being passed is of valid length or not, otherwise, we would make a RPC call to matching and waste that trip

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly overkill, but to add on we should also probably check for partition ID's being non-negative since matching would return an error in those cases.

@veeral-patel veeral-patel merged commit 94556cc into temporalio:main Apr 15, 2026
46 checks passed
veeral-patel added a commit that referenced this pull request Apr 15, 2026
#9935)

## What changed?
This PR adds a `tdbg taskqueue get-user-data` CLI command to `tdbg` to
get the `TaskQueueUserData` for a particular task queue partition.

This PR depends on #9934. See #9934 for more context.
## Files changed
  | File | Change |                                         
  |---|---|
| `tools/tdbg/tdbg_commands.go` | Registered `get-user-data` subcommand
under `taskqueue` with `--namespace`,
`--task-queue`,`--task-queue-type`, and `--partition-id` flags |
| `tools/tdbg/task_queue_commands.go` | Implemented
`AdminGetTaskQueueUserData`: reads flags, calls
`AdminService.GetTaskQueueUserData`, pretty-prints response |
  | `tools/tdbg/task_queue_commands_test.go` | Added unit tests |
## How did you test it?                                   

  - [x] built
  - [x] run locally and tested manually
  - [x] added new unit test(s)
  - [x] added new integration test(s) — not applicable
- [x] added new functional test(s) — not applicable (CLI change)
   
### Unit tests
                                                            
  | Test case | Input | Expected |
  |---|---|---|
| Missing namespace | `--task-queue my-queue` only | Error before RPC is
called |
| Missing task queue | `--namespace default` only | Error before RPC is
called |
| Invalid task queue type | `--task-queue-type INVALID` | `StringToEnum`
returns error before RPC is called |
| Unspecified task queue type | `--task-queue-type
TASK_QUEUE_TYPE_UNSPECIFIED` | Defaults to `TASK_QUEUE_TYPE_WORKFLOW`,
succeeds |
| Root partition (default) | Valid flags, no `--partition-id` | Calls
RPC with `partition_id=0`, prints response |
| Non-root partition | `--partition-id 1` | Calls RPC with
`partition_id=1`, prints response |
   
### Manual tests

  **Setup**
  ```
  make start-sqlite
  temporal operator namespace create -n default
temporal task-queue versioning insert-assignment-rule --namespace
default --task-queue my-queue --build-id "test-build-1" --rule-index 0
--yes
temporal task-queue config set --namespace default --task-queue my-queue
--task-queue-type activity --queue-rps-limit 50 --queue-rps-limit-reason
"manual test"
  ```
   
**Root partition, workflow type**
  ```                                                       
$ ./tdbg taskqueue get-user-data --namespace default --task-queue
my-queue --task-queue-type TASK_QUEUE_TYPE_WORKFLOW
  {
    "version": "2"
}
  ```
`version=2` reflects writes from the assignment rule; `user_data` absent
as expected (versioning rules live in `versioning_data`, not
  `per_type`).                                              

  **Root partition, activity type**
  ```
$ ./tdbg taskqueue get-user-data --namespace default --task-queue
my-queue --task-queue-type TASK_QUEUE_TYPE_ACTIVITY
{
    "version": "2",
"user_data": {
      "config": {                                           
        "queueRateLimit": {
"rateLimit": { "requestsPerSecond": 50 },
          "metadata": { "reason": "manual test", "updateTime": "..." }
}
      }                                                     
    }
  }
  ```
`user_data` populated with the rate limit config set via
`UpdateTaskQueueConfig`.
  **Non-root partition, workflow type**
```
$ ./tdbg taskqueue get-user-data --namespace default --task-queue
my-queue --task-queue-type TASK_QUEUE_TYPE_WORKFLOW --partition-id 1
{
    "version": "2"
}
  ```                                                       
  Same version as root — replication is working.

**Non-root partition, activity type**
  ```                                                       
$ ./tdbg taskqueue get-user-data --namespace default --task-queue
my-queue --task-queue-type TASK_QUEUE_TYPE_ACTIVITY --partition-id 1
  {
    "userData": {
      "config": {
        "queueRateLimit": {
          "rateLimit": {
            "requestsPerSecond": 50
          },
          "metadata": {
            "reason": "manual test",
            "updateTime": "2026-04-13T21:40:39.888Z"
          }
        }
      }
    },
    "version": "2"
}
  ```

  **Unspecified type defaults to workflow**
  ```
$ ./tdbg taskqueue get-user-data --namespace default --task-queue
my-queue --task-queue-type TASK_QUEUE_TYPE_UNSPECIFIED
  {
    "version": "2"
  }
  ```
Passing `TASK_QUEUE_TYPE_UNSPECIFIED` defaults to workflow type instead
of erroring — output matches the workflow-type test above.

---------

Co-authored-by: Veeral Patel <veeralpatel@Veerals-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

// Fetch the user data currently loaded by the target partition.
// LastKnownUserDataVersion=0: no cached version, always return current data.
// LastKnownEphemeralDataVersion=-1: skip ephemeral data; we only need persisted per-type data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I would like to see the ephemeral data returned through this rpc. that'll be useful for debugging. (and it is replicated between partitions)

Comment on lines +1853 to +1857
// partition_id=0 (root) → bare task queue name, e.g. "my-queue".
// partition_id=N → mangled name, e.g. "/_sys/my-queue/N".
// The matching client uses this name for consistent-hash routing to the correct host,
// and the matching engine parses it to find the right in-memory partition manager.
// namespaceID is passed for correctness even though RpcName() only uses the task queue name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments are just repeating information that's already in the tqid library, they're not necessary here

Comment on lines +1878 to +1885
// User data is a family-level map keyed by TaskQueueType (int32).
// Extract only the entry for the requested type and return it alongside the version,
// so callers can compare versions across partitions to check replication lag.
perType := resp.GetUserData().GetData().GetPerType()
return &adminservice.GetTaskQueueUserDataResponse{
UserData: perType[int32(request.GetTaskQueueType())],
Version: resp.GetUserData().GetVersion(),
}, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rpc should return the full userdata. there's no reason at all to limit things to just the per-type data. the caller can drill down if they want.

s.ErrorAs(err, &notFoundErr)
}

func (s *adminHandlerSuite) TestGetTaskQueueUserData() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these test cases are pretty low-value and I don't think we need them. we don't need to test trivial validations, especially on admin handler (which is not accessible to users). we also shouldn't test the rpc name mangling, that's other code's responsibility. we can just have one or two cases for a successful and error call to matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants