Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB][LRU] Tserver continues to access SST files that were deleted as a result they are not cleaned from the system[2.8.8.0-b2] #13834

Open
kripasreenivasan opened this issue Sep 1, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug long_running_universe priority/medium Medium priority issue

Comments

@kripasreenivasan
Copy link
Contributor

kripasreenivasan commented Sep 1, 2022

Jira Link: DB-3337

Description

  • Create a universe in 2.8.8.0-b2 with the following gflags

image

  • Trigger a YCQL- Cassandra batch time series workload from a load VM with a large no. of keys such that it will execute over 5-6 days without interruption.
  • After the workload ends, connect to the leader node of the universe and drop the YCQL table created by the above workload.
    Issue:
  1. Navigate to the UI and reload the overview and nodes tab of this universe. You will find that though the table was successfully dropped and disappears from the tables listing tab, there is no difference to the disk usage metrics, SST file size before and after deleetion. On navigating to the path corresponding to the table in question in any of the nodes, you will find the tablets are not deleted: /mnt/d0/yb-data/tserver/data/rocksdb/table-c07537a0504c463e92b80e1e0c277841/tablet-1a1a2c6d7d6c4af48265a33194d2d20c
  2. There are lots of open files for the tserver process-
[yugabyte@ip-172-151-27-132 master]$ ps aux|grep yb-tserver
yugabyte  7803  217 34.1 18932688 2629692 ?    Sl   Aug25 21158:29 /home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/server.conf
yugabyte 18297  0.0  0.0 112636   984 pts/0    S+   04:57   0:00 grep --color=auto yb-tserver
  1. Looks like the SST files were deleted (unlinked on filesystem), but for some reason tserver is still accessing them, thus they can’t actually be cleaned from the filesystem yet. We should not keep them open when we delete them from File system.
[yugabyte@ip-172-151-27-132 fd]$ ls -lha | grep sst | grep deleted | wc -l
312

A rolling restart fixes this however, we should handle this gracefully without a restart.

This was observed on long running universe repeatedly: http://portal.aws-stable.dev.yugabyte.com/universes/309cdc4f-a785-4013-86e6-3d6555eab86a/overview

Slack thread: https://yugabyte.slack.com/archives/C01CB38CZHU/p1661420772861299

CC: @bmatican , @ragh

@kripasreenivasan kripasreenivasan added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage long_running_universe labels Sep 1, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Sep 1, 2022
@kripasreenivasan kripasreenivasan changed the title [DocDB][LRU] On dropping table after a long running(6 days) workload ends, tablets associated with that table are not getting deleted since tserver is still accessing them[2.8.8.0-b2] [DocDB][LRU] tserver continues to access SST files that were deleted as a result they are not cleaned from the system[2.8.8.0-b2] Sep 1, 2022
@kripasreenivasan kripasreenivasan changed the title [DocDB][LRU] tserver continues to access SST files that were deleted as a result they are not cleaned from the system[2.8.8.0-b2] [DocDB][LRU] Tserver continues to access SST files that were deleted as a result they are not cleaned from the system[2.8.8.0-b2] Sep 1, 2022
@yugabyte-ci yugabyte-ci assigned rthallamko3 and unassigned bmatican Sep 6, 2022
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug long_running_universe priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants