Skip to content

Commit

Permalink
dm-raid: really frozen sync_thread during suspend
Browse files Browse the repository at this point in the history
[ Upstream commit 16c4770 ]

1) commit f52f5c7 ("md: fix stopping sync thread") remove
   MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
   dm-raid relies on __md_stop_writes() to frozen sync_thread
   indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
   md_stop_writes(), and since stop_sync_thread() is only used for
   dm-raid in this case, also move stop_sync_thread() to
   md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
   it only prevent new sync_thread to start, and it can't stop the
   running sync thread; In order to frozen sync_thread, after seting the
   flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
   it as condition for md_stop_writes() in raid_postsuspend() doesn't
   look correct. Consider that reentrant stop_sync_thread() do nothing,
   always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
   and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
   new sync_thread can start unexpected. Fix this by disallow
   raid_message() to change sync_thread status during suspend.

Note that after commit f52f5c7 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
Fixes: 1af2048 ("dm raid: fix deadlock caused by premature md_stop_writes()")
Fixes: 9dbd1aa ("dm raid: add reshaping support to the target")
Fixes: f52f5c7 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-6-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
  • Loading branch information
Yu Kuai authored and gregkh committed Apr 3, 2024
1 parent 3685ad3 commit af916cb
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
25 changes: 15 additions & 10 deletions drivers/md/dm-raid.c
Expand Up @@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
rs->md.ro = 1;
rs->md.in_sync = 1;

/* Keep array frozen until resume. */
set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);

/* Has to be held on running the array */
mddev_suspend_and_lock_nointr(&rs->md);

/* Keep array frozen until resume. */
md_frozen_sync_thread(&rs->md);

r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) {
Expand Down Expand Up @@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;

if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
return -EBUSY;

if (!strcasecmp(argv[0], "frozen"))
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
else
Expand Down Expand Up @@ -3796,10 +3800,11 @@ static void raid_postsuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;

if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
/* Writes have to be stopped before suspending to avoid deadlocks. */
if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
md_stop_writes(&rs->md);

/*
* sync_thread must be stopped during suspend, and writes have
* to be stopped before suspending to avoid deadlocks.
*/
md_stop_writes(&rs->md);
mddev_suspend(&rs->md, false);
}
}
Expand Down Expand Up @@ -4012,8 +4017,6 @@ static int raid_preresume(struct dm_target *ti)
}

/* Check for any resize/reshape on @rs and adjust/initiate */
/* Be prepared for mddev_resume() in raid_resume() */
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
mddev->resync_min = mddev->recovery_cp;
Expand Down Expand Up @@ -4055,10 +4058,12 @@ static void raid_resume(struct dm_target *ti)
if (mddev->delta_disks < 0)
rs_set_capacity(rs);

WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
mddev_lock_nointr(mddev);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
md_unfrozen_sync_thread(mddev);
mddev_unlock_and_resume(mddev);
}
}
Expand Down
3 changes: 2 additions & 1 deletion drivers/md/md.c
Expand Up @@ -6369,7 +6369,6 @@ static void md_clean(struct mddev *mddev)

static void __md_stop_writes(struct mddev *mddev)
{
stop_sync_thread(mddev, true, false);
del_timer_sync(&mddev->safemode_timer);

if (mddev->pers && mddev->pers->quiesce) {
Expand All @@ -6394,6 +6393,8 @@ static void __md_stop_writes(struct mddev *mddev)
void md_stop_writes(struct mddev *mddev)
{
mddev_lock_nointr(mddev);
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
stop_sync_thread(mddev, true, false);
__md_stop_writes(mddev);
mddev_unlock(mddev);
}
Expand Down

0 comments on commit af916cb

Please sign in to comment.