Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async I/O on a zfs file system causes system hang #7335

Closed
ColinIanKing opened this issue Mar 22, 2018 · 7 comments
Closed

async I/O on a zfs file system causes system hang #7335

ColinIanKing opened this issue Mar 22, 2018 · 7 comments
Projects

Comments

@ColinIanKing
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version Bionic, Artful, Xenial
Linux Kernel 4.15.0-13-generic, 4.13.0-37-generic, 4.4.0-116-generic
Architecture x86-64
ZFS Version 0.7.5-1ubuntu6, 0.6.5.11-1ubuntu3, 0.6.5.6-0ubuntu16
SPL Version 0.7.5-1ubuntu1, 0.6.5.11-1ubuntu1, 0.6.5.6-0ubuntu4

Running the reproducer program shown below will cause a soft lockup. This occurs after the async I/O read occurs. strace on the program shows it hangs on the read io_submit. If the read/write buffer being passed is NOT mmap'd then the hang does not happen.

io_setup(1024, [0x7f0a46d58000]) = 0
openat(AT_FDCWD, "testfile", O_RDWR|O_CREAT, 04130) = 3
ftruncate(3, 512) = 0
brk(NULL) = 0x55645fa55000
brk(0x55645fa76000) = 0x55645fa76000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f0a46d7a000
io_submit(0x7f0a46d58000, 1, [{aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=3, aio_buf="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., aio_nbytes=512, aio_offset=0}]) = 1
io_getevents(0x7f0a46d58000, 0, 1, [{data=0, obj=0x7ffda2080710, res=512, res2=0}], {tv_sec=30, tv_nsec=0}) = 1
io_submit(0x7f0a46d58000, 1, [{aio_lio_opcode=IOCB_CMD_PREAD, aio_fildes=3, aio_buf=0x7f0a46d7a000, aio_nbytes=512, aio_offset=0}]

Mar 22 17:06:39 ubuntu kernel: [  605.143429] INFO: task 5.p:4208 blocked for more than 120 seconds.
Mar 22 17:06:39 ubuntu kernel: [  605.145306]       Tainted: P           O     4.15.0-13-generic #14-Ubuntu
Mar 22 17:06:39 ubuntu kernel: [  605.147111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 22 17:06:39 ubuntu kernel: [  605.149019] 5.p             D    0  4208   4206 0x00000003
Mar 22 17:06:39 ubuntu kernel: [  605.149025] Call Trace:
Mar 22 17:06:39 ubuntu kernel: [  605.149095]  __schedule+0x297/0x8b0
Mar 22 17:06:39 ubuntu kernel: [  605.149112]  schedule+0x2c/0x80
Mar 22 17:06:39 ubuntu kernel: [  605.149137]  io_schedule+0x16/0x40
Mar 22 17:06:39 ubuntu kernel: [  605.149153]  __lock_page+0xff/0x140
Mar 22 17:06:39 ubuntu kernel: [  605.149159]  ? page_cache_tree_insert+0xe0/0xe0
Mar 22 17:06:39 ubuntu kernel: [  605.149163]  filemap_page_mkwrite+0xb3/0xc0
Mar 22 17:06:39 ubuntu kernel: [  605.149171]  do_page_mkwrite+0x35/0x90
Mar 22 17:06:39 ubuntu kernel: [  605.149187]  do_wp_page+0x22b/0x600
Mar 22 17:06:39 ubuntu kernel: [  605.149193]  ? mutex_lock+0x12/0x40
Mar 22 17:06:39 ubuntu kernel: [  605.149197]  handle_pte_fault+0x69d/0xdb0
Mar 22 17:06:39 ubuntu kernel: [  605.149201]  ? mutex_lock+0x12/0x40
Mar 22 17:06:39 ubuntu kernel: [  605.149210]  __handle_mm_fault+0x47b/0x5c0
Mar 22 17:06:39 ubuntu kernel: [  605.149216]  handle_mm_fault+0xb1/0x1f0
Mar 22 17:06:39 ubuntu kernel: [  605.149226]  __do_page_fault+0x250/0x4d0
Mar 22 17:06:39 ubuntu kernel: [  605.149230]  do_page_fault+0x2e/0xe0
Mar 22 17:06:39 ubuntu kernel: [  605.149240]  do_async_page_fault+0x51/0x80
Mar 22 17:06:39 ubuntu kernel: [  605.149245]  async_page_fault+0x25/0x50
Mar 22 17:06:39 ubuntu kernel: [  605.149252] RIP: 0010:copy_user_enhanced_fast_string+0xe/0x20
Mar 22 17:06:39 ubuntu kernel: [  605.149254] RSP: 0018:ffffb1e28848fb80 EFLAGS: 00010206
Mar 22 17:06:39 ubuntu kernel: [  605.149257] RAX: 00007f2e2648c200 RBX: 0000000000000200 RCX: 0000000000000200
Mar 22 17:06:39 ubuntu kernel: [  605.149260] RDX: 0000000000000200 RSI: ffff88ffdf518000 RDI: 00007f2e2648c000
Mar 22 17:06:39 ubuntu kernel: [  605.149261] RBP: ffffb1e28848fb88 R08: ffffb1e288490000 R09: ffffdd05047d4600
Mar 22 17:06:39 ubuntu kernel: [  605.149263] R10: 0000000000000040 R11: ffffb1e28848fb68 R12: ffffb1e28848fc60
Mar 22 17:06:39 ubuntu kernel: [  605.149265] R13: 0000000000000000 R14: ffff88ffdf518000 R15: ffffb1e28848fd70
Mar 22 17:06:39 ubuntu kernel: [  605.149279]  ? _copy_to_user+0x26/0x30
Mar 22 17:06:39 ubuntu kernel: [  605.149291]  uiomove+0x116/0x260 [zcommon]
Mar 22 17:06:39 ubuntu kernel: [  605.149388]  zfs_read+0x1dd/0x3f0 [zfs]
Mar 22 17:06:39 ubuntu kernel: [  605.149465]  zpl_read_common_iovec+0x80/0xd0 [zfs]
Mar 22 17:06:39 ubuntu kernel: [  605.149537]  zpl_iter_read+0x9c/0xe0 [zfs]
Mar 22 17:06:39 ubuntu kernel: [  605.149562]  aio_read+0xee/0x150
Mar 22 17:06:39 ubuntu kernel: [  605.149567]  ? _cond_resched+0x19/0x40
Mar 22 17:06:39 ubuntu kernel: [  605.149572]  ? ptrace_stop+0x1a6/0x260
Mar 22 17:06:39 ubuntu kernel: [  605.149576]  ? _cond_resched+0x19/0x40
Mar 22 17:06:39 ubuntu kernel: [  605.149579]  do_io_submit+0x46a/0x670
Mar 22 17:06:39 ubuntu kernel: [  605.149583]  ? do_io_submit+0x46a/0x670
Mar 22 17:06:39 ubuntu kernel: [  605.149587]  ? ptrace_notify+0x5b/0x90
Mar 22 17:06:39 ubuntu kernel: [  605.149592]  SyS_io_submit+0x10/0x20
Mar 22 17:06:39 ubuntu kernel: [  605.149595]  ? SyS_io_submit+0x10/0x20
Mar 22 17:06:39 ubuntu kernel: [  605.149606]  do_syscall_64+0x73/0x130
Mar 22 17:06:39 ubuntu kernel: [  605.149612]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Describe how to reproduce the problem

Compile the C source listed below and run:

gcc test.c -laio
run on a ZFS file system:

./a.out

..and it will hang and get soft lockup timeouts after 120 seconds on Ubuntu.

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <libaio.h>
#include <fcntl.h>
#include <err.h>

io_context_t io_ctx;

void do_sync_io(struct iocb *iocb)
{
        struct io_event event;
        struct iocb *iocbs[] = { iocb };
        struct timespec ts = { 30, 0 };

        if (io_submit(io_ctx, 1, iocbs) != 1)
                err(1, "io_submit failed");
        if (io_getevents(io_ctx, 0, 1, &event, &ts) != 1)
                err(1, "io_getevents failed");
}

int main(void)
{
        char *buf;
        int rwfd, status = 0, res, page_size = getpagesize();
        struct iocb iocb;

        if (io_queue_init(1024, &io_ctx))
                err(1, "io_queue_init failed");
        rwfd = open("testfile", O_RDWR | O_CREAT);
        if (rwfd < 0)
                err(1, "open failed");
        if (ftruncate(rwfd, 512) < 0)
                err(1, "ftruncate failed");
        buf = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, rwfd, 0);
        if (buf == MAP_FAILED)
                err(1, "mmap failed");

        (void)io_prep_pwrite(&iocb, rwfd, buf, 512, 0);
        do_sync_io(&iocb);

        (void)io_prep_pread(&iocb, rwfd, buf, 512, 0);
        do_sync_io(&iocb);
}
@ColinIanKing
Copy link
Contributor Author

I suspect this happens because the page is being locked in mappedread and again in filemap_page_mkwrite. Since the page is already locked, the locking of the page in filemap_page_mkwrite causes the task to block indefinitely.

@ColinIanKing
Copy link
Contributor Author

Touching the mmap'd buffer (e.g. *buf = 0;) before the read on the final do_sync_io() call causes the page to be faulted and then we don't get the hang

@ColinIanKing
Copy link
Contributor Author

issue occurs with latest (today's) zfs + spl on 4.15

@ColinIanKing
Copy link
Contributor Author

ColinIanKing commented Mar 23, 2018

This helps, but is clearly sub-optimal.

diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
index a2d7d7b24..ad4ce7eb6 100644
--- a/module/zfs/zfs_vnops.c
+++ b/module/zfs/zfs_vnops.c
@@ -391,6 +391,11 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
        int len = nbytes;
        int error = 0;
        void *pb;
+       void *p;
+
+       p = kmalloc(PAGE_SIZE, GFP_KERNEL);
+       if (!p)
+               return ENOMEM;
 
        start = uio->uio_loffset;
        off = start & (PAGE_SIZE-1);
@@ -402,7 +407,7 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
                        ASSERT(PageUptodate(pp));
 
                        pb = kmap(pp);
-                       error = uiomove(pb + off, bytes, UIO_READ, uio);
+                       memcpy(p, pb + off, bytes);
                        kunmap(pp);
 
                        if (mapping_writably_mapped(mp))
@@ -411,6 +416,8 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
                        mark_page_accessed(pp);
                        unlock_page(pp);
                        put_page(pp);
+
+                       error = uiomove(p, bytes, UIO_READ, uio);
                } else {
                        error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl),
                            uio, bytes);
@@ -421,6 +428,8 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
                if (error)
                        break;
        }
+
+       kfree(p);
        return (error);
 }

@ColinIanKing
Copy link
Contributor Author

ColinIanKing commented Mar 23, 2018

Clearly it's a double-lock on the page causing the problem, if I unlock it before the uiomove we don't get the lockup hang:

diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
index a2d7d7b24..3baa04e7a 100644
--- a/module/zfs/zfs_vnops.c
+++ b/module/zfs/zfs_vnops.c
@@ -402,7 +402,9 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
                        ASSERT(PageUptodate(pp));
 
                        pb = kmap(pp);
-                       error = uiomove(pb + off, bytes, UIO_READ, uio);
+                       unlock_page(pp);
+                       uiomove(pb + off, bytes, UIO_READ, uio);
+                       lock_page(pp);
                        kunmap(pp);
 
                        if (mapping_writably_mapped(mp))

Any ideas of a suitable fix?

behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 24, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

This issue is that uiomove() must be called with the page lock held
in order to safely populate the page date.  If the page needs to be
faulted in by filemap_page_mkwrite() then it will also take the page
lock resulting in a double-lock.

Normally this isn't an issue since the pages are very likely to be
already faulted in.  This patch makes sure that is always the case
by prefaulting in the user space pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
@behlendorf
Copy link
Contributor

@ColinIanKing thanks for getting to the heart of the matter for putting together such a nice test case! I've opened #7339 with a proposed fix for your review and verification. What I'm suggesting is that we simply always ensure the page is prefaulted.

@ColinIanKing
Copy link
Contributor Author

Thanks for the speedy turnaround on this fix. I'll test it out on Monday with our own Canonical file system tests in the mix to give it a thorough test.

behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 25, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

This issue is that uiomove() must be called with the page lock held
in order to safely populate the page date.  If the page needs to be
faulted in by filemap_page_mkwrite() then it will also take the page
lock resulting in a double-lock.

Normally this isn't an issue since the pages are very likely to be
already faulted in.  This patch makes sure that is always the case
by prefaulting in the user space pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 26, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

This issue is that uiomove() must be called with the page lock held
in order to safely populate the page date.  If the page needs to be
faulted in by filemap_page_mkwrite() then it will also take the page
lock resulting in a double-lock.

Normally this isn't an issue since the pages are very likely to be
already faulted in.  This patch makes sure that is always the case
by prefaulting in the user space pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 26, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

This issue is that uiomove() must be called with the page lock held
in order to safely populate the page date.  If the page needs to be
faulted in by filemap_page_mkwrite() then it will also take the page
lock resulting in a double-lock.

Normally this isn't an issue since the pages are very likely to be
already faulted in.  This patch makes sure that is always the case
by prefaulting in the user space pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 26, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

This issue is that uiomove() must be called with the page lock held
in order to safely populate the page date.  If the page needs to be
faulted in by filemap_page_mkwrite() then it will also take the page
lock resulting in a double-lock.

Normally this isn't an issue since the pages are very likely to be
already faulted in.  This patch makes sure that is always the case
by prefaulting in the user space pages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 27, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

Resolve the issue by only taking a reference on the page when
copying it and not the page lock.  The inode range lock protects
against concurrent updates via zfs_read() and zfs_write().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 27, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

Resolve the issue by only taking a reference on the page when
copying it and not the page lock.  The inode range lock protects
against concurrent updates via zfs_read() and zfs_write().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 27, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

Resolve the issue by only taking a reference on the page when
copying it and not the page lock.  The inode range lock protects
against concurrent updates via zfs_read() and zfs_write().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 27, 2018
Calling uiomove() in mappedread() can result in deadlock if the
user space page needs to be faulted in.

Resolve the issue by only taking a reference on the page when
copying it and not the page lock.  The inode range lock protects
against concurrent updates via zfs_read() and zfs_write().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7335
@behlendorf behlendorf added this to TODO in 0.7.9 Mar 27, 2018
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 16, 2018
Calling uiomove() in mappedread() under the page lock can result
in a deadlock if the user space page needs to be faulted in.

Resolve the issue by dropping the page lock before the uiomove().
The inode range lock protects against concurrent updates via
zfs_read() and zfs_write().

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#7335
Closes openzfs#7339
@tonyhutter tonyhutter moved this from TODO to In progress in 0.7.9 Apr 30, 2018
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue May 4, 2018
Calling uiomove() in mappedread() under the page lock can result
in a deadlock if the user space page needs to be faulted in.

Resolve the issue by dropping the page lock before the uiomove().
The inode range lock protects against concurrent updates via
zfs_read() and zfs_write().

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#7335
Closes openzfs#7339
tonyhutter pushed a commit that referenced this issue May 10, 2018
Calling uiomove() in mappedread() under the page lock can result
in a deadlock if the user space page needs to be faulted in.

Resolve the issue by dropping the page lock before the uiomove().
The inode range lock protects against concurrent updates via
zfs_read() and zfs_write().

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7335
Closes #7339
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.9
  
In progress
Development

No branches or pull requests

2 participants