Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on big-endian hosts #17

Open
glaubitz opened this issue Nov 27, 2016 · 3 comments
Open

Issues on big-endian hosts #17

glaubitz opened this issue Nov 27, 2016 · 3 comments

Comments

@glaubitz
Copy link
Contributor

glaubitz commented Nov 27, 2016

Hi!

I was recently testing qemu-m68k on a sparc64 machine in the hope to work-around the futex issues we have when running on little-endian hosts (because m68k is big-endian, the futex counter is not passed evaluated properly by the kernel), but to my surprise, qemu-m68k actually crashed:

root@deb4g:~/qemu-m68k# gdb ./m68k-linux-user/qemu-m68k
GNU gdb (Debian 7.11.1-2+b2) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./m68k-linux-user/qemu-m68k...done.
(gdb) run /srv/sid-m68k-sbuild/lib/ld.so.1
Starting program: /root/qemu-m68k/m68k-linux-user/qemu-m68k /srv/sid-m68k-sbuild/lib/ld.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
[New Thread 0xfff80001007ff910 (LWP 141817)]

Thread 1 "qemu-m68k" received signal SIGBUS, Bus error.
0x000000006041e738 in static_code_gen_buffer ()
(gdb) bt
#0 0x000000006041e738 in static_code_gen_buffer ()
#1 0x0000000060006038 in cpu_tb_exec (cpu=0x624b9550, itb=0xfff8000100883940) at /root/qemu-m68k/cpu-exec.c:160
#2 0x0000000060006d1c in cpu_loop_exec_tb (cpu=0x624b9550, tb=0xfff8000100883940, last_tb=0x7feffffea38, tb_exit=0x7feffffea
#3 0x0000000060006f78 in cpu_exec (cpu=0x624b9550) at /root/qemu-m68k/cpu-exec.c:619
#4 0x00000000600469b4 in cpu_loop (env=0x624c1990) at /root/qemu-m68k/linux-user/main.c:3272
#5 0x0000000060049044 in main (argc=2, argv=0x7fefffff5d8, envp=0x7fefffff5f0) at /root/qemu-m68k/linux-user/main.c:5000
(gdb)

This was tested with the 680x0-master-dev branch. Will also now try the m68k-dev branch, hadn't seen that one before.

Adrian

@glaubitz
Copy link
Contributor Author

glaubitz commented Nov 27, 2016

Same crash in the m68k-dev branch:

root@deb4g:/srv/sid-m68k-sbuild# gdb ~/qemu-m68k/m68k-linux-user/qemu-m68k
GNU gdb (Debian 7.11.1-2+b2) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /root/qemu-m68k/m68k-linux-user/qemu-m68k...done.
(gdb) run -cpu m68040 /srv/sid-m68k-sbuild/lib/ld.so.1
Starting program: /root/qemu-m68k/m68k-linux-user/qemu-m68k -cpu m68040 /srv/sid-m68k-sbuild/lib/ld.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
[New Thread 0xfff80001007ff910 (LWP 75080)]

Thread 1 "qemu-m68k" received signal SIGBUS, Bus error.
0x0000000060453368 in static_code_gen_buffer ()
(gdb) bt
#0 0x0000000060453368 in static_code_gen_buffer ()
#1 0x000000006000630c in cpu_tb_exec (cpu=0x624f0170, itb=0xfff8000100883940) at /root/qemu-m68k/cpu-exec.c:164
#2 0x0000000060007140 in cpu_loop_exec_tb (cpu=0x624f0170, tb=0xfff8000100883940, last_tb=0x7feffffea48, tb_exit=0x7feffffea44,
sc=0x7feffffea60) at /root/qemu-m68k/cpu-exec.c:544
#3 0x0000000060007384 in cpu_exec (cpu=0x624f0170) at /root/qemu-m68k/cpu-exec.c:638
#4 0x0000000060050c80 in cpu_loop (env=0x624f85b0) at /root/qemu-m68k/linux-user/main.c:2835
#5 0x00000000600532d8 in main (argc=4, argv=0x7fefffff5e8, envp=0x7fefffff610) at /root/qemu-m68k/linux-user/main.c:4560
(gdb)

@vivier
Copy link
Owner

vivier commented Nov 28, 2016

Did you try to run a qemu-ppc on sparc64 to see if it is specific to m68k or to the generic part?

@glaubitz
Copy link
Contributor Author

Good idea. Just gave it a try, qemu-ppc works fine:

root@deb4g:~/ppc-test/lib/powerpc-linux-gnu# ~/qemu-m68k/ppc-linux-user/qemu-ppc ./ld-2.19.so
Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]
You have invoked ld.so', the helper program for shared library executables. This program usually lives in the file /lib/ld.so', and special directives
in executable files using ELF shared libraries tell the system's program
loader to load the helper program from this file. This helper program loads
the shared libraries needed by the program executable, prepares the program
to run, and runs it. You may invoke this helper program directly from the
command line to load and run an ELF executable file; this is like executing
that file itself, but always uses this helper program from the file you
specified, instead of the helper program file specified in the executable
file you run. This is mostly of use for maintainers to test new versions
of this helper program; chances are you did not intend to run this program.

--list list all dependencies and how they are resolved
--verify verify that given object really is a dynamically linked
object we can handle
--inhibit-cache Do not use /etc/ld.so.cache
--library-path PATH use given PATH instead of content of the environment
variable LD_LIBRARY_PATH
--inhibit-rpath LIST ignore RUNPATH and RPATH information in object names
in LIST
--audit LIST use objects named in LIST as auditors
root@deb4g:~/ppc-test/lib/powerpc-linux-gnu#

vivier pushed a commit that referenced this issue Jan 28, 2017
QEMU will crash with the follow backtrace if the new created thread exited before
we call qemu_thread_set_name() for it.

  (gdb) bt
  #0 0x00007f9a68b095d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
  #1 0x00007f9a68b0acc8 in __GI_abort () at abort.c:90
  #2 0x00007f9a69cda389 in PAT_abort () from /usr/lib64/libuvpuserhotfix.so
  #3 0x00007f9a69cdda0d in patchIllInsHandler () from /usr/lib64/libuvpuserhotfix.so
  #4 <signal handler called>
  #5 pthread_setname_np (th=140298470549248, name=name@entry=0x8cc74a "io-task-worker") at ../nptl/sysdeps/unix/sysv/linux/pthread_setname.c:49
  #6 0x00000000007f5f20 in qemu_thread_set_name (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker") at util/qemu_thread_posix.c:459
  #7 0x00000000007f679e in qemu_thread_create (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker",start_routine=start_routine@entry=0x7c1300 <qio_task_thread_worker>, arg=arg@entry=0x7f99b8001720, mode=mode@entry=1) at util/qemu_thread_posix.c:498
  #8 0x00000000007c15b6 in qio_task_run_in_thread (task=task@entry=0x7f99b80033d0, worker=worker@entry=0x7bd920 <qio_channel_socket_connect_worker>, opaque=0x7f99b8003370, destroy=0x7c6220 <qapi_free_SocketAddress>) at io/task.c:133
  #9 0x00000000007bda04 in qio_channel_socket_connect_async (ioc=0x7f99b80014c0, addr=0x37235d0, callback=callback@entry=0x54ad00 <qemu_chr_socket_connected>, opaque=opaque@entry=0x38118b0, destroy=destroy@entry=0x0) at io/channel_socket.c:191
  #10 0x00000000005487f6 in socket_reconnect_timeout (opaque=0x38118b0) at qemu_char.c:4402
  #11 0x00007f9a6a1533b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
  #12 0x00007f9a6a15299a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
  #13 0x0000000000747386 in glib_pollfds_poll () at main_loop.c:227
  #14 0x0000000000747424 in os_host_main_loop_wait (timeout=404000000) at main_loop.c:272
  #15 0x0000000000747575 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:520
  #16 0x0000000000557d31 in main_loop () at vl.c:2170
  #17 0x000000000041c8b7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5083

Let's detach the new thread after calling qemu_thread_set_name().

Signed-off-by: Caoxinhua <caoxinhua@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <1483493521-9604-1-git-send-email-zhang.zhanghailiang@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vivier pushed a commit that referenced this issue Jan 28, 2017
AioHandlers marked ->is_external must be skipped when aio_node_check()
fails.  bdrv_drained_begin() needs this to prevent dataplane from
submitting new I/O requests while another thread accesses the device and
relies on it being quiesced.

This patch fixes the following segfault:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  2650            bdrv_io_plug(child->bs);
  [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))]
  (gdb) bt
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  #1  0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:1561
  #2  0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589
  #3  0x00005577f5d4240d in virtio_blk_data_plane_handle_output (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/dataplane/virtio-blk.c:158
  #4  0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at qemu/hw/virtio/virtio.c:1304
  #5  0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134
  #6  0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at qemu/aio-posix.c:493
  #7  0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:569
  #8  0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:601
  #9  0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at qemu/block/io.c:2403
  #10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at qemu/block.c:2322
  #11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at qemu/block.c:2465
  #12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at qemu/block.c:3425
  #13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at qemu/block.c:1361
  #14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:491
  #15 0x00005577f6111b1b in blk_remove_all_bs () at qemu/block/block-backend.c:245
  #16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382
  #17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, envp=0x7ffea6eb8440) at qemu/vl.c:4684

The key thing is that bdrv_close() uses bdrv_drained_begin() and
virtio_queue_host_notifier_aio_poll() must not be called.

Thanks to Fam Zheng <famz@redhat.com> for identifying the root cause of
this crash.

Reported-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170124095350.16679-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
vivier pushed a commit that referenced this issue Apr 28, 2017
During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

    qmp_block_commit
      commit_active_start
        bdrv_reopen
          bdrv_reopen_multiple
            bdrv_reopen_prepare
              bdrv_flush
                aio_poll
                  aio_bh_poll
                    aio_bh_call
                      block_job_defer_to_main_loop_bh
                        stream_complete
                          bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.

As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:

    #0  0x0000555555aa79f3 in bdrv_drain_recurse
    #1  0x0000555555aa825d in bdrv_drained_begin
    #2  0x0000555555aa8449 in bdrv_drain
    #3  0x0000555555a9c356 in blk_drain
    #4  0x0000555555aa3cfd in mirror_drain
    #5  0x0000555555a66e11 in block_job_detach_aio_context
    #6  0x0000555555a62f4d in bdrv_detach_aio_context
    #7  0x0000555555a63116 in bdrv_set_aio_context
    #8  0x0000555555a9d326 in blk_set_aio_context
    #9  0x00005555557e38da in virtio_blk_data_plane_stop
    #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
    #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
    #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
    #13 0x00005555559f6a18 in virtio_pci_reset
    #14 0x00005555559139a9 in qdev_reset_one
    #15 0x0000555555916738 in qbus_walk_children
    #16 0x0000555555913318 in qdev_walk_children
    #17 0x0000555555916738 in qbus_walk_children
    #18 0x00005555559168ca in qemu_devices_reset
    #19 0x000055555581fcbb in pc_machine_reset
    #20 0x00005555558a4d96 in qemu_system_reset
    #21 0x000055555577157a in main_loop_should_exit
    #22 0x000055555577157a in main_loop
    #23 0x000055555577157a in main

The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.

Reported-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170418143044.12187-3-famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
vivier pushed a commit that referenced this issue May 28, 2017
This commit fixes a bug which causes the guest to hang. The bug was
observed upon a "receive overrun" (bit #6 of the ICR register)
interrupt which could be triggered post migration in a heavy traffic
environment. Even though the "receive overrun" bit (#6) is masked out
by the IMS register (refer to the log below) the driver still receives
an interrupt as the "receive overrun" bit (#6) causes the "Other" -
bit #24 of the ICR register - bit to be set as documented below. The
driver handles the interrupt and clears the "Other" bit (#24) but
doesn't clear the "receive overrun" bit (#6) which leads to an
infinite loop. Apparently the Windows driver expects that the "receive
overrun" bit and other ones - documented below - to be cleared when
the "Other" bit (#24) is cleared.

So to sum that up:
1. Bit #6 of the ICR register is set by heavy traffic
2. As a results of setting bit #6, bit #24 is set
3. The driver receives an interrupt for bit 24 (it doesn't receieve an
   interrupt for bit #6 as it is masked out by IMS)
4. The driver handles and clears the interrupt of bit #24
5. Bit #6 is still set.
6. 2 happens all over again

The Interrupt Cause Read - ICR register:

The ICR has the "Other" bit - bit #24 - that is set when one or more
of the following ICR register's bits are set:

LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit
#17, MNG - bit #18

This bug can occur with any of these bits depending on the driver's
behaviour and the way it configures the device. However, trying to
reproduce it with any bit other than RX0 is challenging and came to
failure as the drivers don't implement most of these bits, trying to
reproduce it with LSC (Link Status Change - bit #2) bit didn't succeed
too as it seems that Windows handles this bit differently.

Log sample of the storm:

27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004)
27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004)

* This bug behaviour wasn't observed with the Linux driver.

This commit solves:
https://bugzilla.redhat.com/show_bug.cgi?id=1447935
https://bugzilla.redhat.com/show_bug.cgi?id=1449490

Cc: qemu-stable@nongnu.org
Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
vivier pushed a commit that referenced this issue Jan 23, 2018
Fixes leaks such as:

Direct leak of 2 byte(s) in 1 object(s) allocated from:
    #0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94
    #2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331
    #3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363
    #4 0x55db720f1d46 in readline_hist_add /home/elmarco/src/qq/util/readline.c:258
    #5 0x55db720f2d34 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:387
    #6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896
    #7 0x55db71f9be35 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167
    #8 0x55db71f9bed3 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179
    #9 0x55db71fa013c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    #10 0x55db71fe18a8 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    #11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182
    #12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847
    #13 0x55db720af3bd in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214
    #14 0x55db720af505 in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261
    #15 0x55db720af6d6 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515
    #16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995
    #17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914
    #18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039)

(while at it, use g_new0(ReadLineState), it's a bit easier to read)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180104160523.22995-11-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vivier pushed a commit that referenced this issue Jan 23, 2018
Direct leak of 160 byte(s) in 4 object(s) allocated from:
    #0 0x55ed7678cda8 in calloc (/home/elmarco/src/qq/build/x86_64-softmmu/qemu-system-x86_64+0x797da8)
    #1 0x7f3f5e725f75 in g_malloc0 /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:124
    #2 0x55ed778aa3a7 in query_option_descs /home/elmarco/src/qq/util/qemu-config.c:60:16
    #3 0x55ed778aa307 in get_drive_infolist /home/elmarco/src/qq/util/qemu-config.c:140:19
    #4 0x55ed778a9f40 in qmp_query_command_line_options /home/elmarco/src/qq/util/qemu-config.c:254:36
    #5 0x55ed76d4868c in qmp_marshal_query_command_line_options /home/elmarco/src/qq/build/qmp-marshal.c:3078:14
    #6 0x55ed77855dd5 in do_qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:104:5
    #7 0x55ed778558cc in qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:131:11
    #8 0x55ed768b592f in handle_qmp_command /home/elmarco/src/qq/monitor.c:3840:11
    #9 0x55ed7786ccfe in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105:5
    #10 0x55ed778fe37c in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323:13
    #11 0x55ed778fdde6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373:15
    #12 0x55ed7786cd83 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124:12
    #13 0x55ed768b559e in monitor_qmp_read /home/elmarco/src/qq/monitor.c:3882:5
    #14 0x55ed77714f29 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167:9
    #15 0x55ed77714fde in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179:9
    #16 0x55ed7772ffad in tcp_chr_read /home/elmarco/src/qq/chardev/char-socket.c:440:13
    #17 0x55ed7777113b in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84:12
    #18 0x7f3f5e71d90b in g_main_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3182
    #19 0x7f3f5e71e7ac in g_main_context_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3847
    #20 0x55ed77886ffc in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214:9
    #21 0x55ed778865fd in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261:5
    #22 0x55ed77886222 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515:11
    #23 0x55ed76d2a4df in main_loop /home/elmarco/src/qq/vl.c:1995:9
    #24 0x55ed76d1cb4a in main /home/elmarco/src/qq/vl.c:4914:5
    #25 0x7f3f555f6039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180104160523.22995-14-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vivier pushed a commit that referenced this issue Jan 23, 2018
Spotted thanks to ASAN:

==25226==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556715a1f120 at pc 0x556714b6f6b1 bp 0x7ffcdfac1360 sp 0x7ffcdfac1350
READ of size 1 at 0x556715a1f120 thread T0
    #0 0x556714b6f6b0 in init_disasm /home/elmarco/src/qemu/disas/s390.c:219
    #1 0x556714b6fa6a in print_insn_s390 /home/elmarco/src/qemu/disas/s390.c:294
    #2 0x55671484d031 in monitor_disas /home/elmarco/src/qemu/disas.c:635
    #3 0x556714862ec0 in memory_dump /home/elmarco/src/qemu/monitor.c:1324
    #4 0x55671486342a in hmp_memory_dump /home/elmarco/src/qemu/monitor.c:1418
    #5 0x5567148670be in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3109
    #6 0x5567148674ed in qmp_human_monitor_command /home/elmarco/src/qemu/monitor.c:613
    #7 0x556714b00918 in qmp_marshal_human_monitor_command /home/elmarco/src/qemu/build/qmp-marshal.c:1704
    #8 0x556715138a3e in do_qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:104
    #9 0x556715138f83 in qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:131
    #10 0x55671485cf88 in handle_qmp_command /home/elmarco/src/qemu/monitor.c:3839
    #11 0x55671514e80b in json_message_process_token /home/elmarco/src/qemu/qobject/json-streamer.c:105
    #12 0x5567151bf2dc in json_lexer_feed_char /home/elmarco/src/qemu/qobject/json-lexer.c:323
    #13 0x5567151bf827 in json_lexer_feed /home/elmarco/src/qemu/qobject/json-lexer.c:373
    #14 0x55671514ee62 in json_message_parser_feed /home/elmarco/src/qemu/qobject/json-streamer.c:124
    #15 0x556714854b1f in monitor_qmp_read /home/elmarco/src/qemu/monitor.c:3881
    #16 0x556715045440 in qemu_chr_be_write_impl /home/elmarco/src/qemu/chardev/char.c:172
    #17 0x556715047184 in qemu_chr_be_write /home/elmarco/src/qemu/chardev/char.c:184
    #18 0x55671505a8e6 in tcp_chr_read /home/elmarco/src/qemu/chardev/char-socket.c:440
    #19 0x5567150943c3 in qio_channel_fd_source_dispatch /home/elmarco/src/qemu/io/channel-watch.c:84
    #20 0x7fb90292b90b in g_main_dispatch ../glib/gmain.c:3182
    #21 0x7fb90292c7ac in g_main_context_dispatch ../glib/gmain.c:3847
    #22 0x556715162eca in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #23 0x556715163001 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #24 0x5567151631fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #25 0x556714ad6d3b in main_loop /home/elmarco/src/qemu/vl.c:1950
    #26 0x556714ade329 in main /home/elmarco/src/qemu/vl.c:4865
    #27 0x7fb8fe5c9009 in __libc_start_main (/lib64/libc.so.6+0x21009)
    #28 0x5567147af4d9 in _start (/home/elmarco/src/qemu/build/s390x-softmmu/qemu-system-s390x+0xf674d9)

0x556715a1f120 is located 32 bytes to the left of global variable 'char_hci_type_info' defined in '/home/elmarco/src/qemu/hw/bt/hci-csr.c:493:23' (0x556715a1f140) of size 104
0x556715a1f120 is located 8 bytes to the right of global variable 's390_opcodes' defined in '/home/elmarco/src/qemu/disas/s390.c:860:33' (0x556715a15280) of size 40600

This fix is based on Andreas Arnez <arnez@linux.vnet.ibm.com> upstream
commit:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=9ace48f3d7d80ce09c5df60cccb433470410b11b

2014-08-19  Andreas Arnez  <arnez@linux.vnet.ibm.com>

       * s390-dis.c (init_disasm): Simplify initialization of
       opc_index[].  This also fixes an access after the last element
       of s390_opcodes[].

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180104160523.22995-19-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vivier pushed a commit that referenced this issue Mar 21, 2018
Spotted thanks to ASAN:
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/migration-test -p /x86_64/migration/bad_dest

==30302==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f60efba1a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7f60eef3cf75 in g_malloc0 ../glib/gmem.c:124
    #2 0x55ca9094702c in error_copy /home/elmarco/src/qemu/util/error.c:203
    #3 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139
    #4 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150
    #5 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411
    #6 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81
    #7 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85
    #8 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142
    #9 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88
    #10 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552
    #11 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182
    #12 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847
    #13 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #14 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #15 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #16 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942
    #17 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724
    #18 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009)

Indirect leak of 45 byte(s) in 1 object(s) allocated from:
    #0 0x7f60efba1850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f60eef3cf0c in g_malloc ../glib/gmem.c:94
    #2 0x7f60eef3d1cf in g_malloc_n ../glib/gmem.c:331
    #3 0x7f60eef596eb in g_strdup ../glib/gstrfuncs.c:363
    #4 0x55ca90947085 in error_copy /home/elmarco/src/qemu/util/error.c:204
    #5 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139
    #6 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150
    #7 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411
    #8 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81
    #9 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85
    #10 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142
    #11 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88
    #12 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552
    #13 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182
    #14 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847
    #15 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #16 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #17 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #18 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942
    #19 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724
    #20 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180306170959.3921-1-marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
vivier pushed a commit that referenced this issue Jun 10, 2018
Eric Auger reported the problem days ago that OOB broke ARM when running
with libvirt:

http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html

The problem was that the monitor dispatcher bottom half was bound to
qemu_aio_context now, which could be polled unexpectedly in block code.
We should keep the dispatchers run in iohandler_ctx just like what we
did before the Out-Of-Band series (chardev uses qio, and qio binds
everything with iohandler_ctx).

If without this change, QMP dispatcher might be run even before reaching
main loop in block IO path, for example, in a stack like (the ARM case,
"cont" command handler run even during machine init phase):

        #0  qmp_cont ()
        #1  0x00000000006bd210 in qmp_marshal_cont ()
        #2  0x0000000000ac05c4 in do_qmp_dispatch ()
        #3  0x0000000000ac07a0 in qmp_dispatch ()
        #4  0x0000000000472d60 in monitor_qmp_dispatch_one ()
        #5  0x000000000047302c in monitor_qmp_bh_dispatcher ()
        #6  0x0000000000acf374 in aio_bh_call ()
        #7  0x0000000000acf428 in aio_bh_poll ()
        #8  0x0000000000ad5110 in aio_poll ()
        #9  0x0000000000a08ab8 in blk_prw ()
        #10 0x0000000000a091c4 in blk_pread ()
        #11 0x0000000000734f94 in pflash_cfi01_realize ()
        #12 0x000000000075a3a4 in device_set_realized ()
        #13 0x00000000009a26cc in property_set_bool ()
        #14 0x00000000009a0a40 in object_property_set ()
        #15 0x00000000009a3a08 in object_property_set_qobject ()
        #16 0x00000000009a0c8c in object_property_set_bool ()
        #17 0x0000000000758f94 in qdev_init_nofail ()
        #18 0x000000000058e190 in create_one_flash ()
        #19 0x000000000058e2f4 in create_flash ()
        #20 0x00000000005902f0 in machvirt_init ()
        #21 0x00000000007635cc in machine_run_board_init ()
        #22 0x00000000006b135c in main ()

Actually the problem is more severe than that.  After we switched to the
qemu AIO handler it means the monitor dispatcher code can even be called
with nested aio_poll(), then it can be an explicit aio_poll() inside
another main loop aio_poll() which could be racy too; breaking code
like TPM and 9p that use nested event loops.

Switch to use the iohandler_ctx for monitor dispatchers.

My sincere thanks to Eric Auger who offered great help during both
debugging and verifying the problem.  The ARM test was carried out by
applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the
patch.

A quick test of mine shows that after this patch applied we can pass all
raw iotests even with OOB on by default.

CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180410044942.17059-1-peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
vivier pushed a commit that referenced this issue Jun 10, 2018
…nnect

When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.

The backtrace is:
    (gdb) bt
    #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
    #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
    #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
    #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
    #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
    #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
    #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
    #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
    #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
    #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
    #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
    #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
        at util/async.c:261
    #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
    #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
    #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
    #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
    #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
    #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752

It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.

According to IB Spec once active side send DREQ message, it should wait for DREP message
and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
due to network issues.
For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.

So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
is also destroyed in qemu_rdma_cleanup.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
vivier pushed a commit that referenced this issue Jun 3, 2019
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.

This is an example stack trace of a related crash:

 #0  0x00007ffff59da57f in raise () at /lib64/libc.so.6
 #1  0x00007ffff59c4895 in abort () at /lib64/libc.so.6
 #2  0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
 #4  0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
 #5  0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
         at block/io.c:833
 #6  0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
 #7  0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
 #8  0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
 #9  0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
 #10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
 #12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
 #13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
 #14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
 #16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
 #17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
 #18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
 #19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
 #20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
 #21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
 #22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
 #23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776

The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
vivier pushed a commit that referenced this issue Sep 4, 2019
There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1565625509-404969-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vivier pushed a commit that referenced this issue Feb 13, 2020
Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.

In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.

However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:

  #2  0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
  #3  0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
  #4  0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
  #5  0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
  #6  0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
  #7  0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
  #8  0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
  #9  0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:520
  #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:607
  #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:639
  #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0)
      at /home/mdroth/w/qemu.git/util/aio-wait.c:71
  #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
  #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
  #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
  #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
  #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613

I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:

  static bool virtio_queue_host_notifier_aio_poll(void *opaque)
  {
      EventNotifier *n = opaque;
      VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
      bool progress;

      if (!vq->vring.desc || virtio_queue_empty(vq)) {
          return false;
      }

      progress = virtio_queue_notify_aio_vq(vq);

namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:

  int virtio_queue_empty(VirtQueue *vq)
  {
      bool empty;
      ...

      if (vq->shadow_avail_idx != vq->last_avail_idx) {
          return 0;
      }

      rcu_read_lock();
      empty = vring_avail_idx(vq) == vq->last_avail_idx;
      rcu_read_unlock();
      return empty;

but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:

  "virtio: zero sized buffers are not allowed"

or

  "virtio-blk missing headers"

and puts the device in an error state.

This patch works around the issue by introducing virtio_set_disabled(),
which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
when bus-mastering is disabled. Since we'd check this flag at all the
same sites as vdev->broken, we replace those checks with an inline
function which checks for either vdev->broken or vdev->disabled.

The 'disabled' flag is only migrated when set, which should be fairly
rare, but to maintain migration compatibility we disable it's use for
older machine types. Users requiring the use of the flag in conjunction
with older machine types can set it explicitly as a virtio-device
option.

NOTES:

 - This leaves some other oddities in play, like the fact that
   DRIVER_OK also gets unset in response to bus-mastering being
   disabled, but not restored (however the device seems to continue
   working)
 - Similarly, we disable the host notifier via
   virtio_bus_stop_ioeventfd(), which seems to move the handling out
   of virtio-blk dataplane and back into the main IO thread, and it
   ends up staying there till a reset (but otherwise continues working
   normally)

Cc: David Gibson <david@gibson.dropbear.id.au>,
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vivier pushed a commit that referenced this issue Feb 13, 2020
start vm with libvirt, when GuestOS running, enter poweroff command using
the xhci keyboard, then ASAN shows memory leak stack:

Direct leak of 80 byte(s) in 5 object(s) allocated from:
    #0 0xfffd1e6431cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
    #1 0xfffd1e107163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
    #2 0xaaad39051367 in qemu_sglist_init /qemu/dma-helpers.c:43
    #3 0xaaad3947c407 in pci_dma_sglist_init /qemu/include/hw/pci/pci.h:842
    #4 0xaaad3947c407 in xhci_xfer_create_sgl /qemu/hw/usb/hcd-xhci.c:1446
    #5 0xaaad3947c407 in xhci_setup_packet /qemu/hw/usb/hcd-xhci.c:1618
    #6 0xaaad3948625f in xhci_submit /qemu/hw/usb/hcd-xhci.c:1827
    #7 0xaaad3948625f in xhci_fire_transfer /qemu/hw/usb/hcd-xhci.c:1839
    #8 0xaaad3948625f in xhci_kick_epctx /qemu/hw/usb/hcd-xhci.c:1991
    #9 0xaaad3948f537 in xhci_doorbell_write /qemu/hw/usb/hcd-xhci.c:3158
    #10 0xaaad38bcbfc7 in memory_region_write_accessor /qemu/memory.c:483
    #11 0xaaad38bc654f in access_with_adjusted_size /qemu/memory.c:544
    #12 0xaaad38bd1877 in memory_region_dispatch_write /qemu/memory.c:1482
    #13 0xaaad38b1c77f in flatview_write_continue /qemu/exec.c:3167
    #14 0xaaad38b1ca83 in flatview_write /qemu/exec.c:3207
    #15 0xaaad38b268db in address_space_write /qemu/exec.c:3297
    #16 0xaaad38bf909b in kvm_cpu_exec /qemu/accel/kvm/kvm-all.c:2383
    #17 0xaaad38bb063f in qemu_kvm_cpu_thread_fn /qemu/cpus.c:1246
    #18 0xaaad39821c93 in qemu_thread_start /qemu/util/qemu-thread-posix.c:519
    #19 0xfffd1c8378bb  (/lib64/libpthread.so.0+0x78bb)
    #20 0xfffd1c77616b  (/lib64/libc.so.6+0xd616b)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-id: 20200110105855.81144-1-kuhn.chenqun@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
vivier pushed a commit that referenced this issue Feb 13, 2020
All paths that lead to bdrv_backup_top_drop(), except for the call
from backup_clean(), imply that the BDS AioContext has already been
acquired, so doing it there too can potentially lead to QEMU hanging
on AIO_WAIT_WHILE().

An easy way to trigger this situation is by issuing a two actions
transaction, with a proper and a bogus blockdev-backup, so the second
one will trigger a rollback. This will trigger a hang with an stack
trace like this one:

 #0  0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>,
     timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
 #1  0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
     at /usr/include/bits/poll2.h:77
 #2  0x000055e743386e09 in qemu_poll_ns
     (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336
 #3  0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true)
     at util/aio-posix.c:669
 #4  0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878
 #5  0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017
 #6  0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262
 #7  0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644
 #8  0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273
 #9  0x000055e74331461f in backup_job_create
     (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478
 #10 0x000055e74315bc52 in do_backup_common
     (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at blockdev.c:3580
 #11 0x000055e74315c37c in do_blockdev_backup
     (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492
 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018)
     at blockdev.c:1885
 #13 0x000055e743160152 in qmp_transaction
     (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340
 #14 0x000055e743287ff5 in qmp_marshal_transaction
     (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8)
     at qapi/qapi-commands-transaction.c:44
 #15 0x000055e74333de6c in do_qmp_dispatch
     (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #16 0x000055e74333de6c in qmp_dispatch
     (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>)
     at qapi/qmp-dispatch.c:175
 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>)
     at monitor/qmp.c:145
 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117
 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117
 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459
 #22 0x000055e743385742 in aio_ctx_dispatch
     (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176
 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829
 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219
 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828
 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
     at vl.c:4504

Fix this by not acquiring the AioContext there, and ensuring all paths
leading to it have it already acquired (backup_clean()).

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vivier pushed a commit that referenced this issue Feb 13, 2020
Dirty map addition and removal functions are not acquiring to BDS
AioContext, while they may call to code that expects it to be
acquired.

This may trigger a crash with a stack trace like this one:

 #0  0x00007f0ef146370f in __GI_raise (sig=sig@entry=6)
     at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007f0ef144db25 in __GI_abort () at abort.c:79
 #2  0x0000565022294dce in error_exit
     (err=<optimized out>, msg=msg@entry=0x56502243a730 <__func__.16350> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x00005650222950ba in qemu_mutex_unlock_impl
     (mutex=mutex@entry=0x5650244b0240, file=file@entry=0x565022439adf "util/async.c", line=line@entry=526) at util/qemu-thread-posix.c:108
 #4  0x0000565022290029 in aio_context_release
     (ctx=ctx@entry=0x5650244b01e0) at util/async.c:526
 #5  0x000056502221cd08 in bdrv_can_store_new_dirty_bitmap
     (bs=bs@entry=0x5650244dc820, name=name@entry=0x56502481d360 "bitmap1", granularity=granularity@entry=65536, errp=errp@entry=0x7fff22831718)
     at block/dirty-bitmap.c:542
 #6  0x000056502206ae53 in qmp_block_dirty_bitmap_add
     (errp=0x7fff22831718, disabled=false, has_disabled=<optimized out>, persistent=<optimized out>, has_persistent=true, granularity=65536, has_granularity=<optimized out>, name=0x56502481d360 "bitmap1", node=<optimized out>) at blockdev.c:2894
 #7  0x000056502206ae53 in qmp_block_dirty_bitmap_add
     (node=<optimized out>, name=0x56502481d360 "bitmap1", has_granularity=<optimized out>, granularity=<optimized out>, has_persistent=true, persistent=<optimized out>, has_disabled=false, disabled=false, errp=0x7fff22831718) at blockdev.c:2856
 #8  0x00005650221847a3 in qmp_marshal_block_dirty_bitmap_add
     (args=<optimized out>, ret=<optimized out>, errp=0x7fff22831798)
     at qapi/qapi-commands-block-core.c:651
 #9  0x0000565022247e6c in do_qmp_dispatch
     (errp=0x7fff22831790, allow_oob=<optimized out>, request=<optimized out>, cmds=0x565022b32d60 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #10 0x0000565022247e6c in qmp_dispatch
     (cmds=0x565022b32d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
 #11 0x0000565022166061 in monitor_qmp_dispatch
     (mon=0x56502450faa0, req=<optimized out>) at monitor/qmp.c:145
 #12 0x00005650221666fa in monitor_qmp_bh_dispatcher
     (data=<optimized out>) at monitor/qmp.c:234
 #13 0x000056502228f866 in aio_bh_call (bh=0x56502440eae0)
     at util/async.c:117
 #14 0x000056502228f866 in aio_bh_poll (ctx=ctx@entry=0x56502440d7a0)
     at util/async.c:117
 #15 0x0000565022292c54 in aio_dispatch (ctx=0x56502440d7a0)
     at util/aio-posix.c:459
 #16 0x000056502228f742 in aio_ctx_dispatch
     (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #17 0x00007f0ef5ce667d in g_main_dispatch (context=0x56502449aa40)
     at gmain.c:3176
 #18 0x00007f0ef5ce667d in g_main_context_dispatch
     (context=context@entry=0x56502449aa40) at gmain.c:3829
 #19 0x0000565022291d08 in glib_pollfds_poll () at util/main-loop.c:219
 #20 0x0000565022291d08 in os_host_main_loop_wait
     (timeout=<optimized out>) at util/main-loop.c:242
 #21 0x0000565022291d08 in main_loop_wait (nonblocking=<optimized out>)
     at util/main-loop.c:518
 #22 0x00005650220743c1 in main_loop () at vl.c:1828
 #23 0x0000565021f20a72 in main
     (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
     at vl.c:4504

Fix this by acquiring the AioContext at qmp_block_dirty_bitmap_add()
and qmp_block_dirty_bitmap_add().

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782175
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vivier pushed a commit that referenced this issue Feb 13, 2020
external_snapshot_abort() calls to bdrv_set_backing_hd(), which
returns state->old_bs to the main AioContext, as it's intended to be
used then the BDS is going to be released. As that's not the case when
aborting an external snapshot, return it to the AioContext it was
before the call.

This issue can be triggered by issuing a transaction with two actions,
a proper blockdev-snapshot-sync and a bogus one, so the second will
trigger a transaction abort. This results in a crash with an stack
trace like this one:

 #0  0x00007fa1048b28df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
 #1  0x00007fa10489ccf5 in __GI_abort () at abort.c:79
 #2  0x00007fa10489cbc9 in __assert_fail_base
     (fmt=0x7fa104a03300 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=0x557224014d30 "block.c", line=2240, function=<optimized out>) at assert.c:92
 #3  0x00007fa1048aae96 in __GI___assert_fail
     (assertion=assertion@entry=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=file@entry=0x557224014d30 "block.c", line=line@entry=2240, function=function@entry=0x5572240b5d60 <__PRETTY_FUNCTION__.31620> "bdrv_replace_child_noperm") at assert.c:101
 #4  0x0000557223e631f8 in bdrv_replace_child_noperm (child=0x557225b9c980, new_bs=new_bs@entry=0x557225c42e40) at block.c:2240
 #5  0x0000557223e68be7 in bdrv_replace_node (from=0x557226951a60, to=0x557225c42e40, errp=0x5572247d6138 <error_abort>) at block.c:4196
 #6  0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1731
 #7  0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1717
 #8  0x0000557223d09013 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x557225cc7d70, errp=errp@entry=0x7ffe704c0c98) at blockdev.c:2360
 #9  0x0000557223e32085 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffe704c0d08) at qapi/qapi-commands-transaction.c:44
 #10 0x0000557223ee798c in do_qmp_dispatch (errp=0x7ffe704c0d00, allow_oob=<optimized out>, request=<optimized out>, cmds=0x5572247d3cc0 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #11 0x0000557223ee798c in qmp_dispatch (cmds=0x5572247d3cc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
 #12 0x0000557223e06141 in monitor_qmp_dispatch (mon=0x557225c69ff0, req=<optimized out>) at monitor/qmp.c:120
 #13 0x0000557223e0678a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209
 #14 0x0000557223f2f366 in aio_bh_call (bh=0x557225b9dc60) at util/async.c:117
 #15 0x0000557223f2f366 in aio_bh_poll (ctx=ctx@entry=0x557225b9c840) at util/async.c:117
 #16 0x0000557223f32754 in aio_dispatch (ctx=0x557225b9c840) at util/aio-posix.c:459
 #17 0x0000557223f2f242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #18 0x00007fa10913467d in g_main_dispatch (context=0x557225c28e80) at gmain.c:3176
 #19 0x00007fa10913467d in g_main_context_dispatch (context=context@entry=0x557225c28e80) at gmain.c:3829
 #20 0x0000557223f31808 in glib_pollfds_poll () at util/main-loop.c:219
 #21 0x0000557223f31808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
 #22 0x0000557223f31808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
 #23 0x0000557223d13201 in main_loop () at vl.c:1828
 #24 0x0000557223bbfb82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1779036
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vivier pushed a commit that referenced this issue May 3, 2020
Coverity reports:

  *** CID 1419387:  Memory - illegal accesses  (OVERRUN)
  /hw/hppa/dino.c: 267 in dino_chip_read_with_attrs()
  261             val = s->ilr & s->imr & s->icr;
  262             break;
  263         case DINO_TOC_ADDR:
  264             val = s->toc_addr;
  265             break;
  266         case DINO_GMASK ... DINO_TLTIM:
  >>>     CID 1419387:  Memory - illegal accesses  (OVERRUN)
  >>>     Overrunning array "s->reg800" of 12 4-byte elements at element index 12 (byte offset 48) using index "(addr - 2048UL) / 4UL" (which evaluates to 12).
  267             val = s->reg800[(addr - DINO_GMASK) / 4];
  268             if (addr == DINO_PAMR) {
  269                 val &= ~0x01;  /* LSB is hardwired to 0 */
  270             }
  271             if (addr == DINO_MLTIM) {
  272                 val &= ~0x07;  /* 3 LSB are hardwired to 0 */

  *** CID 1419393:  Memory - corruptions  (OVERRUN)
  /hw/hppa/dino.c: 363 in dino_chip_write_with_attrs()
  357             /* These registers are read-only.  */
  358             break;
  359
  360         case DINO_GMASK ... DINO_TLTIM:
  361             i = (addr - DINO_GMASK) / 4;
  362             val &= reg800_keep_bits[i];
  >>>     CID 1419393:  Memory - corruptions  (OVERRUN)
  >>>     Overrunning array "s->reg800" of 12 4-byte elements at element index 12 (byte offset 48) using index "i" (which evaluates to 12).
  363             s->reg800[i] = val;
  364             break;
  365
  366         default:
  367             /* Controlled by dino_chip_mem_valid above.  */
  368             g_assert_not_reached();

  *** CID 1419394:  Memory - illegal accesses  (OVERRUN)
  /hw/hppa/dino.c: 362 in dino_chip_write_with_attrs()
  356         case DINO_IRR1:
  357             /* These registers are read-only.  */
  358             break;
  359
  360         case DINO_GMASK ... DINO_TLTIM:
  361             i = (addr - DINO_GMASK) / 4;
  >>>     CID 1419394:  Memory - illegal accesses  (OVERRUN)
  >>>     Overrunning array "reg800_keep_bits" of 12 4-byte elements at element index 12 (byte offset 48) using index "i" (which evaluates to 12).
  362             val &= reg800_keep_bits[i];
  363             s->reg800[i] = val;
  364             break;
  365
  366         default:
  367             /* Controlled by dino_chip_mem_valid above.  */

Indeed the array should contain 13 entries, the undocumented
register 0x82c is missing. Fix by increasing the array size
and adding the missing register.

CID 1419387 can be verified with:

  $ echo x 0xfff80830 | hppa-softmmu/qemu-system-hppa -S -monitor stdio -display none
  QEMU 4.2.50 monitor - type 'help' for more information
  (qemu) x 0xfff80830
  qemu/hw/hppa/dino.c:267:15: runtime error: index 12 out of bounds for type 'uint32_t [12]'
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/phil/source/qemu/hw/hppa/dino.c:267:15 in
  00000000fff80830: 0x00000000

and CID 1419393/1419394 with:

  $ echo writeb 0xfff80830 0x69 \
    | hppa-softmmu/qemu-system-hppa -S -accel qtest -qtest stdio -display none
  [I 1581634452.654113] OPENED
  [R +4.105415] writeb 0xfff80830 0x69
  qemu/hw/hppa/dino.c:362:16: runtime error: index 12 out of bounds for type 'const uint32_t [12]'
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior qemu/hw/hppa/dino.c:362:16 in
  =================================================================
  ==29607==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5577dae32f30 at pc 0x5577d93f2463 bp 0x7ffd97ea11b0 sp 0x7ffd97ea11a8
  READ of size 4 at 0x5577dae32f30 thread T0
      #0 0x5577d93f2462 in dino_chip_write_with_attrs qemu/hw/hppa/dino.c:362:16
      #1 0x5577d9025664 in memory_region_write_with_attrs_accessor qemu/memory.c:503:12
      #2 0x5577d9024920 in access_with_adjusted_size qemu/memory.c:539:18
      #3 0x5577d9023608 in memory_region_dispatch_write qemu/memory.c:1482:13
      #4 0x5577d8e3177a in flatview_write_continue qemu/exec.c:3166:23
      #5 0x5577d8e20357 in flatview_write qemu/exec.c:3206:14
      #6 0x5577d8e1fef4 in address_space_write qemu/exec.c:3296:18
      #7 0x5577d8e20693 in address_space_rw qemu/exec.c:3306:16
      #8 0x5577d9011595 in qtest_process_command qemu/qtest.c:432:13
      #9 0x5577d900d19f in qtest_process_inbuf qemu/qtest.c:705:9
      #10 0x5577d900ca22 in qtest_read qemu/qtest.c:717:5
      #11 0x5577da8c4254 in qemu_chr_be_write_impl qemu/chardev/char.c:183:9
      #12 0x5577da8c430c in qemu_chr_be_write qemu/chardev/char.c:195:9
      #13 0x5577da8cf587 in fd_chr_read qemu/chardev/char-fd.c:68:9
      #14 0x5577da9836cd in qio_channel_fd_source_dispatch qemu/io/channel-watch.c:84:12
      #15 0x7faf44509ecc in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4fecc)
      #16 0x5577dab75f96 in glib_pollfds_poll qemu/util/main-loop.c:219:9
      #17 0x5577dab74797 in os_host_main_loop_wait qemu/util/main-loop.c:242:5
      #18 0x5577dab7435a in main_loop_wait qemu/util/main-loop.c:518:11
      #19 0x5577d9514eb3 in main_loop qemu/vl.c:1682:9
      #20 0x5577d950699d in main qemu/vl.c:4450:5
      #21 0x7faf41a87f42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
      #22 0x5577d8cd4d4d in _start (qemu/build/sanitizer/hppa-softmmu/qemu-system-hppa+0x1256d4d)

  0x5577dae32f30 is located 0 bytes to the right of global variable 'reg800_keep_bits' defined in 'qemu/hw/hppa/dino.c:87:23' (0x5577dae32f00) of size 48
  SUMMARY: AddressSanitizer: global-buffer-overflow qemu/hw/hppa/dino.c:362:16 in dino_chip_write_with_attrs
  Shadow bytes around the buggy address:
    0x0aaf7b5be590: 00 f9 f9 f9 f9 f9 f9 f9 00 02 f9 f9 f9 f9 f9 f9
    0x0aaf7b5be5a0: 07 f9 f9 f9 f9 f9 f9 f9 07 f9 f9 f9 f9 f9 f9 f9
    0x0aaf7b5be5b0: 07 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
    0x0aaf7b5be5c0: 00 00 00 02 f9 f9 f9 f9 00 00 00 00 00 00 00 00
    0x0aaf7b5be5d0: 00 00 00 00 00 00 00 00 00 00 00 03 f9 f9 f9 f9
  =>0x0aaf7b5be5e0: 00 00 00 00 00 00[f9]f9 f9 f9 f9 f9 00 00 00 00
    0x0aaf7b5be5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0aaf7b5be600: 00 00 01 f9 f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9
    0x0aaf7b5be610: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
    0x0aaf7b5be620: 00 00 00 05 f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9
    0x0aaf7b5be630: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 07 f9 f9 f9
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
    Shadow gap:              cc
  ==29607==ABORTING

Fixes: Covertiy CID 1419387 / 1419393 / 1419394 (commit 1809259)
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200218063355.18577-3-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
vivier pushed a commit that referenced this issue May 3, 2020
The virtqueue code sets up MemoryRegionCaches to access the virtqueue
guest RAM data structures.  The code currently assumes that
VRingMemoryRegionCaches is initialized before device emulation code
accesses the virtqueue.  An assertion will fail in
vring_get_region_caches() when this is not true.  Device fuzzing found a
case where this assumption is false (see below).

Virtqueue guest RAM addresses can also be changed from a vCPU thread
while an IOThread is accessing the virtqueue.  This breaks the same
assumption but this time the caches could become invalid partway through
the virtqueue code.  The code fetches the caches RCU pointer multiple
times so we will need to validate the pointer every time it is fetched.

Add checks each time we call vring_get_region_caches() and treat invalid
caches as a nop: memory stores are ignored and memory reads return 0.

The fuzz test failure is as follows:

  $ qemu -M pc -device virtio-blk-pci,id=drv0,drive=drive0,addr=4.0 \
         -drive if=none,id=drive0,file=null-co://,format=raw,auto-read-only=off \
         -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw \
         -display none \
         -qtest stdio
  endianness
  outl 0xcf8 0x80002020
  outl 0xcfc 0xe0000000
  outl 0xcf8 0x80002004
  outw 0xcfc 0x7
  write 0xe0000000 0x24 0x00ffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffab5cffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffab0000000001
  inb 0x4
  writew 0xe000001c 0x1
  write 0xe0000014 0x1 0x0d

The following error message is produced:

  qemu-system-x86_64: /home/stefanha/qemu/hw/virtio/virtio.c:286: vring_get_region_caches: Assertion `caches != NULL' failed.

The backtrace looks like this:

  #0  0x00007ffff5520625 in raise () at /lib64/libc.so.6
  #1  0x00007ffff55098d9 in abort () at /lib64/libc.so.6
  #2  0x00007ffff55097a9 in _nl_load_domain.cold () at /lib64/libc.so.6
  #3  0x00007ffff5518a66 in annobin_assert.c_end () at /lib64/libc.so.6
  #4  0x00005555559073da in vring_get_region_caches (vq=<optimized out>) at qemu/hw/virtio/virtio.c:286
  #5  vring_get_region_caches (vq=<optimized out>) at qemu/hw/virtio/virtio.c:283
  #6  0x000055555590818d in vring_used_flags_set_bit (mask=1, vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:398
  #7  virtio_queue_split_set_notification (enable=0, vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:398
  #8  virtio_queue_set_notification (vq=vq@entry=0x5555575ceea0, enable=enable@entry=0) at qemu/hw/virtio/virtio.c:451
  #9  0x0000555555908512 in virtio_queue_set_notification (vq=vq@entry=0x5555575ceea0, enable=enable@entry=0) at qemu/hw/virtio/virtio.c:444
  #10 0x00005555558c697a in virtio_blk_handle_vq (s=0x5555575c57e0, vq=0x5555575ceea0) at qemu/hw/block/virtio-blk.c:775
  #11 0x0000555555907836 in virtio_queue_notify_aio_vq (vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:2244
  #12 0x0000555555cb5dd7 in aio_dispatch_handlers (ctx=ctx@entry=0x55555671a420) at util/aio-posix.c:429
  #13 0x0000555555cb67a8 in aio_dispatch (ctx=0x55555671a420) at util/aio-posix.c:460
  #14 0x0000555555cb307e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
  #15 0x00007ffff7bbc510 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
  #16 0x0000555555cb5848 in glib_pollfds_poll () at util/main-loop.c:219
  #17 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
  #18 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
  #19 0x00005555559b20c9 in main_loop () at vl.c:1683
  #20 0x0000555555838115 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4441

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Cc: Michael Tsirkin <mst@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200207104619.164892-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vivier pushed a commit that referenced this issue May 3, 2020
'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295
    #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49
    #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290
    #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163
    #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148
    #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232
    #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512
    #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792
    #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819
    #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317
    #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575
    #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126
    #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219
    #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397
    #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534
    #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668
    #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485
    #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
vivier pushed a commit that referenced this issue May 3, 2020
There is a use-after-free possible: bdrv_unref_child() leaves
bs->backing freed but not NULL. bdrv_attach_child may produce nested
polling loop due to drain, than access of freed pointer is possible.

I've produced the following crash on 30 iotest with modified code. It
does not reproduce on master, but still seems possible:

    #0  __strcmp_avx2 () at /lib64/libc.so.6
    #1  bdrv_backing_overridden (bs=0x55c9d3cc2060) at block.c:6350
    #2  bdrv_refresh_filename (bs=0x55c9d3cc2060) at block.c:6404
    #3  bdrv_backing_attach (c=0x55c9d48e5520) at block.c:1063
    #4  bdrv_replace_child_noperm
        (child=child@entry=0x55c9d48e5520,
        new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2290
    #5  bdrv_replace_child
        (child=child@entry=0x55c9d48e5520,
        new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2320
    #6  bdrv_root_attach_child
        (child_bs=child_bs@entry=0x55c9d3cc2060,
        child_name=child_name@entry=0x55c9d241d478 "backing",
        child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
        ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
        opaque=0x55c9d3c5a3d0, errp=0x7ffd117108e0) at block.c:2424
    #7  bdrv_attach_child
        (parent_bs=parent_bs@entry=0x55c9d3c5a3d0,
        child_bs=child_bs@entry=0x55c9d3cc2060,
        child_name=child_name@entry=0x55c9d241d478 "backing",
        child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
        errp=errp@entry=0x7ffd117108e0) at block.c:5876
    #8  in bdrv_set_backing_hd
        (bs=bs@entry=0x55c9d3c5a3d0,
        backing_hd=backing_hd@entry=0x55c9d3cc2060,
        errp=errp@entry=0x7ffd117108e0)
        at block.c:2576
    #9  stream_prepare (job=0x55c9d49d84a0) at block/stream.c:150
    #10 job_prepare (job=0x55c9d49d84a0) at job.c:761
    #11 job_txn_apply (txn=<optimized out>, fn=<optimized out>) at
        job.c:145
    #12 job_do_finalize (job=0x55c9d49d84a0) at job.c:778
    #13 job_completed_txn_success (job=0x55c9d49d84a0) at job.c:832
    #14 job_completed (job=0x55c9d49d84a0) at job.c:845
    #15 job_completed (job=0x55c9d49d84a0) at job.c:836
    #16 job_exit (opaque=0x55c9d49d84a0) at job.c:864
    #17 aio_bh_call (bh=0x55c9d471a160) at util/async.c:117
    #18 aio_bh_poll (ctx=ctx@entry=0x55c9d3c46720) at util/async.c:117
    #19 aio_poll (ctx=ctx@entry=0x55c9d3c46720,
        blocking=blocking@entry=true)
        at util/aio-posix.c:728
    #20 bdrv_parent_drained_begin_single (poll=true, c=0x55c9d3d558f0)
        at block/io.c:121
    #21 bdrv_parent_drained_begin_single (c=c@entry=0x55c9d3d558f0,
        poll=poll@entry=true)
        at block/io.c:114
    #22 bdrv_replace_child_noperm
        (child=child@entry=0x55c9d3d558f0,
        new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2258
    #23 bdrv_replace_child
        (child=child@entry=0x55c9d3d558f0,
        new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2320
    #24 bdrv_root_attach_child
        (child_bs=child_bs@entry=0x55c9d3d27300,
        child_name=child_name@entry=0x55c9d241d478 "backing",
        child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
        ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
        opaque=0x55c9d3cc2060, errp=0x7ffd11710c60) at block.c:2424
    #25 bdrv_attach_child
        (parent_bs=parent_bs@entry=0x55c9d3cc2060,
        child_bs=child_bs@entry=0x55c9d3d27300,
        child_name=child_name@entry=0x55c9d241d478 "backing",
        child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
        errp=errp@entry=0x7ffd11710c60) at block.c:5876
    #26 bdrv_set_backing_hd
        (bs=bs@entry=0x55c9d3cc2060,
        backing_hd=backing_hd@entry=0x55c9d3d27300,
        errp=errp@entry=0x7ffd11710c60)
        at block.c:2576
    #27 stream_prepare (job=0x55c9d495ead0) at block/stream.c:150
    ...

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200316060631.30052-2-vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
vivier pushed a commit that referenced this issue May 3, 2020
There is an overflow, the source 'datain.data[2]' is 100 bytes,
 but the 'ss' is 252 bytes.This may cause a security issue because
 we can access a lot of unrelated memory data.

The len for sbp copy data should take the minimum of mx_sb_len and
 sb_len_wr, not the maximum.

If we use iscsi device for VM backend storage, ASAN show stack:

READ of size 252 at 0xfffd149dcfc4 thread T0
    #0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34)
    #1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9
    #2 0xfffd1af0e2dc  (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc)
    #3 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #4 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4)
allocated by thread T0 here:
    #0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70)
    #1 0xfffd1af0e254  (/usr/lib64/iscsi/libiscsi.so.8+0xe254)
    #2 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #3 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vivier pushed a commit that referenced this issue May 31, 2020
when s->inflight is freed, vhost_dev_free_inflight may try to access
s->inflight->addr, it will retrigger the following issue.

==7309==ERROR: AddressSanitizer: heap-use-after-free on address 0x604001020d18 at pc 0x555555ce948a bp 0x7fffffffb170 sp 0x7fffffffb160
READ of size 8 at 0x604001020d18 thread T0
    #0 0x555555ce9489 in vhost_dev_free_inflight /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473
    #1 0x555555cd86eb in virtio_reset /root/smartx/qemu-el7/qemu-test/hw/virtio/virtio.c:1214
    #2 0x5555560d3eff in virtio_pci_reset hw/virtio/virtio-pci.c:1859
    #3 0x555555f2ac53 in device_set_realized hw/core/qdev.c:893
    #4 0x5555561d572c in property_set_bool qom/object.c:1925
    #5 0x5555561de8de in object_property_set_qobject qom/qom-qobject.c:27
    #6 0x5555561d99f4 in object_property_set_bool qom/object.c:1188
    #7 0x555555e50ae7 in qdev_device_add /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:626
    #8 0x555555e51213 in qmp_device_add /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:806
    #9 0x555555e8ff40 in hmp_device_add /root/smartx/qemu-el7/qemu-test/hmp.c:1951
    #10 0x555555be889a in handle_hmp_command /root/smartx/qemu-el7/qemu-test/monitor.c:3404
    #11 0x555555beac8b in monitor_command_cb /root/smartx/qemu-el7/qemu-test/monitor.c:4296
    #12 0x555556433eb7 in readline_handle_byte util/readline.c:393
    #13 0x555555be89ec in monitor_read /root/smartx/qemu-el7/qemu-test/monitor.c:4279
    #14 0x5555563285cc in tcp_chr_read chardev/char-socket.c:470
    #15 0x7ffff670b968 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4a968)
    #16 0x55555640727c in glib_pollfds_poll util/main-loop.c:215
    #17 0x55555640727c in os_host_main_loop_wait util/main-loop.c:238
    #18 0x55555640727c in main_loop_wait util/main-loop.c:497
    #19 0x555555b2d0bf in main_loop /root/smartx/qemu-el7/qemu-test/vl.c:2013
    #20 0x555555b2d0bf in main /root/smartx/qemu-el7/qemu-test/vl.c:4776
    #21 0x7fffdd2eb444 in __libc_start_main (/lib64/libc.so.6+0x22444)
    #22 0x555555b3767a  (/root/smartx/qemu-el7/qemu-test/x86_64-softmmu/qemu-system-x86_64+0x5e367a)

0x604001020d18 is located 8 bytes inside of 40-byte region [0x604001020d10,0x604001020d38)
freed by thread T0 here:
    #0 0x7ffff6f00508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
    #1 0x7ffff671107d in g_free (/lib64/libglib-2.0.so.0+0x5007d)

previously allocated by thread T0 here:
    #0 0x7ffff6f00a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
    #1 0x7ffff6710fc5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x4ffc5)

SUMMARY: AddressSanitizer: heap-use-after-free /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473 in vhost_dev_free_inflight
Shadow bytes around the buggy address:
  0x0c08801fc150: fa fa 00 00 00 00 04 fa fa fa fd fd fd fd fd fa
  0x0c08801fc160: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 04 fa
  0x0c08801fc170: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 04 fa
  0x0c08801fc180: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 00 01
  0x0c08801fc190: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 04 fa
=>0x0c08801fc1a0: fa fa fd[fd]fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c08801fc1b0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c08801fc1c0: fa fa 00 00 00 00 00 fa fa fa fd fd fd fd fd fd
  0x0c08801fc1d0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
  0x0c08801fc1e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd
  0x0c08801fc1f0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==7309==ABORTING

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20200417101707.14467-1-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
vivier pushed a commit that referenced this issue Jul 6, 2020
On restart, we were scheduling a BH to process queued requests, which
would run before starting up the data plane, leading to those requests
being assigned and started on coroutines on the main context.

This could cause requests to be wrongly processed in parallel from
different threads (the main thread and the iothread managing the data
plane), potentially leading to multiple issues.

For example, stopping and resuming a VM multiple times while the guest
is generating I/O on a virtio_blk device can trigger a crash with a
stack tracing looking like this one:

<------>
 Thread 2 (Thread 0x7ff736765700 (LWP 1062503)):
 #0  0x00005567a13b99d6 in iov_memset
     (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
     at util/iov.c:69
 #1  0x00005567a13bab73 in qemu_iovec_memset
     (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
 #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
 #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
 #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
 #5  0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420)
     at block/linux-aio.c:236
 #6  0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267
 #7  0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
     at util/aio-posix.c:520
 #8  0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8)
     at util/aio-posix.c:562
 #9  0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
     at util/aio-posix.c:597
 #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639
 #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75
 #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519
 #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0
 #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6

 Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)):
 #0  0x00005567a13b99d6 in iov_memset
     (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
     at util/iov.c:69
 #1  0x00005567a13bab73 in qemu_iovec_memset
     (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
 #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
 #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
 #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
 #5  0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2)
     at block/linux-aio.c:375
 #6  0x00005567a12f4af2 in laio_co_submit
     (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2)
     at block/linux-aio.c:394
 #7  0x00005567a12f1803 in raw_co_prw
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2)
     at block/file-posix.c:1892
 #8  0x00005567a12f1941 in raw_co_pwritev
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0)
     at block/file-posix.c:1925
 #9  0x00005567a12fe3e1 in bdrv_driver_pwritev
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0)
     at block/io.c:1183
 #10 0x00005567a1300340 in bdrv_aligned_pwritev
     (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980
 #11 0x00005567a1300b29 in bdrv_co_pwritev_part
     (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0)
     at block/io.c:2137
 #12 0x00005567a12baba1 in qcow2_co_pwritev_task
     (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444
 #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475
 #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45
 #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115
 #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6
 #17 0x00007ff6626f1350 in  ()
 #18 0x0000000000000000 in  ()
<------>

This is also known to cause crashes with this message (assertion
failed):

 aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20200603093240.40489-3-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vivier pushed a commit that referenced this issue Jul 6, 2020
Since commit d70c996, when enabling the PMU we get:

  $ qemu-system-aarch64 -cpu host,pmu=on -M virt,accel=kvm,gic-version=3
  Segmentation fault (core dumped)

  Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault.
  0x0000aaaaaae356d0 in kvm_ioctl (s=0x0, type=44547) at accel/kvm/kvm-all.c:2588
  2588        ret = ioctl(s->fd, type, arg);
  (gdb) bt
  #0  0x0000aaaaaae356d0 in kvm_ioctl (s=0x0, type=44547) at accel/kvm/kvm-all.c:2588
  #1  0x0000aaaaaae31568 in kvm_check_extension (s=0x0, extension=126) at accel/kvm/kvm-all.c:916
  #2  0x0000aaaaaafce254 in kvm_arm_pmu_supported (cpu=0xaaaaac214ab0) at target/arm/kvm.c:213
  #3  0x0000aaaaaafc0f94 in arm_set_pmu (obj=0xaaaaac214ab0, value=true, errp=0xffffffffe438) at target/arm/cpu.c:1111
  #4  0x0000aaaaab5533ac in property_set_bool (obj=0xaaaaac214ab0, v=0xaaaaac223a80, name=0xaaaaac11a970 "pmu", opaque=0xaaaaac222730, errp=0xffffffffe438) at qom/object.c:2170
  #5  0x0000aaaaab5512f0 in object_property_set (obj=0xaaaaac214ab0, v=0xaaaaac223a80, name=0xaaaaac11a970 "pmu", errp=0xffffffffe438) at qom/object.c:1328
  #6  0x0000aaaaab551e10 in object_property_parse (obj=0xaaaaac214ab0, string=0xaaaaac11b4c0 "on", name=0xaaaaac11a970 "pmu", errp=0xffffffffe438) at qom/object.c:1561
  #7  0x0000aaaaab54ee8c in object_apply_global_props (obj=0xaaaaac214ab0, props=0xaaaaac018e20, errp=0xaaaaabd6fd88 <error_fatal>) at qom/object.c:407
  #8  0x0000aaaaab1dd5a4 in qdev_prop_set_globals (dev=0xaaaaac214ab0) at hw/core/qdev-properties.c:1218
  #9  0x0000aaaaab1d9fac in device_post_init (obj=0xaaaaac214ab0) at hw/core/qdev.c:1050
  ...
  #15 0x0000aaaaab54f310 in object_initialize_with_type (obj=0xaaaaac214ab0, size=52208, type=0xaaaaabe237f0) at qom/object.c:512
  #16 0x0000aaaaab54fa24 in object_new_with_type (type=0xaaaaabe237f0) at qom/object.c:687
  #17 0x0000aaaaab54fa80 in object_new (typename=0xaaaaabe23970 "host-arm-cpu") at qom/object.c:702
  #18 0x0000aaaaaaf04a74 in machvirt_init (machine=0xaaaaac0a8550) at hw/arm/virt.c:1770
  #19 0x0000aaaaab1e8720 in machine_run_board_init (machine=0xaaaaac0a8550) at hw/core/machine.c:1138
  #20 0x0000aaaaaaf95394 in qemu_init (argc=5, argv=0xffffffffea58, envp=0xffffffffea88) at softmmu/vl.c:4348
  #21 0x0000aaaaaada3f74 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at softmmu/main.c:48

This is because in frame #2, cpu->kvm_state is still NULL
(the vCPU is not yet realized).

KVM has a hard requirement of all cores supporting the same
feature set. We only need to check if the accelerator supports
a feature, not each vCPU individually.

Fix by removing the 'CPUState *cpu' argument from the
kvm_arm_<FEATURE>_supported() functions.

Fixes: d70c996 ('Use CPUState::kvm_state in kvm_arm_pmu_supported')
Reported-by: Haibo Xu <haibo.xu@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vivier pushed a commit that referenced this issue Dec 19, 2020
In qobject_type(), NULL is returned when the 'QObject' returned from parse_value() is not of QString type,
and this 'QObject' memory will leaked.
So we need to first cache the 'QObject' returned from parse_value(), and finally
free 'QObject' memory at the end of the function.
Also, we add a testcast about invalid dict key.

The memleak stack is as follows:
Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0xfffe4b3c34fb in __interceptor_malloc (/lib64/libasan.so.4+0xd34fb)
    #1 0xfffe4ae48aa3 in g_malloc (/lib64/libglib-2.0.so.0+0x58aa3)
    #2 0xaaab3557d9f7 in qnum_from_int qemu/qobject/qnum.c:25
    #3 0xaaab35584d23 in parse_literal qemu/qobject/json-parser.c:511
    #4 0xaaab35584d23 in parse_value qemu/qobject/json-parser.c:554
    #5 0xaaab35583d77 in parse_pair qemu/qobject/json-parser.c:270
    #6 0xaaab355845db in parse_object qemu/qobject/json-parser.c:327
    #7 0xaaab355845db in parse_value qemu/qobject/json-parser.c:546
    #8 0xaaab35585b1b in json_parser_parse qemu/qobject/json-parser.c:580
    #9 0xaaab35583703 in json_message_process_token qemu/qobject/json-streamer.c:92
    #10 0xaaab355ddccf in json_lexer_feed_char qemu/qobject/json-lexer.c:313
    #11 0xaaab355de0eb in json_lexer_feed qemu/qobject/json-lexer.c:350
    #12 0xaaab354aff67 in tcp_chr_read qemu/chardev/char-socket.c:525
    #13 0xfffe4ae429db in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x529db)
    #14 0xfffe4ae42d8f  (/lib64/libglib-2.0.so.0+0x52d8f)
    #15 0xfffe4ae430df in g_main_loop_run (/lib64/libglib-2.0.so.0+0x530df)
    #16 0xaaab34d70bff in iothread_run qemu/iothread.c:82
    #17 0xaaab3559d71b in qemu_thread_start qemu/util/qemu-thread-posix.c:519

Fixes: 532fb53 ("qapi: Make more of qobject_to()")
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201113145525.85151-1-alex.chen@huawei.com>
[Commit message tweaked]
vivier pushed a commit that referenced this issue Jan 7, 2021
In host_memory_backend_get_host_nodes, we build host_nodes
list and output it to v (a StringOutputVisitor) but forget
to free the list. This fixes the memory leak.

The memory leak stack:

 Direct leak of 32 byte(s) in 2 object(s) allocated from:
    #0 0xfffda30b3393 in __interceptor_calloc (/usr/lib64/libasan.so.4+0xd3393)
    #1 0xfffda1d28b9b in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x58b9b)
    #2 0xaaab05ca6e43 in host_memory_backend_get_host_nodes backends/hostmem.c:94
    #3 0xaaab061ddf83 in object_property_get_uint16List qom/object.c:1478
    #4 0xaaab05866513 in query_memdev hw/core/machine-qmp-cmds.c:312
    #5 0xaaab061d980b in do_object_child_foreach qom/object.c:1001
    #6 0xaaab0586779b in qmp_query_memdev hw/core/machine-qmp-cmds.c:328
    #7 0xaaab0615ed3f in qmp_marshal_query_memdev qapi/qapi-commands-machine.c:327
    #8 0xaaab0632d647 in do_qmp_dispatch qapi/qmp-dispatch.c:147
    #9 0xaaab0632d647 in qmp_dispatch qapi/qmp-dispatch.c:190
    #10 0xaaab0610f74b in monitor_qmp_dispatch monitor/qmp.c:120
    #11 0xaaab0611074b in monitor_qmp_bh_dispatcher monitor/qmp.c:209
    #12 0xaaab063caefb in aio_bh_poll util/async.c:117
    #13 0xaaab063d30fb in aio_dispatch util/aio-posix.c:459
    #14 0xaaab063cac8f in aio_ctx_dispatch util/async.c:268
    #15 0xfffda1d22a6b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a6b)
    #16 0xaaab063d0e97 in glib_pollfds_poll util/main-loop.c:218
    #17 0xaaab063d0e97 in os_host_main_loop_wait util/main-loop.c:241
    #18 0xaaab063d0e97 in main_loop_wait util/main-loop.c:517
    #19 0xaaab05c8bfa7 in main_loop /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:1791
    #20 0xaaab05713bc3 in main /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:4473
    #21 0xfffda0a83ebf in __libc_start_main (/usr/lib64/libc.so.6+0x23ebf)
    #22 0xaaab0571ed5f  (aarch64-softmmu/qemu-system-aarch64+0x88ed5f)
 SUMMARY: AddressSanitizer: 32 byte(s) leaked in 2 allocation(s).

Fixes: 4cf1b76 (hostmem: add properties for NUMA memory policy)
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Tested-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201210075226.20196-1-zhukeqian1@huawei.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
vivier pushed a commit that referenced this issue Jan 23, 2021
The adc_qom_set function didn't free "response", which caused an indirect
memory leak. So use qobject_unref() to fix it.

ASAN shows memory leak stack:

Indirect leak of 593280 byte(s) in 144 object(s) allocated from:
    #0 0x7f9a5e7e8d4e in __interceptor_calloc (/lib64/libasan.so.5+0x112d4e)
    #1 0x7f9a5e607a50 in g_malloc0 (/lib64/libglib-2.0.so.0+0x55a50)
    #2 0x55b1bebf636b in qdict_new ../qobject/qdict.c:30
    #3 0x55b1bec09699 in parse_object ../qobject/json-parser.c:318
    #4 0x55b1bec0b2df in parse_value ../qobject/json-parser.c:546
    #5 0x55b1bec0b6a9 in json_parser_parse ../qobject/json-parser.c:580
    #6 0x55b1bec060d1 in json_message_process_token ../qobject/json-streamer.c:92
    #7 0x55b1bec16a12 in json_lexer_feed_char ../qobject/json-lexer.c:313
    #8 0x55b1bec16fbd in json_lexer_feed ../qobject/json-lexer.c:350
    #9 0x55b1bec06453 in json_message_parser_feed ../qobject/json-streamer.c:121
    #10 0x55b1bebc2d51 in qmp_fd_receive ../tests/qtest/libqtest.c:614
    #11 0x55b1bebc2f5e in qtest_qmp_receive_dict ../tests/qtest/libqtest.c:636
    #12 0x55b1bebc2e6c in qtest_qmp_receive ../tests/qtest/libqtest.c:624
    #13 0x55b1bebc3340 in qtest_vqmp ../tests/qtest/libqtest.c:715
    #14 0x55b1bebc3942 in qtest_qmp ../tests/qtest/libqtest.c:756
    #15 0x55b1bebbd64a in adc_qom_set ../tests/qtest/npcm7xx_adc-test.c:127
    #16 0x55b1bebbd793 in adc_write_input ../tests/qtest/npcm7xx_adc-test.c:140
    #17 0x55b1bebbdf92 in test_convert_external ../tests/qtest/npcm7xx_adc-test.c:246

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Message-id: 20210118065627.79903-1-ganqixin@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vivier pushed a commit that referenced this issue Jan 23, 2021
The pwm_qom_get function didn't free "response", which caused an indirect
memory leak. So use qobject_unref() to fix it.

ASAN shows memory leak stack:

Indirect leak of 74160000 byte(s) in 18000 object(s) allocated from:
    #0 0x7f96e2f79d4e in __interceptor_calloc (/lib64/libasan.so.5+0x112d4e)
    #1 0x7f96e2d98a50 in g_malloc0 (/lib64/libglib-2.0.so.0+0x55a50)
    #2 0x556313112180 in qdict_new ../qobject/qdict.c:30
    #3 0x556313115bca in parse_object ../qobject/json-parser.c:318
    #4 0x556313117810 in parse_value ../qobject/json-parser.c:546
    #5 0x556313117bda in json_parser_parse ../qobject/json-parser.c:580
    #6 0x55631310fe67 in json_message_process_token ../qobject/json-streamer.c:92
    #7 0x5563131210b7 in json_lexer_feed_char ../qobject/json-lexer.c:313
    #8 0x556313121662 in json_lexer_feed ../qobject/json-lexer.c:350
    #9 0x5563131101e9 in json_message_parser_feed ../qobject/json-streamer.c:121
    #10 0x5563130cb81e in qmp_fd_receive ../tests/qtest/libqtest.c:614
    #11 0x5563130cba2b in qtest_qmp_receive_dict ../tests/qtest/libqtest.c:636
    #12 0x5563130cb939 in qtest_qmp_receive ../tests/qtest/libqtest.c:624
    #13 0x5563130cbe0d in qtest_vqmp ../tests/qtest/libqtest.c:715
    #14 0x5563130cc40f in qtest_qmp ../tests/qtest/libqtest.c:756
    #15 0x5563130c5623 in pwm_qom_get ../tests/qtest/npcm7xx_pwm-test.c:180
    #16 0x5563130c595e in pwm_get_duty ../tests/qtest/npcm7xx_pwm-test.c:210
    #17 0x5563130c7529 in test_toggle ../tests/qtest/npcm7xx_pwm-test.c:447

Reported-by: Euler Robot <euler.robot@huawei.com>
Message-Id: <20210115075634.717909-1-ganqixin@huawei.com>
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
vivier pushed a commit that referenced this issue Feb 11, 2021
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist.  Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.

The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop.  If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:

  #0  qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
  #1  0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
      opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
  #2  0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
  #3  0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
  #4  aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
  #5  0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
      blocking=blocking@entry=true)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
  #6  0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
      cb=<optimized out>, opaque=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
  #7  0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
      bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
  #8  bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
      new_context=new_context@entry=0x5618056c7580,
      ignore=ignore@entry=0x7f60e1e63780)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
  #9  0x000056180345c969 in bdrv_child_try_set_aio_context (
      bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
  #10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
      new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
      errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
  #11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
  #12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
      running=0, state=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
  #15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
  #16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
  #17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
  #18 migration_iteration_run (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
  #19 migration_thread (opaque=opaque@entry=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
  #20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
  #21 0x00007f61085d06ba in start_thread ()
     from /lib/x86_64-linux-gnu/libpthread.so.0
  #22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
  #23 0x0000000000000000 in ?? ()

Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context.  If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
vivier pushed a commit that referenced this issue Feb 11, 2021
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.

However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine.  As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:

  #0  qio_channel_detach_aio_context (ioc=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
  #1  0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
  #2  bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
      new_context=new_context@entry=0x562a260c9580,
      ignore=ignore@entry=0x7feeadc9b780)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
  #3  0x0000562a24282969 in bdrv_child_try_set_aio_context
      (bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
  #4  0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
      new_context=0x562a260c9580,
      update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
  #5  0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
  #6  0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
      out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #7  0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #8  0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
      running=0, state=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
  #9  0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
  #10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
  #11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
  #12 migration_iteration_run (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
  #13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
  #14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
  #15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
      at pthread_create.c:333
  #16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
      oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
      <__func__.28102>, newlen=0)
      at ../sysdeps/unix/sysv/linux/sysctl.c:30
  #17 0x0000000000000000 in ?? ()

Fix it by checking that the connection coroutine is non-null before
trying to enter it.  If it is null, no entering is needed, as the
connection is probably going down anyway.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
vivier pushed a commit that referenced this issue Feb 11, 2021
Address space is destroyed without proper removal of its listeners with
current code. They are expected to be removed in
virtio_device_instance_finalize [1], but qemu calls it through
object_deinit, after address_space_destroy call through
device_set_realized [2].

Move it to virtio_device_unrealize, called before device_set_realized
[3] and making it symmetric with memory_listener_register in
virtio_device_realize.

v2: Delete no-op call of virtio_device_instance_finalize.
    Add backtraces.

[1]

 #0  virtio_device_instance_finalize (obj=0x555557de5120)
     at /home/qemu/include/hw/virtio/virtio.h:71
 #1  0x0000555555b703c9 in object_deinit (type=0x555556639860,
      obj=<optimized out>) at ../qom/object.c:671
 #2  object_finalize (data=0x555557de5120) at ../qom/object.c:685
 #3  object_unref (objptr=0x555557de5120) at ../qom/object.c:1184
 #4  0x0000555555b4de9d in bus_free_bus_child (kid=0x555557df0660)
     at ../hw/core/qdev.c:55
 #5  0x0000555555c65003 in call_rcu_thread (opaque=opaque@entry=0x0)
     at ../util/rcu.c:281

Queued by:

 #0  bus_remove_child (bus=0x555557de5098,
     child=child@entry=0x555557de5120) at ../hw/core/qdev.c:60
 #1  0x0000555555b4ee31 in device_unparent (obj=<optimized out>)
     at ../hw/core/qdev.c:984
 #2  0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557de5120)
     at ../qom/object.c:1725
 #3  0x0000555555b6fa17 in object_property_del_child (
     child=0x555557de5120, obj=0x555557ddcf90) at ../qom/object.c:645
 #4  object_unparent (obj=0x555557de5120) at ../qom/object.c:664
 #5  0x0000555555b4c071 in bus_unparent (obj=<optimized out>)
     at ../hw/core/bus.c:147
 #6  0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557de5098)
     at ../qom/object.c:1725
 #7  0x0000555555b6fa17 in object_property_del_child (
     child=0x555557de5098, obj=0x555557ddcf90) at ../qom/object.c:645
 #8  object_unparent (obj=0x555557de5098) at ../qom/object.c:664
 #9  0x0000555555b4ee19 in device_unparent (obj=<optimized out>)
     at ../hw/core/qdev.c:981
 #10 0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557ddcf90)
     at ../qom/object.c:1725
 #11 0x0000555555b6fa17 in object_property_del_child (
     child=0x555557ddcf90, obj=0x55555685da10) at ../qom/object.c:645
 #12 object_unparent (obj=0x555557ddcf90) at ../qom/object.c:664
 #13 0x00005555558dc331 in pci_for_each_device_under_bus (
     opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
     at ../hw/pci/pci.c:1654

[2]

Optimizer omits pci_qdev_unrealize, called by device_set_realized, and
do_pci_unregister_device, called by pci_qdev_unrealize and caller of
address_space_destroy.

 #0  address_space_destroy (as=0x555557ddd1b8)
     at ../softmmu/memory.c:2840
 #1  0x0000555555b4fc53 in device_set_realized (obj=0x555557ddcf90,
      value=<optimized out>, errp=0x7fffeea8f1e0)
     at ../hw/core/qdev.c:850
 #2  0x0000555555b6eaa6 in property_set_bool (obj=0x555557ddcf90,
      v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffeea8f1e0) at ../qom/object.c:2255
 #3  0x0000555555b70e07 in object_property_set (
      obj=obj@entry=0x555557ddcf90,
      name=name@entry=0x555555db99df "realized",
      v=v@entry=0x7fffe46b7500,
      errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #4  0x0000555555b73c5f in object_property_set_qobject (
      obj=obj@entry=0x555557ddcf90,
      name=name@entry=0x555555db99df "realized",
      value=value@entry=0x7fffe44f6180,
      errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #5  0x0000555555b71044 in object_property_set_bool (
      obj=0x555557ddcf90, name=0x555555db99df "realized",
      value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #6  0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
      dev=0x555557ddcf90,
      opaque=<optimized out>) at /home/qemu/include/hw/qdev-core.h:17
 #7  0x00005555558dc331 in pci_for_each_device_under_bus (
      opaque=<optimized out>, fn=<optimized out>,
      bus=<optimized out>) at ../hw/pci/pci.c:1654

[3]

 #0  virtio_device_unrealize (dev=0x555557de5120)
     at ../hw/virtio/virtio.c:3680
 #1  0x0000555555b4fc63 in device_set_realized (obj=0x555557de5120,
     value=<optimized out>, errp=0x7fffee28df90)
     at ../hw/core/qdev.c:850
 #2  0x0000555555b6eab6 in property_set_bool (obj=0x555557de5120,
     v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffee28df90) at ../qom/object.c:2255
 #3  0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557de5120,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8035040,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #4  0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557de5120,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8035020,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #5  0x0000555555b71054 in object_property_set_bool (
     obj=0x555557de5120, name=name@entry=0x555555db99ff "realized",
     value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #6  0x0000555555b4edc5 in qdev_unrealize (dev=<optimized out>)
     at ../hw/core/qdev.c:403
 #7  0x0000555555b4c2a9 in bus_set_realized (obj=<optimized out>,
     value=<optimized out>, errp=<optimized out>)
     at ../hw/core/bus.c:204
 #8  0x0000555555b6eab6 in property_set_bool (obj=0x555557de5098,
     v=<optimized out>, name=<optimized out>, opaque=0x555557df04c0,
     errp=0x7fffee28e0a0) at ../qom/object.c:2255
 #9  0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8034f50,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #10 0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8020630,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #11 0x0000555555b71054 in object_property_set_bool (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #12 0x0000555555b4c725 in qbus_unrealize (
     bus=bus@entry=0x555557de5098) at ../hw/core/bus.c:178
 #13 0x0000555555b4fc00 in device_set_realized (obj=0x555557ddcf90,
     value=<optimized out>, errp=0x7fffee28e1e0)
     at ../hw/core/qdev.c:844
 #14 0x0000555555b6eab6 in property_set_bool (obj=0x555557ddcf90,
     v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffee28e1e0) at ../qom/object.c:2255
 #15 0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557ddcf90,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8020560,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #16 0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557ddcf90,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8020540,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #17 0x0000555555b71054 in object_property_set_bool (
     obj=0x555557ddcf90, name=0x555555db99ff "realized",
     value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #18 0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
     dev=0x555557ddcf90, opaque=<optimized out>)
     at /home/qemu/include/hw/qdev-core.h:17
 #19 0x00005555558dc331 in pci_for_each_device_under_bus (
     opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
     at ../hw/pci/pci.c:1654

Fixes: c611c76 ("virtio: add MemoryListener to cache ring translations")
Buglink: https://bugs.launchpad.net/qemu/+bug/1912846
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20210125192505.390554-1-eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vivier pushed a commit that referenced this issue Jun 25, 2022
the code in pcibus_get_fw_dev_path contained the potential for a
stack buffer overflow of 1 byte, potentially writing to the stack an
extra NUL byte.

This overflow could happen if the PCI slot is >= 0x10000000,
and the PCI function is >= 0x10000000, due to the size parameter
of snprintf being incorrectly calculated in the call:

    if (PCI_FUNC(d->devfn))
        snprintf(path + off, sizeof(path) + off, ",%x", PCI_FUNC(d->devfn));

since the off obtained from a previous call to snprintf is added
instead of subtracted from the total available size of the buffer.

Without the accurate size guard from snprintf, we end up writing in the
worst case:

name (32) + "@" (1) + SLOT (8) + "," (1) + FUNC (8) + term NUL (1) = 51 bytes

In order to provide something more robust, replace all of the code in
pcibus_get_fw_dev_path with a single call to g_strdup_printf,
so there is no need to rely on manual calculations.

Found by compiling QEMU with FORTIFY_SOURCE=3 as the error:

*** buffer overflow detected ***: terminated

Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff642c380 (LWP 121307)]
0x00007ffff71ff55c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
 #0  0x00007ffff71ff55c in __pthread_kill_implementation () at /lib64/libc.so.6
 #1  0x00007ffff71ac6f6 in raise () at /lib64/libc.so.6
 #2  0x00007ffff7195814 in abort () at /lib64/libc.so.6
 #3  0x00007ffff71f279e in __libc_message () at /lib64/libc.so.6
 #4  0x00007ffff729767a in __fortify_fail () at /lib64/libc.so.6
 #5  0x00007ffff7295c36 in  () at /lib64/libc.so.6
 #6  0x00007ffff72957f5 in __snprintf_chk () at /lib64/libc.so.6
 #7  0x0000555555b1c1fd in pcibus_get_fw_dev_path ()
 #8  0x0000555555f2bde4 in qdev_get_fw_dev_path_helper.constprop ()
 #9  0x0000555555f2bd86 in qdev_get_fw_dev_path_helper.constprop ()
 #10 0x00005555559a6e5d in get_boot_device_path ()
 #11 0x00005555559a712c in get_boot_devices_list ()
 #12 0x0000555555b1a3d0 in fw_cfg_machine_reset ()
 #13 0x0000555555bf4c2d in pc_machine_reset ()
 #14 0x0000555555c66988 in qemu_system_reset ()
 #15 0x0000555555a6dff6 in qdev_machine_creation_done ()
 #16 0x0000555555c79186 in qmp_x_exit_preconfig.part ()
 #17 0x0000555555c7b459 in qemu_init ()
 #18 0x0000555555960a29 in main ()

Found-by: Dario Faggioli <Dario Faggioli <dfaggioli@suse.com>
Found-by: Martin Liška <martin.liska@suse.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220531114707.18830-1-cfontana@suse.de>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
vivier pushed a commit that referenced this issue Jan 27, 2023
This is below memleak detected when to quit the qemu-system-x86_64 (with
vhost-scsi-pci).

(qemu) quit

=================================================================
==15568==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f00aec57917 in __interceptor_calloc (/lib64/libasan.so.6+0xb4917)
    #1 0x7f00ada0d7b5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x517b5)
    #2 0x5648ffd38bac in vhost_scsi_start ../hw/scsi/vhost-scsi.c:92
    #3 0x5648ffd38d52 in vhost_scsi_set_status ../hw/scsi/vhost-scsi.c:131
    #4 0x5648ffda340e in virtio_set_status ../hw/virtio/virtio.c:2036
    #5 0x5648ff8de281 in virtio_ioport_write ../hw/virtio/virtio-pci.c:431
    #6 0x5648ff8deb29 in virtio_pci_config_write ../hw/virtio/virtio-pci.c:576
    #7 0x5648ffe5c0c2 in memory_region_write_accessor ../softmmu/memory.c:493
    #8 0x5648ffe5c424 in access_with_adjusted_size ../softmmu/memory.c:555
    #9 0x5648ffe6428f in memory_region_dispatch_write ../softmmu/memory.c:1515
    #10 0x5648ffe8613d in flatview_write_continue ../softmmu/physmem.c:2825
    #11 0x5648ffe86490 in flatview_write ../softmmu/physmem.c:2867
    #12 0x5648ffe86d9f in address_space_write ../softmmu/physmem.c:2963
    #13 0x5648ffe86e57 in address_space_rw ../softmmu/physmem.c:2973
    #14 0x5648fffbfb3d in kvm_handle_io ../accel/kvm/kvm-all.c:2639
    #15 0x5648fffc0e0d in kvm_cpu_exec ../accel/kvm/kvm-all.c:2890
    #16 0x5648fffc90a7 in kvm_vcpu_thread_fn ../accel/kvm/kvm-accel-ops.c:51
    #17 0x56490042400a in qemu_thread_start ../util/qemu-thread-posix.c:505
    #18 0x7f00ac3b6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)

Free the vsc->inflight at the 'stop' path.

Fixes: b82526c ("vhost-scsi: support inflight io track")
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Li Feng <fengli@smartx.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20230104160433.21353-1-dongli.zhang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vivier pushed a commit that referenced this issue Apr 5, 2023
Fixes this tsan crash, easy to reproduce with any large enough program:

$ tests/unit/test-qht
1..2
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=1821568)
    #0 __tsan::CheckUnwind() ../../../../src/libsanitizer/tsan/tsan_rtl.cpp:353 (libtsan.so.2+0x90034)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 (libtsan.so.2+0xca555)
    #2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:67 (libtsan.so.2+0xb3616)
    #3 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:59 (libtsan.so.2+0xb3616)
    #4 __sanitizer::DeadlockDetector<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::onLockAfter(__sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >*, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:216 (libtsan.so.2+0xb3616)
    #5 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, __sanitizer::DDMutex*, bool, bool) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector1.cpp:169 (libtsan.so.2+0xb3616)
    #6 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int, int) ../../../../src/libsanitizer/tsan/tsan_rtl_mutex.cpp:200 (libtsan.so.2+0xa3382)
    #7 __tsan_mutex_post_lock ../../../../src/libsanitizer/tsan/tsan_interface_ann.cpp:384 (libtsan.so.2+0x76bc3)
    #8 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:259 (test-qht+0x44a97)
    #9 qht_map_lock_buckets ../util/qht.c:253 (test-qht+0x44a97)
    #10 do_qht_iter ../util/qht.c:809 (test-qht+0x45f33)
    #11 qht_iter ../util/qht.c:821 (test-qht+0x45f33)
    #12 iter_check ../tests/unit/test-qht.c:121 (test-qht+0xe473)
    #13 qht_do_test ../tests/unit/test-qht.c:202 (test-qht+0xe473)
    #14 qht_test ../tests/unit/test-qht.c:240 (test-qht+0xe7c1)
    #15 test_default ../tests/unit/test-qht.c:246 (test-qht+0xe828)
    #16 <null> <null> (libglib-2.0.so.0+0x7daed)
    #17 <null> <null> (libglib-2.0.so.0+0x7d80a)
    #18 <null> <null> (libglib-2.0.so.0+0x7d80a)
    #19 g_test_run_suite <null> (libglib-2.0.so.0+0x7dfe9)
    #20 g_test_run <null> (libglib-2.0.so.0+0x7e055)
    #21 main ../tests/unit/test-qht.c:259 (test-qht+0xd2c6)
    #22 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f)
    #23 __libc_start_main_impl ../csu/libc-start.c:392 (libc.so.6+0x29e3f)
    #24 _start <null> (test-qht+0xdb44)

Signed-off-by: Emilio Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230111151628.320011-5-cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230124180127.1881110-30-alex.bennee@linaro.org>
vivier pushed a commit that referenced this issue Apr 5, 2023
After live migration with virtio block device, qemu crash at:

	#0  0x000055914f46f795 in object_dynamic_cast_assert (obj=0x559151b7b090, typename=0x55914f80fbc4 "qio-channel", file=0x55914f80fb90 "/images/testvfe/sw/qemu.gerrit/include/io/channel.h", line=30, func=0x55914f80fcb8 <__func__.17257> "QIO_CHANNEL") at ../qom/object.c:872
	#1  0x000055914f480d68 in QIO_CHANNEL (obj=0x559151b7b090) at /images/testvfe/sw/qemu.gerrit/include/io/channel.h:29
	#2  0x000055914f4812f8 in qio_net_listener_set_client_func_full (listener=0x559151b7a720, func=0x55914f580b97 <tcp_chr_accept>, data=0x5591519f4ea0, notify=0x0, context=0x0) at ../io/net-listener.c:166
	#3  0x000055914f580059 in tcp_chr_update_read_handler (chr=0x5591519f4ea0) at ../chardev/char-socket.c:637
	#4  0x000055914f583dca in qemu_chr_be_update_read_handlers (s=0x5591519f4ea0, context=0x0) at ../chardev/char.c:226
	#5  0x000055914f57b7c9 in qemu_chr_fe_set_handlers_full (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false, sync_state=true) at ../chardev/char-fe.c:279
	#6  0x000055914f57b86d in qemu_chr_fe_set_handlers (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false) at ../chardev/char-fe.c:304
	#7  0x000055914f378caf in vhost_user_async_close (d=0x559152bf21a0, chardev=0x559152bf23a0, vhost=0x559152bf2420, cb=0x55914f2fb8c1 <vhost_user_blk_disconnect>) at ../hw/virtio/vhost-user.c:2725
	#8  0x000055914f2fba40 in vhost_user_blk_event (opaque=0x559152bf21a0, event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:395
	#9  0x000055914f58388c in chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:61
	#10 0x000055914f583905 in qemu_chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:81
	#11 0x000055914f581275 in char_socket_finalize (obj=0x5591519f4ea0) at ../chardev/char-socket.c:1083
	#12 0x000055914f46f073 in object_deinit (obj=0x5591519f4ea0, type=0x5591519055c0) at ../qom/object.c:680
	#13 0x000055914f46f0e5 in object_finalize (data=0x5591519f4ea0) at ../qom/object.c:694
	#14 0x000055914f46ff06 in object_unref (objptr=0x5591519f4ea0) at ../qom/object.c:1202
	#15 0x000055914f4715a4 in object_finalize_child_property (obj=0x559151b76c50, name=0x559151b7b250 "char3", opaque=0x5591519f4ea0) at ../qom/object.c:1747
	#16 0x000055914f46ee86 in object_property_del_all (obj=0x559151b76c50) at ../qom/object.c:632
	#17 0x000055914f46f0d2 in object_finalize (data=0x559151b76c50) at ../qom/object.c:693
	#18 0x000055914f46ff06 in object_unref (objptr=0x559151b76c50) at ../qom/object.c:1202
	#19 0x000055914f4715a4 in object_finalize_child_property (obj=0x559151b6b560, name=0x559151b76630 "chardevs", opaque=0x559151b76c50) at ../qom/object.c:1747
	#20 0x000055914f46ef67 in object_property_del_child (obj=0x559151b6b560, child=0x559151b76c50) at ../qom/object.c:654
	#21 0x000055914f46f042 in object_unparent (obj=0x559151b76c50) at ../qom/object.c:673
	#22 0x000055914f58632a in qemu_chr_cleanup () at ../chardev/char.c:1189
	#23 0x000055914f16c66c in qemu_cleanup () at ../softmmu/runstate.c:830
	#24 0x000055914eee7b9e in qemu_default_main () at ../softmmu/main.c:38
	#25 0x000055914eee7bcc in main (argc=86, argv=0x7ffc97cb8d88) at ../softmmu/main.c:48

In char_socket_finalize after s->listener freed, event callback function
vhost_user_blk_event will be called to handle CHR_EVENT_CLOSED.
vhost_user_blk_event is calling qio_net_listener_set_client_func_full which
is still using s->listener.

Setting s->listener = NULL after object_unref(OBJECT(s->listener)) can
solve this issue.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20230214021430.3638579-1-yajunw@nvidia.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants