Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for NAND service to publish #1

Closed
asLody opened this issue Feb 4, 2019 · 2 comments
Closed

Waiting for NAND service to publish #1

asLody opened this issue Feb 4, 2019 · 2 comments

Comments

@asLody
Copy link

asLody commented Feb 4, 2019

Following the tutorial, I started qemu and LLDB, but then the kernel loops to print : Waiting for NAND service to publish.

@asLody
Copy link
Author

asLody commented Feb 4, 2019

It looks like this is what the tutorial shows :)

@asLody asLody closed this as completed Feb 4, 2019
@zhuowei
Copy link
Owner

zhuowei commented Feb 4, 2019

Yeah, the only peripheral currently emulated is the serial port.

zhuowei pushed a commit that referenced this issue Jul 23, 2019
When emulating irqchip in qemu, such as following command:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -smp 4 -hda /home/test/test.img
-machine kernel-irqchip=off --enable-kvm -vnc :0 -device edu -monitor stdio

We will get a crash with following asan output:

(qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index 35 out of bounds for type 'int [24]'
=================================================================
==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61b000003114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
WRITE of size 4 at 0x61b000003114 thread T4
    #0 0x5579e3c7a80e in ioapic_eoi_broadcast /home/test/qemu5/qemu/hw/intc/ioapic.c:266
    #1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
    #2 0x5579e3c720a7 in apic_mem_write /home/test/qemu5/qemu/hw/intc/apic.c:802
    qemu#3 0x5579e3b1e31a in memory_region_write_accessor /home/test/qemu5/qemu/memory.c:503
    qemu#4 0x5579e3b1e6a2 in access_with_adjusted_size /home/test/qemu5/qemu/memory.c:569
    qemu#5 0x5579e3b28d77 in memory_region_dispatch_write /home/test/qemu5/qemu/memory.c:1497
    qemu#6 0x5579e3a1b36b in flatview_write_continue /home/test/qemu5/qemu/exec.c:3323
    qemu#7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
    qemu#8 0x5579e3a1bcb1 in address_space_write /home/test/qemu5/qemu/exec.c:3452
    qemu#9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
    qemu#10 0x5579e3b8b979 in kvm_cpu_exec /home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
    qemu#11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn /home/test/qemu5/qemu/cpus.c:1287
    qemu#12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
    qemu#13 0x7fd0146376da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    qemu#14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e

This is because in ioapic_eoi_broadcast function, we uses 'vector' to
index the 's->irq_eoi'. To fix this, we should uses the irq number.

Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190622002119.126834-1-liq3ea@163.com>
zhuowei pushed a commit that referenced this issue Jul 23, 2019
The test aarch64 kernel is in an array defined with
 unsigned char aarch64_kernel[] = { [...] }

which means it could be any size; currently it's quite small.
However we write it to a file using init_bootfile(), which
writes exactly 512 bytes to the file. This will break if
we ever end up with a kernel larger than that, and will
read garbage off the end of the array in the current setup
where the kernel is smaller.

Make init_bootfile() take an argument giving the length of
the data to write. This allows us to use it for all architectures
(previously s390 had a special-purpose init_bootfile_s390x
which hardcoded the file to write so it could write the
correct length). We assert that the x86 bootfile really is
exactly 512 bytes as it should be (and as we were previously
just assuming it was).

This was detected by the clang-7 asan:
==15607==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a796f51d20 at pc 0x55a796b89c2f bp 0x7ffc58e89160 sp 0x7ffc58e88908
READ of size 512 at 0x55a796f51d20 thread T0
    #0 0x55a796b89c2e in fwrite (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0xb0c2e)
    #1 0x55a796c46492 in init_bootfile /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:99:5
    #2 0x55a796c46492 in test_migrate_start /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:593
    qemu#3 0x55a796c44101 in test_baddest /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:854:9
    qemu#4 0x7f906ffd3cc9  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72cc9)
    qemu#5 0x7f906ffd3bfa  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
    qemu#6 0x7f906ffd3bfa  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
    qemu#7 0x7f906ffd3ea1 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ea1)
    qemu#8 0x7f906ffd3ec0 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ec0)
    qemu#9 0x55a796c43707 in main /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:1187:11
    qemu#10 0x7f906e9abb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
    qemu#11 0x55a796b6c2d9 in _start (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0x932d9)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190702150311.20467-1-peter.maydell@linaro.org
zhuowei pushed a commit that referenced this issue Jul 23, 2019
Reading the RX_DATA register when the RX_FIFO is empty triggers
an abort. This can be easily reproduced:

  $ qemu-system-arm -M emcraft-sf2 -monitor stdio -S
  QEMU 4.0.50 monitor - type 'help' for more information
  (qemu) x 0x40001010
  Aborted (core dumped)

  (gdb) bt
  #1  0x00007f035874f895 in abort () at /lib64/libc.so.6
  #2  0x00005628686591ff in fifo8_pop (fifo=0x56286a9a4c68) at util/fifo8.c:66
  qemu#3  0x00005628683e0b8e in fifo32_pop (fifo=0x56286a9a4c68) at include/qemu/fifo32.h:137
  qemu#4  0x00005628683e0efb in spi_read (opaque=0x56286a9a4850, addr=4, size=4) at hw/ssi/mss-spi.c:168
  qemu#5  0x0000562867f96801 in memory_region_read_accessor (mr=0x56286a9a4b60, addr=16, value=0x7ffeecb0c5c8, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
  qemu#6  0x0000562867f96cdb in access_with_adjusted_size (addr=16, value=0x7ffeecb0c5c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x562867f967c3 <memory_region_read_accessor>, mr=0x56286a9a4b60, attrs=...) at memory.c:569
  qemu#7  0x0000562867f99940 in memory_region_dispatch_read1 (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1420
  qemu#8  0x0000562867f99a08 in memory_region_dispatch_read (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1447
  qemu#9  0x0000562867f38721 in flatview_read_continue (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, addr1=16, l=4, mr=0x56286a9a4b60) at exec.c:3385
  qemu#10 0x0000562867f38874 in flatview_read (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3423
  qemu#11 0x0000562867f388ea in address_space_read_full (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3436
  qemu#12 0x0000562867f389c5 in address_space_rw (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=false) at exec.c:3466
  qemu#13 0x0000562867f3bdd7 in cpu_memory_rw_debug (cpu=0x56286aa19d00, addr=1073745936, buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=0) at exec.c:3976
  qemu#14 0x000056286811ed51 in memory_dump (mon=0x56286a8c32d0, count=1, format=120, wsize=4, addr=1073745936, is_physical=0) at monitor/misc.c:730
  qemu#15 0x000056286811eff1 in hmp_memory_dump (mon=0x56286a8c32d0, qdict=0x56286b15c400) at monitor/misc.c:785
  qemu#16 0x00005628684740ee in handle_hmp_command (mon=0x56286a8c32d0, cmdline=0x56286a8caeb2 "0x40001010") at monitor/hmp.c:1082

From the datasheet "Actel SmartFusion Microcontroller Subsystem
User's Guide" Rev.1, Table 13-3 "SPI Register Summary", this
register has a reset value of 0.

Check the FIFO is not empty before accessing it, else log an
error message.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-3-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
zhuowei pushed a commit that referenced this issue Jul 23, 2019
In the previous commit we fixed a crash when the guest read a
register that pop from an empty FIFO.
By auditing the repository, we found another similar use with
an easy way to reproduce:

  $ qemu-system-aarch64 -M xlnx-zcu102 -monitor stdio -S
  QEMU 4.0.50 monitor - type 'help' for more information
  (qemu) xp/b 0xfd4a0134
  Aborted (core dumped)

  (gdb) bt
  #0  0x00007f6936dea57f in raise () at /lib64/libc.so.6
  #1  0x00007f6936dd4895 in abort () at /lib64/libc.so.6
  #2  0x0000561ad32975ec in xlnx_dp_aux_pop_rx_fifo (s=0x7f692babee70) at hw/display/xlnx_dp.c:431
  qemu#3  0x0000561ad3297dc0 in xlnx_dp_read (opaque=0x7f692babee70, offset=77, size=4) at hw/display/xlnx_dp.c:667
  qemu#4  0x0000561ad321b896 in memory_region_read_accessor (mr=0x7f692babf620, addr=308, value=0x7ffe05c1db88, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
  qemu#5  0x0000561ad321bd70 in access_with_adjusted_size (addr=308, value=0x7ffe05c1db88, size=1, access_size_min=4, access_size_max=4, access_fn=0x561ad321b858 <memory_region_read_accessor>, mr=0x7f692babf620, attrs=...) at memory.c:569
  qemu#6  0x0000561ad321e9d5 in memory_region_dispatch_read1 (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1420
  qemu#7  0x0000561ad321ea9d in memory_region_dispatch_read (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1447
  qemu#8  0x0000561ad31bd742 in flatview_read_continue (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1, addr1=308, l=1, mr=0x7f692babf620) at exec.c:3385
  qemu#9  0x0000561ad31bd895 in flatview_read (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3423
  qemu#10 0x0000561ad31bd90b in address_space_read_full (as=0x561ad5bb3020, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3436
  qemu#11 0x0000561ad33b1c42 in address_space_read (len=1, buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", attrs=..., addr=4249485620, as=0x561ad5bb3020) at include/exec/memory.h:2131
  qemu#12 0x0000561ad33b1c42 in memory_dump (mon=0x561ad59c4530, count=1, format=120, wsize=1, addr=4249485620, is_physical=1) at monitor/misc.c:723
  qemu#13 0x0000561ad33b1fc1 in hmp_physical_memory_dump (mon=0x561ad59c4530, qdict=0x561ad6c6fd00) at monitor/misc.c:795
  qemu#14 0x0000561ad37b4a9f in handle_hmp_command (mon=0x561ad59c4530, cmdline=0x561ad59d0f22 "/b 0x00000000fd4a0134") at monitor/hmp.c:1082

Fix by checking the FIFO is not empty before popping from it.

The datasheet is not clear about the reset value of this register,
we choose to return '0'.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-4-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
zhuowei pushed a commit that referenced this issue Jul 23, 2019
When creating the admin queue in nvme_init() the variable that
holds the number of queues created is modified before actual
queue creation. This is a problem because if creating the queue
fails then the variable is left in inconsistent state. This was
actually observed when I tried to hotplug a nvme disk. The
control got to nvme_file_open() which called nvme_init() which
failed and thus nvme_close() was called which in turn called
nvme_free_queue_pair() with queue being NULL. This lead to an
instant crash:

  #0  0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164
  #1  0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729
  #2  0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781
  qemu#3  0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291
  qemu#4  0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551
  qemu#5  0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063
  qemu#6  0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712
  qemu#7  0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011
  qemu#8  0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156
  qemu#9  0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389
  qemu#10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 4, 2019
commit a6f230c move blockbackend back to main AioContext on unplug. It set the AioContext of
SCSIDevice to the main AioContex, but s->ctx is still the iothread AioContex(if the scsi controller
is configure with iothread). So if there are having in-flight requests during unplug, a failing assertion
happend. The bt is below:
(gdb) bt
#0  0x0000ffff86aacbd0 in raise () from /lib64/libc.so.6
#1  0x0000ffff86aadf7c in abort () from /lib64/libc.so.6
#2  0x0000ffff86aa6124 in __assert_fail_base () from /lib64/libc.so.6
qemu#3  0x0000ffff86aa61a4 in __assert_fail () from /lib64/libc.so.6
qemu#4  0x0000000000529118 in virtio_scsi_ctx_check (d=<optimized out>, s=<optimized out>, s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:246
qemu#5  0x0000000000529ec4 in virtio_scsi_handle_cmd_req_prepare (s=0x2779ec00, req=0xffff740397d0) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:559
qemu#6  0x000000000052a228 in virtio_scsi_handle_cmd_vq (s=0x2779ec00, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:603
qemu#7  0x000000000052afa8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:59
qemu#8  0x000000000054d94c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>) at /home/qemu-4.0.0/hw/virtio/virtio.c:2452

assert(blk_get_aio_context(d->conf.blk) == s->ctx) failed.

To avoid assertion failed,  moving the "if" after qdev_simple_device_unplug_cb.

In addition, to avoid another qemu crash below, add aio_disable_external before
qdev_simple_device_unplug_cb, which disable the further processing of external clients
when doing qdev_simple_device_unplug_cb.
(gdb) bt
#0  scsi_req_unref (req=0xffff6802c6f0) at hw/scsi/scsi-bus.c:1283
#1  0x00000000005294a4 in virtio_scsi_handle_cmd_req_submit (req=<optimized out>,
    s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:589
#2  0x000000000052a2a8 in virtio_scsi_handle_cmd_vq (s=s@entry=0x9c90e90,
    vq=vq@entry=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:625
qemu#3  0x000000000052afd8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>,
    vq=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:60
qemu#4  0x000000000054d97c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>)
    at /home/qemu-4.0.0/hw/virtio/virtio.c:2447
qemu#5  0x00000000009b204c in run_poll_handlers_once (ctx=ctx@entry=0x6efea40,
    timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:521
qemu#6  0x00000000009b2b64 in run_poll_handlers (ctx=ctx@entry=0x6efea40,
    max_ns=max_ns@entry=4000, timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:559
qemu#7  0x00000000009b2ca0 in try_poll_mode (ctx=ctx@entry=0x6efea40, timeout=0xffff7d7f7308,
    timeout@entry=0xffff7d7f7348) at util/aio-posix.c:594
qemu#8  0x00000000009b31b8 in aio_poll (ctx=0x6efea40, blocking=blocking@entry=true)
    at util/aio-posix.c:636
qemu#9  0x00000000006973cc in iothread_run (opaque=0x6ebd800) at iothread.c:75
qemu#10 0x00000000009b592c in qemu_thread_start (args=0x6efef60) at util/qemu-thread-posix.c:502
qemu#11 0x0000ffff8057f8bc in start_thread () from /lib64/libpthread.so.0
qemu#12 0x0000ffff804e5f8c in thread_start () from /lib64/libc.so.6
(gdb) p bus
$1 = (SCSIBus *) 0x0

Signed-off-by: Zhengui li <lizhengui@huawei.com>
Message-Id: <1563696502-7972-1-git-send-email-lizhengui@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1563829520-17525-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 4, 2019
Commit 4812f26 tried to fix rollback path of xics_kvm_connect() but
it isn't enough. If we fail to create the KVM device, the guest fails
to boot later on with:

[    0.010817] pci 0000:00:00.0: Adding to iommu group 0
[    0.010863] irq: unknown-1 didn't like hwirq-0x1200 to VIRQ17 mapping (rc=-22)
[    0.010923] pci 0000:00:01.0: Adding to iommu group 0
[    0.010968] irq: unknown-1 didn't like hwirq-0x1201 to VIRQ17 mapping (rc=-22)
[    0.011543] EEH: No capable adapters found
[    0.011597] irq: unknown-1 didn't like hwirq-0x1000 to VIRQ17 mapping (rc=-22)
[    0.011651] audit: type=2000 audit(1563977526.000:1): state=initialized audit_enabled=0 res=1
[    0.011703] ------------[ cut here ]------------
[    0.011729] event-sources: Unable to allocate interrupt number for /event-sources/epow-events
[    0.011776] WARNING: CPU: 0 PID: 1 at arch/powerpc/platforms/pseries/event_sources.c:34 request_event_sources_irqs+0xbc/0x150
[    0.011828] Modules linked in:
[    0.011850] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.17-300.fc30.ppc64le #1
[    0.011886] NIP:  c0000000000d4fac LR: c0000000000d4fa8 CTR: c0000000018f0000
[    0.011923] REGS: c00000001e4c38d0 TRAP: 0700   Not tainted  (5.1.17-300.fc30.ppc64le)
[    0.011966] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000284  XER: 20040000
[    0.012012] CFAR: c00000000011b42c IRQMASK: 0
[    0.012012] GPR00: c0000000000d4fa8 c00000001e4c3b60 c0000000015fc400 0000000000000051
[    0.012012] GPR04: 0000000000000001 0000000000000000 0000000000000081 772d6576656e7473
[    0.012012] GPR08: 000000001edf0000 c0000000014d4830 c0000000014d4830 6e6576652f20726f
[    0.012012] GPR12: 0000000000000000 c0000000018f0000 c000000000010bf0 0000000000000000
[    0.012012] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.012012] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.012012] GPR24: 0000000000000000 0000000000000000 c000000000ebbf00 c0000000000d5570
[    0.012012] GPR28: c000000000ebc008 c00000001fff8248 0000000000000000 0000000000000000
[    0.012372] NIP [c0000000000d4fac] request_event_sources_irqs+0xbc/0x150
[    0.012409] LR [c0000000000d4fa8] request_event_sources_irqs+0xb8/0x150
[    0.012445] Call Trace:
[    0.012462] [c00000001e4c3b60] [c0000000000d4fa8] request_event_sources_irqs+0xb8/0x150 (unreliable)
[    0.012513] [c00000001e4c3bf0] [c000000001042848] __machine_initcall_pseries_init_ras_IRQ+0xc8/0xf8
[    0.012563] [c00000001e4c3c20] [c000000000010810] do_one_initcall+0x60/0x254
[    0.012611] [c00000001e4c3cf0] [c000000001024538] kernel_init_freeable+0x35c/0x444
[    0.012655] [c00000001e4c3db0] [c000000000010c14] kernel_init+0x2c/0x148
[    0.012693] [c00000001e4c3e20] [c00000000000bdc4] ret_from_kernel_thread+0x5c/0x78
[    0.012736] Instruction dump:
[    0.012759] 38a00000 7c7f1b78 7f64db78 2c1f0000 2fbf0000 78630020 4180002c 409effa8
[    0.012805] 7fa4eb78 7f43d378 48046421 60000000 <0fe00000> 3bde0001 2c1e0010 7fde07b4
[    0.012851] ---[ end trace aa5785707323fad3 ]---

This happens because QEMU fell back on XICS emulation but didn't unregister
the RTAS calls from KVM. The emulated RTAS calls are hence never called and
the KVM ones return an error to the guest since the KVM device is absent.

The sanity checks in xics_kvm_disconnect() are abusive since we're freeing
the KVM device. Simply drop them.

Fixes: 4812f26 "xics/kvm: Add proper rollback to xics_kvm_init()"
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156398744035.546975.7029414194633598474.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
Since commit d70c996, when enabling the PMU we get:

  $ qemu-system-aarch64 -cpu host,pmu=on -M virt,accel=kvm,gic-version=3
  Segmentation fault (core dumped)

  Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault.
  0x0000aaaaaae356d0 in kvm_ioctl (s=0x0, type=44547) at accel/kvm/kvm-all.c:2588
  2588        ret = ioctl(s->fd, type, arg);
  (gdb) bt
  #0  0x0000aaaaaae356d0 in kvm_ioctl (s=0x0, type=44547) at accel/kvm/kvm-all.c:2588
  #1  0x0000aaaaaae31568 in kvm_check_extension (s=0x0, extension=126) at accel/kvm/kvm-all.c:916
  #2  0x0000aaaaaafce254 in kvm_arm_pmu_supported (cpu=0xaaaaac214ab0) at target/arm/kvm.c:213
  qemu#3  0x0000aaaaaafc0f94 in arm_set_pmu (obj=0xaaaaac214ab0, value=true, errp=0xffffffffe438) at target/arm/cpu.c:1111
  qemu#4  0x0000aaaaab5533ac in property_set_bool (obj=0xaaaaac214ab0, v=0xaaaaac223a80, name=0xaaaaac11a970 "pmu", opaque=0xaaaaac222730, errp=0xffffffffe438) at qom/object.c:2170
  qemu#5  0x0000aaaaab5512f0 in object_property_set (obj=0xaaaaac214ab0, v=0xaaaaac223a80, name=0xaaaaac11a970 "pmu", errp=0xffffffffe438) at qom/object.c:1328
  qemu#6  0x0000aaaaab551e10 in object_property_parse (obj=0xaaaaac214ab0, string=0xaaaaac11b4c0 "on", name=0xaaaaac11a970 "pmu", errp=0xffffffffe438) at qom/object.c:1561
  qemu#7  0x0000aaaaab54ee8c in object_apply_global_props (obj=0xaaaaac214ab0, props=0xaaaaac018e20, errp=0xaaaaabd6fd88 <error_fatal>) at qom/object.c:407
  qemu#8  0x0000aaaaab1dd5a4 in qdev_prop_set_globals (dev=0xaaaaac214ab0) at hw/core/qdev-properties.c:1218
  qemu#9  0x0000aaaaab1d9fac in device_post_init (obj=0xaaaaac214ab0) at hw/core/qdev.c:1050
  ...
  qemu#15 0x0000aaaaab54f310 in object_initialize_with_type (obj=0xaaaaac214ab0, size=52208, type=0xaaaaabe237f0) at qom/object.c:512
  qemu#16 0x0000aaaaab54fa24 in object_new_with_type (type=0xaaaaabe237f0) at qom/object.c:687
  qemu#17 0x0000aaaaab54fa80 in object_new (typename=0xaaaaabe23970 "host-arm-cpu") at qom/object.c:702
  qemu#18 0x0000aaaaaaf04a74 in machvirt_init (machine=0xaaaaac0a8550) at hw/arm/virt.c:1770
  qemu#19 0x0000aaaaab1e8720 in machine_run_board_init (machine=0xaaaaac0a8550) at hw/core/machine.c:1138
  qemu#20 0x0000aaaaaaf95394 in qemu_init (argc=5, argv=0xffffffffea58, envp=0xffffffffea88) at softmmu/vl.c:4348
  qemu#21 0x0000aaaaaada3f74 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at softmmu/main.c:48

This is because in frame #2, cpu->kvm_state is still NULL
(the vCPU is not yet realized).

KVM has a hard requirement of all cores supporting the same
feature set. We only need to check if the accelerator supports
a feature, not each vCPU individually.

Fix by removing the 'CPUState *cpu' argument from the
kvm_arm_<FEATURE>_supported() functions.

Fixes: d70c996 ('Use CPUState::kvm_state in kvm_arm_pmu_supported')
Reported-by: Haibo Xu <haibo.xu@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
When adding the generic PCA955xClass in commit 736132e, we
forgot to set the class_size field. Fill it now to avoid:

  (gdb) run -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  Starting program: ../../qemu/qemu/arm-softmmu/qemu-system-arm -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  double free or corruption (!prev)
  Thread 1 "qemu-system-arm" received signal SIGABRT, Aborted.
  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  (gdb) where
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffff75d8859 in __GI_abort () at abort.c:79
  #2  0x00007ffff76433ee in __libc_message
      (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff776d285 "%s\n")
      at ../sysdeps/posix/libc_fatal.c:155
  qemu#3  0x00007ffff764b47c in malloc_printerr
      (str=str@entry=0x7ffff776f690 "double free or corruption (!prev)")
      at malloc.c:5347
  qemu#4  0x00007ffff764d12c in _int_free
      (av=0x7ffff779eb80 <main_arena>, p=0x5555567a3990, have_lock=<optimized out>) at malloc.c:4317
  qemu#5  0x0000555555c906c3 in type_initialize_interface
      (ti=ti@entry=0x5555565b8f40, interface_type=0x555556597ad0, parent_type=0x55555662ca10) at qom/object.c:259
  qemu#6  0x0000555555c902da in type_initialize (ti=ti@entry=0x5555565b8f40)
      at qom/object.c:323
  qemu#7  0x0000555555c90d20 in type_initialize (ti=0x5555565b8f40)
      at qom/object.c:1028

  $ valgrind --track-origins=yes qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479== Memcheck, a memory error detector
  ==77479== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
  ==77479== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
  ==77479== Command: qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479==
  ==77479== Invalid write of size 2
  ==77479==    at 0x6D8322: pca9552_class_init (pca9552.c:424)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)
  ==77479==  Address 0x583f108 is 0 bytes after a block of size 200 alloc'd
  ==77479==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==77479==    by 0x4AF8D30: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x844258: type_initialize.part.0 (object.c:306)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)

Fixes: 736132e ("hw/misc/pca9552: Add generic PCA955xClass")
Reported-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Message-id: 20200629074704.23028-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
Coverity has problems seeing through __builtin_choose_expr, which
result in it abandoning analysis of later functions that utilize a
definition that used MIN_CONST or MAX_CONST, such as in qemu-file.c:

 50    DECLARE_BITMAP(may_free, MAX_IOV_SIZE);

CID 1429992 (#1 of 1): Unrecoverable parse warning (PARSE_ERROR)1.
expr_not_constant: expression must have a constant value

As has been done in the past (see 07d6667), it's okay to dumb things
down when compiling for static analyzers.  (Of course, now the
syntax-checker has a false positive on our reference to
__COVERITY__...)

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: CID 1429992, CID 1429995, CID 1429997, CID 1429999
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200629162804.1096180-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
tries to fix a leak detected when building with --enable-sanitizers:
./i386-softmmu/qemu-system-i386
Upon exit:
==13576==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1216 byte(s) in 1 object(s) allocated from:
    #0 0x7f9d2ed5c628 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5)
    #1 0x7f9d2e963500 in g_malloc (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.)
    #2 0x55fa646d25cc in object_new_with_type /tmp/qemu/qom/object.c:686
    qemu#3 0x55fa63dbaa88 in qdev_new /tmp/qemu/hw/core/qdev.c:140
    qemu#4 0x55fa638a533f in pc_pflash_create /tmp/qemu/hw/i386/pc_sysfw.c:88
    qemu#5 0x55fa638a54c4 in pc_system_flash_create /tmp/qemu/hw/i386/pc_sysfw.c:106
    qemu#6 0x55fa646caa1d in object_init_with_type /tmp/qemu/qom/object.c:369
    qemu#7 0x55fa646d20b5 in object_initialize_with_type /tmp/qemu/qom/object.c:511
    qemu#8 0x55fa646d2606 in object_new_with_type /tmp/qemu/qom/object.c:687
    qemu#9 0x55fa639431e9 in qemu_init /tmp/qemu/softmmu/vl.c:3878
    qemu#10 0x55fa6335c1b8 in main /tmp/qemu/softmmu/main.c:48
    qemu#11 0x7f9d2cf06e0a in __libc_start_main ../csu/libc-start.c:308
    qemu#12 0x55fa6335f8e9 in _start (/tmp/qemu/build/i386-softmmu/qemu-system-i386)

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200701145231.19531-1-alxndr@bu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
… staging

* Make checkpatch say 'qemu' instead of 'kernel' (Aleksandar)
* Fix PSE guests with emulated NPT (Alexander B. #1)
* Fix leak (Alexander B. #2)
* HVF fixes (Roman, Cameron)
* New Sapphire Rapids CPUID bits (Cathy)
* cpus.c and softmmu/ cleanups (Claudio)
* TAP driver tweaks (Daniel, Havard)
* object-add bugfix and testcases (Eric A.)
* Fix Coverity MIN_CONST and MAX_CONST (Eric B.)
* "info lapic" improvement (Jan)
* SSE fixes (Joseph)
* "-msg guest-name" option (Mario)
* support for AMD nested live migration (myself)
* Small i386 TCG fixes (myself)
* improved error reporting for Xen (myself)
* fix "-cpu host -overcommit cpu-pm=on" (myself)
* Add accel/Kconfig (Philippe)
* iscsi sense handling fixes (Yongji)
* Misc bugfixes

# gpg: Signature made Sat 11 Jul 2020 00:33:41 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (47 commits)
  linux-headers: update again to 5.8
  apic: Report current_count via 'info lapic'
  scripts: improve message when TAP based tests fail
  target/i386: Enable TSX Suspend Load Address Tracking feature
  target/i386: Add SERIALIZE cpu feature
  softmmu/vl: Remove the check for colons in -accel parameters
  cpu-throttle: new module, extracted from cpus.c
  softmmu: move softmmu only files from root
  pc: fix leak in pc_system_flash_cleanup_unused
  cpus: Move CPU code from exec.c to cpus-common.c
  target/i386: Correct the warning message of Intel PT
  checkpatch: Change occurences of 'kernel' to 'qemu' in user messages
  iscsi: return -EIO when sense fields are meaningless
  iscsi: handle check condition status in retry loop
  target/i386: sev: fail query-sev-capabilities if QEMU cannot use SEV
  target/i386: sev: provide proper error reporting for query-sev-capabilities
  KVM: x86: believe what KVM says about WAITPKG
  target/i386: implement undocumented "smsw r32" behavior
  target/i386: remove gen_io_end
  Makefile: simplify MINIKCONF rules
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
…tart

When the disconnect event is triggered in the connecting stage,
the tcp_chr_disconnect_locked may be called twice.

The first call:
    #0  qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:120
    #1  0x000055555558e38c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    #2  0x000055555558e3cd in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    qemu#3  0x000055555558ea32 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    qemu#4  0x000055555558eeb8 in qemu_chr_socket_connected (task=0x55555582f300, opaque=<optimized out>) at chardev/char-socket.c:1090
    qemu#5  0x0000555555574352 in qio_task_complete (task=task@entry=0x55555582f300) at io/task.c:196
    qemu#6  0x00005555555745f4 in qio_task_thread_result (opaque=0x55555582f300) at io/task.c:111
    qemu#7  qio_task_wait_thread (task=0x55555582f300) at io/task.c:190
    qemu#8  0x000055555558f17e in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1013
    qemu#9  0x0000555555567cbd in char_socket_client_reconnect_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1152
The second call:
    #0  0x00007ffff5ac3277 in raise () from /lib64/libc.so.6
    #1  0x00007ffff5ac4968 in abort () from /lib64/libc.so.6
    #2  0x00007ffff5abc096 in __assert_fail_base () from /lib64/libc.so.6
    qemu#3  0x00007ffff5abc142 in __assert_fail () from /lib64/libc.so.6
    qemu#4  0x000055555558d10a in qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:125
    qemu#5  0x000055555558df0c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    qemu#6  0x000055555558df4d in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    qemu#7  0x000055555558e5b2 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    qemu#8  0x000055555558e93a in tcp_chr_connect_client_sync (chr=chr@entry=0x55555582ee90, errp=errp@entry=0x7fffffffd178) at chardev/char-socket.c:944
    qemu#9  0x000055555558ec78 in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1035
    qemu#10 0x000055555556804b in char_socket_client_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1023

Run test/test-char to reproduce this issue.

test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.

Signed-off-by: Li Feng <fengli@smartx.com>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200522025554.41063-1-fengli@smartx.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
With a reconnect socket, qemu_char_open() will start a background
thread. It should keep a reference on the chardev.

Fixes invalid read:
READ of size 8 at 0x6040000ac858 thread T7
    #0 0x5555598d37b8 in unix_connect_saddr /home/elmarco/src/qq/util/qemu-sockets.c:954
    #1 0x5555598d4751 in socket_connect /home/elmarco/src/qq/util/qemu-sockets.c:1109
    #2 0x555559707c34 in qio_channel_socket_connect_sync /home/elmarco/src/qq/io/channel-socket.c:145
    qemu#3 0x5555596adebb in tcp_chr_connect_client_task /home/elmarco/src/qq/chardev/char-socket.c:1104
    qemu#4 0x555559723d55 in qio_task_thread_worker /home/elmarco/src/qq/io/task.c:123
    qemu#5 0x5555598a6731 in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:519
    qemu#6 0x7ffff40d4431 in start_thread (/lib64/libpthread.so.0+0x9431)
    qemu#7 0x7ffff40029d2 in __clone (/lib64/libc.so.6+0x1019d2)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200420112012.567284-1-marcandre.lureau@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
"tmp.tls_hostname" and "tmp.tls_creds" allocated by migrate_params_test_apply()
is forgot to free at the end of qmp_migrate_set_parameters(). Fix that.

The leak stack:
Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   qemu#3 0xaaaac52447fb in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1377)
   qemu#4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   qemu#5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c:165)
   qemu#6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   qemu#7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   qemu#8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   qemu#9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   qemu#10 0xaaaac55cae0f in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   qemu#11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   qemu#12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   qemu#13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   qemu#14 0xaaaac47f45ef(/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   qemu#15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   qemu#16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   qemu#3 0xaaaac5244893 in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1382)
   qemu#4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   qemu#5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c)
   qemu#6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   qemu#7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   qemu#8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   qemu#9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   qemu#10 0xaaaac55cae0f in in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   qemu#11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   qemu#12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   qemu#13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   qemu#14 0xaaaac47f45ef (/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   qemu#15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   qemu#16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: KeQian Zhu <zhukeqian1@huawei.com>
Reviewed-by: HaiLiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O
request until it has completed. ENOMEDIUM requests are special because
there is no BlockDriverState when the drive has no medium!

Define a .get_aio_context() function for BlkAioEmAIOCB requests so that
bdrv_aio_cancel() can find the AioContext where the completion BH is
pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM
requests!

libFuzzer triggered the following assertion:

  cat << EOF | qemu-system-i386 -M pc-q35-5.0 \
    -nographic -monitor none -serial none \
    -qtest stdio -trace ide\*
  outl 0xcf8 0x8000fa24
  outl 0xcfc 0xe106c000
  outl 0xcf8 0x8000fa04
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fb20
  write 0x0 0x3 0x2780e7
  write 0xe106c22c 0xd 0x1130c218021130c218021130c2
  write 0xe106c218 0x15 0x110010110010110010110010110010110010110010
  EOF
  ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7
  ide_reset IDEstate 0x56170a77a340
  Aborted (core dumped)

  (gdb) bt
  #1  0x00007ffff4f93895 in abort () at /lib64/libc.so.6
  #2  0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745
  qemu#3  0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546
  qemu#4  0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318
  qemu#5  0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422
  qemu#6  0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650
  qemu#7  0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360
  qemu#8  0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513
  qemu#9  0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483

Looking at bdrv_aio_cancel:

2728 /* async I/Os */
2729
2730 void bdrv_aio_cancel(BlockAIOCB *acb)
2731 {
2732     qemu_aio_ref(acb);
2733     bdrv_aio_cancel_async(acb);
2734     while (acb->refcnt > 1) {
2735         if (acb->aiocb_info->get_aio_context) {
2736             aio_poll(acb->aiocb_info->get_aio_context(acb), true);
2737         } else if (acb->bs) {
2738             /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
2739              * assert that we're not using an I/O thread.  Thread-safe
2740              * code should use bdrv_aio_cancel_async exclusively.
2741              */
2742             assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
2743             aio_poll(bdrv_get_aio_context(acb->bs), true);
2744         } else {
2745             abort();     <===============
2746         }
2747     }
2748     qemu_aio_unref(acb);
2749 }

Fixes: 02c50ef ("block: Add bdrv_aio_cancel_async")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Buglink: https://bugs.launchpad.net/qemu/+bug/1878255
Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200720100141.129739-1-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
To make deallocating partially constructed objects work, the
visit_type_STRUCT() need to succeed without doing anything when passed
a null object.

Commit cdd2b22 "qapi: Smooth visitor error checking in generated
code" broke that.  To reproduce, run tests/test-qobject-input-visitor
with AddressSanitizer:

    ==4353==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 16 byte(s) in 1 object(s) allocated from:
	#0 0x7f192d0c5d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
	#1 0x7f192cd21b10 in g_malloc0 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51b10)
	#2 0x556725f6bbee in visit_next_list qapi/qapi-visit-core.c:86
	qemu#3 0x556725f49e15 in visit_type_UserDefOneList tests/test-qapi-visit.c:474
	qemu#4 0x556725f4489b in test_visitor_in_fail_struct_in_list tests/test-qobject-input-visitor.c:1086
	qemu#5 0x7f192cd42f29  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f29)

    SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Test case /visitor/input/fail/struct-in-list feeds a list with a bad
element to the QObject input visitor.  Visiting that element duly
fails, and aborts the visit with the list only partially constructed:
the faulty object is null.  Cleaning up the partially constructed list
visits that null object, fails, and aborts the visit before the list
node gets freed.

Fix the the generated visit_type_STRUCT() to succeed for null objects.

Fixes: cdd2b22
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200716150617.4027356-1-armbru@redhat.com>
Tested-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
zhuowei pushed a commit that referenced this issue Aug 3, 2020
It should be safe to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done

After some time, qemu-io will hang trying to drain, for example, like
this:

 qemu#3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
    util/aio-posix.c:600
 qemu#4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
    parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
 qemu#5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
 qemu#6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
 qemu#7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
 qemu#8 bdrv_open_inherit (filename=0x7fffba1563bc
    "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0,
    flags=24578, parent=0x0, child_class=0x0, child_role=0,
    errp=0x7fffba154620) at block.c:3491
 qemu#9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block.c:3513
 qemu#10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block/block-backend.c:421

And connection_co stack like this:

 #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
    action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
    io/channel.c:472
 qemu#3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
 qemu#4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
 qemu#5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
    "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
    io/channel.c:247
 qemu#6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:365
 qemu#7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:391
 qemu#8 nbd_start_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, structured_reply=true,
    zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
 qemu#9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
    nbd/client.c:1032
 qemu#10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
    block/nbd.c:1460
 qemu#11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
 qemu#12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
 qemu#13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
 qemu#14 coroutine_trampoline (i0=113190480, i1=22000) at
    util/coroutine-ucontext.c:173

Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants