Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Merge/sound upstream 20230306 (v6.3-rc1) #4223

Closed

Conversation

ujfalusi
Copy link
Collaborator

@ujfalusi ujfalusi commented Mar 6, 2023

a fresh tag of 6.3-rc1, this time I have not tested locally, let's see what CI thinks of this....

Eric Dumazet and others added 30 commits February 27, 2023 11:59
This is a follow up of commit 0a375c8 ("tcp: tcp_rtx_synack()
can be called from process context").

Frederick Lawler reported another "__this_cpu_add() in preemptible"
warning caused by the same reason.

In my former patch I took care of tcp_rtx_synack()
but forgot that tcp_check_req() also contained some SNMP updates.

Note that some parts of tcp_check_req() always run in BH context,
I added a comment to clarify this.

Fixes: 8336886 ("tcp: TCP Fast Open Server - support TFO listeners")
Link: https://lore.kernel.org/netdev/8cd33923-a21d-397c-e46b-2a068c287b03@cloudflare.com/T/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Frederick Lawler <fred@cloudflare.com>
Tested-by: Frederick Lawler <fred@cloudflare.com>
Link: https://lore.kernel.org/r/20230227083336.4153089-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow the new GSS Kerberos encryption type test suites to run
outside of the kunit infrastructure. Replace the assertion that
fires when lookup_enctype() so that the case is skipped instead of
failing outright.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Unable to handle kernel paging request at virtual address 7365742 when execute
[7365742] *pgd=00000000
Internal error: Oops: 80000005 [thesofproject#1] ARM
CPU: 0 PID: 1 Comm: swapper Tainted: G                 N 6.2.0-rc7-00133-g373f26a81164-dirty thesofproject#9
Hardware name: Generic DT based system
PC is at 0x73657420
LR is at kunit_run_tests+0x3e0/0x5f4

On x86 with GCC 12, the missing array terminators did not seem to
matter. Other platforms appear to be more picky.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
…git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  The notable fixes here are the EEE fix which restores boot for many
  embedded platforms (real and QEMU); WiFi warning suppression and the
  ICE Kconfig cleanup.

  Current release - regressions:

   - phy: multiple fixes for EEE rework

   - wifi: wext: warn about usage only once

   - wifi: ath11k: allow system suspend to survive ath11k

  Current release - new code bugs:

   - mlx5: Fix memory leak in IPsec RoCE creation

   - ibmvnic: assign XPS map to correct queue index

  Previous releases - regressions:

   - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces

   - netfilter: ctnetlink: make event listener tracking global

   - nf_tables: allow to fetch set elements when table has an owner

   - mlx5:
      - fix skb leak while fifo resync and push
      - fix possible ptp queue fifo use-after-free

  Previous releases - always broken:

   - sched: fix action bind logic

   - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also
     uses a mutex

   - netfilter: conntrack: fix rmmod double-free race

   - netfilter: xt_length: use skb len to match in length_mt6, avoid
     issues with BIG TCP

  Misc:

   - ice: remove unnecessary CONFIG_ICE_GNSS

   - mlx5e: remove hairpin write debugfs files

   - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"

* tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  tcp: tcp_check_req() can be called from process context
  net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard
  xen-netback: remove unused variables pending_idx and index
  net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy
  net: dsa: ocelot_ext: remove unnecessary phylink.h include
  net: mscc: ocelot: fix duplicate driver name error
  net: dsa: felix: fix internal MDIO controller resource length
  net: dsa: seville: ignore mscc-miim read errors from Lynx PCS
  net/sched: act_sample: fix action bind logic
  net/sched: act_mpls: fix action bind logic
  net/sched: act_pedit: fix action bind logic
  wifi: wext: warn about usage only once
  wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue
  qede: avoid uninitialized entries in coal_entry array
  nfc: fix memory leak of se_io context in nfc_genl_se_io
  ice: remove unnecessary CONFIG_ICE_GNSS
  net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()
  ibmvnic: Assign XPS map to correct queue index
  docs: net: fix inaccuracies in msg_zerocopy.rst
  tools: net: add __pycache__ to gitignore
  ...
CP0_CMGCRBASE is not always available on CPS enabled system
such as early proAptiv.

For early SMP bring up where we can't safely access memeory,
we patch the entry of CPS NMI vector to inject CMGCR address
directly into register during early core bringup.

For VPE bringup as the core is already coherenct at that point
we just read the variable to obtain the address.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In c0_compare_int_usable we clear compare interrupt by write value
just read out from counter to compare register.

However sometimes if those all instructions are graduated together
then it's possible that at the time compare register is written, the
counter haven't progressed, thus the interrupt is triggered again.

It also applies to QEMU that instructions is executed significantly
faster then counter.

Offset the value used to clear interrupt by one to prevent that happen.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
All MIPS processors on the Ralink SoCs implement the MIPS32 Release 2
Architecture. Remove SYS_HAS_CPU_MIPS32_R1.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Currently, out of every Ralink SoC, only the dt-binding of the MT7621 SoC
uses pinctrl. Because of this, PINCTRL is not selected at all. Make
SOC_MT7621 select PINCTRL.

Remove PINCTRL_MT7621, enabling it for the MT7621 SoC will be handled under
the PINCTRL_MT7621 option.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
KUnit's 'hooks.o' file need to be built-in whenever KUnit is enabled
(even if CONFIG_KUNIT=m).  We'd previously attemtped to do this by
adding 'kunit/hooks.o' to obj-y in lib/Makefile, but this caused hooks.c
to be rebuilt even when it was unchanged.

Instead, always recurse into lib/kunit using obj-y when KUnit is
enabled, and add the hooks there.

Fixes: 7170b7e ("kunit: Add "hooks" to call into KUnit when it's built as a module").
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-kselftest/CAHk-=wiEf7irTKwPJ0jTMOF3CS-13UXmF6Fns3wuWpOZ_wGyZQ@mail.gmail.com/
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Restore the vcs_size() handling in vcs_read() to what
it had been in previous version.

Fixes: 226fae1 ("vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF")
Suggested-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…rnel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've got a huge number of patches that improve code
  readability along with minor bug fixes, while we've mainly fixed some
  critical issues in recently-added per-block age-based extent_cache,
  atomic write support, and some folio cases.

  Enhancements:

   - add sysfs nodes to set last_age_weight and manage
     discard_io_aware_gran

   - show ipu policy in debugfs

   - reduce stack memory cost by using bitfield in struct f2fs_io_info

   - introduce trace_f2fs_replace_atomic_write_block

   - enhance iostat support and adds flush commands

  Bug fixes:

   - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"

   - fix kernel crash on the atomic write abort flow

   - call clear_page_private_reference in .{release,invalid}_folio

   - support .migrate_folio for compressed inode

   - fix cgroup writeback accounting with fs-layer encryption

   - retry to update the inode page given data corruption

   - fix kernel crash due to NULL io->bio

   - fix some bugs in per-block age-based extent_cache:
       - wrong calculation of block age
       - update age extent in f2fs_do_zero_range()
       - update age extent correctly during truncation"

* tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: drop unnecessary arg for f2fs_ioc_*()
  f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  f2fs: synchronize atomic write aborts
  f2fs: fix wrong segment count
  f2fs: replace si->sbi w/ sbi in stat_show()
  f2fs: export ipu policy in debugfs
  f2fs: make kobj_type structures constant
  f2fs: fix to do sanity check on extent cache correctly
  f2fs: add missing description for ipu_policy node
  f2fs: fix to set ipu policy
  f2fs: fix typos in comments
  f2fs: fix kernel crash due to null io->bio
  f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
  f2fs: add sysfs nodes to set last_age_weight
  f2fs: fix f2fs_show_options to show nogc_merge mount option
  f2fs: fix cgroup writeback accounting with fs-layer encryption
  f2fs: fix wrong calculation of block age
  f2fs: fix to update age extent in f2fs_do_zero_range()
  f2fs: fix to update age extent correctly during truncation
  f2fs: fix to avoid potential memory corruption in __update_iostat_latency()
  ...
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
access(2) remains commonly used, for example on exec:
access("/etc/ld.so.preload", R_OK)

or when running gcc: strace -c gcc empty.c

  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
    0.00    0.000000           0        42        26 access

It falls down to do_faccessat without the AT_EACCESS flag, which in turn
results in allocation of new creds in order to modify fsuid/fsgid and
caps.  This is a very expensive process single-threaded and most notably
multi-threaded, with numerous structures getting refed and unrefed on
imminent new cred destruction.

Turns out for typical consumers the resulting creds would be identical
and this can be checked upfront, avoiding the hard work.

An access benchmark plugged into will-it-scale running on Cascade Lake
shows:

    test     proc     before       after
    access1     1    1310582     2908735    (+121%) # distinct files
    access1    24    4716491    63822173   (+1353%) # distinct files
    access2    24    2378041     5370335    (+125%) # same file

The above benchmarks are not integrated into will-it-scale, but can be
found in a pull request:

  https://github.com/antonblanchard/will-it-scale/pull/36/files

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case 4, we are shrinking 'prev' (PPPP in the comment) and expanding
'mid' (NNNN).  So we need to make sure 'mid' clones the anon_vma from
'prev', if it doesn't have any.  After commit 0503ea8 ("mm/mmap:
remove __vma_adjust()") we can fail to do that due to wrong parameters for
dup_anon_vma().  The call is a no-op because res == next, adjust == mid
and mid == next.  Fix it.

Link: https://lkml.kernel.org/r/ad91d62b-37eb-4b73-707a-3c45c9e16256@suse.cz
Fixes: 0503ea8 ("mm/mmap: remove __vma_adjust()")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
damon_get_folio() would always increase folio _refcount and
folio_isolate_lru() would increase folio _refcount if the folio's lru flag
is set.

If an unevictable folio isolated successfully, there will be two more
_refcount.  The one from folio_isolate_lru() will be decreased in
folio_puback_lru(), but the other one from damon_get_folio() will be left
behind.  This causes a pin page.

Whatever the case, the _refcount from damon_get_folio() should be
decreased.

Link: https://lkml.kernel.org/r/20230222064223.6735-1-andrew.yang@mediatek.com
Fixes: 57223ac ("mm/damon/paddr: support the pageout scheme")
Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[5.16.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
…LUSH

DFLTCC deflate with Z_NO_FLUSH might generate a corrupted stream when the
output buffer is not large enough to fit all the deflate output at once. 
The problem takes place on closing the deflate block since flush_pending()
might leave some output bits not written.  Similar problem for software
deflate with Z_BLOCK flush option (not supported by kernel zlib deflate)
has been fixed a while ago in userspace zlib but the fix never got to the
kernel.

Now flush_pending() flushes the bit buffer before copying out the byte
buffer, in order to really flush as much as possible.

Currently there are no users of DFLTCC deflate with Z_NO_FLUSH option in
the kernel so the problem remained hidden for a while.

This commit is based on the old zlib commit:
madler/zlib@0b828b4

Link: https://lkml.kernel.org/r/20230221131617.3369978-2-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After a memory error happens on a clean folio, a process unexpectedly
receives SIGBUS when it accesses the error page.  This SIGBUS killing is
pointless and simply degrades the level of RAS of the system, because the
clean folio can be dropped without any data lost on memory error handling
as we do for a clean pagecache.

When memory_failure() is called on a clean folio, try_to_unmap() is called
twice (one from split_huge_page() and one from hwpoison_user_mappings()). 
The root cause of the issue is that pte conversion to hwpoisoned entry is
now done in the first call of try_to_unmap() because PageHWPoison is
already set at this point, while it's actually expected to be done in the
second call.  This behavior disturbs the error handling operation like
removing pagecache, which results in the malfunction described above.

So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
when we really intend to convert pte to hwpoison entry.  This can prevent
other callers of try_to_unmap() from accidentally converting to hwpoison
entries.

Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev
Fixes: a42634a ("readahead: Use a folio in read_pages()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Georgi's old email is still picked up by the likes of get_maintainer.pl
and it keeps bouncing every time one submits an interconnect patch.  Map
it to his current @kernel.org one.

Link: https://lkml.kernel.org/r/20230217203516.826424-1-konrad.dybcio@linaro.org
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
code path:

ocfs2_ioctl_move_extents
 ocfs2_move_extents
  ocfs2_defrag_extent
   __ocfs2_move_extent
    + ocfs2_journal_access_di
    + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
    + ocfs2_journal_dirty //crash by jbs2 ASSERT

crash stacks:

PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
 #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
 thesofproject#1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
 thesofproject#2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
 thesofproject#3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
 thesofproject#4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
 thesofproject#5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
 thesofproject#6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
    [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
    RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
    RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
    R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
    R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 thesofproject#7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
 thesofproject#8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
 thesofproject#9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]

Analysis

This bug has the same root cause of 'commit 7f27ec9 ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in
ocfs2_write_end_nolock()")'.  For this bug, jbd2_journal_restart() is
called by ocfs2_split_extent() during defragmenting.

How to fix

For ocfs2_split_extent() can handle journal operations totally by itself. 
Caller doesn't need to call journal access/dirty pair, and caller only
needs to call journal start/stop pair.  The fix method is to remove
journal access/dirty from __ocfs2_move_extent().

The discussion for this patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html

Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This fixes three issues on move extents ioctl without auto defrag:

a) In ocfs2_find_victim_alloc_group(), we have to convert bits to block
   first in case of global bitmap.

b) In ocfs2_probe_alloc_group(), when finding enough bits in block
   group bitmap, we have to back off move_len to start pos as well,
   otherwise it may corrupt filesystem.

c) In ocfs2_ioctl_move_extents(), set me_threshold both for non-auto
   and auto defrag paths.  Otherwise it will set move_max_hop to 0 and
   finally cause unexpectedly ENOSPC error.

Currently there are no tools triggering the above issues since
defragfs.ocfs2 enables auto defrag by default.  Tested with manually
changing defragfs.ocfs2 to run non auto defrag path.

Link: https://lkml.kernel.org/r/20230220050526.22020-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 226fae1 ("vc_screen: move load of struct vc_data pointer in
vcs_read() to avoid UAF") moved the call to vcs_vc() into the loop.

While doing this it also moved the unconditional assignment of

	ret = -ENXIO;

This unconditional assignment was valid outside the loop but within it
it clobbers the actual value of ret.

To avoid this only assign "ret = -ENXIO" when actually needed.

[ Also, the 'goto unlock_out" needs to be just a "break", so that it
  does the right thing when it exits on later iterations when partial
  success has happened - Linus ]

Reported-by: Storm Dragon <stormdragon2976@gmail.com>
Link: https://lore.kernel.org/lkml/Y%2FKS6vdql2pIsCiI@hotmail.com/
Fixes: 226fae1 ("vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/64981d94-d00c-4b31-9063-43ad0a384bde@t-8ch.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MAINTAINERS entry for VAS (Virtual Accelerator Switchboard) no
longer has any maintainers, it just points to linuxppc-dev, since commit
6049606 ("powerpc: Update MAINTAINERS for ibmvnic and VAS").

So just drop the VAS entry, all the paths are already covered by the
main powerpc entry, ie. the output of get_maintainer.pl is unchanged.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230221101952.2697101-1-mpe@ellerman.id.au
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections.
Because they are not mentioned in the linker script warnings are
generated, and when orphan handling is set to error that becomes a build
error, eg:

  ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is
  being placed in '.text.tsan.module_ctor' ld.lld: error:
  vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in
  '.text.tsan.module_ctor'

Fix it by adding the sections to our linker script, similar to the
generic change made in 8483788 ("vmlinux.lds.h: Handle clang's
module.{c,d}tor sections").

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
Although powerpc now has objtool mcount support, it's not enabled in all
configurations due to dependencies.

On those configurations, with some linkers (binutils 2.37 at least),
it's still possible to hit the dreaded "recordmcount bug", eg. errors
such as:

    CC      kernel/kexec_file.o
  Cannot find symbol for section 10: .text.unlikely.
  kernel/kexec_file.o: failed
  make[1]: *** [scripts/Makefile.build:287 : kernel/kexec_file.o] Error 1

Those errors are much more prevalent when building with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, because it places every function
in a separate section.

CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is marked experimental and is not
enabled in any powerpc defconfigs or by major distros. Although it does
have at least some users on 32-bit where kernel size tends to be more
important.

Avoid the build errors by blocking CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
when the build is using recordmcount, rather than objtool. In practice
that means for 64-bit big endian builds, or 64-bit clang builds - both
because they lack CONFIG_MPROFILE_KERNEL.

On 32-bit objtool is always used, so
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is still available there.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230221130331.2714199-1-mpe@ellerman.id.au
The attempt to add DMA alignment padding by moving IV to the front
of edesc was completely broken as it didn't change the places where
edesc was freed.

It's also wrong as the IV may still share a cache-line with the
edesc.

Fix this by restoring the original layout and simply reserving
enough memmory so that the IV is on a DMA cache-line by itself.

Reported-by: Meenakshi Aggarwal <meenakshi.aggarwal@nxp.com>
Fixes: 199354d ("crypto: caam - Remove GFP_DMA and add DMA alignment padding")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Clang warns (or errors with CONFIG_WERROR):

  ../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:6: error: variable 'helper' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
          if (!fbdev)
              ^~~~~~
  ../drivers/gpu/drm/omapdrm/omap_fbdev.c:259:26: note: uninitialized use occurs here
          drm_fb_helper_unprepare(helper);
                                  ^~~~~~
  ../drivers/gpu/drm/omapdrm/omap_fbdev.c:235:2: note: remove the 'if' if its condition is always false
          if (!fbdev)
          ^~~~~~~~~~~
  ../drivers/gpu/drm/omapdrm/omap_fbdev.c:228:30: note: initialize the variable 'helper' to silence this warning
          struct drm_fb_helper *helper;
                                      ^
                                       = NULL
  1 error generated.

Return early, as there is nothing for the function to do if memory
cannot be allocated. There is no point in adding another label to just
emit the warning at the end of the function in this case, as memory
allocation failures are already logged.

Fixes: 3fb1f62 ("drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()")
Link: ClangBuiltLinux#1809
Link: https://lore.kernel.org/oe-kbuild-all/202302250058.fYTe9aTP-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230224-omapdrm-wsometimes-uninitialized-v1-1-3fec8906ee3a@kernel.org
The referenced commit added a wrapper for drm_gem_shmem_get_pages_sgt(),
but in the process it accidentally changed the export type from GPL to
non-GPL. Switch it back to GPL.

Reported-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Fixes: ddddeda ("drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()")
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230227-shmem-export-fix-v1-1-8880b2c25e81@asahilina.net
Florian reports that when building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y,
he sees "Misaligned patch-site" warnings at boot, e.g.

| Misaligned patch-site bcm2836_arm_irqchip_handle_irq+0x0/0x88
| WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/ftrace.c:120 ftrace_call_adjust+0x4c/0x70

This is because GCC will silently ignore `-falign-functions=N` when
passed `-Os`, resulting in functions not being aligned as we expect.
This is a known issue, and to account for this we modified the kernel to
avoid `-Os` generally. Unfortunately we forgot to account for
CONFIG_CC_OPTIMIZE_FOR_SIZE.

Forbid the use of CALL_OPS with CONFIG_CC_OPTIMIZE_FOR_SIZE=y to prevent
this issue. All exising ftrace features will work as before, though
without the performance benefit of CALL_OPS.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Link: http://lore.kernel.org/linux-arm-kernel/2d9284c3-3805-402b-5423-520ced56d047@gmail.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230227115819.365630-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In the removed code, num_clusters is 0, nothing is done in
exfat_chain_cont_cluster(), so it is unneeded, remove it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
When allocating a new cluster, exFAT first allocates from the
next cluster of the last cluster of the file. If the last cluster
of the file is the last cluster of the volume, allocate from the
first cluster. This is a normal case, but the following error log
will be printed. It makes users confused, so this commit removes
the error log.

[1960905.181545] exFAT-fs (sdb1): hint_cluster is invalid (262130)

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
@ujfalusi ujfalusi marked this pull request as draft March 9, 2023 11:25
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 9, 2023

Converting to draft and pushed a revert to 3ab18a6.

We need to do a new merge anyways as we got lots of patches in since I have opened this PR.

@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 9, 2023

OK, I would have been really surprised if a fresh -rc1 does not have suspend/resume issue.
The two SoundWire device is failing it: ADLP_RVP_SDW_IPC4ZPH and TGLU_RVP_SDW_IPC4ZPH, for first glance I don't see anything in the kernel log, it must be SDW specific, audio is not resuming after wakeup.

Unfortunately I don't have local device where I can test this. @plbossart are you able to reproduce this locally?

@plbossart
Copy link
Member

Last time we had such FAILs it boiled down to compiler issues... I'll try to reproduce this later today @ujfalusi

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 9, 2023

Last time we had such FAILs it boiled down to compiler issues...

For the record this was #4203 . There were failures with more than one toolchain version. We ignored them and standardized on some more "crowd-pleasing" toolchain.

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 9, 2023

OK, I would have been really surprised if a fresh -rc1 does not have suspend/resume issue.

We've had intermittent but frequent suspend/resume failures pretty much across the board since forever. We can't test production hardware and BIOS because of SOF signing, I bet that plays a very large part.

it must be SDW specific,

Because of the above I would recommend a bit more testing before coming to this conclusion.

@plbossart
Copy link
Member

no issues with TGLU_SKU0A32_SDCA

PASS: tplg-binary; pcm_list; playback-d100l1r1; capture-d100l1r1; playback-d1l100r1; capture_d1l100r1; playback_d1l1r50; capture_d1l1r50; speaker; pause-resume-playback; pause-resume-capture; volume; signal-stop-start-playback; signal-stop-start-capture; multiple-pipeline-playback; multiple-pipeline-capture; multiple-pause-resume; kmod-load-unload; kmod-load-unload-after-playback; suspend-resume; suspend-resume-with-playback; suspend-resume-with-capture;
SKIP: sof-logger; ipc-flood; xrun-injection-playback; xrun-injection-capture; simultaneous-playback-capture; volume_levels;

 test end with 0 failed tests

@plbossart
Copy link
Member

@ujfalusi I wonder if this is a new IPC4 issue, it'd make sense to see if we are able to pass the tests on ADLP_RVP_SDW with 2.2 IPC3 firmware.

Alternatively we could launch a daily test with this PR to cover more devices (daily_stable_v2_2 and daily_IPC4).

@plbossart
Copy link
Member

@ujfalusi I've run hundreds of cycles with suspend+audio, so far nothing bad seen with IPC3. I'll let it run further tonight but I don't think it's a systematic issue with SoundWire, it seems platform specific and possible IPC4 specific. Not an easy one to find.

@ujfalusi
Copy link
Collaborator Author

@plbossart, @ranj063, with a similar change as in ##4230 the suspend /resume tests are passing with 6.3.
I see other sequence changes as well, but let me cross check again.

@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 10, 2023

@plbossart, gash, no, reverting that change and the kernel I build is working fine.

$ gcc --version
gcc (GCC) 12.2.1 20230201

CI is using gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

This starts to sound wrong, something is really off, it does not make any sense that with a kernel we need to match a compiler, this cannot be the case.

@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Mar 10, 2023

@plbossart and to make things more fun: if I download the deb package from the CI build pr4223-3510, install the kernel and modules to my sdw machine, reboot and do the test: I see the same failure and I'm running IPC3 on that laptop.

So: CI built kernel fails suspend/resume on sdw devices with active audio (both IPC4 and IPC3), locally built kernel passes the same test with IPC3 and IPC4.

FYI: @marc-hb, @fredoh9

How cool is this? Not at all... :(

@plbossart
Copy link
Member

Thanks @ujfalusi for testing further.

So: CI built kernel fails suspend/resume on sdw devices with active audio (both IPC4 and IPC3), locally built kernel passes the same test with IPC3 and IPC4.

This points again to a difference in configurations and/or compilers. it's been a recurring issue, and with the CI build system being such a black box it's anyone's guess what it's doing really.

@fredoh9 @marc-hb @greg-intel we'll need your help on this one. If local tests pass and CI tests don't we need the "CI team" on board.

@plbossart
Copy link
Member

plbossart commented Mar 10, 2023

@plbossart, gash, no, reverting that change and the kernel I build is working fine.

$ gcc --version
gcc (GCC) 12.2.1 20230201

CI is using gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

This starts to sound wrong, something is really off, it does not make any sense that with a kernel we need to match a compiler, this cannot be the case.

I am using gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0 with Ubuntu 22.10 on my development device, the target is Fedora-based so no debian plumbing involved

@ujfalusi
Copy link
Collaborator Author

I was wondering if we are setting some custom funky compiler flags in CI when building the kernel? My guess is that we don't, but I don't know it for sure.

@plbossart
Copy link
Member

@ujfalusi possibly a red-herring but there's a difference between what CI did and what I tested

the CI kernel config uses this:
../kconfig/kconfig-sof-default.sh ../kconfig/xrun-debug-defconfig > /dev/null

I only used the first part.

the xrun stuff adds the following options

CONFIG_SND_DEBUG_VERBOSE=y
CONFIG_SND_PCM_XRUN_DEBUG=y

It's quite possible that the first option changes the timing and exposes some sort of race condition, made worse with a compiler change.

@plbossart
Copy link
Member

And no special compiler flags, the command line is this

make bindeb-pkg -j8 LOCALVERSION=$LOCAL_VERSION KDEB_PKGVERSION=$(make kernelversion)-${FORMAT_BRANCH_NAME} KBUILD_OUTPUT=$BUILD_FOLDER > /dev/null

@plbossart
Copy link
Member

The change in the .config has no effect on my local tests, still PASS. Humm...

@ujfalusi
Copy link
Collaborator Author

I have taken the config from the CI built kernel and built at the same commit.
What is stunning is that this is the second time in a row and it is again sdw. Something is not quite right, if a compiler swap fixes it, then it is likely works by luck and I don't buy it that this only affects our CI.
Probably worth a question in sdw list?

To sum up: kernel at the same commit with the same config.
Built by CI: broken sdw suspend/resume
Built locally (by you and me): no issues

Just amazing

@plbossart
Copy link
Member

@ujfalusi I don't know how anyone outside Intel could help. We have non-reproducible results between CI-built and locally build, so the conclusion will be "fix your CI".

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 10, 2023

and with the CI build system being such a black box it's anyone's guess what it's doing really.

I'm usually the first and loudest to complain about CI secrets and CI blackboxes but in this case CI is merely using a regular kconfig script, a plain Ubuntu compiler and regular make bindeb-pkg command.

From sofkernel_prs.gvy:

../kconfig/kconfig-sof-default.sh ../kconfig/xrun-debug-defconfig
# or
../kconfig/kconfig-sof-nocodec.sh ../kconfig/xrun-debug-defconfig

# then
GITSHA=$(cat .gitsha)
PRIDV=$(cat .prid)
make bindeb-pkg LOCALVERSION=-$PRIDV-default KDEB_PKGVERSION=$(make kernelversion)-$GITSHA -j8

And that's it, there is really not much to hide. I think and hope the kernel build is not affected by environment variables or any other hidden setting?

Something is not quite right, if a compiler swap fixes it, then it is likely works by luck and I don't buy it that this only affects our CI.

Best summary IMHO. I think we've known that since #4122 and #4203 when different Ubuntu compilers gave different results but we tried hard to ignore it at the time. Didn't work.

KERNELRELEASE does not need to match the package version in changelog.
Rather, it conventially matches what is called 'ABINAME', which is a
part of the binary package names.

Both are the same by default, but the former might be overridden by
KDEB_PKGVERSION. In this case, the resulting package would not boot
because /lib/modules/$(uname -r) does not point the module directory.

Partially revert 3ab18a6 ("kbuild: deb-pkg: improve the usability
of source package").

Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Fixes: 3ab18a6 ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
@ujfalusi ujfalusi force-pushed the merge/sound-upstream-20230306 branch from 0185e22 to c7ab720 Compare March 13, 2023 06:35
@ujfalusi
Copy link
Collaborator Author

There is a fix upstream: Testing the proposed upstream fix: https://lore.kernel.org/lkml/20230312200731.599706-3-masahiroy@kernel.org/
But this is not testable with this PR (conflicts). I will close and open a new upstream merge today.

@ujfalusi
Copy link
Collaborator Author

New merge PR created: #4241, closing.

@ujfalusi ujfalusi closed this Mar 13, 2023
@ujfalusi ujfalusi deleted the merge/sound-upstream-20230306 branch March 21, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.