Commits on Jul 29, 2016
  1. Merge tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/rostedt/linux-trace
    
    Pull tracing updates from Steven Rostedt:
     "This is mostly clean ups and small fixes.  Some of the more visible
      changes are:
    
       - The function pid code uses the event pid filtering logic
       - [ku]probe events have access to current->comm
       - trace_printk now has sample code
       - PCI devices now trace physical addresses
       - stack tracing has less unnessary functions traced"
    
    * tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
      printk, tracing: Avoiding unneeded blank lines
      tracing: Use __get_str() when manipulating strings
      tracing, RAS: Cleanup on __get_str() usage
      tracing: Use outer () on __get_str() definition
      ftrace: Reduce size of function graph entries
      tracing: Have HIST_TRIGGERS select TRACING
      tracing: Using for_each_set_bit() to simplify trace_pid_write()
      ftrace: Move toplevel init out of ftrace_init_tracefs()
      tracing/function_graph: Fix filters for function_graph threshold
      tracing: Skip more functions when doing stack tracing of events
      tracing: Expose CPU physical addresses (resource values) for PCI devices
      tracing: Show the preempt count of when the event was called
      tracing: Add trace_printk sample code
      tracing: Choose static tp_printk buffer by explicit nesting count
      tracing: expose current->comm to [ku]probe events
      ftrace: Have set_ftrace_pid use the bitmap like events do
      tracing: Move pid_list write processing into its own function
      tracing: Move the pid_list seq_file functions to be global
      tracing: Move filtered_pid helper functions into trace.c
      tracing: Make the pid filtering helper functions global
    committed Jul 28, 2016
  2. Merge tag 'vfio-v4.8-rc1' of git://github.com/awilliam/linux-vfio

    Pull VFIO updates from Alex Williamson:
     - Enable no-iommu mode for platform devices (Peng Fan)
     - Sub-page mmap for exclusive pages (Yongji Xie)
     - Use-after-free fix (Ilya Lesokhin)
     - Support for ACPI-based platform devices (Sinan Kaya)
    
    * tag 'vfio-v4.8-rc1' of git://github.com/awilliam/linux-vfio:
      vfio: platform: check reset call return code during release
      vfio: platform: check reset call return code during open
      vfio, platform: make reset driver a requirement by default
      vfio: platform: call _RST method when using ACPI
      vfio: platform: add extra debug info argument to call reset
      vfio: platform: add support for ACPI probe
      vfio: platform: determine reset capability
      vfio: platform: move reset call to a common function
      vfio: platform: rename reset function
      vfio: fix possible use after free of vfio group
      vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
      vfio: platform: support No-IOMMU mode
    committed Jul 28, 2016
  3. Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/shli/md
    
    Pull MD updates from Shaohua Li:
     - A bunch of patches from Neil Brown to fix RCU usage
     - Two performance improvement patches from Tomasz Majchrzak
     - Alexey Obitotskiy fixes module refcount issue
     - Arnd Bergmann fixes time granularity
     - Cong Wang fixes a list corruption issue
     - Guoqing Jiang fixes a deadlock in md-cluster
     - A null pointer deference fix from me
     - Song Liu fixes misuse of raid6 rmw
     - Other trival/cleanup fixes from Guoqing Jiang and Xiao Ni
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (28 commits)
      MD: fix null pointer deference
      raid10: improve random reads performance
      md: add missing sysfs_notify on array_state update
      Fix kernel module refcount handling
      md: use seconds granularity for error logging
      md: reduce the number of synchronize_rcu() calls when multiple devices fail.
      md: be extra careful not to take a reference to a Faulty device.
      md/multipath: add rcu protection to rdev access in multipath_status.
      md/raid5: add rcu protection to rdev accesses in raid5_status.
      md/raid5: add rcu protection to rdev accesses in want_replace
      md/raid5: add rcu protection to rdev accesses in handle_failed_sync.
      md/raid1: add rcu protection to rdev in fix_read_error
      md/raid1: small code cleanup in end_sync_write
      md/raid1: small cleanup in raid1_end_read/write_request
      md/raid10: simplify print_conf a little.
      md/raid10: minor code improvement in fix_read_error()
      md/raid10: add rcu protection to rdev access during reshape.
      md/raid10: add rcu protection to rdev access in raid10_sync_request.
      md/raid10: add rcu protection in raid10_status.
      md/raid10: fix refounct imbalance when resyncing an array with a replacement device.
      ...
    committed Jul 28, 2016
  4. Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/nvdimm/nvdimm
    
    Pull libnvdimm updates from Dan Williams:
    
     - Replace pcommit with ADR / directed-flushing.
    
       The pcommit instruction, which has not shipped on any product, is
       deprecated.  Instead, the requirement is that platforms implement
       either ADR, or provide one or more flush addresses per nvdimm.
    
       ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
       to the memory controller on a power-fail event.
    
       Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
       Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
       A flush hint is an mmio address that when written and fenced assures
       that all previous posted writes targeting a given dimm have been
       flushed to media.
    
     - On-demand ARS (address range scrub).
    
       Linux uses the results of the ACPI ARS commands to track bad blocks
       in pmem devices.  When latent errors are detected we re-scrub the
       media to refresh the bad block list, userspace can also request a
       re-scrub at any time.
    
     - Support for the Microsoft DSM (device specific method) command
       format.
    
     - Support for EDK2/OVMF virtual disk device memory ranges.
    
     - Various fixes and cleanups across the subsystem.
    
    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
      libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
      nfit: do an ARS scrub on hitting a latent media error
      nfit: move to nfit/ sub-directory
      nfit, libnvdimm: allow an ARS scrub to be triggered on demand
      libnvdimm: register nvdimm_bus devices with an nd_bus driver
      pmem: clarify a debug print in pmem_clear_poison
      x86/insn: remove pcommit
      Revert "KVM: x86: add pcommit support"
      nfit, tools/testing/nvdimm/: unify shutdown paths
      libnvdimm: move ->module to struct nvdimm_bus_descriptor
      nfit: cleanup acpi_nfit_init calling convention
      nfit: fix _FIT evaluation memory leak + use after free
      tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
      tools/testing/nvdimm: add virtual ramdisk range
      acpi, nfit: treat virtual ramdisk SPA as pmem region
      pmem: kill __pmem address space
      pmem: kill wmb_pmem()
      libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
      fs/dax: remove wmb_pmem()
      libnvdimm, pmem: flush posted-write queues on shutdown
      ...
    committed Jul 28, 2016
  5. Merge tag 'pinctrl-v4.8-1' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/linusw/linux-pinctrl
    
    Pull pin control updates from Linus Walleij:
     "This is the bulk of pin control changes for the v4.8 kernel cycle.
    
      Nothing stands out as especially exiting: new drivers, new subdrivers,
      lots of cleanups and incremental features.
    
      Business as usual.
    
      New drivers:
    
       - New driver for Oxnas pin control and GPIO.  This ARM-based chipset
         is used in a few storage (NAS) type devices.
    
       - New driver for the MAX77620/MAX20024 pin controller portions.
    
       - New driver for the Intel Merrifield pin controller.
    
      New subdrivers:
    
       - New subdriver for the Qualcomm MDM9615
    
       - New subdriver for the STM32F746 MCU
    
       - New subdriver for the Broadcom NSP SoC.
    
      Cleanups:
    
       - Demodularization of bool compiled-in drivers.
    
      Apart from this there is just regular incremental improvements to a
      lot of drivers, especially Uniphier and PFC"
    
    * tag 'pinctrl-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (131 commits)
      pinctrl: fix pincontrol definition for marvell
      pinctrl: xway: fix typo
      Revert "pinctrl: amd: make it explicitly non-modular"
      pinctrl: iproc: Add NSP and Stingray GPIO support
      pinctrl: Update iProc GPIO DT bindings
      pinctrl: bcm: add OF dependencies
      pinctrl: ns2: remove redundant dev_err call in ns2_pinmux_probe()
      pinctrl: Add STM32F746 MCU support
      pinctrl: intel: Protect set wake flow by spin lock
      pinctrl: nsp: remove redundant dev_err call in nsp_pinmux_probe()
      pinctrl: uniphier: add Ethernet pin-mux settings
      sh-pfc: Use PTR_ERR_OR_ZERO() to simplify the code
      pinctrl: ns2: fix return value check in ns2_pinmux_probe()
      pinctrl: qcom: update DT bindings with ebi2 groups
      pinctrl: qcom: establish proper EBI2 pin groups
      pinctrl: imx21: Remove the MODULE_DEVICE_TABLE() macro
      Documentation: dt: Add new compatible to STM32 pinctrl driver bindings
      includes: dt-bindings: Add STM32F746 pinctrl DT bindings
      pinctrl: sunxi: fix nand0 function name for sun8i
      pinctrl: uniphier: remove pointless pin-mux settings for PH1-LD11
      ...
    committed Jul 28, 2016
Commits on Jul 28, 2016
  1. Merge branch 'akpm' (patches from Andrew)

    Merge more updates from Andrew Morton:
     "The rest of MM"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits)
      mm, compaction: simplify contended compaction handling
      mm, compaction: introduce direct compaction priority
      mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
      mm, page_alloc: make THP-specific decisions more generic
      mm, page_alloc: restructure direct compaction handling in slowpath
      mm, page_alloc: don't retry initial attempt in slowpath
      mm, page_alloc: set alloc_flags only once in slowpath
      lib/stackdepot.c: use __GFP_NOWARN for stack allocations
      mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
      mm, kasan: account for object redzone in SLUB's nearest_obj()
      mm: fix use-after-free if memory allocation failed in vma_adjust()
      zsmalloc: Delete an unnecessary check before the function call "iput"
      mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
      mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
      mm: optimize copy_page_to/from_iter_iovec
      mm: add cond_resched() to generic_swapfile_activate()
      Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
      mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
      mm: hwpoison: remove incorrect comments
      make __section_nr() more efficient
      ...
    committed Jul 28, 2016
  2. @tehcaster

    mm, compaction: simplify contended compaction handling

    Async compaction detects contention either due to failing trylock on
    zone->lock or lru_lock, or by need_resched().  Since 1f9efde ("mm,
    compaction: khugepaged should not give up due to need_resched()") the
    code got quite complicated to distinguish these two up to the
    __alloc_pages_slowpath() level, so different decisions could be taken
    for khugepaged allocations.
    
    After the recent changes, khugepaged allocations don't check for
    contended compaction anymore, so we again don't need to distinguish lock
    and sched contention, and simplify the current convoluted code a lot.
    
    However, I believe it's also possible to simplify even more and
    completely remove the check for contended compaction after the initial
    async compaction for costly orders, which was originally aimed at THP
    page fault allocations.  There are several reasons why this can be done
    now:
    
    - with the new defaults, THP page faults no longer do reclaim/compaction at
      all, unless the system admin has overridden the default, or application has
      indicated via madvise that it can benefit from THP's. In both cases, it
      means that the potential extra latency is expected and worth the benefits.
    - even if reclaim/compaction proceeds after this patch where it previously
      wouldn't, the second compaction attempt is still async and will detect the
      contention and back off, if the contention persists
    - there are still heuristics like deferred compaction and pageblock skip bits
      in place that prevent excessive THP page fault latencies
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-9-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  3. @tehcaster

    mm, compaction: introduce direct compaction priority

    In the context of direct compaction, for some types of allocations we
    would like the compaction to either succeed or definitely fail while
    trying as hard as possible.  Current async/sync_light migration mode is
    insufficient, as there are heuristics such as caching scanner positions,
    marking pageblocks as unsuitable or deferring compaction for a zone.  At
    least the final compaction attempt should be able to override these
    heuristics.
    
    To communicate how hard compaction should try, we replace migration mode
    with a new enum compact_priority and change the relevant function
    signatures.  In compact_zone_order() where struct compact_control is
    constructed, the priority is mapped to suitable control flags.  This
    patch itself has no functional change, as the current priority levels
    are mapped back to the same migration modes as before.  Expanding them
    will be done next.
    
    Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is
    removed, as the only caller exists under CONFIG_COMPACTION.
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-8-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  4. @tehcaster

    mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

    After the previous patch, we can distinguish costly allocations that
    should be really lightweight, such as THP page faults, with
    __GFP_NORETRY.  This means we don't need to recognize khugepaged
    allocations via PF_KTHREAD anymore.  We can also change THP page faults
    in areas where madvise(MADV_HUGEPAGE) was used to try as hard as
    khugepaged, as the process has indicated that it benefits from THP's and
    is willing to pay some initial latency costs.
    
    We can also make the flags handling less cryptic by distinguishing
    GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
    GFP_TRANSHUGE (only direct reclaim, khugepaged default).  Adding
    __GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed.
    
    The patch effectively changes the current GFP_TRANSHUGE users as
    follows:
    
    * get_huge_zero_page() - the zero page lifetime should be relatively
      long and it's shared by multiple users, so it's worth spending some
      effort on it.  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added.
      This also restores direct reclaim to this allocation, which was
      unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag
      by default to madvise and add a stall-free defrag option")
    
    * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency
      is not an issue.  So if khugepaged "defrag" is enabled (the default), do
      reclaim via GFP_TRANSHUGE without __GFP_NORETRY.  We can remove the
      PF_KTHREAD check from page alloc.
    
      As a side-effect, khugepaged will now no longer check if the initial
      compaction was deferred or contended.  This is OK, as khugepaged sleep
      times between collapsion attempts are long enough to prevent noticeable
      disruption, so we should allow it to spend some effort.
    
    * migrate_misplaced_transhuge_page() - already was masking out
      __GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is
      equivalent.
    
    * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise)
      are now allocating without __GFP_NORETRY.  Other vma's keep using
      __GFP_NORETRY if direct reclaim/compaction is at all allowed (by default
      it's allowed only for madvised vma's).  The rest is conversion to
      GFP_TRANSHUGE(_LIGHT).
    
    [mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
    Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  5. @tehcaster

    mm, page_alloc: make THP-specific decisions more generic

    Since THP allocations during page faults can be costly, extra decisions
    are employed for them to avoid excessive reclaim and compaction, if the
    initial compaction doesn't look promising.  The detection has never been
    perfect as there is no gfp flag specific to THP allocations.  At this
    moment it checks the whole combination of flags that makes up
    GFP_TRANSHUGE, and hopes that no other users of such combination exist,
    or would mind being treated the same way.  Extra care is also taken to
    separate allocations from khugepaged, where latency doesn't matter that
    much.
    
    It is however possible to distinguish these allocations in a simpler and
    more reliable way.  The key observation is that after the initial
    compaction followed by the first iteration of "standard"
    reclaim/compaction, both __GFP_NORETRY allocations and costly
    allocations without __GFP_REPEAT are declared as failures:
    
            /* Do not loop if specifically requested */
            if (gfp_mask & __GFP_NORETRY)
                    goto nopage;
    
            /*
             * Do not retry costly high order allocations unless they are
             * __GFP_REPEAT
             */
            if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
                    goto nopage;
    
    This means we can further distinguish allocations that are costly order
    *and* additionally include the __GFP_NORETRY flag.  As it happens,
    GFP_TRANSHUGE allocations do already fall into this category.  This will
    also allow other costly allocations with similar high-order benefit vs
    latency considerations to use this semantic.  Furthermore, we can
    distinguish THP allocations that should try a bit harder (such as from
    khugepageed) by removing __GFP_NORETRY, as will be done in the next
    patch.
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-6-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  6. @tehcaster

    mm, page_alloc: restructure direct compaction handling in slowpath

    The retry loop in __alloc_pages_slowpath is supposed to keep trying
    reclaim and compaction (and OOM), until either the allocation succeeds,
    or returns with failure.  Success here is more probable when reclaim
    precedes compaction, as certain watermarks have to be met for compaction
    to even try, and more free pages increase the probability of compaction
    success.  On the other hand, starting with light async compaction (if
    the watermarks allow it), can be more efficient, especially for smaller
    orders, if there's enough free memory which is just fragmented.
    
    Thus, the current code starts with compaction before reclaim, and to
    make sure that the last reclaim is always followed by a final
    compaction, there's another direct compaction call at the end of the
    loop.  This makes the code hard to follow and adds some duplicated
    handling of migration_mode decisions.  It's also somewhat inefficient
    that even if reclaim or compaction decides not to retry, the final
    compaction is still attempted.  Some gfp flags combination also shortcut
    these retry decisions by "goto noretry;", making it even harder to
    follow.
    
    This patch attempts to restructure the code with only minimal functional
    changes.  The call to the first compaction and THP-specific checks are
    now placed above the retry loop, and the "noretry" direct compaction is
    removed.
    
    The initial compaction is additionally restricted only to costly orders,
    as we can expect smaller orders to be held back by watermarks, and only
    larger orders to suffer primarily from fragmentation.  This better
    matches the checks in reclaim's shrink_zones().
    
    There are two other smaller functional changes.  One is that the upgrade
    from async migration to light sync migration will always occur after the
    initial compaction.  This is how it has been until recent patch "mm,
    oom: protect !costly allocations some more", which introduced upgrading
    the mode based on COMPACT_COMPLETE result, but kept the final compaction
    always upgraded, which made it even more special.  It's better to return
    to the simpler handling for now, as migration modes will be further
    modified later in the series.
    
    The second change is that once both reclaim and compaction declare it's
    not worth to retry the reclaim/compact loop, there is no final
    compaction attempt.  As argued above, this is intentional.  If that
    final compaction were to succeed, it would be due to a wrong retry
    decision, or simply a race with somebody else freeing memory for us.
    
    The main outcome of this patch should be simpler code.  Logically, the
    initial compaction without reclaim is the exceptional case to the
    reclaim/compaction scheme, but prior to the patch, it was the last loop
    iteration that was exceptional.  Now the code matches the logic better.
    The change also enable the following patches.
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  7. @tehcaster

    mm, page_alloc: don't retry initial attempt in slowpath

    After __alloc_pages_slowpath() sets up new alloc_flags and wakes up
    kswapd, it first tries get_page_from_freelist() with the new
    alloc_flags, as it may succeed e.g. due to using min watermark instead
    of low watermark.  It makes sense to to do this attempt before adjusting
    zonelist based on alloc_flags/gfp_mask, as it's still relatively a fast
    path if we just wake up kswapd and successfully allocate.
    
    This patch therefore moves the initial attempt above the retry label and
    reorganizes a bit the part below the retry label.  We still have to
    attempt get_page_from_freelist() on each retry, as some allocations
    cannot do that as part of direct reclaim or compaction, and yet are not
    allowed to fail (even though they do a WARN_ON_ONCE() and thus should
    not exist).  We can reuse the call meant for ALLOC_NO_WATERMARKS attempt
    and just set alloc_flags to ALLOC_NO_WATERMARKS if the context allows
    it.  As a side-effect, the attempts from direct reclaim/compaction will
    also no longer obey watermarks once this is set, but there's little harm
    in that.
    
    Kswapd wakeups are also done on each retry to be safe from potential
    races resulting in kswapd going to sleep while a process (that may not
    be able to reclaim by itself) is still looping.
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  8. @tehcaster

    mm, page_alloc: set alloc_flags only once in slowpath

    In __alloc_pages_slowpath(), alloc_flags doesn't change after it's
    initialized, so move the initialization above the retry: label.  Also
    make the comment above the initialization more descriptive.
    
    The only exception in the alloc_flags being constant is
    ALLOC_NO_WATERMARKS, which may change due to TIF_MEMDIE being set on the
    allocating thread.  We can fix this, and make the code simpler and a bit
    more effective at the same time, by moving the part that determines
    ALLOC_NO_WATERMARKS from gfp_to_alloc_flags() to gfp_pfmemalloc_allowed().
    
    This means we don't have to mask out ALLOC_NO_WATERMARKS in numerous
    places in __alloc_pages_slowpath() anymore.  The only two tests for the
    flag can instead call gfp_pfmemalloc_allowed().
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    tehcaster committed with Jul 28, 2016
  9. @kiryl

    lib/stackdepot.c: use __GFP_NOWARN for stack allocations

    This (large, atomic) allocation attempt can fail.  We expect and handle
    that, so avoid the scary warning.
    
    Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl committed with Jul 28, 2016
  10. @ramosian-glider

    mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB

    For KASAN builds:
     - switch SLUB allocator to using stackdepot instead of storing the
       allocation/deallocation stacks in the objects;
     - change the freelist hook so that parts of the freelist can be put
       into the quarantine.
    
    [aryabinin@virtuozzo.com: fixes]
      Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
    Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <adech.fo@gmail.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Kostya Serebryany <kcc@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    ramosian-glider committed with Jul 28, 2016
  11. @ramosian-glider

    mm, kasan: account for object redzone in SLUB's nearest_obj()

    When looking up the nearest SLUB object for a given address, correctly
    calculate its offset if SLAB_RED_ZONE is enabled for that cache.
    
    Previously, when KASAN had detected an error on an object from a cache
    with SLAB_RED_ZONE set, the actual start address of the object was
    miscalculated, which led to random stacks having been reported.
    
    When looking up the nearest SLUB object for a given address, correctly
    calculate its offset if SLAB_RED_ZONE is enabled for that cache.
    
    Fixes: 7ed2f9e ("mm, kasan: SLAB support")
    Link: http://lkml.kernel.org/r/1468347165-41906-2-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <adech.fo@gmail.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Kostya Serebryany <kcc@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    ramosian-glider committed with Jul 28, 2016
  12. mm: fix use-after-free if memory allocation failed in vma_adjust()

    There's one case when vma_adjust() expands the vma, overlapping with
    *two* next vma.  See case 6 of mprotect, described in the comment to
    vma_merge().
    
    To handle this (and only this) situation we iterate twice over main part
    of the function.  See "goto again".
    
    Vegard reported[1] that he sees out-of-bounds access complain from
    KASAN, if anon_vma_clone() on the *second* iteration fails.
    
    This happens because we free 'next' vma by the end of first iteration
    and don't have a way to undo this if anon_vma_clone() fails on the
    second iteration.
    
    The solution is to do all required allocations upfront, before we touch
    vmas.
    
    The allocation on the second iteration is only required if first two
    vmas don't have anon_vma, but third does.  So we need, in total, one
    anon_vma_clone() call.
    
    It's easy to adjust 'exporter' to the third vma for such case.
    
    [1] http://lkml.kernel.org/r/1469514843-23778-1-git-send-email-vegard.nossum@oracle.com
    
    Link: http://lkml.kernel.org/r/1469625255-126641-1-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Kirill A. Shutemov committed with Jul 28, 2016
  13. @elfring

    zsmalloc: Delete an unnecessary check before the function call "iput"

    iput() tests whether its argument is NULL and then returns immediately.
    Thus the test around the call is not needed.
    
    This issue was detected by using the Coccinelle software.
    
    Link: http://lkml.kernel.org/r/559cf499-4a01-25f9-c87f-24d906626a57@users.sourceforge.net
    Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
    Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    elfring committed with Jul 28, 2016
  14. mm/memblock.c: fix index adjustment error in __next_mem_range_rev()

    Fix region index adjustment error when parameter type_b of
    __next_mem_range_rev() == NULL.
    
    Signed-off-by: zijun_hu <zijun_hu@htc.com>
    Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Richard Leitner <dev@g0hl1n.net>
    Cc: David Gibson <david@gibson.dropbear.id.au>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zijun_hu committed with Jul 28, 2016
  15. mem-hotplug: alloc new page from a nearest neighbor node when mem-off…

    …line
    
    If we offline a node, alloc the new page from a nearest neighbor node
    instead of the current node or other remote nodes, because re-migrate is
    a waste of time and the distance of the remote nodes is often very
    large.
    
    Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
    zone or highmem zone.
    
    Link: http://lkml.kernel.org/r/5795E18B.5060302@huawei.com
    Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Xishi Qiu committed with Jul 28, 2016
  16. mm: optimize copy_page_to/from_iter_iovec

    copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
    to userspace or from userspace.  These functions have a fast path where
    they map a page using kmap_atomic and a slow path where they use kmap.
    
    kmap is slower than kmap_atomic, so the fast path is preferred.
    
    However, on kernels without highmem support, kmap just calls
    page_address, so there is no need to avoid kmap.  On kernels without
    highmem support, the fast path just increases code size (and cache
    footprint) and it doesn't improve copy performance in any way.
    
    This patch enables the fast path only if CONFIG_HIGHMEM is defined.
    
    Code size reduced by this patch:
      x86 (without highmem)	  928
      x86-64		  960
      sparc64		  848
      alpha			 1136
      pa-risc		 1200
    
    [akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Andi Kleen <andi@firstfloor.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Mikulas Patocka committed with Jul 28, 2016
  17. mm: add cond_resched() to generic_swapfile_activate()

    generic_swapfile_activate() can take quite long time, it iterates over
    all blocks of a file, so add cond_resched to it.  I observed about 1
    second stalls when activating a swapfile that was almost unfragmented -
    this patch fixes it.
    
    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221710580.4818@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Mikulas Patocka committed with Jul 28, 2016
  18. Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elem…

    …ents"
    
    This reverts commit f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC
    if there are free elements").
    
    There has been a report about OOM killer invoked when swapping out to a
    dm-crypt device.  The primary reason seems to be that the swapout out IO
    managed to completely deplete memory reserves.  Ondrej was able to
    bisect and explained the issue by pointing to f9054c7 ("mm,
    mempool: only set __GFP_NOMEMALLOC if there are free elements").
    
    The reason is that the swapout path is not throttled properly because
    the md-raid layer needs to allocate from the generic_make_request path
    which means it allocates from the PF_MEMALLOC context.  dm layer uses
    mempool_alloc in order to guarantee a forward progress which used to
    inhibit access to memory reserves when using page allocator.  This has
    changed by f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC if
    there are free elements") which has dropped the __GFP_NOMEMALLOC
    protection when the memory pool is depleted.
    
    If we are running out of memory and the only way forward to free memory
    is to perform swapout we just keep consuming memory reserves rather than
    throttling the mempool allocations and allowing the pending IO to
    complete up to a moment when the memory is depleted completely and there
    is no way forward but invoking the OOM killer.  This is less than
    optimal.
    
    The original intention of f9054c7 was to help with the OOM
    situations where the oom victim depends on mempool allocation to make a
    forward progress.  David has mentioned the following backtrace:
    
      schedule
      schedule_timeout
      io_schedule_timeout
      mempool_alloc
      __split_and_process_bio
      dm_request
      generic_make_request
      submit_bio
      mpage_readpages
      ext4_readpages
      __do_page_cache_readahead
      ra_submit
      filemap_fault
      handle_mm_fault
      __do_page_fault
      do_page_fault
      page_fault
    
    We do not know more about why the mempool is depleted without being
    replenished in time, though.  In any case the dm layer shouldn't depend
    on any allocations outside of the dedicated pools so a forward progress
    should be guaranteed.  If this is not the case then the dm should be
    fixed rather than papering over the problem and postponing it to later
    by accessing more memory reserves.
    
    mempools are a mechanism to maintain dedicated memory reserves to
    guaratee forward progress.  Allowing them an unbounded access to the
    page allocator memory reserves is going against the whole purpose of
    this mechanism.
    
    Bisected by Ondrej Kozina.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20160721145309.GR26379@dhcp22.suse.cz
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Reported-by: Ondrej Kozina <okozina@redhat.com>
    Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
    Acked-by: NeilBrown <neilb@suse.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Mikulas Patocka <mpatocka@redhat.com>
    Cc: Ondrej Kozina <okozina@redhat.com>
    Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Michal Hocko committed with Jul 28, 2016
  19. mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIG…

    …HT mode
    
    At present MIGRATE_SYNC_LIGHT is allowing __isolate_lru_page() to
    isolate a PageWriteback page, which __unmap_and_move() then rejects with
    -EBUSY: of course the writeback might complete in between, but that's
    not what we usually expect, so probably better not to isolate it.
    
    When tested by stress-highalloc from mmtests, this has reduced the
    number of page migrate failures by 60-70%.
    
    Link: http://lkml.kernel.org/r/20160721073614.24395-2-vbabka@suse.cz
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins committed with Jul 28, 2016
  20. @Naoya-Horiguchi

    mm: hwpoison: remove incorrect comments

    dequeue_hwpoisoned_huge_page() can be called without page lock hold, so
    let's remove incorrect comment.
    
    The reason why the page lock is not really needed is that
    dequeue_hwpoisoned_huge_page() checks page_huge_active() inside
    hugetlb_lock, which allows us to avoid trying to dequeue a hugepage that
    are just allocated but not linked to active list yet, even without
    taking page lock.
    
    Link: http://lkml.kernel.org/r/20160720092901.GA15995@www9186uo.sakura.ne.jp
    Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Reported-by: Zhan Chen <zhanc1@andrew.cmu.edu>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Naoya-Horiguchi committed with Jul 28, 2016
  21. @zhouchengming1

    make __section_nr() more efficient

    When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get the
    section number with a subtraction directly.
    
    Link: http://lkml.kernel.org/r/1468988310-11560-1-git-send-email-zhouchengming1@huawei.com
    Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Hanjun Guo <guohanjun@huawei.com>
    Cc: Li Bin <huawei.libin@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zhouchengming1 committed with Jul 28, 2016
  22. @vegard

    kmemleak: don't hang if user disables scanning early

    If the user tries to disable automatic scanning early in the boot
    process using e.g.:
    
      echo scan=off > /sys/kernel/debug/kmemleak
    
    then this command will hang until SECS_FIRST_SCAN (= 60) seconds have
    elapsed, even though the system is fully initialised.
    
    We can fix this using interruptible sleep and checking if we're supposed
    to stop whenever we wake up (like the rest of the code does).
    
    Link: http://lkml.kernel.org/r/1468835005-2873-1-git-send-email-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    vegard committed with Jul 28, 2016
  23. @dennisarm

    arm64:acpi: fix the acpi alignment exception when 'mem=' specified

    When booting an ACPI enabled kernel with 'mem=x', there is the
    possibility that ACPI data regions from the firmware will lie above the
    memory limit.  Ordinarily these will be removed by
    memblock_enforce_memory_limit(.).
    
    Unfortunately, this means that these regions will then be mapped by
    acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned
    accessess will then provoke alignment faults.
    
    In this patch we adopt memblock_mem_limit_remove_map instead, and this
    preserves these ACPI data regions (marked NOMAP) thus ensuring that
    these regions are not mapped as device memory.
    
    For example, below is an alignment exception observed on ARM platform
    when booting the kernel with 'acpi=on mem=8G':
    
      ...
      Unable to handle kernel paging request at virtual address ffff0000080521e7
      pgd = ffff000008aa0000
      [ffff0000080521e7] *pgd=000000801fffe003, *pud=000000801fffd003, *pmd=000000801fffc003, *pte=00e80083ff1c1707
      Internal error: Oops: 96000021 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172
      Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016
      task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000
      PC is at acpi_ns_lookup+0x520/0x734
      LR is at acpi_ns_lookup+0x4a4/0x734
      pc : [<ffff0000083b8b10>] lr : [<ffff0000083b8a94>] pstate: 60000045
      sp : ffff800001efb8b0
      x29: ffff800001efb8c0 x28: 000000000000001b
      x27: 0000000000000001 x26: 0000000000000000
      x25: ffff800001efb9e8 x24: ffff000008a10000
      x23: 0000000000000001 x22: 0000000000000001
      x21: ffff000008724000 x20: 000000000000001b
      x19: ffff0000080521e7 x18: 000000000000000d
      x17: 00000000000038ff x16: 0000000000000002
      x15: 0000000000000007 x14: 0000000000007fff
      x13: ffffff0000000000 x12: 0000000000000018
      x11: 000000001fffd200 x10: 00000000ffffff76
      x9 : 000000000000005f x8 : ffff000008725fa8
      x7 : ffff000008a8df70 x6 : ffff000008a8df70
      x5 : ffff000008a8d000 x4 : 0000000000000010
      x3 : 0000000000000010 x2 : 000000000000000c
      x1 : 0000000000000006 x0 : 0000000000000000
      ...
        acpi_ns_lookup+0x520/0x734
        acpi_ds_load1_begin_op+0x174/0x4fc
        acpi_ps_build_named_op+0xf8/0x220
        acpi_ps_create_op+0x208/0x33c
        acpi_ps_parse_loop+0x204/0x838
        acpi_ps_parse_aml+0x1bc/0x42c
        acpi_ns_one_complete_parse+0x1e8/0x22c
        acpi_ns_parse_table+0x8c/0x128
        acpi_ns_load_table+0xc0/0x1e8
        acpi_tb_load_namespace+0xf8/0x2e8
        acpi_load_tables+0x7c/0x110
        acpi_init+0x90/0x2c0
        do_one_initcall+0x38/0x12c
        kernel_init_freeable+0x148/0x1ec
        kernel_init+0x10/0xec
        ret_from_fork+0x10/0x40
      Code: b9009fbc 2a00037b 36380057 3219037b (b9400260)
      ---[ end trace 03381e5eb0a24de4 ]---
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    
    With 'efi=debug', we can see those ACPI regions loaded by firmware on
    that board as:
    
      efi:   0x0083ff185000-0x0083ff1b4fff [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
      efi:   0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
      efi:   0x0083ff223000-0x0083ff224fff [ACPI Memory NVS    |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
    
    Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.com
    Acked-by: Steve Capper <steve.capper@arm.com>
    Signed-off-by: Dennis Chen <dennis.chen@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Kaly Xin <kaly.xin@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    dennisarm committed with Jul 28, 2016
  24. @dennisarm

    mm/memblock.c: add new infrastructure to address the mem limit issue

    In some cases, memblock is queried by kernel to determine whether a
    specified address is RAM or not.  For example, the ACPI core needs this
    information to determine which attributes to use when mapping ACPI
    regions(acpi_os_ioremap).  Use of incorrect memory types can result in
    faults, data corruption, or other issues.
    
    Removing memory with memblock_enforce_memory_limit() throws away this
    information, and so a kernel booted with 'mem=' may suffer from the
    issues described above.  To avoid this, we need to keep those NOMAP
    regions instead of removing all above the limit, which preserves the
    information we need while preventing other use of those regions.
    
    This patch adds new infrastructure to retain all NOMAP memblock regions
    while removing others, to cater for this.
    
    Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.com
    Signed-off-by: Dennis Chen <dennis.chen@arm.com>
    Acked-by: Steve Capper <steve.capper@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Kaly Xin <kaly.xin@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    dennisarm committed with Jul 28, 2016
  25. printk: when dumping regs, show the stack, not thread_info

    We currently show:
    
      task: <current> ti: <current_thread_info()> task.ti: <task_thread_info(current)>"
    
    "ti" and "task.ti" are redundant, and neither is actually what we want
    to show, which the the base of the thread stack.  Change the display to
    show the stack pointer explicitly.
    
    Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Andy Lutomirski committed with Jul 28, 2016
  26. kdb: use task_cpu() instead of task_thread_info()->cpu

    We'll need this cleanup to make the cpu field in thread_info be
    optional.
    
    Link: http://lkml.kernel.org/r/da298328dc77ea494576c2f20a934218e758a6fa.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Jason Wessel <jason.wessel@windriver.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Andy Lutomirski committed with Jul 28, 2016
  27. mm: fix memcg stack accounting for sub-page stacks

    We should account for stacks regardless of stack size, and we need to
    account in sub-page units if THREAD_SIZE < PAGE_SIZE.  Change the units
    to kilobytes and Move it into account_kernel_stack().
    
    Fixes: 12580e4 ("mm: memcontrol: report kernel stack usage in cgroup2 memory.stat")
    Link: http://lkml.kernel.org/r/9b5314e3ee5eda61b0317ec1563768602c1ef438.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Andy Lutomirski committed with Jul 28, 2016
  28. mm: track NR_KERNEL_STACK in KiB instead of number of stacks

    Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone.
    This only makes sense if each kernel stack exists entirely in one zone,
    and allowing vmapped stacks could break this assumption.
    
    Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
    allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all
    architectures.  Keep it simple and use KiB.
    
    Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Andy Lutomirski committed with Jul 28, 2016
  29. @djbw

    mm: cleanup ifdef guards for vmem_altmap

    Now that ZONE_DEVICE depends on SPARSEMEM_VMEMMAP we can simplify some
    ifdef guards to just ZONE_DEVICE.
    
    Link: http://lkml.kernel.org/r/146687646788.39261.8020536391978771940.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Reported-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Eric Sandeen <sandeen@redhat.com>
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    djbw committed with Jul 28, 2016
  30. @djbw

    mm: CONFIG_ZONE_DEVICE stop depending on CONFIG_EXPERT

    When it was first introduced CONFIG_ZONE_DEVICE depended on disabling
    CONFIG_ZONE_DMA, a configuration choice reserved for "experts".
    However, now that the ZONE_DMA conflict has been eliminated it no longer
    makes sense to require CONFIG_EXPERT.
    
    Link: http://lkml.kernel.org/r/146687646274.39261.14267596518720371009.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Reported-by: Eric Sandeen <sandeen@redhat.com>
    Reported-by: Jeff Moyer <jmoyer@redhat.com>
    Acked-by: Jeff Moyer <jmoyer@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    djbw committed with Jul 28, 2016