Permalink
Commits on Aug 22, 2012
  1. Linux 3.6-rc3

    committed Aug 22, 2012
  2. Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

    Pull drm fixes from Dave Airlie:
     "Intel: edid fixes, power consumption fix, s/r fix, haswell fix
    
      Radeon: BIOS loading fixes for UEFI and Thunderbolt machines, better
      MSAA validation, lockup timeout fixes, modesetting fixes
    
      One udl dpms fix, one vmwgfx fix, a couple of trivial core changes.
    
      There is an export added to ACPI as part of the radeon bios fixes.
    
      I've also included the fbcon flashing cursor vs deinit race fix, that
      seems the simplest place to start"
    
    Trivial conflict in drivers/video/console/fbcon.c due to me having
    already applied the fbcon flashing cursor vs deinit race fix, and Dave
    had added a comment in there too.
    
    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
      fbcon: fix race condition between console lock and cursor timer (v1.1)
      drm: Add missing static storage class specifiers in drm_proc.c file
      drm/udl: dpms off the crtc when disabled.
      drm: Remove two unused fields from struct drm_display_mode
      drm: stop vmgfx driver explosion
      drm/radeon/ss: use num_crtc rather than hardcoded 6
      Revert "drm/radeon: fix bo creation retry path"
      drm/i915: use hsw rps tuning values everywhere on gen6+
      drm/radeon: split ATRM support out from the ATPX handler (v3)
      drm/radeon: convert radeon vfct code to use acpi_get_table_with_size
      ACPI: export symbol acpi_get_table_with_size
      drm/radeon: implement ACPI VFCT vbios fetch (v3)
      drm/radeon/kms: extend the Fujitsu D3003-S2 board connector quirk to cover later silicon stepping
      drm/radeon: fix checking of MSAA renderbuffers on r600-r700
      drm/radeon: allow CMASK and FMASK in the CS checker on r600-r700
      drm/radeon: init lockup timeout on ring init
      drm/radeon: avoid turning off spread spectrum for used pll
      drm/i915: fall back to bit-banging if GMBUS fails in CRT EDID reads
      drm/i915: extract connector update from intel_ddc_get_modes() for reuse
      drm/i915: fix hsw uncached pte
      ...
    committed Aug 22, 2012
  3. Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

    Pull SCSI target fixes from Nicholas Bellinger:
     "The executive summary includes:
    
       - Post-merge review comments for tcm_vhost (MST + nab)
       - Avoid debugging overhead when not debugging for tcm-fc(FCoE) (MDR)
       - Fix NULL pointer dereference bug on alloc_page failulre (Yi Zou)
       - Fix REPORT_LUNs regression bug with pSCSI export (AlexE + nab)
       - Fix regression bug with handling of zero-length data CDBs (nab)
       - Fix vhost_scsi_target structure alignment (MST)
    
      Thanks again to everyone who contributed a bugfix patch, gave review
      feedback on tcm_vhost code, and/or reported a bug during their own
      testing over the last weeks.
    
      There is one other outstanding bug reported by Roland recently related
      to SCSI transfer length overflow handling, for which the current
      proposed bugfix has been left in queue pending further testing with
      other non iscsi-target based fabric drivers.
    
      As the patch is verified with loopback (local SGL memory from SCSI
      LLD) + tcm_qla2xxx (TCM allocated SGL memory mapped to PCI HW) fabric
      ports, it will be included into the next 3.6-rc-fixes PULL request."
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
      target: Remove unused se_cmd.cmd_spdtl
      tcm_fc: rcu_deref outside rcu lock/unlock section
      tcm_vhost: Fix vhost_scsi_target structure alignment
      target: Fix regression bug with handling of zero-length data CDBs
      target/pscsi: Fix bug with REPORT_LUNs handling for SCSI passthrough
      tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char *
      target: fix NULL pointer dereference bug alloc_page() fails to get memory
      tcm_fc: Avoid debug overhead when not debugging
      tcm_vhost: Post-merge review changes requested by MST
      tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl
    committed Aug 22, 2012
  4. Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/g…

    …it/wsa/linux
    
    Pull i2c-embedded fixes from Wolfram Sang:
     "Some bugfixes for the "embedded" part of the I2C subsystem.  The fixes
      affect mostly drivers which have been largely reworked lately and
      where regressions appeared."
    
    * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
      i2c: tegra: protect suspend/resume callbacks with CONFIG_PM_SLEEP
      i2c: diolan-u2c: Fix master_xfer return code
      I2C: OMAP: xfer: fix runtime PM get/put balance on error
      i2c: nomadik: Add default configuration into the Nomadik I2C driver
    committed Aug 22, 2012
  5. Merge tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm

    Pull pwm fixes from Thierry Reding:
     "These patches fix the Samsung PWM driver and perform some minor
      cleanups like fixing checkpatch and sparse warnings.
    
      Two redundant error messages are removed and the Kconfig help text for
      the PWM subsystem is made more descriptive."
    
    * tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm:
      pwm: Improve Kconfig help text
      pwm: core: Fix coding style issues
      pwm: vt8500: Fix coding style issue
      pwm: Remove a redundant error message when devm_request_and_ioremap fails
      pwm: samsung: add missing device pointer to struct pwm_chip
      pwm: Add missing static storage class specifiers in core.c file
    committed Aug 22, 2012
  6. Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/sage/ceph-client
    
    Pull ceph fixes from Sage Weil:
     "Jim's fix closes a narrow race introduced with the msgr changes.  One
      fix resolves problems with debugfs initialization that Yan found when
      multiple client instances are created (e.g., two clusters mounted, or
      rbd + cephfs), another one fixes problems with mounting a nonexistent
      server subdirectory, and the last one fixes a divide by zero error
      from unsanitized ioctl input that Dan Carpenter found."
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
      ceph: avoid divide by zero in __validate_layout()
      libceph: avoid truncation due to racing banners
      ceph: tolerate (and warn on) extraneous dentry from mds
      libceph: delay debugfs initialization until we learn global_id
    committed Aug 22, 2012
  7. Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy…

    …/linux-nfs
    
    Pull NFS client bugfixes from Trond Myklebust:
     - NFSv3 mounts need to fail if the FSINFO rpc call fails
     - Ensure that the NFS commit cache gets torn down when we unload the
       NFS module.
     - Fix memory scribble issues when interrupting a LAYOUTGET rpc call
     - Fix NFSv4 legacy idmapper regressions
     - Fix issues with the NFSv4 getacl command
     - Fix a regression when using the legacy "mount -t nfs4"
    
    * tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
      NFSv3: Ensure that do_proc_get_root() reports errors correctly
      NFSv4: Ensure that nfs4_alloc_client cleans up on error.
      NFS: return -ENOKEY when the upcall fails to map the name
      NFS: Clear key construction data if the idmap upcall fails
      NFSv4: Don't use private xdr_stream fields in decode_getacl
      NFSv4: Fix the acl cache size calculation
      NFSv4: Fix pointer arithmetic in decode_getacl
      NFS: Alias the nfs module to nfs4
      NFS: Fix a regression when loading the NFS v4 module
      NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
      pnfs-obj: Better IO pattern in case of unaligned offset
      NFS41: add pg_layout_private to nfs_pageio_descriptor
      pnfs: nfs4_proc_layoutget returns void
      pnfs: defer release of pages in layoutget
      nfs: tear down caches in nfs_init_writepagecache when allocation fails
    committed Aug 22, 2012
  8. Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/viro/vfs
    
    Pull assorted fixes - mostly vfs - from Al Viro:
     "Assorted fixes, with an unexpected detour into vfio refcounting logics
      (fell out when digging in an analog of eventpoll race in there)."
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      task_work: add a scheduling point in task_work_run()
      fs: fix fs/namei.c kernel-doc warnings
      eventpoll: use-after-possible-free in epoll_create1()
      vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
      vfio: get rid of vfio_device_put()/vfio_group_get_device* races
      vfio: get rid of open-coding kref_put_mutex
      introduce kref_put_mutex()
      vfio: don't dereference after kfree...
      mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
    committed Aug 22, 2012
  9. task_work: add a scheduling point in task_work_run()

    It seems commit 4a9d4b0 (switch fput to task_work_add) reintroduced
    the problem addressed in commit 944be0b (close_files(): add scheduling
    point)
    
    If a server process with a lot of files (say 2 million tcp sockets)
    is killed, we can spend a lot of time in task_work_run() and trigger
    a soft lockup.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Eric Dumazet committed with Al Viro Aug 21, 2012
  10. fs: fix fs/namei.c kernel-doc warnings

    Fix kernel-doc warnings in fs/namei.c:
    
    Warning(fs/namei.c:360): No description found for parameter 'inode'
    Warning(fs/namei.c:672): No description found for parameter 'nd'
    
    Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
    Cc:	Alexander Viro <viro@zeniv.linux.org.uk>
    Cc:	linux-fsdevel@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Randy Dunlap committed with Al Viro Aug 19, 2012
  11. eventpoll: use-after-possible-free in epoll_create1()

    As soon as we'd installed the file into descriptor table, it can
    get closed by another thread.  Freeing ep in process...
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 18, 2012
  12. vfio: grab vfio_device reference *before* exposing the sucker via fd_…

    …install()
    
    It's not critical (anymore) since another thread closing the file will block
    on ->device_lock before it gets to dropping the final reference, but it's
    definitely cleaner that way...
    
    Acked-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 18, 2012
  13. vfio: get rid of vfio_device_put()/vfio_group_get_device* races

    we really need to make sure that dropping the last reference happens
    under the group->device_lock; otherwise a loop (under device_lock)
    might find vfio_device instance that is being freed right now, has
    already dropped the last reference and waits on device_lock to exclude
    the sucker from the list.
    
    Acked-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 18, 2012
  14. vfio: get rid of open-coding kref_put_mutex

    Acked-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 18, 2012
  15. introduce kref_put_mutex()

    equivalent of
    	mutex_lock(mutex);
    	if (!kref_put(kref, release))
    		mutex_unlock(mutex);
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 18, 2012
  16. vfio: don't dereference after kfree...

    Acked-by: Alex Williamson <alex.williamson@redhat.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 17, 2012
  17. fbcon: fix race condition between console lock and cursor timer (v1.1)

    So we've had a fair few reports of fbcon handover breakage between
    efi/vesafb and i915 surface recently, so I dedicated a couple of
    days to finding the problem.
    
    Essentially the last thing we saw was the conflicting framebuffer
    message and that was all.
    
    So after much tracing with direct netconsole writes (printks
    under console_lock not so useful), I think I found the race.
    
    Thread A (driver load)    Thread B (timer thread)
      unbind_con_driver ->              |
      bind_con_driver ->                |
      vc->vc_sw->con_deinit ->          |
      fbcon_deinit ->                   |
      console_lock()                    |
          |                             |
          |                       fbcon_flashcursor timer fires
          |                       console_lock() <- blocked for A
          |
          |
    fbcon_del_cursor_timer ->
      del_timer_sync
      (BOOM)
    
    Of course because all of this is under the console lock,
    we never see anything, also since we also just unbound the active
    console guess what we never see anything.
    
    Hopefully this fixes the problem for anyone seeing vesafb->kms
    driver handoff.
    
    v1.1: add comment suggestion from Alan.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Dave Airlie committed Aug 21, 2012
  18. Merge branch 'akpm' (Andrew's patch-bomb)

    Merge fixes from Andrew Morton.
    
    Random drivers and some VM fixes.
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
      mm: compaction: Abort async compaction if locks are contended or taking too long
      mm: have order > 0 compaction start near a pageblock with free pages
      rapidio/tsi721: fix unused variable compiler warning
      rapidio/tsi721: fix inbound doorbell interrupt handling
      drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
      mm: correct page->pfmemalloc to fix deactivate_slab regression
      drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
      mm/compaction.c: fix deferring compaction mistake
      drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
      string: do not export memweight() to userspace
      hugetlb: update hugetlbpage.txt
      checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
      mm: hugetlbfs: correctly populate shared pmd
      cciss: fix incorrect scsi status reporting
      Documentation: update mount option in filesystem/vfat.txt
      mm: change nr_ptes BUG_ON to WARN_ON
      cs5535-clockevt: typo, it's MFGPT, not MFPGT
    committed Aug 22, 2012
Commits on Aug 21, 2012
  1. Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/mchehab/linux-media
    
    Pull media fixes from Mauro Carvalho Chehab:
     "For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver,
      radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)."
    
    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
      [media] media: soc_camera: don't clear pix->sizeimage in JPEG mode
      [media] media: mx2_camera: Fix clock handling for i.MX27
      [media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare
      [media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare
      [media] media: mx3_camera: buf_init() add buffer state check
      [media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set
      [media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set
      [media] radio-shark*: Call cancel_work_sync from disconnect rather then release
      [media] radio-shark*: Remove work-around for dangling pointer in usb intfdata
      [media] Add USB dependency for IguanaWorks USB IR Transceiver
      [media] Add missing logging for rangelow/high of hwseek
      [media] VIDIOC_ENUM_FREQ_BANDS fix
      [media] mem2mem_testdev: fix querycap regression
      [media] si470x: v4l2-compliance fixes
      [media] DocBook: Remove a spurious character
      [media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer
    committed Aug 21, 2012
  2. Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

    Pull networking update from David Miller:
     "A couple weeks of bug fixing in there.  The largest chunk is all the
      broken crap Amerigo Wang found in the netpoll layer."
    
     1) netpoll and it's users has several serious bugs:
        a) uses GFP_KERNEL with locks held
        b) interfaces requiring interrupts disabled are called with them
           enabled
        c) and vice versa
        d) VLAN tag demuxing, as per all other RX packet input paths, is not
           applied
    
        All from Amerigo Wang.
    
     2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
        good, from Neal Cardwell.
    
     3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
        when the user doesn't specify one explicitly during sendmsg().
        Instead we attach an empty (zero) SCM credential block which is
        definitely not what we want.  Fix from Eric Dumazet.
    
     4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
        from Ben Hutchings.
    
     5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
        from Christoph Paasch.
    
     6) When AF_PACKET is used for transmit, packet loopback doesn't behave
        properly when a socket fanout is enabled, from Eric Leblond.
    
     7) On bluetooth l2cap channel create failure, we leak the socket, from
        Jaganath Kanakkassery.
    
     8) Fix all the netprio file handling bugs found by Al Viro, from John
        Fastabend.
    
     9) Several error return and NULL deref bug fixes in networking drivers
        from Julia Lawall.
    
    10) A large smattering of struct padding et al.  kernel memory leaks to
        userspace found of Mathias Krause.
    
    11) Conntrack expections in netfilter can access an uninitialized timer,
        fix from Pablo Neira Ayuso.
    
    12) Several netfilter SIP tracker bug fixes from Patrick McHardy.
    
    13) IPSEC ipv6 routes are not initialized correctly all the time,
        resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.
    
    14) Bridging does rcu_dereference() outside of RCU protected area, from
        Stephen Hemminger.
    
    15) Fix routing cache removal performance regression when looking up
        output routes that have a local destination.  From Zheng Yan.
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
      af_netlink: force credentials passing [CVE-2012-3520]
      ipv4: fix ip header ident selection in __ip_make_skb()
      ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
      tcp: fix possible socket refcount problem
      net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
      net/core/dev.c: fix kernel-doc warning
      netconsole: remove a redundant netconsole_target_put()
      net: ipv6: fix oops in inet_putpeer()
      net/stmmac: fix issue of clk_get for Loongson1B.
      caif: Do not dereference NULL in chnl_recv_cb()
      af_packet: don't emit packet on orig fanout group
      drivers/net/irda: fix error return code
      drivers/net/wan/dscc4.c: fix error return code
      drivers/net/wimax/i2400m/fw.c: fix error return code
      smsc75xx: add missing entry to MAINTAINERS
      net: qmi_wwan: new devices: UML290 and K5006-Z
      net: sh_eth: Add eth support for R8A7779 device
      netdev/phy: skip disabled mdio-mux nodes
      dt: introduce for_each_available_child_of_node, of_get_next_available_child
      net: netprio: fix cgrp create and write priomap race
      ...
    committed Aug 21, 2012
  3. mm: compaction: Abort async compaction if locks are contended or taki…

    …ng too long
    
    Jim Schutt reported a problem that pointed at compaction contending
    heavily on locks.  The workload is straight-forward and in his own words;
    
    	The systems in question have 24 SAS drives spread across 3 HBAs,
    	running 24 Ceph OSD instances, one per drive.  FWIW these servers
    	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
    	Ceph Linux clients doing dd simultaneously to a Ceph file system
    	backed by 12 of these servers.
    
    Early in the test everything looks fine
    
      procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
       r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
      31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
      27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
      28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
       6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
      22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
    
    and then it goes to pot
    
      procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
       r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
      163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
      207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
      123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
      123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
      622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
      223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0
    
    Note that system CPU usage is very high blocks being written out has
    dropped by 42%. He analysed this with perf and found
    
      perf record -g -a sleep 10
      perf report --sort symbol --call-graph fractal,5
        34.63%  [k] _raw_spin_lock_irqsave
                |
                |--97.30%-- isolate_freepages
                |          compaction_alloc
                |          unmap_and_move
                |          migrate_pages
                |          compact_zone
                |          compact_zone_order
                |          try_to_compact_pages
                |          __alloc_pages_direct_compact
                |          __alloc_pages_slowpath
                |          __alloc_pages_nodemask
                |          alloc_pages_vma
                |          do_huge_pmd_anonymous_page
                |          handle_mm_fault
                |          do_page_fault
                |          page_fault
                |          |
                |          |--87.39%-- skb_copy_datagram_iovec
                |          |          tcp_recvmsg
                |          |          inet_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call
                |          |          __recv
                |          |          |
                |          |           --100.00%-- (nil)
                |          |
                |           --12.61%-- memcpy
                 --2.70%-- [...]
    
    There was other data but primarily it is all showing that compaction is
    contended heavily on the zone->lock and zone->lru_lock.
    
    commit [b2eef8c: mm: compaction: minimise the time IRQs are disabled
    while isolating pages for migration] noted that it was possible for
    migration to hold the lru_lock for an excessive amount of time. Very
    broadly speaking this patch expands the concept.
    
    This patch introduces compact_checklock_irqsave() to check if a lock
    is contended or the process needs to be scheduled. If either condition
    is true then async compaction is aborted and the caller is informed.
    The page allocator will fail a THP allocation if compaction failed due
    to contention. This patch also introduces compact_trylock_irqsave()
    which will acquire the lock only if it is not contended and the process
    does not need to schedule.
    
    Reported-by: Jim Schutt <jaschut@sandia.gov>
    Tested-by: Jim Schutt <jaschut@sandia.gov>
    Signed-off-by: Mel Gorman <mgorman@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Mel Gorman committed with Aug 21, 2012
  4. mm: have order > 0 compaction start near a pageblock with free pages

    Commit 7db8889 ("mm: have order > 0 compaction start off where it
    left") introduced a caching mechanism to reduce the amount work the free
    page scanner does in compaction.  However, it has a problem.  Consider
    two process simultaneously scanning free pages
    
    					    			C
    	Process A		M     S     			F
    			|---------------------------------------|
    	Process B		M 	FS
    
    	C is zone->compact_cached_free_pfn
    	S is cc->start_pfree_pfn
    	M is cc->migrate_pfn
    	F is cc->free_pfn
    
    In this diagram, Process A has just reached its migrate scanner, wrapped
    around and updated compact_cached_free_pfn accordingly.
    
    Simultaneously, Process B finishes isolating in a block and updates
    compact_cached_free_pfn again to the location of its free scanner.
    
    Process A moves to "end_of_zone - one_pageblock" and runs this check
    
                    if (cc->order > 0 && (!cc->wrapped ||
                                          zone->compact_cached_free_pfn >
                                          cc->start_free_pfn))
                            pfn = min(pfn, zone->compact_cached_free_pfn);
    
    compact_cached_free_pfn is above where it started so the free scanner
    skips almost the entire space it should have scanned.  When there are
    multiple processes compacting it can end in a situation where the entire
    zone is not being scanned at all.  Further, it is possible for two
    processes to ping-pong update to compact_cached_free_pfn which is just
    random.
    
    Overall, the end result wrecks allocation success rates.
    
    There is not an obvious way around this problem without introducing new
    locking and state so this patch takes a different approach.
    
    First, it gets rid of the skip logic because it's not clear that it
    matters if two free scanners happen to be in the same block but with
    racing updates it's too easy for it to skip over blocks it should not.
    
    Second, it updates compact_cached_free_pfn in a more limited set of
    circumstances.
    
    If a scanner has wrapped, it updates compact_cached_free_pfn to the end
    	of the zone. When a wrapped scanner isolates a page, it updates
    	compact_cached_free_pfn to point to the highest pageblock it
    	can isolate pages from.
    
    If a scanner has not wrapped when it has finished isolated pages it
    	checks if compact_cached_free_pfn is pointing to the end of the
    	zone. If so, the value is updated to point to the highest
    	pageblock that pages were isolated from. This value will not
    	be updated again until a free page scanner wraps and resets
    	compact_cached_free_pfn.
    
    This is not optimal and it can still race but the compact_cached_free_pfn
    will be pointing to or very near a pageblock with free pages.
    
    Signed-off-by: Mel Gorman <mgorman@suse.de>
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Reviewed-by: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Mel Gorman committed with Aug 21, 2012
  5. rapidio/tsi721: fix unused variable compiler warning

    Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
    option off.
    
    This patch is applicable to kernel versions starting from v3.2
    
    Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
    Cc: Matt Porter <mporter@kernel.crashing.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    abou9 committed with Aug 21, 2012
  6. rapidio/tsi721: fix inbound doorbell interrupt handling

    Make sure that there is no doorbell messages left behind due to disabled
    interrupts during inbound doorbell processing.
    
    The most common case for this bug is loss of rionet JOIN messages in
    systems with three or more rionet participants and MSI or MSI-X enabled.
    As result, requests for packet transfers may finish with "destination
    unreachable" error message.
    
    This patch is applicable to kernel versions starting from v3.2.
    
    Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
    Cc: Matt Porter <mporter@kernel.crashing.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    abou9 committed with Aug 21, 2012
  7. drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode

    Correct the offset by subtracting 20 from tm_hour before taking the
    modulo 12.
    
    [ "Why 20?" I hear you ask. Or at least I did.
    
      Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
      included in the RS5C348_HOURS_MASK define.  So it's really subtracting
      out that bit to get "hour+12".  But then because it does things modulo
      12, it needs to add the 12 in again afterwards anyway.
    
      This code is confused.  It would be much clearer if RS5C348_HOURS_MASK
      just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
      need to do the silly subtract either.
    
      Whatever. It's all just math, the end result is the same.   - Linus ]
    
    Reported-by: James Nute <newten82@gmail.com>
    Tested-by: James Nute <newten82@gmail.com>
    Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Atsushi Nemoto committed with Aug 21, 2012
  8. mm: correct page->pfmemalloc to fix deactivate_slab regression

    Commit cfd19c5 ("mm: only set page->pfmemalloc when
    ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc
    setting, but it missed some places the pfmemalloc should be set.
    
    So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
    cause incorrect deactivate_slab() on our core2 server:
    
        64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
                         |
                         --- _raw_spin_lock
                            |
                            |---0.34%-- deactivate_slab
                            |          __slab_alloc
                            |          kmem_cache_alloc
                            |          |
    
    That causes our fio sync write performance to have a 40% regression.
    
    Move the checking in get_page_from_freelist() which resolves this issue.
    
    Signed-off-by: Alex Shi <alex.shi@intel.com>
    Acked-by: Mel Gorman <mgorman@suse.de>
    Cc: David Miller <davem@davemloft.net
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
    Tested-by: Sage Weil <sage@inktank.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Alex Shi committed with Aug 21, 2012
  9. drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes

    Dynamically allocated sysfs attributes must be initialized using
    sysfs_attr_init(), otherwise lockdep complains: BUG: key <address> not in
    .data!
    
    Found by Linux Driver Verification project (linuxtesting.org).
    
    Signed-off-by: Ilya Shchepetkov <shchepetkov@ispras.ru>
    Cc: Chris Verges <chrisv@cyberswitching.com>
    Cc: Christian Pellegrin <chripell@fsfe.org>
    Cc: Alessandro Zummo <a.zummo@towertech.it>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Ilya Shchepetkov committed with Aug 21, 2012
  10. mm/compaction.c: fix deferring compaction mistake

    Commit aff6224 ("vmscan: only defer compaction for failed order and
    higher") fixed bad deferring policy but made mistake about checking
    compact_order_failed in __compact_pgdat().  So it can't update
    compact_order_failed with the new order.  This ends up preventing
    correct operation of policy deferral.  This patch fixes it.
    
    Signed-off-by: Minchan Kim <minchan@kernel.org>
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Acked-by: Mel Gorman <mel@csn.ul.ie>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    minchank committed with Aug 21, 2012
  11. drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out…

    … of IRQ resources
    
    On many of our larger systems, CPU 0 has had all of its IRQ resources
    consumed before XPC loads.  Worst cases on machines with multiple 10
    GigE cards and multiple IB cards have depleted the entire first socket
    of IRQs.
    
    This patch makes selecting the node upon which IRQs are allocated (as
    well as all the other GRU Message Queue structures) specifiable as a
    module load param and has a default behavior of searching all nodes/cpus
    for an available resources.
    
    [akpm@linux-foundation.org: fix build: include cpu.h and module.h]
    Signed-off-by: Robin Holt <holt@sgi.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Robin Holt committed with Aug 21, 2012
  12. string: do not export memweight() to userspace

    Fix the following warning:
    
      usr/include/linux/string.h:8: userspace cannot reference function or variable defined in the kernel
    
    Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
    Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    congwang committed with Aug 21, 2012
  13. hugetlb: update hugetlbpage.txt

    Commit f0f57b2 ("mm: move hugepage test examples to
    tools/testing/selftests/vm") moved map_hugetlb.c, hugepage-shm.c and
    hugepage-mmap.c tests into tools/testing/selftests/vm/ directory, but it
    didn't update hugetlbpage.txt
    
    Signed-off-by: Zhouping Liu <sanweidaying@gmail.com>
    Acked-by: Dave Young <dyoung@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    sanweiying committed with Aug 21, 2012
  14. checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_M…

    …ACRO
    
    Commit b13edf7 ("checkpatch: add checks for do {} while (0) macro
    misuses") added a test that is overly simplistic for single statement
    macros.
    
    Macros that start with control tests should be enclosed in a do {} while
    (0) loop.
    
    Add the necessary control tests to the check.
    
    Signed-off-by: Joe Perches <joe@perches.com>
    Acked-by: Andy Whitcroft <apw@canonical.com>
    Tested-by: Franz Schrober <franzschrober@yahoo.de>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    JoePerches committed with Aug 21, 2012
  15. mm: hugetlbfs: correctly populate shared pmd

    Each page mapped in a process's address space must be correctly
    accounted for in _mapcount.  Normally the rules for this are
    straightforward but hugetlbfs page table sharing is different.  The page
    table pages at the PMD level are reference counted while the mapcount
    remains the same.
    
    If this accounting is wrong, it causes bugs like this one reported by
    Larry Woodman:
    
      kernel BUG at mm/filemap.c:135!
      invalid opcode: 0000 [#1] SMP
      CPU 22
      Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
      Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
      RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
      Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
      Call Trace:
        delete_from_page_cache+0x40/0x80
        truncate_hugepages+0x115/0x1f0
        hugetlbfs_evict_inode+0x18/0x30
        evict+0x9f/0x1b0
        iput_final+0xe3/0x1e0
        iput+0x3e/0x50
        d_kill+0xf8/0x110
        dput+0xe2/0x1b0
        __fput+0x162/0x240
    
    During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
    shared page tables with the check dst_pte == src_pte.  The logic is if
    the PMD page is the same, they must be shared.  This assumes that the
    sharing is between the parent and child.  However, if the sharing is
    with a different process entirely then this check fails as in this
    diagram:
    
      parent
        |
        ------------>pmd
                     src_pte----------> data page
                                            ^
      other--------->pmd--------------------|
                      ^
      child-----------|
                     dst_pte
    
    For this situation to occur, it must be possible for Parent and Other to
    have faulted and failed to share page tables with each other.  This is
    possible due to the following style of race.
    
      PROC A                                          PROC B
      copy_hugetlb_page_range                         copy_hugetlb_page_range
        src_pte == huge_pte_offset                      src_pte == huge_pte_offset
        !src_pte so no sharing                          !src_pte so no sharing
    
      (time passes)
    
      hugetlb_fault                                   hugetlb_fault
        huge_pte_alloc                                  huge_pte_alloc
          huge_pmd_share                                 huge_pmd_share
            LOCK(i_mmap_mutex)
            find nothing, no sharing
            UNLOCK(i_mmap_mutex)
                                                          LOCK(i_mmap_mutex)
                                                          find nothing, no sharing
                                                          UNLOCK(i_mmap_mutex)
          pmd_alloc                                       pmd_alloc
          LOCK(instantiation_mutex)
          fault
          UNLOCK(instantiation_mutex)
                                                      LOCK(instantiation_mutex)
                                                      fault
                                                      UNLOCK(instantiation_mutex)
    
    These two processes are not poing to the same data page but are not
    sharing page tables because the opportunity was missed.  When either
    process later forks, the src_pte == dst pte is potentially insufficient.
    As the check falls through, the wrong PTE information is copied in
    (harmless but wrong) and the mapcount is bumped for a page mapped by a
    shared page table leading to the BUG_ON.
    
    This patch addresses the issue by moving pmd_alloc into huge_pmd_share
    which guarantees that the shared pud is populated in the same critical
    section as pmd.  This also means that huge_pte_offset test in
    huge_pmd_share is serialized correctly now which in turn means that the
    success of the sharing will be higher as the racing tasks see the pud
    and pmd populated together.
    
    Race identified and changelog written mostly by Mel Gorman.
    
    {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
    Reported-by: Larry Woodman <lwoodman@redhat.com>
    Tested-by: Larry Woodman <lwoodman@redhat.com>
    Reviewed-by: Mel Gorman <mgorman@suse.de>
    Signed-off-by: Michal Hocko <mhocko@suse.cz>
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Cc: David Gibson <david@gibson.dropbear.id.au>
    Cc: Ken Chen <kenchen@google.com>
    Cc: Cong Wang <xiyou.wangcong@gmail.com>
    Cc: Hillf Danton <dhillf@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Michal Hocko committed with Aug 21, 2012
  16. cciss: fix incorrect scsi status reporting

    Delete code which sets SCSI status incorrectly as it's already been set
    correctly above this incorrect code.  The bug was introduced in 2009 by
    commit b0e15f6 ("cciss: fix typo that causes scsi status to be
    lost.")
    
    Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
    Reported-by: Roel van Meer <roel.vanmeer@bokxing.nl>
    Tested-by: Roel van Meer <roel.vanmeer@bokxing.nl>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    smcameron committed with Aug 21, 2012
  17. Documentation: update mount option in filesystem/vfat.txt

    Update two mount options(discard, nfs) in vfat.txt.
    
    Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
    Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    namjaejeon committed with Aug 21, 2012