Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

SPL PANIC when creating a pool on top of a Ceph RBD #241

Closed
tdb opened this Issue May 24, 2013 · 59 comments

Comments

Projects
None yet
8 participants

tdb commented May 24, 2013

I'm trying to create a pool on top of a Ceph RBD. My setup is:

  • Running on a VMware VM
  • Ubuntu precise using linux-generic-lts-raring kernel (3.8.0.22.21)
  • ZFS/SPL 0.6.1 from ppa:zfs-native/stable
  • Ceph 0.61.2 from ceph.com repository

I can create a pool on top of a local disk without any problems. But when I put it on top of a Ceph RBD (block device) I get the following error:

# rbd ls -l
NAME     SIZE PARENT FMT PROT LOCK
cephzfs 1024G          1
# rbd map cephzfs --pool rbd --name client.admin
# ls -la /dev/rbd/rbd/cephzfs /dev/rbd1
brw-rw---- 1 root disk 251, 0 May 24 16:04 /dev/rbd1
lrwxrwxrwx 1 root root     10 May 24 16:04 /dev/rbd/rbd/cephzfs -> ../../rbd1
# zpool create pool1 /dev/rbd/rbd/cephzfs
cannot open 'pool1': dataset does not exist

And this panic:

[10582.132665] VERIFY(shpp->sh_eof == shpp->sh_pool_create_len) failed
[10582.132816] SPLError: 1746:0:(spa_history.c:276:spa_history_log_sync()) SPL PANIC
[10582.132958] SPL: Showing stack for process 1746
[10582.132962] Pid: 1746, comm: txg_sync Tainted: PF          O 3.8.0-22-generic #33~precise1-Ubuntu
[10582.132963] Call Trace:
[10582.132999]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[10582.133006]  [] spl_debug_bug+0x82/0xe0 [spl]
[10582.133045]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10582.133077]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10582.133107]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10582.133140]  [] spa_sync+0x3a8/0xa50 [zfs]
[10582.133160]  [] ? ktime_get_ts+0x4c/0xe0
[10582.133195]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10582.133229]  [] ? txg_init+0x250/0x250 [zfs]
[10582.133238]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10582.133246]  [] ? __thread_create+0x310/0x310 [spl]
[10582.133255]  [] kthread+0xc0/0xd0
[10582.133259]  [] ? flush_kthread_worker+0xb0/0xb0
[10582.133272]  [] ret_from_fork+0x7c/0xb0
[10582.133275]  [] ? flush_kthread_worker+0xb0/0xb0

And then the following repeats after that until I reboot:

[10779.414291] INFO: task txg_sync:1746 blocked for more than 120 seconds.
[10779.414442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10779.414588] txg_sync        D ffff88003737b460     0  1746      2 0x00000000
[10779.414596]  ffff88003c517ad8 0000000000000046 0000000000000013 ffff88003fc13f40
[10779.414601]  ffff88003c517fd8 ffff88003c517fd8 ffff88003c517fd8 0000000000013f40
[10779.414604]  ffff88003b2d9740 ffff88003b08c5c0 ffffffff81c15347 0000000000000000
[10779.414607] Call Trace:
[10779.414624]  [] schedule+0x29/0x70
[10779.414652]  [] spl_debug_bug+0xb5/0xe0 [spl]
[10779.414716]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10779.414751]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10779.414785]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10779.414818]  [] spa_sync+0x3a8/0xa50 [zfs]
[10779.414825]  [] ? ktime_get_ts+0x4c/0xe0
[10779.414863]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10779.414897]  [] ? txg_init+0x250/0x250 [zfs]
[10779.414906]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10779.414914]  [] ? __thread_create+0x310/0x310 [spl]
[10779.414919]  [] kthread+0xc0/0xd0
[10779.414922]  [] ? flush_kthread_worker+0xb0/0xb0
[10779.414926]  [] ret_from_fork+0x7c/0xb0
[10779.414929]  [] ? flush_kthread_worker+0xb0/0xb0
[10899.176620] INFO: task txg_sync:1746 blocked for more than 120 seconds.
[10899.176758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10899.176902] txg_sync        D ffff88003737b460     0  1746      2 0x00000000
[10899.176906]  ffff88003c517ad8 0000000000000046 0000000000000013 ffff88003fc13f40
[10899.176910]  ffff88003c517fd8 ffff88003c517fd8 ffff88003c517fd8 0000000000013f40
[10899.176913]  ffff88003b2d9740 ffff88003b08c5c0 ffffffff81c15347 0000000000000000
[10899.176917] Call Trace:
[10899.176926]  [] schedule+0x29/0x70
[10899.176958]  [] spl_debug_bug+0xb5/0xe0 [spl]
[10899.176998]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10899.177030]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10899.177059]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10899.177092]  [] spa_sync+0x3a8/0xa50 [zfs]
[10899.177097]  [] ? ktime_get_ts+0x4c/0xe0
[10899.177132]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10899.177166]  [] ? txg_init+0x250/0x250 [zfs]
[10899.177178]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10899.177186]  [] ? __thread_create+0x310/0x310 [spl]
[10899.177191]  [] kthread+0xc0/0xd0
[10899.177194]  [] ? flush_kthread_worker+0xb0/0xb0
[10899.177198]  [] ret_from_fork+0x7c/0xb0
[10899.177202]  [] ? flush_kthread_worker+0xb0/0xb0

I'm happy to provide any further information required or do testing as needed.

Thank you.
Tim.

hvenzke commented May 28, 2013

use the real physical name /dev/rbd1
no symlinks with zfs !!

tdb commented May 28, 2013

It makes no difference I'm afraid. The panic is identical.

hvenzke commented May 28, 2013

Well , then the bug is at Ceph RBD ´s logic basicly as that provide the storange .

ZFS on linux is known to work with native drbd fine.

Ceph RBD snapshoot featgers are overkill as ZFS does that itsself.

Can you try make an gfs cluster or lustre fs on it ?

tdb commented May 29, 2013

Ceph RBD works fine with other file systems for me, and ZFS works fine with other underlying storage. So it's hard to be precise about where the problem lies. In any case, ZFS shouldn't panic, surely? That's a bug.

Ceph provides a distributed file system which is why I want to use it. ZFS also has some great features for managing multiple file systems within a single pool including snapshots.

Owner

behlendorf commented Jun 7, 2013

@tdb You're hitting a VERIFY in the code while attempting to sync out the history buffer to disk. For some reason the buffer lengths aren't being correctly updated. Since this only happens on top of a ceph rbd I suspect their block device is behaving slightly differently that the rest of the Linux block drivers. For the purposes of a test you could try commenting out the VERIFY like this, although I my suspicion is you'll likely hit another issue quickly. However, that failure may shed some more light on exactly what's going wrong.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..2d45266 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -272,8 +272,8 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
            NV_ENCODE_XDR, KM_PUSHPAGE) == 0);

        mutex_enter(&spa->spa_history_lock);
-       if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
-               VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);
+//     if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
+//             VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);

        /* write out the packed length as little endian */
        le_len = LE_64((uint64_t)reclen);

Related to this most people usually think about putting ceph on top over zfs not vise-versa. This behavior was recently fixed in master so you might try that. It won't get you features like distributed snapshots but it will bring many of zfs's other benefits.

tdb commented Jun 7, 2013

@behlendorf Thanks for the reply. I made the change suggested (against 0.6.1) and saw the following:

# zpool create pool1 /dev/rbd1
cannot open 'pool1': dataset does not exist

So that's the same as before. Checking zpool status afterwards showed a good pool, but zfs status didn't show any filesystems. No panic though.

Then I tried to repeat it. This time I got a panic after creating the pool, and zpool status hung. The panic was:

[  183.924160] divide error: 0000 [#1] SMP
[  183.924349] Modules linked in: coretemp(F) microcode(F) psmouse(F) ppdev(F) vmw_balloon(F) serio_raw(F) i2c_piix4(F) vmwgfx(F) mac_hid(F) ttm(F) shpchp(F) drm(F) parport_pc(F) rbd(F) libceph(F) lp(F) parport(F) zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) floppy(F) e1000(F) mptspi(F) mptscsih(F) mptbase(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
[  183.926033] CPU 0
[  183.926100] Pid: 2019, comm: txg_sync Tainted: PF          O 3.8.0-23-generic #34~precise1-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[  183.926385] RIP: 0010:[]  [] spa_history_write+0x82/0x1d0 [zfs]
[  183.926631] RSP: 0018:ffff88003c549ab8  EFLAGS: 00010246
[  183.926742] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  183.926878] RDX: 0000000000000000 RSI: 0000000000000020 RDI: 0000000000000000
[  183.927015] RBP: ffff88003c549b28 R08: ffff88003cfb4b40 R09: 0000000000000003
[  183.927151] R10: ffff880037062303 R11: 316462722f766564 R12: ffff88003c496600
[  183.927287] R13: ffff88003be36000 R14: ffff88003cf9a000 R15: 0000000000000008
[  183.927424] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  183.927574] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  183.927690] CR2: 00007f3b12ef0000 CR3: 000000003b141000 CR4: 00000000000007f0
[  183.927924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  183.928132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  183.928312] Process txg_sync (pid: 2019, threadinfo ffff88003c548000, task ffff88003bc8ae80)
[  183.928535] Stack:
[  183.928633]  0000000000000002 ffffffffa01e3360 ffff88003cfb4b40 ffff88003c549ba0
[  183.929007]  ffff88003cf9a000 0000000000000008 ffff88003be36000 0000000068163d54
[  183.929382]  ffff88003b8a2cc0 ffff88003b8a2cc0 ffff88003be36000 ffff88003cfb4b40
[  183.929757] Call Trace:
[  183.929903]  [] spa_history_log_sync+0x221/0x610 [zfs]
[  183.930106]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[  183.930312]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[  183.930507]  [] spa_sync+0x3a8/0xa50 [zfs]
[  183.930667]  [] ? ktime_get_ts+0x4c/0xe0
[  183.930852]  [] txg_sync_thread+0x2df/0x540 [zfs]
[  183.931049]  [] ? txg_init+0x250/0x250 [zfs]
[  183.931219]  [] thread_generic_wrapper+0x78/0x90 [spl]
[  183.931397]  [] ? __thread_create+0x310/0x310 [spl]
[  183.931568]  [] kthread+0xc0/0xd0
[  183.936038]  [] ? flush_kthread_worker+0xb0/0xb0
[  183.936149]  [] ret_from_fork+0x7c/0xb0
[  183.936251]  [] ? flush_kthread_worker+0xb0/0xb0
[  183.936360] Code: 55 b0 48 89 fa 48 29 f2 48 01 c2 48 39 55 b8 0f 82 bc 00 00 00 4c 8b 75 b0 41 bf 08 00 00 00 48 29 c8 31 d2 49 8b b5 70 08 00 00 <48> f7 f7 4c 8d 45 c0 4c 89 f7 48 01 ca 48 29 d3 48 83 fb 08 49
[  183.938433] RIP  [] spa_history_write+0x82/0x1d0 [zfs]
[  183.938599]  RSP 
[  183.938710] ---[ end trace f7a46262c37aea79 ]---

If I had a more concrete idea of what was happening I'd be happy to file a bug with Ceph.

Owner

behlendorf commented Jun 7, 2013

Divide by zero, now that's interesting. Can you dump the exact code for your build as follows, it should look something like this but the exact line might differ. I want to know where that device by zero occurred.

[behlendo@rhel-6-2-amd64 zfs]$ gdb module/zfs/zfs.ko
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/behlendo/src/git/zfs/module/zfs/zfs.ko...done.
(gdb) 
(gdb)  list *(spa_history_write+0x82)
0x58a62 is in spa_history_write (/home/behlendo/src/git/zfs/module/zfs/../../module/zfs/spa_history.c:129).
124             int err;
125
126             phys_bof = spa_history_log_to_phys(shpp->sh_bof, shpp);
127             firstread = MIN(sizeof (reclen), shpp->sh_phys_max_off - phys_bof);
128
129             if ((err = dmu_read(mos, spa->spa_history, phys_bof, firstread,
130                 buf, DMU_READ_PREFETCH)) != 0)
131                     return (err);
132             if (firstread != sizeof (reclen)) {
133                     if ((err = dmu_read(mos, spa->spa_history,
(gdb) quit

tdb commented Jun 8, 2013

I've been building the module using dkms, but it appears to be stripping the module or not building it with symbols in the first place. Is there a way to modify that behaviour? Or am I going to need to ditch that and build it myself?

I've tried setting the relevant things in /etc/default/zfs.

Contributor

chrisrd commented Jun 8, 2013

Given previously failing VERIFY:

+//     if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
+//             VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);

...static analysis suggests:

static int
spa_history_write(spa_t *spa, void *buf, uint64_t len, spa_history_phys_t *shpp,
    dmu_tx_t *tx)
{
    ...
        phys_eof = spa_history_log_to_phys(shpp->sh_eof, shpp);
    ...
}

static uint64_t
spa_history_log_to_phys(uint64_t log_off, spa_history_phys_t *shpp)
{
        uint64_t phys_len;

        phys_len = shpp->sh_phys_max_off - shpp->sh_pool_create_len;
        return ((log_off - shpp->sh_pool_create_len) % phys_len      <<<< BOOM!
            + shpp->sh_pool_create_len);
}
Owner

behlendorf commented Jun 18, 2013

@tdb It depends on your kernel and what the default build options are. For example, the Ubuntu kernels will always strip the symbols. It may also not be needed since @chrisrd has likely spotted the offending line here.

It seems likely that we're somehow reading bogus data from the ceph rbd. It would be useful to see what those values are. If you're still interested in chasing this can you try the following patch. It will log the offending value to the console before the crash. It would be useful to run it several times to see if the values remain constant or change.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..700f364 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -223,6 +223,13 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
         */
        VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
        shpp = dbp->db_data;
+#ifdef _KERNEL
+       printk("sh_pool_create_len = %llu\n", shpp->sh_pool_create_len);
+       printk("sh_phys_max_off = %llu\n", shpp->sh_phys_max_off);
+       printk("sh_bof = %llu\n", shpp->sh_bof);
+       printk("sh_eof = %llu\n", shpp->sh_eof);
+       printk("sh_records_losts = %llu\n", shpp->sh_records_lost);
+#endif

        dmu_buf_will_dirty(dbp, tx);

tdb commented Jun 18, 2013

@behlendorf It looks like either through fiddling or other updates that I've managed to move the error:

[  422.936633]  rbd1: unknown partition table
[  422.936705] rbd: rbd1: added with size 0x10000000000
[  441.362250] SPL: using hostid 0x007f0101
[  441.470098] SPLError: 1682:0:(zap_micro.c:301:mze_find()) VERIFY3(mze->mze_cd == (&(zn->zn_zap)->zap_u.zap_micro.zap_phys->mz_chunk[(mze)->mze_chunkid])->mze_cd) failed (0 == 1635019877)
[  441.470418] SPLError: 1682:0:(zap_micro.c:301:mze_find()) SPL PANIC
[  441.470544] SPL: Showing stack for process 1682
[  441.470552] Pid: 1682, comm: txg_sync Tainted: PF          O 3.8.0-25-generic #37~precise1-Ubuntu
[  441.470554] Call Trace:
[  441.470579]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[  441.470589]  [] spl_debug_bug+0x82/0xe0 [spl]
[  441.470636]  [] mze_find+0x13a/0x270 [zfs]
[  441.470677]  [] zap_lookup_norm+0x9e/0x1c0 [zfs]
[  441.470685]  [] ? kmem_free_debug+0x4b/0x150 [spl]
[  441.470725]  [] zap_lookup+0x33/0x40 [zfs]
[  441.470765]  [] spa_feature_is_active+0x8a/0xf0 [zfs]
[  441.470799]  [] dsl_scan_active+0x76/0xc0 [zfs]
[  441.470833]  [] dsl_scan_sync+0x4f/0xe30 [zfs]
[  441.470873]  [] ? zio_wait+0x23d/0x480 [zfs]
[  441.470910]  [] ? bpobj_enqueue_cb+0x20/0x20 [zfs]
[  441.470947]  [] spa_sync+0x417/0xcd0 [zfs]
[  441.470968]  [] ? ktime_get_ts+0x4c/0xe0
[  441.471007]  [] txg_sync_thread+0x30a/0x640 [zfs]
[  441.471016]  [] ? kmem_free_debug+0x4b/0x150 [spl]
[  441.471054]  [] ? txg_quiesce_thread+0x540/0x540 [zfs]
[  441.471062]  [] thread_generic_wrapper+0x78/0x90 [spl]
[  441.471070]  [] ? __thread_create+0x310/0x310 [spl]
[  441.471080]  [] kthread+0xc0/0xd0
[  441.471084]  [] ? flush_kthread_worker+0xb0/0xb0
[  441.471096]  [] ret_from_fork+0x7c/0xb0
[  441.471100]  [] ? flush_kthread_worker+0xb0/0xb0

If that's of no use to you, let me know and I'll try to get the machine back how it was. I notice the kernel version has changed, and I'm fairly sure a ceph update got pulled in too.

Owner

behlendorf commented Jun 18, 2013

@tdb This just looks like garbage data from disk as well. One thing which did catch my eye however from the above log was the size of the rbd device. 0x10000000000 is a surprisingly round number for the partition, is this expected? Also are you creating a partition table for zfs manually, or allowing it to partition the device?

[  422.936705] rbd: rbd1: added with size 0x10000000000

tdb commented Jun 19, 2013

@behlendorf I noticed that size too. It's a 1GB partition, so it's actually correct.

# rbd ls -l
NAME     SIZE PARENT FMT PROT LOCK
cephzfs 1024G          1

I was giving the raw device to ZFS, rather than creating a partition.

If I use fdisk to but a partition table on the disk, but without adding any partitions, I get the following when creating a pool:

# zpool create pool1 /dev/rbd1
internal error: Invalid argument
Aborted (core dumped)

If I create a partition on it I get the same errors as I mentioned previously (mze_find) when creating a pool on /dev/rbd1p1.

Just for comparison, here's the output creating an ext4 filesystem on the same partition:

root@ubuntu:~# mkfs.ext4 /dev/rbd1p1
mke2fs 1.42 (29-Nov-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1024 blocks, Stripe width=1024 blocks
67108864 inodes, 268434432 blocks
13421721 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
8192 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
root@ubuntu:~# mount /dev/rbd1p1 /mnt
root@ubuntu:~# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd1p1    1008G   72M  957G   1% /mnt
root@ubuntu:~#
Owner

behlendorf commented Jun 19, 2013

Strange. Well the only way these failures make sense is if something odd is happening at the block device layer. My next suggestion would be to use blktrace to grab a trace log for the rbd device. That would allow us to look for something unusual in the way the rbd or zfs is behaving.

http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html

Owner

behlendorf commented Jun 20, 2013

@tdb That's exactly the log I wanted to see, but unfortunately it doesn't really show anything strange. All the I/O looks reasonable and is doing what I'd expect a zpool create to do. It's the right size and it's all within the size of the device. However, what is interesting is that it doesn't show any reads before the crash.

That's got me wondering if the rbd driver might be modifying parts of the pages in the bvecs during the write. That could explain this issue, but we'd need to put a debug patch together to see.

Contributor

chrisrd commented Jun 21, 2013

@TBD Based on little more than the mention of modifying bvecs, this commit which touches drivers/block/rbd.c might be relevant:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d74c6d514fe314b8bdab58b487b25992291577ec

block: Add bio_for_each_segment_all()

__bio_for_each_segment() iterates bvecs from the specified index
instead of bio->bv_idx.  Currently, the only usage is to walk all the
bvecs after the bio has been advanced by specifying 0 index.

For immutable bvecs, we need to split these apart;
bio_for_each_segment() is going to have a different implementation.
This will also help document the intent of code that's using it -
bio_for_each_segment_all() is only legal to use for code that owns the
bio.

If your kernel doesn't have that patch already it could be worthwhile trying a kernel including it. It looks to have been introduced some time between v3.9 and v3.10-rc1. Possibly even worth trying v3.10-rc6 which has pulled in a bunch of rbd.c changes

tdb commented Jun 21, 2013

@chrisrd Using the Ubuntu mainline kernels I tried v3.9.7, but it behaved the same. I checked and it doesn't cotain the commit you mentioned above. So I tried v3.10-rc6 and I get the following build error in spl:

Making all in module
make[2]: Entering directory `/var/lib/dkms/spl/0.6.1/build/module'
make -C /lib/modules/3.10.0-031000rc6-generic/build SUBDIRS=`pwd`  CONFIG_SPL=m modules
make[3]: Entering directory `/usr/src/linux-headers-3.10.0-031000rc6-generic'
  CC [M]  /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-debug.o
  CC [M]  /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.o
In file included from /var/lib/dkms/spl/0.6.1/build/include/sys/kmem.h:38:0,
                 from /var/lib/dkms/spl/0.6.1/build/include/sys/kstat.h:32,
                 from /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:28:
/var/lib/dkms/spl/0.6.1/build/include/sys/vmsystm.h:77:8: error: redefinition of ‘struct vmalloc_info’
include/linux/vmalloc.h:173:8: note: originally defined here
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_match’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1126:15: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1129:32: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_find’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1137:16: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1137:37: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entries’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1150:16: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1150:37: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘spl_proc_init’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1177:2: error: implicit declaration of function ‘create_proc_entry’ [-Werror=implicit-function-declaration]
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1177:21: warning: assignment makes pointer from integer without a cast [enabled by default]
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1181:27: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_match’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1130:1: warning: control reaches end of non-void function [-Wreturn-type]
cc1: some warnings being treated as errors
make[5]: *** [/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.o] Error 1
make[4]: *** [/var/lib/dkms/spl/0.6.1/build/module/spl] Error 2
make[3]: *** [_module_/var/lib/dkms/spl/0.6.1/build/module] Error 2
make[3]: Leaving directory `/usr/src/linux-headers-3.10.0-031000rc6-generic'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/var/lib/dkms/spl/0.6.1/build/module'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/lib/dkms/spl/0.6.1/build'
make: *** [all] Error 2

Have spl/zfs been tested with v3.10 yet?

Owner

behlendorf commented Jun 21, 2013

@tdb There are pull requests open for 3.10 support by they are still under going review before getting merged. They should be safe to use, the only real questions around them are do they accidentally break builds on older kernels and are they as clean as they can be.

@chrisrd I don't think the referenced commit will help, but it wouldn't hurt to try. We'll probably need to instrument the zfs vdev_disk.c code to see exactly what's happening to the bios.

tdb commented Aug 25, 2013

Just a quick update on this. I've tried again with 0.6.2 and the following two kernels:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10.9-saucy/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-rc6-saucy/

Same problem:

Aug 25 00:51:29 ubuntu-12042 kernel: [  142.393672] SPLError: 2851:0:(zap_micro.c:301:mze_find()) VERIFY3(mze->mze_cd == (&(zn->zn_zap)->zap_u.zap_micro.zap_phys->mz_chunk[(mze)->mze_chunkid])->mze_cd) failed (0 == 825307184)
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394034] SPLError: 2851:0:(zap_micro.c:301:mze_find()) SPL PANIC
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394160] SPL: Showing stack for process 2851
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394164] CPU: 0 PID: 2851 Comm: txg_sync Tainted: PF          O 3.11.0-031100rc6-generic #201308181835
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394166] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/22/2012
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394169]  ffff88003c59da00 ffff88003c4ab9c8 ffffffff81720b9b 0000000000000007
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394173]  0000000000000000 ffff88003c4ab9d8 ffffffffa018f4d7 ffff88003c4aba18
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394176]  ffffffffa01907a2 ffffffffa01a4b4d ffff880036998880 ffff88003c59da00
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394179] Call Trace:
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394203]  [] dump_stack+0x46/0x58
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394221]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394246]  [] spl_debug_bug+0x82/0xe0 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394314]  [] mze_find+0x13a/0x270 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394359]  [] zap_lookup_norm+0x9e/0x1c0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394368]  [] ? kmem_free_debug+0x4b/0x150 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394410]  [] zap_lookup+0x33/0x40 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394451]  [] spa_feature_is_active+0x8a/0xf0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394485]  [] dsl_scan_active+0x76/0xc0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394520]  [] dsl_scan_sync+0x4f/0xe30 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394559]  [] ? zio_wait+0x23d/0x4a0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394596]  [] ? bpobj_enqueue_cb+0x20/0x20 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394633]  [] spa_sync+0x48a/0xd60 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394649]  [] ? ktime_get_ts+0x4c/0xe0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394687]  [] txg_sync_thread+0x30a/0x640 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394696]  [] ? kmem_free_debug+0x4b/0x150 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394733]  [] ? txg_quiesce_thread+0x540/0x540 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394742]  [] thread_generic_wrapper+0x78/0x90 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394750]  [] ? __thread_create+0x310/0x310 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394759]  [] kthread+0xc0/0xd0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394763]  [] ? flush_kthread_worker+0xb0/0xb0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394771]  [] ret_from_fork+0x7c/0xb0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394776]  [] ? flush_kthread_worker+0xb0/0xb0

tdb commented Nov 21, 2013

Using 0.6.2 and the linux-image-generic-lts-saucy 3.11.0.13.12 kernel on Ubuntu precise I now get the following:

# zpool create pool2 /dev/rbd1
internal error: Invalid argument
Aborted (core dumped)

The core file contains:

#0  0x00007ffa1abad425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffa1abb0b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffa1b383782 in ?? () from /lib/libzfs.so.2
#3  0x00007ffa1b383b70 in zfs_standard_error_fmt () from /lib/libzfs.so.2
#4  0x00007ffa1b364a1e in zfs_open () from /lib/libzfs.so.2
#5  0x000000000040bc98 in zpool_do_create (argc=, argv=) at ../../cmd/zpool/zpool_main.c:1057
#6  0x0000000000404d26 in main (argc=4, argv=0x7fffecdc5178) at ../../cmd/zpool/zpool_main.c:5709

And this in the log:

Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240529] SPLError: 1688:0:(spa.c:6190:spa_sync()) VERIFY3(bpobj_iterate(defer_bpo, spa_free_sync_cb, zio, tx) == 0) failed (22 == 0)
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240786] SPLError: 1688:0:(spa.c:6190:spa_sync()) SPL PANIC
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240899] SPL: Showing stack for process 1688
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240910] CPU: 0 PID: 1688 Comm: txg_sync Tainted: PF          O 3.11.0-13-generic #20~precise2-Ubuntu
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240912] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240915]  0000000000000005 ffff88003c6f9c48 ffffffff8173a05d 0000000000000007
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240919]  0000000000000000 ffff88003c6f9c58 ffffffffa01794d7 ffff88003c6f9c98
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240922]  ffffffffa017a7a2 ffffffffa018ebed ffff88003b804000 0000000000000005
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240925] Call Trace:
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240943]  [] dump_stack+0x46/0x58
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240971]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240979]  [] spl_debug_bug+0x82/0xe0 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241024]  [] spa_sync+0x9f7/0xdb0 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241080]  [] txg_sync_thread+0x364/0x6a0 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241122]  [] ? txg_quiesce_thread+0x520/0x520 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241131]  [] thread_generic_wrapper+0x78/0x90 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241139]  [] ? __thread_create+0x310/0x310 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241145]  [] kthread+0xc0/0xd0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241149]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241158]  [] ret_from_fork+0x7c/0xb0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241162]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.848936] INFO: task txg_sync:1688 blocked for more than 120 seconds.
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849220] txg_sync        D ffff880036a5ece0     0  1688      2 0x00000000
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849226]  ffff88003c6f9c48 0000000000000046 ffffffff81ae70b3 ffff88003fc14580
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849230]  ffff88003c6f9fd8 ffff88003c6f9fd8 ffff88003c6f9fd8 0000000000014580
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849233]  ffff88003cd69770 ffff88003cd6aee0 0000000000000000 0000000000000000
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849236] Call Trace:
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849252]  [] schedule+0x29/0x70
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849299]  [] spl_debug_bug+0xb5/0xe0 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849346]  [] spa_sync+0x9f7/0xdb0 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849387]  [] txg_sync_thread+0x364/0x6a0 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849427]  [] ? txg_quiesce_thread+0x520/0x520 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849445]  [] thread_generic_wrapper+0x78/0x90 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849454]  [] ? __thread_create+0x310/0x310 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849460]  [] kthread+0xc0/0xd0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849464]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849468]  [] ret_from_fork+0x7c/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849471]  [] ? flush_kthread_worker+0xb0/0xb0

Further zpool commands generate the following:

Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182141] SPLError: 2064:0:(zap_micro.c:1292:zap_cursor_retrieve()) VERIFY3(mze->mze_cd == mzep->mze_cd) failed (0 == 1635019877)
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182264] SPLError: 2064:0:(zap_micro.c:1292:zap_cursor_retrieve()) SPL PANIC
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182329] SPL: Showing stack for process 2064
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182335] CPU: 0 PID: 2064 Comm: zpool Tainted: PF          O 3.11.0-13-generic #20~precise2-Ubuntu
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182337] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182338]  ffff88003c25b640 ffff88003bdebac8 ffffffff8173a05d 0000000000000007
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182341]  0000000000000000 ffff88003bdebad8 ffffffffa01794d7 ffff88003bdebb18
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182343]  ffffffffa017a7a2 ffffffffa018ebed ffff88003bdebbf8 ffff88003c25b640
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182345] Call Trace:
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182352]  [] dump_stack+0x46/0x58
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182363]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182367]  [] spl_debug_bug+0x82/0xe0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182400]  [] zap_cursor_retrieve+0x24a/0x480 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182414]  [] ? default_spin_lock_flags+0x9/0x10
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182441]  [] ? zap_unlockdir+0x108/0x1a0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182466]  [] spa_add_feature_stats+0x213/0x440 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182471]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182476]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182482]  [] ? nvlist_remove_all+0x8f/0xd0 [znvpair]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182506]  [] ? spa_config_held+0xb9/0xd0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182531]  [] ? spa_add_l2cache+0x29/0x3f0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182555]  [] ? spa_add_spares+0x25/0x360 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182579]  [] spa_get_stats+0x10f/0x330 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182584]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182610]  [] zfs_ioc_pool_stats+0x31/0x70 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182636]  [] zfsdev_ioctl+0x53b/0x5b0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182646]  [] ? ftrace_raw_event_do_sys_open+0x100/0x110
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182651]  [] do_vfs_ioctl+0x7c/0x2f0
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182653]  [] SyS_ioctl+0x91/0xb0
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182657]  [] system_call_fastpath+0x1a/0x1f
Owner

behlendorf commented Nov 21, 2013

@tdb This was due to an ABI change between the user utilities and kmods. It is unrelated to your original issue. Make sure you rebuild everything such that the utilities exactly match the kmods.

tdb commented Nov 22, 2013

@behlendorf Ah, ok, sorry for the noise. I actually just installed the binary packages from launchpad, which built the kmods with dkms. So I would have expected that to stay in sync. Anyway - as you say, not related to this issue.

rbraddy commented Dec 18, 2013

Is this still an issue or has a resolution been found?

tdb commented Dec 18, 2013

@rbraddy I'm not aware of a resolution yet, but @behlendorf can confirm. Are you seeing the same problem? It'd be good to know it's not just me!

rbraddy commented Dec 18, 2013

Yes, we are seeing the same issue with creating ZFS storage pool atop of Ceph RDB block device - zpool create failure and kernel panic.

Creating ext4 filesystem on RDB works perfectly. RBD is used extensively today by various cloud stacks (e.g., Open Stack, Cloud Stack and others), so there seems to be no issue with how it presents itself as a block device for those file systems.

Having RDB work well with ZFS is very important, as it addresses one of the major drawbacks to ZFS - a single point of failure on direct-attached storage, plus the ability to scale out. RADOS is very impressive technology, and combined with ZFS promises to be the most powerful filesystem around. Ceph's filesystem is not ready for prime time, so it just makes sense for these two technologies to work well together and be supported (the way ZFS is supported underneath Ceph OSD's today).

I agree that it's odd that: a) ZFS is the only major file system that is not working atop of RDB today, and b) ZFS panics instead of failing gracefully in the face of whatever incompatibility exists.

Having said that, from what I have seen, ZFS does work a bit differently than many other file systems. In our testing, we also encountered strange behavior by PARTED when trying to delete an existing ext4 partition that we initially configured atop of RDB, in an attempt to create an empty GPT partition in preparation for use with ZFS. ZFS creates its own partitioning scheme from what I have seen, so this may be a clue. We are still investigating, but at this point lack the deep kernel expertise required to reconcile the issue between these two complex systems.

In reading through this thread, about six months ago, I see Brian proposed something as a next step that does not appear to have occurred yet, to gather more information as a next step. I'm wondering if it makes sense to pursue that line of analysis next:

From @behlendorf : It seems likely that we're somehow reading bogus data from the ceph rbd. It would be useful to see what those values are. If you're still interested in chasing this can you try the following patch. It will log the offending value to the console before the crash. It would be useful to run it several times to see if the values remain constant or change.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..700f364 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -223,6 +223,13 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
         */
        VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
        shpp = dbp->db_data;
+#ifdef _KERNEL
+       printk("sh_pool_create_len = %llu\n", shpp->sh_pool_create_len);
+       printk("sh_phys_max_off = %llu\n", shpp->sh_phys_max_off);
+       printk("sh_bof = %llu\n", shpp->sh_bof);
+       printk("sh_eof = %llu\n", shpp->sh_eof);
+       printk("sh_records_losts = %llu\n", shpp->sh_records_lost);
+#endif

        dmu_buf_will_dirty(dbp, tx);
Member

dweeezil commented Dec 30, 2013

I just wanted to post a note here to say that I've started actively looking into this problem. I'm occasionally able to reproduce similar problems as the original report but my general observation is that any other forms of chaos can seem to result from running ZFS atop RDB. Unfortunately, I got sidetracked while looking into this and burned a ton of time tracking down the problem described in zfsonlinux/zfs#2010. With that out of the way, hopefully I'm back on track now.

Also, I should mention that this should likely be a ZFS issue rather than an SPL issue.

hvenzke commented Dec 30, 2013

@dweeezil Tim , some of the logs i have read about this with zfs+ Ceph RDB said that the ZFS ´s used partion table not supported by PARTED ?!??

  1. What exactly Partion type been set BEVOR you try to make an zpool on the Ceph RDB ?
  2. did you tried sliced `(diskP2 )setup instead of wholedisk(disk) ?

3 . did you tried fdisk on the Ceph RDB disk , type "bf" usage ?

uppon my ZFS skills BF are the default , someone may allowed to correct me .-)

Member

dweeezil commented Dec 30, 2013

I'm still trying to get a grip on the actual problem. So far, I'm fairly certain the problem is not simply that the rbd block device behaves differently than do block devices.

@remsnet For my current testbed, I'm generally creating my ZFS pool on a single pre-created partition on the rbd device (actually, my preferred testbed is to dd a known good pool on to my rbd and test from there). I'm hoping to narrow down the problem a bit more within the next day or so once I get more time to look at it.

The failures I'm seeing when performing normal filesystem operations are many and varied. I'm concerned that zfs+rbd is exceeding Linux's kernel stack limit but I've not been able to prove it. I do plan on building a 16K stack kernel as part of my further testing to try to rule it out. Using debugfs' stack_trace feature has been very iffy with wild pointer (NULL or close-to-null) dereferences typically occurring in the ftrace_call() function, itself. I also plan on doing some instrumenting of rbd by itself to get a handle on its "base" stack utilization. The failures I'm seeing are typical of those you'd see when memory (the stack in particular) is overwritten.

I'll post more information as I get it it.

Owner

behlendorf commented Jan 7, 2014

@dweeezil It's great to see you looking in to this. Stack overun is certainly one possible explanation for this, I could easily believe that the ceph rbd is more stack heavy that other block devices in the kernel. As you said rebuilding your kernel with 16k stacks would be the easiest way to check for this.

Member

dweeezil commented Jan 7, 2014

I realize a quick update on this may be in order since my last communication has been directly to @rbraddy off-list.

I've pretty much ruled out stack overflow and, in fact, the stack usage seems to be very well controlled with my zpool commands being the greatest consumer but still having over 11K free when using a kernel with 16K stacks.

What I'm seeing when trying simple ZPL operations on a known-good pool (dd'ed to a Ceph RBD and md5-checksumed for verification) is that somewhat random memory corruption. I briefly experimented with kmemcheck but I don't think it's going to be too helpful; the kernel runs too slowly and it has a bad habit of finding false positives. Instead, I'm trying to find a way to reliably trip specific types of assertions. My best success has been that it's very easy to corrupt the microZAPs when creating small files. In my last bit of detective work, it looks like they're being clobbered by, of call things, the in-memory debugging output from the various dprintf...() debugging functions.

I've taken a few days off of looking at this mainly due to my discovery of the zfsonlinux/zfs#2010. I'll post more information on this issue as it becomes available.

hvenzke commented Jan 7, 2014

last real kernel stackoverrun i heard was when Linus left the kernel serries 1.x alone
from < 3k to 8k soome years go .
Good to hear that its not required to have an 16K base as default as it whuold cause some issues on several systems with smal ram .

Contributor

chrisrd commented Jan 7, 2014

@dweeezil What version of kernel rbd are you using? A whole bunch of rbd changes, some of which could conceivably cause corruptions, hit master in v3.12-rc2 and are slated for the next round of stable updates (v3.10.26 etc).

Member

dweeezil commented Jan 8, 2014

@chrisrd Since I'm doing my testing under Ubuntu 13.10 I'm running their stock 3.11.0-14-generic kernel compiled by myself and fetched from their git repo. The last commit in the fs/ceph directory is 494ddd1 from the upstream. I did think about running a new kernel (even a stock upstream) but decided to try to stay as standard as possible for the time being. I did also checkout the commit logs going forward until the latest master code in Linus' tree and none of them looked like total show-stoppers to me, but that was just after a cursory inspection. Should I be running something newer?

Contributor

chrisrd commented Jan 9, 2014

@dweeezil Given you're using rbd, the relevant kernel parts would be drivers/block/rbd* net/ceph include/linux/ceph rather than fs/ceph (that's for the ceph file system, not directly related to rbd, except they both use the ceph underpinnings).

But I agree, looking at the logs from torvalds/linux@494ddd1 onwards doesn't show anything that looks particularly suspicious, except perhaps if you're using rbd snapshots:

cd src/linux
git checkout master
git pull
git log 494ddd1..HEAD drivers/block/rbd* net/ceph include/linux/ceph
Member

dweeezil commented Jan 9, 2014

@chrisrd Thanks for the confirmation. Yes, that was a booboo for me to refer to fs/ceph rather than drivers/block/rbd* (which is actually what I was looking at the first time I scanned the commits a week ago). In any case, it seems that I should be safe staying with the kernel I'm working with.

Member

dweeezil commented Jan 13, 2014

It seems the rbd driver will overwrite memory in some cases when passed odd-sized bio requests from ZFS. I gave up trying to track down the memory corruption because I couldn't get anything reproducible so I compared ZFS' block I/O to that of other file systems (ext4 and xfs) and discovered that the others only presented nice evenly-sized requests (usually 8 sectors or multiples of 8) but that ZFS presents "unusual" sizes such as 5, 13 and 15 sectors. Apparently the rbd driver doesn't like some aspect of these requests and the result is memory corruption. I've not yet determined whether the transfer size or the buffer alignment is the problem.

In any case, a trivial workaround is to use an ashift 12 pool. They appear to work just fine on rbd and present it with nice evenly-sized bios. I've asked @rbraddy to do some testing on his Ceph rigs.

Contributor

chrisrd commented Jan 13, 2014

I'm sure the Ceph people would like to know about this: would you like to enter whatever details you have into the ceph bug tracker?

Owner

behlendorf commented Jan 14, 2014

@dweeezil That makes good sense since most, maybe all, Linux filesystems will always submit IO in 4k page sized chunks from the page cache. I could easily see an alignment issue like in the ceph rbd not being caught.

aieri commented Mar 19, 2014

@dweeezil if anybody who has a fair understanding about what's going on could submit a bug on the CEPH tracker I would be happy to help push things along. We have a support contract with Inktank and could bring this issue to their attention.

Contributor

chrisrd commented Mar 19, 2014

Issue opened on ceph bug tracker: http://tracker.ceph.com/issues/7790

aieri commented Mar 28, 2014

Unfortunately the word from Inktank is that since ZFS on RBD is not supported, they will not handle the bug as part of our support contract, but it will simply be a community effort.
I have tried to reproduce the bug with misaligned partitions and "standard" filesystems but without success. If this bug resurfaced with, say, ext or xfs, then they might reconsider.

Member

dweeezil commented Mar 28, 2014

@aieri I don't think that simply misaligning standard filesystems will make a difference. With ext4, maybe try mkfs.ext4 -b 1024 or with xfs try mkfs.xfs -b 512 (supposedly it supports 512 byte block size). You can then use blktrace to watch the IO requests. You'll find that with ashift=9, ZFS will frequently make small odd-sized (5KiB for example) transfers. When I was doing my testing, I was not able to get either xfs or ext4 to do the same but I never tried creating those filesystems with "-b".

Owner

behlendorf commented Mar 28, 2014

You shouldn't be able to cause this with filesystem like ext or xfs since they will be strictly page aligned (4k).

We could probably work around it fairly easily on the ZFS side by detecting the it's a ceph rbd and automatically setting the ashift to 12. That said it would be better to fix this on the ceph side of course.

Member

dweeezil commented Mar 30, 2014

I never did determine whether it was the alignment or the transfer size that caused the problem.

It would be nice to detect Ceph RBD under ZFS and set ashift=12 as a workaround.

Member

tuxoko commented Apr 16, 2014

Hi all:
I tried this issue today and got an interesting result.

At first with spl-master, zfs-master and linux-3.14.1
# zpool create -o ashift=12 p1 /dev/rbd0 -R /mnt
Success
# zpool create -o ashift=9 p1 /dev/rbd0 -R /mnt
Kernel oops

Then I tried again with the following patch:
https://gist.github.com/tuxoko/10832862
# zpool create -o ashift=12 p1 /dev/rbd0 -R /mnt
Success
# zpool create -o ashift=9 p1 /dev/rbd0 -R /mnt
Success!! And I can run bonnie++ without crashing it.

I still haven't figure out why, but it seems that rbd cannot handle vmalloc'd buffer properly.

Member

dweeezil commented Apr 16, 2014

@tuxoko Very interesting! Have you tried setting "c" to 16 if it were less and continue to let it use kmem_cache_alloc()? How about rounding it up to the next even value? Rounding up to a multiple of 8?

Member

tuxoko commented Apr 17, 2014

Hi @dweeezil
I continued using the above patch with kmalloc size going from 8192 to 512 for every 2^n.
The results are all successful. So it seems the problem only show up with 512.

Then, I tried with the below one:
https://gist.github.com/tuxoko/10954458
The idea is that I vmalloc one page and put the buffer randomly into 512 aligned position in the page.
The result is also sucessful.

So I'm now more inclined to believe that there's a subtle bug in spl slab that happen to show up 100% when using 512 byte slab with ashift=9 on rbd.

By the way, I get SPL: kmem leaked 39/75659481 bytes (number varies from 1x to 3x) every time when I rmmod.

@behlendorf behlendorf referenced this issue in zfsonlinux/zfs Apr 17, 2014

Closed

Fix zfsdev_ioctl() kmem leak warning #2262

Member

tuxoko commented Apr 18, 2014

So I noticed this in zio.c:

                        /*
                         * The smallest buffers (512b) are heavily used and
                         * experience a lot of churn.  The slabs allocated
                         * for them are also relatively small (32K).  Thus
                         * in over to avoid expensive calls to vmalloc() we
                         * make an exception to the usual slab allocation
                         * policy and force these buffers to be kmem backed.
                         */
                        if (size == (1 << SPA_MINBLOCKSHIFT))
                                flags |= KMC_KMEM;

I commented out this part, and the result is also successful.
This explains why only 512 byte slab is causing trouble.

There's definitely a bug in spl slab especially with KMC_KMEM
@behlendorf Would you happen to know what might have caused this?

Update:
I forced smaller buffer to use 8K slab with KMC_KMEM and result in failure
More update:
So I use ashift=12 with the above setting and the result is also failure.
It seems the problem shows up when KMC_KMEM buffer size >= ashift size.

Member

dweeezil commented Apr 18, 2014

@tuxoko Wondering if there could be a tie-in to this code in bio_map():

                if (kmem_virt(bio_ptr))
                        page = vmalloc_to_page(bio_ptr);
                else
                        page = virt_to_page(bio_ptr);

When I was trying to track this down, I did all sorts of hackery to __vdev_disk_physio() and friends to try to force them to do "nice" alignments, sizes, etc. but it was to no avail. I wasn't aware of the KMC_KMEM override you pointed out above.

Member

tuxoko commented Apr 18, 2014

I'm a bit confused.
I've looked into spl-kmem.c but I didn't see anything suspicious between KMC_KMEM or KMC_VMEM.
The bio_map() doesn't seem wrong either, but I could be wrong.

The real mystery is that at first I thought there's a problem with vmalloc, so I use kmalloc and it worked. But it turned out that the it wasn't using vmalloc in the beginning and change it to vmalloc also "fixed" it.

Member

tuxoko commented Apr 18, 2014

@dweeezil
I've come up with another theory.
I think there could be some possibility that this problem is caused by other KMC_KMEM slabs.
If you could, could you try to change all other slabs to KMC_VMEM?
My week is over so I have to wait til next Monday to test this idea.
It would be great if you could test it now.

Member

dweeezil commented Apr 18, 2014

@tuxoko I should be able to get to this later this evening. The most interesting clue I can remember from my previous research into the problem is that the data being overwritten is frequently the in-memory SPL log (the one you can view after triggering the /proc/sys/kernel/spl/debug flag) which I'll note are allocated as a bunch of 1024 byte kmalloc() allocations.

Owner

behlendorf commented Apr 18, 2014

@tuxoko @dweeezil Sorry I've been slow to comment on this but I think I can shed some light on the code referenced above.

The zio buffers will be a mix of kmem and vmem backed memory depending on the size of the object. We want to minimize the overhead of an allocation so we avoid vmem backed buffers if possible. Ideally they should all be backed by individual pages which would make them simpler to hand off to the block layer which expects individual pages. That's why we need the bio_map() function in the vdev_disk code to determine what type of memory backs this block so we can get the right page. But as you know using pages at the higher ZFS layers is problematic. But if we could use pages all the way through (like other Linux FSs) this code can all be removed.

Now exactly why small slab objects would cause a problem isn't clear to me. The only thing which immediately occurs to me is that they are likely to share pages in the slab. These pages will be mapped in to the bio and perhaps that leading to an issue.

Member

dweeezil commented Apr 21, 2014

Unfortunately, I've let my ceph testing rig go stale and it's not working right now so I can't easily do any more testing on this issue for the moment.

Member

tuxoko commented Apr 21, 2014

Hi @dweeezil @behlendorf

I built a DEBUG_PAGEALLOC kernel and got the following result with ashift=9 on unpatched master

[ 1256.830668] BUG: unable to handle kernel paging request at ffff88003bdaf600
[ 1256.830971] IP: [<ffffffff8137d827>] memmove+0x37/0x1a0
[ 1256.831210] PGD 1fd0067 PUD 1fd1067 PMD 3fc06067 PTE 800000003bdaf060
[ 1256.831492] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1256.831691] Modules linked in: zfs(POF) zcommon(POF) zunicode(POF) znvpair(POF) zavl(POF) spl(OF) rbd(F) libceph(F) libcrc32c(F) vboxsf(OF) rfcomm(F) bnep(F) bluetooth(F) 6lowpan_iphc(F) snd_intel8x0(F) snd_ac97_codec(F) ac97_bus(F) snd_pcm(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F) snd_timer(F) joydev(F) ppdev(F) snd_seq_device(F) snd(F) microcode(F) vboxvideo(OF) soundcore(F) drm(F) psmouse(F) lp(F) parport_pc(F) i2c_piix4(F) serio_raw(F) vboxguest(OF) parport(F) mac_hid(F) hid_generic(F) usbhid(F) hid(F) ahci(F) libahci(F) e1000(F) [last unloaded: spl]
[ 1256.833755] CPU: 1 PID: 5282 Comm: z_wr_int/3 Tainted: PF          O 3.14.1-debug-pagealloc #1
[ 1256.834086] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 1256.834400] task: ffff880016064aa0 ti: ffff88003bb20000 task.ti: ffff88003bb20000
[ 1256.834622] RIP: 0010:[<ffffffff8137d827>]  [<ffffffff8137d827>] memmove+0x37/0x1a0
[ 1256.834622] RSP: 0018:ffff88003bb21c80  EFLAGS: 00010206
[ 1256.834622] RAX: ffffc90002091000 RBX: ffffc90000258ac0 RCX: 0000000000000000
[ 1256.834622] RDX: 00000000000001c0 RSI: ffff88003bdaf600 RDI: ffffc90002091000
[ 1256.834622] RBP: ffff88003bb21d30 R08: ffff88003bdaf800 R09: 0000000000000400
[ 1256.834622] R10: 0000000000000000 R11: ffffc90002091000 R12: ffffc900002314a0
[ 1256.834622] R13: ffffc900002314a0 R14: ffffc90000235aa0 R15: ffff880033399708
[ 1256.834622] FS:  0000000000000000(0000) GS:ffff88003fb00000(0000) knlGS:0000000000000000
[ 1256.834622] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1256.834622] CR2: ffff88003bdaf600 CR3: 0000000021993000 CR4: 00000000000006e0
[ 1256.834622] Stack:
[ 1256.834622]  ffffffffa0404b9a ffffffff00020400 ffffffffa04040d0 0000000000000000
[ 1256.834622]  ffff880033399678 ffff880016064aa0 ffff880016064aa0 ffffc90000258ac0
[ 1256.834622]  ffffc90000258ac0 ffff880033399760 ffff8800333997b0 ffffc90002091000
[ 1256.834622] Call Trace:
[ 1256.834622]  [<ffffffffa0404b9a>] ? vdev_queue_io_to_issue+0x84a/0xeb0 [zfs]
[ 1256.834622]  [<ffffffffa04040d0>] ? vdev_queue_timestamp_compare+0x40/0x40 [zfs]
[ 1256.834622]  [<ffffffffa04059f6>] vdev_queue_io_done+0x176/0x3d0 [zfs]
[ 1256.834622]  [<ffffffffa045119a>] ? zio_wait_for_children+0x7a/0x130 [zfs]
[ 1256.834622]  [<ffffffffa0452318>] zio_vdev_io_done+0x88/0x1e0 [zfs]
[ 1256.834622]  [<ffffffffa04530f0>] zio_execute+0xf0/0x310 [zfs]
[ 1256.834622]  [<ffffffffa0297ae7>] taskq_thread+0x267/0x660 [spl]
[ 1256.834622]  [<ffffffff81099498>] ? finish_task_switch+0xf8/0x150
[ 1256.834622]  [<ffffffff8109ee50>] ? try_to_wake_up+0x2c0/0x2c0
[ 1256.834622]  [<ffffffffa0297880>] ? taskq_dispatch_delay+0x390/0x390 [spl]
[ 1256.834622]  [<ffffffff8108dc29>] kthread+0xc9/0xe0
[ 1256.834622]  [<ffffffff8108db60>] ? flush_kthread_worker+0x80/0x80
[ 1256.834622]  [<ffffffff817305bc>] ret_from_fork+0x7c/0xb0
[ 1256.834622]  [<ffffffff8108db60>] ? flush_kthread_worker+0x80/0x80
[ 1256.834622] Code: 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 41 48 83 ea 20 48 83 ea 20 <4c> 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 4c 89 
[ 1256.834622] RIP  [<ffffffff8137d827>] memmove+0x37/0x1a0
[ 1256.834622]  RSP <ffff88003bb21c80>
[ 1256.834622] CR2: ffff88003bdaf600
[ 1256.834622] ---[ end trace 9235dc3b3757c5f7 ]---

This oops can be reproduced very reliably, which is a bit of a surprise since the buffer is slab backed.
And if I changed 512 buffer to use __get_free_page, the oops disappears.

Member

tuxoko commented Apr 21, 2014

This is the dump_stack where the page is freed.

[  667.875292] CPU: 0 PID: 0 Comm: swapper/0 Tainted: PF          O 3.14.1-debug-pagealloc+ #4
[  667.875614] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  667.875936]  0000000000000000 ffff88003fa03778 ffffffff8171f389 01ffff0000000000
[  667.876251]  ffffea0000f6e180 ffff88003fa037e8 ffffffff81160fb0 ffff88003fa1c200
[  667.876564]  ffff88003fa13d00 ff0000fffe000000 ff0000ffffffffff ffffea0000f6e180
[  667.876877] Call Trace:
[  667.876978]  <IRQ>  [<ffffffff8171f389>] dump_stack+0x46/0x58
[  667.877223]  [<ffffffff81160fb0>] free_pages_prepare+0x220/0x230
[  667.877461]  [<ffffffff81161000>] free_hot_cold_page+0x40/0x170
[  667.877695]  [<ffffffff81166647>] __put_single_page+0x27/0x30
[  667.877922]  [<ffffffff811671b5>] put_page+0x25/0x40
[  667.878121]  [<ffffffff81627c78>] skb_release_data+0x98/0x130
[  667.878349]  [<ffffffff81627d38>] skb_release_all+0x28/0x30
[  667.878569]  [<ffffffff81627d96>] __kfree_skb+0x16/0xc0
[  667.878778]  [<ffffffff816836db>] tcp_ack+0x60b/0x1050
[  667.878982]  [<ffffffff81684655>] tcp_rcv_established+0x195/0x6c0
[  667.879025]  [<ffffffff8168e655>] tcp_v4_do_rcv+0x1d5/0x4e0
[  667.879025]  [<ffffffff81690853>] tcp_v4_rcv+0x553/0x670
[  667.879025]  [<ffffffff8166b536>] ip_local_deliver_finish+0x66/0x100
[  667.879025]  [<ffffffff8166b768>] ip_local_deliver+0x48/0x80
[  667.879025]  [<ffffffff8166b231>] ip_rcv_finish+0x81/0x320
[  667.879025]  [<ffffffff8166ba42>] ip_rcv+0x2a2/0x3e0
[  667.879025]  [<ffffffff81694721>] ? tcp4_gro_receive+0xe1/0x130
[  667.879025]  [<ffffffff81637002>] __netif_receive_skb_core+0x542/0x730
[  667.879025]  [<ffffffff81637211>] __netif_receive_skb+0x21/0x70
[  667.879025]  [<ffffffff81637413>] netif_receive_skb_internal+0x23/0x90
[  667.879025]  [<ffffffff81637e8d>] napi_gro_receive+0x8d/0x100
[  667.879025]  [<ffffffffa0002496>] e1000_clean_rx_irq+0x2b6/0x570 [e1000]
[  667.879025]  [<ffffffffa0003b82>] e1000_clean+0x282/0x8e0 [e1000]
[  667.879025]  [<ffffffff8109132e>] ? hrtimer_get_next_event+0xce/0xd0
[  667.879025]  [<ffffffff8163777f>] net_rx_action+0xbf/0x1d0
[  667.879025]  [<ffffffff8106e07f>] __do_softirq+0xef/0x2d0
[  667.879025]  [<ffffffff8106e57d>] irq_exit+0x12d/0x140
[  667.879025]  [<ffffffff817329a7>] do_IRQ+0x67/0x110
[  667.879025]  [<ffffffff817277ed>] common_interrupt+0x6d/0x6d
[  667.879025]  <EOI>  [<ffffffff81091238>] ? hrtimer_start+0x18/0x20
[  667.879025]  [<ffffffff81052696>] ? native_safe_halt+0x6/0x10
[  667.879025]  [<ffffffff8101e1df>] default_idle+0x1f/0xc0
[  667.879025]  [<ffffffff8101ea56>] arch_cpu_idle+0x26/0x30
[  667.879025]  [<ffffffff810c6991>] cpu_startup_entry+0xe1/0x280
[  667.879025]  [<ffffffff8170f2f7>] rest_init+0x77/0x80
[  667.879025]  [<ffffffff81d2af96>] start_kernel+0x44e/0x45b
[  667.879025]  [<ffffffff81d2a947>] ? repair_env_string+0x5e/0x5e
[  667.879025]  [<ffffffff81d2a117>] ? early_idt_handlers+0x117/0x120
[  667.879025]  [<ffffffff81d2a5f0>] x86_64_start_reservations+0x2a/0x2c
[  667.879025]  [<ffffffff81d2a733>] x86_64_start_kernel+0x141/0x150

So rbd/network call put_page on zfs buffer and result in this bug.
I'm not familiar with rbd and net code, so I'm not sure how to fix it.

Member

tuxoko commented Apr 22, 2014

@aieri @tdb
Could you try the following patch.
I haven't figure out exactly why, but this seems to work for me.

diff --git a/module/spl/spl-kmem.c b/module/spl/spl-kmem.c
index 55c467b..b673c29 100644
--- a/module/spl/spl-kmem.c
+++ b/module/spl/spl-kmem.c
@@ -864,7 +864,8 @@ kv_alloc(spl_kmem_cache_t *skc, int size, int flags)
    ASSERT(ISP2(size));

    if (skc->skc_flags & KMC_KMEM)
-       ptr = (void *)__get_free_pages(flags, get_order(size));
+       ptr = (void *)__get_free_pages(flags | __GFP_COMP,
+           get_order(size));
    else
        ptr = __vmalloc(size, flags | __GFP_HIGHMEM, PAGE_KERNEL);

jakew02 added a commit to jakew02/android_kernel_moto_shamu that referenced this issue Feb 2, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

jakew02 added a commit to jakew02/android_kernel_moto_shamu that referenced this issue Feb 2, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

neomanu added a commit to neomanu/android_kernel_yu_msm8916 that referenced this issue Feb 19, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: neomanu <neuvine@gmail.com>

sandymanu added a commit to sandymanu/manufooty_yu that referenced this issue Feb 21, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: neomanu <neuvine@gmail.com>

surkovalex pushed a commit to surkovalex/linux-amlogic that referenced this issue Mar 5, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

shminer referenced this issue in shminer/android_kernel_lge_f460 Mar 6, 2015

Shamu: Update to Linux 3.10.42
ftrace/module: Hardcode ftrace_module_init() call into load_module()

commit a949ae560a511fe4e3adf48fa44fefded93e5c2b upstream.

A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip: Gic: Support forced affinity setting

commit ffde1de64012c406dfdda8690918248b472f24e4 upstream.

To support the affinity setting of per cpu timers in the early startup
of a not yet online cpu, implement the force logic, which disables the
cpu online check.

Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143315.916984416@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

genirq: Allow forcing cpu affinity of interrupts

commit 01f8fa4f01d8362358eb90e412bd7ae18a3ec1ad upstream.

The current implementation of irq_set_affinity() refuses rightfully to
route an interrupt to an offline cpu.

But there is a special case, where this is actually desired. Some of
the ARM SoCs have per cpu timers which require setting the affinity
during cpu startup where the cpu is not yet in the online mask.

If we can't do that, then the local timer interrupt for the about to
become online cpu is routed to some random online cpu.

The developers of the affected machines tried to work around that
issue, but that results in a massive mess in that timer code.

We have a yet unused argument in the set_affinity callbacks of the irq
chips, which I added back then for a similar reason. It was never
required so it got not used. But I'm happy that I never removed it.

That allows us to implement a sane handling of the above scenario. So
the affected SoC drivers can add the required force handling to their
interrupt chip, switch the timer code to irq_force_affinity() and
things just work.

This does not affect any existing user of irq_set_affinity().

Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.

Reported-and-tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clocksource: Exynos_mct: Register clock event after request_irq()

commit 8db6e5104b77de5d0b7002b95069da0992a34be9 upstream.

After hotplugging CPU1 the first call of interrupt handler for CPU1
oneshot timer was called on CPU0 because it fired before setting IRQ
affinity. Affected are SoCs where Multi Core Timer interrupts are
shared (SPI), e.g. Exynos 4210.

During setup of the MCT timers the clock event device should be
registered after setting the affinity for interrupt. This will prevent
starting the timer too early.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143316.299247848@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pata_at91: fix ata_host_activate() failure handling

commit 27aa64b9d1bd0d23fd692c91763a48309b694311 upstream.

Add missing clk_put() call to ata_host_activate() failure path.

Sergei says,

  "Hm, I have once fixed that (see that *if* (!ret)) but looks like a
   later commit 477c87e90853d136b188c50c0e4a93d01cad872e (ARM:
   at91/pata: use gpio_is_valid to check the gpio) broke it again. :-(
   Would be good if the changelog did mention that..."

Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

topology: Fix compilation warning when not in SMP

commit 53974e06603977f348ed978d75c426b0532daa67 upstream.

The topology_##name() macro does not use its argument when CONFIG_SMP is not
set, as it ultimately calls the cpu_data() macro.

So we avoid maintaining a possibly unused `cpu' variable, to avoid the
following compilation warning:

  drivers/base/topology.c: In function ‘show_physical_package_id’:
  drivers/base/topology.c:103:118: warning: unused variable ‘cpu’ [-Wunused-variable]
   define_id_show_func(physical_package_id);

  drivers/base/topology.c: In function ‘show_core_id’:
  drivers/base/topology.c:106:106: warning: unused variable ‘cpu’ [-Wunused-variable]
   define_id_show_func(core_id);

This can be seen with e.g. x86 defconfig and CONFIG_SMP not set.

Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: make fixup_user_fault() check the vma access rights too

commit 1b17844b29ae042576bea588164f2f1e9590a8bc upstream.

fixup_user_fault() is used by the futex code when the direct user access
fails, and the futex code wants it to either map in the page in a usable
form or return an error.  It relied on handle_mm_fault() to map the
page, and correctly checked the error return from that, but while that
does map the page, it doesn't actually guarantee that the page will be
mapped with sufficient permissions to be then accessed.

So do the appropriate tests of the vma access rights by hand.

[ Side note: arguably handle_mm_fault() could just do that itself, but
  we have traditionally done it in the caller, because some callers -
  notably get_user_pages() - have been able to access pages even when
  they are mapped with PROT_NONE.  Maybe we should re-visit that design
  decision, but in the meantime this is the minimal patch. ]

Found by Dave Jones running his trinity tool.

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: 8250: Fix thread unsafe __dma_tx_complete function

commit f8fd1b0350d3a4581125f5eda6528f5a2c5f9183 upstream.

__dma_tx_complete is not protected against concurrent
call of serial8250_tx_dma. it can lead to circular tail
index corruption or parallel call of serial_tx_dma on the
same data portion.

This patch fixes this issue by holding the port lock.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8250_core: Fix unwanted TX chars write

commit b08c9c317e3f7764a91d522cd031639ba42b98cc upstream.

On transmit-hold-register empty, serial8250_tx_chars
should be called only if we don't use DMA.
DMA has its own tx cycle.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpu: host1x: handle the correct # of syncpt regs

commit 22bbd5d949dc7fdd72a4e78e767fa09d8e54b446 upstream.

BIT_WORD() truncates rather than rounds, so the loops in
syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
rather than < in an attempt to process the correct number of registers
when rounding of the conversion of count of bits to count of words is
necessary. However, when rounding isn't necessary because the value is
already a multiple of the divisor (as is the case for all values of
nb_pts the code actually sees), this causes one too many registers to
be processed.

Solve this by using and explicit DIV_ROUND_UP() call, rather than
BIT_WORD(), and comparing with < rather than <=.

Fixes: 7ede0b0bf3e2 ("gpu: host1x: Add syncpoint wait and interrupts")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

timer: Prevent overflow in apply_slack

commit 98a01e779f3c66b0b11cd7e64d531c0e41c95762 upstream.

On architectures with sizeof(int) < sizeof (long), the
computation of mask inside apply_slack() can be undefined if the
computed bit is > 32.

E.g. with: expires = 0xffffe6f5 and slack = 25, we get:

expires_limit = 0x20000000e
bit = 33
mask = (1 << 33) - 1  /* undefined */

On x86, mask becomes 1 and and the slack is not applied properly.
On s390, mask is -1, expires is set to 0 and the timer fires immediately.

Use 1UL << bit to solve that issue.

Suggested-by: Deborah Townsend <dstownse@us.ibm.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipmi: Fix a race restarting the timer

commit 48e8ac2979920ffa39117e2d725afa3a749bfe8d upstream.

With recent changes it is possible for the timer handler to detect an
idle interface and not start the timer, but the thread to start an
operation at the same time.  The thread will not start the timer in that
instance, resulting in the timer not running.

Instead, move all timer operations under the lock and start the timer in
the thread if it detect non-idle and the timer is not already running.
Moving under locks allows the last timeout to be set in both the thread
and the timer.  'Timer is not running' means that the timer is not
pending and smi_timeout() is not running.  So we need a flag to detect
this correctly.

Also fix a few other timeout bugs: setting the last timeout when the
interrupt has to be disabled and the timer started, and setting the last
timeout in check_start_timer_thread possibly racing with the timer

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipmi: Reset the KCS timeout when starting error recovery

commit eb6d78ec213e6938559b801421d64714dafcf4b2 upstream.

The OBF timer in KCS was not reset in one situation when error recovery
was started, resulting in an immediate timeout.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: fix suspend vs. authentication race

commit 1a1cb744de160ee70086a77afff605bbc275d291 upstream.

Since Stanislaw's patch removing the quiescing code, mac80211 had
a race regarding suspend vs. authentication: as cfg80211 doesn't
track authentication attempts, it can't abort them. Therefore the
attempts may be kept running while suspending, which can lead to
all kinds of issues, in at least some cases causing an error in
iwlmvm firmware.

Fix this by aborting the authentication attempt when suspending.

Cc: stable@vger.kernel.org
Fixes: 12e7f517029d ("mac80211: cleanup generic suspend/resume procedures")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm, thp: close race between mremap() and split_huge_page()

commit dd18dbc2d42af75fffa60c77e0f02220bc329829 upstream.

It's critical for split_huge_page() (and migration) to catch and freeze
all PMDs on rmap walk.  It gets tricky if there's concurrent fork() or
mremap() since usually we copy/move page table entries on dup_mm() or
move_page_tables() without rmap lock taken.  To get it work we rely on
rmap walk order to not miss any entry.  We expect to see destination VMA
after source one to work correctly.

But after switching rmap implementation to interval tree it's not always
possible to preserve expected walk order.

It works fine for dup_mm() since new VMA has the same vma_start_pgoff()
/ vma_last_pgoff() and explicitly insert dst VMA after src one with
vma_interval_tree_insert_after().

But on move_vma() destination VMA can be merged into adjacent one and as
result shifted left in interval tree.  Fortunately, we can detect the
situation and prevent race with rmap walk by moving page table entries
under rmap lock.  See commit 38a76013ad80.

Problem is that we miss the lock when we move transhuge PMD.  Most
likely this bug caused the crash[1].

[1] http://thread.gmane.org/gmane.linux.kernel.mm/96473

Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail")

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()

commit 9844f5462392b53824e8b86726e7c33b5ecbb676 upstream.

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwpoison, hugetlb: lock_page/unlock_page does not match for handling a free hugepage

commit b985194c8c0a130ed155b71662e39f7eaea4876f upstream.

For handling a free hugepage in memory failure, the race will happen if
another thread hwpoisoned this hugepage concurrently.  So we need to
check PageHWPoison instead of !PageHWPoison.

If hwpoison_filter(p) returns true or a race happens, then we need to
unlock_page(hpage).

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: fix on-channel remain-on-channel

commit b4b177a5556a686909e643f1e9b6434c10de079f upstream.

Jouni reported that if a remain-on-channel was active on the
same channel as the current operating channel, then the ROC
would start, but any frames transmitted using mgmt-tx on the
same channel would get delayed until after the ROC.

The reason for this is that the ROC starts, but doesn't have
any handling for "remain on the same channel", so it stops
the interface queues. The later mgmt-tx then puts the frame
on the interface queues (since it's on the current operating
channel) and thus they get delayed until after the ROC.

To fix this, add some logic to handle remaining on the same
channel specially and not stop the queues etc. in this case.
This not only fixes the bug but also improves behaviour in
this case as data frames etc. can continue to flow.

Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (emc1403) fix inverted store_hyst()

commit 17c048fc4bd95efea208a1920f169547d8588f1f upstream.

Attempts to set the hysteresis value to a temperature below the target
limit fails with "write error: Numerical result out of range" due to
an inverted comparison.

Signed-off-by: Josef Gajdusek <atx@atx.name>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (emc1403) Support full range of known chip revision numbers

commit 3a18e1398fc2dc9c32bbdc50664da3a77959a8d1 upstream.

The datasheet for EMC1413/EMC1414, which is fully compatible to
EMC1403/1404 and uses the same chip identification, references revision
numbers 0x01, 0x03, and 0x04. Accept the full range of revision numbers
from 0x01 to 0x04 to make sure none are missed.

Signed-off-by: Josef Gajdusek <atx@atx.name>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Prevent all reprogramming if hang detected

commit 6c6c0d5a1c949d2e084706f9e5fb1fccc175b265 upstream.

If the last hrtimer interrupt detected a hang it sets hang_detected=1
and programs the clock event device with a delay to let the system
make progress.

If hang_detected == 1, we prevent reprogramming of the clock event
device in hrtimer_reprogram() but not in hrtimer_force_reprogram().

This can lead to the following situation:

hrtimer_interrupt()
   hang_detected = 1;
   program ce device to Xms from now (hang delay)

We have two timers pending:
   T1 expires 50ms from now
   T2 expires 5s from now

Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
invoked, which in turn programs the clock event device to T2 (5
seconds from now).

Any hrtimer_start after that will not reprogram the hardware due to
hang_detected still being set. So we effectivly block all timers until
the T2 event fires and cleans up the hang situation.

Add a check for hang_detected to hrtimer_force_reprogram() which
prevents the reprogramming of the hang delay in the hardware
timer. The subsequent hrtimer_interrupt will resolve all outstanding
issues.

[ tglx: Rewrote subject and changelog and fixed up the comment in
  	hrtimer_force_reprogram() ]

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Prevent remote enqueue of leftmost timers

commit 012a45e3f4af68e86d85cce060c6c2fed56498b2 upstream.

If a cpu is idle and starts an hrtimer which is not pinned on that
same cpu, the nohz code might target the timer to a different cpu.

In the case that we switch the cpu base of the timer we already have a
sanity check in place, which determines whether the timer is earlier
than the current leftmost timer on the target cpu. In that case we
enqueue the timer on the current cpu because we cannot reprogram the
clock event device on the target.

If the timers base is already the target CPU we do not have this
sanity check in place so we enqueue the timer as the leftmost timer in
the target cpus rb tree, but we cannot reprogram the clock event
device on the target cpu. So the timer expires late and subsequently
prevents the reprogramming of the target cpu clock event device until
the previously programmed event fires or a timer with an earlier
expiry time gets enqueued on the target cpu itself.

Add the same target check as we have for the switch base case and
start the timer on the current cpu if it would become the leftmost
timer on the target.

[ tglx: Rewrote subject and changelog ]

Signed-off-by: Leon Ma <xindong.ma@intel.com>
Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Set expiry time before switch_hrtimer_base()

commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.

switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.

But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.

Update expiry time before calling switch_hrtimer_base().

[ tglx: Rewrote changelog once again ]

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: fweisbec@gmail.com
Cc: arvind.chauhan@arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.1399882253.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: avoid possible spinning md thread at shutdown.

commit 0f62fb220aa4ebabe8547d3a9ce4a16d3c045f21 upstream.

If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:

1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
   and md_check_recover enters an infinite loop.

Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Break encoder->crtc link separately in intel_sanitize_crtc()

commit 7f1950fbb989e8fc5463b307e062b4529d51c862 upstream.

Depending on the SDVO output_flags SDVO may have multiple connectors
linking to the same encoder (in intel_connector->encoder->base).
Only one of those connectors should be active (ie link to the encoder
thru drm_connector->encoder).
If intel_connector_break_all_links() is called from intel_sanitize_crtc()
we may break the crtc connection of an encoder thru an inactive connector
in which case intel_connector_break_all_links() will not be called again
for the active connector if this happens to come later in the list due to:
    if (connector->encoder->base.crtc != &crtc->base)
                                 continue;
in intel_sanitize_crtc().
This will however leave the drm_connector->encoder linkage for this
active connector in place. Subsequently this will cause multiple
warnings in intel_connector_check_state() to trigger and the driver
will eventually die in drm_encoder_crtc_ok() (because of crtc == NULL).

To avoid this remove intel_connector_break_all_links() and move its
code to its two calling functions: intel_sanitize_crtc() and
intel_sanitize_encoder().
This allows to implement the link breaking more flexibly matching
the surrounding code: ie. in intel_sanitize_crtc() we can break the
crtc link separatly after the links to the encoders have been
broken which avoids above problem.

This regression has been introduced in:

commit 24929352481f085c5f85d4d4cbc919ddf106d381
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jul 2 20:28:59 2012 +0200

    drm/i915: read out the modeset hw state at load and resume time

so goes back to the very beginning of the modeset rework.

v2: This patch takes care of the concernes voiced by Chris Wilson
and Daniel Vetter that only breaking links if the drm_connector
is linked to an encoder may miss some links.
v3: move all encoder handling to encoder loop as suggested by
Daniel Vetter.

Signed-off-by: Egbert Eich <eich@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: fix ATPX detection on non-VGA GPUs

commit e9a4099a59cc598a44006059dd775c25e422b772 upstream.

Some newer PX laptops have the pci device class
set to DISPLAY_OTHER rather than DISPLAY_VGA.  This
properly detects ATPX on those laptops.

Based on a patch from: Pali Rohár <pali.rohar@gmail.com>

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: check buffer relocation offset

commit 695daf1a8e731a4b5b89de89a61f32a4d7ad7094 upstream.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/tegra: Remove gratuitous pad field

commit cbfbbabb89b37f6bad05f478d906a385149f288d upstream.

The version of the drm_tegra_submit structure that was merged all the
way back in 3.10 contains a pad field that was originally intended to
properly pad the following __u64 field. Unfortunately it seems like a
different field was dropped during review that caused this padding to
become unnecessary, but the pad field wasn't removed at that time.

One possible side-effect of this is that since the __u64 following the
pad is now no longer properly aligned, the compiler may (or may not)
introduce padding itself, which results in no predictable ABI.

Rectify this by removing the pad field so that all fields are again
naturally aligned. Technically this is breaking existing userspace ABI,
but given that there aren't any (released) userspace drivers that make
use of this yet, the fallout should be minimal.

Fixes: d43f81cbaf43 ("drm/tegra: Add gr2d device")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio:imu:mpu6050: Fixed segfault in Invensens MPU driver due to null dereference

commit b9b3a41893c3f1be67b5aacfa525969914bea0e9 upstream.

The driver segfaults when the kernel boots with device tree as the
platform data is then not present and the pointer is deferenced without
checking it is not null.  This patch introduces such a check avoiding the
crash.

Signed-off-by: Atilla Filiz <atilla.filiz@essensium.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsl-usb: do not test for PHY_CLK_VALID bit on controller version 1.6

commit d183c81929beeba842b74422f754446ef2b8b49c upstream.

Per reference manuals of Freescale P1020 and P2020 SoCs, USB controller
present in these SoCs has bit 17 of USBx_CONTROL register marked as
Reserved - there is no PHY_CLK_VALID bit there.

Testing for this bit in ehci_fsl_setup_phy() behaves differently on two
P1020RDB boards available here - on one board test passes and fsl-usb
init succeeds, but on other board test fails, causing fsl-usb init to
fail.

This patch changes ehci_fsl_setup_phy() not to test PHY_CLK_VALID on
controller version 1.6 that (per manual) does not have this bit.

Signed-off-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: at91-udc: fix irq and iomem resource retrieval

commit 886c7c426d465732ec9d1b2bbdda5642fc2e7e05 upstream.

When using dt resources retrieval (interrupts and reg properties) there is
no predefined order for these resources in the platform dev resource
table. Also don't expect the number of resource to be always 2.

Signed-off-by: Jean-Jacques Hiblot <jjhiblot@traphandler.com>
Acked-by: Boris BREZILLON <b.brezillon@overkiz.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: OHCI: fix problem with global suspend on ATI controllers

commit c1db30a2a79eb59997b13b8cabf2a50bea9f04e1 upstream.

Some OHCI controllers from ATI/AMD seem to have difficulty with
"global" USB suspend, that is, suspending an entire USB bus without
setting the suspend feature for each port connected to a device.  When
we try to resume the child devices, the controller gives timeout
errors on the unsuspended ports, requiring resets, and can even cause
ohci-hcd to hang; see

	http://marc.info/?l=linux-usb&m=139514332820398&w=2

and the following messages.

This patch fixes the problem by adding a new quirk flag to ohci-hcd.
The flag causes the ohci_rh_suspend() routine to suspend each
unsuspended, enabled port before suspending the root hub.  This
effectively converts the "global" suspend to an ordinary root-hub
suspend.  There is no need to unsuspend these ports when the root hub
is resumed, because the child devices will be resumed anyway in the
course of a normal system resume ("global" suspend is never used for
runtime PM).

This patch should be applied to all stable kernels which include
commit 0aa2832dd0d9 (USB: use "global suspend" for system sleep on
USB-2 buses) or a backported version thereof.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Peter Münster <pmlists@free.fr>
Tested-by: Peter Münster <pmlists@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: qcserial: add a number of Dell devices

commit 4d7c0136a54f62501f8a34c4d08a5e0258d3d3ca upstream.

Dan writes:

"The Dell drivers use the same configuration for PIDs:

81A2: Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card
81A3: Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card
81A4: Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card
81A8: Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card
81A9: Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card

These devices are all clearly Sierra devices, but are also definitely
Gobi-based.  The A8 might be the MC7700/7710 and A9 is likely a MC7750.

>From DellGobi5kSetup.exe from the Dell drivers:

usbif0: serial/firmware loader?
usbif2: nmea
usbif3: modem/ppp
usbif8: net/QMI"

Reported-by: AceLan Kao <acelan.kao@canonical.com>
Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: storage: shuttle_usbat: fix discs being detected twice

commit df602c2d2358f02c6e49cffc5b49b9daa16db033 upstream.

Even if the USB-to-ATAPI converter supported multiple LUNs, this
driver would always detect the same physical device or media because
it doesn't use srb->device->lun in any way.
Tested with an Hewlett-Packard CD-Writer Plus 8200e.

Signed-off-by: Daniele Forsi <dforsi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: Nokia 305 should be treated as unusual dev

commit f0ef5d41792a46a1085dead9dfb0bdb2c574638e upstream.

Signed-off-by: Victor A. Santos <victoraur.santos@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: Nokia 5300 should be treated as unusual dev

commit 6ed07d45d09bc2aa60e27b845543db9972e22a38 upstream.

Signed-off-by: Daniele Forsi <dforsi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rt2x00: fix beaconing on USB

commit 8834d3608cc516f13e2e510f4057c263f3d2ce42 upstream.

When disable beaconing we clear register with beacon and newer set it
back, what make we stop send beacons infinitely.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: work around corrupted TEAC UD-H01 feedback data

commit 7040b6d1febfdbd9c1595efb751d492cd2503f96 upstream.

The TEAC UD-H01 firmware sends wrong feedback frequency values, thus
causing the PC to send the samples at a wrong rate, which results in
clicks and crackles in the output.

Add a workaround to detect and fix the corruption.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[mick37@gmx.de: use sender->udh01_fb_quirk rather than
 ep->udh01_fb_quirk in snd_usb_handle_sync_urb()]
Reported-and-tested-by: Mick <mick37@gmx.de>
Reported-and-tested-by: Andrea Messa <andr.messa@tiscali.it>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix triggering BR/EDR L2CAP Connect too early

commit 9eb1fbfa0a737fd4d3a6d12d71c5ea9af622b887 upstream.

Commit 1c2e004183178 introduced an event handler for the encryption key
refresh complete event with the intent of fixing some LE/SMP cases.
However, this event is shared with BR/EDR and there we actually want to
act only on the auth_complete event (which comes after the key refresh).

If we do not do this we may trigger an L2CAP Connect Request too early
and cause the remote side to return a security block error.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix redundant encryption request for reauthentication

commit 09da1f3463eb81d59685df723b1c5950b7570340 upstream.

When we're performing reauthentication (in order to elevate the
security level from an unauthenticated key to an authenticated one) we
do not need to issue any encryption command once authentication
completes. Since the trigger for the encryption HCI command is the
ENCRYPT_PEND flag this flag should not be set in this scenario.
Instead, the REAUTH_PEND flag takes care of all necessary steps for
reauthentication.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Add support for Lite-on [04ca:3007]

commit 1fb4e09a7e780b915dbd172592ae7e2a4c071065 upstream.

Add support for the AR9462 chip

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=03 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=04ca ProdID=3007 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Mohammed Habibulla <moch@chromium.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

posix_acl: handle NULL ACL in posix_acl_equiv_mode

commit 50c6e282bdf5e8dabf8d7cf7b162545a55645fd9 upstream.

Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.

Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

omap3isp: Defer probe when the IOMMU is not available

commit 7c0f812a5d65e712618af880dda4a5cc7ed79463 upstream.

When the OMAP3 ISP driver is compiled in the kernel the device can be
probed before the corresponding IOMMU is available. Defer the probe in
that case, and fix a crash in the error path.

Reported-by: Javier Martin <javier.martin@vista-silicon.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: i.MX53: Fix ipu register space size

commit 6d66da89bf4422c0a0693627fb3e25f74af50f92 upstream.

The IPU register space is 128MB, not 2GB.

Fixes: abed9a6bf2bb 'ARM i.MX53: Add IPU support'
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: 8012/1: kdump: Avoid overflow when converting pfn to physaddr

commit 8fad87bca7ac9737e413ba5f1656f1114a8c314d upstream.

When we configure CONFIG_ARM_LPAE=y, pfn << PAGE_SHIFT will
overflow if pfn >= 0x100000 in copy_oldmem_page.
So use __pfn_to_phys for converting.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtl8192cu: Fix unbalanced irq enable in error path of rtl92cu_hw_init()

commit 3234f5b06fc3094176a86772cc64baf3decc98fc upstream.

Fixes: a53268be0cb9 ('rtlwifi: rtl8192cu: Fix too long disable of IRQs')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/acpi: allow non-optimus setups to load vbios from acpi

commit a3d0b1218d351c6e6f3cea36abe22236a08cb246 upstream.

There appear to be a crop of new hardware where the vbios is not
available from PROM/PRAMIN, but there is a valid _ROM method in ACPI.
The data read from PCIROM almost invariably contains invalid
instructions (still has the x86 opcodes), which makes this a low-risk
way to try to obtain a valid vbios image.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76475
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/pm/fan: drop the fan lock in fan_update() before rescheduling

commit 61679fe153b2b9ea5b5e2ab93305419e85e99a9d upstream.

This should fix a deadlock that has been reported to us where fan_update()
would hold the fan lock and try to grab the alarm_program_lock to reschedule
an update. On an other CPU, the alarm_program_lock would have been taken
before calling fan_update(), leading to a deadlock.

We should Cc: <stable@vger.kernel.org> # 3.9+

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Timothée Ravier <tim@siosm.fr>
Tested-by: Boris Fersing (IRC nick fersingb, no public email address)
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

leds: leds-pwm: properly clean up after probe failure

commit 392369019eb96e914234ea21eda806cb51a1073e upstream.

When probing with DT, we add each LED one at a time.  If we find a LED
without a PWM device (because it is not available yet) we fail the
initialisation, unregister previous LEDs, and then by way of managed
resources, we free the structure.

The problem with this is we may have a scheduled and active work_struct
in this structure, and this results in a nasty kernel oops.

We need to cancel this work_struct properly upon cleanup - and the
cleanup we require is the same cleanup as we do when the LED platform
device is removed.  Rather than writing this same code three times,
move it into a separate function and use it in all three places.

Fixes: c971ff185f64 ("leds: leds-pwm: Defer led_pwm_set() if PWM can sleep")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

brcmsmac: fix deadlock on missing firmware

commit 8fc1e8c240aab968db658b2d8d079b4391207a36 upstream.

When brcm80211 firmware is not installed networking hangs.
A deadlock happens because we call ieee80211_unregister_hw()
from the .start callback of struct ieee80211_ops. When .start
is called we are under rtnl lock and ieee80211_unregister_hw()
tries to take it again.

Function call stack:

dev_change_flags()
	__dev_change_flags()
		__dev_open()
			ASSERT_RTNL() <-- Assert rtnl lock
			ops->ndo_open()

.ndo_open = ieee80211_open,

ieee80211_open()
	ieee80211_do_open()
		drv_start()
			local->ops->start()

.start = brcms_ops_start,

brcms_ops_start()
	brcms_remove()
		ieee80211_unregister_hw()
			rtnl_lock() <-- Here we deadlock

Introduced by:
commit 25b5632fb35ca61b8ae3eee235edcdc2883f7a5e
("brcmsmac: request firmware in .start() callback")

This patch fixes the bug by removing the call to brcms_remove()
and moves the brcms_request_fw() call to the top of the .start
callback to not initiate anything unless firmware is installed.

Signed-off-by: Emil Goode <emilgoode@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Documentation: Update stable address in Chinese and Japanese translations

commit 98b0f811aade1b7c6e7806c86aa0befd5919d65f upstream.

The English and Korean translations were updated, the Chinese and Japanese
weren't.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: crypto_wq - Fix late crypto work queue initialization

commit 130fa5bc81b44b6cc1fbdea3abf6db0da22964e0 upstream.

The crypto algorithm modules utilizing the crypto daemon could
be used early when the system start up.  Using module_init
does not guarantee that the daemon's work queue is initialized
when the cypto alorithm depending on crypto_wq starts.  It is necessary
to initialize the crypto work queue earlier at the subsystem
init time to make sure that it is initialized
when used.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: vexpress: NULL dereference on error path

commit 6b4ed8b00e93bd31f24a25f59ed8d1b808d0cc00 upstream.

If the allocation fails then we dereference the NULL in the error path.
Just return directly.

Fixes: ed27ff1db869 ('clk: Versatile Express clock generators ("osc") driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: media-device: fix infoleak in ioctl media_enum_entities()

commit e6a623460e5fc960ac3ee9f946d3106233fd28d8 upstream.

This fixes CVE-2014-1739.

Signed-off-by: Salva Peiró <speiro@ai2.upv.es>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: i801: Add Device IDs for Intel Wildcat Point-LP PCH

commit afc659241258b40b683998ec801d25d276529f43 upstream.

This patch adds the SMBus Device IDs for the Intel Wildcat Point-LP PCH.

Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: "Chang, Rebecca Swee Fun" <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: i801: enable Intel BayTrail SMBUS

commit 1b31e9b76ef8c62291e698dfdb973499986a7f68 upstream.

Add Device ID of Intel BayTrail SMBus Controller.

Signed-off-by: Chew, Kean ho <kean.ho.chew@intel.com>
Signed-off-by: Chew, Chiau Ee <chiau.ee.chew@intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: "Chang, Rebecca Swee Fun" <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts

commit 03367ef5ea811475187a0732aada068919e14d61 upstream.

Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality
is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the
VMBUS protocol.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

trace: module: Maintain a valid user count

commit 098507ae3ec2331476fb52e85d4040c1cc6d0ef4 upstream.

The replacement of the 'count' variable by two variables 'incs' and
'decs' to resolve some race conditions during module unloading was done
in parallel with some cleanup in the trace subsystem, and was integrated
as a merge.

Unfortunately, the formula for this replacement was wrong in the tracing
code, and the refcount in the traces was not usable as a result.

Use 'count = incs - decs' to compute the user count.

Link: http://lkml.kernel.org/p/1393924179-9147-1-git-send-email-romain.izard.pro@gmail.com

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: c1ab9cab7509 "merge conflict resolution"
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: atkbd - fix keyboard not working on some LG laptops

commit 3d725caa9dcc78c3dc9e7ea0c04f626468edd9c9 upstream.

After issuing ATKBD_CMD_RESET_DIS, keyboard on some LG laptops stops
working. The workaround is to stop issuing ATKBD_CMD_RESET_DIS commands.

In order to keep changes in atkbd driver to the minimum we check DMI
signature and only skip ATKBD_CMD_RESET_DIS if we are running on LG
LW25-B7HV or P1-J273B.

Signed-off-by: Sheng-Liang Song <ssl@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: elantech - fix touchpad initialization on Gigabyte U2442

commit 36189cc3cd57ab0f1cd75241f93fe01de928ac06 upstream.

The hw_version 3 Elantech touchpad on the Gigabyte U2442 does not accept
0x0b as initialization value for r10, this stand-alone version of the
driver: http://planet76.com/drivers/elantech/psmouse-elantech-v6.tar.bz2

Uses 0x03 which does work, so this means not setting bit 3 of r10 which
sets: "Enable Real H/W Resolution In Absolute mode"

Which will result in half the x and y resolution we get with that bit set,
so simply not setting it everywhere is not a solution. We've been unable to
find a way to identify touchpads where setting the bit will fail, so this
patch uses a dmi based blacklist for this.

https://bugzilla.kernel.org/show_bug.cgi?id=61151

Reported-by: Philipp Wolfer <ph.wolfer@gmail.com>
Tested-by: Philipp Wolfer <ph.wolfer@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - add min/max quirk for the ThinkPad W540

commit 0b5fe736fe923f1f5e05413878d5990e92ffbdf5 upstream.

https://bugzilla.redhat.com/show_bug.cgi?id=1096436

Tested-and-reported-by: ajayr@bigfoot.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - T540p - unify with other LEN0034 models

commit 6d396ede224dc596d92d7cab433713536e68916c upstream.

The T540p has a touchpad with pnp-id LEN0034, all the models with this
pnp-id have the same min/max values, except the T540p where the values are
slightly off. Fix them to be identical.

This is a preparation patch for simplifying the quirk table.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix onboard audio on Intel H97/Z97 chipsets

commit 77f07800cb456bed6e5c345e6e4e83e8eda62437 upstream.

The recent Intel H97/Z97 chipsets need the similar setups like other
Intel chipsets for snooping, etc.  Especially without snooping, the
audio playback stutters or gets corrupted.  This fix patch just adds
the corresponding PCI ID entry with the proper flags.

Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSd: Move default initialisers from create_client() to alloc_client()

commit 5694c93e6c4954fa9424c215f75eeb919bddad64 upstream.

Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSd: call rpc_destroy_wait_queue() from free_client()

commit 4cb57e3032d4e4bf5e97780e9907da7282b02b0c upstream.

Mainly to ensure that we don't leave any hanging timers.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Call ->set_acl with a NULL ACL structure if no entries

commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream.

After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.

This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.

Thanks for Christoph Hellwig's advice.

First problem:
............ hang ...........

Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
3.15.0-rc1+ #15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320]  [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320]  RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: warn on finding lockowner without stateid's

commit 27b11428b7de097c42f205beabb1764f4365443b upstream.

The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: remove lockowner when removing lock stateid

commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream.

The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: fix bugs in wq_update_unbound_numa() failure path

commit 77f300b198f93328c26191b52655ce1b62e202cf upstream.

wq_update_unbound_numa() failure path has the following two bugs.

- alloc_unbound_pwq() is called without holding wq->mutex; however, if
  the allocation fails, it jumps to out_unlock which tries to unlock
  wq->mutex.

- The function should switch to dfl_pwq on failure but didn't do so
  after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: fix a possible race condition between rescuer and pwq-release

commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream.

There is a race condition between rescuer_thread() and
pwq_unbound_release_workfn().

Even after a pwq is scheduled for rescue, the associated work items
may be consumed by any worker.  If all of them are consumed before the
rescuer gets to them and the pwq's base ref was put due to attribute
change, the pwq may be released while still being linked on
@wq->maydays list making the rescuer dereference already freed pwq
later.

Make send_mayday() pin the target pwq until the rescuer is done with
it.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: make rescuer_thread() empty wq->maydays list before exiting

commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream.

After a @pwq is scheduled for emergency execution, other workers may
consume the affectd work items before the rescuer gets to them.  This
means that a workqueue many have pwqs queued on @wq->maydays list
while not having any work item pending or in-flight.  If
destroy_workqueue() executes in such condition, the rescuer may exit
without emptying @wq->maydays.

This currently doesn't cause any actual harm.  destroy_workqueue() can
safely destroy all the involved data structures whether @wq->maydays
is populated or not as nobody access the list once the rescuer exits.

However, this is nasty and makes future development difficult.  Let's
update rescuer_thread() so that it empties @wq->maydays after seeing
should_stop to guarantee that the list is empty on rescuer exit.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bus: mvebu-mbus: allow several windows with the same target/attribute

commit b566e782be32145664d96ada3e389f17d32742e5 upstream.

Having multiple windows with the same target and attribute is actually
legal, and can be useful for PCIe windows, when PCIe BARs have a size
that isn't a power of two, and we therefore need to create several
MBus windows to cover the PCIe BAR for a given PCIe interface.

Fixes: fddddb52a6c4 ('bus: introduce an Marvell EBU MBus driver')
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: https://lkml.kernel.org/r/1397823593-1932-7-git-send-email-thomas.petazzoni@free-electrons.com
Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()

commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream.

pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)

It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
but for consistency with its couterpart pcpu_mem_zalloc(),
use pcpu_mem_free() instead.

Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use
pcpu_mem_free() instead of kfree()") addressed this problem, but
missed this one.

tj: commit message updated

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-blkfront: revoke foreign access for grants not mapped by the backend

commit fbe363c476afe8ec992d3baf682670a4bd1b6ce6 upstream.

There's no need to keep the foreign access in a grant if it is not
persistently mapped by the backend. This allows us to free grants that
are not mapped by the backend, thus preventing blkfront from hoarding
all grants.

The main effect of this is that blkfront will only persistently map
the same grants as the backend, and it will always try to use grants
that are already mapped by the backend. Also the number of persistent
grants in blkfront is the same as in blkback (and is controlled by the
value in blkback).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-blkfront: restore the non-persistent data path

commit bfe11d6de1c416cea4f3f0f35f864162063ce3fa upstream.

When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
Tested-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatile

commit 44330ab516c15dda8a1e660eeaf0003f84e43e3f upstream.

The register CLASS_D_CONTROL_1 is marked as volatile because it contains
a bit, DAC_MUTE, which is also mirrored in the ADC_DAC_CONTROL_1
register. This causes problems for the "Speaker Switch" control, which
will report an error if the CODEC is suspended because it relies on a
volatile register.

To resolve this issue mark CLASS_D_CONTROL_1 as non-volatile and
manually keep the register cache in sync by updating both bits when
changing the mute status.

Reported-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

metag: fix memory barriers

commit 2425ce84026c385b73ae72039f90d042d49e0394 upstream.

Volatile access doesn't really imply the compiler barrier. Volatile access
is only ordered with respect to other volatile accesses, it isn't ordered
with respect to general memory accesses. Gcc may reorder memory accesses
around volatile acc…

shminer referenced this issue in shminer/android_kernel_lge_f460 Mar 6, 2015

Shamu: Update to Linux 3.10.42
ftrace/module: Hardcode ftrace_module_init() call into load_module()

commit a949ae560a511fe4e3adf48fa44fefded93e5c2b upstream.

A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip: Gic: Support forced affinity setting

commit ffde1de64012c406dfdda8690918248b472f24e4 upstream.

To support the affinity setting of per cpu timers in the early startup
of a not yet online cpu, implement the force logic, which disables the
cpu online check.

Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143315.916984416@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

genirq: Allow forcing cpu affinity of interrupts

commit 01f8fa4f01d8362358eb90e412bd7ae18a3ec1ad upstream.

The current implementation of irq_set_affinity() refuses rightfully to
route an interrupt to an offline cpu.

But there is a special case, where this is actually desired. Some of
the ARM SoCs have per cpu timers which require setting the affinity
during cpu startup where the cpu is not yet in the online mask.

If we can't do that, then the local timer interrupt for the about to
become online cpu is routed to some random online cpu.

The developers of the affected machines tried to work around that
issue, but that results in a massive mess in that timer code.

We have a yet unused argument in the set_affinity callbacks of the irq
chips, which I added back then for a similar reason. It was never
required so it got not used. But I'm happy that I never removed it.

That allows us to implement a sane handling of the above scenario. So
the affected SoC drivers can add the required force handling to their
interrupt chip, switch the timer code to irq_force_affinity() and
things just work.

This does not affect any existing user of irq_set_affinity().

Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.

Reported-and-tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clocksource: Exynos_mct: Register clock event after request_irq()

commit 8db6e5104b77de5d0b7002b95069da0992a34be9 upstream.

After hotplugging CPU1 the first call of interrupt handler for CPU1
oneshot timer was called on CPU0 because it fired before setting IRQ
affinity. Affected are SoCs where Multi Core Timer interrupts are
shared (SPI), e.g. Exynos 4210.

During setup of the MCT timers the clock event device should be
registered after setting the affinity for interrupt. This will prevent
starting the timer too early.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Link: http://lkml.kernel.org/r/20140416143316.299247848@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pata_at91: fix ata_host_activate() failure handling

commit 27aa64b9d1bd0d23fd692c91763a48309b694311 upstream.

Add missing clk_put() call to ata_host_activate() failure path.

Sergei says,

  "Hm, I have once fixed that (see that *if* (!ret)) but looks like a
   later commit 477c87e90853d136b188c50c0e4a93d01cad872e (ARM:
   at91/pata: use gpio_is_valid to check the gpio) broke it again. :-(
   Would be good if the changelog did mention that..."

Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

topology: Fix compilation warning when not in SMP

commit 53974e06603977f348ed978d75c426b0532daa67 upstream.

The topology_##name() macro does not use its argument when CONFIG_SMP is not
set, as it ultimately calls the cpu_data() macro.

So we avoid maintaining a possibly unused `cpu' variable, to avoid the
following compilation warning:

  drivers/base/topology.c: In function ‘show_physical_package_id’:
  drivers/base/topology.c:103:118: warning: unused variable ‘cpu’ [-Wunused-variable]
   define_id_show_func(physical_package_id);

  drivers/base/topology.c: In function ‘show_core_id’:
  drivers/base/topology.c:106:106: warning: unused variable ‘cpu’ [-Wunused-variable]
   define_id_show_func(core_id);

This can be seen with e.g. x86 defconfig and CONFIG_SMP not set.

Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: make fixup_user_fault() check the vma access rights too

commit 1b17844b29ae042576bea588164f2f1e9590a8bc upstream.

fixup_user_fault() is used by the futex code when the direct user access
fails, and the futex code wants it to either map in the page in a usable
form or return an error.  It relied on handle_mm_fault() to map the
page, and correctly checked the error return from that, but while that
does map the page, it doesn't actually guarantee that the page will be
mapped with sufficient permissions to be then accessed.

So do the appropriate tests of the vma access rights by hand.

[ Side note: arguably handle_mm_fault() could just do that itself, but
  we have traditionally done it in the caller, because some callers -
  notably get_user_pages() - have been able to access pages even when
  they are mapped with PROT_NONE.  Maybe we should re-visit that design
  decision, but in the meantime this is the minimal patch. ]

Found by Dave Jones running his trinity tool.

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

serial: 8250: Fix thread unsafe __dma_tx_complete function

commit f8fd1b0350d3a4581125f5eda6528f5a2c5f9183 upstream.

__dma_tx_complete is not protected against concurrent
call of serial8250_tx_dma. it can lead to circular tail
index corruption or parallel call of serial_tx_dma on the
same data portion.

This patch fixes this issue by holding the port lock.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8250_core: Fix unwanted TX chars write

commit b08c9c317e3f7764a91d522cd031639ba42b98cc upstream.

On transmit-hold-register empty, serial8250_tx_chars
should be called only if we don't use DMA.
DMA has its own tx cycle.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpu: host1x: handle the correct # of syncpt regs

commit 22bbd5d949dc7fdd72a4e78e767fa09d8e54b446 upstream.

BIT_WORD() truncates rather than rounds, so the loops in
syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
rather than < in an attempt to process the correct number of registers
when rounding of the conversion of count of bits to count of words is
necessary. However, when rounding isn't necessary because the value is
already a multiple of the divisor (as is the case for all values of
nb_pts the code actually sees), this causes one too many registers to
be processed.

Solve this by using and explicit DIV_ROUND_UP() call, rather than
BIT_WORD(), and comparing with < rather than <=.

Fixes: 7ede0b0bf3e2 ("gpu: host1x: Add syncpoint wait and interrupts")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

timer: Prevent overflow in apply_slack

commit 98a01e779f3c66b0b11cd7e64d531c0e41c95762 upstream.

On architectures with sizeof(int) < sizeof (long), the
computation of mask inside apply_slack() can be undefined if the
computed bit is > 32.

E.g. with: expires = 0xffffe6f5 and slack = 25, we get:

expires_limit = 0x20000000e
bit = 33
mask = (1 << 33) - 1  /* undefined */

On x86, mask becomes 1 and and the slack is not applied properly.
On s390, mask is -1, expires is set to 0 and the timer fires immediately.

Use 1UL << bit to solve that issue.

Suggested-by: Deborah Townsend <dstownse@us.ibm.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipmi: Fix a race restarting the timer

commit 48e8ac2979920ffa39117e2d725afa3a749bfe8d upstream.

With recent changes it is possible for the timer handler to detect an
idle interface and not start the timer, but the thread to start an
operation at the same time.  The thread will not start the timer in that
instance, resulting in the timer not running.

Instead, move all timer operations under the lock and start the timer in
the thread if it detect non-idle and the timer is not already running.
Moving under locks allows the last timeout to be set in both the thread
and the timer.  'Timer is not running' means that the timer is not
pending and smi_timeout() is not running.  So we need a flag to detect
this correctly.

Also fix a few other timeout bugs: setting the last timeout when the
interrupt has to be disabled and the timer started, and setting the last
timeout in check_start_timer_thread possibly racing with the timer

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipmi: Reset the KCS timeout when starting error recovery

commit eb6d78ec213e6938559b801421d64714dafcf4b2 upstream.

The OBF timer in KCS was not reset in one situation when error recovery
was started, resulting in an immediate timeout.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: fix suspend vs. authentication race

commit 1a1cb744de160ee70086a77afff605bbc275d291 upstream.

Since Stanislaw's patch removing the quiescing code, mac80211 had
a race regarding suspend vs. authentication: as cfg80211 doesn't
track authentication attempts, it can't abort them. Therefore the
attempts may be kept running while suspending, which can lead to
all kinds of issues, in at least some cases causing an error in
iwlmvm firmware.

Fix this by aborting the authentication attempt when suspending.

Cc: stable@vger.kernel.org
Fixes: 12e7f517029d ("mac80211: cleanup generic suspend/resume procedures")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm, thp: close race between mremap() and split_huge_page()

commit dd18dbc2d42af75fffa60c77e0f02220bc329829 upstream.

It's critical for split_huge_page() (and migration) to catch and freeze
all PMDs on rmap walk.  It gets tricky if there's concurrent fork() or
mremap() since usually we copy/move page table entries on dup_mm() or
move_page_tables() without rmap lock taken.  To get it work we rely on
rmap walk order to not miss any entry.  We expect to see destination VMA
after source one to work correctly.

But after switching rmap implementation to interval tree it's not always
possible to preserve expected walk order.

It works fine for dup_mm() since new VMA has the same vma_start_pgoff()
/ vma_last_pgoff() and explicitly insert dst VMA after src one with
vma_interval_tree_insert_after().

But on move_vma() destination VMA can be merged into adjacent one and as
result shifted left in interval tree.  Fortunately, we can detect the
situation and prevent race with rmap walk by moving page table entries
under rmap lock.  See commit 38a76013ad80.

Problem is that we miss the lock when we move transhuge PMD.  Most
likely this bug caused the crash[1].

[1] http://thread.gmane.org/gmane.linux.kernel.mm/96473

Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail")

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()

commit 9844f5462392b53824e8b86726e7c33b5ecbb676 upstream.

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwpoison, hugetlb: lock_page/unlock_page does not match for handling a free hugepage

commit b985194c8c0a130ed155b71662e39f7eaea4876f upstream.

For handling a free hugepage in memory failure, the race will happen if
another thread hwpoisoned this hugepage concurrently.  So we need to
check PageHWPoison instead of !PageHWPoison.

If hwpoison_filter(p) returns true or a race happens, then we need to
unlock_page(hpage).

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: fix on-channel remain-on-channel

commit b4b177a5556a686909e643f1e9b6434c10de079f upstream.

Jouni reported that if a remain-on-channel was active on the
same channel as the current operating channel, then the ROC
would start, but any frames transmitted using mgmt-tx on the
same channel would get delayed until after the ROC.

The reason for this is that the ROC starts, but doesn't have
any handling for "remain on the same channel", so it stops
the interface queues. The later mgmt-tx then puts the frame
on the interface queues (since it's on the current operating
channel) and thus they get delayed until after the ROC.

To fix this, add some logic to handle remaining on the same
channel specially and not stop the queues etc. in this case.
This not only fixes the bug but also improves behaviour in
this case as data frames etc. can continue to flow.

Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (emc1403) fix inverted store_hyst()

commit 17c048fc4bd95efea208a1920f169547d8588f1f upstream.

Attempts to set the hysteresis value to a temperature below the target
limit fails with "write error: Numerical result out of range" due to
an inverted comparison.

Signed-off-by: Josef Gajdusek <atx@atx.name>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (emc1403) Support full range of known chip revision numbers

commit 3a18e1398fc2dc9c32bbdc50664da3a77959a8d1 upstream.

The datasheet for EMC1413/EMC1414, which is fully compatible to
EMC1403/1404 and uses the same chip identification, references revision
numbers 0x01, 0x03, and 0x04. Accept the full range of revision numbers
from 0x01 to 0x04 to make sure none are missed.

Signed-off-by: Josef Gajdusek <atx@atx.name>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Prevent all reprogramming if hang detected

commit 6c6c0d5a1c949d2e084706f9e5fb1fccc175b265 upstream.

If the last hrtimer interrupt detected a hang it sets hang_detected=1
and programs the clock event device with a delay to let the system
make progress.

If hang_detected == 1, we prevent reprogramming of the clock event
device in hrtimer_reprogram() but not in hrtimer_force_reprogram().

This can lead to the following situation:

hrtimer_interrupt()
   hang_detected = 1;
   program ce device to Xms from now (hang delay)

We have two timers pending:
   T1 expires 50ms from now
   T2 expires 5s from now

Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
invoked, which in turn programs the clock event device to T2 (5
seconds from now).

Any hrtimer_start after that will not reprogram the hardware due to
hang_detected still being set. So we effectivly block all timers until
the T2 event fires and cleans up the hang situation.

Add a check for hang_detected to hrtimer_force_reprogram() which
prevents the reprogramming of the hang delay in the hardware
timer. The subsequent hrtimer_interrupt will resolve all outstanding
issues.

[ tglx: Rewrote subject and changelog and fixed up the comment in
  	hrtimer_force_reprogram() ]

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Prevent remote enqueue of leftmost timers

commit 012a45e3f4af68e86d85cce060c6c2fed56498b2 upstream.

If a cpu is idle and starts an hrtimer which is not pinned on that
same cpu, the nohz code might target the timer to a different cpu.

In the case that we switch the cpu base of the timer we already have a
sanity check in place, which determines whether the timer is earlier
than the current leftmost timer on the target cpu. In that case we
enqueue the timer on the current cpu because we cannot reprogram the
clock event device on the target.

If the timers base is already the target CPU we do not have this
sanity check in place so we enqueue the timer as the leftmost timer in
the target cpus rb tree, but we cannot reprogram the clock event
device on the target cpu. So the timer expires late and subsequently
prevents the reprogramming of the target cpu clock event device until
the previously programmed event fires or a timer with an earlier
expiry time gets enqueued on the target cpu itself.

Add the same target check as we have for the switch base case and
start the timer on the current cpu if it would become the leftmost
timer on the target.

[ tglx: Rewrote subject and changelog ]

Signed-off-by: Leon Ma <xindong.ma@intel.com>
Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Set expiry time before switch_hrtimer_base()

commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.

switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.

But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.

Update expiry time before calling switch_hrtimer_base().

[ tglx: Rewrote changelog once again ]

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: fweisbec@gmail.com
Cc: arvind.chauhan@arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.1399882253.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: avoid possible spinning md thread at shutdown.

commit 0f62fb220aa4ebabe8547d3a9ce4a16d3c045f21 upstream.

If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:

1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
   and md_check_recover enters an infinite loop.

Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Break encoder->crtc link separately in intel_sanitize_crtc()

commit 7f1950fbb989e8fc5463b307e062b4529d51c862 upstream.

Depending on the SDVO output_flags SDVO may have multiple connectors
linking to the same encoder (in intel_connector->encoder->base).
Only one of those connectors should be active (ie link to the encoder
thru drm_connector->encoder).
If intel_connector_break_all_links() is called from intel_sanitize_crtc()
we may break the crtc connection of an encoder thru an inactive connector
in which case intel_connector_break_all_links() will not be called again
for the active connector if this happens to come later in the list due to:
    if (connector->encoder->base.crtc != &crtc->base)
                                 continue;
in intel_sanitize_crtc().
This will however leave the drm_connector->encoder linkage for this
active connector in place. Subsequently this will cause multiple
warnings in intel_connector_check_state() to trigger and the driver
will eventually die in drm_encoder_crtc_ok() (because of crtc == NULL).

To avoid this remove intel_connector_break_all_links() and move its
code to its two calling functions: intel_sanitize_crtc() and
intel_sanitize_encoder().
This allows to implement the link breaking more flexibly matching
the surrounding code: ie. in intel_sanitize_crtc() we can break the
crtc link separatly after the links to the encoders have been
broken which avoids above problem.

This regression has been introduced in:

commit 24929352481f085c5f85d4d4cbc919ddf106d381
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jul 2 20:28:59 2012 +0200

    drm/i915: read out the modeset hw state at load and resume time

so goes back to the very beginning of the modeset rework.

v2: This patch takes care of the concernes voiced by Chris Wilson
and Daniel Vetter that only breaking links if the drm_connector
is linked to an encoder may miss some links.
v3: move all encoder handling to encoder loop as suggested by
Daniel Vetter.

Signed-off-by: Egbert Eich <eich@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: fix ATPX detection on non-VGA GPUs

commit e9a4099a59cc598a44006059dd775c25e422b772 upstream.

Some newer PX laptops have the pci device class
set to DISPLAY_OTHER rather than DISPLAY_VGA.  This
properly detects ATPX on those laptops.

Based on a patch from: Pali Rohár <pali.rohar@gmail.com>

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: check buffer relocation offset

commit 695daf1a8e731a4b5b89de89a61f32a4d7ad7094 upstream.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/tegra: Remove gratuitous pad field

commit cbfbbabb89b37f6bad05f478d906a385149f288d upstream.

The version of the drm_tegra_submit structure that was merged all the
way back in 3.10 contains a pad field that was originally intended to
properly pad the following __u64 field. Unfortunately it seems like a
different field was dropped during review that caused this padding to
become unnecessary, but the pad field wasn't removed at that time.

One possible side-effect of this is that since the __u64 following the
pad is now no longer properly aligned, the compiler may (or may not)
introduce padding itself, which results in no predictable ABI.

Rectify this by removing the pad field so that all fields are again
naturally aligned. Technically this is breaking existing userspace ABI,
but given that there aren't any (released) userspace drivers that make
use of this yet, the fallout should be minimal.

Fixes: d43f81cbaf43 ("drm/tegra: Add gr2d device")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio:imu:mpu6050: Fixed segfault in Invensens MPU driver due to null dereference

commit b9b3a41893c3f1be67b5aacfa525969914bea0e9 upstream.

The driver segfaults when the kernel boots with device tree as the
platform data is then not present and the pointer is deferenced without
checking it is not null.  This patch introduces such a check avoiding the
crash.

Signed-off-by: Atilla Filiz <atilla.filiz@essensium.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsl-usb: do not test for PHY_CLK_VALID bit on controller version 1.6

commit d183c81929beeba842b74422f754446ef2b8b49c upstream.

Per reference manuals of Freescale P1020 and P2020 SoCs, USB controller
present in these SoCs has bit 17 of USBx_CONTROL register marked as
Reserved - there is no PHY_CLK_VALID bit there.

Testing for this bit in ehci_fsl_setup_phy() behaves differently on two
P1020RDB boards available here - on one board test passes and fsl-usb
init succeeds, but on other board test fails, causing fsl-usb init to
fail.

This patch changes ehci_fsl_setup_phy() not to test PHY_CLK_VALID on
controller version 1.6 that (per manual) does not have this bit.

Signed-off-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: gadget: at91-udc: fix irq and iomem resource retrieval

commit 886c7c426d465732ec9d1b2bbdda5642fc2e7e05 upstream.

When using dt resources retrieval (interrupts and reg properties) there is
no predefined order for these resources in the platform dev resource
table. Also don't expect the number of resource to be always 2.

Signed-off-by: Jean-Jacques Hiblot <jjhiblot@traphandler.com>
Acked-by: Boris BREZILLON <b.brezillon@overkiz.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: OHCI: fix problem with global suspend on ATI controllers

commit c1db30a2a79eb59997b13b8cabf2a50bea9f04e1 upstream.

Some OHCI controllers from ATI/AMD seem to have difficulty with
"global" USB suspend, that is, suspending an entire USB bus without
setting the suspend feature for each port connected to a device.  When
we try to resume the child devices, the controller gives timeout
errors on the unsuspended ports, requiring resets, and can even cause
ohci-hcd to hang; see

	http://marc.info/?l=linux-usb&m=139514332820398&w=2

and the following messages.

This patch fixes the problem by adding a new quirk flag to ohci-hcd.
The flag causes the ohci_rh_suspend() routine to suspend each
unsuspended, enabled port before suspending the root hub.  This
effectively converts the "global" suspend to an ordinary root-hub
suspend.  There is no need to unsuspend these ports when the root hub
is resumed, because the child devices will be resumed anyway in the
course of a normal system resume ("global" suspend is never used for
runtime PM).

This patch should be applied to all stable kernels which include
commit 0aa2832dd0d9 (USB: use "global suspend" for system sleep on
USB-2 buses) or a backported version thereof.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Peter Münster <pmlists@free.fr>
Tested-by: Peter Münster <pmlists@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: qcserial: add a number of Dell devices

commit 4d7c0136a54f62501f8a34c4d08a5e0258d3d3ca upstream.

Dan writes:

"The Dell drivers use the same configuration for PIDs:

81A2: Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card
81A3: Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card
81A4: Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card
81A8: Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card
81A9: Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card

These devices are all clearly Sierra devices, but are also definitely
Gobi-based.  The A8 might be the MC7700/7710 and A9 is likely a MC7750.

>From DellGobi5kSetup.exe from the Dell drivers:

usbif0: serial/firmware loader?
usbif2: nmea
usbif3: modem/ppp
usbif8: net/QMI"

Reported-by: AceLan Kao <acelan.kao@canonical.com>
Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: storage: shuttle_usbat: fix discs being detected twice

commit df602c2d2358f02c6e49cffc5b49b9daa16db033 upstream.

Even if the USB-to-ATAPI converter supported multiple LUNs, this
driver would always detect the same physical device or media because
it doesn't use srb->device->lun in any way.
Tested with an Hewlett-Packard CD-Writer Plus 8200e.

Signed-off-by: Daniele Forsi <dforsi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: Nokia 305 should be treated as unusual dev

commit f0ef5d41792a46a1085dead9dfb0bdb2c574638e upstream.

Signed-off-by: Victor A. Santos <victoraur.santos@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: Nokia 5300 should be treated as unusual dev

commit 6ed07d45d09bc2aa60e27b845543db9972e22a38 upstream.

Signed-off-by: Daniele Forsi <dforsi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rt2x00: fix beaconing on USB

commit 8834d3608cc516f13e2e510f4057c263f3d2ce42 upstream.

When disable beaconing we clear register with beacon and newer set it
back, what make we stop send beacons infinitely.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: work around corrupted TEAC UD-H01 feedback data

commit 7040b6d1febfdbd9c1595efb751d492cd2503f96 upstream.

The TEAC UD-H01 firmware sends wrong feedback frequency values, thus
causing the PC to send the samples at a wrong rate, which results in
clicks and crackles in the output.

Add a workaround to detect and fix the corruption.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[mick37@gmx.de: use sender->udh01_fb_quirk rather than
 ep->udh01_fb_quirk in snd_usb_handle_sync_urb()]
Reported-and-tested-by: Mick <mick37@gmx.de>
Reported-and-tested-by: Andrea Messa <andr.messa@tiscali.it>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix triggering BR/EDR L2CAP Connect too early

commit 9eb1fbfa0a737fd4d3a6d12d71c5ea9af622b887 upstream.

Commit 1c2e004183178 introduced an event handler for the encryption key
refresh complete event with the intent of fixing some LE/SMP cases.
However, this event is shared with BR/EDR and there we actually want to
act only on the auth_complete event (which comes after the key refresh).

If we do not do this we may trigger an L2CAP Connect Request too early
and cause the remote side to return a security block error.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix redundant encryption request for reauthentication

commit 09da1f3463eb81d59685df723b1c5950b7570340 upstream.

When we're performing reauthentication (in order to elevate the
security level from an unauthenticated key to an authenticated one) we
do not need to issue any encryption command once authentication
completes. Since the trigger for the encryption HCI command is the
ENCRYPT_PEND flag this flag should not be set in this scenario.
Instead, the REAUTH_PEND flag takes care of all necessary steps for
reauthentication.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Add support for Lite-on [04ca:3007]

commit 1fb4e09a7e780b915dbd172592ae7e2a4c071065 upstream.

Add support for the AR9462 chip

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=03 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=04ca ProdID=3007 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Mohammed Habibulla <moch@chromium.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

posix_acl: handle NULL ACL in posix_acl_equiv_mode

commit 50c6e282bdf5e8dabf8d7cf7b162545a55645fd9 upstream.

Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.

Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

omap3isp: Defer probe when the IOMMU is not available

commit 7c0f812a5d65e712618af880dda4a5cc7ed79463 upstream.

When the OMAP3 ISP driver is compiled in the kernel the device can be
probed before the corresponding IOMMU is available. Defer the probe in
that case, and fix a crash in the error path.

Reported-by: Javier Martin <javier.martin@vista-silicon.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: i.MX53: Fix ipu register space size

commit 6d66da89bf4422c0a0693627fb3e25f74af50f92 upstream.

The IPU register space is 128MB, not 2GB.

Fixes: abed9a6bf2bb 'ARM i.MX53: Add IPU support'
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: 8012/1: kdump: Avoid overflow when converting pfn to physaddr

commit 8fad87bca7ac9737e413ba5f1656f1114a8c314d upstream.

When we configure CONFIG_ARM_LPAE=y, pfn << PAGE_SHIFT will
overflow if pfn >= 0x100000 in copy_oldmem_page.
So use __pfn_to_phys for converting.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtl8192cu: Fix unbalanced irq enable in error path of rtl92cu_hw_init()

commit 3234f5b06fc3094176a86772cc64baf3decc98fc upstream.

Fixes: a53268be0cb9 ('rtlwifi: rtl8192cu: Fix too long disable of IRQs')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/acpi: allow non-optimus setups to load vbios from acpi

commit a3d0b1218d351c6e6f3cea36abe22236a08cb246 upstream.

There appear to be a crop of new hardware where the vbios is not
available from PROM/PRAMIN, but there is a valid _ROM method in ACPI.
The data read from PCIROM almost invariably contains invalid
instructions (still has the x86 opcodes), which makes this a low-risk
way to try to obtain a valid vbios image.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76475
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/nouveau/pm/fan: drop the fan lock in fan_update() before rescheduling

commit 61679fe153b2b9ea5b5e2ab93305419e85e99a9d upstream.

This should fix a deadlock that has been reported to us where fan_update()
would hold the fan lock and try to grab the alarm_program_lock to reschedule
an update. On an other CPU, the alarm_program_lock would have been taken
before calling fan_update(), leading to a deadlock.

We should Cc: <stable@vger.kernel.org> # 3.9+

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Timothée Ravier <tim@siosm.fr>
Tested-by: Boris Fersing (IRC nick fersingb, no public email address)
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

leds: leds-pwm: properly clean up after probe failure

commit 392369019eb96e914234ea21eda806cb51a1073e upstream.

When probing with DT, we add each LED one at a time.  If we find a LED
without a PWM device (because it is not available yet) we fail the
initialisation, unregister previous LEDs, and then by way of managed
resources, we free the structure.

The problem with this is we may have a scheduled and active work_struct
in this structure, and this results in a nasty kernel oops.

We need to cancel this work_struct properly upon cleanup - and the
cleanup we require is the same cleanup as we do when the LED platform
device is removed.  Rather than writing this same code three times,
move it into a separate function and use it in all three places.

Fixes: c971ff185f64 ("leds: leds-pwm: Defer led_pwm_set() if PWM can sleep")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

brcmsmac: fix deadlock on missing firmware

commit 8fc1e8c240aab968db658b2d8d079b4391207a36 upstream.

When brcm80211 firmware is not installed networking hangs.
A deadlock happens because we call ieee80211_unregister_hw()
from the .start callback of struct ieee80211_ops. When .start
is called we are under rtnl lock and ieee80211_unregister_hw()
tries to take it again.

Function call stack:

dev_change_flags()
	__dev_change_flags()
		__dev_open()
			ASSERT_RTNL() <-- Assert rtnl lock
			ops->ndo_open()

.ndo_open = ieee80211_open,

ieee80211_open()
	ieee80211_do_open()
		drv_start()
			local->ops->start()

.start = brcms_ops_start,

brcms_ops_start()
	brcms_remove()
		ieee80211_unregister_hw()
			rtnl_lock() <-- Here we deadlock

Introduced by:
commit 25b5632fb35ca61b8ae3eee235edcdc2883f7a5e
("brcmsmac: request firmware in .start() callback")

This patch fixes the bug by removing the call to brcms_remove()
and moves the brcms_request_fw() call to the top of the .start
callback to not initiate anything unless firmware is installed.

Signed-off-by: Emil Goode <emilgoode@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Documentation: Update stable address in Chinese and Japanese translations

commit 98b0f811aade1b7c6e7806c86aa0befd5919d65f upstream.

The English and Korean translations were updated, the Chinese and Japanese
weren't.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: crypto_wq - Fix late crypto work queue initialization

commit 130fa5bc81b44b6cc1fbdea3abf6db0da22964e0 upstream.

The crypto algorithm modules utilizing the crypto daemon could
be used early when the system start up.  Using module_init
does not guarantee that the daemon's work queue is initialized
when the cypto alorithm depending on crypto_wq starts.  It is necessary
to initialize the crypto work queue earlier at the subsystem
init time to make sure that it is initialized
when used.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: vexpress: NULL dereference on error path

commit 6b4ed8b00e93bd31f24a25f59ed8d1b808d0cc00 upstream.

If the allocation fails then we dereference the NULL in the error path.
Just return directly.

Fixes: ed27ff1db869 ('clk: Versatile Express clock generators ("osc") driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: media-device: fix infoleak in ioctl media_enum_entities()

commit e6a623460e5fc960ac3ee9f946d3106233fd28d8 upstream.

This fixes CVE-2014-1739.

Signed-off-by: Salva Peiró <speiro@ai2.upv.es>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: i801: Add Device IDs for Intel Wildcat Point-LP PCH

commit afc659241258b40b683998ec801d25d276529f43 upstream.

This patch adds the SMBus Device IDs for the Intel Wildcat Point-LP PCH.

Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: "Chang, Rebecca Swee Fun" <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: i801: enable Intel BayTrail SMBUS

commit 1b31e9b76ef8c62291e698dfdb973499986a7f68 upstream.

Add Device ID of Intel BayTrail SMBus Controller.

Signed-off-by: Chew, Kean ho <kean.ho.chew@intel.com>
Signed-off-by: Chew, Chiau Ee <chiau.ee.chew@intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: "Chang, Rebecca Swee Fun" <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts

commit 03367ef5ea811475187a0732aada068919e14d61 upstream.

Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality
is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the
VMBUS protocol.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

trace: module: Maintain a valid user count

commit 098507ae3ec2331476fb52e85d4040c1cc6d0ef4 upstream.

The replacement of the 'count' variable by two variables 'incs' and
'decs' to resolve some race conditions during module unloading was done
in parallel with some cleanup in the trace subsystem, and was integrated
as a merge.

Unfortunately, the formula for this replacement was wrong in the tracing
code, and the refcount in the traces was not usable as a result.

Use 'count = incs - decs' to compute the user count.

Link: http://lkml.kernel.org/p/1393924179-9147-1-git-send-email-romain.izard.pro@gmail.com

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: c1ab9cab7509 "merge conflict resolution"
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: atkbd - fix keyboard not working on some LG laptops

commit 3d725caa9dcc78c3dc9e7ea0c04f626468edd9c9 upstream.

After issuing ATKBD_CMD_RESET_DIS, keyboard on some LG laptops stops
working. The workaround is to stop issuing ATKBD_CMD_RESET_DIS commands.

In order to keep changes in atkbd driver to the minimum we check DMI
signature and only skip ATKBD_CMD_RESET_DIS if we are running on LG
LW25-B7HV or P1-J273B.

Signed-off-by: Sheng-Liang Song <ssl@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: elantech - fix touchpad initialization on Gigabyte U2442

commit 36189cc3cd57ab0f1cd75241f93fe01de928ac06 upstream.

The hw_version 3 Elantech touchpad on the Gigabyte U2442 does not accept
0x0b as initialization value for r10, this stand-alone version of the
driver: http://planet76.com/drivers/elantech/psmouse-elantech-v6.tar.bz2

Uses 0x03 which does work, so this means not setting bit 3 of r10 which
sets: "Enable Real H/W Resolution In Absolute mode"

Which will result in half the x and y resolution we get with that bit set,
so simply not setting it everywhere is not a solution. We've been unable to
find a way to identify touchpads where setting the bit will fail, so this
patch uses a dmi based blacklist for this.

https://bugzilla.kernel.org/show_bug.cgi?id=61151

Reported-by: Philipp Wolfer <ph.wolfer@gmail.com>
Tested-by: Philipp Wolfer <ph.wolfer@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - add min/max quirk for the ThinkPad W540

commit 0b5fe736fe923f1f5e05413878d5990e92ffbdf5 upstream.

https://bugzilla.redhat.com/show_bug.cgi?id=1096436

Tested-and-reported-by: ajayr@bigfoot.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - T540p - unify with other LEN0034 models

commit 6d396ede224dc596d92d7cab433713536e68916c upstream.

The T540p has a touchpad with pnp-id LEN0034, all the models with this
pnp-id have the same min/max values, except the T540p where the values are
slightly off. Fix them to be identical.

This is a preparation patch for simplifying the quirk table.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix onboard audio on Intel H97/Z97 chipsets

commit 77f07800cb456bed6e5c345e6e4e83e8eda62437 upstream.

The recent Intel H97/Z97 chipsets need the similar setups like other
Intel chipsets for snooping, etc.  Especially without snooping, the
audio playback stutters or gets corrupted.  This fix patch just adds
the corresponding PCI ID entry with the proper flags.

Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSd: Move default initialisers from create_client() to alloc_client()

commit 5694c93e6c4954fa9424c215f75eeb919bddad64 upstream.

Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSd: call rpc_destroy_wait_queue() from free_client()

commit 4cb57e3032d4e4bf5e97780e9907da7282b02b0c upstream.

Mainly to ensure that we don't leave any hanging timers.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Call ->set_acl with a NULL ACL structure if no entries

commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream.

After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.

This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.

Thanks for Christoph Hellwig's advice.

First problem:
............ hang ...........

Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
3.15.0-rc1+ #15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320]  [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320]  RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: warn on finding lockowner without stateid's

commit 27b11428b7de097c42f205beabb1764f4365443b upstream.

The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: remove lockowner when removing lock stateid

commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream.

The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: fix bugs in wq_update_unbound_numa() failure path

commit 77f300b198f93328c26191b52655ce1b62e202cf upstream.

wq_update_unbound_numa() failure path has the following two bugs.

- alloc_unbound_pwq() is called without holding wq->mutex; however, if
  the allocation fails, it jumps to out_unlock which tries to unlock
  wq->mutex.

- The function should switch to dfl_pwq on failure but didn't do so
  after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: fix a possible race condition between rescuer and pwq-release

commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream.

There is a race condition between rescuer_thread() and
pwq_unbound_release_workfn().

Even after a pwq is scheduled for rescue, the associated work items
may be consumed by any worker.  If all of them are consumed before the
rescuer gets to them and the pwq's base ref was put due to attribute
change, the pwq may be released while still being linked on
@wq->maydays list making the rescuer dereference already freed pwq
later.

Make send_mayday() pin the target pwq until the rescuer is done with
it.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: make rescuer_thread() empty wq->maydays list before exiting

commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream.

After a @pwq is scheduled for emergency execution, other workers may
consume the affectd work items before the rescuer gets to them.  This
means that a workqueue many have pwqs queued on @wq->maydays list
while not having any work item pending or in-flight.  If
destroy_workqueue() executes in such condition, the rescuer may exit
without emptying @wq->maydays.

This currently doesn't cause any actual harm.  destroy_workqueue() can
safely destroy all the involved data structures whether @wq->maydays
is populated or not as nobody access the list once the rescuer exits.

However, this is nasty and makes future development difficult.  Let's
update rescuer_thread() so that it empties @wq->maydays after seeing
should_stop to guarantee that the list is empty on rescuer exit.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bus: mvebu-mbus: allow several windows with the same target/attribute

commit b566e782be32145664d96ada3e389f17d32742e5 upstream.

Having multiple windows with the same target and attribute is actually
legal, and can be useful for PCIe windows, when PCIe BARs have a size
that isn't a power of two, and we therefore need to create several
MBus windows to cover the PCIe BAR for a given PCIe interface.

Fixes: fddddb52a6c4 ('bus: introduce an Marvell EBU MBus driver')
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: https://lkml.kernel.org/r/1397823593-1932-7-git-send-email-thomas.petazzoni@free-electrons.com
Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()

commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream.

pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)

It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
but for consistency with its couterpart pcpu_mem_zalloc(),
use pcpu_mem_free() instead.

Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use
pcpu_mem_free() instead of kfree()") addressed this problem, but
missed this one.

tj: commit message updated

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-blkfront: revoke foreign access for grants not mapped by the backend

commit fbe363c476afe8ec992d3baf682670a4bd1b6ce6 upstream.

There's no need to keep the foreign access in a grant if it is not
persistently mapped by the backend. This allows us to free grants that
are not mapped by the backend, thus preventing blkfront from hoarding
all grants.

The main effect of this is that blkfront will only persistently map
the same grants as the backend, and it will always try to use grants
that are already mapped by the backend. Also the number of persistent
grants in blkfront is the same as in blkback (and is controlled by the
value in blkback).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-blkfront: restore the non-persistent data path

commit bfe11d6de1c416cea4f3f0f35f864162063ce3fa upstream.

When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
Tested-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatile

commit 44330ab516c15dda8a1e660eeaf0003f84e43e3f upstream.

The register CLASS_D_CONTROL_1 is marked as volatile because it contains
a bit, DAC_MUTE, which is also mirrored in the ADC_DAC_CONTROL_1
register. This causes problems for the "Speaker Switch" control, which
will report an error if the CODEC is suspended because it relies on a
volatile register.

To resolve this issue mark CLASS_D_CONTROL_1 as non-volatile and
manually keep the register cache in sync by updating both bits when
changing the mute status.

Reported-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

metag: fix memory barriers

commit 2425ce84026c385b73ae72039f90d042d49e0394 upstream.

Volatile access doesn't really imply the compiler barrier. Volatile access
is only ordered with respect to other volatile accesses, it isn't ordered
with respect to general memory accesses. Gcc may reorder memory accesses
around volatile acc…

placiano pushed a commit to placiano/NBKernel_Lollipop that referenced this issue Mar 8, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

shminer added a commit to shminer/android_kernel_lge_f460 that referenced this issue Mar 9, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Pafcholini added a commit to Pafcholini/Nadia-kernel-LL-N910F-EUR-LL-OpenSource that referenced this issue Mar 11, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Pafcholini added a commit to Pafcholini/Nadia-kernel-LL--Update-Linux-N910F-EUR-LL-OpenSource that referenced this issue Mar 11, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SkrilaxCZ added a commit to razrqcom-dev-team/android_kernel_motorola_apq8084 that referenced this issue Mar 20, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

javilonas added a commit to javilonas/Lonas_KL-SM-G901F that referenced this issue May 15, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: placiano <placiano80@gmail.com>
Signed-off-by: Pafcholini <nadyaivanova14@gmail.com>

javilonas added a commit to javilonas/Lonas_KL-SM-G901F that referenced this issue May 17, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cm-gerrit pushed a commit to CyanogenMod/android_kernel_motorola_apq8084 that referenced this issue May 19, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hohoxu pushed a commit to hohoxu/n5kernel that referenced this issue Jul 15, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KangDroid pushed a commit to Hexa-Project-Device/android_kernel_moto_shamu_Nougat that referenced this issue Aug 18, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crimsonthunder pushed a commit to crimsonthunder/kernel_samsung_trlte that referenced this issue Aug 29, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: placiano <placiano80@gmail.com>
Signed-off-by: Pafcholini <nadyaivanova14@gmail.com>

Pafcholini added a commit to Pafcholini/Beta_TW that referenced this issue Sep 5, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: placiano <placiano80@gmail.com>
Signed-off-by: Pafcholini <nadyaivanova14@gmail.com>

gchild320 added a commit to gchild320/flounder-lp that referenced this issue Sep 29, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crimsonthunder pushed a commit to crimsonthunder/kernel_samsung_trlte_5.1.1 that referenced this issue Oct 3, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: placiano <placiano80@gmail.com>
Signed-off-by: Pafcholini <nadyaivanova14@gmail.com>

crimsonthunder pushed a commit to crimsonthunder/kernel_samsung_trlte_5.1.1 that referenced this issue Oct 13, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: placiano <placiano80@gmail.com>
Signed-off-by: Pafcholini <nadyaivanova14@gmail.com>

gchild320 added a commit to gchild320/flounder that referenced this issue Oct 24, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nychitman1 added a commit to nychitman1/android_kernel_htc_flounder that referenced this issue Nov 7, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nychitman1 added a commit to nychitman1/android_kernel_htc_flounder that referenced this issue Nov 8, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

u9621071 pushed a commit to u9621071/kernel-uek-UEK3 that referenced this issue Nov 16, 2015

libceph: fix corruption when using page_count 0 page in rbd
Orabug: 19573338

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit a757a4e215574f2c92fc990275fa5e02159771e1)

Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

Conflicts:

	net/ceph/messenger.c
(cherry picked from commit 274466491645964c76ce54a41c8869eab25cf84c)

Signed-off-by: Dan Duval <dan.duval@oracle.com>

artefvck added a commit to artefvck/artefvck that referenced this issue Dec 24, 2015

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

aow1980 added a commit to TeamTwisted/hells-Core-N6 that referenced this issue Jan 24, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dfuse06 added a commit to dfuse06/android_kernel_htc_flounder_old that referenced this issue Feb 22, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dfuse06 added a commit to dfuse06/android_kernel_htc_flounder_old that referenced this issue Feb 23, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dfuse06 added a commit to dfuse06/android_kernel_htc_flounder_old that referenced this issue Feb 23, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sashalevin added a commit to sashalevin/linux-stable-security that referenced this issue Apr 29, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

sashalevin added a commit to sashalevin/linux-stable-security that referenced this issue Apr 29, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

sashalevin added a commit to sashalevin/linux-stable-security that referenced this issue Apr 29, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

sashalevin added a commit to sashalevin/linux-stable-security that referenced this issue Apr 29, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

mattgorski added a commit to Jetson-TK1-AndroidTV/android_kernel_nvidia_jetson_l4t_21.4 that referenced this issue Apr 30, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nikhil18 added a commit to nikhil18/lightning-kernel-bacon that referenced this issue May 16, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crimsonthunder pushed a commit to crimsonthunder/cm_kernel that referenced this issue Jun 20, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

StefanescuCristian added a commit to StefanescuCristian/shamu that referenced this issue Jul 25, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USA-RedDragon added a commit to USA-RedDragon/Werewolf-trltetmo that referenced this issue Sep 5, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

muhviehstah added a commit to muhviehstah/N915FY-MM-Kernel that referenced this issue Nov 21, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IonKiwi added a commit to IonKiwi/android_kernel_samsung_kccat6 that referenced this issue Nov 28, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IonKiwi added a commit to IonKiwi/android_kernel_samsung_kccat6 that referenced this issue Dec 30, 2016

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crimsonthunder pushed a commit to crimsonthunder/TRLTE_AOSP_Kernel that referenced this issue Jan 9, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IonKiwi added a commit to IonKiwi/android_kernel_samsung_kccat6 that referenced this issue Jan 28, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

EfranDev added a commit to TeamAlto45/android_kernel_alcatel_msm8916 that referenced this issue Feb 12, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IonKiwi added a commit to IonKiwi/android_kernel_samsung_kccat6 that referenced this issue Feb 26, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lineageos-gerrit pushed a commit to LineageOS/android_kernel_samsung_apq8084 that referenced this issue Apr 24, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Corinna Vinschen <xda@vinschen.de>

rockinroyle added a commit to aospdk/shamu that referenced this issue Jun 29, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rockinroyle added a commit to aospdk/shamu that referenced this issue Jul 1, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

github-cygwin added a commit to github-cygwin/android_kernel_samsung_apq8084 that referenced this issue Jul 6, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

TheRingMaster added a commit to GZR-Kernels/kernel_moto_shamu that referenced this issue Aug 20, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kevintm78 added a commit to kevintm78/Back_Alley that referenced this issue Sep 5, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

engine95 added a commit to engine95/S2-710-2DQCL-Nougat that referenced this issue Sep 26, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

engine95 added a commit to engine95/S2-815-2CQCL-Nougat that referenced this issue Sep 28, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

engine95 added a commit to engine95/S2-715-2CQCL-Nougat that referenced this issue Sep 28, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

engine95 added a commit to engine95/S2-810-2DQCL-Nougat that referenced this issue Sep 28, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

elektroschmock added a commit to elektroschmock/android_kernel_moto_shamu that referenced this issue Oct 4, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda2 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

camcory added a commit to camcory/android_kernel_moto_shamu that referenced this issue Oct 6, 2017

libceph: fix corruption when using page_count 0 page in rbd
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

zfsonlinux/spl#241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment