Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Add support for AR5BBU22 [0489:e03c] #17

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet

WNeZRoS commented May 11, 2012

No description provided.

@WNeZRoS WNeZRoS closed this May 11, 2012

Owner

torvalds commented May 11, 2012

I don't do github pull requests.

github throws away all the relevant information, like having even a
valid email address for the person asking me to pull. The diffstat is
also deficient and useless.

Git comes with a nice pull-request generation module, but github
instead decided to replace it with their own totally inferior version.
As a result, I consider github useless for these kinds of things. It's
fine for hosting, but the pull requests and the online commit
editing, are just pure garbage.

I've told github people about my concerns, they didn't think they
mattered, so I gave up. Feel free to make a bugreport to github.

                Linus

On Fri, May 11, 2012 at 4:27 AM, Roman
reply@reply.github.com
wrote:

You can merge this Pull Request by running:

 git pull https://github.com/WNeZRoS/linux master

Or you can view, comment on it, or merge it online at:

 #17

-- Commit Summary --

  • Add support for AR5BBU22 [0489:e03c]

-- File Changes --

M drivers/bluetooth/btusb.c (3)

-- Patch Links --

 https://github.com/torvalds/linux/pull/17.patch
 https://github.com/torvalds/linux/pull/17.diff


Reply to this email directly or view it on GitHub:
#17

How do you feel about merging in things that may include commits downstream that have been pull requested with github? Seems hard to stop that.

Somebody please look at the diff. Thats a simple 3 line code addition. I agree to you @torvalds but you could have excused this time :)

By the way, its quite funny that github is sending instructions to @torvalds on using git.

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 1:03 PM, orblivion
reply@reply.github.com
wrote:

How do you feel about merging in things that may include commits downstream that have been pull requested with github? Seems hard to stop that.

Read my email.

I have no problem with people using github as a hosting site.

But in order for me to pull from github, you need to

(a) make a real pull request, not the braindamaged crap that github
does when you ask it to request a pull: real explanation, proper email
addresses, proper shortlog, and proper diffstat.

(b) since github identities are random, I expect the pull request to
be a signed tag, so that I can verify the identity of the person in
question.

I also refuse to pull commits that have been made with the github web
interface. Again, the reason for that is that the way the github web
interface work, those commits are invariably pure crap. Commits done
on github invariably have totally unreadable descriptions, because the
github commit making thing doesn't do any of the simplest things
that the kernel people expect from a commit message:

  • no "short one-line description in the first line"
  • no sane word-wrap of the long description you type: github commit
    messages tend to be (if they have any description at all) one long
    unreadable line.
  • no sign-offs etc that we require for kernel submissions.

github could make it easy to write good commit messages and enforce
the proper "oneliner for shortlogs and gitk, full explanation for full
logs". But github doesn't. Instead, the github "commit on the web"
interface is one single horrible text-entry field with absolutely no
sane way to write a good-looking message.

Maybe some of this has changed, I haven't checked lately. But in
general, the quality of stuff I have seen from people who use the
github web interfaces has been so low that it's not worth my time.

I'm writing these explanations in the (probably vain) hope that people
who use github will actually take them to heart, and github will
eventually improve. But right now github is a total ghetto of crap
commit messages and unreadable and unusable pull requests.

And the fact that other projects apparently have so low expectations
of commit messages that these things get used is just sad. People
should try to compare the quality of the kernel git logs with some
other projects, and cry themselves to sleep.

               Linus
Owner

torvalds commented May 11, 2012

Btw, Joseph, you're a quality example of why I detest the github
interface. For some reason, github has attracted people who have zero
taste, don't care about commit logs, and can't be bothered.

The fact that I have higher standards then makes people like you make
snarky comments, thinking that you are cool.

You're a moron.

               Linus

skalnik commented May 11, 2012

@torvalds The GitHub commit UI provides a text area for commit messages. This supports new lines and makes it easy to do nicely formatted commit messages :)

jedahan commented May 11, 2012

@skalnik would be nice if it had an 80-character line to help format things nicely.

Every time another Pull Request fiasco happens on one of Linus's repos it makes me sad, especially because I want someone whose work I greatly respect, to have a good experience on GitHub - instead he gets dozens of troll comments.

An OS kernel very rightfully demands a very disciplined approach to development that is in many ways not compatible with the goals of GitHub, which is to get as many people of all skill levels involved in Free / Open Source Software. We can certainly make improvements though, and I appreciate that Linus has taken some time to detail exactly why he doesn't use PRs, even if it's a bit harsh.

tubbo commented May 11, 2012

 - no sane word-wrap of the long description you type: github commit
messages tend to be (if they have any description at all) one long
unreadable line.

I think this is only because people who are new to Git are using GitHub and not understanding about Git-style committing. Remember, a lot of these newbies are just out of the gate from using SVN for years. I bet a lot of them don't even realize that git commit with the "-m" omitted just opens up COMMIT_EDITMSG in your editor. It isn't even very apparent (to newbies) of the 50-char title rule and 72-char every other line rule with commit messages.

github *could* make it easy to write good commit messages and enforce
the proper "oneliner for shortlogs and gitk, full explanation for full
logs". But github doesn't. Instead, the github "commit on the web"
interface is one single horrible text-entry field with absolutely no
sane way to write a good-looking message.

I have to agree with you there. Commit message viewing on Github sucks and I hope they change it soon.

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 1:29 PM, Mike Skalnik
reply@reply.github.com
wrote:

@torvalds The GitHub commit UI provides a text area for commit messages. This supports new lines and makes it easy to do nicely formatted commit messages :)

No it doesn't.

What it supports is writing long lines that you have not a f*cking
clue how long they are. The text area does not do line breaks for you,
and you have no way to judge where the line breaks would go.

In other words, it makes it very hard indeed to do "nicely formatted
commit messages". It also doesn't enforce the trivial "oneliner for
shortlog" model, so the commit messages often end up looking like
total crap in shortlogs and in gitk.

So the github commit UI should have

  • separate "shortlog" one-liner text window, so that people cannot
    screw that up.
  • some way to actually do sane word-wrap at the standard 72-column mark.
  • reminders about sign-offs etc that some projects need for
    project-specific or even legal reasons.

It didn't do any of those last time I checked.

              Linus

jedahan commented May 11, 2012

I always thought of the title of a pull request as the one-liner ...

jrep commented May 11, 2012

Newbie question I know, but can someone point me to this "nice pull-request generation module" Linus mentions? My google fu, documentation fu, and command-line-help fu all failed.

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 1:40 PM, Tom Scott
reply@reply.github.com
wrote:

  • no sane word-wrap of the long description you type: github commit
       messages tend to be (if they have any description at all) one long
       unreadable line.

I think this is only because people who are new to Git are using GitHub and not understanding about Git-style committing.

The thing is, even if you do understand about git-style committing,
it's actually really hard to do that with the github web interface.

The best way to do it is literally to open up another text editor
for the commit message, and then cut-and-paste the end result into the
web interface text tool.

Yes, commit messages should have proper word-wrap, with empty lines in
between paragraphs, and at the same time sometimes you need a long
line without word-wrap (compiler error messages or other "non-prose"
explanation).

And yes, that would almost require some kind of "markup" format with
quoting markers etc. And yes, it would be a more complex model of
writing commit messages. But if the default is "word-wrap at 72
characters, put empty lines in between paragraphs", then people who
don't know about the markup would still on average get better results
(even if the word-wrap would then occasionally be the wrong thing to
do)

Right now, github simply seems to default to "broken horrible
messages", and make it really really hard to do a good job.

And I think it should default to "nice readable messages" with some
effort needed for special things.

            Linus

@jrep: I believe he's referring to git-request-pull.

nugend commented May 11, 2012

I'm not sure I understand why the commit message itself should be hard word-wrapped. Naively, it seems like that should be a display property of the editor used to write the commit message or the tool used to display the commit message.

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 1:48 PM, Dominik Dabrowski
reply@reply.github.com
wrote:

You might have fun raging on the internet, but I think your goals would be better served if you expressed your thoughts in a clear (maybe even polite) manner that doesn't embarrass the people whose actions you're trying to influence.

Umm. I think I've been able to reach my goals on the internet better
than most people.

The fact that I'm very clear about my opinions is probably part of it.

If people get offended by accurate portrayals of the current state of
github pull requests, that's their problem.

I hate that whole "victim philosophy". The truth shouldn't be sugarcoated.

                    Linus

scomma commented May 11, 2012

While I do have great respect for you @torvalds and your work, and it's totally valid for the repository of Linux to have rather rigorous standards, have you considered the possibility there could be a lot of GitHub users who don't really need nor care about any of those "features" you try to portray as objectively superior?

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 1:49 PM, Daniel Nugent
reply@reply.github.com
wrote:

I'm not sure I understand why the commit message itself should be hard word-wrapped. Naively, it seems like that should be a display property of the editor used to write the commit message or the tool used to display the commit message.

No it shouldn't.

Word-wrapping is a property of the text. And the tool you use to
visualize things cannot know. End result: you do word-wrapping at the
only stage where you can do it, namely when writing it. Not when
showing it.

Some things should not be word-wrapped. They may be some kind of
quoted text - long compiler error messages, oops reports, whatever.
Things that have a certain specific format.

The tool displaying the thing can't know. The person writing the
commit message can. End result: you'd better do word-wrapping at
commit time, because that's the only time you know the difference.

Sure, the alternative would be to have commit messages be some
non-pure-textual format (html or similar). But no, that's not how git
does things. Sure, technically it could, but realistically the rule is
simple: we use 72-character columns for word-wrapping, except for
quoted material that has a specific line format.

(And the rule is not 80 characters, because you do want to allow the
standard indentation from git log, and you do want to leave some room
for quoting).

Anyway, you are obviously free to do your commit messages any way you
want. However, these are the rules we try to follow in the kernel, and
in git itself.

And quite frankly, anybody who thinks they have better rules had
better prove their point by showing a project with better commit
messages. Quite frankly, I've seen a lot of open-source projects, and
I have yet to see any project that does a better job of doing good
commit messages than the kernel or git. And I've seen a lot of
projects that do much worse.

So I would suggest taking the cue for good log messages from projects
that have proven that they really can do good log messages. Linux and
git are both good examples of that.

             Linus

If you add .patch onto this URL you'll get a git-am style patch.

(Github is very silly for not exposing this in the interface, and for not even really mentioning this feature.)

I agree with you on the messages, I wish the text areas were at least monospaced.

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 2:01 PM, Prathan Thananart
reply@reply.github.com
wrote:

While I do have great respect for you @torvalds and your work, and it's totally valid for the repository of Linux to have rather rigorous standards, have you considered the possibility there could be a lot of GitHub users who don't really need nor care about any of those "features" you try to portray as objectively superior?

Sure.

And when those people with lower standards try to get their commits
included in the kernel, I will ridicule them and point out how broken
their commit messages or pull requests are.

Agreed?

Btw, the commit message rules we use in the kernel really are
objectively better. The fact that some other projects don't care that
much is fine. But just compare kernel message logs to other projects,
and I think you'll find that no, it's not just "my opinion". We do
have standards, and the standards are there to make for better logs.

               Linus
Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 2:03 PM, Mahmut Bulut
reply@reply.github.com
wrote:

So, if you can't "impolite" dear @torvalds, we can say 'why the "linux kernel" is here'?

.. because I think github does some things very well.

So sure, you may think I hate github. I don't. I hate very specific
parts of github that I think are done badly.

But other parts are done really really well.

I think github does a stellar job at the actual hosting part. I
really do. There is no question in my mind that github is one of the
absolute best places to host a project. It's fast, it's efficient, it
works, and it's available to anybody.

That's wonderful. I think github is absolutely lovely in many respects.

And that then makes me really annoyed at the places where I think
github does a subpar job: pull requests and committing changes using
the web interface.

            Linus

Word-wrapping is a property of the text. And the tool you use to
visualize things cannot know. End result: you do word-wrapping at the
only stage where you can do it, namely when writing it. Not when
showing it.

Just curious - why is it that the tool used to visualize things cannot know how to wrap text it displays? And if it is the case, isn't that a problem with the viewer itself, rather than a reason to hard wrap?

Commit messages must be limited to 140 characters, like tweets. Right in git's core.

(See what I did there? What's “pure garbage” for you is just perfect for a lot of people.)

@torvalds Thank you for your rational and good opinion. I appreciate you.

Do you guys not understand that this is Linus' blessed repository and he can accept and reject whomever and whichever request he likes? He has specific and pertinent rules when it comes to merging that he's learned over 20 years of maintaining the Linux kernel. He developed git - in case you forgot, he was the initial developer - with features specifically for gpg signoffs, shortlogs, etc. - things he and other intelligent computer scientists find useful for maintaining repositories.

I've maintained small projects with three developers plus myself and as soon as you become loose with your merging criteria, the entire repository goes to hell. If he wants gpg signoffs, then he'll get gpg signoffs. Try maintaining 20 millions lines of code and merges requests from 2,000 developers, and then you can give Linus advise.

I think @torvalds is a pretty cool guy. eh scolds githubs and doesnt afraid of anything.

Contributor

MostAwesomeDude commented May 11, 2012

While I do have great respect for you @torvalds and your work, and it's totally valid for the repository of Linux to have rather rigorous standards, have you considered the possibility there could be a lot of GitHub users who don't really need nor care about any of those "features" you try to portray as objectively superior?

"GitHub is the best place to share code with friends, co-workers,
classmates, and complete strangers." As long as GH actually, genuinely
cares about making this statement true, they should be providing these
features.

Roman, in the future, you should follow the kernel's guide for
submitting patches. I believe that drivers/bluetooth is covered by the
list at linux-bluetooth@vger.kernel.org and you can submit your patch
to them, with a proper Signed-off-by tag.

FWIW, Reviewed-by: Corbin Simpson MostAwesomeDude@gmail.com, but
there's no way to confirm that since GH is going to hide my email
address and I can't easily sign this message.

(As an example of broken UI, while writing this message, I split my
screen between Firefox and vim, vertically. Linus' messages, being
wrapped, were perfectly readable, but because Github has a massive
minimum width, I had to scroll back and forth in order to read everybody
else's messages.)

Contributor

ivyl commented May 11, 2012

@mmorris-gc
Sure, tools can do that, but at what cost?
Mostly messages are read in terminal, not via web interface.

How to distinguish part which should be wrapped from ones that
don't? Add extra tags?

Commit logs are mostly viewed in terminals, which tends to use
monotype fonts.

What about quoting? ">" are clean and indicates
level of quoting.

This ideas are used for years in emails and guess what?
They work!

@mmorris-gc It's open source. Fork it and write a custom viewer for youself. Problem solved.

mephux commented May 11, 2012

Amen for the "victim philosophy" comment. If you want to commit or suggest features get ready for feedback. People need to seriously stop crying when others are blunt with them; It's pathetic. (not everyone has time to consider the infinite ways you may interpret something)

I'd have to say I fully agree with @torvalds, I've worked in very strict commit standards, and in very loose standards, and by far my entire experience was a lot better with well formatted standard commit messages. Github does not handle this at all.

Some say that "people don't care", it's mostly because they don't know what they are missing, if it were more convenient to use good standards, everyone would use them.

jite commented May 11, 2012

Sometimes I wonder if the ones who like a massive one-liner as commit message are Windows users...

@ivyl

Sure, tools can do that, but at what cost?

I don't know what the cost is, but I'd be interested to know! That's why I was asking what prevents the tool from doing this rather than requiring that the user handle it.

@factormystic Not sure what this has to do with my question. I was just wondering if there was a reason that the viewer couldn't handle it; I wasn't complaining or asking someone to fix it for me.

jnavila commented May 11, 2012

Sad that there is no option to disable pull requests via github

skalnik commented May 11, 2012

@torvalds It is indeed a text area.
On top of this, vim/emacs/$EDITOR does not usually enforce the commit format either. In both cases it's up to the end user to write a well styled commit message.

That being said, I agree it could be better. Perhaps if it was more like the commit form that the GitHub application has.

Since this is seems so important, perhaps git should enforce this style by rejecting any commits with a message that does not adhere to your specification?

camdez commented May 11, 2012

why is it that the tool used to visualize things cannot know how to wrap text it displays?

@mmorris-gc That was actually covered by @torvalds above when he said:

Some things should not be word-wrapped. They may be some kind of
quoted text - long compiler error messages, oops reports, whatever.

Not only would it be a tremendous burden for every viewing tool to try and determine which items meet the above definition (and do so correctly), many of the tools we use are generic whereas the formatting rules might depend what domain the material came from, making it literally impossible to display things correctly under all conditions.

@camdez Interesting. Still seems like a problem that could be solved by better tooling, but I appreciate you taking the time to point that out. Thanks!

@jnavia there´s a way to disable pull requests in Github, they call it private repos.

So sad seeing someone who made a great system raging like a child because no one and no system can be like him or how he wants.

antirez commented May 11, 2012

@torvalds other than "form" of pull requests what I'm even more worried about is that this new model of contributing code bypasses the former interaction that there is in a mailing list. If the hub of a project is the ML there are better chances that things are discussed before turning into code that will be refused. Even when the approach starts with a patch, it gets publicly discussed by interested parties, and a long term trace remains in the ML archive. It's a pretty different way of doing this, that was used to build a lot of code with success, and one that works better for a project where patches and new ideas are scrutinized in depth before being accepted.

@torvalds I would like to take this oppurtunity to say thanks for Linux and git. For without both of those this great coding community wouldn't have had a chance.

I'd also like to point out something else GitHub does do really well. This. What we are doing right now. Socially coding in an open environment. Talking about things, being connected. Hell when I was growing up I never thought I'd get a chance to say something that Linus effing Torvalds would get to read and possibly comment on, and now here I am, able to put in my two cents (in a flood of thousands of pennies). So thankyou. Thankyou Linus for making git and Linux, and thankyou GitHub for making coding social.

jnavila commented May 11, 2012

@leobalter No : disabling pull requests does not mean making a repo private. As many other opensource projects, the linux kernel has its own workflow, so why not follow it? At GH, they are aware of it, they even mention in the progit book.

And before "raging like a child" about his comments, read them again: he just does not care or bother.

My own preferred solution would be if GitHub kept to one commit message box but live previewed how it would appear below with 72 character wrap. Then you could see clearly what the short and long messages would look and could adjust accordingly (this is done in Stack Overflow and is very helpful).

The last issue is that monospace is required to view / wrap correctly. A natural way to handle this is to use the markdown four space indent syntax, but since this could get annoying it might be better to have an input type pulldown (text vs markdown) in the same way editing GitHub wikis allows.

@jnavila github has its pull requests as they are. Maybe no one follow "high standards" Linus but it´s great in my workflow.

My point is: raging like a child is unnecessary. Turn of pull requests notifications and don´t answer.

If this github pull requests mess your day off, start thinking about using other code hosting.

The community doesn´t need to be blamed for not being such highness standards followers, we just need people collaborating, because it´s open and many visions are still great on any project.

drogus commented May 11, 2012

I'm not sure why this topic is about pull requests not the feature of editing files online. Most of the people create pull requests out of branches prepared locally, I've prepared tons of pull requests and I've used online editor only once.

@leobalter, you're missing the point, this isn't about downplaying the current workings of github, it's about suggesting better workings for github. Just because you are fine with having pull requests on doesn't mean there shouldn't be an option to turn them off.

@leobalter He's not blaming 'the community', he's pointing out what he thinks needs improving in GH. Raging like a child? If you don't like his 'childish' opinion (read: high standards), don't open a pull request. I'm quite happy to see the conversation that's followed as a result.

I work at a financial institution where a single line code change can be backed with 50 page specs, 200 lines of test code, 2 weeks of testing, etc. Asking for a decent commit message on your own repo isn't that big of a deal.

nugend commented May 11, 2012

@camdez Are we talking about only the situation where some text shouldn't be word wrapped though? Are there other wrapping related formatting concerns with plain text?

@ghost

ghost commented May 11, 2012

I agree, especially the identify verification via confirmed email addresses, digital signatures, or a mix.

@torvalds I think you missed my point. I'm not just talking about people using Github to host. You don't merge everything in Linux yourself, you defer 90% of that through a trust hierarchy (as you eloquently described in your Google talk about Git). Unless you somehow enforce that everybody under you also refuses Github pull requests, your logs could still get soiled.

@antirez How is the discussion of a pull request on GitHub different than the discussion of a patch on a mailing list? Is it that you end up with two different places to discuss things - mailing list for things without patches, GitHub for things with patches? Or is it that subscribing to see pull requests for a project is not as elegant as subscribing to a mailing list?

My company has had quite a bit of success having in depth discussions about both experimental and more straightforward patches on pull requests, and treating them as the long term trace of discussion, much like you're suggesting - what would we gain from using a mailing list instead?

Owner

torvalds commented May 11, 2012

On Fri, May 11, 2012 at 4:12 PM, orblivion
reply@reply.github.com
wrote:

Unless you somehow enforce that everybody under you also refuses Github pull requests, your logs could still get soiled.

I'm not a "rules over everything else" kind of black-and-white person.

I'm basically describing what my requirements are. Not all Linux
sub-maintainers are necessarily as critical as I am, and yes, there
are ugly commit messages in the kernel too (and some of them are very
much about lacking proper word wrapping, for example).

So things slip through occasionally. I'm not German - rules are good,
and they set a standard that people should really try to strive for
(and quite frankly, hopefully exceed: the "formatting rules" should
preferably go with "really good and readable message that really
explains what is going on"), but rules are not some kind of absolute
thing that have to be 100% guaranteed.

In the kernel, see commit cb8722d, for example. That's a case of
"oops, that's one long line". It happens, and I got it through David
Miller, who usually doesn't have those kinds of issues. I suspect the
patch came from somebody who used an annoying editor or MUA that has
problems wrapping lines properly (sometimes, you have to disable
word-wrap in the MUA for it to not corrupt patches, but then some
MUA's have a horrible editor that doesn't help you wrap lines when you
want to!).

So I don't worry about "still get soiled". Crap happens, we try to minimize it.

What I dislike about the github thing is that it's not "crap happens,
we'll try to minimize it", it's "crap absolutely WILL happen".
Instead of trying to minimize it, the commit message editor actively
revels in it, and makes it hard not to make a crappy message.

Similarly, the pull request interface of github makes it literally
impossible to make a good pull request. You literally cannot make a
good pull request using the github web interface.

So right now, I encourage people to use github as a hosting site, but
as a hosting site only. Don't create commits there, and don't use
github for pull requests. Do your commits on your own machine, push
them to github, and then when you're ready for a pull request, again
do it on your own machine and email it to the maintainer that way.

So I'm really not trying to hate on github. I only despise a few of
the small details of github.

Github as a hosting site for open-source (or closed, for that matter)
projects is wonderful.

Github as a place to generate commits and pull requests? Not so much.

                    Linus
Contributor

dysoco commented May 11, 2012

@johnmetta Oh, you must be new to the internet, or to @torvalds rants :P

braneed commented May 12, 2012

Linus, I love your rants and your code. @torvalds.

I like how @torvalds rants on a high niveau ;)

nice read, and I have to agree (tho the "moron" comment really wasn't necessary)

Did you see about adding .patch to the end of the pull request URL like so: https://github.com/torvalds/linux/pull/17.patch

I'm no git-expert, but doesn't that have all the information?

Not sure what all this fuss is about. @torvalds points out that due to definite weaknesses in GitHub's UI he won't accept pull requests, and the world starts whaling on him. It's simple: if you want him to pull your changes in, don't use GitHub to generate the request. This would probably be easier than trying to change his mind.

braph pushed a commit to braph/dillo that referenced this pull request Dec 16, 2016

limit size when copying strings to find character references
torvalds/linux#17 has a five-megabyte title
attribute, which is just a bit excessive. Since it has tons of < and
>, dillo couldn't cope with it. Over five minutes to parse as much
of it as it got before the connection broke. With this change, it's
about fifty seconds (on this old computer) to get/show the full 24 megs,
which is an improvement, at least.

@lenovouser lenovouser referenced this pull request Jan 1, 2017

Closed

Update rt2800usb.c #368

tombriden pushed a commit to tombriden/linux that referenced this pull request Jan 6, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rsalveti pushed a commit to rsalveti/linux that referenced this pull request Jan 12, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
BugLink: http://bugs.launchpad.net/bugs/1655082

commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

demurgos added a commit to demurgos/notes that referenced this pull request Jan 20, 2017

Add commit messages reference
This commit adds a link to Tim Pope's article about git commit messages.
This article was approved by Linus Torvalds in the following comment:
torvalds/linux#17 (comment)

Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 6, 2017

x86/quirks: Add early quirk to reset Apple AirPort card
commit abb2baf upstream.

The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.

The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.

The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.

When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56

This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.

Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.

Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.

The following should be a comprehensive list of affected models:
    iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
    iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
    Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
    Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]

For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):

    irq 17: nobody cared (try booting with the "irqpoll" option)
    handlers:
    [<ffffffff81374370>] pcie_isr
    [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
    [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
    Disabling IRQ #17

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Kwiboo pushed a commit to Kwiboo/linux that referenced this pull request Feb 9, 2017

Merge pull request #17 from Kwiboo/aml-audio
sound/soc/aml/m8: update channel mapping

lukenels pushed a commit to lukenels/linux that referenced this pull request Feb 12, 2017

IB/core: correctly handle rdma_rw_init_mrs() failure
BugLink: http://bugs.launchpad.net/bugs/1637520

commit b6bc1c7 upstream.

Function ib_create_qp() was failing to return an error when
rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp()
when trying to dereferece the qp pointer which was actually a negative
errno.

The crash:

crash> log|grep BUG
[  136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
crash> bt
PID: 3736   TASK: ffff8808543215c0  CPU: 2   COMMAND: "kworker/u64:2"
 #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0
 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758
 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d
 #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6
 #4 [ffff88084d3234e0] no_context at ffffffff8106e431
 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610
 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4
 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc
 #8 [ffff88084d323620] do_page_fault at ffffffff8106f057
 #9 [ffff88084d323660] page_fault at ffffffff816e3148
    [exception RIP: ib_create_qp+427]
    RIP: ffffffffa02554fb  RSP: ffff88084d323718  RFLAGS: 00010246
    RAX: 0000000000000004  RBX: fffffffffffffff4  RCX: 000000018020001f
    RDX: ffff880830997fc0  RSI: 0000000000000001  RDI: ffff88085f407200
    RBP: ffff88084d323778   R8: 0000000000000001   R9: ffffea0020bae210
    R10: ffffea0020bae218  R11: 0000000000000001  R12: ffff88084d3237c8
    R13: 00000000fffffff4  R14: ffff880859fa5000  R15: ffff88082eb89800
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm]
#11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma]
#12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma]
#13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma]
#14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma]
#15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm]
#16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm]
#17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm]
#18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm]
#19 [ffff88084d323cb0] process_one_work at ffffffff810a1483
#20 [ffff88084d323d90] worker_thread at ffffffff810a211d
#21 [ffff88084d323ec0] kthread at ffffffff810a6c5c
#22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf

Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

fengguang added a commit to 0day-ci/linux that referenced this pull request Mar 10, 2017

dccp/tcp: fix routing redirect race
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>

torvalds pushed a commit that referenced this pull request Mar 15, 2017

dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 16, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sunny256 pushed a commit to sunny256/linux that referenced this pull request Mar 17, 2017

x86/quirks: Add early quirk to reset Apple AirPort card
commit abb2baf upstream.

The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.

The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.

The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.

When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56

This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.

Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.

Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.

The following should be a comprehensive list of affected models:
    iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
    iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
    Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
    Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
    Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
    MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
    MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]

For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):

    irq 17: nobody cared (try booting with the "irqpoll" option)
    handlers:
    [<ffffffff81374370>] pcie_isr
    [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
    [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
    Disabling IRQ #17

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sunny256 pushed a commit to sunny256/linux that referenced this pull request Mar 17, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 22, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 22, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

shenki pushed a commit to shenki/linux that referenced this pull request Mar 22, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fengguang pushed a commit to 0day-ci/linux that referenced this pull request Mar 23, 2017

origin
GIT d528ae0d3dfedea553812c957a6ed1e87feeed8a

commit 15c9e10d9ad4d41d076148bbff1de7f659f68852
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date:   Thu Mar 16 16:40:33 2017 -0700

    drivers core: remove assert_held_device_hotplug()
    
    The last caller of assert_held_device_hotplug() is gone, so remove it again.
    
    Link: http://lkml.kernel.org/r/20170314125226.16779-3-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
    Acked-by: Dan Williams <dan.j.williams@intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Ben Hutchings <ben@decadent.org.uk>
    Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 55adc1d05dca9e949cdf46c747cb1e91c0e9143d
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date:   Thu Mar 16 16:40:30 2017 -0700

    mm: add private lock to serialize memory hotplug operations
    
    Commit bfc8c90139eb ("mem-hotplug: implement get/put_online_mems")
    introduced new functions get/put_online_mems() and mem_hotplug_begin/end()
    in order to allow similar semantics for memory hotplug like for cpu
    hotplug.
    
    The corresponding functions for cpu hotplug are get/put_online_cpus()
    and cpu_hotplug_begin/done() for cpu hotplug.
    
    The commit however missed to introduce functions that would serialize
    memory hotplug operations like they are done for cpu hotplug with
    cpu_maps_update_begin/done().
    
    This basically leaves mem_hotplug.active_writer unprotected and allows
    concurrent writers to modify it, which may lead to problems as outlined
    by commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use
    mem_hotplug_{begin, done}").
    
    That commit was extended again with commit b5d24fda9c3d ("mm,
    devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin,
    done}") which serializes memory hotplug operations for some call sites
    by using the device_hotplug lock.
    
    In addition with commit 3fc21924100b ("mm: validate device_hotplug is held
    for memory hotplug") a sanity check was added to mem_hotplug_begin() to
    verify that the device_hotplug lock is held.
    
    This in turn triggers the following warning on s390:
    
    WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 assert_held_device_hotplug+0x4a/0x58
     Call Trace:
      assert_held_device_hotplug+0x40/0x58)
      mem_hotplug_begin+0x34/0xc8
      add_memory_resource+0x7e/0x1f8
      add_memory+0xda/0x130
      add_memory_merged+0x15c/0x178
      sclp_detect_standby_memory+0x2ae/0x2f8
      do_one_initcall+0xa2/0x150
      kernel_init_freeable+0x228/0x2d8
      kernel_init+0x2a/0x140
      kernel_thread_starter+0x6/0xc
    
    One possible fix would be to add more lock_device_hotplug() and
    unlock_device_hotplug() calls around each call site of
    mem_hotplug_begin/end().  But that would give the device_hotplug lock
    additional semantics it better should not have (serialize memory hotplug
    operations).
    
    Instead add a new memory_add_remove_lock which has the similar semantics
    like cpu_add_remove_lock for cpu hotplug.
    
    To keep things hopefully a bit easier the lock will be locked and unlocked
    within the mem_hotplug_begin/end() functions.
    
    Link: http://lkml.kernel.org/r/20170314125226.16779-2-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
    Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
    Acked-by: Dan Williams <dan.j.williams@intel.com>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Ben Hutchings <ben@decadent.org.uk>
    Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 171012f561274784160f666f8398af8b42216e1f
Author: Dmitry Vyukov <dvyukov@google.com>
Date:   Thu Mar 16 16:40:27 2017 -0700

    mm: don't warn when vmalloc() fails due to a fatal signal
    
    When vmalloc() fails it prints a very lengthy message with all the
    details about memory consumption assuming that it happened due to OOM.
    
    However, vmalloc() can also fail due to fatal signal pending.  In such
    case the message is quite confusing because it suggests that it is OOM
    but the numbers suggest otherwise.  The messages can also pollute
    console considerably.
    
    Don't warn when vmalloc() fails due to fatal signal pending.
    
    Link: http://lkml.kernel.org/r/20170313114425.72724-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
    Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit d0f33ac9ae7b2a727fb678235ae37baf1d0608d5
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Mar 16 16:40:24 2017 -0700

    mm, x86: fix native_pud_clear build error
    
    We still get a build error in random configurations, after this has been
    modified a few times:
    
      In file included from include/linux/mm.h:68:0,
                       from include/linux/suspend.h:8,
                       from arch/x86/kernel/asm-offsets.c:12:
      arch/x86/include/asm/pgtable.h:66:26: error: redefinition of 'native_pud_clear'
       #define pud_clear(pud)   native_pud_clear(pud)
    
    My interpretation is that the build error comes from a typo in
    __PAGETABLE_PUD_FOLDED, so fix that typo now, and remove the incorrect
    #ifdef around the native_pud_clear definition.
    
    Fixes: 3e761a42e19c ("mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()")
    Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
    Link: http://lkml.kernel.org/r/20170314121330.182155-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Ackedy-by: Dave Jiang <dave.jiang@intel.com>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Thomas Garnier <thgarnie@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Borislav Petkov <bp@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 5be9b730b09c45c358bbfe7f51d254e306cccc07
Author: Masami Hiramatsu <mhiramat@kernel.org>
Date:   Thu Mar 16 16:40:21 2017 -0700

    kasan: add a prototype of task_struct to avoid warning
    
    Add a prototype of task_struct to fix below warning on arm64.
    
      In file included from arch/arm64/kernel/probes/kprobes.c:19:0:
      include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
       static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
    
    As same as other types (kmem_cache, page, and vm_struct) this adds a
    prototype of task_struct data structure on top of kasan.h.
    
    [arnd] A related warning was fixed before, but now appears in a
    different line in the same file in v4.11-rc2.  The patch from Masami
    Hiramatsu still seems appropriate, so let's take his version.
    
    Fixes: 71af2ed5eeea ("kasan, sched/headers: Remove <linux/sched.h> from <linux/kasan.h>")
    Link: https://patchwork.kernel.org/patch/9569839/
    Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Acked-by: Alexander Potapenko <glider@google.com>
    Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 271df90e4e530c17f237b27034d6341cb2c2f536
Author: Vitaly Wool <vitalywool@gmail.com>
Date:   Thu Mar 16 16:40:19 2017 -0700

    z3fold: fix spinlock unlocking in page reclaim
    
    Commmit 5a27aa822029 ("z3fold: add kref refcounting") introduced a bug
    in z3fold_reclaim_page() with function exit that may leave pool->lock
    spinlock held.  Here comes the trivial fix.
    
    Fixes: 5a27aa822029 ("z3fold: add kref refcounting")
    Link: http://lkml.kernel.org/r/20170311222239.7b83d8e7ef1914e05497649f@gmail.com
    Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
    Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 28ea06c46fbcab63fd9a55531387b7928a18a590
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date:   Mon Mar 6 12:58:42 2017 -0500

    gfs2: Avoid alignment hole in struct lm_lockname
    
    Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over
    the entire struct lm_lockname instead of its individual fields.  On some
    architectures, struct lm_lockname contains a hole of uninitialized
    memory due to alignment rules, which now leads to incorrect hash values.
    Get rid of that hole.
    
    Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
    Signed-off-by: Bob Peterson <rpeterso@redhat.com>
    CC: <stable@vger.kernel.org> #v4.3+

commit 630a04e79dd41ff746b545d4fc052e0abb836120
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Wed Mar 15 00:24:25 2017 -0700

    xfs: verify inline directory data forks
    
    When we're reading or writing the data fork of an inline directory,
    check the contents to make sure we're not overflowing buffers or eating
    garbage data.  xfs/348 corrupts an inline symlink into an inline
    directory, triggering a buffer overflow bug.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    ---
    v2: add more checks consistent with _dir2_sf_check and make the verifier
    usable from anywhere.

commit 6b116b1d6a521a1907b3c18cb7a8592a655f660c
Author: Mintz, Yuval <Yuval.Mintz@cavium.com>
Date:   Tue Mar 14 15:26:04 2017 +0200

    qed: Enable iSCSI Out-of-Order
    
    Missing in the initial submission, qed fails to propagate qedi's
    request to enable OOO to firmware.
    
    Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI")
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit db31d330e8b0ad8926725eb2af6f6422c07ca7ab
Author: Mintz, Yuval <Yuval.Mintz@cavium.com>
Date:   Tue Mar 14 15:26:03 2017 +0200

    qed: Correct out-of-bound access in OOO history
    
    Need to set the number of entries in database, otherwise the logic
    would quickly surpass the array.
    
    Fixes: 1d6cff4fca43 ("qed: Add iSCSI out of order packet handling")
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1df2adedcce17ad4a39fba74f0e2b611f797fe10
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Tue Mar 14 15:26:02 2017 +0200

    qed: Fix interrupt flags on Rx LL2
    
    Before iterating over the the LL2 Rx ring, the ring's
    spinlock is taken via spin_lock_irqsave().
    The actual processing of the packet [including handling
    by the protocol driver] is done without said lock,
    so qed releases the spinlock and re-claims it afterwards.
    
    Problem is that the final spin_lock_irqrestore() at the end
    of the iteration uses the original flags saved from the
    initial irqsave() instead of the flags from the most recent
    irqsave(). So it's possible that the interrupt status would
    be incorrect at the end of the processing.
    
    Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support");
    CC: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 4621ceb279d065151eb940ce8a4728b10c0646c7
Author: Mintz, Yuval <Yuval.Mintz@cavium.com>
Date:   Tue Mar 14 15:26:01 2017 +0200

    qed: Free previous connections when releasing iSCSI
    
    Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI")
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 752ecb2da11124a948567076b60767dc8034cfa5
Author: Mintz, Yuval <Yuval.Mintz@cavium.com>
Date:   Tue Mar 14 15:26:00 2017 +0200

    qed: Fix mapping leak on LL2 rx flow
    
    When receiving an Rx LL2 packet, qed fails to unmap the previous buffer.
    
    Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support");
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 3ef310a7d99216e0fbdff29f0cb13bc54180373a
Author: Tomer Tayar <Tomer.Tayar@cavium.com>
Date:   Tue Mar 14 15:25:59 2017 +0200

    qed: Prevent creation of too-big u32-chains
    
    Current Logic would allow the creation of a chain with U32_MAX + 1
    elements, when the actual maximum supported by the driver infrastructure
    is U32_MAX.
    
    Fixes: a91eb52abb50 ("qed: Revisit chain implementation")
    Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f3e48119b97f56fb09310c95d49da122a27003d7
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Tue Mar 14 15:25:58 2017 +0200

    qed: Align CIDs according to DORQ requirement
    
    The Doorbell HW block can be configured at a granularity
    of 16 x CIDs, so we need to make sure that the actual number
    of CIDs configured would be a multiplication of 16.
    
    Today, when RoCE is enabled - given that the number is unaligned,
    doorbelling the higher CIDs would fail to reach the firmware and
    would eventually timeout.
    
    Fixes: dbb799c39717 ("qed: Initialize hardware for new protocols")
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e9093b1183bbac462d2caef3eac165778c0b1bf1
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Tue Mar 14 14:00:01 2017 +0100

    mlxsw: reg: Fix SPVMLR max record count
    
    The num_rec field is 8 bit, so the maximal count number is 255.
    This fixes vlans learning not being enabled for wider ranges than 255.
    
    Fixes: a4feea74cd7a ("mlxsw: reg: Add Switch Port VLAN MAC Learning register definition")
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Reviewed-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f004ec065b4879d6bc9ba0211af2169b3ce3097f
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Tue Mar 14 14:00:00 2017 +0100

    mlxsw: reg: Fix SPVM max record count
    
    The num_rec field is 8 bit, so the maximal count number is 255. This
    fixes vlans not being enabled for wider ranges than 255.
    
    Fixes: b2e345f9a454 ("mlxsw: reg: Add Switch Port VID and Switch Port VLAN Membership registers definitions")
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Reviewed-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 37c343b4f4e70e9dc328ab04903c0ec8d154c1a4
Author: Vlad Yasevich <vyasevich@gmail.com>
Date:   Tue Mar 14 08:58:08 2017 -0400

    net: Resend IGMP memberships upon peer notification.
    
    When we notify peers of potential changes,  it's also good to update
    IGMP memberships.  For example, during VM migration, updating IGMP
    memberships will redirect existing multicast streams to the VM at the
    new location.
    
    Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
    Acked-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 11353b9d10392e79e32603d2178e75feb25eaf0d
Author: Zhilong Liu <zlliu@suse.com>
Date:   Tue Mar 14 15:52:26 2017 +0800

    md/raid1: fix a trivial typo in comments
    
    raid1.c: fix a trivial typo in comments of freeze_array().
    
    Cc: Jack Wang <jack.wang.usish@gmail.com>
    Cc: Guoqing Jiang <gqjiang@suse.com>
    Cc: John Stoffel <john@stoffel.org>
    Acked-by: Coly Li <colyli@suse.de>
    Signed-off-by: Zhilong Liu <zlliu@suse.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 9c62110454b088b4914ffe375c2dbc19643eac34
Author: Jens Axboe <axboe@fb.com>
Date:   Tue Mar 14 11:51:59 2017 -0600

    blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
    
    If we have scheduling enabled, we jump directly to insert-and-run.
    That's fine, but we run the queue async and we don't pass in information
    on whether we can block from this context or not. Fixup both these
    cases.
    
    Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
    Reviewed-by: Omar Sandoval <osandov@fb.com>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 0977762f6d15f13caccc20d71a5dec47d098907d
Author: Song Liu <songliubraving@fb.com>
Date:   Mon Mar 13 13:44:35 2017 -0700

    md/r5cache: fix set_syndrome_sources() for data in cache
    
    Before this patch, device InJournal will be included in prexor
    (SYNDROME_SRC_WANT_DRAIN) but not in reconstruct (SYNDROME_SRC_WRITTEN). So it
    will break parity calculation. With srctype == SYNDROME_SRC_WRITTEN, we need
    include both dev with non-null ->written and dev with R5_InJournal. This fixes
    logic in 1e6d690(md/r5cache: caching phase of r5cache)
    
    Cc: stable@vger.kernel.org (v4.10+)
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Mon Mar 13 00:01:30 2017 +0100

    dccp: fix memory leak during tear-down of unsuccessful connection request
    
    This patch fixes a memory leak, which happens if the connection request
    is not fulfilled between parsing the DCCP options and handling the SYN
    (because e.g. the backlog is full), because we forgot to free the
    list of ack vectors.
    
    Reported-by: Jianwen Ji <jiji@redhat.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b20e2d54789c6acbf6bd0efdbec2cf5fa4d90ef1
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Mon Mar 13 00:00:26 2017 +0100

    tun: fix premature POLLOUT notification on tun devices
    
    aszlig observed failing ssh tunnels (-w) during initialization since
    commit cc9da6cc4f56e0 ("ipv6: addrconf: use stable address generator for
    ARPHRD_NONE"). We already had reports that the mentioned commit breaks
    Juniper VPN connections. I can't clearly say that the Juniper VPN client
    has the same problem, but it is worth a try to hint to this patch.
    
    Because of the early generation of link local addresses, the kernel now
    can start asking for routers on the local subnet much earlier than usual.
    Those router solicitation packets arrive inside the ssh channels and
    should be transmitted to the tun fd before the configuration scripts
    might have upped the interface and made it ready for transmission.
    
    ssh polls on the interface and receives back a POLL_OUT. It tries to send
    the earily router solicitation packet to the tun interface.  Unfortunately
    it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
    doesn't retry again and considers the tun interface broken forever.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
    Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
    Cc: Bjørn Mork <bjorn@mork.no>
    Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
    Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
    Reported-by: Jonas Lippuner <jonas@lippuner.ca>
    Cc: Jonas Lippuner <jonas@lippuner.ca>
    Reported-by: aszlig <aszlig@redmoonstudios.org>
    Cc: aszlig <aszlig@redmoonstudios.org>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0
Author: Jon Maxwell <jmaxwell37@gmail.com>
Date:   Fri Mar 10 16:40:33 2017 +1100

    dccp/tcp: fix routing redirect race
    
    As Eric Dumazet pointed out this also needs to be fixed in IPv6.
    v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
    
    We have seen a few incidents lately where a dst_enty has been freed
    with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
    dst_entry. If the conditions/timings are right a crash then ensues when the
    freed dst_entry is referenced later on. A Common crashing back trace is:
    
     #8 [] page_fault at ffffffff8163e648
        [exception RIP: __tcp_ack_snd_check+74]
    .
    .
     #9 [] tcp_rcv_established at ffffffff81580b64
    #10 [] tcp_v4_do_rcv at ffffffff8158b54a
    #11 [] tcp_v4_rcv at ffffffff8158cd02
    #12 [] ip_local_deliver_finish at ffffffff815668f4
    #13 [] ip_local_deliver at ffffffff81566bd9
    #14 [] ip_rcv_finish at ffffffff8156656d
    #15 [] ip_rcv at ffffffff81566f06
    #16 [] __netif_receive_skb_core at ffffffff8152b3a2
    #17 [] __netif_receive_skb at ffffffff8152b608
    #18 [] netif_receive_skb at ffffffff8152b690
    #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
    #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
    #21 [] net_rx_action at ffffffff8152bac2
    #22 [] __do_softirq at ffffffff81084b4f
    #23 [] call_softirq at ffffffff8164845c
    #24 [] do_softirq at ffffffff81016fc5
    #25 [] irq_exit at ffffffff81084ee5
    #26 [] do_IRQ at ffffffff81648ff8
    
    Of course it may happen with other NIC drivers as well.
    
    It's found the freed dst_entry here:
    
     224 static bool tcp_in_quickack_mode(struct sock *sk)↩
     225 {↩
     226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
     227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
     228 ↩
     229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
     230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
     231 }↩
    
    But there are other backtraces attributed to the same freed dst_entry in
    netfilter code as well.
    
    All the vmcores showed 2 significant clues:
    
    - Remote hosts behind the default gateway had always been redirected to a
    different gateway. A rtable/dst_entry will be added for that host. Making
    more dst_entrys with lower reference counts. Making this more probable.
    
    - All vmcores showed a postitive LockDroppedIcmps value, e.g:
    
    LockDroppedIcmps                  267
    
    A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
    regardless of whether user space has the socket locked. This can result in a
    race condition where the same dst_entry cached in sk->sk_dst_entry can be
    decremented twice for the same socket via:
    
    do_redirect()->__sk_dst_check()-> dst_release().
    
    Which leads to the dst_entry being prematurely freed with another socket
    pointing to it via sk->sk_dst_cache and a subsequent crash.
    
    To fix this skip do_redirect() if usespace has the socket locked. Instead let
    the redirect take place later when user space does not have the socket
    locked.
    
    The dccp/IPv6 code is very similar in this respect, so fixing it there too.
    
    As Eric Garver pointed out the following commit now invalidates routes. Which
    can set the dst->obsolete flag so that ipv4_dst_check() returns null and
    triggers the dst_release().
    
    Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
    Cc: Eric Garver <egarver@redhat.com>
    Cc: Hannes Sowa <hsowa@redhat.com>
    Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 02bb56ddc6711639c549d81c7b9f6d845da243a9
Author: Zhao Qiang <qiang.zhao@nxp.com>
Date:   Tue Mar 14 09:38:33 2017 +0800

    ucc/hdlc: fix two little issue
    
    1. modify bd_status from u32 to u16 in function hdlc_rx_done,
    because bd_status register is 16bits
    2. write bd_length register before writing bd_status register
    
    Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c80498e36d4ef3e24599d363c622fbf22a1293cc
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Mon Mar 13 16:24:03 2017 +0100

    vxlan: fix ovs support
    
    The required changes in the function vxlan_dev_create() were missing
    in commit 8bcdc4f3a20b.
    The vxlan device is not registered anymore after this patch and the error
    path causes an stack dump:
     WARNING: CPU: 3 PID: 1498 at net/core/dev.c:6713 rollback_registered_many+0x9d/0x3f0
    
    Fixes: 8bcdc4f3a20b ("vxlan: add changelink support")
    CC: Roopa Prabhu <roopa@cumulusnetworks.com>
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 91864f5852f9996210fad400cf70fb85af091243
Author: Andrey Vagin <avagin@openvz.org>
Date:   Sun Mar 12 21:36:18 2017 -0700

    net: use net->count to check whether a netns is alive or not
    
    The previous idea was to check whether a net namespace is in
    net_exit_list or not. It doesn't work, because net->exit_list is used in
    __register_pernet_operations and __unregister_pernet_operations where
    all namespaces are added to a temporary list to make cleanup in a error
    case, so list_empty(&net->exit_list) always returns false.
    
    Reported-by: Mantas Mikulėnas <grawity@gmail.com>
    Fixes: 002d8a1a6c11 ("net: skip genenerating uevents for network namespaces that are exiting")
    Signed-off-by: Andrei Vagin <avagin@openvz.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a13b2082ece95247779b9995c4e91b4246bed023
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Mar 13 17:38:17 2017 +0100

    bridge: drop netfilter fake rtable unconditionally
    
    Andreas reports kernel oops during rmmod of the br_netfilter module.
    Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
    
    Problem is that br_netfilter has the nasty concept of adding a fake
    rtable to skb->dst; this happens in a br_netfilter prerouting hook.
    
    A second hook (in bridge LOCAL_IN) is supposed to remove these again
    before the skb is handed up the stack.
    
    However, on module unload hooks get unregistered which means an
    skb could traverse the prerouting hook that attaches the fake_rtable,
    while the 'fake rtable remove' hook gets removed from the hooklist
    immediately after.
    
    Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core")
    Reported-by: Andreas Karis <akaris@redhat.com>
    Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Mar 13 16:24:28 2017 +0100

    ipv6: avoid write to a possibly cloned skb
    
    ip6_fragment, in case skb has a fraglist, checks if the
    skb is cloned.  If it is, it will move to the 'slow path' and allocates
    new skbs for each fragment.
    
    However, right before entering the slowpath loop, it updates the
    nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
    to account for the fragment header that will be inserted in the new
    ipv6-fragment skbs.
    
    In case original skb is cloned this munges nexthdr value of another
    skb.  Avoid this by doing the nexthdr update for each of the new fragment
    skbs separately.
    
    This was observed with tcpdump on a bridge device where netfilter ipv6
    reassembly is active:  tcpdump shows malformed fragment headers as
    the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
    
    Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Reported-by: Andreas Karis <akaris@redhat.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6e526fdff7be4f13b24f929a04c0e9ae6761291e
Author: Johan Hovold <johan@kernel.org>
Date:   Mon Mar 13 13:42:03 2017 +0100

    net: wimax/i2400m: fix NULL-deref at probe
    
    Make sure to check the number of endpoints to avoid dereferencing a
    NULL-pointer or accessing memory beyond the endpoint array should a
    malicious device lack the expected endpoints.
    
    The endpoints are specifically dereferenced in the i2400m_bootrom_init
    path during probe (e.g. in i2400mu_tx_bulk_out).
    
    Fixes: f398e4240fce ("i2400m/USB: probe/disconnect, dev init/shutdown
    and reset backends")
    Cc: Inaky Perez-Gonzalez <inaky@linux.intel.com>
    
    Signed-off-by: Johan Hovold <johan@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 68c32f9c2a36d410aa242e661506e5b2c2764179
Author: Johan Hovold <johan@kernel.org>
Date:   Mon Mar 13 13:39:01 2017 +0100

    isdn/gigaset: fix NULL-deref at probe
    
    Make sure to check the number of endpoints to avoid dereferencing a
    NULL-pointer should a malicious device lack endpoints.
    
    Fixes: cf7776dc05b8 ("[PATCH] isdn4linux: Siemens Gigaset drivers -
    direct USB connection")
    Cc: stable <stable@vger.kernel.org>	# 2.6.17
    Cc: Hansjoerg Lipp <hjlipp@web.de>
    
    Signed-off-by: Johan Hovold <johan@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 67e194007be08d071294456274dd53e0a04fdf90
Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Mon Mar 13 13:28:09 2017 +0100

    ipv6: make ECMP route replacement less greedy
    
    Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
    loop that removes all siblings of an ECMP route that is being
    replaced. However, this loop doesn't stop when it has replaced
    siblings, and keeps removing other routes with a higher metric.
    We also end up triggering the WARN_ON after the loop, because after
    this nsiblings < 0.
    
    Instead, stop the loop when we have taken care of all routes with the
    same metric as the route being replaced.
    
      Reproducer:
      ===========
        #!/bin/sh
    
        ip netns add ns1
        ip netns add ns2
        ip -net ns1 link set lo up
    
        for x in 0 1 2 ; do
            ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
            ip -net ns1 link set eth$x up
            ip -net ns2 link set veth$x up
        done
    
        ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
                nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
        ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
        ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
    
        echo "before replace, 3 routes"
        ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
        echo
    
        ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
                nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
    
        echo "after replace, only 2 routes, metric 2048 is gone"
        ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
    
    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Reviewed-by: Xin Long <lucien.xin@gmail.com>
    Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ce70df089143c49385b4f32f39d41fb50fbf6a7c
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Mon Mar 13 08:22:13 2017 +0300

    mm, gup: fix typo in gup_p4d_range()
    
    gup_p4d_range() should call gup_pud_range(), not itself.
    
    [ This was not noticed on x86: this is the HAVE_GENERIC_RCU_GUP code
      used by arm[64] and powerpc    - Linus ]
    
    Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
    Reported-by: Anton Blanchard <anton@samba.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 4a3a485b1ed0e109718cc8c9d094fa0f552de9b2
Author: Tahsin Erdogan <tahsin@google.com>
Date:   Fri Mar 10 12:09:49 2017 -0800

    writeback: fix memory leak in wb_queue_work()
    
    When WB_registered flag is not set, wb_queue_work() skips queuing the
    work, but does not perform the necessary clean up. In particular, if
    work->auto_free is true, it should free the memory.
    
    The leak condition can be reprouced by following these steps:
    
       mount /dev/sdb /mnt/sdb
       /* In qemu console: device_del sdb */
       umount /dev/sdb
    
    Above will result in a wb_queue_work() call on an unregistered wb and
    thus leak memory.
    
    Reported-by: John Sperbeck <jsperbeck@google.com>
    Signed-off-by: Tahsin Erdogan <tahsin@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 0067d4b020ea07a58540acb2c5fcd3364bf326e0
Author: Sagi Grimberg <sagi@grimberg.me>
Date:   Mon Mar 13 16:10:11 2017 +0200

    blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
    
    In case cpu was unplugged, we need to make sure not to assume
    that the tags for that cpu are still allocated. so check
    for null tags when reinitializing a tagset.
    
    Reported-by: Yi Zhang <yizhan@redhat.com>
    Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 65869a47f3488253f5fd88cc4f14e0a4e2601a55
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Sat Mar 11 16:55:49 2017 +0100

    bpf: improve read-only handling
    
    Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
    one-time warning in case of an error when the image couldn't
    be set read-only, and also mark struct bpf_prog as locked when
    bpf_prog_lock_ro() was called.
    
    Reason for the latter is that bpf_prog_unlock_ro() is called from
    various places including error paths, and we shouldn't mess with
    page attributes when really not needed.
    
    For bpf_jit_binary_unlock_ro() this is not needed as jited flag
    implicitly indicates this, thus for archs with ARCH_HAS_SET_MEMORY
    we're guaranteed to have a previously locked image. Overall, this
    should also help us to identify any further potential issues with
    set_memory_*() helpers.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1da8ac7c49fb2879ba95006d8bd1095e6870ea1a
Author: Alexei Starovoitov <ast@fb.com>
Date:   Fri Mar 10 22:05:55 2017 -0800

    selftests/bpf: fix broken build
    
    Recent merge of 'linux-kselftest-4.11-rc1' tree broke bpf test build.
    None of the tests were building and test_verifier.c had tons of compiler errors.
    Fix it and add #ifdef CAP_IS_SUPPORTED to support old versions of libcap.
    Tested on centos 6.8 and 7
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 79099aab38c8f5c746748b066ae74ba984fe2cc8
Author: David Ahern <dsa@cumulusnetworks.com>
Date:   Fri Mar 10 14:11:39 2017 -0800

    mpls: Do not decrement alive counter for unregister events
    
    Multipath routes can be rendered usesless when a device in one of the
    paths is deleted. For example:
    
    $ ip -f mpls ro ls
    100
    	nexthop as to 200 via inet 172.16.2.2  dev virt12
    	nexthop as to 300 via inet 172.16.3.2  dev br0
    101
    	nexthop as to 201 via inet6 2000:2::2  dev virt12
    	nexthop as to 301 via inet6 2000:3::2  dev br0
    
    $ ip li del br0
    
    When br0 is deleted the other hop is not considered in
    mpls_select_multipath because of the alive check -- rt_nhn_alive
    is 0.
    
    rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
    down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
    a 2 hop route, deleting one device drops the alive count to 0. Since
    devices are taken down before unregistering, the decrement on
    NETDEV_UNREGISTER is redundant.
    
    Fixes: c89359a42e2a4 ("mpls: support for dead routes")
    Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b17075d5c1988b83f840d272c795ac17d57ce804
Author: Igor Druzhinin <igor.druzhinin@citrix.com>
Date:   Fri Mar 10 21:36:22 2017 +0000

    xen-netback: fix race condition on XenBus disconnect
    
    In some cases during XenBus disconnect event handling and subsequent
    queue resource release there may be some TX handlers active on
    other processors. Use RCU in order to synchronize with them.
    
    Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e37791ec1ad785b59022ae211f63a16189bacebf
Author: David Ahern <dsa@cumulusnetworks.com>
Date:   Fri Mar 10 09:46:15 2017 -0800

    mpls: Send route delete notifications when router module is unloaded
    
    When the mpls_router module is unloaded, mpls routes are deleted but
    notifications are not sent to userspace leaving userspace caches
    out of sync. Add the call to mpls_notify_route in mpls_net_exit as
    routes are freed.
    
    Fixes: 0189197f44160 ("mpls: Basic routing support")
    Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 52491c7607c5527138095edf44c53169dc1ddb82
Author: Etienne Noss <etienne.noss@wifirst.fr>
Date:   Fri Mar 10 16:55:32 2017 +0100

    act_connmark: avoid crashing on malformed nlattrs with null parms
    
    tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
    is set, resulting in a null pointer dereference when trying to access it.
    
    [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
    [501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
    ...
    [501099.044334] Call Trace:
    [501099.044345]  [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
    [501099.044363]  [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
    [501099.044380]  [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
    [501099.044398]  [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
    [501099.044417]  [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
    [501099.044436]  [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
    [501099.044454]  [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
    [501099.044471]  [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
    [501099.044490]  [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
    [501099.044507]  [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
    [501099.044524]  [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
    [501099.044541]  [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
    [501099.044558]  [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
    [501099.044576]  [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
    [501099.044592]  [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
    [501099.044608]  [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
    [501099.044626]  [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b
    
    Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action")
    Signed-off-by: Étienne Noss <etienne.noss@wifirst.fr>
    Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
    Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 88a7cddce2506b0b6c06a9f6e51379d0d275353b
Author: Neil Jerram <neil@tigera.io>
Date:   Fri Mar 10 12:24:57 2017 +0000

    Make IP 'forwarding' doc more precise
    
    It wasn't clear if the 'forwarding' setting needs to be enabled on the
    interface that packets are received from, or on the interface that
    packets are forwarded to, or both.
    
    In fact (according to my code reading) the setting is relevant on the
    interface that packets are received from, so this change updates the doc
    to say that.
    
    Signed-off-by: Neil Jerram <neil@tigera.io>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7ce101246655935b014b11d81f815342921f5654
Author: stephen hemminger <stephen@networkplumber.org>
Date:   Thu Mar 9 14:58:29 2017 -0800

    netvsc: handle select_queue when device is being removed
    
    Move the send indirection table from the inner device (netvsc)
    to the network device context.
    
    It is possible that netvsc_device is not present (remove in progress).
    This solves potential use after free issues when packet is being
    created during MTU change, shutdown, or queue count changes.
    
    Fixes: d8e18ee0fa96 ("netvsc: enhance transmit select_queue")
    Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ecd052250b51c8cee4d9720adea05690dfc77c64
Author: David Arcari <darcari@redhat.com>
Date:   Thu Mar 9 13:28:33 2017 -0500

    net: ethernet: aquantia: call set_irq_affinity_hint before free_irq
    
    When a network interface controlled by the aquantia ethernet driver is brought
    down a warning is output in dmesg (see below).
    
    The problem is that aq_pci_func_free_irqs() is calling free_irq() before it is
    calling irq_set_affinity_hint().
    
    WARNING: CPU: 4 PID: 10068 at kernel/irq/manage.c:1503 __free_irq+0x24d/0x2b0
    <snip>
    Call Trace:
     dump_stack+0x63/0x87
     __warn+0xd1/0xf0
     warn_slowpath_null+0x1d/0x20
     __free_irq+0x24d/0x2b0
     free_irq+0x39/0x90
     aq_pci_func_free_irqs+0x52/0xa0 [atlantic]
     aq_nic_stop+0xca/0xd0 [atlantic]
     aq_ndev_close+0x1d/0x40 [atlantic]
     __dev_close_many+0x99/0x100
     __dev_close+0x67/0xb0
    <snip>
    
    Fixes: 36a4a50f4048 ("net: ethernet: aquantia: switch to pci_alloc_irq_vectors")
    
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Pavel Belous <pavel.belous@aquantia.com>
    Signed-off-by: David Arcari <darcari@redhat.com>
    Tested-by: Pavel Belous <pavel.belous@aquantia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit d1c4e9bf73e739b937ddd9dc4cf0f6de2e6117da
Author: João Paulo Rechi Vita <jprvita@gmail.com>
Date:   Mon Feb 20 14:50:23 2017 -0500

    platform/x86: asus-wmi: Remove quirk_no_rfkill
    
    With the detection introduced in the previous patches, we don't need
    these static DMI-based quirks anymore.
    
    This reverts the following commits:
    56a37a72002b "asus-wmi: Add quirk_no_rfkill_wapf4 for the Asus X456UA"
    a961a285b479 "asus-wmi: Add quirk_no_rfkill_wapf4 for the Asus X456UF"
    6b7ff2af5286 "asus-wmi: Add quirk_no_rfkill for the Asus Z550MA"
    02db9ff7af18 "asus-wmi: Add quirk_no_rfkill for the Asus U303LB"
    2d735244b798 "asus-wmi: Add quirk_no_rfkill for the Asus N552VW"
    a977e59c0c67 "asus-wmi: Create quirk for airplane_mode LED"
    
    Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    [dvhart: minor commit message corrections]
    Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>

commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f
Author: NeilBrown <neilb@suse.com>
Date:   Fri Mar 10 17:00:47 2017 +1100

    blk: Ensure users for current->bio_list can see the full list.
    
    Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
    changed current->bio_list so that it did not contain *all* of the
    queued bios, but only those submitted by the currently running
    make_request_fn.
    
    There are two places which walk the list and requeue selected bios,
    and others that check if the list is empty.  These are no longer
    correct.
    
    So redefine current->bio_list to point to an array of two lists, which
    contain all queued bios, and adjust various code to test or walk both
    lists.
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 1345921393ba23b60d3fcf15933e699232ad25ae
Author: Jason Yan <yanaijie@huawei.com>
Date:   Fri Mar 10 11:49:12 2017 +0800

    md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
    
    The sb->layout is of type __le32, so we shoud use le32_to_cpu.
    
    Signed-off-by: Jason Yan <yanaijie@huawei.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 3fb632e40d7667d8bedfabc28850ac06d5493f54
Author: Jason Yan <yanaijie@huawei.com>
Date:   Fri Mar 10 11:27:23 2017 +0800

    md: fix super_offset endianness in super_1_rdev_size_change
    
    The sb->super_offset should be big-endian, but the rdev->sb_start is in
    host byte order, so fix this by adding cpu_to_le64.
    
    Signed-off-by: Jason Yan <yanaijie@huawei.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 0e4c0e6ea7d4a988a5ae2791c7cb5769b5256dad
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Fri Feb 17 15:25:08 2017 +0100

    arm64: kernel: Update kerneldoc for cpu_suspend() rename
    
    Commit af391b15f7b56ce1 ("arm64: kernel: rename __cpu_suspend to keep it
    aligned with arm") renamed cpu_suspend() to arm_cpuidle_suspend(), but
    forgot to update the kerneldoc header.
    
    Fixes: af391b15f7b56ce1 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit ea29bd304d7b522f6162a36f394e690c579b5a63
Author: Eugenia Emantayev <eugenia@mellanox.com>
Date:   Fri Mar 10 14:33:05 2017 +0200

    net/mlx5e: Fix loopback selftest
    
    Change packet type handler to ETH_P_IP instead of ETH_P_ALL
    since we are already expecting an IP packet.
    
    Also, using ETH_P_ALL will cause the loopback test packet type handler
    to be called on all outgoing packets, especially our own self loopback
    test SKB, which will be validated on xmit as well, and we don't want that.
    
    Tested with:
    ethtool -t ethX
    validated that the loopback test passes.
    
    Fixes: 0952da791c97 ('net/mlx5e: Add support for loopback selftest')
    Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 65ba8fb7d5c6803ec236bb8d6650465fed7f9769
Author: Or Gerlitz <ogerlitz@mellanox.com>
Date:   Fri Mar 10 14:33:04 2017 +0200

    net/mlx5e: Avoid wrong identification of rules on deletion
    
    When deleting offloaded TC flows, we must correctly identify E-switch
    rules. The current check could get us wrong w.r.t to rules set on the
    PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
    offloads mode and then attempt to delete a NIC rule.
    
    To solve that, we add a flags field to offloaded rules, set it on
    creation time and use that over the code where currently needed.
    
    Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Reviewed-by: Roi Dayan <roid@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 33e21c59526e9147d7c68913995298f10c35cd6f
Author: Huy Nguyen <huyn@mellanox.com>
Date:   Fri Mar 10 14:33:03 2017 +0200

    net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode
    
    Currently, the function setdcbx fails if the request dcbx mode
    is either IEEE or CEE. We remove the IEEE/CEE mode check because
    we support both IEEE and CEE interfaces.
    
    Fixes: 3a6a931dfb8e ("net/mlx5e: Support DCBX CEE API")
    Signed-off-by: Huy Nguyen <huyn@mellanox.com>
    Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 5d47f6c89d568ab61712d8c40676fbb020b68752
Author: Daniel Jurgens <danielj@mellanox.com>
Date:   Fri Mar 10 14:33:02 2017 +0200

    net/mlx5: Don't save PCI state when PCI error is detected
    
    When a PCI error is detected the PCI state could be corrupt, don't save
    it in that flow. Save the state after initialization. After restoring the
    PCI state during slot reset save it again, restoring the state destroys
    the previously saved state info.
    
    Fixes: 05ac2c0b7438 ('net/mlx5: Fix race between PCI error handlers and
    health work')
    Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
    
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit af36370569eb37420e1e78a2e60c277b781fcd00
Author: Paul Blakey <paulb@mellanox.com>
Date:   Fri Mar 10 14:33:01 2017 +0200

    net/mlx5: Fix create autogroup prev initializer
    
    The autogroups list is a list of non overlapping group boundaries
    sorted by their start index. If the autogroups list wasn't empty
    and an empty group slot was found at the start of the list,
    the new group was added to the end of the list instead of the
    beginning, as the prev initializer was incorrect.
    When this was repeated, it caused multiple groups to have
    overlapping boundaries.
    
    Fixed that by correctly initializing the prev pointer to the
    start of the list.
    
    Fixes: eccec8da3b4e ('net/mlx5: Keep autogroups list ordered')
    Signed-off-by: Paul Blakey <paulb@mellanox.com>
    Reviewed-by: Mark Bloch <markb@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 14088540ad63c648e5cdf490412033f792d16b6b
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Fri Mar 10 17:44:18 2017 +0000

    arm64: use const cap for system_uses_ttbr0_pan()
    
    Since commit 4b65a5db362783ab ("arm64: Introduce
    uaccess_{disable,enable} functionality based on TTBR0_EL1"),
    system_uses_ttbr0_pan() has used cpus_have_cap() to determine whether
    PAN is present.
    
    Since commit a4023f682739439b ("arm64: Add hypervisor safe helper for
    checking constant capabilities"), which was introduced around the same
    time, cpus_have_cap() doesn't try to use a static key, and must always
    perform a load, test, and consitional branch (likely a tbnz for the
    latter two).
    
    Elsewhere, we moved to using cpus_have_const_cap(), which can use a
    static key (i.e. a non-conditional branch), which is patched at runtime
    when the feature is detected.
    
    This patch makes system_uses_ttbr0_pan() use cpus_have_const_cap(). The
    static key is likely a win for hot-paths like the uacccess primitives,
    and this makes our usage consistent regardless.
    
    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit 5c2a625937ba49bc691089370638223d310cda9a
Author: Eric Biggers <ebiggers@google.com>
Date:   Wed Mar 8 16:27:04 2017 -0800

    arm64: support keyctl() system call in 32-bit mode
    
    As is the case for a number of other architectures that have a 32-bit
    compat mode, enable KEYS_COMPAT if both COMPAT and KEYS are enabled.
    This allows AArch32 programs to use the keyctl() system call when
    running on an AArch64 kernel.
    
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit b0de0ccc8b9edd8846828e0ecdc35deacdf186b0
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Mon Mar 6 19:06:40 2017 +0000

    arm64: kasan: avoid bad virt_to_pfn()
    
    Booting a v4.11-rc1 kernel with DEBUG_VIRTUAL and KASAN enabled produces
    the following splat (trimmed for brevity):
    
    [    0.000000] virt_to_phys used for non-linear address: ffff200008080000 (0xffff200008080000)
    [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:14 __virt_to_phys+0x48/0x70
    [    0.000000] PC is at __virt_to_phys+0x48/0x70
    [    0.000000] LR is at __virt_to_phys+0x48/0x70
    [    0.000000] Call trace:
    [    0.000000] [<ffff2000080b1ac0>] __virt_to_phys+0x48/0x70
    [    0.000000] [<ffff20000a03b86c>] kasan_init+0x1c0/0x498
    [    0.000000] [<ffff20000a034018>] setup_arch+0x2fc/0x948
    [    0.000000] [<ffff20000a030c68>] start_kernel+0xb8/0x570
    [    0.000000] [<ffff20000a0301e8>] __primary_switched+0x6c/0x74
    
    This is because we use virt_to_pfn() on a kernel image address when
    trying to figure out its nid, so that we can allocate its shadow from
    the same node.
    
    As with other recent changes, this patch uses lm_alias() to solve this.
    
    We could instead use NUMA_NO_NODE, as x86 does for all shadow
    allocations, though we'll likely want the "real" memory shadow to be
    backed from its corresponding nid anyway, so we may as well be
    consistent and find the nid for the image shadow.
    
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Acked-by: Laura Abbott <labbott@redhat.com>
    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit cb6950b7152fb3760942f9cb16bd2a35e5a1bfd1
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Tue Mar 7 00:34:57 2017 +0530

    arm64: kprobes: remove kprobe_exceptions_notify
    
    Commit fc62d0207ae0 ("kprobes: Introduce weak variant of
    kprobe_exceptions_notify()") introduces a generic empty version of the
    function for architectures that don't need special handling, like arm64.
    As such, remove the arch/arm64/ specific handler.
    
    Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit 702f2ac87a9a8da23bf8506466bc70175fc970b2
Author: David Howells <dhowells@redhat.com>
Date:   Fri Mar 10 07:48:49 2017 +0000

    rxrpc: Wake up the transmitter if Rx window size increases on the peer
    
    The RxRPC ACK packet may contain an extension that includes the peer's
    current Rx window size for this call.  We adjust the local Tx window size
    to match.  However, the transmitter can stall if the receive window is
    reduced to 0 by the peer and then reopened.
    
    This is because the normal way that the transmitter is re-energised is by
    dropping something out of our Tx queue and thus making space.  When a
    single gap is made, the transmitter is woken up.  However, because there's
    nothing in the Tx queue at this point, this doesn't happen.
    
    To fix this, perform a wake_up() any time we see the peer's Rx window size
    increasing.
    
    The observable symptom is that calls start failing on ETIMEDOUT and the
    following:
    
    	kAFS: SERVER DEAD state=-62
    
    appears in dmesg.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 296739839fa2851e6badc77dcfc45050094cb102
Author: Andrew Lunn <andrew@lunn.ch>
Date:   Thu Mar 9 20:53:31 2017 +0100

    net: phy: marvell: Fix double free of hwmon device
    
    The hwmon temperature sensor devices is registered using a devm_hwmon
    API call.  The marvell_release() would then manually free the device,
    not using a devm_hmon API, resulting in the device being removed
    twice, leading to a crash in kernfs_find_ns() during the second
    removal.
    
    Remove the manual removal, which makes marvell_release() empty, so
    remove it as well.
    
    Signed-off-by: Andrew Lunn <andrew@lunn.ch>
    Fixes: 0b04680fdae4 ("phy: marvell: Add support for temperature sensor")
    Acked-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7b9f71f974a12740e79e918cfd58c2fce0b5b580
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Tue Feb 28 12:00:48 2017 +1000

    powerpc/64s: POWER9 machine check handler
    
    Add POWER9 machine check handler. There are several new types of errors
    added, so logging messages for those are also added.
    
    This doesn't attempt to reuse any of the P7/8 defines or functions,
    because that becomes too complex. The better option in future is to use
    a table driven approach.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit c1bbf387d6191e6e18f3adc4db45b922822c2ba4
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Tue Feb 28 12:00:47 2017 +1000

    powerpc/64s: allow machine check handler to set severity and initiator
    
    Currently severity and initiator are always set to MCE_SEV_ERROR_SYNC and
    MCE_INITIATOR_CPU in the core mce code. Allow them to be set by the
    machine specific mce handlers.
    
    No functional change for existing handlers.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit 1363875bdb6317a2d0798284d7aaf320f0782f6d
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Tue Feb 28 12:00:46 2017 +1000

    powerpc/64s: fix handling of non-synchronous machine checks
    
    A synchronous machine check is an exception raised by the attempt to
    execute the current instruction. If the error can't be corrected, it
    can make sense to SIGBUS the currently running process.
    
    In other cases, the error condition is not related to the current
    instruction, so killing the current process is not the right thing to
    do.
    
    Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no
    practical change. It will be used to handle POWER9 asynchronous
    machine checks.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit 46f401c4297a2232a037ad8801b6c83c90414cf7
Author: Larry Finger <Larry.Finger@lwfinger.net>
Date:   Thu Mar 9 20:33:51 2017 -0600

    powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops
    
    Commit 5657933dbb6e ("treewide: Move dma_ops from struct dev_archdata
    into struct device") introduced a crash for macio devices, an example
    backtrace being:
    
      kernel BUG at ./include/linux/dma-mapping.h:465!
      Oops: Exception in kernel mode, sig: 5 [#1]
      ...
      NIP [c031ddb0] dmam_alloc_coherent+0x74/0x140
      LR [c031de70] dmam_alloc_coherent+0x134/0x140
      Call Trace:
       dmam_alloc_coherent+0x134/0x140 (unreliable)
       pata_macio_port_start+0x3c/0x8c
       ata_host_start.part.5+0xfc/0x208
       ata_host_activate+0x128/0x154
       pata_macio_common_init+0x2f0/0x538
       pata_macio_attach+0xd8/0x180
       macio_device_probe+0x5c/0xec
       driver_probe_device+0x21c/0x314
       __driver_attach+0xcc/0xd0
       bus_for_each_dev+0x68/0xb4
       bus_add_driver+0x1dc/0x244
       driver_register+0x88/0x130
       pata_macio_init+0x5c/0x88
       do_one_initcall+0x40/0x170
       kernel_init_freeable+0x134/0x1d0
       kernel_init+0x18/0x110
       ret_from_kernel_thread+0x5c/0x64
    
    This was caused by the device having NULL dma_ops, triggering the
    BUG_ON(). Previously the device inherited its dma_ops via the assignment
    to dev->ofdev.dev.archdata. However after commit 5657933dbb6e the
    dma_ops are moved into dev->ofdev.dev, and so they need to be explicitly
    copied.
    
    Fixes: 5657933dbb6e ("treewide: Move dma_ops from struct dev_archdata into struct device")
    Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
    Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    [mpe: Rewrite change log, add backtrace]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit 6d22fe14005ce66d1a120495ac16499b944feb95
Author: Doug Berger <opendmb@gmail.com>
Date:   Thu Mar 9 16:58:50 2017 -0800

    net: bcmgenet: decouple flow control from bcmgenet_tx_reclaim
    
    The bcmgenet_tx_reclaim() function is used to reclaim transmit
    resources in different places within the driver.  Most of them
    should not affect the state of the transmit flow control.
    
    This commit relocates the logic for waking tx queues based on
    freed resources to the napi polling function where it is more
    appropriate.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 89316fa34ab8afac8d693f41a5bc268673f1da15
Author: Edwin Chan <edwin.chan@broadcom.com>
Date:   Thu Mar 9 16:58:49 2017 -0800

    net: bcmgenet: add begin/complete ethtool ops
    
    Make sure clock is enabled for ethtool ops.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Edwin Chan <edwin.chan@broadcom.com>
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6be371b053dc86f11465cc1abce2e99bda0a0574
Author: Doug Berger <opendmb@gmail.com>
Date:   Thu Mar 9 16:58:48 2017 -0800

    net: bcmgenet: Power up the internal PHY before probing the MII
    
    When using the internal PHY it must be powered up when the MII is probed
    or the PHY will not be detected.  Since the PHY is powered up at reset
    this has not been a problem.  However, when the kernel is restarted with
    kexec the PHY will likely be powered down when the kernel starts so it
    will not be detected and the Ethernet link will not be established.
    
    This commit explicitly powers up the internal PHY when the GENET driver
    is probed to correct this behavior.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 07c52d6a0b955a8a28834f9354793cfc4b81d0e9
Author: Doug Berger <opendmb@gmail.com>
Date:   Thu Mar 9 16:58:47 2017 -0800

    net: bcmgenet: synchronize irq0 status between the isr and task
    
    Add a spinlock to ensure that irq0_stat is not unintentionally altered
    as the result of preemption.  Also removed unserviced irq0 interrupts
    and removed irq1_stat since there is no bottom half service for those
    interrupts.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7627409cc4970e8c8b9de6945ad86a575290a94e
Author: Doug Berger <opendmb@gmail.com>
Date:   Thu Mar 9 16:58:46 2017 -0800

    net: bcmgenet: power down internal phy if open or resume fails
    
    Since the internal PHY is powered up during the open and resume
    functions it should be powered back down if the functions fail.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Doug Berger <opendmb@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit eca4bad73409aedc6ff22f823c18b67a4f08c851
Author: Doug Berger <opendmb@gmail.com>
Date:   Thu Mar 9 16:58:45 2017 -0800

    net: bcmgenet: reserved phy revisions must be checked first
    
    The reserved gphy_rev value of 0x01ff must be tested before the old
    or new scheme for GPHY major versioning are tested, otherwise it will
    be treated as 0xff00 according to the old schem…

Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 30, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
--
 fs/xfs/xfs_bmap_util.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

We are taking steps to help lead devs enforce good commit messages. Hope you like this initiative guys! >>> https://goo.gl/bGfFBw

lgeek added a commit to lgeek/linux-okreader that referenced this pull request Apr 8, 2017

ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) …
…optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(cherry picked from commit 455bd4c)

fuzeman pushed a commit to fuzeman/linux-ubuntu-zesty that referenced this pull request Apr 11, 2017

dccp/tcp: fix routing redirect race
BugLink: http://bugs.launchpad.net/bugs/1675032

[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Noltari pushed a commit to Noltari/linux that referenced this pull request Apr 13, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

rickgaiser pushed a commit to rickgaiser/linux that referenced this pull request Apr 21, 2017

ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) …
…optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Change-Id: I79a0d6897572b693d50f8ea8a94aa331bfcc59f8
Git-Commit: 455bd4c
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>

fengguang added a commit to 0day-ci/linux that referenced this pull request May 2, 2017

vmscan: scan pages until it founds eligible pages
Oops, forgot to add lkml and linux-mm.
Sorry for that.
Send it again.

From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 2 May 2017 12:34:05 +0900
Subject: [PATCH] vmscan: scan pages until it founds eligible pages

On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
There are premature OOM happening. Although there are a ton of free
swap and anonymous LRU list of elgible zones, OOM happened.

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.
Finally, OOM happens.

This patch makes isolate_lru_pages try to scan pages until it
encounters eligible zones's pages or too much scan happen(ie,
node's LRU size).

balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x65/0x87
 dump_header.isra.19+0x8f/0x20f
 ? preempt_count_add+0x9e/0xb0
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 oom_kill_process+0x21d/0x3f0
 ? has_capability_noaudit+0x17/0x20
 out_of_memory+0xd8/0x390
 __alloc_pages_slowpath+0xbc1/0xc50
 ? anon_vma_interval_tree_insert+0x84/0x90
 __alloc_pages_nodemask+0x1a5/0x1c0
 pte_alloc_one+0x20/0x50
 __pte_alloc+0x1e/0x110
 __handle_mm_fault+0x919/0x960
 handle_mm_fault+0x77/0x120
 __do_page_fault+0x27a/0x550
 trace_do_page_fault+0x43/0x150
 do_async_page_fault+0x2c/0x90
 async_page_fault+0x28/0x30
RIP: 0033:0x7fc4636bacb8
RSP: 002b:00007fff97c9c4c0 EFLAGS: 00010202
RAX: 00007fc3e818d000 RBX: 00007fc4639f8760 RCX: 00007fc46372e9ca
RDX: 0000000000101002 RSI: 0000000000101000 RDI: 0000000000000000
RBP: 0000000000100010 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 00000000000a3901 R12: 00007fc3e818d010
R13: 0000000000101000 R14: 00007fc4639f87b8 R15: 00007fc4639f87b8
Mem-Info:
active_anon:424716 inactive_anon:65314 isolated_anon:0
 active_file:52 inactive_file:46 isolated_file:0
 unevictable:0 dirty:27 writeback:0 unstable:0
 slab_reclaimable:3967 slab_unreclaimable:4125
 mapped:133 shmem:43 pagetables:1674 bounce:0
 free:4637 free_pcp:225 free_cma:0
Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 992 992 1952
DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 959
Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
378 total pagecache pages
17 pages in swap cache
Swap cache stats: add 17325, delete 17302, find 0/27
Free swap  = 978940kB
Total swap = 1048572kB
524157 pages RAM
0 pages HighMem/MovableOnly
12629 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
[  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd
...

Signed-off-by: Minchan Kim <minchan@kernel.org>

sunny256 pushed a commit to sunny256/linux that referenced this pull request May 5, 2017

dccp/tcp: fix routing redirect race
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sunny256 pushed a commit to sunny256/linux that referenced this pull request May 5, 2017

xfs: fix up xfs_swap_extent_forks inline extent handling
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fengguang added a commit to 0day-ci/linux that referenced this pull request May 7, 2017

block/mq: fix potential deadlock during cpu hotplug
This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting queues for blk mq
w/ all_q_mutex, however, blk_mq_init_allocated_queue() will contend these two locks in the
inversion order. This is due to commit eabe065 (blk/mq: Cure cpu hotplug lock inversion),
it fixes a cpu hotplug lock inversion issue because of hotplug rework, however the hotplug
rework is still work-in-progress and lives in a -tip branch and mainline cannot yet trigger
that splat. The commit breaks the linus's tree in the merge window, so this patch reverts the
lock order and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

fengguang added a commit to 0day-ci/linux that referenced this pull request May 7, 2017

block/mq: fix potential deadlock during cpu hotplug
This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
contend these two locks in the inversion order. This is due to commit eabe065
(blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
issue because of hotplug rework, however the hotplug rework is still work-in-progress
and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
breaks the linus's tree in the merge window, so this patch reverts the lock order
and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

faxiang1230 pushed a commit to faxiang1230/linux that referenced this pull request May 10, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

faxiang1230 pushed a commit to faxiang1230/linux that referenced this pull request May 10, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Reichl pushed a commit to Reichl/linux-odroid that referenced this pull request May 10, 2017

block/mq: fix potential deadlock during cpu hotplug
This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
contend these two locks in the inversion order. This is due to commit eabe065
(blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
issue because of hotplug rework, however the hotplug rework is still work-in-progress
and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
breaks the linus's tree in the merge window, so this patch reverts the lock order
and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

fengguang added a commit to 0day-ci/linux that referenced this pull request May 10, 2017

mm: vmscan: scan until it founds eligible pages
Although there are a ton of free swap and anonymous LRU page
in elgible zones, OOM happened.

balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x65/0x87
 dump_header.isra.19+0x8f/0x20f
 ? preempt_count_add+0x9e/0xb0
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 oom_kill_process+0x21d/0x3f0
 ? has_capability_noaudit+0x17/0x20
 out_of_memory+0xd8/0x390
 __alloc_pages_slowpath+0xbc1/0xc50
 ? anon_vma_interval_tree_insert+0x84/0x90
 __alloc_pages_nodemask+0x1a5/0x1c0
 pte_alloc_one+0x20/0x50
 __pte_alloc+0x1e/0x110
 __handle_mm_fault+0x919/0x960
 handle_mm_fault+0x77/0x120
 __do_page_fault+0x27a/0x550
 trace_do_page_fault+0x43/0x150
 do_async_page_fault+0x2c/0x90
 async_page_fault+0x28/0x30
RIP: 0033:0x7fc4636bacb8
RSP: 002b:00007fff97c9c4c0 EFLAGS: 00010202
RAX: 00007fc3e818d000 RBX: 00007fc4639f8760 RCX: 00007fc46372e9ca
RDX: 0000000000101002 RSI: 0000000000101000 RDI: 0000000000000000
RBP: 0000000000100010 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 00000000000a3901 R12: 00007fc3e818d010
R13: 0000000000101000 R14: 00007fc4639f87b8 R15: 00007fc4639f87b8
Mem-Info:
active_anon:424716 inactive_anon:65314 isolated_anon:0
 active_file:52 inactive_file:46 isolated_file:0
 unevictable:0 dirty:27 writeback:0 unstable:0
 slab_reclaimable:3967 slab_unreclaimable:4125
 mapped:133 shmem:43 pagetables:1674 bounce:0
 free:4637 free_pcp:225 free_cma:0
Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 992 992 1952
DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 959
Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
378 total pagecache pages
17 pages in swap cache
Swap cache stats: add 17325, delete 17302, find 0/27
Free swap  = 978940kB
Total swap = 1048572kB
524157 pages RAM
0 pages HighMem/MovableOnly
12629 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
[  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.
Finally, OOM happens.

The problem is that get_scan_count determines nr_to_scan with
eligible zones so although priority drops to zero, it couldn't
reclaim any pages if the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
	size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim any pages
remained after scanning 4 pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it
encounters eligible zones's pages.

Signed-off-by: Minchan Kim <minchan@kernel.org>

fengguang pushed a commit to 0day-ci/linux that referenced this pull request May 13, 2017

mm: vmscan: scan until it finds eligible pages
Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x65/0x87
 dump_header.isra.19+0x8f/0x20f
 ? preempt_count_add+0x9e/0xb0
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 oom_kill_process+0x21d/0x3f0
 ? has_capability_noaudit+0x17/0x20
 out_of_memory+0xd8/0x390
 __alloc_pages_slowpath+0xbc1/0xc50
 ? anon_vma_interval_tree_insert+0x84/0x90
 __alloc_pages_nodemask+0x1a5/0x1c0
 pte_alloc_one+0x20/0x50
 __pte_alloc+0x1e/0x110
 __handle_mm_fault+0x919/0x960
 handle_mm_fault+0x77/0x120
 __do_page_fault+0x27a/0x550
 trace_do_page_fault+0x43/0x150
 do_async_page_fault+0x2c/0x90
 async_page_fault+0x28/0x30
RIP: 0033:0x7fc4636bacb8
RSP: 002b:00007fff97c9c4c0 EFLAGS: 00010202
RAX: 00007fc3e818d000 RBX: 00007fc4639f8760 RCX: 00007fc46372e9ca
RDX: 0000000000101002 RSI: 0000000000101000 RDI: 0000000000000000
RBP: 0000000000100010 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 00000000000a3901 R12: 00007fc3e818d010
R13: 0000000000101000 R14: 00007fc4639f87b8 R15: 00007fc4639f87b8
Mem-Info:
active_anon:424716 inactive_anon:65314 isolated_anon:0
 active_file:52 inactive_file:46 isolated_file:0
 unevictable:0 dirty:27 writeback:0 unstable:0
 slab_reclaimable:3967 slab_unreclaimable:4125
 mapped:133 shmem:43 pagetables:1674 bounce:0
 free:4637 free_pcp:225 free_cma:0
Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 992 992 1952
DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 959
Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
378 total pagecache pages
17 pages in swap cache
Swap cache stats: add 17325, delete 17302, find 0/27
Free swap  = 978940kB
Total swap = 1048572kB
524157 pages RAM
0 pages HighMem/MovableOnly
12629 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
[  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim void
because it returns zero nr_taken easily so LRU shrinking is effectively
nothing and just increases priority aggressively.  Finally, OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages if
the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
	size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim any pages
remained after scanning 4 pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it
encounters eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db6581 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

torvalds added a commit that referenced this pull request May 13, 2017

mm: vmscan: scan until it finds eligible pages
Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

  balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
  CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
   oom_kill_process+0x21d/0x3f0
   out_of_memory+0xd8/0x390
   __alloc_pages_slowpath+0xbc1/0xc50
   __alloc_pages_nodemask+0x1a5/0x1c0
   pte_alloc_one+0x20/0x50
   __pte_alloc+0x1e/0x110
   __handle_mm_fault+0x919/0x960
   handle_mm_fault+0x77/0x120
   __do_page_fault+0x27a/0x550
   trace_do_page_fault+0x43/0x150
   do_async_page_fault+0x2c/0x90
   async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:424716 inactive_anon:65314 isolated_anon:0
   active_file:52 inactive_file:46 isolated_file:0
   unevictable:0 dirty:27 writeback:0 unstable:0
   slab_reclaimable:3967 slab_unreclaimable:4125
   mapped:133 shmem:43 pagetables:1674 bounce:0
   free:4637 free_pcp:225 free_cma:0
  Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 992 992 1952
  DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 959
  Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
  DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
  Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
  378 total pagecache pages
  17 pages in swap cache
  Swap cache stats: add 17325, delete 17302, find 0/27
  Free swap  = 978940kB
  Total swap = 1048572kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  12629 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned
  [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
  [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.  Finally,
OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
	size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages.  If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db6581 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sean-jc pushed a commit to sean-jc/linux that referenced this pull request Jun 2, 2017

mm: vmscan: scan until it finds eligible pages
Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x65/0x87
 dump_header.isra.19+0x8f/0x20f
 ? preempt_count_add+0x9e/0xb0
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 oom_kill_process+0x21d/0x3f0
 ? has_capability_noaudit+0x17/0x20
 out_of_memory+0xd8/0x390
 __alloc_pages_slowpath+0xbc1/0xc50
 ? anon_vma_interval_tree_insert+0x84/0x90
 __alloc_pages_nodemask+0x1a5/0x1c0
 pte_alloc_one+0x20/0x50
 __pte_alloc+0x1e/0x110
 __handle_mm_fault+0x919/0x960
 handle_mm_fault+0x77/0x120
 __do_page_fault+0x27a/0x550
 trace_do_page_fault+0x43/0x150
 do_async_page_fault+0x2c/0x90
 async_page_fault+0x28/0x30
RIP: 0033:0x7fc4636bacb8
RSP: 002b:00007fff97c9c4c0 EFLAGS: 00010202
RAX: 00007fc3e818d000 RBX: 00007fc4639f8760 RCX: 00007fc46372e9ca
RDX: 0000000000101002 RSI: 0000000000101000 RDI: 0000000000000000
RBP: 0000000000100010 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 00000000000a3901 R12: 00007fc3e818d010
R13: 0000000000101000 R14: 00007fc4639f87b8 R15: 00007fc4639f87b8
Mem-Info:
active_anon:424716 inactive_anon:65314 isolated_anon:0
 active_file:52 inactive_file:46 isolated_file:0
 unevictable:0 dirty:27 writeback:0 unstable:0
 slab_reclaimable:3967 slab_unreclaimable:4125
 mapped:133 shmem:43 pagetables:1674 bounce:0
 free:4637 free_pcp:225 free_cma:0
Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 992 992 1952
DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 959
Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
378 total pagecache pages
17 pages in swap cache
Swap cache stats: add 17325, delete 17302, find 0/27
Free swap  = 978940kB
Total swap = 1048572kB
524157 pages RAM
0 pages HighMem/MovableOnly
12629 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
[  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim void
because it returns zero nr_taken easily so LRU shrinking is effectively
nothing and just increases priority aggressively.  Finally, OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages if
the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
	size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim any pages
remained after scanning 4 pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it
encounters eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db6581 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Noltari pushed a commit to Noltari/linux that referenced this pull request Jun 15, 2017

dccp/tcp: fix routing redirect race
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fengguang added a commit to 0day-ci/linux that referenced this pull request Jul 21, 2017

drm/i2c: tda998x: Fix lockdep warning about possible circular dependency
When enabling lockdep debugging on Juno platform with HDLCD and TDA998x
I get the following warning from the system:

[   25.990733] ======================================================
[   25.998637] WARNING: possible circular locking dependency detected
[   26.006531] 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17 Not tainted
[   26.014246] ------------------------------------------------------
[   26.022142] kworker/1:2/140 is trying to acquire lock:
[   26.029001]  (&priv->audio_mutex){+.+.+.}, at: [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.041100]
[   26.041100] but task is already holding lock:
[   26.050436]  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.061531]
[   26.061531] which lock already depends on the new lock.
[   26.061531]
[   26.075063]
[   26.075063] the existing dependency chain (in reverse order) is:
[   26.086031]
[   26.086031] -> #2 (crtc_ww_class_mutex){+.+.+.}:
[   26.095657]        __lock_acquire+0x18a0/0x19b8
[   26.101918]        lock_acquire+0xd0/0x2b0
[   26.107731]        __ww_mutex_lock.constprop.3+0x90/0xe78
[   26.114817]        ww_mutex_lock+0x54/0xe0
[   26.120672]        drm_modeset_lock+0x64/0xf8 [drm]
[   26.127253]        drm_helper_probe_single_connector_modes+0x7c/0x6b8 [drm_kms_helper]
[   26.136829]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.144797]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.152429]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.161097]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.169857]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.177559]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.184310]        try_to_bring_up_master+0x180/0x1e0
[   26.191043]        component_master_add_with_match+0xb0/0x108
[   26.198458]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.204735]        platform_drv_probe+0x60/0xc0
[   26.210913]        driver_probe_device+0x23c/0x2e8
[   26.217350]        __driver_attach+0xd4/0xd8
[   26.223256]        bus_for_each_dev+0x5c/0xa8
[   26.229232]        driver_attach+0x30/0x40
[   26.234917]        bus_add_driver+0x1d8/0x248
[   26.240831]        driver_register+0x6c/0x118
[   26.246715]        __platform_driver_register+0x54/0x60
[   26.253461]        0xffff000000e1b018
[   26.258644]        do_one_initcall+0x44/0x138
[   26.264503]        do_init_module+0x64/0x1d4
[   26.270238]        load_module+0x1f90/0x2590
[   26.275957]        SyS_finit_module+0xb0/0xc8
[   26.281765]        __sys_trace_return+0x0/0x4
[   26.281767]
[   26.281767] -> #1 (crtc_ww_class_acquire){+.+.+.}:
[   26.281778]        __lock_acquire+0x18a0/0x19b8
[   26.281782]        lock_acquire+0xd0/0x2b0
[   26.281877]        drm_modeset_acquire_init+0xa8/0xe0 [drm]
[   26.281921]        drm_helper_probe_single_connector_modes+0x48/0x6b8 [drm_kms_helper]
[   26.281929]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.281970]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.282009]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.282049]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.282088]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.282095]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.282099]        try_to_bring_up_master+0x180/0x1e0
[   26.282104]        component_master_add_with_match+0xb0/0x108
[   26.282110]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.282114]        platform_drv_probe+0x60/0xc0
[   26.282117]        driver_probe_device+0x23c/0x2e8
[   26.282121]        __driver_attach+0xd4/0xd8
[   26.282124]        bus_for_each_dev+0x5c/0xa8
[   26.282127]        driver_attach+0x30/0x40
[   26.282130]        bus_add_driver+0x1d8/0x248
[   26.282134]        driver_register+0x6c/0x118
[   26.282138]        __platform_driver_register+0x54/0x60
[   26.282141]        0xffff000000e1b018
[   26.282145]        do_one_initcall+0x44/0x138
[   26.282149]        do_init_module+0x64/0x1d4
[   26.282152]        load_module+0x1f90/0x2590
[   26.282156]        SyS_finit_module+0xb0/0xc8
[   26.282159]        __sys_trace_return+0x0/0x4
[   26.282161]
[   26.282161] -> #0 (&priv->audio_mutex){+.+.+.}:
[   26.282172]        print_circular_bug+0x80/0x2e0
[   26.282176]        __lock_acquire+0x15a8/0x19b8
[   26.282180]        lock_acquire+0xd0/0x2b0
[   26.282184]        __mutex_lock+0x78/0x8e0
[   26.282188]        mutex_lock_nested+0x3c/0x50
[   26.282196]        tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.282237]        drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.282251]        malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.282292]        commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.282333]        drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.282427]        drm_atomic_commit+0x54/0x70 [drm]
[   26.282467]        restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.282507]        restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.282547]        drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.282586]        drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.282625]        drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.282665]        drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.282704]        drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.282716]        malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.282757]        drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.282797]        output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.282803]        process_one_work+0x280/0x790
[   26.282808]        worker_thread+0x48/0x450
[   26.282812]        kthread+0x138/0x140
[   26.282815]        ret_from_fork+0x10/0x40
[   26.282817]
[   26.282817] other info that might help us debug this:
[   26.282817]
[   26.282819] Chain exists of:
[   26.282819]   &priv->audio_mutex --> crtc_ww_class_acquire --> crtc_ww_class_mutex
[   26.282819]
[   26.282830]  Possible unsafe locking scenario:
[   26.282830]
[   26.282832]        CPU0                    CPU1
[   26.282834]        ----                    ----
[   26.282835]   lock(crtc_ww_class_mutex);
[   26.282840]                                lock(crtc_ww_class_acquire);
[   26.282845]                                lock(crtc_ww_class_mutex);
[   26.282850]   lock(&priv->audio_mutex);
[   26.282854]
[   26.282854]  *** DEADLOCK ***
[   26.282854]
[   26.282858] 5 locks held by kworker/1:2/140:
[   26.282859]  #0:  ("events"){.+.+.+}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282871]  #1:  ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282883]  #2:  (&helper->lock){+.+.+.}, at: [<ffff000000c0631c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x3c/0xd0 [drm_kms_helper]
[   26.282929]  #3:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffff000000c02d80>] restore_fbdev_mode_atomic+0x38/0x220 [drm_kms_helper]
[   26.282976]  #4:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.283077]
[   26.283077] stack backtrace:
[   26.283082] CPU: 1 PID: 140 Comm: kworker/1:2 Not tainted 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17
[   26.283084] Hardware name: ARM Juno development board (r0) (DT)
[   26.283127] Workqueue: events output_poll_execute [drm_kms_helper]
[   26.283131] Call trace:
[   26.283137] [<ffff00000808a778>] dump_backtrace+0x0/0x268
[   26.283142] [<ffff00000808aabc>] show_stack+0x24/0x30
[   26.283146] [<ffff000008aa36a8>] dump_stack+0xbc/0xf4
[   26.283151] [<ffff00000812f454>] print_circular_bug+0x1d4/0x2e0
[   26.283155] [<ffff000008132480>] __lock_acquire+0x15a8/0x19b8
[   26.283159] [<ffff000008133008>] lock_acquire+0xd0/0x2b0
[   26.283163] [<ffff000008aba060>] __mutex_lock+0x78/0x8e0
[   26.283168] [<ffff000008aba904>] mutex_lock_nested+0x3c/0x50
[   26.283176] [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.283217] [<ffff000000c00050>] drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.283230] [<ffff000000f1bd0c>] malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.283271] [<ffff000000c0045c>] commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.283312] [<ffff000000c00630>] drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.283406] [<ffff000000eb1604>] drm_atomic_commit+0x54/0x70 [drm]
[   26.283447] [<ffff000000c02f38>] restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.283487] [<ffff000000c03cf0>] restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.283526] [<ffff000000c06324>] drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.283566] [<ffff000000c0619c>] drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.283606] [<ffff000000c0627c>] drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.283645] [<ffff000000c062c4>] drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.283685] [<ffff000000c07124>] drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.283697] [<ffff000000f1b44c>] malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.283738] [<ffff000000bf5264>] drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.283779] [<ffff000000bf5480>] output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.283784] [<ffff0000080f85a8>] process_one_work+0x280/0x790
[   26.283788] [<ffff0000080f8b00>] worker_thread+0x48/0x450
[   26.283792] [<ffff000008100430>] kthread+0x138/0x140
[   26.283796] [<ffff000008083710>] ret_from_fork+0x10/0x40

This looks like it has been introduced by 'commit 02efac0 ("drm/i2c:
tda998x: remove complexity from tda998x_audio_get_eld()")'.

Fix the warning by dropping the use of local audio_mutex and switch to
taking the modeset connection_mutex in tda998x_audio_get_eld() for
avoidance of race conditions with drm_helper_probe_single_connector_modes()
updating the ELD data.

v2: Change to taking the modeset connection_mutex rather than changing
the time when audio_mutex lock was taken, as suggested by Russell King.
v3: Drop the bespoke drm_modeset_acquire_ctx, it is not needed.
(suggested by Daniel Vetter)

Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 02efac0 ("drm/i2c: tda998x: remove complexity from tda998x_audio_get_eld()")

fengguang added a commit to 0day-ci/linux that referenced this pull request Jul 21, 2017

drm/i2c: tda998x: Fix lockdep warning about possible circular dependency
When enabling lockdep debugging on Juno platform with HDLCD and TDA998x
I get the following warning from the system:

[   25.990733] ======================================================
[   25.998637] WARNING: possible circular locking dependency detected
[   26.006531] 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17 Not tainted
[   26.014246] ------------------------------------------------------
[   26.022142] kworker/1:2/140 is trying to acquire lock:
[   26.029001]  (&priv->audio_mutex){+.+.+.}, at: [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.041100]
[   26.041100] but task is already holding lock:
[   26.050436]  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.061531]
[   26.061531] which lock already depends on the new lock.
[   26.061531]
[   26.075063]
[   26.075063] the existing dependency chain (in reverse order) is:
[   26.086031]
[   26.086031] -> #2 (crtc_ww_class_mutex){+.+.+.}:
[   26.095657]        __lock_acquire+0x18a0/0x19b8
[   26.101918]        lock_acquire+0xd0/0x2b0
[   26.107731]        __ww_mutex_lock.constprop.3+0x90/0xe78
[   26.114817]        ww_mutex_lock+0x54/0xe0
[   26.120672]        drm_modeset_lock+0x64/0xf8 [drm]
[   26.127253]        drm_helper_probe_single_connector_modes+0x7c/0x6b8 [drm_kms_helper]
[   26.136829]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.144797]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.152429]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.161097]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.169857]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.177559]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.184310]        try_to_bring_up_master+0x180/0x1e0
[   26.191043]        component_master_add_with_match+0xb0/0x108
[   26.198458]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.204735]        platform_drv_probe+0x60/0xc0
[   26.210913]        driver_probe_device+0x23c/0x2e8
[   26.217350]        __driver_attach+0xd4/0xd8
[   26.223256]        bus_for_each_dev+0x5c/0xa8
[   26.229232]        driver_attach+0x30/0x40
[   26.234917]        bus_add_driver+0x1d8/0x248
[   26.240831]        driver_register+0x6c/0x118
[   26.246715]        __platform_driver_register+0x54/0x60
[   26.253461]        0xffff000000e1b018
[   26.258644]        do_one_initcall+0x44/0x138
[   26.264503]        do_init_module+0x64/0x1d4
[   26.270238]        load_module+0x1f90/0x2590
[   26.275957]        SyS_finit_module+0xb0/0xc8
[   26.281765]        __sys_trace_return+0x0/0x4
[   26.281767]
[   26.281767] -> #1 (crtc_ww_class_acquire){+.+.+.}:
[   26.281778]        __lock_acquire+0x18a0/0x19b8
[   26.281782]        lock_acquire+0xd0/0x2b0
[   26.281877]        drm_modeset_acquire_init+0xa8/0xe0 [drm]
[   26.281921]        drm_helper_probe_single_connector_modes+0x48/0x6b8 [drm_kms_helper]
[   26.281929]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.281970]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.282009]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.282049]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.282088]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.282095]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.282099]        try_to_bring_up_master+0x180/0x1e0
[   26.282104]        component_master_add_with_match+0xb0/0x108
[   26.282110]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.282114]        platform_drv_probe+0x60/0xc0
[   26.282117]        driver_probe_device+0x23c/0x2e8
[   26.282121]        __driver_attach+0xd4/0xd8
[   26.282124]        bus_for_each_dev+0x5c/0xa8
[   26.282127]        driver_attach+0x30/0x40
[   26.282130]        bus_add_driver+0x1d8/0x248
[   26.282134]        driver_register+0x6c/0x118
[   26.282138]        __platform_driver_register+0x54/0x60
[   26.282141]        0xffff000000e1b018
[   26.282145]        do_one_initcall+0x44/0x138
[   26.282149]        do_init_module+0x64/0x1d4
[   26.282152]        load_module+0x1f90/0x2590
[   26.282156]        SyS_finit_module+0xb0/0xc8
[   26.282159]        __sys_trace_return+0x0/0x4
[   26.282161]
[   26.282161] -> #0 (&priv->audio_mutex){+.+.+.}:
[   26.282172]        print_circular_bug+0x80/0x2e0
[   26.282176]        __lock_acquire+0x15a8/0x19b8
[   26.282180]        lock_acquire+0xd0/0x2b0
[   26.282184]        __mutex_lock+0x78/0x8e0
[   26.282188]        mutex_lock_nested+0x3c/0x50
[   26.282196]        tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.282237]        drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.282251]        malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.282292]        commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.282333]        drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.282427]        drm_atomic_commit+0x54/0x70 [drm]
[   26.282467]        restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.282507]        restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.282547]        drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.282586]        drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.282625]        drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.282665]        drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.282704]        drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.282716]        malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.282757]        drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.282797]        output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.282803]        process_one_work+0x280/0x790
[   26.282808]        worker_thread+0x48/0x450
[   26.282812]        kthread+0x138/0x140
[   26.282815]        ret_from_fork+0x10/0x40
[   26.282817]
[   26.282817] other info that might help us debug this:
[   26.282817]
[   26.282819] Chain exists of:
[   26.282819]   &priv->audio_mutex --> crtc_ww_class_acquire --> crtc_ww_class_mutex
[   26.282819]
[   26.282830]  Possible unsafe locking scenario:
[   26.282830]
[   26.282832]        CPU0                    CPU1
[   26.282834]        ----                    ----
[   26.282835]   lock(crtc_ww_class_mutex);
[   26.282840]                                lock(crtc_ww_class_acquire);
[   26.282845]                                lock(crtc_ww_class_mutex);
[   26.282850]   lock(&priv->audio_mutex);
[   26.282854]
[   26.282854]  *** DEADLOCK ***
[   26.282854]
[   26.282858] 5 locks held by kworker/1:2/140:
[   26.282859]  #0:  ("events"){.+.+.+}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282871]  #1:  ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282883]  #2:  (&helper->lock){+.+.+.}, at: [<ffff000000c0631c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x3c/0xd0 [drm_kms_helper]
[   26.282929]  #3:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffff000000c02d80>] restore_fbdev_mode_atomic+0x38/0x220 [drm_kms_helper]
[   26.282976]  #4:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.283077]
[   26.283077] stack backtrace:
[   26.283082] CPU: 1 PID: 140 Comm: kworker/1:2 Not tainted 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17
[   26.283084] Hardware name: ARM Juno development board (r0) (DT)
[   26.283127] Workqueue: events output_poll_execute [drm_kms_helper]
[   26.283131] Call trace:
[   26.283137] [<ffff00000808a778>] dump_backtrace+0x0/0x268
[   26.283142] [<ffff00000808aabc>] show_stack+0x24/0x30
[   26.283146] [<ffff000008aa36a8>] dump_stack+0xbc/0xf4
[   26.283151] [<ffff00000812f454>] print_circular_bug+0x1d4/0x2e0
[   26.283155] [<ffff000008132480>] __lock_acquire+0x15a8/0x19b8
[   26.283159] [<ffff000008133008>] lock_acquire+0xd0/0x2b0
[   26.283163] [<ffff000008aba060>] __mutex_lock+0x78/0x8e0
[   26.283168] [<ffff000008aba904>] mutex_lock_nested+0x3c/0x50
[   26.283176] [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.283217] [<ffff000000c00050>] drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.283230] [<ffff000000f1bd0c>] malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.283271] [<ffff000000c0045c>] commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.283312] [<ffff000000c00630>] drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.283406] [<ffff000000eb1604>] drm_atomic_commit+0x54/0x70 [drm]
[   26.283447] [<ffff000000c02f38>] restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.283487] [<ffff000000c03cf0>] restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.283526] [<ffff000000c06324>] drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.283566] [<ffff000000c0619c>] drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.283606] [<ffff000000c0627c>] drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.283645] [<ffff000000c062c4>] drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.283685] [<ffff000000c07124>] drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.283697] [<ffff000000f1b44c>] malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.283738] [<ffff000000bf5264>] drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.283779] [<ffff000000bf5480>] output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.283784] [<ffff0000080f85a8>] process_one_work+0x280/0x790
[   26.283788] [<ffff0000080f8b00>] worker_thread+0x48/0x450
[   26.283792] [<ffff000008100430>] kthread+0x138/0x140
[   26.283796] [<ffff000008083710>] ret_from_fork+0x10/0x40

Fix the warning by moving the acquiring of the priv->audio_mutex  in
tda998x_connector_fill_modes() after the drm_helper_probe_single_connector_modes().

Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Fixes: 02efac0 ("drm/i2c: tda998x: remove complexity from tda998x_audio_get_eld()")

fengguang added a commit to 0day-ci/linux that referenced this pull request Jul 22, 2017

drm/i2c: tda998x: Fix lockdep warning about possible circular dependency
When enabling lockdep debugging on Juno platform with HDLCD and TDA998x
I get the following warning from the system:

[   25.990733] ======================================================
[   25.998637] WARNING: possible circular locking dependency detected
[   26.006531] 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17 Not tainted
[   26.014246] ------------------------------------------------------
[   26.022142] kworker/1:2/140 is trying to acquire lock:
[   26.029001]  (&priv->audio_mutex){+.+.+.}, at: [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.041100]
[   26.041100] but task is already holding lock:
[   26.050436]  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.061531]
[   26.061531] which lock already depends on the new lock.
[   26.061531]
[   26.075063]
[   26.075063] the existing dependency chain (in reverse order) is:
[   26.086031]
[   26.086031] -> #2 (crtc_ww_class_mutex){+.+.+.}:
[   26.095657]        __lock_acquire+0x18a0/0x19b8
[   26.101918]        lock_acquire+0xd0/0x2b0
[   26.107731]        __ww_mutex_lock.constprop.3+0x90/0xe78
[   26.114817]        ww_mutex_lock+0x54/0xe0
[   26.120672]        drm_modeset_lock+0x64/0xf8 [drm]
[   26.127253]        drm_helper_probe_single_connector_modes+0x7c/0x6b8 [drm_kms_helper]
[   26.136829]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.144797]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.152429]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.161097]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.169857]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.177559]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.184310]        try_to_bring_up_master+0x180/0x1e0
[   26.191043]        component_master_add_with_match+0xb0/0x108
[   26.198458]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.204735]        platform_drv_probe+0x60/0xc0
[   26.210913]        driver_probe_device+0x23c/0x2e8
[   26.217350]        __driver_attach+0xd4/0xd8
[   26.223256]        bus_for_each_dev+0x5c/0xa8
[   26.229232]        driver_attach+0x30/0x40
[   26.234917]        bus_add_driver+0x1d8/0x248
[   26.240831]        driver_register+0x6c/0x118
[   26.246715]        __platform_driver_register+0x54/0x60
[   26.253461]        0xffff000000e1b018
[   26.258644]        do_one_initcall+0x44/0x138
[   26.264503]        do_init_module+0x64/0x1d4
[   26.270238]        load_module+0x1f90/0x2590
[   26.275957]        SyS_finit_module+0xb0/0xc8
[   26.281765]        __sys_trace_return+0x0/0x4
[   26.281767]
[   26.281767] -> #1 (crtc_ww_class_acquire){+.+.+.}:
[   26.281778]        __lock_acquire+0x18a0/0x19b8
[   26.281782]        lock_acquire+0xd0/0x2b0
[   26.281877]        drm_modeset_acquire_init+0xa8/0xe0 [drm]
[   26.281921]        drm_helper_probe_single_connector_modes+0x48/0x6b8 [drm_kms_helper]
[   26.281929]        tda998x_connector_fill_modes+0x44/0xa8 [tda998x]
[   26.281970]        drm_setup_crtcs+0x19c/0xba0 [drm_kms_helper]
[   26.282009]        drm_fb_helper_initial_config+0x70/0x440 [drm_kms_helper]
[   26.282049]        drm_fbdev_cma_init_with_funcs+0x94/0x168 [drm_kms_helper]
[   26.282088]        drm_fbdev_cma_init+0x38/0x50 [drm_kms_helper]
[   26.282095]        hdlcd_drm_bind+0x1f8/0x4a8 [hdlcd]
[   26.282099]        try_to_bring_up_master+0x180/0x1e0
[   26.282104]        component_master_add_with_match+0xb0/0x108
[   26.282110]        hdlcd_probe+0x58/0x80 [hdlcd]
[   26.282114]        platform_drv_probe+0x60/0xc0
[   26.282117]        driver_probe_device+0x23c/0x2e8
[   26.282121]        __driver_attach+0xd4/0xd8
[   26.282124]        bus_for_each_dev+0x5c/0xa8
[   26.282127]        driver_attach+0x30/0x40
[   26.282130]        bus_add_driver+0x1d8/0x248
[   26.282134]        driver_register+0x6c/0x118
[   26.282138]        __platform_driver_register+0x54/0x60
[   26.282141]        0xffff000000e1b018
[   26.282145]        do_one_initcall+0x44/0x138
[   26.282149]        do_init_module+0x64/0x1d4
[   26.282152]        load_module+0x1f90/0x2590
[   26.282156]        SyS_finit_module+0xb0/0xc8
[   26.282159]        __sys_trace_return+0x0/0x4
[   26.282161]
[   26.282161] -> #0 (&priv->audio_mutex){+.+.+.}:
[   26.282172]        print_circular_bug+0x80/0x2e0
[   26.282176]        __lock_acquire+0x15a8/0x19b8
[   26.282180]        lock_acquire+0xd0/0x2b0
[   26.282184]        __mutex_lock+0x78/0x8e0
[   26.282188]        mutex_lock_nested+0x3c/0x50
[   26.282196]        tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.282237]        drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.282251]        malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.282292]        commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.282333]        drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.282427]        drm_atomic_commit+0x54/0x70 [drm]
[   26.282467]        restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.282507]        restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.282547]        drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.282586]        drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.282625]        drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.282665]        drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.282704]        drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.282716]        malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.282757]        drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.282797]        output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.282803]        process_one_work+0x280/0x790
[   26.282808]        worker_thread+0x48/0x450
[   26.282812]        kthread+0x138/0x140
[   26.282815]        ret_from_fork+0x10/0x40
[   26.282817]
[   26.282817] other info that might help us debug this:
[   26.282817]
[   26.282819] Chain exists of:
[   26.282819]   &priv->audio_mutex --> crtc_ww_class_acquire --> crtc_ww_class_mutex
[   26.282819]
[   26.282830]  Possible unsafe locking scenario:
[   26.282830]
[   26.282832]        CPU0                    CPU1
[   26.282834]        ----                    ----
[   26.282835]   lock(crtc_ww_class_mutex);
[   26.282840]                                lock(crtc_ww_class_acquire);
[   26.282845]                                lock(crtc_ww_class_mutex);
[   26.282850]   lock(&priv->audio_mutex);
[   26.282854]
[   26.282854]  *** DEADLOCK ***
[   26.282854]
[   26.282858] 5 locks held by kworker/1:2/140:
[   26.282859]  #0:  ("events"){.+.+.+}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282871]  #1:  ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffff0000080f8500>] process_one_work+0x1d8/0x790
[   26.282883]  #2:  (&helper->lock){+.+.+.}, at: [<ffff000000c0631c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x3c/0xd0 [drm_kms_helper]
[   26.282929]  #3:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffff000000c02d80>] restore_fbdev_mode_atomic+0x38/0x220 [drm_kms_helper]
[   26.282976]  #4:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffff000000eaefe4>] drm_modeset_lock+0x64/0xf8 [drm]
[   26.283077]
[   26.283077] stack backtrace:
[   26.283082] CPU: 1 PID: 140 Comm: kworker/1:2 Not tainted 4.13.0-rc1-00284-g28c0a682ecbf-dirty #17
[   26.283084] Hardware name: ARM Juno development board (r0) (DT)
[   26.283127] Workqueue: events output_poll_execute [drm_kms_helper]
[   26.283131] Call trace:
[   26.283137] [<ffff00000808a778>] dump_backtrace+0x0/0x268
[   26.283142] [<ffff00000808aabc>] show_stack+0x24/0x30
[   26.283146] [<ffff000008aa36a8>] dump_stack+0xbc/0xf4
[   26.283151] [<ffff00000812f454>] print_circular_bug+0x1d4/0x2e0
[   26.283155] [<ffff000008132480>] __lock_acquire+0x15a8/0x19b8
[   26.283159] [<ffff000008133008>] lock_acquire+0xd0/0x2b0
[   26.283163] [<ffff000008aba060>] __mutex_lock+0x78/0x8e0
[   26.283168] [<ffff000008aba904>] mutex_lock_nested+0x3c/0x50
[   26.283176] [<ffff000000d0319c>] tda998x_encoder_mode_set+0x12c/0x5a0 [tda998x]
[   26.283217] [<ffff000000c00050>] drm_atomic_helper_commit_modeset_disables+0x328/0x3a0 [drm_kms_helper]
[   26.283230] [<ffff000000f1bd0c>] malidp_atomic_commit_tail+0x44/0x6b0 [mali_dp]
[   26.283271] [<ffff000000c0045c>] commit_tail+0x4c/0x80 [drm_kms_helper]
[   26.283312] [<ffff000000c00630>] drm_atomic_helper_commit+0xe8/0x180 [drm_kms_helper]
[   26.283406] [<ffff000000eb1604>] drm_atomic_commit+0x54/0x70 [drm]
[   26.283447] [<ffff000000c02f38>] restore_fbdev_mode_atomic+0x1f0/0x220 [drm_kms_helper]
[   26.283487] [<ffff000000c03cf0>] restore_fbdev_mode+0x38/0x188 [drm_kms_helper]
[   26.283526] [<ffff000000c06324>] drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xd0 [drm_kms_helper]
[   26.283566] [<ffff000000c0619c>] drm_fb_helper_set_par+0x34/0x80 [drm_kms_helper]
[   26.283606] [<ffff000000c0627c>] drm_fb_helper_hotplug_event.part.19+0x94/0xb0 [drm_kms_helper]
[   26.283645] [<ffff000000c062c4>] drm_fb_helper_hotplug_event+0x2c/0x48 [drm_kms_helper]
[   26.283685] [<ffff000000c07124>] drm_fbdev_cma_hotplug_event+0x24/0x30 [drm_kms_helper]
[   26.283697] [<ffff000000f1b44c>] malidp_output_poll_changed+0x24/0x30 [mali_dp]
[   26.283738] [<ffff000000bf5264>] drm_kms_helper_hotplug_event+0x34/0x40 [drm_kms_helper]
[   26.283779] [<ffff000000bf5480>] output_poll_execute+0x1a0/0x1f0 [drm_kms_helper]
[   26.283784] [<ffff0000080f85a8>] process_one_work+0x280/0x790
[   26.283788] [<ffff0000080f8b00>] worker_thread+0x48/0x450
[   26.283792] [<ffff000008100430>] kthread+0x138/0x140
[   26.283796] [<ffff000008083710>] ret_from_fork+0x10/0x40

This looks like it has been introduced by 'commit 02efac0 ("drm/i2c:
tda998x: remove complexity from tda998x_audio_get_eld()")'.

Fix the warning by dropping the use of local audio_mutex and switch to
taking the modeset connection_mutex in tda998x_audio_get_eld() for
avoidance of race conditions with drm_helper_probe_single_connector_modes()
updating the ELD data.

v2: Change to taking the modeset connection_mutex rather than changing
the time when audio_mutex lock was taken, as suggested by Russell King.

Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Fixes: 02efac0 ("drm/i2c: tda998x: remove complexity from tda998x_audio_get_eld()")

codykrieger pushed a commit to bw-oss/linux that referenced this pull request Jul 28, 2017

dccp/tcp: fix routing redirect race
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

torvalds pushed a commit that referenced this pull request Sep 29, 2017

KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
------------[ cut here ]------------
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

This should actually work all the time, making vmx_inject_page_fault_nested
totally unnecessary.  However, that's not working yet, so this patch can work
around the issue in the meanwhile.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fxn added a commit to fxn/dotfiles that referenced this pull request Oct 13, 2017

fengguang pushed a commit to 0day-ci/linux that referenced this pull request Oct 23, 2017

timer: Provide wrappers safe for use with LOCKDEP
Under LOCKDEP, the timer lock_class_key (set up in __setup_timer) needs
to be tied to the caller's context, so an inline for timer_setup()
won't work. We do, however, want to keep the inline version around for
argument type checking, though, so this provides macro wrappers in the
LOCKDEP case.

This fixes the case of different timers sharing the same LOCKDEP instance,
and producing a false positive warning:

[  580.840858] ======================================================
[  580.842299] WARNING: possible circular locking dependency detected
[  580.843684] 4.14.0-rc4+ #17 Not tainted
[  580.844554] ------------------------------------------------------
[  580.845945] swapper/9/0 is trying to acquire lock:
[  580.847024]  (slock-AF_INET){+.-.}, at: [<ffffffff84ea4c34>] tcp_write_timer+0x24/0xd0
[  580.848834]
               but task is already holding lock:
[  580.850107]  ((timer)#2){+.-.}, at: [<ffffffff846df7c0>] call_timer_fn+0x0/0x300
[  580.851663]
               which lock already depends on the new lock.

[  580.853439]
               the existing dependency chain (in reverse order) is:
[  580.855311]
               -> #1 ((timer)#2){+.-.}:
[  580.856538]        __lock_acquire+0x114d/0x11a0
[  580.857506]        lock_acquire+0xb0/0x1d0
[  580.858373]        del_timer_sync+0x3c/0xb0
[  580.859260]        inet_csk_reqsk_queue_drop+0x7f/0x1b0
...
               -> #0 (slock-AF_INET){+.-.}:
[  580.884980]        check_prev_add+0x666/0x700
[  580.885790]        __lock_acquire+0x114d/0x11a0
[  580.886575]        lock_acquire+0xb0/0x1d0
[  580.887289]        _raw_spin_lock+0x2c/0x40
[  580.888021]        tcp_write_timer+0x24/0xd0
...
[  580.900055]  Possible unsafe locking scenario:

[  580.901043]        CPU0                    CPU1
[  580.901797]        ----                    ----
[  580.902540]   lock((timer)#2);
[  580.903046]                                lock(slock-AF_INET);
[  580.904006]                                lock((timer)#2);
[  580.904915]   lock(slock-AF_INET);
[  580.905502]

In this report, del_timer_sync() is from:

	inet_csk_reqsk_queue_drop()
		reqsk_queue_unlink()
			del_timer_sync(&req->rsk_timer)

but tcp_write_timer()'s timer is attached to icsk_retransmit_timer. Both
had the same lock_class_key, since they were using timer_setup(). Switching
to a macro allows for a separate context, avoiding the false positive.

Fixes: 686fef9 ("timer: Prepare to change timer callback argument type")
Reported-by: Craig Gallek <cgallek@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20171019202838.GA43223@beast

borkmann pushed a commit to cilium/linux that referenced this pull request Nov 1, 2017

Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.
Commit dec2c92 ("Bluetooth: hci_ldisc:
Use rwlocking to avoid closing proto races") introduced locks in
hci_ldisc that are held while calling the proto functions. These locks
are rwlock's, and hence do not allow sleeping while they are held.
However, the proto functions that hci_bcm registers use mutexes and
hence need to be able to sleep.

In more detail: hci_uart_tty_receive() and hci_uart_dequeue() both
acquire the rwlock, after which they call proto->recv() and
proto->dequeue(), respectively. In the case of hci_bcm these point to
bcm_recv() and bcm_dequeue(). The latter both acquire the
bcm_device_lock, which is a mutex, so doing so results in a call to
might_sleep(). But since we're holding a rwlock in hci_ldisc, that
results in the following BUG (this for the dequeue case - a similar
one for the receive case is omitted for brevity):

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c
  in_atomic(): 1, irqs_disabled(): 0, pid: 7303, name: kworker/7:3
  INFO: lockdep is turned off.
  CPU: 7 PID: 7303 Comm: kworker/7:3 Tainted: G        W  OE   4.13.2+ #17
  Hardware name: Apple Inc. MacBookPro13,3/Mac-A5C67F76ED83108C, BIOS MBP133.8
  Workqueue: events hci_uart_write_work [hci_uart]
  Call Trace:
   dump_stack+0x8e/0xd6
   ___might_sleep+0x164/0x250
   __might_sleep+0x4a/0x80
   __mutex_lock+0x59/0xa00
   ? lock_acquire+0xa3/0x1f0
   ? lock_acquire+0xa3/0x1f0
   ? hci_uart_write_work+0xd3/0x160 [hci_uart]
   mutex_lock_nested+0x1b/0x20
   ? mutex_lock_nested+0x1b/0x20
   bcm_dequeue+0x21/0xc0 [hci_uart]
   hci_uart_write_work+0xe6/0x160 [hci_uart]
   process_one_work+0x253/0x6a0
   worker_thread+0x4d/0x3b0
   kthread+0x133/0x150

We can't replace the mutex in hci_bcm, because there are other calls
there that might sleep. Therefore this replaces the rwlock's in
hci_ldisc with rw_semaphore's (which allow sleeping). This is a safer
approach anyway as it reduces the restrictions on the proto callbacks.
Also, because acquiring write-lock is very rare compared to acquiring
the read-lock, the percpu variant of rw_semaphore is used.

Lastly, because hci_uart_tx_wakeup() may be called from an IRQ context,
we can't block (sleep) while trying acquire the read lock there, so we
use the trylock variant.

Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

roadrunner2 added a commit to roadrunner2/linux that referenced this pull request Nov 8, 2017

Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.
Commit dec2c92 ("Bluetooth: hci_ldisc:
Use rwlocking to avoid closing proto races") introduced locks in
hci_ldisc that are held while calling the proto functions. These locks
are rwlock's, and hence do not allow sleeping while they are held.
However, the proto functions that hci_bcm registers use mutexes and
hence need to be able to sleep.

In more detail: hci_uart_tty_receive() and hci_uart_dequeue() both
acquire the rwlock, after which they call proto->recv() and
proto->dequeue(), respectively. In the case of hci_bcm these point to
bcm_recv() and bcm_dequeue(). The latter both acquire the
bcm_device_lock, which is a mutex, so doing so results in a call to
might_sleep(). But since we're holding a rwlock in hci_ldisc, that
results in the following BUG (this for the dequeue case - a similar
one for the receive case is omitted for brevity):

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c
  in_atomic(): 1, irqs_disabled(): 0, pid: 7303, name: kworker/7:3
  INFO: lockdep is turned off.
  CPU: 7 PID: 7303 Comm: kworker/7:3 Tainted: G        W  OE   4.13.2+ #17
  Hardware name: Apple Inc. MacBookPro13,3/Mac-A5C67F76ED83108C, BIOS MBP133.8
  Workqueue: events hci_uart_write_work [hci_uart]
  Call Trace:
   dump_stack+0x8e/0xd6
   ___might_sleep+0x164/0x250
   __might_sleep+0x4a/0x80
   __mutex_lock+0x59/0xa00
   ? lock_acquire+0xa3/0x1f0
   ? lock_acquire+0xa3/0x1f0
   ? hci_uart_write_work+0xd3/0x160 [hci_uart]
   mutex_lock_nested+0x1b/0x20
   ? mutex_lock_nested+0x1b/0x20
   bcm_dequeue+0x21/0xc0 [hci_uart]
   hci_uart_write_work+0xe6/0x160 [hci_uart]
   process_one_work+0x253/0x6a0
   worker_thread+0x4d/0x3b0
   kthread+0x133/0x150

We can't replace the mutex in hci_bcm, because there are other calls
there that might sleep. Therefore this replaces the rwlock's in
hci_ldisc with rw_semaphore's (which allow sleeping). This is a safer
approach anyway as it reduces the restrictions on the proto callbacks.
Also, because acquiring write-lock is very rare compared to acquiring
the read-lock, the percpu variant of rw_semaphore is used.

Lastly, because hci_uart_tx_wakeup() may be called from an IRQ context,
we can't block (sleep) while trying acquire the read lock there, so we
use trylock variant.

Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Dean Jenkins <Dean_Jenkins@mentor.com>

roadrunner2 added a commit to roadrunner2/linux that referenced this pull request Nov 8, 2017

Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.
Commit dec2c92 ("Bluetooth: hci_ldisc:
Use rwlocking to avoid closing proto races") introduced locks in
hci_ldisc that are held while calling the proto functions. These locks
are rwlock's, and hence do not allow sleeping while they are held.
However, the proto functions that hci_bcm registers use mutexes and
hence need to be able to sleep.

In more detail: hci_uart_tty_receive() and hci_uart_dequeue() both
acquire the rwlock, after which they call proto->recv() and
proto->dequeue(), respectively. In the case of hci_bcm these point to
bcm_recv() and bcm_dequeue(). The latter both acquire the
bcm_device_lock, which is a mutex, so doing so results in a call to
might_sleep(). But since we're holding a rwlock in hci_ldisc, that
results in the following BUG (this for the dequeue case - a similar
one for the receive case is omitted for brevity):

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c
  in_atomic(): 1, irqs_disabled(): 0, pid: 7303, name: kworker/7:3
  INFO: lockdep is turned off.
  CPU: 7 PID: 7303 Comm: kworker/7:3 Tainted: G        W  OE   4.13.2+ #17
  Hardware name: Apple Inc. MacBookPro13,3/Mac-A5C67F76ED83108C, BIOS MBP133.8
  Workqueue: events hci_uart_write_work [hci_uart]
  Call Trace:
   dump_stack+0x8e/0xd6
   ___might_sleep+0x164/0x250
   __might_sleep+0x4a/0x80
   __mutex_lock+0x59/0xa00
   ? lock_acquire+0xa3/0x1f0
   ? lock_acquire+0xa3/0x1f0
   ? hci_uart_write_work+0xd3/0x160 [hci_uart]
   mutex_lock_nested+0x1b/0x20
   ? mutex_lock_nested+0x1b/0x20
   bcm_dequeue+0x21/0xc0 [hci_uart]
   hci_uart_write_work+0xe6/0x160 [hci_uart]
   process_one_work+0x253/0x6a0
   worker_thread+0x4d/0x3b0
   kthread+0x133/0x150

We can't replace the mutex in hci_bcm, because there are other calls
there that might sleep. Therefore this replaces the rwlock's in
hci_ldisc with rw_semaphore's (which allow sleeping). This is a safer
approach anyway as it reduces the restrictions on the proto callbacks.
Also, because acquiring write-lock is very rare compared to acquiring
the read-lock, the percpu variant of rw_semaphore is used.

Lastly, because hci_uart_tx_wakeup() may be called from an IRQ context,
we can't block (sleep) while trying acquire the read lock there, so we
use trylock variant.

Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Dean Jenkins <Dean_Jenkins@mentor.com>

Flex1911 added a commit to Flex1911/linux that referenced this pull request Nov 14, 2017

f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Flex1911 added a commit to Flex1911/linux that referenced this pull request Nov 15, 2017

f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Flex1911 added a commit to Flex1911/linux that referenced this pull request Nov 15, 2017

f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Flex1911 added a commit to Flex1911/linux that referenced this pull request Nov 15, 2017

f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Flex1911 added a commit to Flex1911/linux that referenced this pull request Nov 16, 2017

f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

erikarn pushed a commit to bw-oss/linux that referenced this pull request Nov 22, 2017

Merge pull request #17 from bowerswilkins/bw/bcm7444s-4.1-plushulafile
Import Hulafile and other goodies from hula-pkgs.git (BCM7444s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment