Skip to content

Add support for AR5BBU22 [0489:e03c] #17

Closed
wants to merge 1 commit into from
@WNeZRoS
WNeZRoS commented May 11, 2012

No description provided.

@WNeZRoS WNeZRoS closed this May 11, 2012
@torvalds
Owner
@orblivion

How do you feel about merging in things that may include commits downstream that have been pull requested with github? Seems hard to stop that.

@jaseemabid

Somebody please look at the diff. Thats a simple 3 line code addition. I agree to you @torvalds but you could have excused this time :)

@jaseemabid

By the way, its quite funny that github is sending instructions to @torvalds on using git.

@torvalds
Owner
@torvalds
Owner
@skalnik
skalnik commented May 11, 2012

@torvalds The GitHub commit UI provides a text area for commit messages. This supports new lines and makes it easy to do nicely formatted commit messages :)

@jedahan
jedahan commented May 11, 2012

@skalnik would be nice if it had an 80-character line to help format things nicely.

@paulcbetts

Every time another Pull Request fiasco happens on one of Linus's repos it makes me sad, especially because I want someone whose work I greatly respect, to have a good experience on GitHub - instead he gets dozens of troll comments.

An OS kernel very rightfully demands a very disciplined approach to development that is in many ways not compatible with the goals of GitHub, which is to get as many people of all skill levels involved in Free / Open Source Software. We can certainly make improvements though, and I appreciate that Linus has taken some time to detail exactly why he doesn't use PRs, even if it's a bit harsh.

@tubbo
tubbo commented May 11, 2012
 - no sane word-wrap of the long description you type: github commit
messages tend to be (if they have any description at all) one long
unreadable line.

I think this is only because people who are new to Git are using GitHub and not understanding about Git-style committing. Remember, a lot of these newbies are just out of the gate from using SVN for years. I bet a lot of them don't even realize that git commit with the "-m" omitted just opens up COMMIT_EDITMSG in your editor. It isn't even very apparent (to newbies) of the 50-char title rule and 72-char every other line rule with commit messages.

github *could* make it easy to write good commit messages and enforce
the proper "oneliner for shortlogs and gitk, full explanation for full
logs". But github doesn't. Instead, the github "commit on the web"
interface is one single horrible text-entry field with absolutely no
sane way to write a good-looking message.

I have to agree with you there. Commit message viewing on Github sucks and I hope they change it soon.

@torvalds
Owner
@jedahan
jedahan commented May 11, 2012

I always thought of the title of a pull request as the one-liner ...

@jrep
jrep commented May 11, 2012

Newbie question I know, but can someone point me to this "nice pull-request generation module" Linus mentions? My google fu, documentation fu, and command-line-help fu all failed.

@torvalds
Owner
@technoweenie

@jrep: I believe he's referring to git-request-pull.

@nugend
nugend commented May 11, 2012

I'm not sure I understand why the commit message itself should be hard word-wrapped. Naively, it seems like that should be a display property of the editor used to write the commit message or the tool used to display the commit message.

@torvalds
Owner
@scomma
scomma commented May 11, 2012

While I do have great respect for you @torvalds and your work, and it's totally valid for the repository of Linux to have rather rigorous standards, have you considered the possibility there could be a lot of GitHub users who don't really need nor care about any of those "features" you try to portray as objectively superior?

@torvalds
Owner
@tylermenezes

If you add .patch onto this URL you'll get a git-am style patch.

(Github is very silly for not exposing this in the interface, and for not even really mentioning this feature.)

I agree with you on the messages, I wish the text areas were at least monospaced.

@torvalds
Owner
@torvalds
Owner
@mmorris-gc

Word-wrapping is a property of the text. And the tool you use to
visualize things cannot know. End result: you do word-wrapping at the
only stage where you can do it, namely when writing it. Not when
showing it.

Just curious - why is it that the tool used to visualize things cannot know how to wrap text it displays? And if it is the case, isn't that a problem with the viewer itself, rather than a reason to hard wrap?

@myfreeweb

Commit messages must be limited to 140 characters, like tweets. Right in git's core.

(See what I did there? What's “pure garbage” for you is just perfect for a lot of people.)

@vertexclique

@torvalds Thank you for your rational and good opinion. I appreciate you.

@brettalton

Do you guys not understand that this is Linus' blessed repository and he can accept and reject whomever and whichever request he likes? He has specific and pertinent rules when it comes to merging that he's learned over 20 years of maintaining the Linux kernel. He developed git - in case you forgot, he was the initial developer - with features specifically for gpg signoffs, shortlogs, etc. - things he and other intelligent computer scientists find useful for maintaining repositories.

I've maintained small projects with three developers plus myself and as soon as you become loose with your merging criteria, the entire repository goes to hell. If he wants gpg signoffs, then he'll get gpg signoffs. Try maintaining 20 millions lines of code and merges requests from 2,000 developers, and then you can give Linus advise.

@dustalov

I think @torvalds is a pretty cool guy. eh scolds githubs and doesnt afraid of anything.

@MostAwesomeDude

While I do have great respect for you @torvalds and your work, and it's totally valid for the repository of Linux to have rather rigorous standards, have you considered the possibility there could be a lot of GitHub users who don't really need nor care about any of those "features" you try to portray as objectively superior?

"GitHub is the best place to share code with friends, co-workers,
classmates, and complete strangers." As long as GH actually, genuinely
cares about making this statement true, they should be providing these
features.

Roman, in the future, you should follow the kernel's guide for
submitting patches. I believe that drivers/bluetooth is covered by the
list at linux-bluetooth@vger.kernel.org and you can submit your patch
to them, with a proper Signed-off-by tag.

FWIW, Reviewed-by: Corbin Simpson MostAwesomeDude@gmail.com, but
there's no way to confirm that since GH is going to hide my email
address and I can't easily sign this message.

(As an example of broken UI, while writing this message, I split my
screen between Firefox and vim, vertically. Linus' messages, being
wrapped, were perfectly readable, but because Github has a massive
minimum width, I had to scroll back and forth in order to read everybody
else's messages.)

@ivyl
ivyl commented May 11, 2012

@mmorris-gc
Sure, tools can do that, but at what cost?
Mostly messages are read in terminal, not via web interface.

How to distinguish part which should be wrapped from ones that
don't? Add extra tags?

Commit logs are mostly viewed in terminals, which tends to use
monotype fonts.

What about quoting? ">" are clean and indicates
level of quoting.

This ideas are used for years in emails and guess what?
They work!

@factormystic

@mmorris-gc It's open source. Fork it and write a custom viewer for youself. Problem solved.

@mephux
mephux commented May 11, 2012

Amen for the "victim philosophy" comment. If you want to commit or suggest features get ready for feedback. People need to seriously stop crying when others are blunt with them; It's pathetic. (not everyone has time to consider the infinite ways you may interpret something)

@KorvinSzanto

I'd have to say I fully agree with @torvalds, I've worked in very strict commit standards, and in very loose standards, and by far my entire experience was a lot better with well formatted standard commit messages. Github does not handle this at all.

Some say that "people don't care", it's mostly because they don't know what they are missing, if it were more convenient to use good standards, everyone would use them.

@jite
jite commented May 11, 2012

Sometimes I wonder if the ones who like a massive one-liner as commit message are Windows users...

@mmorris-gc

@ivyl

Sure, tools can do that, but at what cost?

I don't know what the cost is, but I'd be interested to know! That's why I was asking what prevents the tool from doing this rather than requiring that the user handle it.

@factormystic Not sure what this has to do with my question. I was just wondering if there was a reason that the viewer couldn't handle it; I wasn't complaining or asking someone to fix it for me.

@jnavila
jnavila commented May 11, 2012

Sad that there is no option to disable pull requests via github

@skalnik
skalnik commented May 11, 2012

@torvalds It is indeed a text area.
On top of this, vim/emacs/$EDITOR does not usually enforce the commit format either. In both cases it's up to the end user to write a well styled commit message.

That being said, I agree it could be better. Perhaps if it was more like the commit form that the GitHub application has.

Since this is seems so important, perhaps git should enforce this style by rejecting any commits with a message that does not adhere to your specification?

@camdez
camdez commented May 11, 2012

why is it that the tool used to visualize things cannot know how to wrap text it displays?

@mmorris-gc That was actually covered by @torvalds above when he said:

Some things should not be word-wrapped. They may be some kind of
quoted text - long compiler error messages, oops reports, whatever.

Not only would it be a tremendous burden for every viewing tool to try and determine which items meet the above definition (and do so correctly), many of the tools we use are generic whereas the formatting rules might depend what domain the material came from, making it literally impossible to display things correctly under all conditions.

@mmorris-gc

@camdez Interesting. Still seems like a problem that could be solved by better tooling, but I appreciate you taking the time to point that out. Thanks!

@leobalter

@jnavia there´s a way to disable pull requests in Github, they call it private repos.

So sad seeing someone who made a great system raging like a child because no one and no system can be like him or how he wants.

@antirez
antirez commented May 11, 2012

@torvalds other than "form" of pull requests what I'm even more worried about is that this new model of contributing code bypasses the former interaction that there is in a mailing list. If the hub of a project is the ML there are better chances that things are discussed before turning into code that will be refused. Even when the approach starts with a patch, it gets publicly discussed by interested parties, and a long term trace remains in the ML archive. It's a pretty different way of doing this, that was used to build a lot of code with success, and one that works better for a project where patches and new ideas are scrutinized in depth before being accepted.

@JeremyARussell

@torvalds I would like to take this oppurtunity to say thanks for Linux and git. For without both of those this great coding community wouldn't have had a chance.

I'd also like to point out something else GitHub does do really well. This. What we are doing right now. Socially coding in an open environment. Talking about things, being connected. Hell when I was growing up I never thought I'd get a chance to say something that Linus effing Torvalds would get to read and possibly comment on, and now here I am, able to put in my two cents (in a flood of thousands of pennies). So thankyou. Thankyou Linus for making git and Linux, and thankyou GitHub for making coding social.

@jnavila
jnavila commented May 11, 2012

@leobalter No : disabling pull requests does not mean making a repo private. As many other opensource projects, the linux kernel has its own workflow, so why not follow it? At GH, they are aware of it, they even mention in the progit book.

And before "raging like a child" about his comments, read them again: he just does not care or bother.

@evanmoran

My own preferred solution would be if GitHub kept to one commit message box but live previewed how it would appear below with 72 character wrap. Then you could see clearly what the short and long messages would look and could adjust accordingly (this is done in Stack Overflow and is very helpful).

The last issue is that monospace is required to view / wrap correctly. A natural way to handle this is to use the markdown four space indent syntax, but since this could get annoying it might be better to have an input type pulldown (text vs markdown) in the same way editing GitHub wikis allows.

@leobalter

@jnavila github has its pull requests as they are. Maybe no one follow "high standards" Linus but it´s great in my workflow.

My point is: raging like a child is unnecessary. Turn of pull requests notifications and don´t answer.

If this github pull requests mess your day off, start thinking about using other code hosting.

The community doesn´t need to be blamed for not being such highness standards followers, we just need people collaborating, because it´s open and many visions are still great on any project.

@drogus
drogus commented May 11, 2012

I'm not sure why this topic is about pull requests not the feature of editing files online. Most of the people create pull requests out of branches prepared locally, I've prepared tons of pull requests and I've used online editor only once.

@KorvinSzanto

@leobalter, you're missing the point, this isn't about downplaying the current workings of github, it's about suggesting better workings for github. Just because you are fine with having pull requests on doesn't mean there shouldn't be an option to turn them off.

@denhamcoote

@leobalter He's not blaming 'the community', he's pointing out what he thinks needs improving in GH. Raging like a child? If you don't like his 'childish' opinion (read: high standards), don't open a pull request. I'm quite happy to see the conversation that's followed as a result.

I work at a financial institution where a single line code change can be backed with 50 page specs, 200 lines of test code, 2 weeks of testing, etc. Asking for a decent commit message on your own repo isn't that big of a deal.

@nugend
nugend commented May 11, 2012

@camdez Are we talking about only the situation where some text shouldn't be word wrapped though? Are there other wrapping related formatting concerns with plain text?

@SixArm
SixArm commented May 11, 2012

I agree, especially the identify verification via confirmed email addresses, digital signatures, or a mix.

@orblivion

@torvalds I think you missed my point. I'm not just talking about people using Github to host. You don't merge everything in Linux yourself, you defer 90% of that through a trust hierarchy (as you eloquently described in your Google talk about Git). Unless you somehow enforce that everybody under you also refuses Github pull requests, your logs could still get soiled.

@jsanders

@antirez How is the discussion of a pull request on GitHub different than the discussion of a patch on a mailing list? Is it that you end up with two different places to discuss things - mailing list for things without patches, GitHub for things with patches? Or is it that subscribing to see pull requests for a project is not as elegant as subscribing to a mailing list?

My company has had quite a bit of success having in depth discussions about both experimental and more straightforward patches on pull requests, and treating them as the long term trace of discussion, much like you're suggesting - what would we gain from using a mailing list instead?

@torvalds
Owner
@dysoco
dysoco commented May 11, 2012

@johnmetta Oh, you must be new to the internet, or to @torvalds rants :P

@braneed
braneed commented May 12, 2012

Linus, I love your rants and your code. @torvalds.

@SkaveRat

I like how @torvalds rants on a high niveau ;)

nice read, and I have to agree (tho the "moron" comment really wasn't necessary)

@sirlancelot

Did you see about adding .patch to the end of the pull request URL like so: https://github.com/torvalds/linux/pull/17.patch

I'm no git-expert, but doesn't that have all the information?

@holdenweb

Not sure what all this fuss is about. @Torvalds points out that due to definite weaknesses in GitHub's UI he won't accept pull requests, and the world starts whaling on him. It's simple: if you want him to pull your changes in, don't use GitHub to generate the request. This would probably be easier than trying to change his mind.

@torvalds
Owner
@torvalds
Owner
@javajosh

If the rules for writing good commit messages are that mechanistic, then @skalnik made a good suggestion: provide a way for maintainers to specify a validation function on commit messages. Could be a nice feature.

@fogleman

Why is Linus word wrapping his comments in this thread? Looks silly.

@SkaveRat
@reinaldons

I fully agree with @torvalds. GitHub is a UI that replaces an important feature with inferior version, have no excuses.

@sp4ke
sp4ke commented May 12, 2012

I think github devs should really take @torvalds remarks seriously for two main reasons.

Git was built for kernel and designed by @torvalds, so even if it might not seem important for some new people to git and github, there is a reason for git commit messages and pull request to respect some rules which might not be evident, and Github has a great responsability in teaching these rules to new comers.

Second, Gh might be the best tool/platform to start using Git, so with all the possibilities given by a modern web service like Github and how easily UI can be tweaked, it seems like a waste not to build on top of best practices. Seriously how hard is it to make a text field validator for respecting git commit messages ?

@rtomayko

I just want to get on the record as one of the original pull request developers that we've been aware of these issues for a long time and certainly take them seriously. There are a number of problems we need to address that would make maintainers's lives a lot easier.

There's no question Linus's feedback is warranted. I could add considerably to his critique, even. (Mail headers anyone?) Nobody hates these issues more than we do.

@luckydev

@torvalds is very clear on what Linux needs if the contributors wants to send him pull requests. I think Github should just look into this and fix the problems.

@AlekseyKorzun

Github was made for 'easy & fast' code management, perhaps this is not the right tool for this job.

I don't agree that there should be rigid pull request standards in place, it works fine for 99% of the smaller projects that are hosted here.

The ticketing system on other hand.. is another story.

@petdance

(not everyone has time to consider the infinite ways you may interpret something)

I can think of only one way to interpret Linus telling someone "You're a moron." There is no subtle nuance there.

@Bilge
Bilge commented May 12, 2012

Sure is my way or the highway in here.

@pirtlj
pirtlj commented May 12, 2012

My comments keep getting deleted lol

@pirtlj
pirtlj commented May 12, 2012

I hate that whole "victim philosophy". The truth shouldn't be sugarcoated.

By truth Linus is of course referring to his own opinion.

@holdenweb

No doubt his manner is abrupt. Possibly curt. Probably rude. It's fairly obvious @torvalds doesn't "suffer fools gladly". For all I know, this may be a necessary strategy, though it's certainly not one I would find productive. But luckily for him, I'm not him. I do know prominent open sourcerers who are, I wouldn't say harassed, but certainly imposed upon mercilessly. The "nice guys" end up conscientiously dealing with at least some of the traffic, which takes up time that could be spent working or with their families and friends.

But the most important points have almost been lost in the noise: a) @torvalds made explicit complaints about the github pull request, with cogent reasons why it was unsatisfactory; b) Github responded (nice to know they watch their logs) explaining that they are aware of the shortcomings, and others not mentioned, and are working towards fixing it.

It's pointless to argue and bicker
Linus doesn't respond to a clicker
So just make a note
He's a crabby old goat
And then we'll all get along quicker

The Miss Manners conversation can now continue :)

@ghost
ghost commented May 12, 2012

Why is it possible to commit using the web interface anyway? I agree that it's difficult to write decent commit messages using the web interface, but it's even more difficult to write decent changes using the web interface.

People often either don't have knowledge of the organization of the project, or they are half asleep when writing the changes using the web interface.

Anyway, this discussion about wraps is getting me hungry.

@n3storm
n3storm commented May 12, 2012

I started reading this thread with no opinion. Then I realised I started reading
@torvalds nice newspaper column like posts with ease and skipping
non wrapped texts, and said "uhm, that's the point!"
So now I do have an opinion, all comments should be line-wrapped.
Thank's for the lesson :)

@sitaramc

@torvalds You have a lot more patience than I have. I rarely even log in to github (website); once in a while I'll go in and just blindly delete all the pending pull requests unless I recognise the name of the person.

I've long had a policy of "no pull requests, no issues, no comments on code via github; everything on email only" and if people don't know that it's their problem.

[edited to change "tolerance" to "patience" in first line]

@JonDum
JonDum commented May 12, 2012

@sitaramc Maybe the solution is for Github to convert actions that were created with the web interface into an email friendly format and send it out like it does for comments. That way all parties are satisfied regardless of which interface they prefer (web or terminal).

@jammycakes

In defence of Linus's attitude here:

There is one thing you need to bear in mind about the Linux kernel. It is an operating system kernel -- the most fundamental, critical software component of your entire computer. If it goes wrong, everything goes wrong. On top of that, it is probably the most widely deployed OS kernel in the world, being used from everything from transport to logistics to medicine to the military to aerospace. Many of these are applications where people could be killed if things went wrong.

A system of that nature requires much more care and attention to detail than your average vim setup or pet weekend IOC container. If that extends to issues as seemingly trivial (to some people) as word wrapping on check-in comments, then so be it. And if the lead developer of a project such as that does get sharp with people, it's not unfriendliness and political incorrectness, but simply due care and attention with regards to the bigger picture and the stakes being so much higher.

@JackieJ
JackieJ commented May 12, 2012

@AlekseyKorzun, "easy and fast" development is based upon code that's easy to manage. Loose pull requests really hurts efficiency in code management, no matter for big project or small project. A rigid pull request standard would make the development easier and faster, especially for projects involving multiple contributors:).

@sitaramc

@JonDum sounds like a nice idea but consider this sequence. I get an email from the website. I reply to it, cc-ing someone outside github. The original requestor (who is interacting only via github's web interface) sees my reply and replies to that. At this point I believe the guy I added in my CC does not get cc-d and is out of the loop.

At least that's my recollection of this; maybe they fixed it...

It's not hard to fix; qa.debian.org does it ok I think. So does bugzilla, IIRC and probably many other such systems.

@robermorales

I think that if @torvalds does not like github, he can move "his" project to another site. I like github web interface. We are on 2012, not on 1980. Probably Linus use a 80-char green-on-black display. The real people mostly not.

Furtherhand, @torvalds cannot say "you are a moron" while his minions clap. It is hateful.

@jaseemabid

@robermorales He explained already why he hosted "his" project here in a very sensible manner. Read comments.
There are quite a lot advantages using the 80 chars convention even in 2012.
He is the only reason why we at least have a sensible "80-char green-on-black display", respect him for that.

@richo
richo commented May 12, 2012

@robermorales move "his" project?

You're here because you use git right.. who specifically do you think wrote git?

@AnthonyAkentiev

Linus wrote: "For some reason, github has attracted people who have zero
taste, don't care about commit logs, and can't be bothered."

It seems like github is written using C++ :-))

@yobert
yobert commented May 12, 2012

I'm fascinated by how many comments say things along the lines of "why not enforce the commit message formatting" or "github should add validation to the commit message tool". The point is that you can't validate or enforce good formatting, since only the author knows which parts of the text should be wrapped nicely and which parts shouldn't.

On a side note, doing a good text editor in a web browser that looks nice and works well is very very challenging.

@greg0ire

@yobert no, you can't (or should) not enforce the commit message formatting,
but you sure could validate it. Look at how vim does this with colors when you
use it as your commit editor.

@bootc bootc pushed a commit to bootc/linux that referenced this pull request May 12, 2012
Mel Gorman mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGE…
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9da11af
@MattKetmo

@n3storm @SkaveRat : I don't share your opinion about auto-wrapped email.

First, I feel sick reading those auto-wrapped comments in this thread because I always make a "stop breath" after each end of line. It's not natural, especially with non-monospace font. Imagine reading a book where text is not justified and text width is 3/4 of the page. That'd be weird.

It's not your responsibility to format the text displayed in a web page. If your feel uncomfortable with reading all other comments, maybe the designers at GitHub should change the font-size or the .discussion-timeline div width. The only reason is if you read that thread on your terminal... which leads to my second point.

My second point is : I don't read my emails on a terminal!, like most of people. I hate when somebody send me (wrapped) text-only email that are very hard to read on a smartphone (need to scroll horizontally), and often looks weird on the web interface, like @fogleman noticed it here.

That could be compared to a width-fixed content like PDF which is not adaptable (/responsive) for all supports, whereas HTML is.

On the other hand, most people use a terminal or a monospaced text editor/IDE to code and use git. That's why I totally agree on wrapping message commits, but not in emails or in comments.

So my opinion is: don't use a web UI as your main tools to make commits -- most people work locally anyway, this is just here to provide a quick (and crappy, ok) solution when you don't have your usual working environment. But Github pull requests are really awesome and are much more user-friendly than emails.

@dysoco
dysoco commented May 12, 2012

Wait a minute, I'm going to invite Tanenbaum to this conversation.

@poke
poke commented May 12, 2012

@torvalds

Look here for a good example of a recent valid pull request:

http://groups.google.com/group/linux.kernel/browse_thread/thread/c3de7bbe9bb73cf5/1d61f01ea9ec3c67?show_docid=1d61f01ea9ec3c67&pli=1

To be fair, pull requests on GitHub are not that different to that. Pull requests (and issues, which are very related) here are a replacement for mailing lists. It’s where the discussion is going on. All the data you mention is available in a pull request as well, just not that visible. Instead you have to look at the commits appended to the request, or the diff view that’s next to the discussion tab.

Obviously that’s not how you do things. You are an email person, using mailing lists as the main (if not only) way to discuss and propose changes to your projects. And I think that is perfectly fine, especially looking at how well it works with your projects.

But I don’t think that makes pull requests on GitHub inferior. They are different, yes, they require a different workflow, but that workflow works extremely well for many projects, especially those that are not using other means for communication (like mailing lists).

A bad style for commit messages or reasoning of commits and pull request does not come automatically with pull requests on GitHub, the same way as a good style does not come automatically with mailing lists. I’ve seen many perfectly described commits in a well-reasoned pull requests on GitHub, and I’ve also seen as many bad requests in mailing lists. You can do both good and bad things with either (or any) system, and I personally think GitHub offers a great system for projects that are not as busy as the kernel or Git itself.

@kbarber
kbarber commented May 12, 2012

Github should supply a mechanism for disabling pull requests from the Admin interface, so these conversations aren't required and people like @torvalds can make his own decisions on how he wants to receive commits. At the moment, pull request capabilities are always on so one needs to constantly close them, explain to people why, rinse & repeat.

@n3storm
n3storm commented May 12, 2012

@MattKetmo, just to give you a clue, ever wondered why you prefer reading books
in a tablet or pad and not in a 29'' screen?
Even in year 2030, human eyes will have the same comfortable eye scanning and
skimming range, proportionally to font size, of course.

@benatkin

Just add a feature to disable pull requests on a per-repo basis, GitHub. Since it's an option, it doesn't need a majority of users to want it, to justify adding it.

It's also a good first step to take in fixing pull requests.

@MattKetmo

@n3storm Sure, I agree with you about having a "comfortable eye scanning and skimming range".

I'm just saying breaking lines at 80 chars for that kind of content doesn't solve the problem (except in a terminal). Text should be displayed at fullwidth in a pad, and in a column of "xxx" px max in a 24" or 42" screen. Line breaks won't be the same depending on the support, so manually breaking lines can make reading harder.

@osteslag

Maybe @github could add a per-repo option to enforce the @torvalds recommended commit messaging style?

@SteveJones

I can't believe this whole discussion has gone on with no mention of format=flowed, probably the best thing Apple ever did. Not that it really applies to the question of how git commit messages should be formatted (or does it?), but you shouldn't be commenting on formatting of plain text emails unless you've read that rfc.

@shepik
shepik commented May 12, 2012

I love it that you can easily know just by
looking at word wraps of the comment
whether a person is supporting Linus or not

@teamaqua

For future reference, here are the HN discussions on this topic:

http://news.ycombinator.com/item?id=3960876
http://news.ycombinator.com/item?id=3964252

@larshp larshp referenced this pull request in larshp/abapGit Jul 6, 2015
Open

better commit message editor #94

@kernelOfTruth kernelOfTruth added a commit to kernelOfTruth/linux that referenced this pull request Jul 15, 2015
Michal Hocko [PATCH] mm, vmscan: Do not wait for page writeback for GFP_NOFS
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:
PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
 #0 [ffff88177374ac60] __schedule at ffffffff815ab152
 #1 [ffff88177374acb0] schedule at ffffffff815ab76e
 #2 [ffff88177374acd0] schedule_timeout at ffffffff815ae5e5
 #3 [ffff88177374ad70] io_schedule_timeout at ffffffff815aad6a
 #4 [ffff88177374ada0] bit_wait_io at ffffffff815abfc6
 #5 [ffff88177374adb0] __wait_on_bit at ffffffff815abda5
 #6 [ffff88177374ae00] wait_on_page_bit at ffffffff8111fd4f
 #7 [ffff88177374ae50] shrink_page_list at ffffffff81135445
 #8 [ffff88177374af50] shrink_inactive_list at ffffffff81135845
 #9 [ffff88177374b060] shrink_lruvec at ffffffff81135ead
 #10 [ffff88177374b150] shrink_zone at ffffffff811360c3
 #11 [ffff88177374b220] shrink_zones at ffffffff81136eff
 #12 [ffff88177374b2a0] do_try_to_free_pages at ffffffff8113712f
 #13 [ffff88177374b300] try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 [ffff88177374b380] try_charge at ffffffff81189423
 #15 [ffff88177374b430] mem_cgroup_try_charge at ffffffff8118c6f5
 #16 [ffff88177374b470] __add_to_page_cache_locked at ffffffff8112137d
 #17 [ffff88177374b4e0] add_to_page_cache_lru at ffffffff81121618
 #18 [ffff88177374b510] pagecache_get_page at ffffffff8112170b
 #19 [ffff88177374b560] grow_dev_page at ffffffff811c8297
 #20 [ffff88177374b5c0] __getblk_slow at ffffffff811c91d6
 #21 [ffff88177374b600] __getblk_gfp at ffffffff811c92c1
 #22 [ffff88177374b630] ext4_ext_grow_indepth at ffffffff8124565c
 #23 [ffff88177374b690] ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 [ffff88177374b6e0] ext4_ext_insert_extent at ffffffff81246f09
 #25 [ffff88177374b750] ext4_ext_map_blocks at ffffffff8124a848
 #26 [ffff88177374b870] ext4_map_blocks at ffffffff8121a5b7
 #27 [ffff88177374b910] mpage_map_one_extent at ffffffff8121b1fa
 #28 [ffff88177374b950] mpage_map_and_submit_extent at ffffffff8121f07b
 #29 [ffff88177374b9b0] ext4_writepages at ffffffff8121f6d5
 #30 [ffff88177374bb20] do_writepages at ffffffff8112c490
 #31 [ffff88177374bb30] __filemap_fdatawrite_range at ffffffff81120199
 #32 [ffff88177374bb80] filemap_flush at ffffffff8112041c
 #33 [ffff88177374bb90] ext4_alloc_da_blocks at ffffffff81219da1
 #34 [ffff88177374bbb0] ext4_rename at ffffffff81229b91
 #35 [ffff88177374bcd0] ext4_rename2 at ffffffff81229e32
 #36 [ffff88177374bce0] vfs_rename at ffffffff811a08a5
 #37 [ffff88177374bd60] SYSC_renameat2 at ffffffff811a3ffc
 #38 [ffff88177374bf60] sys_renameat2 at ffffffff811a408e
 #39 [ffff88177374bf70] sys_rename at ffffffff8119e51e
 #40 [ffff88177374bf80] system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away. The heuristic was introduced by e62e384
("memcg: prevent OOM with too many dirty pages") and it was applied
only when may_enter_fs was specified. The code has been changed by
c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
which has removed the __GFP_FS restriction with a reasoning that we
do not get into the fs code. But this is not sufficient apparently
because the fs doesn't necessarily submit pages marked PG_writeback
for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by __GFP_FS check (for case 2)
before we go to wait on the writeback. The page fault path, which is the
only path that triggers memcg oom killer since 3.12, shouldn't require
GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue
which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:
: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable # 3.6+
[tytso@mit.edu: check for __GFP_FS rather than __GFP_IO]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmscan.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)
042e907
@martinezjavier martinezjavier pushed a commit to martinezjavier/linux that referenced this pull request Jul 30, 2015
Michal Hocko mm, vmscan: do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:
PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
 #0 [ffff88177374ac60] __schedule at ffffffff815ab152
 #1 [ffff88177374acb0] schedule at ffffffff815ab76e
 #2 [ffff88177374acd0] schedule_timeout at ffffffff815ae5e5
 #3 [ffff88177374ad70] io_schedule_timeout at ffffffff815aad6a
 #4 [ffff88177374ada0] bit_wait_io at ffffffff815abfc6
 #5 [ffff88177374adb0] __wait_on_bit at ffffffff815abda5
 #6 [ffff88177374ae00] wait_on_page_bit at ffffffff8111fd4f
 #7 [ffff88177374ae50] shrink_page_list at ffffffff81135445
 #8 [ffff88177374af50] shrink_inactive_list at ffffffff81135845
 #9 [ffff88177374b060] shrink_lruvec at ffffffff81135ead
 #10 [ffff88177374b150] shrink_zone at ffffffff811360c3
 #11 [ffff88177374b220] shrink_zones at ffffffff81136eff
 #12 [ffff88177374b2a0] do_try_to_free_pages at ffffffff8113712f
 #13 [ffff88177374b300] try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 [ffff88177374b380] try_charge at ffffffff81189423
 #15 [ffff88177374b430] mem_cgroup_try_charge at ffffffff8118c6f5
 #16 [ffff88177374b470] __add_to_page_cache_locked at ffffffff8112137d
 #17 [ffff88177374b4e0] add_to_page_cache_lru at ffffffff81121618
 #18 [ffff88177374b510] pagecache_get_page at ffffffff8112170b
 #19 [ffff88177374b560] grow_dev_page at ffffffff811c8297
 #20 [ffff88177374b5c0] __getblk_slow at ffffffff811c91d6
 #21 [ffff88177374b600] __getblk_gfp at ffffffff811c92c1
 #22 [ffff88177374b630] ext4_ext_grow_indepth at ffffffff8124565c
 #23 [ffff88177374b690] ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 [ffff88177374b6e0] ext4_ext_insert_extent at ffffffff81246f09
 #25 [ffff88177374b750] ext4_ext_map_blocks at ffffffff8124a848
 #26 [ffff88177374b870] ext4_map_blocks at ffffffff8121a5b7
 #27 [ffff88177374b910] mpage_map_one_extent at ffffffff8121b1fa
 #28 [ffff88177374b950] mpage_map_and_submit_extent at ffffffff8121f07b
 #29 [ffff88177374b9b0] ext4_writepages at ffffffff8121f6d5
 #30 [ffff88177374bb20] do_writepages at ffffffff8112c490
 #31 [ffff88177374bb30] __filemap_fdatawrite_range at ffffffff81120199
 #32 [ffff88177374bb80] filemap_flush at ffffffff8112041c
 #33 [ffff88177374bb90] ext4_alloc_da_blocks at ffffffff81219da1
 #34 [ffff88177374bbb0] ext4_rename at ffffffff81229b91
 #35 [ffff88177374bcd0] ext4_rename2 at ffffffff81229e32
 #36 [ffff88177374bce0] vfs_rename at ffffffff811a08a5
 #37 [ffff88177374bd60] SYSC_renameat2 at ffffffff811a3ffc
 #38 [ffff88177374bf60] sys_renameat2 at ffffffff811a408e
 #39 [ffff88177374bf70] sys_rename at ffffffff8119e51e
 #40 [ffff88177374bf80] system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away. The heuristic was introduced by e62e384
("memcg: prevent OOM with too many dirty pages") and it was applied
only when may_enter_fs was specified. The code has been changed by
c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
which has removed the __GFP_FS restriction with a reasoning that we
do not get into the fs code. But this is not sufficient apparently
because the fs doesn't necessarily submit pages marked PG_writeback
for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by __GFP_FS check (for case 2)
before we go to wait on the writeback. The page fault path, which is the
only path that triggers memcg oom killer since 3.12, shouldn't require
GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue
which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:
: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
[tytso@mit.edu: check for __GFP_FS rather than __GFP_IO]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marian Marinov <mm@1h.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
b2f3b4a
@ddstreet ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jul 31, 2015
Michal Hocko mm, vmscan: do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:
PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
 #0 [ffff88177374ac60] __schedule at ffffffff815ab152
 #1 [ffff88177374acb0] schedule at ffffffff815ab76e
 #2 [ffff88177374acd0] schedule_timeout at ffffffff815ae5e5
 #3 [ffff88177374ad70] io_schedule_timeout at ffffffff815aad6a
 #4 [ffff88177374ada0] bit_wait_io at ffffffff815abfc6
 #5 [ffff88177374adb0] __wait_on_bit at ffffffff815abda5
 #6 [ffff88177374ae00] wait_on_page_bit at ffffffff8111fd4f
 #7 [ffff88177374ae50] shrink_page_list at ffffffff81135445
 #8 [ffff88177374af50] shrink_inactive_list at ffffffff81135845
 #9 [ffff88177374b060] shrink_lruvec at ffffffff81135ead
 #10 [ffff88177374b150] shrink_zone at ffffffff811360c3
 #11 [ffff88177374b220] shrink_zones at ffffffff81136eff
 #12 [ffff88177374b2a0] do_try_to_free_pages at ffffffff8113712f
 #13 [ffff88177374b300] try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 [ffff88177374b380] try_charge at ffffffff81189423
 #15 [ffff88177374b430] mem_cgroup_try_charge at ffffffff8118c6f5
 #16 [ffff88177374b470] __add_to_page_cache_locked at ffffffff8112137d
 #17 [ffff88177374b4e0] add_to_page_cache_lru at ffffffff81121618
 #18 [ffff88177374b510] pagecache_get_page at ffffffff8112170b
 #19 [ffff88177374b560] grow_dev_page at ffffffff811c8297
 #20 [ffff88177374b5c0] __getblk_slow at ffffffff811c91d6
 #21 [ffff88177374b600] __getblk_gfp at ffffffff811c92c1
 #22 [ffff88177374b630] ext4_ext_grow_indepth at ffffffff8124565c
 #23 [ffff88177374b690] ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 [ffff88177374b6e0] ext4_ext_insert_extent at ffffffff81246f09
 #25 [ffff88177374b750] ext4_ext_map_blocks at ffffffff8124a848
 #26 [ffff88177374b870] ext4_map_blocks at ffffffff8121a5b7
 #27 [ffff88177374b910] mpage_map_one_extent at ffffffff8121b1fa
 #28 [ffff88177374b950] mpage_map_and_submit_extent at ffffffff8121f07b
 #29 [ffff88177374b9b0] ext4_writepages at ffffffff8121f6d5
 #30 [ffff88177374bb20] do_writepages at ffffffff8112c490
 #31 [ffff88177374bb30] __filemap_fdatawrite_range at ffffffff81120199
 #32 [ffff88177374bb80] filemap_flush at ffffffff8112041c
 #33 [ffff88177374bb90] ext4_alloc_da_blocks at ffffffff81219da1
 #34 [ffff88177374bbb0] ext4_rename at ffffffff81229b91
 #35 [ffff88177374bcd0] ext4_rename2 at ffffffff81229e32
 #36 [ffff88177374bce0] vfs_rename at ffffffff811a08a5
 #37 [ffff88177374bd60] SYSC_renameat2 at ffffffff811a3ffc
 #38 [ffff88177374bf60] sys_renameat2 at ffffffff811a408e
 #39 [ffff88177374bf70] sys_rename at ffffffff8119e51e
 #40 [ffff88177374bf80] system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away. The heuristic was introduced by e62e384
("memcg: prevent OOM with too many dirty pages") and it was applied
only when may_enter_fs was specified. The code has been changed by
c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
which has removed the __GFP_FS restriction with a reasoning that we
do not get into the fs code. But this is not sufficient apparently
because the fs doesn't necessarily submit pages marked PG_writeback
for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by __GFP_FS check (for case 2)
before we go to wait on the writeback. The page fault path, which is the
only path that triggers memcg oom killer since 3.12, shouldn't require
GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue
which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:
: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
[tytso@mit.edu: check for __GFP_FS rather than __GFP_IO]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marian Marinov <mm@1h.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
b94cce9
@kingkongmaster kingkongmaster referenced this pull request in amoffat/masquerade Aug 4, 2015
@torvalds Enjoy! 9b05625
@torvalds torvalds added a commit that referenced this pull request Aug 5, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecf5fc6
@ddstreet ddstreet pushed a commit to ddstreet/linux that referenced this pull request Aug 6, 2015
mmotm auto import origin
GIT 4469942

commit fc1a812
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Tue Aug 4 10:58:26 2015 -0600

    KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
    
    The patch was munged on commit to re-order these tests resulting in
    excessive warnings when trying to do device assignment.  Return to
    original ordering: https://lkml.org/lkml/2015/7/15/769
    
    Fixes: 3e5d2fd ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit ecf5fc6
Author: Michal Hocko <mhocko@suse.cz>
Date:   Tue Aug 4 14:36:58 2015 -0700

    mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
    
    Nikolay has reported a hang when a memcg reclaim got stuck with the
    following backtrace:
    
    PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
      #0 __schedule at ffffffff815ab152
      #1 schedule at ffffffff815ab76e
      #2 schedule_timeout at ffffffff815ae5e5
      #3 io_schedule_timeout at ffffffff815aad6a
      #4 bit_wait_io at ffffffff815abfc6
      #5 __wait_on_bit at ffffffff815abda5
      #6 wait_on_page_bit at ffffffff8111fd4f
      #7 shrink_page_list at ffffffff81135445
      #8 shrink_inactive_list at ffffffff81135845
      #9 shrink_lruvec at ffffffff81135ead
     #10 shrink_zone at ffffffff811360c3
     #11 shrink_zones at ffffffff81136eff
     #12 do_try_to_free_pages at ffffffff8113712f
     #13 try_to_free_mem_cgroup_pages at ffffffff811372be
     #14 try_charge at ffffffff81189423
     #15 mem_cgroup_try_charge at ffffffff8118c6f5
     #16 __add_to_page_cache_locked at ffffffff8112137d
     #17 add_to_page_cache_lru at ffffffff81121618
     #18 pagecache_get_page at ffffffff8112170b
     #19 grow_dev_page at ffffffff811c8297
     #20 __getblk_slow at ffffffff811c91d6
     #21 __getblk_gfp at ffffffff811c92c1
     #22 ext4_ext_grow_indepth at ffffffff8124565c
     #23 ext4_ext_create_new_leaf at ffffffff81246ca8
     #24 ext4_ext_insert_extent at ffffffff81246f09
     #25 ext4_ext_map_blocks at ffffffff8124a848
     #26 ext4_map_blocks at ffffffff8121a5b7
     #27 mpage_map_one_extent at ffffffff8121b1fa
     #28 mpage_map_and_submit_extent at ffffffff8121f07b
     #29 ext4_writepages at ffffffff8121f6d5
     #30 do_writepages at ffffffff8112c490
     #31 __filemap_fdatawrite_range at ffffffff81120199
     #32 filemap_flush at ffffffff8112041c
     #33 ext4_alloc_da_blocks at ffffffff81219da1
     #34 ext4_rename at ffffffff81229b91
     #35 ext4_rename2 at ffffffff81229e32
     #36 vfs_rename at ffffffff811a08a5
     #37 SYSC_renameat2 at ffffffff811a3ffc
     #38 sys_renameat2 at ffffffff811a408e
     #39 sys_rename at ffffffff8119e51e
     #40 system_call_fastpath at ffffffff815afa89
    
    Dave Chinner has properly pointed out that this is a deadlock in the
    reclaim code because ext4 doesn't submit pages which are marked by
    PG_writeback right away.
    
    The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
    with too many dirty pages") and it was applied only when may_enter_fs
    was specified.  The code has been changed by c3b94f4 ("memcg:
    further prevent OOM with too many dirty pages") which has removed the
    __GFP_FS restriction with a reasoning that we do not get into the fs
    code.  But this is not sufficient apparently because the fs doesn't
    necessarily submit pages marked PG_writeback for IO right away.
    
    ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
    submit the bio.  Instead it tries to map more pages into the bio and
    mpage_map_one_extent might trigger memcg charge which might end up
    waiting on a page which is marked PG_writeback but hasn't been submitted
    yet so we would end up waiting for something that never finishes.
    
    Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
    before we go to wait on the writeback.  The page fault path, which is
    the only path that triggers memcg oom killer since 3.12, shouldn't
    require GFP_NOFS and so we shouldn't reintroduce the premature OOM
    killer issue which was originally addressed by the heuristic.
    
    As per David Chinner the xfs is doing similar thing since 2.6.15 already
    so ext4 is not the only affected filesystem.  Moreover he notes:
    
    : For example: IO completion might require unwritten extent conversion
    : which executes filesystem transactions and GFP_NOFS allocations. The
    : writeback flag on the pages can not be cleared until unwritten
    : extent conversion completes. Hence memory reclaim cannot wait on
    : page writeback to complete in GFP_NOFS context because it is not
    : safe to do so, memcg reclaim or otherwise.
    
    Cc: stable@vger.kernel.org # 3.9+
    [tytso@mit.edu: corrected the control flow]
    Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
    Reported-by: Nikolay Borisov <kernel@kyup.com>
    Signed-off-by: Michal Hocko <mhocko@suse.cz>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit fcdf31a
Author: Ross Lagerwall <ross.lagerwall@citrix.com>
Date:   Fri Jul 31 14:30:42 2015 +0100

    xen/events/fifo: Handle linked events when closing a port
    
    An event channel bound to a CPU that was offlined may still be linked
    on that CPU's queue.  If this event channel is closed and reused,
    subsequent events will be lost because the event channel is never
    unlinked and thus cannot be linked onto the correct queue.
    
    When a channel is closed and the event is still linked into a queue,
    ensure that it is unlinked before completing.
    
    If the CPU to which the event channel bound is online, spin until the
    event is handled by that CPU. If that CPU is offline, it can't handle
    the event, so clear the event queue during the close, dropping the
    events.
    
    This fixes the missing interrupts (and subsequent disk stalls etc.)
    when offlining a CPU.
    
    Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: David Vrabel <david.vrabel@citrix.com>

commit 6ea76f3
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Aug 3 17:24:11 2015 +0200

    drm/atomic-helpers: Make encoder picking more robust
    
    We've had a few issues with atomic where subtle bugs in the encoder
    picking logic lead to accidental self-stealing of the encoder,
    resulting in a NULL connector_state->crtc in update_connector_routing
    and subsequent.
    
    Linus applied some duct-tape for an mst regression in
    
    commit 27667f4
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date:   Wed Jul 29 22:18:16 2015 -0700
    
        i915: temporary fix for DP MST docking station NULL pointer dereference
    
    But that was incomplete (the code will still oops when debuggin is
    enabled) and mangled the state even further. So instead WARN and bail
    out as the more future-proof option.
    
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Reviewed-by: Thierry Reding <treding@nvidia.com>
    Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

commit 42639ba
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Aug 3 17:24:10 2015 +0200

    drm/dp-mst: Remove debug WARN_ON
    
    Apparently been in there since forever and fairly easy to hit when
    hotplugging really fast. I can do that since my mst hub has a manual
    button to flick the hpd line for reprobing. The resulting WARNING spam
    isn't pretty.
    
    Cc: Dave Airlie <airlied@gmail.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Thierry Reding <treding@nvidia.com>
    Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

commit 459485a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Aug 3 17:24:09 2015 +0200

    drm/i915: Fixup dp mst encoder selection
    
    In
    
    commit 8c7b5cc
    Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
    Date:   Tue Apr 21 17:13:19 2015 +0300
    
        drm/i915: Use atomic helpers for computing changed flags
    
    we've switched over to the atomic version to compute the
    crtc->encoder->connector routing from the i915 variant. That one
    relies upon the ->best_encoder callback, but the i915-private version
    relied upon intel_find_encoder. Which didn't matter except for dp mst,
    where the encoder depends upon the selected crtc.
    
    Fix this functional bug by implemented a correct atomic-state based
    encoder selector for dp mst.
    
    Note that we can't get rid of the legacy best_encoder callback since
    the fbdev emulation uses that still. That means it's incorrect there
    still, but that's been the case ever since i915 dp mst support was
    merged so not a regression. Best to fix that by converting fbdev over
    to atomic too.
    
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

commit 3b8a684
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Aug 3 17:24:08 2015 +0200

    drm/atomic-helper: Add an atomice best_encoder callback
    
    With legacy helpers all the routing was already set up when calling
    best_encoder and so could be inspected. But with atomic it's staged,
    hence we need a new atomic compliant callback for drivers which need
    to inspect the requested state and can't just decided the best encoder
    statically.
    
    This is needed to fix up i915 dp mst where we need to pick the right
    encoder depending upon the requested CRTC for the connector.
    
    v2: Don't forget to amend the kerneldoc
    
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Acked-by: Thierry Reding <treding@nvidia.com>
    Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

commit 5413fcd
Author: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Date:   Mon Aug 3 12:40:51 2015 +0200

    Adding YAMA hooks also when YAMA is not stacked.
    
    Without this patch YAMA will not work at all if it is chosen
    as the primary LSM instead of being "stacked".
    
    Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
    Acked-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: James Morris <james.l.morris@oracle.com>

commit 49895bc
Author: NeilBrown <neilb@suse.com>
Date:   Mon Aug 3 17:09:57 2015 +1000

    md/raid5: don't let shrink_slab shrink too far.
    
    I have a report of drop_one_stripe() called from
    raid5_cache_scan() apparently finding ->max_nr_stripes == 0.
    
    This should not be allowed.
    
    So add a test to keep max_nr_stripes above min_nr_stripes.
    
    Also use a 'mask' rather than a 'mod' in drop_one_stripe
    to ensure 'hash' is valid even if max_nr_stripes does reach zero.
    
    
    Fixes: edbe83a ("md/raid5: allow the stripe_cache to grow and shrink.")
    Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569)
    Reported-by: Tomas Papan <tomas.papan@gmail.com>
    Signed-off-by: NeilBrown <neilb@suse.com>

commit b6878d9
Author: Benjamin Randazzo <benjamin@randazzo.fr>
Date:   Sat Jul 25 16:36:50 2015 +0200

    md: use kzalloc() when bitmap is disabled
    
    In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
    mdu_bitmap_file_t called "file".
    
    5769         file = kmalloc(sizeof(*file), GFP_NOIO);
    5770         if (!file)
    5771                 return -ENOMEM;
    
    This structure is copied to user space at the end of the function.
    
    5786         if (err == 0 &&
    5787             copy_to_user(arg, file, sizeof(*file)))
    5788                 err = -EFAULT
    
    But if bitmap is disabled only the first byte of "file" is initialized
    with zero, so it's possible to read some bytes (up to 4095) of kernel
    space memory from user space. This is an information leak.
    
    5775         /* bitmap disabled, zero the first byte and copy out */
    5776         if (!mddev->bitmap_info.file)
    5777                 file->pathname[0] = '\0';
    
    Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
    Signed-off-by: NeilBrown <neilb@suse.com>

commit 423f04d
Author: NeilBrown <neilb@suse.com>
Date:   Mon Jul 27 11:48:52 2015 +1000

    md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
    
    raid1_end_read_request() assumes that the In_sync bits are consistent
    with the ->degaded count.
    raid1_spare_active updates the In_sync bit before the ->degraded count
    and so exposes an inconsistency, as does error()
    So extend the spinlock in raid1_spare_active() and error() to hide those
    inconsistencies.
    
    This should probably be part of
      Commit: 34cab6f ("md/raid1: fix test for 'was read error from
      last working device'.")
    as it addresses the same issue.  It fixes the same bug and should go
    to -stable for same reasons.
    
    Fixes: 7607305 ("md/raid1: clean up read_balance.")
    Cc: stable@vger.kernel.org (v3.0+)
    Signed-off-by: NeilBrown <neilb@suse.com>

commit e331146
Author: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Date:   Mon Jul 27 17:30:48 2015 +0300

    i2c: fix leaked device refcount on of_find_i2c_* error path
    
    If of_find_i2c_device_by_node() or of_find_i2c_adapter_by_node() find
    a device by node, but its type does not match, a reference to that
    device is still held. This change fixes the problem.
    
    Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit 8fcd461
Author: Jeff Layton <jlayton@poochiereds.net>
Date:   Thu Jul 30 06:57:46 2015 -0400

    nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
    
    Currently, preprocess_stateid_op calls nfs4_check_olstateid which
    verifies that the open stateid corresponds to the current filehandle in the
    call by calling nfs4_check_fh.
    
    If the stateid is a NFS4_DELEG_STID however, then no such check is done.
    This could cause incorrect enforcement of permissions, because the
    nfsd_permission() call in nfs4_check_file uses current the current
    filehandle, but any subsequent IO operation will use the file descriptor
    in the stateid.
    
    Move the call to nfs4_check_fh into nfs4_check_file instead so that it
    can be done for all stateid types.
    
    Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
    Cc: stable@vger.kernel.org
    [bfields: moved fh check to avoid NULL deref in special stateid case]
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

commit e952849
Author: Masanari Iida <standby24x7@gmail.com>
Date:   Tue Jul 28 20:11:23 2015 +0900

    i2c: Fix typo in i2c-bfin-twi.c
    
    This patch fix some typos found in a printk message and
    MODULE_DESCRIPTION.
    
    Signed-off-by: Masanari Iida <standby24x7@gmail.com>
    Acked-by: Sonic Zhang <sonic.zhang@analog.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit 828e66c
Author: Jan Luebbe <jlu@pengutronix.de>
Date:   Wed Jul 8 16:35:27 2015 +0200

    i2c: omap: fix bus recovery setup
    
    At least on the AM335x, enabling OMAP_I2C_SYSTEST_ST_EN is not enough to
    allow direct access to the SCL and SDA pins. In addition to ST_EN, we
    need to set the TMODE to 0b11 (Loop back & SDA/SCL IO mode select).
    Also, as the reset values of SCL_O and SDA_O are 0 (which means "drive
    low level"), we need to set them to 1 (which means "high-impedance") to
    avoid unwanted changes on the pins.
    
    As a precaution, reset all these bits to their default values after
    recovery is complete.
    
    Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
    Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
    Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit 8b06260
Author: Jan Luebbe <jlu@pengutronix.de>
Date:   Wed Jul 8 16:35:06 2015 +0200

    i2c: core: only use set_scl for bus recovery after calling prepare_recovery
    
    Using set_scl may be ineffective before calling the driver specific
    prepare_recovery callback, which might change into a test mode. So
    instead of setting SCL in i2c_generic_scl_recovery, move it to
    i2c_generic_recovery (after the optional prepare_recovery).
    
    Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
    Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
    Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit d12c0aa
Author: Vladimir Zapolskiy <vz@mleia.com>
Date:   Mon Jul 27 00:18:51 2015 +0300

    misc: eeprom: at24: clean up at24_bin_write()
    
    The change removes redundant sysfs binary file boundary check, since
    this task is already done on caller side in fs/sysfs/file.c
    
    Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit 1f02329
Author: Vladimir Zapolskiy <vz@mleia.com>
Date:   Mon Jul 27 00:16:31 2015 +0300

    i2c: slave eeprom: clean up sysfs bin attribute read()/write()
    
    The change removes redundant sysfs binary file boundary checks,
    since this task is already done on caller side in fs/sysfs/file.c
    
    Note, on file size overflow read() now returns 0, and this is a
    correct and expected EOF notification according to POSIX.
    
    Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit 2761713
Author: Ilya Dryomov <idryomov@gmail.com>
Date:   Thu Jul 16 17:36:11 2015 +0300

    rbd: fix copyup completion race
    
    For write/discard obj_requests that involved a copyup method call, the
    opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
    rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
    ->xferred and delegates to rbd_img_obj_callback(), the "normal" image
    object callback, for reporting to block layer and putting refs.
    
    rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
    which means obj_request is marked done in rbd_osd_trivial_callback(),
    *before* ->callback is invoked and rbd_img_obj_copyup_callback() has
    a chance to run.  Marking obj_request done essentially means giving
    rbd_img_obj_callback() a license to end it at any moment, so if another
    obj_request from the same img_request is being completed concurrently,
    rbd_img_obj_end_request() may very well be called on such prematurally
    marked done request:
    
    <obj_request-1/2 reply>
    handle_reply()
      rbd_osd_req_callback()
        rbd_osd_trivial_callback()
        rbd_obj_request_complete()
        rbd_img_obj_copyup_callback()
        rbd_img_obj_callback()
                                        <obj_request-2/2 reply>
                                        handle_reply()
                                          rbd_osd_req_callback()
                                            rbd_osd_trivial_callback()
          for_each_obj_request(obj_request->img_request) {
            rbd_img_obj_end_request(obj_request-1/2)
            rbd_img_obj_end_request(obj_request-2/2) <--
          }
    
    Calling rbd_img_obj_end_request() on such a request leads to trouble,
    in particular because its ->xfferred is 0.  We report 0 to the block
    layer with blk_update_request(), get back 1 for "this request has more
    data in flight" and then trip on
    
        rbd_assert(more ^ (which == img_request->obj_request_count));
    
    with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
    been called for both requests and lhs (more) being 1 because we haven't
    got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
    
    To fix this, leverage that rbd wants to call class methods in only two
    cases: one is a generic method call wrapper (obj_request is standalone)
    and the other is a copyup (obj_request is part of an img_request).  So
    make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
    rbd_img_obj_copyup_callback() from it if obj_request is part of an
    img_request, similar to how CEPH_OSD_OP_READ handler invokes
    rbd_img_obj_request_read_callback().
    
    Since rbd_img_obj_copyup_callback() is now being called from the OSD
    request callback (only), it is renamed to rbd_osd_copyup_callback().
    
    Cc: Alex Elder <elder@linaro.org>
    Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Reviewed-by: Alex Elder <elder@linaro.org>

commit fc927cd
Author: Yan, Zheng <zyan@redhat.com>
Date:   Mon Jul 20 09:50:58 2015 +0800

    ceph: always re-send cap flushes when MDS recovers
    
    commit e548e9b makes the kclient
    only re-send cap flush once during MDS failover. If the kclient sends
    a cap flush after MDS enters reconnect stage but before MDS recovers.
    The kclient will skip re-sending the same cap flush when MDS recovers.
    
    This causes problem for newly created inode. The MDS handles cap
    flushes before replaying unsafe requests, so it's possible that MDS
    find corresponding inode is missing when handling cap flush. The fix
    is reverting to old behaviour: always re-send when MDS recovers
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit f6762cb
Author: Yan, Zheng <zyan@redhat.com>
Date:   Tue Jul 7 16:18:46 2015 +0800

    ceph: fix ceph_encode_locks_to_buffer()
    
    posix locks should be in ctx->flc_posix list
    
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit 586b7cc
Author: Christian Borntraeger <borntraeger@de.ibm.com>
Date:   Tue Jul 28 15:03:05 2015 +0200

    KVM: s390: Fix hang VCPU hang/loop regression
    
    commit 785dbef ("KVM: s390: optimize round trip time in request
    handling") introduced a regression. This regression was seen with
    CPU hotplug in the guest and switching between 1 or 2 CPUs. This will
    set/reset the IBS control via synced request.
    
    Whenever we make a synced request, we first set the vcpu->requests
    bit and then block the vcpu. The handler, on the other hand, unblocks
    itself, processes vcpu->requests (by clearing them) and unblocks itself
    once again.
    
    Now, if the requester sleeps between setting of vcpu->requests and
    blocking, the handler will clear the vcpu->requests bit and try to
    unblock itself (although no bit is set). When the requester wakes up,
    it blocks the VCPU and we have a blocked VCPU without requests.
    
    Solution is to always unset the block bit.
    
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
    Fixes: 785dbef ("KVM: s390: optimize round trip time in request handling")

commit fe0d34d
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed Jul 29 05:52:14 2015 +0930

    module: weaken locking assertion for oops path.
    
    We don't actually hold the module_mutex when calling find_module_all
    from module_kallsyms_lookup_name: that's because it's used by the oops
    code and we don't want to deadlock.
    
    However, access to the list read-only is safe if preempt is disabled,
    so we can weaken the assertion.  Keep a strong version for external
    callers though.
    
    Fixes: 0be964b ("module: Sanitize RCU usage and locking")
    Reported-by: He Kuang <hekuang@huawei.com>
    Cc: stable@kernel.org
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit 17fb874
Author: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date:   Fri Jul 24 13:13:30 2015 +0200

    hwrng: core - correct error check of kthread_run call
    
    The kthread_run() function can return two different error values
    but the hwrng core only checks for -ENOMEM. If the other error
    value -EINTR is returned it is assigned to hwrng_fill and later
    used on a kthread_stop() call which naturally crashes.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit f898c52
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Wed Jul 22 18:05:35 2015 +0800

    crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
    
    This patch removes a bogus BUG_ON in the ablkcipher path that
    triggers when the destination buffer is different from the source
    buffer and is scattered.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit 6f043b5
Author: Tadeusz Struk <tadeusz.struk@intel.com>
Date:   Tue Jul 21 22:07:47 2015 -0700

    crypto: qat - Fix invalid synchronization between register/unregister sym algs
    
    The synchronization method used atomic was bogus.
    Use a proper synchronization with mutex.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit 3d1450d
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Tue Jul 7 20:26:07 2015 +0200

    Makefile: Force gzip and xz on module install
    
    Running `make modules_install` ordinarily will overwrite existing
    modules. This is the desired behavior, and is how pretty much every
    other `make install` target works.
    
    However, if CONFIG_MODULE_COMPRESS is enabled, modules are passed
    through gzip and xz which then do the file writing. Both gzip and xz
    will error out if the file already exists, unless -f is passed.
    
    This patch adds -f so that the behavior is uniform.
    
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: Michal Marek <mmarek@suse.com>

commit 6dd3f13
Author: Michal Marek <mmarek@suse.com>
Date:   Thu Jul 16 18:23:53 2015 +0200

    kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment
    
    Initialize the ARCH_* overrides before including the arch Makefile, to
    avoid picking up the values from the environment. The variables can
    still be overriden on the make command line, but this won't happen
    by accident.
    
    Signed-off-by: Michal Marek <mmarek@suse.com>

commit 1ca4b88
Author: Kinglong Mee <kinglongmee@gmail.com>
Date:   Thu Jul 9 17:38:26 2015 +0800

    nfsd: Fix a file leak on nfsd4_layout_setlease failure
    
    If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file.
    
    Fix commit c5c707f "nfsd: implement pNFS layout recalls".
    
    Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

commit c2227a3
Author: Kinglong Mee <kinglongmee@gmail.com>
Date:   Tue Jul 7 10:16:37 2015 +0800

    nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem
    
    On an absent filesystem (one served by another server), we need to be
    able to handle requests for certain attributest (like fs_locations, so
    the client can find out which server does have the filesystem), but
    others we can't.
    
    We forgot to take that into account when adding another attribute
    bitmask work for the SECURITY_LABEL attribute.
    
    There an export entry with the "refer" option can result in:
    
    [   88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249!
    [   88.414828] invalid opcode: 0000 [#1] SMP
    [   88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd]
    [   88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 #1
    [   88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
    [   88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000
    [   88.419729] RIP: 0010:[<ffffffffa04b3c10>]  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
    [   88.420376] RSP: 0000:ffff8800785db998  EFLAGS: 00010206
    [   88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980
    [   88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000
    [   88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000
    [   88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a
    [   88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980
    [   88.424295] FS:  0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
    [   88.424944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0
    [   88.426285] Stack:
    [   88.426921]  ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0
    [   88.427585]  ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0
    [   88.428228]  ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980
    [   88.428877] Call Trace:
    [   88.429527]  [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd]
    [   88.430168]  [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0
    [   88.430807]  [<ffffffff8123833e>] ? d_lookup+0x2e/0x60
    [   88.431449]  [<ffffffff81236133>] ? dput+0x33/0x230
    [   88.432097]  [<ffffffff8123f214>] ? mntput+0x24/0x40
    [   88.432719]  [<ffffffff812272b2>] ? path_put+0x22/0x30
    [   88.433340]  [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd]
    [   88.433954]  [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd]
    [   88.434601]  [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd]
    [   88.435172]  [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd]
    [   88.435710]  [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
    [   88.436447]  [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd]
    [   88.437011]  [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd]
    [   88.437566]  [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd]
    [   88.438157]  [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd]
    [   88.438680]  [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc]
    [   88.439192]  [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc]
    [   88.439694]  [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd]
    [   88.440194]  [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd]
    [   88.440697]  [<ffffffff810bb728>] kthread+0xd8/0xf0
    [   88.441260]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
    [   88.441762]  [<ffffffff81789e58>] ret_from_fork+0x58/0x90
    [   88.442322]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
    [   88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe
    [   88.444052] RIP  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
    [   88.444658]  RSP <ffff8800785db998>
    [   88.445232] ---[ end trace 6cb9d0487d94a29f ]---
    
    Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

commit 929423f
Author: Juergen Gross <jgross@suse.com>
Date:   Mon Jul 20 13:49:39 2015 +0200

    xen: release lock occasionally during ballooning
    
    When dom0 is being ballooned balloon_process() will hold the balloon
    mutex until it is finished. This will block e.g. creation of new
    domains as the device backends for the new domain need some
    autoballooned pages for the ring buffers.
    
    Avoid this by releasing the balloon mutex from time to time during
    ballooning. Adjust the comment above balloon_process() regarding
    multiple instances of balloon_process().
    
    Instead of open coding it, just use cond_resched().
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: David Vrabel <david.vrabel@citrix.com>

commit c9ddbac
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Tue Jul 14 18:27:46 2015 -0500

    PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
    
    09a2c73 ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition")
    removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was
    unused in the kernel.  But that breaks user programs that were using it
    (QEMU in particular).
    
    Restore the PCI_MSIX_FLAGS_BIRMASK definition.
    
    [bhelgaas: changelog]
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    CC: stable@vger.kernel.org	# v3.13+

commit 30b03d0
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Fri Jun 26 03:28:24 2015 +0200

    xen/gntdevt: Fix race condition in gntdev_release()
    
    While gntdev_release() is called the MMU notifier is still registered
    and can traverse priv->maps list even if no pages are mapped (which is
    the case -- gntdev_release() is called after all). But
    gntdev_release() will clear that list, so make sure that only one of
    those things happens at the same time.
    
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4d39819
@0xAX 0xAX pushed a commit to 0xAX/linux that referenced this pull request Aug 15, 2015
Michal Hocko mm, vmscan: do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:
PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
 #0 [ffff88177374ac60] __schedule at ffffffff815ab152
 #1 [ffff88177374acb0] schedule at ffffffff815ab76e
 #2 [ffff88177374acd0] schedule_timeout at ffffffff815ae5e5
 #3 [ffff88177374ad70] io_schedule_timeout at ffffffff815aad6a
 #4 [ffff88177374ada0] bit_wait_io at ffffffff815abfc6
 #5 [ffff88177374adb0] __wait_on_bit at ffffffff815abda5
 #6 [ffff88177374ae00] wait_on_page_bit at ffffffff8111fd4f
 #7 [ffff88177374ae50] shrink_page_list at ffffffff81135445
 #8 [ffff88177374af50] shrink_inactive_list at ffffffff81135845
 #9 [ffff88177374b060] shrink_lruvec at ffffffff81135ead
 #10 [ffff88177374b150] shrink_zone at ffffffff811360c3
 #11 [ffff88177374b220] shrink_zones at ffffffff81136eff
 #12 [ffff88177374b2a0] do_try_to_free_pages at ffffffff8113712f
 #13 [ffff88177374b300] try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 [ffff88177374b380] try_charge at ffffffff81189423
 #15 [ffff88177374b430] mem_cgroup_try_charge at ffffffff8118c6f5
 #16 [ffff88177374b470] __add_to_page_cache_locked at ffffffff8112137d
 #17 [ffff88177374b4e0] add_to_page_cache_lru at ffffffff81121618
 #18 [ffff88177374b510] pagecache_get_page at ffffffff8112170b
 #19 [ffff88177374b560] grow_dev_page at ffffffff811c8297
 #20 [ffff88177374b5c0] __getblk_slow at ffffffff811c91d6
 #21 [ffff88177374b600] __getblk_gfp at ffffffff811c92c1
 #22 [ffff88177374b630] ext4_ext_grow_indepth at ffffffff8124565c
 #23 [ffff88177374b690] ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 [ffff88177374b6e0] ext4_ext_insert_extent at ffffffff81246f09
 #25 [ffff88177374b750] ext4_ext_map_blocks at ffffffff8124a848
 #26 [ffff88177374b870] ext4_map_blocks at ffffffff8121a5b7
 #27 [ffff88177374b910] mpage_map_one_extent at ffffffff8121b1fa
 #28 [ffff88177374b950] mpage_map_and_submit_extent at ffffffff8121f07b
 #29 [ffff88177374b9b0] ext4_writepages at ffffffff8121f6d5
 #30 [ffff88177374bb20] do_writepages at ffffffff8112c490
 #31 [ffff88177374bb30] __filemap_fdatawrite_range at ffffffff81120199
 #32 [ffff88177374bb80] filemap_flush at ffffffff8112041c
 #33 [ffff88177374bb90] ext4_alloc_da_blocks at ffffffff81219da1
 #34 [ffff88177374bbb0] ext4_rename at ffffffff81229b91
 #35 [ffff88177374bcd0] ext4_rename2 at ffffffff81229e32
 #36 [ffff88177374bce0] vfs_rename at ffffffff811a08a5
 #37 [ffff88177374bd60] SYSC_renameat2 at ffffffff811a3ffc
 #38 [ffff88177374bf60] sys_renameat2 at ffffffff811a408e
 #39 [ffff88177374bf70] sys_rename at ffffffff8119e51e
 #40 [ffff88177374bf80] system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away. The heuristic was introduced by e62e384
("memcg: prevent OOM with too many dirty pages") and it was applied
only when may_enter_fs was specified. The code has been changed by
c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
which has removed the __GFP_FS restriction with a reasoning that we
do not get into the fs code. But this is not sufficient apparently
because the fs doesn't necessarily submit pages marked PG_writeback
for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by __GFP_FS check (for case 2)
before we go to wait on the writeback. The page fault path, which is the
only path that triggers memcg oom killer since 3.12, shouldn't require
GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue
which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:
: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
[tytso@mit.edu: check for __GFP_FS rather than __GFP_IO]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marian Marinov <mm@1h.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
794510a
@0xAX 0xAX pushed a commit to 0xAX/linux that referenced this pull request Aug 15, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit ecf5fc6)
7cfd164
@quinte17 quinte17 pushed a commit to quinte17/linux-stable that referenced this pull request Aug 17, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7f488aa
@otavio otavio pushed a commit to Freescale/linux-fslc that referenced this pull request Aug 21, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0fa4301
@shelt shelt referenced this pull request Aug 23, 2015
Closed

Update README #200

@sunny256 sunny256 pushed a commit to sunny256/linux that referenced this pull request Aug 26, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
022d35a
@JackKnightme

Manual wrapping on the web is not a good idea, because it is not responsive, and is not portable.

The application that is viewing the plaintext is what should be formatting it. If you need to delineate wrapping and non wrapping text, you should be using delineators for this that your editor or viewer or terminal can interpret.

In the modern world of responsiveness and portability, stored text should never have line breaks or formatting added to it.

This is why Markdown was invented.

Strictly enforcing word-wrap, etc, probably makes sense if you use a terminal for most of your work. If you want to use that same text for the web, or anything other than a proper terminal, you need format-less text.

@rumpelsepp

if you use a terminal for most of your work.

That's the case :smiling_imp:

@JackKnightme

if you use a terminal for most of your work.

That's the case :smiling_imp:

Yes, but the issue was with github's way of doing things. Not everyone uses the terminal exclusively, and with github's model a web centric setup is necessary. This means there are two essential viewpoints to consider, and that neither is inherently moronic. ^_^

@sunny256 sunny256 pushed a commit to sunny256/linux that referenced this pull request Sep 7, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
[ Upstream commit ecf5fc6 ]

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
1f020ce
@sunny256 sunny256 pushed a commit to sunny256/linux that referenced this pull request Sep 7, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
56c6503
@lightshire

Im a bitchass dog. Woof Woof!

@marctmiller
@adam-lee adam-lee added a commit to adam-lee/linux that referenced this pull request Sep 30, 2015
@idjelic idjelic ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) …
…optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
c7418d4
@nkeck720
nkeck720 commented Oct 5, 2015

Guys, remember that GitHub was built with both beginners and advanced programmers like Linus in mind. The interface may seem crappy to an advanced programmer but it needs to be beginner friendly, and since these people may have little or no experience with Git, they made the interface the way it is. I can understand where Linus is coming from, being a developer of Git and also working on the Linux Kernel (which has strict standars just btw), but the goals of GitHub (Being beginner friendly and general purpose) and Linus (creating and maintaining an OS kernel with strict standards on pulls and pushes, with official commit messages) are different. GitHub just isn't suited to the stricter standards behind development of a kernel. I think Linus may be a tad unreasonable for not accepting PR's just because there are beginners to Git on this site making requests who are amazing programmers, but are making them via GitHub because that is the only interface they are used to. That is all.

@squeedee
squeedee commented Oct 9, 2015

I don't expect to change @torvalds opinion on this matter but perhaps I can help those who are swayed by his delivery here.

It's perfectly fine for him to state expectations in his repository. Of course it is.

The line where he steps into ridicule is not helpful. He can, if he so chooses, state his requirements (he doesn't need to repeat himself, I'm sure there's a contributor's guide someplace) and stop completely short of calling someone a moron.

It is far more effective to stop short of ridicule.

This is not the victim philosophy.

Sugarcoating is at the other end of a continuum from ridicule. In the middle is stated facts and/or requirements. No one deserves ridicule. Not even @torvalds. Even if he chooses to do it himself.

@nkeck720

I never said I was sugarcoating, I was simply stating the fact that there are very skilled programmers who only know how to use the GitHub web interface and thus are being ignored by Linus. I think that is simply unreasonable. Linus is denying help he very well could have but since they are new to Git and started on GitHub, he won't accept their work.

@oelmekki

@nkeck720 It's a bit hard to imagine that this one who is a great
programmer, can write drivers without specs or kernel level code and is
willing to contribute to linux kernel is not able and willing to learn
how to use git itself extensively to achieve this :)

Also, the linux kernel is probably way past the point where maximizing
contributors is an issue.

@conradjones
@nkeck720
@nemobis
@kaber
kaber commented Oct 10, 2015
@alias-mac

@kaber, you are receiving updates on this because your commit in:
kaber@4533703
has (in the commit message) #17, so github relates that automatically and therefore you are automatically subscribed to the same thread you referenced (in this case this one).

I complained to github about a very similar issue: I was receiving notifications from things happening on the forked repo that I don't care about, since I'm not maintaining them.

You need to manually click the unsubscribe button (on the bottom right part) in the #17 page.

@0day-ci 0day-ci pushed a commit that referenced this pull request Oct 12, 2015
Yunzhi Li usb: dwc2: gadget: fix a memory use-after-free bug
When dwc2_hsotg_handle_unaligned_buf_complete() hs_req->req.buf
already destroyed, in dwc2_hsotg_unmap_dma(), it touches
hs_req->req.dma again, so dwc2_hsotg_unmap_dma() should be called
before dwc2_hsotg_handle_unaligned_buf_complete(). Otherwise, it
will cause a bad_page BUG, when allocate this memory page next
time.

This bug led to the following crash:

BUG: Bad page state in process swapper/0  pfn:2bdbc
[   26.820440] page:eed76780 count:0 mapcount:0 mapping:  (null) index:0x0
[   26.854710] page flags: 0x200(arch_1)
[   26.885836] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[   26.919179] bad because of flags:
[   26.948917] page flags: 0x200(arch_1)
[   26.979100] Modules linked in:
[   27.008401] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W3.14.0 #17
[   27.041816] [<c010e1f8>] (unwind_backtrace) from [<c010a704>] (show_stack+0x20/0x24)
[   27.076108] [<c010a704>] (show_stack) from [<c087eea8>] (dump_stack+0x70/0x8c)
[   27.110246] [<c087eea8>] (dump_stack) from [<c01ce0b8>] (bad_page+0xfc/0x12c)
[   27.143958] [<c01ce0b8>] (bad_page) from [<c01ce65c>] (get_page_from_freelist+0x3e4/0x50c)
[   27.179298] [<c01ce65c>] (get_page_from_freelist) from [<c01ce9a0>] (__alloc_pages_nodemask)
[   27.216296] [<c01ce9a0>] (__alloc_pages_nodemask) from [<c01cf00c>] (__get_free_pages+0x20/)
[   27.252326] [<c01cf00c>] (__get_free_pages) from [<c01e5bec>] (kmalloc_order_trace+0x34/0xa)
[   27.288295] [<c01e5bec>] (kmalloc_order_trace) from [<c0203304>] (__kmalloc+0x40/0x1ac)
[   27.323751] [<c0203304>] (__kmalloc) from [<c052abc0>] (dwc2_hsotg_ep_queue.isra.12+0x7c/0x1)
[   27.359937] [<c052abc0>] (dwc2_hsotg_ep_queue.isra.12) from [<c052af88>] (dwc2_hsotg_ep_queue)
[   27.397478] [<c052af88>] (dwc2_hsotg_ep_queue_lock) from [<c0554110>] (rx_submit+0xfc/0x164)
[   27.433619] [<c0554110>] (rx_submit) from [<c05546e8>] (rx_complete+0x22c/0x230)
[   27.468872] [<c05546e8>] (rx_complete) from [<c052b528>] (dwc2_hsotg_complete_request+0xfc/0)
[   27.506240] [<c052b528>] (dwc2_hsotg_complete_request) from [<c052bba0>] (dwc2_hsotg_handle_o)
[   27.545401] [<c052bba0>] (dwc2_hsotg_handle_outdone) from [<c052be70>] (dwc2_hsotg_epint+0x2c)
[   27.583689] [<c052be70>] (dwc2_hsotg_epint) from [<c052c750>] (dwc2_hsotg_irq+0x1dc/0x4ac)
[   27.621041] [<c052c750>] (dwc2_hsotg_irq) from [<c01682e0>] (handle_irq_event_percpu+0x70/0x)
[   27.659066] [<c01682e0>] (handle_irq_event_percpu) from [<c01684ec>] (handle_irq_event+0x4c)
[   27.697322] [<c01684ec>] (handle_irq_event) from [<c016bae0>] (handle_fasteoi_irq+0xc8/0x11)
[   27.735451] [<c016bae0>] (handle_fasteoi_irq) from [<c0167b8c>] (generic_handle_irq+0x30/0x)
[   27.773918] [<c0167b8c>] (generic_handle_irq) from [<c0167ca4>] (__handle_domain_irq+0x84/0)
[   27.812018] [<c0167ca4>] (__handle_domain_irq) from [<c01003b0>] (gic_handle_irq+0x48/0x6c)
[   27.849695] [<c01003b0>] (gic_handle_irq) from [<c010b340>] (__irq_svc+0x40/0x50)
[   27.886907] Exception stack(0xc0d01ee0 to 0xc0d01f28)

Acked-by: John Youn <johnyoun@synopsys.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Yunzhi Li <lyz@rock-chips.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
44583fe
@quant67 quant67 referenced this pull request in MopTym/gojuon Oct 14, 2015
Merged

description: make description appropriate #2

@0day-ci 0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 21, 2015
Tsutomu Itoh Btrfs: add a check of whether fs_info->fs_root is NULL in btrfs_async…
…_reclaim_metadata_space()

Kernel panic occurred due to NULL pointer reference in can_overcommit().
Because btrfs_async_reclaim_metadata_space() passed NULL pointer to
btrfs_calc_reclaim_metadata_size().

============================================================
[ 3756.152833] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f0
[ 3756.152882] IP: [<ffffffffa01d9c21>] can_overcommit+0x21/0xf0 [btrfs]
[ 3756.152936] PGD 0
[ 3756.152949] Oops: 0000 [#1] SMP
[ 3756.152969] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_broute bridge stp llc ebtable_nat
ebtables ip6table_mangle ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables
iptable_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack coretemp kvm_intel kvm crc32
_pclmul iTCO_wdt iTCO_vendor_support microcode ipmi_si lpc_ich mfd_core pcspkr acpi_power_meter ipmi_msghandler i2c_i801 i7core_edac shpchp edac_core
nfsd acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel btrfs xor raid6_pq usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops ttm drm igb ptp ata_generic pps_core pata_acpi crc32c_intel
[ 3756.153397]  dca megaraid_sas i2c_algo_bit ata_piix i2c_core
[ 3756.153433] CPU: 3 PID: 3004 Comm: kworker/u25:4 Tainted: G          I     4.3.0-rc6 #1
[ 3756.153469] Hardware name: FUJITSU-SV                       PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.09.2619.N1           12/13/2010
[ 3756.153537] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 3756.153571] task: ffff88023581a400 ti: ffff880234648000 task.ti: ffff880234648000
[ 3756.153604] RIP: 0010:[<ffffffffa01d9c21>]  [<ffffffffa01d9c21>] can_overcommit+0x21/0xf0 [btrfs]
[ 3756.153655] RSP: 0018:ffff88023464bda8  EFLAGS: 00010282
[ 3756.153679] RAX: 0000000001000000 RBX: ffff880431f68c00 RCX: 0000000000000002
[ 3756.153711] RDX: 0000000000c00000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3756.153742] RBP: ffff88023464bde0 R08: 0000000000000101 R09: 000000000000000c
[ 3756.153773] R10: ffffffff81d10060 R11: ffffffff81d10050 R12: ffff880431f68c00
[ 3756.153804] R13: 0000000000000000 R14: ffff880035f67070 R15: 0000000000c00000
[ 3756.153836] FS:  0000000000000000(0000) GS:ffff880237cc0000(0000) knlGS:0000000000000000
[ 3756.153871] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3756.153897] CR2: 00000000000001f0 CR3: 0000000001c08000 CR4: 00000000000006e0
[ 3756.153929] Stack:
[ 3756.153940]  ffff880200000000 ffff880237cd2940 ffff880431f68c00 0000000000000000
[ 3756.153979]  0000000000c00000 ffff880035f67070 0000000000000000 ffff88023464be20
[ 3756.154016]  ffffffffa01e5404 ffff880431f68c80 ffff880234482240 ffff8802378a1800
[ 3756.154054] Call Trace:
[ 3756.154081]  [<ffffffffa01e5404>] btrfs_async_reclaim_metadata_space+0xb4/0x210 [btrfs]
[ 3756.154119]  [<ffffffff8109158e>] process_one_work+0x19e/0x3d0
[ 3756.154146]  [<ffffffff8109180e>] worker_thread+0x4e/0x450
[ 3756.154174]  [<ffffffff816914c9>] ? __schedule+0x2b9/0x930
[ 3756.154199]  [<ffffffff810917c0>] ? process_one_work+0x3d0/0x3d0
[ 3756.154227]  [<ffffffff810917c0>] ? process_one_work+0x3d0/0x3d0
[ 3756.154255]  [<ffffffff81096e59>] kthread+0xc9/0xe0
[ 3756.154279]  [<ffffffff81096d90>] ? kthread_worker_fn+0x160/0x160
[ 3756.154307]  [<ffffffff816956cf>] ret_from_fork+0x3f/0x70
[ 3756.154333]  [<ffffffff81096d90>] ? kthread_worker_fn+0x160/0x160
[ 3756.154361] Code: a5 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 31 f6 49 89 fd 49 89 d7 48 83 ec 10
<4c> 8b b7 f0 01 00 00 89 4d cc 49 3b 7e 30 40 0f 95 c6 48 8d 74
[ 3756.156802] RIP  [<ffffffffa01d9c21>] can_overcommit+0x21/0xf0 [btrfs]
[ 3756.157995]  RSP <ffff88023464bda8>
[ 3756.159162] CR2: 00000000000001f0
============================================================

fs_info->fs_root is referred in btrfs_async_reclaim_metadata_space()
when mount kicked kworker(btrfs_async_reclaim_metadata_space).

But at this time, fs_info->fs_root had not been initialized yet,
so NULL pointer passed to btrfs_calc_reclaim_metadata_size().

============================================================
PID: 3045   TASK: ffff8800bb06b000  CPU: 2   COMMAND: "mount"
    [exception RIP: queued_spin_lock_slowpath+350]
    RIP: ffffffff810be2de  RSP: ffff8800b9fdb738  RFLAGS: 00000202
    RAX: 0000000000000101  RBX: ffff880431f68c00  RCX: 0000000000000001
    RDX: 0000000000000101  RSI: 0000000000000001  RDI: ffff880431f68c00
    RBP: ffff8800b9fdb738   R8: 0000000000000101   R9: 0000000000000000
    R10: 0000000000004000  R11: 0000000000018e58  R12: 0000000000000001
    R13: ffff8800b9fdb7c0  R14: ffff8800bb06b000  R15: 0000000000000001
    CS: 0010  SS: 0018
 #0 [ffff8800b9fdb740] _raw_spin_lock at ffffffff81694ff0
 #1 [ffff8800b9fdb750] reserve_metadata_bytes at ffffffffa01e55cc [btrfs]
 #2 [ffff8800b9fdb800] btrfs_block_rsv_add at ffffffffa01e5a93 [btrfs]
 #3 [ffff8800b9fdb828] btrfs_truncate_inode_items at ffffffffa0202779 [btrfs]
 #4 [ffff8800b9fdb920] btrfs_evict_inode at ffffffffa02040ec [btrfs]
 #5 [ffff8800b9fdb990] evict at ffffffff811ed6ea
 #6 [ffff8800b9fdb9b8] iput at ffffffff811ed996
 #7 [ffff8800b9fdb9e8] btrfs_orphan_cleanup at ffffffffa0204c57 [btrfs]
 #8 [ffff8800b9fdba60] btrfs_recover_relocation at ffffffffa0247a8e [btrfs]
 #9 [ffff8800b9fdbaf0] open_ctree at ffffffffa01f576b [btrfs]
#10 [ffff8800b9fdbbc8] btrfs_mount at ffffffffa01cc6a9 [btrfs]
#11 [ffff8800b9fdbc90] mount_fs at ffffffff811d7ab8
#12 [ffff8800b9fdbcd8] vfs_kern_mount at ffffffff811f1be7
#13 [ffff8800b9fdbd10] btrfs_mount at ffffffffa01cbf57 [btrfs]
#14 [ffff8800b9fdbdd8] mount_fs at ffffffff811d7ab8
#15 [ffff8800b9fdbe20] vfs_kern_mount at ffffffff811f1be7
#16 [ffff8800b9fdbe58] do_mount at ffffffff811f3f8d
#17 [ffff8800b9fdbf08] sys_mount at ffffffff811f4e1c
#18 [ffff8800b9fdbf50] entry_SYSCALL_64_fastpath at ffffffff8169536e
    RIP: 00007f6250fc733a  RSP: 00007ffdcd40ba88  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f6251b2f42a  RCX: 00007f6250fc733a
    RDX: 0000564b58511070  RSI: 0000564b5850e290  RDI: 0000564b5850e270
    RBP: 0000564b5850e150   R8: 0000000000000000   R9: 0000000000000014
    R10: 00000000c0ed0000  R11: 0000000000000246  R12: 00007f6251d3f1dc
    R13: 00007ffdcd40bd88  R14: 0000000000000000  R15: 00000000ffffffff
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
============================================================

Therefore, check of whether fs_info->fs_root is NULL is added to
btrfs_async_reclaim_metadata_space().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
a5a4ec2
@0day-ci 0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 26, 2015
Jiri Olsa reporting stuck on s390 in dso__split_kallsyms_for_kcore
hi,
I'm getting stuck buildid-list command on s390

seems like the kcore code gets stuck with inseting
into rbtree while iterating it..

I was able to fix it with patch below, bu I'm not sure it's the
correct fix because the kcore maps magic is beyond me so far ;-)

please check attached backtrace and patch

(gdb) r buildid-list -i perf.data --with-hits
Starting program: /root/linux/tools/perf/./perf buildid-list -i perf.data --with-hits
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 56723.
^C
Program received signal SIGINT, Interrupt.
rb_next (node=0x81573930, node@entry=0x81551840) at ../lib/rbtree.c:451
451                             node=node->rb_left;
Missing separate debuginfos, use: debuginfo-install audit-libs-2.4.1-5.el7.s390x bzip2-libs-1.0.6-13.el7.s390x elfutils-libelf-0.163-3.el7.s390x elfutils-libs-0.163-3.el7.s390x glibc-2.17-105.el7.s390x nss-softokn-freebl-3.16.2.3-13.el7_1.s390x perl-libs-5.16.3-286.el7.s390x python-libs-2.7.5-34.el7.s390x slang-2.2.4-11.el7.s390x xz-libs-5.1.2-12alpha.el7.s390x zlib-1.2.7-15.el7.s390x
(gdb) bt
#0  rb_next (node=0x81573930, node@entry=0x81551840) at ../lib/rbtree.c:451
#1  0x00000000800a7da0 in dso__split_kallsyms_for_kcore (dso=dso@entry=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0)
    at util/symbol.c:668
#2  0x00000000800a9ac6 in dso__load_kallsyms (dso=dso@entry=0x812e69c0, filename=filename@entry=0x81353030 "/proc/kallsyms",
    map=map@entry=0x812e6fb0, filter=filter@entry=0x0) at util/symbol.c:1289
#3  0x00000000800aa0c4 in dso__load_kernel_sym (dso=dso@entry=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0)
    at util/symbol.c:1783
#4  0x00000000800aa208 in dso__load (dso=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0) at util/symbol.c:1420
#5  0x00000000800bb2aa in map__load (map=0x812e6fb0, filter=0x0) at util/map.c:289
#6  0x0000000080087c9e in thread__find_addr_map (thread=<optimized out>, cpumode=cpumode@entry=1 '\001', type=type@entry=MAP__FUNCTION,
    addr=<optimized out>, al=al@entry=0x3ffffffe118) at util/event.c:969
#7  0x0000000080081908 in build_id__mark_dso_hit (tool=<optimized out>, event=0x3fffd5aa568, sample=0x3ffffffe410, evsel=<optimized out>,
    machine=<optimized out>) at util/build-id.c:41
#8  0x00000000800bdfec in perf_evlist__deliver_sample (evlist=evlist@entry=0x812e5d60,
    tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>, event=event@entry=0x3fffd5aa568, sample=sample@entry=0x3ffffffe410,
    evsel=evsel@entry=0x812e6770, machine=0x812e5ab8) at util/session.c:1039
#9  0x00000000800be156 in machines__deliver_event (machines=machines@entry=0x812e5ab8, evlist=0x812e5d60,
    event=event@entry=0x3fffd5aa568, sample=sample@entry=0x3ffffffe410, tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>,
    file_offset=136552) at util/session.c:1076
#10 0x00000000800be3f2 in perf_session__deliver_event (session=session@entry=0x812e59e0, event=event@entry=0x3fffd5aa568,
    sample=sample@entry=0x3ffffffe410, tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>, file_offset=<optimized out>)
    at util/session.c:1133
#11 0x00000000800c0146 in perf_session__process_event (session=session@entry=0x812e59e0, event=event@entry=0x3fffd5aa568,
    file_offset=file_offset@entry=136552) at util/session.c:1298
#12 0x00000000800c0746 in __perf_session__process_events (session=session@entry=0x812e59e0, data_offset=<optimized out>,
    data_size=<optimized out>, file_size=1164232, file_size@entry=1166128) at util/session.c:1633
#13 0x00000000800c0c2c in perf_session__process_events (session=session@entry=0x812e59e0) at util/session.c:1683
#14 0x000000008002ae5e in perf_session__list_build_ids (force=<optimized out>, with_hits=true) at builtin-buildid-list.c:82
#15 0x000000008002b078 in cmd_buildid_list (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>)
    at builtin-buildid-list.c:115
#16 0x000000008007c71a in run_builtin (p=p@entry=0x8024aab8 <commands+24>, argc=argc@entry=4, argv=0x3fffffff090) at perf.c:385
#17 0x000000008007c95e in handle_internal_command (argc=<optimized out>, argv=<optimized out>) at perf.c:445
#18 0x000000008007c9e4 in run_argv (argcp=argcp@entry=0x3ffffffedd4, argv=argv@entry=0x3ffffffedc8) at perf.c:489
#19 0x000000008007cca2 in main (argc=4, argv=0x3fffffff090) at perf.c:606

thanks,
jirka
aa5b596
@0day-ci 0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 26, 2015
Adrian Hunter reporting stuck on s390 in dso__split_kallsyms_for_kcore
On 26/10/15 15:01, Jiri Olsa wrote:
> hi,
> I'm getting stuck buildid-list command on s390
>
> seems like the kcore code gets stuck with inseting
> into rbtree while iterating it..
>
> I was able to fix it with patch below, bu I'm not sure it's the
> correct fix because the kcore maps magic is beyond me so far ;-)
>
> please check attached backtrace and patch

Your fix looks correct to me.

>
>
> (gdb) r buildid-list -i perf.data --with-hits

By the way, there are problems with buildid-list, I have the patches
attached but haven't had a chance to send them.  If you find are missing
buildids you might want to try them.

> Starting program: /root/linux/tools/perf/./perf buildid-list -i perf.data --with-hits
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Detaching after fork from child process 56723.
> ^C
> Program received signal SIGINT, Interrupt.
> rb_next (node=0x81573930, node@entry=0x81551840) at ../lib/rbtree.c:451
> 451                             node=node->rb_left;
> Missing separate debuginfos, use: debuginfo-install audit-libs-2.4.1-5.el7.s390x bzip2-libs-1.0.6-13.el7.s390x elfutils-libelf-0.163-3.el7.s390x elfutils-libs-0.163-3.el7.s390x glibc-2.17-105.el7.s390x nss-softokn-freebl-3.16.2.3-13.el7_1.s390x perl-libs-5.16.3-286.el7.s390x python-libs-2.7.5-34.el7.s390x slang-2.2.4-11.el7.s390x xz-libs-5.1.2-12alpha.el7.s390x zlib-1.2.7-15.el7.s390x
> (gdb) bt
> #0  rb_next (node=0x81573930, node@entry=0x81551840) at ../lib/rbtree.c:451
> #1  0x00000000800a7da0 in dso__split_kallsyms_for_kcore (dso=dso@entry=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0)
>     at util/symbol.c:668
> #2  0x00000000800a9ac6 in dso__load_kallsyms (dso=dso@entry=0x812e69c0, filename=filename@entry=0x81353030 "/proc/kallsyms",
>     map=map@entry=0x812e6fb0, filter=filter@entry=0x0) at util/symbol.c:1289
> #3  0x00000000800aa0c4 in dso__load_kernel_sym (dso=dso@entry=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0)
>     at util/symbol.c:1783
> #4  0x00000000800aa208 in dso__load (dso=0x812e69c0, map=map@entry=0x812e6fb0, filter=filter@entry=0x0) at util/symbol.c:1420
> #5  0x00000000800bb2aa in map__load (map=0x812e6fb0, filter=0x0) at util/map.c:289
> #6  0x0000000080087c9e in thread__find_addr_map (thread=<optimized out>, cpumode=cpumode@entry=1 '\001', type=type@entry=MAP__FUNCTION,
>     addr=<optimized out>, al=al@entry=0x3ffffffe118) at util/event.c:969
> #7  0x0000000080081908 in build_id__mark_dso_hit (tool=<optimized out>, event=0x3fffd5aa568, sample=0x3ffffffe410, evsel=<optimized out>,
>     machine=<optimized out>) at util/build-id.c:41
> #8  0x00000000800bdfec in perf_evlist__deliver_sample (evlist=evlist@entry=0x812e5d60,
>     tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>, event=event@entry=0x3fffd5aa568, sample=sample@entry=0x3ffffffe410,
>     evsel=evsel@entry=0x812e6770, machine=0x812e5ab8) at util/session.c:1039
> #9  0x00000000800be156 in machines__deliver_event (machines=machines@entry=0x812e5ab8, evlist=0x812e5d60,
>     event=event@entry=0x3fffd5aa568, sample=sample@entry=0x3ffffffe410, tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>,
>     file_offset=136552) at util/session.c:1076
> #10 0x00000000800be3f2 in perf_session__deliver_event (session=session@entry=0x812e59e0, event=event@entry=0x3fffd5aa568,
>     sample=sample@entry=0x3ffffffe410, tool=tool@entry=0x8024b240 <build_id__mark_dso_hit_ops>, file_offset=<optimized out>)
>     at util/session.c:1133
> #11 0x00000000800c0146 in perf_session__process_event (session=session@entry=0x812e59e0, event=event@entry=0x3fffd5aa568,
>     file_offset=file_offset@entry=136552) at util/session.c:1298
> #12 0x00000000800c0746 in __perf_session__process_events (session=session@entry=0x812e59e0, data_offset=<optimized out>,
>     data_size=<optimized out>, file_size=1164232, file_size@entry=1166128) at util/session.c:1633
> #13 0x00000000800c0c2c in perf_session__process_events (session=session@entry=0x812e59e0) at util/session.c:1683
> #14 0x000000008002ae5e in perf_session__list_build_ids (force=<optimized out>, with_hits=true) at builtin-buildid-list.c:82
> #15 0x000000008002b078 in cmd_buildid_list (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>)
>     at builtin-buildid-list.c:115
> #16 0x000000008007c71a in run_builtin (p=p@entry=0x8024aab8 <commands+24>, argc=argc@entry=4, argv=0x3fffffff090) at perf.c:385
> #17 0x000000008007c95e in handle_internal_command (argc=<optimized out>, argv=<optimized out>) at perf.c:445
> #18 0x000000008007c9e4 in run_argv (argcp=argcp@entry=0x3ffffffedd4, argv=argv@entry=0x3ffffffedc8) at perf.c:489
> #19 0x000000008007cca2 in main (argc=4, argv=0x3fffffff090) at perf.c:606
>
>
> thanks,
> jirka
>
>
> ---
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index e7bf0c4..b0d2fb2 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -680,7 +680,7 @@ static int dso__split_kallsyms_for_kcore(struct dso *dso, struct map *map,
>  			pos->start -= curr_map->start - curr_map->pgoff;
>  			if (pos->end)
>  				pos->end -= curr_map->start - curr_map->pgoff;
> -			if (curr_map != map) {
> +			if (curr_map->dso != map->dso) {
>  				rb_erase_init(&pos->rb_node, root);
>  				symbols__insert(
>  					&curr_map->dso->symbols[curr_map->type],
>

From 23033db810aa8416069e68b0389eac0a33b6bb3d Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@intel.com>
Date: Sat, 24 Oct 2015 15:00:45 +0300
Subject: [PATCH 10/13] perf tools: Fix dso lookup by long name and missing
 buildids

Commit 4598a0a ("perf symbols: Improve DSO long names lookup speed with rbtree")
Added a tree to lookup dsos by long name.  That tree gets corrupted
whenever a dso long name is changed because the tree is not updated.

One effect of that is buildid-list does not work with the 'with-hits'
option because dso lookup fails and results in two structs for the same
dso.  The first has the buildid but no hits, the second has hits but no
buildid. e.g.

Before:
	$ tools/perf/perf record ls
	arch   certs    CREDITS  Documentation  firmware  include  ipc     Kconfig  lib          Makefile  net     REPORTING-BUGS  scripts   sound  usr
	block  COPYING  crypto   drivers        fs        init     Kbuild  kernel   MAINTAINERS  mm        README  samples         security  tools  virt
	[ perf record: Woken up 1 times to write data ]
	[ perf record: Captured and wrote 0.012 MB perf.data (11 samples) ]
	$ tools/perf/perf buildid-list
	574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
	30c94dc66a1fe95180c3d68d2b89e576d5ae213c /lib/x86_64-linux-gnu/libc-2.19.so
	$ tools/perf/perf buildid-list -H
	574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
	0000000000000000000000000000000000000000 /lib/x86_64-linux-gnu/libc-2.19.so
After:
	$ tools/perf/perf buildid-list -H
	574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
	30c94dc66a1fe95180c3d68d2b89e576d5ae213c /lib/x86_64-linux-gnu/libc-2.19.so

The fix is to record the root of the tree on the dso so that
dso__set_long_name() can update the tree when the long name changes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
5fe286e
@jhofstee jhofstee added a commit to victronenergy/linux that referenced this pull request Nov 10, 2015
@idjelic idjelic ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) …
…optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
a982c99
@0day-ci 0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Nov 11, 2015
@masoncl masoncl xfs: give all workqueues rescuer threads
We're consistently hitting deadlocks here with XFS on recent kernels.
After some digging through the crash files, it looks like everyone in
the system is waiting for XFS to reclaim memory.

Something like this:

PID: 2733434  TASK: ffff8808cd242800  CPU: 19  COMMAND: "java"
 #0 [ffff880019c53588] __schedule at ffffffff818c4df2
 #1 [ffff880019c535d8] schedule at ffffffff818c5517
 #2 [ffff880019c535f8] _xfs_log_force_lsn at ffffffff81316348
 #3 [ffff880019c53688] xfs_log_force_lsn at ffffffff813164fb
 #4 [ffff880019c536b8] xfs_iunpin_wait at ffffffff8130835e
 #5 [ffff880019c53728] xfs_reclaim_inode at ffffffff812fd453
 #6 [ffff880019c53778] xfs_reclaim_inodes_ag at ffffffff812fd8c7
 #7 [ffff880019c53928] xfs_reclaim_inodes_nr at ffffffff812fe433
 #8 [ffff880019c53958] xfs_fs_free_cached_objects at ffffffff8130d3b9
 #9 [ffff880019c53968] super_cache_scan at ffffffff811a6f73
#10 [ffff880019c539c8] shrink_slab at ffffffff811460e6
#11 [ffff880019c53aa8] shrink_zone at ffffffff8114a53f
#12 [ffff880019c53b48] do_try_to_free_pages at ffffffff8114a8ba
#13 [ffff880019c53be8] try_to_free_pages at ffffffff8114ad5a
#14 [ffff880019c53c78] __alloc_pages_nodemask at ffffffff8113e1b8
#15 [ffff880019c53d88] alloc_kmem_pages_node at ffffffff8113e671
#16 [ffff880019c53dd8] copy_process at ffffffff8104f781
#17 [ffff880019c53ec8] do_fork at ffffffff8105129c
#18 [ffff880019c53f38] sys_clone at ffffffff810515b6
#19 [ffff880019c53f48] stub_clone at ffffffff818c8e4d

xfs_log_force_lsn is waiting for logs to get cleaned, which is waiting
for IO, which is waiting for workers to complete the IO which is waiting
for worker threads that don't exist yet:

PID: 2752451  TASK: ffff880bd6bdda00  CPU: 37  COMMAND: "kworker/37:1"
 #0 [ffff8808d20abbb0] __schedule at ffffffff818c4df2
 #1 [ffff8808d20abc00] schedule at ffffffff818c5517
 #2 [ffff8808d20abc20] schedule_timeout at ffffffff818c7c6c
 #3 [ffff8808d20abcc0] wait_for_completion_killable at ffffffff818c6495
 #4 [ffff8808d20abd30] kthread_create_on_node at ffffffff8106ec82
 #5 [ffff8808d20abdf0] create_worker at ffffffff8106752f
 #6 [ffff8808d20abe40] worker_thread at ffffffff810699be
 #7 [ffff8808d20abec0] kthread at ffffffff8106ef59
 #8 [ffff8808d20abf50] ret_from_fork at ffffffff818c8ac8

I think we should be using WQ_MEM_RECLAIM to make sure this thread
pool makes progress when we're not able to allocate new workers.

[dchinner: make all workqueues WQ_MEM_RECLAIM]

Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7a29ac4
@joshix joshix referenced this pull request in coreos/docs Dec 28, 2015
Merged

README: change pull request description #708

@Ismael-VC

And when those people with lower standards try to get their commits included in the kernel, I will ridicule them and point out how broken their commit messages or pull requests are.

Wow, really!? I respect @torvalds work, but I can't see how anybody could respect him, I guess that as long as his arrogance and his geniusness are not mutually exclusive, lots of people don't care at all and even idolize him for that. :worried:

Even you Torvalds were a n00b at some point, and your rudeness and willingness to publicly ridicule other newbies is totally unnecessary, why not just point them how broken their commit messages or pull requests are and link them to your standards guide (which I agree is a good standard) if you are taking the trouble to answer or just ignore them instead?

@pixelrebel

...his arrogance...lots of people don't care at all and even idolize him for that.

I, for one, would be honored if Linus took the time to call me a moron.

why not just point them how broken their commit messages or pull requests are and link them to your standards guide...if you are taking the trouble to answer

Then golden threads like this one wouldn't exist. :)

@Ismael-VC

I, for one, would be honored if Linus took the time to call me a moron.

Good for you, but the point is that not all the people would be willing to take insults from anyone with pride like you and some others may do.

Insulting over the internet is super easy, insulting back is also super easy. I wonder if he would also still be at least that rude with each and everyone of those people with lower standards that try to get their commits included in the kernel in person ...at fist range?

So much for a benevolent dictator for life... if such a thing even exists.

@mucamaca
mucamaca commented Feb 1, 2016

I am pretty sure that you wouldn't tell him all this in person..

@leonklingele

Can you please stop spamming my inbox with this useless junk? Thank you

@Ismael-VC

Sure, why not? He is just a man like you and me. Are you implying that you would also be OK if he insulted you? And It doesn't mean that I would be disrespecting him. Not all the people likes to be disrespectful and I would also ask him first if he is still like that and why (you know ...talk), if he still does, in that case.

I'm just wondering, because I cannot truly believe that each and everyone he insults just lets him be, either because they idolize him or fear him?

I would expect at least one of them to fight back, that's all, I'm done with my comments here, I think I'm pretty clear of my opinion.

@alias-mac

@Ismael-VC, I think you are seeing things in the wrong perspective.
It seems to me that you assume wrong things and if you read carefully above, Linus explained that in the first comment. He got upset when other ppl jumped into the thread and said "open an exception", which I must agree, makes no sense to open an exception just because it is 3 lines of code...

There are standards that Linux project is following and all of that is well documented already, thus making statements/comments like those are just a waste of Linus and other ppl's time. If the maintainers of the project (and community) already spent their own time to write those standards and to share them with the community (giving it for free), I strongly believe that all we (including n00bs) should do is: RTFM!
I often see n00bs on forums asking things that could clearly be found by researching a little bit more. It is a my time vs your time problem.

Seems like that you did the same mistake and judged the entire thread.
I would probably burst too and would have the same reaction when I read the comment:
#17 (comment)

Don't you think that was being disrespectful? I do :smile:

@leonklingele, please unsubscribe the thread (it is easier to press that button than it is to make a comment here).

@Bengt Bengt referenced this pull request in jayphelps/git-blame-someone-else Feb 9, 2016
@torvalds Pretend to be Linus e5cfe4b
@nemobis
nemobis commented Feb 22, 2016

I'm surprised there isn't a link yet, so here it goes: https://mako.cc/writing/hill-free_tools.html

@torvalds torvalds pushed a commit that referenced this pull request Feb 26, 2016
Mark Rutland KVM: arm/arm64: vgic: Ensure bitmaps are long enough
When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
236cf17
@mdjurfeldt mdjurfeldt added a commit to mdjurfeldt/linux that referenced this pull request Feb 28, 2016
Mark Rutland KVM: arm/arm64: vgic: Ensure bitmaps are long enough
When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
fe1d5e3
@Noltari Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 4, 2016
Mark Rutland KVM: arm/arm64: vgic: Ensure bitmaps are long enough
commit 236cf17 upstream.

When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d62cca1
@fedux fedux pushed a commit to fedux/linux that referenced this pull request Mar 19, 2016
Mark Rutland KVM: arm/arm64: vgic: Ensure bitmaps are long enough
[ Upstream commit 236cf17 ]

When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
e720b3b
@Noltari Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 20, 2016
Mark Rutland KVM: arm/arm64: vgic: Ensure bitmaps are long enough
[ Upstream commit 236cf17 ]

When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[<ffffffc00008e770>] dump_backtrace+0x0/0x280
[<ffffffc00008ea04>] show_stack+0x14/0x20
[<ffffffc000726360>] dump_stack+0x100/0x188
[<ffffffc00030d324>] print_trailer+0xfc/0x168
[<ffffffc000312294>] object_err+0x3c/0x50
[<ffffffc0003140fc>] kasan_report_error+0x244/0x558
[<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
[<ffffffc000745688>] __bitmap_or+0xc0/0xc8
[<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
[<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
[<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
[<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
[<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
b29de09
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
a3a32bd
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
f10731a
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Harshula Jayasuriya nfsd: nfsd_open: when dentry_open returns an error do not propagate a…
…s struct file

commit e4daf1f upstream.

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[xr: Backported to 3.4: adjust context]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
e7bac3a
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
9b9aea2
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
80a1788
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
9a2f0d2
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Harshula Jayasuriya nfsd: nfsd_open: when dentry_open returns an error do not propagate a…
…s struct file

commit e4daf1f upstream.

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
f63540d
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
dd4d4b5
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
5fd842a
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
[ Upstream commit ecf5fc6 ]

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
02b550b
@sashalevin sashalevin added a commit that referenced this pull request Apr 11, 2016
Harshula Jayasuriya nfsd: nfsd_open: when dentry_open returns an error do not propagate a…
…s struct file

commit e4daf1f upstream.

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
e327508
@sashalevin sashalevin added a commit that referenced this pull request Apr 12, 2016
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670