New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revive KQueue Reactor #1918
Comments
I don't like platform.usingKQueue. Also, I'd like to see all tests pass on FreeBSD, or else an explanation as to why they don't. |
|
python-hpio.net is down for a while now, can you provide the latest tarball? The best would probably to ship it with Twisted, if license is compatible. |
The latest source is here: As far as I know, the thing on python-hpio was unchanged from this. License is BSD. |
Also, there is no way that kqueue will work with PTYs (or any "device" kind of fd) on osx < 10.5 (which hasn't been released yet). This is a limitation in the OS. Poll doesn't work for device fds either, btw. Just to calibrate expectations here... |
This patch no longer applies cleanly to trunk. |
I have started some things in the kqreactor-1918 branch. |
(In [23105]) Branching to 'kqreactor-1918-2' |
While trying to port dialtone's inotify to Mac, seems I need to use kqueue(). The kqreactor in Twisted uses kqsyscall. The Python module for it seems only to be maintained with Python 1.5. Also, it does not compile on my Leapord even with the patch provided by Itamar at http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.kqreactor.html. Meanwhile, Python 2.6 is bringing kqueue() support in its select module. A backport of it for 2.5 is available at: http://pypi.python.org/pypi/select26. Using the module and patching up kqreactor.py with the attached diff, I was able to run: kqreactor.install(). I would really appreciate if somebody can look into the patch and let me know if this is the right way to proceed. I need folderwatcher support for Mac at my work so I am willing to spend time to get this thing working so any thoughts, comments are welcome. |
One thing to note. Following the example at: http://www.bpaste.net/show/25/ Not sure where I am wrong but the poller only returns one event Looks like it might be related to the bug at: http://bugs.python.org/issue5910 but I tried to implement that patch in the current select26 without success. I will try to contact the original author cheimes to see if he can help me out. |
Hi Here is my first attempt to use the Python 2.6 select module and Christian Heimes backport to update the kqueue reactor. Initially, I had some issues with test cases related to timerservice but after exarkun pointed out at IRC, the timeout should be in seconds rather than milliseconds. Though I am still seeing errors when I run the testcases and they primarily seemed to be grouped in processes or pty/tty which seems to follow dialtone and jknight observations. I am attaching a patch file which is applicable to trunk at revision: r28802. Let me know what else I can do? |
Are you testing on OSX? What version? I guess all the tests which use devices should be marked as known failures on OSX/kqueuereactor, since apple seems in no hurry to fix it... Is the patch for select26 in your patch only required for the select26 backport? That is: does the reactor work properly against released python 2.6's select module? |
Wow. That was quick. Yes. I am testing on Mac OS X v10.5 now. I have a friend who has 10.6 so I can ask him to test it on Snow Leapord also. I guess, I can install FreeBSD on my virtual box and run the test cases. As an addition, I am also attaching the error log of the complete test run. The following tests seem to be taking longer (when using kqueue rather than the default select) but eventually they succeeded. test_outputWithErrorIgnored ... |
Replying to jknight:
This works on both. It first tries to import select26 module and if it fails tries to import the select module. |
"This works on both" -- well, it doesn't work on an unpatched select26 -- it requires a patched copy to work properly. So I was wondering if it worked on an out-of-the-box python 2.6. Anyways, I looked at the source for selectmodule.c in python 2.6: it looks like only Python 2.6.5 or later will work, because that's the first version to include the fix for: On earlier versions of Python 2.6, I suppose users will need to install the patched select26 module also. kqueuereactor should probably check that the version of python it's running against is acceptable before letting you use it. As for the failing tests, the test_process and test_stdio failures are just because the kqueue syscall in OSX is broken for ptys: those failures are expected. But do retest on FreeBSD to make sure it works there. I don't think the half-close tests in test_tcp should be failing, however. That /probably/ indicates an actual bug in kqueuereactor. Also, as a note on the process here: if you think there are still things you need to work on, removing the review keyword yourself is an entirely appropriate thing to do. |
My bad about the interpretation of your question. Yes, you need to apply the patch for select26 module. Is there any standard way to check versions when you code in Twisted? I can put those checks. I guess the next step is to look into the non test_process/test_stdio test cases to make sure there are no errors in kqreactor. I will also try to install FreeBSD on my VirtualBox to run the testcases. |
Alright. I am kind of stuck now. An extra eye would be much appreciated. http://www.bpaste.net/show/5236/ - I have pasted the output of trial run with -e and the resulting test.log. Basically, in the testcases: testCloseWriteCloser and testWriteCloseNotification, the method readConnectionLost method is never called. |
Replying to psykidellic:
In the future, could you attach logs like this to tickets as attachments? Pastebins tend to expire things pretty quickly, so when someone goes to diagnose it, it's OK.
You mentioned on IRC that this was run under Leopard. Any chance you could test it on a more recent MacOS, or a recent FreeBSD, and see if the results differ? Thanks! |
We have a Snow Leopard slave and a FreeBSD slave. Someone should put the code in a branch and run it on our buildbot. |
So I tried select.kqueue on python on my mac running OSX 10.6.2. It still doesn't support PTYs (or /dev/null), and furthermore, it crashed my mac with a kernel panic when I exited the python process. I had run something like this at an interactive python prompt:
and it returned a KQ_EV_ERROR with EINVAL error code, as it always has...and kernel panic'd on control-D. I didn't just run the above, though, it was some interactive fiddling, so I don't know exactly what set of things caused the panic. I haven't tried (and am not going to try) to reproduce it. It's nearly certainly kqueue's fault, though. Anyhow, given that behavior, it doesn't seem like we should try to support kqueue reactor on OSX (nor even attempt to run it on the buildslave). It's clear Apple doesn't really test kqueue at all, nor care if it works (it's been broken since it was added, in 10.3 or so). They're probably too busy making sure that people can't run software on the iPhone to get around to fixing kernel panics in OSX. And BTW, poll on OSX is broken just as badly as kqueue...but apparently python on OSX is built without poll support, so you can't even try that. :) Reviving kqueuereactor for freebsd to use is still a good idea, though. |
Replying to jknight:
Unfortunately I can't reproduce this particular panic :-. Maybe this would be helpful, if you experience it again? How to log a kernel panic.
If you can reproduce this kernel panic, report it at bugreport.apple.com. I assure you someone will be interested :). Calendar Server uses kqueue, albeit not with /dev/null or PTYs, so I'm personally motivated to help out with this where I can. |
Wow. So many replies. I thought I would be notified if I am in the CC list. Anyway, we had a mid-release coming up so I had to divert resources to office work and could not reply here. Status update: -- I was able to sneak in couple of hours over the weekdays to run do couple of testruns. Due to my oversight I completely missed that branch 1918-2 already had the patch from dialtone. After correction from exarkun, I ran the test again using Python 2.5 and select26 backport on my laptop. Then I went ahead and ported the changes to the trunk but unfortunately some extra test cases (related to process fails) even though the same test case dont fail in 1918-2 branch. I am not sure what might be the reason. If somebody with more knowledge can help me out, it would be much appreciated. I see the same comment from dreid so I am not sure what changes broke those tests. As for FreeBSD, I had VirtualBox and after some fiddling around and Googling, it seems VirtualBox does not really support FreeBSD out of the box. Thankfully though I was able to get one of the extra licenses for VMWare Fusion from office. According to the docs: http://www.freebsd.org/doc/en/books/handbook/virtualization-guest.html, VMWare Fusion should support FreeBSD. I am installing it as I write this message. Regarding the kernel panic, even I was unable to reproduce the error. I had a kernel panic only once when I was trying to run the test suite over SSH but that probably was bad mixup of code/branches so I am not even sure of the steps to reproduce the error. I will upload my test report soon as they are on my laptop. |
(In [29762]) Branching to 'kqreactor-1918-3' |
One trick for this might be to just make spawnProcess raise a bogus exception when someone tries to use a pty on os x. This should make all such tests fail quickly, and then we can decide how to handle the resulting list of tests (hopefully they'll appear in a few clusters so they don't all need to be handled individually).
OS X is the only BSD-like platform we have a slave for. We have an offer of some hosting resources for a FreeBSD slave, but someone with FreeBSD experience needs to set it up and perhaps offer a minimal level of maintenance (making sure it stays running, fixing any platform-specific issues that might arise that interfere with the slave itself (although helping out with platform-specific issues in the test suite is also very helpful :), etc). Am I right in guessing that if you're not too interested in OS X, you are interested in FreeBSD instead? An alternative to figuring out all the OS X/PTY test failures would be to only set up the FreeBSD slave and only claim kqueue is supported on FreeBSD. Someone else with more interest in OS X can deal with getting a clean test suite run on OS X later. |
|
Replying to oberstet:
I care about OS X, but I don't care a whole lot about OSX+kqueue+spawnProcess; there are enough other ways to get code which requires some subset of those features to work. Ideally we could just add a slave to our existing builder but have some conditional test skips for PTY-based tests. |
(oops. I meant OSX+kqueue+spawnProcess+PTYs. Actually i do care about OSX+kqueue+spawnProcess ;-)) |
Replying to glyph:
Do I get this right? :
As said, I would help with the FreeBSD slave. '' I don't wanna sound silly/frustrated, but my motivation of doing 3. approaches zero ... P.S.: any docs for builder slaves? |
Replying to oberstet:
There is an OS X slave, but it isn't testing kqueue reactor.
Since OS X mostly supports kqueue, yes - ideally, at some point, a kqueue-based reactor would be available on both OS X and FreeBSD (who knows, perhaps even on some other BSDs as well :).
Yes. As long as tests are failing (consistently), we don't consider a platform supported. Relatedly, we don't introduce changes that make tests fail (consistently) on a supported platform. We do testing on unsupported platforms, but they get less attention.
OS X issues can absolutely be addressed in another ticket. Just adding kqueue support for FreeBSD is a major feature addition, and is worthwhile independent of any OS X issues.
I understand. Thanks very much for sticking with it for this long. :)
Yep - http://twistedmatrix.com/trac/wiki/ContinuousIntegration I'm trying to dig up credentials for an existing FreeBSD machine we should have access to. If I succeed, and you're interested in setting up/maintaining a slave on it, I can email you the details. |
Replying to exarkun:
Ah, ok. BuildBot. Used that before ..
Thanks! Either that, or I can host a virtualized one on machine in a datacenter. So, more important for me: can you give me (tobias.oberstein@...) a slave name/password and configure that on the build master? |
ok, full email: tobias.oberstein at tavendo.de another thing: I would setup a slave running FreeBSD 8.2 i386 .. so slave name could be something like freebsd82_i386 .. |
Replying to exarkun:
Just to reinforce this, if no tests fail on the existing OS X builder I'm perfectly happy for that work to happen on a different ticket. I'm sorry if my earlier comment gave the impression otherwise. Also, I'd be happy to take this on, so if you wouldn't mind describing things in detail on that ticket (maybe copy some of comment 63, to remind me) and assign it to me, please do so.
Ditto. This has been a significant chunk of work, and a feature I'm personally happy to see land, so your perseverance is noted and very much appreciated. |
Replying to glyph:
I am trying to verify if "no tests fail on the existing OS X builder". When I do the following on OS X (using trunk with the patch applied)
I get 11 errors in total. Here is the breakdown: E1: twisted.trial.test.test_loader => Tries to create a directory under the Twisted installation directory .. permission denied. This is unrelated to the patch. E2: twisted.test.test_enterprise => "from twisted.enterprise.adbapi import _safe" fails .. "cannot import name _safe" This is unrelated to the patch. E3: 1 failed test case: TCPClientTestsBuilder_KQueueReactor
E4: 7 failed test cases: So as far as I can see only E3 and E4 are kqueue related.However, I am wondering why those cases are even run for kqueue although I started trial with option "-r select". How does a BuildBot slave run the trial? Similar to how I did it manually? Anyway, would you agree that fixing E3 and E4 (that is make those tests be skipped on OS X) should make the patch finally acceptable?PS: I'll create a new ticket ("Skip PTY tests on OS X when running kqueue") when this patch is done and it's clear whats being left to do for OS X. |
For completeness, here is the breakdown for FreeBSD when running the same
E1 and E2 are the same. Further, the following cases fail/err: twisted.internet.test.test_posixprocess.FileDescriptorTests.test_expectedFDs This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch. To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them. |
Replying to oberstet:
This looks suspiciously similar to #4881. Are you using current trunk? |
Replying to oberstet:
There's no such module in current trunk: http://twistedmatrix.com/trac/browser/trunk/twisted/test/test_enterprise.py Did you fail to clean up PYCs maybe?
This looks suspicious, like it might actually be related to the patch. If this is going to be run by the existing buildbot it might need fixing.
These are the reactor-builder tests. They test all the reactors using tests which build the reactor, then run very specific tests against that created reactor, then clean it up. What Generally, the reactor builder tests are a lot better, as they demonstrate more isolated failures, and we are moving to test all reactors in this way as much as possible.
Pretty close, but if you want to know exactly, here's the builder: http://buildbot.twistedmatrix.com/builders/osx10.6-py2.6-select
As long as there's an accompanying ticket to fix it later :). E3 looks like a potentially real problem, but given that this is new functionality, skipping it on OS X for now and fixing it separately is OK.
Sounds good, although any failing reactor builder tests need to be skipped before landing in order to not break the existing builder, as noted above. |
Replying to glyph:
Yep, I did use trunk. And yes, the ticket is exactly about this issue. In the meantime (with the help of exarkun), I've setup a FreeBSD build slave. After mounting mount -t fdescfs null /dev/fd those cases also succeed: http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/3 Is #4881 already merged on trunk? If yes, then the skipping of the open FD tests when "fdescfs" is NOT mounted doesn't seem to work. But I will automount the fdescfs on the slave anyway .. now that I know how to. |
Replying to oberstet:
Make sure to update, clean up PYCs, etc, to deal with any local configuration issues.
OK, great.
It was merged to trunk here: http://twistedmatrix.com/trac/ticket/4881#comment:24 It sounds like you're saying the fix didn't actually work. When fdescfs is mounted, even earlier versions of Twisted would pass tests on FreeBSD.
That does seem like a reasonable solution as far as this ticket is concerned, but possibly you should re-open #4881 for further investigation. |
Ok, the new branch "branches/kqueue-1918" now contains the above "kqueue.7.patch". I'd like to discuss the results of running trial against the branch on OS X and FreeBSD. Need some help;) OS X: http://buildbot.twistedmatrix.com/builders/osx10.6-py2.6-select/builds/1791 It has 8 errors. 7 of those are PTY issues. As already discussed, those need to be skipped on OS X. I will open a new ticket for that when the branch is merged. The 1 other error is in: twisted.internet.test.test_tcp.TCPClientTestsBuilder_CFReactor.test_protocolGarbageAfterLostConnection I suspect CF = Core Foundation? How is that related to kqueue .. what is happening. I'm a bit puzzled by that one. FreeBSD: http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/5 It has 8 erros and 1 failure. Those are issues I have not seen in my testing before, I guess because the buildslave I've setup has everything (but GTK) installed, and thus skips less than my manual testing before. All 9 issue are in twisted.test.test_stdio.StandardInputOutputTestCase and as far as I could analyze, they are related to handling of "temp" files produced during the test cases. I have no clue why those pop up when running kqueue, but not when running select. Any hints on that one would be great also .. whats going on? |
A simple way to do this for now might be to remove kqueue reactor from the
CF is Core Foundation, yes. I'm not sure what's going on here. Possibly state from one of the failing KQueue reactor tests is interfering. I would try to get rid of the rest of the failures and see if this one remains. If this one stops failing, it would then also be interesting to investigate how the KQueue failures were causing this one to fail (but that doesn't need to block this ticket).
These may be environment setup issues, caused by a bug in the unit tests, resulting in the wrong version of Twisted being used in the child process. I would investigate what PYTHONPATH ends up being set to in the launched process and see if that leads anywhere. I suspect this is the problem because:
|
Replying to exarkun:
Thanks! You were right .. problem is module search path is not correct. The PYTHONPATH of the spawned process does point to the locally checked out code tree (and thus, to the new kqueue), but the PYTHONPATH ends up behind the one pointing to the system installed Twisted within sys.path. I can get the cases working by doing something like this:
Is is acceptable to do that for all 9 files? Or do you prefer something different? |
As discussed on IRC with exarkun, I did:
and added
to the twisted/test/stdio_test_* files. This solves the problem. FreeBSD now runs the whole trial without any error/unexpected failure: http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/8 I want to note that at least 2 more files contain roughly similar code like _preamble.py:./twisted/test/process_twisted.py ./twisted/conch/test/test_conch.py a) The code is not the same, and it is inferior (i.e. one will only work when the local Twisted base directory is actually called "Twisted"). b) The code is replicated. Thus, I think we should clean that up also. But probably not in this ticket. This ticket is already touching a couple of places. Agreed? |
Ok, the OS X platform also succeeds, after doing like suggested by exarkun in #7401. I have run the full set of builders using force-builds.py: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/kqueue-1918 Is there anything more to do for final merge? |
I made a few further changes, very minor; one is linked above, the others were r33479 and r33480. Latest build results look fairly good (I think that's as good as Windows gets at the moment). This looks good to me to merge now. Since this is your first merge, maybe it would be good if you found someone on IRC to walk you through it, or at least confirm your ideas of how it works if you have some already. :) Thanks very much for your work on this! |
Replying to zectbumo: |
(In [33481]) Merge kqueue-1918: Revive kqueue reactor Author: oberstet A kqueue()/kevent() based implementation of the Twisted main loop. This implementation depends on Python 2.6 or higher which has kqueue support Note, that you should use Python 2.6.5 or higher, since previous implementations Note, that on OS X, PTYs will (currently) NOT work due to platform restrictions. |
I cant get this to work with Python 2.6.5 I'm using Linux 64bit, Python 2.6.5. The following script, causes my CPUs to go 100% after 10 seconds. If I use gtk.main() instead of reactor.run(), there will be no problem, so I wonder if it is caused by the new version of pykqueue or a bug of twisted? #!/usr/bin/python /Århus (In [33481]) Merge kqueue-1918: Revive kqueue reactor Author: oberstet Reviewer: exarkun, glyph Fixes: #1918 A kqueue()/kevent() based implementation of the Twisted main loop. This implementation depends on Python 2.6 or higher which has kqueue support built in the select module. Note, that you should use Python 2.6.5 or higher, since previous implementations of select.kqueue had U{ http://bugs.python.org/issue5910} not yet fixed. Note, that on OS X, PTYs will (currently) NOT work due to platform restrictions. Replying to glyph:
|
Please use the mailing list or the IRC channel for support. |
Replying to building team:
For posterity, I just want to make it clear that this question doesn't really make sense, since kqueue is a feature of BSD-family OSes (FreeBSD, NetBSD, Mac OS X) and can't be used on Linux. Also, the kqueue reactor wouldn't be used in conjunction with the GTK reactor; they're different event-loop APIs. As exarkun already suggested, if you don't know what your issue actually relates to, ask the question on the mailing list or IRC first and we'll attempt to steer you to the correct bug (or a new bug). |
foom has been working a new version of pykqueue that works better than version 1.3 and is distributed and maintained by arg here: www.python-hpio.net.
Under this new circumstances reviving the kqueue reactor should be a good aim to work for.
Attachments:
Searchable metadata
The text was updated successfully, but these errors were encountered: