-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow Terminal Action #20
Comments
In reverse order: :ls % isn't giving you output because :ls is a file that you need to pull from ~zod. If you can't connect to ~zod for the initial update, very few commands work. You ship's probably out of sync. Did you delete and recreate your destroyers as outlined in Curtis' post to the mailing list? https://groups.google.com/d/msg/urbit-dev/FQqNyi15lu0/GB7Fgyk_wVQJ ~zod is up right now, so once you get reconnected, let's see if the terminal slowness is still there. keep in mind, everything's inherently slow right now. it's not atrocious for an alpha version, but things will be much much better once we hit continuity (and thus beta) next week. |
I hadn't got the daily digest yet. I read the initial warning that this was coming before Oct 4, but I didn't think that would mean "tomorrow" :) I should have checked the mailing list before I tweeted. Thanks! |
I am back on the network and I appear to have pulled down all the "binaries" from zod. It is still just as slow. I don't know what these ships are doing when I load them up; if I have a submarine and two destroyers, will they work together on their own (?) -- is it necessary to run them all at the same time on different hosts? I'm familiar with zookeeper enough to know that you need at least 3 zookeepers to have a working cluster (not two, and certainly not one, but always an odd number) Does it work like that? EDIT: OK so maybe not "just as slow" but there is a sizeable delay |
It's probably a tolerable speed now, with just one submarine. Thanks for making a neat thing! |
We've got a big update coming, even before continuity, that should help the network performance. Just one of the innumerable things we've got to get done. |
~zod/~doznec was down. Alas, it happens. The error on 'ls' is that the ~ /cx/~divsem-misdef/try/~2013.9.26..12.18.19..6428/bin/ls/hoon The "state D" thing is very interesting. The thing is, we shouldn't be Unfortunately, yebyen, this makes you a valuable resource. If you can On Thu, Sep 26, 2013 at 5:52 AM, yebyen notifications@github.com wrote:
|
I have the same issue, using Fedora 19 (64bit if that matters). On the other side playing with Arvo/Urbit in VMWare client (Windows 7) with Debian freshly installed I did not have that issue. I must be system specific. Currently I'm running the latest from the github |
Oh, ok! It doesn't have to be unfortunate if I am a valuable resource :) I have done this before using... strace. Unfortunately the output of strace is of absolutely ridiculous proportions even in trivial programs, and this is not a trivial program... I see a lot of fdatasync, epoll_wait, clock_gettime when things settle down. Performance has got a lot better trying it right now, but I don't have to touch a lot of keys to see it enter D state. I can try and just capture all of the output of strace, does that help? Is there something you'd rather get? |
I'm going to bet that it's something docker does with cgroups and namespaces, not something that VirtualBox or Vagrant is doing, based on nothing but intuition. Let me know if there's an IRC channel I should join, or another way you'd prefer to continue the debugging dialog. |
Ooh! That makes me so mad! Cruiser to anyone who can find this. It's On Thu, Sep 26, 2013 at 9:59 AM, Vasile Rotaru notifications@github.comwrote:
|
You know, I really don't want to create an IRC channel, because Urbit is The sigsegvs are normal - they're part of how we do checkpointing. Above The question is simply what it's doing when it's in D mode. Somewhere, On Thu, Sep 26, 2013 at 10:35 AM, yebyen notifications@github.com wrote:
|
So this is what I get from
when I'm doing a
|
Alas, this doesn't tell me anything - as you see, the number of seconds I wonder about fsync() and fdatasync() - it's a little surprising how often On Thu, Sep 26, 2013 at 12:01 PM, Vasile Rotaru notifications@github.comwrote:
|
Nope, it's still holding the terminal after a certain amount of keyboard input. It seems like the amount has gotten larger since I started testing this earlier. I'm not sure that commenting those lines was enough, I've got strace -p running in a different terminal now, and it is calling fdatasync and fsync repeatedly while the D state is hanging. |
Ditto for me with the delay, I'm also not able to recreate my destroyers but I predict I've done something else wrong along the way.
Also, this is what appears (a portion) just before the output of %ls
And many repetitions of this:
|
John, did you try recreating your destroyers from a new submarine? The mprotect() and sigsegvs are totally normal - that's part of the I think it's clear that there's some blocking I/O going on somewhere. I On Thu, Sep 26, 2013 at 12:38 PM, John notifications@github.com wrote:
|
Docker is all about boxed-up virtual machines. I could send you mine. Let me check to make sure I haven't left any password hashes anywhere. |
If you get Docker, you can pull
If by chance it's an artifact of the kernel (3.10.10) or other system bits (it's CoreOS, which has been having a bit of an update problem in the last few days, so it might not be the very latest CoreOS) you won't see the problem. It's all pushed. Have a look see. The image is close to 3GB (sorry) because it includes some testnet bitcoin altchain I've been hashing on and an example ChicagoBoss app from the PDF tutorial I've been following along with. (I can also export to a tar file but docker is built for doing this...) |
Hm, I get a username/password prompt when I try to: oxford:~/Documents/src; git pull https://github.com/yebyen/cgyarvin.git I do suspect it might be a kernel thing, though. Well, if so, that'll On Thu, Sep 26, 2013 at 12:50 PM, yebyen notifications@github.com wrote:
|
No, it's not on Git, you want this one: https://index.docker.io/u/yebyen/cgyarvin/ Unfortunately not sure how you can download from the docker index without docker. I'll start exporting to tar. |
Well, I need Docker to run it! Point me to some stupid person docker doc On Thu, Sep 26, 2013 at 12:58 PM, yebyen notifications@github.com wrote:
|
If you're in Ubuntu Linux, you're 2/3 of the way there... http://www.docker.io/gettingstarted/#h_installation You want:
before doing
I don't actually install docker though, so I start here: If you are using CoreOS (and possibly if you're not) you'll have to do the |
I'm in OS X - so don't expect immediate results :-) On Thu, Sep 26, 2013 at 1:02 PM, yebyen notifications@github.com wrote:
|
OK :) The coreos/vagrant path is the quickest to an identical setup. I'm using Windows 7, so ^_^ |
Cool. One day maybe I'll rock as much as docker. Anyway I have something On Thu, Sep 26, 2013 at 1:17 PM, yebyen notifications@github.com wrote:
|
@cgyarvin still no love:
|
See latest urbit-dev reply. On Thu, Sep 26, 2013 at 2:06 PM, John notifications@github.com wrote:
|
My idea, when posting the strace output was to compare it with the same output on a box without those problems, and bellow is the same output (during Well, there is quite a difference . Unfortunately, I tells me very little about what the problem could be.
|
Ohp... I've just accidentally pushed all of my destroyers to the docker cloud. Seems there is no way to push a repo and keep certain tags private at the same time. |
Okay I think I get it. It's just a crazy level of difference in OS performance. On some systems it is quite tolerable, those include mine. Why is it blocking? It's a bug, or rather a prototype in production. What it should actually do is block, but block in a forked process. Assuming decent COW this should not delay the parent, which can continue past the snapshot point. This change might be a fun way to learn vere internals... Sent from my iPhone On Sep 29, 2013, at 11:31 AM, yebyen notifications@github.com wrote:
|
Doh! We'll send you some new ones. Sent from my iPhone On Sep 29, 2013, at 12:18 PM, yebyen notifications@github.com wrote:
|
Thanks :) I don't think anyone got them, but who knows if Docker index will keep them around for the NSA or not. There is a much smaller image you can pull now, from
Everything is in
This time it's only 309.3MB, 123MB of which you will have already downloaded if you have pulled |
Nifty I'm sure we'll get way into Docker as time goes on. Sent from my iPhone On Sep 29, 2013, at 1:01 PM, yebyen notifications@github.com wrote:
|
I got the extra destroyers, thanks 💃 |
Crap I'll get online and fix that. Sent from my iPhone On Sep 29, 2013, at 3:10 PM, yebyen notifications@github.com wrote:
|
I think I've confirmed, it's not the absence or presence of a login shell with tty that causes this problem. In docker, you need to run -privileged in order to have an sshd running that allows users to log in and get a tty. Unfortunately I couldn't get a tty any other way. Turns out I still don't need one, other than for screen... |
I like your explanation that it's an abysmal COW that causes this to be noticeable... you're not the first person I've heard say (not that you so much said) that AUFS is a bad idea, and it should be killed. I like everything else about CoreOS though, so I'm probably going to comment that line in |
AUFS is a great idea but it has to actually be as fast as a non-union You can't really live in Urbit without checkpointing - every time you quit So, it's a performance bug that someone has to fix. I'll fix it myself if On Sun, Sep 29, 2013 at 3:55 PM, yebyen notifications@github.com wrote:
|
Well, I run vim with I have the whole Dockerfile/Vagrantfile philosophy, where experiments should really be quickly repeatable with whatever input data, without a lot of resources, and images in general should be disposable, not depended on (except for basically all the rest of the time when they are actually part of a production process.) I see the Of course, on my platform even I guess checkpoints are really not the same thing as snapshots... |
I had an idea, at least for CoreOS/docker users, how to test this theory of what is wrong and whether it can be fixed without architectural changes or a patch upstream. It appears that ~zod is down, but possibly hosting urbit on a volume will alleviate some of the terminal slowness? At first attempt it doesn't appear so. I don't think volumes are copy-on-write, so back to not knowing what causes the slowdown. |
I just put ~zod back up. You know, actually, the spawning sounds scary, On Tue, Oct 1, 2013 at 7:05 AM, yebyen notifications@github.com wrote:
|
On Sep 27, 2013 1:08 AM, "Curtis Yarvin" curtis.yarvin@gmail.com wrote:
Given the continued stability problems, I am going to start hanging out in |
I'm also seeing the same laggy terminal input issue and also practically constant lseek/write 16384 bytes/sync cycles in strace (on fd 4 -- This is on an actual iron machine (my NAS box, heh), btw, no virtualization. |
I want to clarify this bug once again. The "bug" is that checkpointing is acceptably fast on some machines, but In any case, the solution is to fork() and checkpoint in a different On Thu, Oct 3, 2013 at 1:06 PM, Aarni Koskela notifications@github.comwrote:
|
Well, heh. I'll be the first to admit my NAS box isn't exactly... well, erm, is far from the most performant one out there -- I built it to be reliable and cheap, and those goals it fulfills admirably. Anyway, I copied my URB_HOME to |
Aarni, that's what I'd expect, but I also do like a bit of disk under my On Thu, Oct 3, 2013 at 1:29 PM, Aarni Koskela notifications@github.comwrote:
|
Likewise. :) But at least it proves (as if it needed to be proven any further) that it's an IO/FS performance issue. (Aside: Luckily I'm in the process of upgrading my non-NAS system, so I can then hopefully happily run Urbit in a VM on non-old iron, backed by an SSD to boot. Yay!) |
You've both given some pretty big hints in here. I am a little stumped what you mean by "libuv losing my SIGCHLD" -- I come from a CS major program where they teach you threads and forking in java, and I have an appreciation for literature like Dave Thomas -- Pipelines using Fibers in Ruby 1.9. Of course nearly all of the programs I use from day to day are written in C, and don't follow these methods at all. I read the thread on news where they told you how to do a proper rebase/pull request and I was mystified. While everything sounds technically correct and seeing the solution explained I can read and follow, I can say with certainty I have literally never done anything like that. I will take a stab at this one issue if you want to engage a branch where we can discuss the problem, but I think my git-fu is more on your level than theirs, and I know my C chops don't measure up to yours! I'm hoping you have already solved this in private and you're waiting to see if someone else comes up with the solution. I'll pull now and see if I can find your changes around |
Kingdon, I apologize for misconstruing your background! Of course there's nothing I will fix this bug - it's just that I have 10 million other top On Fri, Oct 4, 2013 at 7:54 PM, yebyen notifications@github.com wrote:
|
Well, you do see that fsync() and in fact fdatasync() is being called down I am inclined to regard your change as good. I think I've basically been The checkpoint logic definitely doesn't pretend to be constantly consistent
On Mon, Oct 14, 2013 at 12:06 PM, Aaron Sokoloski
|
Ah, ok, will do. Sorry, I deleted my comment after re-reading some previous ones, thinking that the issue had already been covered. |
Yeah, fsync() definitely works quite differently on different platforms. I http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html Philip Monk On Mon, Oct 14, 2013 at 12:59 PM, Aaron Sokoloski
|
I can confirm that this completely resolves the issue for me. Thank you so (incidentally, for some reason I had to do a Philip Monk On Mon, Oct 14, 2013 at 1:08 PM, Philip Monk pcmonk@asu.edu wrote:
|
So, I should amend my previous statement -- this mitigates the problem greatly, but if I try pasting into the terminal, there's still quite a noticeable lag as I think u2_loom_save is being called after every character input. But at least it's much more usable now. |
Yeah, I can confirm that after this pull request is merged, on my laptop (different laptop than previous report, but practically the same software underneath) the issue is only a problem for extreme button mashers or big copy pasters. As fast as I can intelligibly type (and I am a fast typer), the key-presses are reflected on the screen. The last laptop was an SSD and this one is an SSD/Hybrid, I have not tested their comparative write performance but with both machines running some variant of Windows (7 or 8), deploying CoreOS to VirtualBox (with Vagrant), Debian Jessie (testing) and Urbit master underneath, performance is good enough for me that I would close this issue. I closed it before though, and it was reopened for good reason, so I'll just leave that $0.02. If anyone wants to try on different hardware, I would be more careful with a destroyer, but if I'm helping someone try out Urbit, I might want to start by giving them a submarine, I just need to know if I'm going to start to have problems if I gave everyone the same submarine (at a 30-minute talk in November, for instance.) |
From the time I type ":infinite" to the time it shows up on my screen is at least a full 10 seconds. There is nothing else eating up cycles during this time. The vere process shows in top as state D (blocking I/O) until the typing is finally echoed to the screen.
Today it tells me ~zod |Tianming| not responding still trying, it hasn't said that before, but I've always had the same slow typing results. Same result with submarines or destroyers. It does not have to be ":infinite" but typing anything, with or without newline. Today I also don't get output from
:ls %
:My environment is Ubuntu Linux under Docker (CoreOS/Vagrant)
The host system is Windows 7
It's an SSD-bearing laptop with four cores, so there are no local barriers that I know of that should cause blocking for that long. Is it reaching out to the internet to act on these commands? Should it be blocking I/O?
The text was updated successfully, but these errors were encountered: