-
Notifications
You must be signed in to change notification settings - Fork 91
two 3000+ task taskgroup inspectors hang nightly #271
Comments
Here's the code for the loading of tasks, which is a recursive call: taskcluster-tools/src/views/UnifiedInspector/Inspector.js Lines 134 to 158 in 65f7f12
Upon looking at this, I do notice a couple problems:
There is probably more we could do, too. |
Filed on the Fx side as bug 1399229 . |
I've been noticing such an issue for about a month, but I hadn't thought this is because of Task Group inspector, until last Friday. I'm on Linux, I thought this was a system issue, until I realize a text editor didn't show the symptom below: SymptomsType some in text in a text field of whatever page and notice the input stops from time to times. From times to times, the input hangs. Sometimes because of this hang, the key gets repeated more than 10 times (as in "buggggggggggggggggggg"). Data setI ran a profile, and I notice every spike happen in the same content process. This content process points to tools.taskcluster.net, on task-group-inspector like: https://tools.taskcluster.net/groups/NXi31S58R6OEAtWdahm9LQ (it contains more than 4400 tasks). Please note the whole task group had been loaded several minutes ago. Interpretation@julienw from the profiler team helped me to understand the issue. He narrowed down to this profile https://perfht.ml/2uLosxM. We see 3 issues:
General rendering timeThere are several issues here:
componentWillReceiveProps() too longSorting tasks by names takes > 200 ms. We may want to generate the sorted (by name) array as soon as we build in it (in the reduce call, for instance) The reduce call itselffindIndex() has a complexity of n*m/2, n being > 4400 total tasks and m being 200 fetched tasks. This may be avoided by using a map, indexed by the slugId. Another optimization (we don't know if that's already in place) could be to strip the task data to what's needed for the group details, i.e.: ImpactSuch a run happens every 2 seconds and causes a hang in the content process of approximately 800ms (each). Therefore, if somebody has like 10 taskgroups open (which may happen in releng), we may see more frequent hangs. To sum up@julienw and I investigated these potential explanation. We're not too familiar with this code, though. Feel free to ask us anything that isn't clear :) What do you think @eliperelman ? |
Thanks for the detailed information! You're right that a lot of this comes down to fetching and rendering too much data. While we could spend a lot of time trying to optimize this for displaying all the data we are fetching, we are actually going to tackle this another way: fetching less data. We are iterating a new version of the web app [1] that speaks with our own gateway to the Taskcluster APIs via GraphQL [2], instead of talking to the APIs directly. Right now the APIs return an abundance of data, much more than the UI needs, and so we will wrap these APIs in a GraphQL layer that will cut this down significantly. My initial back-of-the-napkin estimates shows a data reduction of about 90%. Second, this new implementation will stop loading the entire task group in a single pass. Instead, we are going to start paginating this data. We are also going to stop loading the task group when only viewing the task. We currently use react-virtualized for the logviewer [3] implementation, but I don't think we will need to use it for the task group once we start paginating. Feel free to follow along our progress in the links below, and let us know if you have any further advice or questions! [1] https://github.com/taskcluster/taskcluster-web |
Hey @eliperelman ! Clearly rendering less data will make the problem less present. May I suggest to at least implement the change in
I think you'd need to extract this part into a PureComponent too: taskcluster-tools/src/views/UnifiedInspector/GroupDetails.jsx Lines 104 to 120 in e8b6d45
Then checking that the status actually changed before changing the task would IMO help a lot with the rendering, as no Task would actually be rendered if the "task" object doesn't change and the (new) TaskInspector component is pure. |
I certainly think that's reasonable! @helfi92 thoughts? |
For
I believe it was done purposely so that we can keep the ordering of tasks whenever the listener calls |
@helfi92 I'm not opposing doing it when the status actually changes :) The problem is that we always change it, even when not needed. |
I'm hoping to be able to open 2-3 active nightly graphs without having a hung browser. It's possible this is a nightly issue, and possible it's a tools.tc.net issue, and certainly possible that both could use improvements.
We just unified the two views (task inspector / taskgroup inspector) which is an overall win for UX. Hanging when I'm viewing a single task + a second taskgroup isn't ideal, though; I'm wondering if there's anything we can do on the tools.tc.net side to make this better.
(I do like the current unified UX quite a bit: the task completed, and I was easily able to then easily look at the current running tasks that had been unscheduled because they depended on the first task. It's the hang that I'm trying to address, so if we can address that without affecting the UI, that's the ideal solution for me.)
The text was updated successfully, but these errors were encountered: