Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voyager struggling under large rooms #84

Closed
turt2live opened this issue Apr 19, 2017 · 1 comment
Closed

Voyager struggling under large rooms #84

turt2live opened this issue Apr 19, 2017 · 1 comment

Comments

@turt2live
Copy link
Owner

turt2live commented Apr 19, 2017

It got asked to join #ubuntu in freenode, and now it's torturing all available resources.

It stopped processing sync updates due to the large room join. Eventually the homeserver failed under the load and caused some propagating failures to voyager:

/sync error Unknown error code: Unknown message
{ [Unknown error code: Unknown message]
  errcode: undefined,
  name: 'Unknown error code',
  message: 'Unknown message',
  data: '<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body bgcolor="white">\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx/1.10.0 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n',
  httpStatus: 504 }
Starting keep-alive
info VoyagerBot Sync state: SYNCING -> RECONNECTING
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: RECONNECTING -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: ERROR -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: ERROR -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }

This is voyager's monitoring:
image

This is the homeserver's basic monitoring:
image

Prometheus metrics are not available for this incident. However, voyager did run up against the resource limits of it's Toronto node:
image

After voyager recovered, it queued an update of ~105,359 nodes (because it tries to cache member information). This took quite a while to process and ended up with some significant traffic to the database, monopolizing the connection pool.

Voyager is currently stable (as of writing this), however it is still chugging through ~80k node updates.

@turt2live turt2live added this to the v1.1.0 milestone Apr 19, 2017
turt2live added a commit that referenced this issue May 31, 2017
@turt2live turt2live modified the milestones: v1.1.0, v1.0.0 Dec 12, 2017
@turt2live
Copy link
Owner Author

Improved performance in the typescript version should fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant