Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance bug when uploading large amounts of data #7525

Closed
1 task
hbgl opened this issue Jun 29, 2023 · 5 comments · Fixed by #8084
Closed
1 task

Performance bug when uploading large amounts of data #7525

hbgl opened this issue Jun 29, 2023 · 5 comments · Fixed by #8084
Labels
- P4: important Violate documented behavior or significantly impacts performance (priority)

Comments

@hbgl
Copy link
Contributor

hbgl commented Jun 29, 2023

What version of astro are you using?

2.7.2

Are you using an SSR adapter? If so, which one?

Node

What package manager are you using?

npm

What operating system are you using?

Windows

What browser are you using?

Chrome

Describe the Bug

There exists a performance bug when Astro has to handle large request bodies. The problem is caused by this piece of code:

let body = Buffer.from([]);
let reqBodyComplete = new Promise((resolve, reject) => {
req.on('data', (d) => {
body = Buffer.concat([body, d]);
});
req.on('end', () => {
resolve(body);
});
req.on('error', (err) => {
reject(err);
});
});

On line 67, Buffer.concat is called for every chunk of the request body. The concat function returns a newly allocated buffer on each call. This puts a lot of pressure on the GC and spikes memory usage. With small payloads this effect is not noticeable. However, with larger payloads in the megabytes, this adds significant overhead. The chunk size on my machine is 64k, so a lot of buffers are created and destroyed when uploading anything large.

I want my Astro app to handle file uploads so this is a real problem for me.

What's the expected result?

For me at least, the expected result would be to not buffer the body into memory at all. The Request constructor also accepts a ReadableStream for the body. That would be my preferred solution because for file uploads that will be saved on disk, there is no reason to load them into memory first.

Alternatively, instead of allocating a new Buffer for every chunk, use a "growing" buffer, e.g. double in size when full.

Link to Minimal Reproducible Example

https://stackblitz.com/edit/github-mx5ncy-g1ui5u

Participation

  • I am willing to submit a pull request for this issue.
@natemoo-re natemoo-re added the - P3: minor bug An edge case that only affects very specific usage (priority) label Jun 29, 2023
@natemoo-re
Copy link
Member

Thanks for opening an issue! Since you've already identified the culprit, any interest in submitting a PR to fix this? Switching to a ReadableStream seems like a great solution.

@hbgl
Copy link
Contributor Author

hbgl commented Jun 29, 2023

I'm not sure if I can find the time to do a proper implementation for this central infrastructure feature. It's sadly not that straight foward to convert a node Readable to a ReadableStream. A suitable function is avaiable since Node V17 as an experimental feature. Astro does support Node V16 so that would need to get polyfilled (node source).

@natemoo-re
Copy link
Member

Totally understandable, I appreciate the links and context! We'll try to get working on this as soon as we can.

@natemoo-re natemoo-re added - P4: important Violate documented behavior or significantly impacts performance (priority) and removed - P3: minor bug An edge case that only affects very specific usage (priority) labels Jun 30, 2023
@hbgl
Copy link
Contributor Author

hbgl commented Jul 5, 2023

It's sadly not that straight foward to convert a node Readable to a ReadableStream. A suitable function is avaiable since Node V17 as an experimental feature. Astro does support Node V16 so that would need to get polyfilled (node source).

@natemoo-re I want to make a correction here. Node uses the undici library for the implementation of the Request class (see node source). Undici's Request body also accepts async iterables, even though this is non-standard (see remark).

So in practice, you do not need an adapter to convert from a http.IncomingMessage to a ReadableStream since http.IncomingMessage is also a Readable which is also an aync iterable.

This should just work:

import { IncomingMessage } from 'node:http';

function makeRequest(req: IncomingMessage) {
    return new Request('http://example.com', {
        method: 'post',
        body: req,
        duplex: 'half',
    });
}

@hbgl
Copy link
Contributor Author

hbgl commented Jul 7, 2023

I was trying to write a fix (see commit) but in order to make it PR-ready, it would also involve refactoring the tests. The tests currently use node-mocks-http to create mock requests. However, the mocked request does not support the async iterable interface that the real request has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- P4: important Violate documented behavior or significantly impacts performance (priority)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants