New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage when uploading to Azure Storage #1031
Comments
here's a heap dump of me running it locally: |
From the heap dump, it is apparent that there is a 128MB and a 64MB allocated at
It does indeed seems like that azurestore is copying the entire upload data into a memory buffer before uploading it to Azure: tusd/pkg/azurestore/azurestore.go Lines 143 to 148 in 7970961
A new buffer is created and the entire request body is written into the buffer before the final size is examined and the data is uploaded. No data is streamed. I am not sure if that is possible with Azure. @omBratteng, do you know if there is a better way to avoid these allocations? For the s3store we save chunks of data on disk to avoid in-memory buffers, for example. @jspath-ankored Does the memory get properly released once the upload is completed? Or is the memory kept alive after the upload, meaning we would have a memory leak. |
It's been a while since I worked with this and I have since gotten a new job. But if I recall correctly, it only keeps the chunks in memory before it uploads it to blob storage. We have uploaded files of ~10TiB without any issues (though we didn't really have any memory constraints in our cluster). Though, there shouldn't be any issues with save chunks of data on disk. |
It looks like it's not a memory leak, and the memory does eventually get released, although it takes around five minutes: It would still be preferable if the memory usage didn't grow like this ... I uploaded a 221MB file, and after the 2nd upload, we were using 922.8MB of memory. This makes me concerned about using this on production. If it matters, I am measuring the memory using the Live Charts extension for Docker Desktop. So maybe I'm just measuring the memory that the Docker container is using. I know you can stream to Azure Blob Storage in .NET ... not sure about Go. |
I am no expert at Go's garbage collector, but Go might not release memory back to the OS immediately. The memory might be flagged internally as unused, but releasing it to the OS is another step that might only be taken when the memory is actually needed my another process. But please don't cite me on this :)
If you want more insights into the memory usage as seen from the Go internals, you can use the metrics exposed by tusd (e.g. http://tusd.tusdemo.net/metrics). They contain many different memory-related metrics and might help you understand how much memory Go thinks is in use right now. However, I agree that it would be ideal to avoid this in-memory buffering altogether. Would you be interested in trying out to stream the data directly to Azure (or stream it in chunks)? |
Yes. We are not using tusd in production yet, and won't be a for a while yet, so I am open to trying out streaming. Are you asking me to implement, or just try out some changes? |
Yes, improving the memory usage of the Azure store would require some implementation changes. I personally do not use Azure on my own and don't have access to it, making me not the best person to tackle this. |
We noticed that the body is ready into memory when reviewing the code before settling and tusd and decided to use 5 MiB Patch Chunks in the clients for now. I'd like to have tried taking a stab at improving it but didn't have time for it :( |
Does this also occur if you use Azurite? If so, I can try to reproduce it locally and debug it. |
Yes, the same issue happens using Azurite locally: |
I implemented the buffer in a temporary file on disk instead of in-memory. Please have a look at #1070, test it out, and let me know if it behaves as expected, especially in terms of memory usage. In my local tests, no memory growth was visible over time. |
Describe the bug
I see very high memory usage when uploading large files to Azure Storage. For example, a 221MB file upload results in using over 221MB of memory. I see this behavior when uploading to actual Azure Storage as well as uploading to Azurite locally.
It seems like something in the tusd Azure Storage code may be retaining all of the file info in memory, instead of streaming it to Azure Storage or an Azure Storage emulator, like Azurite.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I would expect the memory usage to stay relatively low.
Setup details
Please provide following details, if applicable to your situation:
tusproject/tusd:v2.0
--azure-storage uploads --azure-endpoint https://AZURE-ACCOUNT.blob.core.windows.net --hooks-http MYWEBHOOKURL --hooks-http-forward-headers Authorization --hooks-enabled-events pre-create,pre-finish --cors-expose-headers X-Upload-Properties-Set,X-Upload-File-Path --behind-proxy --expose-pprof
--behind-proxy
on Azure only, not locally. But I see the problem both locally and on Azure.The text was updated successfully, but these errors were encountered: