Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption due to race condition is possible #3

Closed
troggy opened this issue May 5, 2022 · 5 comments
Closed

Data corruption due to race condition is possible #3

troggy opened this issue May 5, 2022 · 5 comments
Labels
question Further information is requested

Comments

@troggy
Copy link
Contributor

troggy commented May 5, 2022

It is more of a theoretical issue, since I haven't reproduced it yet, but from the code it seems possible.

Many of the functions like chunkCompleted or cancelOne change the state of upload module, e.g. they update state variables likecompletedChunks and pendingChunkUploads. Yet these functions could be executed in parallel at the same time (as uploadChunk function is executed asynchronously for multiple chunks at once). This could corrupt the state of uploadModule.

For instance, imagine chunkCompleted is called at the same time for two different chunks. Both of them will operate on the same pendingChunkUploads list and they wouldn't know of each other. So the first chunkCompleted will remove the completed chunk from the list, but second chunkCompleted will remove another completed chunk and keep the one removed by the first call (because, again, they share the exact copies of the pendingChunkUploads list).

Described example could lead to the situation when upload won't be marked as complete even when all the chunks are uploaded, because const isUploadComplete = uploadCompletedChunks.length === upload.partsCount condition for the very last chunk may return false in this case.

Just for your consideration

@troggy
Copy link
Contributor Author

troggy commented May 5, 2022

Once I have some free time, I will try to make a demo example (should be possible to do with many parallel chunk uploads)

@toniopelo
Copy link
Owner

Thanks a lot for your time @troggy.
On this one I am not sure that this scenario can happen though, and it would be interesting to check by tweaking the code a little bit to force calling the function twice at the "same time" and verify whether I'm right or not.

My way of understanding this is that
1- Node.js is monothreaded and that's why we never need mutex or semaphores like some other languages dealing with async (like C for example). Node actually push chunks of code to be executed into the so called "event loop" and execute each code chunk proceduraly. I'm not exactly sure how these chunks are split but calling a function might be one of these point where a new chunk is pushed to the execution queue. IMO this ensure that the two different chunks of the chunkCompleted function call are going to be executed one after the other, preventing the concurrent data access/mutation conflict.
2- Everything in node is handle with pointers and thus the state variables like completedChunks should be the same everywhere it's accessed with the same pointer.

Might be simple enough to check that. If I find the time for it I'll try to demonstrate what I just said.

@troggy
Copy link
Contributor Author

troggy commented May 6, 2022

1- Node.js is monothreaded and that's why we never need mutex or semaphores like some other languages dealing with async (like C for example). Node actually push chunks of code to be executed into the so called "event loop" and execute each code chunk proceduraly. I'm not exactly sure how these chunks are split but calling a function might be one of these point where a new chunk is pushed to the execution queue. IMO this ensure that the two different chunks of the chunkCompleted function call are going to be executed one after the other, preventing the concurrent data access/mutation conflict.

Would be interesting to explore that. I admit I lack some fundamental knowledge on node.js internals here.

Btw, the stuff I'm talking about are in s3-uploads-client and thus are running in browser. AFAIK Node.js uses V8 as well, so may be there is no difference for our topic.

Everything in node is handle with pointers and thus the state variables like completedChunks should be the same everywhere it's accessed with the same pointer.

This is a good point. However, the functions don't mutate the object, they create a new one and thus create new pointer.

For example

this.completedChunks = this.completedChunks.filter(
      (uploadedChunk) => uploadedChunk.uploadId !== uploadId,
    )

will create a new array and this.completedChunks will be changed to point to this new array. However, concurrent call may (probably?) still have the old pointer. Unless the engine is setting the function call context immediatelly before the execution and not when the function is put on call stack.

@toniopelo
Copy link
Owner

Would be interesting to explore that. I admit I lack some fundamental knowledge on node.js internals here.
Btw, the stuff I'm talking about are in s3-uploads-client and thus are running in browser. AFAIK Node.js uses V8 as well, so may be there is no difference for our topic.

You're right and as they both use the same JS interpreter I would say that there is no difference at all.

This is a good point. However, the functions don't mutate the object, they create a new one and thus create new pointer.

Yeah, you're right my first answer to this point was not relevant then! But I would say the same for the context.
The upload manager being a class, when instanciated with the new keyword, it lives in memory somewhere. For each of the members function calls, the context assigned to the function is this, meaning the object instance.
I would then say that when you update this.completedChunks pointer, it does replace the pointer on the attribute completedChunks of the object and when you access it you read from the new pointer. Because this context passed to the function is a pointer itself, pointing to the place in memory where the instanciated class object lives.
⚠️ I'm not sure about that, this is just my guess of the way this works internally but I might as well be wrong

@toniopelo toniopelo added the question Further information is requested label May 17, 2022
@troggy
Copy link
Contributor Author

troggy commented Feb 19, 2023

running the code in production for many months now, the issue didn't happen. Closing

@troggy troggy closed this as completed Feb 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants