-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DEFAULT_SESSION use be thread-safe #228
Conversation
I have an implementation that uses threads to distribute calls to S3 resources. When using several threads at once, lots of errors emerged. I figured out that it was because of this: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html#multithreading-or-multiprocessing-with-sessions Meaning that Sessions can't be reused by different threads. To solve this what I did was to change the way we use the global variable DEFAULT_SESSION to make it possible to have one default session for each thread.
+1 for this one! |
Considering this is an async project, is there a specific reason for using aioboto3 in multiple threads over multiple tasks in the event loop, or using boto3 in threads? I don't at the moment see the use case. |
Hey @terrycain , Thanks for your answer. Let me try to explain my case. Suppose you have a task that is both: IO-bound and CPU-bound at the same time. For example, a task that gets lots of data from S3, organizes it in a table and executes a classification algorithm that uses some vector operations on all this data. In a kind of pyseudocode it could be like: async def task(data_chunk):
data = await get_data_from_s3_io_bound()
return execute_complex_vector_operation_cpu_bound(data) Now, think that you have lots of tasks like this to execute and that they can be carried on in parallel. In this case, you are going to have something like this in your pyseudocode: import concurrent
def execute_tasks_in_parallel():
chunks = range(1000)
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
tasks = [ pool.submit(task, chunk) for chunk in chunks ]
for task_completed in concurrent.futures.as_completed(tasks):
do_something_with_task_result(task_completed.result()) Thus, we are going to have different threads trying to access the same DEFAULT_SESSION variable under the hood, and it crashes the execution sometimes, concluding as a scenario where we need a different session for each thread, otherwise, it's going to crash. I don't know if anyone else is experiencing this something like this, because it can be too specific, but I think thread-safety characteristic is something this library should have as its property. Feel free to discord at any point or even to clarify whatever you want to, I may be missing some points here because it is the first time I work with aioboto3. Thanks for the patience in analyzing this PR. |
Hey @terrycain , can you take a look at my answer here? |
Ok so these changes have the capability to store infinite copies of the session, especially long after a thread has ended which isn't great for memory usage. Is there anything stopping you from creating a One of the things I've been contemplating recently is actually dropping the entire global session as it can present problems for frameworks that stop the event loop and create new ones (e.g. aws chalice) |
Hey, I think that the idea of dropping the whole global session is a better
one, but we could have the solution I gave merged in the meantime. It is up
to you.
…On Thu, May 20, 2021 at 4:19 PM Terry Cain ***@***.***> wrote:
Ok so these changes have the capability to store infinite copies of the
session, especially long after a thread has ended which isn't great for
memory usage.
Is there anything stopping you from creating a Session() object yourself
and using that instead of relying on the global one.
One of the things I've been contemplating recently is actually dropping
the entire global session as it can present problems for frameworks that
stop the event loop and create new ones (e.g. aws chalice)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABS7Q7BHCRHG6MYFY3KVX3TOVOCZANCNFSM42TFHLDQ>
.
|
9.0.0 is out, there is no default session anymore. |
Awesome, thanks!
…On Sun, Jun 27, 2021, 3:44 PM Terry Cain ***@***.***> wrote:
9.0.0 is out, there is no default session anymore.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABS7Q5CROHVRSX4YBAASHLTU5WPHANCNFSM42TFHLDQ>
.
|
I have an implementation that uses threads to distribute calls to S3 resources. When using several threads at once, lots of errors emerged.
I figured out that it was because of this:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html#multithreading-or-multiprocessing-with-sessions
Meaning that Sessions can't be reused by different threads.
To solve this what I did was to change the way we use the global variable DEFAULT_SESSION to make it possible to have one default session for each thread.