Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stores #86

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Improve stores #86

wants to merge 17 commits into from

Conversation

hbcarlos
Copy link
Collaborator

Changes the stores to have one instance handling multiple documents instead of instantiating one store per document.

@hbcarlos hbcarlos marked this pull request as draft September 29, 2023 12:31
@davidbrochart
Copy link
Collaborator

Can you explain why this is needed?

@hbcarlos
Copy link
Collaborator Author

We need more control over the documents that are stored.
For example, check if the document exists before trying to load it from the store, remove documents if every client leaves the room or we need to reset the content of the room, list the documents, and get their updates to create a history and revert to that point, etc.

I don't think the stores should be making decisions about what to do with the document, the room or the server should make those decisions.

@davidbrochart
Copy link
Collaborator

I may be missing something, but I don't understand why stores shouldn't be independent. Actually, I think they should even live in a separate package. That way we could use them for other transport layers than WebSockets.

check if the document exists before trying to load it from the store

The store is created from a document, so an existing store ensures the document exists.

remove documents if every client leaves the room

We could add a method to the current YStores, to remove all updates of a document.

list the documents

That should be at the WebSocket server IMO, not the stores.

get their updates to create a history and revert to that point

I don't see why it's not currently possible.

@fcollonval
Copy link
Collaborator

list the documents

That should be at the WebSocket server IMO, not the stores.

That kind of feature is useful for maintenance or debug tooling. It makes sense to provide an API to introspect stored documents outside of the collaboration context.


I'm trying to compare the two approaches to get a balance of pros and cons; here is a starting point:

Manager of YStores

Current code encapsulated in a store manager.

class StoreManager(Mapping[str, YStore]):

     def __init__(self, store_factory):
         self._factory = store_factory

     def list(self):
         return self.keys()
  
     def __get(self, path):
         # Create store if it does not exist
         # start get called here I guess?
         return store;

     def delete(self, path):
         # Destroy/stop the store ?
         self.__get(path).delete()
        
     def write(self, path, data):
         self.__get(path).write(data)

     def read(self):
         return self.__get(path).read(data)
         

class BaseYStore:

     @abstractmethod
     def write(self, data):
         pass
    
     @abstractmethod
     def read(self):
         pass

      # Option for start and stop who should be responsible for this?

# Create storage manager
def factory(...):
     return MyStore(...)

manager = StoreManager(store_factory=factory)

To avoid a risk of inconsistency the document store should not be accessed directly as otherwise the caller can temper with the lifecycle (start and stop) of each document store. And it is harder to control that lifecycle as caller can keep reference of a store they should not.

Question: Are YStore stateful? If not - what I think it should -, what is the advantage of keeping in memory a YStore per document that is a stateless actor to carry out IO operations?

YStoreManager

This proposal

class BaseStoreManager:

     async def list(self):
         return self.keys()

     async def delete(self, path):
         # Destroy/stop the store

     @abstractmethod
     async def write(self, path, data):
         pass
     
     @abstractmethod
     async def read(self, path):
         pass

     # Start and stop could be handled by the
     # object initiating the store manager or 
     # within the object.

# Create storage manager
manager = MyStoreManager(...)

One advantage I see with this proposal is a simpler API.

@davidbrochart
Copy link
Collaborator

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

@JohanMabille
Copy link

JohanMabille commented Oct 5, 2023

To avoid a risk of inconsistency the document store should not be accessed directly

This constraint leads to duplicating the Store API in the StoreManager, therefore I agree that it complicates the design. Unless we need to use different stores in the same StoreManager, or we relax this constraint, the second solution looks better (the Store hierarchy class becomes an implementation detail for the end user, so let's keep it simple).

@fcollonval
Copy link
Collaborator

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

You could easily achieve it with a multiplexer manager as done for example by some people for the jupyter server content manager.

@davidbrochart
Copy link
Collaborator

I agree that it's better to do simple things easily, and more complicated things with more effort, so let's go with the "YStoreManager" solution.
Thinking more about it, this idea of a YStore managing multiple documents was there at the beginning anyway. That's why for instance a FileYStore is not useful on its own, only when used e.g. in a TempFileYStore to write files in a common directory. An SQLiteYStore also uses a common backend (a DB).

@hbcarlos hbcarlos self-assigned this Oct 5, 2023
@hbcarlos hbcarlos added the enhancement New feature or request label Oct 5, 2023
@hbcarlos hbcarlos marked this pull request as ready for review October 5, 2023 14:34
ypy_websocket/stores/sqlite_store.py Outdated Show resolved Hide resolved
ypy_websocket/stores/file_store.py Show resolved Hide resolved
ypy_websocket/stores/sqlite_store.py Show resolved Hide resolved
@davidbrochart
Copy link
Collaborator

Thanks, I'll take a closer look soon.

Copy link
Collaborator

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hbcarlos

I have one question. Otherwise code looks good for me.

ypy_websocket/stores/base_store.py Show resolved Hide resolved
ypy_websocket/stores/file_store.py Show resolved Hide resolved
ypy_websocket/stores/file_store.py Show resolved Hide resolved
ypy_websocket/stores/sqlite_store.py Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
ypy_websocket/stores/file_store.py Outdated Show resolved Hide resolved
tests/test_file_store.py Outdated Show resolved Hide resolved
tests/test_file_store.py Outdated Show resolved Hide resolved
tests/test_file_store.py Outdated Show resolved Hide resolved
ypy_websocket/stores/utils.py Outdated Show resolved Hide resolved
path = tmp_path / "tmp"
store = FileYStore(str(path))
await store.start()
await store.initialize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what is the difference between start and initialize?
From what I can see, start now does nothing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialize is used to create or ensure all the resources needed are available before using the store. I moved it out of start because the entity that calls it should be the one deciding whether to call it and forget about it or wait until it finishes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the new initialize is the old start. What is start used for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To create the task group because I saw that some other classes are adding tasks there.

@@ -120,9 +120,6 @@ def on_message(self, value: Callable[[bytes], Awaitable[bool] | bool] | None):
self._on_message = value

async def _broadcast_updates(self):
if self.ystore is not None and not self.ystore.started.is_set():
self._task_group.start_soon(self.ystore.start)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how the store is started?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One store contains multiple documents now, so it is no longer the responsibility of the room to start the store.
The store should be initialized by the same entity that instantiates it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and in ypy-websocket where is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not in ypy-websocket. Where is the store instantiated in ypy-websocket?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store is instantiated outside, but I think it should be the responsibility of the WebSocket server to start and stop the store, since the store lifetime is tied to the server's.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other entities can access the store by starting and stopping it as they wish

Yes, but starting and stopping don't give you fine-grain control over what to start/stop.
We are designing a store that multiple rooms are going to access at the same time I can not just cancel everything I will be cancelling tasks from other rooms!

But it is better for users of ypy-websocket to not care about starting and stopping the store

No is not, less control is never better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put aside the question of who has the responsibility to start/stop the store for now.
My point is that you basically reverted the use of AnyIO in the way a store is started. A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped

No, it is not. The point of AnyIO is that every task should have a parent (a task group that handles its life cycle), but It doesn't have to be the same parent for every task. At the same time, the parent doesn't have to be in the class that implements the logic of a task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is the decision I made for ypy-websocket in general, and stores in particular. I want a store to be self-contained, as far as the tasks it launches. No task should escape from it. See #86 (comment).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with that decision, the store shouldn't be self-contained because the store is not the one launching those tasks.

ypy_websocket/stores/base_store.py Outdated Show resolved Hide resolved
Comment on lines 68 to 71
self._task_group = create_task_group()
self.started.set()
self._starting = False
task_status.started()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the starting logic that was launching the initialization task in a task group. This start method now seems useless. It seems that you moved the logic to the initialize method, but without the benefit of launching it in a task group, which was the whole point of using AnyIO.
Correct me if I'm wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the initialize because the entity that calls it should decide to wait until it is done or forget about it.

Looking at the code now, I don't think we should use AnyIO in the stores. We should use it in the rooms or the server to organize the different tasks, but not here.

For example, if a room is writing to the store while also reading, then the room should have a task group called write_tasks, and before cleaning that room, we should wait for that task group to finish, but we can cancel the reading task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we can cancel the reading task.

You can cancel tasks with AnyIO.

The goal of the start method was to launch an initialization task in the background, but read/write operations must wait until initialization is done. This way starting the WebSocket server can be quick, if no access to the store is done yet. You should restore this behavior.

Copy link
Collaborator Author

@hbcarlos hbcarlos Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can accomplish the same behavior from outside and have more control over the different tasks running.
Creating a task group and adding every task there. This is not a good practice. We should differentiate between tasks., and the lifecycle of a task should be handled by the entity that launches the task not by the store.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a task group and adding every task there is not a good practice.

I disagree, it's the whole point of structured concurrency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are misunderstanding the idea of structured concurrency.

Where does it say every task should be in the same group? Where does it say every task should have the same direct parent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In structured concurrency, you have to launch a task in a task group. You can of course nest task groups, but cancelling the root task group cancels all the (sub)tasks.
In ypy-websocket, I made the choice for stores to have a root task group that makes it easy to start and stop them. All tasks a store creates are contained in it, they cannot leak outside.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tasks a store creates are contained in it, they cannot leak outside.

The problem is that the store doesn't create those tasks. It is not the store that calls self.write() or self.read().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because stores have not been completely separated out of ypy-websocket (see #19), but you can see that e.g. write() is launched in a task group. Ideally, we should use the store's task group instead of ypy-websocket's, but the point is that no task is launched in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants