Improve stores #86

hbcarlos · 2023-09-29T12:31:48Z

Changes the stores to have one instance handling multiple documents instead of instantiating one store per document.

davidbrochart · 2023-09-29T13:09:07Z

Can you explain why this is needed?

hbcarlos · 2023-09-29T13:27:43Z

We need more control over the documents that are stored.
For example, check if the document exists before trying to load it from the store, remove documents if every client leaves the room or we need to reset the content of the room, list the documents, and get their updates to create a history and revert to that point, etc.

I don't think the stores should be making decisions about what to do with the document, the room or the server should make those decisions.

davidbrochart · 2023-09-29T13:49:52Z

I may be missing something, but I don't understand why stores shouldn't be independent. Actually, I think they should even live in a separate package. That way we could use them for other transport layers than WebSockets.

check if the document exists before trying to load it from the store

The store is created from a document, so an existing store ensures the document exists.

remove documents if every client leaves the room

We could add a method to the current YStores, to remove all updates of a document.

list the documents

That should be at the WebSocket server IMO, not the stores.

get their updates to create a history and revert to that point

I don't see why it's not currently possible.

fcollonval · 2023-10-04T13:54:28Z

list the documents

That should be at the WebSocket server IMO, not the stores.

That kind of feature is useful for maintenance or debug tooling. It makes sense to provide an API to introspect stored documents outside of the collaboration context.

I'm trying to compare the two approaches to get a balance of pros and cons; here is a starting point:

Manager of YStores

Current code encapsulated in a store manager.

class StoreManager(Mapping[str, YStore]):

     def __init__(self, store_factory):
         self._factory = store_factory

     def list(self):
         return self.keys()
  
     def __get(self, path):
         # Create store if it does not exist
         # start get called here I guess?
         return store;

     def delete(self, path):
         # Destroy/stop the store ?
         self.__get(path).delete()
        
     def write(self, path, data):
         self.__get(path).write(data)

     def read(self):
         return self.__get(path).read(data)
         

class BaseYStore:

     @abstractmethod
     def write(self, data):
         pass
    
     @abstractmethod
     def read(self):
         pass

      # Option for start and stop who should be responsible for this?

# Create storage manager
def factory(...):
     return MyStore(...)

manager = StoreManager(store_factory=factory)

To avoid a risk of inconsistency the document store should not be accessed directly as otherwise the caller can temper with the lifecycle (start and stop) of each document store. And it is harder to control that lifecycle as caller can keep reference of a store they should not.

Question: Are YStore stateful? If not - what I think it should -, what is the advantage of keeping in memory a YStore per document that is a stateless actor to carry out IO operations?

YStoreManager

This proposal

class BaseStoreManager:

     async def list(self):
         return self.keys()

     async def delete(self, path):
         # Destroy/stop the store

     @abstractmethod
     async def write(self, path, data):
         pass
     
     @abstractmethod
     async def read(self, path):
         pass

     # Start and stop could be handled by the
     # object initiating the store manager or 
     # within the object.

# Create storage manager
manager = MyStoreManager(...)

One advantage I see with this proposal is a simpler API.

davidbrochart · 2023-10-04T14:24:39Z

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

JohanMabille · 2023-10-05T07:18:36Z

To avoid a risk of inconsistency the document store should not be accessed directly

This constraint leads to duplicating the Store API in the StoreManager, therefore I agree that it complicates the design. Unless we need to use different stores in the same StoreManager, or we relax this constraint, the second solution looks better (the Store hierarchy class becomes an implementation detail for the end user, so let's keep it simple).

fcollonval · 2023-10-05T07:22:10Z

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

You could easily achieve it with a multiplexer manager as done for example by some people for the jupyter server content manager.

davidbrochart · 2023-10-05T07:42:58Z

I agree that it's better to do simple things easily, and more complicated things with more effort, so let's go with the "YStoreManager" solution.
Thinking more about it, this idea of a YStore managing multiple documents was there at the beginning anyway. That's why for instance a FileYStore is not useful on its own, only when used e.g. in a TempFileYStore to write files in a common directory. An SQLiteYStore also uses a common backend (a DB).

ypy_websocket/stores/sqlite_store.py

ypy_websocket/stores/file_store.py

ypy_websocket/stores/sqlite_store.py

davidbrochart · 2023-10-06T08:33:33Z

Thanks, I'll take a closer look soon.

fcollonval

Thanks @hbcarlos

I have one question. Otherwise code looks good for me.

ypy_websocket/stores/base_store.py

ypy_websocket/stores/file_store.py

ypy_websocket/stores/sqlite_store.py

pyproject.toml

ypy_websocket/stores/file_store.py

tests/test_file_store.py

ypy_websocket/stores/utils.py

davidbrochart · 2023-10-09T12:54:55Z

tests/test_file_store.py

+    path = tmp_path / "tmp"
+    store = FileYStore(str(path))
+    await store.start()
+    await store.initialize()


Can you explain what is the difference between start and initialize?
From what I can see, start now does nothing?

The initialize is used to create or ensure all the resources needed are available before using the store. I moved it out of start because the entity that calls it should be the one deciding whether to call it and forget about it or wait until it finishes.

It seems to me that the new initialize is the old start. What is start used for now?

To create the task group because I saw that some other classes are adding tasks there.

davidbrochart · 2023-10-09T12:57:06Z

ypy_websocket/yroom.py

@@ -120,9 +120,6 @@ def on_message(self, value: Callable[[bytes], Awaitable[bool] | bool] | None):
        self._on_message = value

    async def _broadcast_updates(self):
-        if self.ystore is not None and not self.ystore.started.is_set():
-            self._task_group.start_soon(self.ystore.start)
-


Can you explain how the store is started?

One store contains multiple documents now, so it is no longer the responsibility of the room to start the store.
The store should be initialized by the same entity that instantiates it.

OK, and in ypy-websocket where is it?

It is not in ypy-websocket. Where is the store instantiated in ypy-websocket?

The store is instantiated outside, but I think it should be the responsibility of the WebSocket server to start and stop the store, since the store lifetime is tied to the server's.

Other entities can access the store by starting and stopping it as they wish

Yes, but starting and stopping don't give you fine-grain control over what to start/stop.
We are designing a store that multiple rooms are going to access at the same time I can not just cancel everything I will be cancelling tasks from other rooms!

But it is better for users of ypy-websocket to not care about starting and stopping the store

No is not, less control is never better

Let's put aside the question of who has the responsibility to start/stop the store for now.
My point is that you basically reverted the use of AnyIO in the way a store is started. A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped.

A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped

No, it is not. The point of AnyIO is that every task should have a parent (a task group that handles its life cycle), but It doesn't have to be the same parent for every task. At the same time, the parent doesn't have to be in the class that implements the logic of a task.

But this is the decision I made for ypy-websocket in general, and stores in particular. I want a store to be self-contained, as far as the tasks it launches. No task should escape from it. See #86 (comment).

I disagree with that decision, the store shouldn't be self-contained because the store is not the one launching those tasks.

ypy_websocket/stores/base_store.py

davidbrochart · 2023-10-09T13:08:21Z

ypy_websocket/stores/sqlite_store.py

+        self._task_group = create_task_group()
+        self.started.set()
+        self._starting = False
+        task_status.started()


You removed the starting logic that was launching the initialization task in a task group. This start method now seems useless. It seems that you moved the logic to the initialize method, but without the benefit of launching it in a task group, which was the whole point of using AnyIO.
Correct me if I'm wrong?

I moved the initialize because the entity that calls it should decide to wait until it is done or forget about it.

Looking at the code now, I don't think we should use AnyIO in the stores. We should use it in the rooms or the server to organize the different tasks, but not here.

For example, if a room is writing to the store while also reading, then the room should have a task group called write_tasks, and before cleaning that room, we should wait for that task group to finish, but we can cancel the reading task.

but we can cancel the reading task.

You can cancel tasks with AnyIO.

The goal of the start method was to launch an initialization task in the background, but read/write operations must wait until initialization is done. This way starting the WebSocket server can be quick, if no access to the store is done yet. You should restore this behavior.

You can accomplish the same behavior from outside and have more control over the different tasks running.
Creating a task group and adding every task there. This is not a good practice. We should differentiate between tasks., and the lifecycle of a task should be handled by the entity that launches the task not by the store.

Creating a task group and adding every task there is not a good practice.

I disagree, it's the whole point of structured concurrency.

I think you are misunderstanding the idea of structured concurrency.

Where does it say every task should be in the same group? Where does it say every task should have the same direct parent?

In structured concurrency, you have to launch a task in a task group. You can of course nest task groups, but cancelling the root task group cancels all the (sub)tasks.
In ypy-websocket, I made the choice for stores to have a root task group that makes it easy to start and stop them. All tasks a store creates are contained in it, they cannot leak outside.

All tasks a store creates are contained in it, they cannot leak outside.

The problem is that the store doesn't create those tasks. It is not the store that calls self.write() or self.read().

This is because stores have not been completely separated out of ypy-websocket (see #19), but you can see that e.g. write() is launched in a task group. Ideally, we should use the store's task group instead of ypy-websocket's, but the point is that no task is launched in the wild.

hbcarlos marked this pull request as draft September 29, 2023 12:31

hbcarlos mentioned this pull request Oct 4, 2023

Fix empty YNotebook jupyter-server/jupyter_ydoc#189

Open

hbcarlos added 9 commits October 5, 2023 15:55

Split stores into different files

8bc7d9c

Removes the old stores

64c4ad3

Creates a global store and improves the API

a87c0bb

Updates the SQLite store

bdd4f9e

Updates the file store

3fddf19

Include updates when requesting a document

57b2d70

Fixes type errors

2eb2f17

Adds a flag to include the updates when retrieving a doc

ecb8680

Update version

6f84e6f

hbcarlos force-pushed the improve_stores branch from 5a1adb7 to 6f84e6f Compare October 5, 2023 14:03

Ignore mypy error

32c9063

hbcarlos self-assigned this Oct 5, 2023

hbcarlos added the enhancement New feature or request label Oct 5, 2023

hbcarlos marked this pull request as ready for review October 5, 2023 14:34

davidbrochart reviewed Oct 5, 2023

View reviewed changes

ypy_websocket/stores/sqlite_store.py Outdated Show resolved Hide resolved

ypy_websocket/stores/file_store.py Show resolved Hide resolved

ypy_websocket/stores/sqlite_store.py Show resolved Hide resolved

hbcarlos mentioned this pull request Oct 5, 2023

Fixes the initialization of rooms jupyterlab/jupyter-collaboration#198

Draft

hbcarlos added 2 commits October 6, 2023 10:05

Review

2515a79

Fixes types

746a62b

fcollonval reviewed Oct 6, 2023

View reviewed changes

ypy_websocket/stores/base_store.py Show resolved Hide resolved

ypy_websocket/stores/file_store.py Show resolved Hide resolved

ypy_websocket/stores/file_store.py Show resolved Hide resolved

ypy_websocket/stores/sqlite_store.py Show resolved Hide resolved

Review

abe39d9

davidbrochart requested changes Oct 9, 2023

View reviewed changes

hbcarlos added 4 commits October 9, 2023 18:09

Review

50e03b4

Removes start/stop

dd72571

pre-commit

e876a80

Fix windows

999701d

davidbrochart mentioned this pull request Dec 12, 2023

State of the 2.x branch? jupyterlab/jupyter-collaboration#222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve stores #86

Improve stores #86

hbcarlos commented Sep 29, 2023

davidbrochart commented Sep 29, 2023

hbcarlos commented Sep 29, 2023

davidbrochart commented Sep 29, 2023

fcollonval commented Oct 4, 2023

davidbrochart commented Oct 4, 2023

JohanMabille commented Oct 5, 2023 •

edited

fcollonval commented Oct 5, 2023

davidbrochart commented Oct 5, 2023

davidbrochart commented Oct 6, 2023

fcollonval left a comment

davidbrochart Oct 9, 2023

hbcarlos Oct 9, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 9, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 9, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 9, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 10, 2023

davidbrochart Oct 10, 2023

hbcarlos Oct 10, 2023

davidbrochart Oct 10, 2023

hbcarlos Oct 10, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 9, 2023

davidbrochart Oct 9, 2023

hbcarlos Oct 10, 2023 •

edited

davidbrochart Oct 10, 2023

hbcarlos Oct 10, 2023

davidbrochart Oct 10, 2023

hbcarlos Oct 10, 2023

davidbrochart Oct 10, 2023

Improve stores #86

Are you sure you want to change the base?

Improve stores #86

Conversation

hbcarlos commented Sep 29, 2023

davidbrochart commented Sep 29, 2023

hbcarlos commented Sep 29, 2023

davidbrochart commented Sep 29, 2023

fcollonval commented Oct 4, 2023

davidbrochart commented Oct 4, 2023

JohanMabille commented Oct 5, 2023 • edited

fcollonval commented Oct 5, 2023

davidbrochart commented Oct 5, 2023

davidbrochart commented Oct 6, 2023

fcollonval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hbcarlos Oct 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanMabille commented Oct 5, 2023 •

edited

hbcarlos Oct 10, 2023 •

edited