DataMigrate with a detached container #1

tyler-cai-microsoft · 2023-08-09T22:08:50Z

I used runtime factory chaining
Did not implement quorum proposals - the solution will need it and understand what the challenges there are
Did not bulletproof clients from submitting ops - quorum proposal pausing. There is a partial solution implemented.
Did not close all containers
Transitions a shared cell to a shared map
Need to consider the whole running summarizer locked action as double submitting a summary will result in the summary not being uploaded properly.
Using a readonly context (detached == readonly)
Need to consider vamping the whole summarizer system from the summarizer class to allow summarizing with a specific summary
Need a way to access the summarizer when it gets created, I hacked my way around it.

vladsud · 2023-08-10T14:58:20Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+	return readonlyContext;
+}
+
+class ContainerRuntimeFactoryWithDataMigration extends RuntimeFactoryHelper {


might be worth adding comments above key classes to clarify their purpose.
If I got it right, this is an CRF adapter, that makes a call on what CRF instance to use depending on version and detached/attached state.

vladsud · 2023-08-10T14:59:39Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+	}
+}
+
+class SharedCellToSharedMapFactory implements IChannelFactory {


If I got right, this is essentially on-the-fly transformer for node in the graph (dds), if using this kind of language.

vladsud · 2023-08-10T15:16:35Z

packages/test/test-utils/src/testObjectProvider.ts

@@ -381,8 +387,11 @@ export class TestObjectProvider implements ITestObjectProvider {
 		entryPoint: fluidEntryPoint,
 		loaderProps?: Partial<ILoaderProps>,
 		requestHeader?: IRequestHeader,
+		packageEntries: Iterable<[IFluidCodeDetails, fluidEntryPoint]> = [


That's fine for a hack, but it's not great to have an argument that is not used if further arguments are present.
Not sure how many implementations of this interface are out there, but I'd consider adding new API instead on it

This was part of my attempt to do some mock of quorum proposals, feel free to ignore.

vladsud · 2023-08-10T15:18:58Z

packages/runtime/container-runtime/src/containerRuntime.ts

@@ -2825,6 +2891,16 @@ export class ContainerRuntime
 		);
 	}

+	public submitTransition() {


maybe startProposal() or submitTransitionProposal()?

vladsud · 2023-08-10T15:21:48Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+
+	const loadV2Container = async (summaryVersion: string): Promise<IContainer> =>
+		provider.loadContainer(
+			dataMigrationRuntimeFactoryV2,


code is a bit confusing to read - if you keep current hack around, I'd suggest passing undefined here (as this arg is not used) - it's a bit easier to read it that way (without need to remind constantly that first arg is not used)

vladsud · 2023-08-10T15:24:59Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+		runtimeFactoryV1,
+		runtimeFactoryV1ToV2,
+		runtimeFactoryV2,
+		deferred,


I'd suggest instead have a function on ContainerRuntimeFactoryWithDataMigration that returns this promise. Under cover (inside class) we can use same deferred approach, but having a function with a name and a call many lines below will be more readable as one could glimpse at least some context from the name - when I see where this differed is used 40 lines down, I have no context where it came from, and what are the semantics for it - need to drill into implementation of this class to get that.

vladsud · 2023-08-10T15:29:47Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+		summarizer.on("connected", () => {
+			waitForSummarizerRuntimeConnection.resolve();
+		});
+		await waitForSummarizerRuntimeConnection.promise;


I think even in your PR we use this pattern of setting up listener to resolve differed multiple times. And it's somewhat wrong as you do not use once(), you use on() :) I suggest creating a general-purpose helper function that takes arbitrary EventEmitter, arbitrary event name and returns a promise when such event name is called. Implementation will create deferred and do what you do here, but for callers it would be a single line call - await waitEvent(emitter, "connected").

vladsud · 2023-08-10T15:32:15Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+		await waitForSummarizerRuntimeConnection.promise;
+		const migrateAck = await migrate(summarizer, readonlyRuntime);
+		const summaryVersion = migrateAck.handle;
+		transitionContainer.close();


Consider using same prefix for different types of variables. For example, I think readonlyRuntime corresponds here to transitionContainer, if so - it would be great for both variables have either "transition" or "readonly" or both in their names. Same prefixes make it easier to follow relationships. Sometimes numbers are better, i.e. runtime1, runtime2, ... with some header at the top describing what 1/2/3 actually represent.

vladsud · 2023-08-10T16:00:30Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+		assert(codeDetails !== undefined, "get code details failed");
+
+		this._currentRuntimeFactory = codeDetails.package === "v1" ? this.v1 : this.v2;
+		if (this.clientDetailsType === detachedClientType) {


Something worth confirming (I think I made this mistake initially in my thinking, and I think you follow it):
I think CRF is supposed to be stateless object, and used to load many different files (i.e. there is one-many relationships for CFR-CR). If that's true, then I think we are in trouble with this approach.

If you look at IRuntimeFactory, it's simple - it has only one method. I think using RuntimeFactoryHelper here hurts more than it helps - if it's a single method, then that means all this code could be written in linear fashion, and then most of the state likely will be on stack, not as class remembers.

That said, I think the whole RuntimeFactoryHelper pattern is wrong. I'd break it into two classes (probably worth opening a task to do this work in the future, as it's outside of this work):

Class A, that implements IRuntimeFactory, but also takes a factory to be able to create instances of class B for each instantiation.

Class B, that has all the methods around pre-initializaiton, post-initializaiton, etc. And it's stateful, i.e. it's safe to have state because it's used to load only one instance of a runtime.

I'd kill "provide" nonsense from it, and make it not implement IRuntimeFactory, but implement all the other methods it has today. And in places where we need IRuntimeFactory, I'd use different class implementation that would create

I'm not in love with this implementation either. Glad this clarified that part.

vladsud · 2023-08-10T16:02:49Z

packages/runtime/container-runtime/src/containerRuntime.ts

+			this.submitSignalFn = () => {};
+		}
+
+		this.loadDetachedAndTransitionFn = async () => {


This feels a bit convoluted. On one hand I can see why you might want to put this code here - "/_detachedContainerRuntime" string seems to be ContainerRuntime business only. But otherwise this block seems like does not belong here - it belongs where loadDetachedAndTransitionFn() is called.

Maybe one reason why it's not there - this class has access to loader, which (if I got right) is relative loader (i.e. it operates only on current URI namespace). But do we need loader at all? Can we have detached context and runtime attached to it, and that's about it? I.e. could we avoid touching loader layer?

Your approach with having to use loader to create container object is fine as well, it's just so hard to follow the network of connections / code flow on how things are created and what is their state as functionality is spread out.

vladsud · 2023-08-10T16:04:17Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+	return newTree;
+}
+
+async function migrate(summarizerRuntime: ContainerRuntime, readonlyRuntime: ContainerRuntime) {


To be honest, I got lost here on what these runtimes are. Trying to follow the code, it feels like readonlyRuntime should be a detached runtime, but if so, I'm not sure why we call startSummary() on it, so I think I got it wrong (and thus got lost)

So if we call startSummary, we can call summarize and potentially get an incremental summary that is submittable.

If we don't call startSummary, we can't summarize because the summarizerNode system would break down. All clients utilize the summarizerNode system from what I understand. This may change in the future, I'm not sure.

vladsud · 2023-08-10T16:07:10Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+}
+
+async function migrate(summarizerRuntime: ContainerRuntime, readonlyRuntime: ContainerRuntime) {
+	(readonlyRuntime as any).summarizerNode.startSummary(


couple notes:

I think we need to do something here to ensure that summarizer does not produce a summary on its own, with V2 quorum state (as that's the state of it when we get here) but with V1 content.

In general, we should consider each object in the system to know what to do in response to various events (like quorum proposals), i.e. instead of poking at objects from outside and tell them what to do, I think we benefit a lot (where it's possible to accomplish) if objects (like summarizer) know what is expected from them in response to observing transition ops. This solves first point, but also reduces number of weird APIs we need to add to various layers to be able to pick through layers.

I agree with the notes.

vladsud · 2023-08-10T16:09:19Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+	const neverCancel = new Deferred<SummarizerStopReason>();
+	const runningSummarizer = (summarizerRuntime as any).summarizer.runningSummarizer;
+	await runningSummarizer.summarizingLock;
+	await runningSummarizer.lockedSummaryAction(


See comment above - I think this kind of locking should happen immediately in response to V2 acceptance op, such that summarizer can't produce one more summary with ref seq number of V2 acceptance op. That's fine as prototype - this is mostly invitation for discussion on how to organize code in the future to have proper correctness gurantees.

Yeah, the locking should happen on receival of the op.

vladsud · 2023-08-10T16:16:06Z

packages/test/test-end-to-end-tests/src/test/dataMigration.spec.ts

+		readonlyRuntime.deltaManager.lastSequenceNumber,
+		new TelemetryNullLogger(),
+	);
+	const appSummary = await readonlyRuntime.summarize({ fullTree: true });


I'd think you would use createSummary() call instead - I think it's more suitable here.
That said, I think I see the problem with my original thinking - createSummary() will use refSeqNumber = 0, and I believe you are trying to fix it. I wonder if we could generalize it, i.e. can we change our code to be able to create detached containers from arbitrary seq number? I'm sure it's not easy - I'm sure zero is encoded in other places (like driver), though I remember I did some work to remove such code and rely strictly on what Container provides.
I think we need to start collecting notes like that into some readme.md - things that are hard / we have to hack around. As they are negatives for this approach, and having too many of these could outweigh the benefits of this approach over some other approach. I.e. we should always think about it as an experiment, in search for best solution.

That all said, do we care what is the seq number? The app tree should not care, no? Where does it show up?

There's some metadata field that I had to modify in the submit summary function on the summarizer client. Look at the containerRuntime submit summary function.

vladsud · 2023-08-10T16:40:10Z

packages/runtime/container-runtime/src/containerRuntime.ts

@@ -212,6 +212,9 @@ export enum ContainerMessageType {
 	 * See the [IdCompressor README](./id-compressor/README.md) for more details.


This comment is for whole review (putting it here to allow proper discussion thread).

I believe what you have here is not export / import flow, where arbitrary transformation over file is possible. Instead, you are essentially doing conversion on the fly, in-place, i.e., the fact of loading container with special container runtime results in transformation of content and a new state. And the only transformations allowed are those that transform individual notes irrespective of the rest of container (where node is DDS or data store).

There are couple directions of exploration from here:

If developer needs arbitrary transformation (let's say merge 2 DDSs into one instance), how would we go about it?

Here, we might say - we do not care about incrementality / preservation of IDs. It's a full rewrite where we do not care if most of the content is the same or not.

If we are fine with only node-to-node transformations, can we do it somehow in-place inside of summarizer client, such that it could produce incremental summary?

the most complex - both of above, i.e. relatively arbitrary transformations inside of container runtime that result in incremental summary

we need to be careful though with "arbitrary" part here. I think we should allow arbitrary transformations, but I believe they could be constrained to - replace a node (with all of its children) to another node. Here node could be DDS, DataStore, or ContainerRuntime (in later case it means - wipe out all data stores and start from scratch creating any arbitrary set of data stores). I think the key here - developer has choices, but small incremental changes are possible.

I do not know if it adds clarify, but here is an example of something similar outside of our project:

https://www.inkandswitch.com/cambria/

https://github.com/stripe/dagon

The problem of changing data schema is very common in industry, and there is a bunch of interesting work out there. I'm not sure work itself is very applicable, but ideas are.

Here is my understanding how it works usually:

There is a framework that allows walking / enumerating through DAG of objects (in our case it's CR/DS (Datastores)/DDSs

a set of callbacks (or better - classes with methods, such that they can be stateful) that are called for nodes with ability to replace node with some other node.

Such callbacks/classes

could replace a node with another node. For example, data store with another data store instance.

There could be multiple passes over the graph, such that these classes could collect info on first pass, and act on this info on second pass. I.e. imagine that on first pass it collects info about all Directory instances in container, and on second pass it created one SharedTree instance and patches all other DDSs / Datastores that had reference to Directory instance.

GC subsystem is invoked at the end to collect nodes that are not referenced any more.

That said, all such systems usually work on DAG. If we count handles, we deal with arbitrary graph, not DAG. And it's not clear if we want to support such arbitrary transformations that could leave dangling handles. We might want to say initially - it should be possible, and we will explore it later, but for now we allow only simplistic transformation - DDS node to DDS node.

I have no way how to do it, this is mostly discussion allowed to see if we can see some interesting next steps on that way - the way to more generalized system in the future but leveraging principles of that system even today to implement in-place incremental substitution of DDS instance.

Let's chat more :)

Here is my poor-man's implementation suggestion: imagine that all parent nodes (container runtime, data store) provide a method to enumerate children and replace child with another (new) child, or patch child in-place (like make changes inside of DDS through normal APIs, but this does not result in ops - though we do not need it now, so defer for later implementation of it). These methods are not exposed on any interfaces, only ContainerRuntime (and CRF) have access to them. For now, we will use only leaf node replacement (i.e. replace DataStore child) for our needs, and will expose transformer that walks the tree and asks transformer if a node (DDS) needs to be replaced. Our transformer will look only for specific factory type (Legacy Shared Tree) and will involve abstract method to replace it with another DDS instance.

Something like that.

DataMigrate with a detached container

779f4b0

github-actions bot added community-contribution base: main area: contributor experience area: driver area: runtime area: server area: tests public api change labels Aug 9, 2023

tyler-cai-microsoft added 3 commits August 9, 2023 15:41

Cleanup

841c09e

Remove quorum code

f37c839

Add back gc

0c2df40

vladsud reviewed Aug 10, 2023

View reviewed changes

tyler-cai-microsoft added 3 commits August 14, 2023 16:13

Use summarizer client without detached container

fe86064

Verify that the deltamanager does not process any ops

ce6ef42

Add comments and simplify code

74b98ed

tyler-cai-microsoft and others added 6 commits August 15, 2023 14:29

More code simplification and readability

55cf482

Add preservation of Id

9ab6c61

code cleanup

097f298

Use generic dataObject for modification

13f785b

Merge branch 'main' into prototype-summarize-detached

c6a1e9b

other merge main changes

acce783

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataMigrate with a detached container #1

DataMigrate with a detached container #1

tyler-cai-microsoft commented Aug 9, 2023 •

edited

Loading

vladsud Aug 10, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

tyler-cai-microsoft Aug 15, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

vladsud Aug 10, 2023

		@@ -212,6 +212,9 @@ export enum ContainerMessageType {
		* See the [IdCompressor README](./id-compressor/README.md) for more details.

DataMigrate with a detached container #1

Are you sure you want to change the base?

DataMigrate with a detached container #1

Conversation

tyler-cai-microsoft commented Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-cai-microsoft commented Aug 9, 2023 •

edited

Loading