Skip to content

(tree) Renamed some shape encoding entities in chunked forest #24777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -43,13 +43,14 @@ export type BufferFormat<TEncodedShape> = (
)[];

/**
* Replace shapes and identifiers in buffer and any nested arrays.
* Takes already encoded content in `buffer` and deeply searches, replacing each `IdentifierToken` and `Shape`
* with references to their encodings in the returned `EncodedFieldBatchGeneric`.
*
* This looks inside nested arrays (including transitively) but not inside objects.
*
* Note that this modifies `buffer` to avoid having to copy it.
*/
export function handleShapesAndIdentifiers<TEncodedShape>(
export function updateShapesAndIdentifiersEncoding<TEncodedShape>(
version: number,
buffer: BufferFormat<TEncodedShape>[],
identifierFilter: CounterFilter<string> = jsonMinimizingFilter,
@@ -68,6 +69,9 @@ export function handleShapesAndIdentifiers<TEncodedShape>(
}
};

// A collection of all arrays transitively reachable (via arrays) from buffer.
// These are all the arrays whose contents are searched for shapes and identifiers which are then replaced
// with references. This uses a breadth first traversal, but the actual order is not important.
const arrays: BufferFormat<TEncodedShape>[] = [buffer];
for (const array of arrays) {
for (const item of array) {
@@ -93,9 +97,9 @@ export function handleShapesAndIdentifiers<TEncodedShape>(

// Traverse shape graph, discovering and counting all shape to shape and shape to identifier references.
{
let item: Shape<TEncodedShape> | undefined;
while ((item = shapeToCount.pop()) !== undefined) {
item.count(identifiers, shapeDiscovered);
let shape: Shape<TEncodedShape> | undefined;
while ((shape = shapeToCount.pop()) !== undefined) {
shape.countReferencedShapesAndIdentifiers(identifiers, shapeDiscovered);
}
}

@@ -143,27 +147,28 @@ export function handleShapesAndIdentifiers<TEncodedShape>(
* Comparison by content would be difficult due to shape containing references to other shapes.
*
* @privateRemarks
* Unlike with identifiers, conversion from the initial form (this class / IdentifierToken) is done by the `encodeShape` method, not by general purpose logic in `handleShapesAndIdentifiers`.
* For `handleShapesAndIdentifiers` to do the conversion without help from `encodeShape`,
* Unlike with identifiers, conversion from the initial form (this class / IdentifierToken) is done by the `encodeShape` method, not by general purpose logic in `updateShapesAndIdentifiersEncoding`.
* For `updateShapesAndIdentifiersEncoding` to do the conversion without help from `encodeShape`,
* instances of this Shape class would have to either be or output an object that is identical to the `TEncodedShape` format except with all shape references as object references instead of indexes.
* Those objects would have to be deeply traversed looking for shape objects to replace with reference indexes.
* This is possible, but making it type safe would involve generating derived types from the `TEncodedShape` deeply replacing any shape references, as well as requiring deep traversal of all objects in the encoded output.
* Such an approach seemed less maintainable and readable than the design taken here which avoids the need for those derived types.
*/
export abstract class Shape<TEncodedShape> {
/**
* Count this shape's contents.
* Enumerate all contents of this shape that can get replaced with references (for better compression/deduplication) during encoding.
* This currently includes shapes and identifiers.
*
* Used to discover referenced shapes (to ensure they are included in the `shapes` passed to `encodeShape`),
* as well as count usages of shapes and identifiers for more efficient dictionary encoding. See {@link Counter}.
*
* @param shapes - must be invoked with each directly referenced shape (which must provided to `encodeShape`).
* @param shapeDiscovered - must be invoked with each directly referenced shape (which must be provided to `encodeShape`).
* Can be invoked multiple times if a shape is referenced more than once for more efficient dictionary encoding.
* Should not be invoked with `this` unless this shape references itself.
*/
public abstract count(
public abstract countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapes: (shape: Shape<TEncodedShape>) => void,
shapeDiscovered: (shape: Shape<TEncodedShape>) => void,
): void;

/**
Original file line number Diff line number Diff line change
@@ -23,7 +23,7 @@ import type { Counter, DeduplicationTable } from "./chunkCodecUtilities.js";
import {
type BufferFormat as BufferFormatGeneric,
Shape as ShapeGeneric,
handleShapesAndIdentifiers,
updateShapesAndIdentifiersEncoding,
} from "./chunkEncodingGeneric.js";
import type { FieldBatch } from "./fieldBatch.js";
import {
@@ -55,7 +55,7 @@ export function compressedEncode(
anyFieldEncoder.encodeField(cursor, cache, buffer);
batchBuffer.push(buffer);
}
return handleShapesAndIdentifiers(version, batchBuffer);
return updateShapesAndIdentifiersEncoding(version, batchBuffer);
}

export type BufferFormat = BufferFormatGeneric<EncodedChunkShape>;
@@ -175,7 +175,10 @@ export class AnyShape extends ShapeGeneric<EncodedChunkShape> {
return { d: encodedAnyShape };
}

public count(identifiers: Counter<string>, shapes: (shape: Shape) => void): void {}
public countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapeDiscovered: (shape: Shape) => void,
): void {}

public static encodeField(
cursor: ITreeCursorSynchronous,
@@ -329,8 +332,11 @@ export class InlineArrayShape
};
}

public count(identifiers: Counter<string>, shapes: (shape: Shape) => void): void {
shapes(this.inner.shape);
public countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapeDiscovered: (shape: Shape) => void,
): void {
shapeDiscovered(this.inner.shape);
}

public get shape(): this {
@@ -387,8 +393,11 @@ export class NestedArrayShape extends ShapeGeneric<EncodedChunkShape> implements
};
}

public count(identifiers: Counter<string>, shapes: (shape: Shape) => void): void {
shapes(this.inner.shape);
public countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapeDiscovered: (shape: Shape) => void,
): void {
shapeDiscovered(this.inner.shape);
}
}

Original file line number Diff line number Diff line change
@@ -121,21 +121,21 @@ export class NodeShape extends Shape<EncodedChunkShape> implements NodeEncoder {
};
}

public count(
public countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapes: (shape: Shape<EncodedChunkShape>) => void,
shapeDiscovered: (shape: Shape<EncodedChunkShape>) => void,
): void {
if (this.type !== undefined) {
identifiers.add(this.type);
}

for (const fieldEncoder of this.specializedFieldEncoders) {
identifiers.add(fieldEncoder.key);
shapes(fieldEncoder.encoder.shape);
shapeDiscovered(fieldEncoder.encoder.shape);
}

if (this.otherFieldsEncoder !== undefined) {
shapes(this.otherFieldsEncoder.shape);
shapeDiscovered(this.otherFieldsEncoder.shape);
}
}

Original file line number Diff line number Diff line change
@@ -11,7 +11,7 @@ import type { CounterFilter } from "../../../../feature-libraries/chunked-forest
// eslint-disable-next-line import/no-internal-modules
import { decode } from "../../../../feature-libraries/chunked-forest/codec/chunkDecoding.js";
// eslint-disable-next-line import/no-internal-modules
import { handleShapesAndIdentifiers } from "../../../../feature-libraries/chunked-forest/codec/chunkEncodingGeneric.js";
import { updateShapesAndIdentifiersEncoding } from "../../../../feature-libraries/chunked-forest/codec/chunkEncodingGeneric.js";
import type {
BufferFormat,
EncoderCache,
@@ -75,7 +75,7 @@ function checkDecode(
}

/**
* Clones anything handleShapesAndIdentifiers might modify in-place.
* Clones anything updateShapesAndIdentifiersEncoding might modify in-place.
*/
function cloneArrays<T>(data: readonly T[]): T[] {
return data.map((item) => (Array.isArray(item) ? cloneArrays(item) : item)) as T[];
@@ -87,7 +87,11 @@ function testDecode(
identifierFilter: CounterFilter<string>,
idCompressor?: IIdCompressor,
): EncodedFieldBatch {
const chunk = handleShapesAndIdentifiers(version, cloneArrays(buffer), identifierFilter);
const chunk = updateShapesAndIdentifiersEncoding(
version,
cloneArrays(buffer),
identifierFilter,
);

// TODO: check chunk matches schema

Original file line number Diff line number Diff line change
@@ -16,7 +16,7 @@ import type {
import {
IdentifierToken,
Shape,
handleShapesAndIdentifiers,
updateShapesAndIdentifiersEncoding,
// eslint-disable-next-line import/no-internal-modules
} from "../../../../feature-libraries/chunked-forest/codec/chunkEncodingGeneric.js";

@@ -41,9 +41,9 @@ type EncodedChunkShape = Static<typeof EncodedChunkShape>;
class TestShape extends Shape<EncodedChunkShape> {
public constructor(
public readonly data: string,
public readonly count: (
public readonly countReferencedShapesAndIdentifiers: (
identifiers: Counter<string>,
shapes: (shape: Shape<EncodedChunkShape>) => void,
shapeDiscovered: (shape: Shape<EncodedChunkShape>) => void,
) => void = () => {},
) {
super();
@@ -62,9 +62,9 @@ class TestConstantShape extends Shape<EncodedChunkShape> {
super();
}

public count(
public countReferencedShapesAndIdentifiers(
identifiers: Counter<string>,
shapes: (shape: Shape<EncodedChunkShape>) => void,
shapeDiscovered: (shape: Shape<EncodedChunkShape>) => void,
): void {}

public encodeShape(
@@ -78,9 +78,9 @@ class TestConstantShape extends Shape<EncodedChunkShape> {
const testConstantShape = new TestConstantShape();

describe("chunkEncodingGeneric", () => {
describe("handleShapesAndIdentifiers", () => {
describe("updateShapesAndIdentifiersEncoding", () => {
it("Empty", () => {
assert.deepEqual(handleShapesAndIdentifiers(version, []), {
assert.deepEqual(updateShapesAndIdentifiersEncoding(version, []), {
version,
identifiers: [],
shapes: [],
@@ -89,32 +89,35 @@ describe("chunkEncodingGeneric", () => {
});
it("data", () => {
const input = [["x", 1, [1, 2], { a: 1, b: 2 }]];
assert.deepEqual(handleShapesAndIdentifiers(version, input), {
assert.deepEqual(updateShapesAndIdentifiersEncoding(version, input), {
version,
identifiers: [],
shapes: [],
data: input,
});
});
it("identifier: inline", () => {
assert.deepEqual(handleShapesAndIdentifiers(version, [[new IdentifierToken("x")]]), {
version,
identifiers: [],
shapes: [],
data: [["x"]],
});
assert.deepEqual(
updateShapesAndIdentifiersEncoding(version, [[new IdentifierToken("x")]]),
{
version,
identifiers: [],
shapes: [],
data: [["x"]],
},
);
});
it("identifier: deduplicated", () => {
assert.deepEqual(
handleShapesAndIdentifiers(version, [
updateShapesAndIdentifiersEncoding(version, [
[new IdentifierToken("long string"), new IdentifierToken("long string")],
]),
{ version, identifiers: ["long string"], shapes: [], data: [[0, 0]] },
);
});
it("identifier: mixed", () => {
assert.deepEqual(
handleShapesAndIdentifiers(version, [
updateShapesAndIdentifiersEncoding(version, [
[
new IdentifierToken("long string"),
5,
@@ -132,19 +135,22 @@ describe("chunkEncodingGeneric", () => {
);
});
it("shape: minimal", () => {
assert.deepEqual(handleShapesAndIdentifiers(version, [[new TestShape("shape data")]]), {
version,
identifiers: [],
shapes: [{ b: "shape data" }],
data: [[0]],
});
assert.deepEqual(
updateShapesAndIdentifiersEncoding(version, [[new TestShape("shape data")]]),
{
version,
identifiers: [],
shapes: [{ b: "shape data" }],
data: [[0]],
},
);
});
it("shape: counted", () => {
const shape1 = new TestShape("1");
const shape2 = new TestShape("2");
const shape3 = new TestShape("3");
assert.deepEqual(
handleShapesAndIdentifiers(version, [
updateShapesAndIdentifiersEncoding(version, [
[shape1, shape3, shape3, shape2, shape3, shape2],
]),
{
@@ -168,7 +174,7 @@ describe("chunkEncodingGeneric", () => {
countShape(shape2);
countShape(shape3); // cycle
});
assert.deepEqual(handleShapesAndIdentifiers(version, [[shape3, shape3]]), {
assert.deepEqual(updateShapesAndIdentifiersEncoding(version, [[shape3, shape3]]), {
version,
identifiers: ["deduplicated-id"],
// Ensure shapes are sorted by most frequent first
@@ -179,7 +185,7 @@ describe("chunkEncodingGeneric", () => {

it("nested arrays", () => {
assert.deepEqual(
handleShapesAndIdentifiers(version, [
updateShapesAndIdentifiersEncoding(version, [
[[[new IdentifierToken("long string"), new IdentifierToken("long string")]]],
]),
{ version, identifiers: ["long string"], shapes: [], data: [[[[0, 0]]]] },
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ import {
import {
type BufferFormat,
IdentifierToken,
handleShapesAndIdentifiers,
updateShapesAndIdentifiersEncoding,
// eslint-disable-next-line import/no-internal-modules
} from "../../../../feature-libraries/chunked-forest/codec/chunkEncodingGeneric.js";
import {
@@ -159,7 +159,7 @@ describe("compressedEncode", () => {
const buffer: BufferFormat<EncodedChunkShape> = [];
encodeValue(value, shape, buffer);
assert.deepEqual(buffer, encoded);
const processed = handleShapesAndIdentifiers(version, [buffer]);
const processed = updateShapesAndIdentifiersEncoding(version, [buffer]);
assert(processed.data.length === 1);
const stream = { data: processed.data[0], offset: 0 };
const decoded = readValue(stream, shape, {
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ describe("nodeShape", () => {
it("empty node", () => {
const shape = new NodeShape(undefined, false, [], undefined);
const identifierCounter = new Counter<string>();
shape.count(identifierCounter, () => fail());
shape.countReferencedShapesAndIdentifiers(identifierCounter, () => fail());
assert(identifierCounter.buildTable().indexToValue.length === 0);

const cache = new EncoderCache(
@@ -54,7 +54,7 @@ describe("nodeShape", () => {
const shape = new NodeShape(brand("foo"), true, [], undefined);

const identifierCounter = new Counter<string>();
shape.count(identifierCounter, () => fail());
shape.countReferencedShapesAndIdentifiers(identifierCounter, () => fail());
const cache = new EncoderCache(
() => fail(),
() => fail(),
Loading
Oops, something went wrong.