Skip to content

default arrayLengthFormat: Q or L? #769

@petrelharp

Description

@petrelharp

In the [documentation() it says that Q is the default for the array length format in metadata schemas, but in the code it appears to be L. I'd propose a fix, but I'm not sure if there was a rationale behind choosing one over the other. (However, I'd vote for I, maybe, as it's shorter - we don't expect long arrays of metadata.)

Also: it appears that encoding with i will decode fine with either of the other four-byte integer, I or L. Does anyone know if this is this guaranteed to work?

Here's some code that verifies this is the case.

import tskit
import struct

msd = { "codec": "struct",
        "type": "object",
        "properties": {
            "a": {
                "type": "integer",
                "binaryFormat": "d",
                "index": 1
                },
            "b": {
                "type": "array",
                "index": 2,
                "items": {
                    "type": "integer",
                    "binaryFormat": "d",
                },
            },
        },
    }

exl = [ {"a" : 1, "b": list(range(k))} for k in range(4) ]

def encode(d):
    n = len(d['b'])
    struct_string = "<di" + "d" * n
    md = struct.pack(struct_string, d['a'], n, *d['b'])
    return md

def decode(md):
    n = int((len(md) - 12 ) / 8)
    struct_string = "<di" + "d" * n
    x = struct.unpack(struct_string, md)
    return {'a': int(x[0]), 'b': list(map(int, x[2:]))}

def test_format(msd, x):
    if x is None:
        if 'arrayLengthFormat' in msd['properties']['b']:
            del msd['properties']['b']['arrayLengthFormat']
    else:
        msd['properties']['b']['arrayLengthFormat'] = x
    ms = tskit.MetadataSchema(msd)

    for ex in exl:
        pyen = encode(ex)
        py = decode(pyen)
        tsken = ms.validate_and_encode_row(ex)
        tsk = ms.decode_row(tsken)
        assert(ex == py)
        assert(ex == tsk)
        if x is None or x in ['I', 'L']:
            assert(pyen == tsken)
            tskpy = decode(tsken)
            pytsk = ms.decode_row(pyen)
            assert(ex == tskpy)
            assert(ex == pytsk)
        else:
            assert(pyen != tsken)

for x in [None, 'I', 'L', 'Q']:
    test_format(msd, x)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions