-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Description
In the [documentation() it says that Q is the default for the array length format in metadata schemas, but in the code it appears to be L. I'd propose a fix, but I'm not sure if there was a rationale behind choosing one over the other. (However, I'd vote for I, maybe, as it's shorter - we don't expect long arrays of metadata.)
Also: it appears that encoding with i will decode fine with either of the other four-byte integer, I or L. Does anyone know if this is this guaranteed to work?
Here's some code that verifies this is the case.
import tskit
import struct
msd = { "codec": "struct",
"type": "object",
"properties": {
"a": {
"type": "integer",
"binaryFormat": "d",
"index": 1
},
"b": {
"type": "array",
"index": 2,
"items": {
"type": "integer",
"binaryFormat": "d",
},
},
},
}
exl = [ {"a" : 1, "b": list(range(k))} for k in range(4) ]
def encode(d):
n = len(d['b'])
struct_string = "<di" + "d" * n
md = struct.pack(struct_string, d['a'], n, *d['b'])
return md
def decode(md):
n = int((len(md) - 12 ) / 8)
struct_string = "<di" + "d" * n
x = struct.unpack(struct_string, md)
return {'a': int(x[0]), 'b': list(map(int, x[2:]))}
def test_format(msd, x):
if x is None:
if 'arrayLengthFormat' in msd['properties']['b']:
del msd['properties']['b']['arrayLengthFormat']
else:
msd['properties']['b']['arrayLengthFormat'] = x
ms = tskit.MetadataSchema(msd)
for ex in exl:
pyen = encode(ex)
py = decode(pyen)
tsken = ms.validate_and_encode_row(ex)
tsk = ms.decode_row(tsken)
assert(ex == py)
assert(ex == tsk)
if x is None or x in ['I', 'L']:
assert(pyen == tsken)
tskpy = decode(tsken)
pytsk = ms.decode_row(pyen)
assert(ex == tskpy)
assert(ex == pytsk)
else:
assert(pyen != tsken)
for x in [None, 'I', 'L', 'Q']:
test_format(msd, x)
Metadata
Metadata
Assignees
Labels
No labels