-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backporting BJData features to UBJSON - ND array, u/m/M/h markers and handling of NaN #109
Comments
Some reasons why I don't like your ND array spec is that 1) The way you use
# is ambiguous with the count specifier. 2) the way you use [ seems to add
an extra character where it's not needed.
3) It explicitly disallows multidimensional arrays of non-fixed or
structured type (which is an unusual but reasonable use case that I think
UBJ should support)
4) it seems to violate the TLV 'convention' we use somewhat.
But yes, I think we should work together on merging the specs if possible.
…On Wed, May 13, 2020 at 10:20 AM Qianqian Fang ***@***.***> wrote:
@Steve132 <https://github.com/Steve132>, Re your comments at
Steve132/ubj#8 <Steve132/ubj#8> and #108
<#108>
I think the right step forward for the project would be to agree on an
array spec, agree on other changes, and move forward to a 1.0 spec, along
with reference implementations in as many languages as possible and a test
suite.
Again, I am happy to assist on backporting the new syntax to UBJSON so we
do not have to maintain two separate specs.
The main additions made in BJData Draft 1 include
1 an optimized ND array container (via # followed by a 1D array object)
https://github.com/OpenJData/bjdata/blob/Draft_1/Binary_JData_Specification.md#optimized-n-dimensional-array-of-uniform-type
2 new dedicated markers for unsigned integers and half-precision
floating-point numbers
***@***.***
<NeuroJSON/bjdata@95544dc>
1. Instead of converting NaN/+-infinity to null, BJData keeps their
respective IEEE 754 binary forms
***@***.***#diff-d94a316d99f805938f1320db036a84a5L174
<NeuroJSON/bjdata@f3d338f#diff-d94a316d99f805938f1320db036a84a5L174>
I will explain the rationales for each of my additions, and we can
discuss/approve/disapprove in this thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIMAE6CT5637MMPTSJGLRRKUBHANCNFSM4M7ZM7PQ>
.
|
For the ND array construct, I am aware that @Steve132 has a similar proposal using the below construct
the ND array optimized container used in BJData was originally proposed in this mailing list post back in 2013, it is done via
The main reasons that had led to my prefer to the
in my opinion, this requires the least amount of work for parser updates
|
I don't think so - in draft 12, # is supposed to be followed by one of the integer markers
For extending to template objects, I reserved |
it does not need to add handling to a new marker in an existing parser,
instead, one can just call the array parsing function to read out the
dimensions when #+[ is encountered. You can see this in below examples:
That's actually a good argument *against* doing this. Parsing the array as
an array like you've done here in code is a bug, because it actually allows
people to use nd-arrays (and arrays of strings or objects or floats) as the
dimension specifier. We don't *want* parsers to be encouraged to re-use
the existing array parsing code, because if they do that's incorrect and
leads to bugs. Really what your spec says is that you need to either
implement most of the array parser twice (one for the general version, and
again for the constrained version) OR you need to add a 'constrain to 1d
array' flag to the code.
it does not limit the maximum dimension to 255 as the @ construct
I'd say that, in practice, this limitation isn't an actual problem. I've
never seen a 256-dimensional array anywhere in actual usage for any problem
domain. But sure.
the row-major byte order aligns with the conventions for the majority of
programming languages, such as C, C++, python, Javascript Typed array etc,
therefore, the data do not require transpose when saving/reading.
I actually completely agree with this.
it feels intuitive because for an ND array, its count is not a single
number but a 1D vector.
I understand your argument here.
One concern that I have: in UBJ I'd want to relax the restriction that an
ND-array must have a type. An array can have a count but not a type in
UBJSON. To me it seems like it's reasonable to do an ND array of unknown
things.
This construct also align with the general template construct I am about
to propose - by allowing $+{ or #+{ to represent payload less object
template, s
$+{ already has a meaning in UBJSON. It's a fixed array where each item is
an object. The only available place for extension is #{
I think we can come together to find a middle ground spect here that
doesn't break backwards compatibility for you in 99% of cases but fixes
some of these problems.
…On Wed, May 13, 2020 at 10:54 AM Qianqian Fang ***@***.***> wrote:
For the ND array construct, I am aware that @Steve132
<https://github.com/Steve132> has a similar proposal using the below
construct
[[] [$] [type] [@] [1-byte Ndim] [Nx type] [Nx] [Ny type] [Ny] ... [serialized ND array in column-major order]
the ND array optimized container used in BJData was originally proposed in
this mailing list post back in 2013, it is done via #+a UBJSON 1D integer
array object, followed by row-major serialized array elements, for example
[[] [$] [type] [#] [[] [$] [Nx type] [#] [Ndim type] [Ndim] [Nx Ny Nz ...] [Nx*Ny*Nz*...*sizeof(type)]
or
[[] [$] [type] [#] [[] [Nx type] [nx] [Ny type] [Ny] [Nz type] [Nz] ... []] [Nx*Ny*Nz*...*sizeof(type)]
The main reasons that had led to my prefer to the #+[ construct are
1. it does not need to add handling to a new marker in an existing
parser, instead, one can just call the array parsing function to read out
the dimensions when #+[ is encountered. You can see this in below
examples:
- for JSONLab:
https://github.com/fangq/jsonlab/blob/master/loadubjson.m#L135-L138
- for py-ubjson: ***@***.***
#diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229
<NeuroJSON/pybj@b90e140#diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229>
- for ubj: ***@***.***#diff-572570f413b5480c5985c6a5c9ede5b2R359
<fangq/ubj@cca51f6#diff-572570f413b5480c5985c6a5c9ede5b2R359>
in my opinion, this requires the least amount of work for parser updates
1.
it does not limit the maximum dimension to 255 as the @ construct
2.
the row-major byte order aligns with the conventions for the majority
of programming languages, such as C, C++, python, Javascript Typed array
etc, therefore, the data do not require transpose when saving/reading.
3.
it feels intuitive because for an ND array, its count is not a single
number but a 1D vector.
4.
This construct also align with the general template construct I am
about to propose - by allowing $+{ or #+{ to represent payload less
object template, similar to what I proposed for messagepack: msgpack/msgpack#267
(comment)
<msgpack/msgpack#267 (comment)>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIMDEOLLRDPMFZZCV6XTRRKYD5ANCNFSM4M7ZM7PQ>
.
|
"For extending to template objects, I reserved $+{ or #+{ for this purpose,
"
${ already has a meaning in UBJSON. (It's an array of objects). Keep in
mind that an array of objects is NOT always going to be a template.
An ND-array of dynamic type does not always have to be typed, and it
doesn't always have to be a template. The template use case is a separate
case.
…On Wed, May 13, 2020 at 11:20 AM Steven Braeger ***@***.***> wrote:
>it does not need to add handling to a new marker in an existing parser,
instead, one can just call the array parsing function to read out the
dimensions when #+[ is encountered. You can see this in below examples:
That's actually a good argument *against* doing this. Parsing the array
as an array like you've done here in code is a bug, because it actually
allows people to use nd-arrays (and arrays of strings or objects or floats)
as the dimension specifier. We don't *want* parsers to be encouraged to
re-use the existing array parsing code, because if they do that's incorrect
and leads to bugs. Really what your spec says is that you need to either
implement most of the array parser twice (one for the general version, and
again for the constrained version) OR you need to add a 'constrain to 1d
array' flag to the code.
> it does not limit the maximum dimension to 255 as the @ construct
I'd say that, in practice, this limitation isn't an actual problem. I've
never seen a 256-dimensional array anywhere in actual usage for any problem
domain. But sure.
> the row-major byte order aligns with the conventions for the majority of
programming languages, such as C, C++, python, Javascript Typed array etc,
therefore, the data do not require transpose when saving/reading.
I actually completely agree with this.
> it feels intuitive because for an ND array, its count is not a single
number but a 1D vector.
I understand your argument here.
One concern that I have: in UBJ I'd want to relax the restriction that an
ND-array must have a type. An array can have a count but not a type in
UBJSON. To me it seems like it's reasonable to do an ND array of unknown
things.
> This construct also align with the general template construct I am about
to propose - by allowing $+{ or #+{ to represent payload less object
template, s
$+{ already has a meaning in UBJSON. It's a fixed array where each item
is an object. The only available place for extension is #{
I think we can come together to find a middle ground spect here that
doesn't break backwards compatibility for you in 99% of cases but fixes
some of these problems.
On Wed, May 13, 2020 at 10:54 AM Qianqian Fang ***@***.***>
wrote:
> For the ND array construct, I am aware that @Steve132
> <https://github.com/Steve132> has a similar proposal using the below
> construct
>
> [[] [$] [type] [@] [1-byte Ndim] [Nx type] [Nx] [Ny type] [Ny] ... [serialized ND array in column-major order]
>
> the ND array optimized container used in BJData was originally proposed
> in this mailing list post back in 2013, it is done via #+a UBJSON 1D
> integer array object, followed by row-major serialized array elements, for
> example
>
> [[] [$] [type] [#] [[] [$] [Nx type] [#] [Ndim type] [Ndim] [Nx Ny Nz ...] [Nx*Ny*Nz*...*sizeof(type)]
> or
> [[] [$] [type] [#] [[] [Nx type] [nx] [Ny type] [Ny] [Nz type] [Nz] ... []] [Nx*Ny*Nz*...*sizeof(type)]
>
> The main reasons that had led to my prefer to the #+[ construct are
>
> 1. it does not need to add handling to a new marker in an existing
> parser, instead, one can just call the array parsing function to read out
> the dimensions when #+[ is encountered. You can see this in below
> examples:
>
>
> - for JSONLab:
> https://github.com/fangq/jsonlab/blob/master/loadubjson.m#L135-L138
> - for py-ubjson: ***@***.***
> #diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229
> <NeuroJSON/pybj@b90e140#diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229>
> - for ubj: ***@***.***#diff-572570f413b5480c5985c6a5c9ede5b2R359
> <fangq/ubj@cca51f6#diff-572570f413b5480c5985c6a5c9ede5b2R359>
>
> in my opinion, this requires the least amount of work for parser updates
>
> 1.
>
> it does not limit the maximum dimension to 255 as the @ construct
> 2.
>
> the row-major byte order aligns with the conventions for the majority
> of programming languages, such as C, C++, python, Javascript Typed array
> etc, therefore, the data do not require transpose when saving/reading.
> 3.
>
> it feels intuitive because for an ND array, its count is not a single
> number but a 1D vector.
> 4.
>
> This construct also align with the general template construct I am
> about to propose - by allowing $+{ or #+{ to represent payload less
> object template, similar to what I proposed for messagepack: msgpack/msgpack#267
> (comment)
> <msgpack/msgpack#267 (comment)>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#109 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAGWIMDEOLLRDPMFZZCV6XTRRKYD5ANCNFSM4M7ZM7PQ>
> .
>
|
I think I can draw up a proposed ND-array spec for this that doesn't break
backcompatibility with you 99% of the time but fixes these issues. Give
me a day or so.
…On Wed, May 13, 2020 at 11:23 AM Steven Braeger ***@***.***> wrote:
"For extending to template objects, I reserved $+{ or #+{ for this
purpose, "
${ already has a meaning in UBJSON. (It's an array of objects). Keep in
mind that an array of objects is NOT always going to be a template.
An ND-array of dynamic type does not always have to be typed, and it
doesn't always have to be a template. The template use case is a separate
case.
On Wed, May 13, 2020 at 11:20 AM Steven Braeger ***@***.***>
wrote:
> >it does not need to add handling to a new marker in an existing parser,
> instead, one can just call the array parsing function to read out the
> dimensions when #+[ is encountered. You can see this in below examples:
>
> That's actually a good argument *against* doing this. Parsing the array
> as an array like you've done here in code is a bug, because it actually
> allows people to use nd-arrays (and arrays of strings or objects or floats)
> as the dimension specifier. We don't *want* parsers to be encouraged to
> re-use the existing array parsing code, because if they do that's incorrect
> and leads to bugs. Really what your spec says is that you need to either
> implement most of the array parser twice (one for the general version, and
> again for the constrained version) OR you need to add a 'constrain to 1d
> array' flag to the code.
>
> > it does not limit the maximum dimension to 255 as the @ construct
>
> I'd say that, in practice, this limitation isn't an actual problem. I've
> never seen a 256-dimensional array anywhere in actual usage for any problem
> domain. But sure.
>
> > the row-major byte order aligns with the conventions for the majority
> of programming languages, such as C, C++, python, Javascript Typed array
> etc, therefore, the data do not require transpose when saving/reading.
>
> I actually completely agree with this.
>
> > it feels intuitive because for an ND array, its count is not a single
> number but a 1D vector.
>
> I understand your argument here.
>
> One concern that I have: in UBJ I'd want to relax the restriction that an
> ND-array must have a type. An array can have a count but not a type in
> UBJSON. To me it seems like it's reasonable to do an ND array of unknown
> things.
>
> > This construct also align with the general template construct I am
> about to propose - by allowing $+{ or #+{ to represent payload less
> object template, s
>
> $+{ already has a meaning in UBJSON. It's a fixed array where each item
> is an object. The only available place for extension is #{
>
> I think we can come together to find a middle ground spect here that
> doesn't break backwards compatibility for you in 99% of cases but fixes
> some of these problems.
>
>
>
>
> On Wed, May 13, 2020 at 10:54 AM Qianqian Fang ***@***.***>
> wrote:
>
>> For the ND array construct, I am aware that @Steve132
>> <https://github.com/Steve132> has a similar proposal using the below
>> construct
>>
>> [[] [$] [type] [@] [1-byte Ndim] [Nx type] [Nx] [Ny type] [Ny] ... [serialized ND array in column-major order]
>>
>> the ND array optimized container used in BJData was originally proposed
>> in this mailing list post back in 2013, it is done via #+a UBJSON 1D
>> integer array object, followed by row-major serialized array elements, for
>> example
>>
>> [[] [$] [type] [#] [[] [$] [Nx type] [#] [Ndim type] [Ndim] [Nx Ny Nz ...] [Nx*Ny*Nz*...*sizeof(type)]
>> or
>> [[] [$] [type] [#] [[] [Nx type] [nx] [Ny type] [Ny] [Nz type] [Nz] ... []] [Nx*Ny*Nz*...*sizeof(type)]
>>
>> The main reasons that had led to my prefer to the #+[ construct are
>>
>> 1. it does not need to add handling to a new marker in an existing
>> parser, instead, one can just call the array parsing function to read out
>> the dimensions when #+[ is encountered. You can see this in below
>> examples:
>>
>>
>> - for JSONLab:
>> https://github.com/fangq/jsonlab/blob/master/loadubjson.m#L135-L138
>> - for py-ubjson: ***@***.***
>> #diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229
>> <NeuroJSON/pybj@b90e140#diff-fae7cbc0e5df9b9d3d69d5cc4453b6feR226-R229>
>> - for ubj: ***@***.***
>> #diff-572570f413b5480c5985c6a5c9ede5b2R359
>> <fangq/ubj@cca51f6#diff-572570f413b5480c5985c6a5c9ede5b2R359>
>>
>> in my opinion, this requires the least amount of work for parser updates
>>
>> 1.
>>
>> it does not limit the maximum dimension to 255 as the @ construct
>> 2.
>>
>> the row-major byte order aligns with the conventions for the
>> majority of programming languages, such as C, C++, python, Javascript Typed
>> array etc, therefore, the data do not require transpose when saving/reading.
>> 3.
>>
>> it feels intuitive because for an ND array, its count is not a
>> single number but a 1D vector.
>> 4.
>>
>> This construct also align with the general template construct I am
>> about to propose - by allowing $+{ or #+{ to represent payload less
>> object template, similar to what I proposed for messagepack: msgpack/msgpack#267
>> (comment)
>> <msgpack/msgpack#267 (comment)>
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#109 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAGWIMDEOLLRDPMFZZCV6XTRRKYD5ANCNFSM4M7ZM7PQ>
>> .
>>
>
|
For the record, I implemented my own proposal for N-dimensional arrays that does not require extra syntax. It just adds the notion that, in a strongly-typed array, all the bytes that come before the first data field form the array's type specifier. Type specifiers can then be parsed recursively in a very natural way:
This is efficient for the kind of data I deal with, which contains lots of very long and narrow matrices of float32, just like in this example. Binary JData would be about as efficient for me, at the cost the extra complexity that comes with adding new syntax. The current UBJSON specification does not permit storing this efficiently. |
@edgar-bonet
We sort of went over this at the time, but can you clarify *semantically*
what makes your proposal different than either 2 of what we've already sort
of proposed here? It seems like your proposal the way you are using it
provides the same *semantics* as the other two proposals, but it's much
more complicated to parse. (you have to lookahead to complete the entire
array template type recursively and then pop up off the stack to get the
resulting N-D dimensions)
It also (worryingly, to me at least) seems to imply that it's a more
generic 'schema' that would allow people to insert templated arrays of
arrays into the stack structure, but that's not the case. For example,
[ [ ] [$] [ [ ][#][[][i][2] [#][3200] = array of type (array of type
(dynamic array) with length 2) with length 3200
…On Wed, May 13, 2020 at 2:36 PM Edgar Bonet ***@***.***> wrote:
For the record, I implemented my own proposal for N-dimensional arrays
<http://../issues/43> that does not require extra syntax. It just adds
the notion that, in a strongly-typed array, all the bytes that come before
the first data field form the array's type specifier. Type specifiers can
then be parsed recursively in a very natural way:
- [[][$][d][#][i][2] = array of type float32 and length 2
- [[][$] [[][$][d][#][i][2] [#][I][3200] = array of type (array of
type float32 and length 2) and length 3200
This is efficient for the kind of data I deal with, which contains lots of
very long and narrow matrices of float32, just like in this example. Binary
JData would be about as efficient for me, at the cost the extra complexity
that comes with adding new syntax. The current UBJSON specification does
not permit storing this efficiently.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIME6WAYI4AET6E5I333RRLSCBANCNFSM4M7ZM7PQ>
.
|
not entirely sure what was the main concern against using recursive calls. In a way, we are already doing it in almost all existing parsers because JSON/UBJSON array and object constructs can be embedded in one another. If you are worried about the recursion depth and stack-overflow risks, then I think the same concerns should be applied to regular array/object parsing, and not specific to this size construct. the only thing to avoid stack-overflow is to change the entire parser structure using an iterator or some sort to convert recursion into loops.
if I just want to prevent it from being abused by constructing deep-recursion, adding a simple flag, like what I just added to jsonlab will likely eliminate the risk, and the required change is quite minimal and local |
@Steve132 wrote:
Indeed. I am commenting mostly for @fangq, who was not there at the time.
Of course. This is all about providing a representation of the most common forms of multidimensional arrays that is more efficient without altering their semantics.
I disagree. Recursion makes it easy. Recursion is what makes the JSON data model both simple an powerful: containers hold values which can themselves be containers which hold values which...
Yes, but never more than one byte:
But note that this has nothing to do with my proposal, which does not introduce any extra lookahead requirements.
I can't visually parse your example. |
I don't have a problem with recursion in general. Recursion in general is
great, especially for a parser.
I have two separate arguments about two separate proposals: The first
argument is about this claim of your proposal "The implementer can re-use
the same parser". My claim is that they cannot.
[[] [$] [{] [#] [U] [3] [{][}][{][}][{][}]
This is a valid 1-D array specifier. (in particular, it's a 1-D array
of 3 empty objects} It is NOT a valid N-D array dimension header
according to your spec (the type is object not int). Therefore you
*can't* simply use the standard array parser unmodified to parse the
N-D array header. You have to have two array parsers.
One of them that works generically and the other which is constrained
to the N-D array header spec (e.g, only integers, only one
dimensional, only TCV).
But an implementer who re-uses the 1-D array parser code unmodified to
parse your N-D array header would produce a broken implementation,
because it would accept the above as a valid ND array header. (in
fact, your patch to ubjr actually has this bug). Fixing the bug
requires implementing a special case version of the array parser that *only*
reads a valid N-D array header and not any array. So it's avoidable.
But there's no code reuse here. Reusing the code actually creates a
bug. So since you can't reuse code, code reuse isn't a feature of
your design. What's more, it encourages implementers to make the bug that you made (because it superficially looks like it works).
The second argument is about @edgar-bonet's implementation, which
basically boils down to.
"What's easier to parse and understand: 'floats, 3 dimensions: 4 rows
10 columns 3 channels' or '( ( ( ( float ) 3 ) 10 ) 4'" They both
work, but to me the first one is a lot easier to parse and validate.
…On Wed, May 13, 2020 at 3:25 PM Qianqian Fang ***@***.***> wrote:
We don't *want* parsers to be encouraged to re-use the existing array
parsing code, because if they do that's incorrect and leads to bugs.
not entirely sure what was the main concern against using recursive calls.
In a way, we are already doing it in almost all existing parsers because
JSON/UBJSON array and object constructs can be embedded in one another. If
you are worried about the recursion depth and stack-overflow risks, then I
think the same concerns should be applied to regular array/object parsing,
and not specific to this size construct.
the only thing to avoid stack-overflow is to change the entire parser
structure using an iterator or some sort to convert recursion into loops.
Really what your spec says is that you need to either implement most of
the array parser twice (one for the general version, and again for the
constrained version) OR you need to add a 'constrain to 1d array' flag to
the code.
if I just want to prevent it from being abused by constructing
deep-recursion, adding a simple flag, like what I just added to jsonlab
will likely eliminate the risk, and the required change is quite minimal
and local
***@***.***
<fangq/jsonlab@86efe89>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIMDUVLNQBUD7NI5HVALRRLX37ANCNFSM4M7ZM7PQ>
.
|
@edgar-bonet If I want to read my ND array *as* an ND array, under your
proposal, I have to parse the entire header recursively all the way down
to get the type, then all the way back up and populate my array in reverse
order in order to get all the dimensions.
( ( ( ( float ) 3 ) 10 ) 4 ) vs "float, 3 dimensions, 4,10,3"
…On Wed, May 13, 2020 at 5:02 PM Steven Braeger ***@***.***> wrote:
I don't have a problem with recursion in general. Recursion in general
is great, especially for a parser.
I have two separate arguments about two separate proposals: The first
argument is about this claim of your proposal "The implementer can re-use
the same parser". My claim is that they cannot.
[[] [$] [{] [#] [U] [3] [{][}][{][}][{][}]
This is a valid 1-D array specifier. (in particular, it's a 1-D array of 3 empty objects} It is NOT a valid N-D array dimension header according to your spec (the type is float not int). Therefore you *can't* simply use the standard array parser unmodified to parse the N-D array header. You have to have two array parsers.
One of them that works generically and the other which is constrained to the N-D array header spec (e.g, only integers, only one dimensional, only TCV).
But an implementer who re-uses the 1-D array parser code unmodified to parse your N-D array header would produce a broken implementation, because it would accept the above as a valid ND array header. (in fact, your patch to ubjr actually has this bug). Fixing the bug requires
implementing a special case version of the array parser that *only* reads a valid N-D array header and not any array. So it's avoidable. But there's no code reuse here. Reusing the code actually creates a bug. So since you can't reuse code, code reuse isn't a feature of your design.
The second argument is about @edgar-bonet's implementation, which basically boils down to.
"What's easier to parse and understand: 'floats, 3 dimensions: 4 rows 10 columns 3 channels' or '( ( ( ( float ) 76 ) 10 ) 4'" They both work, but to me the first one is a lot easier to parse and validate.
On Wed, May 13, 2020 at 3:25 PM Qianqian Fang ***@***.***>
wrote:
> We don't *want* parsers to be encouraged to re-use the existing array
> parsing code, because if they do that's incorrect and leads to bugs.
>
> not entirely sure what was the main concern against using recursive
> calls. In a way, we are already doing it in almost all existing parsers
> because JSON/UBJSON array and object constructs can be embedded in one
> another. If you are worried about the recursion depth and stack-overflow
> risks, then I think the same concerns should be applied to regular
> array/object parsing, and not specific to this size construct.
>
> the only thing to avoid stack-overflow is to change the entire parser
> structure using an iterator or some sort to convert recursion into loops.
>
> Really what your spec says is that you need to either implement most of
> the array parser twice (one for the general version, and again for the
> constrained version) OR you need to add a 'constrain to 1d array' flag to
> the code.
>
> if I just want to prevent it from being abused by constructing
> deep-recursion, adding a simple flag, like what I just added to jsonlab
> will likely eliminate the risk, and the required change is quite minimal
> and local
>
> ***@***.***
> <fangq/jsonlab@86efe89>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#109 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAGWIMDUVLNQBUD7NI5HVALRRLX37ANCNFSM4M7ZM7PQ>
> .
>
|
To be clear, I don't necessarily think the bug caused by people wanting to re-use the array code is so severe as to veto using the JData standard for the syntax. As long as we warn implementers in the spec not to do it it's easy enough to avoid. But the JData standard does use more characters as a result as well...so it's worth considering. |
@Steve132 wrote:
My understanding is that, under the current spec, it is not. Either the array is untyped and the values have to be spelled in full:
or the array is typed and the type (a single byte) has to be removed from the values:
I personally find “((float 3) 10) 4” easier, but that may be just me. I can understand that humans may find “floats, 3 dimensions: 4 rows 10 columns 3 channels” easier. :-) |
or the array is typed and the type (a single byte) has to be removed from
the values:
You're right. My bad. The point remains that
[[] [#][U][3] [{][}][{][}][{][}]
Is a valid array but is not a valid N-D array integer dimension
header, so the code cannot be reused unmodified to parse both
I personally find “((float 3) 10) 4” easier, but that may be just me. I
can understand that humans may find “floats, 3 dimensions: 4 rows 10
columns 3 channels” easier. :-)
That's basically my argument. I'm a mere human and I don't know
anyone who prefers your way (no offense) it certainly works, but it's
meaning is obscure to me. Parsing it properly *requires* a
stack-style recursion. But mine and @fangq's do not.
…On Wed, May 13, 2020 at 5:36 PM Edgar Bonet ***@***.***> wrote:
@Steve132 <https://github.com/Steve132> wrote:
[[] [$] [{] [#] [U] [3] [{][}][{][}][{][}] is a valid 1-D array of 3
empty objects
My understanding is that, under the current spec, it is not. Either the
array is untyped and the values have to be spelled in full:
[[] [#][U][3] [{][}][{][}][{][}]
or the array is typed and the type (a single byte) has to be removed from
the values:
[[] [$][{] [#][U][3] [}][}][}]
What's easier to parse and understand [...]
I personally find “((float 3) 10) 4” easier, but that may be just me. I
can understand that humans may find “floats, 3 dimensions: 4 rows 10
columns 3 channels” easier. :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIMECH42NQLWQO7XIG5LRRMHFZANCNFSM4M7ZM7PQ>
.
|
@edgar-bonet, thanks for your proposal - what this tells us is that we DO NEED to include a native ND array header in the spec because it is such an important data type to support. @Steve132 - Re your argument regarding whether it is appropriate to reuse an array parser You can do it in more than one ways for sure - as you suggested, one can certain write a dedicated 1D array parser, which just needs a few extra lines (involving a loop), one can do it for both I agree that my ubj patch does not "validate" the array size vector, but it can certainly be added. @edgar-bonet, I think your proposal is one of many ways to add this, but the downside is the needs to use stack for each additional dimension, this comes with a bigger risk of stack overflow and slower performance. |
@Steve132, for some reason, I felt this discussion is steered towards what is the best practice for writing a parser, which is not really the focus of this thread - we just define a format, and let the parser writers to come up with a parser that works with the format. Whether the parser is 100% compliant, whether it has a robust implementation, or have followed the best practices, they are really up to the parser writers. A good thing about JSON and UBJSON parser is that the format is so simple that writing a parser is not that difficult, that gives space for people to write their own flavors with different trade-offs (complexity vs utility). I am looking forward to hearing about your proposal of BJData compatible array construct. I don't mind you list both constructs as valid UBJSON formats, as long as these new features (which really comes from the research needs from my community) are supported, and subsequently supported by the related parsers. On a related note - I really like the MarkDown version of the UBJSON spec that py-ubjson team wrote, which I used as the starting point for writing the BJData spec. It takes the advantage of the convenient formatting supported by github and allows people to collaboratively develop a document. Comparing with the html version of the UBJSON spec, I feel the markdown version is much more efficient, easy to read and easy to revise. I really wish that UBJSON can take this new form to continue its development and future discussions (especially given that the ubjson.org website is not functional). |
Part of a standard is making sure that it's easy to implement without bugs. It does matter.
I probably won't include multiple constructs to do the same thing into UBJSON. But I do think that a lot of the other issues you've brought up are important and worth looking into. I think a big open question could be how much can we avoid fragmentation in this space: Are you willing to standardize on UBJSON to avoid fragmentation if it ends up being modified to sufficiently meets your needs? (e.g. eliminating the BJData spec?)
I agree 100%. Another big issue I'm currently working on is versioning. I think we need to make an "alpha" version (spec 12) a "beta" version (spec 13) and a "1.0 version" (including whatever we end up deciding. |
@Steve132, any update on this? |
Yeah I'm organizing my notes into a document.
…On Fri, May 29, 2020 at 11:52 AM Qianqian Fang ***@***.***> wrote:
@Steve132 <https://github.com/Steve132>, any update on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWIMDE7KAKELKQVDLTBM3RT7K2RANCNFSM4M7ZM7PQ>
.
|
@Steve132, @rkalla, and other UBJSON contributors and maintainers - I am glad to share an update: I just received a 5-year funding support from the NIH (National Institute of Health) to develop a new-generation neuroimaging data exchange platform named NeuroJSON (http://neurojson.org, https://github.com/NeuroJSON). The goal of the project is to develop a series of specifications to map major neuroimaging data files, standardized by the BIDS project (https://bids.neuroimaging.io/), to a unified JSON/BJData based exchange format to gain human-readability, easy parsing, reuse and integration with NoSQL database (such as MongoDB, CouchDB). As I mentioned above, the BJData spec (Binary JData) was largely based upon UBJSON, with the extensions described in this ticket. I see the updated UBJSON website now has a placeholder for UBJSON 2.0 - it would be wonderful if my above proposed extensions can be reviewed and incorporated into UBJSON spec. It will be mutually beneficial to join force so that our user communities can enjoy the works from both initiatives. By the way, we are recruiting a full-time programmer to drive the NeuroJSON project and develop new parsers/libraries/specifications, and incorporate those to prominent neuroimaging software such as FieldTrip, SPM, FreeSurfer, BIDS etc. If any UBJSON developer/contributors is interested in joining this effort, please checkout our job posting: https://careersmanager.pageuppeople.com/879/ci/en-us/job/507821/scientific-programmer |
Qianqian, congratulations!!
I started to take a stab at rethinking 2.0 while on a sabbatical between
careers. It’s a huge breaking refinement however and will go slowly so I’m
not sure that would work with your timelines.
We will support however we can, keep us posted!
Riyad
…On Wed, Sep 15, 2021 at 9:39 PM Qianqian Fang ***@***.***> wrote:
@Steve132 <https://github.com/Steve132>, @rkalla
<https://github.com/rkalla>, and other UBJSON contributors and
maintainers - I am glad to share an update:
I just received a 5-year funding support from the NIH (National Institute
of Health) to develop a new-generation neuroimaging data exchange platform
named *NeuroJSON* (http://neurojson.org, https://github.com/NeuroJSON).
The goal of the project is to develop a series of specifications to map
major neuroimaging data files, standardized by the BIDS project (
https://bids.neuroimaging.io/), to a unified JSON/BJData
<https://github.com/NeuroJSON/bjdata> based exchange format to gain
human-readability, easy parsing, reuse and integration with NoSQL database
(such as MongoDB, CouchDB).
As I mentioned above, the BJData spec (Binary JData) was largely based
upon UBJSON, with the extensions described in this ticket.
I see the updated UBJSON website now has a placeholder for UBJSON 2.0 - it
would be wonderful if my above proposed extensions can be reviewed and
incorporated into UBJSON spec. It will be mutually beneficial to join force
so that our user communities can enjoy the works from both initiatives.
By the way, we are recruiting a full-time programmer to drive the
NeuroJSON project and develop new parsers/libraries/specifications, and
incorporate those to prominent neuroimaging software such as FieldTrip,
SPM, FreeSurfer, BIDS etc. If any UBJSON developer/contributors is
interested in joining this effort, please checkout our job posting:
https://careersmanager.pageuppeople.com/879/ci/en-us/job/507821/scientific-programmer
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACG26F33WFIGDHNYLPRMSLUCFYITANCNFSM4M7ZM7PQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I think number 2 is a sensible extension to UBJSON, since missing unsigned types can introduce incompatibilities in some programming languages or platforms. This is because I also don't see which binary format signed integers are using in the spec (I am assuming two's complement, this should definitely also be added to the spec), which is technically not a given for all platforms, although true in practice. I don't think the extensions in 1 and 3 should be added to UBJSON, I think UBJSON should remain universal and as compatible to JSON itself as possible. I think one of the main advantages of UBJSON to other binary formats is, that it can be a drop in replacement to JSON, because they are so similar, and this should be kept, with the same limitations that JSON has. You would gain smaller data to be transferred and faster parsing speeds, but compatible representations of data. This is why I don't think an extension for N-D arrays makes sense, because it is too specific thus not universal, as well as not really compatible with JSON. An N-D array should know itself, how to serialize itself to UBJSON/JSON and deserialize and validate itself, this should not be part of the serialization format itself. It is imposing a kind of schema to a schemaless format. It is also limiting and not useful in a general sense. Small N-D arrays can be serialized in a linear fashion, but for big matrices for instance, a better format would probably be to store and serialize them sparsely. N-D arrays don't help here. Although UBJSON is a binary format, I think it is more important to keep the compatibility to JSON, since JSON is more widespread. As such, in my opinion, UBJSON should also not be extended in a manner to enable the serialization of specific binary data at all, since JSON itself is not suitable for that. UBJSON makes it more convenient to store binary data than JSON, but that should not be the reason to use it. There are better binary formats for that. |
@Steve132, Re your comments at Steve132/ubj#8 and #108
Again, I am happy to assist on backporting the new syntax to UBJSON so we do not have to maintain two separate specs.
The main additions made in BJData Draft 1 include
#
followed by a 1D array object)https://github.com/OpenJData/bjdata/blob/Draft_1/Binary_JData_Specification.md#optimized-n-dimensional-array-of-uniform-type
new dedicated markers for unsigned integers and half-precision floating-point numbers
NeuroJSON/bjdata@95544dc
Instead of converting NaN/+-infinity to null, BJData keeps their respective IEEE 754 binary forms
NeuroJSON/bjdata@f3d338f#diff-d94a316d99f805938f1320db036a84a5L174
I will explain the rationales for each of my additions, and we can discuss/approve/disapprove in this thread.
The text was updated successfully, but these errors were encountered: