Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define conversions across types #319

Open
andreubotella opened this issue Jul 20, 2020 · 6 comments
Open

Define conversions across types #319

andreubotella opened this issue Jul 20, 2020 · 6 comments

Comments

@andreubotella
Copy link
Member

As per whatwg/encoding#215 (comment), we might want to enable Infra types to define conversions from and to other types.

@annevk
Copy link
Member

annevk commented Jul 22, 2020

So what do we need here?

  • Byte sequence as list.
  • String as list. Code unit and code point, presumably? Encoding needs code point, but I suspect in other places we would want to do code units, if anything.

And the reverse?

Also, should we make it implicit so you can write <a for=list>For each</a> <var>byte</var> of <var>bytes</var> or do we want <var>bytes</var> to be explicitly converted to a list first?

Or even further, do we want to say that byte sequences and strings are fundamentally lists? (I guess that doesn't work for strings do to code unit/code point stuff.)

@andreubotella
Copy link
Member Author

andreubotella commented Sep 14, 2020

I'm looking at the various usages of the Encoding hooks across several standards, and they seem to be called almost every time with a byte sequence (respectively, with a string), with the return value being used as a string (resp. byte sequence). Note that this already relied on implicit conversions before whatwg/encoding#215.

I suppose it might be fine to make a conversion implicit if it's on an algorithm boundary with well-defined types. For example, if "decode" is called with a byte sequence, it's clear that it has to be converted into an I/O queue of bytes. Likewise, if inside the steps for "decode", a string was returned, it'd be clear that it'd have to be converted into an I/O queue of scalar values inside the decode operation. But from outside the decode algorithm, the output type of the conversion is not necessarily clear, and since the range of possible types might be open-ended, the conversion would have to be explicit:

Let string be the result of UTF-8 decoding byteSeq, converted to a string.

@annevk
Copy link
Member

annevk commented Sep 14, 2020

That, or we define an I/O queue of scalar values that contains end-of-queue as being interchangeable with a scalar value string. That might also address the for each problem although I guess you'd not want end-of-queue to show up there... Or we define a string-returning version of the frequently invoked decoding algorithms.

@andreubotella
Copy link
Member Author

I think we might want to define that types which are a wrapper over some other type should by default have conversions to/from that wrapped type, but we might want to define additional conversions and/or override the default ones.

For example, let's say that string was defined as a list of code units (which it probably should). Then there'd be a conversion string → list of code units and a conversion list of code units → string by default. But we could additionally define a conversion string ↔ list of code points, and we could in turn use that conversion to define code point length, scalar value string, collect a sequence of code points...

Now, for some types which add additional semantics to their wrapped types, such as set, we could define an explicit algorithmic conversion list → set which maintains the invariants. And we could use that same thing to handle end-of-queue on I/O queues.

@domenic
Copy link
Member

domenic commented Apr 15, 2021

It's pretty weird that you cannot (or can no longer?) apply UTF-8 decode to a byte sequence, but instead have to apply UTF-8 decode to the result of converting the byte sequence into an I/O queue.

@annevk
Copy link
Member

annevk commented Apr 16, 2021

Not sure if this came up in the context of writing new specification text, but I think we should continue to write text as if that is possible and eventually fix the plumbing.

andreubotella pushed a commit to andreubotella/multipart-form-data that referenced this issue Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants
@domenic @annevk @andreubotella and others