New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.base64 should support reporting errors and ignoring non alphabet characters #611

Closed
jfo opened this Issue Nov 14, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@jfo
Member

jfo commented Nov 14, 2017

Such as it is, the spec

Relevant passage:

Implementations MUST reject the encoded data if it contains
   characters outside the base alphabet when interpreting base-encoded
   data, unless the specification referring to this document explicitly
   states otherwise.  Such specifications may instead state, as MIME
   does, that characters outside the base encoding alphabet should
   simply be ignored when interpreting data ("be liberal in what you
   accept")

Currently, encode accepts all incoming data.

For reference:

VGltZSBwcmVzZW50IGFuZCB0aW1lIHBhc3QKQXJlIGJvdGggcGVyaGFwcyBw
cmVzZW50IGluIHRpbWUgZnV0dXJlLApBbmQgdGltZSBmdXR1cmUgY29udGFp
bmVkIGluIHRpbWUgcGFzdC4KSWYgYWxsIHRpbWUgaXMgZXRlcm5hbGx5IHBy
ZXNlbnQKQWxsIHRpbWUgaXMgdW5yZWRlZW1hYmxlLgpXaGF0IG1pZ2h0IGhh
dmUgYmVlbiBpcyBhbiBhYnN0cmFjdGlvbgpSZW1haW5pbmcgYSBwZXJwZXR1
YWwgcG9zc2liaWxpdHkKT25seSBpbiBhIHdvcmxkIG9mIHNwZWN1bGF0aW9u
Lgo=

when decoded with zig yields:

Time present and time past
Are both perhaps p�É�Í�¹Ð�¥¸�Ñ¥µ���ÕÑÕÉ�°)�¹��Ñ¥µ���ÕÑÕÉ���½¹Ñ�¤�æVB��â�F�ÖR���7Bà¤�b��ÆÂ�F�ÖR��2�WFW&æ�ÆÇ���0�\Ù[���[����[YH�\È�[��Y�Y[XX��K��Ú�]��ZYÚ����@ve been is an abstraction
Remaining a perpetu��°�Á½ÍÍ¥�¥±¥Ñä)=¹±ä�¥¸���ݽɱ��½��ÍÁ��Õ±�Ñ¥½¼�à¤�

It makes sense to either validate and reject nonconforming input or ignore it. Bsd's base64 utility seems to ignore whitespace and return an error on other characters.

@andrewrk andrewrk added this to the 0.3.0 milestone Nov 14, 2017

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Nov 18, 2017

The spec actually says to reject whitespace, not ignore it, "unless the specification referring to this document explicitly states otherwise".

But we definitely need an error for invalid characters when decoding data. The existing undefined behavior is a security hazard.

I'm working on this.

@thejoshwolfe thejoshwolfe changed the title from ignore whitespace on std.base64.decode? to invalid characters should cause std.base64.decode to return an error Nov 18, 2017

@andrewrk

This comment has been minimized.

Member

andrewrk commented Nov 18, 2017

It's not undefined behavior, so it's not a security hazard. The encoder is ultimately responsible for the bytes they want to be decoded, and that doesn't change with the status quo implementation.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Nov 18, 2017

There are 3 use cases here:

  • Rejecting invalid base64. Ideally we could even point out the index of the first error.
  • Decoding a chunk of text which has garbage that is not part of the encoded data, such as from an email or an RSA key that has extra whitespace (or maybe even other bytes)
  • Optimally decoding known-to-be-valid base64 (this is the only currently supported use case)

These are all valid use cases, and this issue is to support the other 2.

@jfo

This comment has been minimized.

Member

jfo commented Nov 18, 2017

It's worth considering supporting different modes. Many languages seem to do this (I checked ruby and go, but I'm sure others do.) Where strict aborts on any invalid character and "regular" ignores them or at least ignores whitespace. Note that = is invalid when not in a terminal position.

@thejoshwolfe thanks for looking! I was hoping to do some more research and submit a PR for this but I'm starting a new job next week so probably won't have too much time in the near term.

@andrewrk andrewrk changed the title from invalid characters should cause std.base64.decode to return an error to std.base64 should support reporting errors and ignoring non alphabet characters Nov 18, 2017

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Nov 18, 2017

It's not undefined behavior, so it's not a security hazard.

Sorry, it's undefined values, not undefined behavior. I misread the doc comment.

There are 3 use cases here:

You're right.

strict aborts on any invalid character and "regular" ignores them or at least ignores whitespace.

Yeah. I'm planning to support every usecase, but whatever you want to ignore has to be explicit. That fits with the zen of zig.

thejoshwolfe added a commit that referenced this issue Nov 18, 2017

rework std.base64 api
* rename decode to decodeExactUnsafe.
* add decodeExact, which checks for invalid chars and padding.
* add decodeWithIgnore, which also allows ignoring chars.
* alphabets are supplied to the decoders with their
  char-to-index mapping already built, which enables it to be
  done at comptime.
* all decode/encode apis except decodeWithIgnore require dest
  to be the exactly correct length. This is calculated by a
  calc function corresponding to each api. These apis no longer
  return the dest parameter.
* for decodeWithIgnore, an exact size cannot be known a priori.
  Instead, a calc function gives an upperbound, and a runtime
  error is returned in case of overflow. decodeWithIgnore
  returns the number of bytes written to dest.

closes #611

thejoshwolfe added a commit that referenced this issue Nov 21, 2017

rework std.base64 api
* rename decode to decodeExactUnsafe.
* add decodeExact, which checks for invalid chars and padding.
* add decodeWithIgnore, which also allows ignoring chars.
* alphabets are supplied to the decoders with their
  char-to-index mapping already built, which enables it to be
  done at comptime.
* all decode/encode apis except decodeWithIgnore require dest
  to be the exactly correct length. This is calculated by a
  calc function corresponding to each api. These apis no longer
  return the dest parameter.
* for decodeWithIgnore, an exact size cannot be known a priori.
  Instead, a calc function gives an upperbound, and a runtime
  error is returned in case of overflow. decodeWithIgnore
  returns the number of bytes written to dest.

closes #611

thejoshwolfe added a commit that referenced this issue Nov 21, 2017

rework std.base64 api
* rename decode to decodeExactUnsafe.
* add decodeExact, which checks for invalid chars and padding.
* add decodeWithIgnore, which also allows ignoring chars.
* alphabets are supplied to the decoders with their
  char-to-index mapping already built, which enables it to be
  done at comptime.
* all decode/encode apis except decodeWithIgnore require dest
  to be the exactly correct length. This is calculated by a
  calc function corresponding to each api. These apis no longer
  return the dest parameter.
* for decodeWithIgnore, an exact size cannot be known a priori.
  Instead, a calc function gives an upperbound, and a runtime
  error is returned in case of overflow. decodeWithIgnore
  returns the number of bytes written to dest.

closes #611
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment