New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhaul std.fmt formatting api #1358

Open
tiehuis opened this Issue Aug 9, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@tiehuis
Member

tiehuis commented Aug 9, 2018

This is a proposal for the formatting interface exposed via std.fmt and which
shows up in most printing functions (e.g. std.debug.warn).

This is largely based on Rust's std::fmt (which in turn is similar to Python3) so see that for a more in-depth reference for certain parts.

Formatting Options

We do take the following formatting options from Rust:

  • positional parameters ("{0}").
  • alignment ("{:<} {:<5} {:0^10}")
  • width ("{:5}")
  • precision ("{:.5} {:.0}")

We do not take the following:

  • # alternate printing forms
  • +, -, 0 sign flags (NOTE: may actually want these)
  • named parameters (format!("{arg1}", arg1 = "example"))
  • runtime specified precision (format!("{:.*}", 3, 5.0923412) (NOTE: could add this in if reasonable demand)
  • numbered argument specified precision (format!("{0:1$}", 5.0923412, 3))

Format Specifiers

These are largely unchanged but a few are:

  • {} (primitives) print the default primitive representation (if it exists)
  • {c} (int): print as an ascii character
  • {b} (int): print as binary
  • {x} (int): print as lowercase hex
  • {X} (int): print as uppercase hex
  • {o} (int): print as octal
  • {e} (float): print in exponent form
  • {d} (int/float): print in base10/decimal form
  • {s} ([]u8/*u8): print as null-terminated string
  • {*} (any): print as a pointer (hex) (NOTE: does & make more sense here?)
  • {?} (any): print full debug representation (e.g. traverse structs etc to primitive fields)
  • {#} (any): print raw bytes of the value (hex) (NOTE: do we need this? how often is it used?)

These format specifiers are removed from the current implementation:

  • {.} (float): was to specify decimal float, now {d} replaces this
  • {e10} (float): precision was attached to format specifier. The new format
    specifier type would replace this.
  • {B} (any): printed raw bytes of value, replaced by {#}. This is to
    ensure it cannot be shadowed by a user defined function.

User-defined functions

Alongside this I propose a change in the way format functions are defined.

The current function to implement is of the form:

pub fn format(
    self: *SelfType,
    comptime fmt: []const u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!void;

I instead propose changing this to be of the form:

pub fn format(
    self: *SelfType,
    comptime format: ?u8,
    context: var,
    comptime Errors: type,
    output: fn (@typeOf(context), []const u8) Errors!void,
) Errors!?void {
    // This is enforced within `std.fmt`.
    std.debug.assert(format == null or
        ('a' <= format.? and format.? <= 'z') or
        ('A' <= format.? and format.? <= 'Z')
    );
}

Format specifiers should be simple and ensuring they are only 1 character
at least enforces consistency and simpler format strings. This also makes
switching on the format cases much easier for an implementation and avoids
some easy edge cases.

format is null for the {} case.

If the function does not handle the format specifier they can return null and
std.fmt will handle an appropriate message.

Old Example
const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt: []const u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!void {
        if (fmt.len > 0) {
            if (fmt.len > 1) {
                unreachable;
            }

            switch (fmt[0]) {
                // point format
                'p' => return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y),
                // dimension format
                'd' => return std.fmt.format(context, Errors, output, "{.3}x{.3}", self.x, self.y),
                else => unreachable,
            }
        }
        return std.fmt.format(context, Errors, output, "({.3},{.3})", self.x, self.y);
    }
};
New Example
const Vec2 = struct {
    x: f32, y: f32,

    pub fn format(
        self: *Vec2,
        comptime fmt_spec: ?u8,
        context: var,
        comptime Errors: type,
        output: fn (@typeOf(context), []const u8) Errors!void,
    ) Errors!?void {
        switch (fmt_spec) {
            // point format
            null, 'p' => return std.fmt.format(context, Errors, output, "({:.3},{:.3})", self.x, self.y),
            // dimension format
            'd' => return std.fmt.format(context, Errors, output, "{:.3}x{:.3}", self.x, self.y),
            // unhandled format
            else => return null,
        }
    }
};

One extra thing that comes to mind is whether we want to allow access to the
formatting specifiers for user-defined functions, passing the values to each.

An example use-case for the above would be allowing access to the precision
field and printing the vector components with that precision instead of
hardcoding. One concern is format functions don't necessarily have to use that
information for the correct purpose and could use it poorly. This is minor,
though.

Shortcomings/Extras

Leftside format-specifier type

With this proposal {s} becomes {:s}. Is this fine? Since we only accept one
character and don't want named arguments we could put this on the leftside of
: alongside the positional argument. This would mean the common case is the
same as now and fairly clean. With a positional parameter this would change from:

"{0:s} {2} {:b}" -> "{0s} {2} {b}"

This is still unambiguous.

Grammar

format-string := <text> (maybe-format <text>)*
maybe-format := "{{" | "}}" | format
format := '{' argument? (':' format-spec)? '}'
argument := integer? type-spec

type-spec := [a-zA-Z*#?]

format-spec := (fill? align)? width? ('.' precision)?
fill := character
align := '<' | '^' | '>'
width := integer
precision := integer

End

Feel free to make any other suggestions and/or highlight any issues. I'd
prefer to keep this as simple as reasonable as long as it covers all the common
use-cases reasonably.

@tiehuis tiehuis added the proposal label Aug 9, 2018

@tiehuis tiehuis added this to the 0.4.0 milestone Aug 9, 2018

@kristate

This comment has been minimized.

Contributor

kristate commented Aug 9, 2018

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Aug 9, 2018

{#} will be useful in writing network applications when we need to debug what was sent to and from the line.

it should be restricted to packed types then.

@andrewrk andrewrk added the accepted label Aug 9, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Aug 9, 2018

I'd like to further propose:

  • {s16LE} - decode UTF-16 Little Endian, and print as encoded UTF-8. After #265 it would work for []u16 as well as [*]null u16 types.
  • {s16BE} - decode UTF-16 Big Endian, and print as encoded UTF-8. After #265 it would work for []u16 as well as [*]null u16 types.
  • {s32LE} - decode UTF-32 Little Endian, and print as encoded UTF-8. After #265 it would work for []u32 as well as [*]null u32 types.
  • {s32BE} - decode UTF-32 Big Endian, and print as encoded UTF-8. After #265 it would work for []u32 as well as [*]null u32 types.

{s16LE} would be common for printing Windows "wide character" strings.

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Aug 10, 2018

runtime zfilling is useful. i wanted that feature for this project: https://github.com/thejoshwolfe/hexdump-zip . when that tool was written in javascript, i would determine the digit count for the highest memory address value (which depends on the user-provided input file size), then zfill all memory address representations to that width. the zig implementation of that tool can't easily do that, so i just zfill everything to the maximum conceivable memory address, which is way bulky.

kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018

std/fmt/index.zig: ziglang#1358 allow bytes to be printed-out as hex;
Supports {x} for lowercase and {X} for uppercase;

kristate added a commit to kristate/zig that referenced this issue Sep 1, 2018

andrewrk added a commit that referenced this issue Sep 1, 2018

Merge pull request #1451 from kristate/fmt-hexbytes-issue1358
allow bytes to be printed-out as hex (#1358)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment