Use `u8` and `[u8]`, not `char` and `str` for printing methods on `Console` #161

ftxqxd · 2015-06-04T06:25:21Z

Console’s printing methods don’t work with Unicode strings or characters, so it doesn’t make sense to use the char (Unicode scalar value) and str (UTF-8 string) types. This changes these methods to use u8 and [u8] instead. The ergonomic impact of this is minimal (although it breaks just about all code, unfortunately), due to the existence of byte string and character literals (b"foo" and b'c'), and additionally, this makes it possible to specify special characters directly inside byte strings (e.g., printing b"90\xf8" (0xf8 is chars::GRADE) now displays 90°, but previously displayed 90├© due to the fact that strings are UTF-8).

`Console`’s printing functions don’t work with Unicode strings or characters, so it doesn’t make sense to use the `char` (Unicode scalar value) and `str` (UTF-8 string) types for printing.

tomassedovic · 2015-06-04T10:20:28Z

The ergonomic impact seems much higher to me, but yeah, this is a problem. I absolutely agree that we should allow b'c' and b"foo", but I'd like to keep the ability to pass in normal chars and strings as well (with the understanding that you mustn't pass in non-ascii values).

What do you think of accepting something like Into<u8> and Into<[u8]>, potentially with the char and str implementations calling std::ascii::AsciiExt::is_ascii during the conversion?

(we'd have to define a different trait because the types and the Into trait are in a different crate)

That conversion would have a performance hit, but you'd get a runtime error and you could always make it fast by using the ASCII types directly. I.e. easy for people getting started but not costing folks who know what they're doing.

zsparal · 2015-06-04T10:35:48Z

Since libtcod actually does support Unicode printing functions: docs, we could statically dispatch the printing based on what argument it gets. I cannot test this right now, but this should work (the mutable trait object syntax might be incorrect, I can't remember if you need a second &):

pub trait PrintString { 
    fn print(&self, con: &mut Console);
}

impl PrintString for [u8] {
    fn print(&self, con: &mut Console) {
        // Print ASCII using the ffi
    }
}

impl<'a> PrintString for &'a str {
    fn print(&self, con: &mut Console) {
        // Print Unicode using the ffi
    }
}

impl PrintString for String {
    fn print(&self, con: &mut Console) {
        // Print Unicode using the ffi
    }
}

Using this, the definition for the actual print function would look something like this:

fn print<T>(&mut self, x: i32, y: i32, text: &T) where T: PrintString {
    assert!(x >= 0 && y >= 0);
    text.print(self);
}

This could also be repeated for char and u8.

ftxqxd · 2015-06-04T10:44:07Z

@tomassedovic Yeah, I was just thinking about doing that. Potentially (perhaps later in a different PR, though) the char/str → u8/[u8] conversion implementation could also manually map specific Unicode glyphs to their equivalents in libtcod’s character set, and only if no such equivalent exists would it throw a runtime error.

One question, though: the ASCII control characters display differently in libtcod’s character set. These characters could easily be supported using char/&str, because they have the same representation as a C/byte string, but that’s potentially confusing, because they don’t display the same in Unicode, which is what char and &str are supposed to represent. So should a char/&str → u8/[u8] conversion method allow control characters?

@Gustorn I did try to use the _utf functions from the FFI, but they just didn’t work in my testing (e.g., é displayed as ú), and I had to manually convert &str to UTF-32 anyway, which is an O(n) operation.

zsparal · 2015-06-04T10:53:49Z

@P1start Does libtcod really expect UTF-32 strings? That's a bit curious... Also, are you sure you were using a Unicode font for the Root console?

ftxqxd · 2015-06-04T11:04:25Z

@Gustorn Are Unicode fonts available for libtcod? I was under the impression that only 256-character terminal.png-like fonts were supported. And I’m pretty sure they’re UTF-32 because the C API expects const wchar_t *, which is equivalent to *const i32 in Rust (wchar_t is 32 bits, at least on my platform).

Also, I just realised that a static-dispatch-based API (fn print<T: AsTcodStr> or whatever) isn’t going to work, since that would make Console not object safe, which is a big deal. You could use dynamic dispatch (fn set_char(x: i32, y: i32, glyph: &ToTcodStr) or something similar), but then you’d have to box/reference every string or character, making the ergonomic impact even worse, probably.

zsparal · 2015-06-04T11:22:57Z

Here's a forum topic about an (apparently) full Unicode bitmap font. As for wchar_t and UTF: you actually have to store UTF-8 using a type that can hold a minimum of 32 bits, since Unicode code points can take up 1-4 bytes using that encoding (the original spec said up to 6 bytes, but if I remember correctly there is an RFC that specifies 4 as the current upper limit). As a sidenote here: the libtcod implementation is actually not that great, since wchar_t is actually compiler specific, and can be as small as 8 bits...

Object safety is actually a really good point, one which I haven't considered. To be completely honest, I haven't really seen a use-case for using trait objects, so object safety is doesn't seem a terribly big deal to me: the only function where Console types are actually interchangeable is blit, other than that I can't see why you'd store a Vec<Box<Console>> for example.

ftxqxd · 2015-06-04T11:42:41Z

@Gustorn Looks like that ‘only’ supports the basic multilingual plane, but I now see that you can indeed support more than just 256 characters. I guess I now know how I was using the _utf functions incorrectly: the only difference is that they support 2³² characters instead of just 256; they don’t necessarily map to Unicode values unless your font file is specifically organised that way.

One of the first things I wrote with tcod involved using Console as a trait object, in order to make sure another trait was object safe. It went something like this:

trait Draw {
    fn draw(&self, &mut Console);
}

And then there was going to be a Vec<Box<Draw>> somewhere which would be iterated over to draw every drawable object, but I never really got that far. It’s not a super compelling example, but it’s a good idea anyway to keep traits object safe whenever possible.

zsparal · 2015-06-04T12:06:03Z

Yeah, unfortunaly that is a valid example, even though I would argue that drawing on Offscreen consoles would be good enough: it makes very simple prototypes harder to make (no direct drawing to Root), but 90% of the time you would draw on Offscreen consoles and blit them to Root anyway. These are the times when I actually miss C++. A templated print wouldn't cause any problems there.

I still think that using u8 as default is not the greatest idea: it does have a significant ergonomic impact (either the byte literals or as_bytes (?)), and it's much more natural to work with String and &str. An other option would be to get rid of every function inside the trait (except for con), and define everything as free functions. Since there's no actual UFCS in Rust, this would be pretty inconvenient (even though it would mirror the C API quite faithfully):

root.print(args...); 
// Would become
console::print(root, args...);

zsparal · 2015-06-04T19:07:53Z

So here's some testing I've done with object safety, and I'm happy to say, we can actually do the generic print and have Console still be object safe. Here's the link to my little test: commit. Unforuntalely I don't really have time today, but I think I'll implement it tomorrow.

tomassedovic · 2015-06-04T22:48:45Z

I'm sorry, but I've had little time to participate today and I'll be completely off for the next few days. If you come up with something that doesn't require the u8 and [u8] literals, feel free to merge it.

I'll be back on Monday so if it doesn't come in by then, I'll read through this thread again and get this fixed one way or another.

Thanks both of you.

zsparal · 2015-06-12T11:29:27Z

Proposing to close this since #163 was merged.

tomassedovic · 2015-06-12T12:01:41Z

Agreed, the unicode concerns should be addressed by #163. Note that we still don't use b'@' and b"foo" directly, but that can be addressed in another pull request.

Thanks @P1start for starting this! If you feel we should re-open this, feel free to comment here.

zsparal · 2015-06-12T14:05:01Z

I'll reopen this since I did a bit of research regarding the literals.

After a bit of experimentation I'm not sure if we should accept ASCII literals, and here's why. With the current implementation we choose the printing function based on Rust's is_ascii function, which only returns true if the given characters are the part of the original (non-extended) ASCII table. This means that given the example in the opening post (b"90\xf8") is_ascii would return false, which would result in us trying to use the UTF-8 printing function with an invalid byte (the degree character is represented by two bytes in UTF-8). This is most certainly not the behaviour one would expect.

Now the current runtime dispatch could be combined with a custom trait, but I'm not sure if introducing arbitrary rules for the different string types is such a good idea. Let's say we have the trait TcodString that controls if a value should be converted to UTF-8 or not. A possible implementation for the different string (or string-like) types:

&str -> convert if not is_ascii
String -> convert if not is_ascii
&[u8] -> never convert

This is probably close to the behaviour people would expect: since libtcod doesn't really care about encoding, using ASCII literals, even with the extended ASCII table should give correct results. In the case of the two regular string types they are already guaranteed to be valid UTF-8 encoded strings at least, but we can save the conversion if they are in the original ASCII range (this is what we're doing now).

I can probably implement it in a way that still accepts user-defined string types, provided they implement AsRef<str>. Without negative trait bounds also accepting AsRef<[u8]> is impossible I think (it would have to be a concrete implementation for &[u8]).

Use u8 and [u8] instead of char and str for printing

e5eb1cc

`Console`’s printing functions don’t work with Unicode strings or characters, so it doesn’t make sense to use the `char` (Unicode scalar value) and `str` (UTF-8 string) types for printing.

zsparal mentioned this pull request Jun 11, 2015

Unicode printing and AsRef consistency #163

Merged

tomassedovic closed this Jun 12, 2015

zsparal reopened this Jun 12, 2015

zsparal mentioned this pull request Jun 13, 2015

Support for proper printing of ASCII literals #164

Merged

tomassedovic closed this in #164 Jun 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `u8` and `[u8]`, not `char` and `str` for printing methods on `Console` #161

Use `u8` and `[u8]`, not `char` and `str` for printing methods on `Console` #161

ftxqxd commented Jun 4, 2015

tomassedovic commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

zsparal commented Jun 4, 2015

tomassedovic commented Jun 4, 2015

zsparal commented Jun 12, 2015

tomassedovic commented Jun 12, 2015

zsparal commented Jun 12, 2015

Use u8 and [u8], not char and str for printing methods on Console #161

Use u8 and [u8], not char and str for printing methods on Console #161

Conversation

ftxqxd commented Jun 4, 2015

tomassedovic commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

ftxqxd commented Jun 4, 2015

zsparal commented Jun 4, 2015

zsparal commented Jun 4, 2015

tomassedovic commented Jun 4, 2015

zsparal commented Jun 12, 2015

tomassedovic commented Jun 12, 2015

zsparal commented Jun 12, 2015

Use `u8` and `[u8]`, not `char` and `str` for printing methods on `Console` #161

Use `u8` and `[u8]`, not `char` and `str` for printing methods on `Console` #161