Implement Prototype Rust Backend #6

brendanzab · 2017-09-21T07:29:46Z

In order to start making our push to go end-to-end via Allsorts, we need a backend for generating Rust code. This will probably be similar to how LALRPOP does it, but I would like to generate more readable code than it does.

The unsafe slices branch shows what a resulting decoder might look like note that this is a very simple case however, and we will need to do more trickery in cases where we have anonymous structs, unions, and have unknown-sized types in the middle of structs.

mikeday · 2017-09-21T07:31:59Z

I'm questioning whether generating an actual Rust struct is necessary at all, although it could be helpful for relatively simple structs with fixed field layouts. Complex structs could be represented by an abstract slice with custom accessor methods.

brendanzab · 2017-09-21T07:34:07Z

Yeah, that may indeed be the case. We also want the output Rust API to be reasonably predicable, so changing the specification slightly should not greatly change the resulting code.

mikeday · 2017-09-21T07:35:23Z

Even simple structs will need accessors anyway for endian swapping if we want zero-copy access.

brendanzab · 2017-09-21T07:38:49Z

Note that in the cmap example above I do just that. Note that in Rust struct fields are private by default, so the only way to interact with this API is via the accessors.

mikeday · 2017-09-21T07:40:22Z

Right, so the public API doesn't hinge on there being an actual struct, just a type with the appropriate name. (Zero copy would require some lifetime trickery to make references live as long as the main buffer, I suppose).

brendanzab · 2017-09-21T07:43:59Z

Yup. The Cmap type is defined as:

#[repr(C)]
pub struct CMap {
    version: u16,
    num_tables: u16,
    encoding_records: [EncodingRecord],
}

The [EncodingRecord] means that this struct is 'dynamically sized' - ie. it can only be accessed via a pointer indirection. The only way it can be constructed is via pub fn from_buf(buf: &[u8]) -> Result<&CMap, ()>, so it will be forever forced to be immutable.

brendanzab · 2017-09-21T09:15:16Z

Another alternative might be to do something like:

pub struct CMap<'data> {
    data: &'data [u8],
}

pub fn from_buf<'data>(data: &'data [u8]) -> Result<CMap<'data>, ()> {
    ...
}

Would be interested to see how Nom does deserialization, seeing as it claims to be zero-copy too.

Edit: Seems like nom is zero copy in the sense of it does not allocate copies of data from the input iterator. But it does require you copy data into new structs in order to make use of the parsed data. For example, the gif decoder: https://github.com/Geal/gif.rs/blob/master/src/parser.rs

brendanzab · 2017-11-08T03:48:17Z

Here are some different methods I've been exploring for the by-ref output:

use std::{mem, ptr, slice};

pub struct Bmp {
    w: u64,
    h: u64,
    data: [u8],
}

impl Bmp {
    pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
        if buf.len() < Bmp::min_size() {
            Err(())
        } else {
            let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };

            if buf.len() != bmp.exact_size() {
                Err(())
            } else {
                Ok(bmp)
            }
        }
    }

    fn min_size() -> usize {
        mem::size_of::<u64>() + // w
        mem::size_of::<u64>() // h
    }

    fn exact_size(&self) -> usize {
        Bmp::min_size() +
        mem::size_of::<u8>() * self.w() as usize * self.h() as usize // data
    }

    pub fn w(&self) -> u64 {
        u64::from_be(self.w)
    }

    pub fn h(&self) -> u64 {
        u64::from_be(self.h)
    }

    pub fn data(&self) -> &[u8] {
        unsafe { slice::from_raw_parts(self.data.as_ptr(), self.w() as usize * self.h() as usize) }
    }
}

use std::{mem, ptr, slice};

pub struct Bmp {
    data: [u8],
}

impl Bmp {
    pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
        if buf.len() < Bmp::min_size() {
            Err(())
        } else {
            let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };

            if buf.len() != bmp.exact_size() {
                Err(())
            } else {
                Ok(bmp)
            }
        }
    }

    fn min_size() -> usize {
        Bmp::w_size() + Bmp::h_size()
    }

    fn exact_size(&self) -> usize {
        Bmp::min_size() + self.data_size()
    }

    fn w_size() -> usize { mem::size_of::<u64>() }
    fn h_size() -> usize { mem::size_of::<u64>() }
    fn data_size(&self) -> usize { mem::size_of::<u8>() * self.w() as usize * self.h() as usize }

    fn w_offset() -> isize { 0 }
    fn h_offset() -> isize { Bmp::w_offset() + Bmp::w_size() as isize }
    fn data_offset() -> isize { Bmp::h_offset() + Bmp::h_size() as isize }

    pub fn w(&self) -> u64 {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::w_offset()) as *const u64;
          u64::from_be(ptr::read(ptr))
        }
    }

    pub fn h(&self) -> u64 {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::h_offset()) as *const u64;
          u64::from_be(ptr::read(ptr))
        }
    }

    pub fn data(&self) -> &[u8] {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::data_offset()) as *const u8;
          slice::from_raw_parts(ptr, self.data_size())
        }
    }
}

Note that the generated assembly code is identical. This is very similar to what Harfbuzz is doing.

Advantages:

all bounds checking is done up-front
no reallocation occurs because it is reusing the same bytes as the buffer

Disadvantages:

requires the full buffer to be persisted in memory
requires a great deal of unsafe code under the hood to work which could be tricky to audit
have to execute 'interp' types by need, without memoization (eg. for endian-conversions)
users might want to persist part of the struct beyond the lifetime of the buffer
not compatible with streaming

brendanzab · 2017-11-08T06:57:58Z

I wanted to experiment with the above unsafe APIs because they followed Harfbuzz's technique, but using unsafe in generated code gives me the heebyjeebies. Don’t want to hit something like that Ragel bug that hit Cloudflare…

Here are some more ideas for the 'spidery api', but this time using byte slices as they are intended top be used:

//! Pixel = {
//!     r : u8,
//!     g : u8,
//!     b : u8,
//! };
//!
//! Bmp = struct {
//!     w : u64be,
//!     h : u64be,
//!     data : [Pixel; w * h],
//!     trailer_len : u32be,
//!     trailer_data : [u32be; trailer_len],
//! };

extern crate byteorder;

use byteorder::{BigEndian, ReadBytesExt};
use std::io::Cursor;
use std::mem;

pub struct BmpDataRef<'a> {
    buf: &'a [u8],
}

impl<'a> BmpDataRef<'a> {
    pub fn new(buf: &[u8]) -> BmpDataRef {
        BmpDataRef { buf }
    }
}

pub struct BmpRef<'a> {
    buf: &'a [u8],
}

impl<'a> BmpRef<'a> {
    pub fn new(buf: &[u8]) -> BmpRef {
        BmpRef { buf }
    }

    pub fn w(&self) -> u64 {
        let offset = 0;
        Cursor::new(&self.buf[offset..])
            .read_u64::<BigEndian>()
            .unwrap()
    }

    pub fn h(&self) -> u64 {
        let offset = mem::size_of::<u64>();
        Cursor::new(&self.buf[offset..])
            .read_u64::<BigEndian>()
            .unwrap()
    }

    pub fn data(&self) -> BmpDataRef {
        let offset = mem::size_of::<u64>() * 2;
        let size = mem::size_of::<u8>() * 3 * self.w() as usize * self.h() as usize;
        BmpDataRef::new(&self.buf[offset..offset + size])
    }
}

This one is a spidery api with 'staged copying'. Advantage is that it does up-front verification in the constructor, and caches the results:

//! Pixel = {
//!     r : u8,
//!     g : u8,
//!     b : u8,
//! };
//!
//! Bmp = struct {
//!     w : u64be,
//!     h : u64be,
//!     data : [Pixel; w * h],
//!     trailer_len : u32be,
//!     trailer_data : [u32be; trailer_len],
//! };

extern crate byteorder;

use byteorder::{BigEndian, ReadBytesExt};
use std::io::{self, Cursor};
use std::mem;

#[derive(Copy, Clone)]
pub struct BmpRef<'a> {
    w: u64,
    h: u64,
    data: BmpDataCursor<'a>,
}

impl<'a> BmpRef<'a> {
    pub fn new(buf: &[u8]) -> Result<BmpRef, ()> {
        let w = Cursor::new(&buf.get(0..).ok_or(())?)
            .read_u64::<BigEndian>()
            .map_err(|_| ())?;
        let h = Cursor::new(&buf.get(mem::size_of_val(&w)..).ok_or(())?)
            .read_u64::<BigEndian>()
            .map_err(|_| ())?;

        let data = BmpDataCursor::new(&buf
            .get(mem::size_of_val(&w) + mem::size_of_val(&h)..)
            .ok_or(())?);

        Ok(BmpRef { w, h, data })
    }

    pub fn w(&self) -> u64 {
        self.w
    }

    pub fn h(&self) -> u64 {
        self.h
    }

    pub fn data(&self) -> BmpDataCursor<'a> {
        self.data
    }
}

#[derive(Copy, Clone)]
pub struct BmpDataCursor<'a> {
    buf: &'a [u8],
}

impl<'a> BmpDataCursor<'a> {
    pub fn new(buf: &[u8]) -> BmpDataCursor {
        BmpDataCursor { buf }
    }
    
    pub fn get(&self, index: usize) -> Option<Pixel> {
        unimplemented!()
    }
    
    pub fn iter(&self) -> BmpDataIter {
        unimplemented!()
    }
}

pub struct BmpDataIter {
    // TODO
}

impl Iterator for BmpDataIter {
    type Item = Pixel;
    
    fn next(&mut self) -> Option<Pixel> {
        unimplemented!()
    }
}

pub struct Pixel {
    r: u8,
    g: u8,
    b: u8,
}

Closes #6

brendanzab mentioned this issue Sep 26, 2017

IR for compiler backend #9

Closed

brendanzab mentioned this issue Nov 22, 2017

Begin implementing compiler backend #38

Merged

5 tasks

brendanzab closed this as completed in #38 Nov 23, 2017

brendanzab added a commit that referenced this issue Aug 30, 2019

Check for multiple definitions

6081095

Closes #6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Prototype Rust Backend #6

Implement Prototype Rust Backend #6

brendanzab commented Sep 21, 2017 •

edited

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017 •

edited

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Nov 8, 2017 •

edited

brendanzab commented Nov 8, 2017

Implement Prototype Rust Backend #6

Implement Prototype Rust Backend #6

Comments

brendanzab commented Sep 21, 2017 • edited

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017 • edited

mikeday commented Sep 21, 2017

brendanzab commented Sep 21, 2017 • edited

brendanzab commented Sep 21, 2017 • edited

brendanzab commented Nov 8, 2017 • edited

brendanzab commented Nov 8, 2017

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Sep 21, 2017 •

edited

brendanzab commented Nov 8, 2017 •

edited