New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Prototype Rust Backend #6
Comments
I'm questioning whether generating an actual Rust struct is necessary at all, although it could be helpful for relatively simple structs with fixed field layouts. Complex structs could be represented by an abstract slice with custom accessor methods. |
Yeah, that may indeed be the case. We also want the output Rust API to be reasonably predicable, so changing the specification slightly should not greatly change the resulting code. |
Even simple structs will need accessors anyway for endian swapping if we want zero-copy access. |
Note that in the cmap example above I do just that. Note that in Rust struct fields are private by default, so the only way to interact with this API is via the accessors. |
Right, so the public API doesn't hinge on there being an actual struct, just a type with the appropriate name. (Zero copy would require some lifetime trickery to make references live as long as the main buffer, I suppose). |
Yup. The #[repr(C)]
pub struct CMap {
version: u16,
num_tables: u16,
encoding_records: [EncodingRecord],
} The |
Another alternative might be to do something like: pub struct CMap<'data> {
data: &'data [u8],
}
pub fn from_buf<'data>(data: &'data [u8]) -> Result<CMap<'data>, ()> {
...
} Would be interested to see how Nom does deserialization, seeing as it claims to be zero-copy too. Edit: Seems like nom is zero copy in the sense of it does not allocate copies of data from the input iterator. But it does require you copy data into new structs in order to make use of the parsed data. For example, the gif decoder: https://github.com/Geal/gif.rs/blob/master/src/parser.rs |
Here are some different methods I've been exploring for the by-ref output: use std::{mem, ptr, slice};
pub struct Bmp {
w: u64,
h: u64,
data: [u8],
}
impl Bmp {
pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
if buf.len() < Bmp::min_size() {
Err(())
} else {
let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };
if buf.len() != bmp.exact_size() {
Err(())
} else {
Ok(bmp)
}
}
}
fn min_size() -> usize {
mem::size_of::<u64>() + // w
mem::size_of::<u64>() // h
}
fn exact_size(&self) -> usize {
Bmp::min_size() +
mem::size_of::<u8>() * self.w() as usize * self.h() as usize // data
}
pub fn w(&self) -> u64 {
u64::from_be(self.w)
}
pub fn h(&self) -> u64 {
u64::from_be(self.h)
}
pub fn data(&self) -> &[u8] {
unsafe { slice::from_raw_parts(self.data.as_ptr(), self.w() as usize * self.h() as usize) }
}
} use std::{mem, ptr, slice};
pub struct Bmp {
data: [u8],
}
impl Bmp {
pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
if buf.len() < Bmp::min_size() {
Err(())
} else {
let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };
if buf.len() != bmp.exact_size() {
Err(())
} else {
Ok(bmp)
}
}
}
fn min_size() -> usize {
Bmp::w_size() + Bmp::h_size()
}
fn exact_size(&self) -> usize {
Bmp::min_size() + self.data_size()
}
fn w_size() -> usize { mem::size_of::<u64>() }
fn h_size() -> usize { mem::size_of::<u64>() }
fn data_size(&self) -> usize { mem::size_of::<u8>() * self.w() as usize * self.h() as usize }
fn w_offset() -> isize { 0 }
fn h_offset() -> isize { Bmp::w_offset() + Bmp::w_size() as isize }
fn data_offset() -> isize { Bmp::h_offset() + Bmp::h_size() as isize }
pub fn w(&self) -> u64 {
unsafe {
let ptr = self.data.as_ptr().offset(Bmp::w_offset()) as *const u64;
u64::from_be(ptr::read(ptr))
}
}
pub fn h(&self) -> u64 {
unsafe {
let ptr = self.data.as_ptr().offset(Bmp::h_offset()) as *const u64;
u64::from_be(ptr::read(ptr))
}
}
pub fn data(&self) -> &[u8] {
unsafe {
let ptr = self.data.as_ptr().offset(Bmp::data_offset()) as *const u8;
slice::from_raw_parts(ptr, self.data_size())
}
}
} Note that the generated assembly code is identical. This is very similar to what Harfbuzz is doing. Advantages:
Disadvantages:
|
I wanted to experiment with the above unsafe APIs because they followed Harfbuzz's technique, but using Here are some more ideas for the 'spidery api', but this time using byte slices as they are intended top be used: //! Pixel = {
//! r : u8,
//! g : u8,
//! b : u8,
//! };
//!
//! Bmp = struct {
//! w : u64be,
//! h : u64be,
//! data : [Pixel; w * h],
//! trailer_len : u32be,
//! trailer_data : [u32be; trailer_len],
//! };
extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt};
use std::io::Cursor;
use std::mem;
pub struct BmpDataRef<'a> {
buf: &'a [u8],
}
impl<'a> BmpDataRef<'a> {
pub fn new(buf: &[u8]) -> BmpDataRef {
BmpDataRef { buf }
}
}
pub struct BmpRef<'a> {
buf: &'a [u8],
}
impl<'a> BmpRef<'a> {
pub fn new(buf: &[u8]) -> BmpRef {
BmpRef { buf }
}
pub fn w(&self) -> u64 {
let offset = 0;
Cursor::new(&self.buf[offset..])
.read_u64::<BigEndian>()
.unwrap()
}
pub fn h(&self) -> u64 {
let offset = mem::size_of::<u64>();
Cursor::new(&self.buf[offset..])
.read_u64::<BigEndian>()
.unwrap()
}
pub fn data(&self) -> BmpDataRef {
let offset = mem::size_of::<u64>() * 2;
let size = mem::size_of::<u8>() * 3 * self.w() as usize * self.h() as usize;
BmpDataRef::new(&self.buf[offset..offset + size])
}
} This one is a spidery api with 'staged copying'. Advantage is that it does up-front verification in the constructor, and caches the results: //! Pixel = {
//! r : u8,
//! g : u8,
//! b : u8,
//! };
//!
//! Bmp = struct {
//! w : u64be,
//! h : u64be,
//! data : [Pixel; w * h],
//! trailer_len : u32be,
//! trailer_data : [u32be; trailer_len],
//! };
extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt};
use std::io::{self, Cursor};
use std::mem;
#[derive(Copy, Clone)]
pub struct BmpRef<'a> {
w: u64,
h: u64,
data: BmpDataCursor<'a>,
}
impl<'a> BmpRef<'a> {
pub fn new(buf: &[u8]) -> Result<BmpRef, ()> {
let w = Cursor::new(&buf.get(0..).ok_or(())?)
.read_u64::<BigEndian>()
.map_err(|_| ())?;
let h = Cursor::new(&buf.get(mem::size_of_val(&w)..).ok_or(())?)
.read_u64::<BigEndian>()
.map_err(|_| ())?;
let data = BmpDataCursor::new(&buf
.get(mem::size_of_val(&w) + mem::size_of_val(&h)..)
.ok_or(())?);
Ok(BmpRef { w, h, data })
}
pub fn w(&self) -> u64 {
self.w
}
pub fn h(&self) -> u64 {
self.h
}
pub fn data(&self) -> BmpDataCursor<'a> {
self.data
}
}
#[derive(Copy, Clone)]
pub struct BmpDataCursor<'a> {
buf: &'a [u8],
}
impl<'a> BmpDataCursor<'a> {
pub fn new(buf: &[u8]) -> BmpDataCursor {
BmpDataCursor { buf }
}
pub fn get(&self, index: usize) -> Option<Pixel> {
unimplemented!()
}
pub fn iter(&self) -> BmpDataIter {
unimplemented!()
}
}
pub struct BmpDataIter {
// TODO
}
impl Iterator for BmpDataIter {
type Item = Pixel;
fn next(&mut self) -> Option<Pixel> {
unimplemented!()
}
}
pub struct Pixel {
r: u8,
g: u8,
b: u8,
} |
In order to start making our push to go end-to-end via Allsorts, we need a backend for generating Rust code. This will probably be similar to how LALRPOP does it, but I would like to generate more readable code than it does.
The unsafe slices branch shows what a resulting decoder might look like note that this is a very simple case however, and we will need to do more trickery in cases where we have anonymous structs, unions, and have unknown-sized types in the middle of structs.
The text was updated successfully, but these errors were encountered: