Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design flaw: Access out of bounds on pointer to array #386

Closed
ofelas opened this issue Jun 6, 2017 · 10 comments · Fixed by #1053
Closed

design flaw: Access out of bounds on pointer to array #386

ofelas opened this issue Jun 6, 2017 · 10 comments · Fixed by #1053
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@ofelas
Copy link

ofelas commented Jun 6, 2017

Unclear behaviour on array access

fn fillArray(src: &const [24]u8, dest: &[24]u8) -> usize {
    var idx: usize = 24;
    // access of of bounds
    dest[1] = src[0];
    // copies everything
    // dest[0] = src[0];

    // error: expected type '[24]u8', found 'u8'
    //dest[1] = u8('a');
    (*dest)[14] = 'b';
    (*dest)[3] = 'c';
    // access out of bounds
    dest[idx] = src[0];
    // segfault
    //(*dest)[idx] = (*src)[0];
    // error: index 24 outside array of size 24
    //(*dest)[24] = 'b';

    return (*dest).len;
}

pub fn main() -> %void {
    var s : [24]u8 = []u8{0} ** 24;
    var ss : [24]u8 = []u8{'a'} ** 24;
    const l = fillArray(&ss, &s);
    if (s[0] == 'a') {
       s[1] = s[0];
    }
}
@andrewrk andrewrk added this to the 0.1.0 milestone Jun 6, 2017
@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior and removed bug Observed behavior contradicts documented or intended behavior labels Jun 6, 2017
@andrewrk
Copy link
Member

andrewrk commented Jun 6, 2017

I don't think there is a compiler bug here, but this reveals an unfortunate design flaw in Zig.

Given dest: &[24]u8, it's natural to expect dest[0] to access the first u8 in the array, in the same way that if you had foo: &Foo you would expect foo.field to access the field. However, dest[0][0] is the first u8 in the array, and dest[0] is the array itself. dest[1] is out of bounds and assigning to it clobbers the stack. It's far too easy to accidentally do this here.

I'm curious to get @thejoshwolfe's thoughts here.

@andrewrk
Copy link
Member

andrewrk commented Jun 7, 2017

Here's a proposal:

  • Add a type that looks like this: [..]T. This is the ArithmeticPointer type. &T is renamed to the Reference type. Consideration: It kind of makes more sense that []T would be the arithmetic pointer type and [..]T would be the slice type. I'm not a fan of that though because slices are so common, I want them to be []T.
  • Reference types can no longer be sliced or indexed like arrays. If your type is &[24]u8 however, then you could slice and index this, and it would slice and index the underlying array, just like you would expect.
  • ArithmeticPointer types support binary + and - operations with integer types.
  • Arithmetic Pointer types can be sliced and indexed. Slicing an arithmetic pointer type returns a slice type.
  • Arithmetic Pointer types do not support *x prefix dereferencing operator. Use x[0] instead.
  • Remove @intToPtr and allow ([..]T)(address) explicit casting from integer types
  • Slices can be converted to ArithmeticPointer types like this: slice.ptr. This returns a [..]T for the first element in the array. @typeOf(&slice[0]) is &T - a reference type of the child type of the slice.
  • ([..]T)(ref) to convert a reference to an arithmetic pointer

@AndreaOrru thoughts? how would this fit into zen?

@raulgrell
Copy link
Contributor

My only suggestion for the proposal is to make the ArithmeticPointer type a #T instead of [..]T as the latter might be too close to a slice.

I don't know if it varies between cultures, but the # has a strong association with numbers: ie you can do arithmetic to it and you should access it with a number subscript/index.

@thejoshwolfe
Copy link
Sponsor Contributor

I'm still a little uneasy about the [..]T syntax as well. I'm not sure i like adding a new type sigil either though. Part of this proposal is an agenda to discourage programmers from using an arithmetic pointer when a reference or a slice would be better (better for debug safety, clarity of intent, etc.). The somewhat-excessive 4-character non-alphanumeric prefix [..] works for this agenda, but i'm concerned it may fail to be readable.

My original idea of an arithmetic pointer was that it would be a slice with no runtime-known .len that you get by slicing something with x[0..]. However, i don't think that "unbounded slice" syntax is very useful given that normal slices have a .ptr field you can use instead.

The important things about an arithmetic pointer are that you can do + and - arithmetic, and you can do [n] subscripting. So how about [+]T. Parsing that is pretty clean, because Zig has no prefix + operator.

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Jun 16, 2017
@andrewrk
Copy link
Member

I'm going to move this to 0.2.0. It's an important, breaking change, but status quo is working and well-defined, if a bit unintuitive. I think we can release 0.1.0 before focusing on this change.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.1.0 Jun 17, 2017
@andrewrk andrewrk changed the title Access out of bounds on pointer to array design flaw: Access out of bounds on pointer to array Dec 6, 2017
@skyfex
Copy link

skyfex commented Dec 14, 2017

I stumbled into this design issue today, and can provide some feedback. Pointer arithmetic is definitely very important for some cases. I was able to use indexing instead in my case, but I did stumble on some things that were annoying.

I wanted a pointer to someArray[someArray.len]. (I would then increase another pointer until it became equal to that end-pointer). This would trigger out-of-bounds error. That's actually good though. Safety is important. If we had pointer arithmetic, I could do &someArray[someArray.len-1] + 1. I think that would communicate what I wanted, and that I knew what I was doing, without having to disable bounds checking.

When incrementing a pointer, doing bufferPtr = &bufferPtr[1] does not clearly communicate intention. It just feels like magic incantation to get to what I really want.

I've always found it a bit magic too, that writing somePtr + 1 in C adds something other than 1 to the address. But I don't have a better suggestion, and it's too useful to not have. In some languages it can be common to define functions that wrap numbers in a type to make it clear what you're working with. It's quite nice when you have a dot-syntax. In that case you could have somePtr + 1.unit. But it may be too verbose and it might be hard to find a word that's reasonable to reserve.

I think the proposal looks pretty good. Declaring something as an arithmetic pointer provides very useful information about what you intend to do with it.

There may be interactions with null pointers to think about as well. In my experiment, I wish I could compare a null pointer with a plain pointer. They may not be the same type, but I think nullPtr != normalPtr should be true if nullPtr is null. Possibly nullPtr < normalPtr too.

@andrewrk
Copy link
Member

andrewrk commented Dec 15, 2017

I could do &someArray[someArray.len-1] + 1

I think this is probably not the best way to communicate intent here. If the slice is length 0 this will trigger an out of bounds panic. The -1, +1 is clearly a workaround and not directly communicating what you want.

If you really just want a pointer to someArray[someArray.len] you can get it like this:
&someArray.ptr[someArray.len].

Help me understand though, why do you need a pointer to the address after the valid data?

As for the use case of incrementing a pointer, I want to understand this use case better. If you're incrementing a pointer, that means that there is something that specifies how many items after the base pointer are valid. If that value is known ahead of time, then a slice is appropriate. If it is null terminated (see related issue #265) or otherwise found out lazily, you can use an index to march forward, like the implementation of strlen in std.cstr.len:

pub fn len(ptr: &const u8) -> usize {
    var count: usize = 0;
    while (ptr[count] != 0) : (count += 1) {}
    return count;
}

I wish I could compare a null pointer with a plain pointer.

We can definitely make this work. What we'll do is implement the equality operators for ?&T types, if it's not already implemented. Then we just need the ability to implicitly cast, and peer cast &T to ?&T which I'm pretty sure is already implemented and tested. Then a == b or a != b where a and b are any combination of ?&T and &T will work.

@skyfex
Copy link

skyfex commented Dec 15, 2017

If you really just want a pointer to someArray[someArray.len] you can get it like this:
&someArray.ptr[someArray.len].

That's what I originally tried, but that triggers bounds checking. I don't think that wrapping the code in something that disables bounds checking is nice either.

As for the use case of incrementing a pointer, I want to understand this use case better.

If you have a buffer that you're consuming items from one at a time, you can roughly do it two ways.

1: Have a pointer to the start of the buffer, an index of your current item, and the length or end pointer (3 words)

2: Have a pointer to the next item to consume, and the length or end pointer (2 words)

The second approach saves you a word. You could do it without incrementing a pointer: you could have a pointer to the end, and a counter of how many items are left. But that's a bit silly.

You could elegantly do the second approach with a slice. ptr = ptr[1..] should work maybe? But then you have two store both words back to memory, rather than just updating the start pointer.

It doesn't really have to be easy to do pointer arithmetic, if the standard library includes generic structures that abstracts away super-optimized implementations of various datastructures and algorithms, it's not that necessary.

Look at the code related to 'uartBufferPtr' here to see what I'm doing now. I'm currently using an end-pointer rather than length or slice: https://github.com/skyfex/zig-nrf-demo/blob/master/test.zig

Looking at the code now, I'm not sure I actually mind doing &ptr[1]. Maybe there should only be one way to do things.. Having both indexing and pointer arithmetic is perhaps redundant.

@andrewrk
Copy link
Member

That's what I originally tried, but that triggers bounds checking. I don't think that wrapping the code in something that disables bounds checking is nice either.

Did you see the .ptr? This test passes for me:

const assert = @import("std").debug.assert;

test "get end ptr of slice" {
    var array = []u8{1, 2, 3, 4};
    var slice = array[0..];

    const ptr_to_end = &slice.ptr[slice.len];
    assert(@ptrToInt(ptr_to_end) == @ptrToInt(&slice[0]) + 4);
}

You could elegantly do the second approach with a slice. ptr = ptr[1..] should work maybe? But then you have two store both words back to memory, rather than just updating the start pointer.

I think this is the best way to represent this. You're right that it has this downside of storing both words. Sometimes the optimizer is able to avoid this, sometimes not.

Is this code reasonable?

diff --git a/test.zig b/test.zig
index 464d9f4..0cd7e55 100644
--- a/test.zig
+++ b/test.zig
@@ -130,8 +130,9 @@ fn NVIC_DisableIRQ(IRQn: c_int) {
 // -- UART Driver --
 
 var uartBusy: bool = false;
-var uartBufferPtr: &const u8 = undefined;
-var uartBufferEnd: &const u8 = undefined;
+
+var uartBuffer: []const u8 = undefined;
+var uartBuffer_index: usize = undefined;
 
 fn uartInit() {
     // NVIC_SetPriority(PWM_IRQn, 2);
@@ -152,11 +153,9 @@ export fn UART0_IRQHandler() {
 }
 
 fn uartSendBufferByte() {
-    if (@ptrToInt(uartBufferPtr) <= @ptrToInt(uartBufferEnd)) {
-        nrfUart.TXD = *uartBufferPtr;
-        uartBufferPtr = &uartBufferPtr[1];
-    }
-    else {
+    if (uartBuffer_index < uartBuffer.len) {
+        nrfUart.TXD = uartBuffer[uartBuffer_index];
+    } else {
         uartBusy = false;
         nrfGpio.OUT = nrfGpio.OUT ^ (1<<4);
     }
@@ -165,8 +164,8 @@ fn uartSendBufferByte() {
 fn uartSendString(str: []const u8) {
     while (uartBusy) {}
     uartBusy = true;
-    uartBufferPtr = &str[0];
-    uartBufferEnd = &str[str.len-1];
+    uartBuffer_index = 0;
+    uartBuffer = str;
 
     uartSendBufferByte();
 }

@skyfex
Copy link

skyfex commented Dec 15, 2017

Ah, I missed ".ptr", yes. The alternate code is reasonable yes. If I wasn't playing around, and just wanted clear/simple code, I'd probably go for something like that. But when Zig becomes a decent alternative to C, I know there'll be people who will want to do hardcore hand optimisations, so it's interesting to see how natural that is in Zig.

The safest way should be the easiest though.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018
andrewrk added a commit that referenced this issue Jun 5, 2018
 * enable slicing for single-item ptr to arrays
 * disable slicing for other single-item pointers
 * enable indexing for single-item ptr to arrays
 * disable indexing for other single-item pointers

see #770
closes #386
andrewrk added a commit that referenced this issue Jun 5, 2018
 * enable slicing for single-item ptr to arrays
 * disable slicing for other single-item pointers
 * enable indexing for single-item ptr to arrays
 * disable indexing for other single-item pointers

see #770
closes #386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants