Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD Support #903

Open
tiehuis opened this Issue Apr 7, 2018 · 18 comments

Comments

Projects
None yet
7 participants
@tiehuis
Copy link
Member

commented Apr 7, 2018

Current Progress


SIMD is very useful for fast processing of data and given Zig's goals of going fast, I think we need to look at how exposing some way of using these instructions easily and reliably.

Status-Quo

Inline Assembly

It is possible to do simd in inline-assembly as is. This is a bit cumbersome though and I think we should strive for being able to get any speed performances in the zig language itself.

Rely on the Optimizer

The optimizer is good and comptime unrolling and support helps a lot, but it doesn't provide guarantees that any specific code will be vectorized. You are at the mercy of LLVM and you don't want to see your code lose a huge hit in performance simply due to a compiler upgrade/change.

LLVM Vector Intrinsics

LLVM supports vector types as first class objects in it's ir. These correspond to simd instructions. This provides the bulk of the work and for us, we simply need to expose a way to construct these vector types. This would be analagous to the __attribute__((vector))__ builtin found in C compilers.


If anyone has any thoughts on the implementation and or usage then that would be great since I'm not very familiar with how these are exposed by LLVM. It would be great to get some discussion going in this area since I'm sure people would like to be able to match the performance of C in all areas with Zig.

@tiehuis tiehuis added the proposal label Apr 7, 2018

@abique

This comment has been minimized.

Copy link

commented Apr 7, 2018

I think relying on the compiler vector type is a good solution.
Both LLVM and GCC have it. If they're not present, you can always have a generic "software" fallback.

Syntax: you need a way to describe a vector type, an idea could be:

const value = <[]> f32 {0, 13, 23, 0.4};

So <[ N_ELTS ]> type would be the bracket style for vectors in this examples.

Also vector are use essentially for arithetic so regular artimetic should work.

Importants things:

  • be able to extract a single element from a vector
  • needs some kind of shuffle vector: `@shuffle(v1, v2, index0, index1, index2, ...)
  • you should be able to do an addition or multiplication between a scalar and a vector
  • vector can't be nested

The standard library should also provide simd version of cos, sin, exp and so on.

@andrewrk

This comment has been minimized.

Copy link
Member

commented Apr 7, 2018

How about adding operators for arrays? Example:

const std = @import("std");

test "simd" {
    var a = [4]i32{1, 2, 3, 4};
    var b = [4]i32{5, 6, 7, 8};
    var c = a + b;
    std.debug.assert(mem.eql(i32, c[0..], [4]i32{6, 8, 10, 12} ));
}

This would codegen to using vectors in LLVM.

@abique

This comment has been minimized.

Copy link

commented Apr 8, 2018

I believe you'll find out that using arrays for "simd vector" introduces more problems than solutions, and that's why llvm and gcc went a different way.

First thing is that they might have different alignment requirement. Plus those vector are supposed to be stored in a single register in the end, so you might want to codegen differently depending on vector / array maybe.

I also worked on a private DSL, and we had the distinction between vectors and array from the typing, and it was fine as far as I can tell. The vector type also provides useful information while doing the semantic analysis, and you see what you get. Otherwise you have some array magic which is exactly the kind of things that people want to avoid when switching to your new language right?

@andrewrk

This comment has been minimized.

Copy link
Member

commented Apr 8, 2018

I think you're right - the simplest thing for everyone is to introduce a new vector primitive type and have it map exactly to the LLVM type.

@andrewrk andrewrk added the accepted label Apr 8, 2018

@andrewrk andrewrk added this to the 0.4.0 milestone Apr 8, 2018

@BraedonWooding

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Also keep in mind the rsqrtss command and others which are seriously fast on systems that support them showing speed increases of 10x. This article here demonstrates some of the differences well; http://assemblyrequired.crashworks.org/timing-square-root/
and here; http://adrianboeing.blogspot.com.au/2009/10/timing-square-root-on-gpu.html.

We should aim to try to utilise this set of faster functions when we can.

@lmb

This comment has been minimized.

Copy link

commented Jul 16, 2018

I just stumbled on this. There is a blog post series by a (former?) Intel engineer who designed a compiler for a vectorized language: http://pharr.org/matt/blog/2018/04/18/ispc-origins.html
At the least an interesting read, but maybe good inspiration as well.

@abique

This comment has been minimized.

Copy link

commented Jul 16, 2018

Dense and interesting articles!

@BarabasGitHub

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

One thing to keep in mind here is that even though you can vectorize scalar code, there are a lot of operations that are supported by simd instructions which you can't do in 'normal' scalar code. Such as creating bit fields from floating point comparisons to later use them in bitwise operations (often to avoid branches). Plus there are integer operations which expand to wider integers and other special stuff.

The series of articles linked by @lmb also show well what the difference can be between code/compiler that's designed for SIMD and code/compiler that isn't.

andrewrk added a commit that referenced this issue Jan 31, 2019

introduce vector type for SIMD
See #903

 * create with `@vector(len, ElemType)`
 * only wrapping addition is implemented

This feature is far from complete; this is only the beginning.
@andrewrk

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

In the above commit I introduced the @Vector(len, ElemType) builtin to create vector types, and then I implemented addition (but I didn't make a test yet, hence the box is unchecked). So the effort here is started. Here is what I believe is left to do:

  • builtin @Vector to create vector types, and make it work with @typeInfo
  • compile error test for trying to nest vectors
  • addition (float)
  • addition (wrapping int)
  • int addition with safety checks
  • subtraction (wrapping int)
  • subtraction (float)
  • int subtraction with safety checks
  • negation (float, wrapping int)
  • int negation with safety checks
  • multiplication (wrapping int)
  • multiplication (float)
  • int multiplication with safety checks
  • division (float, wrapping int)
  • int division with safety checks
  • @rem/@mod/% (float, wrapping int)
  • int @rem/@mod/% with safety checks
  • @shlExact with safety checks
  • left shift
  • @shrExact with safety checks
  • right shift
  • bitwise AND
  • bitwise OR
  • bitwise XOR
  • @truncate
  • @floatCast
  • implicit widening casting
  • @floatToInt
  • @intToFloat
  • @ptrToInt
  • @intToPtr
  • @bitCast
  • @bitreverse
  • @bswap
  • @popCount
  • @clz
  • @ctz
  • @addWithOverflow, @mulWithOverflow, @subWithOverflow. This will either require a patch to LLVM or implementing in Zig
  • field access / pointer access / array access when the vector is of pointer type
  • compile error for trying to get a pointer to a vector element
  • vector comparison (int, float, pointer), which returns a vector of bools
  • element reading with [x] syntax
  • element modification with [x] syntax
  • @shuffle (see http://llvm.org/docs/LangRef.html#shufflevector-instruction)
  • implicit array to vector cast
  • implicit vector to array cast
  • @sqrt should support vectors and std.math.sqrt. Same for all the other math intrinsics but I think sqrt is the only one we have so far. (See #767)
    • powi
    • sin
    • cos
    • pow
    • exp
    • exp2
    • log
    • log10
    • log2
    • fma
    • fabs
    • minnum
    • maxnum
    • minimum
    • maximum
    • copysign
    • floor
    • ceil
    • trunc
    • rint
    • nearbyint
    • round
    • fshl / fshr
  • once #653 is implemented it will need to work with vectors of pointers
  • once #1284 is implemented it will need vector support
  • @maskedLoad (see http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics)
  • @maskedStore (see http://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics)
  • @maskedGather (see http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics)
  • @maskedScatter (see http://llvm.org/docs/LangRef.html#llvm-masked-scatter-intrinsics)
  • @maskedExpandLoad (see http://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
  • @maskedCompressStore (see http://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
  • make sure documentation is complete
  • C ABI tests
  • len property on the type
  • @bitCast to an integer with number of bits equal to len * elem_type.bit_count

No mixing vector/scalar support. Instead you will use @splat(N, x) to create a vector of N elements from a scalar value x. Reasoning for this is that it more closely matches the LLVM IR. So for example multiplication would be:

fn vecMulScalar(v: @Vector(10, i32), x: i32) @Vector(10, i32) {
    return v * @splat(10, x);
}
@abique

This comment has been minimized.

Copy link

commented Jan 31, 2019

The syntax looks ugly but if it works as good as the llvm builtin vectors, then it is fine! ;-)

Thank you, and don't forget the shuflle vector!

@abique

This comment has been minimized.

Copy link

commented Jan 31, 2019

What do you think of v10i32 ?

@andrewrk

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

What do you think of v10i32 ?

A few things:

  • We need the builtin function anyway (just like we have @IntType (which is planned to be renamed to @Int)), so @Vector is a good starting point. If we switch to syntax, it will be a very small change in the compiler.
  • If there is syntax for it, it should work for ints, floats, and pointers. I'm not sure how the v10i32 example would work for pointer elements, and, if you don't already know about vectors, @Vector seems more discoverable to me than v10i32.
  • Manually putting const v10i32 = @Vector(10, i32); in a file is not so bad. Let's try it out for a while, and maybe we add syntax later if it seems necessary.

Please do feel free to propose syntax for a vector type. What's been proposed so far:

  • <[N]> type

This syntax hasn't been rejected; I'm simply avoiding the syntax question until the feature is done since it's the easiest thing to change at the very end.

@abique

This comment has been minimized.

Copy link

commented Feb 1, 2019

Why would you want a vector of pointers? Can you do a vector load from that? Would that even be efficient? Do you want people to do vectorized pointer arithmetic? 🐙

I'd go with v4f32 style! People will really enjoy writing simd with that style. But of course it does not work with templates... :) So might need a more verbose type declaration indeed.

@andrewrk

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Why would you want a vector of pointers?

Mainly, because LLVM IR supports it, and they're usually pretty good about representing what hardware generally supports. We don't automatically do everything LLVM does, but it's a good null hypothesis.

Can you do a vector load from that?

Yes you can, which yields a vector. So for example you could have a vector of 4 pointers to a struct, and then obtain a vector of 4 floats which are their fields:

const Point = struct {x: f32, y: f32};
fn multiPointMagnitude(points: @Vector(4, *Point)) @Vector(4, f32) {
    return @sqrt(points.x * points.x + points.y * points.y);
}

It's planned for this code to work verbatim once this issue is closed.

Not only can you do vector loads and vector stores from vectors of pointers, you can also do @maskedGather, @maskedScatter, and more. See the LLVM LangRef links in the comment above for explanations.

@travisstaloch

This comment has been minimized.

Copy link

commented Feb 2, 2019

How are we supposed to initialize a vector? I couldn't find an example in the newest code. Or is this not implemented yet?

For example, the following doesn't work:

test "initialize vector" {
    const V4i32 = @Vector(4, i32);
    var v: V4i32 = []i32{ 0, 1, 2, 3 };
}
@andrewrk

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

Your example is planned to work. That's the checkbox above labeled "implicit array to vector cast".

andrewrk added a commit that referenced this issue Feb 5, 2019

SIMD: array to vector, vector to array, wrapping int add
also vectors and arrays now use the same ConstExprVal representation

See #903
@andrewrk

This comment has been minimized.

Copy link
Member

commented Feb 5, 2019

@travisstaloch the array <-> vector casts work now. Here's the passing test case:

test "implicit array to vector and vector to array" {
const S = struct {
fn doTheTest() void {
var v: @Vector(4, i32) = [4]i32{10, 20, 30, 40};
const x: @Vector(4, i32) = [4]i32{1, 2, 3, 4};
v +%= x;
const result: [4]i32 = v;
assertOrPanic(result[0] == 11);
assertOrPanic(result[1] == 22);
assertOrPanic(result[2] == 33);
assertOrPanic(result[3] == 44);
}
};
S.doTheTest();
comptime S.doTheTest();
}

andrewrk added a commit that referenced this issue Feb 22, 2019

implement vector negation
also fix vector behavior tests, they weren't actually testing
runtime vectors, but now they are.

See #903

@andrewrk andrewrk removed this from the 0.4.0 milestone Mar 22, 2019

@andrewrk andrewrk added this to the 0.5.0 milestone Mar 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.