-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues running BERT on Windows #166
Comments
The error:
The generated HLSL:
To cause the error on Windows (note, uses my personal project): $env:RUST_LOG=debug
$env:WGPU_BACKEND="dx12"
cargo run --bin quokka-ui --release It fails with WGPU_BACKEND=vulkan or WGPU_BACKEND=dx11 as well, but the error message there is not very informative. |
WONNX-generated WGSL: alias Scalar = f32;
alias GemmVec = vec4<f32>;
alias GemmMat = mat4x4<f32>;
struct GemmArrayVector {
data: array<GemmVec>
};
@group(0) @binding(0)
var<storage, read> input_left: GemmArrayVector;
@group(0) @binding(1)
var<storage, read> input_right: GemmArrayVector;
// Bias
@group(0) @binding(2)
var<storage, read> input_bias: GemmArrayVector;
@group(0) @binding(3)
var<storage, read_write> output_0: GemmArrayVector;
@compute @workgroup_size(1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
let y = global_id.x % 192u;
let x = global_id.x / 192u;
let stack_index = global_id.y;
let left_offset = stack_index * 0u;
let right_offset = stack_index * 0u;
let output_offset = stack_index * 0u;
let index = output_offset + (x * 768u) + y;
let zero_vec = GemmVec(
Scalar(),
Scalar(),
Scalar(),
Scalar()
);
let zero_matrix = GemmMat(
zero_vec,
zero_vec,
zero_vec,
zero_vec
);
var tmpsum = zero_matrix;
var product = zero_matrix;
for(var k: u32 = 0u; k < 768u; k = k + 1u) {
let index_left = left_offset + (x * 3072u) + k;
let index_right = right_offset + (k * 768u) + y;
let mat_left = GemmMat(
input_left.data[index_left + 0u],
input_left.data[index_left + 768u],
input_left.data[index_left + 1536u],
input_left.data[index_left + 2304u]
);
let mat_right = GemmMat(
input_right.data[index_right + (0u)],
input_right.data[index_right + (192u)],
input_right.data[index_right + (384u)],
input_right.data[index_right + (576u)]
);
product = mat_right * mat_left;
for(var index_mat: u32 = 0u; index_mat < 4u; index_mat = index_mat + 1u) {
tmpsum[index_mat] = tmpsum[index_mat] + product[index_mat];
}
}
let bias_index =
y ;
for(var index_mat: u32 = 0u; index_mat < 4u; index_mat = index_mat + 1u) {
let bias = input_bias.data[bias_index ];
output_0.data[index + (index_mat * 192u)] =tmpsum[index_mat] +bias;
}
} |
Seeing a different issue now, when writing the input buffers:
Could be just my Windows laptop... |
0x887A0005 = DXGI_ERROR_DEVICE_REMOVED (https://learn.microsoft.com/en-us/windows/win32/direct3ddxgi/dxgi-error)
Not sure why this is happening but apparently it can get triggered by other things.
from gfx-rs/wgpu#1624 (comment) It would be nice to call |
On Windows, running BERT models leads to an error about loop unrolling in HLSL. Apparently one of the shaders has a loop with a too large (static) number of iterations. Same model runs fine on Mac/Metal. Will look into this later.
The text was updated successfully, but these errors were encountered: