Use SIMD for the graphics compositor code #31

Closed
taisel opened this Issue Jun 9, 2016 · 7 comments

Comments

Projects
None yet
1 participant
@taisel
Owner

taisel commented Jun 9, 2016

Of note, it looks like that particular block of code can be rewritten to use SIMD.

Specifically:

  • SIMD.Int32x4.and() for isolating flags.
  • SIMD.Int32x4.greaterThanOrEqual() for comparing flags.
  • SIMD.Int32x4.lessThan() for comparing flags.
  • SIMD.Int32x4.select() for selecting which pixels win "priority" based on prior SIMD comparison ops.
  • SIMD.Int32x4.load() to load from the backing store used by the renderers prior to compositing.
  • SIMD.Int32x4.store() to store into the line buffer.
  • SIMD.Int32x4.splat() to generate some bitmasks that the SIMD AND ops will use.

Which means we can process up to 4 pixels at once in parallel, in the usual loop of 240 pixels per line.

No reason to nuke legacy non-SIMD based code, as there will be a test before the functions are built to see if SIMD is supported or not, and will patch in the according legacy path if not supported. Albeit, the legacy code needs a good cleanup, since it's illegible as it is JS that codegens more JS.

@taisel

This comment has been minimized.

Show comment
Hide comment
@taisel

taisel Jun 9, 2016

Owner

Need to figure out how to manage the color effects though to play well with SIMD based code.

Owner

taisel commented Jun 9, 2016

Need to figure out how to manage the color effects though to play well with SIMD based code.

@taisel taisel self-assigned this Jun 9, 2016

@taisel

This comment has been minimized.

Show comment
Hide comment
@taisel

taisel Jun 9, 2016

Owner

For users with browsers that support SharedArrayBuffer, this will not affect performance, as the graphics is done offthread for you, and IIRC IodineGBA is CPU thread bound.

This will help browsers that manage to support SIMD before SharedArrayBuffer for running IodineGBA faster...

Owner

taisel commented Jun 9, 2016

For users with browsers that support SharedArrayBuffer, this will not affect performance, as the graphics is done offthread for you, and IIRC IodineGBA is CPU thread bound.

This will help browsers that manage to support SIMD before SharedArrayBuffer for running IodineGBA faster...

@taisel

This comment has been minimized.

Show comment
Hide comment
@taisel

taisel Jun 9, 2016

Owner

Apparently we added it to color blending ops first for a completely different use: 938ecd2

Owner

taisel commented Jun 9, 2016

Apparently we added it to color blending ops first for a completely different use: 938ecd2

@taisel

This comment has been minimized.

Show comment
Hide comment
Owner

taisel commented Jun 12, 2016

@taisel

This comment has been minimized.

Show comment
Hide comment
@taisel

taisel Jun 12, 2016

Owner

Probably should move the color effects pass out as a separate loop, and quick-store the lower/upper pixel values into a different buffer.

Owner

taisel commented Jun 12, 2016

Probably should move the color effects pass out as a separate loop, and quick-store the lower/upper pixel values into a different buffer.

@taisel

This comment has been minimized.

Show comment
Hide comment
Owner

taisel commented Jun 14, 2016

@taisel

This comment has been minimized.

Show comment
Hide comment
Owner

taisel commented Jun 18, 2016

@taisel taisel closed this Jun 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment