Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bandwidth] Bandwidth for typeN and compare with clpeak result #7

Closed
ysh329 opened this issue Sep 25, 2017 · 2 comments
Closed

[bandwidth] Bandwidth for typeN and compare with clpeak result #7

ysh329 opened this issue Sep 25, 2017 · 2 comments
Assignees
Projects

Comments

@ysh329
Copy link
Owner

ysh329 commented Sep 25, 2017

Before, set max freq. for gpu and cpu using scrips in tools of this repo.

  1. Calculate bandwidth for typeN: intN, floatN, halfN;
  2. Compare with clpeak result.

clpeak:

Platform: ARM Platform
  Device: Mali-T860
    Driver version  : 1.2 (Linux ARM64)
    Compute units   : 4
    Clock frequency : 800 MHz

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

    Single-precision compute (GFLOPS)
      float   : 22.86
      float2  : 44.68
      float4  : 44.51
      float8  : 41.46
      float16 : 46.16

    half-precision compute (GFLOPS)
      half   : 22.83
      half2  : 46.46
      half4  : 93.96
      half8  : 92.44
      half16 : 69.40

    Double-precision compute (GFLOPS)
      double   : 3.60
      double2  : 3.54
      double4  : 20.92
      double8  : 20.60
      double16 : 20.35

    Integer compute (GIOPS)
      int   : 20.26
      int2  : 49.72
      int4  : 47.51
      int8  : 48.96
      int16 : 41.47

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 4.06
      enqueueReadBuffer          : 2.17
      enqueueMapBuffer(for read) : 2015.28
        memcpy from mapped ptr   : 2.18
      enqueueUnmap(after write)  : 5406.56
        memcpy to mapped ptr     : 2.23

    Kernel launch latency : 78.36 us
@ysh329 ysh329 created this issue from a note in bandwidth (In Progress) Sep 25, 2017
@ysh329
Copy link
Owner Author

ysh329 commented Sep 25, 2017

My bandwidth results are as below (more concrete logs're here):

half1: 5.16 GB/s
half2: 4.71 GB/s
half4: 5.14 GB/s
half8: 5.50 GB/s
half16: 4.98 GB/s
half1-A53: 2.10 GB/s
half1-A72: 3.91 GB/s

short1: 5.29 GB/s
short2: 4.71 GB/s
short4: 5.07 GB/s
short8: 5.52 GB/s
short16: 5.00GB/s
short1-A53: 2.26 GB/s
short1-A72: 4.51 GB/s

int1: 5.26 GB/s
int2: 5.49 GB/s
int4: 6.13 GB/s
int8: 5.49 GB/s
int16: 5.28 GB/s
int-a53: 2.25 GB/s
int-a72: 4.53 GB/s

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s

double1: 4.49 GB/s
double2: 6.39 GB/s
double4: 5.58 GB/s
double8: 5.40 GB/s
double16: 5.51 GB/s
double1-A53: 2.29 GB/s
double1-A72: 4.58 GB/s

@ysh329
Copy link
Owner Author

ysh329 commented Sep 26, 2017

The gap between clpeak (bandwidth is bigger than measures using my code) and my bandwidth is due to read operation only for clpeak, but my bandwidth have both read and write operations in kernel function.

clpeak

Kerel function is here.

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

my bandwidth

Kernel function is here.

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s

@ysh329 ysh329 closed this as completed Sep 26, 2017
@ysh329 ysh329 moved this from In Progress to Done in bandwidth Sep 26, 2017
@ysh329 ysh329 self-assigned this Jan 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

1 participant