Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread throttling profile for DGEMV on NEOVERSEV1 #5175

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

shubhamsvc
Copy link

This PR introduces thread throttling for DGEMV on Neoverse V1.

Benchmarking results for matrix sizes [2,1024]:
Machine: AWS Graviton3 Processor

- dgemv_n: Geometric mean speedup of 2.2x
image

- dgemv_t: Geometric mean speedup of 2.7x
image

@shubhamsvc
Copy link
Author

Please help with review
cc @martin-frbg @annop-w @michalowski-arm

@shubhamsvc shubhamsvc force-pushed the dgemv_thread_throttling branch from e311258 to d7a2b6b Compare March 13, 2025 05:58
interface/gemv.c Outdated
@@ -89,6 +89,24 @@ static inline int get_gemv_optimal_nthreads_neoversev2(BLASLONG MN, int ncpu) {
}
#endif

//thread throttling for dgemv
#if defined(DYNAMIC_ARCH) || defined(NEOVERSEV1)
static inline int get_dgemv_optimal_nthreads_neoversev1(BLASLONG MN, int ncpu) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining a new function, I think it is cleaner to just use get_gemv_optimal_nthreads_<uarch>.
Inside get_gemv_optimal_nthreads_<uarch>, we can #ifdef DOUBLE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please keep this inside the existing function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as suggested

interface/gemv.c Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) {
#endif
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(BFLOAT16)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

interface/gemv.c Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) {
#endif
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
return get_gemv_optimal_nthreads_neoversev1(MN, ncpu);
#elif defined(NEOVERSEV1) && !defined(COMPLEX) && defined(DOUBLE) && !defined(BFLOAT16)
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as suggested

interface/gemv.c Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) {
#endif
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
return get_gemv_optimal_nthreads_neoversev1(MN, ncpu);
#elif defined(NEOVERSEV1) && !defined(COMPLEX) && defined(DOUBLE) && !defined(BFLOAT16)
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);
#elif defined(NEOVERSEV2) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
return get_gemv_optimal_nthreads_neoversev2(MN, ncpu);
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(BFLOAT16)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as suggested

interface/gemv.c Outdated
if (strcmp(gotoblas_corename(), "neoversev1") == 0) {
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as suggested

interface/gemv.c Outdated
: MN < 435600L ? MIN(ncpu, 24)
: MN < 810000L ? MIN(ncpu, 32)
: MN < 1050625 ? MIN(ncpu, 40)
: ncpu;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please move this inside get_gemv_optimal_nthreads_neoversev1 and use #ifdef DOUBLE ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@shubhamsvc shubhamsvc force-pushed the dgemv_thread_throttling branch from d7a2b6b to 8e289ec Compare March 18, 2025 07:54
@annop-w
Copy link
Contributor

annop-w commented Mar 18, 2025

@shubhamsvc Thank you for the changes. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants