Changes to add user scratch pad for matmul primitive to fix OOM issue in Transformer LT #54381

jojivk73 · 2022-02-14T23:12:36Z

Adding changes for the matmul primitive to use user scratch pad. This reduces memory footprint of the primitive. It fixes an out of memory issue when running Transformer LT with multiple instances and total thread count is large. Managing scratch pad for the primitive from the framework, fixes the out of memory issue, reduces memory footprint and does not affect performance. The changes :
Creates a new struct that hold the Tensor for scratch pad arg.
Allocates memory based on scratch pad size queried from primitive description.
Sets user scratch pad in post ops for the primitive.

penpornk

Thank you for the PR!

This reduces memory footprint of the primitive.

Could you please help clarify how providing a user-allocated scratchpad helps save the memory in this case? The user-allocated memory doesn't seem to be reused (only called once per each Compute call). Does the primitive over-allocate memory when not given user-allocated scratchpad? The primitive is not cached, e.g., a new one is created each time the op is called, so shouldn't it be able to allocate just the amount it needs?

gzmkl · 2022-03-09T20:59:57Z

@penpornk we are working to enable oneDNN scratchpad "user-mode" for conv, inner-product, and matmul ops.
@tf side, we will continue to reuse primitives with LRU caching; but it will control oneDNN memory consumption (oneDNN will execute an op with provided scratchpad buffter and won't allocate anything more).

To answer the last quest:
(1) all conv/inner-product (matmul fused)/matmul (sgemm) ops will be cached at TF

(2) scratchpad buffer is allocated with Compute(), just before invoking oneDNN execution.
The buffer size is decided by oneDNN. Please check the "struct UserScratchPad" implementation in core/util/mkl_util.h

Thanks

penpornk · 2022-03-10T07:26:14Z

(2) scratchpad buffer is allocated with Compute(), just before invoking oneDNN execution.
The buffer size is decided by oneDNN. Please check the "struct UserScratchPad" implementation in core/util/mkl_util.h

@gzmkl Thank you for the quick reply! I understand that the user scratchpad is allocated in Compute(). But I don't understand how this differs from letting oneDNN allocate the buffer itself, since the matmul primitive here is not cached in LRU and is created every time Compute() is called too (and oneDNN allocates memory at primitive creation time). Both approaches seem to allocate scratchpad memory with similar lifetime, and the sizes shouldn't be too different since the user-allocated approach also gets the size from oneDNN. So I'm wondering what makes the difference here (in this PR).

jojivk73 · 2022-03-10T23:17:53Z

@penpornk
When user scratch pad is not used by framework, oneDNN primitive adds its own scratch pad and hold it as long as the primitive is alive. TF also keeps all primitives alive till the LRU cache is full. In cases where there are many threads, each thread can create primitives for same dimensions, since the cache is thread local (this is essential). This results is many (similar) primitives created across multiple threads. For example, in the case of transformer (for which the issue was reported), the matmul dimensions varies based on input. This results in 100s of primitives getting created in each thread (when the thread count is greater than 1), each holding its scratch pad mem alive. And transformer with greater than 4 threads was going out of memory in machines with less memory (<182 GB)

With UserScratchPad, the TF framework controls the scratchpad by creating and releasing the scratchpad memory as needed : -ie (as of now) create it before execution and release it after execution of the primitive. This keeps the memory consumption low for the entire model

penpornk

@jojivk73 Thank you for the detailed reply!

I misread the do_not_cache parameter. I thought that false means the primitive is not cached (and assumed that the primitive is destroyed along with its auto-allocated scratchpad at the end of the kernel call -- hence having a similar lifetime as user-allocated scratchpad.) I understand now that the primitive here is cached and therefore will hold on to the scratchpad memory for as long as it stays in the LRU.

tensorflow/tensorflow/core/kernels/mkl/mkl_batch_matmul_op.cc

Lines 151 to 154 in 598681c

    
           // Create or retrieve matmul primitive from cache. 
        
           MklMatMulPrimitive<Tlhs, Trhs, Toutput>* matmul_prim = 
        
               MklMatMulPrimitiveFactory<float, Tlhs, Trhs, Toutput>::Get( 
        
                   *params, false /* value for do_not_cache */);

Rolling back PR #54381 because it broke Windows continuous build. PiperOrigin-RevId: 434790571

jojivk73 added 2 commits February 10, 2022 13:38

Fix for scratch pad

cfbc7a1

Merge branch 'master' into spad_public_jvk

6c016de

jojivk73 requested a review from penpornk as a code owner February 14, 2022 23:12

google-ml-butler bot added the size:M CL Change Size: Medium label Feb 14, 2022

google-ml-butler bot assigned gbaned Feb 14, 2022

google-ml-butler bot added the awaiting review Pull request awaiting review label Feb 14, 2022

tilakrayal added the comp:mkl MKL related issues label Feb 15, 2022

tilakrayal added this to Assigned Reviewer in PR Queue via automation Feb 15, 2022

tilakrayal self-assigned this Feb 15, 2022

jojivk73 added 2 commits February 24, 2022 14:31

Style check fix

c10aeaa

Merge branch 'master' into spad_public_jvk

a06ea20

tilakrayal removed their assignment Mar 2, 2022

agramesh1 mentioned this pull request Mar 9, 2022

Per thread allocation of MKL primitives #54979

Open

penpornk reviewed Mar 9, 2022

View reviewed changes

penpornk removed the awaiting review Pull request awaiting review label Mar 9, 2022

penpornk approved these changes Mar 14, 2022

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Mar 14, 2022

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 14, 2022

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 14, 2022

copybara-service bot merged commit b4bd9c3 into tensorflow:master Mar 14, 2022

google-ml-butler bot removed the ready to pull PR ready for merge process label Mar 14, 2022

This was referenced Mar 15, 2022

[INTEL oneDNN] user mode scratchpad implementation for conv op #55235

Merged

[INTEL oneDNN] user mode scratchpad implementation for inner-product #55236

Closed

copybara-service bot pushed a commit that referenced this pull request Mar 15, 2022

Rollback of PR #54381

165b30d

Rolling back PR #54381 because it broke Windows continuous build. PiperOrigin-RevId: 434790571

jojivk73 mentioned this pull request Mar 16, 2022

[INTEL oneDNN] Resubmitting scratch pad optimization for matmul primitive : PR number :54381 #55256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to add user scratch pad for matmul primitive to fix OOM issue in Transformer LT #54381

Changes to add user scratch pad for matmul primitive to fix OOM issue in Transformer LT #54381

jojivk73 commented Feb 14, 2022 •

edited

penpornk left a comment •

edited

gzmkl commented Mar 9, 2022

penpornk commented Mar 10, 2022 •

edited

jojivk73 commented Mar 10, 2022

penpornk left a comment

	// Create or retrieve matmul primitive from cache.
	MklMatMulPrimitive<Tlhs, Trhs, Toutput>* matmul_prim =
	MklMatMulPrimitiveFactory<float, Tlhs, Trhs, Toutput>::Get(
	params, false / value for do_not_cache */);

Changes to add user scratch pad for matmul primitive to fix OOM issue in Transformer LT #54381

Changes to add user scratch pad for matmul primitive to fix OOM issue in Transformer LT #54381

Conversation

jojivk73 commented Feb 14, 2022 • edited

penpornk left a comment • edited

Choose a reason for hiding this comment

gzmkl commented Mar 9, 2022

penpornk commented Mar 10, 2022 • edited

jojivk73 commented Mar 10, 2022

penpornk left a comment

Choose a reason for hiding this comment

jojivk73 commented Feb 14, 2022 •

edited

penpornk left a comment •

edited

penpornk commented Mar 10, 2022 •

edited