Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT throughput: noway_assert #7709

Closed
BruceForstall opened this issue Mar 23, 2017 · 7 comments
Closed

JIT throughput: noway_assert #7709

BruceForstall opened this issue Mar 23, 2017 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitThroughput CLR JIT issues regarding speed of JIT itself tenet-performance Performance related issue
Milestone

Comments

@BruceForstall
Copy link
Member

The JIT has about 3300 noway_assert. These are executed in non-DEBUG (aka, RELEASE) builds. Some might be frequently executed and thus costly. Instead of auditing all of them for relevance (i.e., in an optimization phase that can be backed out of), or apparently importance, we could change the noway_assert macro (conditionally) to collect a count of which ones are frequently executed, using a hash table from preprocessor FILE and LINE to execution count, dumped at the end of compilation. Then, we could convert the worst ones to simple asserts.

category:throughput
theme:throughput
skill-level:expert
cost:medium

@BruceForstall
Copy link
Member Author

I implemented this. I ran "a lot" of the test tree, processed the data, and saw these as the top 75 dynamic occurrences of noway_asserts (some might actually get optimized away in release builds, as there are a few static constant asserts here). The columns are: total count, filename, line number, assertion text.

181470679, e:\gh\coreclr2\src\jit\morph.cpp, 14648, tree
180436108, e:\gh\coreclr2\src\jit\morph.cpp, 14649, tree->gtOper != GT_STMT
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 808, assertIndex <= optAssertionCount
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 807, assertIndex != NO_ASSERTION_INDEX
72024047, e:\gh\coreclr2\src\jit\liveness.cpp, 1632, lclNum < lvaCount
43294958, e:\gh\coreclr2\src\jit\lsra.cpp, 3003, count < MaxInternalRegisters
40851620, e:\gh\coreclr2\src\jit\lclvars.cpp, 2510, varNum < lvaCount
35418942, e:\gh\coreclr2\src\jit\morph.cpp, 10632, tree->OperKind() & GTK_SMPOP
33982229, e:\gh\coreclr2\src\jit\flowgraph.cpp, 5228, opcode < CEE_COUNT
32541990, e:\gh\coreclr2\src\jit\morph.cpp, 11463, tree->gtOper != GT_CALL
32533197, e:\gh\coreclr2\src\jit\morph.cpp, 13347, oper == tree->gtOper
30050847, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16328, !(block->bbFlags & BBF_REMOVED)
27085069, e:\gh\coreclr2\src\jit\morph.cpp, 15542, stmt->gtOper == GT_STMT
23724178, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18490, tree->gtOper == GT_STMT
23362852, e:\gh\coreclr2\src\jit\liveness.cpp, 1642, varIndex < lvaTrackedCount
23102961, e:\gh\coreclr2\src\jit\emit.h, 1604, (UNATIVE_OFFSET)distance == distance
22587177, e:\gh\coreclr2\src\jit\morph.cpp, 8401, tree->OperKind() & GTK_LEAF
21688207, e:\gh\coreclr2\src\jit\morph.cpp, 5984, tree->gtOper == GT_LCL_VAR
19590233, e:\gh\coreclr2\src\jit\lclvars.cpp, 3599, (tree->gtOper == GT_LCL_VAR) || (tree->gtOper == GT_LCL_FLD)
19478128, e:\gh\coreclr2\src\jit\lclvars.cpp, 3680, tiVerificationNeeded || varDsc->lvType == TYP_UNDEF || tree->gtType == TYP_UNKNOWN || allowStructs || genActualType(varDsc->TypeGet()) == genActualType(tree->gtType) || (tree->gtType == TYP_BYREF && varDsc->TypeGet() == TYP_I_IMPL) || (tree->gtType == TYP_I_IMPL && varDsc->TypeGet() == TYP_BYREF) || (tree->gtFlags & GTF_VAR_CAST) || varTypeIsFloating(varDsc->TypeGet()) && varTypeIsFloating(tree->gtType)
18224434, e:\gh\coreclr2\src\jit\morph.cpp, 6019, !(tree->gtFlags & GTF_VAR_DEF) || varAddr
15639847, e:\gh\coreclr2\src\jit\gentree.cpp, 7095, argInfo != nullptr
14993689, e:\gh\coreclr2\src\jit\morph.cpp, 8344, tree->OperKind() & GTK_CONST
13800099, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18555, list.gtNext->gtPrev == &list
13209197, e:\gh\coreclr2\src\jit\morph.cpp, 14731, tree != nullptr
12710725, e:\gh\coreclr2\src\jit\emitxarch.cpp, 546, prefix >= 0x40 && prefix <= 0x4F
11966466, e:\gh\coreclr2\src\jit\morph.cpp, 10733, op1
11256849, e:\gh\coreclr2\src\jit\emitxarch.cpp, 1695, (int)offs < 0
10750793, e:\gh\coreclr2\src\jit\importer.cpp, 563, impTreeLast != nullptr
10503327, e:\gh\coreclr2\src\jit\flowgraph.cpp, 889, block
10444252, e:\gh\coreclr2\src\jit\codegenxarch.cpp, 1516, targetType != TYP_STRUCT
10283544, e:\gh\coreclr2\src\jit\flowgraph.cpp, 9626, block->bbNext == bNext
9808620, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16280, !(block->bbFlags & BBF_TRY_BEG)
9808617, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16279, !block->bbCatchTyp
9807880, e:\gh\coreclr2\src\jit\morph.cpp, 15677, fgPtrArgCntCur == 0
9490871, e:\gh\coreclr2\src\jit\valuenum.cpp, 747, attribs == CEA_None
9297809, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3328, emitVerifyEncodable(ins, size, reg)
9049871, e:\gh\coreclr2\src\jit\liveness.cpp, 1879, VarSetOps::IsSubset(this, keepAliveVars, life)
9026196, e:\gh\coreclr2\src\jit\flowgraph.cpp, 890, blockPred
8640521, e:\gh\coreclr2\src\jit\emitxarch.cpp, 4861, emitVerifyEncodable(ins, size, ireg)
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2423, rv1 || mul != 1
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2425, FitsIn<INT32>(cns)
7375190, e:\gh\coreclr2\src\jit\morph.cpp, 15754, fgExpandInline == false
7352623, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3735, emitVerifyEncodable(ins, size, reg1, reg2)
7322925, e:\gh\coreclr2\src\jit\morph.cpp, 10675, op1 == tree->gtOp.gtOp1
7282409, e:\gh\coreclr2\src\jit\morph.cpp, 8054, call->gtOper == GT_CALL
7028316, e:\gh\coreclr2\src\jit\lclvars.cpp, 3363, varDsc->lvRefCnt > 0
6854055, e:\gh\coreclr2\src\jit\flowgraph.cpp, 11017, (block->bbFlags & BBF_REMOVED) == 0
6777828, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 497, smOpcode < SM_COUNT
6777827, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 498, smOpcode != SM_PREFIX_N
6480615, e:\gh\coreclr2\src\jit\flowgraph.cpp, 17102, blk != nullptr
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13703, GT_SUB == GT_ADD + 1
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13704, GT_MUL == GT_ADD + 2
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13705, GT_DIV == GT_ADD + 3
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13706, GT_MOD == GT_ADD + 4
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13716, GT_RSZ == GT_ADD + 12
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13715, GT_RSH == GT_ADD + 11
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13714, GT_LSH == GT_ADD + 10
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13712, GT_AND == GT_ADD + 9
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13711, GT_XOR == GT_ADD + 8
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13707, GT_UDIV == GT_ADD + 5
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13708, GT_UMOD == GT_ADD + 6
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13710, GT_OR == GT_ADD + 7
6262994, e:\gh\coreclr2\src\jit\liveness.cpp, 3003, compCurBB == block
6062460, e:\gh\coreclr2\src\jit\flowgraph.cpp, 7343, tree->OperGet() == GT_ASG
6028687, e:\gh\coreclr2\src\jit\codegencommon.cpp, 11901, jitGetILoffs(offsx) <= compiler->info.compILCodeSize
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6747, varDsc->lvIsInReg() || varDsc->lvOnFrame || varDsc->lvRefCnt == 0
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6751, !varDsc->lvRegister || !varDsc->lvOnFrame
5280889, e:\gh\coreclr2\src\jit\lclvars.cpp, 4567, !varDsc->lvFramePointerBased || codeGen->doubleAlignOrFramePointerUsed()
5223565, e:\gh\coreclr2\src\jit\flowgraph.cpp, 1713, fgDomsComputed
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1833, endNode || (startNode == compCurStmt->gtStmt.gtStmtExpr)
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2945, nextStmt->gtOper == GT_STMT
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2944, nextStmt
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1832, compCurStmt->gtOper == GT_STMT
4798987, e:\gh\coreclr2\src\jit\assertionprop.cpp, 3931, !optLocalAssertionProp

@BruceForstall
Copy link
Member Author

I measured the total impact of noway_assert using instruction counts over SuperPMI collections of the dotnet/coreclr testbed, by removing noway_assert from release build. I saw a 1.03% overhead from noway_assert using normal optimization, and a 0.74% overhead from noway_assert using MinOpts.

@BruceForstall
Copy link
Member Author

The top 11 noway_asserts here are over 50% of the dynamic count, so converting these few to simple assert I would expect to see up to 0.5% throughput improvement.

@mikedn
Copy link
Contributor

mikedn commented Jun 7, 2017

What's the story of this noway_assert thing anyway? How was decided where to use assert and where to use noway_assert?

@BruceForstall
Copy link
Member Author

As I understand it, assert was converted to noway_assert automatically or semi-automatically. I actually don't know how they determined which were to be converted. Theoretically, we should not have any noway_assert unless a re-compilation with MinOpts would avoid repeating the condition (since that's the main benefit of noway_assert). But that's hard to tell sometimes. So we have to make a call about which ones are worth it, and which are too expensive (now, after the automated conversion was done).

BruceForstall referenced this issue in BruceForstall/coreclr Jun 7, 2017
With these few changes, I measured a JIT instruction count reduction
of 0.37% of SuperPMI over the tests, and 0.17% for MinOpts.

Related to #10421
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@TIHan
Copy link
Contributor

TIHan commented Nov 1, 2023

Considering this is for throughput, it would be interesting to investigate this more.

@TIHan TIHan modified the milestones: Future, 9.0.0 Nov 1, 2023
@TIHan TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Nov 1, 2023
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 9.0.0, 10.0.0 Jul 25, 2024
@BruceForstall
Copy link
Member Author

Going to close this, as I don't expect we'll work on this anytime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitThroughput CLR JIT issues regarding speed of JIT itself tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants