Improve profiler timing on Windows/x86 using RDTSC#629
Improve profiler timing on Windows/x86 using RDTSC#629mvorisek wants to merge 3 commits intoxdebug:masterfrom
Conversation
dd9a1e9 to
6bf31ad
Compare
|
I'm not inclined to merge changes to timings until after Xdebug 3.0 is out, as it needs my full attention, and there is enough scope creep already. PR #628 is still "in draft", so I didn't pay it any attention yet, as I thought it wasn't done yet. |
|
This is architecture specific; it wouldn't work on ARM Win32. |
thanks for your feedback - would you like to see there macro like |
85548a1 to
61ad26b
Compare
1820d76 to
cc71fc8
Compare
cc71fc8 to
b44d7e0
Compare
eaf7a10 to
bda82c6
Compare
bda82c6 to
c55bae7
Compare
0e5402a to
0861545
Compare
0861545 to
8763cdd
Compare
8763cdd to
2711607
Compare
|
@derickr can we merge this PR?
|
|
@derickr can you please get to this PR? |
|
I'm still not keen in adding this, as it's more code to maintain on a platform that I know little about. I haven't heard a good reason why this is needed, or what big benefit it has. |
|
I have been profiling very critical code parts which have a lot of very simple/fast calls. Currently, on Windows, the time resolution is 100 ns. But, as mentioned in the description, the simple/fast calls take about 20 ns and without this PR the profiler result is more or less random :), thus I propose this PR to: I plan then to finish #628 which will remove the xdebug overhead and measure simple/fast calls below 10 ns. In numbers, simple/fast call is currently recorded with 10 ns (minimal/fallback step) or 100 ns (minimal timer step). Even if the times should average to +- the real result, they don't as the time needed to read the current time is currently way larger than the fastest function call itself. Imagine a loop with 1k elements and simple code inside like So with this PR, the measured time will decrease from 70 us to 20 us. With the later PR, it will be 10 us. With more elements/calls it can be millisconds. And if the measured time is 7x greater than in the reality, it confuses the developer so much that he may think some heavy call like DB query is faster. It is not, only 50 ns overhead per 1 ms DB query is unimportant, but for the fastest functions, the results are almost an order of magnitude off. To your point about Windows: it is tested in CI, feel free to break the code to verify (like by speeding the time by multiplying it by some constant) |
|
@derickr would you be ok to add |
|
Previous timing related PRs have given me a lot of trouble, so I won't be accepting this, or anything else related to timing any more. |
|
Thank you for the feedback Derick, can you please be more specific if there was anything I could done better from my side? |
|
Hi @derickr, I am still facing with inaccurate time measurements. Measuring the time in sense of CPU instructions may be a solution, although not all CPU instructions take the same time. What are your thoughts, would you accept a PR for measuring time based on CPU instructions? Or did you try this solution in the past and rejected it for some reason? Thanks. |
This PR further improves #611. The simplest php functions (
is_int,count, ...) take about 20 ns per call, but the best Windows timer has resolution of 100 ns [1] which implies that this timer source is short to simple php functions. When a script/project under test contain a lot of calls of simple php functions, the xdebug profiler output is quite ambiguous (some calls are recorded with 10 ns duration (minimal time step) and some with 100 ns (minimal timer step)).This PR solves this issue by using RDTSC as a clock source, which has resolution of about 0.25 ns and reading it is fast, it takes about 6 ns per call. On modern CPUs (Nehalen or newer), it is incremented at constant rate
However, there is no way to detect the RDTSC speed, therefore it needs a short calibration (2 ms) and this feature
must by enabled explictly byis enabled by default on Windows but only for profile mode. As php process takes about 10 ms to startup on Windows, I belive the calibration delay is acceptable against much more accurate results.xdebug.profiler_tsc_as_clockphp ini option[1] https://stackoverflow.com/questions/63205226/better-than-100ns-resolution-timers-in-windows