Skip to content
Travis Downs edited this page Sep 10, 2022 · 36 revisions

Miscellaneous links that I'll probably want to read again in the future.

Memory

Caches

Cache behavior for all recent Intel at uops.info.

DRAM

Detailed DDR4 stuff

DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks Description of physical -> DRAM mapping RE techniques, and results for many systems including Skylake DDR4. Uses a timing approach to find pairs of addresses that have a bank/row conflict. Associated github repo with the RE tool.

Ulrich Drepper's What Every Programmer Should Know About Memory

Sandy Bridge physical address to DRAM mapping well described. Note that Ivy Bridge is apparently more complex in the channel mapping.

Method for reverse engineering the physical-DRAM mapping describes how to determine the DRAM bank mappings by searching for bank collisions via timing.

Physical Address Decoding in Intel Xeon v3/v4 CPUs: A Supplemental Datasheet describes the physical to DRAM mapping down to rank granularity (socket, channel, rank, but not finder). They use a combination of a linear DRAM mapping and the normal interleaved mapping for the same region of DRAM, and then write a token using one mapping and search for it with the other, allowing exact determination of the interleaving function w/o any dependence on timing.

L3/Superqueue

Detailed reverse engineering of the LLC/LC ring bus using contention.

Descripiton of how to map offcore traffic counters to specific locations on the die, and a bit about the types of busses that are involved:

https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/777530

Description of the physical address to slice mapping (see references) and a strategy to have slice-locality in the L3:

https://people.kth.se/~farshin/documents/slice-aware-eurosys19.pdf

Good description of the uncore and QPI linkes, including LLC. For Westmere but probably still relevant in many ways:

Intel Technology Journal

TLB/paging

Reverse engineering of the intermediate paging caches: Reverse Engineering Hardware Page Table Caches Using Side-Channel Attacks on the MMU and this similar set of slides: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-van_schaik.pdf.

Coherency Etc

John McCalpin's description of how coherency works in KNL and Skylake-SP is excellent:

Topology and Cache Coherence in Knights Landing and Skylake Xeon Processors

A list of all sorts of links to coherency and caching info on Intel processors.

ALU/Core

The thesis Combining static and dynamic approaches to model loop performance in HPC has lots of good stuff in the appendices A and B, including methodologies for measuring the little-known load matrix size.

Branch Prediction

BTB and branch throughput measurement on Cloudflare blog

Voltage

Skylake voltage thread on notebook review.

Underclock readme, including links to three tools that do the heavy lifting for you.

Power

Power and energy use.

Energy Efficiency Features of the Intel Skylake-SPProcessor and Their Impact on Performance shows that 512-bit xor power consumption depends significantly on the number of bits which are flipped and also less strongly on the number of 1 bits in the output.

DGEMM performance is data-dependent describes how matrix multiplication performance varies based on the element values, with constant values having the lowest energy (possibly due to fewer bit flips in the FMA and associated circuitry).

Timing

John McCalpin on the costs of rdtsc and rdtscp on Intel Forum.

Another forum post with more rdtsc details and some kernel module to measure performance.

Perf

The only thing I've found with any type of explanation of the PEBS shadow effect.

Misc

Intel, AMD, Graviton cloud CPU share.

AMD

All manuals (PPR for perf counters)