optimization on matrix-matrix mulitplication #172

…at-mat-mul seems to work now

…case in packing.hpp, mat-mat-mul seems to work now

… added a framework for reading cache sizes, used to calculate block sizes (not yet implemented).

…fails

…loads/stores, further bug fixes Renamed "get_cache_sizes.hpp" to "get_block_sizes.hpp", where get_block_sizes() is called evertime prod() is invoked, since we have to dynamically assign mr/nr as they depend on wether float or double entries are processed. The AVX-microkernels work for doubles and float, where the approach taken for float entries differs from that for doubles due to limitations of the AVX-instructions.

…pp), made sure standard-microkernel is working

… accordingly, unfortunately fails due to segfaults on amd-systems, intel-systems untested

The inline assembler in get_cache_sizes() gets a pointer to an array which should be stored in %rdi. This was the only way I could get it to work propperly, as specifying input/output operands would yield segfaults. Therefore, the inline assembler is in a seperate function and relies on the standard register or first function argument (i.e. %rdi). I do not know if this could cause problems on other systems => needs to be tested.

…cpus fails

…thouroughly tested CPUID info can be obtained through cpuid-leaf2 or cpuid-leaf4 on intel CPUs. It depends on the CPU, which leaf to use. Both have been implemented and leaf2 works correctly on a core 2 quad q9400. Further,thorough testing and double checking of the huge switch-case for leaf2 has NOT been done.

into gemmopt-avx

quick tests did not show any performance impacts

Please enter the commit message for your changes. Lines starting

… of available threads Please enter the commit message for your changes. Lines starting

…riptions

…emory_create() etc., fixed underflows when calculatiting num_of_blocks.. and num_residue_slivers..

…ther memory_create()s (buffer_A/B)

…lude in matrix_operations.hpp

…ned L1/2/3_AVX/SSE_DENOMs to quickly change what fraction of cache should be filled with the blocks

Commits on Oct 12, 2015

fixed get_cache_intel_leaf4()

Max authored and Max committed Oct 12, 2015

Configuration menu

View commit details

Copy full SHA for ed01ae4

Browse repository at this point

Copy the full SHA

ed01ae4 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization on matrix-matrix mulitplication #172

optimization on matrix-matrix mulitplication #172

Commits on Sep 4, 2015

Commits on Sep 5, 2015

Commits on Sep 11, 2015

Commits on Sep 17, 2015

Commits on Sep 21, 2015

Commits on Sep 24, 2015

Commits on Sep 25, 2015

Commits on Sep 27, 2015

Commits on Sep 29, 2015

Commits on Oct 2, 2015

Commits on Oct 5, 2015

Commits on Oct 12, 2015

Commits on Oct 13, 2015

Commits on Oct 14, 2015

Commits on Nov 1, 2015

Commits on Nov 5, 2015

Commits on Nov 9, 2015

Commits on Dec 2, 2015

optimization on matrix-matrix mulitplication #172

Are you sure you want to change the base?

optimization on matrix-matrix mulitplication #172

Commits on Sep 4, 2015

Commits on Sep 5, 2015

Commits on Sep 11, 2015

Commits on Sep 17, 2015

Commits on Sep 21, 2015

Commits on Sep 24, 2015

Commits on Sep 25, 2015

Commits on Sep 27, 2015

Commits on Sep 29, 2015

Commits on Oct 2, 2015

Commits on Oct 5, 2015

Commits on Oct 12, 2015

Commits on Oct 13, 2015

Commits on Oct 14, 2015

Commits on Nov 1, 2015

Commits on Nov 5, 2015

Commits on Nov 9, 2015

Commits on Dec 2, 2015