-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization on matrix-matrix mulitplication #172
base: master
Are you sure you want to change the base?
Commits on Sep 4, 2015
-
added product test in ./tests viennacl/, fixed some packaging bugs, m…
…at-mat-mul seems to work now
Fritzkefit committedSep 4, 2015 Configuration menu - View commit details
-
Copy full SHA for ab756df - Browse repository at this point
Copy the full SHA ab756dfView commit details
Commits on Sep 5, 2015
-
got rid of the wrappers for A and B, switched offsets for transposed …
…case in packing.hpp, mat-mat-mul seems to work now
Fritzkefit committedSep 5, 2015 Configuration menu - View commit details
-
Copy full SHA for a8789e6 - Browse repository at this point
Copy the full SHA a8789e6View commit details
Commits on Sep 11, 2015
-
Altered packing and blocking to be ready for BLIS-micro-kernels. Also…
… added a framework for reading cache sizes, used to calculate block sizes (not yet implemented).
Fritzkefit committedSep 11, 2015 Configuration menu - View commit details
-
Copy full SHA for 2a85254 - Browse repository at this point
Copy the full SHA 2a85254View commit details
Commits on Sep 17, 2015
-
Fixed a lot of bugs, still some remaining, "avx_prod_test 1 513 513" …
…fails
Fritzkefit committedSep 17, 2015 Configuration menu - View commit details
-
Copy full SHA for e304016 - Browse repository at this point
Copy the full SHA e304016View commit details
Commits on Sep 21, 2015
-
AVX-microkernel works now for double and float, got rid of unaligned …
…loads/stores, further bug fixes Renamed "get_cache_sizes.hpp" to "get_block_sizes.hpp", where get_block_sizes() is called evertime prod() is invoked, since we have to dynamically assign mr/nr as they depend on wether float or double entries are processed. The AVX-microkernels work for doubles and float, where the approach taken for float entries differs from that for doubles due to limitations of the AVX-instructions.
Fritzkefit committedSep 21, 2015 Configuration menu - View commit details
-
Copy full SHA for 941feeb - Browse repository at this point
Copy the full SHA 941feebView commit details -
transfered aligned-buffer-functions to its own file (aligned_buffer.h…
…pp), made sure standard-microkernel is working
Fritzkefit committedSep 21, 2015 Configuration menu - View commit details
-
Copy full SHA for 7ca4779 - Browse repository at this point
Copy the full SHA 7ca4779View commit details
Commits on Sep 24, 2015
-
get_block_sizes() now tries to read the cpuid and set the cache sizes…
… accordingly, unfortunately fails due to segfaults on amd-systems, intel-systems untested
Fritzkefit committedSep 24, 2015 Configuration menu - View commit details
-
Copy full SHA for 61e080d - Browse repository at this point
Copy the full SHA 61e080dView commit details
Commits on Sep 25, 2015
-
cache size information on AMD systems are now read correctly
The inline assembler in get_cache_sizes() gets a pointer to an array which should be stored in %rdi. This was the only way I could get it to work propperly, as specifying input/output operands would yield segfaults. Therefore, the inline assembler is in a seperate function and relies on the standard register or first function argument (i.e. %rdi). I do not know if this could cause problems on other systems => needs to be tested.
Fritzkefit committedSep 25, 2015 Configuration menu - View commit details
-
Copy full SHA for 0776b86 - Browse repository at this point
Copy the full SHA 0776b86View commit details
Commits on Sep 27, 2015
-
added #ifdef to include correct micro-kernel, reading cache on intel …
…cpus fails
Fritzkefit committedSep 27, 2015 Configuration menu - View commit details
-
Copy full SHA for b306a14 - Browse repository at this point
Copy the full SHA b306a14View commit details
Commits on Sep 29, 2015
-
cpuid is read correctly in either one of two ways on intel cpus, not …
…thouroughly tested CPUID info can be obtained through cpuid-leaf2 or cpuid-leaf4 on intel CPUs. It depends on the CPU, which leaf to use. Both have been implemented and leaf2 works correctly on a core 2 quad q9400. Further,thorough testing and double checking of the huge switch-case for leaf2 has NOT been done.
Fritzkefit committedSep 29, 2015 Configuration menu - View commit details
-
Copy full SHA for b578b17 - Browse repository at this point
Copy the full SHA b578b17View commit details -
fixed switch case in set_cache_intel()
Fritzkefit committedSep 29, 2015 Configuration menu - View commit details
-
Copy full SHA for fcbfa30 - Browse repository at this point
Copy the full SHA fcbfa30View commit details
Commits on Oct 2, 2015
-
updated benchmarks and file structure in tests\ viennacl
Fritzkefit committedOct 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 8ab70e5 - Browse repository at this point
Copy the full SHA 8ab70e5View commit details -
Fritzkefit committed
Oct 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 0d71ac1 - Browse repository at this point
Copy the full SHA 0d71ac1View commit details
Commits on Oct 5, 2015
-
Fritzkefit committed
Oct 5, 2015 Configuration menu - View commit details
-
Copy full SHA for ad7e436 - Browse repository at this point
Copy the full SHA ad7e436View commit details -
extended MR_D block size for avx_micro_kernel<double>()
Fritzkefit committedOct 5, 2015 Configuration menu - View commit details
-
Copy full SHA for f401acd - Browse repository at this point
Copy the full SHA f401acdView commit details -
Merge branch 'gemmopt-avx' of https://github.com/Fritzkefit/viennacl-dev
Fritzkefit committedOct 5, 2015 Configuration menu - View commit details
-
Copy full SHA for 6886625 - Browse repository at this point
Copy the full SHA 6886625View commit details
Commits on Oct 12, 2015
-
Max authored and Max committed
Oct 12, 2015 Configuration menu - View commit details
-
Copy full SHA for ed01ae4 - Browse repository at this point
Copy the full SHA ed01ae4View commit details
Commits on Oct 13, 2015
-
adjusted how block sizes are calculated
quick tests did not show any performance impacts
Max authored and Max committedOct 13, 2015 Configuration menu - View commit details
-
Copy full SHA for 7c989c2 - Browse repository at this point
Copy the full SHA 7c989c2View commit details
Commits on Oct 14, 2015
-
nothing functional changed, switching systems that's why commit/push
Max authored and Max committedOct 14, 2015 Configuration menu - View commit details
-
Copy full SHA for 17c7469 - Browse repository at this point
Copy the full SHA 17c7469View commit details -
Please enter the commit message for your changes. Lines starting
Fritzkefit committedOct 14, 2015 Configuration menu - View commit details
-
Copy full SHA for b2dd9fb - Browse repository at this point
Copy the full SHA b2dd9fbView commit details
Commits on Nov 1, 2015
-
parallel for around first loop in macro-kernel, nc is at least number…
… of available threads Please enter the commit message for your changes. Lines starting
Fritzkefit committedNov 1, 2015 Configuration menu - View commit details
-
Copy full SHA for e1d658d - Browse repository at this point
Copy the full SHA e1d658dView commit details -
deleted comments conatining debug-code and added minimal doxygen desc…
…riptions
Fritzkefit committedNov 1, 2015 Configuration menu - View commit details
-
Copy full SHA for 5f97487 - Browse repository at this point
Copy the full SHA 5f97487View commit details
Commits on Nov 5, 2015
-
fixed inline assembler nonsense, swapped 'get_aligned_buffer()' for m…
…emory_create() etc., fixed underflows when calculatiting num_of_blocks.. and num_residue_slivers..
Fritzkefit committedNov 5, 2015 Configuration menu - View commit details
-
Copy full SHA for 7915cc3 - Browse repository at this point
Copy the full SHA 7915cc3View commit details -
removed DEBUG comments and moved memory_create() of buffer_C to the o…
…ther memory_create()s (buffer_A/B)
Fritzkefit committedNov 5, 2015 Configuration menu - View commit details
-
Copy full SHA for 60009d8 - Browse repository at this point
Copy the full SHA 60009d8View commit details -
forgot commented free()s after macro-kernel
Fritzkefit committedNov 5, 2015 Configuration menu - View commit details
-
Copy full SHA for aa6fd76 - Browse repository at this point
Copy the full SHA aa6fd76View commit details -
deleted test files, align_buffer.hpp (not needed anymore) and its inc…
…lude in matrix_operations.hpp
Fritzkefit committedNov 5, 2015 Configuration menu - View commit details
-
Copy full SHA for 656e842 - Browse repository at this point
Copy the full SHA 656e842View commit details
Commits on Nov 9, 2015
-
added option to use posix_memalign() instead of aligned_alloc(), defi…
…ned L1/2/3_AVX/SSE_DENOMs to quickly change what fraction of cache should be filled with the blocks
Fritzkefit committedNov 9, 2015 Configuration menu - View commit details
-
Copy full SHA for 4d8ee48 - Browse repository at this point
Copy the full SHA 4d8ee48View commit details -
Fritzkefit committed
Nov 9, 2015 Configuration menu - View commit details
-
Copy full SHA for 36e5d70 - Browse repository at this point
Copy the full SHA 36e5d70View commit details
Commits on Dec 2, 2015
-
Fritzkefit committed
Dec 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 8ac081c - Browse repository at this point
Copy the full SHA 8ac081cView commit details -
add min matrix-size for omp loops
Fritzkefit committedDec 2, 2015 Configuration menu - View commit details
-
Copy full SHA for 949106a - Browse repository at this point
Copy the full SHA 949106aView commit details