Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization on matrix-matrix mulitplication #172

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Commits on Sep 4, 2015

  1. added product test in ./tests viennacl/, fixed some packaging bugs, m…

    …at-mat-mul seems to work now
    Fritzkefit committed Sep 4, 2015
    Configuration menu
    Copy the full SHA
    ab756df View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2015

  1. got rid of the wrappers for A and B, switched offsets for transposed …

    …case in packing.hpp, mat-mat-mul seems to work now
    Fritzkefit committed Sep 5, 2015
    Configuration menu
    Copy the full SHA
    a8789e6 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2015

  1. Altered packing and blocking to be ready for BLIS-micro-kernels. Also…

    … added a framework for reading cache sizes, used to calculate block sizes (not yet implemented).
    Fritzkefit committed Sep 11, 2015
    Configuration menu
    Copy the full SHA
    2a85254 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2015

  1. Fixed a lot of bugs, still some remaining, "avx_prod_test 1 513 513" …

    …fails
    Fritzkefit committed Sep 17, 2015
    Configuration menu
    Copy the full SHA
    e304016 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2015

  1. AVX-microkernel works now for double and float, got rid of unaligned …

    …loads/stores, further bug fixes
    
    Renamed "get_cache_sizes.hpp" to "get_block_sizes.hpp", where get_block_sizes() is called evertime prod() is invoked, since we have to dynamically assign mr/nr as they depend on wether float or double entries are processed.
    The AVX-microkernels work for doubles and float, where the approach taken for float entries differs from that for doubles due to limitations of the AVX-instructions.
    Fritzkefit committed Sep 21, 2015
    Configuration menu
    Copy the full SHA
    941feeb View commit details
    Browse the repository at this point in the history
  2. transfered aligned-buffer-functions to its own file (aligned_buffer.h…

    …pp), made sure standard-microkernel is working
    Fritzkefit committed Sep 21, 2015
    Configuration menu
    Copy the full SHA
    7ca4779 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2015

  1. get_block_sizes() now tries to read the cpuid and set the cache sizes…

    … accordingly, unfortunately fails due to segfaults on amd-systems, intel-systems untested
    Fritzkefit committed Sep 24, 2015
    Configuration menu
    Copy the full SHA
    61e080d View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2015

  1. cache size information on AMD systems are now read correctly

    The inline assembler in get_cache_sizes() gets a pointer to an array which should be stored in %rdi.
    This was the only way I could get it to work propperly, as specifying input/output operands would yield segfaults.
    Therefore, the inline assembler is in a seperate function and relies on the standard register or first function argument (i.e. %rdi).
    I do not know if this could cause problems on other systems => needs to be tested.
    Fritzkefit committed Sep 25, 2015
    Configuration menu
    Copy the full SHA
    0776b86 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2015

  1. added #ifdef to include correct micro-kernel, reading cache on intel …

    …cpus fails
    Fritzkefit committed Sep 27, 2015
    Configuration menu
    Copy the full SHA
    b306a14 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2015

  1. cpuid is read correctly in either one of two ways on intel cpus, not …

    …thouroughly tested
    
    CPUID info can be obtained through cpuid-leaf2 or cpuid-leaf4 on intel CPUs.
    It depends on the CPU, which leaf to use. Both have been implemented and
    leaf2 works correctly on a core 2 quad q9400. Further,thorough testing and double checking of the huge switch-case for leaf2 has NOT been done.
    Fritzkefit committed Sep 29, 2015
    Configuration menu
    Copy the full SHA
    b578b17 View commit details
    Browse the repository at this point in the history
  2. fixed switch case in set_cache_intel()

    Fritzkefit committed Sep 29, 2015
    Configuration menu
    Copy the full SHA
    fcbfa30 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2015

  1. Configuration menu
    Copy the full SHA
    8ab70e5 View commit details
    Browse the repository at this point in the history
  2. updated microkernel

    Fritzkefit committed Oct 2, 2015
    Configuration menu
    Copy the full SHA
    0d71ac1 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2015

  1. added sse kernel

    Fritzkefit committed Oct 5, 2015
    Configuration menu
    Copy the full SHA
    ad7e436 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f401acd View commit details
    Browse the repository at this point in the history
  3. Merge branch 'gemmopt-avx' of https://github.com/Fritzkefit/viennacl-dev

     into gemmopt-avx
    Fritzkefit committed Oct 5, 2015
    Configuration menu
    Copy the full SHA
    6886625 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2015

  1. fixed get_cache_intel_leaf4()

    Max authored and Max committed Oct 12, 2015
    Configuration menu
    Copy the full SHA
    ed01ae4 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2015

  1. adjusted how block sizes are calculated

    quick tests did not show any performance impacts
    Max authored and Max committed Oct 13, 2015
    Configuration menu
    Copy the full SHA
    7c989c2 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2015

  1. nothing functional changed, switching systems that's why commit/push

    Max authored and Max committed Oct 14, 2015
    Configuration menu
    Copy the full SHA
    17c7469 View commit details
    Browse the repository at this point in the history
  2. extended the benchmarks

    Please enter the commit message for your changes. Lines starting
    Fritzkefit committed Oct 14, 2015
    Configuration menu
    Copy the full SHA
    b2dd9fb View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2015

  1. parallel for around first loop in macro-kernel, nc is at least number…

    … of available threads
    
    Please enter the commit message for your changes. Lines starting
    Fritzkefit committed Nov 1, 2015
    Configuration menu
    Copy the full SHA
    e1d658d View commit details
    Browse the repository at this point in the history
  2. deleted comments conatining debug-code and added minimal doxygen desc…

    …riptions
    Fritzkefit committed Nov 1, 2015
    Configuration menu
    Copy the full SHA
    5f97487 View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2015

  1. fixed inline assembler nonsense, swapped 'get_aligned_buffer()' for m…

    …emory_create() etc., fixed underflows when calculatiting num_of_blocks.. and num_residue_slivers..
    Fritzkefit committed Nov 5, 2015
    Configuration menu
    Copy the full SHA
    7915cc3 View commit details
    Browse the repository at this point in the history
  2. removed DEBUG comments and moved memory_create() of buffer_C to the o…

    …ther memory_create()s (buffer_A/B)
    Fritzkefit committed Nov 5, 2015
    Configuration menu
    Copy the full SHA
    60009d8 View commit details
    Browse the repository at this point in the history
  3. forgot commented free()s after macro-kernel

    Fritzkefit committed Nov 5, 2015
    Configuration menu
    Copy the full SHA
    aa6fd76 View commit details
    Browse the repository at this point in the history
  4. deleted test files, align_buffer.hpp (not needed anymore) and its inc…

    …lude in matrix_operations.hpp
    Fritzkefit committed Nov 5, 2015
    Configuration menu
    Copy the full SHA
    656e842 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2015

  1. added option to use posix_memalign() instead of aligned_alloc(), defi…

    …ned L1/2/3_AVX/SSE_DENOMs to quickly change what fraction of cache should be filled with the blocks
    Fritzkefit committed Nov 9, 2015
    Configuration menu
    Copy the full SHA
    4d8ee48 View commit details
    Browse the repository at this point in the history
  2. few cleanups on comments

    Fritzkefit committed Nov 9, 2015
    Configuration menu
    Copy the full SHA
    36e5d70 View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2015

  1. CMake tests now use AVX/SSE

    Fritzkefit committed Dec 2, 2015
    Configuration menu
    Copy the full SHA
    8ac081c View commit details
    Browse the repository at this point in the history
  2. add min matrix-size for omp loops

    Fritzkefit committed Dec 2, 2015
    Configuration menu
    Copy the full SHA
    949106a View commit details
    Browse the repository at this point in the history