Skip to content

Loading…

[AE] fix remapping - downmixing algorithm #1390

Merged
merged 1 commit into from

4 participants

@Voyager1
Team Kodi member

The basic algorithm failed when a given output channel is mixing more than 4 sources. I discovered this when playing a 6.1 DTS-ES on stereo output. I didn't have the center channel, which turned out to be the 5th channel in the mix!!

So when doing 7 ch -> 2 ch, each channel is taking more than 4 sources (e.g. FR = FR+ SR+ BC+ LFE+ FC)

The code seems to optimize in "blocks of 4", by performing loop unrolling, but there was double index increase (inside the unrolled loop plus in the for statement), thereby jumping from 3 to 8 instead of 3 to 4. Channel index 4 was the front center.

Tested the change, and voilà the 6.1 sounds downmix correctly now.

update: next to this, the loop has been improved for better parallelism.

@Voyager1
Team Kodi member

note: updated commit, originally made a slight mistake.

@terual

Shouldn't it look like:
for (; i < blocks; i += 4)
{
*outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
*outOffset += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
*outOffset += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
*outOffset += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level;
}

@Voyager1
Team Kodi member

that would be ok too. But even better (from an scalar cpu optimization point of view) would be to have four different float numbers f1, f2, f3, f4 in front of the value*level calculation. After the for loop you add them up like *outOffset += (f1+f2+f3+f4);

@Voyager1
Team Kodi member

Example of how the loop could be better unrolled (more parallel execution because less interdependency):

    /* the compiler has a better chance of optimizing this if it is done in parallel */
    int i = 0;
    float f1 = 0.0, f2 = 0.0, f3 = 0.0, f4 = 0.0;
    for (; i < blocks; i += 4)
    {
      f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
      f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      f4 += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level;
    }

    /* unrolled loop for higher performance */
    switch (info->srcCount & 0x3)
    {
      case 3: f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      case 2: f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      case 1: f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
    }

    *outOffset += (f1+f2+f3+f4);
Voyager-xbmc fix remapping - downmixing algorithm. Didn't work for 7 ch -> 2 ch, e…
…ach channel taking more than 4 sources (e.g. FR = FR+ SR+ BC+ LFE+ FC)
582c88a
@Voyager1
Team Kodi member

updated commit: better potential for parallel execution optimization since dependencies are eliminated:

  • index is not changed inside loop
  • sums are calculated separately and added after the loop
@Voyager1
Team Kodi member

@DDDamian Can you take a look?

@DDDamian DDDamian merged commit feda1e7 into xbmc:master
@gnif
Team Kodi member

wow, that was a pretty obvious bug, thanks for catching it :)

@Voyager1 Voyager1 deleted the Voyager1:ae-fix-remapping branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 8, 2012
  1. fix remapping - downmixing algorithm. Didn't work for 7 ch -> 2 ch, e…

    Voyager-xbmc committed
    …ach channel taking more than 4 sources (e.g. FR = FR+ SR+ BC+ LFE+ FC)
This page is out of date. Refresh to see the latest.
Showing with 10 additions and 7 deletions.
  1. +10 −7 xbmc/cores/AudioEngine/Utils/AERemap.cpp
View
17 xbmc/cores/AudioEngine/Utils/AERemap.cpp
@@ -344,21 +344,24 @@ void CAERemap::Remap(float * const in, float * const out, const unsigned int fra
/* the compiler has a better chance of optimizing this if it is done in parallel */
int i = 0;
+ float f1 = 0.0, f2 = 0.0, f3 = 0.0, f4 = 0.0;
for (; i < blocks; i += 4)
{
- *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
- *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
- *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
- *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
+ f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
+ f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
+ f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
+ f4 += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level;
}
/* unrolled loop for higher performance */
switch (info->srcCount & 0x3)
{
- case 3: *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
- case 2: *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level, i++;
- case 1: *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
+ case 3: f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
+ case 2: f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
+ case 1: f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
}
+
+ *outOffset += (f1+f2+f3+f4);
}
}
}
Something went wrong with that request. Please try again.