[AE] fix remapping - downmixing algorithm #1390

Merged
merged 1 commit into from Sep 8, 2012

Conversation

Projects
None yet
4 participants
@Voyager1
Member

Voyager1 commented Sep 7, 2012

The basic algorithm failed when a given output channel is mixing more than 4 sources. I discovered this when playing a 6.1 DTS-ES on stereo output. I didn't have the center channel, which turned out to be the 5th channel in the mix!!

So when doing 7 ch -> 2 ch, each channel is taking more than 4 sources (e.g. FR = FR+ SR+ BC+ LFE+ FC)

The code seems to optimize in "blocks of 4", by performing loop unrolling, but there was double index increase (inside the unrolled loop plus in the for statement), thereby jumping from 3 to 8 instead of 3 to 4. Channel index 4 was the front center.

Tested the change, and voilà the 6.1 sounds downmix correctly now.

update: next to this, the loop has been improved for better parallelism.

@Voyager1

This comment has been minimized.

Show comment
Hide comment
@Voyager1

Voyager1 Sep 8, 2012

Member

note: updated commit, originally made a slight mistake.

Member

Voyager1 commented Sep 8, 2012

note: updated commit, originally made a slight mistake.

@terual

This comment has been minimized.

Show comment
Hide comment
@terual

terual Sep 8, 2012

Shouldn't it look like:
for (; i < blocks; i += 4) { *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level; *outOffset += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level; *outOffset += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level; *outOffset += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level; }

terual commented Sep 8, 2012

Shouldn't it look like:
for (; i < blocks; i += 4) { *outOffset += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level; *outOffset += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level; *outOffset += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level; *outOffset += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level; }

@Voyager1

This comment has been minimized.

Show comment
Hide comment
@Voyager1

Voyager1 Sep 8, 2012

Member

that would be ok too. But even better (from an scalar cpu optimization point of view) would be to have four different float numbers f1, f2, f3, f4 in front of the value*level calculation. After the for loop you add them up like *outOffset += (f1+f2+f3+f4);

Member

Voyager1 commented Sep 8, 2012

that would be ok too. But even better (from an scalar cpu optimization point of view) would be to have four different float numbers f1, f2, f3, f4 in front of the value*level calculation. After the for loop you add them up like *outOffset += (f1+f2+f3+f4);

@Voyager1

This comment has been minimized.

Show comment
Hide comment
@Voyager1

Voyager1 Sep 8, 2012

Member

Example of how the loop could be better unrolled (more parallel execution because less interdependency):

    /* the compiler has a better chance of optimizing this if it is done in parallel */
    int i = 0;
    float f1 = 0.0, f2 = 0.0, f3 = 0.0, f4 = 0.0;
    for (; i < blocks; i += 4)
    {
      f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
      f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      f4 += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level;
    }

    /* unrolled loop for higher performance */
    switch (info->srcCount & 0x3)
    {
      case 3: f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      case 2: f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      case 1: f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
    }

    *outOffset += (f1+f2+f3+f4);
Member

Voyager1 commented Sep 8, 2012

Example of how the loop could be better unrolled (more parallel execution because less interdependency):

    /* the compiler has a better chance of optimizing this if it is done in parallel */
    int i = 0;
    float f1 = 0.0, f2 = 0.0, f3 = 0.0, f4 = 0.0;
    for (; i < blocks; i += 4)
    {
      f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
      f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      f4 += inOffset[info->srcIndex[i+3].index] * info->srcIndex[i+3].level;
    }

    /* unrolled loop for higher performance */
    switch (info->srcCount & 0x3)
    {
      case 3: f3 += inOffset[info->srcIndex[i+2].index] * info->srcIndex[i+2].level;
      case 2: f2 += inOffset[info->srcIndex[i+1].index] * info->srcIndex[i+1].level;
      case 1: f1 += inOffset[info->srcIndex[i].index] * info->srcIndex[i].level;
    }

    *outOffset += (f1+f2+f3+f4);
Voyager-xbmc
fix remapping - downmixing algorithm. Didn't work for 7 ch -> 2 ch, e…
…ach channel taking more than 4 sources (e.g. FR = FR+ SR+ BC+ LFE+ FC)
@Voyager1

This comment has been minimized.

Show comment
Hide comment
@Voyager1

Voyager1 Sep 8, 2012

Member

updated commit: better potential for parallel execution optimization since dependencies are eliminated:

  • index is not changed inside loop
  • sums are calculated separately and added after the loop
Member

Voyager1 commented Sep 8, 2012

updated commit: better potential for parallel execution optimization since dependencies are eliminated:

  • index is not changed inside loop
  • sums are calculated separately and added after the loop
@Voyager1

This comment has been minimized.

Show comment
Hide comment
@Voyager1

Voyager1 Sep 8, 2012

Member

@DDDamian Can you take a look?

Member

Voyager1 commented Sep 8, 2012

@DDDamian Can you take a look?

DDDamian added a commit that referenced this pull request Sep 8, 2012

Merge pull request #1390 from Voyager-xbmc/ae-fix-remapping
[AE] fix remapping - downmixing algorithm - good catch Voyager

@DDDamian DDDamian merged commit feda1e7 into xbmc:master Sep 8, 2012

@gnif

This comment has been minimized.

Show comment
Hide comment
@gnif

gnif Sep 27, 2012

Member

wow, that was a pretty obvious bug, thanks for catching it :)

Member

gnif commented on 582c88a Sep 27, 2012

wow, that was a pretty obvious bug, thanks for catching it :)

@Voyager1 Voyager1 deleted the Voyager1:ae-fix-remapping branch May 16, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment