Skip to content

[BUG] Baytrail/CherryTrail: samples are not played in correct order #1215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
plbossart opened this issue Apr 2, 2019 · 35 comments
Closed

[BUG] Baytrail/CherryTrail: samples are not played in correct order #1215

plbossart opened this issue Apr 2, 2019 · 35 comments
Assignees
Labels
bug Something isn't working as expected BYT Applies to Baytrail platform
Milestone

Comments

@plbossart
Copy link
Member

Describe the bug

When playing samples that are incremented by one from a sample to the other, the output shows discontinuities of 96 samples, see
baytrail.xlsx
. With regular sine waves this has the net effect of adding noise.

This clearly points to bad buffer/pointer management or scheduling issues.

To Reproduce

Configure Minnowboard, use kernel topic/sof-dev and firmware v1.3-rc
connect LogicPro on SSP pins (bclk, fs, dac)
unzip and play file
cnt_1024_left.pcm.gz
aplay -Dhw:1,0 -c2 -r48000 -f dat cnt_1024_left.pcm
grab results with LogicPro and check the decoded values (should be similar to excel sheet provided above)

Expected behavior

samples are expected to be played in order, with wrap-around at 1024

Impact

showstopper

Environment

  1. Branch name and commit hash of 3 repositories: sof (firmware), linux (kernel driver) and soft (tools & topology).

kernel f3adfd6
firmware eac4a73
2) Name of the topology file

sof-byt-nocodec.tplg (or any byt topology, doesn't matter)

  1. Name of the platform(s) on which the bug is observed.

All Baytrail and cherrytrail platforms. The measurements are only seen when the connector is exposed, e.g. on Minnowboard.

  1. Reproducibility Rate. 100%
@plbossart plbossart added the bug Something isn't working as expected label Apr 2, 2019
@lgirdwood lgirdwood added this to the 1.3 milestone Apr 3, 2019
@lgirdwood
Copy link
Member

lgirdwood commented Apr 3, 2019

@mengdonglin @RanderWang I'm sure this is dw-dma DMA buffer R/W pointer related. The DMA on BYT does not support HW LLI mode and LLI must be programmed via SW. There have also been some recent commits to the DMA driver that should be bisected.

The issue here is that the DMA HW read period (data that is be sent to SSP) is being written by the FW pipeline at the same time. i.e. we are reading and writing the same period at the same time. Some trace should show this. This is obviously working in HW LLI mode for non BYT platforms.

@tlauda
Copy link
Contributor

tlauda commented Apr 3, 2019

@ranj063 @xiulipan @lgirdwood What has changed since issue #1107 has been fixed? Are you saying there is another regression? The only problem with noise I was aware of was problem with SRC #1161.

@juimonen
Copy link

juimonen commented Apr 3, 2019

@tlauda @lgirdwood I did some bisecting and for me the problems start way back:
so this is minnowboard + rt5651 (and rt5651 topology, sof master cbd3e07)

  1. 22e3cf9 works for me I can play audio and its clean.
  2. 9a6a38a starts problems, this commit doesn't work at all
  3. 407be96 onward until HEAD I can play but the sound is metallic.

@lgirdwood
Copy link
Member

@tlauda not sure but CI was reporting false success.
@juimonen I've ran into this before hence I know the root cause. Can you check the DMA and it's source component R/W pointers as they are both using the same period for IO as explained above.

@mengdonglin mengdonglin added the BYT Applies to Baytrail platform label Apr 4, 2019
@xiulipan
Copy link
Contributor

xiulipan commented Apr 4, 2019

@tlauda @lgirdwood @juimonen
Confirmed with the today's daily build binary. With pause/resume the noise can be randomly recover.
It should be like what @lgirdwood have said

I'm sure this is dw-dma DMA buffer R/W pointer related. The DMA on BYT does not support HW LLI mode and LLI must be programmed via SW. There have also been some recent commits to the DMA driver that should be bisected.

The issue here is that the DMA HW read period (data that is be sent to SSP) is being written by the FW pipeline at the same time. i.e. we are reading and writing the same period at the same time. Some trace should show this. This is obviously working in HW LLI mode for non BYT platforms.

And we have fixed similar issue once. I think this maybe caused by the recent change to the pipeline.

But as 9a6a38a is something related to the mixer component. I will try to use some pass through pipeline to check if it is related to the mixer component first.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@xiulipan 9a6a38a is the first in series of pipeline changes for variable frame size and I don't think the commits work individually. So I had to go up until 407be96 to get the pipeline even working.

@ranj063
Copy link
Collaborator

ranj063 commented Apr 4, 2019

@juimonen to rule out the mixer as a suspect, can you try a simple host->vol->dai pipeline on BYT?

@xiulipan
Copy link
Contributor

xiulipan commented Apr 4, 2019

@juimonen thanks for the hint. It is a set of patches.
@ranj063
I will do that too.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@xiulipan @tomek @ranj063 @lgirdwood I got simplified topology (without mixer) from Ranjani and it plays cleanly with master. The basic rt5651 topology with mixer has metallic sound with master. I have some unreliability in my setup so sometimes I get just low level rumble. Anyway it is quite clear that there's some issues with mixer component (or the whole pipeline including it).

@tlauda
Copy link
Contributor

tlauda commented Apr 4, 2019

@juimonen What is the stream format vs. dai format? Can you get me verbose FW logs? It looks like some kind of mismatch in frame_fmt between mixer and volume components.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@tlauda can someone please tell me how to enable TRACEV?
I'm running ./scripts/docker-run.sh ./scripts/xtensa-build-all.sh -l byt, so how to enable it?

@tlauda
Copy link
Contributor

tlauda commented Apr 4, 2019

@juimonen make menuconfig and select verbose traces.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@tlauda doesn't work with docker so the "CI way"... that's the only setup I have atm. Where is the menuconfig inventing this? can I add it by hand to some file. I cna't add it to .config because they get always regenerated.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@tlauda ok got it working by editing src/lib/Kconfig

here's the log from mixer pipeline with metallic sound:

log_mixer.txt

here's the log with simple pipeline with clean sound:

log_simple.txt

@xiulipan
Copy link
Contributor

xiulipan commented Apr 4, 2019

@juimonen You can do it with docker. You can just run what you would like with docker. Docker used map for the device.
@tlauda @ranj063
I also tried with sof-cht-rt5682.tplg that is a simple host->vol->dai pipeline.
But I found if I pause/resume the playback, the noise will random come.

With the sof-byt-rt5651.tplg that is host->vol->mixer->volume->dai, the noise will come at beginning. But with some pause/resume it will recover.

@lgirdwood
Copy link
Member

lgirdwood commented Apr 4, 2019

@xiulipan @juimonen to restate, the issue is there is a race starting or synchronizing the DW-DMA HW wrt the rest of the pipeline.

Consider the SSP DMA buffer B with two periods p0, p1. On failure, the pipeline is writing to p0 whilst the DMA is simultaneously reading from p0. Normal operation is that pipeline writes to p0 whilst DMA reads p1.

The ordering PCM data from @plbossart is a big hint here as the DMA engine will prefetch p0 or p1 (and copy to the internal DMA FIFO) in multiples of DMA burst size. This will show a mixture of old and new data aligned on burst size multiples.

It is also worth checking the host DMA side too given the scheduling updates.

@juimonen can you resend logs of good and bad using the same simple pipeline topology. Just an excerpt is needed showing copying of about 20 periods.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@lgirdwood here's only bad log. I just can't get the clean thing out of this. Don't know if it is the excessive logging or something, but I booted the device 10 times, always bad.
log-simple-bad.txt

@tlauda
Copy link
Contributor

tlauda commented Apr 4, 2019

@juimonen So now also the simple topology doesn't work without noise?

@juimonen
Copy link

juimonen commented Apr 4, 2019

@tlauda yeah, I can sometimes play simple topology like couple of seconds, but then begins the noise (its like wind noise on microphone) and the music is gone. Don't know if the dma pointers get out of sync etc. I have to go way back to 22e3cf9 to get reliable playback

@juimonen
Copy link

juimonen commented Apr 4, 2019

@tlauda there is something really strange with the system. I'm not sure how I get my device in to this low rumble mode, but for e.g. now I can play simple topology clean with master. I'm sorry about this confusion but I'm also myself confused, is it that our CI scripts are not making clean builds when I jump around in git history or is my device borked... really unreliable results

@ranj063
Copy link
Collaborator

ranj063 commented Apr 4, 2019

@tlauda @juimonen I can confirm that with the commit cbbc6af the audio is clean on my minnowboard with a no mixer topology. but I did see an underrun once during 2 minutes of playback but audio recovered and continued to play without any issues.

With a mixer based topology, the sound is metallic.

@juimonen
Copy link

juimonen commented Apr 4, 2019

@ranj063 if you try what @xiulipan said today in bug scrub: try aplay -i and use space to pause/resume, you can get the mixer topology also to play clean sometimes. I think this is something what @lgirdwood described so host or dai dma is getting confused. Mixer topology is just bigger so it gets to "metallic" more often.

@lgirdwood
Copy link
Member

lgirdwood commented Apr 4, 2019

@juimonen thanks for the log, but it needs verbose trace enabled to see the R/W pointer or they may need manually added to trace. It would be good to see the timing and R/W pointer at trigger start and then when the pipeline is next scheduled (as this could be scheduled too soon).

 0      2          DMA         109184540.364583         2.656250 src/drivers/dw/dma.c:491 	dw-dma: 1 channel 0 start
    0      2          SSP         109184546.354167         5.989583 intel/baytrail/ssp.c:547 	ssp_trigger()
    0      2          SSP         109184548.489583         2.135417 intel/baytrail/ssp.c:496 	ssp_start()
    0      2         PIPE         109184551.302083         2.812500 src/audio/pipeline.c:864 	pipeline_task() sched
    0      2     SCHEDULE         109184553.020833         1.718750 c/lib/edf_schedule.c:369 	schedule_edf_task_complete()
    0      2         WAIT         109184555.625000         2.604167 c/include/sof/wait.h:53  	WFE
    0      2         WAIT         109185377.656250       822.031250 c/include/sof/wait.h:59  	WFX
    0      2     SCHEDULE         109185379.322917         1.666667 c/lib/edf_schedule.c:427 	schedule_edf()
    0      2         WAIT         109185381.458333         2.135417 c/include/sof/wait.h:53  	WFE
    0      2          DMA         109185488.072917       106.614586 src/drivers/dw/dma.c:1445	dw-dma: 1 IRQ status 0x3
    0      2          DAI 1.5     109185491.822917         3.750000      src/audio/dai.c:171 	dai_dma_cb()
    0      2     SCHEDULE         109185493.750000         1.927083 c/lib/edf_schedule.c:278 	_schedule_edf_task()
    0      2     SCHEDULE         109185497.343750         3.593750 c/lib/edf_schedule.c:427 	schedule_edf()
    0      2     SCHEDULE         109185502.291667         4.947917 c/lib/edf_schedule.c:412 	edf_scheduler_run()
    0      2     SCHEDULE         109185503.750000         1.458333 c/lib/edf_schedule.c:194 	sch_edf()
    0      2     SCHEDULE         109185507.343750         3.593750 c/lib/edf_schedule.c:403 	schedule_edf_task_running()
    0      2         PIPE 1.6     109185509.062500         1.718750 src/audio/pipeline.c:843 	pipeline_task()
    0      2          DAI 1.5     109185510.677083         1.614583      src/audio/dai.c:629 	dai_copy()
    0      2          DMA         109185512.187500         1.510417 src/drivers/dw/dma.c:1631	dw-dma: 1 channel 0 get_data_size
    0      2          DAI 1.5     109185514.270833         2.083333      src/audio/dai.c:655 	dai_copy(), copy_bytes = 0x208
    0      2          DMA         109185515.729167         1.458333 src/drivers/dw/dma.c:1297	dw-dma: 1 channel 0 copy
    0      2          DAI 1.5     109185517.291667         1.562500      src/audio/dai.c:171 	dai_dma_cb()
    0      2          DAI 1.5     109185518.906250         1.614583      src/audio/dai.c:101 	dai_buffer_process()

Something stands out that needs checking.

dai_copy(), copy_bytes = 0x208

The period size should be 1ms (SSP DMA interrupt is ~1ms) but we appear to write 0x208 bytes when I would expect 0xc0 bytes.
Also please add R/W pointer dumps to the DMA driver, DAI driver and source component. These all need to align all the time.

@juimonen
Copy link

juimonen commented Apr 8, 2019

I've been trying to log the dai write vs. dma read pointers and at least to me it seems that dma read pointer is always behind the dai writer (so not at least not overlapping with the current write period). I'm checking this in dw_dma_copy where dai has written a period and is updating pointers for dma (sar register value is not inside current_ptr + bytes).

I've also tried to fiddle around cache_writeback because the error sounds like buffer corruption, but without luck.

If someone has time I could really use extra eye balls or good advice...

@tlauda
Copy link
Contributor

tlauda commented Apr 10, 2019

Fixed by #1247.

@mengdonglin
Copy link
Collaborator

@Jiangxinx Please help to check if this issue is fixed by #1247 on Baytrail.

@lgirdwood
Copy link
Member

@mengdonglin @tlauda according to @stevyan this is still an issue for playback but not capture on BYT.

@tlauda
Copy link
Contributor

tlauda commented Apr 17, 2019

@lgirdwood Does @stevyan use the right topology? Note that only Dhw:0,0 works for now, Dhw:0,1 is media pipe and still needs topology changes.

@lgirdwood
Copy link
Member

@tlauda not sure, but I would be happy to do an v3.1-rc2 if Dhw:0,0 is working. @stevyan can you describe analog playback/capture test you are using.

@wenqingfu
Copy link

@juimonen how about testing on your side? Is everything ok with @tlauda 's patch?

@juimonen
Copy link

@wenqingfu yes for me the hw0:0 device is working with @tlauda topology.

@juimonen
Copy link

@tlauda have not been following fw PR's closely, but do you have the PR also for the pause/resume fix? I think that would be very good for rc2...

@tlauda
Copy link
Contributor

tlauda commented Apr 17, 2019

@juimonen It's by default fixed by #1273.

@juimonen
Copy link

@tlauda, cool and thanks!

@juimonen
Copy link

works for me with byt-mb + rt5651!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected BYT Applies to Baytrail platform
Projects
None yet
Development

No branches or pull requests

9 participants