[BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC #4163

XiaoyunWu6666 · 2021-05-10T09:33:35Z

Describe the bug
Found on May 09 daily test http://sof-ci.sh.intel.com/#/result/planresultdetail/3845

In daily test , read error also occured in check-capture-all-formats on TGLU_RVP_NOCODEC , but cannot be manually reproduced.

To Reproduce
TPLG=sof-tgl-nocodec-ci.tplg ~/sof-test/test-case/check-capture.sh -d 1 -l 1 -r 50

Reproduction Rate
100% on May 10 , but other issue[refer to #4164] may overlay it so you cannot see log shown below if trying to reproduce it

Environment
Kernel Branch: topic/sof-dev
Kernel Commit: 9101539a
SOF Branch: main
SOF Commit: 47d223c
TPLG=sof-tgl-nocodec-ci.tplg
Device:TGLU_RVP_NOCODEC [jf-tglu-rvp-nocodec-1]
Report ID:3845

Log

2021-05-09 22:01:45 UTC [REMOTE_INFO] ===== Testing: (Round: 19/50) (PCM: Port0 [hw:0,0]) (Loop: 1/1) =====
2021-05-09 22:01:45 UTC [REMOTE_INFO] no file prefix, use /dev/null as dummy capture output
2021-05-09 22:01:45 UTC [REMOTE_COMMAND] arecord   -Dhw:0,0 -r 48000 -c 2 -f S32_LE -d 1 /dev/null -v -q
Hardware PCM card 0 'sof-nocodec' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 32
  buffer_size  : 8192
  period_size  : 2048
  period_time  : 42666
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 2048
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 8192
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
**arecord: pcm_read:2155: read error: Input/output error
2021-05-09 22:01:45 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2021-05-09 22:01:45 UTC [REMOTE_ERROR]  arecord_opts()  @  /home/ubuntu/sof-test/test-case/../case-lib/lib.sh
2021-05-09 22:01:45 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-capture.sh:98**
2021-05-09 22:01:45 UTC [REMOTE_INFO] Starting /usr/bin/sof-logger  -l /etc/sof/sof-tgl.ldc -o /home/ubuntu/sof-test/logs/check-capture/2021-05-09-22:00:26-23746/etrace.txt
2021-05-09 22:01:46 UTC [REMOTE_INFO] pkill -TERM sof-logger
Terminated
2021-05-09 22:01:48 UTC [REMOTE_INFO] nlines=4 /home/ubuntu/sof-test/logs/check-capture/2021-05-09-22:00:26-23746/etrace.txt
2021-05-09 22:01:48 UTC [REMOTE_INFO] Test Result: FAIL!

sometimes it gets
arecord: set_params:1407: Unable to install hw params:
instead

soflogger.txt

The text was updated successfully, but these errors were encountered:

marc-hb · 2021-05-10T21:31:10Z

Reproduction Rate 100%

This is very recent, can someone bisect and find which commit introduced this?

keyonjie · 2021-05-11T02:29:02Z

Thanks for reporting @XiaoyunWu6666 .
@marc-hb This looks like a new issue observed by our CI about multi-core, that's why we need the topology change merged as early as possible.

marc-hb · 2021-05-11T02:35:49Z

Reproduction Rate 100%

Actually, that can't be right because:

===== Testing: (Round: 19/50)

that's why we need the topology change merged as early as possible.

Pretty sure you mean this one : #4153

Based on all the above I'm transferring this from thesofproject/linux to thesofproject/sof

slawblauciak · 2021-05-11T07:37:18Z

Can you please check if #4089 helps here?

plbossart · 2021-05-11T14:51:22Z

Thanks for reporting @XiaoyunWu6666 .
@marc-hb This looks like a new issue observed by our CI about multi-core, that's why we need the topology change merged as early as possible.

What topology change are you referring to @keyonjie ? I don't see any and if there is again an issue we probably need to revert the multicore changes again. It seems like the topology change in #4153 was merged too quickly and broke CI twice. We need to be more careful here, CI is not a debug tool when introducing features.

keyonjie · 2021-05-12T00:42:56Z

@plbossart what do you mean by breaking CI? The basic support of putting a pipeline run on a slave core is already claimed supported, we need to ask validation of it.
IMHO, having bugs reported every time the validation of a new feature set introduced is quite common and under expected, we should go and fix them, just like what we have been doing for SDW.

plbossart · 2021-05-12T01:16:27Z

@keyonjie I take issue with you merging your own PR before FIRST asking for validation. I routinely ask to run more thorough daily tests before we merge. This is what we do also for changes of the kernel to a new -rc1.

Merging and then doing validation is not right, sorry.

keyonjie · 2021-05-12T01:40:18Z

@plbossart we did run validation before merging #4153, and if you check the May 9th daily report (the first daily after the PR merged) http://sof-ci.sh.intel.com/#/result/planresultdetail/3845, only 3 cases failed and 2 of them are stress, it is already not bad than TGLU_VOLT_SDW no?

EDIT:
On the other hand, there are more errors on the May 10th http://sof-ci.sh.intel.com/#/result/planresultdetail/3868, and looks some regression observed, that's what we benefit from the new coverage, no?

keyonjie · 2021-05-12T02:47:35Z

EDIT
looks like the kernel has been built with wrong commit on May 11th build 3904, it is now 5.12-rc7 while 5.12-rc8 on May 9th, @keqiaozhang @marc-hb @fredoh9 @aiChaoSONG can you check? Without @kv2019i 's fix, even basic multi-core support is not available.

marc-hb · 2021-05-12T03:09:50Z

Do you mean it was not built with 9101539a as reported or just that 9101539a is not the latest commit?

EDIT: only daily build 3904 tested the wrong commit. It tested a 1-month old commit by accident

marc-hb · 2021-05-12T04:25:05Z

(the first daily after the PR merged) http://sof-ci.sh.intel.com/#/result/planresultdetail/3845, only 3 cases failed and 2 of them are stress, it is already not bad than TGLU_VOLT_SDW no?

It simply depends whether these failures were known before merge and relatively easy to reproduce. If they were then they should have been fixed before merge so these tests are not "lost" for other development.

@plbossart what do you mean by breaking CI?

It's a very simple idea: no regression.

keyonjie · 2021-05-12T04:48:02Z

Do you mean it was not built with 9101539a as reported or just that 9101539a is not the latest commit?

EDIT: only daily build 3904 tested the wrong commit. It tested a 1-month old commit by accident

you are right, only build 3904 (May 11th) was wrong, the 3868 was correct, and the issue filed here based on 3845 is valid, I just correct the comments above.
My PR #4089 was added to fix this issue, @XiaoyunWu6666 can you please help to check if it help?

XiaoyunWu6666 · 2021-05-12T05:41:56Z

#4089 is effective against this issue. [tested on ubuntu@sh-tglu-rvp-nocodec-02]
@keyonjie

XiaoyunWu6666 · 2021-05-13T03:12:59Z

happen on TGLU_RVP_HDA too
http://sof-ci.sh.intel.com/#/result/planresultdetail/3910?model=TGLU_RVP_HDA&testcase=check-capture-all-formats

marc-hb transferred this issue from thesofproject/linux May 11, 2021

XiaoyunWu6666 closed this as completed May 11, 2021

XiaoyunWu6666 reopened this May 11, 2021

XiaoyunWu6666 changed the title ~~[BUG] pcm_Read error occurs on TGLU_RVP_NOCODEC when check-capture-50rounds~~ [BUG] pcm_Read error on TGLU_RVP_NOCODEC when check-capture-50rounds May 11, 2021

mengdonglin added bug Something isn't working as expected TGL Applies to Tiger Lake multicore Issues observed when not only core#0 is used. labels May 11, 2021

mengdonglin added this to the v1.8 milestone May 11, 2021

plbossart mentioned this issue May 11, 2021

topology: sof-tgl-nocodec-ci: change to run smart_amp on core 1 #4153

Merged

XiaoyunWu6666 changed the title ~~[BUG] pcm_Read error on TGLU_RVP_NOCODEC when check-capture-50rounds~~ [BUG] pcm_Read error on TGLU_RVP_NOCODEC/HDA when check-capture-50rounds May 13, 2021

XiaoyunWu6666 changed the title ~~[BUG] pcm_Read error on TGLU_RVP_NOCODEC/HDA when check-capture-50rounds~~ [BUG] pcm_Read error when capturing May 13, 2021

XiaoyunWu6666 changed the title ~~[BUG] pcm_Read error when capturing~~ [BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC May 13, 2021

XiaoyunWu6666 closed this as completed May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC #4163

[BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC #4163

XiaoyunWu6666 commented May 10, 2021 •

edited

Loading

marc-hb commented May 10, 2021 •

edited

Loading

keyonjie commented May 11, 2021

marc-hb commented May 11, 2021 •

edited

Loading

slawblauciak commented May 11, 2021

plbossart commented May 11, 2021

keyonjie commented May 12, 2021

plbossart commented May 12, 2021

keyonjie commented May 12, 2021 •

edited

Loading

keyonjie commented May 12, 2021 •

edited

Loading

marc-hb commented May 12, 2021 •

edited

Loading

marc-hb commented May 12, 2021

keyonjie commented May 12, 2021 •

edited

Loading

XiaoyunWu6666 commented May 12, 2021

XiaoyunWu6666 commented May 13, 2021

[BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC #4163

[BUG] pcm_Read error when capturing on TGLU_RVP_NOCODEC #4163

Comments

XiaoyunWu6666 commented May 10, 2021 • edited Loading

marc-hb commented May 10, 2021 • edited Loading

keyonjie commented May 11, 2021

marc-hb commented May 11, 2021 • edited Loading

slawblauciak commented May 11, 2021

plbossart commented May 11, 2021

keyonjie commented May 12, 2021

plbossart commented May 12, 2021

keyonjie commented May 12, 2021 • edited Loading

keyonjie commented May 12, 2021 • edited Loading

marc-hb commented May 12, 2021 • edited Loading

marc-hb commented May 12, 2021

keyonjie commented May 12, 2021 • edited Loading

XiaoyunWu6666 commented May 12, 2021

XiaoyunWu6666 commented May 13, 2021

XiaoyunWu6666 commented May 10, 2021 •

edited

Loading

marc-hb commented May 10, 2021 •

edited

Loading

marc-hb commented May 11, 2021 •

edited

Loading

keyonjie commented May 12, 2021 •

edited

Loading

keyonjie commented May 12, 2021 •

edited

Loading

marc-hb commented May 12, 2021 •

edited

Loading

keyonjie commented May 12, 2021 •

edited

Loading