-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rtc_cmos rtc_cmos: Alarms can be up to one month in the future after waking from suspend #3011
Comments
@XiaoyunWu6666 ,please set the RTC time form system time. |
@keqiaozhang already ' sudo hwclock --systohc' on that DUT . |
@plbossart can you confirm that we have CONFIG RTC_SYSTOHC enabled always? That makes hwclock unnecessary. It's ON on my system. Would we have any script messing with /etc/adjtime? I hope not... |
yes, it's included in the build
|
I searched the source code to understand the error message. It happens that an invalid value is passed to https://lore.kernel.org/patchwork/patch/722795/ Then I looked at the logs and the test did absolutely not try to schedule an alarm more than a month ahead. So I don't know what's going on.
|
I checked the logs and found something even more puzzling: this error message came at Jun 23 22:52:44, NOT at the time The message can only be printed by I also compared the logs between successful rtcwake and the failing one and found no obvious difference.
|
Interesting @marc-hb, could it be that something other that sof-test sets an alarm on resume? Also wondering if the issue is this log only, what would happen if we ignore it? |
While this one looks spurious I'm reluctant to ignore time-related error messages considering how many time synchronizations issues and misconfigurations we had in the past and we still experience various TIMEOUTs, see links above. In fact is this message really "spurious"? Some rogue script invoking rtcwake could explain some TIMEOUTs. |
I don't disagree, but I don't see how to backtrace who requested the alarm. Could be literally any userspace script or application, no? |
Is this still happening, how often and on which branch? |
not happened these days but add a command to set RTC time from system time after DUT deployment , before they are reboot and tests start, see sof-sh PR above. |
inner daily 5261?model=CML_RVP_SDW&testcase=check-suspend-resume-50 It happened again in today's daily, and was the first reproduction after inner daily test 4834 model=CML_RVP_SDW testcase=check-suspend-resume-50. DUT: jf-cml-rvp-sdw-3 |
I checked the log again and again the error appeared after the system woke up on time just fine as expected. A possible next step could be to compare the BIOS versions with |
@fredoh9, can you help to compare the BIOS version on these 2 devices? |
I will compare but |
Yes, but more importantly: BIOS version differences for bug fixes Also: |
@fredoh9 ,did you find any difference after comparing BIOS? |
I haven't checked/compared the BIOS settings yet. I will plan to come to office and do that in this Wednesday. |
I changed my schedule to come to office today along with Marc. Thank you @marc-hb. I couldn't compare Scanned test results for a week, this hasn't been not happening for a week. Running suspend-resume 500 times locally to see I can reproduce this. Will do same thing on jf-cml-rvp-sdw-1 whenever it is available. Will do |
jf-cml-rvp-sdw-3 running fine up to 375/500. I had to stop to check other things. Coin battery seems fine by checking, check BIOS date/time -> remove power -> wait for 2-3min -> power on -> check BIOS date/time comparing package list. |
diff -b -u shows a lot of differences: one system has packages that the other has not and one system is more up to date than the other. And of course we can't compare configuration files difference and any other custom scripts. |
This didn't happened for a long time. No change on BIOS setting. Comparing dpkg list wasn't easy. But the clock synchronization had some issues. Removed ntpsec-ntpdate as I suspect conflict with systemd-timesyncd and made sure systemd-timesyncd does the job. After that many of weird issues was gone. I think we can close this. If same or similar problem is happened again, we can re-open. |
happen in inner daily 5882 again |
What a coincidence! After closing this issue, I saw that failure. After scan full test logs for about a month, these are the failures. Will do more investigation.
|
@fredoh9 @XiaoyunWu6666 @marc-hb do we still see this issue? if not can we close? |
Just now after @fredoh9 extended the suspend-resume test from 1 cycle per device to 5 cycles per device. https://sof-ci.01.org/sofpr/PR4866/build10718/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-playback-5 |
this happened sh-apl-up2-hda-07 |
@marc-hb @XiaoyunWu6666 We seem to have moved to an issue that's no longer specific to CML_RVP_SDW, should we update title and tags for tracking purposes? |
Collecting more Please help review. |
@marc-hb @XiaoyunWu6666 i can't recall if this problem still exists? |
They may not print this message but we still experience rtcwake freezes on a regular basis so I would prefer to keep this open |
Fresh from the oven: https://sof-ci.01.org/sofpr/PR5875/build778/devicetest/?model=WHL_UPEXT_HDA_ZEPHYR&testcase=check-suspend-resume-5 I checked Interestingly, CI did NOT timed out and power-cycled the device because it immediately rebooted itself, it was up again less than 30 seconds later. |
That would need to be tracked as a separate bug. This issue was only related to the 'one month into the future'. |
Description
rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend
happened in inner daily test 4834 model=CML_RVP_SDW testcase=check-suspend-resume-50
[console log]
Reproduction
TPLG=sof-cml-rt700-4ch.tplg ~/sof-test/test-case/check-suspend-resume.sh -l 50
Reproduction rate unknown.
Platform
CML_RVP_SDW
The text was updated successfully, but these errors were encountered: