Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtc_cmos rtc_cmos: Alarms can be up to one month in the future after waking from suspend #3011

Closed
XiaoyunWu6666 opened this issue Jun 25, 2021 · 35 comments
Assignees
Labels
bug Something isn't working suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@XiaoyunWu6666
Copy link

XiaoyunWu6666 commented Jun 25, 2021

Description
rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend
happened in inner daily test 4834 model=CML_RVP_SDW testcase=check-suspend-resume-50

[console log]

2021-06-23 22:52:49 UTC [ERROR] Caught kernel log error
===========================>>
[ 4739.061314] kernel: rtc_cmos rtc_cmos: Alarms can be up to one month in the future
<<===========================

Reproduction
TPLG=sof-cml-rt700-4ch.tplg ~/sof-test/test-case/check-suspend-resume.sh -l 50

Reproduction rate unknown.
Platform
CML_RVP_SDW

@XiaoyunWu6666 XiaoyunWu6666 added CML Applies to Comet Lake platform NOT on topic/sof-dev issue that does not happen on branch topic/sof-dev labels Jun 25, 2021
@XiaoyunWu6666 XiaoyunWu6666 changed the title [CML]rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend [CML][NOT ON SOF_DEV]rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend Jun 25, 2021
@keqiaozhang
Copy link
Collaborator

@XiaoyunWu6666 ,please set the RTC time form system time.

@XiaoyunWu6666
Copy link
Author

@keqiaozhang already ' sudo hwclock --systohc' on that DUT .
Will track whether it reproduced in CI later . If not ,will close this issue . thanks

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 30, 2021

@keqiaozhang already ' sudo hwclock --systohc' on that DUT .

@plbossart can you confirm that we have CONFIG RTC_SYSTOHC enabled always? That makes hwclock unnecessary. It's ON on my system.

Would we have any script messing with /etc/adjtime? I hope not...

@plbossart
Copy link
Member

@keqiaozhang already ' sudo hwclock --systohc' on that DUT .

@plbossart can you confirm that we have CONFIG RTC_SYSTOHC enabled always? That makes hwclock unnecessary. It's ON on my system.

yes, it's included in the build

sof-dev-defconfig:CONFIG_RTC_SYSTOHC=y
sof-dev-defconfig:CONFIG_RTC_SYSTOHC_DEVICE="rtc0"

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 30, 2021

I searched the source code to understand the error message. It happens that an invalid value is passed to rtcwake. More specifically, Alarms can be up to one month in the future is supposed to be printed when the user tries to schedule an alarm more than one month ahead but the RTC clock does not support months. See the full code and commit message there:

https://lore.kernel.org/patchwork/patch/722795/
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6a6af3d04435

Then I looked at the logs and the test did absolutely not try to schedule an alarm more than a month ahead. So I don't know what's going on.

2021-06-23 22:52:01 UTC [REMOTE_INFO] ===== Round(24/50) =====
2021-06-23 22:52:01 UTC [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Jun 23 22:52:08 2021
2021-06-23 22:52:08 UTC [REMOTE_COMMAND] sleep for 5
2021-06-23 22:52:13 UTC [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1624488716"
2021-06-23 22:52:13 UTC [REMOTE_INFO] Check for the wakeup_count
2021-06-23 22:52:13 UTC [REMOTE_INFO] ===== Round(25/50) =====
2021-06-23 22:52:13 UTC [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Jun 23 22:52:20 2021
2021-06-23 22:52:20 UTC [REMOTE_COMMAND] sleep for 5
2021-06-23 22:52:25 UTC [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1624488728"
2021-06-23 22:52:25 UTC [REMOTE_INFO] Check for the wakeup_count
2021-06-23 22:52:25 UTC [REMOTE_INFO] ===== Round(26/50) =====
2021-06-23 22:52:25 UTC [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Jun 23 22:52:32 2021
2021-06-23 22:52:32 UTC [REMOTE_COMMAND] sleep for 5
2021-06-23 22:52:37 UTC [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1624488740"
2021-06-23 22:52:37 UTC [REMOTE_INFO] Check for the wakeup_count
2021-06-23 22:52:37 UTC [REMOTE_INFO] ===== Round(27/50) =====
2021-06-23 22:52:37 UTC [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Jun 23 22:52:44 2021
2021-06-23 22:52:44 UTC [REMOTE_COMMAND] sleep for 5
2021-06-23 22:52:49 UTC [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1624488752"
2021-06-23 22:52:49 UTC [ERROR] Caught kernel log error
===========================>>
[ 4739.061314] kernel: rtc_cmos rtc_cmos: Alarms can be up to one month in the future
<<===========================
2021-06-23 22:52:49 UTC [REMOTE_ERROR] Caught error in kernel log
2021-06-23 22:52:49 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2021-06-23 22:52:49 UTC [REMOTE_ERROR]  die()  @  /home/ubuntu/sof-test/test-case/../case-lib/lib.sh
2021-06-23 22:52:49 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-suspend-resume.sh:91
2021-06-23 22:52:49 UTC [REMOTE_INFO] Test Result: FAIL!

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 30, 2021

I checked the logs and found something even more puzzling: this error message came at Jun 23 22:52:44, NOT at the time rtcwake was invoked but 5 seconds later when the system woke up!?

The message can only be printed by cmos_set_alarm(), @plbossart, others any idea what else could be trying to set RTC alarms?

I also compared the logs between successful rtcwake and the failing one and found no obvious difference.

Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: smpboot: Booting Node 0 Processor 10 APIC 0x9
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: ACPI: \_SB_.PR10: Found 3 idle states
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: CPU10 is up
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: smpboot: Booting Node 0 Processor 11 APIC 0xb
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: ACPI: \_SB_.PR11: Found 3 idle states
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: CPU11 is up
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: ACPI: Waking up from system sleep state S3
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: ACPI: EC: interrupt unblocked
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: ACPI: EC: event unblocked
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sd 2:0:0:0: [sda] Starting disk
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] is disabled/in DSI mode with an ungated DDI clock, gate it
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: i915 0000:00:02.0: [drm] [ENCODER:102:DDI B/PHY B] is disabled/in DSI mode with an ungated DDI clock, gate it
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: i915 0000:00:02.0: [drm] [ENCODER:117:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: serial 00:06: activated
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: mei_me 0000:00:16.0: hbm: dma setup response: failure = 3 REJECTED
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: rtc_cmos rtc_cmos: Alarms can be up to one month in the future
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 1
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000000 at  00000044
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000040 at  00000048
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIW: 00000000 at  00000048
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000000 at  00000048
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIW: 00000040 at  00000048
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
Jun 23 22:52:44 jf-cml-rvp-sdw-3 kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D0I0

@plbossart
Copy link
Member

Interesting @marc-hb, could it be that something other that sof-test sets an alarm on resume?

Also wondering if the issue is this log only, what would happen if we ignore it?

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 1, 2021

While this one looks spurious I'm reluctant to ignore time-related error messages considering how many time synchronizations issues and misconfigurations we had in the past and we still experience various TIMEOUTs, see links above.

In fact is this message really "spurious"? Some rogue script invoking rtcwake could explain some TIMEOUTs.

@plbossart
Copy link
Member

I don't disagree, but I don't see how to backtrace who requested the alarm. Could be literally any userspace script or application, no?

@XiaoyunWu6666 XiaoyunWu6666 removed the CML Applies to Comet Lake platform label Jul 5, 2021
@XiaoyunWu6666 XiaoyunWu6666 changed the title [CML][NOT ON SOF_DEV]rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend [NOT ON SOF_DEV]rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend Jul 5, 2021
@mengdonglin mengdonglin added bug Something isn't working CML Applies to Comet Lake platform labels Jul 12, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 13, 2021

Is this still happening, how often and on which branch?

@XiaoyunWu6666 XiaoyunWu6666 removed the NOT on topic/sof-dev issue that does not happen on branch topic/sof-dev label Jul 13, 2021
@XiaoyunWu6666 XiaoyunWu6666 changed the title [NOT ON SOF_DEV]rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend Jul 13, 2021
@XiaoyunWu6666
Copy link
Author

XiaoyunWu6666 commented Jul 13, 2021

Is this still happening, how often and on which branch?

not happened these days but add a command to set RTC time from system time after DUT deployment , before they are reboot and tests start, see sof-sh PR above.

@XiaoyunWu6666
Copy link
Author

XiaoyunWu6666 commented Jul 14, 2021

inner daily 5261?model=CML_RVP_SDW&testcase=check-suspend-resume-50

It happened again in today's daily, and was the first reproduction after inner daily test 4834 model=CML_RVP_SDW testcase=check-suspend-resume-50.
So maybe this is an issue with low reproduction rate and actually 'hwclock --systohc' cannot save it , according to what we observe in daily testresults

DUT: jf-cml-rvp-sdw-3

image

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 14, 2021

I checked the log again and again the error appeared after the system woke up on time just fine as expected.

A possible next step could be to compare the BIOS versions with jf-cml-rvp-sdw-2 on which this never seems to happen and where the same test has been running fine (I checked the logs)

@keqiaozhang
Copy link
Collaborator

@fredoh9, can you help to compare the BIOS version on these 2 devices?

@fredoh9
Copy link
Collaborator

fredoh9 commented Jul 20, 2021

I will compare but jf-cml-rvp-sdw-2 is out of ci for a long time. We should compare jf-cml-rvp-sdw-1 vs jf-cml-rvp-sdw-3.
Do I need to look for RTC related settings?

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 21, 2021

Do I need to look for RTC related settings?

Yes, but more importantly: BIOS version differences for bug fixes

Also: dpkg -l differences and especially any software that could run in the background.

@keqiaozhang
Copy link
Collaborator

@fredoh9 ,did you find any difference after comparing BIOS?

@fredoh9
Copy link
Collaborator

fredoh9 commented Jul 26, 2021

I haven't checked/compared the BIOS settings yet. I will plan to come to office and do that in this Wednesday.

@fredoh9
Copy link
Collaborator

fredoh9 commented Jul 26, 2021

I changed my schedule to come to office today along with Marc. Thank you @marc-hb. I couldn't compare jf-cml-rvp-sdw-3 with jf-cml-rvp-sdw-1 because daily build need one. But as I suspect I don't find particular setting for it. Found some could be related with RTC. Need to compare them when daily test is done.

Scanned test results for a week, this hasn't been not happening for a week.

Running suspend-resume 500 times locally to see I can reproduce this. Will do same thing on jf-cml-rvp-sdw-1 whenever it is available.

Will do dpkg -l and compare between two devices also

@fredoh9
Copy link
Collaborator

fredoh9 commented Jul 27, 2021

jf-cml-rvp-sdw-3 running fine up to 375/500. I had to stop to check other things.

Coin battery seems fine by checking, check BIOS date/time -> remove power -> wait for 2-3min -> power on -> check BIOS date/time

comparing package list.
cml1_dpkg_list.txt
cml3_dpkg_list.txt

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 27, 2021

comparing package list. cml1_dpkg_list.txt cml3_dpkg_list.txt

diff -b -u shows a lot of differences: one system has packages that the other has not and one system is more up to date than the other. And of course we can't compare configuration files difference and any other custom scripts.

@fredoh9 fredoh9 self-assigned this Aug 16, 2021
@fredoh9
Copy link
Collaborator

fredoh9 commented Aug 16, 2021

This didn't happened for a long time. No change on BIOS setting. Comparing dpkg list wasn't easy.

But the clock synchronization had some issues. Removed ntpsec-ntpdate as I suspect conflict with systemd-timesyncd and made sure systemd-timesyncd does the job. After that many of weird issues was gone. I think we can close this. If same or similar problem is happened again, we can re-open.

@fredoh9 fredoh9 closed this as completed Aug 16, 2021
@XiaoyunWu6666
Copy link
Author

happen in inner daily 5882 again
on cml-rvp-sdw apl-up2-nocodec

@XiaoyunWu6666 XiaoyunWu6666 reopened this Aug 16, 2021
@fredoh9
Copy link
Collaborator

fredoh9 commented Aug 16, 2021

What a coincidence! After closing this issue, I saw that failure.

After scan full test logs for about a month, these are the failures. Will do more investigation.

  • 5261/CML_RVP_SDW/check-suspend-resume-50
  • 5664/ADLP_RVP_NOCODEC/check-suspend-resume-5
  • 5687/ADLP_RVP_NOCODEC/check-suspend-resume-5
  • 5882/CML_RVP_SDW/check-suspend-resume-50
  • 5882/APL_UP2_NOCODEC/check-suspend-resume-50

@plbossart
Copy link
Member

@fredoh9 @XiaoyunWu6666 @marc-hb do we still see this issue? if not can we close?

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 15, 2021

Just now after @fredoh9 extended the suspend-resume test from 1 cycle per device to 5 cycles per device.

https://sof-ci.01.org/sofpr/PR4866/build10718/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-playback-5
https://sof-ci.01.org/sofpr/PR4866/build10718/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-playback-5

@XiaoyunWu6666
Copy link
Author

https://sof-ci.01.org/sofpr/PR5239/build11751/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-5

this happened sh-apl-up2-hda-07
will check whether hw update can solve this or not after CNY :)

@plbossart
Copy link
Member

@marc-hb @XiaoyunWu6666 We seem to have moved to an issue that's no longer specific to CML_RVP_SDW, should we update title and tags for tracking purposes?

@marc-hb marc-hb changed the title rtc_cmos rtc_cmos: Alarms can be up to one month in the future on CML_RVP_SDW after waking from suspend rtc_cmos rtc_cmos: Alarms can be up to one month in the future after waking from suspend Jan 24, 2022
@marc-hb marc-hb removed the CML Applies to Comet Lake platform label Jan 24, 2022
@marc-hb
Copy link
Collaborator

marc-hb commented May 7, 2022

@plbossart
Copy link
Member

@marc-hb @XiaoyunWu6666 i can't recall if this problem still exists?

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 29, 2022

They may not print this message but we still experience rtcwake freezes on a regular basis so I would prefer to keep this open

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 30, 2022

Fresh from the oven: https://sof-ci.01.org/sofpr/PR5875/build778/devicetest/?model=WHL_UPEXT_HDA_ZEPHYR&testcase=check-suspend-resume-5

I checked journalctl on that system and while there was no rtcwake error message there definitely was a hard lockup: absolutely nothing in the logs after the call to rtcwake

Interestingly, CI did NOT timed out and power-cycled the device because it immediately rebooted itself, it was up again less than 30 seconds later.

@marc-hb marc-hb added the suspend resume Issues related to suspend resume (e.g. rtcwake) label Jul 19, 2022
@plbossart
Copy link
Member

They may not print this message but we still experience rtcwake freezes on a regular basis so I would prefer to keep this open

That would need to be tracked as a separate bug. This issue was only related to the 'one month into the future'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

No branches or pull requests

6 participants