-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
byt-ipc: refine the IPC DONE mask #2119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, please explain what you are fixing.
This is aim to harden the robustness to avoid IPC IRQ storm, we only expect to receive ACK/reply IPC from host after we have initiated an IPC to host. @lgirdwood imagine that the host can send ACK IPC(or even storm ACKIPCs) by intentional or accident, we hit similar issue on host side and are doing similar fix on Linux side. |
Ok, please include all this data in your commit message. Do we need to do the same for BDW given the similarities in IPC IP. |
Sure, let me update the commit message, and yes, we need the similar fixes for BDW also, let me add it together. |
db5cb2b
to
fa1e4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyonjie "To fix that, reefine the IPCD DONE mask sequence as below:" reefine typo ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @keyonjie you keep sending IPC related patches without clear explanations.
This is critical code, we should not change this without extreme caution.
I am not aware of any rain storm or issues with Baytrail, so let's please follow rational engineering steps
@plbossart I understand and do follow the extreme caution rule here, we have logic explanation in the commit message, experiment explanation in the filed issue thesofproject/linux#1492, we have also run stress test on BYT/CHT/BSW. For the interrupt storm issue, if you run module unload/reload in BSW, you will see it randomly, if the last ack IPC is not handled in the driver before the module is unloaded, the HW will keep generating this interrupt and hang the whole system. Vice versa, in the FW side, my PR here implement the same logic to avoid this error. |
good catch, fixed now. |
This is aim to harden the robustness to avoid IPC IRQ storm, we only expect to receive ACK/reply IPC from host after we have initiated an IPC to host. Imagine that the host can keep sending ACK IPC(or even storm ACK IPCs) intentiionally or by accident, with today's code the FW will keep receiving this ACK IPC interrupts and looks hang, we hit this similar issue on host side and are doing similar fix on Linux side. To fix that, refine the IPCD DONE mask sequence as below: 1. After initialized, mask the IPCD DONE. 2. Enable/unmask IPCD DONE at new message sending to host. 3. Disable/mask IPCD DONE once the FW initiated IPC handling is done. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
This is aim to harden the robustness to avoid IPC IRQ storm, we only expect to receive ACK/reply IPC from host after we have initiated an IPC to host. Imagine that the host can keep sending ACK IPC(or even storm ACK IPCs) intentiionally or by accident, with today's code the FW will keep receiving this ACK IPC interrupts and looks hang, we hit this similar issue on host side and are doing similar fix on Linux side. To fix that, refine the IPCD DONE mask sequence as below: 1. After initialized, mask the IPCD DONE. 2. Enable/unmask IPCD DONE at new message sending to host. 3. Disable/mask IPCD DONE once the FW initiated IPC handling is done. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
I stopped here: you didn't describe how this can happen in the first place... |
@keyonjie I am not also not very convinced about this change too. What is the driver "ACK"ing when the module is removed? |
@ranj063 not the driver "ACK"ing in the module removing, the issue happens with module removing is on host side, where is the FW "ACK"ing to the host side. Here is only a symmetric fix in FW side. |
@keyonjie so this is curent real world problem when we remove modules. Can you detail the real problem in commit messages (adding logs shwoing the issue would be good). |
@lgirdwood @keyonjie I think the problem is that on BYT/CHT the interrupts arent getting disabled properly when the module is removed, which is what is probably leading to the problem that keyon is talking about. This PR will just hide the real problem IMO |
@lgirdwood @ranj063 yes, we still need to disable the interrupt properly, this PR won't cover all the cases, what here it changes only is to enable the specific DONE interrupt when necessary, I think at least there is no harm to do so. |
@ranj063 @plbossart with this change, we can eliminate those "no reply expected" ACK IPCs also. If I am not asking for ACK, all your ACKs(or attacks) will not be passed to me. |
@keyonjie can you fix this in the driver. It seems your are trying to fix a driver issue in the FW here. |
@lgirdwood no actually, the symmetric PR in the driver side is here: thesofproject/linux#1491 let me know if you have any specific concerns about the FW PR here. |
There is no problem on all UEFI machines, the issue with module load/unload only happens on the Cyan model, and we have only tested with the legacy BIOS. |
@keyonjie any update, can this be closed ? |
@lgirdwood yes, let me close it. |
Refine the IPCD DONE mask sequence:
Signed-off-by: Keyon Jie yang.jie@linux.intel.com