Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hack Week] A generic solution for resolving the ANRs caused by accessing local database #20937

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

jarvislin
Copy link
Contributor

@jarvislin jarvislin commented Jun 4, 2024

This is a Hack Week project, this PR also fixes #20850 for demonstrating how to use the helper to update the current codebase.

Why

There are a number of Android ANR errors reported on Sentry, and some of these ANRs are caused by local database access. The likely cause is that some code is accessing the local database without switching to the background thread, which can lead to ANRs in certain situations.

To address this issue, I have created a generic helper that can be used to easily handle thread switching. This helper can be used by anyone who encounters a similar situation in the future.

P.S. This helper is not limited to accessing the database; this helper can also be used for any requirement that needs to be executed on a IO thread and then switched back to the main thread.

Use with Caution

This helper looks convenient, but the goal is not to use it extensively. Good code should have a well-designed architecture and proper implementations, which wouldn't require this helper. It's meant for quick fixes in the code's incorrect implementations. However, in the long run, we should aim for a complete refactor.

Note

While working on #20851, I found that the solution to modify the original blocking operation to thread switching introduced a race condition that caused UI errors. The culprit was a piece of code operating on another thread. This means that the original code needs to be refactored further to avoid this issue. It's crucial to carefully check for potential race conditions when dealing with similar problems.


To Test:

  1. Sign in JP app.
  2. Go to the Reader tab.
  3. Click on a post.
  4. Click on a tag in that post.
  5. Click Subscribe/unsubscribe button on the tag header.
  6. It should work as usual.
  7. Done, thank you!
Post Tag Header
Screenshot_20240604-174237 Screenshot_20240604-174247

Regression Notes

  1. Potential unintended areas of impact

    • reader
  2. What I did to test those areas of impact (or what existing automated tests I relied on)

    • manual
  3. What automated tests I added (or what prevented me from doing so)

    • n/a

PR Submission Checklist:

  • I have completed the Regression Notes.
  • I have considered adding accessibility improvements for my changes.
  • I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

Testing Checklist (strike-out the not-applying and unnecessary ones):

skipped

@dangermattic
Copy link
Collaborator

1 Message
📖 This PR is still a Draft: some checks will be skipped.

Generated by 🚫 Danger

Copy link

sonarcloud bot commented Jun 4, 2024

Quality Gate Passed Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@jarvislin jarvislin added [Type] ANR Application Not Responding Reader labels Jun 4, 2024
@jarvislin jarvislin added this to the 25.1 milestone Jun 4, 2024
@jarvislin jarvislin marked this pull request as ready for review June 4, 2024 15:47
@wpmobilebot
Copy link
Contributor

WordPress📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.
App NameWordPress WordPress
FlavorJalapeno
Build TypeDebug
Versionpr20937-032d503
Commit032d503
Direct Downloadwordpress-prototype-build-pr20937-032d503.apk
Note: Google Login is not supported on these builds.

@jarvislin jarvislin requested a review from antonis June 4, 2024 15:53
@wpmobilebot
Copy link
Contributor

Jetpack📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.
App NameJetpack Jetpack
FlavorJalapeno
Build TypeDebug
Versionpr20937-032d503
Commit032d503
Direct Downloadjetpack-prototype-build-pr20937-032d503.apk
Note: Google Login is not supported on these builds.

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 40.93%. Comparing base (56f0686) to head (032d503).
Report is 2 commits behind head on trunk.

Files Patch % Lines
...org/wordpress/android/datasets/AsyncTaskHandler.kt 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            trunk   #20937      +/-   ##
==========================================
- Coverage   40.93%   40.93%   -0.01%     
==========================================
  Files        1518     1519       +1     
  Lines       69550    69554       +4     
  Branches    11473    11473              
==========================================
  Hits        28473    28473              
- Misses      38491    38495       +4     
  Partials     2586     2586              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@antonis antonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this @jarvislin 🙇
I've tested the specific use case and it works as expected for me 🎉
The introduced AsyncHandler helper makes a lot of sense and adds structure to the code especially when calling from java 🥇

@antonis antonis merged commit a24e5d1 into trunk Jun 5, 2024
23 checks passed
@antonis antonis deleted the hackweek/database-anr branch June 5, 2024 10:47
Copy link
Contributor

@thomashorta thomashorta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this PR has already been merged, but looking at it I felt it could be worth a post-merge Review to maybe rethink if this makes sense to be kept as-is.

I understand this PR is more of a quick solution to some of the ANR problems we have, but I have concerns about its rationale being a bit too simplistic and potentially introducing other issues that degrade user experience and are harder to track (non-crashing/behavioral bugs).

Besides the technical concerns I pointed below in the code comments, I feel like each place in the codebase that currently calls DB/network code from the main thread needs to be investigated individually, and the long-term solution for most of them is likely some refactor to work correctly with async and concurrent calls.

That said it might make sense to still have some quick fix like this for some stuff, but we need to be aware of the risks.

*/
@JvmStatic
fun <T> load(backgroundTask: () -> T, callback: AsyncTaskCallback<T>) {
CoroutineScope(Dispatchers.IO).launch {
Copy link
Contributor

@thomashorta thomashorta Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns about the usage of Coroutines here.

CoroutineScope

The CoroutineScope is being created ad hoc instead of using an existing Scope or Context/Job, which is usually attached to some lifecycle-aware component. This is essentially the same as using the GlobalScope, which should be avoided.

This means that the Job launched here is completely standalone, bypassing one of the biggest advantages introduced by coroutines: structured concurrency.

In practice, this means that even if the caller of the coroutine dies, the callback lambda will still be called (since the Scope is never canceled) which can cause issues if said callback references something that was destroyed.

TBH I'm not sure what would be the best solution here, but using an existing Scope that's close to the component that launches this Job would be preferred, all-in-all, thinking about this helper class, I'd say receiving the CoroutineScope via argument would make more sense so the caller can pass an appropriate Scope.

Hardcoded Dispatchers.IO

This is another thing that should be avoided, not only because it's hard to test, but also because, based on the name of this class (async TASK), it looks like this is supposed to be a generic "offload-to-background" helper.

This means that passing a backgroundTask that runs computational work could be expected, but that kind of work should be run in the Dispatchers.Default, while Dispatchers.IO should be used only for IO/waiting tasks. More info here.

The solution for this should probably be to receive the CoroutineDispatcher as an argument since the caller would be the one who knows what kind of task is being run.

Copy link
Contributor Author

@jarvislin jarvislin Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomashorta Thank you for pointing out the potential issues in the current implementation.

I'd say receiving the CoroutineScope via argument would make more sense so the caller can pass an appropriate Scope.

Yeah this is also the way to handle the lifecycle stuff properly, will update it later.

Copy link
Contributor Author

@jarvislin jarvislin Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Hardcoded Dispatchers.IO, I'm not sure if I should receive the CoroutineDispatcher as an argument since the initial idea was to address issues related to accessing the database, opening it up to all dispatchers seems a bit over-engineered to me.

Therefore, I propose we keep this part as is and handle it through renaming for now. If a future need arises, we can then add the dispatcher argument. What do you think?

P.S. I think this helper is more related to the legacy code, especially in Java. Those codebase write in Kotlin should be able to handle the requirements without using this helper. So the code will use this helper might be less than we expected (and will be less after some refactors).

final boolean isAskingToFollow = !ReaderTagTable.isFollowedTagName(currentTag.getTagSlug());

final String slugForTracking = currentTag.getTagSlug();
AsyncTaskHandler.load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the usage here was more of an example of how to use this new AsyncTaskHandler but in this case (and I guess many other cases that are currently using DBs/network in the main thread) I feel the legacy code is in a bad state and the real fix should be a complete refactor.

For instance: this is a RecyclerView Adapter and it's dealing with data loading, which is clearly a bad practice, so a more coherent solution would be refactoring how the data is updated in the Adapter, probably moving the data loading to the appropriate ViewModel (which would have access to a proper CoroutineScope and ways of doing async work) and propagating the list state correctly to the Activity/Fragment containing the Adapter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approach is to first address the root cause with minimal changes and then look for opportunities to refactor. Refactoring involves more extensive changes, and dealing with legacy code introduces even more uncertainty. Therefore, I avoid making major changes right away unless I'm very confident or have plenty of time. It's a bit like a philosophical trade-off, but I agree that these related files need a complete refactor. In the long run, I believe the final outcome will be consistent.

@jarvislin
Copy link
Contributor Author

jarvislin commented Jun 6, 2024

@thomashorta Thank you so much for reviewing this PR. I plan to create a follow-up PR to handle the issues in the current implementation. And I just updated the section of Use with Caution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reader [Type] ANR Application Not Responding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ApplicationNotResponding: ANR
5 participants