Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout starts counting before apk and instrumentation has been installed [android] #1194

Open
msand opened this Issue Mar 5, 2019 · 14 comments

Comments

3 participants
@msand
Copy link
Contributor

msand commented Mar 5, 2019

Description

Timeout starts counting before apk (and instrumentation) has been installed, causing the first attempt at testing e2e android on travis to fail (I haven't got bitwise to work properly with android e2e a single time yet). Running a second time allows the tests to finish on time, at least sometimes, it seems relatively flaky.

Ran into something like 20 different error messages on the way, most of them having a closed issue here on the repo with the conversations locked (not very helpful). In the end I managed to find a way to make it build, by using the debug build of the private travis-ci.com, ssh into the machine, logcat android in one half of the screen and running detox test repeatedly after giving the emulator enough time to start and stabilize. After this I had to find suitable timeouts, which weren't too big for travis to think it had frozen, and allow the second run to finish.

Also, a lot of different errors were avoidable by adding a 60 second sleep after the android-wait-for-emulator command, all kinds of service and permission errors show up even if the wait command finishes successfully.

A history of the 100ish attempts to make this work can be found from the public travis: https://travis-ci.org/msand/react-native-svg-e2e/builds
And another 150ish attempts in bitrise: https://app.bitrise.io/app/1a150b039e9b8fbd#/builds
Would probably be useful to collect the various error messages and resolutions into documentation somewhere.

Testing on iOS works good (available in the primary bitrise workflow), I even have it taking screenshots on bitwise. On travis, iOS works fine with the example project, creating a new project, installing react-native-svg (the system under test), and building (haven't tried making screenshots on travis yet). And another new project stage, using cocoapods, installing, and, building. This issue is specifically for the android workflow in bitrise, and the android "build test for example" on travis.

Also, building, testing and taking screenshots locally works fine with both Android and iOS.

Ideally, detox would wait for adb to install the packages before starting the timeout, or even have a separate timeout for this. And, something which ensures output at least once per 10 minutes while waiting (not to make travis believe it has crashed).

Steps to Reproduce

Fork https://github.com/msand/react-native-svg-e2e
Add it to travis
Add it to bitrise
Trigger build(s)

Detox, Node, Device, Xcode and macOS Versions

  • Detox: 10.0.12
  • React Native: 0.58.6 (most recent stable release)
  • Node: 11.10.1 (most recent node from nvm)
  • Device: android emulator canary https://dl.google.com/android/repository/emulator-linux-5329922.zip (latest stable behaves similar, but canary should be faster/more efficient)
    • Travis:
      • Android linux image with oraclejdk8, system-images;android-21;google_apis;armeabi-v7a
    • Bitrise
      • Android Ubuntu 16.04, system-images;android-28;google_apis;x86

Device and verbose Detox logs

https://travis-ci.org/msand/react-native-svg-e2e/jobs/501950974
https://app.bitrise.io/build/42e2db33ffa6485c

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 5, 2019

I haven't looked much at the code in detox yet. But I'm thinking, perhaps this would be cleanest to solve by splitting the logic for installing the apk and instrumentation into a separate step. Such that when it finishes, one can safely call detox with the --reuse flag. And perhaps a separate command to check that the app and the instrumentation has started correctly and warmed up for testing.

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 5, 2019

Is there any existing example of detox e2e testing working with android on bitrise?

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 6, 2019

Managed to get x86 android on bitrise now as well: https://app.bitrise.io/build/c31c1a40907cd137
Using system-images;android-28;google_apis;x86
Latest android emulator canary with using emulator-headless -no-accel
Using default "Hybrid (Xamarin, Ionic, ...) on macOS, includes Visual Studio for Mac, Stable channel"

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 11, 2019

I'm still having some flakyness with android on travis: https://s3.console.aws.amazon.com/s3/buckets/react-native-svg-e2e/msand/react-native-svg-e2e/117/117.2/android.emu.release.2019-03-11%252015-59-50Z/?region=us-east-1&tab=overview
Relatively common to get "Unfortunately, Google Play Games has stopped." in a modal with an "OK" button. I found I can select the OK button using adb shell input keyevent 61 and click it using adb shell input keyevent 66, so I've added two steps before running detox:

        - adb shell dumpsys window windows | grep -E 'Not Responding' && adb shell input keyevent 61  && adb shell input keyevent 61  && adb shell input keyevent 66 || true
        - adb shell dumpsys window windows | grep -E 'has stopped' && adb shell input keyevent 61 && adb shell input keyevent 66 || true

I've also changed to using travis_wait for the long running install of the apk, and give --reuse to detox:

        - adb push android/app/build/outputs/apk/release/app-release.apk /data/local/tmp
        - travis_wait adb shell pm install /data/local/tmp/app-release.apk
        - adb shell rm /data/local/tmp/app-release.apk
        - adb push android/app/build/outputs/apk/androidTest/release/app-release-androidTest.apk /data/local/tmp
        - adb shell pm install /data/local/tmp/app-release-androidTest.apk
        - adb shell rm /data/local/tmp/app-release-androidTest.apk

But sometimes I still get this in some tests in the middle of the run:

image

So, I've patched the screenshot plugin to run this on start:

        const content = await adb.shell(deviceId, 'dumpsys window windows');
        if (content && content.indexOf('Application Error: com.google.android.play.games') !== -1) {
          await adb._sendKeyEvent(deviceId, 61);
          await adb._sendKeyEvent(deviceId, 66);
        }

msand/detox-patch-build@8a9c562
Update: msand/detox-patch-build@e764d3c

And hoping it would resolve the issue for good. @d4vidi Would you be interested in a PR for this?

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 11, 2019

I guess the assertions are failing because mCurrentFocus is in the Application error window saying that Google Play Games has stopped, thus it thinks that the elements aren't visible. Perhaps there should be some better logic for detecting if system services have failed, and are causing false negatives because of showing modals. Such as querying mCurrentFocus and making sure it's in the expected window before making assertions about visibility.

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 11, 2019

Seems I hadn't tested the logic properly, new patch looks for Application Error: com.google.android.play.games instead of has stopped

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 11, 2019

This seems to have resolved the issue: https://travis-ci.org/msand/react-native-svg-e2e/jobs/504814138
You can see from the logs that the modal was present (and got dismissed), and that the tests passed correctly.

@d4vidi

This comment has been minimized.

Copy link
Contributor

d4vidi commented Mar 12, 2019

@msand first thing's first, thanks for this highly detailed report!
Second, it troubles me to see that the integration process hasn't been smooth for you. I really want this to be better, and so I'd be happy if you better explain which exact timeout you're referring to.

In essence, detox test should take of bringing up the device and installing the app for you, and it should feel smooth. As a side note, I'd say that on CI it's recommended that you'd have a saved snapshot of the emulator set up so it wouldn't have to cold start each time.

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 16, 2019

Well I'm happy I got it working at least. It was quite a ride in debugging CI land 😉 learned a lot in the process. Detox seems very nice otherwise, thanks for your work on it!
Seems we could improve it quite a lot by detecting the case that focus is outside the app, and even more if it could detect various os related errors and report them, or even deal with them.

There are two timeouts: One from the testing system, jest in this case, which times out when installing the apk takes too long. Timeout set here: https://github.com/msand/react-native-svg-e2e/blob/master/e2e/init.js#L12

The other one is from the CI system itself, which was mostly an issue in travis, even when just installing the app apk to the emulator as I do now to avoid the jest timeout, before calling detox with --reuse. My current workaround is the travis_wait command, which is intended for long-running steps which produce no output, but still isn't a sign of hanging. Travis has a default timeout of 10 minutes, the travis_wait makes it 20 by default and can be changed with an optional parameter. This might be good to add to documentation, for using detox with android on ci.

Also, the additional sleep after the wait_for_emulator command, I suspect it's a flaw with the wait command, it should look for something else to decide if the emulator is ready to get commands / have apks installed to it / e2e run on it, or detox could retry if it fails. Alternatively, could perhaps add to the documentation a suggestion to add a sleep command, but doesn't feel completely clean. Or as you suggested now, to use a saved snapshot of the emulator which has had any initial setup needed already done and given enough time to stabilize. And, perhaps even install some trivial/minimal apk to it, to verify that it works in isolation. But, correct detection of emulator readiness would be ideal.

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 17, 2019

I'm thinking something like this might be a better wait_for_emulator command:

adb wait-for-device shell getprop
adb shell '
if [[ $(getprop | grep dev.bootcomplete) ]]
 then while [[ -z $(getprop dev.bootcomplete) ]] do
  echo dev.bootcomplete: $(getprop dev.bootcomplete);
  sleep 1;
 done;
 echo dev.bootcomplete: $(getprop dev.bootcomplete);
else
 echo no dev.bootcomplete property;
fi

if [[ $(getprop | grep sys.boot_completed) ]]
 then while [[ -z $(getprop sys.boot_completed) ]] do
  echo sys.boot_completed: $(getprop sys.boot_completed);
  sleep 1;
 done;
 echo sys.boot_completed: $(getprop sys.boot_completed);
else
 echo no sys.boot_completed property;
fi

if [[ $(getprop | grep service.bootanim.exit) ]]
 then while [[ -z $(getprop service.bootanim.exit) ]] do
  echo service.bootanim.exit: $(getprop service.bootanim.exit);
  sleep 1;
 done;
 echo service.bootanim.exit: $(getprop service.bootanim.exit);
else
 echo no service.bootanim.exit property;
fi

if [[ $(getprop | grep init.svc.bootanim) ]]
 then while [[ $(getprop init.svc.bootanim) != 'stopped' ]] do
  echo init.svc.bootanim: $(getprop init.svc.bootanim);
  sleep 1;
 done;
 echo init.svc.bootanim: $(getprop init.svc.bootanim);
else
 echo no init.svc.bootanim property;
fi
'

This should be sufficient to resolve the issues from the fact that install requires the Android package manager, which is available only after the system is fully booted. I think this should allow removing or at least decreasing the extra sleep I'm currently using after the wait command. Similar commands seem to be used in chromiumos and fastlane
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/350770/2/server/hosts/adb_host.py#519
https://github.com/AzimoLabs/fastlane-plugin-automated-test-emulator-run/blob/master/lib/fastlane/plugin/automated_test_emulator_run/actions/automated_test_emulator_run_action.rb

Also, it seems it might be more reliable not to use -no-boot-anim for the getprop service.bootanim.exit / getprop init.svc.bootanim to be a good indicator of it having started properly:
https://issues.jenkins-ci.org/browse/JENKINS-27702

@d4vidi d4vidi self-assigned this Mar 18, 2019

@d4vidi

This comment has been minimized.

Copy link
Contributor

d4vidi commented Mar 18, 2019

To try to narrow it down, as far as Detox is concerned, it would be better if the jest/mocha timeout would not include the time it takes for the emulator to start and for the app installation to complete. Is that the problem at its core?

@msand

This comment has been minimized.

Copy link
Contributor Author

msand commented Mar 18, 2019

Well, essentially it's three separate issues I've had to work around. One to get the emulator to a state where it's safe to install the apk. One to handle that the install takes too long. And one to handle that the google services crash and takes the focus away from the app.

@noomorph

This comment has been minimized.

Copy link
Contributor

noomorph commented Apr 15, 2019

Good observations. I'll make the issue pinned till we decide what to do about the things you've noticed.

@noomorph noomorph added this to Needs triage in Issues triage Apr 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.