Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zalenium issues when no enough CPU resources are available for video recording #146

Closed
MiluchOK opened this Issue Jun 8, 2017 · 11 comments

Comments

5 participants
@MiluchOK
Copy link

MiluchOK commented Jun 8, 2017

When I run tests in 4 parallel threads (about 5 heavy tests per a thread) I end up zalenium crashing on me.

In the end by trying to access http://localhost:4444/grid/console I am getting 502 Bad Gateway error and the only solution is to restart the docker image.

What I see in logs:

Waiting for video to stop recording...
video-rec                        STOPPED   Jun 09 12:28 AM
Done waiting for video recording to stop.
Video recording stopped
--LOG 00:28:53:859378518 -- DEBUG: video-rec-stdout.log ----
    Stream #0:0: Video: rawvideo (BGR[0] / 0x524742), bgr0, 1900x1880, 10 fps, 10 tbr, 1000k tbn, 10 tbc
No pixel format specified, yuv444p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[libx264 @ 0x12280a0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2
--LOG 00:28:40:632097174 Trapped SIGTERM or SIGINT so shutting down ffmpeg gracefully...
--LOG 00:28:40:641717826 Will kill VID_TOOL_PID=1654 ...
[libx264 @ 0x12280a0] profile High 4:4:4 Predictive, level 5.0, 4:4:4 8-bit
[libx264 @ 0x12280a0] 264 - core 148 r2643 5c65704 - H.264/MPEG-4 AVC codec - Copyleft 2003-2015 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=0:0:0 analyse=0:0 me=dia subme=0 psy=0 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=0 chroma_qp_offset=0 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=10 scenecut=0 intra_refresh=0 rc=cqp mbtree=0 qp=0
Output #0, matroska, to '/home/seluser/videos/vid_chrome_40008.mkv':
  Metadata:
    encoder         : Lavf56.40.101
    Stream #0:0: Video: h264 (libx264) (H264 / 0x34363248), yuv444p, 1900x1880, q=-1--1, 10 fps, 1k tbn, 10 tbc
    Metadata:
      encoder         : Lavc56.60.100 libx264
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (libx264))
Could not write header for output file #0 (incorrect codec parameters ?): Immediate exit requested
Exiting normally, received signal 2.
--LOG 00:28:41:671063184 Tried to kill -SIGTERM VID_TOOL_PID=1654
--LOG 00:28:41:680362019 Waiting up to 6s for VID_TOOL_PID=1654 to end with SIGTERM...
--LOG 00:28:41:693379829 wait_pid successfully managed to SIGTERM:VID_TOOL_PID=1654 within less than 6s
--LOG 00:28:41:708743447 Will try to fix the videos...
--LOG 00:28:42:788497322 Fixing perms for /home/seluser/videos/vid_chrome_40008.mkv*
--LOG 00:28:43:180531631 Changing video encoding from mkv to mp4...
--LOG 00:28:44:783082418 Conversion from mkv to mp4 FAILED! in within the 20s
--LOG 00:28:45:049847189 Optimizing /home/seluser/videos/vid_chrome_40008.mp4 for HTTP streaming...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:45:456048840 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:46:479030170 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:47:609130836 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:49:242635890 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:50:971258261 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
Error opening file /home/seluser/videos/vid_chrome_40008.mp4: IsoMedia File is truncated
--LOG 00:28:52:042575775 MP4Box got errors meaning the mp4 video file is corrupted, trying again...
--LOG 00:28:53:126241695 Failed! to mp4box_retry.sh within 8s
--LOG 00:28:53:177013880 ffmpeg shutdown complete.
--LOG 00:28:53:883021685 -- DEBUG: video-rec-stderr.log ----
/usr/bin/start-video-rec.sh: line 59: kill: (1654) - No such process
ffmpeg version 2.8.11-0ubuntu0.16.04.1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[matroska,webm @ 0xc41320] Format matroska,webm detected only with low score of 1, misdetection possible!
[matroska,webm @ 0xc41320] EBML header parsing failed
/home/seluser/videos/vid_chrome_40008.mkv: Invalid data found when processing input

00:28:54.215 INFO - http://172.17.0.7:40008 Video file copied to: /home/seluser/videos/zalenium_4f552f6e-a765-462f-a7d2-c159e4ed8ece_chrome_LINUX_20170609002854.mp4
00:28:54.578 INFO - http://172.17.0.7:40008 [bash, -c, transfer-logs.sh]
00:28:54.628 INFO - http://172.17.0.7:40008

Note: My tests are registering a new chromedriver for every test, so it is a lot of conteiner spawning/shutting down work.

@MiluchOK

This comment has been minimized.

Copy link
Author

MiluchOK commented Jun 8, 2017

So, seems like the issue only appears when you have limited resources available for Zalenium to work with, bumping up the amount of CPU and Mem that docker can use helped big time. But still there should be detection of some kind, that not enough resources available, but not a simple crash/

@elgalu

This comment has been minimized.

Copy link
Member

elgalu commented Jun 9, 2017

That's correct @MiluchOK , having video recording enabled is CPU intensive as we are using ffmpeg with compression on the fly.

So your options are:

  1. Disable video recording with capability recordVideo set to false
  2. Increase CPU and RAM availability (which is how you fixed it)

I've tried to record video without compression on the fly but that generates a huge avi file, we could make it configurable for people that can handle lots of disk space usage. But for now we are sticking to adjusting CPU/Mem hardware requirements accordingly to the amount of parallel tests we want to run.

there should be detection when there aren't enough resources available

We would love that! if you figure out a way to contribute with that via a PR it would be awesome!

@diemol

This comment has been minimized.

Copy link
Member

diemol commented Jun 9, 2017

@MiluchOK I also ran into the same issue when I didn't give so much RAM and CPU to docker in my Mac.

But I think you are right, Zalenium should manage the available resources and decide if more containers should be created or not. Just crashing is not that is very annoying. I don't have any ideas right now to improve that, but maybe you have some and we can implement them together.

In addition to that, we will work also in #135, it basically aims to reuse created containers and avoid this overhead of creating and destroying containers, which conceptually is nice but it hurts performance badly.

@pearj

This comment has been minimized.

Copy link
Collaborator

pearj commented Jun 9, 2017

#103 will solve the problem of deciding whether enough resources are available because you can create the container with a resource request. Ie when you create the pod you tell kubernetes how much resources it needs to start the pod. So basically if there aren't enough resources it won't schedule the pod.

I haven't quite figured out what a good number is yet, because I haven't been able to calculate memory usage during a test. Openshift graphs memory usage but only after the container has been running for a bit. Do you guys have any idea how much memory is required while recording and running chrome? I allocated 1Gi of ram by default and was going to make it overridable.

@pearj

This comment has been minimized.

Copy link
Collaborator

pearj commented Jun 11, 2017

@elgalu

I've tried to record video without compression on the fly but that generates a huge avi file

How huge is huge? Potentially you could have a worker thread that compresses the videos on zalenium one by one after copying from the node.

@elgalu

This comment has been minimized.

Copy link
Member

elgalu commented Jun 11, 2017

I like this idea:

have a worker thread that compresses the videos on Zalenium one by one after copying from the node

I think it would only make sense if the worker runs on separate hardware else we will have the same problem. So this might be the way to go with K8s

@pearj

This comment has been minimized.

Copy link
Collaborator

pearj commented Jun 11, 2017

I think it would only make sense if the worker runs on separate hardware else we will have the same problem. So this might be the way to go with K8s

With K8s you can have pod anti-affinity to say that the video compression can't run on the same node as selenium tests. Or you could simply have a dedicated node for that purpose that you never schedule salenium nodes to run on.

@pearj

This comment has been minimized.

Copy link
Collaborator

pearj commented Jun 12, 2017

We might not need to care about what runs on what hardware if we set all the resource requests (CPU and memory) appropriately for each pod. For example if we know a selenium node is going to do video compression on the fly we set the CPU request appropriately high and then let the k8s scheduler figure out the appropriate density.

@elgalu elgalu changed the title Zalenium crashes Zalenium issues when no enough CPU resources are available for video recording Jul 20, 2017

@diemol

This comment has been minimized.

Copy link
Member

diemol commented Oct 29, 2017

I've kept this issue open for a while to see if it caught more attention to it, and it seems it has not.

So far, we have made efforts to improve Zalenium's performance, and we created this document as a reference.

Generally, Zalenium is used in CI environments where all the resources are shared with the docker daemon. In the case where it is used in a laptop, it is known that the amount of resources that needs to be shared depends on the amount of concurrent tests (as mentioned in the linked document above). On the other hand, in the Kubernetes world, it is possible to specify the allocated resources via env vars.

We could have implemented the flags docker gives to allocate resources, but so far the benefit vs. the effort does not match, and we don't have the capacity to implement that. Nevertheless we are open to receive PRs for this.

So I've decided to close this issue since it has had no traction. Hope it is understandable.

@diemol diemol closed this Oct 29, 2017

@unlikelyzero

This comment has been minimized.

Copy link

unlikelyzero commented Feb 1, 2018

We ran into this issue and didn't immediately understand why the 502s were being thrown until we found this closed issue. I think we should considering adding a 'Known Issue' section or maybe a simple 'Troubleshooting' section for first-time-users

@diemol

This comment has been minimized.

Copy link
Member

diemol commented Feb 1, 2018

That's a good point @unlikelyzero, I just created #427, we'll find the time to add this to the revamped docs https://zalando.github.io/zalenium/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.