Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laggy FMVs #573

Open
RageXbox opened this issue Nov 26, 2021 · 3 comments
Open

Laggy FMVs #573

RageXbox opened this issue Nov 26, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@RageXbox
Copy link
Contributor

Bug Description

Most games with FMVs lags and FMV audio lags too. Even if you’re on a most powerful hardware. Causes Halo 2 Epilogue to skip.

Expected Behavior

FMVs should not lag and audio should not lag at all.

xemu Version

v0.6.2-9-g69ceec4446

System Information

Windows 10 (64-bit)
(Intel(R) Xeon(R) CPU E5-2678 v3 @2.50 GHz)
(NVIDIA Quadro P5000)
(NVIDIA 472.47)

Additional Context

FMV.mp4
@RageXbox RageXbox added the bug Something isn't working label Nov 26, 2021
@Blackbird88
Copy link
Contributor

Blackbird88 commented Nov 27, 2021

From my experience this affects all games using XMV video format. Games like GTAIII/VC which use Bink Video (.BIK) are fine.

@icculus
Copy link

icculus commented Mar 5, 2022

Just adding some notes here, because I'm seeing this too.

My assumption is that XMV decoding is just a software library linked into the game, and complex/higher-res videos are CPU-bound, and thus is at the mercy of qemu's efficiency.

(unless, I suppose, the texture upload performance can be improved, but I haven't dug into that at all.)

As a test, I ran xemu under Linux's "perf record" tool. I used "Castlevania - Curse of Darkness", since it has a really long XMV near startup that doesn't keep up, and just raced through the menus to start the game without a save created...then I just let perf collect samples while the video didn't keep up for a minute or two.

These are the biggest CPU hotspots:

+   11.99%     0.00%  xemu          [unknown]                [.] 0x0000000183e5f000
+   10.34%     0.22%  xemu          [unknown]                [.] 0000000000000000
+    7.58%     0.47%  xemu          libc.so.6                [.] clock_gettime@@GLIBC_2.17
+    4.43%     0.00%  xemu          [vdso]                   [.] 0x00007ffd3c1ab6e8
+    4.40%     4.40%  xemu          [vdso]                   [.] 0x00000000000006e5
+    3.66%     3.50%  xemu          xemu                     [.] helper_cvtps2pi
     3.51%     3.38%  xemu          xemu                     [.] helper_psrad_mmx
     2.92%     2.79%  xemu          xemu                     [.] float32_add
+    2.89%     2.71%  xemu          xemu                     [.] float32_mul
     2.83%     2.81%  xemu          xemu                     [.] helper_packuswb_mmx
+    2.79%     1.43%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ee01
     2.41%     2.28%  xemu          xemu                     [.] helper_punpcklwd_mmx
+    2.38%     0.01%  xemu          [unknown]                [k] 0xffffffffb160007c
+    2.34%     0.00%  xemu          [unknown]                [.] 0x0000000000000190
+    2.33%     0.00%  xemu          [unknown]                [.] 0x70632d363833692d
+    2.33%     0.00%  xemu          [unknown]                [.] 0x00007ff0f40c7ab0
+    2.29%     2.25%  xemu          xemu                     [.] cpu_exec
+    2.27%     0.01%  xemu          [unknown]                [k] 0xffffffffb142c331
     2.11%     2.03%  xemu          xemu                     [.] helper_psllq_mmx
+    1.75%     1.62%  xemu          xemu                     [.] helper_lookup_tb_ptr
+    1.74%     0.00%  xemu          xemu                     [.] 0x00005618e0bf77f0
+    1.73%     1.65%  xemu          xemu                     [.] soft_f32_mul
+    1.72%     1.48%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edf3
+    1.55%     0.07%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edf6
     1.50%     1.37%  xemu          xemu                     [.] helper_packssdw_mmx
     1.47%     1.42%  xemu          xemu                     [.] helper_pslld_mmx
+    1.45%     0.01%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ee12
+    1.44%     0.00%  xemu          [unknown]                [.] 0x00000000000000e8
+    1.44%     1.44%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ee0e
+    1.44%     0.00%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ee05
+    1.44%     0.00%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edd0
+    1.43%     1.43%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edce
+    1.43%     1.42%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ede7
+    1.43%     0.00%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0eded
+    1.43%     0.00%  xemu          [unknown]                [.] 0x00007ff0f40c7a00
+    1.43%     1.40%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edda
+    1.43%     0.01%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edc0
+    1.42%     1.42%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0ee3b
+    1.41%     0.00%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0eddd
+    1.37%     1.36%  xemu          [JIT] tid 2630951        [.] 0x00007ff0b4d0edfd
+    1.12%     0.00%  xemu          [unknown]                [.] 0x4781270047812700
+    1.12%     0.00%  xemu          [unknown]                [.] 0xc6483000c6483000
+    1.06%     0.00%  xemu          libc.so.6                [.] __GI___ioctl_time64
     1.04%     0.93%  xemu          xemu                     [.] helper_mulps
     1.02%     0.96%  xemu          xemu                     [.] helper_paddw_mmx
+    1.01%     0.91%  xemu          xemu                     [.] parts64_round_to_int_normal.constprop.0
+    1.00%     0.00%  xemu          [unknown]                [.] 0xffffffffb0b3e981

(everything below this is < 1 percent of the CPU time.)

The things without symbols are probably JIT'd code from qemu, which would make sense if this is just chewing through video data on the CPU.

The most notable thing here is the clock_gettime() call taking seven percent of the processing time. That is probably worth exploring! If this is just the loop in sdl2_gl_refresh, though, it means we're not spending much time on the CPU per-frame and it's just spinning to keep the rendering at 60Hz, and the CPU emulation is not the problem in this case because we're clearly waiting around with nothing to do.

After that, it's worth looking at helper_cvtps2pi, float32_add, etc, and seeing if there's something that will make them faster. Likely small tweaks in there will result in significant improvements due to the likely sheer volume of these function calls per-frame. Is there some way we can inline these functions (or coerce the JIT to inline them instead of calling them as functions?). Or get them to overlay actual SIMD instructions if they are faking them without it?

Also I'm curious if the decoding library was intended to use the GPU to aid in decoding and is falling back to software decoding because something isn't hooked up, but this era of gaming might not have been that savvy about GPU processing (or not had enough GPU to do it), but maybe it turns out there's some basic check that is unexpectedly failing and putting us on a slow path that actual consoles never used. Don't know.

Failing all else: maybe we detect this library and use HLE to handle decoding in native code without going through qemu...? Seems like a big lift and a lot of trouble, but I guess it's a possible solution.

Anyhow, I have no answers or patches at the moment, just adding information and ideas to this bug report.

@Triticum0
Copy link

Triticum0 commented May 18, 2022

This issue Is horrible for unskippable cutscenes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants