Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected recursive calls to clock_gettime() without proper initialization #327

Closed
ophusky opened this issue Apr 26, 2021 · 9 comments
Closed

Comments

@ophusky
Copy link

ophusky commented Apr 26, 2021

I try to run a skynet program,but something went wrong,How can I solve it?

[root@wenjian x21_server]# uname -a
Linux wenjian.lwj_01 4.15.0-1047-gcp #50-Ubuntu SMP Wed Oct 2 00:50:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@wenjian ~]# echo $LD_PRELOAD
/usr/local/lib/faketime/libfaketime.so.1
[root@wenjian ~]# echo $FAKETIME
+0
[root@wenjian x21_server]# ./bin/tad ctl all_in_one -e cfg_all_in_one start
Start app: [ all_in_one ] with env: [ cfg_all_in_one ].
Using config file: [ /root/.tad/all_in_one.cfg_all_in_one ]
Exec command: [ bin/skynet /root/.tad/all_in_one.cfg_all_in_one ]
libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.
libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().
libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.
libfaketime:  Returning -1 on clock_gettime() to break recursion now... if that does not work, please check other libraries' error handling.

@wolfcw
Copy link
Owner

wolfcw commented Apr 26, 2021

My guess that this is related to jemalloc (it's prominently mentioned in skynet's README). As can be seen in libfaketime's #130, this problem is known, still exists, we don't have a solution for it on libfaketime's end, and it probably cannot be fixed unless also some changes to jemalloc are made. Any fresh ideas are highly welcome. :-)

@batiati
Copy link

batiati commented Jul 4, 2021

I got this same message running mssql on docker (ubuntu) and the latest libfaketime 0.9.9 (built from source).

/# uname -a
Linux sql-01-67bd5d6dcf-srlvv 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

/# faketime '2008-01-01' /opt/mssql/bin/sqlservr
libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.
libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().
libfaketime:  Please check whether any other libraries are in use that clash with libfaketime.
libfaketime:  Returning -1 on clock_gettime() to break recursion now... if that does not work, please check other libraries' error handling.

As far as I know, mssql depends on jemalloc too.

@wolfcw
Copy link
Owner

wolfcw commented Jul 4, 2021

Perhaps you could try #333? It still needs some fine-tuning, but currently this would be the way ahead regarding jemalloc compatibility.

@batiati
Copy link

batiati commented Jul 4, 2021

I tried #333's PR, adding -DJEMALLOC_COMPAT to the CFLAGS.
Unfortunately, the same issue with mssql.

@batiati
Copy link

batiati commented Jul 4, 2021

I added some debug messages, it's odd because the recursive calls happen when this part is called: real_calloc = dlsym(RTLD_NEXT, "calloc");

In this line of the PR's code:
https://github.com/ronrother/libfaketime/blob/9bff182a3884d08de2810497653c8669f103bb87/src/libfaketime.c#L2891

Called from here
https://github.com/ronrother/libfaketime/blob/9bff182a3884d08de2810497653c8669f103bb87/src/libfaketime.c#L2898

And first init time from here
https://github.com/ronrother/libfaketime/blob/9bff182a3884d08de2810497653c8669f103bb87/src/libfaketime.c#L2283

I'm not 100% sure, but I think that there are no other threads involved.
Let me know if there is something that I can test to help.

@wolfcw
Copy link
Owner

wolfcw commented Jul 5, 2021

Thanks a lot for trying this out! I hope we can fix this as part of #333. Unfortunately, currently libfaketime will clash with anything that uses jemalloc, which is a long standing issue (cf. #130), and there is no known workaround.

@batiati
Copy link

batiati commented Jul 7, 2021

Hi @wolfcw,

As suggested by @ronrother, I tried to make a syscall, without any memory allocation required by dlsym(), so I started by using datefudge's code as reference because it is the same concept but way simpler than libfaketime to this little proof of concept.

That's my code (it compiles)

#include <time.h>
#include <sys/time.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <asm/unistd.h>
#include <asm/msr.h>

static long syscall_gettime(long clock, struct timespec *ts)
{
    long ret;
    asm("syscall" : "=a" (ret) :
    "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : "memory");
    return ret;
}

int clock_gettime(clockid_t x, struct timespec *y) {

    long ret = syscall_gettime(x, y);
    if (x != 0) return ret;

    const long fake = -1* 10 * 365 * 24 * 60 * 60;
    y->tv_sec += fake ;

    return ret;
}

Note that in the first call clockid_t == 1, SQL Server hangs if I change tv_sec here; The next calls clock_t == 0.

It was necessary to wait an arbitrary amount of time before altering tv_sec, for some reason SQL Server hangs if I change it from the first call.

Here a select GETDATE() returning 10 years in the past:

image

It's not stable, I still got random crashes after a couple of minutes ... it seems that SQL Server needs timestamps very badly.

Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=702

Thanks a lot

@wolfcw
Copy link
Owner

wolfcw commented Jul 7, 2021

If it solves your problem, please feel free to strip anything non-essential from libfaketime to make it fit your purpose.

The approach you outline may, however, be hard to turn into a more generic solution. On the one hand, libfaketime more recently also intercepts syscall(), which again just shifts the same observed problem to this function. On the other hand, syscall() is not available on all platforms that libfaketime and probably even more so jemalloc support.

@batiati
Copy link

batiati commented Jul 7, 2021

Yes, I was thinking to make a PR, but I came to this same conclusion;

Maybe it's better to create a new specific tool regarding only this specific scenario (running on docker and faking time through syscall).

Many thanks for your help!

[EDIT]

For whom it might interest:

Lib dateoffset > very simple, stripped-down lib that can fake date and works with jemalloc on Linux.

mssql-testing > MS SQL Server docker image with testing convenience tools, including fake dates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants