Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate flamegraph at runtime #5697

Merged
merged 39 commits into from Nov 16, 2019
Merged

Generate flamegraph at runtime #5697

merged 39 commits into from Nov 16, 2019

Conversation

@YangKeao
Copy link
Contributor

YangKeao commented Oct 22, 2019

What have you changed?

  • Include pprof-rs as dependence.
  • Add HTTP interface to start/stop and get report from rsperftools
  • Modify route of jemalloc profile into /debug/pprof/heap

What is the type of the changes?

  • New feature (a change which adds functionality)

How is the PR tested?

They have just been tested manually :(

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No

Does this PR affect tidb-ansible?

No

Refer to a related PR or issue link

#5666

Any examples?

Assume that tikv status_server is listening at localhost:8080

curl "http://localhost:8080/debug/pprof/profile?frequency=50&seconds=10" # this will start cpu sampler at 10/sec frequency and continues 10 secs.
curl "http://localhost:8080/debug/pprof/profile?frequency=50&seconds=10" --header  "Content-Type: application/protobuf" # this will return a pprof protobuf

Notice:

The implementation of rsperftools is not efficient (some performance improvement will land these days). So too high frequency will make it hold lock all the time and may lead to long response time and high cpu usage.

Panic:

Under these situations, rsperftools will crash or deadlock:

  • profiling tick happens when program is propagating thrown exception

flamegraph

flamegraph.svg.gz

@YangKeao YangKeao requested review from lonng, breeswish, sticnarf and 5kbpers and removed request for 5kbpers Oct 22, 2019
@YangKeao

This comment has been minimized.

Copy link
Contributor Author

YangKeao commented Oct 22, 2019

/run-all-tests

@@ -20,6 +20,7 @@ use tikv_alloc::error::ProfError;
use tikv_util::collections::HashMap;
use tikv_util::metrics::dump;
use tikv_util::timer::GLOBAL_TIMER_HANDLE;
use rsperftools;

This comment has been minimized.

Copy link
@YangKeao

YangKeao Oct 22, 2019

Author Contributor

Who can tell me why this line is necessary? Without this line backtrace-rs can only give me the first frame (perf_signal_handler)

This comment has been minimized.

Copy link
@breeswish

This comment has been minimized.

Copy link
@kennytm

kennytm Oct 23, 2019

Contributor

The same reason as rust-lang/rust#64402 (comment) I suppose (to link the crate).

This comment has been minimized.

Copy link
@YangKeao

YangKeao Oct 25, 2019

Author Contributor

Even more wield! If I set rsperftools as a path dependency, I cannot get full stack backtrace even if I use it/extern crate it explicitly. However, both git dependency and normal dependency work well (with use statement)!

@YangKeao YangKeao force-pushed the YangKeao:rsperftools branch 2 times, most recently from b8aeffd to 866be92 Oct 22, 2019
…raph

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
@YangKeao YangKeao force-pushed the YangKeao:rsperftools branch from 866be92 to cf215e4 Oct 22, 2019
@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 22, 2019

Cool! Look forward to a more efficient implementation.

@YangKeao

This comment has been minimized.

Copy link
Contributor Author

YangKeao commented Oct 22, 2019

/run-all-tests

@siddontang

This comment has been minimized.

Copy link
Contributor

siddontang commented Oct 23, 2019

I prefer we follow the Go pprof style, e.g, for CPU profiling, we can access /debug/pprof/profile?seconds=30

@siddontang

This comment has been minimized.

Copy link
Contributor

siddontang commented Oct 23, 2019

also, for heap, we can access /debug/pprof/heap.

refer https://golang.org/pkg/net/http/pprof/

@@ -240,6 +339,9 @@ impl StatusServer {
(Method::GET, "/status") => Box::new(ok(Response::default())),
(Method::GET, "/pprof/profile") => Self::dump_prof_to_resp(req),
(Method::GET, "/config") => Self::config_handler(config.clone()),
(Method::GET, "/rsperf/start") => Self::start_rs_profiler(req),
(Method::GET, "/rsperf/stop") => Self::stop_rs_profiler(),
(Method::GET, "/rsperf/report") => Self::report_from_rs_profiler(),

This comment has been minimized.

Copy link
@overvenus

overvenus Oct 23, 2019

Contributor

How about adding perf duration?

I think golang pprof is good example https://golang.org/pkg/net/http/pprof/

This comment has been minimized.

Copy link
@siddontang

siddontang Oct 23, 2019

Contributor

+1
should not use start and stop manually

This comment has been minimized.

Copy link
@breeswish

This comment has been minimized.

Copy link
@lonng

lonng Oct 23, 2019

Contributor

+3

This comment has been minimized.

Copy link
@zhouqiang-cl

This comment has been minimized.

Copy link
@YangKeao

YangKeao Oct 25, 2019

Author Contributor

Now, the only HTTP interface left was /debug/pprof/profile?frequency=50&seconds=10!

@siddontang

This comment has been minimized.

Copy link
Contributor

siddontang commented Oct 23, 2019

another thing is that can we fold the tikv thread directly when we generate the flamegraph?

see https://github.com/pingcap/tidb-inspect-tools/blob/master/tracing_tools/perf/fold-tikv-threads-perf.pl

@zhouqiang-cl

This comment has been minimized.

Copy link
Member

zhouqiang-cl commented Oct 23, 2019

/test

@YangKeao YangKeao force-pushed the YangKeao:rsperftools branch from 9fa5adf to e2ee97c Oct 24, 2019
…signal and use spin rwlock

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
@YangKeao YangKeao force-pushed the YangKeao:rsperftools branch from e2ee97c to 1f6bdbb Oct 25, 2019
…top profiler

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 25, 2019

Would like to see TiKV threads folded (#5697 (comment)) as well since unfolded TiKV flamegraph is nearly unreadable 🤣BTW in your rsperftools 0.1.3 can we use a very high frequency now?

None => 10,
};

let frequency: i32 = match query_pairs.get("frequency") {

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 25, 2019

Member

We can supply a best default value for this parameter.

This comment has been minimized.

Copy link
@siddontang

siddontang Oct 25, 2019

Contributor

how about the frequency of Go? maybe we can use 99HZ by default.

This comment has been minimized.

Copy link
@siddontang

siddontang Nov 11, 2019

Contributor

From Go

func StartCPUProfile(w io.Writer) error {
	// The runtime routines allow a variable profiling rate,
	// but in practice operating systems cannot trigger signals
	// at more than about 500 Hz, and our processing of the
	// signal is not cheap (mostly getting the stack trace).
	// 100 Hz is a reasonable choice: it is frequent enough to
	// produce useful data, rare enough not to bog down the
	// system, and a nice round number to make it easy to
	// convert sample counts to seconds. Instead of requiring
	// each client to specify the frequency, we hard code it.
	const hz = 100

We know that the maximum HZ maybe 500, so here we must avoid too large frequency.

Another thing is Go uses 100, but I still prefer 99 by default :-)

};

Box::new(
Self::dump_rsprof(seconds, frequency)

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 25, 2019

Member

Will this function get blocked by the lock when there is another perf dump?

This comment has been minimized.

Copy link
@YangKeao

YangKeao Nov 1, 2019

Author Contributor

No. In general, this request will return 500 error.

Cargo.toml Outdated
@@ -126,6 +126,8 @@ engine_traits = { path = "components/engine_traits" }
sst_importer = { path = "components/sst_importer" }
engine_rocks = { path = "components/engine_rocks" }

rsperftools = { version = "0.1.3", features = ["flamegraph"] }

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 25, 2019

Member

Prefer to put it before tipb.

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
@zhouqiang-cl

This comment has been minimized.

Copy link
Member

zhouqiang-cl commented Nov 15, 2019

/merge

@sre-bot sre-bot added the S: CanMerge label Nov 15, 2019
@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 15, 2019

/run-all-tests

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 15, 2019

@YangKeao merge failed.

@YangKeao

This comment has been minimized.

Copy link
Contributor Author

YangKeao commented Nov 15, 2019

/run-all-tests

@lonng
lonng approved these changes Nov 15, 2019
@lonng

This comment has been minimized.

Copy link
Contributor

lonng commented Nov 15, 2019

/merge

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 15, 2019

/run-all-tests

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 15, 2019

@YangKeao merge failed.

@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Nov 15, 2019

/test

@YangKeao YangKeao merged commit c135e22 into tikv:master Nov 16, 2019
6 checks passed
6 checks passed
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-copr-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details
@YangKeao

This comment has been minimized.

Copy link
Contributor Author

YangKeao commented Nov 16, 2019

/run-cherry-picker

@you06

This comment has been minimized.

Copy link
Contributor

you06 commented Nov 16, 2019

/run-cherry-picker

1 similar comment
@you06

This comment has been minimized.

Copy link
Contributor

you06 commented Nov 16, 2019

/run-cherry-picker

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 16, 2019

cherry pick to release-2.1 failed

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 16, 2019

cherry pick to release-3.0 failed

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Nov 16, 2019

cherry pick to release-3.1 failed

@nrc nrc mentioned this pull request Nov 18, 2019
YangKeao added a commit to YangKeao/tikv that referenced this pull request Nov 19, 2019
Signed-off-by: Yang Keao <keao.yang@yahoo.com>
YangKeao added a commit to YangKeao/tikv that referenced this pull request Nov 19, 2019
Signed-off-by: Yang Keao <keao.yang@yahoo.com>
@lonng lonng mentioned this pull request Nov 19, 2019
26 of 61 tasks complete
sre-bot added a commit that referenced this pull request Nov 29, 2019
)

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
Signed-off-by: Nick Cameron <nrc@ncameron.org>
hawkingrei added a commit to hawkingrei/tikv that referenced this pull request Dec 1, 2019
* server/status_server: support control rsperftools and generate flamegraph

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update version of rsperftools to ignore queued signal and use spin rwlock

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: use one interface and profiler guard to start/stop profiler

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update rsperftools to increase performance

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update rsperftools and modify route path

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update rsperftools to fix crash bugs

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: migrate related thread together

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: use pthread_getname_np to get thread name

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: fix bugs in TempFdArray

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: set default frequency to 99Hz

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: rename from rsperftools into pprof

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update pprof

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: support return protobuf according to Content-Type

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: add tests for regex

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: add response function

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: update pprof-rs

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: fix grammer error

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: fix bugs caused by pprof-rs link tests all the time

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: reset handle_fail_points_request

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

* server/status_server: add description for Error response

Signed-off-by: Yang Keao <keao.yang@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.