[test] Continuous Performance Regression Tests #948

k-ye · 2020-05-11T11:43:46Z

Concisely describe the proposed feature

I think it will be great if we can have a CI pipeline to run some benchmarks as regression tests. This way we can easily detect problems like #937 (comment).

archibate · 2020-05-11T13:13:33Z

https://www.cnblogs.com/younggun/articles/1814989.html
I thought that's exactly what we do in tests/python?

k-ye · 2020-05-11T13:20:17Z

By "regression" I mean to detect performance regression (e.g. a new change caused the performance, as measured by our benchmark tests, to drop by 50%).

In contrast, what we have currently in the CI are just unit tests. They are used to verify if the system is not fundamentally broken.

archibate · 2020-05-11T14:37:36Z

Thank for clarify this, so we want to verify the functionability not broken, also want to verify the performance not broken? Not sure how Travis CI could do this, currently we can only do this by git switch back-and-fore, then run benchmarks by hand.

k-ye · 2020-05-11T14:41:43Z

also want to verify the performance not broken?

Yep

Not sure now Travis CI could do this,

I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.

I suggest we don't worry too much about this issue. We may prioritize this when Taichi is more mature. For now I'm simply creating an issue so that we don't forget :)

archibate · 2020-05-11T14:47:50Z

I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.

I searched the web and found no info about relation between Travis and CPRT...

A stright-forward attempt can be:
Add a file called last_benchmark.txt, contains numbers that generated for each commit.
And let the CI or human-eye to check if the value last_benchmark.txt is increased or decreased, and report that number aloud.

k-ye · 2020-05-11T14:54:37Z

I searched the web and found no info about relation between Travis and CPRT...

Ah, the naming could be performance tests, benchmark (BM) tests... I think the terms are pretty confusing here.

Yeah, I think having a file to store the historical BM data is a good way to get things on going (Usually this would be stored in some database for ease of query, but obviously we'd then have to pay for that...) I think this can be even simpler -- configure the bot so that it posts the BM data on each PR. For example: pingcap/tidb#17101 (comment)

archibate · 2020-05-11T15:12:01Z

pingcap/tidb#17101 (comment)

Cool! But I guess we will pay money for that. not sure if @yuanming-hu like this...
It comes to me that we can upgrade our format server to [Click to update benchmark]:

taichi/.github/pull_request_template.md

Line 7 in 471392b

[[Click here for the format server]](http://kun.csail.mit.edu:31415/)

When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.
Or we can trigger this when user pushes [benchmark] do benchmark for me like the [format] currently does.
Then the reviewers can check the Files changed page to see if the performance increased or decreased.

k-ye · 2020-05-11T15:16:50Z

But I guess we will pay money for that. not sure if @yuanming-hu like this...

I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...

When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.

Yeah, i think this can be a good start.. The good thing about having a report on the PR is that people can actively look into it, though. But again, these are all fancy stuffs, which we don't need urgently

archibate · 2020-05-11T15:27:17Z

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

yuanming-hu · 2020-05-11T15:47:26Z

Great idea. Thanks for proposing this.

I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...

We can find a computer in our lab for benchmark purposes. We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates. Our group also has some free Google cloud accounts. I'll think about this.

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

We actually already have some basic benchmarks: https://github.com/taichi-dev/taichi/tree/master/benchmarks. https://github.com/taichi-dev/taichi/blob/master/benchmarks/run.py can trigger these benchmarks:

fill_dense:
 * flat_range                        x64        8.852 ms       cuda       0.402 ms
 * flat_struct                       x64        5.688 ms       cuda       0.398 ms
 * nested_range                      x64        8.549 ms       cuda       0.826 ms
 * nested_range_blocked              x64        4.323 ms       cuda       6.724 ms
 * nested_struct                     x64        5.694 ms       cuda       0.324 ms
 * nested_struct_listgen_16x16       x64        5.693 ms       cuda       0.317 ms
 * nested_struct_listgen_8x8         x64        5.763 ms       cuda       0.316 ms
 * root_listgen                      x64        5.685 ms       cuda       0.402 ms
fill_sparse:
 * nested_struct                     x64       11.053 ms       cuda       0.674 ms
 * nested_struct_fill_and_clear      x64       43.212 ms       cuda      22.951 ms
memory_bound:
 * memcpy                            x64       78.917 ms       cuda       8.072 ms
 * memset                            x64       92.055 ms       cuda       5.042 ms
 * saxpy                             x64       97.547 ms       cuda      11.460 ms
 * sscal                             x64       98.836 ms       cuda       7.809 ms
minimal:
 * fill_scalar                       x64        0.002 ms       cuda       0.007 ms
mpm2d:
 * range                             x64        0.793 ms       cuda       0.027 ms
 * struct                            x64        0.773 ms       cuda       0.028 ms

These can be reused. For example, https://github.com/taichi-dev/taichi/blob/master/benchmarks/mpm2d.py should be able to detect the performance issue in introduced in #937. How to automatically summarize the benchmark results and display on GitHub is worth discussions.

I haven't got a chance to systematically work on performance issues though.

xumingkuan · 2020-05-11T17:24:50Z

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

Currently, I'm just setting TI_PRINT_BENCHMARK_STAT=1, which generates a log file when running each unit test, for my benchmark charts.

it runs ti benchmark and update misc/benchmark.txt

If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).

archibate · 2020-05-12T03:49:48Z

If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).

So currently print_benchmark_stat only shows the number of statements? If so, it's not enough, for example:

$1 = const [8]
$2 = pow $0, $1

versus:

$1 = mul $0, $0
$2 = mul $1, $1
$3 = mul $2, $2

Although the second have more statements, but it's actually more efficient than the first.

Also consider vector division:

v.x /= k;
v.y /= k;
v.z /= k;

versus:

tmp = 1 / k;
v.x *= tmp;
v.y *= tmp;
v.z *= tmp;

And not to mention loop unroll.

So what we want is Time Performance, instead of Size Performance. I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.

xumingkuan · 2020-05-12T03:57:08Z

I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.

Yes, but we may need to solve this issue first before adding the regression test of time performance:

We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates.

archibate · 2020-05-12T04:03:07Z

solve this issue

Oh, I see, so we can first setup SP CPRT as a 练手 for TP CPRT before this issue is solved?

k-ye added enhancement Make existing things or codebases better feature request Suggest an idea on this project labels May 11, 2020

k-ye changed the title ~~Continuous Regression Tests~~ Continuous Performance Regression Tests May 11, 2020

archibate changed the title ~~Continuous Performance Regression Tests~~ [test] Continuous Performance Regression Tests May 11, 2020

archibate mentioned this issue May 12, 2020

[test] Size Performance Regression Tests (SPRT) #959

Merged

k-ye closed this as completed Jun 9, 2021

This was referenced Jul 27, 2021

[benchmark] Add query_kernel_profiler_() and benchmark/misc for performance monitoring #2601

Merged

[ci] Add slash benchmark command for performance monitoring #2632

Merged

yolo2themoon added this to test in Backends Performance Oct 27, 2021

yolo2themoon added this to Collection of issues in Backends Performance Oct 27, 2021

qiao-bo moved this from Collection of issues to Done in Backends Performance Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Continuous Performance Regression Tests #948

[test] Continuous Performance Regression Tests #948

k-ye commented May 11, 2020

archibate commented May 11, 2020

k-ye commented May 11, 2020 •

edited

Loading

archibate commented May 11, 2020 •

edited

Loading

k-ye commented May 11, 2020

archibate commented May 11, 2020

k-ye commented May 11, 2020

archibate commented May 11, 2020 •

edited

Loading

k-ye commented May 11, 2020

archibate commented May 11, 2020

yuanming-hu commented May 11, 2020 •

edited

Loading

xumingkuan commented May 11, 2020

archibate commented May 12, 2020 •

edited

Loading

xumingkuan commented May 12, 2020

archibate commented May 12, 2020 •

edited

Loading

[test] Continuous Performance Regression Tests #948

[test] Continuous Performance Regression Tests #948

Comments

k-ye commented May 11, 2020

archibate commented May 11, 2020

k-ye commented May 11, 2020 • edited Loading

archibate commented May 11, 2020 • edited Loading

k-ye commented May 11, 2020

archibate commented May 11, 2020

k-ye commented May 11, 2020

archibate commented May 11, 2020 • edited Loading

k-ye commented May 11, 2020

archibate commented May 11, 2020

yuanming-hu commented May 11, 2020 • edited Loading

xumingkuan commented May 11, 2020

archibate commented May 12, 2020 • edited Loading

xumingkuan commented May 12, 2020

archibate commented May 12, 2020 • edited Loading

k-ye commented May 11, 2020 •

edited

Loading

archibate commented May 11, 2020 •

edited

Loading

archibate commented May 11, 2020 •

edited

Loading

yuanming-hu commented May 11, 2020 •

edited

Loading

archibate commented May 12, 2020 •

edited

Loading

archibate commented May 12, 2020 •

edited

Loading