Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Continuous Performance Regression Tests #948

Closed
k-ye opened this issue May 11, 2020 · 14 comments
Closed

[test] Continuous Performance Regression Tests #948

k-ye opened this issue May 11, 2020 · 14 comments
Labels
enhancement Make existing things or codebases better feature request Suggest an idea on this project

Comments

@k-ye
Copy link
Member

k-ye commented May 11, 2020

Concisely describe the proposed feature

I think it will be great if we can have a CI pipeline to run some benchmarks as regression tests. This way we can easily detect problems like #937 (comment).

@k-ye k-ye added enhancement Make existing things or codebases better feature request Suggest an idea on this project labels May 11, 2020
@archibate
Copy link
Collaborator

https://www.cnblogs.com/younggun/articles/1814989.html
I thought that's exactly what we do in tests/python?

@k-ye
Copy link
Member Author

k-ye commented May 11, 2020

By "regression" I mean to detect performance regression (e.g. a new change caused the performance, as measured by our benchmark tests, to drop by 50%).

In contrast, what we have currently in the CI are just unit tests. They are used to verify if the system is not fundamentally broken.

@k-ye k-ye changed the title Continuous Regression Tests Continuous Performance Regression Tests May 11, 2020
@archibate archibate changed the title Continuous Performance Regression Tests [test] Continuous Performance Regression Tests May 11, 2020
@archibate
Copy link
Collaborator

archibate commented May 11, 2020

Thank for clarify this, so we want to verify the functionability not broken, also want to verify the performance not broken? Not sure how Travis CI could do this, currently we can only do this by git switch back-and-fore, then run benchmarks by hand.

@k-ye
Copy link
Member Author

k-ye commented May 11, 2020

also want to verify the performance not broken?

Yep

Not sure now Travis CI could do this,

I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.


I suggest we don't worry too much about this issue. We may prioritize this when Taichi is more mature. For now I'm simply creating an issue so that we don't forget :)

@archibate
Copy link
Collaborator

I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.

I searched the web and found no info about relation between Travis and CPRT...

A stright-forward attempt can be:
Add a file called last_benchmark.txt, contains numbers that generated for each commit.
And let the CI or human-eye to check if the value last_benchmark.txt is increased or decreased, and report that number aloud.

@k-ye
Copy link
Member Author

k-ye commented May 11, 2020

I searched the web and found no info about relation between Travis and CPRT...

Ah, the naming could be performance tests, benchmark (BM) tests... I think the terms are pretty confusing here.

Yeah, I think having a file to store the historical BM data is a good way to get things on going (Usually this would be stored in some database for ease of query, but obviously we'd then have to pay for that...) I think this can be even simpler -- configure the bot so that it posts the BM data on each PR. For example: pingcap/tidb#17101 (comment)

@archibate
Copy link
Collaborator

archibate commented May 11, 2020

pingcap/tidb#17101 (comment)

Cool! But I guess we will pay money for that. not sure if @yuanming-hu like this...
It comes to me that we can upgrade our format server to [Click to update benchmark]:

[[Click here for the format server]](http://kun.csail.mit.edu:31415/)

When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.
Or we can trigger this when user pushes [benchmark] do benchmark for me like the [format] currently does.
Then the reviewers can check the Files changed page to see if the performance increased or decreased.

@k-ye
Copy link
Member Author

k-ye commented May 11, 2020

But I guess we will pay money for that. not sure if @yuanming-hu like this...

I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...

When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.

Yeah, i think this can be a good start.. The good thing about having a report on the PR is that people can actively look into it, though. But again, these are all fancy stuffs, which we don't need urgently

@archibate
Copy link
Collaborator

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

@yuanming-hu
Copy link
Member

yuanming-hu commented May 11, 2020

Great idea. Thanks for proposing this.

I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...

We can find a computer in our lab for benchmark purposes. We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates. Our group also has some free Google cloud accounts. I'll think about this.

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

We actually already have some basic benchmarks: https://github.com/taichi-dev/taichi/tree/master/benchmarks. https://github.com/taichi-dev/taichi/blob/master/benchmarks/run.py can trigger these benchmarks:

fill_dense:
 * flat_range                        x64        8.852 ms       cuda       0.402 ms
 * flat_struct                       x64        5.688 ms       cuda       0.398 ms
 * nested_range                      x64        8.549 ms       cuda       0.826 ms
 * nested_range_blocked              x64        4.323 ms       cuda       6.724 ms
 * nested_struct                     x64        5.694 ms       cuda       0.324 ms
 * nested_struct_listgen_16x16       x64        5.693 ms       cuda       0.317 ms
 * nested_struct_listgen_8x8         x64        5.763 ms       cuda       0.316 ms
 * root_listgen                      x64        5.685 ms       cuda       0.402 ms
fill_sparse:
 * nested_struct                     x64       11.053 ms       cuda       0.674 ms
 * nested_struct_fill_and_clear      x64       43.212 ms       cuda      22.951 ms
memory_bound:
 * memcpy                            x64       78.917 ms       cuda       8.072 ms
 * memset                            x64       92.055 ms       cuda       5.042 ms
 * saxpy                             x64       97.547 ms       cuda      11.460 ms
 * sscal                             x64       98.836 ms       cuda       7.809 ms
minimal:
 * fill_scalar                       x64        0.002 ms       cuda       0.007 ms
mpm2d:
 * range                             x64        0.793 ms       cuda       0.027 ms
 * struct                            x64        0.773 ms       cuda       0.028 ms

These can be reused. For example, https://github.com/taichi-dev/taichi/blob/master/benchmarks/mpm2d.py should be able to detect the performance issue in introduced in #937. How to automatically summarize the benchmark results and display on GitHub is worth discussions.

I haven't got a chance to systematically work on performance issues though.

@xumingkuan
Copy link
Collaborator

Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)

Currently, I'm just setting TI_PRINT_BENCHMARK_STAT=1, which generates a log file when running each unit test, for my benchmark charts.

it runs ti benchmark and update misc/benchmark.txt

If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).

@archibate
Copy link
Collaborator

archibate commented May 12, 2020

If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).

So currently print_benchmark_stat only shows the number of statements? If so, it's not enough, for example:

$1 = const [8]
$2 = pow $0, $1

versus:

$1 = mul $0, $0
$2 = mul $1, $1
$3 = mul $2, $2

Although the second have more statements, but it's actually more efficient than the first.


Also consider vector division:

v.x /= k;
v.y /= k;
v.z /= k;

versus:

tmp = 1 / k;
v.x *= tmp;
v.y *= tmp;
v.z *= tmp;

And not to mention loop unroll.

So what we want is Time Performance, instead of Size Performance. I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.

@xumingkuan
Copy link
Collaborator

I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.

Yes, but we may need to solve this issue first before adding the regression test of time performance:

We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates.

@archibate
Copy link
Collaborator

archibate commented May 12, 2020

solve this issue

Oh, I see, so we can first setup SP CPRT as a 练手 for TP CPRT before this issue is solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Make existing things or codebases better feature request Suggest an idea on this project
Development

No branches or pull requests

4 participants