Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt] Add strength reduction optimizations #1065

Merged
merged 8 commits into from
May 27, 2020
Merged

[Opt] Add strength reduction optimizations #1065

merged 8 commits into from
May 27, 2020

Conversation

xumingkuan
Copy link
Collaborator

@xumingkuan xumingkuan commented May 26, 2020

Related issue = #944

This PR contains:

  • a * 2 -> a + a, 2 * a -> a + a (for all types, cast to the original result type)
  • a / const -> a * (1 / const) (floating point only & fast_math only -- is this necessary?)
  • a ** 1 -> a (for all types)
  • a ** 0 -> 1 (for all types & fast_math only, cast to the original result type)
  • a ** 2 -> a * a (for all types, cast to the original result type)

Benchmark:
benchmark20200526

[Click here for the format server]

@xumingkuan
Copy link
Collaborator Author

What about a ** 0 -> 1 (for all types, fast_math only)?

@yuanming-hu
Copy link
Member

  • a / const -> a * (1 / const) (floating point only & fast_math only -- is this necessary?)

Yes if the RHS is a const, then we should do it under fast math.

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

  • a ** 2 -> a * a (for all types, cast to the original result type)

We can actually weaken a ** n for all n <= 32, using exponentiation by squaring. std::pow is too costly.

@xumingkuan
Copy link
Collaborator Author

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

What should be the type of 1 then? And what if a == 0 in runtime?

  • a ** 2 -> a * a (for all types, cast to the original result type)

We can actually weaken a ** n for all n <= 32, using exponentiation by squaring. std::pow is too costly.

Shall we only optimize this when n is an integer? (Do we want to weaken something like a ** 10.0?)

@yuanming-hu
Copy link
Member

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

What should be the type of 1 then? And what if a == 0 in runtime?

The type should be the return type of the original pow statement. 0 ** 0 = 1 in most implementations of pow.

Shall we only optimize this when n is an integer? (Do we want to weaken something like a ** 10.0?)

Yes. For integeral n and -32 <= n <= 32.

@archibate
Copy link
Collaborator

Try out ti benchmark -T!

@xumingkuan
Copy link
Collaborator Author

Try out ti benchmark -T!

Cool, but probably doesn't make much sense on my laptop:

(taichi) C:\Users\xmk\Desktop\taichi>ti regression
[Taichi] mode=development
[Taichi] <dev mode>, supported archs: [cpu only], commit 13fece39, python 3.7.
6

 *******************************************
 **     Taichi Programming Language       **
 *******************************************

x64::struct_______________________________________________
time_avg                      ?[35m    1.1?[39m -> ?[36m    1.1 ?[31m    +3.6%
?[39m

x64::sscal________________________________________________
time_avg                      ?[35m1.3e+02?[39m -> ?[36m1.3e+02 ?[31m    +2.5%
?[39m

x64::saxpy________________________________________________
time_avg                      ?[35m1.9e+02?[39m -> ?[36m1.9e+02 ?[31m    +0.3%
?[39m

x64::root_listgen_________________________________________
time_avg                      ?[35m    6.7?[39m -> ?[36m    6.7 ?[31m    +0.3%
?[39m

x64::range________________________________________________
time_avg                      ?[35m    1.1?[39m -> ?[36m    1.1 ?[31m    +2.6%
?[39m

x64::nested_struct_listgen_8x8____________________________
time_avg                      ?[35m    6.6?[39m -> ?[36m    6.6 ?[32m    -0.6%
?[39m

x64::nested_struct_listgen_16x16__________________________
time_avg                      ?[35m    7.4?[39m -> ?[36m    7.7 ?[31m    +4.7%
?[39m

x64::nested_struct_fill_and_clear_________________________
time_avg                      ?[35m6.2e+01?[39m -> ?[36m7.1e+01 ?[31m   +13.4%
?[39m

x64::nested_struct________________________________________
time_avg                      ?[35m2.2e+01?[39m -> ?[36m2.3e+01 ?[31m    +3.5%
?[39m

x64::nested_range_blocked_________________________________
time_avg                      ?[35m    6.0?[39m -> ?[36m    6.0 ?[31m    +0.8%
?[39m

x64::nested_range_________________________________________
time_avg                      ?[35m1.2e+01?[39m -> ?[36m1.2e+01 ?[31m    +0.0%
?[39m

x64::memset_______________________________________________
time_avg                      ?[35m  1e+02?[39m -> ?[36m1.1e+02 ?[31m    +6.2%
?[39m

x64::memcpy_______________________________________________
time_avg                      ?[35m1.4e+02?[39m -> ?[36m1.4e+02 ?[31m    +1.1%
?[39m

x64::flat_struct__________________________________________
time_avg                      ?[35m    6.4?[39m -> ?[36m    6.4 ?[31m    +0.3%
?[39m

x64::flat_range___________________________________________
time_avg                      ?[35m    9.5?[39m -> ?[36m1.1e+01 ?[31m   +14.1%
?[39m

x64::fill_scalar__________________________________________
time_avg                      ?[35m  0.005?[39m -> ?[36m  0.003 ?[32m   -40.0%
?[39m


>>> Running time: 0.01s

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@@ -437,6 +437,23 @@ float64 TypedConstant::val_float() const {
}
}

TypedConstant TypedConstant::operator-() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be replaced by the JIT evaluator in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I need this operator because we don't do alg_simp and constant_fold iteratively together now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants