[Opt] Add strength reduction optimizations #1065

xumingkuan · 2020-05-26T20:54:03Z

Related issue = #944

This PR contains:

a * 2 -> a + a, 2 * a -> a + a (for all types, cast to the original result type)
a / const -> a * (1 / const) (floating point only & fast_math only -- is this necessary?)
a ** 1 -> a (for all types)
a ** 0 -> 1 (for all types & fast_math only, cast to the original result type)
a ** 2 -> a * a (for all types, cast to the original result type)

Benchmark:

[Click here for the format server]

xumingkuan · 2020-05-26T20:57:29Z

What about a ** 0 -> 1 (for all types, fast_math only)?

yuanming-hu · 2020-05-26T21:22:28Z

a / const -> a * (1 / const) (floating point only & fast_math only -- is this necessary?)

Yes if the RHS is a const, then we should do it under fast math.

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

a ** 2 -> a * a (for all types, cast to the original result type)

We can actually weaken a ** n for all n <= 32, using exponentiation by squaring. std::pow is too costly.

xumingkuan · 2020-05-26T21:29:10Z

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

What should be the type of 1 then? And what if a == 0 in runtime?

a ** 2 -> a * a (for all types, cast to the original result type)

We can actually weaken a ** n for all n <= 32, using exponentiation by squaring. std::pow is too costly.

Shall we only optimize this when n is an integer? (Do we want to weaken something like a ** 10.0?)

yuanming-hu · 2020-05-26T21:32:07Z

What about a ** 0 -> 1 (for all types, fast_math only)?

Sounds good. I think you can do that even without fast math. In Python (-3) ** 0 = 1.

What should be the type of 1 then? And what if a == 0 in runtime?

The type should be the return type of the original pow statement. 0 ** 0 = 1 in most implementations of pow.

Shall we only optimize this when n is an integer? (Do we want to weaken something like a ** 10.0?)

Yes. For integeral n and -32 <= n <= 32.

archibate · 2020-05-27T01:12:42Z

Try out ti benchmark -T!

xumingkuan · 2020-05-27T01:54:54Z

Try out ti benchmark -T!

Cool, but probably doesn't make much sense on my laptop:

(taichi) C:\Users\xmk\Desktop\taichi>ti regression
[Taichi] mode=development
[Taichi] <dev mode>, supported archs: [cpu only], commit 13fece39, python 3.7.
6

 *******************************************
 **     Taichi Programming Language       **
 *******************************************

x64::struct_______________________________________________
time_avg                      ?[35m    1.1?[39m -> ?[36m    1.1 ?[31m    +3.6%
?[39m

x64::sscal________________________________________________
time_avg                      ?[35m1.3e+02?[39m -> ?[36m1.3e+02 ?[31m    +2.5%
?[39m

x64::saxpy________________________________________________
time_avg                      ?[35m1.9e+02?[39m -> ?[36m1.9e+02 ?[31m    +0.3%
?[39m

x64::root_listgen_________________________________________
time_avg                      ?[35m    6.7?[39m -> ?[36m    6.7 ?[31m    +0.3%
?[39m

x64::range________________________________________________
time_avg                      ?[35m    1.1?[39m -> ?[36m    1.1 ?[31m    +2.6%
?[39m

x64::nested_struct_listgen_8x8____________________________
time_avg                      ?[35m    6.6?[39m -> ?[36m    6.6 ?[32m    -0.6%
?[39m

x64::nested_struct_listgen_16x16__________________________
time_avg                      ?[35m    7.4?[39m -> ?[36m    7.7 ?[31m    +4.7%
?[39m

x64::nested_struct_fill_and_clear_________________________
time_avg                      ?[35m6.2e+01?[39m -> ?[36m7.1e+01 ?[31m   +13.4%
?[39m

x64::nested_struct________________________________________
time_avg                      ?[35m2.2e+01?[39m -> ?[36m2.3e+01 ?[31m    +3.5%
?[39m

x64::nested_range_blocked_________________________________
time_avg                      ?[35m    6.0?[39m -> ?[36m    6.0 ?[31m    +0.8%
?[39m

x64::nested_range_________________________________________
time_avg                      ?[35m1.2e+01?[39m -> ?[36m1.2e+01 ?[31m    +0.0%
?[39m

x64::memset_______________________________________________
time_avg                      ?[35m  1e+02?[39m -> ?[36m1.1e+02 ?[31m    +6.2%
?[39m

x64::memcpy_______________________________________________
time_avg                      ?[35m1.4e+02?[39m -> ?[36m1.4e+02 ?[31m    +1.1%
?[39m

x64::flat_struct__________________________________________
time_avg                      ?[35m    6.4?[39m -> ?[36m    6.4 ?[31m    +0.3%
?[39m

x64::flat_range___________________________________________
time_avg                      ?[35m    9.5?[39m -> ?[36m1.1e+01 ?[31m   +14.1%
?[39m

x64::fill_scalar__________________________________________
time_avg                      ?[35m  0.005?[39m -> ?[36m  0.003 ?[32m   -40.0%
?[39m


>>> Running time: 0.01s

yuanming-hu

LGTM! Thanks!

yuanming-hu · 2020-05-27T02:55:03Z

taichi/lang_util.cpp

@@ -437,6 +437,23 @@ float64 TypedConstant::val_float() const {
  }
 }

+TypedConstant TypedConstant::operator-() const {


This should be replaced by the JIT evaluator in the future.

Yes. I need this operator because we don't do alg_simp and constant_fold iteratively together now.

[Opt] Add strength reduction optimizations

43754e1

xumingkuan added 3 commits May 26, 2020 17:07

[skip ci] code format

e63143c

add a ** 0 -> 1

db31e83

[skip ci] code format

2584b4f

xumingkuan added 4 commits May 26, 2020 20:08

add a ** n

047f124

fix tests

ccce0d6

[skip ci] code format

dbf757e

[skip ci] code format

13fece3

xumingkuan requested a review from yuanming-hu May 27, 2020 01:47

yuanming-hu approved these changes May 27, 2020

View reviewed changes

xumingkuan merged commit 29eae0b into master May 27, 2020

xumingkuan deleted the strength branch May 27, 2020 19:37

xumingkuan mentioned this pull request May 27, 2020

[Opt] Add a strength reduction pass #944

Open

2 tasks

yuanming-hu mentioned this pull request May 31, 2020

[release] v0.6.7 #1094

Merged

xumingkuan mentioned this pull request Jun 7, 2020

[Opt] Improve strength reduction optimization for "pow" #1170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Opt] Add strength reduction optimizations #1065

[Opt] Add strength reduction optimizations #1065

xumingkuan commented May 26, 2020 •

edited

Loading

xumingkuan commented May 26, 2020

yuanming-hu commented May 26, 2020

xumingkuan commented May 26, 2020

yuanming-hu commented May 26, 2020

archibate commented May 27, 2020

xumingkuan commented May 27, 2020

yuanming-hu left a comment

yuanming-hu May 27, 2020

xumingkuan May 27, 2020

[Opt] Add strength reduction optimizations #1065

[Opt] Add strength reduction optimizations #1065

Conversation

xumingkuan commented May 26, 2020 • edited Loading

xumingkuan commented May 26, 2020

yuanming-hu commented May 26, 2020

xumingkuan commented May 26, 2020

yuanming-hu commented May 26, 2020

archibate commented May 27, 2020

xumingkuan commented May 27, 2020

yuanming-hu left a comment

Choose a reason for hiding this comment

yuanming-hu May 27, 2020

Choose a reason for hiding this comment

xumingkuan May 27, 2020

Choose a reason for hiding this comment

xumingkuan commented May 26, 2020 •

edited

Loading