Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] Avoid storing constants across offloaded tasks #812

Merged
merged 2 commits into from Apr 18, 2020

Conversation

xumingkuan
Copy link
Collaborator

Related issue = #656 #729

Test case: test_local_atomics.py::test_implicit_local_atomic_or

before:

kernel {
  $0 = offloaded  {
    <i32*x1> $1 = global tmp var (offset = 0 B)
    <i32 x1> $2 = const [0]
    <i32*x1> $3 : global store [$1 <- $2]
    <i32*x1> $4 = global tmp var (offset = 4 B)
    <i32*x1> $5 : global store [$4 <- $2]
    <i32*x1> $6 = global tmp var (offset = 0 B)
    <i32*x1> $7 = global tmp var (offset = 4 B)
    <i32 x1> $8 = global load $7
    <i32*x1> $9 : global store [$6 <- $8]
  }
  $10 = offloaded range_for(0, 10) block_dim=adaptive {
    <i32 x1> $11 = const [2]
    <i32 x1> $12 = loop index 0
    <i32 x1> $13 = pow $11 $12
    <i32*x1> $14 = global tmp var (offset = 0 B)
    <i32 x1> $15 = atomic bit_or($14, $13)
  }
  $16 = offloaded  {
    <i32*x1> $17 = global tmp var (offset = 0 B)
    <i32 x1> $18 = global load $17
    <gen*x1> $19 = get root
    <i32*x1> $20 = global tmp var (offset = 4 B)
    <i32 x1> $21 = global load $20
    <gen*x1> $22 = [S0root][root]::lookup($19, $21) activate = false
    <gen*x1> $23 = get child [S0root->S1dense] $22
    <i32*x1> $24 = global tmp var (offset = 4 B)
    <i32 x1> $25 = global load $24
    <gen*x1> $26 = [S1dense][dense]::lookup($23, $25) activate = false
    <i32*x1> $27 = get child [S1dense->S2place_i32] $26
    <i32*x1> $28 : global store [$27 <- $18]
  }
}

after:

kernel {
  $0 = offloaded  {
    <i32*x1> $1 = global tmp var (offset = 0 B)
    <i32 x1> $2 = const [0]
    <i32*x1> $3 : global store [$1 <- $2]
    <i32*x1> $4 = global tmp var (offset = 0 B)
    <i32*x1> $5 : global store [$4 <- $2]
  }
  $6 = offloaded range_for(0, 10) block_dim=adaptive {
    <i32 x1> $7 = const [2]
    <i32 x1> $8 = loop index 0
    <i32 x1> $9 = pow $7 $8
    <i32*x1> $10 = global tmp var (offset = 0 B)
    <i32 x1> $11 = atomic bit_or($10, $9)
  }
  $12 = offloaded  {
    <i32*x1> $13 = global tmp var (offset = 0 B)
    <i32 x1> $14 = global load $13
    <gen*x1> $15 = get root
    <i32 x1> $16 = const [0]
    <gen*x1> $17 = [S0root][root]::lookup($15, $16) activate = false
    <gen*x1> $18 = get child [S0root->S1dense] $17
    <gen*x1> $19 = [S1dense][dense]::lookup($18, $16) activate = false
    <i32*x1> $20 = get child [S1dense->S2place_i32] $19
    <i32*x1> $21 : global store [$20 <- $14]
  }
}

[Click here for the format server]

@xumingkuan
Copy link
Collaborator Author

BTW, I wonder if FixCrossOffloadReferences::visit(AtomicOpStmt *stmt) and visit(LocalLoadStmt *stmt) should consider the stmt_to_offloaded[stmt] != stmt_to_offloaded[op] case where op is stmt->val for AtomicOpStmt or stmt->data for LocalLoadStmt.

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and LGTM! Sorry about the force-pushes... (Let me know if I screw up anything.)

@yuanming-hu
Copy link
Member

yuanming-hu commented Apr 17, 2020

BTW, I wonder if FixCrossOffloadReferences::visit(AtomicOpStmt *stmt) and visit(LocalLoadStmt *stmt) should consider the stmt_to_offloaded[stmt] != stmt_to_offloaded[op] case where op is stmt->val for AtomicOpStmt or stmt->data for LocalLoadStmt.

I assuming you are talking about LocalStoreStmt. I tried this:

import taichi as ti

ti.init()

@ti.kernel
def f():
    a = 1
    b = 0
    for i in range(10):
        b += a

    print(b)

f()

and it indeed crashes ...

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-we9vtefg
[Taichi] sandbox prepared
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 365cad6e, python 3.6.9
[E 04/17/20 19:42:12.768] [verify.cpp:basic_verify@37] stmt 12 cannot have operand 1.


***********************************
* Taichi Compiler Stack Traceback *
***********************************
/tmp/taichi-we9vtefg/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::IRVerifier::basic_verify(taichi::lang::Stmt*)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::irpass::analysis::verify(taichi::lang::IRNode*)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::irpass::compile_to_offloads(taichi::lang::IRNode*, taichi::lang::CompileConfig const&, bool, bool, bool, bool)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::Kernel::lower()
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::Program::compile(taichi::lang::Kernel&)
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::Kernel::compile()
/tmp/taichi-we9vtefg/taichi_core.so: taichi::lang::Kernel::operator()()

If you already have an idea to fix this, could you open another PR for this one? I'll just merge this in after CI passes.

@yuanming-hu yuanming-hu merged commit 5fb79eb into taichi-dev:master Apr 18, 2020
@xumingkuan
Copy link
Collaborator Author

Thanks for providing an example and I'll open another PR to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants