# Classical Loop Transformations

### Setup AST generation infrastructure

In [1]:
import isl


class CSource():
  def __init__(self, ast):
    self.source = ast

  def _repr_html_(self):
    return "<pre class='code'><code class=\"cpp hljs\">" + self.source.to_C_str() + "</code></pre>"


class CSourceComparer():
  def __init__(self, before: CSource, after: CSource):
    self.before = before
    self.after = after

  def _repr_html_(self):
    s = "<b>Before Transform:</b>\n"
    s += self.before._repr_html_()
    s += "<b>After Transform:</b>\n"
    s += self.after._repr_html_()
    return s


def print_before_after(domain, schedule_original, schedule_new):
  context = isl.set("{ : }")
  build = isl.ast_build.from_context(context)
  schedule_original = schedule_original.intersect_domain(domain)
  schedule_new = schedule_new.intersect_domain(domain)
  return CSourceComparer(CSource(build.node_from_schedule_map(schedule_original)),
                         CSource(build.node_from_schedule_map(schedule_new)))
  # print("<b>Before Transform:</b>")
  # ast = build.node_from_schedule_map(schedule_original)
  # print_code(ast)
  # print("<b>After Transform:</b>")
  # ast = build.node_from_schedule_map(schedule_new)
  # print_code(ast)


## Loop Reversal

<!-- Loop reversal changes the direction in which elements of a loop are visited. After loop reversal, the previous first loop iteration is executed last and the previous last loop iteration is executed first. -->

循环反转可以改变循环元素被访问的方向, 反转之后, 之前迭代的第一个元素将会被最后执行, 最后一个元素将会被第一个执行.

**Benefits**:
<!-- - Can be used to shorten dependences -->
- 可以被用来缩短依赖


In [2]:
domain = isl.union_set("[n] -> {S[i] : 0 <= i < n}") # 原始迭代域
original = isl.union_map("{S[i] -> [i]}") # 原始schedule
transformation = isl.union_map("{[i] -> [-i]}")

transformed = original.apply_range(transformation) # 实施transform.
print_before_after(domain, original, transformed) # i \in [0,n-1] => [-n+1,0]

# Loop Fusion

<!-- After Loop fusion two statements that have previously been enumerated by different loops are
now enumerated by a single loop. -->

之前分离在两个不同循环的中的statement将会被放到同一个循环中.

**Benefits:**
  <!-- - Improves data-locality -->
  - 提高数据局部性


In [3]:
domain = isl.union_set("[n] -> {S[i] : 0 <= i <= n; T[i] : 0 <= i <= n}") # 原始两个循环, S和T
original = isl.union_map("{S[i] -> [0, i]; T[i] -> [1, i]}") # 原始schedule, 是在循环外部安排顺序
transformation = isl.union_map("{[0, i] -> [i,0]; [1, i] -> [i, 1]}") # 将顺序安排在同一个循环内部.
transformed = original.apply_range(transformation) # NOTE 现实中要考虑是否存在数据依赖问题.
print_before_after(domain, original, transformed)

# Loop Fission (Loop Distribution)

<!-- Loop fission takes two statements that have been originally executed in the same
loop and distributes them to two separate loops. -->
Loop fission 是指将在同一个循环中执行的statement分配到两个不同的循环中去.

**Benefits:**
 <!-- - Reduces register pressure -->
 <!-- - Enables other transformations, i.e. SIMDization in case only one of
   the two statements in a loop body allows for parallel execution. -->
  - 减少寄存器压力(在同一个循环中就意味着使用更多的寄存器存储数组地址/索引等)
  - 可以开启其他的优化, 比如一个循环中只有一个statement可以进行`SIMDization`,此时可以将其分离出去单独进行并行化.

In [4]:
domain = isl.union_set("[n] -> {S[i] : 0 <= i <= n; T[i] : 0 <= i <= n}")
original = isl.union_map("{S[i] -> [i, 0]; T[i] -> [i, 1]}")
transformation = isl.union_map("{[i, 0] -> [0, i]; [i, 1] -> [1, i]}")

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)

# Loop Interchange

循环顺序交换

In [5]:
domain = isl.union_set("[n,m] -> {S[i,j] : 0 <= i <= n and 0 <= j <= m }")
original = isl.union_map("{S[i,j] -> [i, j]}")
transformation = isl.union_map("{[i, j] -> [j, i]}")

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)

# Strip Mining

<!-- Strip mining partitions a single loop into chunks that are enumerated by two loops.
An outer loop enumerates the individual blocks, whereas the inner loop enumerates
the individual iterations that belong to each block. -->
Strip mining是将单个循环按chunk分离为两个循环, 外循环在每个blocks上迭代, 内循环在每个block内部进行迭代.


**Benefits:**
 <!-- - Building block for loop tiling and unroll-and-jam. -->
 - 构建loop tiling和unroll-and-jam的block


In [6]:
domain = isl.union_set("{S[i] : 0 <= i < 1024 }")
original = isl.union_map("{S[i] -> [i]}")
transformation = isl.union_map("{[i] -> [floor(i/4), i % 4]}")

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)

# Loop Tiling

<!-- Loop tiling partitions the execution of a multi-dimensional loop into groups, the tiles.
First a set of outer loops enumerate all tiles that must be executed and for each tile
a set of inner loops, the point loops, enumerates the individual points of the tile. -->
loop tiling是将多维循环切分为group,即tile. 首先一组外部循环在外部循环在所有的tile上迭代, point loops则在每个tile的points上迭代.

**Benefits:**
 <!-- - Increased data-locality
 - More coarse-grained parallelism -->
 - 增加数据局部性
 - 更加粗粒度的并行

In [7]:
domain = isl.union_set("{S[i,j] : 0 <= i,j < 1024 }")
original = isl.union_map("{S[i,j] -> [i,j]}")
transformation = isl.union_map("{[i,j] -> [floor(i/4), i % 4, floor(j/4), j % 4]}") # 和strip mining类似.

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)

# Unroll-and-jam

<!-- Unroll-and-jam is a combination of strip-mining of the outer loop into a
tile and point loop and then an interchange of the new point loop with
the innermost loop dimension. -->
Unroll-and-jam是将外部循环进行strip-mining分离为tile和point 循环,然后交换point loop和最内层的循环.

**Benefits:**
 <!-- - Enables outer loop vectorization -->
 - 使得外部循环向量化.


In [8]:
domain = isl.union_set("{S[i,j] : 0 <= i,j < 1024 }")
original = isl.union_map("{S[i,j] -> [i,j]}")
transformation = isl.union_map("{[i,j] -> [floor(i/4), j, i % 4] }")

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)

# Skewing

倾斜迭代域

**Benefits:**
 - 使得部分无依赖的statement得以并行化.


In [9]:
domain = isl.union_set("[n] -> {S[i,j] : 0 <= i,j < n }")
original = isl.union_map("{S[i,j] -> [i,j]}")
transformation = isl.union_map("{[i,j] -> [i, i + j]}")

transformed = original.apply_range(transformation)
print_before_after(domain, original, transformed)