Skip to content
yyan7223 edited this page Jul 28, 2023 · 27 revisions

From .cpp to DFG

Finite Impulse Response kernel.cpp

LLVM Intermediate Representation

Data Flow Graph (DFG)

Mapping DFG to CGRA

heuristicMap()

Mapper.heuristicMap() is responsible for mapping DFG to CGRA, and its pseudocode is as follows:

II = Max(RecMII,ResMII)
While(1){
    constructMRRG(DFG, CGRA, II) // construct Modulo Routing Resource Graph (MRRG) according to DFG, CGRA, and current II 

    for each node in DFGNodes do
        for each Tile in CGRA do
            [path, timeCost] = caculateCost(node, Tile) // find the shortest path (time cost minimum) that mapping current DFGNode to current Tile
            pathList.append([path, timeCost])
        end for
        optimalPath = getPathWithMinCostAndConstraints(pathList) // find the path with minimum time cost and other constraints
        flag = schedule(optimalPath) // occupy the CGRALink among the optimalPath, allocate the register for the last CGRANode of the optimalPath
    end for

    if flag 
        break // mapping success
    else
        II += 1 // update II, repeat steps above
}

constructMRRG()

Mapper.constructMRRG() is responsible for constructing the Modulo Routing Resource Graph (MRRG) according to DFG, CGRA, and current II Assume that a 2x2 CGRA with II=4, MRRG can be interpreted as the duplicated representation of Tile and Link resources in the CGRA along the time axis, so that the mapper can obtain te occupy status of each Tile and Link at each time cycle, the conduct the mapping process accordingly.

  • The Tile resource includes Tile0, Tile1, Tile2, Tile3.
  • The Link resource, or, the routing fabric, includes the bi-directional interconnect between Tile0 and Tile1, Tile0 and Tile2, Tile3 and Tile1, Tile3 and Tile2.

calculateCost()

Assume that the DFGNode add has been mapped to MRRG Tile1 at Cycle0, now the mapper is trying to determine the placement of its successor DFGNode cmp on MRRG.

As CGRA2x2 only has 4 Tile, so there are generally 4 placement options for DFGNode cmp, including Tile0, Tile1, Tile2, Tile3, whose exact cycle in MRRG is confirmed by dijkstra_search() function:

(a) MRRG Tile1 at Cycle0: use the same Tile as its predecessor, the result of DFGNode 'add' is directly registered in Tile 1, no routing fabric is required, with timeCost = 0

(b) MRRG Tile0 at Cycle1: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0 to MRRG Tile0 at Cycle1, with timeCost = 1

(c) MRRG Tile3 at Cycle1: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0 to MRRG Tile3 at Cycle1, with timeCost = 1

(d) MRRG Tile2 at Cycle2: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0, through MRRG Tile0/Tile3 at Cycle1, to MRRG Tile2 at Cycle2, with timeCost = 2 (there is no routing fabric between Tile1 and Tile2, so an addition Tile0/Tile3 must be adopted to bridge the result of DFGNode add)

dijkstra_search() is used for calculating the shortest path between the source Tile to destination Tile. Take placement option (a) as an example, the source node is MRRG Tile1 at Cycle0, the destination Tile is MRRG Tile1 at any Cycle, so the path might be:

  • MRRG Tile1 at Cycle0->MRRG Tile1 at Cycle1
  • MRRG Tile1 at Cycle0->MRRG Tile0 at Cycle1->MRRG Tile1 at Cycle2
  • MRRG Tile1 at Cycle0->MRRG Tile3 at Cycle1->MRRG Tile3 at Cycle2->MRRG Tile1 at Cycle3
  • MRRG Tile1 at Cycle0->MRRG Tile0 at Cycle1->MRRG Tile2 at Cycle2->MRRG Tile3 at Cycle3->MRRG Tile1 at Cycle4
  • ......

Although there are actually infinite possible options fo the path, it is obvious that MRRG Tile1 at Cycle0->MRRG Tile1 at Cycle1 is the optimal path with the minimum timeCost=0. However in some complex cases, identifying the optimal path is not as easy as the example above, so the dijkstra algorithm is adopted to iteratively search the optimal path.

getPathWithMinCostAndConstraints()

Go back to the pesudo code:

for each Tile in CGRA do
    [path, timeCost] = caculateCost(node, Tile)
    pathList.append([path, timeCost])
end for

This part of code calculate the optimal path of mapping DFGNode cmp to Tile0, Tile1, Tile2, Tile3, respectively. And together with their respective timeCost, are stored in pathList. Mapper.getPathWithMinCostAndConstraints() takes other constraints into account, calculate the overall cost for each path in pathList, finally output the path with the minimum cost, and pass to Mapper.schedule()

There are many constraints that need to be concerned for each path, for example

  • If a non-memory access operation is mapped on a Tile that support the memory access operation, then a negative cost should be applied to this path. For a memory-intensive DFG, the Tile with the memory access capability should be better left for Load/Store operation.

schedule()

For the optimal path output by Mapper.getPathWithMinCostAndConstraints(), for example:

MRRG Tile1 at Cycle0->MRRG Tile3 at Cycle1->MRRG Tile2 at Cycle2

In this path, the mapper uses setDFGNode() to record:

  • DFGNode cmp is mapped to MRRG Tile2 at Cycle2

and, uses setDFGNode() to record:

  • The routing fabric between MRRG Tile1 at Cycle0 and MRRG Tile3 at Cycle1
  • The routing fabric between MRRG Tile3 at Cycle1 and MRRG Tile2 at Cycle2

schedule()

CGRA Architecture & Timing

The details of the CGRA architecture is firstly introduced, followed by the timing analysis of a concrete operation 'multiplication'

CGRA Architecture

CGRA Timing