Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Technology Mapping 1 by h2e9I9PT

VIEWS: 0 PAGES: 19

									Technology Mapping

Outline
   – What is Technology Mapping?
   – Rule-Based Mapping
   – Tree Pattern Matching


Goal
   – Understand technology mapping
   – Understand mapping algorithms
   – Understand mapping issues
   What is Technology Mapping?

• Map optimized logic to
  primitive cell library
• Library capabilities
   – function
   – cell size
   – cell performance
• Library restrictions
   – fan-in
   – fan-out
• Goal                     Cell AOI33
                                - area 3248
   –   delay                    - delay 0.8
   –   area                     - power 0.08
   –   power
   –   etc.
                      Cell Libraries

                                                a       b
• Custom cells                                              ab
                                                    a
   – synthesize functions as needed
   – e.g. complementary, domino, or CVSL CMOS       b
• Standard cells
   – library of fixed functions
   – use as needed
   – e.g. 3-input AND-OR
• Gate arrays
   – fixed population of gates and gate types
   – must fit design into available gates
• FPGAs
   – programmable function blocks
   – e.g. all functions of 4 inputs
               Custom Cell Synthesis
• Mechanically transform function to custom cell
     – complementary CMOS - cell is guaranteed to work
         » but it might be slow - e.g. 20-input NAND gate
     – avoid problems by limiting functions in synthesis process
         » limit fan-in, fan-out of functions
         » chop up large functions                   ABCDEF => (ABC)(DEF)
•   Apply electrical rules                           A+B+C+D => (A+B)+(C+D)
     – size transistors to meet timing goals
     – speed up longest paths in circuit
     – rules are specific to cell technology


                    a         b
                                      ab
                          a
                                           in series, so make fatter
                          b                speeds up 1->0 transition
             Rule-Based Cell Matching
• Library-based transformations
   – rules encode library capabilities and restrictions
   – transformations improve area, delay, power, etc.
   – similar to synthesis via local optimization
• SOCRATES circuit optimizer
   – lookahead several rules
   – test all applicable rules
       » avoid local minima
• Problems
   – rules are not guaranteed to find optimum
   – rules may not be complete


                X                    X
Rule-Based Matching Algorithm
   TryRule3(circuit, rule)
   {
     scan circuit for rule match
     if match, compute cost
     recurse twice more on all rules
     return minimum cost
   }

   do {
     mincost = cost;
     for (i = 0; i < MAXRULES; i++) {
        cost = TryRule3(circuit, rule[i]);
        if (cost < mincost) {
          mincost = cost;
          minrule = rule[i];
     }
     if (minrule != NULL)
        cost = ApplyRule(circuit, minrule);
   }
   until no rules apply
  Complexity (CN)3 for C circuit elements and N rules
                     Graph Covering

                                                                       F = abcd
• Logic Equation Representation
   – directed acyclic graphs (DAGs)
       » 2-input NANDs and inverters in MIS
                                                                 NOT           F
       » representation is not unique
                                                         NAND
• Library Cell Representation
   – represent cells as DAGs
                                                                   a       b       c   d
       » 2-input NANDs and inverters in MIS
       » all decompositions of a cell function
       » ~N! possibilities for N inputs
                                            F            F                 F
   – cell cost (area, delay, power)
                                                 d           b         a

                                            c            d                 b
                                                                 •••
                                    a   b        a   c                             c   d
                   Graph Covering

• Algorithm                                          F = abcd
  – minimum-cost cover of equation DAGs with
    library DAGs
  – NP-complete (Bruno and Sethi 1975)                       F
  – same problem as compiler code generation
• Approaches
  – search from primary inputs                   a       b       c       d
  – search from primary outputs
  – try largest cell DAGs first
       » usually smallest area                       F
       » not always fastest
       » similar to “maximal munching” in code   a
         generation
  – avoid local minima                               b
       » lookahead
       » several random starting points                      c       d
                 Tree Pattern Matching

• Partition circuit DAGs into trees                       a
                                                                          c
   – split at fan-out nodes                               b
   – make only outputs roots of trees
   – perform splitting incrementally
       » when searching that tree                        a
                                                         a
       » stop at already-mapped nodes                    b                c
• Algorithm                                              b
   – find optimal mapping for each output tree
   – use recursive graph isomorphism tree matching               look at top
        » match all cells at root (output)                     input to NAND
        » find optimum mapping for each subtree (cell input)
        » cost is cell plus cost of mappings of cell inputs
   – top-down traversal to record cells
   – exponential time in worst case
              Tree Pattern Matching
OptimalTree(tree)
{
  mincost = INF;
  for all cells
    if (cell matches at tree.root) {
      cost = cell.cost;
      for all cell inputs
        cost += OptimalTree(cell.input[i]);
      if (cost < mincost) {                 cost = 5+2+2+5+2+2 = 18
        mincost = cost;
        keep tree mapping;
      }
    }
  return(mincost);
}

for each output {
  outputcost = OptimalTree(output);
  scan top-down to get cell mapping;
}                                                cost = 4+4 = 8
      Tree Pattern Matching

• Inject inverter pairs at gate outputs
   – increases possible matches
   – add fake inverter pair cell to library
       » removes remaining inverter pairs from circuit




                     without inverters         with inverters
             Tree Matching Issues

• Tree matching is fast
   – simple DFS of circuit and cell trees
• Still many tree representations for a function
• Might miss common subexpressions
   – if cell matching does not line up with fan-out nodes
   – stop by treating fan-outs as cell outputs
                                                    a

         a                       a              b
                         b
    b                    c                           c
    c                        b
                             c                       b
         d
                                 d               c
                                                     d
  2-input ANDs                              3-input ANDs
  cost: 15 xistors                          cost: 16 xistors
               Graph Pattern Matching
• Match subgraphs, not trees
   – avoid breaking graph into forest of trees
   – match more gate types
       » XOR, MUX
       » multiple-output gates
• Algorithm
   – find all circuit subgraph to cell graph matchings
        » generate DFS traversal of each cell graph from outputs
        » O(C*N) for C circuit nodes, N total cell library nodes
   – cover graph with minimum-cost cell graphs
        » matrix with row for each circuit node, column for each cell, 1 if cell
          matches circuit, 0 otherwise
        » find least-cost maximum independent set of rows
        » branch-and-bound search algorithm
        » bound is least-cost rows found so far
        » exponential in worst case
Graph Matching Algorithm

boolean CellMatch(x, y)
    if (y.gatetype != x.gatetype) return(0);
    i = y.firstchild; j = x.firstchild;
    while (i != NULL && j != NULL)
        if (!CellMatch(i,j)) return(0);
        i = y.nextchild; j = x.nextchild;
    if (j == NULL) return(1);
    else return(0);

for each node i in circuit graph
    for each cell j in library graphs
        if (CellMatch(i,j)) match[i][j] = 1

find least-cost maximum set of independent rows
  in match array
    Minimum-Cost Graph Cover

3                     2
                                              XOR2 INV NOR2 NAND2
        4                      1           Cost 14   2   4     4
5                     6                    a    0    0   1     0
                                           b    1    0   0     1
    XOR Cell DFS                           c    0    0   0     1
                                           d    0    1   0     0
        d                                  e    0    0   0     1
                  c                        f    0    0   0     1
    e
                                   a
              f            b
                                           first solution: a, b, c, d, e
    Circuit DFS                            bound = 4 + 4 + 4 + 2 + 4 = 22
                                           least-cost solution: a, b, d
    a   -   fails         at   1           cost = 4 + 14 + 2 = 20
    b   -   match         at   1             NOR2 at a
    c   -   fails         at   2             XOR2 at b
    d   -   fails         at   1             INV at d
    e   -   fails         at   2
    f   -   fails         at   4       Maximum number of rows - cover most
                                       nodes in fewest cells
                                       Independent - cells do not overlap
                Layout-Driven Mapping


• Goal
   –   minimize chip area
   –   previous approaches focus on cell area
   –   ignore inter-cell routing
   –   example - high fan-in and fan-out
         » minimizes cell area
         » takes a lot of routing
• Solution
   – estimate placement and routing during mapping
       » simple, fast estimates
   – incrementally update during mapping
       » only mapping a few gates at a time
       FPGA Technology Mapping

• Programmable logic blocks
   – multiplexor-based (Actel)
   – lookup table (Xilinx)
• Problem
   –   lookup table of K inputs implements          possible functions
                                                      K
   –   K = 5 typically                             22
   –   impractical to use library cell matching approach
   –   requires 4 billion variations for each cell pattern/tree/graph
   –   similar problem for mux-based FPGAs
• Solutions
   – clique partitioning
   – bin packing
   – OBDD matching
                         FPGAs

• Xilinx
   –   RAM configurable logic blocks (CLB)
   –   RAM programmable wiring               CLB
   –   2 functions of 4 variables
   –   1 function of 5 variables
   –   implemented via table lookup RAM
• Actel
   –   fuse configurable logic elements
   –   fuse programmable wiring
   –   all 2 and 3-variable functions
   –   some 4-variable functions
            SRAM-based FPGA


• Functions are implemented as MUX
• Interconnects are implemented as wire segment
• Interconnect connections are pass transistors

• Advantage: CMOS
• Disadvantage: slow
• Other issues:
  – Need re-program each time power is on

								
To top