VIEWS: 0 PAGES: 19 POSTED ON: 6/14/2012
Technology Mapping Outline – What is Technology Mapping? – Rule-Based Mapping – Tree Pattern Matching Goal – Understand technology mapping – Understand mapping algorithms – Understand mapping issues What is Technology Mapping? • Map optimized logic to primitive cell library • Library capabilities – function – cell size – cell performance • Library restrictions – fan-in – fan-out • Goal Cell AOI33 - area 3248 – delay - delay 0.8 – area - power 0.08 – power – etc. Cell Libraries a b • Custom cells ab a – synthesize functions as needed – e.g. complementary, domino, or CVSL CMOS b • Standard cells – library of fixed functions – use as needed – e.g. 3-input AND-OR • Gate arrays – fixed population of gates and gate types – must fit design into available gates • FPGAs – programmable function blocks – e.g. all functions of 4 inputs Custom Cell Synthesis • Mechanically transform function to custom cell – complementary CMOS - cell is guaranteed to work » but it might be slow - e.g. 20-input NAND gate – avoid problems by limiting functions in synthesis process » limit fan-in, fan-out of functions » chop up large functions ABCDEF => (ABC)(DEF) • Apply electrical rules A+B+C+D => (A+B)+(C+D) – size transistors to meet timing goals – speed up longest paths in circuit – rules are specific to cell technology a b ab a in series, so make fatter b speeds up 1->0 transition Rule-Based Cell Matching • Library-based transformations – rules encode library capabilities and restrictions – transformations improve area, delay, power, etc. – similar to synthesis via local optimization • SOCRATES circuit optimizer – lookahead several rules – test all applicable rules » avoid local minima • Problems – rules are not guaranteed to find optimum – rules may not be complete X X Rule-Based Matching Algorithm TryRule3(circuit, rule) { scan circuit for rule match if match, compute cost recurse twice more on all rules return minimum cost } do { mincost = cost; for (i = 0; i < MAXRULES; i++) { cost = TryRule3(circuit, rule[i]); if (cost < mincost) { mincost = cost; minrule = rule[i]; } if (minrule != NULL) cost = ApplyRule(circuit, minrule); } until no rules apply Complexity (CN)3 for C circuit elements and N rules Graph Covering F = abcd • Logic Equation Representation – directed acyclic graphs (DAGs) » 2-input NANDs and inverters in MIS NOT F » representation is not unique NAND • Library Cell Representation – represent cells as DAGs a b c d » 2-input NANDs and inverters in MIS » all decompositions of a cell function » ~N! possibilities for N inputs F F F – cell cost (area, delay, power) d b a c d b ••• a b a c c d Graph Covering • Algorithm F = abcd – minimum-cost cover of equation DAGs with library DAGs – NP-complete (Bruno and Sethi 1975) F – same problem as compiler code generation • Approaches – search from primary inputs a b c d – search from primary outputs – try largest cell DAGs first » usually smallest area F » not always fastest » similar to “maximal munching” in code a generation – avoid local minima b » lookahead » several random starting points c d Tree Pattern Matching • Partition circuit DAGs into trees a c – split at fan-out nodes b – make only outputs roots of trees – perform splitting incrementally » when searching that tree a a » stop at already-mapped nodes b c • Algorithm b – find optimal mapping for each output tree – use recursive graph isomorphism tree matching look at top » match all cells at root (output) input to NAND » find optimum mapping for each subtree (cell input) » cost is cell plus cost of mappings of cell inputs – top-down traversal to record cells – exponential time in worst case Tree Pattern Matching OptimalTree(tree) { mincost = INF; for all cells if (cell matches at tree.root) { cost = cell.cost; for all cell inputs cost += OptimalTree(cell.input[i]); if (cost < mincost) { cost = 5+2+2+5+2+2 = 18 mincost = cost; keep tree mapping; } } return(mincost); } for each output { outputcost = OptimalTree(output); scan top-down to get cell mapping; } cost = 4+4 = 8 Tree Pattern Matching • Inject inverter pairs at gate outputs – increases possible matches – add fake inverter pair cell to library » removes remaining inverter pairs from circuit without inverters with inverters Tree Matching Issues • Tree matching is fast – simple DFS of circuit and cell trees • Still many tree representations for a function • Might miss common subexpressions – if cell matching does not line up with fan-out nodes – stop by treating fan-outs as cell outputs a a a b b b c c c b c b d d c d 2-input ANDs 3-input ANDs cost: 15 xistors cost: 16 xistors Graph Pattern Matching • Match subgraphs, not trees – avoid breaking graph into forest of trees – match more gate types » XOR, MUX » multiple-output gates • Algorithm – find all circuit subgraph to cell graph matchings » generate DFS traversal of each cell graph from outputs » O(C*N) for C circuit nodes, N total cell library nodes – cover graph with minimum-cost cell graphs » matrix with row for each circuit node, column for each cell, 1 if cell matches circuit, 0 otherwise » find least-cost maximum independent set of rows » branch-and-bound search algorithm » bound is least-cost rows found so far » exponential in worst case Graph Matching Algorithm boolean CellMatch(x, y) if (y.gatetype != x.gatetype) return(0); i = y.firstchild; j = x.firstchild; while (i != NULL && j != NULL) if (!CellMatch(i,j)) return(0); i = y.nextchild; j = x.nextchild; if (j == NULL) return(1); else return(0); for each node i in circuit graph for each cell j in library graphs if (CellMatch(i,j)) match[i][j] = 1 find least-cost maximum set of independent rows in match array Minimum-Cost Graph Cover 3 2 XOR2 INV NOR2 NAND2 4 1 Cost 14 2 4 4 5 6 a 0 0 1 0 b 1 0 0 1 XOR Cell DFS c 0 0 0 1 d 0 1 0 0 d e 0 0 0 1 c f 0 0 0 1 e a f b first solution: a, b, c, d, e Circuit DFS bound = 4 + 4 + 4 + 2 + 4 = 22 least-cost solution: a, b, d a - fails at 1 cost = 4 + 14 + 2 = 20 b - match at 1 NOR2 at a c - fails at 2 XOR2 at b d - fails at 1 INV at d e - fails at 2 f - fails at 4 Maximum number of rows - cover most nodes in fewest cells Independent - cells do not overlap Layout-Driven Mapping • Goal – minimize chip area – previous approaches focus on cell area – ignore inter-cell routing – example - high fan-in and fan-out » minimizes cell area » takes a lot of routing • Solution – estimate placement and routing during mapping » simple, fast estimates – incrementally update during mapping » only mapping a few gates at a time FPGA Technology Mapping • Programmable logic blocks – multiplexor-based (Actel) – lookup table (Xilinx) • Problem – lookup table of K inputs implements possible functions K – K = 5 typically 22 – impractical to use library cell matching approach – requires 4 billion variations for each cell pattern/tree/graph – similar problem for mux-based FPGAs • Solutions – clique partitioning – bin packing – OBDD matching FPGAs • Xilinx – RAM configurable logic blocks (CLB) – RAM programmable wiring CLB – 2 functions of 4 variables – 1 function of 5 variables – implemented via table lookup RAM • Actel – fuse configurable logic elements – fuse programmable wiring – all 2 and 3-variable functions – some 4-variable functions SRAM-based FPGA • Functions are implemented as MUX • Interconnects are implemented as wire segment • Interconnect connections are pass transistors • Advantage: CMOS • Disadvantage: slow • Other issues: – Need re-program each time power is on