36x48 vertical poster template by pengtt


									                  Understanding the Impacts of 3D Stacked
                              Layouts on ILP
                               Vivek Venkatesan, Manu Awasthi, Rajeev Balasubramonian
                                        School of Computing, University of Utah

BACKGROUND                                                         3D TECHNOLOGY
Interconnects within a processor pipeline are known to be a
major bottleneck for performance and power in future                                                                                  Drawback : Increased Thermal
processors. Wire delays are Vertical 3D stacking of dies                                                                              Density
allows reduction of overall wire-lengths and helps alleviate the
performance and power overhead of on-chip wiring. The
primary disadvantage is that it results in increased power-                                                                              RegFile
densities and on-chip temperatures.
                                                                                                                                                                         Break and Stack
                                                                                           3D Folding – Proposed by Puttaswamy et. Al.
   Wire-latencies predicted to be in 10’s of cycles in future
   fabrication technologies
   Studying the impact of wire-delays on performance                                       Observation : Wire-Delay Limited Architectures
reinforces the need for interconnect optimization techniques                               stand to benefit more from a 3D integration
   Popular belief: Multi-threading hides wire-delays, not                                  technology
                                                                                                  Proposed Solution : 3D Stacking
   entirely true!
   Technique to alleviate key wire-delays -> Floor-planning        Cache Bank (Replicated/Word

                                                                                           Arch 1                                      Arch 2                                               Arch 3
                                                                                      Reduced Wire Delays and Better Thermal Density !!

                                                                   3D BENEFITS
                                                                      Best case 3D (Arch 3/WI) performs 12% better than best
                                                                   case 2D (Replicated Cache banks).
                                                                      Better thermal profile : Best case (Arch 3/ WI) has just 10
                                                                   0 C increase from 2D with maximum performance gains

                                                                                             Base Case Performance Comparisons                                         Average performance Improvement wrt 2D Centralized

                                                                                      25                                                                          25
   Floor-planning generates arbitrary layouts of micro-
                                                                                                                                                IPC Improvement
                                                                    IPC Improvement

                                                                                      20                                                                          20
architectural                                                                         15                                                                          15
    blocks in a processor evaluating each with respect to an                          10                                                                          10
    objective function                                                                 5                                                                           5
   Include delay-criticality information in the objective                              0                                                                           0

function to                                                                                  Replicated                          WI                                       Arch 1                 Arch 2                Arch 3

    keep heavily communicating blocks closer
                                                                   2D Performance Comparison                                                                      Best Case 3D Performance
   3D floor-planning generates 3D layouts, more potential to
    closeness in the vertical dimension                            CONCLUSIONS

                                                                    3D Technology has the potential to improve processor
                                                                    performance, power and cost
                                                                    3D wafer bonding traps heat resulting in higher peak and
                                                                    average temperatures
                                                                    Tiled-architectures with long inter-cluster wires stand to
                                                                    gain more from 3D stacking
                                                                    Aggressive cooling capabilities may be required to extract the
                                                                    full potential of 3D
                                                                    Other promising applications of 3D technology include “snap-
                                                                    on” analysis engines, fault-correction engines and stacked

To top