Optimization of Mesh Locality for Transparent Vertex Caching

Document Sample
Optimization of Mesh Locality for Transparent Vertex Caching Powered By Docstoc
					Optimization of Mesh Locality for
 Transparent Vertex Caching


          Hugues Hoppe
        Microsoft Research
            SIGGRAPH 99
Triangle meshes
                 System architecture
                 mesh in memory
 CPU
                      geometry     ?                graphics processor

L1 cache          vertices
                  vertices     faces                           geometric
                                                               processing
L2 cache
           bus                               bus
                                            (e.g.     texture
                  texture     texture ...
                                             AGP)      cache rasterization
                   image       image


                                                          frame buffer
                            bottleneck
                  Previous work
       compressed                 graphics processor
     geometry stream
                                parsing mesh geometric
        v1    c v2-v1 c          logic  buffer processing
    v3-v2 c v4-v3 c c     bus
    v5-v4 c c



   16-entry FIFO buffer        [Deering95, Chow97]

      
    θ n stack buffer            [BarYehuda-Gotsman96]

   mesh compression            [Taubin-Rossignac98]
                                [Gumhold-Strasser98]
                                …
               Previous work
   compressed                 graphics processor
 geometry stream
                            parsing mesh geometric
    v1    c v2-v1 c          logic  buffer processing
v3-v2 c v4-v3 c c     bus
v5-v4 c c



     Drawbacks:
           only static geometry
           new API
           not backward compatible
           Our approach
                              Optimize ordering!
   traditional
   mesh API                   graphics processor
vertex    indexed
array       strips              vertex geometric
                                cache processing


                        bus
                                texture
texture   texture ...            cache rasterization
 image     image




   No explicit cache management
        Transparent vertex caching
                 traditional mesh API
                                        graphics
application       vertex   indexed
                             strips      system
                  array

   Pros:
       animated geometry
       application program unchanged
       backward compatible on legacy hardware
   Cons:
       less compression (but still a factor ~2)
             Indexed triangle strips
1            3             6
                                       v1     1 2 3
                                       v2      4 3 5
                                       v3      6
      2          4     5               v4
                                              2 7 4
                                       v5
                                               5
                                       v6
                                       v7
              7


    position normal color texture1 texture2
     xyz     nx ny nz rgba u v        uv      ~ 2 bytes

                  ~ 32 bytes
   Cache parameters

vertex cache

                    16
               size ? entries


               replacement
                        FIFO
                 policy ?
            Vertex data access
              = cache hit   = cache miss

   traditional strips             with caching




                             assume in cache
transfer ~1.0 vertex/tri     transfer ~0.5 vertex/tri
            Vertex data access
            # misses       0   1   2     3
   traditional strips                  with caching




transfer ~1.0 vertex/tri       transfer ~0.5 vertex/tri
                       Example
  before optimization          after optimization




                            47% bandwidth reduction
# misses
           0   1   2    3
         Optimization problem

Given mesh,
 find strips minimizing bus bandwidth
  ( strips correspond to ordering of faces F )

   minFP ( F ) r F   32  bF   2
             ˆ



        cache miss rate         # vertex indices
    Two reordering techniques

   Greedy strip-growing
       fast: 40,000 faces/sec


   Local optimization
       improve initial greedy solution
       very slow
         Greedy strip-growing

   Inspired by [Chow97]

                           1
                           2
                       3

                   4

   To decide when to restart strip,
     perform lookahead cache simulations
        When to restart strip?
good strip length
4
3      3
       2
       4       3
               2
               1
               4


2
3
1
4      1
       2
       4
       3       1
               4
               2
               3


3
1
2      1
       2       1


              (cache size 4)
        When to restart strip?
good strip length              strip too long
                      3
                      4         2
                                4
                                3        1
                                         3
                                         2
                                         4      2
                                                1


               4      2
                      1
                      3
                      4         2
                                4
                                3
                                1        1
                                         3
                                         2      1
                                                4


3      2       1      2
                      3
                      4         3
                                2
                                1        1
                        jump in miss rate!
              (cache size 4)
         Lookahead simulations
                              4


             3       2        1



   Perform s simulations
    (a) restart immediately, after 0 faces
    (b) restart after 0 < i < s faces
   If (a) is best, restart strip
                     Result
traditional long strips
                              face order
                                   within strip

                                   strip restart
                     Result
traditional long strips   greedy strip-growing
                      Result
     before                           after




45.8 bytes/triangle            25.5 bytes/triangle
                 Local optimization
Apply perturbations to face ordering if cost is lowered:
                                 x                      y

     Initial order F             Fx                     Fy


   F’=Reflectx,y(F)    F1..x-1             Fy..x             Fy+1..m


   F’=Insert1x,y(F)    F1..x-1   Fy         Fx..y-1          Fy+1..m


   F’=Insert2x,y(F)    F1..x-1   Fy-1..y      Fx..y-2        Fy+1..m
                       Result
greedy strip-growing            local optimization




 25.5 bytes/triangle            24.2 bytes/triangle
                                  Bandwidth Results
                    Original         Greedy strips            Local optimization
                   50
Bytes / triangle




                   40

                   30

                   20

                   10

                   0
                        fandisk   gameguy bunny2k   bunny4k   bunny   buddha   schooner


                        Improvement by factor of 1.6 – 1.9
                              Choice of cache size
                     3
Cache miss rate r


                    2.5
                     2
                                    size 16 sufficient for most gain
                    1.5
                     1
                    0.5
                     0
                          0    10   20     30 40        50    60       70
                                         Cache size
     Cache replacement policy

      FIFO        (cache size 4)   LRU
3
4       3
        2
        4          3
                   2
                   1
                   4


1
3
2
4       1
        2
        3          1
                   2


1
     all is OK
    Cache replacement policy

    FIFO    (cache size 4)   LRU
4
3     3
      2
      4      3
             2
             1
             4         3
                       4          4
                                  3
                                  1
                                  2   3
                                      4
                                      1


1
2
3
4     1
      2
      3      1
             2         3
                       2
                       4          2
                                  1   4
                                      1


1                      2


           strips twice as long
       Comparison
FIFO                LRU
       Comparison
FIFO                LRU
                Summary

   Vertex caching reduces geometry
    bandwidth by factor of 1.6 to 1.9

   Transparent to application:
     simply pre-process the models (fast)

   Still efficient on legacy hardware

   Supports dynamic geometry
                 Future work

   Issue of cache size
       Find face ordering good for all sizes?
       Standardize on size 16?
       Reprocess mesh at load time

   Interaction with texture caching

   Cache efficiency during runtime LOD

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:12/4/2011
language:English
pages:31