Portable_ mostly-concurrent_ mostly-copying garbage collection for

Shared by: gjmpzlaezgx
Categories
Tags
-
Stats
views:
0
posted:
10/18/2011
language:
English
pages:
30
Document Sample
scope of work template
							      Portable,
 mostly-concurrent,
mostly-copying GC for
  multi-processors
        Tony Hosking
Secure Software Systems Lab
     Purdue University
    Platform assumptions
• Symmetric multi-processor (SMP/CMP)
• Multiple mutator threads
• (Large heaps)
     Desirable properties
• Maximize throughput
• Minimize collector pauses
• Scalability
    Exploiting parallelism
• Avoid contention
• (Mostly-)Concurrent allocation
• (Mostly-)Concurrent collection
     Concurrent allocation
• Use thread-private allocation “pages”
• Threads contend for free pages
• Each thread allocates from its own
  page
  • multiple small objects per page, or
  • multiple pages per large object
   Concurrent collection:
  The tricolour abstraction
• Black
  • “live”
  • scanned
  • cannot refer to white
• Grey
  • “live” wavefront
  • still to be scanned
  • may refer to any color
• White
  • hypothetical garbage
       Garbage collection
• White = whole heap
• Shade root targets grey
• While grey nonempty
  • Shade one grey object black
  • Shade its white children grey
• At end, white objects are garbage
       Copying collection
• Partition white from black by copying
• Reclaim white partition wholesale
• At next GC, “flip” black to white
Incremental collection
    Mutator threads
Concurrent collection
    Mutator threads




  Background GC thread
     Concurrent mutators
• Mutation changes reachability during GC
• Loss of black/grey reference is safe
  • Non-white object losing its last reference
    will be garbage at next GC
• New reference from black to white
  • New reference may make target live
  • Collector may never see new reference
• Mutations may require compensation
     Compensation options
• Prevent mutator from creating black-to-
  white references
  • write barrier on black
  • read barrier on grey to prevent mutator
    obtaining white refs
• Prevent destruction of any path from a
  grey object to a white object without
  telling GC
  • write barrier on grey
       Mostly-copying GC
          [Bartlett]
• Copying collection with ambiguous roots
  • Uncooperative compilers
  • Untidy references
  • Explicit pinning
• Pin ambiguously-referenced objects
  • Shade their page grey without copying
• Assume heap accuracy
  • Copy remaining heap-referenced objects
          Incremental MCGC
             [DeTreville]
• Enforce grey mutator invariant
  – STW greys ambiguously-referenced pages
  – Read barrier on grey using VM page protection
• Read barrier
  –   Stop mutator threads
  –   Unprotect page
  –   Copy white targets to grey
  –   Shade page black
  –   Restart threads
• Atomic system call wrappers unprotect
  parameter targets (otherwise traps in OS
  return error)
      Concurrent MCGC?
• Stopping all threads at each
  increment is prohibitive on SMP &
  impedes concurrency
• BUT barriers difficult to place on
  ambiguous references with
  uncooperative compilers
• ALSO Preemptive scheduling may
  break wrapper atomicity
  Mostly-concurrent MCGC
• Enforce black mutator invariant
  • STW blackens ambiguously-referenced
    pages
  • Read barrier on load of accurate (tidy) grey
    reference
• Read barrier:
  • Blacken grey references as they are loaded
• No system call wrappers: arguments are
  always black
Read barrier on load of grey
• Object header bit marks grey objects
• Inline fast path checks grey bit in target
  header, calls out to slow path if set
• Out-of-line slow path:
  • Lock heap meta-data
  • For each (grey) source object in target page
     • Copy white targets to grey
     • Clear grey header bit
  • Shade target page black
  • Unlock heap meta-data
   Coherence for fast path
• STW phase synchronizes mutators’ views of
  heap state
• Grey bits are set only in newly-copied
  objects (ie, newly-allocated grey pages)
  since most recent STW
• Mutators can never see a cleared grey
  header unless the page is also black
• Seeing a spurious grey header due to weak
  ordering is benign: slow path will synchronize
          Implementation
• Modula-3:
  • gcc-based compiler back-end
  • No tricky target-specific stack-maps
  • Compiler front-end emits barriers
  • M3 threads map to preemptively-scheduled
    POSIX pthreads
  • Stop/start threads: signals + semaphores, or
    OS primitives if available
  • Simple to port: Darwin (OS X), Linux,
    Solaris, Alpha/OSF
            Experiments
• Parallelized GCOld benchmark to permit
  throughput measurements for multiple
  mutators
• Measures steady-state GC throughput
• 2 platforms:
  • 2 x 2.3GHz PowerPC Macintosh Xserve
    running OS X 10.4.4
  • 8 x 700MHz Intel Pentium 3 SMP running
    Linux 2.6
                             Read Barriers: STW
                             1 user-level mutator thread, work=1
                   5

                   4
                                             Hardware       Software
                   4

                   3
elapsed time (s)




                   3

                   2

                   2

                   1

                   1

                   0
                       0.1    0.5        1              2              4   8
                                             GC ratio
                                     Elapsed time (s)
                             1 system-level mutator thread, work=1
                   7


                   6                              STW        INC


                   5
elapsed time (s)




                   4


                   3


                   2


                   1


                   0
                       0.1     0.5        1              2         4   8
                                              GC ratio
                                      Heap size
                                1 system-level mutator thread
                    140


                    120                            STW       INC


                    100
maximum heap (MB)




                     80


                     60


                     40


                     20


                      0
                          0.1   0.5       1              2         4   8
                                              GC ratio
                    BMU
1 system-level mutator thread, work=1000, ratio=1
                                              Scalability
                                         work=1000, ratio=1, 8xP3
                   120


                                                               STW           INC
                   100


                    80
elapsed time (s)




                    60


                    40


                    20


                     0
                         1   2   3   4    5   6   7      8      9       10     11   12   13   14   15   16
                                                      mutator threads
                             Java Hotspot server
                                  work=1000, 8xP3
                   200

                   180                     Serial       Concurrent MS
                   160

                   140
elapsed time (s)




                   120

                   100

                    80

                    60

                    40

                    20

                     0
                         1    2   3    4            5           6       7   8
                                       mutator threads
              Conclusions
• Mostly-concurrent,mostly-copying collection
  is feasible for multi-processors (proof-of-
  existence)
• Performance is good (scalable)
• Portable: changes only to compiler front-end
  to introduce barriers, and to GC run-time
  system
• Compiler back-end unchanged: full-blown
  optimizations enabled, no stack-map
  overheads
           Future work
• Convert read barrier to “clean” only
  target object instead of whole page
                                            Scalability
                                        work=10, ratio=1, 8xP3
                   80
                                                            STW        INC
                   70

                   60
elapsed time (s)




                   50

                   40

                   30

                   20

                   10

                    0
                        1   2   3   4   5   6   7      8      9       10     11   12   13   14   15   16
                                                    mutator threads
                             Java Hotspot server
                                      work=10, 8xP3
                   120

                                              Serial       Concurrent MS
                   100


                    80
elapsed time (s)




                    60


                    40


                    20


                     0
                         1    2   3       4            5           6       7   8
                                          mutator threads

						
Shared by: gjmpzlaezgx
Related docs
Other docs by gjmpzlaezgx
EMERGENCY PLAN POLICY TEMPLATE
Views: 1  |  Downloads: 0
Tribune_ListofCreditorsC
Views: 10100  |  Downloads: 5
The MERS Problem
Views: 32  |  Downloads: 0
06-6135
Views: 4  |  Downloads: 0
Secrets to Pre-Foreclosure Profits
Views: 50  |  Downloads: 0
Question Of Bournvit.. - CEON
Views: 6  |  Downloads: 0
INTRODUCTION TO CIVIL LITIGATION
Views: 49  |  Downloads: 0
ACCREDITED PROGRAMS
Views: 24  |  Downloads: 0