Virtual Memory Recap Memory Hierarchy of a Modern Computer

Document Sample
Virtual Memory Recap Memory Hierarchy of a Modern Computer Powered By Docstoc
					                                                                              Recap: Memory Hierarchy of a Modern
                                                                              Computer System
                                                                              • By taking advantage of the principle of locality:
                                                                                  – Present the user with as much memory as is available in the
                                                                                    cheapest technology.
                                                                                  – Provide access at the speed offered by the fastest technology.

                         Virtual Memory                                                           Processor


                                                                                                 Control                                                            Tertiary
                                                                                                                                                       Secondary    Storage
                                                                                                                                                        Storage      (Disk)
                 Lecturer: Dr. Hui Annie Guo                                                                                     Second     Main
                                                                                                                                                         (Disk)




                                                                                                                On-Chip
                                                                                                    Registers
                                                                                                                                  Level   Memory




                                                                                                                 Cache
                                                                                         Datapath
                   huig@cse.unsw.edu.au                                                                                          Cache    (DRAM)
                                                                                                                                (SRAM)
                    K17-501F (ext. 57136)

                                                                                       Speed (ns): 1s                     10s              100s    10,000,000s 10,000,000,000s
                                                                                      Size (bytes): 100s                                             (10s ms)      (10s sec)
                                                                                                                          Ks               Ms           Gs           Ts
                                                                               COMP3211/9211                                                               2009S1 wk9_2 P2




Virtual memory vs Physical memory                                             Virtual Memory: Key Ideas (1/2)

• Physical memory                                                             • Virtual address space, divided into pages, is much larger than
    – Main memory in the hierarchical memory system                             physical address space, which is divided into similar sized
                                                                                blocks known as frames.
    – Its addresses are called physical addresses
                                                                              • A process, running on the processor refers to data using virtual
• Virtual memory                                                                addresses
    – A technique that                                                            – Why not use physical address?
        • allows users/programs to reference memory larger than                       • synonym / alias problem: two different virtual addresses map to same
          actually exits in the computer                                                physical address
        • Uses physical memory as a cache for secondary storage               • The virtual addresses must therefore be translated into
        • The addresses in virtual memory are called virtual addresses.         physical addresses to access actual memory locations.
                                                                              • Virtual page numbers (the high order address bits) are
                                                                                translated into physical frame numbers using a page table that
                                                                                is stored in physical memory.
                                                                              • The data cache is indexed/tagged using physical addresses.


 COMP3211/9211                                              2009S1 wk9_2 P3    COMP3211/9211                                                               2009S1 wk9_2 P4
Virtual Memory: Key Ideas (2/2)                                                 Recall: How is the hierarchy managed?
                                                                                                                                  Processor


• To speed up translation, a translation lookaside                              • registers ↔ memory                          Control

  buffer (TLB) – a small associative cache – is used to
                                                                                                                                                                                          Tertiary
                                                                                    – by compiler/programmer                                                                Secondary
                                                                                                                                                                             Storage
                                                                                                                                                                                          Storage
                                                                                                                                                                                        (Removable

  store recent page-frame translations
                                                                                                                                                                              (Disk)       Disk)
                                                                                                                                                          Second     Main
                                                                                                                                                           Level   Memory




                                                                                                                                   Registers
                                                                                                                                                          Cache    (DRAM)




                                                                                                                                               On-Chip
                                                                                                                                                Cache
                                                                                                                       Datapath
    – More will be covered later                                                                                                                         (SRAM)

                                                                                • cache ↔ memory
• Hardware must trap to the operating system if a                                   – by the hardware
  page is not resident in memory so that it can be
  loaded from disk – this may result in another page
  having to be written back to disk.                                            • memory ↔ disks
                                                                                    – by the hardware and operating system (virtual memory)
                                                                                    – by the programmer (files)




 COMP3211/9211                                                2009S1 wk9_2 P5    COMP3211/9211                                                                          2009S1 wk9_2 P6




Basic Issues in Virtual Memory System Design                                    Address Map ([B]lock [I]dentification)
                                                                                   V = {0, 1, . . . , n - 1} virtual address space
• Size of information blocks that are transferred from secondary (Disk) to
  main storage ([M]emory)                                                          M = {0, 1, . . . , m - 1} physical address space, n > m

• Missing item fetched from secondary memory (e.g. disk) only on the               MAP: V → M U {Ø} address mapping function
  occurrence of a fault → demand load policy
                                                                                        MAP(a) = a' if data at virtual address a is present in physical
• Which region of M is to hold the new block → placement policy                                        address a' and a' in M
• Block of information brought into M, and M is full, then some region of M
  must be released to make room for the new block → replacement policy                           = Ø if data at virtual address a is not present in M

                               mem            disk                                        a                                       missing item fault
                    cache                                                                         Name Space V
      reg
                                                                                                                                   fault
                                                                                 Processor                                        handler
                                                 pages
                               frame
                                                                                                                  Ø
  Paging Organization                                                                               Addr Trans                     Main                            Secondary
                                                                                          a         Mechanism                     Memory                            Memory
  virtual and physical address space partitioned into blocks of equal size                                        a'

                               page frames                                                                physical address                                         OS performs
            pages
                                                                                                                                                                   this transfer
 COMP3211/9211                                                2009S1 wk9_2 P7    COMP3211/9211                                                                          2009S1 wk9_2 P8
 Paging Organization (BI)                                                                                   Virtual Address and Cache Access
                                                           V.A.
P.A.                                                                                           unit of
    0       frame 0           1K                           0            page 0       1K        mapping
 1024             1           1K          Addr
                                                        1024                 1       1K                     • Page table is implemented in the main memory. It
                                          Trans
                                          MAP                                            also unit of
                                                                                                              takes an extra memory access to translate VA to PA
 7168                 7       1K                                                         transfer from      • This makes cache access very expensive, and this is
                                                                                         virtual to
           Physical                                                                      physical             the "innermost loop" that you want to have go as
           Memory
                                                       31744                   31     1K memory               fast as possible!!
        Address Mapping                                           Virtual Memory
                                    10
 VA       page no.                 disp
                                                                                                                             VA            PA            miss
                          Page Table                                                                                              Trans-                         Main
                                                                                                                     CPU                         Cache
Page Table                                                                                                                        lation                        Memory
Base Reg
                          V
                              Access                              actually, concatenation                                                  hit
        index                 Rights fr.   no.         +                                                                           data
                                                                  is more likely
        into
        page                                                                the size of the page table
        table             table located           physical
                           in physical            memory                    depends upon the size of the
                             memory               address                   virtual address space
   COMP3211/9211                                                                          2009S1 wk9_2 P9    COMP3211/9211                                       2009S1 wk9_2 P10




 Making address translation fast: TLB                                                                       Translation Look-Aside Buffers (TLB)

  • Translation Look-aside Buffer (TLB) provides a                                                          • Just like any other cache, the TLB can be organized
    cache of recent translations                                                                              as
                                                                                                                – fully associative,
       Virtual Address Space        Physical Memory Space           Page Table       virtual address
                  0                                0                                   page off                 – set associative, or
                                                                    0
                  1                                1
                                                                    1
                                                                    2   2
                                                                                                                – direct mapped
                                                                    3
   MAP(2) = 2     2                                2
                                                                    4   0                                   • TLBs are usually small, typically not more than 128 -
                  3                                3                5
                                                                    6
                                                                    7   1
                                                                                                              256 entries
   MAP(4) = 0     4
                                                                    8                                           – This permits fully associative lookup on high-end machines.
                  5                                                 9   3
                                                                   10
                  6                                                11                                           – Most mid-range machines use small n-way set associative
   MAP(7) = 1     7                                                                                               organizations.
                                                   Translation Lookaside Buffer
                  8
                                                                   page frame
   MAP(9) = 3     9
                                                                     2    2
                 10                                                  4    0
                 11

   COMP3211/9211                                                                        2009S1 wk9_2 P11     COMP3211/9211                                       2009S1 wk9_2 P12
  Datapath with TLB                                                                                       Further Improvement

   • Access TLB requires 1/2 clock cycles as compared                                                     • Reduce the effect of address translation on
     to multiple cycles to access the page table                                                            performance
                                                                                                              – Overlapping the cache access with the TLB access
                                   1/2 CC   hit
                         VA                 PA                                  miss                              • Works because high order bits of the VA are used to look in the
                                                                                                                    TLB while low order bits are used as index into cache
                                TLB                                                      Main
                CPU                                             Cache
                               Lookup                                                   Memory
Translation                   miss                            hit       1 CC            10’s – 100’s CC     MIPS R3000 Pipeline




                                            several CC
with a TLB                                                                                                                         Dcd/ Reg
                                                                                                                   Inst Fetch                 ALU / E.A    Memory    Write Reg
                                   Trans-                                                                          TLB                  RF    Operation              WB
                                   lation
                                                                                                                         I-Cache              E.A.   TLB   D-Cache
                                                                    data




    COMP3211/9211                                                                      2009S1 wk9_2 P13    COMP3211/9211                                             2009S1 wk9_2 P14




  Overlapped Cache & TLB Access                                                                           Summary #1/ 3:

                            assoc                             index                                       • The Principle of Locality:
   32         TLB                                                                  Cache          1K
                            lookup                                                                            – Program likely to access a relatively small portion of the
                                                                                                                address space at any instant in time.
                                                                                  4 bytes                         • Temporal Locality: Locality in Time
                              PA                         10         2                                             • Spatial Locality: Locality in Space
                                                                00                                        • Three Major Categories of Cache Misses:
               Hit/           VA
                                                                                                              – Compulsory Misses: sad facts of life. Example: cold start
    PA                         20                         12               PA    Data           Hit/            misses.
               Miss
                                                                                                Miss
                            page #                       disp                                                 – Conflict Misses: increase cache size and/or associativity.
                                                                                                                         Nightmare Scenario: ping pong effect!
                                                                    =                                         – Capacity Misses: increase cache size


        IF cache hit AND (cache tag = PA) THEN deliver data to CPU
        ELSE IF cache miss AND TLB hit THEN
                 access memory with the PA from the TLB
        ELSE do standard VA translation
    COMP3211/9211                                                                      2009S1 wk9_2 P15    COMP3211/9211                                             2009S1 wk9_2 P16
Summary #2 / 3: The Cache Design Space                                       Summary #3 / 3 : TLB, Virtual Memory
                                                Cache Size
• Several interacting dimensions                                             •   Caches, TLBs, Virtual Memory all understood by
    –   cache size                                           Associativity       examining how they deal with 4 questions:
    –   block size                                                               1.   Where can block be placed?
    –   associativity                                                            2.   How is block found?
    –   replacement policy                                                       3.   What block is replaced on miss?
    –   write-through vs write-back                      Block Size              4.   How are writes handled?
    –   write allocation                                                     •   Page tables map virtual address to physical address
• The optimal choice is a compromise                                         •   TLBs are important for fast translation
    – depends on access characteristics   Bad
         • workload
         • use (I-cache, D-cache, TLB)
                                         Good Factor A        Factor B
    – depends on technology / cost
                                                Less            More
• Simplicity often wins

 COMP3211/9211                                            2009S1 wk9_2 P17    COMP3211/9211                               2009S1 wk9_2 P18