A1-Presentation-2 by cuiliqing

VIEWS: 9 PAGES: 37

									Process Management & IPC
In Multiprocessor Operating
          Systems
     Presented by Group A1
      Garrick Williamson

         SMU
         Brad Crabtree
        Alex MacFarlaneSM
Process Management & IPC Intro
        (Focus on Solaris)
       Garrick Williamson


          SMU         SM
               Introduction
• SunOS is the operating system component of
  the Solaris environment.
• It supports Symmetric Multiprocessing (SMP).
  See diagram on next page for an example of an SMP
  system.
• The kernel runs equally on all processors
  within a tightly coupled shared memory
  multiprocessor system.

                  SMU
• Control flows are entirely threads, including
  interrupts.                     SM
SMP System Example




     SMU     SM
       SunOS 5.0 Architecture
• In addition to Kernel level threads, SunOS also
  supports multiple threads of control, called lightweight
  processes (LWPs).
• There is one Kernel thread for each LWP. The Kernel
  threads are used when the LWPs perform system
  functions/calls.




                    SMU                SM
SunOS Architecture Diagram




        SMU      SM
              Synchronization
• Threads/Processes synchronize through a
  variety of ways:
  –   Mutual Exclusion locks
  –   Condition Variables
  –   Counting Semaphores
  –   Multiple Readers and single writer locks
• The Mutual Exclusion and writer locks use a

                    SMU
  priority inheritance protocol in order to
  prevent priority inversion.      SM
                 Solaris IPC
• Solaris provides the following mechanisms for
  IPC:
  – Simple, but limited mechanisms include
     • Signals
     • Pipes and named pipes (FIFO)
     • Sockets
  – More versatile mechanisms include
     • Message Queues
     • Shared memory (With Memory Mapped files and IPC

                    SMU
       shared Memory options)
     • Semaphores                    SM
             Simple IPC
• Pipes do not allow unrelated processes to
  communicate.
• Named pipes allow unrelated processes to
  communicate, but are not private
  channels.
• Using the kill function, processes may
  communicate with signals, but only
               SMU
  through signal numbers. SM
            Complex IPC
• Messaging allows formatted data streams
  to be sent to arbitrary processes.
• Semaphores allow processes
  synchronization.
• And shared memory allows processes to
  share part of their virtual address space.

               SMU           SM
IRIX Process Management
        And IPC
      Brad Crabtree


       SMU            SM
               Outline
• Hardware Background
• Process Management Facilities
• Interprocess Communication Facilities




              SMU           SM
           Large Scale Computing
             Machines a Reality
•
•
    The Avalon A12.
    The Cambridge Parallel Processing   • June 2001
    Gamma II Plus.
•   The Compaq AlphaServer SC.
                                          – Raytheon installs
•   The Fujitsu AP3000.                     1152 processor Origin
•   The Fujitsu VPP5000 series.
•   The Hitachi SR8000 system.
                                            3000 series at NOAA
•   The HP Exemplar V2600.                – $67M
•   The IBM RS/6000 SP.
•   The NEC Cenju-4.                      – 900 BFLOPS/sec
•   The NEC SX-5.
•   The Quadrics Apemille.
                                          – 2 PB Tape Library


                                SMU
•   The SGI Origin 2000 series.
•   The Sun E1000 Starfire.
•   The Tera/Cray SV1.                         SM
•   The Tera/Cray T3E.
•   The Tera MTA
     SGI Origin Architecture
• ccNUMA (NUMALink)
  – non-blocking
    crossbar switches as
    an interconnect fabric
  – 1.6GB-per-second
    crossbar switch



                   SMU       SM
Switch verses Bus




    SMU      SM
   “Cellular IRIX” Scheduler
• Facilities for Improving Scalability and
  Locality
• Job Priorities
        –   Real-Time Jobs
        –   Batch Critical
        –   Time Share
        –   Batch


                     SMU
        –   Weightless

• User-level Scheduler Concept
                            SM
            Real Time Jobs
• Global Run Queue replaced with Implicit
  Binding Scheme
  – improve cache affinity and scalability
  – binds top N jobs, by priority, to N CPUs
     • CPU is always available when real-time job
       comes in because currently running job is of
       lower priority

                  SMU
     • Real-Time jobs always go to same CPU
                                     SM
    Hard Real-Time in IRIX
• REACT/PRO Extentions
  – Lock processes, memory to CPUs
  – Disable IRIX scheduler and replace with
    Frame Scheduler, Deadline Scheduler or
    None (yours)
  – Direct interrupts away from CPUs

                SMU
  – Deterministic interrupt latency
                               SM
    Time Sharing Scheduler
• Degrading Priority replaced with
  Earnings Model
  – Distribution controlled by Virtual
    Multiprocessors (VMPs)
  – at 1 HZ, VMPs balance run queues with
    nearest neighbors and push out extra work


                SMU            SM
     Parallel Job Scheduling
• Gang Scheduling replaced with
  Nanothreads
  – Space sharing over Time Sharing
  – Job requests CPUs, gets # avail and then
    algorithm is re-blocked
  – When thread preempted, context is saved to

                SMU
    shared memory and User Level Scheduler
    re-blocks again             SM
     Replicated Kernel Text
• Wired in 16MB TLB pair into kernel
  virtual memory space
  – One read-only, one read-write
  – TLB miss exception overhead is avoided




                SMU            SM
        Memory Migration
• Trying to avoid memory hot spots
• Reference counters in hub (local/remote)
• Fast Block Transfer Engine
  – Marks Source Page as Poisoned
     • Lazy TLB Shootdown
• Hysterisis for frequent migration
  managed
                SMU           SM
   Types of IPC & Compatibility
 Type of IPC                              Purpose                              Compatibility
Signals        A means of receiving notice of a software or hardware event,    POSIX, SVR4,
               asynchronously.                                                 BSD
Shared         A way to create a segment of memory that is mapped into the     POSIX, IRIX,
memory         address space of two or more processes, each of which can       SVR4
               access and alter the memory contents.


Semaphores     Software objects used to coordinate access to countable         POSIX, IRIX,
               resources.                                                      SVR4
Locks,         Software objects used to ensure exclusive use of single         POSIX, IRIX
Mutexes, and   resources or code sequences.
Condition
Variables
Barriers       Software objects used to ensure that all processes in a group   IRIX
               are ready before any of them proceed.

Message        Software objects used to exchange an ordered sequence of        POSIX, SVR4




                                  SMU
Queues         messages.

File Locks
                                                                  SM
               A means of gaining exclusive use of all or part of a file.      SVR4, BSD

Sockets        Virtual data connections between processes that may be in       BSD
               different systems.
                POSIX vs. IRIX Shared
                      Memory
POSIX
Function Name    Purpose and Operation
mmap(2)          Map a file or shared memory object into the address space
shm_open(2)      Create, or gain access to, a shared memory object.
shm_unlink(2)    Destroy a shared memory object when no references to it remain open.


IRIX
Function Name    Purpose and Operation




                                    SMU
usconfig(3)      Establish the default size of an arena, the number of concurrent processes that can use it, and
                 the features of IPC objects in it.
usinit(3)        Create an arena or join an existing arena.           SM
usadd(3)         Join an existing arena.
                             usconfig options
usconfig() Flag Name   Meaning
CONF_INITSIZE          The initial size of the arena segment. The default is 64 KB. Often you know
                       that more is needed.
CONF_AUTOGROW          Whether or not the arena can grow automatically as more IPC objects or data
                       objects are allocated (default: yes).
CONF_INITUSERS         The largest number of concurrent processes that can use the arena. The default is 8;
                       if more processes than this will use IPC, the limit must be set higher.
CONF_CHMOD             The effective file permissions on arena access. The default is 600, allowing only
                       processes with the effective UID of the creating process to attach the arena.
CONF_ARENATYPE         Establish whether the arena can be attached by general processes or only by members
                       of one program (a share group).




                                         SMU
CONF_LOCKTYPE          Whether or not lock objects allocated in the arena collect metering statistics as they are used.
CONF_ATTACHADDR An explicit memory base address for the next arena to be created
CONF_HISTON/OFF                                                             SM
                       Start and stop collecting usage history (more bulky than metering information) for semaphores
                       in a specified arena.
CONF_HISTSIZE          Set the maximum size of semaphore history records.
                IRIX IPC
• Tuned for Multiprocessor Environment
• Utilizes “shared arena” memory
  – memory that can be mapped into the
    address spaces of multiple processes
  – A shared arena is identified with a file that
    acts as the backing store for the arena

                 SMU
    memory
  – shared memory is pinned into physical
                                  SM
    memory, accessible by programs and kernel
           First Touch Rule
• Pages in an arena are allocated via first
  touch
  – places virtual page in the node that first
    accesses it
• To ensure spread processes have local
  access to most used pages, touch whole

                 SMU
  pages in arena from processes which use
  them most                   SM
  – dynamic realloc. will handle; but slower
Linux Process Management

    Alex MacFarlane




      SMU         SM
                         Threads
• Number of threads limited only to size of
  physical memory. By default, set to half:
  max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2;

• Modifiable at runtime using sysctl() or
  the proc filesystem interface.
• Was limited to 4k in Linux 2.2

                        SMU                     SM
            Thread Types
• Idle Thread(s)
  – One per CPU in SMP system
  – Created at boot time
• Kernel Threads
• User-space Threads
• Threads created by clone(), an extension
  to fork()    SMU            SM
             clone() flags
• CLONE_VM
  – Share data and stack
• CLONE_FS
  – Share filesystem info
• CLONE_FILES
  – Share open files
• CLONE_SIGHAND
  – Share signal handlers
• CLONE_PID
                   SMU
  – Share PID with parent   SM
    Linux Scheduling Policies
• SCHED_OTHER
  – Traditional UNIX scheduling
• SCHED_FIFO
  – Runs until blocking on I/O, explicitly yielding CPU
    or being pre-empted by higher priority realtime
    task.
• SCHED_RR
  – Same as SCHED_FIFO but limited to a timeslice
• All user-space tasks must use SCHED_OTHER
                   SMU
• Static priorities may be assigned using nice()
                                  SM
      Process Representation
• A collection of struct   task_struct   structures
• Linked in two ways:
  – A hashtable hashed on pid
  – A circular doubly-linked list
• Find specific task using find_task_by_pid()
• Walk tasks using for_each_task()

                 SMU
• Modifications protected by a read-write
  spinlock.                    SM
                  Process States
•   TASK_RUNNING: means the task is in the run queue.
•   TASK_INTERRUPTIBLE: means the task is sleeping but can
    be woken up by a signal or by expiry of a timer.
•   TASK_UNINTERRUPTIBLE: same as previous, except it
    cannot be woken up.
•   TASK_ZOMBIE: task has terminated but has not had its status
    collected (wait()-ed for) by the parent (natural or by adoption).
•   TASK_STOPPED: task was stopped, either due to job control
    signals or due to ptrace().
•   TASK_EXCLUSIVE: this is not a separate state but can be OR-
    ed to either one of TASK_INTERRUPTIBLE or
    TASK_UNINTERRUPTIBLE. Prevents “thundering herd”.
•
                       SMU
    A process’ state may be modified asynchronously.
                                              SM
         Atomic Operations
• Two types
  – Bitmap
  – atomic_t
• Wrapped by bus locking on SMP
• Bitmap operations – for free/allocated bitmaps
  – set_bit(), clear_bit(), change_bit(),
    test_and_set_bit() etc.
• atomic_t operations – for numeric counts
                 SMU
  – atomic_read(), atomic_set(), atomic_add(),
    atomic_inc() etc.            SM
                              References
•   The SGI Origin software environment and application performance , Whitney, S.;
    McCalpin, J.; Bitar, N.; Richardson, J.L.; Stevens, L., Compcon '97. Proceedings, IEEE ,
    1997, Page(s): 165 -170
•   An Integrated Kernel- and User-Level Paradigm for Efficient Multiprogramming,
    Master’s Thesis, D. Craig, CSRD Technical Report No. 1533, University of Illinois at
    Urbana-Champaign, 1999.
•   Integrated scheduling of multimedia and hard real-time tasks, Kaneko, H.;
    Stankovic, J.A.; Sen, S.; Ramamritham, K., Real-Time Systems Symposium, 1996., 17th
    IEEE , 1996, Page(s): 206 -217
•   An Efficient Kernel-level Scheduling Methodology for Multiprogrammed Shared
    Memory Multiprocessors, Proc. of the First Merged IPPS/SPDP Conference, pp. 392--
    397, Orlando, FL, 1998. 18



                                SMU
•   Topics in IRIX Programming, Chapter 2, Interprocess Communication, Silicon
    Graphics, Inc., 2001
•                                                              SM
    Topics in IRIX Programming, Chapter 3, Sharing Memory Between Processes,
    Silicon Graphics, Inc., 2001
                     References
• Phyllis E. Crandall, Eranti V. Sumithasri, and Mark A. Clement.
  Performance comparison of desktop multiprocessing and
  workstation cluster computing. In Proceedings of the Fifth
  International Symposium on High Performance Distributed
  Computing, August 1996.
• www.sun.com
• Kotz, David and Nils Nieuwajaar, Flexibility and Performance of
  Parallel File Systems, ACM Operating Systems Review 30(2),
  ACM Press, April 1996, pp. 63-73.



                       SMU                  SM

								
To top