Windows OS Internals by jianghongl

VIEWS: 143 PAGES: 192

									Windows Internals

David Solomon (
David Solomon Expert Seminars

Mark Russinovich (
About the Speaker: David Solomon
1982-1992: VMS operating systems development at
1992-present: Researching, writing, and teaching
Windows operating system internals
Frequent speaker at technical conferences
(Microsoft TechEd, IT Forum, PDCs, …)
Microsoft Most Valuable Professional (1993, 2005)
   Windows Internals, 4th edition
      PDF version ships with Server 2003 Resource Kit
   Inside Windows 2000, 3rd edition
   Inside Windows NT, 2nd edition
   Windows NT for OpenVMS Professionals
Live Classes
   2-5 day classes on Windows Internals,
   Advanced Troubleshooting
Video Training
   12 hour interactive internals tutorial
   Licensed by MS for internal use                      2
About the Speaker: Mark Russinovich
 Co-author of Inside Windows 2000, 3rd
 Edition and Windows Internals, 4th edition
 with David Solomon
 Senior Contributing Editor to Windows IT
 Pro Magazine
    Co-authors Windows Power Tools column
 Author of tools on
 Microsoft Most Valuable Professional
 Co-founder and chief software architect
 of Winternals Software
 Ph.D. in Computer Engineering
 Special thanks to:
   Dave Cutler for initially granting David access to
   the source code in 1993 and reviewing the book
   and presentations
   Rob Short & Jim Allchin for continuing to be our
   “executive sponsors”
 Also thanks to many others in the Windows
 team (past & present) for their support and
   Landy Wang, Neil Clift, Jim Allchin, Mark Lucovsky, Brian
   Andrews, Richard Ward, Steve Wood, Tom Miller, Gary
   Kimura, Darryl Havens, Lou Perazzoli

Purpose of Tutorial
 Give Windows developers a foundation
 understanding of the system’s kernel
   Design better for performance & scalability
   Debug problems more effectively
   Understand system performance issues
 We’re covering a small, but important set of core
   The “plumbing in the boiler room”

   System Architecture
         System Processes                 Services                         Applications                          Environment
                  Service                                                                                        Subsystems
                Control Mgr.                                                                                           POSIX
                                      SvcHost.Exe                       Task Manager
                                     WinMgt.Exe                           Explorer
              WinLogon             SpoolSv.Exe                                                                      OS/2
User      Session                Services.Exe                           Application
Mode      Manager                                                  Subsystem DLLs                          Windows

   System                                                        NTDLL.DLL

Mode                                              System Service Dispatcher
                                   (kernel mode callable interfaces)                                                       Windows
           I/O Mgr


                                                               Reference                                                     GDI
                                           Play Mgr.

                                           Plug and

                                                                                                   tion Mgr







         Device &                                                                                                          Graphics
         File Sys.                                                                                                         Drivers
                                          Hardware Abstraction Layer (HAL)
                                 hardware interfaces (buses, I/O devices, interrupts,
                               interval timers, DMA, memory cache control, etc., etc.)                                           6
Tools Used To Dig In
 Many tools available to dig into Windows
 OS internals without requiring source code
   Helps to see internals behavior “in action”
   Many of these tools are used in labs in the
   video and the book
 Several sources of tools
   Support Tools (on Windows OS CD-ROM in
   Resource Kit Tools
   Sysinternals tools (
   Windows Debugging Tools                       7
Live Kernel Debugging
 Useful for investigating internal system
 state not available from other tools
   Previously, required 2 computers
   (host and target)
   Target would be halted while host debugger
   in use
 XP & later supports live local kernel
   Technically requires system to be booted
   /DEBUG to work correctly
   But, not all commands work
 LiveKd makes more commands work on a
 live system
  Works on NT4, Windows 2000, Windows XP,
  Server 2003, and Vista
  Was originally shipped on Inside Windows 2000
  book CD-ROM – now is free on Sysinternals
  Tricks standard Microsoft kernel debuggers
  into thinking they are looking at a crash dump
  Does not guarantee consistent view of
  system memory
    Thus can loop or fail with access violation
    Just quit and restart                         9
1.   System Architecture
2.   Processes and Thread Internals
3.   Memory Management Internals
4.   Security Internals

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Processes And Threads                       Per-process
                                           address space

 What is a process?                      Thread
   Represents an instance of a
   running program                                Thread
     You create a process to run a
     Starting an application creates a
   Process defined by
     Address space
     Resources (e.g., open handles)
     Security profile (token)
 System call
   Primary argument to                      System-wide
   CreateProcess is image file             address space
   name (or command line)                                  12
Processes And Threads                                Per-process
                                                    address space
  What is a thread?
    An execution context within a process          Thread
    Unit of scheduling (threads run, processes
    don’t run)
    All threads in a process share the same                 Thread
    per-process address space
      Services provided so that threads can
      synchronize access to shared resources
      (critical sections, mutexes, events,
    All threads in the system are scheduled as         Thread
    peers to all others, without regard to their
    “parent” process
  System call:
    Primary argument to CreateThread is a
    function entry point address
    No threads per-se
    Tasks can act like Windows threads by            System-wide
    sharing handle table, PID and address           address space
Processes And Threads
 Every process starts with one thread
    First thread executes the program’s “main” function
       Can create other threads in the same process
       Can create additional processes
 Why divide an application into multiple threads?
    Perceived user responsiveness, parallel/background execution
       Examples: Word background print – can continue to edit during print
    Take advantage of multiple processors
       On an MP system with n CPUs, n threads can literally run at the
       same time
       Question: Given a single threaded application, will adding a second
       processor make it run faster?
    Does add complexity
       Scalability well is a different question…
          Number of multiple runnable threads versus number CPUs
          Having too many runnable threads causes excess context switching
32-bit x86 Address Space
 32-bits = 4 GB
     Default      3 GB user space

        2 GB
       process         3 GB
        space           User

        2 GB
       Space            1 GB
                    System Space

64-bit Address Spaces
 64-bits = 17,179,869,184 GB
   x64 today supports 48 bits virtual = 262,144 GB
   IA-64 today support 50 bits virtual = 1,048,576 GB

        x64                           Itanium
         8192 GB                          7152 GB
          (8 TB)                           (7 TB)
           User                             User
         process                          process
          space                            space

        6657 GB                          6144 GB
        System                           System
         Space                            Space
Memory Protection Model
 No user process can touch another user process address
 space (without first opening a handle to the process,
 which means passing through NT security)
   Separate process page tables prevent this
   “Current” page table changed on context switch from a thread in 1
   process to a thread in another process
 No user process can touch kernel memory
   Page protection in process page tables prevent this
   OS pages only accessible from “kernel mode”
     x86: Ring 0, Itanium: Privilege Level 0
   Threads change from user to kernel mode and back (via a secure
   interface) to execute kernel code
     Does not affect scheduling (not a context switch)

Process Explorer (Sysinternals)
  “Super Task Manager”
   Shows full image path, command line, environment
   variables, parent process, thread details, security access
   token, open handles, loaded DLLs & mapped files

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Windows Kernel Evolution
 Basic kernel architecture has remained
 stable while system has evolved
   Windows 2000: major changes in I/O
   subsystem (plug & play, power management,
   WDM), but rest similar to NT4
   Windows XP & Server 2003: modest upgrades
   as compared to the changes from NT4 to
   Windows 2000
 Internal version numbers confirm this:
     Windows 2000 was 5.0
     Windows XP is 5.1
     Windows Server 2003 is 5.2
     Windows Vista is 6.0
Kernel Architecture
 Is Windows NT/2000/XP/2003 a microkernel-based OS?
   No – not using the academic definition (OS components and
   drivers run in their own private address spaces, layered on a
   primitive microkernel)
   All kernel components live in a common shared address space
     Therefore no protection between OS and drivers
   But it does have some attributes of a microkernel OS
     OS personalities running in user space as separate processes
     Kernel-mode components don't reach into one another’s
     data structures
       Use formal interfaces to pass parameters and access and/or modify data
   Therefore the term “modified microkernel”
 Why not pure microkernel?
   Performance – separate address spaces would mean context
   switching to call basic OS services
 Linux has the same monolithic kernel architecture
   So do most Unix’s, VMS, …
Invoking a Win32 Kernel API

Windows application          call WriteFile(…)

WriteFile                     call NtWriteFile        Win32-
in Kernel32.Dll               return to caller        specific

NtWriteFile           Int 2E or SYSENTER or SYSCALL   used by all
in NtDll.Dll                     return to caller     subsystems    U
 software interrupt
KiSystemService               call NtWriteFile
in NtosKrnl.Exe              dismiss interrupt

NtWriteFile                  do the operation
in NtosKrnl.Exe               return to caller
API Differences
 Windows DLLs versus NtDll.Dll
    Windows “kernel” APIs exported by Kernel32.Dll are different from the
    “native API” in NtDll.Dll
       Different entry point names
       Arguments are different (but similar)
    Routines in Kernel32.Dll rearrange (“marshal”) the arguments and call
    routines in NtDll.Dll
    NtDll.Dll uses change mode mechanism (INT 2E, SYSCALL) to invoke
    services in NtosKrnl.Exe in kernel mode
 NtDll.Dll versus NtosKrnl.Exe
    1400 exported symbols (285 start with “Nt”)
    Entry point names, arguments, etc., are the same between NtDll.Dll and
       I.e., a user-mode routine in the native API can also be called from kernel
    The DDK describes many “Zw” routines such as ZwReadFile, callable
    from kernel mode – this is the same location in memory as NtReadFile
    from user mode
    Kernel mode code could also call NtReadFile directly
Symmetric Multiprocessing (SMP)
 No master processor
  All the processors share just one memory space
  Interrupts can be serviced on any processor
  Any processor can cause another processor to reschedule what it’s


                   Memory       I/O

 Windows Server 2003 supports NUMA (non uniform
 memory architecture) systems                                         24
New MP Configurations
Hyperthreading support
  CPU fools OS into thinking there are multiple CPUs
    Example: dual Xeon with hyperthreading can support 2 logical processors
  XP & Windows Server 2003 are hyperthreading aware
    Logical processors don’t count against physical CPU limits
     E.g. XP Home will use 2 logical processors; XP Pro will use 4
    Scheduling algorithms take into account logical vs physical processors
Dual Core
  Processor licensing is per-socket
NUMA (non uniform memory architecture)
  Groups of physical processors (called “nodes”) that have “local
  Still an SMP system (e.g. any processor can access all of memory)
    But node-local memory is faster
  Scheduling algorithms take this into account
Kernel Synchronization
 Kernel synchronization primitives
   Queued Spinlocks
   Executive Resources
   Fast Mutexes, Guarded Mutexes
   Kernel Dispatcher Mutexes & Semaphores
 Scalability improvements
   Elimination of locks
   Locks held shorter durations
   Scheduling database now per-CPU
Increased System Memory Limits
 Key system memory limits raised in XP and 2003
 Windows 2000 limit of 200 GB of mapped file
 data eliminated
  Previously limited size of files that could be backed up
 Variable system PTEs can now describe 1.3 GB
 (960 MB contiguous)
  Windows 2000 limit was 660 MB (220 MB contiguous)
  Max device driver size was 220 MB, now 960 MB
 Registry limit of 376MB removed
  Was a limit on number of terminal server users
  No longer in paged pool – now a memory-mapped file
    No registry quota any more
  SYSTEM hive limited to 200 MB or ¼ of RAM,
  whichever is lower (max was 12 MB)
  Increased Limits in 64-bit
                      IA64      x64       x86
User Address Space    7152 GB   8192 GB   2-3 GB
Page file limit       16 TB     16 TB     4095 MB
                                          PAE: 16 TB
Max page file space   256 TB    256 TB    ~64 GB
System PTE Space      128 GB    128 GB    1.2 GB
System Cache          1 TB      1 TB      960 MB
Paged pool            128 GB    128 GB    470-650
Non-paged pool        128 GB    128 GB    256 MB

Many Packages…
1. Windows XP Home Edition
      Licensed for 1 CPU die, 4GB RAM
2. Windows 2000 & XP Professional
      Desktop version (but also is a fully functional server system)
      Licensed for 2 CPU dies, 4GB RAM (128GB for 64-bit edition on x64)
3. Windows Server 2003, Web Server
      Reduced functionality Standard Server (no domain controller)
      Licensed for 2 CPU dies, 2GB RAM
4. Windows Server 2003, Standard Edition (formerly Windows 2000 Server)
      Adds server and networking features (active directory-based domains, host-based
      mirroring and RAID 5, NetWare gateway, DHCP server, WINS, DNS, …)
            Licensed for 4 CPU dies, 4GB RAM (128GB on x64)
5. Windows Server 2003, Enterprise Edition
   (formerly Windows 2000 Advanced Server )
       3GB per-process address space option, Clusters (8 nodes)
       32-bit: 8 CPU dies, 32GB RAM; 64-bit: 64GB
6. Windows 2000 Datacenter Server & Windows 2003 Server, Datacenter Edition
       32-bit: 32 processors, 64GB RAM; 64-bit: 64 processors & 1024GB RAM
   NOTE: this is not an exhaustive list
      XP: Tablet PC edition, Media Center Edition, Starter Edition, N Edition
      Server: Small Business Server, Storage Server, …

...One OS Kernel
 Windows XP & 2003 for x64 (5.2) and all Windows 2000
 versions have identical core operating system
   XP & Server 2003 have different kernel versions (5.1 vs 5.2)
 Registry indicates system type (set at install time)
     ProductType: WinNT=Workstation, ServerNT=Server not a domain
     controller, LanManNT=Server that is a Domain Controller
     ProductSuite: indicates type of Server (Advanced, Datacenter, or for
     Windows NT 4.0: Enterprise Edition, Terminal Server, …)
 Code in the operating system tests these values and
 behaves slightly differently in a few places
   Licensing limits (number of processors, number of inbound network
   connections, etc.)
   Boot-time calculations (mostly in the memory manager)
   Default length of time slice
Core operating system image
  Contains Executive and Kernel
Four retail variations:
      NTOSKRNL.EXE          Uniprocessor
      NTKRNLMP.EXE          Multiprocessor
 32-bit Windows PAE versions (for DEP & >4GB
      NTKRNLPA.EXE          Uniprocessor w/extended
       addressing support
      NTKRPAMP.EXE          Multiprocessor w/extended
       addressing support

Vista: no uniprocessor kernel
Debug Version
“Checked Build”
Special debug version of system called “Checked Build”
  Provided with MSDN
  Primarily for driver testing, but can be useful for catching timing bugs in multithreaded
Built from same source files as “free build” (a.k.a., “retail build”)
  “DBG” compile-time symbol defined which enables:
    Error tests for “can’t happen” conditions in kernel mode (ASSERTs)
    Validity checks on arguments passed from one kernel mode routine to another

  #ifdef DBG
           if (something that should never happen has happened)

  Multiprocessor kernel (of course, runs on UP systems)
Can capture kernel debugger output with Dbgview from
See Knowledge base article 314743 (HOWTO: Enable Verbose Debug
Tracing in Various Drivers and Subsystems)

        System Architecture
                 System Processes                                      Services                          Applications                            Environment
                                   Service                                                                                                       Subsystems
                                 Control Mgr.                                                                                                         POSIX
                                                                      SvcHost.Exe                     Task Manager
                                                                     WinMgt.Exe                         Explorer
                            WinLogon                               SpoolSv.Exe                                                                     OS/2
   User              Session                                Services.Exe                              Application
   Mode              Manager                                                                     Subsystem DLLs                              Windows

        System                                                                                 NTDLL.DLL

 Mode                                                                              System Service Dispatcher
                                                                   (kernel mode callable interfaces)                                                      Windows
                       I/O Mgr



                                                                       Play Mgr.
                                                                       Plug and



                                                                                                                                 tion Mgr







                  Device &                                                                                                                                 Graphics
                  File Sys.                                                                                                                                Drivers
                                                                       Hardware Abstraction Layer (HAL)
                                                            hardware interfaces (buses, I/O devices, interrupts,
                                                          interval timers, DMA, memory cache control, etc., etc.)                                                33
Original copyright by Microsoft Corporation. Used by permission.
Upper layer of the operating system
Provides “generic operating system” functions (“services”)
  Process Manager
  Object Manager
  Cache Manager
  LPC (local procedure call) Facility
  Configuration Manager
  Memory Manager
  Security Reference Monitor
  I/O Manager
  Power Manager
  Plug-and-Play Manager
Almost completely portable C code
Runs in kernel (“privileged”, ring 0) mode
Most interfaces to executive services not documented

 Lower layers of the operating system
   Implements processor-dependent functions (x86 versus Itanium,
   Also implements many processor-independent functions that are
   closely associated with processor-dependent functions
 Main services
   Thread waiting, scheduling, and context switching
   Exception and interrupt dispatching
   Operating system synchronization primitives
   (different for MP versus UP)
   A few of these are exposed to user mode
 Not a classic “microkernel”
   shares address space with rest of kernel-mode components

HAL – Hardware Abstraction Layer
 Responsible for a small part of “hardware
   Components on the motherboard not handled by drivers
        System timers, Cache coherency, and flushing
        SMP support, Hardware interrupt priorities
 Subroutine library for the kernel and device drivers
   Isolates OS & drivers from platform-specific details
   Presents uniform model of I/O hardware interface to
 Reduced role in Windows 2000
   Bus support moved to bus drivers
   Majority of HALs are vendor-independent

 Exported symbols
   Functions and global variables Microsoft wants visible outside the
   image (e.g., used by device drivers)
   About 1500 symbols exported, of which about 400 are
   documented in the DDK
   Ways to list:
     Dependency Walker (File->Save As)
     Visual C++ “link /dump /exports ntoskrnl.exe”
 Global symbols
   Over 9000 global symbols in XP/2003 (Windows NT 4.0 was
     Many variables contain values related to performance and memory
   Ways to list:
     Visual C++: “dumpbin /symbols /all ntoskrnl.exe” (names only)
     Kernel debugger: “x nt!*”
           Module name of NTOSKRNL is “NT”
           Naming Convention For Internal
           NTOSKRNL Routines
Two- or three-letter component code in beginning of function name

   Ex       - General executive routine                    Ob      - Object management
   Exp      - Executive private (not exported)             Io      - I/O subsystem
   Cc       - Cache manager                                Se      - Security
   Mm       - Memory management                            Ps      - Process structure
   Rtl      - Run-Time Library                             Lsa     - Security Authentication
   FsRtl    - File System Run-Time Lib                     Zw      - File access, etc.

   Ke       - Kernel
   Ki       - Kernel internal (not available outside the kernel)

   Hal  - Hardware Abstraction Layer
   READ_, WRITE_ - I/O port and register access

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Interrupt Dispatching
user or kernel
    mode         kernel mode
                                                 Note, no thread or
                                                 process context switch!

                 Interrupt dispatch routine
interrupt !
                 Disable interrupts
                                                     Interrupt service routine
                 Record machine state (trap
                 frame) to allow resume             Tell the device to stop
                 Mask equal- and lower-IRQL         Interrogate device state, start
                 interrupts                         next operation on device, etc.
                                                    Request a DPC
                 Find and call appropriate ISR      Return to caller

                 Dismiss interrupt

                 Restore machine state
                 (including mode and enabled

Interrupt Precedence Via IRQLs
     IRQL = Interrupt Request Level             IRQL is also a state of the
       The “precedence” of the interrupt with   processor
       respect to other interrupts              Servicing an interrupt raises
       Different interrupt sources have         processor IRQL to that
       different IRQLs                          interrupt’s IRQL
       Not the same as IRQ                         This masks subsequent interrupts
                                                   at equal and lower IRQLs
31             High                             User mode is limited to IRQL 0
30           Power fail                         No waits or page faults at
29    Interprocessor Interrupt                  IRQL >= DISPATCH_LEVEL
28             Clock
              Device n                   Hardware interrupts
              Device 1
2          Dispatch/DPC                  Deferrable software interrupts
1               APC
                                                normal thread execution
0             Passive                                                                 41
Deferred Procedure Calls (DPCs)
 Used to defer processing from higher (device) interrupt level to a lower
 (dispatch) level
   Driver (usually ISR) queues request
   One queue per CPU; DPCs are normally queued to the current processor,
   but can be targetted to other CPUs
   Executes specified procedure at dispatch IRQL (or “dispatch level”, also
   “DPC level”) when all higher-IRQL work (interrupts) completed
 Used heavily for driver “after interrupt” functions
   Also used for quantum end and timer expiration
   queue head        DPC object         DPC object        DPC object

 IRQLs on 64-bit Systems
                 x64                             IA64
15           High/Profile                   High/Profile/Power
14   Interprocessor Interrupt/Power        Interprocessor Interrupt
13             Clock                              Clock
12        Synch (Srv 2003)                   Synch (MP only)
              Device n                           Device n
                 .                                   .
4                .                               Device 1
3             Device 1                  Correctable Machine Check
2          Dispatch/DPC               Dispatch/DPC & Synch (UP only)
1               APC                                APC
0           Passive/Low                        Passive/Low

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Object Manager
 Executive component for managing
 system-defined “objects”
   Objects are data structures with optional names
   “Objects” managed here include Windows Kernel
   objects, but not Windows User or GDI objects
   Object manager implements user-mode handles and
   the process handle table
 Object manager is not used for all OS data
   Generally, only those types that need to be shared,
   named, or exported to user mode
   Some data structures are called “objects” but are not
   managed by the object manager (e.g., “DPC objects”)
Object Manager
 In part, a heap manager…
   Allocates memory for data structure from system-wide,
   kernel space heaps (pageable or nonpageable)
 …With a few extra functions
   Assigns name to data structure (optional)
   Allows lookup by name
   Objects can be protected by ACL-based security
   Provides uniform naming, sharing, and protection
     Simplifies C2 security certification by centralizing all object
     protection in one place
   Maintains counts of handles and references (stored
   pointers in kernel space) to each object
     Object cannot be freed back to the heap until all handles and
     references are gone
Handles And Security
 Process handle table
   Is unique for each process
   But is in system address space, hence cannot be
   modified from user mode
   Hence, is trusted
 Security checks are made when handle table
 entry is created
   i.e. at CreateXxx time
   Handle table entry indicates the “validated” access
   rights to the object
     Read, Write, Delete, Terminate, etc.
   No need to revalidate on each request
Examining Handles: MS Tools
 Two tools:
   XP & 2003: openfiles /query command
   Resource Kit “oh” (Open Handles) tool
 Both of these require a special NT “global flag”
 registry bit to be set
   Requires reboot to take effect
   See HKEY_LOCAL_MACHINE\System\CurrentControlSet
    \Control\Session Manager\GlobalFlag
     Can view this bitmask with the GFLAGS tool
   Uses 8 bytes extra for each open handle

Examining Open Handles:
Sysinternals Tools
 Process Explorer (GUI version) or Handle
 (character cell version) from
   Uses a device driver to walk handle table, so doesn’t
   need Global Flag set

Viewing Open Handles
 Handle View
  By default, shows named objects
    Click on Options->Show Unnamed Objects

  Solve file locked errors
    Can search to determine what process is holding a file or
    directory open
    Can even close an open files (be careful!)
  Understand resources used by an application
  Detect handle leaks using refresh difference
  View the state of synchronization objects (mutexes,
  semaphores, events)
Viewing Handles With Kernel
 If looking at a dump, use !handle in Kernel
 Debugger (see help for options)
 lkd> !handle 0 f 9e8 file
 Searching for Process with Cid == 9e8
 Searching for handles of type file
 PROCESS 82ce72d0 SessionId: 0 Cid: 09e8 Peb: 7ffdf000 ParentCid: 06e
   DirBase: 06602000 ObjectTable: e1c879c8 HandleCount: 430.
 0280: Object: 82c5e230 GrantedAccess: 00120089
 Object: 82c5e230 Type: (82fdde70) File
   ObjectHeader: 82c5e218
     HandleCount: 1 PointerCount: 1
     Directory Object: 00000000 Name:
          \slides\ntint\new\4-systemarchitecture.ppt {HarddiskVolume1}
Object Manager Namespace
 System and session-wide internal
 View with Winobj from

Object Manager Namespace
  Hierarchical directory structure (based on file system model)
  System-wide (not per-process)
    With Terminal Services, Windows objects are per-session by default
               Vista: console no longer is session 0
    Can override this with “global\” prefix on object names
  Volatile (not preserved across boots)
  Namespace can be extended by secondary object managers
  (e.g., file system)
    Hook mechanism to call external parse routine (method)
  Supports case sensitive or case blind
  Supports symbolic links (used to implement drive letters, etc.)
Lookup done on object creation or access by name
  Not on access by handle
Not all objects managed by the object manager are named
  E.g., file objects are not named
  Un-named objects are not visible in WinObj                             53
System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

System Threads
 Functions in OS and some drivers that need to run as real
   E.g., need to run concurrently with other system activity, wait on
   timers, perform background “housekeeping” work
   Always run in kernel mode
   Not non-preemptible (unless they raise IRQL to 2 or above)
   For details, see DDK documentation on PsCreateSystemThread
 What process do they appear in?
   “System” process (Windows NT 4.0: PID 2,
   Windows 2000: PID 8, Windows XP: PID 4)
   In Windows 2000 and later, windowing system threads (from
   Win32k.sys) appear in “csrss.exe”
   (Windows subsystem process)

Examples Of System Threads
 Memory Manager
   Modified Page Writer for mapped files
   Modified Page Writer for paging files
   Balance Set Manager
   Swapper (kernel stack, working sets)
   Zero page thread (thread 0, priority 0)
 Security Reference Monitor
   Command Server Thread
   Redirector and Server Worker Threads
 Threads created by drivers for their exclusive use
   Examples: Floppy driver, parallel port driver
 Pool of Executive Worker Threads
   Used by drivers, file systems, …
   Accessed via ExQueueWorkItem
Identifying System Threads
 If System threads are consuming CPU time,
 need to find out what code is running, since it
 could be any one of a variety of components
   Pieces of OS (Ntoskrnl.exe)
   File server worker threads (Srv.sys)
   Other drivers
 To really understand what’s going on, must find
 which driver a thread “belongs to”

Identifiying System Threads
 Process Explorer:
   Double click on System
   Go to Threads tab and
   sort by CPU
     To view call stack, must use
     kernel debugger
 Note: several threads run
 between clock ticks (or at
 high IRQL) and thus don’t
 appear to run
   Watch context switch count

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Process-Based Code
 OS components that run in separate executables
 (.exes), in their own processes
   Started by system
   Not tied to a user logon
 Three types
   Environment subsystems (already described)
   System startup processes
     Note: “system startup processes” is not an official Microsoft
     defined name
   Windows Services
 Let’s examine the system process “tree”
   Use Tlist /T or Process Explorer
Process-Based NT Code
System Startup Processes
 First two processes aren’t real processes
    Not running a user mode .EXE
    No user-mode address space
    Different utilities report them with different names
    Data structures for these processes (and their initial threads) are
    “pre-created” in NtosKrnl.Exe and loaded along with the code

 (Idle)       Process id 0
              Part of the loaded system image
              Home for idle thread(s) (not a real process nor real threads)
              Called “System Process” in many displays
 (System)     Process id 2 (8 in Windows 2000; 4 in XP)
              Part of the loaded system image
              Home for kernel-defined threads (not a real process)
              Thread 0 (routine name Phase1Initialization) launches the first
               “real” process, running smss.exe...
              ...and then becomes the zero page thread

Process-Based NT Code
System Startup Processes
  smss.exe       Session Manager
                 The first “created” process
                 Takes parameters from
                 \Control\Session Manager
                 Launches required subsystems (csrss) and then winlogon
  csrss.exe      Windows subsystem
  winlogon.exe   Logon process: Launches services.exe & lsass.exe; presents first
                 login prompt
                 When someone logs in, launches apps in
                 \Software\Microsoft\Windows NT\WinLogon\Userinit
  services.exe   Service Controller; also, home for many NT-supplied services
                 Starts processes for services not part of services.exe (driven by
                 \Registry\Machine\System\CurrentControlSet\Services )
  lsass.exe      Local Security Authentication Server
  userinit.exe   Started after logon; starts Explorer.exe (see
                 \Software\Microsoft\Windows NT\CurrentVersion\WinLogon\Shell)
                 and exits (hence Explorer appears to be an orphan)
  explorer.exe   and its children are the creators of all interactive apps
Logon Process
 Winlogon sends username/password to Lsass
   Either on local system for local logon, or to Netlogon service on a domain
   Windows XP enhancement: Winlogon doesn’t wait for Workstation
   service to start if
     Account doesn't depend on a roaming profile
     Domain policy that affects logon hasn't changed since last logon
     Controller for a network logon
 Creates a process to run
      HKLM\Software\Microsoft\Windows NT
   By default: Userinit.exe
   Runs logon script, restores drive-letter mappings, starts shell
 Userinit creates a process to run
      HKLM\Software\Microsoft\Windows NT
   By default: Explorer.exe
 There are other places in the Registry that control
 programs that start at logon
Processes Started at Logon
  Displays order of processes configured to start at log on time
  Also can use new XP built-in tool called
  “System Configuration Utility”
     To run, click on Start->Help, then “Use Tools…”, then System
     Configuration Utility
     Only shows what’s defined to start vs Autoruns which shows all places
     things CAN be defined to start
Autoruns (Sysinternals)                 (in \Windows\pchealth\helpctr\binaries)

Windows Services
 An overloaded generic term
 A process created and managed by the Service
 Control Manager (Services.exe)
   E.g. Solitaire can be configured as a service, but is
   killed shortly after starting
 Similar in concept to Unix daemon processes
   Typically configured to start at boot time (if started
   while logged on, survive logoff)
   Typically do not interact with the desktop
 Note: Prior to Windows 2000 this is one way to
 start a process on a remote machine (now you
 can do it with WMI)
Life Of A Service
 Install time
   Setup application tells Service Controller
   about the service
     Setup           CreateService

 System boot/initialization
   SCM reads registry, starts                Service
   services as directed                     Controller/
   Control panel can start                       )
   and stop services and                                   Service
   change startup parameters

Viewing Service Processes
 Process Explorer can highlight
 Service Processes
   Click on Options->Highlight Services

Svchost Mechanism
 Windows 2000 introduced generic Svchost.exe
   Groups services into fewer processes
     Improves system startup time
     Conserves system virtual memory
   Not user-configurable as to which services go in which processes
   3rd parties cannot add services to Svchost.exe processes
 Windows XP/2003 have more Svchost processes due to
 two new less privileged accounts for built-in services
   Less rights than SYSTEM account
     Reduces possibility of damage if system compromised
 On XP/2003, four Svchost processes (at least):
   SYSTEM, SYSTEM (2nd instance – for RPC), LOCAL SERVICE,
Mapping Services To Service
Tlist /S (Debugging
Tools) or Tasklist /svc
(XP/2003) list internal
name of services inside
service processes
Process Explorer shows
more: external display
name and description

System Architecture
 Process Execution Environment
 Kernel Architecture
 Interrupt Handling
 Object Manager
 System Threads
 Process-based code

Four Contexts For Executing Code
 Full process and thread context
   User applications
   Windows Services
   Environment subsystem processes
   System startup processes
 Have thread context but no “real” process
   Threads in “System” process
 Routines called by other threads/processes
   Subsystem DLLs
   Executive system services (NtReadFile, etc.)
   GDI32 and User32 APIs implemented in Win32K.Sys (and graphics
 No process or thread context (“arbitrary thread context”)
   Interrupt dispatching
   Device drivers
        System Architecture
                System Processes                                       Services                          Applications                            Environment
                                  Service                                                                                                        Subsystems
                                Control Mgr.                                                                                                          POSIX
                                                                  SvcHost.Exe                         Task Manager
                                                                 WinMgt.Exe                             Explorer
                           WinLogon                            SpoolSv.Exe                                                                         OS/2
  User              Session                                Services.Exe                               Application
  Mode              Manager                                                                      Subsystem DLLs                              Windows

       System                                                                                  NTDLL.DLL

Mode                                                                               System Service Dispatcher
                                                                   (kernel mode callable interfaces)                                                      Windows
                      I/O Mgr



                                                                       Play Mgr.
                                                                       Plug and



                                                                                                                                 tion Mgr







                 Device &                                                                                                                                  Graphics
                 File Sys.                                                                                                                                 Drivers
                                                                       Hardware Abstraction Layer (HAL)
                                                           hardware interfaces (buses, I/O devices, interrupts,
                                                         interval timers, DMA, memory cache control, etc., etc.)
Original copyright by Microsoft Corporation. Used by permission.                                                                                                 72
1.   System Architecture
2.   Processes and Thread Internals
3.   Memory Management Internals
4.   Security Internals

Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations

Processes And Threads
Each process has its own…
 Virtual address space (including program global storage, heap storage,
 threads’ stacks)
   Processes cannot corrupt each other’s address space by mistake
 Working set (physical memory “owned” by the process)
 Access token (includes security identifiers)
 Handle table for Windows kernel objects
 These are common to all threads in the process, but separate and protected
 between processes
Each thread has its own…
 User-mode stack (automatic storage, call frames, etc.)
 Kernel-mode stack
 Scheduling state (Wait, Ready, Running, etc.) and priority
 Current access mode (user mode or kernel mode)
 Saved CPU state if it isn’t running
 Access token (optional – overrides process’s if present)
Process And Thread Identifiers
 Every process and every thread has an identifier
 Generically: “client ID” (debugger shows as “CID”)
   A.k.a., “process ID” and “thread ID”, respectively
   Process IDs and thread IDs are in the same “number space”
   These identify the requesting process or thread to its subsystem
   “server” process, in API calls that need the server’s help
 Visible in PerfMon, Task Manager (for processes),
 Process Viewer (for processes), kernel debugger, etc.
 IDs are unique among all existing processes
 and threads
   But might be reused as soon as a process or thread
   is deleted

Jobs                                                                 Processes

Kernel object to manage groups
of processes
  Set limits on a process or group of processes
Quotas and restrictions:
  Quotas: total CPU time, # active processes, per-process CPU
  time, memory usage
  Run-time restrictions: priority of all the processes in job;
  processors threads in job can run on
  Security restrictions: limits what processes can do
    Not acquire administrative privileges
    Not accessing windows outside the job, no reading/writing the
  Scheduling class: number from 0-9 (5 is default) - affects length
  of thread timeslice (or quantum - t.b.d.)
    E.g. can be used to achieve “class scheduling” (partition CPU)
 How do processes become of a job?
   Job object has to be created
   Then processes are explicitly added
   Processes by processes in a job automatically are part of the job
     Unless restricted, processes can “break away” from a job
 Only Datacenter Server has a built-in tool to take
 advantage of jobs
   “Process Control Manager” – allows creating definitions for jobs
   and associating processes with them
 Uses of jobs in OS:
   Add/Remove Programs (“ARP Job”)
   WMI provider
   RUNAS service (SecLogon) uses jobs to terminate processes at
   log out
     SU from NT4 ResKit didn’t do this
Demo: WMI Job
Jobs are used by WMI
  Example: run Psinfo (Sysinternals) and pause output

Processes And Threads
Internal Data Structures
    Access Token

                          VAD        VAD         VAD
 Object                Virtual Address Space Descriptors
                                                       See kernel debugger
        Handle Table                                   commands:
                                  object                   dt (see next slide)

          Thread           Thread          Thread          ...
                                              Access Token
Dumping Structures With
Kernel Debugger
 !process and !thread show subset of information
 in a process & thread block
 “dt” (“Display Type”) command can format all the
   Syntax: “dt StructureName address –r”
   dt nt!_* - displays all OS structures known to dt
 Process/thread-related structures

Process Block Layout
lkd> dt nt!_EPROCESS
  +0x000 Pcb         : _KPROCESS
  +0x06c ProcessLock         : _EX_PUSH_LOCK
  +0x070 CreateTime        : _LARGE_INTEGER
  +0x078 ExitTime       : _LARGE_INTEGER
  +0x080 RundownProtect : _EX_RUNDOWN_REF
  +0x084 UniqueProcessId : Ptr32 Void
  +0x088 ActiveProcessLinks : _LIST_ENTRY
  +0x090 QuotaUsage         : [3] Uint4B
  +0x09c QuotaPeak         : [3] Uint4B
  +0x0a8 CommitCharge : Uint4B
  +0x0ac PeakVirtualSize : Uint4B
  +0x0b0 VirtualSize     : Uint4B
NOTE: Add “-r” to recurse through substructures
Thread Block (!strct ethread)
lkd> dt nt!_ETHREAD
  +0x000 Tcb        : _KTHREAD
  +0x1c0 CreateTime       : _LARGE_INTEGER
  +0x1c0 NestedFaultCount : Pos 0, 2 Bits
  +0x1c0 ApcNeeded         : Pos 2, 1 Bit
  +0x1c8 ExitTime      : _LARGE_INTEGER
  +0x1c8 LpcReplyChain : _LIST_ENTRY
  +0x1c8 KeyedWaitChain : _LIST_ENTRY
  +0x1d0 ExitStatus     : Int4B
  +0x1d0 OfsChain       : Ptr32 Void
  +0x1d4 PostBlockList : _LIST_ENTRY
  +0x1dc TerminationPort : Ptr32 _TERMINATION_PORT
  +0x1dc ReaperLink       : Ptr32 _ETHREAD

NOTE: Add “-r” to recurse through substructures
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations

Scheduling Priorities
Realtime Time Critical   31

Levels 16-31

        Realtime Idle    16
                                   13               Above Normal

 Dynamic                                                                      Below Normal
 Levels 1-15                                             8   8


        Dynamic Idle
          System Idle    0

Thread Scheduling
 Priority driven, preemptive
    UP: highest priority thread always runs
    MP: One of the highest priority runnable thread will be running
    Event-driven; no guaranteed execution period before preemption
 No attempt to share processor(s) “fairly” among processes,
 only among threads
    Time-sliced, round-robin within a priority level
 Order 1 (no scan of all threads)
    Linux 2.4 is Order N (2.6 is O1)

Thread Scheduling
The “code that does scheduling” is not a thread
  i.e. there is no always-instantiated routine called “the
Scheduling routines are called whenever events
occur that change the state of a thread
  interval timer interrupts (for quantum end)
  interval timer interrupts (for timed wait completion)
  other hardware interrupts (for I/O wait completion)
  one thread changes the state of a waitable object upon
  which other thread(s) are waiting
  a thread waits on one or more dispatcher objects
  a thread priority is changed

Scheduling Scenarios: Preemption
 Preemption is strictly event-driven
    does not wait for the next clock tick
    no guaranteed execution period before preemption
    threads in kernel mode may be preempted (unless they raise IRQL to >=

           Running Ready
                                            from Wait state

 A preempted thread goes back to the head of its ready queue
    also, if in real-time priority range, its quantum is reset
Scheduling Scenarios
Ready After Wait Resolution
 If newly-ready thread is not of higher priority than the
 running thread…
 …it is put at the tail of the ready queue for its current
    If in real-time priority range, its quantum is reset
        Running Ready

                                                           from Wait state
Scheduling Scenarios
Voluntary Switch
 When the running thread gives up the CPU…
 …Schedule the thread at the head of the next non-empty
 “ready” queue
         Running Ready


to Waiting state
Scheduling Scenarios
Quantum End
 When the running thread exhausts its CPU quantum, it goes to the
 end of its ready queue
   Applies to all threads (even if in kernel mode if IRQL<2)
      Quantums can be disabled for a thread by a kernel function
   Default quantum on Professional is 2 clock ticks, 12 on Server
      standard clock tick is 10 msec; might be 15 msec on some MP Pentium systems
   If no other ready threads at that priority, same thread continues running
   (just gets new quantum)
   If running at boosted priority, priority decays at quantum end (described

             Running Ready
      13                                                                            91
Quantum Stretching
 Resulting quantum:
   “Maximum” = 6 ticks
   (middle) = 4 ticks
   “None” = 2 ticks
           Running Ready


 Quantum stretching does not happen on
 NT Server
   Quantum on NT Server is 12 ticks
Quantum Selection
  As of Windows 2000, can choose short quantums
  on Server (e.g. for terminal servers)
Windows 2000:           Windows XP:

Controlling Quantum
                                Scheduling   Quantum units
 If a process is a member          class
 of a job, quantum can be       0            6
 adjusted by setting the
 “Scheduling Class”             1            12
   Only applies if process is   2            18
   >Idle priority class         3            24
   Only applies if system
   running with fixed           4            30
   quantums (the default        5            36
   on Servers)
                                6            42
 Values are 0-9
                                7            48
   5 is default
                                8            54
                                9            60
Thread Scheduling States

  Init (0)
                               preempt                        preemption,
                                         Standby (3)
                                                              quantum end

                   Ready (1)                             Running (2)

         Transition (6)

                                           Waiting (5)
 wait resolved                                                         Terminate (4)
  after kernel
  stack made
                      Ready = thread eligible to be scheduled to run
                      Standby = thread is selected to run on CPU
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations

Priority Adjustments
 Priority boosts are applied to threads in
 “dynamic” classes (1-15)
   No automatic adjustments in “real-time” class (16 or
   Can disable with SetThreadPriorityBoost or
 Five types:
   I/O completion
   Wait completion on events or semaphores
   When threads in the foreground process complete a
   When GUI threads wake up for windows input
   For CPU starvation avoidance                           97
Priority Boosting
 After an I/O: specified by device driver             Common boost values
                                                      (see NTDDK.H)
   IoCompleteRequest( Irp, PriorityBoost )
                                                      1: disk, CD-ROM,
 After a wait on executive event or                   parallel,
 semaphore                                            Video
   KeSetEvent( Event, PriorityBoost…)                 2: serial, network, named
   Boost value of 1 is used for these objects         pipe, mailslot
                                                      6: keyboard or mouse
   Server 2003: setting thread loses boost
   (lock convoy issue)                                8: sound
 After any wait on a dispatcher object by a thread in the foreground
   Boost value of 2
   Goal: improve responsiveness of interactive apps
 GUI threads that wake up to process windowing input (e.g. windows
 messages) get a boost of 2
   This is added to the current, not base priority
   Goal: improve responsiveness of interactive apps

  Priority Boost And Decay
     Behavior of these boosts:
           Boost is applied to thread’s base priority
            Will not take you above priority 15
           After a boost, you get one quantum
            Then decays 1 level, runs another quantum
                Then decays another level, etc. until back to base priority

                                         Priority decay
                                         at quantum end

 Priority             Boost                               Round-robin at
                      upon                                base priority
                      wait                Preempt
                      complete            (before
 Base                                     quantum
Priority                                  end)
                  Run     Wait     Run                  Run

CPU Starvation Avoidance
Balance Set Manager system thread looks for
“CPU starved” threads                                    12   Wait
  Wakes up once per second and examines Ready
  Looks for threads that have been Ready for 300 clock
                                                         7    Run

Such threads get a big boost to 15 and
quantum is doubled                                       4    Ready
  At quantum end, returns to previous priority (no
  gradual decay) and normal quantum
To minimize overhead:
  Scans up to 16 Ready threads per priority level
  each pass
  Boosts up to 10 Ready threads per pass
Like all priority boosts, does not apply in the
real-time range (priority 16 and above)

Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations

Multiprocessor Scheduling
 Fully distributed (no “master processor”)
  Any processor can interrupt another processor to
  schedule a thread
 Scheduling database:
  Pre-Windows Server 2003: single system-wide list of
  ready queues
  Windows Server 2003: per-CPU ready queues
 Threads can run on any CPU, unless specified
  Tries to keep threads on same CPU (“soft affinity”)
  Setting of which CPUs a thread will run on is called
  “hard affinity”
Hard Processor Affinity
 Threads can run on any CPU, unless affinity specified otherwise
   Affinity specified by a bit mask
   Each bit corresponds to a CPU number
 Can alter with SetThreadAffinityMask or SetProcessAffinityMask or in
 the job object
   Thread affinity mask must be subset of process affinity mask, which in turn
   must be a subset of the active processor mask
 “Hard Affinity” can lead to threads’ getting less CPU time than they
 normally would
   More applicable to large MP systems running dedicated server apps
 Note: OS may in some cases need to run your thread CPUs other than
 your hard affinity setting
   E.g. flushing DPCs, setting system time

Hard Processor Affinity
 On MP systems, the
 process affinity mask
 can be examined and
 changed via Task
 Can also set an image
 affinity mask
   Imagecfg tool in
   Windows 2000 Server
   Resource Kit
   Supplement 1
   Can also set
   “uniprocessor only”:
   sets affinity mask to
   one processor (rotates
   round robin at each
   process creation)        104
Soft Processor Affinity
 Every thread has an “ideal processor”
   System selects ideal processor for first thread in
   process (round robin across CPUs)
   Next thread gets next CPU relative to the process
   Can override with:
   SetThreadIdealProcessor (
    HANDLE hThread,            // handle to thread
    DWORD dwIdealProcessor);   // processor number

   Hard affinity changes update ideal processor settings
   Used in selecting where a thread runs next (see next
Choosing A CPU For A
Ready Thread (Windows 2000)
 When a thread becomes ready to run (e.g. its wait completes, or it is just
 beginning execution), need to choose a processor for it to run on
 First, it sees if any processors are idle that are in the thread’s hard affinity
    If its “ideal processor” is idle, it runs there
    If the previous processor it ran on is idle, it runs there
    Else if the current processor is idle, it runs there
    Else it picks the highest numbered idle processor in the thread’s affinity mask
 If no processors are idle:
    If the ideal processor is in the thread’s affinity mask, it selects that
    Else if the the last processor is in the thread’s affinity mask, it selects that
    Else it picks the highest numbered processor in the thread’s affinity mask
 Finally, it compares the priority of the new thread with the priority of the thread
 running on the processor it selected (if any) to determine whether or not to
 perform a preemption

Selecting A Thread To Run
On A CPU (Windows 2000)
System needs to choose a thread to run on a specific CPU at:
   At quantum end
   When a thread enters a wait state
   When a thread removes its current processor from its hard affinity mask
   When a thread exits
Win2000: With dispatcher lock held, starting with the first thread in the highest
priority non-empty ready queue, it scans the queue for the first thread that has
the current processor in its hard affinity mask and:
   Ran last on the current processor, or
   Has its ideal processor equal to the current processor, or
   Has been in its Ready queue for more than 2 quantums, or
   Has a priority >=24
If it cannot find such a candidate, it selects the highest priority thread that can
run on the current CPU (whose hard affinity includes the current CPU)
   Note: this may mean going to a lower priority ready queue to find a candidate

Server 2003 Enhancements
 Idle processor selection further refined to:
   If a NUMA system: if there are idle CPUs in the node
   containing the thread’s ideal processor, reduce to
   that set
   If a hyperthreaded system: if one of the idle processors
   is a physical processor with all logical processors idle,
   reduce to that set
   Then try to eliminate idle CPUs that are sleeping
   If thread ran last on a member of the set, pick
   that CPU
     Else pick lowest numbered CPU in remaining set

Server 2003 Enhancements
 Threads always go into the ready queue of their ideal
 Instead of locking the dispatcher database to look for a
 candidate to run, per-CPU ready queue is checked first
 (locks PRCB spinlock)
   If a thread has been selected to run on the CPU, does the context
   Else begins scan of other CPU’s ready queues looking for a thread
   to run
     This scan is done OUTSIDE the dispatcher lock
 Dispatcher lock still acquired to wait or unwait a thread
 and/or change state of a dispatcher object
 Bottom line: dispatcher lock is now held for a MUCH
 shorter time
1.   System Architecture
2.   Processes and Thread Internals
3.   Memory Management Internals
4.   Security Internals

Memory Management
 Core Memory Management Services
 Working Set Management
 Unassigned Memory
 Page Files

Memory Manager Features
 Demand paged virtual memory
   Pages are read in on demand and written out when
   necessary (to make room for other memory needs)
 Provides flat virtual address space
   32-bit: 4 GB, 64-bit: 16 Exabytes (theoretical)
 Shared memory with copy on write
 Mapped files (fundamental primitive)
   Provides basic support for file system
   cache manager

Virtual Address Space Allocation
 Virtual address space is sparse
   Address spaces contain reserved, committed, and
   unused regions
 Unit of protection and usage is one page
   Page size can vary
   On x86, default page size for applications is 4 KB
   On Itanium, default page size is 8 KB
 Large pages
   If a “large memory system”, large (4 MB on x86; 16MB
   on Itanium) pages are used to map the OS and HAL
      Disables kernel write protection
   New in 2003: applications can VirtualAlloc large pages
   with MEM_LARGE_PAGE flag
Shared Memory
 Like most modern OSs,
 Windows provides a way for
 processes to share memory
   High speed IPC (used by LPC,
   which is used by RPC)
   Threads share address space, but
   applications may be divided into
   multiple processes for stability
 Processes can also create shared
 memory sections
   Called page file backed file
   mapping objects
   Full Windows security
 It does this automatically for
 shareable pages
   E.g., code pages in an .EXE
Mapped Files
 A way to take part of a file and map it to a range of
 virtual addresses
   (Address space is 2 GB, but files can be much larger)
 Called “file mapping objects” in Windows API
 Bytes in the file then correspond one-for-one with
 bytes in the region of virtual address space
   Read from the “memory” fetches data from the file
   Pages are kept in physical memory as needed
   Changes to the memory are eventually written back to the file
   (can request explicit flush)
 Initial mapped files in a process include
   The executable image (EXE)
   One or more Dynamically Linked Libraries (DLLs)
 Processes can map additional files as desired (data
 files or additional DLLs)                                         115
     Section Objects
     Mapped files
Called “file mapping objects” in Windows API
Files may be mapped into v.a.s.
// first, do EITHER ...
hMapObj = CreateFileMapping (hFile, security, protection,sizeHigh, sizeLow,
// … OR …
hMapObj = OpenFileMapping (accessMode, inheritflag, mapname);
// … then, pass the resulting handle to a mapping object (section) to ...
lpvoid = MapViewOfFile (hMapObj, accessMode,
       offsetHigh, offsetLow, cbMap);

Bytes in the file then correspond one-for-one with bytes in the region
of virtual address space
   Read from the “memory” fetches data from the file
   Changes to the memory are written back to the file
   Pages are kept in physical memory as needed
   If desired, can map to only a part of the file at a time
  Copy-On-Write Pages
Used for sharing between process
address spaces
Pages are originally set up as shared,
read-only, faulted from the common file
  Access violation on write attempt alerts pager
    Pager makes a copy of the page and allocates it privately to
    the process doing the write, backed to the paging file
  So, only need unique copies for the pages in the
  shared region that are actually written (example of
  “lazy evaluation”)
  Original values of data are still shared
    E.g., writeable data initialized with C initializers

   How Copy-On-Write Works

Orig. Data
              Page 1
                                 Orig. Data
                 Page 2

             Page 3

Process                          Process
Address                          Address
 Space                            Space

   How Copy-On-Write Works

Orig. Data
              Page 1
                                          Mod’d. Data
                  Page 2

             Page 3

Process                  Copy of page 2    Process
Address                                    Address
 Space                                      Space

Physical Memory
 32-bit Windows supports systems with 64GB
 physical memory
 But, the virtual address space is still 4 GB, so
 how can this memory be used?
 1.   Although each process can only address 2 (or 3) GB,
      many may be in memory at the same time
      (e.g., 5 * 2 GB processes = 10 GB)
 2.   New Address Windowing Extensions allow Win32
      processes to use more than 2 GB of memory
 3.   Files in system cache remain in physical memory
        Although file cache doesn’t know it, memory manager keeps
        unmapped data in physical memory

Address Windowing Extensions
 AWE functions allow
 Win32 processes to
 allocate large amounts of
 physical memory and then
 map “windows” into that
 Applications: Database
 servers can cache large
 Up to programmer to
   Like DOS enhanced
   memory (EMS) with more
 64-bit Windows removes
 this need

File System Virtual Block Cache
 Virtual block cache (not logical block)
   Managed in terms of blocks within files, not blocks within partition
   Caching occurs above file system, not below
   Permits access to cached data without translation of file to sector
     Allows maintaining coherency between normal file I/O and memory
     mapped file I/O
   Intelligent read-ahead
     Predicts next read location based on history of last 2 reads
 Shared by all file systems
   Local or remote
   Includes file data and file system metadata (e.g. MFT, file
   attributes, …)
 Write back cache
   Data held in memory and written later by mapped page writer
   system thread
Cache Virtual Structure
Virtual size: 64-960mb
  In system virtual address space, so
  visible to all processes
  Divided into 256kb “views”
Cache slots are mapped to 256kb
segments of cached files
  Uses same services as Win32 memory
  mapped files
But remember, this is virtual, not
  Relies on memory manager to read and
  write actual file data via normal paging
Virtual size of the cache is not related
to amount of cached file data
  Memory manager will still “cache”
  unmapped file data on the standby list
  So larger cache size just reduces # of
Controlling The Cache
Per-file basis
 File open flags affect how cache influences the memory
 manager on what data to keep in RAM
   If nothing specified, automatic asynchronous read-ahead
      Predicts next read location based on history of last 2 reads
      Touches the pages to fault them in
   FILE_FLAG_SEQUENTIAL increases size of read-ahead
      And, causes cache to re-use same cache slot (instead of filling
      Also puts unmapped pages at end of standby list
   FILE_FLAG_RANDOM_ACCESS disables read ahead
 Can disable file cache completely on a per-file open basis
   Requires reads/writes to be done on sector boundaries
   Buffers must be aligned in memory on sector boundaries
Memory Management
 Core Memory Management Services
 Working Set Management
 Unassigned Memory
 Page Files

Working Set
 Working set: All the physical pages “owned”
 by a process
   Essentially, all the pages the process can reference without
   incurring a page fault
 Working set limit: The maximum pages the process can
   When limit is reached, a page must be released for every page
   that’s brought in (“working set replacement”)
   Default upper limit on size for each process
   System-wide maximum calculated and stored in
     Approximately RAM minus 512 pages (2 MB on x86) minus min size of
     system working set (1.5 MB on x86)
     Interesting to view (gives you an idea how much memory you’ve “lost”
     to the OS)
   True upper limit: 2 GB minus 64 MB
Birth Of A Working Set
 Pages are brought into memory as a result of page faults
   Prior to Windows XP, no pre-fetching at image startup
   But readahead is performed after a fault
      See MmCodeClusterSize, MmDataClusterSize, MmReadClusterSize
   Can see with Filemon
 If the page is not in memory, the appropriate block in the associated
 file is read in
   Physical page is allocated
   Block is read into the physical page
   Page table entry is filled in
   Exception is dismissed
   Processor re-executes the instruction that caused the page fault (and this
   time, it succeeds)
 The page has now been “faulted into” the process “working set”

Working Set List

        newer pages                 older pages

                      Process “WorkingSet”

 A process always starts with an empty
 working set
   It then incurs page faults when referencing a page that
   isn’t in its working set
   Many page faults may be resolved from memory (to be
   described later)
Working Set Replacement

                       Process “WorkingSet”

 When working set max reached (or working set trim                  To standby
 occurs), must give up pages to make room for new pages              or
 Local page replacement policy (most Unix systems                   modified
 implement global replacement)                                      page list
   E.g. a single process cannot take over all of physical memory
   Page replacement algorithm is least recently accessed
   (pages are aged)
      On UP systems only in Windows 2000 – done on all systems in
      Windows XP/2003
 New VirtualAlloc flag in XP/2003: MEM_WRITE_WATCH
Working Set System Services
  Min/Max set on a per-process basis
    Can view with !process in Kernel Debugger
  Can adjust with SetProcessWorkingSetSize –
  but has little effect
    Limits are “soft” (many processes larger than max)
    Memory Manager decides when to grow/shink
    working sets
  New function in 2003 Server:
    Supports hard working set limits
  Can also self-initiate working set trimming
    Pass -1, -1 as min/max working set size
    (minimizing a window does this for you)              130
Locking Pages
 Pages may be locked into the process working set
    Pages are guaranteed in physical memory (“resident”) when any thread in
    process is executing

 status = VirtualLock(baseAddress, size);
 status = VirtualUnlock(baseAddress, size);

 Number of lockable pages is a fraction of the maximum working set
    Changed by SetProcessWorkingSetSize
 Pages can be locked into physical memory (by kernel mode code
 Pages are then immune from “outswapping” as well as paging


Process Memory Information
Task ManagerProcesses tab

                                                   1      2
1 “Mem Usage” = physical
  memory used by process
  (working set size, not
  working set limit)
  Ø Note: Shared pages are
     counted in each process
l “VM Size” = private (not
  shared) committed virtual
  space in processes ==
  potential pagefile usage
l “Mem Usage” in status bar
  is not total of “Mem
  Usage” column (see later
  slide)                                          3

                               Screen snapshot from:
                               Task Manager | Processes
                               tab                            132
Process Memory Information
PerfMon –
Process Object
 “Virtual Bytes” =
 committed + reserved
 virtual space, including
 shared pages
 “Working Set” = working
 set size (not limit)
 “Private Bytes” = private
 virtual space (same as
 “VM Size” from Task
 Manager Processes list)
 Also: In Threads object,
 look for threads in
 Transition state - evidence
 of swapping (usually
 caused by severe memory
                               Screen snapshot from: Performance Monitor
                               counters from Process object        133
Viewing The Working Set
   Working set size counts shared pages in
   each working set
   Vadump (Resource Kit) can dump the
   breakdown of private, shareable, and
   shared pages
C:\> Vadump –o –p 3968
Module Working Set Contributions in pages
  Total Private Shareable Shared Module
    14     3     11    0 NOTEPAD.EXE
    46     3      0   43 ntdll.dll
    36     1      0   35 kernel32.dll
     7    2      0    5 comdlg32.dll
    17     2      0   15 SHLWAPI.dll
    44     4      0   40 msvcrt.dll          134
Prefetch Mechanism
 File activity is traced and used to prefetch data
 the next time
   First 10 seconds are monitored
     Pages referenced & directories opened
   Prefetch “trace file” stored in \Window\Prefetch
     Name of .EXE-<hash of full path>.pf
 Also applies to system boot
   First 2 minutes of boot process logged
     Stops 30 seconds after the user starts the shell or 60 seconds
     after all services
     are started
   Boot trace file:

Prefetch Mechanism
 When application run again, system
   Reads in directories referenced
   Reads in code and file data
      Reads are asynchronous
      But waits for all prefetch to complete
 In addition, every 3 days, system automatically
 defrags files involved in each application startup
 Bottom line: Reduces disk head seeks
   This was seen to be the major factor in slow
   application/system startup

Memory Management
 Core Memory Management Services
 Working Set Management
 Unassigned Memory
 Page Files

Managing Physical Memory
 System keeps unassigned physical pages on
 one of several lists
    Free page list
    Modified page list
    Standby page list
    Zero page list
    Bad page list – pages that failed memory test at
    system startup
 Lists are implemented by entries in the “PFN
    Maintained as FIFO lists or queues

   Paging Dynamics
          demand zero               page read from
            page faults             disk or kernel


                           “soft”        modified     Free    zero    Zero
           Working                                                           Bad
                           page           page        Page    page    Page
            Sets           faults                            thread          Page
                                          writer      List            List

“global                                 Modified
valid”                                   Page
                      working set         List

                                    Private pages
                                    at process exit
Standby And Modified Page Lists

 Modified pages go to modified (dirty) list
   Avoids writing pages back to disk too soon
 Unmodified pages go to standby
 (clean) list
 They form a system-wide cache of
 “pages likely to be needed again”
   Pages can be faulted back into a process
   from the standby and modified page list
   These are counted as page faults, but not
   page reads
Modified Page Writer
 Moves pages from modified to standby list, and
 copies their contents to disk
   I.e., this is what writes the paging file and updates
   mapped files (including the file system cache)
 Two system threads
   One for mapped files, one for the paging file
 Triggered when
   Memory is over-committed (too few free pages)
   Or modified page threshold is reached
   Does not flush entire modified page list

Free And Zero Page Lists
 Free Page List
   Used for page reads
   Private modified pages go here on process exit
   Pages contain junk in them (e.g., not zeroed)
   On most busy systems, this is empty
 Zero Page List
   Used to satisfy demand zero page faults
     References to private pages that have not been created
   When free page list has 8 or more pages, a priority
   zero thread is awoken to zero them
   On most busy systems, this is empty too
Memory Management Information
Task Manager
Performance tab

6   “Available” = sum of free,
    standby, and zero page
    lists (physical)
    Majority are likely standby
    “System Cache” = size of
    standby list + size of
    system working set (file                                       6
    cache, paged pool,
    pageable OS/driver code
    & data)

                                  Screen snapshot from:
                                  Task Manager | Performance tab
PFN Database
  Only way to get actual size of physical memory lists is to
  use !memusage in Kernel Debugger

lkd> !memusage
 loading PFN database

             Zeroed:      0    (     0   kb)
               Free:      3    (    12   kb)
            Standby: 98248     (392992   kb)
           Modified:    563    ( 2252    kb)
    ModifiedNoWrite:      0    (     0   kb)
       Active/Valid: 93437     (373748   kb)
         Transition:      1    (     4   kb)
            Unknown:      0    (     0   kb)
              TOTAL: 192252    (769008   kb)
                                    Screen snapshot from:kernel debugger
                                    !memusage command                  144
Memory Management
 Core Memory Management Services
 Working Set Management
 Unassigned Memory
 Page Files

Page Files
 What gets sent to the paging file?
    Not code – only modified data (code can be re-read from image
    file anytime)
 When do pages get paged out?
    Only when necessary
    Page file space is only reserved at the time pages are written
    Once a page is written to the paging file, the space is occupied
    until the memory is deleted (e.g., at process exit), even if the
    page is read back from disk
 Can run with no paging file
    Windows NT4/Windows 2000: Zero pagefile size actually
    created a 20MB temporary page file (\temppf.sys)

Sizing The Page File
 Given understanding of page file usage, how big should the total
 paging file space be?
 (Windows supports multiple paging files)
 Size should depend on total private virtual memory used by
 applications and drivers
    Therefore, not related to RAM size (except for taking a full memory
 Worst case: Windows has to page all private data out to make room
 for code pages
    To handle, minimum size should be the maximum of VM usage
    (“Commit Charge Peak”)
       Hard disk space is cheap, so why not double this
    Normally, make maximum size same as minimum
    But, max size could be much larger if there will be infrequent demands
    for large amounts of page file space
       Performance problem: Page file extension will likely be very fragmented
       Extension is deleted on reboot, thus returning to a contiguous page file

Memory Management Information
Task Manager
Performance tab

3   Total committed private virtual
    memory (total of “VM Size” in
    process tab + Kernel Memory
    Not all of this space has actually
    been used in the paging files; it is
    “how much would be used if it was
    all paged out”                              3
4   “Commit charge limit” = sum of
    physical memory available for
    processes + current total size of
    paging file(s)
    Does not reflect true maximum           4
    page file sizes (expansion)
    When “total” reaches “limit”, further
    VirtualAlloc attempts by any                                             3       4
    process will fail
                                                    Screen snapshot from:
                                                    Task Manager | Performance tab       148
When Page Files Are Full
 When page file space runs low
 1.   “System running low on virtual memory”
        First time: Before pagefile expansion
        Second time: When committed bytes reaching commit limit
 2.   “System out of virtual memory”
        Page files are full
 Look for who is consuming pagefile space
      Process memory leak: Check Task Manager, Processes tab, VM
      Size column
        Or Perfmon “private bytes”, same counter
      Paged pool leak: Check paged pool size
        Run poolmon to see what object(s) are filling pool
        Could be a result of processes not closing handles – check process
        “handle count” in Task Manager

1.   System Architecture
2.   Processes and Thread Internals
3.   Memory Management Internals
4.   Security Internals

 Protecting Objects

Windows Security Support
 Microsoft’s goal was to achieve C2, which requires:
   Secure Logon: NT provides this by requiring user name and
   Discretionary Access Control: fine grained protection over
   resources by user/group
   Security Auditing: ability to save a trail of important security
   events, such as access or attempted access of a resource
   Object reuse protection: must initialize physical resources that are
   reused e.g. memory, files
 Certifications achieved:
   Windows NT 3.5 (workstation and server) with SP3 earned C2 in
   July 1995
   In March 1999 Windows NT 4 with SP3 earned e3 rating from
   UK’s Information Technology Security (ITSEC) – equivalent to C2
   In November 1999 NT4 with SP6a earned C2 in stand-alone and
   networked environments
Windows Security Support
 Windows meets two B-level requirements:
   Trusted Path Functionality: way to prevent trojan
   horses with “secure attention sequence” (SAS) - Ctrl-
   Trusted Facility Management: ability to assign different
   roles to different accounts
      Windows does this through account privileges (TBD later)

Common Criteria
 New standard, called Common Criteria (CC), is the new
 standard for software and OS ratings
   Consortium of US, UK, Germany, France, Canada, and the
   Netherlands in 1996
   Became ISO standard 15408 in 1999
   For more information, see
 CC is more flexible than TCSEC trust ratings
   Protection Profile collects security requirements
   Security Target (ST) are security requirements that can be made
   by reference to a PP
 Windows 2000 was certified as compliant with the CC
 Controlled Access Protection Profile (CAPP) in October
   Windows XP and Server 2003 are undergoing evaluation

 Protecting Objects

           Security Components
                                                    LSASS                                                  Event
                                                            Active                                        Logger
                                         LSA                  SAM
                 Policy                 Server               Server
Mode                                   MSVC1_0.dl                                                    Directory
                                       Kerberos.dll                              SAM

Mode                                                System Service Dispatcher
                                     (kernel mode callable interfaces)                                                         Windows
            I/O Mgr


                                             Play Mgr.

                                             Plug and

                                                                                                       tion Mgr







          Device &                                                                                                             Graphics
          File Sys.                                                                                                            Drivers
                                            Hardware Abstraction Layer (HAL)
                                 hardware interfaces (buses, I/O devices, interrupts,
NtosKrnl.Exe                   interval timers, DMA, memory cache control, etc., etc.)
                                                                                       Original copyright by Microsoft Corporation. Used by
Security Reference Monitor
 Performs object access checks,
 manipulates privileges, and generates
 audit messages
 Group of functions in Ntoskrnl.exe
   Some documented in DDK
   Exposed to user mode by Windows API calls
 Demo: Open Ntoskrnl.exe with
 Dependency Walker and view functions
 starting with “Se”

Demo: Viewing Security
 Run Process Explorer
 Collapse Explorer process tree and focus
 on upper half (system processes)

Security Components
 Local Security Authority
    User-mode process (\Windows\System32\Lsass.exe)
    that implements policies (e.g. password, logon),
    authentication, and sending audit records to the
    security event log
    LSASS policy database: registry key
MSGINA                   LSASS
                                Active              Logger
                 LSA              SAM
      Policy    Server           Server           Active
                MSVC1_0.dl                       Directory
                Kerberos.dll               SAM

LSASS Components
 SAM Service
   A set of subroutines (\Windows\System32\Samsrv.dll ) responsible
   for managing the database that contains the usernames and
   groups defined on the local machine
   SAM database: A database that contains the defined local users
   and groups, along with their passwords and other attributes. This
   database is stored in the registry under HKLM\SAM.
   Password crackers attack the local user account password hashes
   stored in the SAM
 Demo: look at SAM service
   Open Lsass.exe process properties – click on services tab
   Click Find DLL – search for Samsrv.dll

Demo: Looking at the SAM
 Look at HKLM\SAM permissions
   SAM security allows only the local system account to access it
   Run Regedit
   Look at HKLM\SAM - nothing there?
   Check permissions (right click->Permissions)
   Close Regedit
 Look in HKLM\SAM
   Running Regedit in the local system account allows you to view the SAM:
    psexec –s –i –d c:\windows\regedit.exe
    sc create cmdassystem type= own type= interact
             binpath= "cmd /c start cmd /k“
    sc start cmdassystem
   View local usernames under
   Passwords are under Users key above Names

LSASS Components
 Active Directory
   A directory service that contains a database that stores
   information about objects in a domain
   A domain is a collection of computers and their associated
   security groups that are managed as a single entity
   The Active Directory server, implemented as a service,
   \Windows\System32\Ntdsa.dll, that runs in the Lsass process
 Authentication packages
   DLLs that run in the context of the Lsass process and that
   implement Windows authentication policy:
      LanMan: \Windows\System32\Msv1_0.dll
      Kerberos: \Windows\System32\Kerberos.dll
      Negotiate: uses LanMan or Kerberos, depending on which is most

LSASS Components
 Net Logon service (Netlogon)
   A Windows service (\Windows\System32\Netlogon.dll) that runs
   inside Lsass and responds to Microsoft LAN Manager 2 Windows
   NT (pre-Windows 2000) network logon requests
   Authentication is handled as local logons are, by sending them to
   Lsass for verification
   Netlogon also has a locator service built into it for locating
   domain controllers
 MSGINA                       LSASS
                                    Active              Logger
                      LSA              SAM
       Policy        Server           Server          Active
                    MSVC1_0.dl                       Directory
                    Kerberos.dll               SAM

 Logon process (Winlogon)
    A user-mode process running \Windows\System32\Winlogon.exe
    that is responsible for responding to the SAS and for managing
    interactive logon sessions
 Graphical Identification and Authentication (GINA)
    A user-mode DLL that runs in the Winlogon process and that
    Winlogon uses to obtain a user's name and password or smart
    card PIN
       Default is \Windows\System32\Msgina.dll
  MSGINA                        LSASS
                                      Active              Logger
                        LSA              SAM
        Policy         Server           Server          Active
                      MSVC1_0.dl                       Directory
                      Kerberos.dll               SAM

 Protecting Objects

What Makes Logon Secure?
 Before anyone logs on, the visible desktop is Winlogon’s
 Winlogon registers CTRL+ALT+DEL, the Secure
 Attention Sequence (SAS), as a standard hotkey
 SAS takes you to the Winlogon desktop
 No application can deregister it because only the thread
 that registers a hotkey can deregister it
 When Windows’ keyboard input processing code sees
 SAS it disables keyboard hooks so that no one can
 intercept it

 After getting security identification (account name,
 password), the GINA sends it to the Local Security
 Authority Sub System (LSASS)
 LSASS calls an authentication package to verify the logon
    If the logon is local or to a legacy domain, MSV1_0 is the
    authenticator. User name and password are encrypted and
    compared against the Security Accounts Manager (SAM)
       Cached domain logons are also handled by MSV1_0
    If the logon is to a AD domain the authenticator is Kerberos, which
    communicates with the AD service on a domain controller
 If there is a match, the SIDs of the corresponding user
 account and its groups are retrieved
 Finally, LSASS retrieves account privileges from the
 Security database or from AD
 LSASS creates a token for your logon session
 and Winlogon attaches it to the first process of
 your session
   Tokens are created with the NtCreateToken API
   Every process gets a copy of its parent’s token
 SIDs and privileges cannot be added to a token
 A logon session is active as long as there is at
 least one token associated with the session
   Run “LogonSessions –p” (from Sysinternals) to view
   the active logon sessions on your system

 Protecting Objects

The Access Validation
 Access validation is a security equation that
 takes three inputs:
   Desired Access
   Process Token
      Or Thread’s token if the thread is “impersonating”
   The object’s Security Descriptor, which contains a
   Discretionary Access Control List (DACL)
 The output is access allowed or access denied

 The main components of a token are:
   SID of the user
   SIDs of groups the user account belongs to
   Privileges assigned to the user (described in
   next section)
                 Account SID
                  Group 1 SID

                  Group n SID
                  Privilege 1

                  Privilege 1

Labs: Viewing Access Tokens
 Process Explorer: double click on a
 process and go to Security tab
    Examine groups list
 Use RUNAS to create a CMD
 process running under another
 account (e.g. your domain account)
    Examine groups list
 Viewing tokens with the Kernel
    Run !process 0 0 to find a process
    Run !process <PID> 1 to dump the
    Get the token address and type
    !token –n <token address>
    Type dt _token <token address> to
    see all fields defined in a token

 Lets an application adopt the security profile another user
   Used by server applications
   Impersonation is implemented at the thread level
       The process token is the “primary token” and is always accessible
       Each thread can be impersonating a different client
 Can impersonate with a number of client/server
 networking APIs – named pipes, RPC, DCOM

    Client                         Server
   Process                        Process                 Object


 Process And Thread Security
                              5   Access Token
1 ACL    Process                     User’s SID       ACL 3
                                     Group SIDs
                                     Owner SID
                                  Primary Group SID
                                     Default ACL

 2 ACL   Thread 1          Thread 2                     Thread 3

                              6   Access Token            Access Token
                                      User’s SID                 User’s SID
                      4 ACL
                                      Group SIDs                 Group SIDs
                                      Privileges                 Privileges
                                      Owner SID                  Owner SID
                                  Primary Group SID           Primary Group SID
                                      Default ACL                Default ACL

   Thread tokens (where present) completely supersede
   process token (basis for “security impersonation”)
 Windows uses Security Identifers (SIDs) to identify security
    Users, Groups of users, Computers, Domains
 SIDs consist of:
    A revision level e.g. 1
    An identifier-authority value e.g. 5 (SECURITY_NT_AUTHORITY)
    One or more subauthority values
 Who assigns SIDs?
    Setup assigns a computer a SID
    Dcpromo assigns a domain a SID
    Users and groups on the local machine are assigned SIDs that are
    rooted with the computer SID, with a Relative Identifier (RID) at the end
        RIDs start at 1000 (built-in account RIDs are pre-defined)
 Some local users and groups have pre-defined SIDs (eg. World = S-

Demo: SIDs
 Example SIDs

 Domain SID:    S-1-5-21-34125455-5125555-1251255
 First account:  S-1-5-21-34125455-5125555-1251255-1000
 Admin account: S-1-5-21-34125455-5125555-1251255-500
 System account: S-1-5-18

 Demo: run PsGetSid (Sysinternals) to view the
 SID of your username and of the computer

Security Descriptors
 Descriptors are associated with objects: e.g.
 files, Registry keys, application-defined
 Descriptors are variable length
                  Owner SID        Defined for POSIX
                 Primary Group


DACLs consist of zero or more Access Control
  A security descriptor with no DACL allows all access
  A security descriptor with an empty (0-entry) DACL
  denies everybody all access

An ACE is either “allow” or “deny”
           ACE Type
                          Read, Write,
           Access         Delete, ...

Demo: Viewing a Security
Descriptor Structure
 Get the address of an EPROCESS block with
 Type !object on that address
 Type “dt _OBJECT_HEADER” on the object
 header address to get the security descriptor
 Type !sd <security descriptor address> & -8 1

Access Check
 The Security Reference Monitor (SRM)
 implements an explicit allow model
 ACEs in the DACL are examined in order
   Does the ACE have a SID matching a SID in the
   If so, do any of the access bits match any remaining
   desired accesses?
   If so, what type of ACE is it?
      Deny: return ACCESS_DENIED
      Allow: grant the specified accesses and if there are no
      remaining accesses to grant, return ACCESS_ALLOWED
   If we get to the end of the DACL and there are
   remaining desired accesses, return
Access Check Example

      Mark            Access Request
   Privilege 1
   Privilege n

ACE Ordering
 The order of ACEs is important!
   Low-level security APIs allow the creation of DACLs with ACEs in
   any order
   All security editor interfaces and higher-level APIs order ACEs
   with denies before allows
    DACL                  Privilege 1                  DACL
    Deny                  Privilege n                 Allow
   Authors                                             Mark
                         Access Request
    Mark                                             Authors
     All                                              Read
Demo: ACE ordering
 Go to a NTFS file
 Add an Everyone deny-all to a file
 Will the Administrator be able to look at the file?
   Verify your answer by checking Effective Permissions

Access Special Cases
 An object’s owner can always open an
 object with WRITE_DACL and
 READ_CONTROL permission
 An account with “take ownership” privilege
 can claim ownership of any object
 An account with backup privilege can open
 any file for reading
 An account with restore privilege can open
 any file for write access

Controllable Inheritance
 In NT 4.0, objects only inherit ACEs from a parent
 container (e.g. Registry key or directory) when
 they are created
   No distinction made between inherited and non-
   inherited ACES
   No prevention of inheritance
 In Windows 2000 and higher inheritance is
   SetNamedSecurityInfoEx and SetSecurityInfoEx
   Will apply new inheritable ACEs to all child objects
   (subkeys, files)
   Directly applied ACEs take precedence over inherited
   ACEs                                                   185
 Protecting Objects

 Specify which system actions a
 process (or thread) can perform
 Privileges are associated with groups
 and user accounts
    There are sets of pre-defined
    privileges associated with built-in
    groups (e.g. System, Administrators)
 Examples include:
    Take ownership
 Privileges are disabled by default
 and must be programmatically turned
 on with a system call

Demo: Privileges
 Run Secpol.msc and examine full list
      Click on Local Policies->User Rights assignment
 Process Explorer: double click on a process, go
 to security tab, and examine privileges list
 Watch changes to privilege list:
 1.   Run Process Explorer – put in paused mode
 2.   Open Control Panel applet to change system time
 3.   Go back to Process Explorer & press F5
 4.   Examine privilege list in new process that was created
 5.   Notice in privilege list that system time privilege is
Powerful Privileges
 There are several privileges that gives an account that has them full
 control of a computer:
    Debug: can open any process, including System processes to
        Inject code
        Modify code
        Read sensitive data
    Take Ownership: can access any object on the system
        Replace system files
        Change security
    Restore: can replace any file
    Load Driver
        Drivers bypass all security
    Create Token
        Can spoof any user (locally)
        Requires use of undocumented NT API
    Trusted Computer Base (Act as Part of Operating System)
        Can create a new logon session with arbitrary SIDs in the token

Demo: Powerful Privileges
 View the use of the backup privilege:
    Make a directory
    Create a file in the directory
    Use the security editor to remove inherited security and give Everyone full access
    to the file
    Remove all access to the directory (do not propagate)
    Start a command-prompt and do a “dir” of the directory
    Run \Sysint\Solomon\PView and enable the Backup privilege for the command
    Do another “dir” and note the different behavior
 View the use of the Bypass-Traverse Checking privilege (internally called
 “Change Notify”)
    From the same command prompt run notepad to open the file (give the full path)
    in the inaccessible directory
    Extra credit: disable Bypass-Traverse Checking so that you get access denied
    trying to open the file (hint: requires use of secpol.msc and then RUNAS)

The End!
Thanks for coming!
For more information:
  Windows Internals, 4th edition
  5th edition will be updated for Vista (will ship
  when Vista ships )
We’ll stay for questions (we’re not here the
rest of the week )
Or, email us (see slide 1 for addresses)

                                © 2005 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


To top