GDC 2006 Cross-Platform Developm

Document Sample
GDC 2006 Cross-Platform Developm Powered By Docstoc
					Cross Platform Development Best

Matt Lee, Kev Gee
Microsoft Game Technology Group
 Code Considerations
   CPU Considerations
   GPU Considerations
   IO Considerations
 Content Considerations
   Data Build System
   Geometry Formats
   Texture Formats
   Audio Considerations
Compiler Comparison
 VS 2005 front end used for both platforms
   Preprocessor benefits both platforms
   Debugger experience is the same
   Full 2005 IDE support coming

 Xbox 360 optimizing back end added with
 XDK install
 Single solution / MSBuild file can target both
 Intel Pentium D / AMD Athlon64 X2
 Programming Model
   2 Cores running @ around 3.20 GHz
   12-KB Execution trace cache
   16-KB L1 cache, 1 MB L2 cache
   Deep Branch Prediction
   Dynamic data flow analysis
   Speculative Execution
   Little-endian byte ordering
   SIMD instructions
 Quad Core announced for early 2007
360 Custom CPU
 Custom IBM Processor
   3 64-bit PowerPC cores
   running at 3.2 GHz
      Two hardware threads per
      32-KB L1 instruction cache
      & data cache, per core
   Shared 1-MB L2 cache
      128-byte cache lines on all
   Big-endian byte ordering
   VMX 128 SIMD
   Lots of Registers
Performance Tools
 Profiling approaches are very similar between
 PC and Xbox 360
 PIX for Xbox 360 & PIX for Windows
   Being developed by the same team now
 Use instrumented tools on Xbox 360
   XbPerfView / Tracedump
 Xbox 360 does not have a sampling profiler yet
 Use PC profiling tools
   Intel VTune / AMD Code Analyst / VS Team System
   Attend the Performance Hands on training!
Focus Your Efforts
 Use performance tools to guide work
 Areas where we have seen platform
 specific efforts reap rewards
   Single Data Pass engine design
   High Frequency Game API Layers
     Use your profiler tools to target the hot spots
   Math Library - Bespoke vs XGMath vs D3DXMath
Impact on Code Design
 Designing Cross platform APIs
   Use of virtual Functions
   Parameter passing mechanisms
     Pass by reference vs. pass by value
   Typedef vector types and intrinsics
     Math Library Design Case Study
   Use of inlining
Use of Virtual Functions
 Be careful when using virtual functions to
 hide platform differences
 Virtual function performance on Xbox 360
   Adds branch instruction which is always
   Compiler limited in optimizing these
 Make a concrete implementation for Xbox
 Avoid virtual functions in inner loops
Cross Platform Render Example

                Base Class

   D3D9          Xbox 360         D3D10
 Overrides       Concrete        Overrides
Virtual Base   Implementation   Virtual Base
Cross Platform Render Example (ctd.)
class IRenderSystem
   #if !defined(_XBOX)
       virtual void Draw()=0;
       void Draw();
void IRenderSystem::Draw()
{                                 D3D9 & D3D10
   // 360 Implementation        implementations subclass
   ……                           for specialization
Beware Big Constructors
 Ctors can dominate execution time
 Ctors often hidden to casual observer
   Copy ctors add objects to containers
   Arrays of C++ objects are constructed
   Overloaded operators may construct temporaries
 Consider: should ctor init data?
   Example: matrix class zeroing all data
 Prefer array initialization = { … }
  Careful inlining is in general a Good Thing
  Plan to spend time ensuring the compiler
  is inlining the right stuff
    Use Perf Tools such as VTune / Trace recorder
  Try the “inline any suitable” option
  Enable link-time code generation
  Consider profile-guided optimization
  Use __forceinline only where necessary
Consider Passing Native Types by
  Xbox 360 has large registers
  64 bit Native PC does too
  Pass and return these types by value
    int, __int64, float
    Consider these types if targeting SSE / VMX
      __m128 / __vector4, XMVECTOR, XMMATRIX
  Pass structs by pointer or reference
  Help the compiler using _restrict
Math Library Header (Xbox 360)
#if defined( _XBOX )

#include <ppcintrinsics.h>
#include <vectorintrinsics.h>

typedef __vector4         XVECTOR;

typedef const XVECTOR     XVECTOR_PARAM;

#define XMATHAPI inline                  Pass by value


Math Library Header (Windows)
#if defined( _WIN32 )

#include <xmmintrin.h>

typedef __m128            XVECTOR;

typedef const XVECTOR&    XVECTOR_PARAM;
                                           Pass by
#define XMATHAPI inline                   reference


Math Library Function
                             XVECTOR_PARAM vB )
#if defined( VMX128_INTRINSICS )

    return __vaddfp( vA, vB );

#elif defined( SSE_INTRINSICS )

    return _mm_add_ps( vA, vB );

 Why Multithread?
   Necessary to take full advantage of modern
 Attend the Multi-threading talk later today
   Covers synchronization prims and lockless sync
 See Also:
   Talks from Intel and AMD (GDC2005 / GDC-E)
   OpenMP – C, not C++, useful in limited circumstances
   Concur – C++, see
D3D Architectural Differences
 D3D9 draw call cost is higher on Windows
 than on Xbox 360
   360 is optimized for a Single GPU target
   D3D10 improves draw call cost by design on
 Very important to carefully manage the
 number of batches submitted
   This can have an impact on content creation
   This work will help with 360 performance too
 Code Considerations
   CPU Considerations
   GPU Considerations
   IO Considerations
 Content Considerations
   Data Build System
   Geometry Formats
   Texture Formats
   Audio Considerations
 Wide variety of available Direct3D9 H/W
 CAPs and Shader Models abstract over feature
   GPUs that are approximately equivalent performance to the
   Xbox 360 GPU
     ATi X1900 / NVidia 7800 GTX
     Shader Model 3.0 support

 Direct3D10 Standardizes feature set
   H/W Scales on performance instead
Xbox 360 Custom GPU
 Direct3D 9.0+ compatible
 High-Level Shader Language (HLSL) 3.0+ support
 10 MB Embedded DRAM
   Frame Buffer with 256 GB/sec bandwidth
 Hardware scaling for display resolution matching
 48 shader ALUs shared between pixel and vertex shading
 (unified shaders)
   Up to 8 simultaneous contexts (threads) in-flight at once
   Changing shaders or render state can be cheap, since a new context
   can be started up easily
 Hardware tesselator
   N-patches, triangular patches, and rectangular patches
   For non continuous / adaptive cases trade memory for
   this feature on PC systems
Explicit Resolve Control
  Copies surface data from EDRAM to a texture in
  system memory
  Required for render-to-texture and presentation
  to the screen
    Can perform MSAA sample averaging or resolve individual
    Can perform format conversions and biasing
    Cannot do rescaling or resampling of any kind
  This can Impact your Xbox 360 engine design as
  it adds an extra step to common operations.
 Code Considerations
   CPU Considerations
   GPU Considerations
   IO Considerations
 Content Considerations
   Audio data
Use Native File I/O Routines
 Only native routines support key features:
   Asynchronous I/O
   Completion routines
 Prefer CreateFile and ReadFile
   Guaranteed as fast or faster than any other
 Avoid fopen, fread, C++ iostreams
Use Asynchronous File I/O
 File read/write operations block by default
 Async operations allows the game to do
 other interesting work
     Guarantees no intermediate buffering
   Use OVERLAPPED struct to determine when
   operation is complete
 See CreateFile docs for details
Memory Mapped File I/O
 Fastest way to load data on Windows
   However, the 32 bit address space is getting tight
   This is a great 64 bit feature add! 

 Memory Mapped I/O not supported on 360
   No HDD backed Virtual Memory management
Universal Gaming Controller
 XInput is the same API for Xbox 360 and Windows
 The Microsoft universal controller is a reference
 design which can be leveraged by other hardware
 XP Driver available from Windows Update
   Support is built in to Xbox 360 and Windows Vista
 Code Considerations
  CPU Considerations
  GPU Considerations
  IO Considerations
 Content Considerations
  Data Build System
  Geometry Formats
  Texture Formats
  Audio Considerations
Data Build System
 Add a data build / processing phase to your
 production system
   Compile, optimize and compress data according to
   multiple target platform requirements
   Easier and faster to handle endian-ness and other format
   conversions offline
   Data packing process can occur here too
 Invest time in making the build fast
   Artists need to rapidly iterate to make quality content
   Incremental builds can really help reduce the buildtime
 Try the XNA build tools
   Copies of XNA build CTP are available NOW!
Geometry Compression
 Offline Compression of Geometry
   Provides wins across all platforms
   Disk I/O wins as well as GPU wins
 The compression approach is likely to be target
 PC is usually a superset of the consoles in this
   D3D9 CAPs / limitations to consider
     16 bit Normals - D3DDECLTYPE_FLOAT16_2
Compressing Textures
 Wide variety of Texture Compression
   ATI Compressinator
   DirectX SDK DDS tools
   NVIDIA – Photoshop DDS Export
   Compression tools for 360 (xgraphics.lib)
     Supports endian swap of texture formats

   Build your own too!
     Make them fit your content.
Texture Formats
    BC == Block Compressed
    Standard DXT* formats across all platforms

    2-component format with 8 bits of precision per
    Great for normal maps
    Single component textures made from a DXT3/DXT5
    alpha block
    4 bits of precision
    Xbox 360 / D3D9 Only
Texture Arrays
 Texture arrays
   generalized version of cube maps
 D3D9 emulate using a texture atlas
 Xbox 360
      Up to 64 surfaces within a texture, optional MIPmaps for each
      Surface is indexed with a [0..1] z coordinate in a 3D texture
 D3D10 supports this as a standard feature
   Up to 512 surfaces within a texture
   Bindable as rendertarget, with per-primitive array index
Custom Vertex Fetch / Vertex Texture

  D3D9 Vertex Texture implementations use

  360 supports explicit instructions for this

  D3D10 supports this as a standard feature
    Load() from buffer (VB, IB, etc.) at any stage
    Sample() from texture at any stage
  D3DX FX and FX Lite co-exist easily
    #define around the texture sampler differences

  Preshaders are not supported on FX Lite
    We advise that these should be optimized to
    native code for D3D9 Effects
HLSL Development
 Set up your engine and tools for rapid
 shader development and iteration
 Compile shaders offline for performance,
   maybe allow run-time recompilation during
 Be careful with shader generation tools
   Perf needs to be considered
   Schedule / Plan work for this
Cross-Platform HLSL Consideration
  Texture access instruction considerations
  Xbox 360 has native tfetch / getWeights features
      Constant texel offsets (-8.0 to 7.5 in 0.5 increments)
      Independent of texture size

  Direct3D 10 supports integer texture offsets when
    Direct3D 10 supports getdimensions() natively
      Equivalent to getWeights

  Direct3D 9 can emulate tfetch & getWeights
  behavior using a shader constant for texture
HLSL Example
float2 g_invTexSize = float2( 1/512.0f, 1/512.0f);

float2 getWeights2D( float2 texCoord )
    return frac( texCoord / g_invTexSize );

float4 tex2DOffset( sampler t, float2 texCoord, float2 offset )
    texCoord += offset * g_invTexSize;
    return tex2D( t, texCoord );
Shader management
 Find a balance between übershaders and specialized
 shader libraries
   Dynamic/static branching versus static compilation
 Small shader libraries can be built and stored inside a single
 Effect file
   One technique per shader configuration

 Larger shader libraries
   Hash table populated with configurations
   Streaming code can load could shader groups on demand
   Profile-guided content generation
   Avoid compiling shaders at run time
 Compiled shaders compress very well
Audio Considerations
 (Microsoft Cross-Platform Audio Creation Tool)
   API and authoring tool parity:
      author once, deploy to both platforms
   Primary difference = wave compression
     ADPCM on Windows vs. Xbox 360 native XMA support
     XMA: controllable quality setting (varies, typically ~6-
     ADPCM: Static ~3.5:1 compression
   Likely need to trade memory for bit rate.
   On Windows, can use hard disk streaming to balance
   lower compression rates if needed
Call To Action!
  Design your games, engines and production
  systems with cross platform development in mind
    (PC / Xbox 360 / Other)

  Invest in making your data build system fast
  Take advantage of each platforms strengths
    Target a D3D10 content design point and fallback to
    D3D9+, D3D9, …

  Provide feedback on how we can make production
  Attend the XACT, HLSL, SM4.0 and Performance
  Hands On Labs

Shared By: