VIEWS: 5 PAGES: 55 POSTED ON: 7/27/2012
Numerical-Precision-Optimized Volume Rendering Ingmar Bitter Neophytos Neophytou Klaus Mueller Arie Kaufman Numerical-Precision-Optimized Volume Rendering Ingmar Bitter Neophytos Neophytou Klaus Mueller Arie Kaufman Outline • Numerical precision - a rendering resource Outline • Numerical precision - a rendering resource • Fixed-point arithmetic Outline • Numerical precision - a rendering resource • Fixed-point arithmetic • Reverse order precision analysis – Compositing, shading, gradients, classification, sampling/splatting, sample/splat location Outline • Numerical precision - a rendering resource • Fixed-point arithmetic • Reverse order precision analysis – Compositing, shading, gradients, classification, sampling/splatting, sample/splat location • Results Outline • Numerical precision - a rendering resource • Fixed-point arithmetic • Reverse order precision analysis – Compositing, shading, gradients, classification, sampling/splatting, sample/splat location • Results • Conclusions Numerical Precision: A Resource • Double precision computation for all – ideal? Numerical Precision: A Resource • Double precision computation for all – ideal? – slower then all other alternatives – not possible on graphics cards (at least for now) – expensive on custom chip implementations – and most importantly: not needed to create best possible images!! Numerical Precision: A Resource • Double precision computation for all – ideal? – slower then all other alternatives – not possible on graphics cards (at least for now) – expensive on custom chip implementations – and most importantly: not needed to create best possible images!! reasons: predominantly 8-bit displays (per channel) limited range intervals throughout Current Status • Stable volume rendering pipeline: both CPU and GPU [LL94, Lev88, MJC02, Wes90, EKE01, RSEB00] • Interpolation before classification, even for splatting [MMC99] • Caching optimized for volume rendering [Kni00, LCCK02, PSL98] • Precision-limited rendering systems: ATI, NVidia, VolumePro [PHK99], VizardII [MKW02], UltraVis [Kni00] • Completely fixed: final output image display bit precision – 8 bits per RGB color channel on CRTs and LCDs – 8 bits max in DVI standard – SGIs 12 bit color displays are nearly extinct – Radiologists’ requirements are not mass market, same analysis applies OpenGL Arithmetic: 12=1? • Representation [0, 255] a = b = 255 • Computation = a[0, 255] × b[0, 255] >> 8; = 254 wrong 1 mult, one shift • Alternatively: tmp = a[0, 255] × b[0, 255] + 128; result = (tmp+(tmp >> 8)) >> 8; = 255, correct [Bli95] 1 mult, 2 adds, 2 shifts OpenGL Arithmetic: 12=1? • Representation: fixed-point I.Fb – I.Fb = I integer bits, F fraction bits • 8 bits 1.7b fixed point number then a = b = 11.7b = 128 • Computation = a1.7b × b1.7b >> 7 = 128 correct 1 mult, one shift one fewer bit of resolution, but OK (we will see) Reverse Order Precision Analysis Ray Casting Splatting • Unified ray casting Sample Location Splat Location and splatting pipelines Sample Splat • Composite creates the final image Classify Gradient Shade Composite Reverse Order Precision Analysis Ray Casting Splatting • Unified ray casting Sample Location Splat Location and splatting pipelines Sample Splat • Composite creates the final image Classify • Precision Gradient requirements propagate Shade backwards Composite Compositing - Math • Pre-(alpha)-multiplied colors: – C = αC = αR, αG, αB • Alpha correction (r samples per unit): – Tcorrected = (1- α)r Compositing - Math • Pre-(alpha)-multiplied colors: – C = αC = αR, αG, αB T/CCompositingBuffer • Alpha correction: Tcorrected, Cfront – Tcorrected = (1- α)r T/CCompositingBuffer • With back-to-front compositing: – CCompositingBuffer ×= Tcorrected += Cfront – TCompositingBuffer ×= Tcorrected; αCompositingBuffer = 1-Tcorrected – perform multiplication N times per pixel correct solution needs N × F × r bits precision Compositing – Precision Theory • 8-bit destination resolution – therefore all partial results can be rounded – drop all bits not contributing to the 8 most significant bits (MSB) • Adding N = 2p samples – allows 8+p bits to influence the 8 MSB • Conversion from αCompositingBufferC to C for display (division) – allows 8+p more bits to influence the 8 MSB • Conversion from αcorrectedC to C for display – allows r times as many bits to influence the 8 MSB • Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α – 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling – 608 bits for 5122×2048 volumes and 16 samples per voxel Compositing – Precision Theory • 8-bit destination resolution – therefore all partial results can be rounded – drop all bits not contributing to the 8 most significant bits (MSB) • Adding N = 2p samples – allows 8+p bits to influence the 8 MSB • Conversion from αCompositingBufferC to C for display (division) – allows 8+p more bits to influence the 8 MSB • Conversion from αcorrectedC to C for display – allows r times as many bits to influence the 8 MSB • Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α – 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling – 608 bits for 5122×2048 volumes and 16 samples per voxel Compositing – Precision Theory • 8-bit destination resolution – therefore all partial results can be rounded – drop all bits not contributing to the 8 most significant bits (MSB) • Adding N = 2p samples – allows 8+p bits to influence the 8 MSB • Conversion from αCompositingBufferC to C for display (division) – allows 8+p more bits to influence the 8 MSB • Conversion from αcorrectedC to C for display – allows r times as many bits to influence the 8 MSB • Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α – 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling – 608 bits for 5122×2048 volumes and 16 samples per voxel Compositing – Precision Theory • 8-bit destination resolution – therefore all partial results can be rounded – drop all bits not contributing to the 8 most significant bits (MSB) • Adding N = 2p samples – allows 8+p bits to influence the 8 MSB • Conversion from αCompositingBufferC to C for display (division) – allows 8+p more bits to influence the 8 MSB • Conversion from αcorrectedC to C for display – allows r times as many bits to influence the 8 MSB • Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α – 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling – 608 bits for 5122×2048 volumes and 16 samples per voxel Compositing – Precision Theory • 8-bit destination resolution – therefore all partial results can be rounded – drop all bits not contributing to the 8 most significant bits (MSB) • Adding N = 2p samples – allows 8+p bits to influence the 8 MSB • Conversion from αCompositingBufferC to C for display (division) – allows 8+p more bits to influence the 8 MSB • Conversion from αcorrectedC to C for display – allows r times as many bits to influence the 8 MSB • Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α – 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling – 608 bits for 5122×2048 volumes and 16 samples per voxel Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Precision Practice • No alpha correction (r = 1): 2 × (8+p) bits • Iso-surface rendering using “old fashioned” OpenGL: – store not αC but C in frame buffer: (8+p) – bright colors: 5+p – at most 8 non-zero samples per ray (p=3): 5+3=8 bits standard 24 bit RGBA frame buffer is adequate • Fog visualization – what matters is the ability to see objects though volumetric fog (substance with low opacity) – visual experiments show 15 fractional bits are sufficient Compositing – Conclusion Least-significant-bit-fog at various bit precisions 8 10 12 14 15 16 5123 dataset • Preferred bit-aware r=2 back-to-front compositing equations: • αC1.15b ×= T1.15bsample += C1.15bsample • T1.15b ×= T1.15bsample Shading - Math • PhongC color = kambient OobjectColor IlightIntensity + kdiffuse O Σi{ Ii (N•Li) } + kspecular Σi{ Ii (R•Li)r } • k є [0,1] kambient + kdiffuse + kspecular =1 • OobjectColor (8 bit) and IlightIntensity є [0,1] • N•Li and R•Li є [-1,1], but є [0,1] after clamping • PhongC color = є [0,1] (possibly clamping Σi) Shading - Analysis • PhongC color needs to be as precise as 1.15b • Use 16.16b for all multiplications [0,1)× [0,1] – sufficient precision and no overflow Shading – New Computation • Replace specular exponentiation with recursive multiplies – repeatedly multiply number with itself – works for all exponents r=2n – when r=26 (16 bit precision), then max error < 0.005% – better results than Knittel’s parabola approximation Shading – New Computation • Replace specular exponentiation with Knittel’s recursive multiplies parabola – repeatedly multiply number with itself – works for all exponents r=2n pow – when r=26 (16 bit precision), then max error < 0.005% – better results than r=2n Knittel’s parabola approximation Shading - Conclusion • Preferred bit-aware Phong shading equation: C16.16b = k16.16bambient O0.8bobjectColor I16.16blight + k16.16bdiffuseO0.8b Σi{ I16.16bi (N16.16b•L16.16bi) } + k16.16bspecular Σi{ I16.16bi (R16.16b•L16.16bi)2^n } Gradients - Math • Gx = 0.5 sample(x+1,y,z) - 0.5 sample(x-1,y,z) • Gy = 0.5 sample(x,y+1,z) - 0.5 sample(x,y-1,z) • Gy = 0.5 sample(x,y,z+1) - 0.5 sample(x,y,z-1) Gradients - Analysis • G = G1.Fb φ • Discrete nearest gradient vector neighbors – sin φ = 1/2F, sin φ ≈ φ → φ ≈ 1/2F • Maximum error for specular intensity, large r – r = 64, 164 != 1, but 164 = (1- 1/2F)64 – error of 22%, 6.1%, 1.6%, 0.4% for F of 8, 10, 12, 14 Gradients - Analysis • 5123-sized spheres with 4 6 8 Phong highlights • 4, 6, 8, 10, 12, 14 bit gradients 10 12 14 • Diffuse artifacts for 4 and 6 bits • Specular artifacts up to 10 bits 10 12 14 Gradients - Conclusion • Thus, 12 bits dynamic range is needed • Now consider normalization: – reduces I.Fb to 1.Fb – up to I bits will be added to the fractional part • Volume samples often have 12 bits • Gx,y,z with 12.12b minimum representation • Gx,y,z with 16.16b preferred representation – leaves room for interpolation bits in normalization Classification – Prelims and Recaps • Use of T instead of α is more efficient in compositing operation • Largest visual precision/quantization error occurs at high transparencies (low opacities) – need more bits for T than for C, just to be sure • Want transfer function lookup table to be cache- friendly – power-of-2 RGBA-tuple alignment • Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02] Classification – Prelims and Recaps • Use of T instead of α is more efficient in compositing operation • Largest visual precision/quantization error occurs at high transparencies (low opacities) – need more bits for T than for C, just to be sure • Want transfer function lookup table to be cache- friendly – power-of-2 RGBA-tuple alignment • Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02] Classification – Prelims and Recaps • Use of T instead of α is more efficient in compositing operation • Largest visual precision/quantization error occurs at high transparencies (low opacities) – need more bits for T than for C, just to be sure • Want transfer function lookup table to be cache- friendly – power-of-2 RGBA-tuple alignment • Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02] Classification – Prelims and Recaps • Use of T instead of α is more efficient in compositing operation • Largest visual precision/quantization error occurs at high transparencies (low opacities) – need more bits for T than for C, just to be sure • Want transfer function lookup table to be cache- friendly – power-of-2 RGBA-tuple alignment • Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02] Classification - Math • Desired lookup table entries: R1.8bG1.8bB1.8bT1.16b 5.5 bytes • Common lookup table entries: R0.8bG0.8bB0.8bα0.8b 4 bytes Classification - Math • Desired lookup table entries: R1.8bG1.8bB1.8bT1.16b 5.5 bytes • Common lookup table entries: R0.8bG0.8bB0.8bα0.8b 4 bytes • Better lookup table entries: R0.8bG0.8bB0.8bsqrt(α)0.8b spreads low α • Computed lookup after T = 1-(sqrt(α)2): R0.8bG0.8bB0.8bT1.16b squaring doubles precision Classification - Conclusion Foot with least-significant-thin-tissue-fog α0.8b sqrt(α)0.8b α0.16b • Preferred bit-aware lookup table entries: R0.8bG0.8bB0.8bsqrt(α)0.8b Sample Interpolation - Math • sample = voxel0 × (1-w) + voxel1 × w • sample = w × (voxel1 - voxel0) + voxel0 • Requirements: – Gx,y,z, derived from samples, need 12 bit dynamic range – samples need 12 bit values for transfer function lookup – cover both low and high dynamic range neighborhoods • Therefore, sample12.12b is a minimum requirement – integer part comes from voxels voxel12.0b – fractional part comes from interpolation w1.12b Sample Interpolation - Conclusion • Preferred bit-aware sample interpolation: sample12.12b = w1.12b × (voxel112.0b - voxel012.0b) + voxel012.0b • Splats start on voxels, need no interpolation: splat12.0b = voxel12.0b Sample Location - Math k • k-th sample location = startPos + Σk Vinc • Perspective rays need to differ enough to allow 1024 rays across 60 degrees, or 0.05◦ – sin φ = (k 1/2F) / k, sin φ ≈ φ → φ ≈ 1/2F φ – F = 6, 12, 16 → φ = 0.9◦, 0.05◦, 0.0009◦ • Also, need to address 2048 slices (integer positions) → 11bits • Thus, need overall 11.12b Sample Location - Conclusion • Preferred bit-aware sample location: – perspective projection: sampleLocation11.12b = startPos11.12b + Σ Vinc1.12b – parallel projection: sampleLocation11.6b (0.9◦ OK) Splat Scan Conversion - Math • Splats project onto image grid → reverse rays • Allow as many as 2048 splat rays across 60 degrees, or 0.025◦ • Hence, twice the ray casting precision – one extra fractional bit F=13 φ • Also address 2048 slices (11bits) • Thus, need overall 11.13b Splat Scan Conversion - Conclusion • Preferred bit-aware splat scan conversion: splatLocation11.13b = startVoxelPos11.13b + Σ Vinc1.13b • Splats are usually pre-transformed and stored in bucket lists (one per sheet-buffer) • Preferred voxel location sheet buffer format x11.13b u8.0b y11.13b v8.0b (64 bits total) – x, y: location on splat plane – u: index into pre-integrated splat table u – v: voxel value (x, y) y) Results • Summary of minimum precision requirements Rendering Stage Input Output Sample locations N/A 11.12b Sample interpolation 12.00b 12.12b Classification 12.00b 4× 0.8b Gradients 12.12b 1.12b Shading 1.12b 1.15b Compositing 1.15b 1.15b Results • Restricted iso-surface rendering: – texture map volume rendering can be done using plain OpenGL or Direct X and 8 bit frame buffers • General volume rendering, all pipeline stages: – 32 bit single precision floating point format – 16.16b fixed point format (up to 4x faster in our tests) Pentium allows 2 simple 32-bit integer ops per clock cycle Conclusions • 8 bits per RGB channel on final display • Analysis of requirements by back propagation • Sufficient precision computations using – either 32 bits single precision floating point format – or 16.16b fixed point format • Voxel location sheet buffer x11.13bu8.0by11.13bv8.0b • Transfer functions stored as R0.8bG0.8bB0.8bsqrt(α)0.8b • Compositing/fragment buffer R1.15bG1.15bB1.15bT1.15b Acknowledgements • Hewlett Packard Laboratories • ONR grant N000140110034 • NSF CAREER grant ACI-0093157 • DOE grant MO-068 • Thanks to Tom Malzbender and Michael Meissner for technical discussions. • Thanks to Ronald Summers for resources.