NHTV, Breda, July Final Lecture

Ray Tracing for Games Dr. Jacco Bikker - IGAD/NHTV, Breda, July 4 - 15 Final Lecture 𝑰 𝒙, 𝒙′ = π’ˆ(𝒙, 𝒙′ ) 𝝐 𝒙, 𝒙′ + 𝝆 𝒙, 𝒙′ , 𝒙′′ 𝑰 𝒙′ , 𝒙′′ 𝒅𝒙′′ 𝑺 ...
Author: Leo Watts
4 downloads 0 Views 3MB Size
Ray Tracing for Games Dr. Jacco Bikker - IGAD/NHTV, Breda, July 4 - 15

Final Lecture 𝑰 𝒙, 𝒙′ = π’ˆ(𝒙, 𝒙′ ) 𝝐 𝒙, 𝒙′ +

𝝆 𝒙, 𝒙′ , 𝒙′′ 𝑰 𝒙′ , 𝒙′′ 𝒅𝒙′′ 𝑺

Today’s Agenda:  Introduction  GPU Architecture  CUDA Primer  My First GPU Ray Tracer

Ray Tracing for Games

Introduction Supercomputing for the Masses* A GPU offers substantially more compute power than a CPU: TitanX: 6.6 TFLOPS Radeon R9 Fury: 8.6 TFLOPS Xeon D-1540 (8 core): 256 GFLOPS ($680; April 2016)

*: Supercomputing for the Masses, Rob Farber in DrDobbs, 2008: http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/207200659

Ray Tracing for Games

Introduction Supercomputing for the Masses GPUs also have substantially more bandwidth: AMD R9 Fury: 512GB/s NVidia TitanX: 336.5GB/s Intel Xeon: 118GB/s

Ray tracing requires compute power as well as bandwidth.

Ray Tracing for Games

Introduction Supercomputing for the Masses GPUs have a unique way of dealing with latencies:  CPUs rely on caches to reduce average memory access time  GPUs rely on massive parallelism to hide latencies. The CPU approach works well if memory access is somewhat coherent. The GPU approach works well if there is sufficient parallelism (memory access coherence is irrelevant).

Ray Tracing for Games

Introduction Supercomputing for the Masses And finally, GPUs are optimized for graphics. Although ray tracing is a GPGPU task, we still benefit:     

Texture filtering hardware Very fast sin/cos/tan and square root Efficient conversion between float and int Efficient Interpolation and clamping Fast interop with OpenGL / DirectX

The GPU is a perfect match for ray tracing… …but it comes with some peculiarities.

Today’s Agenda:  Introduction  GPU Architecture  CUDA Primer  My First GPU Ray Tracer

Ray Tracing for Games

GPU Architecture

CPU

core

core

core

core

GPU

cache

RAM

Small number of cores Optimized for generic tasks Hyperthreading: two threads per core

RAM

Small number of cores (β€˜multiprocessors’) Optimized for parallel tasks Many threads per core, grouped in warps

Ray Tracing for Games

GPU Architecture

Warp:

32 threads, running the same instructions in lock step – SIMT. In case of delays, the stalled warp is swapped for another warp. A single multiprocessor manages several warps (up to 64). Switching between warps is governed by the hardware. Each warp (active or not) has its own registers; switching is β€˜instant’. A modern GPU can execute 4 warps simultaneously (while 60 wait). A modern GPU can have up to 24 multiprocessors.

24 Γ— 64 Γ— 32 = 49152

Ray Tracing for Games

GPU Architecture Feeding the Beast How do we feed such a processor sufficient work?

We feed it many identical tasks. For a ray tracer:  One thread per pixel.

Ray Tracing for Games

GPU Architecture GPU Memory Model NVidia Maxwell architecture*: Registers: 65536 per multiprocessor 1 cycle acess time

Shared memory: 96Kb per multiprocessor 28 cycles

L1/texture cache: 24Kb per multiprocessor 1 cycle

L2 cache: 2MB 194 cycles

Global memory: ~2GB 350 cycles

Host memory: ~8GB PCIe 3.0 Bandwidth: ~15GB/s *: http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf

Ray Tracing for Games

GPU Architecture GPU Memory Model Consequences of the memory model: 1. Caches are either very small (L1) or slow (L2). 2. This is compensated by β€˜shared memory’, which we have to manage manually. 3. The memory hierarchy is (at least partially) explicit rather than implicit as on the CPU. 4. We have to trade registers per thread for number of threads: at 2048 threads per multiprocessor, each thread can use only 32 registers (a single float4 is four registers). Beyond this count, β€˜register spilling’ occurs.  It’s probably better to feed the GPU small programs.  We have to be really careful when spending memory.

Today’s Agenda:  Introduction  GPU Architecture  CUDA Primer  My First GPU Ray Tracer

Ray Tracing for Games

CUDA Primer

Today’s Agenda:  Introduction  GPU Architecture  CUDA Primer  My First GPU Ray Tracer

Today’s Agenda:  Introduction  GPU Architecture  CUDA Primer  My First GPU Ray Tracer

Ray Tracing for Games

TOTAL RECAP

Ray Tracing for Games

Lecture 1a Game development Game architecture The Template Tick Realtime Actors World state Scene graph Data ownership Killing a scene graph

Ray Tracing for Games

Lecture 1b Rasterization: limitations Millions of LOC The attraction of ray tracing RT state of the art RT ingredients Intersections Basic RT algorithm Building a ray tracer in a day

Ray Tracing for Games

Lecture 2 The Art of Optimization Bottlenecks & scalability Measure! The God algorithm High level optimization Low level optimization Data centric & caching Data locality Thread level parallelism Instruction level parallelism

Ray Tracing for Games

Lecture 3 RT: Optimizing high level Acceleration: grids Acceleration: nested grids BVH BVH data layout BVH traversal Packet traversal Binned BVH building Ray / box intersection

Ray Tracing for Games

Lecture 4 SIMD SSE _mm_rsqrt_ps AoS & SoA Vectorization Masking: _mm_cmplt_ps SIMD in ray tracing

Ray Tracing for Games

Lecture 5 Top-level BVH Geometry classification Static, rigid, deforming Refitting Agglomerative clustering Ray transform Efficient traversal Ray coherence Ray packets for shadows More optimizations… Multithreading

Ray Tracing for Games

Lecture 6 Whitted-style Ray optics Physical basis Snell, Fresnel Rendering equation Monte-Carlo integration Distributed ray tracing Motion blur, depth of field

Ray Tracing for Games

Lecture 7 Sampling Stratification Explicit light paths Importance Cosine-weighted Probability Density Function Resampled Importance Light array

Ray Tracing for Games

Lecture 8 GPU Architecture GPGPU CUDA My First GPU Ray Tracer

Total Recap

Ray Tracing for Games

Suggest Documents