GPU Hybrid Computation Platform for Visual Effects

Crom - CPU/GPU Hybrid Computation Platform for Visual Effects Nathan Cournia, Casey Vanover, Bill Spitzak, Hans Rijpkema, Josh Tomlinson, Bradley Smit...
Author: Baldwin Foster
33 downloads 0 Views 2MB Size
Crom - CPU/GPU Hybrid Computation Platform for Visual Effects Nathan Cournia, Casey Vanover, Bill Spitzak, Hans Rijpkema, Josh Tomlinson, Bradley Smith, Nathan Litke Rhythm and Hues Studios

Who We Are

Motivation ●

Modernize lighting/compositing workflows



Unify user experience ●



Workflow evolved across four proprietary packages

Streamline pipeline

Look Development (Lighthouse)

Render (Wren)

Light Placement (Voodoo)

Scene Lighting (Lighthouse)

LightCmp (Icy)

Requirements ●





Rethink our software designed up to 25 years ago ●

Multiple-cores, multiple GPUs, international locations, cloud



Decouple interface from computation engines

Seamless integration with other software: ●

Pipelines: R+H, Shotgun, etc



Renderers: R+H, Mantra, etc.

User extensible: ●

C++



Python (new nodes, Qt interfaces)



Interface builder / Visual Programming



Easily share networks / interfaces

Main Idea ●

Crom is a VFX platform

VFX Platform ●

Look Development



Scene Lighting



Compositing



Misc. Tools

General Design ●



Core data structure is a dependency graph Data passed between dependency graph nodes are strongly typed



Dependency graph is stateless



Can hook up anything to anything else

Stateless Nodes ●



Multiple threads can traverse the graph in parallel "Global" state is passed up the dependency graph in a "Context / Request" object ●

Multiple frames, tiles, layers, etc. can be concurrently computed

Data ●

Data passed between nodes is stored in a "property graph"



Data representation is decoupled from programming interface ●



An interface, i.e. Adapter/Wrapper, can be placed onto a property graph to define an object A property graph can be adapted to provide multiple interfaces



Copy-on-write semantics allow for sharing of data



Heuristics to place subsets of data into a persistent cache



Property graph is dynamically user extensible yet strongly typed

VFX Compositor ●



Compositor: Assembles multiple images into a final image(s). Example: Nuke

GPU/CPU Compositor ●



Crom implements a hybrid GPU/CPU compositor Dependency graph traversal produces two main items in the property graph: ●



Instruction Tree: Low-level operations to be performed Data Callbacks: Objects that will be invoked to populate the compositing engine with data from the dependency graph

Example cmp Node Graph

Example Instruction Tree

Callbacks ReadImage1 Callback

ReadImage2 Callback

RGB1 Callback

Callbacks (cont.) ReadImage1 Callback

ReadImage2 Callback

RGB1 Callback

Instruction Tree (cont.) ●

● ●

Generic representation of low-level operations that need to be done. When working interactively, converted to GLSL. When working on the render farm, converted to OpenCL.

Instruction Tree (GLSL)

uniform sampler2D ReadImage1 ; uniform sampler2D ReadImage2 ; uniform vec4 RGB1 ; varying vec2 v0000 ; void main(void ) { vec4 t0001 = texture2D(ReadImage1, v0000); vec4 t0002 = t0001 + (texture2D(ReadImage2, v0000) * (1 - clamp(t0001.w, 0, 1))); gl_FragColor = (vec4(t0002.xyz, clamp(t0002.w, 0, 1)) * RGB1 ); }

Per-Pixel Expressions ●



Instruction tree nodes can not only be created from the dependency graph but also from crom's expression language Allows for fast per-pixel expressions! sample(ReadImage1.output, vec2(sin(pos.x), pos.y + cos(pos.x)))

Lazy Programmers ●

cmp node library only has around 50 nodes ●



Define low-level operations (cmp.Add, cmp.Translate, cmp.Crop, cmp.Text)

Most nodes are user defined via "macro" nodes!

Macro Node (cmp.Gamma)

Macro Node (cmp.Gamma)

Macro Nodes ●

● ●



Benefit of macro nodes is that they produce an Instruction tree without the user writing any C++ / Python Macro nodes can be just as fast as built-in nodes Custom interfaces can be created that are indistinguishable from built-in interfaces via the interface builder or Python Macro nodes usually contain other macro nodes ●

Production scripts contain well over 250k nodes

GPU Saturation ●









Depedency graph traversal produces hundreds of GPU API calls When scrubbing controls commands build up in GPU Easy to saturate GPU with tens of thousands of commands with a simple gesture GUI quickly becomes unresponsive as GPU tries to process given commands A cornerstone of the Crom platform is that sub-tasks can be interrupted/canceled ●



Allows for fast feedback

GPU APIs do not support canceling commands

Dispatch Queue ● ●

● ●

Crom uses a global GPU dispatch queue All compute communication with the GPU happens on a single context/thread pair Compute threads locally queue commands Locally queued commands are enqueued to global queue in logical batches

Dispatch Queue Observations ●

Global queue throttles commands to ensure GPU driver's command buffer is not to deep



Commands in global dispatch queue can be interrupted



Easy to support "native kernels" in OpenGL backend



GPU throughput not optimal. Overall system is more responsive



Tricky to handle errors in dispatch queue





Must be careful not to interrupt object creation/population commands that are needed for later commands Single context/thread pair helps avoid nasty driver bugs

GPU Limitations ●

In practice the GPU has several limitations: ●

Memory



Uniforms



Varyings



Image Units



Instructions

Instruction Tree Splitting ●







The instruction tree tells us: ●

Memory requirements



Uniform requirements



Varying requirements



Number of input images



Estimate of instructions needed

We break up the instruction tree into smaller sub-trees that "fit" on the GPU Use multiple shader/kernel invocations to composite image Sub-tree output can be cached

Questions? Nathan Cournia [email protected]

Suggest Documents