Voxel- and mesh-based Monte Carlo methods for 3D photon transport simulations Qianqian Fang, Ph.D. Martinos Center for Biomedical Imaging Massachusetts General Hospital Harvard Medical School Virtual Photonics, 09/10/2010
Software website: http://mcx.sf.net
Outline
Introduction
Pros and Cons of Monte Carlo simulations
MCX: a GPU-accelerated MC software
Algorithm flowchart
Boundary reflections
Atomic vs. non-atomic operations
MMC: a mesh-based MC software
Ray-tracing in the Plücker coordinates
Multi-threading with OpenMP
Validations
Website and resources (http://mcx.sf.net)
Virtual Photonics, 09/10/2010
Modeling photon transport in complex media
Transport equation → Finite element, Finite volume, Monte Carlo
Diffusion → Finite element, finite difference …
Software tools:
Finite element: NIRFAST (Dartmouth), TOAST (UCL), Redbird (MGH) Monte Carlo:
Virtual Photonics, 09/10/2010
MCML: Wang & Jacques (1995), layered media
tMCimg: Boas (2002), 3D complex media
CUDAMCML: Alerstam (2009), layered media+GPU
Pros of MC
Simplicity
General media (low-scattering, high absorption, g)
Discontinuities
Flexible source settings
Ease in time-domain simulations
Highly parallelizable
Virtual Photonics, 09/10/2010
Cons of MC
(Very) slow in speed
Difficulties for modeling complex geometries Low speed
Massively Parallel Computing with GPU
Modeling Complex shapes
Mesh-based Monte Carlo
Virtual Photonics, 09/10/2010
Monte Carlo eXtreme MCX
MMC
Multi-core processors and GPU
Multi-core is becoming the mainstream Graphics hardware have many cores and optimized for processing data-parallel tasks Pixel shaders
Vertex shaders
Virtual Photonics, 09/10/2010
How is a GPU different from a CPU?
Pros:
Many-core processor
High speed internal memory
Parallel pipeline
Highly optimized
Cons:
No recursion,no function pointers, no host functions GPU-CPU memory transfer is slow Portability across hardware
Virtual Photonics, 09/10/2010
NVIDIA CUDA Programming Guide
Project overview Based on visitor history at http://mcx.sf.net 2009
v0.2b
v0.2
v0.5b MMC v0.2 2010 In the past year ~580 total downloads ~90 registered users 2840 unique visits from 65 regions 4 major releases MCX: 200 SVN commits MMC: 110 SVN commits MCX-CL: 35 SVN commits
Virtual Photonics, 09/10/2010
Photon transport in a voxelated space Incident Position and vector
Scattering Angle: θ Accumu. interval
Scattering Length: L θ = RNG(media[i,j].g) Henyey-Greenstein L = RNG(media[i,j].μs) Exp. distribution Virtual Photonics, 09/10/2010
φ = RNG(0~2Pi) θ = RNG(media[i,j].g) L = RNG(media[i,j].μs)
GPU-based MC Algorithm Diagram
Virtual Photonics, 09/10/2010
Recording time-resolved solutions
Accumulate photon packets to the corresponding time window Time window 1
Time window 2
Time window 3
...... Virtual Photonics, 09/10/2010
Random number generators
Mersenne Twister: MT19937 (shared memory) Logistic map and lattices: a coupled chaotic system, floating-point only RNG When r=4, x is a random-variable with PDF:
Coupled Lattice: Virtual Photonics, 09/10/2010
Wagner, 1995 Eric Mills, parallel MT19937 random number generator
Handling of Boundary Reflection
Case 1
Case 2
Detecting the exit point: 1. one intersection: find C1 from Pn and done 2. two intersections: find C1, if not, find C2 from Pn+1, done 3. three intersections: if not C1 and C2, orientation of C3 is uniquely decided
Virtual Photonics, 09/10/2010
Atomic vs. non-atomic operations
Non-atomic operations:
global val;
newval=val+x;
val=newvalue;
Race conditions: when multiple threads read/write a single address, value may become non-consistent. How bad is the race condition?
Virtual Photonics, 09/10/2010
Atomic operations:
atomicAdd(val,x);
read/add/write in one instruction Block other threads to avoid racing
Validation and comparisons
Semi-infinite medium
Virtual Photonics, 09/10/2010
Modeling complex geometry
Collins adult brain atlas
Segmented with FreeSurfer
Virtual Photonics, 09/10/2010
Speed benchmark
Non-atomic: more threads, better speed-up Atomic: peaks around 512~1024 threads, a lot slower
semi-infinite Virtual Photonics, 09/10/2010
Collins atlas
Recent development MCX v0.5
Vanilla MCX
Detective MCX
Cached MCX
Atomic MCX
Saving partial-path-length info at detector sites Using shared-memory cache and atomics to improve accuracy near the source
Native matlab mex files
Automatic thread/block configuration (experim.)
Internal boundary reflection (experimental)
Virtual Photonics, 09/10/2010
Matlab MCX - mcxlab
Sample code:
cfg.nphoton=1e7; cfg.vol=uint8(ones(60,60,60)); cfg.srcpos=[30 30 1]; cfg.srcdir=[0 0 1]; cfg.gpuid=1; cfg.autopilot=1; cfg.prop=[0 0 1 1;0.005 1 1 0]; cfg.tstart=0; cfg.tend=5e-9; cfg.tstep=5e-10; cfg.session='testmcx'; cfgs(1)=cfg; cfgs(2)=cfg; cfgs(2).session='testmcx2'; mcxlab(cfgs);
Virtual Photonics, 09/10/2010
MCX Studio – A GUI front-end
Virtual Photonics, 09/10/2010
MCX for heterogeneous platforms
Heterogeneous computing platform
OpenCL
MCX-CL Now supports ATI and nVidia GPUs, as well as Multi-core CPUs. Device 1
MCX-CL
Virtual Photonics, 09/10/2010
OpenMP multi-threading
Device 2 Device N
Gathering /saving
Plücker-coordinate-based ray-tracing
A 3D ray R can be represented by R(d,m)
Benefit: simplicity in a ray-triangle intersection test: w 1= d⃗R⋅m⃗e1 + d⃗e1⋅m⃗R w 2= d⃗R⋅m⃗e2 + d⃗e2⋅m⃗R w 3= d⃗R⋅m⃗e3 + d⃗e3⋅m⃗R
m=p0x p1
R enters ABC ←if ∀ w i ≥0∧∃w i≠0 Rexits ABC ←if ∀ w i ≤0∧∃w i ≠0 R is coplanar with ABC ←if ∀ wi =0 otherwise , R misses ABC
d=p0- p1 p0
Virtual Photonics, 09/10/2010
p1
Barycentric coordinates u i=wi / ∑ wi
Ray-tracing in a mesh
1. Starting from the element (e0) enclosing the current point p0, determine scattering vector R (p0 → p1)
2. compute the exit point px in e0
3.a if ||R||>||p0px||, set p0 ← px, deposit energy-loss to e0, ||R||←||R||-||p0px||, set e0 to the face-neighbor element, repeat from 2
3.b if ||R||