Voxel- and mesh-based Monte Carlo methods for 3D photon transport simulations

Voxel- and mesh-based Monte Carlo methods for 3D photon transport simulations Qianqian Fang, Ph.D. Martinos Center for Biomedical Imaging Massachusett...
Author: Warren Wells
0 downloads 2 Views 2MB Size
Voxel- and mesh-based Monte Carlo methods for 3D photon transport simulations Qianqian Fang, Ph.D. Martinos Center for Biomedical Imaging Massachusetts General Hospital Harvard Medical School Virtual Photonics, 09/10/2010

Software website: http://mcx.sf.net

Outline 

Introduction 







Pros and Cons of Monte Carlo simulations

MCX: a GPU-accelerated MC software 

Algorithm flowchart



Boundary reflections



Atomic vs. non-atomic operations

MMC: a mesh-based MC software 

Ray-tracing in the Plücker coordinates



Multi-threading with OpenMP



Validations

Website and resources (http:​//mcx.sf.net)

Virtual Photonics, 09/10/2010

Modeling photon transport in complex media 

Transport equation → Finite element, Finite volume, Monte Carlo



Diffusion → Finite element, finite difference …



Software tools: 



Finite element: NIRFAST (Dartmouth), TOAST (UCL), Redbird (MGH) Monte Carlo:

Virtual Photonics, 09/10/2010



MCML: Wang & Jacques (1995), layered media



tMCimg: Boas (2002), 3D complex media



CUDAMCML: Alerstam (2009), layered media+GPU

Pros of MC 

Simplicity



General media (low-scattering, high absorption, g)



Discontinuities



Flexible source settings



Ease in time-domain simulations



Highly parallelizable

Virtual Photonics, 09/10/2010

Cons of MC 

(Very) slow in speed



Difficulties for modeling complex geometries Low speed

Massively Parallel Computing with GPU

Modeling Complex shapes

Mesh-based Monte Carlo

Virtual Photonics, 09/10/2010

Monte Carlo eXtreme MCX

MMC

Multi-core processors and GPU 



Multi-core is becoming the mainstream Graphics hardware have many cores and optimized for processing data-parallel tasks Pixel shaders

Vertex shaders

Virtual Photonics, 09/10/2010

How is a GPU different from a CPU? 



Pros: 

Many-core processor



High speed internal memory



Parallel pipeline



Highly optimized

Cons: 





No recursion,no function pointers, no host functions GPU-CPU memory transfer is slow Portability across hardware

Virtual Photonics, 09/10/2010

NVIDIA CUDA Programming Guide

Project overview Based on visitor history at http://mcx.sf.net 2009

v0.2b

v0.2

v0.5b MMC v0.2 2010 In the past year ~580 total downloads ~90 registered users 2840 unique visits from 65 regions 4 major releases MCX: 200 SVN commits MMC: 110 SVN commits MCX-CL: 35 SVN commits

Virtual Photonics, 09/10/2010

Photon transport in a voxelated space Incident Position and vector

Scattering Angle: θ Accumu. interval

Scattering Length: L θ = RNG(media[i,j].g) Henyey-Greenstein L = RNG(media[i,j].μs) Exp. distribution Virtual Photonics, 09/10/2010

φ = RNG(0~2Pi) θ = RNG(media[i,j].g) L = RNG(media[i,j].μs)

GPU-based MC Algorithm Diagram

Virtual Photonics, 09/10/2010

Recording time-resolved solutions 

Accumulate photon packets to the corresponding time window Time window 1

Time window 2

Time window 3

...... Virtual Photonics, 09/10/2010

Random number generators 



Mersenne Twister: MT19937 (shared memory) Logistic map and lattices: a coupled chaotic system, floating-point only RNG When r=4, x is a random-variable with PDF:

Coupled Lattice: Virtual Photonics, 09/10/2010

Wagner, 1995 Eric Mills, parallel MT19937 random number generator

Handling of Boundary Reflection 

Case 1



Case 2

Detecting the exit point: 1. one intersection: find C1 from Pn and done 2. two intersections: find C1, if not, find C2 from Pn+1, done 3. three intersections: if not C1 and C2, orientation of C3 is uniquely decided

Virtual Photonics, 09/10/2010

Atomic vs. non-atomic operations 





Non-atomic operations: 

global val;



newval=val+x;



val=newvalue;

Race conditions: when multiple threads read/write a single address, value may become non-consistent. How bad is the race condition?

Virtual Photonics, 09/10/2010



Atomic operations: 





atomicAdd(val,x);

read/add/write in one instruction Block other threads to avoid racing

Validation and comparisons 

Semi-infinite medium

Virtual Photonics, 09/10/2010

Modeling complex geometry 

Collins adult brain atlas



Segmented with FreeSurfer

Virtual Photonics, 09/10/2010

Speed benchmark 



Non-atomic: more threads, better speed-up Atomic: peaks around 512~1024 threads, a lot slower

semi-infinite Virtual Photonics, 09/10/2010

Collins atlas

Recent development MCX v0.5

Vanilla MCX 



Detective MCX

Cached MCX

Atomic MCX

Saving partial-path-length info at detector sites Using shared-memory cache and atomics to improve accuracy near the source



Native matlab mex files



Automatic thread/block configuration (experim.)



Internal boundary reflection (experimental)

Virtual Photonics, 09/10/2010

Matlab MCX - mcxlab 

Sample code:

cfg.nphoton=1e7; cfg.vol=uint8(ones(60,60,60)); cfg.srcpos=[30 30 1]; cfg.srcdir=[0 0 1]; cfg.gpuid=1; cfg.autopilot=1; cfg.prop=[0 0 1 1;0.005 1 1 0]; cfg.tstart=0; cfg.tend=5e-9; cfg.tstep=5e-10; cfg.session='testmcx'; cfgs(1)=cfg; cfgs(2)=cfg; cfgs(2).session='testmcx2'; mcxlab(cfgs);

Virtual Photonics, 09/10/2010

MCX Studio – A GUI front-end

Virtual Photonics, 09/10/2010

MCX for heterogeneous platforms 

Heterogeneous computing platform

OpenCL

MCX-CL Now supports ATI and nVidia GPUs, as well as Multi-core CPUs. Device 1

MCX-CL

Virtual Photonics, 09/10/2010

OpenMP multi-threading

Device 2 Device N

Gathering /saving

Plücker-coordinate-based ray-tracing 

A 3D ray R can be represented by R(d,m)



Benefit: simplicity in a ray-triangle intersection test: w 1= d⃗R⋅m⃗e1 + d⃗e1⋅m⃗R w 2= d⃗R⋅m⃗e2 + d⃗e2⋅m⃗R w 3= d⃗R⋅m⃗e3 + d⃗e3⋅m⃗R

m=p0x p1

R enters ABC ←if ∀ w i ≥0∧∃w i≠0 Rexits ABC ←if ∀ w i ≤0∧∃w i ≠0 R is coplanar with ABC ←if ∀ wi =0 otherwise , R misses ABC

d=p0- p1 p0

Virtual Photonics, 09/10/2010

p1

Barycentric coordinates u i=wi / ∑ wi

Ray-tracing in a mesh 

1. Starting from the element (e0) enclosing the current point p0, determine scattering vector R (p0 → p1)



2. compute the exit point px in e0



3.a if ||R||>||p0px||, set p0 ← px, deposit energy-loss to e0, ||R||←||R||-||p0px||, set e0 to the face-neighbor element, repeat from 2



3.b if ||R||

Suggest Documents