SU(3) gluodynamics on GPU V.Demchik
Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster
SU(3) gluodynamics on Graphics Processing Units Vadim Demchik
hgpu.org
[email protected]
Summary
Dniepropetrovsk National University Dniepropetrovsk, Ukraine
May 5, 2011
1.31
Outline SU(3) gluodynamics on GPU V.Demchik
1
Motivation
2
Lattice formulation Kernel of MC procedure Lattice QCD software packages
3
GPU implementation Hardware Program model Pseudo-random number generators GPU cluster
4
hgpu.org
5
Summary
Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster hgpu.org Summary
2.31
Physical background SU(3) gluodynamics on GPU V.Demchik
The cost of simulation of one configuration is [Polikarpov]:
Motivation Lattice formulation MC kernel Packages
4 · 10−6
GPU implementation Hardware Program model PRNG GPU cluster
mπ mρ
−6
(L[fm])5 (a[GeV ])−7 Teraflops × year
a[GeV ] - lattice spacing L[fm] - lattice size mπ /mρ - defines the light quark mass (for light quarks mπ → 0 according to the chiral perturbation theory)
hgpu.org Summary
Typical values now: a ≈ 0.1fm, L ≈ 2 − 4fm, mq ≈ 100MeV.
QCD on a lattice requires a very powerful computer resources
3.31
(1)
Computational resources SU(3) gluodynamics on GPU V.Demchik
Motivation Lattice formulation
Tianhe-1A - NUDT TH
MC kernel Packages
MPP, X5670 2.93GHz 6C,
Jaguar - Cray XT5-HE
186368 cores, NVidia GPU,
Opteron 6-core 2.6 GHz,
FT-1000 8C (China),
224162 cores (USA),
Rcore = 25GFlops,
Rcore = 10GFlops,
GPU implementation Hardware Program model PRNG GPU cluster hgpu.org
Rpeak = 4.70PFlops
Summary Rank 1 2 3 4 5 17 22 499 500
Country China USA China Japan USA Russia Germany France UK
120640 cores, NVidia Tesla C2050 GPU (China), Rcore = 25GFlops, Rpeak = 2.98PFlops
Rpeak = 2.33PFlops
Computer NUDT TH MPP, NVIDIA GPU Cray XT5-HE Dawning TC3600 Blade, Tesla C2050 GPU HP ProLiant SL390s, Nvidia GPU Cray XE6 T-Platforms T-Blade2 Supermicro Cluster, ATI Radeon GPU xSeries x3650M2 Cluster Cluster Platform 3000 BL460c G1
Nebulae - Dawning TC3600 Blade, Intel X5650,
Year 2010 2009 2010 2010 2010 2009 2010 2010 2009
Cores 186368 224162 120640 73278 153408 35360 15120 5392 5856
Rmax 2566000 1759000 1271000 1192000 1054000 350100 285200 31124.4 31112.2
Rpeak 4701000 2331000 2984300 2287630 1288630 414419 469728 57500.3 58560
P
Rpeak = 64655.31 TFlops USA - 274 computers (48.7%), China - 41 (18.0%), Japan - 26 (7.1%), France - 26 (5.6%), Germany - 26 (5.4%), UK - 25 (3.5%), Russia - 11 (1.8%) http://www.top500.org/ (Nov.2010) 4.31
Alternative computational resource SU(3) gluodynamics on GPU V.Demchik
OR
Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster
+
hgpu.org Summary
AMD Phenom II X6 1055T @ 2.8GHz, 12GB, 2TB RAID-1, ATI Radeon HD5870, HD6970, Rpeak = 5.4TFlops
5.31
Why GPU? SU(3) gluodynamics on GPU V.Demchik
Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster hgpu.org Summary
6.31
GPU architecture (SIMD vs. scalar operations)
Bibliography SU(3) gluodynamics on GPU V.Demchik
Motivation Lattice formulation
Books on lattice gauge theory:
MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster
C. Gattringer, C.B. Lang, Quantum Chromodynamics on the Lattice, Lect. Notes Phys. 788 (2010) 343p. T. DeGrand, C. DeTar, Lattice Methods for Quantum Chromodynamics, World Scientific (2006) 345p.
hgpu.org
H. Rothe, Lattice gauge theories: an introduction, World Scientific (2005) 3rd ed., 590p.
Summary
I. Montvay, G. M¨ unster, Quantum fields on a Lattice, Cambridge University Press (1996) 491p.
7.31
Lattice formulation SU(3) gluodynamics on GPU V.Demchik
We used hypercubic lattice Lt × L3s with hypertorus geometry
Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster
Standard Wilson action of SU(3) LGT is used XX 1 SW = β 1 − Tr Uµν (x) 3 x µ