SU(3) gluodynamics on Graphics Processing Units

SU(3) gluodynamics on GPU V.Demchik Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster SU(...
1 downloads 2 Views 1MB Size
SU(3) gluodynamics on GPU V.Demchik

Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster

SU(3) gluodynamics on Graphics Processing Units Vadim Demchik

hgpu.org

[email protected]

Summary

Dniepropetrovsk National University Dniepropetrovsk, Ukraine

May 5, 2011

1.31

Outline SU(3) gluodynamics on GPU V.Demchik

1

Motivation

2

Lattice formulation Kernel of MC procedure Lattice QCD software packages

3

GPU implementation Hardware Program model Pseudo-random number generators GPU cluster

4

hgpu.org

5

Summary

Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster hgpu.org Summary

2.31

Physical background SU(3) gluodynamics on GPU V.Demchik

The cost of simulation of one configuration is [Polikarpov]:

Motivation Lattice formulation MC kernel Packages

4 · 10−6

GPU implementation Hardware Program model PRNG GPU cluster



mπ mρ

−6

(L[fm])5 (a[GeV ])−7 Teraflops × year

a[GeV ] - lattice spacing L[fm] - lattice size mπ /mρ - defines the light quark mass (for light quarks mπ → 0 according to the chiral perturbation theory)

hgpu.org Summary

Typical values now: a ≈ 0.1fm, L ≈ 2 − 4fm, mq ≈ 100MeV.

QCD on a lattice requires a very powerful computer resources

3.31

(1)

Computational resources SU(3) gluodynamics on GPU V.Demchik

Motivation Lattice formulation

Tianhe-1A - NUDT TH

MC kernel Packages

MPP, X5670 2.93GHz 6C,

Jaguar - Cray XT5-HE

186368 cores, NVidia GPU,

Opteron 6-core 2.6 GHz,

FT-1000 8C (China),

224162 cores (USA),

Rcore = 25GFlops,

Rcore = 10GFlops,

GPU implementation Hardware Program model PRNG GPU cluster hgpu.org

Rpeak = 4.70PFlops

Summary Rank 1 2 3 4 5 17 22 499 500

Country China USA China Japan USA Russia Germany France UK

120640 cores, NVidia Tesla C2050 GPU (China), Rcore = 25GFlops, Rpeak = 2.98PFlops

Rpeak = 2.33PFlops

Computer NUDT TH MPP, NVIDIA GPU Cray XT5-HE Dawning TC3600 Blade, Tesla C2050 GPU HP ProLiant SL390s, Nvidia GPU Cray XE6 T-Platforms T-Blade2 Supermicro Cluster, ATI Radeon GPU xSeries x3650M2 Cluster Cluster Platform 3000 BL460c G1

Nebulae - Dawning TC3600 Blade, Intel X5650,

Year 2010 2009 2010 2010 2010 2009 2010 2010 2009

Cores 186368 224162 120640 73278 153408 35360 15120 5392 5856

Rmax 2566000 1759000 1271000 1192000 1054000 350100 285200 31124.4 31112.2

Rpeak 4701000 2331000 2984300 2287630 1288630 414419 469728 57500.3 58560

P

Rpeak = 64655.31 TFlops USA - 274 computers (48.7%), China - 41 (18.0%), Japan - 26 (7.1%), France - 26 (5.6%), Germany - 26 (5.4%), UK - 25 (3.5%), Russia - 11 (1.8%) http://www.top500.org/ (Nov.2010) 4.31

Alternative computational resource SU(3) gluodynamics on GPU V.Demchik

OR

Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster

+

hgpu.org Summary

AMD Phenom II X6 1055T @ 2.8GHz, 12GB, 2TB RAID-1, ATI Radeon HD5870, HD6970, Rpeak = 5.4TFlops

5.31

Why GPU? SU(3) gluodynamics on GPU V.Demchik

Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster hgpu.org Summary

6.31

GPU architecture (SIMD vs. scalar operations)

Bibliography SU(3) gluodynamics on GPU V.Demchik

Motivation Lattice formulation

Books on lattice gauge theory:

MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster

C. Gattringer, C.B. Lang, Quantum Chromodynamics on the Lattice, Lect. Notes Phys. 788 (2010) 343p. T. DeGrand, C. DeTar, Lattice Methods for Quantum Chromodynamics, World Scientific (2006) 345p.

hgpu.org

H. Rothe, Lattice gauge theories: an introduction, World Scientific (2005) 3rd ed., 590p.

Summary

I. Montvay, G. M¨ unster, Quantum fields on a Lattice, Cambridge University Press (1996) 491p.

7.31

Lattice formulation SU(3) gluodynamics on GPU V.Demchik

We used hypercubic lattice Lt × L3s with hypertorus geometry

Motivation Lattice formulation MC kernel Packages GPU implementation Hardware Program model PRNG GPU cluster

Standard Wilson action of SU(3) LGT is used  XX 1 SW = β 1 − Tr Uµν (x) 3 x µ

Suggest Documents