Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
CUDA - A Very Short Intro Manuel Werlberger Insitute for Computer Graphics and Vision Graz University of Technology
Freiburg, July 22, 2011
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
1 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
Why GPUs?
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
2 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
Resources / Credits • ‘Best’ introduction:
CUDA, Supercomputing for the Masses [Dr.Dobb’s Journal] • [GP-GPU course @ETHZ] • NVIDIA Developer Zone
[http://developer.nvidia.com] • NVIDIA CUDA Toolkit includes some pdfs. (programming guide, reference guide, best practices guide, . . . ) • NVIDIA Guides
[http://developer.nvidia.com/nvidia-gpu-computing-documentation] • Books • CUDA by Example: An Introduction to General-Purpose GPU Programming (Sanders et al.) • Programming Massively Parallel Processors: A Hands-On Approach (Kirk et al.) [course slides] • Webinars Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
3 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
History (with NVIDIA subtitles)
2007: CUDA 1.0 (Researcher) 2008: CUDA 2.0 (Scientists and HPC applications) 2009: CUDA 3.0 (Applications) 2011: CUDA 4.0 (‘For the masses’)
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
4 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
be aware . . .
NOT EVERYTHING YOU CAN DO WITH A GPU IS GOOD!
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
5 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
Outline
1
Introduction
2
The GPU: CUDA Architecture GPU Architecture Memory Architecture Program Structure
3
Real-World Example
4
Tricks (?)
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
6 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
"#$!%$&'()!*$#+),!-#$!,+'.%$/&).0!+)!12(&-+)34/(+)-!.&/&*+2+-0!*$-5$$)!-#$!678!&),! -#$!978!+'!-#&-!-#$!978!+'!'/$.+&2+:$,!1(%!.(;/!&'!'.#$;&-+.&220!+22,CEF% G/+=1'%(H%I/>C,9'(A133('3F%*-0%G/+=1'%(H%!"#$% !('13%
Different GPUs: !
"#$%&'(! ")%)*+,+'-!
.&$*(/!#0! 1&,'+%/#2(33#/3!
.&$*(/!#0! "456!"#/(3!
.1J('A1%.?K%LM6%?,%
N57%
O%
8O4%
.1J('A1%.?K%4M6%
N57%
P%
88M%
.1J('A1%.?K%4P6I%
N57%
M%
NOO%
.1J('A1%.?Q%4L6F%.?K%4M6I%
N57%
4%
7RN%
.1J('A1%.?%44LI%
N57%
8%
744%
.1J('A1%.?%48LIF%.?%4NLIF% .?%4N6I%
N57%
N%
RM%
.1J('A1%.?%47LI%
N57%
7%
4O%
.1J('A1%.?K%LO6%
N56%
7M%
L7N%
.1J('A1%.?K%LP6F%.?K%4O6%
N56%
7L%
4O6%
.1J('A1%.?K%4P6%
N56%
74%
44O%
.1J('A1%.?K%4MLF%.?K%4O6I%
N56%
77%
8LN%
.1J('A1%.?K%NRL%
758%
N:86%
N:N46%
.1J('A1%.?K%NOLF%.?K%NO6F% .?K!NPL%
758%
86%
N46%
.1J('A1%.?K%NM6%
758%
N4%
7RN%
.1J('A1%RO66%.KN%
757%
N:7M%
N:7NO%
.1J('A1%.?Q%NL6F%.?Q%7L6F% RO66%.?KF%RO66%.?KSF% OO66%.?Q%L7NF%.?K%NOLIF% .?K!NO6I%
757%
7M%
7NO%
.1J('A1%OO66%">C'*F%OO66%.?K%
756%
7M%
7NO%
.1J('A1%RO66%.?F%OO66%.?F%
757%
74%
77N%
Manuel (ICG, TU-Graz)
CUDA
Graz University of Technology
22.7.2011
11 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
Outline
1
Introduction
2
The GPU: CUDA Architecture GPU Architecture Memory Architecture Program Structure
3
Real-World Example
4
Tricks (?)
Graz University of Technology
Manuel (ICG, TU-Graz)
CUDA
22.7.2011
12 / 47
Introduction
The GPU: CUDA Architecture
Real-World Example
Tricks (?)
End
Memory? !
!"#$%&'()*(+',-'#../0-(1,2&3!
67)(&,! 8()9;7)(&,!#$/! '('$)*!
67)(&,!.#$/0!
! 8()9%#$/0!:7&)(,! '('$)*!
Local Memory: Registers. Only accessible from thread level. !
Shared")+,!-! Memory: Shared among threads within a MP. Read/write access by any ! ! thread from within a MP. .#$/0!1-3!-5! .#$/0!143!-5! .#$/0!123!-5!
.#$/0!1-3!45!
.#$/0!143!45!
Manuel (ICG, TU-Graz)
.#$/0!123!45! CUDA
! ! ! ! ! !
Graz University of Technology
22.7.2011
13 / 47
Introduction
The GPU: CUDA Architecture 67)(&,!.#$/0!
Real-World Example
Tricks (?)
End
! 8()9%#$/0!:7&)(,! '('$)*!
Memory? ")+,!-! .#$/0!1-3!-5!
.#$/0!143!-5!
.#$/0!123!-5!
.#$/0!1-3!45!
.#$/0!143!45!
.#$/0!123!45!
")+,!4! .#$/0!1-3!-5!
.#$/0!143!-5!
.#$/0!1-3!45!
.#$/0!143!45!
.#$/0!1-3!25!
.#$/0!143!25!
! ! ! ! ! ! ! ! ! ! ! ! ! ! "#$%!'('$)*!
Global Memory (Device Memory): SDRAM chip. Any thread can read/write to any location in device memory.
(
?/-8'&()@)