Raja Koduri
And had more too! 16+ years of (sugar) high! In every GPU generation More performance and performance-per-watt More programmability, precision and features While maintaining compatibility with 8+ generations of
APIs
Basic Equations
Chip Power = Static Power + Dynamic Power System Power = CPU + GPU + Other Static power is leakage of inactive transistors
Static Power = N*V*e-Vt Dynamic power is from active switching transistors Dynamic Power = A*N*C*F*V2
N - Number of transistors V - Voltage Vt - Thresh-hold Voltage A - Activity Factor F - Frequency C - Capacitance per transistor
3
Desktops
TDP - Thermal design power Maximum amount of power that the thermal system can sustain
4
0W
CPUs Prioritize Frequency - higher V and Vt Spend N for caches, cores, flexibility and compute quality GPUs Lower Frequency and Voltage Spend N for shaders, textures, pixels etc FixedFunction Lowest N, F and V for a given task
8
10
Fine print
Latency with power toggle (few nano-seconds to a few hundred microseconds) Too aggressive switching may cause performance problems, too conservative
switching may lead to wasted power
11
Fine Print
Switching F&V states can range between a few milliseconds to few seconds!
12
Activity->Performance->Power
Activity/Performance/Power sample illustration
1.0
Activity/Frequency/Power
0.8
0.5
0.3
Time
Activity Frequency Power
13
GPU
14
Tip 1 Control frame rate to minimum desired Pumping out more frames than user can see is wasteful anyways
15
350
300
150
100
50
10
20
30
40
50
60
70
80
90
100
110
16
25 Watt System
350
300
250
200
150
60 fps app
100
50
10
20
30
40
50
60
70
80
90
100
17
18
Tip 2 Optimize the frame rendering time to a minimum Dont stop optimizing when you hit your minimum frame rate targets!
19
80
70
60
Activity
50
40
30
20
10
8 Time in Milliseconds
10
12
14
16
20
80
70
60
Activity
50
40
30
20
10
8 Time in Milliseconds
10
12
14
16
21
Tip 3 Dont scatter work in a frame (coalesce) Insufficient idle intervals for power-state reduction
22
70
60
Activity
50
40
30
20
10
8 Time in Milliseconds
10
12
14
16
Tip 4 Avoid spin-loops Eg:- CPU waiting on GPU This looks like real work to CPU
24
25
A complex subject
Dynamic scheduling systems based on power and
thermal feedback Hardware v/s Software schedulers Scheduling CPU and GPU Many more topics
26
FixedFunction Revenge
Premature declaration of death of fixed-function
in GPU hardware What new candidates can we move to FixedFunction? What interface principles should FixedFunction hardware adopt to be mainstream programmer friendly?
27
Can we build Predictive models to augment current reactive models? Should there be APIs for apps to influence power states and monitor feedback?
28
29
Backup
30
Reduction Choices
Reduce N, but that reduces capabilities Reduce V, but limits performance and not in your
control(process limits)
70
60
Activity
50
40
30
20
10
8 Time in Milliseconds
10
12
14
16
32