Anda di halaman 1dari 28

1

Update on big.LITTLE on TC2


Morten Rasmussen
Technology Researcher
2
Agenda
big.LITTLE Software solutions overview
ARM's Test Chip 2 overview
Benh!ar"ing Metho#olog$ an# %se Cases
I&S status up#ate
big.LITTLE M' status up#ate

big.LITTLE o!er!ie"
'erfor!ane an# power effiien$ in one s$ste!(
Corte#$A1% !s Corte#$A&
'er(ormance
Corte#$A& !s Corte#$A1%
Energy E((iciency
)hrystone 1.*# .%#
+)CT 2.# .,#
IM)CT .-# .-#
MemCopy L1 1.*# 2.#
MemCopy L2 1.*# ..#
.
I/0 solution 1 2asics
In)&ernel Swither *I&S+(
Targete# first generation big.LITTLE pro#uts.
Corte#$A&
Corte#$A1%
/ernel
scheduler I/0
Tas3 1
Tas3 2
Logical C'U
4
%
M' solution
Corte#$A&
Corte#$A1%
/ernel
scheduler
Tas3 1
Tas3 2
4
,
ARMs Test Chip 2 (TC#2): An Overview

A Versatile Express core tile


publically available:

Capabilities

2 x A15 (r2p1) @ up to 1.2 Ghz

x A! (r"p1) @ up to 1Ghz

CC#$%&C$G#C$A%' (r"p")

%&A (()")

2G' exter*al %%+2 ,e,ory


@ -""&hz

.-/ i*ter*al 0+A&

Coresi1ht 2ebu1 (i*clu2i*1 34AG


a*2 #4& trace but *o 04&)

5o G(6

cpu7re8 support: #*2epe*2e*t 7or


each cluster 9ith li,ite2 volta1e
scali*1

cpui2le support: Cluster po9er


1ati*1
TC2
-
Benchmarking Methodoog!
Results
'erfor!ane
'ower
Configurable(
-
CCI
-
ftrae
-
strea!line
C0V co*7i1:
-
6se case
-
0che2uli*1 ,o2el
-
5u,bers o7 cores to use
-
0cali*1 1over*ors
Auto,ate2 syste, 7or
ru**i*1 user 9or/loa2s
o* tar1et 2evice
Choose 9or/loa2
Choose C(6 ,o2e:
Cortex:A!; Cortex:A15; &i1ratio*
(cluster or C(6); or &(
Choose active cores i* each
cluster
4C2: 1:2 bi1; 1: )#44)E
Choose %V<0 1over*or:
#*teractive; per7or,a*ce;
po9ersave; o*2e,a*2
Exte*sible = para,eterisatio*
,
I/0 solution
Targete# first generation big.LITTLE pro#uts.
Corte#$A&
Corte#$A1%
/ernel
scheduler I/0
Tas3 1
Tas3 2
Logical C'U
4
C56+I)E6TIAL *
I/07 C'U Migration
big.LITTLE e.ten#s /01S
/01S algorith! !onitors loa# on eah
C'%
2hen loa# is low it an be han#le# on a
LITTLE proessor
2hen loa# is high the onte.t is
transferre# to a big proessor
The unuse# proessor an be powere#
#own
2hen all proessors in a luster are
inative the luster an# its L2 ahe an
be powere# #own
C56+I)E6TIAL 1-
I/07 C'U Migration
big.LITTLE e.ten#s /01S
/01S algorith! !onitors loa# on eah
C'%
2hen loa# is low it an be han#le# on a
LITTLE proessor
2hen loa# is high the onte.t is
transferre# to a big proessor
The unuse# proessor an be powere#
#own
2hen all proessors in a luster are
inative the luster an# its L2 ahe an
be powere# #own
11
I/07 5'' mapping to A& 8 A1% on TC2
0irtual 1re3uen$ !aps 4''s to big or LITTLE ores
"irt#a
O$$
$h!sica O$$
A%
$h!sica O$$
A&'

"otage
A!
5"""" 5"""" V1
-""""" -""""" V1
... > > V1
?""""" ?""""" V1
@""""" @""""" V2
1"""""" 1"""""" V
A15
12""""" .""""" V1
1-""""" !""""" V1
... > 2> V1
2"""""" 1"""""" V1
22""""" 11""""" V2
2-""""" 12""""" V
12
I/07 Results (or Audio on TC2
'ower o!pare# to e.euting the use ase on A56
I&S #oes not use A56s #uring Au#io run
-78 saving
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
2etter results e#pected on
representati!e silicon.
1
I/07 Results (or 22ench < Audio on TC2
'erfor!ane is !easure# as fro! page loa#ing ti!es of
BBenh
Results nor!alise# to power an# perfor!ane onsu!e# on
sa!e use ase run on A56 onl$
BBenh page 9 Au#io
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
2etter results e#pected on
representati!e silicon.
1.
I/07 5''s on TC2
1%
I/07 Interacti!e go!ernor on TC2
if (cpu_load >= go_hispeed_load){
...
new_freq = max_freq * cpu_load / 100
...
!
else {
...
new_freq = hispeed_freq*cpu_load/100
...
!

1or A56 on TC2 with a go:highspee# at ;68 *#efault+ this algorith!


onl$ uses over#rive setion of A56

Approah is to intro#ue a seon# point of infletion(highspee#2


1=
I/07 :ispeed2
1&
I/07 Results7 2bench < Audio
'ower i!proves with no perfor!ane ost
BBenh page 9 Au#io
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
2etter results e#pected on
representati!e silicon.
1,
M' solution
Corte#$A&
Corte#$A1%
/ernel
scheduler
Tas3 1
Tas3 2
4
1*
M' solution 1 more details
She#uler !o#ifiations(
Treat big an# LITTLE pus as
separate she#uling #o!ains.
%se '<T's loa#)tra"ing pathes
to tra" in#ivi#ual tas" loa#.
Migrate tas"s between the big an#
the LITTLE #o!ains base# on
tas" loa#.
'ath set available through Linaro.
L
2 2
L
Load balance Load balance
Load$based tas3 migration
Tas3 load
Tas3 state
E#ecuting 0leep
Load decay
2-
M'7 E#perimental Implementation
She#uler !o#ifiations(
Appl$ '<Ts= loa#)tra"ing path set.
Set up big an# little she#:#o!ains with no
loa#)balaning between the!.
selet:tas":r3:fair*+ he"s tas" loa#
histor$ to selet appropriate target C'% for
tas"s wa"ing up.
A## fore# !igration !ehanis! to push of
the urrentl$ running tas" to big ore si!ilar
to the e.isting ative loa# balaning
!ehanis!.
'erio#iall$ he"
*run:rebalane:#o!ains*++ urrent tas" on
little run3ueues for tas"s that nee# to be
fore# to !igrate to a big ore.
L
2 2
L
load>balance load>balance
select>tas3>r?>(air@A8
+orced migration
21
M'7 ARM TC27 Audio
2or"loa#( Au#io *!p> pla$ba"+
'erfor!ane?Energ$ target(
A- energ$
Status(
Au#io relate# tas" #o not use A56s@ but
the power onsu!ption is still
signifiantl$ !ore than A- alone.
M' not as power effiient as I&S $et
To#o(
Target spurious wa"e)ups on A56. All
the e.tra power o!es fro! the A56's
whih shoul#n't be use# at all.
Energy
A& -.&*B
M' *.,=B
7
57
27
>7
A7
67
,7
-7
;7
B7
577
Au#io
A56
A- 2C'%
I&S
M'
E
n
e
r
g
$
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
2etter results e#pected on
representati!e silicon.
22
M'7 Audio "or3load analysis
2here is the e.tra energ$ spent
with M'C
Dee# a loo" at wh$ A56's onsu!e
power when the$ are not neessar$.
A- M'
7
7.2
7.A
7.,
7.;
5
5.2
5.A
5.,
Au#io energ$ brea"#own
A56 luster
A- luster
E
n
e
r
g
$
hrtimer (unctions cpu- cpu1 cpu2 cpu cpu.
hrtimer>"a3eup 2 2 1212 .1& 1*-
tic3>sched>timer .-. %, ., %-& &&*
CD (unctions cpu- cpu1 cpu2 cpu cpu.
!mstat>update - 2 2& 2% 2,
cache>reap 1% 2 1. 1 1.
phy>state>machine 1 - - - -
Enter idle cpu- cpu1 cpu2 cpu cpu.
- = 2 2&* 2=- .2
1 ,-1 ,-& ,1= *& *=%2
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
2etter results e#pected on
representati!e silicon.
2
0cale in!ariant load
Loa# au!ulation rate #oes not sale with available
o!pute apait$ *fre3uen$@ big?LITTLE pu+
Currentl$@ there is no lin" between pufre3 an# the she#uler
Tas"s !a$ be !igrate# awa$ fro! a pu at low fre3uen$ b$ the
she#uler before pufre3 has inrease# the fre3uen$ to !ath the
pu loa#.
Saling the tra"e# loa# au!ulation to !ath the urrent
fre3uen$ !itigates this issue.
Tas"s annot au!ulate enough loa# at low fre3uen$ to trigger
!igration an# !ust wait for pufre3 to reat first.
+re? E # +re? E 2#
2.
0cale in!ariant load
!.!?2.1 !.!?2.2 !.!?2. !.!?2.- !.!?2.5 !.!?2..
"
2""
-""
.""
?""
1"""
!.2.@5 !.."5 !..15 !..25 !..5 !..-5
"
2""
-""
.""
?""
1"""
5riginal +re?uency in!ariant
2%
Load accumulation rate
1or so!e wor"loa#s tra"e# loa# saturates too fast an# lea#s
to unneessar$ tas" !igrations.
E.ten#ing the tra"e# loa# histor$ re#ues tra"e# loa#
variations #ue to su##en hanges in the loa# harateristis.
Inreasing the $ fator in the loa# e.pression #ereases the
loa# au!ulation an# #ea$ rates.
load=
u
0
+u
1
y+u
2
y
2
++u
n
y
n
1024+y+y
2
++y
n
+1
5 25 A5 ,5 ;5 575
,
55
5, 2,
>5
>, A,
65
6, ,,
-5
-, ;,
B5
B, 57,
555
55,
525
52,
5>5
5>,
5A5
5A,
565
7
7.5
7.2
7.>
7.A
7.6
7.,
7.-
7.;
7.B
5
$E7.B-;6
Ti!e F!sG
y<1, 0u<1024
2=
Load accumulation rate
Inreasing $ lea#s to a !ore onservative tra"e# loa#
Shoul# lea# to less up?#own !igrations
Inreases up?#own !igrations #ela$ for tas"s that nee#s to be
!igrate#.
5 - 5> 5B 26 >5 >- A> AB 66 ,5 ,- -> -B ;6 B5 B-
A 57 5, 22 2; >A A7 A, 62 6; ,A -7 -, ;2 ;; BA 577
57>
57,
57B
552
556
55;
525
52A
52-
5>7
5>>
5>,
5>B
5A2
5A6
5A;
565
56A
56-
5,7
5,>
5,,
5,B
5-2
5-6
5-;
5;5
5;A
5;-
5B7
5B>
5B,
5BB
Loa# au!ulation rate
Tas"
$E7.B-;6
$E7.B;AA
$E7.BB22
Ti!e F!sG
T
r
a

"
e
#

l
o
a
#
2&
M' 1 Top Issues
Spurious wa"eups
A56s are wo"en up b$ she#uler ti"s *!ainl$+
2or"3ueues
Ti!ers
RC%
pu wa"eup prioritisation
'i" the heapest target pu
Hlobal balaning
Sprea# loa# to A-s when A56s are overloa#e#
'a" vs. sprea#
Cluster aware pufre3 governors
2,
Duestions4

Anda mungkin juga menyukai