Anda di halaman 1dari 12

Case study with atop: memory leakage

Gerlof Langeveld & Jan Christiaan van Winkel


www.atoptool.nl
May 2010
This docuent descri!es the analysis of a slo" syste suffering fro a #rocess "ith a eory
leakage$ %uch #rocess regularly re&uests for ore dynaic eory "ith the su!routine malloc'
"hile the #rograer has (forgotten) to free the re&uested eory again$ *n this "ay the #rocess
gro"s virtually as "ell as #hysically$ Mainly !y the physical gro"th +the #rocess, resident set si-e
increases.' the #rocess inflates like a !alloon and #ushes other #rocesses out of ain eory$ *nstead
of a healthy syste "here #rocesses reach a #ro#er !alance in their eory consu#tion' the total
syste #erforance ight degrade /ust !y one #rocess that leaks eory$
0otice that the Linu1 kernel does not liit the #hysical eory consu#tion of #rocesses$ 2very
#rocess' either running under root identity or non3root identity' can gro" unliited$ *n the last section
of this case study' soe suggestions "ill !e given to decrease the influence of leaking #rocesses on
your overall syste #erforance$
*n order to !e a!le to inter#ret the figures #roduced !y atop' !asic kno"ledge of Linu1 eory
anageent is re&uired$ The ne1t sections descri!e the utili-ation of #hysical eory as "ell as the
i#act of virtual eory !efore focussing on the details of the case itself$
Introduction to physical memory
The #hysical eory +45M. of your syste is su!divided in e&ually3si-ed #ortions' called memory
pages$ The si-e of a eory #age de#ends on the C673architecture and the settings issued !y the
o#erating syste$ Let,s assue for this article that the si-e of a eory #age is 8 9i:$
5t the oent that the syste is !ooted' the co#ressed kernel iage kno"n as the file
/boot/vmlinuz-.... is loaded and deco#ressed in eory$ This static #art of the kernel is
loaded soe"here at the !eginning of the 45M eory$
The running kernel re&uires ho"ever ore s#ace' e$g$ for the adinistration of #rocesses' o#en files'
net"ork sockets' $$$ !ut also to load dynaic loada!le odules$ Therefore' the kernel dynaically
allocates eory using so3called (sla! caches)' in short slab$ When this dynaically allocated kernel
eory is not needed any ore +reoval of #rocess adinistration "hen a #rocess e1its' unloading a
loaded odule' $$$$.' the kernel ight free this eory$ This eans that the sla! s#ace "ill shrink
again$ 0otice that all #ages in use !y the kernel are eory resident and "ill never !e s"a##ed$
5#art fro the kernel' also #rocesses re&uire #hysical #ages for their te1t +code.' static data and stack$
The #hysical s#ace consued !y a #rocess is called (4esident %et %i-e)' in short RSS$ ;o" a #age
!ecoes #art of the 4%% "ill !e discussed in the ne1t section$
The reaining #art of the #hysical eory after the kernel and #rocesses have taken their share is
ainly used for the page cache$ The #age cache kee#s as uch data as #ossi!le fro the disks
+filesystes. in eory in order to i#rove the access s#eed to disk data$ The #age cache consists of
t"o #arts< the #art "here the data !locks of files are stored and the #art "here the etadata !locks
+su#er!locks' inodes' !ita#s' $$$. of filesystes are stored$ The latter #art is called the (!uffer cache)$
Most tools +like free' top' atop. sho" t"o se#arate values for these #arts' res#$ (cached) and
(!uffer)$ The su of these t"o values is the total si-e of the #age cache$
The si-e of the #age cache varies$ *f there is #lenty free eory' the #age cache "ill gro" and if there
is a lack of eory' the #age cache "ill shrink again$
=inally' the kernel kee#s a #ool of free #ages to !e a!le to fullfill a re&uest for a ne" #age and deliver it
fro stock straight a"ay$ When the nu!er of free #ages dro#s !elo" a #articular threshold' #ages that
are currently occu#ied "ill !e freed and added to the free #age #ool$ %uch #age can !e retrieved fro a
#rocess +current #age contents ight have to !e s"a##ed to s"a# s#ace first. or it can !e stolen fro
the #age cache +current #age contents ight have to !e flushed to the filesyste first.$ *n the first case'
the 4%% of the concerning #rocess shrinks$ *n the second case' the si-e of the #age cache shrinks$
2ven the sla! ight shrink in case of a lack of free #ages$ 5lso the sla! contains data that is /ust eant
to s#eed u# certain echaniss' !ut can !e shrunk in case of eory #ressure$ 5n e1a#le is the
incore inode cache that contains inodes of files that are currently o#en' !ut also contains inodes of files
that have recently !een o#en !ut are currently closed$ That last category is ke#t in eory' /ust in case
such file "ill !e o#ened again in the near future +saves another inode retrieval fro disk.$ *f needed
ho"ever' the incore inodes of closed files can !e reoved$ 5nother e1a#le is the directory name
cache +dentry cache. that holds the naes of recently accessed files and directories$ The dentry cache is
eant to s#eed u# the #athnae resolution !y avoiding accesses to disk$ *n case of eory #ressure'
the least3recently accessed naes ight !e reoved to shrink the sla!$
*n the out#ut of atop' the si-es of the eory co#onents that have /ust !een discussed can !e found<
*n the line la!eled M2M' the si-e of the 45M eory is sho"n +tot.' the eory that is currently
free +free.' the si-e of the #age cache +cache+buff. and the si-e of the dynaically allocated
kernel eory +slab.$
The 4%% of the #rocesses can !e found in the #rocess list +the lo"er #art of screen.<
The eory details +su!coand ,m,. sho" the current 4%% #er #rocess in the colun 4%*>2 and +as
a #ercentage of the total eory installed. in the colun M2M$ The #hysical gro"th of the #rocess
during the last interval is sho"n in the colun 4G4?W$
Introduction to virtual memory
When a ne" #rogra is activated' the kernel constructs a virtual addess s#ace for the ne" #rocess$ This
virtual address s#ace descri!es all eory that the #rocess could #ossi!ly use$ =or a starting #rocess'
the si-e of the virtual s#ace is ainly deterined !y the te1t +T. and
data +@. #ages in the e1ecuta!le file' "ith a fe" additional #ages for
the #rocess, stack$
0otice that the #rocess does not consue any #hysical eory yet
during its early startu# stage$
The illustration sho"s an e1ecuta!le file "ith A29i: +B #ages. of te1t
and A29i: +B #ages. of static data$ The kernel has /ust !uilt a virtual
address s#ace for the #ages of the e1ecuta!le file and 8 additional
#ages +e$g$ for stack.$
5fter the virtual address s#ace has !een !uilt' the kernel fills the #rogra counter register of the C67
"ith the address of the first instruction to !e e1ecuted$ The C67 tries to fetch this instruction' !ut
notices that the concerning te1t #age is not in eory$ Therefore the
C67 generates a fault +tra#.$ The fault handling routine of the kernel
"ill load the re&uested te1t #age fro the e1ecuta!le file and restarts
the #rocess at the soe #oint$ 0o" the C67 is a!le to fetch the first
instruction and e1ecutes it$ When a !ranch is ade to an instruction
that lies in another te1t #age' that #age "ill !e loaded via fault
handling as "ell$ The kernel "ill load a data #age as soon as a
reference is ade to a static varia!le$ *n this "ay' any #age can !e
#hysically loaded into eory at its first reference$ 6ages that are not
referenced at all' "ill not !e loaded into eory$
The illustration sho"s that the #rocess has a virtual si-e of B09i: +20
#ages. and a #hysical si-e +4%%. of 1C9i: +8 #ages.$ ?!viously' the
#hysical si-e is al"ays a su!set of the virtual si-e and can never !e
larger than that$
%u##ose that the #rocess allocates eory dynaically +"ith the
su!routine malloc.' the re&uested s#ace "ill initially only e1tend
the #rocess, virtual address s#ace$ ?nly "hen the #roces really refers
to a #age in the dynaic area' that #age is #hysically created and
filled "ith !inary -eroes$
The illustration sho"s that the first alloc,ed area has a vrtual si-e of
8B9i: +12 #ages. "hile only 1C9i: +8 #ages. are #hysically created
!y a reference$ 5lso the #ages in the second and third alloc,ed s#ace
have not all !een referenced$
*n the #rocess list sho"n !y atop' inforation can !e found a!out the virtual address s#ace<
The eory details +su!coand ,m,. sho" the current virtual si-e #er #rocess in the colun D%*>2$
The virtual gro"th of a #rocess during the last interval is sho"n in the colun DG4?W$
=or #rocess simpress.bin "ith #id A8EC a virtual gro"th of 1C2B9i: is sho"n +#ro!a!ly !y
issueing a malloc. "hile the resident gro"th is 1EFC9i:$ 6rocess chrome "ith #id 2F2G2 has not
!een gro"n virtually +09i:. !ut has referenced #ages +C0 #ages of 89i:. during the last interval that
have !een allocated virtually during earlier intervals$ 6rocess Xorg "ith #id 1EA0 has only !een gro"n
virtually +C11C9i:. !ut has not referenced any of the ne" #ages +yet.$
6rocess firefox "ith #id 8CB0 has freed alloc,ed s#ace "ith a virtual si-e of 2G29i:$ 5##arently'
this i#lies the release of 2G29i: resident s#ace$
Case study: A quiet system
The first atop sna#shot "as taken "hen a #rogra that leaks eory has /ust !een started' called
lekker +@utch for (leaker).$ *n this sna#shot "e see a syste "ith net A$BGi: of #hysical eory of
"hich 1BCMi: is free and ore than 1Gi: in the #age cache +M2M line< cache + buff.$ The
kernel has dynaically allocated 2GAMi: sla!$ Thus' soe 2$AGi: is in use !y a##lication #rocesses$
%"a# s#ace +BGi:. is alost unused +so far$$$.$ We can see that the lekker #rocess has gro"n
1ECMi! +virtual. during the last interval "hich "as also ade resident !y really referencing the
allocated s#ace$ =or no"' there is no reason to !e "orried$
6lease take notice of the si1 upload #rocesses$ They have allocated 2ECMi: each +virtual and
resident.$ We can also see ulti#le chrome #rocesses that ay have a large virtual si-e' !ut only a
sall #ortion of that is resident< the cuulated virtual si-e is 8$1Gi: +1H1H0$CH1$E Gi:. of "hich
22FMi: +10AHE2H88HA0 Mi:. is resident$ The #rocess simpress.bin has ade less than 10I of
its virtual foot#rint +1$1Gi:. resident +F1Mi:.$ 5lso firefox has a relatively sall #ortion of its
virtual foot#rint resident$ 5 lot of these virtual si-es "ill !e shared' not only for the sae e1ecuta!le
file +8 chrome #rocesses share at least the sae code.' !ut also for shared li!rary code used !y all
#rocesses$ Till no" the syste has (#roised) A$8Gi: of virtual eory +M2M line' vmcom. of the
total liit F$FGi: +vmlim' "hich is the si-e of s"a# s#ace #lus half of the #hysical eory.$
Case study: It's getting a bit busy...
?ne sna#shot of t"enty seconds later' lekker has gro"n another 1E0Mi: +virtual and resident.$ =or a
large #art that could !e claied fro the free s#ace' !ut not entirely$ The nu!er of free #ages in stock
is getting very lo"' so the kernel tries to free eory soe"here else$ We can see that the first victis
are the #age cache and the sla!$ They !oth have to shrink$
The M2M line is dis#layed in cyan !ecause the aount of eory that can &uickly !e claied is sall
+free #lus the #age cache.$ The #rocesses are not yet in the danger -one !ecause no #ages are s"a##ed
out +65G line' swout.$ :etter yet' firefox has #hysically referenced another BF29i: +that is a
sall aount co#ared to the 1ECMi: that lekker got.$
We can see that chrome has shrunk !y 289i: +C #ages.$ We later found out that this "as caused !y
alloc,ed eory #ages that "ere freed' follo"ed !y a ne" alloc of C #ages "ithout referencing the
#ages again$ ;ence the virtual and resident si-e at first shrunk !y C #ages' after "hich only the virtual
si-e gre" !y C #ages$ 5fter all' the resident si-e shrunk "ithout a change of the virtual si-e$
Case study: The kernel gets worried....
=our sna#shots of 20 seconds later' "e see that lekker has an unsatisfia!le hunger< it has gro"n ore
than C00Mi: +virtual and resident. since the #revious screen shot' "hich is a!out 1E0Mi: #er 20
seconds$
The #age cache has !een shrunk as "ell as the sla! +e$g$ in3core inodes and directory entries.$ 6rocesses
"eren,t s#ared either$ They didn,t shrink virtually' !ut soe of their resident #ages "ere taken a"ay !y
the kernel +negative RGROW.$ :ecause the "orried +!ut not yet des#arate. kernel is looking hard for
eory to free' ore and ore #ages are checked !y the #age scanner< A0G12 #ages "ere verified to
see if they are candidate to !e reoved fro eory +65G line' scan.$ *f a #age has to !e reoved
fro eory that "as odified' that #age has to !e saved to the s"a# s#ace$ *n this sna#shot' A22
#ages "ere "ritten to the s"a# disk +65G line' swout.$ This resulted in A22 "rites to the s"a# s#ace
logical volue +vg00-lvswap. that "ere co!ined to a!out C0 "rites to the #hysical disk +sda.$
:ecause so any #ages "ere s"a##ed out in a short tie' the 65G line is dis#layed in red$
;o"ever' #rocesses don,t sit still and soe of the #ages that "ere s"a##ed out "ill !e referenced
again$ These #ages are read again "hich ha##ened F1 ties +65G line' swin.$ =ortunately atop itself
akes all its #ages resident at startu# and locks the in eory' thus #reventing the to !e s"a##ed
out and aking the easureents unrelia!le$
Case study: The kernel gets desparate as well as the users...
The eory3leaking #rocess lekker cannot !e sto##ed$ We fast3for"ard E inutes<
:y no"' lekker has gro"n to a virtual si-e of 2$BGi: of "hich 2Gi: is resident +ore than E0I of
the #hysical eory of the syste.$ *n the #ast 20 seconds' lekker has tried to get hold of 12BMi:
ore virtual eory !ut has only !een a!le to ake A8Mi: resident$ We kno" fro the #ast that
lekker tries to ake all its virtual eory resident as soon as it can' so "e can conclude that the
kernel is very !usy s"a##ing out #ages$ The #age cache has already !een inii-ed' as "ell as the
inode cache and directory entry cache +#art of the sla!.$ ?!viously the #rocesses "ill also have to
(donate) #hysical eory$ ?ne of the upload #rocesses +6*@ A0F2C. is even donating 1FMi:$ We
can see for soe of the upload #rocesses that they had to give !ack &uite a lot ore #hysical eory
+RSIZE.$ They had C ties 2ECMi:' of "hich they no" have only t"o thirds$
The syste is s"a##ing out heavily +A21AE #ages in the last 20 seconds. !ut is also s"a##ing in +B880
#ages.$ :ecause of this' the 65G line is dis#layed in red$ The disk is very !usy +the @%9 as "ell as
LDM lines are red.$ The average service ties of re&uests for the logical volues that are not related to
s"a##ing +lvhome and lvusr. are getting longer !ecause the re&uests to those areas are s"a#ed !y
re&uests to s"a# s#ace +lvswap.$ 5lthough a relatively sall nu!er of re&uests are related to lvusr
and lvhome' these logical volues are !usy res#ectively GEI and F2I of the tie$ The syste feels
e1treely slo" no"$ Tie to get rid of the leaking #rocess$$$$$
Case study: Relief...
=ive inutes later' the !ig s#ender lekker has finished and thus not using eory any ore<
;o"ever' the effect of lekker as a eory hog can !e noticed for a long tie$ We can see that the
upload #rocesses are slo"ly referencing their s"a##ed3out #ages resulting in a resident gro"th again$
:ecause there is an ocean of free s#ace +1$B Gi:.' nothing is s"a##ed out any ore and hardly any
scanning +65G line' scan.$
We see a lot of a/or #age faults +M5J=LT. for #rocesses< references to virtual #ages that are retrieved
fro disk$ 2ither they "ere s"a##ed out and no" have to !e s"a##ed in' or they are read fro the
e1ecuta!le file$ The inor #age faults +M*0=LT. are references to virtual #ages that can !e ade
resident "ithout loading the #age fro disk< #ages that need to !e filled "ith -eroes +e$g$ for alloc,s.
or #ages that "ere (accidentally) still availa!le in the free #age #ool$
The disk is still very !usy retrieving the virtual #ages that are referenced again and need to !e s"a##ed
in +swin is GAB8 "hich corres#onds to read for logical volue lvswap.$ Therefore the @%9 and
soe LDM lines are sho"n in red$ The #hysical disk sda is ainly !usy due to the re&uests of logical
volue lvswap$ ;o"ever this also slo"s do"n the re&uests issued for the other logical volues$ ?ne
re&uest to lvtmp even takes GABsJ
Case study: Life's almost good again...
More than seven inutes later' "e can see that the syste is alost tran&uil again<
There is far less disk *K? and certainly not all disk *K? is related to s"a##ing any ore$ 6rocesses +like
upload. still do not have all their resident eory !ack' !ecause they si#ly haven,t touched all of
their virtual #ages since the (stor) has #assed$ 6ro!a!ly any of these #ages have !een used during
their initiali-ation #hase and "ill not even !e referenced any ore$ 5s such a sall !rease ight hel#
to clean u# a dusty eory' ho"ever a story leaker as lekker can !etter !e avoided$$$$$
Possible solutions for memory leakage
=ro the case study' it is clear that only one is!ehaving #rocess can cause a heavy #erforance
degradation for the entire syste$ The ost o!vious solution is to solve the eory leakage in the
guilty #rogra and take care that every alloc,ed area is sooner or later freed again$ ;o"ever' in
#ractice this ight not !e a trivial task since the leaking #rogra "ill often !e #art of a third3#arty
a##lication$
%u##ose that a real solution is not #ossi!le +for the tie !eing.' it should !e #ossi!le to avoid that the
leaking #rocess is !othering other #rocesses$ 6refera!ly it should only har its o"n #erforance !y
liiting the resident eory that the leaking #rocess is allo"ed to consue$ %o not allo"ing the
!alloon to e1#and unliited +#ushing out the others.' !ut #utting a !o"l around it redirecting the
su#erfluous e1#ansions outside the !o1$$$$
The good ne"s is< there is a standard uliit value to liit the resident eory of a #rocess$
$ ulimit -a
....
max memory size (kbytes, -m) unlimited
The default value is (unliited)$ The coand ulimit can !e used to set a liit on the resident
eory consu#tion of the shell and the #rocesses started !y this shell<
$ ulimit -m 409600
$ lekker &
The !ad ne"s ho"ever is< this ethod only "orks "ith kernel version 2$8' !ut not any ore "ith
kernel version 2$C +duy value.$
:ut there is other good ne"s +"ithout related !ad ne"s this tie.<
*n the current 2$C kernels a ne" echanis is introduced called container groups +cgrou#s.$ Dia
cgrou#s it is #ossi!le to #artition a set of #rocesses +threads. and s#ecify certain resource liits for
such #artition +container.$ 5 cgrou# can !e created for all kind of resources' also for eory$ *t is
!eyond the sco#e of this docuent to go into detail a!out cgrou#s' !ut a sall e1a#le can already
illustrate the #o"er of this echanis$
Cgrou#s are i#leented via a filesyste odule' so first of all the virtual cgrou# filesyste +"ith
o#tion (eory). should !e ounted to an ar!itrary directory$ This ount has to !e done only once
after !oot' so it,s !etter to s#ecify it in your /etc/fstab file<
# mkdir /cgroups/memo
# mount -t cgroup -o memory none /cgroups/memo

To define a ne" eory cgrou# for the leaking #rocess+es.<
1$ Create a su!directory !elo" the ount #oint of the virtual cgrou# filesyste<
# mkdir /cgroups/memo/leakers
5t the oent that you create a su!directory' it is agically filled "ith all kind of #seudo files
and su!directories that can !e used to control the #ro#erties of this cgrou#$
2$ ?ne of the (files) in the ne"ly created su!directory is called memory.limit_in_bytes
and can !e used to set the total eory liit for all #rocesses that "ill run in this cgrou#<
# echo 420M > /cgroup/memo/leakers/memory.limit_in_bytes
A$ 5nother (file) in the ne"ly created directory is called tasks and can !e used to s#ecify the
id,s of the #rocessesKthreads that ust !e #art of the cgrou#$ *f you assign a #rocess to a cgrou#'
also its descendents +started fro then on. "ill !e #art of that cgrou#$ %u##ose that the leaking
#rocess lekker runs "ith 6*@ 2C2G' it can !e assigned to the cgrou# leakers as follo"s<
# echo 2627 > /cgroup/memo/leakers/tasks
0o" the leaking #rocess can not use ore resident eory than 820Mi:$ When it runs' atop ight
sho" the follo"ing out#ut<
The line la!eled M2M sho"s that 1$FGi: eory is free$ ?n the other hand' the line la!eled 65G
sho"s that a lot of #ages have !een s"a##ed out$
The #rocess lekker has already gro"n to A$CGi: virtual eory +D%*>2.' !ut it only uses AG8Mi:
resident eory +4%*>2.$ @uring the last sa#le' the #rocess has even gro"n 1A2Mi: virtually
+DG4?W.' !ut it has shrunk 2CMi: #hysically +4G4?W.$ 5nd "hat,s ore i#ortant' the other
#rocesses are not hared any ore !y the leaking #rocess$ Their resident gro"th is not negative$
The leakage is not fi1ed' though under control$$$

Anda mungkin juga menyukai