Anda di halaman 1dari 75

Descent into Darkness:

Understanding your system’s binary


interface is the only way out.

joe damato
@joedamato
timetobleed.com
About Joe Damato

• ex-vmware, cmu alumni


• memprof, ltrace libdl/libunwind patchset,
ree/mri thread implementation rewrite
• http://timetobleed.com
• @joedamato
Only have 30 minutes...
welcome to flight school.
No clue why this was
accepted.

This talk will have about 5 lines of Ruby code.


Before we get started

I need to introduce you a good friend of mine...


This talk is about how being evil is totally awesome.
Don’t do any of this,
ever.
The problem
• My ruby process is 700 megabytes. Why?
The problem

• It is very easy to leak references in your


Ruby code.
• Leaking a reference to an object causes
that object and all objects it references to
stick around in memory.
The problem
• As long as someone, somewhere is
holding a reference to this instance
of classA, all the objects in this
picture can not be freed
• This could add up to a lot of
memory very fast.
• GC will scan each object every run
to see if it is time to free the object.
• This could add up to a lot of CPU
burned.
The problem
• But memory is cheap who cares?
• Ruby’s GC is a naïve stop the world mark
and sweep.
• The more objects that stick around in
memory the longer your GC runs take.
• The longer GC takes, the less time your
app has to run Ruby code.
The problem

• Eliminate leaked references, reduce the


length of your GC runs, run more of your
Ruby application code.

• Cool. But how can you track down


reference leaks?
Problem Requirements
• I don’t want to apply
patches and rebuild
Ruby.
• I want to gem install,
require, and done.
• Anything else is too
much work.
Luckily, we can turn to evil.
Verbiage
• amd64 is a CPU spec was proposed by AMD
as a way to add 64bit support to x86.
• Intel Architecture 64 (IA64) spec is a
completely new 64bit instruction set.
• amd64 != IA64
• Intel then decided to adopt AMD’s 64bit spec.
• They did and called it IA-32e, EM64T, and
finally Intel 64.
• Intel 64 ~= amd64
Verbiage

amd64 Intel64

compilers generate code that uses the


subset of the amd64 spec that both intel and
amd comply to.
usually called x86_64 or amd64.
WTF is an ABI?

• Application Binary Interface


“describes the low-level interface between a
program and the operating system or
another application.” (wikipedia)
WTF is an ABI?

• alignment
• calling conventions
• object file and library formats
• syscalls (how they work, where they live)
WTF is an ABI?
System V ABI (271 pages)
System V ABI AMD64 Architecture Processor
Supplement (128 pages)
System V ABI Intel386 Architecture Processor
Supplement (377 pages)

MIPS, ARM, PPC, and IA-64 too!


I brought copies of all three for
everyone.

We will now read them together.


No. But let’s blaze
through the important
pieces now.
Evil Devices

• nm - dump symbol table


• objdump - disassemble lots of different
objects. can do lots, lots more.
• readelf - dump information
• dwarfdump - dump debugging information
nm
% nm /usr/bin/ruby
000000000048ac90 t Balloc

0000000000491270 T Init_Array

0000000000497520 T Init_Bignum

000000000041dc80 T Init_Binding
symbol symbol names
“value” 000000000049d9b0 T Init_Comparable

000000000049de30 T Init_Dir

00000000004a1080 T Init_Enumerable

00000000004a3720 T Init_Enumerator

00000000004a4f30 T Init_Exception

000000000042c2d0 T Init_File

0000000000434b90 T Init_GC
objdump
% objdump -D /usr/bin/ruby

offsets opcodes instructions helpful metadata


readelf
% readelf -a /usr/bin/ruby

This is a *tiny* subset of the data available


dwarfdump
% dwarfdump -a /usr/bin/ruby
Some friends
• Registers are important. They are small, fast
pieces of memory on the CPU.
• Some registers have a specific job:
• %rax - holds a return value
• %rip - instruction pointer
• Can refer to pieces of registers.
%rax uncensored
%rax = 64 bits, 8 bytes, 1 quadword lower 8 bits
%eax = 32 bits, 4 bytes, 1 dword upper 8 bits
%ax = 16 bits, 2 bytes, 1 word
%ah = 8 bits, 1 byte, 1 halfword
%al = 8 bits, 1 byte, 1 halfword
%ah %al
lower 16 bits %ax
lower 32 bits %eax
%rax
Some x86_64 asm
notes
• Two different syntaxes: gas/att and intel.
• GDB disassembly is gas/att by default.
• set disassembly-flavor intel
• objdump is gas/att by default
• objdump -M intel
• I prefer gas/att.
unless otherwise
noted, everything will
be in att/gas syntax.
Moving stuff

mov source, dest


mov $0,%rbx # move immediate (0) to register
mov %eax,%rax # mov eax into rax.

source and dest cannot both be memory.


Calling functions
• Lot’s of different ways to call functions.
• Two ways we care about (there are more):
callq *%rbx # indirect absolute
callq 0xdeadbeef # RIP relative with 32bit displacement
Calling convention
(x86_64)
• function arguments from left to right live in:
%rdi, %rsi, %rdx, %rcx, %r8, %r9

• that’s for INTEGER class items.


• Other stuff gets passed on the stack (like
on i386).
• end of argument area must be aligned on a
16-byte boundary.
• registers can be caller or called saved.
intel syntax att/gas syntax

Save the old stack frame base pointer.


Set the base pointer to the current stack pointer.

int again(int amount)


{
int ret = 0;
ret = amount + 150;
return ret;
}
intel syntax att/gas syntax

*(rbp - 0x14) = amount;


*(rbp - 0x4) = 0;

int again(int amount)


{
int ret = 0;
ret = amount + 150;
return ret;
}
intel syntax att/gas syntax
eax = *(rbp - 0x14);
eax = eax + 0x96; /* 0x96 = 150 :P */

int again(int amount)


{
int ret = 0;
ret = amount + 150;
return ret;
}
intel syntax att/gas syntax

*(rbp - 0x4) = eax; /* not needed */


eax = *(rbp - 0x4); /* not needed */

int again(int amount)


{
int ret = 0;
ret = amount + 150;
return ret;
}
intel syntax att/gas syntax

restore the stack pointer and old base pointer


return from the funtion

int again(int amount)


{
int ret = 0;
ret = amount + 150;
return ret;
}
ELF Objects
ELF Objects
• ELF objects have headers
• elf header (describes the elf object)
• program headers (describes segments)
• section headers (describes sections)
• memprof uses libelf to wander the elf object
extracting information.
• the executable and each .so has its own set of data
Sections that matter to
memprof
• .text - code lives here
• .plt - stub code that helps to “resolve”
absolute function addresses.
• .got.plt - absolute function addresses; used
by .plt entries.
plt

• Procedure Linkage Table (plt) is used to find


functions in shared libraries at runtime.
• Shared libraries are position independent
and can be mapped anywhere in the
address space.
Um, what does this have
to do with Ruby?
The ingredients for evil

• we know the x86_64 ABI


• we know how ELF objects work
• we know ruby calls functions in the VM to
allocate and free objects (rb_newobj,
add_freelist)
You won’t.
Let’s combine all of this knowledge and ...
Rewrite the Ruby VM in memory
while it is running.
Hook rb_newobj
• The Ruby VM calls rb_newobj to allocate a
new object.
• We’ll need to know when this happens so
we can track objects.
• Let’s scan the Ruby binary in memory and
rewrite all function calls to rb_newobj to
call a handler function instead.
Hook rb_newobj
(objdump output)
412d16: e8 c1 36 02 00 callq 4363dc # <rb_newobj>
412d1b: .....

address of this instruction


call opcode*
32bit displacement to the
target function from the next
instruction.
Hook rb_newobj
(objdump output)
412d16: e8 c1 36 02 00 callq 4363dc # <rb_newobj>
412d1b: .....

(x86 is little endian)

412d1b + 000236c1 = 4363dc


Hook rb_newobj
Overwrite the displacement so that all calls
to rb_newobj actually call a different function
instead.

It may look like this:


VALUE other_function()
{
VALUE new_obj = rb_newobj();
/* set up tracking of new_obj */
return new_obj;
}
Doesn’t work for all

• That trick only works for Ruby built with --


disable-shared (no libruby.so)
• Ruby built with --enable-shared (with
libruby.so) doesn’t work like that.
• Code in libruby.so calls rb_newobj via the
PLT.
How the plt works

.got.plt entry
Initially, the .got.plt entry contains
the address of the instruction after
the jmp. 0x7ffff7afd6e6
How the plt works

.got.plt entry
An ID is stored and the rtld is
invoked.
0x7ffff7afd6e6
How the plt works

.got.plt entry
rtld writes the address of
rb_newobj to the .got.plt entry.
0x7ffff7b34ac0
How the plt works

.got.plt entry
rtld writes the address of
rb_newobj to the .got.plt entry.
0x7ffff7b34ac0
calls to the PLT entry jump
immediately to rb_newobj now
that .got.plt is filled in.
Hook the GOT

Redirect execution by overwriting the .got.plt


entry for rb_newobj with a handler function
instead.
Hook the GOT

VALUE other_function() .got.plt entry


{
VALUE new_obj = rb_newobj();
/* set up tracking of new_obj */
return new_obj; 0xdeadbeef
}
WAIT... other_function() calls rb_newobj() isn’t that an infinite loop?

NO, it isn’t. other_function() lives in memprof.so, so its


calls to rb_newobj() use the .plt/.got.plt in memprof.so.

As long as we leave memprof.so unmodified, we’ll avoid an


infinite loop.
Hook add_freelist
• We’re now tracking objects at the time of
creation.
• In order to find leaks we need to track when
objects get freed too.
• add_freelist is called in the VM when an
object is freed.
• Why not just overwrite call instructions or
hook the GOT?
Hook add_freelist
• Can’t because add_freelist is inlined:
static inline void
add_freelist(p)
RVALUE *p;
{
p->as.free.flags = 0;
p->as.free.next = freelist;
freelist = p;
}

• The compiler has the option of


inserting the instructions of this
function directly into the callers.
• If this happens, you won’t see any calls.
So... what now?
• Look carefully at the generated code:
static inline void
add_freelist(p)
RVALUE *p;
{
p->as.free.flags = 0;
p->as.free.next = freelist;
freelist = p;
}

• Notice that freelist gets updated.


• freelist has file level scope.
• hmmmm......
A (stupid) crazy idea
• freelist has file level scope, so it lives at some
static address.
• add_freelist updates freelist, so...
• Why not search the binary for mov instructions
that have freelist as the target!
• Overwrite that mov instruction with a call to
our code!
• But... we have a problem.
• The system isn’t ready for a call instruction.
Isn’t ready? What?
• The 64bit ABI says that the stack must be
aligned to a 16byte boundary after any/all
arguments have been arranged.
• Since the overwrite is just some random mov,
no way to guarantee that the stack is aligned.
• If we just plop in a call instruction, we won’t
be able to arrange for arguments to get put in
the right registers.
• Must save caller saved registers.
• So now what?
jmp
• Can use a jmp instruction.

• call saves a return address


• jmp does not.
• Transfer execution to an
assembly stub that sets the
system up according to the ABI.
• then do the call to the C
handler
• don’t forget to jmp back when
handler is done!
this instruction updates the freelist and comes from
add_freelist:

Can’t overwrite it with a call instruction because the


state of the system is not ready for a function call.

address of assembly stub

The jmp instruction and its offset are 5 bytes wide.


Can’t grow or shrink the binary, so insert 2 one byte
NOPs.
this instruction updates the freelist and comes from
add_freelist:

Can’t overwrite it with a call instruction because the


state of the system is not ready for a function call.

must jump back here

The jmp instruction and its offset are 5 bytes wide.


Can’t grow or shrink the binary, so insert 2 one byte
NOPs.
assembly stub*
*slightly abbreviated

void handler(VALUE freed_object)


{
mark_object_freed(freed_object);
return;
}
Sample Output
require 'memprof'
object count file, line number, class name
Memprof.start 108
14
/custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
test2.rb:3:String
require "stringio" 2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
1 test2.rb:4:StringIO
StringIO.new 1 test2.rb:4:String
Memprof.stats 1
1
test2.rb:3:Array
/custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable

require 'memprof'
Or just track a block Memprof.start
Memprof.track(‘/tmp/file’) {
do_something
}

require 'memprof'
Memprof.start
Or dump the entire heap as JSON do_stuff
Memprof.dump_all(‘/tmp/file’)
Middleware
• Use memprof as middleware
• Get per-request object count information
rails 3, environment.rb:
require 'memprof/middleware'
MyApp::Application.configure do
config.middleware.use Memprof::Middleware
end
569 lib/ruby/1.8/yaml.rb:133:String
528 gems/sequel-3.9.0/lib/sequel/model/base.rb:393:__node__
522 gems/haml-2.2.20/lib/haml/precompiler.rb:545:String
522 gems/haml-2.2.20/lib/haml/helpers.rb:135:String
522 gems/haml-2.2.20/lib/haml/helpers.rb:135:ActiveSupport::SafeBuffer
507 gems/haml-2.2.20/lib/haml/precompiler.rb:317:String
488 gems/sequel-3.9.0/lib/sequel/adapters/mysql.rb:410:String
445 lib/ruby/1.8/yaml.rb:133:YAML::Syck::Node
432 gems/haml-2.2.20/lib/haml/precompiler.rb:566:String
406 gems/sequel-3.9.0/lib/sequel/model/base.rb:392:__node__
memprof.com
memprof limitations
• only works on amd64 linux and snow leopard
• only works with MRI and REE 1.8
• only works on binaries that are NOT STRIPPED.
• OSX System Ruby is NOT supported (yet).
• support for EY rubies is forthcoming - you will
have to install -dbg packages, though.
More evil is brewing
• We have some crazy, scary, stupid ideas that
we think you’ll love.
• Stay tuned to find out what they are.
• 1.9 support is one of the ideas.
Use RVM.
This would have been really hard to test on
all the different Ruby binaries without RVM.
Use it. Donate money. (Not my project).
http://rvm.beginrescueend.com/
Get memprof
• This talk was about the memprof Ruby gem
which is free and provides text output.

• github.com/ice799/memprof
• gem install memprof
• #memprof on irc.freenode.net
• memprof.com is separate and visualizes the
output from the memprof gem.
• memprof.com is in alpha.
Special Thanks
• Aman Gupta (@tmm1) - web ui, json output, and
much more
• Jake Douglas (@jakedouglas) - mach-o layer, bugfixes,
and more.
• Brian Lopez (@brianmario) - because he’s cool.
• Brian Mitchell (@binary42) - for convincing me to do
this by telling me I wouldn’t and was too scared.
Questions ?
@joedamato
timetobleed.com
github.com/ice799

Anda mungkin juga menyukai