Anda di halaman 1dari 39

Understanding

Process Memory
(Win32)
Software Security Assessment
Lecture 0x02
Keith Makan @k3170makan

Before we start...some toolage!


You will need some tools:

Immunity Debugger http://www.immunitysec.com/products-immdbg.shtml


IDA community Edition https://www.hex-rays.com/products/ida/index.shtml
Windows XP 32bit image (easy to obtain, will drop them at the campus this
week)
Oracle VirtualBox (https://www.virtualbox.org/ )

Some stuff you will probably need to read after this class:

http://rsquared.sdf.org/gdb/mlats.html
http://www.cs.nyu.edu/courses/fall04/V22.0201-003/ia32_chap_03.pdf
http://insecure.org/stf/smashstack.html
https://www.corelan.be/index.php/2009/07/19/exploit-writing-tutorial-part-1stack-based-overflows/

Why is this important?

Its fun to know how processes really work


You will probably get a lot better at debugging your programs
Needed for successful and meaningful exploitation!

Just like for SQL injection you need to know how an SQL statement works, for
memory corruption you need to know how memory works.
Simple work cycle to becoming a memory corruption guru: learn a
memory mechanism -> figure out how to corrupt it -> figure out ways
to bend it to your will.
You are computer scientists, enough said!

Hang on, why windows XP 32 bit?


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Because it's ridiculously easy to exploit.


Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.
Because it's ridiculously easy to exploit.

Basic work cycle of an executable


1.
2.
3.
4.
5.

6.

Some idiot writes some code (.c,.cpp,etc.)


A compiler generates machine dependent code (raw assembler)
A linker maps in libraries (.DLLs, .sos )
A PE (Portable Executable) File is produced---or on Linux an ELF (Executable and Linkable
Format)
When the PE or ELF is executed; a memory loader maps the sections of the file into memory (.
bss,.data.text , .reloc, etc.)
a. During this phase the OS makes space for the stack and heap memory
b. Marks memory segments with the appropriate access rights
The operating system switches context to the .text section of the file.

This is a gross oversimplification but the important parts are mentioned. It starts with code, then a PE
(ELF) file is created and this is used to construct a memory image.

Some important things to take note of

The Compiler can only control certain attributes of a executable files


behaviour generally speaking
The Operating system can only control attributes that influence the process
during its execution generally speaking

The Compiler cannot control where an executable is loaded into memory for
example (since it would need to know what is executing and what will execute to some extent), and an
operating system cannot influence the contents of a process's code (it would need to
be able to predict the outcome of the code without actually running it).

This difference in responsibility and functionality is important to understand since it will


determines where certain security protections are enforced (at the compiler vs the operating
system!

A little about process memory

Store information needed to run a process.


Memory is read like a file (with random access)
Some parts hold instructions for the CPU to execute .i.e code, libraries
(.text *mostly*)
Other parts hold data targeted by computation (variables, static
values, dynamically assigned variables)
Memory is segments (or areas of memory) are marked with access rights
READ
EXECUTE
WRITE
or any combination of these (see following screenshot) - immunity
debugger*

The practical picture

access rights

Virtual
Memory
offset

Size of
section

memory map (loaded not executing).

A sample memory map (loaded executing)

The Stack

*stolen from corelan.be - because corelan is awesome!

Program Image

DLLs (Imported Code).

.text?

Used to store code that will dictate the processes behaviour


READ ONLY, well it's intended to be so
Corresponds to the .text area of the executable file (usually)

.text?
instructions

raw opcodes

memory offset

.data?

Holds references to values (variables) with non-NULL values at compile


time. Used to reference both global or local variables initialized this way.
i.e. int one=1, the variable one will have an address in the .data
segment
i.e. char *string = this exploit only works on my machine;
The .data can hold both static (immutable values) and non-static variables
i.e. const int one=1, this variables value cannot be changed, but is still
initialized with a non-zero value therefore it goes in the .data

*some compilers prefer to use the rdata section for non-mutable initialized data.

.data?

String values
initialized at
compile time

Variable offset
addresses

Literal values
encoded in hex

Example reference to .data


4

interpretation
Data section contents
Addresses

stack
addresses

values at
addresses

Some good reading sources...


Ive left out a few segments, if youre interested in the full story as far as
executable formats go, check these links out:

http://www.csn.ul.ie/~caolan/pub/winresdump/winresdump/doc/pefile2.html
https://evilzone.org/tutorials/(paper)-portable-executable-format-and-itsrsrc-section/
http://msdn.microsoft.com/en-us/magazine/cc301805.aspx
http://msdn.microsoft.com/en-us/library/ms809762.aspx

Linux

http://www.skyfree.org/linux/references/ELF_Format.pdf
http://wiki.osdev.org/ELF
http://www.linuxjournal.com/article/1059

.stack?

used as a scratch pad for local variables and switching execution


between functions
Used to set up arguments to pass to called functions
Grows in size dynamically toward address 0x0
Works just like the stacks you learned about in Dodds course
Adopted from the stuff Turing wrote about Turing machines only need one
stack and a tape drive (memory) to be able to compute anything =>
modern computers still rely on this fundamental principle.

The stack must provide a way for functions to call other functions and functions
to return to those that called them!
Have a little think about how you would have this work...

Using the Stack

ESP (x86)/RSP (x64) register is used to point to the top of a functions


stack
EBP (x86)/RBP (x64) register is used to point to the bottom of a functions
stack
These registers are used as ways to deference addresses to variables on
the stack.
Facilitates function nesting and recursion
Can also sometimes be used as a place to store dynamic variables (with
runtime dependent size) i.e. emulate a heap!

How the stack works


function A(){

Base
Pointer

function B(){
C();
}
function C(){
D();

Stack
Pointer

bottom
function Bs stack
top
function C stack

}
main(){
A();
}

function Ds stack

Destruction

function As stack

Growth

B();

From the code...


How functions setup their own stacks...

push

ebp

#save the previous functions EBP

calling ebp
calling stack

function
calling

Return Address
calling esp

calling EBP

function
being
called*

called stack
*this stack is still to be set up, this diagram does not reflect its actual size but instead the space it
will occupy

From the code


How functions setup their own stacks...

push

ebp

#save the previous functions EBP

mov

ebp, esp

#grab the current value of EBP and move it to ESP

calling stack

function
calling

Return Address
called ebp
calling EBP
called esp

*function
being
called

*esp and ebp are equal so effectively the stack currently occupies 0 space at the moment

From the code...


How functions setup their own stacks...

push

ebp

#save the previous functions EBP

mov

ebp, esp

#grab the current value of EBP and move it to ESP

push

ebx

#save the EBX value to the stack

calling stack
Return Address

function
calling

calling EBP
called ebp

called esp

calling EBX
called stack

function
being
called*

From the code...


How functions setup their own stacks...

sub

esp, 0Ch

#create space on the stack by sub-ing 0xC = 12 bytes

Return Address
called ebp

calling EBP
calling EBX

called esp

called stack

0xC
addresses

Some notes

The last diagram indicates a fully setup stack ready to rock!


Here the EBX on the stack is not important, its merely saved/preserved as
a convention of the specific function being called. For all intents and
purposes it has nothing to do with setting up the stack in the classical
sense and was clumped in as part of the this example as pure
happenstance.
We have left out a crucial part of this operation in order to simplify
explanation, if youve noticed this or have some questions sit tight were
not done ;)

From the code... (cont.)


How functions destroy their own stacks...

add

esp, 0Ch

#add back the 12 bytes we allocated on the stack

Return Address
called ebp
calling EBP
called esp

calling EBX
called stack

From the code... (cont.)


How functions destroy their own stacks...

pop

ebx

#restore the ebx value we saved

Return Address
called ebp

calling EBP
calling EBX

called esp

called stack

Value placed in
EBX register

From the code... (cont.)


How functions destroy their own stacks...

pop

ebp

#remove the ebx value we saved

calling stack
called ebp
Return Address
called esp

calling EBP

Saved EBP value lets


us know where to
place the bottom
bound of the old
stack
New stack
restored to
original
state

calling EBX
Old stack
called stack

From the code... (cont.)


How functions destroy their own stacks...

retn

#this instruction basically branches execution

calling stack
called ebp
Return Address
called esp

calling EBP

New stack
restored to
original
state

calling EBX
Old stack
called stack

The RETN instruction

Literally
pop eip

Used to branch execution after a process is done executing


Branches execution to whatever is on top of the stack! So this means
it takes the value saved on top of the stack and placed it inside eip.
The processor then expects to find instructions to execute at that address
value. I.e. call *stack[0] execute whatever stack+0 points to!

Hang on...

How does the processor know where to return to?


Where did this magic return address value come from? Who put it there?
What happens to the stack of the previous function when another one is
called?
All these details are hidden in the CALL opcode.
Its actually a shorthand for a bunch of operations (sort of)
Return is almost the perfect inverse of CALL.

The Call instruction explained...


1. Save the current EIP (plus an instructions) to
the stack (so we know where to return to)
2. Load the called location into the EIP
3. Execute as normal...

Stack after call and setup...


calling stack

1.
2.
3.

call instruction saves EIP (+1 instruction)


called function preserves the callers EBP
called function makes space for its stack

Here we have an example where the function


Immunity.004010BB has just been called, before
the next functions prologue began executing.

Return Address
calling EBP

called stack

saved return pointer


popped onto stack

practical example (all together now)


Example here shows a function stack at a random
snapshot during its execution.

Called Stack - set up after the call was


made.
Calling EBP value - the function
that called this ones EBP

Return Address

Some notes

I consider the return address, saved EBP and arguments part of the called
function's stack
This doesnt really matter but you could consider it part of the calling
function's stack given that it is the calling function that loads these
values
But it makes it an easier story to tell from the perspective of memory
corruption given that the calling function corrupts this data.
You could also think of the calling function as preparing the stack by
placing these values in the called stack.

What about function arguments?

What happens when a function has arguments?


How is this handled according to the mechanics of the stack?

Well

All functions passed to a callee (function being called) need to be accessible locally (via the its
stack).
The calling function will push these arguments onto its stack in REVERSE order before making
the call and branching execution to the callee. *on some architectures the EDI and ESI are used
to store points to arguments before the stack is used to store them when calling a function
(optimization effort)*

And Then

The callee will reference these arguments outside of its stack by using registers (EBP, ESP)

References and further reading


Make sure to soak these up before the next lecture ;)

Smashing the stack in 2010 http://www.mgraziano.info/docs/stsi2010.pdf


Smashing the stack for fun and profit http://insecure.org/stf/smashstack.html
Exploit Writing tutorial part 1 : Stack Overflows https://www.corelan.be/index.
php/2009/07/19/exploit-writing-tutorial-part-1-stack-based-overflows/
Part 1 : Introduction to Exploit Development https://www.fuzzysecurity.com/tutorials/expDev/1.
html