Anda di halaman 1dari 37

An Introduction to

Analyzing Malicious
Documents
By Alexander Hanel
Malware Analyst & Reverse Engineer
No Certificates
Introduction
Analysis Environment
Tools
Microsoft Office Documents
Adobe Portable Document Format
Adobe Flash SWF
Overview
Introduction
Why is malicious document analysis worth
learning?
Disclosed March 17, 2011
All because an employee saw this..
Images Courtesy of F-Secure's Blog
What is a malicious document?
It is a portable document that is capable of
executing code on behalf of an attacker.
How is the code executed?
Vulnerability
Adobe media.newplayer() - CVE-2009-4324
Microsoft Word .rtf stack overflow - CVE-2010-3333
Abusing current functionality
Adobe /Launch Action
Help files
Intent?
Drop an executable file or download one
Analysis Environment
When do we setup the lab?
An analysis environment is an important
precursor for analyzing malicious documents.
How?
Many of the steps for setting up a malware lab
overlap with setting up an analysis
environment for malicious documents
Allocated physical or virtual machines
Isolate the environment
Install analysis tools
etc..
see Lenny Zeltser site for more details.
Install Vulnerable Software
How?...
If we are going to be doing dynamic analysis
the environment will need vulnerable
software. The vulnerable software will be
installed on a Windows VM.
What are the standard vulnerable software
to have installed?
Adobe Reader
Pre-Adobe Reader X
Adobe Flash
Microsoft Office 2003
Microsoft Office 2007
Microsoft Office 2010
Windows Media Player
How?...
But which versions?
Adobe Reader
Fully patched
Never know when you get that 0day.
8.1.1
9.0.0
9.4.3
Adobe Flash
Fully pathed
10.3.183.15
10.2.154.27
Microsoft Office 2003
Fully patched
Default Install
Office Service Pack 1
Office Service Pack 2
Office Service Pack 3
If in an enterprise environment test with the current
patch level.
How?...
But which versions?...
Microsoft Office 2007
Fully patched
Default Install
Office Service Pack 1
Office Service Pack 2
Office Service Pack 3
Microsoft Office 2010
Fully patched
Default Install
Windows Media Player
Fully patched
Default Install with Operating System
VMware Snapshots are very useful when multiple
patches and versions are used for analysis.
Dynamic Analysis?
A fancy way of saying executing code or
simply "double clicking". Which seems to
work for users.
Static Analysis?
The opposite of dynamic analysis. Code is
manually analyzed.
The difference between static and dynamic
analysis is a big factor of how we will setup our
lab environment.
Via Dynamic Analysis, is it bad, yes or no?
Windows Operating System Loaded
Vulnerable Software Installed
Double Click
Something bad? Such as dropped files, network
traffic, CPU spike, application open and closes, etc
No.
Try another vulnerable version.
Via Static Analysis, is it bad, yes or no?
Linux Operating System
View file in hex-editor or file parser
Search or scan for known indicators or aberrant file
traces
Find anything bad? No, try dynamic analysis with
your current patch level.
Better to error on the side of caution.
Screenshot Dynamic Analysis - Windows XP
Linux Static Analysis Environment - REMnux
Tools
Malicious Documents Tool Types?
Parsers
Gives an easier view to navigate the overall structure
and layout of a document
example - pdf.parser.py
Scanners
Scans documents for indicators previously observed
example - yara or clamav
Viewers
Allow us to view the file to look for aberrant
indicators
example - hexdump -C bad.file
One off-scripts
Used to automate previous analysis
example - rtf.carv.py
Parser - Example Compressed Adobe SWF
CWS - Indicates ZLib Compressed Data - Due to the compression no indicators of functionality can
observed.
Using a parser that understand the SWF compressed file format we can start to see patterns that hint at
functionality. If we go deeper we could use a decompiler to view the actionscript code.
Scanner - Example CVE-2012-0158
Yara rule for CVE-2012-0158
Using a scanner we can detect the exploit as CVE-2012-0158 rather than having to manually view it.
Scanners are extremely useful for creating custom detection and alerts.
Viewer - Hexdump -C bad.file | less
Yara rule for CVE-2012-0158
Using a scanner we can detect the exploit as CVE-2012-0158 rather than having to manually view it.
Text editors are useless due to the non-standard ASCII characters that are represented in a binary file.
Viewers are more intimate and help with identifying aberrant sections of data. This can help with
identifying potential exploits or embedded areas that could be parsed such as an embedded TIFF.
One-Off Scripts - Example Shellcode Carver
Python, Perl, Ruby or C is your friend. One-off scripts will be needed to write decoders, carve out files,
decompress files, and many many other one-off opportunities that you will never use again. Handy to
have around for reusing code and just plain learning.
Microsoft Office Documents
Malicious Office Documents Overview
Binary DOC Format or Compound
Document File Format
Specifications released in 1997 then removed in 1999
from online downloads by Microsoft.
Stores data in streams, sectors, ole (object linking
and embedding)
Office Open XML Format (OOXML )
Developed in 2000 first used Office XP for Excel
Became default starting with Microsoft Office 2007
OOXML is a ZIP container with XML markup
language (WordProcessingML).
Allows the storing of text, formatting, styles,
document metadata, images, macros, revisions, etc
Does not allow the storing of TrueType Fonts
First Exploit CVE-1999-1259 Office 98 Mac Edition
Microsoft Office Tools
officemalscanner by Frank Boldewin
cmd - officeMalScanner bad.doc scan brute debug
REMnux will need Wine installed
officecat by Sourcefire - last update 2008
cmd - officecat bad.doc
7z for OOXL
cmd - 7z e bad.docx -so > data.out (writes file to one
long data stream, useful for scanning)
hachoir-subfile to identify embedded files
cmd - hachoir-subfile bad.doc
Offvis by Microsoft - last updated 2009
GUI - Run in a VM
Hexdump
hexdump -C bad.doc | less
The ASCII HEX blobs are not always bad but still good spots to look for shellcode. The blob is read two
bytes as a time and then converted to hex. Example: char '9' + char '0' becomes 0x90 = NOP
Embedded Adobe SWF
Microsoft Office Low Hanging Indicators
ASCII HEX Shellcode (nop slide)
It's rare to see a valid use of Flash in Microsoft Office Document. Usually used by employees to hide
Flash games. xxxswf.py -x bad.doc can be used to carve and decompress the SWF from the document
Portable Document Format
Portable Document Format Overview
Adobe Systems created the Portable Document Format
in 1993
PDF became an open standard on July 1, 2008
Adobe borrowed heavily from the PostScript page
description language
Designed to bundle all objects, fonts, images and other
related content into a single portable file format.
Allows for creators to create dynamic elements so the
users can interact with the document
Very rich feature set which supports many different
embedded file types (U3d, Tiff..) different compression
algorithms, scripting language and many other features
PDF specifications are documented but Adobe Reader
isn't very strict when adhering to them
First Exploit outlook.pdfworm.txt aka Peachy August
2001
PDF Tools
pdf-parser.py by Didder Stevens
cmd - pdf-parser.py -f --raw bad.pdf > deflate.pdf
pdfid.py by Didder Stevens
cmd - pdfid.py bad.file
peepdf by Jose Miguel Esparza
cmd - peepdf -f -i bad.pdf
pdftk by pdflabs
cmd - pdftk output out.pdf decompress bad.pdf
hachoir-subfile to identify embedded files
cmd - hachoir-subfile bad.pdf ( best to do on deflate.
pdf)
Hexdump
hexdump -C deflate.pdf | less
* Not cutting and pasting. Hexdump is your
friend
sc = shellcode. No explanation needed.
Encoded Shellcode Patterns
PDF Really Low Hanging Indicators
Javascript with the variable name 'sc' *shakes head
Usually contained in a JavaScript function. Random chars inserted and removed to break signature
scanners.
Files named after functionality
PDF Low Hanging Indicators...
Embedded Adobe SWF
Multiple Compressions or Encodings used
Data sets with recurring patterns.
Commonly used with obfuscation.
Adobe SWF
Adobe Flash File Format - SWF
Created by FutureWave Software in 1996
which was bought by Macromedia in 1998
which was bought by Adobe in 2005.
Designed to display vector graphics, text,
video and sound over the network in simple
binary format.
Uses techniques such as bit-packing and
compression (zlib) to keep the file format
small
Supports Scriptability through the
ActionScript Language
Easy to embed in documents and webpages
Oldest Exploit CVE-2001-0166
SWF Tools
xxxswf.py by me
cmd - xxxswf.py -x bad.doc
Flash Decompiler Trillix by Eltima
GUI - Great ActionScript Decompiler - $
swfdump by Matthias Kramm
cmd - swfdump -aptdu bad.swf
Adobe SWF Investigator
GUI - Early stages of development
Hey Adobe! Make a cmd line version for PDFs. Us
researchers won't hate you so much.
vi for hexdump
vi bad.swf, :%!xxd, revert :%!xxd -r
SWF Low Hanging Indicators
ASCII HEX Shellcode
Functions named after exploitation
Actionscript formatting Shellcode
Conclusion
This is a very basic introduction to analyzing
malicious documents. The best steps for getting
more efficient with this type of analysis is the
following
a. Reading the different document file specifications (at
least the first five chapters)
b. Experimenting with the different tools and
questioning the output
c. Trying to write parsers or scanners to solve or detect
issues that might come up during analysis
Questions?
Contact
alexander.hanel@gmail.com

Anda mungkin juga menyukai