Anda di halaman 1dari 9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

Reverseengineering&programmingblog
Home
Postsarchive

ReverseengineeringMightandMagicIIIcompression
May23,2015/ReWolfpostedinprogramming,reverseengineering,sourcecode,tools/NoComments
Im not quite sure how I ended up deep inside DOSBoxdebugger, going through 16bit assembly and
recoveringdecompressionroutineusedtohandleMM3.CCfile,butitwasdefinitelyfun.Igotthegame
fromoneoftherecenthumblebundlesandsomehow(thisisthepartthatImmissing)IvefoundJeff
Ludwigspage.IvereadabouthisapproachtomoddingMightandMagicIIIandproblemsrelatedto
compressed/encryptedMM3.CCdatafile.Oneofthephrasessoundedlikeaninvitation:
Itturnsoutthatthisalgorithmhasbeenaparticularlytoughnuttocrack,andnoonehas
comeupwithaviablewayofdecryptingthedata.
Irecommendreadingthewholestoryashismethodofdealingwiththisproblemisalsogreat.Inthispost
IlldescribehowIvehandledit,intheendtherewillbelinktotheopensourceutilitythatcannotonly
decompress,butalsocompressvalidMM3.CCfile.

DOSPacker
Quick look at MM3.EXE reveals that it is compressed DOS executable with some not compressed
overlaythatstartswithFBOVmagic.IhavenoideaaboutexecompressorsfromtheDOSera,butIve
found that Jeff Ludwig used something called Universal Program Cracker version 1.11 coded by
Synopsis. Ive found version 1.10 of this tool (! Release Date: 06 25 1997 !) and it successfully
unpacked working executable. It was even able to properly handle overlay data. Even though I had
unpackedexecutable,Iwantedtoknowthenameofthepackerused.Oneofmyfriendstoldmetotry
DetectItEasyandhewasrightasitwasabletodetect:
EXECUTRIXCOMPRESSOR()[byKnowledgeDynamicsCorp]
BorlandTLINK(2.0)[]

Ifanyofyouisinterestedincomputerhistory,therearesomeoldthreadsongooglegroupsmentioning
thispieceofsoftware(datedbackto1991and1995):
https://groups.google.com/forum/#!topic/comp.os.msdos.programmer/QsjHLY6Kb4s
https://groups.google.com/forum/#!topic/comp.compression/IAj2VHbtl4

IDADOSloader
http://blog.rewolf.pl/blog/?p=1202

1/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

Having unpacked executable is a good start, but having proper disassembly is way more important.
UnfortunatelyIDAhadhardtimesloadingthisexecutable.Itproperlydetectsoverlay,butfailsatloading
it. After quick look at the code, it turned out that analysis without overlay would be rather painful, as
there are obviously missing parts in the code (even though decompression routine is stored inside the
executable). I had to take another look at IDA DOS loader and maybe figure out how to load this
overlay. Querying google for FBOV dos overlay brings source code of IDA DOS loader as a first
result,whichconfirmsthatIDAshouldproperlyloadthistypeofoverlay.Iverecompileddebugversion
ofIDADOSloader,andtraceditinVisualStudiotoseewhyitfails.Tounderstandwheretheproblem
was,IneedtodescribeafewinternalsofFBOVstructure.FBOVheaderisdescribedbybelowstructure:
#defineFB_MAGIC0x4246
#defineOV_MAGIC0x564F

structfbov_t
{

ushortfb;
//=FB_MAGIC

ushortov;
//=OV_MAGIC

uint32ovrsize;

uint32exeinfo;

int32segnum;
};

exeinfoisanoffset(absolutetothebeginningoftheMZheader)tothearrayofstructuresthatdescribes
eachsegmentstoredintheoverlay.segnumisthenumberofsegments,whicharedescribedbybelow
structure:
structseginfo_t
{

ushortseg;

ushortmaxoff;

ushortflags;

ushortminoff;
};

ThatsthetheoryandIDADOSloaderimplementsitintheLoadCppOverlays()function.Incaseofthis
executable,thistheoryfails,butitfailsonlybyafewbytes.DuringdebuggingsessionIvefiguredout
that exeinfo points to the place just after mentioned array of segments. Ive added one line to
LoadCppOverlays():

fbov.exeinfo=fbov.segnum*sizeof(seginfo_t);

ThissimplefixenabledIDAtoproperlyloadoverlaydataandanalysewholeexecutable.Ihaventfound
any FBOV documentation (neither official, nor unofficial), so I cant confirm if there are multiple
differentimplementationsofFBOVoverlays.ImrathersurethatIDADOSloaderimplementsproper
version,assomeonewhowroteitprobablybaseditonsomereallifeexamples.MaybeMM3difference
stemsfromsomeunpackingquirks,whoknows.

LocatingDecompression
Illnotgointomuchdetailsregardingprocessoflocatingdecompressionroutineasitwasneitherhard
nor interesting. Ive used DOSBox debugger and set a breakpoint on int 21h (standard interrupt for
handlingDOSAPIinterface),particularlyinterestingfunctionswere3Dh(openfile),3Fh(readfile)and
42h(seekfile).AftersomeshortperiodoftimeIwasabletoidentifydecompressionroutine.
http://blog.rewolf.pl/blog/?p=1202

2/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

AlgorithmAnalysis
Nowadaysmostof(de)compressionrelatedreverseengineeringworkisdonebyHexRaysDecompiler,
so it is nothing particularly hard, albeit it still requires some patience to get everything right.
UnfortunatelythereisnoHexRaysfor16bitx86assemblerandprobablytherewontbe,asitspretty
muchdeadplatformthisdays.Formethisanalysiswasliketimetravellingbackto2005/2006whenI
wrote my first static unpackers. It was time of IDA 4.9, so there wasnt even interactive graphview
mode,whichwasfirstintroducedinmarch2006(IDA5.0changelog).Immentioninginteractivegraph
modehereonpurpose,because(atleastforme)itwasbreakthroughtechnologythatdrasticallyspedup
reverseengineeringofvariousalgorithms.Belowyoucanfindgraphoverviewofthedecompressor:

http://blog.rewolf.pl/blog/?p=1202

3/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

Purpleblocksareresponsibleforalgorithminitialization,brownaremaindecompressionloopandwhite
ones are related to memory handling and partially to CC file structure parsing. Recovering algorithm
directlyfromassemblycodeusuallyconsistsoffewphases:
1.Gatheringknowndatadifficultyofthisphasestronglydependsonthecomplexityofthealgorithm.
Oneneedtogatherinformationaboutinputandoutputbuffers,allpossibletemporarybuffers,variables
usedthroughoutthefunction(localandglobal)andlastbutnotleastconstantdatathatmightbealsoused
duringsomecalculations.Tosimplifythings,allCPUregistersshouldhaveassignedvariablethathasa
namesimilartotheregister(forexample_ax,_bx,_cxetc).Typeofthevariableshouldcorrespondtothe
sizeoftheregister,inthiscaseitwouldbeuint16_t.Insomecasesitisbettertorepresentregisters as
unions,soaccessing8bitpartsofregisterswouldntbetoopainful.Localandglobalvariablescanbe
tricky,especiallyatthebeginningoftheanalysis,becauseonedoesnthavefullknowledgeifparticular
variableisasimpletypethatoccupies1/2/4bytesofmemoryormaybeanarray.Iftherearemultiple
memoryaccessesataddressesthatareclosetoeachother,itisusuallysafetodeclaresuchmemoryasan
array,andlatersplitittoseparatevariablesiftherearenoarraylikeaccesses.Thisadviceisespecially
usefulforlocalvariables,soallesp/ebprelatedaccessesshouldbehandledthroughthearray(letscallit
_stackforsimplicity).Duringthisphaseitisveryimportanttoinitializeallknowndata.
2. Loops identification this is the most interesting phase (at least for me). IDAs interactive graph
viewisprobablyoneofthebesttoolstoexecuteit.Itisgoodtostartfromthesimplestinnermostloops
andmoveup.Eachloopcanbegivenadifferentcolourandgrouped.Groupingloopssimplifiesgraph
overviewandhidinggroupedinnermostloopshelpswithidentificationofloopsthatareonelevelupand
soon.
3. Code rewriting probably the most tedious part, rewriting each opcode/group of opcodes to high
levellanguagestatements,basicblockbybasicblock.Ifallloopswereproperlyidentified,itshouldnt
betoohard,butasanyothertedioustaskitiserrorprone.Theonlypartoflogicthatleftareconditional
expressions, which are easy to translate to higher level languages. It is good to mark processed basic
blocks with different colour to avoid later confusion (if for example whole work is split between few
days).
4.Checkingcorrectnessinmostcasesitwillnotworkatfirstrun.Mistakescommittedduringanyof
threefirststagesarecommonandhardtospot.Fixingthoseproblemsusuallyrequireslongdebugging
sessionswithtwodebuggersrunningsidebyside(originalcodeversusrecoveredone).
5.Codebeautificationaftersuccessfullyfinishingphase4itisgoodtogothroughthewholecodeand
finally resolve all uncertainty regarding variables, arrays and constants. It is also good to give proper
names to variables and eliminate all unnecessary constructions that mimic assembly. Ideally after this
phasethereshouldntbeanyvariablesnamedafterx86registersorarrayssimulatingstack.Itshouldlook
likelegitimatehighlevelcode.
Back to the Might and Magic III, Ive roughly followed all steps and at the end I had a working
decompressor.Toavoidmistakesmentionedinpoint4,Ihadadebuggerwithanoriginalcodeopenand
I was checking each rewritten basic block if it generates the same output as the original one. I went
through all compressed streams from MM3.CC and I couldnt trigger one part of the decompression
algorithm, so I temporary left it empty. Red blocks on the below graph shows the part that is never
executedforthegivengamefiles.

http://blog.rewolf.pl/blog/?p=1202

4/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

At this point, Ive started searching on google for the name of the algorithm. I was looking for
hexadecimalconstantsthatcanbefoundindisassemblyandworddecompress:
"0x13A"decompress
"0x4E6"decompress
"0x274"decompress
"0x139"decompress
"0xFC4"decompress

<thisonewasapartialsuccess

I've found source code of an unpacker for some old Amiga file format. Except 0xFC4 constant there
werealsoconstanttablesthatarepresentinMM3decompressoraswell.ItturnedoutthatMM3 uses
LZHUF algorithm (original implementation). I've used this knowledge to further beautify my reverse
http://blog.rewolf.pl/blog/?p=1202

5/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

engineered code (based on this implementation). I've also copied missing part of the algorithm (red
blocks)fromthissourcecode.MM3versionofLZHUFisidenticaltotheoriginalonewithjustsmall
exception, instead of using default 0x20 value to initialize dictionary, it uses value provided as an
argument.This8bitvalueisdifferentforeverycompressedstreamstoredinMM3.CCfile.Iguessed
thatitmightbevalueofthemostoccurringbyteinsideuncompressedstreamandIwasright.

MM3.CCPacker/Unpacker
Havingallthisworkbehindme,Iwantedtoproperlyfinishitwithsomeworkingtoolthatinthefuture
could be used by someone else. CCfile format is described on the Xeen Wiki, but this description is
valid only for CC files from Might and Magic IV and V. MM3.CC has similar structure to its
successors,buttherearesomedifferencesregardingfilenamehashingand(ofcourse)compression.File
headerandtableofcontentisexactlythesameasdescribedonXeenWiki:
structFileEntry;

structFileHeader
{

uint16_tNumberOfFileEntries

FileEntryFileEntries[NumberOfFileEntries];
};

structFileEntry
{

uint16_thash;

uint16_toffsetLo;

uint8_toffsetHi;

uint16_tcompressedSize;
//includes4bytesheader

uint8_tpadding;
};

FileEntriesarrayisencryptedbybelowalgorithm(thesameasonXeenWiki):
voidencryptHeader(uint8_t*buf,size_tsize)
{

uint8_tkey=0xAC;

for(size_ti=0;i<size;i++)

buf[i]=_rotr8(buf[i]key,2);

key+=0x67;

}
}

voiddecryptHeader(uint8_t*buf,size_tsize)
{

uint8_tkey=0xAC;

for(size_ti=0;i<size;i++)

buf[i]=_rotl8(buf[i],2)+key;

key+=0x67;

}
}

FilesinsideCCcontainerareidentifiedby16bithash(FileEntry.hash):
uint16_thashFileName(constchar*fileName)
http://blog.rewolf.pl/blog/?p=1202

6/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

uint16_thash=0;
while(0!=*fileName)
{

uint8_tc=((*fileName&0x7F)<0x60)?*fileName:*fileName0x20;

hash=_rotl16(hash,9);
//xchgbl,bh|rolbx,1

hash+=c;

fileName++;
}
returnhash;

Firsttwoentriesarespecial,notcompressedtextfiles.Thosetwofileshavehardcodedfilesizeinthe
MM3executable,soitisbetternottochangeittoomuch,asthegamewillreadalwaysthesamenumber
ofbytes.Allotherentriesarecompressedblocksofdatawithasmalldescriptoratthebeginningofeach
block:
{

uint16_tdecompressionInitializer;
uint16_tdecompressedSize;

decompressionInitializercouldbeuint8_tasitalwaysstoresthesame8bitvalueinthehighandlow8
bits.I'mnotsurewhyitisstoredlikethis.decompressedSizeis stored as a bigendian value, which I
findweirdaswell.Anotherstrangethingisthatafterrecompressionwithmytool,MM3.CCfileshrunk
by 33kB. I've also prepared list of file names gathered from MM3.EXE to decode proper file names
duringunpacking(listisn'tfull,I'mmissing15namesoutof556).That'sprettymuchit,belowyoucan
findalinktogithubrepositorywithMM3.CCfilepacker/unpacker:
https://github.com/rwfpl/rewolfmm3dumper
CompressionistakenfromstandardLZHUFimplementationwithsomesmallchanges,decompression
istheeffectof23daysofreverseengineeringasdescribedearlier.Bothcompressoranddecompressor
lacksbuffersecuritychecks,sopleasedon'tuseitforanythingmoreseriousthanmoddingMM3.Usage
isverysimple:
x:\mm3>mm3_cc_dumper.exe
MightandMagicIIICCfilepacker/unpackerv1.0
Copyrigh(c)2015ReWolf
http://blog.rewolf.pl
Usage:
Unpack:mm3_cc_dumper.exedumpinput_file.cc
Pack:mm3_cc_dumper.exepackinput_directoryoutput_file.cc

Ihopeallofyouenjoyedthis(maybeabittoolong)pieceofhistory.
<NOTAGS>
ResolvingVMwareWorkstation10.0.6crash

Comments(0)
Nocommentsyet.
http://blog.rewolf.pl/blog/?p=1202

7/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

LeaveaReply
Name(*)
Email(*)
Website
AllowedTagsYoumayusetheseHTMLtagsandattributesinyourcomment.
<ahref=""title=""><abbrtitle=""><acronymtitle=""><b><blockquotecite=""><cite><code>
<deldatetime=""><em><i><qcite=""><s><strike><strong>

PostComment(Ctrl+Enter)

Pingbacks(0)
Nopingbacksyet.

Follow@rwfpl

Search...

13

BitCoinDonation
1REwoLFY8JNYxJSHoVyEdrVzEvJwnwTXi

http://blog.rewolf.pl/blog/?p=1202

8/9

5/24/2015

ReverseengineeringMightandMagicIIIcompression

Pages
Articles
Crackmestutorials
Postsarchive
Sourcecodes
Blogroll
dirtyJOEJavaOverallEditor
GDTR
GitHub
GynvaelColdwind
Spinningmirrors
PoweredbyWordPress/ThemeSimpleDarkbyJustice/20042015ReWolfAllRightsReserved

http://blog.rewolf.pl/blog/?p=1202

9/9

Anda mungkin juga menyukai