Anda di halaman 1dari 14

Technical Note

Virtual Disk Format 5.0


VMware ESXi and Hosted Products

Thedocumentdescribesthevirtualmachinedisk(VMDK)formatandcontainsthefollowingsections:
VirtualDisksforVirtualMachinesonpage 1

LayoutBasicsonpage 2

TheDescriptorFileonpage 3

SimpleExtentsonpage 5

ESXiHostSparseExtentsonpage 9

StreamOptimizedCompressedonpage 11

Glossaryonpage 14

Virtual Disks for Virtual Machines


Whenavirtualmachinesoperatingsystemreadsandwritestovirtualdisk,itusesthesameinterfacesasfor
physicaldisk.VMwaredesignedtheVMDK(virtualmachinedisk)formattomimictheoperationofphysical
disk.VirtualdisksarestoredasoneormoreVMDKfilesonthehostcomputerorremotestoragedevice,and
appeartotheguestoperatingsystemasstandarddiskdrives.

VMwareplatformproductsallsupporttheVMDKformat,withslightvariations.

HostedplatformproductssuchasVMwareWorkstationorVMwareFusionstoreVMDKfilesonafilesystem
providedbyanunderlyinghostoperatingsystem,eitherWindows,Linux,orMacOSX.

DatacenterplatformproductsstoreVMDKfileseitheronthelocalstorageofanESXihost,oronanetwork
connectedstoragedevice.OnESXihosts,VMDKfilesareusuallystoredonVMFS(virtualmachinefilesystem)
partitions,optimizedforlargefilestorage,butcanalsobestoredonNASpartitions(NFS).

VMFS3wasintroducedinESX3.0andstillsupportedforESX/ESXi4.0and4.1.VMFS4wasneverreleased.
VMwareintroducedVMFS5invSphere5,withenhancementsshowninTable 1.

Table 1. Comparison of VMFS-3 and VMFS-5


VMFS-3 VMFS-5

Thelargestextentforadiskvolumewas2TB. Extentlimitincreasedto~60TB,forlargervolumesincludingRDM.

MBR(masterbootrecord)partitiontype. GPT(GUIDpartitiontable)supportslargerextents.

Blocksizewas1,2,4,or8MBforverylargefiles. Unified1MBblocksizesupportsverylargefiles>256GB.

Smallestsubblockwas64KB. 8KBsubblocksosmallfilesconsumelessspaceandgroweasily.

Maximumfilecountwas30,720. Supportfor>100,000filespervolume.

MaximumVMDKfilesizeis2TB. Samelimit.

MaximumnumberofsupportedLUNsis256. Samelimit.

LockingofentireLUNbySCSIreservation. PersectorVAAIhardwareassistedlockingreducesdiskcontention.

VMware, Inc. 1
Virtual Disk Format 5.0

InformationinthistechnicalnoteappliestovirtualdiskscreatedonWorkstation5orlater,VMwareFusion,
VMwareServer,ESX3.0orlater,andESXi3.5orlater.Earlierproductsmayuseformatsdifferentfromtheones
describedhere.Topicsthatarenotdiscussedinthisdocumentincludethefollowing:

VirtualdiskscreatedonESX2hostsorearlier,GSX3orearlier,Workstation4orearlier,orVMwareACE.
AlsovirtualdiskscreatedinlegacymodeonWorkstation5.

Devicebackedvirtualdisks.
Encryption,includingencryptedextentsandencrypteddescriptorfiles.

Defragmenting,shrinking,andconsolidatingofvirtualdisks.

ThistechnicalnoteproceedswithahighlevelintroductiontothelayoutofthefilesthatmakeupaVMware
virtualdisk.Itthendrillsdownintodetailsofthedatastructuresinsidethosevirtualdiskfiles.

Layout Basics
VMwarevirtualdiskscanbedescribedatahighlevelbylookingattwokeycharacteristics:

Thevirtualdiskmayusebackingstoragecontainedinasinglefile,oritmayusestoragethatconsistsof
acollectionofsmallerfiles.

Allofthediskspaceneededforavirtualdisksfilesmaybeallocatedatthetimethevirtualdiskiscreated,
orthevirtualdiskmaystartsmallandgrowasneededtoaccommodatenewdata.

Aparticularvirtualdiskmayhaveanycombinationofthesetwocharacteristics.

OnecharacteristicofrecentgenerationVMwarevirtualdisksisthatatextdescriptordescribesthelayoutof
thedatainthevirtualdisk.Thisdescriptormaybesavedasaseparatefileormaybeembeddedinafilethat
ispartofavirtualdisk.ThesectiontitledTheDescriptorFileonpage 3explainstheinformationcontained
inthedescriptor.

Thewayavirtualdiskusesstoragespaceonaphysicaldiskvaries,dependingonthetypeofvirtualdiskyou
selectwhenyoucreatethevirtualmachine.

Initially,forexample,avirtualdiskconsistsofonlythebasedisk.Ifyoutakeasnapshotofavirtualmachine,
itsvirtualdiskincludesboththebaselinkandadeltalink(referredtoinsomeproductdocumentationasa
redologfile).Whentheguestoperatingsystemwritestodisk,changessinceyoutookthesnapshotarestored
inthedeltalink.Itispossibleformorethanonedeltalinktobeassociatedwithaparticularbasedisk.

Youcanthinkofthebasediskandthedeltalinksaslinksinachain.Thevirtualdiskconsistsofallthelinksin
thediskchain.

Figure 1. Links in the chain comprise the virtual disk

Link A Base disk

Link B Delta link 1

Link C Delta link 2

Eachlinkinthechainismadeupofoneormoreextents.

Figure 2. Extents that make up a link

Extent 0 Extent 1 Extent 2 Extent 3

Anextentisaregionofphysicalstorage,oftenafile,thatisusedbythevirtualdisk.

Inthelinksdiagramabove,linksBandCarenecessarilymadeupofextentsthatbeginsmallandgrowover
time,referredtoassparseextents.LinkAcanbemadeupofextentsofanykindsparse,preallocated,oreven
backeddirectlybyaphysicaldevice.

VMware, Inc. 2
Virtual Disk Format 5.0

The Descriptor File


Foramoredetailedviewofhowtheseelementsofavirtualdiskcometogetherinpractice,lookatthe
followingexampletextdescriptorfile,calledtest.vmdk.Itdescribesalinkinavirtualdiskthatissplitinto
filesnolargerthan2GBeachandthatstartssmallandgrowsasdataisadded.

Contentsofthedescriptorfilearenotcasesensitive.Linesbeginningwith#arecommentsandareignoredby
theVMwareprogramthatopensthedisk.
% cat test.vmdk
# Disk DescriptorFile
version=1
CID=fffffffe
parentCID=ffffffff
createType="twoGbMaxExtentSparse"
# Extent description
RW 4192256 SPARSE "test-s001.vmdk"
RW 4192256 SPARSE "test-s002.vmdk"
RW 2101248 SPARSE "test-s003.vmdk"
# The Disk Data Base
#DDB
ddb.adapterType = "ide"
ddb.geometry.sectors = "63"
ddb.geometry.heads = "16"
ddb.geometry.cylinders = "10402"

The Header
Thefirstsectionofthedescriptoristheheader.Itprovidesthefollowinginformationaboutthevirtualdisk:

versionThenumberfollowingversionistheversionnumberofthedescriptor.Thedefaultvalueis1.

CIDThislineshowsthecontentID.Itisarandom32bitvalueupdatedthefirsttimethecontentofthe
virtualdiskismodifiedafterthevirtualdiskisopened.EverylinkheadercontainsbothacontentIDand
aparentcontentID(describedbelow).

IfalinkhasaparentasistrueoflinksBandCinthediagramoflinksinachaintheparentcontentID
isthecontentIDoftheparentlink.

IfalinkhasnoparentasistrueoflinkAinthediagramoflinksinachaintheparentcontentIDisset
toffffffff(seeparentCIDbelow).

ThepurposeofthecontentIDistocheckthefollowing:

Inthecaseofabasediskwithadeltalink,thattheparentlinkhasnotchangedsincethetimethedelta
linkwascreated.Iftheparentlinkhaschanged,thedeltalinkmustbeinvalidated.

Thatthebottommostlinkwasnotmodifiedbetweenthetimethevirtualmachinewassuspended
andthetimeitwasresumed,orbetweenthetimeyoutookasnapshotofthevirtualmachineandthe
timeyourevertedtothesnapshot.

parentCIDThislineshowsthecontentIDoftheparentlinkthepreviouslinkinthechainifitexists.
Ifthelinkdoesnothaveanyparent(thatis,thelinkisabasedisk)theparentCIDissettoffffffff.

createTypeThislinedescribesthetypeofvirtualdisk.Flatdiskisfullyallocatedatcreationtime
(preallocated).Sparsediskisallocatedasneededtostoredata.Notincludinglegacytypesofvirtualdisk,
createTypecanbeoneofthefollowing:

customdescriptorfilewitharbitraryextents.

monolithicSparsesinglesparseextentwithembeddeddescriptorfile.

monolithicFlatsingleflatextentwithseparatedescriptorfile.

2GbMaxExtentSparsesparseextents2GBorsmallertoaccountforfilesystemlimits.

2GbMaxExtentFlatflatextents2GBorsmallertoaccountforfilesystemlimits.

fullDevicediskthattakesthepropertiesof,andisbackedby,physicaldiskonthehost.

VMware, Inc. 3
Virtual Disk Format 5.0

partitionedDevicediskbackedbysomepartitionsofphysicaldisk,withotherpartitionshidden.

vmfsPreallocatedthick(flat)diskonVMFS,withblockszeroedonfirstuse.

vmfsEagerZeroedThickpreallocated(flat)diskonVMFS,withallblockszeroedwhencreated.

vmfsThinthinprovisionedVMFSdisksthatconsumeonlyasmuchspaceasneeded.

vmfsSparsesparsediskonVMFS,oftenaredolog,nottobeconfusedwiththinprovisioneddisk.
vmfsRDMvirtualcompatibilityrawdevicemap(RDM)actslikeasymboliclinktophysicaldisk.

vmfsRDMPphysicalcompabibilityRDM,similarbutsendsSCSIcommandstounderlyinghardware.

vmfsRawspecialrawdiskforESXihosts,passthroughonlymode.

streamOptimizedcompressedsparseextentswithembeddedLBA,usefulforOVFstreaming.

ThefirstsevendisktypesareforVMwarehostedproducts.Termsthatincludemonolithicindicatethat
thevirtualdiskiscontainedinasinglefile.Termsthatinclude2GbMaxExtentindicatethatthevirtualdisk
consistsofacollectionofsmallerfiles.Termsthatincludesparseindicatethatavirtualdiskstartssmall
andgrowstoaccommodatedata.Termsthatincludeflatindicatethatdiskspaceisallocatedatcreation
time.Productdocumentationalsousesthetermsgrowableandpreallocated,respectively.

TermsprefixedbyvmfsareusedforstorageonESXihosts.TYpevmfsSeSparseisforspaceefficient
sparsediskusedfornewstyleredologs.TypevmfsThickreferstononzeroedpreallocateddisk,andis
deprecated.TypesvmfsRawDeviceMapandvmfsPassthroughRawDeviceMapareusedinheadersfor
disksthatuseESXirawdevicemapping.

TypesfullDevice,partitionedDevice,andvmfsRawareusedwhenavirtualmachineisconfiguredto
makedirectuseofaphysicaldisk,orpartitionsonaphysicaldisk,ratherthanconfiguredtostoredatain
filesmanagedbyahostoperatingsystemorVMFS.

ThetermstreamOptimizedisusedtodescribedisksthathavebeenoptimizedforstreaming.

parentFileNameHintThisline,presentonlyifthelinkisadeltalink,containsthepathtotheparentof
thedeltalink.

The Extents
Eachlineofthesecondsectiondescribesoneextent.Theextentsareenumeratedbeginningwiththeone
accessibleatoffset0fromthevirtualmachinespointofview.Theformatofthelinelookslikeoneofthe
followingexamples:

RW 4192256 SPARSE "test-s001.vmdk"

Access Type of extent Filename


Size in sectors

RW 1048576 FLAT "test-f001.vmdk" 0

Access Type of extent Filename Offset


Size in sectors

Theextentdescriptionsprovidethefollowingkeyinformation:

AccessmaybeRW,RDONLY,orNOACCESS

Sizeinsectorsasectoris512bytes

TypeofextentmaybeFLAT,SPARSE,ZERO,VMFS,VMFSSPARSE,VMFSRDM,orVMFSRAW.

Filenameshowsthepathtotheextent(relativetothelocationofthedescriptor).
Ifthetypeofthevirtualdisk,shownintheheader,isfullDeviceorpartitionedDevice,thenthefilename
shouldpointtoanIDEorSCSIblockdevice.IfthetypeofthevirtualdiskisvmfsRaw,thefilenameshould
pointtoafilein/vmfs/devices/disks/.

VMware, Inc. 4
Virtual Disk Format 5.0

Offsettheoffsetvalueisspecifiedonlyforflatextentsandcorrespondstotheoffsetinthefileordevice
wheretheguestoperatingsystemsdataislocated.Forpreallocatedvirtualdisks,thisnumberiszero.For
devicebackedvirtualdisks(physicalorrawdisks),itmaybenonzero.

The Disk Database


Additionalinformationaboutthevirtualdiskisstoredinthediskdatabasesectionofthedescriptor.Eachline
correspondstooneentry.Eachentryisformattedasfollows:
ddb.<nameOfEntry> = "<value of entry>"

Whenthevirtualdiskiscreated,thediskdatabaseispopulatedwithentrieslikethoseshownintheexample
descriptor.Theentrynamesareselfexplanatoryandshowthefollowinginformation:

Theadaptertypecanbeide,buslogic,lsilogic,orlegacyESX.Thebuslogicandlsilogicvalues
areforSCSIdisksandshowwhichvirtualSCSIadapterisconfiguredforthevirtualmachine.The
legacyESXvalueisforolderESX/ESXihostswhentheadaptertypeusedincreatingthevirtualmachine
isnotknown.

Thegeometryvaluesforcylinders,heads,andsectorsareinitializedwiththegeometryofthedisk,
whichdependsontheadaptertype.

Thereisonedescriptor,andthusonediskdatabase,foreachlinkinachain.Searchesfordiskdatabase
informationbegininthedescriptorforthebottomlinkofthechainLinkCintheillustrationoflinksinthe
chainandworktheirwayupthechainuntiltheinformationisfound.

Layout of the Example Disk


Thelinkdescribedintheexampledescriptorhasthreeextents,eachofwhichisafileondisk.Thefollowing
diagramshowsthelayoutofthislinkandthefilenamesoftheextents:

test-s001.vmdk test-s002.vmdk test-s003.vmdk

Simple Extents
Thesimplestkindsofextentsarebackedbyaregionofafileorablockdevice.Theseincludetheextenttypes
showninthedescriptorasFLAT,VMFS,VMFSRDM,orVMFSRAW.

Monolithic or Flat VMDK


Avirtualdiskdescribedasmonolithicandflatconsistsoftwofiles.Onefilecontainsthedescriptor.Theother
fileistheextentusedtostorevirtualmachinedata.

Consideranextentthatisdescribedbythefollowinglineinadescriptorfile.
RW 1048576 FLAT "test-f001.vmdk" 0

Thismeansthatfiletest-f001.vmdkis1048576sectors512bytes/sector=536870912bytes=512MBinsize.

InVMwareESXihosts,eachlinkincludesonlyoneextent.

Accessing a Sector in a Flat Extent


Assumeyouwantaccesstodatainalinkthatismadeupoftwoflatextents.ThesizeofthefirstextentisC1.
ThesizeofthesecondextentisC2.Youwantaccesstosectorxinthevirtualdisk,andx'isthesectoroffsetin
extent1or2wherexislocated.

If x >= C1,thesectorisinextent2.Itsrelativesectoroffsetis: x' = x C1

If x < C1,thesectorisinextent1atoffsetx: x' = x

VMware, Inc. 5
Virtual Disk Format 5.0

Hosted Sparse Extents


Inasparseextent,datastoragespaceisnotallocatedinadvance.Instead,spaceisallocatedasitisneeded.A
sparseextentalsokeepstrackofwhetherornotdataisrepresentedintheextent.Deltalinksmadeupofsparse
extentsusethecopyonwritesemantic.Eachsparseextentismadeupofthefollowingblocks:

Sparse header

Embedded descriptor (Optional)


Redundant grain directory

Redundant grain table #0


...

Redundant grain table #n

Grain directory

Grain table #0
...

Grain table #n

(Padding to grain align)

Grain

Grain
...

Hosted Sparse Extent Header


ThefollowingexampleshowsthecontentofasparseextentsheaderfromaVMwarehostedproduct,suchas
VMwareWorkstation,VMwarePlayer,VMwareFusion,VMwareACE,orVMware(GSX)Server:
typedef uint64 SectorType;
typedef uint8 Bool;
typedef struct SparseExtentHeader {
uint32 magicNumber;
uint32 version;
uint32 flags;
SectorType capacity;
SectorType grainSize;
SectorType descriptorOffset;
SectorType descriptorSize;
uint32 numGTEsPerGT;
SectorType rgdOffset;
SectorType gdOffset;
SectorType overHead;
Bool uncleanShutdown;
char singleEndLineChar;
char nonEndLineChar;
char doubleEndLineChar1;
char doubleEndLineChar2;
uint16 compressAlgorithm;
uint8 pad[433];
} SparseExtentHeader;

Thisstructureneedstobepacked.Ifyouusegcctocompileyourapplication,youmustusethekeyword
__attribute__((__packed__)).

Notes:

AllthequantitiesdefinedasSectorTypeareinsectorunits.

magicNumberisinitializedwith
#define SPARSE_MAGICNUMBER 0x564d444b /* 'V' 'M' 'D' 'K' */
Thismagicnumberisusedtoverifythevalidityofeachsparseextentwhentheextentisopened.

VMware, Inc. 6
Virtual Disk Format 5.0

versionTheversionnumbercanbe1or2.SeeVersion2HostedSparseExtentsonpage 7.

SparseExtentHeaderisstoredondiskinlittleendianbyteorder,soifyouexaminethefirsteightbytes
ofaVMDKfile,youseeKDMV0x010x000x000x00orKDMV0x020x000x000x00.

flagscontainsthefollowingbitsofinformationinthecurrentversionofthesparseformat:

bit0:validnewlinedetectiontest.
bit1:redundantgraintablewillbeused.

bit2:zeroedgrainGTEwillbeused.SeeVersion2HostedSparseExtents,below.

bit16:thegrainsarecompressed.ThetypeofcompressionisdescribedbycompressAlgorithm.

bit17:therearemarkersinthevirtualdisktoidentifyeveryblockofmetadataordataandthe
markersforthevirtualmachinedatacontainlogicalblockaddressing(LBA).

grainSizeisthesizeofagraininsectors.Itmustbeapowerof2andmustbegreaterthan8(4KB).

capacityisthecapacityofthisextentinsectorsshouldbeamultipleofthegrainsize.

descriptorOffsetistheoffsetoftheembeddeddescriptorintheextent.Itisexpressedinsectors.Ifthe
descriptorisnotembedded,alltheextentsinthelinkhavethedescriptoroffsetfieldsetto0.

descriptorSizeisvalidonlyifdescriptorOffsetisnonzero.Itisexpressedinsectors.

numGTEsPerGTisthenumberofentriesinagraintable.Thevalueofthisentryforvirtualdisksis512.

rgdOffsetpointstotheredundantlevel0ofmetadata.Itisexpressedinsectors.

gdOffsetpointstothelevel0ofmetadata.Itisexpressedinsectors.

overHeadisthenumberofsectorsoccupiedbythemetadata.

uncleanShutdownissettoFALSEwhenVMwaresoftwareclosesanextent.Afteranextenthasbeen
opened,softwarechecksforthevalueofuncleanShutdown.IfTRUE,thediskischeckedforconsistency
anduncleanShutdownissettoTRUEafterthisconsistencycheck.Thus,ifthesoftwarecrashesbeforethe
extentisclosed,thisbooleanisfoundtobesettoTRUEthenexttimethevirtualmachineispoweredon.

FourentriesareusedtodetectwhenanextentfilehasbeencorruptedbytransferringitusingFTPintext
mode.Theentriesshouldbeinitializedwiththefollowingvalues:
singleEndLineChar = '\n';
nonEndLineChar = ' ';
doubleEndLineChar1 = '\r';
doubleEndLineChar2 = '\n';

compressAlgorithmdesignatesthealgorithmtocompresseverygraininthevirtualdisk.Ifbit16ofthe
flagsfieldisnotset,COMPRESSION_NONEisassumed.ThedeflatealgorithmisdescribedinRFC1951.
#define COMPRESSION_NONE 0
#define COMPRESSION_DEFLATE 1

Version 2 Hosted Sparse Extents


RecentVMwarehostedplatformproductssupportanewzeroedgraingraintableentry(GTE).The
zeroedgrainGTEreturnsallzerosonread.Inotherwords,thezeroedgrainGTEindicatesthatagraininthe
childdiskiszerofilledbutdoesnotactuallyoccupyspaceinstorage.AsparseextentwithzeroedgrainGTE
hasthefollowinginitsheader:

SparseExtentHeader.version=2
SparseExtentHeader.flagshasbit2set

OtherthanthenewflagandthepossiblyzeroedgrainGTE,version2sparseextentsareidenticaltoversion1.
Also,azeroedgrainGTEhasvalue0x1intheGTtable(fordetails,seeSummaryonpage 9).Currently
version2hostedsparseextentsoccurwhenyoushrinkachilddisk(alsocalledsnapshot).Theymayoccurin
othercircumstances.Whenashrinkoperation(alsocalledcompact)isdoneonaversion1childdisk,the
versionnumberisupgradedto2,andthecompacteddisktakesuplessspacethanitwouldotherwise.

VMware, Inc. 7
Virtual Disk Format 5.0

ReleasesbeforeWorkstation5cannotreadversion2sparsedisks,butallreleasesofVMwareFusioncan.

Productsmay(butarenotrequiredto)downgradeaversion2sparseextenttoversion1iftheextentnolonger
containsazeroedgrainGTE.Thisisdonebysettingversion=1andsettingbit2offlagsto0.

Hosted Sparse Extent Metadata


TherearetwolevelsofmetadatainasparseextentfromahostedVMwareproduct.Level0metadataiscalled
agraindirectoryoraGD.Level1metadataiscalledagraintableoraGT.Eachentryinthelevel0metadata
pointstoablockoflevel1metadata,asshowninthefollowingdiagram:

GDE#0 | GDE#1 | GDE#2 | GDE#3 | ... GD: level 0

GTE#0 GTE#0 GTE#0


GTE#1 GTE#1 GTE#1
GTE#2 GTE#2 GTE#2
GTE#3 GTE#3 GTE#3 GTs: level 1
... ... ...
... ... ...

Redundancy
VMwaresoftwarekeepstwocopiesofthegraindirectoriesandgraintablesondisktoimprovethevirtual
disksresiliencetohostdrivecorruption.

Grain Directory
EachentryinagraindirectoryiscalledagraindirectoryentryorGDE.Agraindirectoryentryistheoffsetin
sectorsofagraintableinasparseextent.Thenumberofgraindirectoryentriespergraindirectory(thesizeof
thegraindirectory)dependsonthelengthoftheextent.Agraindirectoryentryisa32bitquantity.

Grain Table
EachentryinagraintableiscalledagraintableentryorGTE.Agraintableentrypointstotheoffsetofagrain
inthesparseextent.Therearealways512entriesinagraintable,andagraintableentryisa32bitquantity.
Consequently,eachgraintableis2KB.

Inanewlycreatedsparseextent,allthegraintableentriesareinitializedto0,meaningthatthegraintowhich
eachgraintableentrypointsisnotyetallocated.Onceagrainiscreated,thecorrespondinggraintableentry
isinitializedwiththeoffsetofthegraininthesparseextentinsectors.

Allthegraintablesarecreatedwhenthesparseextentiscreated,hencethegraindirectoryistechnicallynot
necessarybuthasbeenkeptforlegacyreasons.Ifyoudisregardtheabstractionprovidedbythegrain
directory,youcanredefinegraintablesasblocksofgraintableentriesofarbitrarysize.Iftherewerenograin
directories,therewouldbenoneedtoimposealengthof512entries.

Grain
Agrainisablockofsectorscontainingdataforthevirtualdisk.Thegranularityisthesizeofagraininsectors.
ItisapropertyoftheextentandisspecifiedinthesparseextentheaderasgrainSize.Thedefaultiscurrently
128,thuseachgraincontains64KBofvirtualmachinedata.Thesizeofasparseextentshouldbeamultipleof
grainSize.Eachgrainstartsatanoffsetthatisamultipleofthegrainsize.

Accessing a Sector in a Hosted Sparse Extent


Assumeyouwantaccesstodatainsectorxstoredinalinkcontainingasinglesparseextent.Youneedtolocate
thegraincontainingthissector(ifitexists)byfirstlookingupthegraindirectoryentrytofindthelocationof
thegraintablethatrecordsthegrainslocation.

VMware, Inc. 8
Virtual Disk Format 5.0

IfgrainSizeisdefinedas

grain=2Gsectors

thentheareaaccessiblewithasinglegraintableis

gtCoverage=numberofGTEsperGTgrainSize
=5122G
=292G
=29+Gsectors

IfthegrainSizeis128sectors,then:

gtCoverage=29+7
=216sectors
=32MB

Toverifythatthegraincontainingthesectorhasbeenallocated,youmustexamineagraintable.Tofindthe
graintableyouneed,examinethegraindirectoryentryatoffsetfloor(x/gtCoverage)inthegraindirectory.
GDE = GD [ floor(x/gtCoverage) ]

Functionfloorisdefinedas:floor(s)isanintegersuchthat
floor(s) s < floor(s) + 1

Usingthisgraindirectoryentry,youcanlocatethegraintable.Thegrainyouwantispointedtoby
GTE = GT [ floor( (x % gtCoverage) / grainSize) ]

IfGTEis0,itmeansthegrainisnotyetallocated.Allthereadsinthisgrainreturnsectorsof0s(unlessthere
isaparentlink).Thefirstwriteallocatesagrain.Ifthereisnoparent,thegrainisinitializedwith0s.Ifthereis
aparentlink,youneedtorespectthecopyonwritesemanticandinitializethecontentofthegrainbyreading
fromtheparent.

IfGTEis1,thatmeansthegrainisallzeros.Allthereadsinthisgrainreturnsectorsofzeros,evenifthereis
aparentlink.Thefirstwriteallocatesagrain,whichisinitializedwithzeros.

Summary
GDE = GD [ floor(x / 2(9+G)) ]

GTE = GT [ floor((x % 2(9+G)) / 2G) ]

[ GTE == 0 ] <==>[grainisnotpresent,thus
readswithnoparent:return0s;
readswithaparent:readfromparent;
writes:allocateagrainandwritetoit]

[ GTE == 1 ] <==> [grainiszeroreads:returnzeros;writes:allocateazeroedgrainandwritetoit]

[ GTE > 1 ] <==> [grainispresent,readfromandwritetoit]

ESXi Host Sparse Extents


SparseextentsinESXihostshaveadifferentlayoutfromthoseinthehostedproducts.Thesparseextent
headerinanESXihostreferstothesparseextentasacopyonwrite(COW)disk.Therearetwolevelsof
metadatainasparseextentonanESXihost.

Thefirstlevel,orthegraindirectory,referstothesetofgraindirectoryentries(GDEs),whereeachGDE
coversCOW_NUM_LEAF_ENTRIES(=4096)*granularity(=512bytes)=2MBofdata.Thegraindirectory
isstoredaftertheCOWDiskheaderandisupdatedwhenanewGDEisinitializedormodified.

Thesecondlevelinthecopyonwritemetadataisagraintable(GT).Thegraintableis16KBinsizeand
covers4096datasectors.AnewGTisallocatedwhenanewGDEisaddedandismodifiedwhenanew
GTEisallocated.

VMware, Inc. 9
Virtual Disk Format 5.0

AGTisfollowedbythedatasectorscorrespondingtoitsGTEs.Becausedeltalinks(alsocalledredologs)are
sparse,allthedatasectorsarenotallocatedimmediatelyafteraGT.Thefollowingdiagramshowsthelayout:

COWDisk header
Grain directory
Grain table

Data corresponding
to GTEs

Grain table

Data corresponding
to GTEs
...

ESXi Host Sparse Extent Header


ThefollowingexampleshowsthecontentofasparseextentsheaderonanESXihost:
#define COWDISK_MAX_PARENT_FILELEN 1024
#define COWDISK_MAX_NAME_LEN 60
#define COWDISK_MAX_DESC_LEN 512
typedef struct COWDisk_Header {
uint32 magicNumber;
uint32 version;
uint32 flags;
uint32 numSectors;
uint32 grainSize;
uint32 gdOffset;
uint32 numGDEntries;
uint32 freeSector;
union {
struct {
uint32 cylinders;
uint32 heads;
uint32 sectors;
} root;
struct {
char parentFileName[COWDISK_MAX_PARENT_FILELEN];
uint32 parentGeneration;
} child;
} u;
uint32 generation;
char name[COWDISK_MAX_NAME_LEN];
char description[COWDISK_MAX_DESC_LEN];
uint32 savedGeneration;
char reserved[8];
uint32 uncleanShutdown;
char padding[396];
} COWDisk_Header;

Notes:

magicNumberissetto0x44574f43whichisASCIICOWD.

versionThevalueofthisentryshouldbe1.

flagsissetto3.

numSectorsreferstototalnumberofsectorsonthebasedisk.

grainSizeisthegranularityofdatastoredindeltalinks.Thisvariesfromonesector(thedefault)to1MB.

gdOffsetstartsatthefourthsector,becausetheCOWDisk_Headerstructuretakesfoursectors.

numGDEntriesisCEILING(numSectors, gtCoverage)

VMware, Inc. 10
Virtual Disk Format 5.0

freeSectoristhenextfreedatasector.Itmustbelessthanthelengthofthedeltalink.Itisinitiallysetto
gdOffset + numGDSectors;

savedGenerationisusedtodetecttheuncleanshutdownofthedeltalink.Itisinitiallysetto0.

uncleanShutDownisusedtotriggerthemetadataconsistencycheckincasethereisanabnormal
terminationoftheprogram.

Theremainingfieldsarenotused.Theyarepresentforcompatibilitywithlegacyvirtualdiskformats.

ESXi Host Sparse Extent Metadata


ThemetadataforanESXihostsparseextentissimilartothatforasparseextentinahostedVMwareproduct,
asdescribedinHostedSparseExtentMetadataonpage 8,withthefollowingexceptions:

ESXisparseextentsdonotincluderedundantcopiesofthegraindirectory.

Graintableshave4096entries.

Eachgraincontains512bytes.

Accessing a Sector in an ESXi Host Sparse Extent


ThemethodforaccessingasectorinanESXihostsparseextentissimilartothatdescribedinAccessinga
SectorinaHostedSparseExtentonpage 8.Besuretoallowforthedifferencesinmetadatadescribedabove.

Stream-Optimized Compressed
Streamoptimizedcompressedextentsaremeanttobeeasilystreamedoveranetworklink.Theyaredesigned
tominimizethememoryfootprintoftheserverstreamingthevirtualdiskandalsoallowfortheuseofasimple
clientapplicationtoreadthevirtualdiskdata.Thisvirtualdisktypeisusedprimarilyinthemonolithicform,
typicallyfordeliveryofOVFvirtualappliances.

Eachstreamoptimizedcompressedsparseextentismadeofthefollowingblocks:

Sparse header

Embedded descriptor

Grain marker

Compressed grain

...

Grain table marker

Grain table

Grain marker

Compressed grain

...

Grain table marker

Grain table

[ ... ]

Grain directory marker

Grain directory

Footer marker

Footer

End-of-stream marker

VMware, Inc. 11
Virtual Disk Format 5.0

Eachmarkeranditsassociatedblockbeginonasectoror512byteboundary.EachmarkercanbeseenasaC
structurewiththefollowinglayout:
struct Marker {
SectorType val;
uint32 size;
union {
uint32 type;
uint8 data[0];
} u;
};

Therearefivetypesofmarkers:compressedgrainmarkers,graintablemarkers,graindirectorymarkers,
footermarkets,andendofstreammarkers.Grainmarkersareindicatedbyanonzerosizesothereisnotype
IDforthem.
#define MARKER_EOS 0
#define MARKER_GT 1
#define MARKER_GD 2
#define MARKER_FOOTER 3

Basedonthevaluesofval,size,andtype,youcandistinguishbetweenthevarioustypesofmarkersand
theirassociatedblocks.Additionaltypesmaybedefinedinthefuturetoindicatevariousmetadataelements.

Inthefollowingdiscussionofmarkertypes,misapointertoamarkerdefinedbytheMarkerstructure.

Compressed Grain Marker


Pointermisamarkerforacompressedgrainifm->size != 0.Inthiscase,themarkerandblockhavethe
followinglayout:
struct GrainMarker {
SectorType lba;
uint32 size;
uint8 data[0];
};

Inthisstructure:

lbaistheoffsetinthevirtualdiskwheretheblockofcompresseddataislocated

sizeisthesizeofthecompresseddatainbytes

dataisthedatacompressedwithRFC1951

End-of-Stream Marker
Pointermisanendofstreammarkerifm->size == 0 && m->u.type == MARKER_EOS.Theendofstream
markersignalstheendofthevirtualdisk.Eachendofstreammarkerispaddedtooccupyasector.The
structurelookslikethis:
struct EOSMarker {
SectorType val;
uint32 size;
uint32 type;
uint8 pad[496];
};

Inthisstructure:

valis0.

sizeis0.

typeisMARKER_EOS (0).

padisunused.Itmustbewrittenaszeroandignoredonread.

VMware, Inc. 12
Virtual Disk Format 5.0

Metadata Markers
Markersusedtosignaltheblockscontaininggraintables,graindirectories,orfootershavethesamelayout.

If m->size == 0 && m->u.type == MARKER_GT,misamarkerforagraintable.

If m->size == 0 && m->u.type == MARKER_GD,misamarkerforagraindirectory.

If m->size == 0 && m->u.type == MARKER_FOOTER,misamarkerforafooter.

Thesemarkersandtheblocksofdatatheysignalhavethefollowinglayout:
struct MetaDataMarker {
SectorType numSectors;
uint32 size;
uint32 type;
uint8 pad[496];
uint8 metadata[0];
};

Inthisstructure:

numSectorsisthenumberofsectorsoccupiedbythemetadata,excludingthemarkeritself.

sizeis0.

typeisoneofMARKER_GT (1),MARKER_GD (2),orMARKER_FOOTER (3).

padisunused.Itmustbewrittenaszeroandignoredonread.

metadatapointstoagraintableiftypeisMARKER_GT,agraindirectoryiftypeisMARKER_GD,orafooter
iftypeisMARKER_FOOTER.

Header and Footer


TheheaderandthefooterarebothdescribedbythesameSparseExtentHeaderstructureshowninHosted
SparseExtentHeaderonpage 6.Thefootertakesprecedenceontheheaderwhenitexists.Thefootershould
bethelastblockofthediskandimmediatelyfollowedbytheendofstreammarkersothattheytogether
occupythelasttwosectorsofthedisk.

Streamoptimizedcompressedsparsedisksdifferfromregularsparsedisksinthat:

flagshasbits16and17settoindicatethatthegrainsarecompressedandthateachblockofmetadataor
dataisidentifiedbyamarker.

compressAlgorithmissettoCOMPRESSION_DEFLATE(1).

ThiscompressionalgorithmisdescribedinRFC1951.
ThergdOffsetshouldbeignoredbecausebit1oftheflagsfieldisnotset.

TheheaderandfooterdifferinthatthefieldgdOffsetissetto
#define GD_AT_END 0xffffffffffffffff

inthecopyoftheheaderstoredattheverybeginningoftheextent,whereasitissettothepropervalueforthe
copyoftheheader(footer)thatisstoredattheendoftheextent.

VMware, Inc. 13
Virtual Disk Format 5.0

Glossary
ChainAcollectionofdisklinksthatcanbeaccessedasasingleentity.

ChilddiskAdisklinkinadiskchainthathasaparentlink.

DeltalinkAlinkmadeofoneormoresparseextents.Itisadifferencelink,achildofaparentlink.Itcontains
onlydatathattheguestoperatingsystemhaswrittentothediskafterthecreationofthedeltalink.Itallows
softwaretogobackintimeand,bysimplyremovingthedeltalink,restorethecontentofthedisktoitsstate
immediatelybeforecreationofthedeltalink.Deltalinksarealsocalledredologfiles.

DescriptorDataaboutthediskabstraction,suchastotalspaceoranextentlist.Thedescriptormaybeina
separatefileorembeddedintheheaderofasparseextent.Anembeddeddiskdescriptorisplacedinthefirst
extentofadisklinkratherthaninaseparatediskdescriptorfile.Anembeddeddiskdescriptorcanbeused
onlywhenthefirstextentofalinkissparse.

DiskAdiskchainthatappearstotheguestoperatingsystemasasinglephysicaldisk.

DiskdatabaseAnamevaluepairtextdatabasefoundinthediskdescriptor.Itcontainsinformationthatthe
disklibrarydoesnotneedfordiskfunction.Examplesofthesekindsofvaluesarevirtualhardwareversion
andVMwareToolsversion.

ExtentAregionofadisklinkbackedbyaregionofafileordevice.Anextentcanbesparse,flat,ordevice.
Anextentdoesnothavenotionsofdiskpropertiesbutactspurelyasstorageofacertainsize.Aflatextentis
anextentbackedbyaflatfile.Flatextentsarealsocalledplainorpreallocated.Asparseextentisanextentthat
doesnotallocateitsdatastoragespaceinadvance,butallocatesasitgoesalong,andkeepstrackofwhether
ornotdataisrepresentedintheextent.Sparseextentsarealsocalledgrowable.

FlatSpaceinaVMDKisfullyallocatedatcreationtime(preallocated).Contrastwithsparse.

GrainAblockofsectorscontainingdataforthevirtualmachinesdisk.Granularitydefinesthesizeofagrain.
Eachgraintableentrypointstoonegrain.

GranularityThesizeofasinglegraininasparseextent.

GraindirectoryMetadataidentifyingthelocationsofgraintables.Thegraindirectoryisignoredbyrecent
VMwareprogramsbecausethegraintableisallocatedinadvance.

GraintableMetadataidentifyingthelocationsofgrains.

LinkAsinglenodeinadiskchain.Alinkconsistsofoneormoreextents.

ParentlinkAlinkthathasachild.Aparentmayitselfhaveaparent.

SparseSpaceinaVMDKisallocatedonlywhenneededtostoredata(growable).Contrastwithflat.

If you have comments about this documentation, submit your feedback to: docfeedback@vmware.com
VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com
Copyright 2007, 2011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are
covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or
other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
Item: EN-000777-00
Updated: 12/20/11

14

Anda mungkin juga menyukai