Anda di halaman 1dari 4

# 5/9/2016

Mrunal

HomePage
AssignmentsDue

Submissionnumber:
Submissioncertificate:
Submissiontime:

130164
JF748747
2016050620:57:33PST(GMT8:00)

ProgressReport
Handouts
Tutorials
Homeworks

Numberofquestions:
Positivepointsperquestion:
Negativepointsperquestion:
Yourscore:

5
3.0
1.0
0

LabProjects
LogOut

Help

1.SupposeweperformthePCYalgorithmtofindfrequentpairs,withmarket
s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
Thereare250,000frequentitems,thatis,itemsthatoccur10,000
timesormore.
Thereareonemillionpairsthatoccur10,000timesormore.
TherearePpairsthatoccurexactlyonceandconsistof2frequent
items.
Nootherpairsoccuratall.
Integersarealwaysrepresentedby4bytes.
Whenwehashpairs,theydistributeamongbucketsrandomly,butas
evenlyaspossiblei.e.,youmayassumethateachbucketgetsexactly
itsfairshareofthePpairsthatoccuronce.

SupposethereareSbytesofmainmemory.InordertorunthePCY
algorithmsuccessfully,thenumberofbucketsmustbesufficientlylargethat
enoughroomtocountallthecandidatepairs.AsafunctionofS,whatisthe
largestvalueofPforwhichwecansuccessfullyrunthePCYalgorithmon
thisdata?Demonstratethatyouhavethecorrectformulabyindicating
whichofthefollowingisavalueforSandavalueforPthatis
approximately(i.e.,towithin10%)thelargestpossiblevalueofPforthatS.
a) S=500,000,000P=3,200,000,000
b) S=500,000,000P=5,000,000,000
c) S=500,000,000P=10,000,000,000
d) S=300,000,000P=3,500,000,000

Herearesomehints:
Pdividedbythenumberofbuckets.
2.Apaircanonlybeacandidatepairforthesecondpassifitisina

1/4

5/9/2016

frequentbucket.ForthevaluesofPandSfoundinthisquestion,thatcan
onlyoccurifthebucketcontainsoneofthe1,000,000frequentpairs.
3.Youmustuseahashtabletocountcandidatepairsonthesecondpassof
PCY.Thishashtabletakes12bytespercandidatepair.

2.Supposewehavetransactionsthatsatisfythefollowingassumptions:

s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
ThereareNfrequentitems,thatis,itemsthatoccur10,000timesor
more.
Thereareonemillionpairsthatoccur10,000timesormore.
Thereare2Mpairsthatoccurexactlyonce.Mofthesepairsconsistof
twofrequentitems,theotherMeachhaveatleastonenonfrequent
item.
Nootherpairsoccuratall.
Integersarealwaysrepresentedby4bytes.
Supposeweruntheapriorialgorithmtofindfrequentpairsandcanchoose
onthesecondpassbetweenthetriangularmatrixmethodforcounting
candidatepairs(atriangulararraycount[i][j]thatholdsanintegercountfor
eachpairofitems(i,j)wherei<j)andahashtableofitemitemcount
triples.Neglectinthefirstcasethespaceneededtotranslatebetween
originalitemnumbersandnumbersforthefrequentitems,andinthesecond
caseneglectthespaceneededforthehashtable.Assumethatitemnumbers
andcountsarealways4byteintegers.
AsafunctionofNandM,whatistheminimumnumberofbytesofmain
memoryneededtoexecutetheapriorialgorithmonthisdata?Demonstrate
thatyouhavethecorrectformulabyselecting,fromthechoicesbelow,the
tripleconsistingofvaluesforN,M,andthe(approximate,i.e.,towithin
10%)minumumnumberofbytesofmainmemory,S,neededfortheapriori
algorithmtoexecutewiththisdata.

a) N=20,000M=80,000,000S=1,100,000,000

b) N=50,000M=200,000,000S=2,500,000,000
c) N=10,000M=50,000,000S=600,000,000

d) N=100,000M=40,000,000S=800,000,000

Here'sahint.Whenconsideringthehashtableforcountingpairsoffrequent
itemsthatactuallyoccurinthedataset,rememberthatyouneed12bytesper
entry,4eachtostorethetwoitemID'sand4tostoretheintegercount.The
numberof12byteentrieswillbethenumberofpairsthatoccurinthedataand
havebothitemsfrequent.

3.Supposeweperformthe3passmultistagealgorithmtofindfrequentpairs,
s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
Allonemillionitemsarefrequentthatis,theyoccuratleast10,000
times.
Thereareonemillionpairsthatoccur10,000timesormore.

2/4

5/9/2016

TherearePpairsthatoccurexactlyonce.
Integersarealwaysrepresentedby4bytes.
Whenwehashpairs,theydistributeamongbucketsrandomly,butas
evenlyaspossiblei.e.,youmayassumethateachbucketgetsexactly
itsfairshareofthePpairsthatoccuronce.
Thehashfunctionsonthefirsttwopassesarecompletelyindependent.
SupposethereareSbytesofmainmemory.AsafunctionofSandP,whatis
theexectednumberofcandidatepairsonthethirdpassofthemultistage
algorithm?Demonstratethecorrectnessofyourformulabydentifyingwhich
ofthefollowingtriplesofvaluesforS,P,andNisNapproximately(i.e.,to
within10%)theexpectednumberofcandidatepairsforthethirdpass.
a) S=300,000,000P=100,000,000,000N=19,000,000
b) S=200,000,000P=10,000,000,000N=3,400,000

c) S=300,000,000P=100,000,000,000N=9,300,000
d) S=500,000,000P=5,000,000,000N=10,500,000

4.DuringarunofToivonen'sAlgorithmwithsetofitems{A,B,C,D,E,F,G,H}
asampleisfoundtohavethefollowingmaximalfrequentitemsets:{A,B},
{A,C},{A,D},{B,C},{E},{F}.Computethenegativeborder.Then,
identifyinthelistbelowthesetthatisNOTinthenegativeborder.
a) {G}
b) {F,G}

c) {A,B,C}
d) {B,F}

Thissetisinthenegativeborderbecauseitisnotfrequent,yeteachofits
immediatepropersubsets,i.e.,theemptysetonly,isfrequent.Notethatasubset
ofamaximalfrequentitemset,suchastheemptyset,mustitselfbefrequent.

5.Inthisproblem,assumeallintegersandpointersoccupy4bytes.The

assumptionthatwecountrepresentpaircountswithtriples(i,j,c)forthepair
i,jwithcountcdoesnotaccountforthespaceneededtobuildanefficient
datastructuretofindijpairswhenweneedthem.Supposeweuseabinary
searchtree,whereeachnodeisaquintuple(i,j,c,leftChild,rightChild).
SupposealsothatthereareIitems,andPpairsthatactuallyappearinthe
data.Underwhatcircumstancesdoesitsavespacetousetheabovebinary
searchtreeratherthanatriangularmatrix?
a) I=500,000P=20,000,000,000
b) I=200,000P=5,000,000,000

c) I=50,000P=3,000,000,000
d) I=1000P=120,000

3/4

5/9/2016