Anda di halaman 1dari 2

Processamento e Recuperao de

Informao

IST 2015/2016

Example Exam
Duration: 2h00.
Group 1
A query Q produced as result the set R = {d1 , d2 . . . d20 }. In R, documents di with an odd i were judged relevant, and those
with an even i were judged not relevant. Calculate the following measures and justify your answer.
a) precision@5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (2.5v)
b) The precision at 1/4 of the relevant documents, knowing that there are a total of 12 relevant documents in the overall
collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(1.5v)
c) The interpolated precision and recall curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1v)
Group 2
a) In the context of Information Retrieval, classification and clustering techniques are usually used with the same goal: to
organize documents. What is the main difference between these two approaches? Justify your answer. . . . . . . . . . . . . (2.5v)
b) Consider the following set of documents, represented through the vector space model.

D1
D2
D3

T1
1
0
1

T2
0
1
0

T3
1
1
0

class
A
A
B

Using a binary Naive Bayes model, compute the probability of document D4 = (1,1,0) belonging yo each class (A or B).
The probability of a term belonging to a class should be computed using Laplace smoothing. . . . . . . . . . . . . . . . . . . . . . .(1.5v)
c) In problems such as the previous, it is important to avoid the occurrence of probabilities with a value of zero. Explain why
we should use techniques like Laplace smoothing, instead of simply adding a constant value to all probabilities (e.g. 0.1).
(1v)
Group 3
The political authorities of towns and political regions are used to sponsor web sites for the promotion of local tourism or
cultural activities. Consider you are designing a new business consisting in the offering to on-line newspapers a fully automated
service to enrich the news they publish with links to relevant books described in Europeana.
a) Please draft the functional architecture of the service described ahead, considering your knowledge of Europeana and your
experience from the project course, as also the discussions in classes. You can consider providing an informal visual diagram
representing that solution as an architecture of services and their functional interfaces, complementing that with textual
descriptions. Please have in mind that the main purpose is to prove you can make a relevant application of the concepts
and techniques addressed in the course, therefore please focus on that. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3v)
b) Consider now you have the possibility to recommend to the newspapers suggestions for how to write the texts of their
news so your automated solution would provide a better relevance. Please describe that, as also how your proposal in the
answer to the previous question would assure that, or could be improved for that purpose (describe and use here a generic
solution if you had not answered to the previous question). Once again, please have in mind that the main purpose is
to provide evidence that you can make an effective application of the concepts and techniques addressed in the course,
therefore please focus on that. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3v)
Group 4

Pag. 1 of 2

IST Processamento e Recuperao de Informao 2015/2016

Consider an organization that has an intranet infrastructure where all the employees have access to personal computers to
access to two enterprise applications. One application orchestrates a business process where each activity has to be executed by
a worker that is expected to (step 1) read a descriptive text; (step 2) discover in a physical paper archive documents relevant
to the subject of that text; and (step 3) makes the process flow by adding it new references to the documents found in the step
2. However, it is realized this organization is having a poor performance because the step 2 is requiring too much human effort.
The management believes that the state of the art of the technology available in the market should make it possible to address
this limitation with a positive impact in the performance of the organization. Please provide a proposal for that, considering the
subjects addressed in this course. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (4v)

Pag. 2 of 2

Anda mungkin juga menyukai