Anda di halaman 1dari 1

Aligning the Warehouse and the Web

We now examine the M-DE component operation from the web (Manning et al, 2007; Zhenyu et al, 2002;
in detail. Logically we divide the model into four Daconta et al, 2003). A
components. In the diagram these are labeled Filter,
Modify, Analyze and Format respectively. Semantic Issues
Firstly, the Filter component takes a query from a
user and checks that it is valid and suitable for further In the Data Warehouse much attention is paid to retaining
action. In some cases the user is directed immediately the purity, consistency and integrity of data originating
to existing results, or may request a manual override. from operational databases. These databases take sev-
The filter may be customized to the needs of the user(s). eral steps to codify meaning through the use of careful
Next the Modify component handles the task of query design, data entry procedures, database triggers, etc.
modification. This is done to address the uniqueness The use of explicit meta-data safeguards the meaning
of search criteria present across individual search en- of data in these databases. Unfortunately on the web
gines. The effect of the modifications is to maximize the situation is often chaotic. One promising avenue in
the success rates of searches across the suite of search addressing the issue of relevance in a heterogeneous
engines interrogated. environment is the use of formal, knowledge representa-
The modified queries are sent to the search engines tion constructs known as Ontologies. These constructs
and the returned results are analyzed, by the Analyze have again recently been the subject of revived interest
component, to determine structure, content type in view of Semantic Web initiatives. In our model we
and viability. At this stage redundant documents are plan to use a domain-specific ontology or taxonomy
eliminated and common web file types are handled in the format module to match the results’ terms and
including .HTML, .doc, .pdf, .xml and .ps. Current hence distinguish relevant from non-relevant results
search engines sometimes produce large volumes of (Ding et al, 2007; Decker et al, 2000; Chakrabarti, 2002;
irrelevant results. To tackle this problem we must con- Kalfoglou et al, 2004; Hassell et al, 2006; Holzinger
sider semantic issues, as well as structural and syntactic et al, 2006).
ones. Standard IR techniques are applied to focus on
the issue of relevance in the retrieved documents col-
lection. Many tools exist to aid us in applying both IR
and data retrieval techniques to the results obtained

Figure 2. Meta-data engine operation

Handle Query Filter Modify


Submit Modified
from D.W. Queries to S.E.

Meta-Data

format Analyse

Handle Results
Provide Results to from S..E.
D.W.



Anda mungkin juga menyukai