Anda di halaman 1dari 5

MASWS Assignment 3 A review on Picky a SW Search Engine ,

Liu qijun s1119879 March 15, 2012

Doing search over semantically organized text?

In semantic web scenario, one most important goal is to distribute data across the web with concerning data consistency etc. . The semantic web data modelling methods (URI and RDF) enable us to do that. In terms of accessing them, we are no longer conned by the traditional document-based methods. For example, in Information Retrieval, the better organized text becomes a potential for rening our search results. Picky is a tool to realize this target. Picky is a semantic text search engine does not operate on huge blobs of text, but instead on smaller, highly categorized text amounts.1 A full-text search engine often puts every term in documents into index. Though it makes our query simple to a xed single search eld, which may have one or two keywords we are interested. It is losing semantic or information that the data may have or imply. For example, peter may appears many times in a document, but a user who types peter into query is only interested that it is used as a surname peter. Picky is a search engine that incorporate those semantic (often relations) to the existed indexing and response to user interactively when receive a ambiguous query text. By doing that, it helps users rene their search to get exactly what they want.

A taste of Picky

Following the setting up tutorials provided on its site, I installed and congured a Picky search engine server and client on my computer. I then made several test cases to examine its features, including both advantages and my concerns. The data set I test is a example given,a simple database with 540 books. The data structure is a relational database like. which have several elds,like writer, booktitle and ISBN etc. 1. rst example ( as shown in Figure 1) illustrates the way picky response to users query. It retrieves the relevant records (i.e 3 books in this case) and suggests some elds that the interested words appear. Those elds are
1

http://orianhanke.com/picky/ http://orianhanke.com/picky/details.html

additional information what Picky know about the query, alan. We then can, for example, interested in books written by Alan, we click on the second one to rene our search . Figure 1: First example,query for alan

2. Another import feature is we can specify our interest (pointing to a certain eld) in query. Figure 2 gives out the results of such query. Picky returns books that have women in their titles. 3. Picky also has other features such as handling words mismatch and similarity measure which are commonly seen in regular search engines.

Technical issues

Figure 2: Second example,query for women in title eld

To better understand how it works and most importantly, how does it use the semantic structured text data to give users explicit feedbacks, I refer to the documentation they provide. Generally, the process Picky undertake could be divided into following parts: (a) conguration This involves specifying the data source search through and dening how data is indexed and how data is searched. The data source Picky supports are CSV le,relational database or RSS feeds. Basically, every thing with an # each is indexable.(this depend on passers, so theoretically, everything we can transform to a relational table form). About how data is accessed, it denes a single way, using so-called Categories to index and access. This Categories are slices of data. It corresponds to the use of elds in database, dimensions in statics data and more or less properties in RDF. For example title in a book dataset could be dened as a Category. Then the way data in this led are indexed and how a query is searched in this eld are specied according to requirements. For example ,the way users express their queries for this eld should be specied as title: and it is searched by boolean key words. some interesting settings like partial matching(i.e. matching part of a word counts a match) are customized here. (b) do indexing At this stage it tokenize and index what we concerned in each categories form every data source we pointed. This is done programmatically. Picky oers a choice of two index types, In-Memory and Redis. The former saves its indexes on disk and reloads them into memory and the latter saves its indexes in Redis. so, Indexes in Picky hold categorized data. Notice that Picky allows real-time indexing (indexing content while server is running). (c) setting up a web server and response to queries Client parses users queries and formalizes it into a inter searching sentence according to users customization. When results return, it displays them and listen to users feedbacks that would further modies the results. The way Picky dealing with semantic text is straightforward, namely by taking a slice of data and then indexing against on that so-called Category.It process all semantic data source in this same uniform way. One regular process that we can take as an example. when a query comes, the results from every category are returned as well as their descriptions to the user-interface, users then can modify their search result by viewing and interacting with.

Alternatives

An alternative tool is called Searchy,which can be found at http : //jsearchy .sourcef orge.net/. Searchy is a distributed Search and information integration engine. The most dierent feature is its architecture. Searchy is based on the concept of agent works in a distributed environment. With the co-operations between agents, Searchy can perform a complex task with a minimun management complexity. Another important dierence is that Searchy is designed with he semantic web technologies in mind. Its core data model is RDF and OWL. So, as introduced in its web site, Searchy can be used as a RDF transformation tool. I think Picky does a better job in interacting with uses to retrieve text that of interests. While Searchy focus more on its usage in other applications and its distributed nature allows a better data integration.

Comments

Besides Picky has many engineering advantages, such as exibly in congure indexing, supporting multiple data sources and etc. The main it brought is the way users interacting with their query results. About their solution to semantic content (text), I think the design is a simple & ready to use one. It has no additional standards or rules we need to refer to, when we do our implementation . As long as we can understand our data and make it t to the form that Picky supports, it could work reasonably well. This obviously have advantages to use. but, on the other hand, because they did not think too much about semantic web and the way they modelling data when do design, it lack a formal model. It may not be globally recognized and largely adopted.

Conclusion
This work generally documents how Picky, a semantic text search engine, is evaluated and the comments of my understanding mainly from a Semantic Web view. I rstly try out the tool. Then through its documentations, I look close at how it works , especially,on how Picky utilize semantics to achieve more accurate text search. Finally, after briey contrasting it to an alternative approach, I nd that Picky is one of those ready-to-use semantic technology tools. Its advantages as well as its limitations are briey analysed and stated. Thanks for the help from MASWS course.

Anda mungkin juga menyukai