Anda di halaman 1dari 17

Efficient Instant-Fuzzy Search

with Proximity Ranking

Abstract
System finds answers to a query instantly while
user types in keywords character-by-character.
Fuzzy search improves user search experiences
by finding relevant answers with keywords similar
to query keywords.
A main computational challenge in this paradigm
is the high speed requirement
At the same time, we also need good ranking
functions that consider the proximity of keywords
to compute relevance scores

Problem Statement & Proposed


Solution

Problem Statement:
o Achieving efficient time & space complexities.

Solution:
o Index phrases with proper indexing scheme and
o Develop an incremental-computation algorithm for efficiently
segmenting a query into phrases and computing relevant
answers.

Result Metrics: Experimental study on real data sets to


show the tradeoffs between time, space, and quality of
these solutions.

Literature Survey
K. Grabski and T. Scheffer, Sentence
completion, in SIGIR, 2004, pp.433439.
A. Nandi and H. V. Jagadish, Effective phrase
prediction, in VLDB, 2007, pp. 219230.
o They have proposed system on predicting queries. Many
systems do prediction by treating a query with multiple
keywords as a single prefix string.
o Therefore, if a related suggestion has the query
keywords but not consecutively, then this suggestion
cannot be found.

Literature Survey Cont..


H. Bast and I. Weber proposed many indexing
and query techniques to support instant search.
M. Hadjieleftheriou and C. Li, K.
Chakrabarti, S. Chaudhuri, V. Ganti, and D.
Xin, S. Chaudhuri, V. Ganti, and R. Motwani:
Suggested former approach, sub-strings of the
data are used for fuzzy string matching
S. Ji, G. Li, C. Li, and J. Feng: This approach is
especially suitable for instant and fuzzy search
since each query is a prefix and trie can support
incremental computation efficiently.

Literature Survey Cont..


R. Fagin, A. Lotem, and M. Naor, F. Zhang, S. Shi, H. Yan,
and J.-R. Wen: Extensively to support top-k queries efficiently
G. Li, J. Wang, C. Li, and J. Feng, Supporting efficient
top-k queries in type-ahead search, in SIGIR, 2012, pp.
355364.

o Adopted existing top-k algorithms to do instant-fuzzy


search.
o Most of these studies reorganize an inverted index to
evaluate more relevant documents first.

Persin et al. proposed using inverted lists sorted by


decreasing document frequency.
Zhang et al. studied the effect of term-independent
features in index reorganization.

General Idea of Instant


Search

Architecture

Example Table for


Architecture Explanation

This data is structured in indexed format.


Two types of indices are used to structure this data
1. Trie Indices
2. Forward Indices

Basic Blocks In Architecture

Indices
1.Trie
2.Forward

Basic Blocks In Architecture


Phrase Validator:
o When a search server receives a request, it first
identifies all the valid phrases in the query that are in
the dictionary D, and intersects their inverted lists.
o The Phrase Validator computes and returns the active
nodes for all these terms.
o If a query keyword appears in multiple valid phrases, the
query can be segmented into phrases.

Query Plan Builder:


o After identifying the valid phrases, the Query Plan
Builder generates a Query Plan Q, which contains all the
possible valid segmentations in a specific order.
o The ranking of Q determines the order in which the
segmentations will be executed.

Basic Blocks In
Architecture Cont.
Index Searcher: After Q is generated, the
segmentations are passed into the Index
Searcher one by one until the top-k answers are
computed, or all the segmentations in the plan
are used.
Cache Module:
o The Phrase Validator uses the Cache module to validate
a phrase without traversing the trie from scratch,
o while the Index Searcher benefits from the Cache by
being able to retrieve the answers to an earlier query to
reduce the computational cost.

Phrase Validator

Phrase Validator With Cache Module

Own Contributions
Implementing Demand Paging Algorithm with
efficient page replacement strategy will be
advantageous for this application.
Previous searches could be the part of next
search history, so we will put log/ history in page
table and retrieve pages efficiently as per query
keyword requirements.
Different Page Replacement Strategies
could be proposed to give extremely faster
recommendations.
Architecture like Translation Look Aside Buffer
could be employed to fetch pages from TLB.

Proposed Architecture

Conclusion
The previous systems were able to recommend
results based on just previously typed
characters kept in cache module.
Most of the times Previous Search Log might be
useful to make recommendation system more
faster!
Relevance to user query along with users
intentions could be mined easily.