ABSTRACT let them watch TV.” is better answered by a friend than the
We present Aardvark, a social search engine. With Aard- library. These differences in information retrieval paradigm
vark, users ask a question, either by instant message, email, require that a social search engine have very different archi-
web input, text message, or voice. Aardvark then routes the tecture, algorithms, and user interfaces than a search engine
question to the person in the user’s extended social network based on the library paradigm.
most likely to be able to answer that question. As compared The fact that the library and the village paradigms of
to a traditional web search engine, where the challenge lies knowledge acquisition complement one another nicely in the
in finding the right document to satisfy a user’s information offline world suggests a broad opportunity on the web for
need, the challenge in a social search engine like Aardvark social information retrieval.
lies in finding the right person to satisfy a user’s information 1.2 Aardvark
need. Further, while trust in a traditional search engine is
based on authority, in a social search engine like Aardvark, In this paper, we present Aardvark, a social search engine
trust is based on intimacy. We describe how these considera- based on the village paradigm. We describe in detail the ar-
tions inform the architecture, algorithms, and user interface chitecture, ranking algorithms, and user interfaces in Aard-
of Aardvark, and how they are reflected in the behavior of vark, and the design considerations that motivated them.
Aardvark users. We believe this to be useful to the research community
for two reasons. First, the argument made in the original
Anatomy paper [3] still holds true — since most search en-
1. INTRODUCTION gine development is done in industry rather than academia,
the research literature describing end-to-end search engine
1.1 The Library and the Village architecture is sparse. Second, the shift in paradigm opens
Traditionally, the basic paradigm in information retrieval up a number of interesting research questions in information
has been the library. Indeed, the field of IR has roots in retrieval, for example around human expertise classification,
the library sciences, and Google itself came out of the Stan- implicit network construction, and conversation design.
ford Digital Library project [13]. While this paradigm has Following the architecture description, we present a statis-
clearly worked well in several contexts, it ignores another tical analysis of usage patterns in Aardvark. We find that, as
age-old model for knowledge acquisition, which we shall call compared to traditional search, Aardvark queries tend to be
“the village paradigm”. In a village, knowledge dissemi- long, highly contextualized and subjective — in short, they
nation is achieved socially — information is passed from tend to be the types of queries that are not well-serviced by
person to person, and the retrieval task consists of finding traditional search engines. We also find that the vast ma-
the right person, rather than the right document, to answer jority of questions get answered promptly and satisfactorily,
your question. and that users are surprisingly active, both in asking and
The differences how people find information in a library answering.
versus a village suggest some useful principles for designing Finally, we present example results from the current Aard-
a social search engine. In a library, people use keywords to vark system, and a comparative evaluation experiment. What
search, the knowledge base is created by a small number of we find is that Aardvark performs very well on queries that
content publishers before the questions are asked, and trust deal with opinion, advice, experience, or recommendations,
is based on authority. In a village, by contrast, people use while traditional corpus-based search engines remain a good
natural language to ask questions, answers are generated in choice for queries that are factual or navigational and lack
real-time by anyone in the community, and trust is based a subjective element.
on intimacy. These properties have cascading effects — for
example, real-time responses from socially proximal respon- 2. OVERVIEW
ders tend to elicit (and work well for) highly contextualized
and subjective queries. For example, the query “Do you 2.1 Main Components
have any good babysitter recommendations in Palo Alto for The main components of Aardvark are:
my 6-year-old twins? I’m looking for somebody that won’t
1. Crawler and Indexer. To find and label resources that
Copyright is held by the author/owner(s).
WWW2010, April 26-30, 2010, Raleigh, North Carolina. contain information — in this case, users, not docu-
. ments (Sections 3.2 and 3.3).
2. Query Analyzer. To understand the user’s information Question
Analyzer
need (Section 3.4). classifiers
Gateways
3. Ranking Function. To select the best resources to pro- IM
Transport
Email Msgs RSR
vide the information (Section 3.5). Conversation Manager
Layer
Twitter Routing
iPhone Engine
Web {RS}*
4. UI. To present the information to the user in an ac- SMS
cessible and interactive form (Section 3.6).
me: Well there is always the Midtown Tennis Club on 8th ave @27th
if you really want to stay in manhattan -- but the quality isn't great.
You'd do just as well to use the public courts in Central Park. Or
another good option is to join NYHRC or NYSC in manhattan, and
use their courts in other boroughs...
aardvark: Great -- I've sent that to Michael. Thanks for the fast
answer! (Type 'Michael:' followed by a message to add something, or
'more' for options.)
local services
Users
travel
Questions Answered
2
%
2 40
Questions
20
1.5
0
1 >2 >4 >8 >16 >32
Number of Topics
0.5
Figure 11: Distribution of percentage of users and
0 number of topics
0 1 2 3 4 5+
Answers Received
in-line feedback which askers provided on the answers
Figure 10: Distribution of questions and number of they received rated the answers as ‘good’, 14.1% rated
answers received. the answers as ‘OK’, and 15.5% rated the answers as
‘bad’.
Questions often have a subjective element A manual There are a broad range of answerers 78,343 users (86.7%
tally of 1000 random questions between March and Oc- of users) have been contacted by Aardvark with a re-
tober of 2009 shows that 64.7% of queries have a sub- quest to answer a question, and of those, 70% have
jective element to them (for example, “Do you know of asked to look at the question, and 38.0% have been
any great delis in Baltimore, MD?” or “What are the able to answer. Additionally, 15,301 users (16.9% of
things/crafts/toys your children have made that made all users) have contacted Aardvark of their own initia-
them really proud of themselves?”). In particular, tive to try answering a question (see Section 3.6 for
advice or recommendations queries regarding travel, an explanation of these two modes of answering ques-
restaurants, and products are very popular. A large tions). Altogether, 45,160 users (50.0% of the total
number of queries are locally oriented. About 10% user base) have answered a question; this is 75% of all
of questions related to local services, and 13% dealt users who interacted with Aardvark at all in the pe-
with restaurants and bars. Figure 8 shows the top riod (66,658 users). As a comparison, only 27% of Ya-
categories of questions sent to Aardvark. The distri- hoo! Answers users have ever answered a question [7].
bution is not dissimilar to that found with traditional While a smaller portion of the Aardvark user base is
web search engines [2], but with a much smaller pro- much more active in answering questions – approxi-
portion of reference, factual, and navigational queries, mately 20% of the user base is responsible for 85% of
and a much greater proportion of experience-oriented, the total number of answers delivered to date – the dis-
recommendation, local, and advice queries. tribution of answers across the user base is far broader
than on a typical user-generated-content site [7].
Questions get answered quickly 87.7% of questions sub-
mitted to Aardvark received at least 1 answer, and Social Proximity Matters Of questions that were routed
57.2% received their first answer in less than 10 min- to somebody in the asker’s social network (most com-
utes. On average, a question received 2.08 answers monly a friend of a friend), 76% of the inline feedback
(Figure 10), 6 and the median answering time was 6 rated the answer as ‘good’, whereas for those answers
minutes and 37 seconds (Figure 9). By contrast, on that came from outside the asker’s social network, 68%
public question and answer forums such as Yahoo! An- of them were rated as ‘good’.
swers [7] most questions are not answered within the
People are indexable 97.7% of the user base has at least
first 10 minutes, and for questions asked on Facebook,
3 topics in their profiles, and the median user has 9
only 15.7% of questions are answered within 15 min-
topics in her profile. In sum, Aardvark users added
utes [12]. (Of course, corpus-based search engines such
1,199,323 topics to their profiles; not counting over-
as Google return results in milliseconds, but many of
lapping topics, this yields a total of 174,605 distinct
the types of questions that are asked from Aardvark
topics which the current Aardvark user base has ex-
require extensive browsing and query refinement when
pertise in. The currently most popular topics in user
asked on corpus-based search engines.)
profiles in Aardvark are “music”, “movies”, “technol-
Answers are high quality Aardvark answers are both com- ogy”, and “cooking”, but in general most topics are
prehensive and concise. The median answer length as specific as “logo design” and “San Francisco pickup
was 22.2 words; 22.9% of answers were over 50 words soccer”.
(i.e., the length of a whole paragraph); and 9.1% of
answers included hypertext links in them. 70.4% of 6. EVALUATION
6 To evaluate social search compared to web search, we ran
A question may receive more than one answer when the
Routing Policy allows Aardvark to contact more than one a side-by-side experiment with Google on a random sample
candidate answerer in parallel for a given question, or when of Aardvark queries. We inserted a “Tip” into a random
the asker resubmits their question to request a second opin- sample of active questions on Aardvark that read: ”Do you
ion. want to help Aardvark run an experiment?” with a link to
an instruction page that asked the user to reformulate their 8. REFERENCES
question as a keyword query and search on Google. We [1] H. Bechar-Israeli. From <Bonehead> to
asked the users to time how long it took to find a satisfac- <cLoNehEAd>: Nicknames, Play, and Identity on
tory answer on both Aardvark and Google, and to rate the Internet Relay Chat. Journal of Computer-Mediated
answers from both on a 1-5 scale. If it took longer than 10 Communication, 1995.
minutes to find a satisfactory answer, we instructed the user [2] S. M. Bietzel, E. C. Jensen, A. Chowdhury,
to give up. Of the 200 responders in the experiment set, we D. Grossman, and O. Frieder. Hourly analysis of a
found that 71.5% of the queries were answered successfully very large topically categorized web query log. In
on Aardvark, with a mean rating of 3.93 (σ = 1.23), while Proceedings of the 27th SIGIR, 2004.
70.5% of the queries were answered successfully on Google,
[3] S. Brin and L. Page. The anatomy of a large-scale
with a mean rating of 3.07 (σ = 1.46). The median time-to-
hypertextual Web search engine. In Computer
satisfactory-response for Aardvark was 5 minutes (of passive
Networks and ISDN Systems, 1998.
waiting), while the median time-to-satisfactory-response for
[4] T. Condie, S. D. Kamvar, and H. Garcia-Molina.
Google was 2 minutes (of active searching).
Adaptive peer-to-peer topologies. In Proceedings of the
Of course, since this evaluation involves reviewing ques-
Fourth International Conference on Peer-to-Peer
tions which users actually sent to Aardvark, we should ex-
Computing, 2004.
pect that Aardvark would perform well — after all, users
chose these particular questions to send to Aardvark because [5] A. R. Dennis and S. T. Kinney. Testing media richness
of their belief that it would be helpful in these cases. And theory in the new media: The effects of cues,
in many cases, the users in the experiment noted that they feedback, and task equivocality. Information Systems
sent their question to Aardvark specifically because a previ- Research, 1998.
ous Google search on the topic was difficult to formulate or [6] J. Donath. Identity and deception in the virtual
did not give a satisfactory result.7 community. Communities in Cyberspace, 1998.
Thus we cannot conclude from this evaluation that social [7] Z. Gyongyi, G. Koutrika, J. Pedersen, and
search will be equally successful for all kinds of questions. H. Garcia-Molina. Questioning Yahoo! Answers. In
Further, we would assume that if the experiment were re- Proceedings of the First WWW Workshop on Question
versed, and we used as our test set a random sample from Answering on the Web, 2008.
Google’s query stream, the results of the experiment would [8] T. Hofmann. Probabilistic latent semantic indexing.
be quite different. Indeed, for questions such as “What is In Proceedings of the 22nd SIGIR, 1999.
the train schedule from Middletown, NJ to New York Penn [9] B. J. Jansen, A. Spink, and T. Sarcevic. Real life, real
Station?”, traditional web search is a preferable option. users, and real needs: a study and analysis of user
However, the questions asked of Aardvark do represent queries on the web. Information Processing and
a large and important class of information need: they are Management, 2000.
typical of the kind of subjective questions for which it is [10] M. Kamvar, M. Kellar, R. Patel, and Y. Xu.
difficult for traditional web search engines to provide satis- Computers and iPhones and mobile phones, oh my!: a
fying results. The questions include background details and logs-based comparison of search users on different
elements of context that specify exactly what the asker is devices. In Proceedings of the 18th International
looking for, and it is not obvious how to translate these infor- Conference on World Wide Web, 2009.
mation needs into keyword searches. Further, there are not [11] D. Klein, K. Toutanova, H. T. Ilhan, S. D. Kamvar,
always existing web pages that contain exactly the content and C. D. Manning. Combining heterogeneous
that is being sought; and in any event, it is difficult for the classifiers for word-sense disambiguation. In
asker to assess whether any content that is returned is trust- Proceedings of the SIGLEX/SENSEVAL Workshop on
worthy or right for them. In these cases, askers are looking Word Sense Disambiguation, 2002.
for personal opinions, suggestions, recommendations, or ad- [12] M. R. Morris, J. Teevan, and K. Panovich. What do
vice, from someone they feel a connection with and trust. people ask their social networks, and why? a survey
The desire to have a fellow human being understand what study of status message Q&A behavior. In Proceedings
you are looking for and respond in a personalized manner in of CHI, 2010.
real time is one of the main reasons why social search is an [13] L. Page, S. Brin, R. Motwani, and T. Winograd. The
appealing mechanism for information retrieval. PageRank citation ranking: Bringing order to the
Web. Stanford Digital Libraries Working Paper, 1998.
7. ACKNOWLEDGMENTS [14] C. Silverstein, M. Henziger, H. Marais, and
M. Moricz. Analysis of a very large Web search engine
We’d like to thank Max Ventilla and the entire Aardvark
query log. In SIGIR Forum, 1999.
team for their contributions to the work reported here. We’d
especially like to thank Rob Spiro for his assistance on this [15] A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic.
paper. And we are grateful to the authors of the original From e-sex to e-commerce: Web search changes. IEEE
“Anatomy of a Large-Scale Hypertextual Web Search En- Computer, 2002.
gine” [3] from which we derived the title and the inspiration [16] L. Sproull and S. Kiesler. Computers, networks, and
for this paper. work. In Global Networks:Computers and
International Communication. MIT Press, 1993.
7 [17] M. Wesch. An anthropological introduction to
For example: “Which golf courses in the San Francisco
Bay Area have the best drainage / are the most playable YouTube. Library of Congress, June, 2008.
during the winter (especially with all of the rain we’ve been
getting)?”