Anda di halaman 1dari 18
282 Chapter 14 Social Bookmarking and Web Search Yusuke Yanbe Kyoto University, Japan Adam Jatowt Kyoto University, Japan Satoshi Nakamura Kyoto University, Japan Katsumi Tanaka Kyoto University, Japan ABSTRACT Social bookmarking is an emerging type of a Web service for reusing, sharing, and discovering resources. By bookmarking, users preserve access points to encountered documents for their future access. On the other hand, the social aspect of bookmarking results from the visibility of bookmarks to other us- ers helping them to discover new, potentially interesting resources. In addition, social bookmarking systems allow for better estimation of the popularity and relevance of documents. In this chapter, we provide an overview of major aspects involved with social bookmarking and investigate their potential ‘for enhancing Web search and for building novel applications. We make a comparative analysis of two popularity measures of Web pages, PageRank and SBRank, where SBRank is defined as an aggregate number of bookmarks that a given page accumulates in a selected social bookmarking system. The results of this analysis reveal the advantages of SBRank when compared to PageRank measure and provide the foundations for utilizing social bookmarking information in order to enhance and improve search in the Web. In the second part of the chapter, we describe an application that combines SBRank and PageRank measures in order to rerank results delivered by Web search engines and that offers several complimentary functions for realizing more effective search. DOL: 10.4018/978-1-60566-384-5.chOl4 ‘Social Bookmarking and Web Search INTRODUCTION Social bookmarking is one of the main trends of anew generation of the Web called Web 2.0. The idea behindsocial bookmarking isto letusersstore URLsto their favoritepages andmake them visible to other users. Fach social bookmark is annotated with tags that describe the bookmarked resource and that were freely chosen by a bookmarker. Del.icio.us’ is currently the most popular social bookmarking service. Ithas been operating since 2003 and currently has about 3 million users that bookmarkedaround 100 million Web documents’ There are also other popular social bookmarking, sites such as Furl" or Simpy’ Non-social bookmarking was proposed first by (Keller, Wolfe, Chen, Labinowitz, & Mathe, 1997) as a way to remember and locally store access points to visited Web documents. In social bookmarking the social aspect of bookmarking allows for discovery of new, potentially relevant resources thanks to the combined effort of many users. This makes italso possible to determine the resources that are both relevant (by the analysis of their tags) and recently popular (by counting their bookmarks) as well as permits to track their popularity and relevance over time. For example, del.icio.us informs users about popular pages that recently obtained many bookmarks and clouda- cio.us® displays historical patterns and trends of bookmarked resources. ‘The incentives of social bookmarkers have been recently categorized by (Marlow, Naaman, Boyd, & Davis, 2006). According to the authors, users decide to bookmark the resources because of the following reasons: future retrieval, con- tribution and sharing, attract attention, play and competition, selfpresentation, opinion expression, In most cases, however, bookmarking are useful forindividual users who wantto externally (hence beyond the limit ofa single PC machine) store ac- ccesspoints to their selected resources, However, in this way,the usershelpalso to manually arrange the ‘Web ina bottom-up fashion since they categorize the ontine resources and enable better estimation of their popularity and, indirectly, quality. Animportant characteristic oftagging in social bookmarking ystems isthe lack of any controlled vocabulary. Usersare free to annotate bookmarked documents as they wish or they can borrow same tags as others used. However, after some time, certain forms of tag agreements emerge forthe sources as demonstrated in (Golder & Huberman, 2006). The process of resource categorization by free taggingiis called folksonomy andisinherently different from a rigid classification usually done by domain experts, for example, by librarians. However, the well-known problems with folk- sonomy result from its advantages, that is, from the uncontrolled, free character of categorizing resources by many users, Forexample, ambiguity, synonymy orpolysemy occuramong tags that can undermine the retrieval process. In this chapter we discuss the social book- ‘marking phenomenon and provide the results of analytical study aimed at analyzing the usefulness of social bookmarks for improving the search in the Web. In particular, we perform a comparative analysis of two popularity measures of Web pages, ‘SBRank and PageRank. PageRank is a popular iterative algorithm that scores Web pages based ontherandom surfer model (Page, Brin, Motwani, & Winograd, 1998). In short, a page has a high PageRank value if it is linked from a relatively large number of other pages that also have high PageRank scores. By finding popular resources both SBRank and PageRank provide means for selectinghigh quality Web pages assuming posi- tive correlation between popularity and quality. In addition, we analyze several other aspects of pages bookmarked in social bookmarking sites. For example, we investigate the dynamics of the both metrics in order to confirm whether mixing them could improve the dynamic characteristics of search results. Next, we also discuss the potentials of social bookmarking systems for providing new, suc- cessful kinds of Web services. For example, a 243 promising business model could be based on exploiting social bookmarks for improving the precision of direct advertising, Weemphasize here the additional aspects of social bookmarks such as the availability of temporal and location-type data that can be leveraged for realizing extended. and efficient search models. The historical pat- tem of social bookmarks accumulation could aid in a better discovery of resources that are in their popularity peaks or that are becoming recently popular among Web users. Additionally, categorizing tags into content-descriptive and sentiment-bearing ones allows for capturing page semantics and estimating user attitudes towards the bookmarked documents. We discuss several potential application examples and explain those characteristics of social bookmarksand, in general, social bookmarking thatare likely tobe key points in creating successful business models. Here, we also demonstrate the application for extended Web search that we have designed based on some of these ideas. RELATED RESEARCH PageRank (Page, Brin, Motwani, & Winograd, 1998) and HITS (Kleinberg, 1999) are the most popular link-based page ranking algorithms for ‘measuring the popularity ofpages. Since automati- cally judging the quality of resources is still not possible to be done by machines, hence the most effective approach is to rely on opinions of large number of Web authors. The calculations of both PageRank and HITS are thus dependent on the number of pages linking to target resources and, basically, favor documents that are commonly refereed to fiom other pages. Nevertheless, the link-based popularity estimation algorithms suffer from certain problems such as link-spamming, difficulties in link creation and poor temporal characteristics. One reason for this is that only Web authors can cast votes to pages by creating links to them, which is in contrast to socially 208 Social Bookmarking and Web Search bookmarking pages, where anyone can “rec- ommend” pages to others. Social bookmarking is thus a more democratic process of resource selection due to its simplicity and accessibility. In addition, since it is necessary for a resource to acquire lange number of in-links in order to ‘become more visible on the Web, usually, certain time delay occurs before the page can obtain high score as calculated by the link-based metrics. The observation that PageRank is biased against new ages as it takes some time until pages become noticed and popular among Web authors has been made by (Bacza-Yates, Castillo, & Saint-Jean, 2004) (Yu, Li, & Liu, 2004). Some researchers, attempted at eliminating this bias to some degree through incorporating the last-modification dates of pages (Baeza-Yates, Castillo, & Saint-Jean, 2004) or adding exponentially decaying factors to PageRank scores (Yu, Li, & Lit, 2004), Lastly, links may have variety of meanings and purposes as discussed in (Mandl, 2006). Previous studies on social bookmarking fo- cused mostly on the issues related to folksonomy (Golder & Huberman, 2006) (Marlow, Naaman, Boyd, & Davis, 2006) (Wu, Zubair, & Maly, 2006) (Wu, Zhang, & Yu, 2006) (Ya, Li, & Liu, 2004) (Zhang, Wu, & Yu, 2006). For example, (Zhang, Wu, & Yu, 2006) introduced hierarchical concept model of folksonomies using HACM -@ hicrarchy-clustering model reportingcertainkinds of hierarchical and conceptual relations between tags. Inanother work, (Golder & Huberman, 2006) investigated the characteristics of tagging and ‘bookmarking and revealed interesting regulari- ties in user activities, tag frequencies, and bursts in popularity of tags in social bookmarks. The authors also analyzed tagging dynamicsas well as classified tags into seven categories depending on the functions they perform for bookmarks. None of the previous studies, however, focused on the comparative analysis of link structure and social bookmarking-derived metres, Perhaps, the closest work to ours is the one done by Heymann et al. (Heymann, Koutrika, & ‘Social Bookmarking and Web Search Garcia-Molina, 2008) who has recently made a large scale analysis of data in Del.icio.us to inves- tigate whether social tagging Websites can be of any use for Web search, Amongother results they found that popular query terms and tags overlap significantly, most tags were deemedrelevantand, objective by usersand tags are present n page text .0f 50% of pages. Our work is, however, unique in its comparison between SBRank and PageRank as well as in its larger focus on temporal charac tcristics of social bookmarks. We also propose ‘merging link-based ranking metries withthemetric that leverages results of collaborative tagging. In addition, we exploitother characteristics of social bookmarking systems such as agglomerated user behavior, sentiment of users towards bookmarked resources. Recently, (Bao, Wu, Fei, Xue, Su, & Yu,2007) introduced an authority-centric search model in social bookmarks that is based on the number of social bookmarks that pages have and the characteristics of users bookmarking them. In (Wu, Zubair, & Maly, 2006) HITS algorithm was adapted for identifying high quality resources and users that provide such resources in the Web. In another work, (Damianos, Griffith, & ‘Cuomo, 2006) proposed using social bookmark- ing for information sharing and management in corporate Intranets. Finally, (Wu, Zhang, & Yu, 2006) described techniques for exploiting social bookmarking for the purpose of fostering the development of Semantic Web. The authors used. probabilistic generative modelto captureemerging semantics of resource ANALYSIS OF SOCIAL BOOKMARKING Dataset Characteristics ‘We have created two datasets for our study. We have chosen del.icio.us as a source of the first dataset, while the second dataset was created. using Hatena Bookmark’, which is the most popular bookmarking service in Japan started in February 2005, The datasets were obtained in the follow- ing way. We have used popular tags, which are sets of the most popular and recently used tags published by del.icio.us and Hatena Bookmark. 140 tags have been retrieved on December 6th, 2006 from del.icio.us and 742 tags on February 16th, 2007 from Hatena Bookmark, Next, we collected popular URLs from these tags. Usually Jess than 25 popular pages were listed for each. tag in both the social bookmarking systems. At this stage, we obtained 2,673 pages for del.icio, usand 18,377 pages for Hatena Bookmark. In the last step, we removed duplicate URLs (i.e. URLs listed under several popular tags). In result, we obtained 1,290and 8,029 unique URLs for del ico. us and for Hatena Bookmark, respectively. Each URL had two attributes: firstDate and SBRank. firstDate indicates the time point when apage was introduced to the social bookmarking system for the first time by having first bookmark created. ‘SBRank, as mentioned above, is the number of accumulated bookmarks of given page obtained at the date of the dataset creation. In order to detect PageRank scores of the URLs, we used Google Toolbar’ - a browser toolbar that allows viewing PageRank scores of visited pages. PageRank scores obtained in this way are approximated on the scale from 0 to 10 (O-means the lowest PageRank score). To sum up, the obtained datasets are the snap- shots of the collections of popular pages in both. social bookmarking systems. Each page has its PageRankand SBRank scores recorded thatithad at the time of the dataset creation. Distribution of PageRank and SBRank Figures I a and | b show the distributions of ‘SBRank scores in the both datasets, We can see that only few pages arc bookmarked by many us- 248, ‘Social Bookmarking and Web Search Figure 1. Distribution of SBRank scores, a) del.icio.us dataset, and b) Hatena Bookmark dataset a = ; » Stak renee Figure 2. Distribution of PageRank scores, a) del.icio.us dataset, and b) Hatena Bookmark dataset ers, while the rest is bookmarked by a relatively Jow number of users. Figures 2 a and 2 b show, in contrast, the distribution of PageRank scores in ‘both datasets. We found that more than a half of pages (56.1%) have PageRank scores equal to 0 inthe del.icio.us dataset, while Hatena Bookmark dataset (81%) contains even more such pages, 246 which is probably due to its mote local scope. ‘These pages have relatively low popularity as ‘measured by the link-based metric. Thus it may be difficult for users to find them through con- ventional search engines. For example, Figure 3 shows the average distribution of PageRank scores in search results delivered from Yahoo! ‘Social Bookmarking and Web Search Figure 3. Average PageRank scores of top 500 pages —qoo dataset —Hatena dataset 25 $2 $ 15 3 1 é Bos s < 0 1 101 201 301 401 Rank of page in top 500 search result search engine® for two sample query sets (see Ap- pendix on how the query sets were created). Nev- ertheless, many social bookmarkers considered the pages to be of high quality by bookmarking them. We think that it would be advantageous if some of these pages could be added into search results| provided they are relevant to issued queries. In general, we think that there may be two possible reasons that caused the occurrence of many pages with low PageRank scores in the datasets, despite their relatively high popularity among social bookmarkers. One is that the pages could have been created rather recently, hence, on average, they did not acquire many in-links and indirectly the high visibility on the Web. Or, in contrast, the pages have been published on the ‘Web since long time, yet their quality cannot be reliably estimated using PageRank measure due to various other reasons. We decided to analyze the temporal characteristics of pages stored in our dataset in order to provide an answer the above question, Figure 4 a and 4b show the distribu- tions of the dates of page additions to the social bookmarking systems (i.e., firstDate). Looking at these figures, we can see that more than a half of the pages appeared in the social bookmark- ing systems within the first three months before the dataset creation. The other half of the pages ‘was bookmarked in the system for the first ime ‘more than three months before the time when the data was crawled, The reason that Hatena Book- ‘mark dataset contains more fresh pages is most likely due to its shorter age than that of del iio. us. From these figures it can be seen that there are many recently added pages in both datasets. This is especially true for pages with PageRank scores equal to 0 as shown in Figure $ a and 5 b, Such pages usually do not have enough time to acquire many in-links, hence, they retain low PageRank scores. From these results we observe one of the useful aspects of SBRank when compared to link-based metrics. The latter is not effective in terms of fresh information retrieval since pages 2a7 ‘Social Bookmarking and Web Search Figure 4, Histogram of firstDate of pages, a) del.icio.us dataset, and b) Hatena Bookmark dataset as 20% a : Ems Sin 1 * @ feaoate os 08 = Ev % ms 108 oO gs Se ease eae Bie PREER2SE2E 222 © feta Figure 5. Histogram of firstDate of pages that have PageRank score equal to 0, a) del.icio.us dataset, and b) Hatena Bookmark dataset 2000/9 ao0a/12 2004/3 2004/6 2a0es9 ao0s12 2009/3 2008/6 2008/9 2005/12 2008/2 2008/6 2006/9 2008/12 @ eatDate require relatively long time in order to acquire large number of in-links. Relatively novel pages may have thus difficulties in reaching top search results in current search engines even if their quality is quite high. We confirm the results of (Baeza-Yates, Castillo, & Saint-Jean, 2004) that PageRank scores of pages are highly correlated with theirage in Figure 6aand 6b, The correlation 208, ‘of eens seBS 2883 Beefe28822228 sass o) coefficients between PageRank and firstDate is 0.85 for del.icio.us and r = ~0.51 for Hatena Bookmark datasets To sum up, the results suggest that many popularpagesin social bookmarkingsystemshave relatively low PageRank scores. In addition, we confirm that SBRank has better dynamics than the traditional link-based page ranking metric, This, ‘Social Bookmarking and Web Search Figure 6, Scatter plot of firstDate and PageRank score, a) del.icio.us dataset, and b) Hatena Bookmark dataset @ is because social bookmarks allow for a more rapid, and unbiased, popularity pages. Complementing PageRank using SBRank hhas thus potential to improve the search process on the Web. timation of ENHANCED WEB SEARCH PROPOSAL In this section, we demonstrate a simple method for enhancing Web search by re-ranking top N resultsretured from conventional searchengines using the information about the number of their social bookmarks and their associated tags, To implement a combined rank estimation measure we propose a linear combination of both ranking metres. Intheequation, SBRant, is thenumber of book- marks ofa page jin del.icio.us, while PageRank, is a PageRank score of the page acquired using Google Toolbar. We normalize both SBRank, and PageRank, scores by dividing them by the maxi- ‘mum scores found for all N pages. a is a mixing parameter with the value ranging from 0 to 1 Below, we discuss the extensions to a query ‘model that are possible thanks to considering the various types of information attached to social bookmarks, Metadata Search Type Tags annotated by users are useful fora so-called. search by metadata (“metadata query”), In our search model a user can issue both a traditional query, which we call “content query” as well as “metadata query”. In such a case, pages that contain the content query in their contents will be re-ranked considering the overlap bet user-issued metadata query and the tags deserib- ing them, For each page we construct atag vector based on accumulated tags of the page and their SBRank, (SBRank} CombinedRank, = a» +f PageRank, -a)s a 249 frequencies. We also construct a metadata query vector in the same way. Then the cosine similar- ity between metadata query and the tag vectors is used for re-ranking search results pages. Temporal Search Type It is also possible to construct temporally con- strained queries thanks to using timestamps of social bookmarks. In consequence, pages could be retrieved according to arbitrary aspects of their popularity patterns. For example, pages that feature quickly rising popularity overtime can be returned on top results. Note that this fmetional- ity cannot be realized using traditional link-based, approaches, as there is no available data on the link evolution of the Web. The temporal extension to the query model that we propose is as follows: First, we propose filtering pages according to their firstDate values, that is, according to the ages of pages inside so- cial bookmarking systems. Users can find those documents that have been recently introduced to the social bookmarking system. Obviously pages may be older than that ‘Next, we allow users to search for pages ac- cording to the popularity variance over time (vari- ance of SBRank over time). For example, search results may contain pages with stable popularity function or pages reflecting large fluctuations in user preferences over time. Lastly, we propose capturing levels of page populatities in certain, specified periods of time through summing the numbers ofbookmarks made to pages during those intervals and re-ranking pages accordingly. Thus, user can request rel- evant pages that were popular within a selected time period Sentiment Search Type Pages are often tagged by subjective or “egoistic” tags such as “shocking”, “funny”, “cool”, Al- though, typically, such tags are viewed negatively 250 Social Bookmarking and Web Search as hindering the retrieval process, we consider them as providing useful information about page quality and user impressions. In our enhanced search model, we thus enable users to issue @ sentiment-like query. Before implementing this feature, we first measured the numberof sentiment tags used by bookmarking users. We have used here the Hatena Bookmark dataset. Tags in this dataset were classified into two groups accord- ing to the taxonomy of tags defined by (Golder & Huberman, 2006) + tags that identify what or who the resource is about + tags that identify qualities or characteris- tics of resources (e.g., “scary”, “funny”, “stupid”) We have manually examined top 1,100 tags from Hatena Bookmark dataset in order to distin- guish between content and sentimenttags. Table | shows the top 10 content and sentiment tags after ‘translation’. In the top 30 tags we have observed the ratio of content tags to sentiment tags to be about 10:1. In Figure 7 we show the distribution of tag frequencies. The results reveal that top 3 sentiment tags are very common, while the other tagsareratherlessused. fier including synonyms ‘we found thatthe most popularsentiment tags are: useful, amazingand awfal. Figure 8 shows the top 54 sentiment tags locatedon the negative-positive scale with their frequencies. The tags that occur more than 3000 times are placed over the dashed line, while those with frequencies less than 100 ‘times are under the horizontal axis. It is easy to notice that, in general, there are more positive sentiment tags than negative ones. Besides, the positive ones are also more frequently used. Only one negative sentiment tag has been used more ‘than 100 times (“it’s awful”), This means that social bookmarkers rarely bookmark resources for which they have negative feelings. For the purpose of utilizing sentiment tags in our search model, we create page sentiment vec« ‘Social Bookmarking and Web Search Table 1. Top 10 content tags (left) and top 10 sentiment tags (right) Tap Name N Tap Name N Web 1633 [ [user oogle 15674 | | Weamaring tcl 14483 [iam jevaneint 11.840 |_| wel @) youtube ass |_| inverting ons ores |_| fay) a6 = sant] [ewe = sien on funny ap 2h sae [ [wet m7 soci vain [ [ise 36s tor from the sentiment tags added by users. The similarity of this vector to the vector built using, userissued sentiment queries is then used for computing sentiment-based scores of pages. Final Re-Ranking Inthis section we discuss the way to integrate the above discussed extensions. At query time the system performs the following operations: Obtain top n pages from search results re- tured by a search engine P={p,,P...5, for query q Obtain SBRank scores for each pi where peP Obtain every bookmark and its associated data for each p, that has SBRank > 0 (i.e., the page has at least one social bookmark) Count the number of occurrences of users and tags to be used for providing “related Figure 7. Frequency distribution of top 20 content and sentiment tags 18000 18000 14000 12000 10000 ‘000 15000 4000 2000 Frequency 251 Social Bookmarking and Web Search Figure 8. Top 54 sentiment tags on the negative-positive sentiment scale Frequently Used Its awtul 3,000 times used, Negative 100 tines used useful(t) ‘ts amazing me funy ma, interesting iteugetu sunny sell fanny@) ovomatte SVE useful(4) vag cad co ea fats pn ceo aa ~ laugh nr nell easing tan wow | PPO cemiaint Janets ign practical GREAT its scan may ig laugh Less Used — *"#PPes Tags” and “related Bookmarks” structures (described later) In order to incorporate the above re-ranking ‘mechanisms, we have applied the ranking formula shown in Figure 9, The original search results returned by the search engine are re-ranked using Rank(p,) function, Here, B(p) represents the popularity estimate of p, using the combination of SBRank(p,) and SearchRank(p), which is the rank of the page in the results returned froma search engine. Not that although, our concepts to combine SBRank with PageRank scores, ithe actual implementation, we have used the ranks of pages returned from search engines as an approximation of pages’ popularity onthe Web. Fp, isthe freshness level ofp; Vip) isa variance measure of the function representing added bookmarks top, sim(ag, tag, is the simi- larity between page tag vector and query vector, 252 while sim(tag, tag) is the similarity between the page sentiment vector and the query vector. Sp tags fag) i8 the proportion of bookmarks of P, Which have been added in the time period /t,. 1,4] 10 the total number of bookmarks added to this page in the bookmarking system. a, f, y and 6 are control parameters. System Interface The interface of the prototype application that we have build is shown in Figure 10. It contains 4 slidesbar controls for adjusting a, f, 7 and 5 pac rameters. “Time span” control was implemented as the combination of two sliding bars used for selecting desired time periods (the time span can be also specified by entering dates in two textual boxes). Radio buttons were added in order to let users select one of the three most popular senti- ‘Social Bookmarking and Web Search Figure 9. Formula used for re-ranking search results Rankp,)=(1+ Ble (+ Fe) MD CTP 14 L0G) S(Phogtat)) where: Be) SBRank p)+ (1a) -SearchRank .) rp poi) -min, Frat) yep )ny-—_HatbaFtratDand py) LasDatdp) OT ar Ware FirstDaidp,)Las.Datdp )) Tong) =a simtagyteg,) P(p,g) = -sim(agtag,) . Ad 001( gst) ‘w)* 4ddBookp,,FirstDatd p ).LasDatdp,)) Searchtankp isarankof p, ‘SBRani{p,)sthenumberofbookmarksot 7 FiraDatdp is thefestdatewhena bookmark vas madetop, ‘astDaid p sthelatdatewhena bookmark vasmadeto p, ag, isa og venorot p 14g": isa sentiment tg vectorof p, SP stg aq tegeerarg ‘Alt the numero bookmarks mado pin ae) site? ag)asimaribctween ig vectra pand- sapeins ‘ment expressions: useful, amazing and awful. As it is difficult to list all potential sentiment tags into the interface, we have grouped the similar sentiment expressions with their corresponding, ‘weights in order to form the three basic sentiment categories, By default the controls are at the positions in which they do not influence search results so that users can perform the usual content-based search, without any additional query features. In addition, the system dynamically gener- ates navigable structures called “related tags and “related bookmarks” according to issued queries for enabling serendipitous discovery (sce the right hand side of Figure 10). “Related tags” is a tag cloud listing 20 most frequently ‘occurring tags for all the N pages returned. The font sizes of tags are determined by the frequen- cies of their occurrences in the results. The tag cloud lets users explore other tags related to returned results and indirectly to the issued queries. Clicking on any tag makes pages list- ing documents assigned to this tag appear from. social bookmarking systems. In addition, next to each tag, there are the “+” signs associated with the tags and clicking on them makes the system issue new search queries with the cor- responding related tag, On the other hand, “related bookmarks” are tuples of social bookmark users and their tags obtained using the returned search results. So- cial bookmarkers have scores assigned depend- 253 ‘Social Bookmarking and Web Search Figure 10. Snapshot of the interface of the enhanced search system 6 Moun I sestaebigensly 9] XPATH 2 BB Beever Orme: SBSearch went ‘Seach Engine Tempera Factor bus Ee et ae iene! Moun 3 eat Senex oh MU EOT Mea ‘Stays yastih Senden Za terrecs boa eae Si 79mg 10 (Sarai comme age cnn 00 Sip Sto ‘usa co avis anne Seen soe i ee eed cae Foe pw, eb go. 01. 8 og Pereotmen on fa cna bat Cnr ing how many of the returned results they have bookmarked. Then, for the top-scored users, the system detects the other most frequent tags that these users used provided they are assigned to the pages retumed in the results, The links to the corresponding pages of these tags in social bookmarking systems are displayed as “related bookmarks”. In Figure 11 we show a modified interface of the above application, Here, upon request, users are able to observe the temporal pattern of social bookmark creation for each page retumed in results. 254 Source Weight seneue/ Haase Teas ‘Sci Booearks User Response oust © Languages ©sspanese On) Process Tune 649sec Related Tags. Viens muy °2cH1-buuetooth aaron aes. gadget dame site seta news intendo »pe ps3 stab Wil /outube .= his aay hte Related Bookmarks 145 RanTaun {2onematane ‘Semel jo2magzcve se auzamene 7h ore 27 Ratan 32 matoame Press 2 aon SUMMARY Inthissection, wesummarize positiveandnegative aspects of social bookmarks from the viewpoint of their usefulness for improving current Web search. Positive Factors Theusefulness of social bookmarks for enhancing, search in the Web has been recently proved by several research efforts (Bao, Wu, Fei, Xue, Su, & Yu,2007) (Heymann, Koutrika, & Garcia-Molina, 2008). Similarly to Web links, social bookmarks ‘Social Bookmarking and Web Search are kinds of votes cast by Web users to resources such as Web pages. Links, however, are usually created by document authors and, thus, average users are rather constrained in making links. On. the other hand, social bookmarks, as being easily ‘generated by users, are, as a consequence, a more democratic means of page quality assessment. As we have previously shown, on average, social bookmarks have better temporal characteristics (ic., are more dynamic) and allow for early de- tecting high quality pages that are often still not popular on the Web when judgingby conventional approaches (e.g, PageRank), In addition, tags at- tached to bookmarks provide information about the topical scopes of bookmarked resources or sometimes even convey attitudes of users to book- marked resources, Next, the timestamps of social bookmarks enable estimation of the Huctuations in their popularity within social bookmarking systems over time, This can be used for various Figure 1. Snapshot of the system interface with displayed temporal patterns of social bookmarks ac- cumulation Ce Me Mls es twee suircowactieormme Be coos) 255 time~ :ntric improvements of search results. In general the information associated with social bookmarks appears to be useful in improving the precision and extending the query model in information retrieval process. Negative Factors There are, however, several obstacles in directly utilizing social bookmarks for Web search, One is a relatively small number of pages having a considerable amount of social bookmarks on the ‘Web. This problem, however, seemsto besolvable inthenear future considering the current popularity increase of social bookmarking systems among Web users (Heymann, Koutrika, & Garcia-Molina, 2008), Another issue isrelatedto the characteristics of social bookmarking that makes it popular and useful for Web search, Namely, social bookmarks havehigh vulnerability to spamming, Since book- marking pages is quite simple, hence, it is also relatively easy to influence the number of social bookmarks a given page has or to purposefully assign wrong tags to the page. The obvious rea- son for such manipulation would be the expected increase in the visibility ofthe page in the system and, indirectly, its visibility on the whole Web. ‘Naturally, certain measureshave been undertaken to cope with this problem. For example, filters are set against automatic creation of user accounts using “captcha”, detection of robot behavior or other preventive methods. Social bookmarking users can also report suspicious accounts or spam. ‘Web pages in several systems. Nevertheless, it is easy to foresce that the above approaches are superficial and cannot be effective in the future, neither scale well on the Web. If social bookmarks ate ever going to be 2 scrious improvement to the Web scale search and, actually, if they are going to be still useful in the future, the problem of spamming has to be appropriately tackled, It is thus apparent that this issue requires much research focus. However, so far, there has been no efficient proposal towards 256 Social Bookmarking and Web Search combating spamming in social bookmarking systems, Lastly, as mentioned before, the lack of controlled vocabulary, misspellings, synonymy, polysemy, ete, canhinderthe information retrieval process that utilizes tags in social bookmarks. OTHER POTENTIAL APPLICATIONS We believe there can be many potential business ‘opportunities related to social bookmarks. For example, social bookmarking in enterprises and intranets has been already proposed and imple- mented (¢.g., (Damianos, Griffith, & Cuomo, 2006), (Millen, Feinberg, & Kerr, 2006)). Since there are already several different so- cial bookmarking services on the Web we think ‘that the combination of data from these could be advantageous to users. Similarly to the concept of meta-search engines this should increase the coverage, freshness and trustworthiness of genet- ated resource ranking. An unsolved issue in such meta-seatch applications is the way to combine popularity estimates drawn from different sources. Such combination should perform normalization score according to various characteristics of dif- ferent services such as their sizes or scopes. We see also potential in improving contextual advertising on the Web using social bookmarks. Currently, representative keywords are automati- cally extracted from pages that are latermatched to the pool of ads. Inan extended advertising model the advertising terms for a given page could be selected among tagsassociated to its bookmarks or could be even retrieved from among related pages commonly bookmarked by the same users. The integration of social bookmarking with social networks is another potential area. Social circles also help improve Web search, (Mislove, Gummadi, & Druschel, 2006) proposed the Web search system that can search pages using web- browsing log between members of given social network as well as pages indexed by conventional ‘Social Bookmarking and Web Search Web search engines. They reported that the system retumed 8% non-indexed but viewed by com ‘munity previously pages. They also destibed the issues of privacy, membership identification of social networks, ranking of search results, scal- ability and so on. In 2007, Google together with, several social network service providers launched OpenSocial". This is a unified platform that lets third party developers to utilize relationships be- ‘tween social network users, in their applications. Social networkingis closely related to socialbook- ‘marking, Social network contains information on ‘human relationships, while social bookmarks are descriptions of user interests. Integrating both systems should enhance the accuracy of recom- mendation, because social networks correspond. to human relationships and the connected users often feature similar interests Lastly, according to our recent study there is a lot of geographical information included in tags and we expect that some sort of location-aware search adaptation could be feasible, State-oF the art geo-tagging approaches rely on extracting location-related information from page content (eg, (Amitay, Har"El, Sivan, & Soffer, 2004)) or from among linked resources. Supplementing, thesemethods fromthe location-based information derived from social bookmarks and their tags may turn out advantageous. The investigation of this potential forms the part of our future work. ‘CONCLUSION In this chapter, we aimed at increasing user's familiarity with social bookmarking by discuss- ing its major characteristics. We described the key aspects related to social bookmarking and its potential to enhance Web search or to be used. forereating novel Web applications. An important part of this description is a comparative analysis of PageRank - a widely-known page popularity ‘measure witha metric derived by the aggregation of social bookmarks (SBRank). This comparative analysis is important inthe view of recent propos- als to incorporate social bookmarks into search ‘mechanisms on the Web. In result of our analysis, we are able to indicate the areas where SBRank is superior to PageRank. To remain objective, we also discuss the shortcoming of the popularity estimation based solely on the amount of social bookmarks. In addition, we propose several ap- plications and business models that could utilize social bookmarks for search in the Web or for other purposes, REFERENCES Amitay, E, Har’Bl, N., Sivan, R., & Soffer, A. (2004), Web-a-whete: Geotagging Web content SIGIR 2004, 273-280. Baeza-Yates, R., Castillo, C., & Saint-Jean, F. (2004). Web dynamics, structure, and page qual- ity. In, Levence & A. Poulovassilis (Eds.), Web dynamics. Springer. Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., & Yu, Y. (2007). Optimizing Web search using social annotations. In Proceedings of the 16% Inter- national World Wide Web Conference, Banff, Alberta, Canada Damianos, L., Griffith, J., & Cuomo, D. (2006) Omni: Social bookmarking on a corporate Inter- net, The MITRE Corporation. Golder, S. A., & Huberman, B, A. (2006). The structure of collaborative tagging systems. Journal of Information Science, 32(2), 198-208. doi:10.1177/0165551506062337 Grosky, W. L, Sreenath, D. V., & Fotouchi, F (2002). Emergent semantics and the multimedia Semantic Web. SIGMOD Record, 31(4), 54-58. doi:10.1145/637411.637420 257 Heymann, P., Koutrika, J., & Garcia-Molina, H. (2008), Can social bookmarking improve Web search? In Proceedings of the I ACM Inter- national Conference of Web Search and Data Mining, Stanford. Keller, R.M., Wolfe, R.R., Chen, J. R., Labinow- itz, J.L., & Mathe,N. (1997). Bookmarking service for organizing and sharing URLs. In Proceedings of the 6° International World Wide Web Confer- ence (pp. 1103-1114), Santa Clara, CA. Kleinberg, J. M. (1999). Alternative sources in a hyperlinked environment. Journal of the ACM, 604-632. doi:10.1145/324133.324140 Mandl, T. (2006). Implementation and evaluation ofa quality-based search engine. In Proceedings of ACM Hypertext 2006 Conference (pp. 73-84). Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academie Article, To Read, In Proceedings of ACM Hypertext 2006 Conference (pp. 31-40), Mathes, A. (2004). Folksonomics-cooperative classification and communication through shared metadata. Computer Mediated Communication. Millen, D. R., Feinberg, J., & Kerr, B. (2006). Doger: Social bookmarking in the enterprise. In Proceedings ofthe SIGCHI Conference on Human Factors in Computer Systems (pp. 111-120). Mislove, A., Gummadi, K.P, & Druschel, P. (2006), Exploring social network for Internet search. In Proceedings of the 5* Workshop on Hot Topics in Networks. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking: Bringing onder to the Web. (Tech. Rep.. Stanford Digital Library Technologies Project. Strutz, D. N. (2004). Communal categorization: The folksonomy. [Content representation. ]. Info, 622. 258 Social Bookmarking and Web Search Wu, H., Zubair, M., & Maly, K. (2006). Har vesting social knowledge from folksonomies. In Proceedings of ACM Hypertext 2006 Conference (pp. 111-114) Wu, X., Zhang, L., & Yu, Y. (2006). Exploring social annotations for the Semantic Web. In Pro ceedings of the 154 World Wide Web Conference (pp. 417-426), Yu, P. S., Li, X., & Liu, B, (2004), On temporal dimension of search. In Proceedings of the 13° International World Wide Web Conference on Al- ternate Track Papers & Posters (pp. 448-449). Zhang, L., Wu, X., & Yu, Y. (2006). Emergent semantics from folksonomies: A quantitative study. Journal on Data Semantics, VI, 168-186. doi:10.1007/11803034_8 ENDNOTES 1 DeLicio.us: http//del.icio.us De.icio.us—Wikipedia, the free encyclopedia (http://en. wikipedia org/wiki/Del.icio.us) obtained on March 3, 2008 Furl: http://www. furl.net Simpy: http:fiwww.simpy.com 5 Cloudalicio.us: http://cloudalicio.us, * — Hatena Bookmark: http:/b.hatena.ne.jp 7 Google Toolbar: http://toolbar.google.com/ ‘TAVindex. html Yahoo! Japan: http:/;www.yahoo.co.ip In some cases same tags are listed several times, since there may be several words used to express the same meaning in Japanese. “OpenSocial: http://code.google.com/apis opensocial ‘Social Bookmarking and Web Search APPENDIX We have created two query sets. As a first query set, we collected 50 keywords that gained highest usage on Goo!” on each month from January 2006 to September 2007. Goo! is one of popular search ‘engines in Japan. After removing duplicates (ie. same queries that appeared within 2 or more months) ‘we obtained 806 queries. As the second query set we collected frequent and recent tags that were used ‘on Hatena Bookmark at the same time. We obtained 531 tags 259

Anda mungkin juga menyukai