Internet Watch
Effort, of course, can be reduced via automation. While collaborative ltering is not necessarily effortless, it requires a relatively small amount of effort on the part of the user and provides very individualized recommendations. The collaborative filtering systems that we discuss here each offer a high degree of personalization, but each system takes a different approach to automation, attempting to find the best trade-off between the amount of work the users must put into the system and the perceived value and benets they receive in return.
TAPESTRY
Collaborative ltering research began in the early 1990s at Xerox PARC in response to the overwhelming number of e-mail messages within PARC, which numbered far more than could be easily managed by mailing lists and keyword ltering. The Tapestry system enabled users to add annotations to messages. Two databases stored the incoming stream of documents and the linked annotation records. A sophisticated query system allowed users to browse for messages based on both their content and annotations. Users could set up standing filter queries that would watch the document stream and annotation records, nding documents that matched the query at any time, present or future. For instance, a user could ask for all messages about collaborative ltering rated excellent by a superior. Only when the message was rated excellent would it be selected and forwarded to the user. Tapestry was the rst step in automating recommendation-sharing among friends and colleagues. It capitalized on the idea that humans working with computers could be more effective information filters than computers or humans working alone. People understand and judge information in ways that current computer systems cannot, largely because people can more readily determine quality as well as content. Because users needed to know whose recommendations to follow, Tapestry worked best in a small community of people who already knew each other.
umans have always addressed the high cost of finding information by sharing itinventing oral traditions, written language, and the Web as information-sharing tools. The printing press, broadcast media, and most recently the Internet have all changed the nature of the information problem. Information is no longer scarce. Indeed, there is far too much of it for any one person to review, let alone organize. Instead of being starved for information, we find ourselves overloaded. When information is abundant, the knowledge of which information is useful and valuable matters most. We all use our network of family, friends, and colleagues to recommend movies, books, cars, and news articles. Collaborative ltering technology automates the process of sharing opinions on the relevance and quality of information. Collaborative ltering is one technique among many information ltering techniques that range from unltered to per-
Editor: Ron Vetter, University of North Carolina at Wilmington, Mathematical Sciences Dept., 601 South College Rd., Wilmington, NC 28403; voice (910) 9623671, fax (910) 962-7107; vetter@cms. uncwil.edu
sonalized and from effortless to laborious, as illustrated by the chart shown as Figure 1. Libraries or the Web are good examples of unfiltered information sources. E-mail directed to one recipient is a good example of a ltered information source. A best-seller list requires little effort for the user, but provides the same recommendations to all users, so it is in the upper left of the chart. Filters based on demographics, such as age, sex, or marital status, require some effort from the user in providing the demographics, and provide some level of personal filtering, so they are near the middle of the chart. Collaborative filtering requires relatively little effort from the user, and provides individually targeted recommendations, so it is in the upper right of the chart.
106
Computer
Impersonal
Personal
Manual
Figure 1. Information retrieval techniques. The vertical dimension indicates how difcult it is for the end user to access the ltered information, while the horizontal dimension indicates the level of personalization. Filters based on demographics require some effort from the end user and provide some level of personal ltering, so would be placed near the middle of the chart. Automated collaborative ltering requires relatively little effort from the end user and provides individually targeted recommendations, so it would be placed in the upper right corner of the chart.
interests. The opinions are weighed according to how close each member of the neighborhood is to Jane. GroupLens shows that predictions from an automated recommender system can be meaningful to users. Predictions generated by the GroupLens engine correlate well with user ratings and are more accurate than average ratings. Highly rated articles are more likely to be read and rated, which means that users are more likely to rate articles so that the system can better understand their interests.
Their research veried that predictions improve as more ratings are collected. Video Recommender, which makes recommendations on movies, found a middle ground on the trade-off between lots of work and lots of value (the Tapestry model) and no work and little value (ratings by movie critics). In exchange for submitting ratings on a selected set of movies, the system generates personalized predictions that are more accurate than critic recommendations. Video Recommenders predictions have a 0.62 correlation coefcient, while movie critics achieve only a 0.22 correlation coefcient. Both Ringo and Video Recommender show that collaborative filtering can apply to all media, even domains like music and movies where computer-based content analysis is not yet possible. These systems showed collaborative filtering allows serendipity where content-based systems might not. If youve shown interest only in country-western music, for
April 1998
107
Internet Watch
example, a content lter would only recommend more country western. In a collaborative recommender system, however, users whose interests correlate with yours on country western might lead you to discover blues albums of interest. Ringo and Video Recommender also extend the virtual community to a real connected community by allowing users to post comments for others to read and by revealing e-mail addresses of users who have volunteered to reveal their identities. Users wanted to get to know others who shared their tastes and even requested a Video Recommender singles club. The knowledge derived from such clubs made users more condent in the recommendations they received.
LOTUS
Lotus developed an active collaborative filtering system that revived the Tapestry model. Lotus researchers believed people could always give more relevant recommendations than any computed function, so they chose to ask for more work from the users in exchange for better predictions about user interests. Built in Lotus Notes, the system made it easy to send pointers to Web pages. Pointers could include hypertext links and annotations explaining the content, context, and relevance of the document. One pointer, for example, might read: Sally, you should denitely see this page on collaborative ltering.Jane. In the Lotus system, pointers could be sent to groups or individuals or published for all to see. Lotus found a striking division between those who would provide information and those who would use it. In the system Lotus implemented, one user was responsible for 80 percent of the pointers. These information mediators can help ensure the quality of information, helping other users grow to trust their recommendations. As in Tapestry, in small social workgroups information mediators may generate enough value to spend much of their time mediating information. In large anonymous groups, information mediation may require shared work from a larger community. 108
Computer
roupLens continues to evolve at the University of Minnesota and we are experimenting with new techniques to help people nd information that is of value to them. Weve found that time spent reading is a fairly accurate measure of a users rating for an article. Future GroupLens systems, then, will use time measurements to gather implicit ratings and to build predictions from those ratings. Users can of course immediately see the benefits of such a system, which requires little extra work to personalize their information needs. Collaborative ltering can also incorporate agent technology through ltering robots. Filterbots can automatically rate new articles as they appear by using different content analysis algorithms. The rst human raters to see these articles will already see predictions, personalized by their correlations with the various lterbots. New prediction algorithms will likely help counter the sparsity problem, where users have not rated enough items in common to correlate, and the scalability problem, where huge numbers of users and items require overwhelming computing resources. y
Al Borchers is a visiting faculty member and postdoctoral researcher at the University of Minnesota. He is developing collaborative filtering algorithms to power the next GroupLens system. Jon Herlocker is a PhD student at the University of Minnesota, researching algorithmic issues in collaborative ltering and ways to measure the effectiveness of recommender systems. Joseph Konstan is an assistant professor of computer science and engineering at the University of Minnesota. He also serves as consulting scientist for Net Perceptions, a company that he cofounded to commercialize collaborative ltering. John Riedl is an associate professor of computer science and engineering at the University of Minnesota. He is also chief technical ofcer of Net Perceptions and the cocreator of GroupLens. Contact the authors at {borchers,herlocke,konstan,riedl}@cs.umn.edu.
Acknowledgements The authors gratefully acknowledge the contributions of GroupLens cofounder Paul Resnick, Hack Week participants Dave Maltz and Brad Miller, all the members of the GroupLens Research team, and the support of the National Science Foundation under grant IRI9613960.