Anda di halaman 1dari 13

Ethics Inf Technol (2010) 12:313–325

DOI 10.1007/s10676-010-9227-5

‘‘But the data is already public’’: on the ethics of research


in Facebook
Michael Zimmer

Published online: 4 June 2010


 Springer Science+Business Media B.V. 2010

Abstract In 2008, a group of researchers publicly The dataset comprises machine-readable files of vir-
released profile data collected from the Facebook accounts tually all the information posted on approximately
of an entire cohort of college students from a US univer- 1,700 [Facebook] profiles by an entire cohort of
sity. While good-faith attempts were made to hide the students at an anonymous, northeastern American
identity of the institution and protect the privacy of the data university. Profiles were sampled at 1-year intervals,
subjects, the source of the data was quickly identified, beginning in 2006. This first wave covers first-year
placing the privacy of the students at risk. Using this profiles, and three additional waves of data will be
incident as a case study, this paper articulates a set of added over time, one for each year of the cohort’s
ethical concerns that must be addressed before embarking college career.
on future research in social networking sites, including the Though friendships outside the cohort are not part of
nature of consent, properly identifying and respecting the data, this snapshot of an entire class over its
expectations of privacy on social network sites, strategies 4 years in college, including supplementary infor-
for data anonymization prior to public release, and the mation about where students lived on campus, makes
relative expertise of institutional review boards when it possible to pose diverse questions about the rela-
confronted with research projects based on data gleaned tionships between social networks, online and offline.
from social media. (N.A. 2008)
Recognizing the privacy concerns inherent with the
Keywords Research ethics  Social networks 
collection and release of social networking data, the T3
Facebook  Privacy  Anonymity
research team took various steps in an attempt to protect
the identity of the subjects, including the removal of stu-
dent names and identification numbers from the dataset, a
Introduction
delay in the release of the cultural interests of the subjects,
and requiring other researchers to agree to a ‘‘terms and
In September 2008, a group of researchers publicly
conditions for use,’’ prohibiting various uses of the data
released data collected from the Facebook accounts of an
that might compromise student privacy, and undergoing
entire cohort of college students. Titled ‘‘Tastes, Ties, and
review by their institutional review board (Lewis 2008, pp.
Time’’ (T3), the announcement accompanying the release
28–29).
noted the uniqueness of the data:
Despite these steps, and claims by the T3 researchers
that ‘‘all identifying information was deleted or encoded’’
(Lewis 2008, p. 30), the identity of the source of the dataset
M. Zimmer (&) was quickly discovered. Using only the publicly available
School of Information Studies, University of Wisconsin- codebook for the dataset and other public comments made
Milwaukee, 656 Bolton Hall, 3210 N. Maryland Ave, about the research project, the identity of the ‘‘anonymous,
Milwaukee, WI 53211, USA northeastern American university’’ from which the data
e-mail: zimmerm@uwm.edu

123
314 M. Zimmer

was drawn was quickly narrowed down to 13 possible Recognizing the data limitations faced by typical
universities (Zimmer 2008b), and then surmised to be sociological studies of online social network dynamics, a
Harvard College (Zimmer 2008a). Reminiscent of the ease group of researchers from Harvard University and the
at which AOL users were re-identified when the search University of California—Los Angeles set out to construct
engine thought the release of individuals’ search history a more robust dataset that would fully leverage the rich
data was sufficiently anonymized (see Barbaro and Zeller data available on social networking websites.3 Given its
Jr 2006), this re-identification of the source institution of popularity, the researchers chose the social network site
the T3 dataset reveals the fragility of the presumed privacy Facebook as their data source, and located a university that
of the subjects under study.1 allowed them to download the Facebook profiles of every
Using the T3 data release and its aftermath as a case member of the freshman class:
study, this paper will reveal numerous conceptual gaps in
With permission from Facebook and the university in
the researchers’ understanding of the privacy risks related
question, we first accessed Facebook on March 10
to their project, and will articulate a set of ethical concerns
and 11, 2006 and downloaded the profile and network
that must be addressed before embarking on future research
data provided by one cohort of college students. This
similarly utilizing social network data. These include
population, the freshman class of 2009 at a diverse
challenges to the traditional nature of consent, properly
private college in the Northeast U.S., has an excep-
identifying and respecting expectations of privacy on social
tionally high participation rate on Facebook: of the
network sites, developing sufficient strategies for data
1640 freshmen students enrolled at the college,
anonymization prior to the public release of personal data,
97.4% maintained Facebook profiles at the time of
and the relative expertise of institutional review boards
download and 59.2% of these students had last
when confronted with research projects based on data
updated their profile within 5 days. (Lewis et al.
gleaned from social media.
2008, p. 331)
This first wave of data collection took place in 2006,
The ‘‘Tastes, Ties, and Time’’ project during the spring of the cohort’s freshman year, and data
collection was repeated annually until 2009, when the vast
Research in social networks has spanned decades, from majority of the study population will have graduated,
Georg Simmel’s foundational work in sociology (Simmel providing 4 years of data about this collegiate social net-
and Wolff 1964), to Barry Wellman’s analyses of social work. Each student’s official housing records were also
networks in the emerging networked society of the late obtained from the university, allowing the researchers to
twentieth century (Wellman and Berkowitz 1988), to the ‘‘connect Internet space to real space’’ (Kaufman 2008a).
deep ethnographies of contemporary online social networks The uniqueness of this dataset is of obvious value for
by boyd (2008b). Indeed, the explosive popularity of online sociologists and Internet researchers. The data was
social networking sites such as MySpace, Twitter, and extracted directly from Facebook without direct interaction
Facebook has attracted attention from a variety of with the subjects or reliance on self-reporting instruments,
researchers and disciplines (see boyd and Ellison 2008).2 A either of which could taint the data collected. The dataset
primary challenge to fully understanding the nature and includes demographic, relational, and cultural information
dynamic of social networks is obtaining sufficient data. on each subject, allowing broad analyses beyond more
Most existing studies rely on external surveys of social simple profile scraping methods. The inclusion of housing
networking participants, ethnographies of smaller subsets data for each of the 4 years of the study for analysis of any
of subjects, or the analysis of limited profile information connection between ‘‘physical proximity, emerging room-
extracted from what subjects chose to make visible. As a mate and friendship groups in the real world and the
result, the available data can often be tainted due to self- presence of these two types of relationships in their Face-
reporting biases and errors, have minimal representative- book space’’ (Kaufman 2008a). Most importantly, the
ness of the entire population, or fail to reflect the true depth dataset represents nearly a complete cohort of college
and complexity of the information users submit (and cre- students, allowing the unique analysis of ‘‘complete social
ate) on social networking sites. universe’’ (Kaufman 2008a), and it is longitudinal,
1
While no individuals within the T3 dataset were positively
identified (indeed, the author did not attempt to re-identify individ-
uals), discovering the source institution makes individual re-identi- 3
The research team includes Harvard University professors Jason
fication much easier, perhaps even trivial, as discussed below. Kaufman and Nicholas Christakis, UCLA professor Andreas Wim-
2
See also bibliography maintained by danah boyd at http://www. mer, and Harvard sociology graduate students Kevin Lewis and
danah.org/SNSResearch.html. Marco Gonzalez.

123
On the ethics of research in Facebook 315

providing the ability to study how the social network The codebook also included an account of the steps
changes over time. taken by the T3 researchers in an attempt to protect subject
As a result of its uniqueness, the dataset can be privacy:
employed for a number of research projects that have
All data were collected with the permission of the
heretofore been difficult or impossible to pursue. As one of
college being studied, the college’s Committee on the
the ‘‘Tastes, Ties, and Time’’ researchers noted, ‘‘We’re on
Use of Human Subjects, as well as Facebook.com.
the cusp of a new way of doing social science… Our
Pursuant to the authors’ agreement with the Committee
predecessors could only dream of the kind of data we now
on the Use of Human Subjects, a number of precau-
have’’ (Nicholas Christakis, qtd in Rosenbloom 2007).
tionary steps were taken to ensure that the identity and
privacy of students in this study remain protected. Only
those data that were accessible by default by each RA
The dataset release
were collected, and no students were contacted for
additional information. All identifying information
The ‘‘Tastes, Ties, and Time’’ project has been funded, in
was deleted or encoded immediately after the data were
part, by a grant from the National Science Foundation,4
downloaded. The roster of student names and identi-
who mandates certain levels of data sharing as a condition
fication numbers is maintained on a secure local server
of its grants.5 As a result, the Facebook dataset is being
accessible only by the authors of this study. This roster
made available for public use in phases, roughly matching
will be destroyed immediately after the last wave of
the annual frequency of data collection: wave 1 in Sep-
data is processed. The complete set of cultural taste
tember 2008, wave 2 in the fall of 2009, wave 3 in the fall
labels provides a kind of ‘‘cultural fingerprint’’ for
of 2010, and wave 4 in the fall of 2011 (Lewis 2008, p. 3).
many students, and so these labels will be released only
The first wave of data, comprising of ‘‘machine-readable
after a substantial delay in order to ensure that students’
files of virtually all the information posted on approxi-
identities remain anonymous. Finally, in order to
mately 1700 FB profiles by an entire cohort of students at
access any part of the dataset, prospective users must
an anonymous, northeastern American university,’’ was
read and electronically sign [a] user agreement…
publicly released on September 25, 2008 (N.A. 2008).6
(Lewis 2008, p. 29)
Prospective users of the dataset are required to submit a
brief statement detailing how the data will be used, and These steps taken by the T3 researchers to remove
access is granted at the discretion of the T3 research team. identifying information reveal an acknowledgment of—and
Researchers are also required to agree to a ‘‘Terms and sensitivity to—the privacy concerns that will necessarily
Conditions of Use’’ statement in order to gain access to the arise given the public release of such a rich and complete
dataset, consenting to various licensing, use, and attribution set of Facebook data. Their intent, as expressed by the
provisions. project’s principle investigator, Jason Kaufman, was to
A comprehensive codebook was downloadable without ensure that ‘‘all the data is cleaned so you can not connect
the need to submit an application, which included detailed anyone to an identity’’ (Kaufman 2008a). Unfortunately,
descriptions and frequencies of the various data elements the T3 researchers were overly optimistic.
(see Lewis 2008), including gender, race, ethnicity, home
state, political views, and college major. For example, the
codebook revealed that the dataset included 819 male and Partial re-identification and withdrawal of dataset
821 female subjects, and that there were 1 self-identified
Albanian, 2 Armenians, 3 Bulgarians, 9 Canadians, and so Cognizant of the privacy concerns related to collecting and
on. releasing detailed Facebook profile data from a cohort of
college students, the T3 research team—in good faith—
took a number of steps in an attempt to protect subject
privacy, including review by their institutional review
4
See ‘‘Social Networks and Online Spaces: A Cohort Study of board, the removal of student names and identification
American College Students’’, Award #0819400, http://www.nsf.gov/ numbers from the dataset, a delay in the release of the
awardsearch/showAward.do?AwardNumber=0819400. cultural interests of the subjects, and requiring other
5
See relevant National Science Foundation Grant General Condi- researchers to agree to a ‘‘terms and conditions for use’’
tions (GC-1), section 38. Sharing of Findings, Data, and Other
that prohibited any attempts to re-identify subjects, to
Research Products (http://www.nsf.gov/publications/pub_summ.jsp?
ods_key=gc109). disclose any identities that might be inadvertently re-
6
The dataset is archived at the IQSS Dataverse Network at Harvard identified, or otherwise to compromise the privacy of the
University (http://dvn.iq.harvard.edu/dvn/). subjects.

123
316 M. Zimmer

However, despite these efforts, the team’s desire to of the data was quickly narrowed down from over 2000
ensure ‘‘all the data is cleaned so you can not connect possible colleges and universities to a list of only seven
anyone to an identity’’ fell short. On September 29, 2008, (Zimmer 2008b). An examination of the codebook revealed
only 4 days after the initial data release, Fred Stutzman, a the source was a private, co-educational institution, whose
Ph.D. student at the University of North Carolina at Chapel class of 2009 initially had 1640 students in it. Elsewhere,
Hill’s School of Information and Library Science, ques- the source was identified as a ‘‘New England’’ school. A
tioned the T3 researchers’ faith in the non-identifiability of search through an online college database7 revealed only
the dataset: seven private, co-ed colleges in New England states (CT,
ME, MA, NH, RI, VT) with total undergraduate popula-
The ‘‘non-identifiability’’ of such a dataset is up for
tions between 5000 and 7500 students (a likely range if
debate. A friend network can be thought of as a fin-
there were 1640 in the 2006 freshman class): Tufts Uni-
gerprint; it is likely that no two networks will be exactly
versity, Suffolk University, Yale University, University of
similar, meaning individuals may be able to be iden-
Hartford, Quinnipiac University, Brown University, and
tified in the dataset post-hoc… Further, the authors of
Harvard College.
the dataset plan to release student ‘‘Favorite’’ data in
Upon the public announcement of this initial discovery,
2011, which will provide further information that may
and general criticism of the research team’s attempts to
lead to identification. (Stutzman 2008)
protect the privacy of the subjects, Jason Kaufman, the
Commenting on Stutzman’s blog post on the subject, principle investigator of the T3 research project, was quick
Eszter Hargittai, an Associate Professor of Communication to react, noting that, perhaps in justification for the amount
Studies at Northwestern University, sounded similar of details released in the dataset, ‘‘We’re sociologists, not
concerns: technologists, so a lot of this is new to us’’ and ‘‘Sociolo-
gists generally want to know as much as possible about
I think it’s hard to imagine that some of this ano-
research subjects’’ (Kaufman 2008b). He then attempts to
nymity wouldn’t be breached with some of the par-
diffuse some of the implicit privacy concerns with the
ticipants in the sample. For one thing, some
following comment:
nationalities are only represented by one person.
Another issue is that the particular list of majors What might hackers want to do with this information,
makes it quite easy to guess which specific school assuming they could crack the data and ‘see’ these
was used to draw the sample. Put those two pieces of people’s Facebook info? Couldn’t they do this just as
information together and I can imagine all sorts of easily via Facebook itself?
identities becoming rather obvious to at least some Our dataset contains almost no information that isn’t
people. (Hargittai 2008) on Facebook. (Privacy filters obviously aren’t much
of an obstacle to those who want to get around them.)
Stutzman and Hargittai share a fear of the possible re-
(Kaufman 2008b)
identification of the presumed anonymous Facebook data-
set that has been made available to the public. Stutzman’s And then:
concern over the ability to exploit the uniqueness of one’s
We have not accessed any information not otherwise
social graph to identify an individual within a large dataset
available on Facebook. We have not interviewed
has proven true in numerous cases (see, for example, Na-
anyone, nor asked them for any information, nor
rayanan and Shmatikov 2008, 2009). Hargittai suggests
made information about them public (unless, as you
that the uniqueness of the some of the data elements makes
all point out, someone goes to the extreme effort of
identifying the source of the data—and therefore some of
cracking our dataset, which we hope it will be hard to
the individual subjects—quite trivial. Hargittai’s fears were
do). (Kaufman 2008c)
correct.
However, little ‘‘extreme effort’’ was needed to further
Partial re-identification ‘‘crack’’ the dataset; it was accomplished a day later, again
without ever looking at the data itself (Zimmer 2008a). As
Within days of its public release, the source of the T3 Hargittai recognized, the unique majors listed in the
dataset was identified as Harvard College (see Zimmer codebook allowed for the ultimate identification of the
2008a, b). Most striking about this revelation was that the source university. Only Harvard College offers the specific
identification of the source of the Facebook data did not variety of the subjects’ majors that are listed in the code-
require access to the full dataset itself. book, such as Near Eastern Languages and Civilizations,
Using only the freely available codebook and referenc-
7
ing various public comments about the research, the source College Board, http://www.collegeboard.com.

123
On the ethics of research in Facebook 317

Studies of Women, Gender and Sexuality, and Organismic we work to ensure that our dataset maintains the
and Evolutionary Biology. The identification of Harvard highest standards for protecting student privacy.10
College was further confirmed after analysis of a June 2008
A full year after the initial release, the dataset remains
video presentation by Kaufman, where he noted that
unavailable, with the following message greeting interested
‘‘midway through the freshman year, students have to pick
researchers:
between 1 and 7 best friends’’ that they will essentially live
with for the rest of their undergraduate career (Kaufman UPDATE (10/2/09): The T3 dataset is still offline as
2008a). This describes the unique method for determining we take further steps to ensure the privacy of students
undergraduate housing at Harvard: all freshman who in the dataset. Please check back later at this site for
complete the fall term enter into a lottery, where they can additional updates- a notice will be posted when the
designate a ‘‘blocking group’’ of between 2 and 8 students distribution process has resumed.11
with whom they would like be housed in close proximity.8
These messages noting the restricted access to the
In summary, the source of the T3 dataset was estab-
Facebook dataset to ‘‘ensure that our dataset maintains the
lished with reasonable certainly in a relatively short period
highest standards for protecting student privacy’’ suggest
of time, without needing to download or access the dataset
that the re-identification of the source as Harvard College
itself. While individual subjects were not identified in this
was correct, and that the T3 research team is re-evaluating
process, the ease of identification of the source places their
their processes and procedures in reaction.
privacy in jeopardy given that the dataset contains a rela-
tively small population with many unique individuals. The
hopes by the T3 research team that ‘‘extreme effort’’ would
The insufficiency of privacy protections
be necessary to ‘‘crack’’ the dataset were, unfortunately,
in the T3 project
overly optimistic.
The changing nature—and expectations—of privacy in
Withdrawal of the dataset
online social networks are being increasingly debated and
explored (see, for example, Gross and Acquisti 2005;
The announcement of this likely identification of the source
Barnes 2006; Lenhart and Madden 2007; Nussbaum 2007;
of the Facebook dataset did not prompt a public reply by
Solove 2007; Albrechtslund 2008; Grimmelmann 2009).
the T3 research team, but within 1 week of the discovery,
The events surrounding the release of the Facebook data in
the access page for the ‘‘Tastes, Ties, and Time’’ dataset
the ‘‘Tastes, Ties, and Time’’ reveals many of the fault
displayed the following message, indicating that the dataset
lines within these debates. Critically examining the meth-
was, at least for the moment, no longer publicly available:
ods of the T3 research project, and the public release of the
Note: As of 10/8/08, prospective users may still dataset, reveals numerous conceptual gaps in the under-
submit requests and research statements, but the standing the nature of privacy and anonymity in the context
approval process will be delayed until further notice. of social networking sites.
We apologize for the inconvenience, and thank you The primary steps taken by the T3 research team to
for your patience.9 protect subject privacy (quoted above), can be summarized
as follows:
Then, in March 2009, the page was updated with a new
message acknowledging the removal was in response to 1. Only those data that were accessible by default by each
concerns over student privacy: RA were collected, and no students were contacted for
additional information.
UPDATE (3/19/09): Internal revisions are almost
2. All identifying information was deleted or encoded
complete, and we expect to begin distributing again
immediately after the data were downloaded.
in the next 2–3 weeks. In the meantime, please DO
3. The complete set of cultural taste labels provides a
NOT submit new dataset requests; but please check
kind of ‘‘cultural fingerprint’’ for many students, and
back frequently at this website for a final release
so these labels will be released only after a substantial
notice. We again apologize for any inconvenience,
and thank you for your patience and understanding as
10
Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on March
8
This process is described at the Harvard College Office of 27, 2009, on file with author. Webpage remains unchanged as of April
Residential Life website: http://www.orl.fas.harvard.edu/icb/icb.do? 29, 2009.
keyword=k11447&tabgroupid=icb.tabgroup17715. 11
Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on
9
Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on October November 1, 2009, on file with author. As of May 29, 2010, this
22, 2008, on file with author. message remains in place.

123
318 M. Zimmer

delay in order to ensure that students’ identities remain It is important to note that both undergraduate and
anonymous. graduate student RAs were employed for download-
4. In order to access any part of the dataset, prospective ing data, and that each type of RA may have had a
researchers must agree to a ‘‘terms and conditions for different level of default access based on individual
use’’ that prohibits any attempts to re-identify subjects, students’ privacy settings. In other words, a given
to disclose any identities that might be inadvertently student’s information should not be considered
re-identified, or otherwise to compromise the privacy objectively ‘‘public’’ or ‘‘private’’ (or even ‘‘not on
of the subjects. Facebook’’)—it should be considered ‘‘public’’ or
5. The entire research project, including the above steps, ‘‘private’’ (or ‘‘not on Facebook’’) from the perspec-
were reviewed and approved by Harvard’s Committee tive of the particular RA that downloaded the given
on the Use of Human Subjects. student’s data. (Lewis 2008, p. 6)
While each of these steps reveal good-faith efforts to The T3 researchers concede that one RA might have
protect the privacy of the subjects, each has serious limi- different access to a student’s profile than a different RA,
tations that expose a failures by the researchers to fully and being ‘‘public’’ or ‘‘private’’ on Facebook is merely
understand the nature of privacy in online social network relative to that particular RAs level of access.
spaces, and to design their research methodology accord- What appears to be lost on the researchers is that a
ingly. Each will be considered below, followed by a brief subject might have set her privacy settings to be viewable
discussion of some of the public comments made by the T3 to only to other users within her network, but to be inac-
research team in defense of their methods and the public cessible to those outside that sphere. For example, a
release of the dataset. Facebook user might decide to share her profile informa-
tion only with other Harvard students, but wants to remain
Use of in-network RAs to access subject data private to the rest of the world. The RAs employed for the
project, being from the same network as the subject, would
In his defense of releasing subjects’ Facebook profile data, be able to view and download a subject’s profile data that
Jason Kaufmann, the principle investigator of the T3 pro- was otherwise restricted from outside view. Thus, her
ject, has stated that ‘‘our dataset contains almost no profile data—originally meant for only those within the
information that isn’t on Facebook’’ and that ‘‘We have not Harvard network—is now included in a dataset released to
accessed any information not otherwise available on the public. As a result, it is likely that profile information
Facebook’’ (Kaufman 2008c). Access to this information that a subject explicitly restricted to only ‘‘in network’’
was granted by Facebook, but only through a manual participants in Facebook has been accessed from within
process. Thus, research assistants (RA) from the source that network, but then extracted and shared outside those
institution (presumably Harvard) were employed to per- explicit boundaries.
form the labor-intensive task of search for each first year Given this likelihood, the justification that ‘‘we have not
student’s Facebook page and saving the profile informa- accessed any information not otherwise available on
tion. The dataset’s codebook confirms that ‘‘Only those Facebook’’ is true only to a point. While the information
data that were accessible by default by each RA were was indeed available to the RA, it might have been
collected, and no students were contacted for additional accessible only due to the fact that the RA was within the
information’’ (Lewis 2008, p. 29). same ‘‘network’’ as the subject, and that a privacy setting
The T3 codebook notes that of the 1,640 students in the was explicitly set with the intent to keep that data within
cohort, 1,446 were found on Facebook with viewable the boundaries of that network. Instead, it was included in a
profiles, 152 had a Facebook profile that was discoverable dataset released to the general public. This gap in the
but not viewable by the RA, and 42 were undiscoverable project’s fundamental methodology reveals a troublesome
(either not on Facebook or invisible to those not within lack of understanding of how users might be using the
their ‘‘friend’’ network) (Lewis 2008, p. 6).12 Importantly, privacy settings within Facebook to control the flow of
the codebook notes a peculiarity inherent with using in- their personal information across different spheres, and
network RAs to access the Facebook profile data: puts the privacy of those subjects at risk.

Removal or encoding of ‘‘identifying’’ information


12
Facebook allows users to control access to their profiles based on
variables such as ‘‘Friends only’’, or those in their ‘‘Network’’ (such In an effort to protect the identity of the subjects, researchers
as the Harvard network), or to ‘‘Everyone’’. Thus, a profile might not
be discoverable or viewable to someone outside the boundaries of the note that ‘‘All identifying information was deleted or
access setting. encoded immediately after the data were downloaded’’

123
On the ethics of research in Facebook 319

(Lewis 2008, p. 29), and that ‘‘all the data is cleaned so you However, others take a much broader stance in what
can not connect anyone to an identity’’ (Kaufman 2008a). constitutes personally identifiable information. The Euro-
Student names were replaced by ‘‘unique identification pean Union, for example, defines PII much more broadly to
numbers’’ and any e-mail addresses or phone numbers that include:
appeared in the Facebook profile data were excluded from the
[A]ny information relating to an identified or identi-
published dataset.
fiable natural person…; an identifiable person is one
Yet, as the AOL search data release revealed, even if
who can be identified, directly or indirectly, in par-
one feels that ‘‘all identifying information’’ has been
ticular by reference to an identification number or to
removed from a dataset, it is often trivial to piece together
one or more factors specific to his physical, physio-
random bits of information to deduce one’s identity (Bar-
logical, mental, economic, cultural or social
baro and Zeller Jr 2006). The fact that the dataset includes
identity.15
each subjects’ gender, race, ethnicity, hometown state, and
major makes it increasingly possible that individuals could Thus, while the T3 researchers might have felt simply
be identified, especially those with a unique set of char- removing or coding the subjects’ names or other specific
acteristics. Repeating Hargittai’s concern: ‘‘I think it’s hard identifiers from the dataset was sufficient, had they fol-
to imagine that some of this anonymity would not be lowed the European Union’s guidance, they would have
breached with some of the participants in the sample’’ recognized that many of the subjects’ ‘‘physical, physio-
(Hargittai 2008). logical, mental, economic, cultural or social identity’’
For example, the codebook reveals that each of these could also be used for re-identification. Even after
states has only a single student represented in the dataset: removing the names of the subjects, since the dataset still
Delaware, Louisiana, Mississippi, Montana, and Wyoming. includes race, ethnicity, and geographic data, re-identifi-
Similarly, there are only single instances of students cation of individual subjects remains a real possibility.
identified as Albanian, Hungarian, Iranian, Malaysian,
Nepali, Philippino, and Romanian. Their uniqueness very Delay in release of cultural taste data
well might have resulted in publicity: it is possible that
local media featured their enrollment at Harvard, or that the Despite the apparent lack of use of the EU’s more stringent
local alumni organization listed their name in a publicly- definition of ‘‘personally identifiable information,’’ the T3
accessible newsletter, and so on. If such unique individuals researchers do recognize the unique nature of the cultural
can be personally identified using external sources, and taste labels they have collected, referring to them as a kind
then located within the dataset, one might also learn his/her of ‘‘cultural fingerprint’’. To protect subject privacy, the
stated political views or sexual preference, resulting in a cultural tastes identified by the researchers have been
significant privacy breach. assigned a unique number, and only the numbers will be
This reveals that even when researchers believe they associated with students for the initial data releases. The
have removed or encoded ‘‘all identifying information,’’ entire set of the actual taste labels will only be released in
there often remains information that could just as easily be the fall of 2011, corresponding with the release of the wave
used to re-identify individuals.13 The T3 researchers’ belief 4 data.
that stripping names alone is sufficient resembles the typ- The T3 researchers are right to recognize how a person’s
ical definition of ‘‘personally identifiable information’’ unique set of cultural tastes could easily identify her. Yet,
(PII) within the United States legal framework. As defined merely instituting a ‘‘substantial delay’’ before releasing
in California law, for example, PII is typically limited to an this personal data does little to mitigate the privacy fears.
individual’s name or other personally identifiable elements Rather, it only delays them, and only by 3 years.
such as a social security number, a driver’s license number, Researchers routinely rely on datasets for years after their
or a credit card number.14 So long as these identifiers are initial collection: some influential studies of search engine
removed from a dataset, it is presumed to be sufficiently behavior rely on nearly 10-year-old data (see, for example,
anonymous. Jansen and Resnick 2005; Jansen and Spink 2005), and
these subjects’ privacy needs do not suddenly disappear
when they graduate from college in 2011.
13
Simply stripping names from records is rarely a sufficient means to Most surprisingly, despite the T3 researchers’ recogni-
keep a dataset anonymous. For example, Latanya Sweeny has shown tion of the sensitive nature of the cultural data, they will
that 87 percent of Americans could be identified by records listing
solely their birth date, gender and ZIP code (Sweeney 2002).
14 15
See, for example, the California Senate Bill 1386, http://info.sen. European Union Data Protection Directive 95/46/EC, http://eur-
ca.gov/pub/01-02/bill/sen/sb_1351-1400/sb_1386_bill_20020926_ lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:
chaptered.html. EN:HTML.

123
320 M. Zimmer

provide immediate access to it on a case-by case basis. As prove effective in raising awareness among potential
the codebook reveals: researchers. However, studies have shown that users fre-
quently simply ‘‘click through’’ such agreements without
In the meantime, if prospective users wish to access
fully reading them or recognizing they are entering into a
some subset of the taste labels, special arrangements
legally binding contract (Gatt 2002), and it is unclear how
may be made on a case-by-case basis at the discretion
the T3 researchers specifically intend to monitor or enforce
of the authors (send request and detailed justification
compliance with these terms. While requiring a terms of
to t3dataset@gmail.com). (Lewis 2008, p. 20)
use is certainly a positive step, without enforcement it
No further guidance is provided as to what kinds of might have limited success in deterring any potential pri-
arrangements are made and what justifications are needed vacy-invasive use of the data.
to make such an exception. If the T3 research team felt
strongly enough that it was necessary to encode and delay IRB approval
the release of the subjects’ ‘‘cultural fingerprints’’, it does
not seem appropriate to announce that exceptions can be As required of any research project involving human
made for its release to selected researchers prior to the 3- interaction, clearance for the research project and data
year delay. If it is potentially privacy invading content, it release was provided by Harvard’s intuitional review board
simply should not be released. (IRB), known as the Committee on the Use of Human
Subjects in Research.16 As Kaufman commented: ‘‘Our
Terms of use statement IRB helped quite a bit as well. It is their job to insure that
subjects’ rights are respected, and we think we have
Researchers wanting access to the T3 dataset must (elec- accomplished this’’ (Kaufman 2008c). Elsewhere he has
tronically) sign a Terms and Conditions of Use statement. noted that ‘‘The university in question allowed us to do this
The statement includes various covenants related to pro- and Harvard was on board because we don’t actually talk to
tecting the privacy of the subjects in the dataset, including students, we just accessed their Facebook information’’
(as numbered in the original): (Kaufman 2008a).
Just as we can question whether the T3 researchers full
3. I will use the dataset solely for statistical analysis and
understood the privacy implications of the research, we
reporting of aggregated information, and not for
must critically examine whether Harvard’s IRB—a panel
investigation of specific individuals or organizations,
of experts in research ethics—also sufficiently understood
except when identification is authorized in writing by
how the privacy of the subjects in the dataset could be
the Authors.
compromised. For example, did the IRB recognize, as
4. I will produce no links among the Authors datasets or
noted above, that using an in-network research assistant to
among the Authors data and other datasets that could
pull data could circumvent privacy settings intended to
identify individuals or organizations.
keep that data visible to only other people at Harvard? Or
5. I represent that neither I, nor anyone I know, has any
did the IRB understand that individuals with unique char-
prior knowledge of the possible identities of any study
acteristics could easily be extracted from the dataset, and
participants in any dataset that I am being licensed to
perhaps identified? It is unclear whether these concerns
use.
were considered and discarded, or whether the IRB did not
6. I will not knowingly divulge any information that
fully comprehend the complex privacy implications of this
could be used to identify individual participants in the
particular research project.17 In either case, the potential
study, nor will I attempt to identify or contact any
privacy-invading consequences of the T3 data release
study participant, and I agree to use any precautions
suggest a possible lapse of oversight at some point of the
necessary to prevent such identification.
IRB review process.
7. I will make no use of the identity of any person or
establishment discovered inadvertently. If I suspect
Other public comments
that I might recognize or know a study participant, I
will immediately inform the Authors, and I will not use
Beyond the shortcomings of the documented efforts to
or retain a copy of data regarding that study partici-
protect the privacy of the T3 dataset subjects, the
pant. If these measures to resolve an identity disclosure
researchers have made various public comments that reveal
are not sufficient, the Authors may terminate my use of
the dataset. (reproduced at Lewis 2008, p. 30)
16
http://www.fas.harvard.edu/*research/hum_sub/.
The language within this statement clearly acknowl- 17
Attempts to obtain information about the IRB deliberations with
edges the privacy implications of the T3 dataset, and might regard to the T3 project have been unsuccessful.

123
On the ethics of research in Facebook 321

additional conceptual gaps in their understanding of the On the issue of the ethics of this kind of research—
privacy implications of the T3 research project.18 Would you require that someone sitting in a public
For example, when confronted with the potential re- square, observing individuals and taking notes on
identifiability of the dataset, Kaufman responded by pon- their behavior, would have to ask those individuals’
dering ‘‘What might hackers want to do with this infor- consent in advance? We have not accessed any
mation, assuming they could crack the data and ‘see’ these information not otherwise available on Facebook. We
people’s Facebook info?’’ and later acknowledging have not interviewed anyone, nor asked them for any
‘‘Nonetheless, seeing your thought process—how you information, nor made information about them pub-
would attack this dataset—is extremely useful to us’’ lic… (Kaufman 2008c)
(Kaufman 2008b). Kaufman’s mention of ‘‘hackers’’,
This justification presents a false comparison. The
‘‘attacking’’ the dataset, and focusing on what someone
‘‘public square’’ example depends on random encounters of
might ‘‘do’’ with this information exposes a harm-based
people who happen to be in the square at the precise time
theory of privacy protection. Such a position supposes that
as the researcher. Further, the researchers cannot observe
so long as the data can be protected from attack by hackers
everyone simultaneously, and instead must select which
or others wishing to ‘‘do’’ something harmful once gaining
individuals to focus their attention, leaving some subjects
access, the privacy of the subjects can be maintained. Such
out of the dataset. Finally, the data gathered is imprecise,
a position ignores the broader dignity-based theory of pri-
and limited to the researchers ability to discern gender, age,
vacy (Bloustein 1964). Such a stance recognizes that one
ethnicity, and other physically-observable characteristics.
does not need to be a victim of hacking, or have a tangible
By contrast, the T3 researchers utilized an in-network
harm take place, in order for there to be concerns over the
research assistant to systematically access and download an
privacy of one’s personal information. Rather, merely
entire cohort of college students’ Facebook profile pages,
having one’s personal information stripped from the
each year for 4 years. They successfully targeted a specific
intended sphere of the social networking profile, and
and known group of students, obtaining a list of names and
amassed into a database for external review becomes an
e-mail addresses of the students from the source university
affront to the subjects’ human dignity and their ability to
to improve their ability to gather data on the entire popu-
control the flow of their personal information.
lation. The data acquired included not only the subjects’
The distinction between harm- and dignity-based theo-
self-reported gender and ethnicity, but also their home
ries of privacy are understood—and often debated—among
state, nation of origin, political views, sexual interests,
privacy scholars, but when asked if they conferred with
college major, relational data, and cultural interests—data
privacy experts over the course of the research and data
which would be considerably more difficult to obtain
release, Kaufman admits that ‘‘we did not consult [with]
through observations in a public square. Suggesting that the
privacy experts on how to do this, but we did think long
two projects are similar and carry similar (and minimal)
and hard about what and how this should be done’’ (Ka-
ethical dilemmas reveals a worrisome gap in the T3
ufman 2008c). Given the apparent focus on data security as
research team’s understanding of the privacy and ethical
a solution to privacy, it appears the T3 research team would
implications of their project.
have benefited from broader discussions on the nature of
privacy in these environments.19
The T3 researchers also claim that there should be little
The ethics of the ‘‘Tastes, Ties, and Time’’ project
concern over the ethics of this research since the Facebook
data gathered was already publicly available. As Kaufman
The above discussion of the unsatisfactory attempts by the
argues:
T3 researchers to protect subject privacy illuminates two
central ethical concerns with the ‘‘Tastes, Ties, and Time’’
18
project: the failure to properly mitigate what amounts to
This section is intended as an informal analysis of the discourse
used when talking about the T3 project. It is meant to reveal gaps in violations of the subjects’ privacy, and, thus, the failure to
broader understanding of the issues at hand, and not necessarily adhere to ethical research standards.
directed against a particular speaker.
19
After the T3 research project was funded and well underway, Privacy violations
Kaufman became a fellow at the Berkman Center for Internet &
Society at Harvard University, an organization dedicated to studying a
number of Internet-related issues, including privacy. While Kaufman The proceeding discussion notes numerous failures of the
presented preliminary results of his research to the Berkman T3 researchers to properly understand the privacy impli-
community prior to joining the center (Kaufman 2008a), there is no cations of the research study. To help concretize these
evidence that others at Berkman were consulted prior to the release of concerns, we can gather them into the following four
the T3 dataset.

123
322 M. Zimmer

salient dimensions of privacy violations, as organized by Unauthorized secondary use


Smith et al. (1996) and based on thorough review of pri-
vacy literature: the amount of personal information col- Unauthorized secondary use of personal information is the
lected, improper access to personal information, concern that information collected from individuals for one
unauthorized secondary use of personal information, and purpose might be used for another secondary purpose
errors in personal information.20 Viewing the circum- without authorization form the individual, thus the subject
stances of the T3 data release through the lens of this loses control over their information. Within Smith et al.’s.
privacy violation framework helps to focus the ethical (1996) framework, this loss of control over one’s personal
deficiencies of the overall project. information is considered a privacy violation. At least two
incidences of unauthorized secondary use of personal
Amount of personal information collected information can be identified in the T3 project. First, the
students’ housing information and personal email addresses
Privacy violations can occur when ‘‘extensive amounts of were provided to the T3 researchers to aid in their data
personally identifiable data are being collected and stored collection and processing. These pieces of information
in databases’’ Smith et al. (1996, p. 172). Notably, the were initially collected by the university to facilitate vari-
‘‘Tastes, Ties, and Time’’ project’s very existence is ous administrative functions, and not for secondary use to
dependent on the extensive collection of personal data. The assist researchers looking for students’ profiles on Face-
T3 project systematically, and regularly over a 4-year book. Second, the very nature of collecting Facebook
period, collected a vast amount of personal information on profile information, aggregating it, and releasing it for
over 1,500 college students. Individual bits of data that others to download invites a multitude of secondary uses of
might have been added and modified on a subject’s Face- the data not authorized by the students. The data was made
book profile page over time were harvested and aggregated available on Facebook for the purpose of social networking
into a single database, co-mingled with housing data from among friends and colleagues, not to be used as fodder for
an outside source, and then compared across datafiles. academic research. Without specific consent, the collection
and release of Facebook data invariably brings about
Improper access to personal information unauthorized secondary uses.

Privacy violations might occur when information about Errors in personal information
individuals might be readily available to persons not
properly or specifically authorized to have access the data. Finally, privacy concerns arise due to the impact of pos-
As described above, subjects within the T3 dataset might sible errors within datasets, which has lead to various
have used technological means to restrict access to their policies ensuring individuals are granted the ability to view
profile information to only members of the Harvard com- and edit data collected about them to minimize any
munity, thus making their data inaccessible to the rest of potential privacy violations.21 In the T3 project, subjects
the world. By using research assistants from within the were not aware of the data collection nor provided any
Harvard community, the T3 researchers—whether inten- access to view the data to correct for errors or unwanted
tional or not—would be able to circumvent those access information.
controls, thereby including these subjects’ information
among those with more liberal restrictions. Ethical research standards
Further, no specific consent was sought or received from
the subjects in the study; their profile information was Viewing the privacy concerns of the T3 data release
simply considered freely accessible for collection and through the lens of Smith et al.’s (1996) privacy violation
research, regardless of what the subject might have inten- framework helps to focus the ethical deficiencies of the
ded or desired regarding its accessibility to be harvested for overall project. In turn, our critique of the T3 project
research purposes. Combined, these two factors reveal how exposes various breeches in ethical research standards that,
a privacy violation based on improper access has occurred if followed, might have mitigated many of the privacy
due to the T3 project. threats.

21
See, for example, the United States Federal Trade Commission’s
Fair Information Practice Principles (http://www.ftc.gov/reports/
privacy3/fairinfo.shtm), which include ‘‘Access’’ as a key provision,
20
I thank an anonymous reviewer for suggesting this organizing providing data subjects the ability to view and contesting inaccurate
framework. or incomplete data.

123
On the ethics of research in Facebook 323

Ethical issues in human subjects research receive con- with research projects based on data gleaned from social
siderable attention, culminating in the scrutiny of research media.
projects by Institutional Review Boards for the Protection As made apparent to the position of some of the T3
of Human Subjects (IRB’s) to review research according to research team that their data collection methods were
federal regulations.22 These regulations focus on research unproblematic since the ‘‘information was already on
ethics issues such as subject safety, informed consent, and Facebook’’, future researchers must gain a better under-
privacy and confidentiality. Others have then these broad standing of the contextual nature of privacy in these
standards and applied them specifically to Internet-based spheres (Nissenbaum 1998, 2004, 2009), recognizing that
research and data collection. For example, the Association just because personal information is made available in
of Internet Researchers have issued a set of recommenda- some fashion on a social network, does not mean it is fair
tions for engaging in ethical research online (see Ess and game for capture and release to all (see, generally, Stutz-
AoIR ethics working committee 2002), which places con- man 2006; Zimmer 2006; McGeveran 2007; boyd 2008a).
siderable focus on informed consent and respecting the Similarly, the notion of what constitutes ‘‘consent’’ within
ethical expectations within the venue under study. the context of divulging personal information in social
As noted above, the T3 researchers did not obtain any networking spaces must be further explored, especially in
informed consent by the subjects within the dataset (nor light of this contextual understanding of norms of infor-
were they asked to do so by their Institutional Review mation flow within specific spheres. The case of the T3
Board). Further, as described in detail, the researchers data release also reveals that we still have not learned the
failed to respect the expectations likely held by the subjects lessons of the AOL data release and similar instances
regarding the relative accessibility and purpose of their where presumed anonymous datasets have been re-identi-
Facebook profile information. By failing to recognize that fied. Perhaps most significantly, this case study has
users might maintain strong expectations that information uncovered possible shortcomings in the oversight functions
shared on Facebook is meant to stay on Facebook, or that of institutional review boards, the very bodies bestowed
only members of the Harvard network would ever have with the responsibility of protecting the rights of data
access to the data, the T3 researchers have failed in their subjects.
duty to engage in ethically-based research. Overcoming these challenges and conceptual muddles is
no easy task, but three steps can be taken immediately to
guide future research in social media spaces. One, scholars
Conclusion engaging in research similar to the T3 project must rec-
ognize their own gaps in understanding the changing nature
The events surrounding the release of the Facebook data in of privacy and the challenges of anonymizing datasets, and
the ‘‘Tastes, Ties, and Time’’ project –including its meth- should strive to bring together an interdisciplinary team of
odology, its IRB approval, the way in which the data was collaborators to help ensure the shortcomings of the T3
released, and the viewpoints publicly expressed by the data release are not repeated. Two, we must evaluate and
researchers—reveals considerable conceptual gaps in the educate IRBs and related policy makers as to the com-
understanding of the privacy implications of research in plexities of engaging in research on social networks.23 And
social networking spaces. As a result, threats to the privacy three, we must ensure that our research methods courses,
of the subjects under study persist, despite the good faith codes of best practices, and research protocols recognize
efforts of the T3 research team. the unique challenges of engaging in research on Internet
The purpose of this critical analysis of the T3 project is and social media spaces.24
not to place blame or single out these researchers for The ‘‘Tastes, Ties, and Time’’ research project might
condemnation, but to use it as a case study to help expose very well be ushering in ‘‘a new way of doing social
the emerging challenges of engaging in research within
online social network settings. These include challenges to
the traditional nature of consent, properly identifying and 23
See, for example, the ‘‘Internet Research Ethics: Discourse,
respecting expectations of privacy on social network sites,
Inquiry, and Policy’’ research project directed by Elizabeth Buchanan
developing sufficient strategies for data anonymization and Charles Ess (http://www.nsf.gov/awardsearch/showAward.do?
prior to the public release of personal data, and the relative AwardNumber=0646591).
24
expertise of institutional review boards when confronted An important movement in this direction is the recently funded
‘‘Internet Research and Ethics 2.0: The Internet Research Ethics
Digital Library, Interactive Resource Center, and Online Ethics
22
See Part 46 Protection of Human Subjects of Title 45 Public Advisory Board’’ project, also directly by Elizabeth Buchanan and
Welfare of the Code of Federal Regulations at http://www.hhs.gov/ Charles Ess (http://www.nsf.gov/awardsearch/showAward.do?Award
ohrp/humansubjects/guidance/45cfr46.htm. Number=0924604 and http://www.internetresearchethics.org/).

123
324 M. Zimmer

science’’, but it is our responsibility scholars to ensure our Kaufman, J. (2008a). Considering the sociology of Facebook:
research methods and processes remain rooted in long- Harvard Research on Collegiate Social Networking [Video].:
Berkman Center for Internet & Society.
standing ethical practices. Concerns over consent, privacy Kaufman, J. (2008b). I am the Principal Investigator… [Blog
and anonymity do not disappear simply because subjects comment]. On the ‘‘Anonymity’’ of the Facebook dataset
participate in online social networks; rather, they become Retrieved September 30, 2008, from http://michaelzimmer.org/
even more important. 2008/09/30/on-the-anonymity-of-the-facebook-dataset/.
Kaufman, J. (2008c). Michael—We did not consult… [Blog com-
ment]. michaelzimmer.org Retrieved September 30, 2008, from
Acknowledgments The author thanks the participants at the Inter- http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-
national Conference of Computer Ethics: Philosophical Enquiry in facebook-dataset/.
Corfu, Greece, as well as the Internet Research 10: Internet Critical Lenhart, A., & Madden, M. (2007). Teens, privacy & online social
conference in Milwaukee, Wisconsin, for their helpful comments and networks. Pew internet & American life project Retrieved April
feedback. Additional thanks to Elizabeth Buchanan, Charles Ess, 20, 2007, from http://www.pewinternet.org/pdfs/PIP_Teens_
Alex Halavais, Anthony Hoffmann, Jon Pincus, Adam Shostack, and Privacy_SNS_Report_Final.pdf.
Fred Stutzman for their valuable insights and conversations, both Lewis, K. (2008). Tastes, Ties, and Time: Cumulative codebook.
online and off. The author also thanks the anonymous reviewers for Retrieved September 30, 2008, from http://dvn.iq.harvard.edu/
their helpful suggestions and criticisms. This article would not have dvn/dv/t3.
been possible without the research assistance of Wyatt Ditzler and Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N.
Renea Drews. Finally, I would like to thank Jason Kaufman and Colin (2008). Tastes, Ties, and time: A new social network dataset
McKay at the Berkman Center for Internet & Society, for their valued using Facebook. com. Social Networks, 30(4), 330–342.
and continued feedback regarding this work. McGeveran, W. (2007). Facebook, context, and privacy. Info/Law
Retrieved October 3, 2008, from http://blogs.law.harvard.edu/
infolaw/2007/09/17/facebook-context/.
References N.A. (2008). Tastes, Ties, and Time: Facebook data release. Berkman
Center for Internet & Society Retrieved September 30, 2008,
Albrechtslund, A. (2008). Online social networking as participatory from http://cyber.law.harvard.edu/node/4682.
surveillance. First Monday Retrieved 2008, March 3, from http:// Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of
www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/ large sparse datasets. Paper presented at the IEEE Symposium on
2142/1949. Security and Privacy, 2008.
Barbaro, M., & Zeller Jr, T. (2006). A face is exposed for AOL Narayanan, A., & Shmatikov, V. (2009). De-anonymizing social
searcher no. 4417749. The New York Times, p. A1. networks. Paper presented at the 30th IEEE Symposium on
Barnes, S. (2006). A privacy paradox: Social networking in the Security and Privacy.
United States. First Monday Retrieved October 12, 2007, from Nissenbaum, H. (1998). Protecting privacy in an information age: The
http://www.firstmonday.org/ISSUES/issue11_9/barnes/. problem of privacy in public. Law and Philosophy, 17(5), 559–
Bloustein, E. (1964). Privacy as an aspect of human dignity: An 596.
answer to Dean Prosser. New York University Law Review, 39, Nissenbaum, H. (2004). Privacy as contextual integrity. Washington
962–1007. Law Review, 79(1), 119–157.
boyd, D. (2008a). Putting privacy settings in the context of use (in Nissenbaum, H. (2009). Privacy in context: Technology, policy, and
Facebook and elsewhere). Apophenia Retrieved October 22, the integrity of social life. Stanford, CA: Stanford University
2008, from http://www.zephoria.org/thoughts/archives/2008/10/ Press.
22/putting_privacy.html. Nussbaum, E. (2007). Kids, the Internet, and the end of privacy. New
boyd, D. (2008b). Taken out of context: American teen sociality in York Magazine Retrieved February 13, 2007, from http://nymag.
networked publics. Unpublished Dissertation, University of com/news/features/27341/.
California-Berkeley. Rosenbloom, S. (2007). On Facebook, scholars link up with data. New
boyd, D., & Ellison, N. (2008). Social network sites: Definition, York Times Retrieved September 30, 2008, from http://www.ny
history, and scholarship. Journal of Computer-Mediated Com- times.com/2007/12/17/style/17facebook.html?ref=us.
munication, 13(1), 210–230. Simmel, G., & Wolff, K. H. (1964). The sociology of Georg Simmel.
Ess, C., & AoIR ethics working committee. (2002). Ethical decision- Glencoe, Ill: Free Press.
making and Internet research. Retrieved March 12, 2010, from Smith, H. J., Milberg, S. J., & Burke, S. J. (1996). Information
http://www.aoir.org/reports/ethics.pdf. privacy: Measuring individuals’ concerns about organizational
Gatt, A. (2002). Click-wrap agreements the enforceability of click-wrap practices. MIS Quarterly, 20(2), 167–196.
agreements. Computer Law & Security Report, 18(6), 404–410. Solove, D. (2007). The future of reputation: Gossip, rumor, and
Grimmelmann, J. (2009). Facebook and the social dynamics of privacy on the internet. New Haven, CT: Yale University Press.
privacy. Iowa Law Review, 95, 4. Stutzman, F. (2006). How Facebook broke its culture. Unit Structures
Gross, R., & Acquisti, A. (2005). Information revelation and privacy Retrieved 2008, October 3, from http://chimprawk.blogspot.
in online social networks. Paper presented at the 2005 ACM com/2006/09/how-facebook-broke-its-culture.html.
workshop on Privacy in the electronic society, Alexandria, VA. Stutzman, F. (2008). Facebook datasets and private chrome. Unit
Jansen, B. J., & Resnick, M. (2005). Examining searcher perceptions Structures Retrieved 2008, September 30, from http://fstutzman.
of and interactions with sponsored results. Paper presented at the com/2008/09/29/facebook-datasets-and-private-chrome/.
Workshop on Sponsored Search Auctions at ACM Conference Sweeney, L. (2002). k-anonymity: A model for protecting privacy.
on Electronic Commerce, Vancouver, BC. International Journal of Uncertainty Fuzziness and Knowledge-
Jansen, B. J., & Spink, A. (2005). How are we searching the world Based Systems, 10(5), 557–570.
wide web? A comparison of nine search engine transaction logs. Wellman, B., & Berkowitz, S. D. (1988). Social structures: A network
Information Processing & Management, 42(1), 248–263. approach. Cambridge: University Press Cambridge.

123
On the ethics of research in Facebook 325

Zimmer, M. (2006). More on Facebook and the contextual integrity of October 3, 2008, from http://michaelzimmer.org/2008/10/03/more-
personal information flows. michaelzimmer.org Retrieved 2008, on-the-anonymity-of-the-facebook-dataset-its-harvard-college/.
October 3, from http://michaelzimmer.org/2006/09/08/more-on- Zimmer, M. (2008b). On the ‘‘Anonymity’’ of the Facebook dataset.
facebook-and-the-contextual-integrity-of-personal-information- michaelzimmer.org Retrieved September 30, 2008, from http://
flows/. michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-face
Zimmer, M. (2008a). More on the ‘‘Anonymity’’ of the Facebook book-dataset/.
dataset—It’s Harvard College. michaelzimmer.org Retrieved

123

Anda mungkin juga menyukai