Re-Identification: Revisiting How We Define Personally Identifiable Information

Jeffrey Van Hulten,
Re-identification: Revisiting how we define Personally Identifiable Information Cyberlaw--2013
Imagine its a rainy Sunday afternoon. You have just sat down in your living room, laptop in hand, as you turn on your TV equipped with Netflix ready to resume viewing your latest TV series obsession. Whilst watching the latest episode, you begin surfing the net and decide to finally give in to the inner desire to make that online purchase for an item you have desired for some time. After you place your order the website kindly asks you to rate several past purchases you have made. You quickly do so and resume watching your show. Shortly after the show ends, Netflix asks you to rate the episode. Again, you quickly provide a rating score and carry on with the rest of your afternoon. In the above scenario, aside from sounding relatively relaxing and familiar, you may have just left enough information for someone with the right tools to identify who you are. Thus, potentially connecting you to some of your most sensitive and personal information. Although this may sound extreme, the reality of re-identification from anonymitized data has become increasingly pervasive as technology in the computer sciences and mathematics continues to advance.1 These advancements in technology has made it possible for hackers, criminals, researchers, and the government alike, to take random anonymous information and, with outside source information, begin to unlock the doors that lead to your most personal information including your actual identity.2 The concept of data anonymization in relatively simple and has been in existence since the advent of digitizing information.3 The concept, although approached in many ways, is simple; it consists of removing all information that could be seen as personally identifiable. That
1
See Latanya Sweeny, Computational Disclosure Control: A Primer on Data Privacy Protection, Massachusetts Institute of Technology (2001), available at http://dspace.mit.edu/handle/1721.1/8589 Arvind Narayanan and Vitaly Shmatikov, Privacy and Security: Myths and Fallacies of Personally Identifiable Information, 53 Communications of the ACM (June 2010), available at http://www.cs.utexas.edu/users/shmat/shmat_cacm10.pdf Arvind Narayanan & Vitaly Shmatikov, Robust De-Anonymization of Large Sparse Datasets, in PROC. OF THE 2008 IEEE SYMP. ON SECURITY AND PRIVACY 111, 121 available at http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf [hereinafter Netflix Release]
Jeffrey Van Hulten,
is, information that could be linked to sensitive information such as health conditions or directly linked to your identity such as your Social Security Number or Credit Card Number. The idea then follows that once this information has been removed or redacted, all that remains is harmless information such as age, gender, product ratings, etc.4 This information is then analyzed is some fashion and/or shared publically, to third parties, or internally to enhance professional practices, academic research, highlight demographics/consumer behavior patterns, or simply to provide public disclosure of information.5 The end result seems to be a wellbalanced approach of protecting the privacy of individuals whose information is being shared while allowing for the utility of information to also be protected. However, while seemingly ideal, this balance operates on an ideological principle that no longer applies in todays technological age: personally identifiable information is only information that is, or is directly linked to, a persons identity or sensitive information. The advances in mathematics and computer science have created a way to connect seemingly harmless and unrelated information, like movie ratings or viewing patterns of a particular unidentified person, to harmful information, like a specific persons diagnosis of HIV or mental illness.6 The privacy right implications to such revelations are damning to the current data anonymizing approach to privacy protection.
See Latanya Sweeney, Achieving k-Anonymity Privacy Protection Using Generalization and Suppression, 10 INTL J. ON UNCERTAINTY, FUZZINESS & KNOWLEDGE-BASED SYS. 571 (2002) available at http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity2.pdf
Ali Inan, Murat Kantarcioglu, Elisa Bertino, Using Anonymized Data for Classification University of Texas at Dallas, available at http://www.utdallas.edu/~muratk/publications/inan-AnonClassification.pdf Bambauer, Jane, Tragedy of the Data Commons (March 18, 2011). Harvard Journal of Law and Technology, Vol. 25, 2011. Available at SSRN: http://ssrn.com/abstract=1789749 or http://dx.doi.org/10.2139/ssrn.1789749
Jeffrey Van Hulten,
These new advancements carry with them serious privacy right implications including the need to reconsider the traditional regulatory approach of categorically defining what constitutes personally identifiable information (PII). This approach examines PII on a spectrum with personally sensitive information on one end and a persons identity on the other.7 The current approach to defining and regulating PII is limited to either end of this spectrum while leaving all other information that falls between these two ends unregulated and out of the definitional scope of PII.8 This emerging gap within the definition of PII is the focus of this paper and will be approached in the following manner. First, Part I of this paper will examine the basic history and approach to data anonymizing and its impact on defining PII in a categorical manner. Part II will highlight how the advancements in technology have destroyed the current definitional approach by highlighting a case study that demonstrates the definitions pitfalls. Part III will layout a recent recommendation made in providing a potential solution to this issue. And finally, Part IV will examine the potential shortcomings of this recommendation by recommending additional considerations in redefining PII and its subsequent regulatory scheme. I. Data Anonymizing and the categorical approach to PII In the current regulatory scheme regarding PII, data anonymizing has been hailed as the cure all for protecting privacy while ensuring utility of information.9 Under the current legal framework, privacy is to be protected in that the identity and/or sensitive information of an
This is a similar analogy of the hallway anaolgy used by Paul Ohm. See Paul Ohm, Broken Promises of Privacy: Responding to the Surprising Failure of Anonimyzation, 57 UCLA L. Rev. 1710 at 1749-50, 1759-60 (2010), available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 [hereinafter Broken Promises] Id.
Douglas J. Sylvester & Sharon Lohr, The Security of Our Secrets: A History of Privacy and Confidentiality in Law and Statistical Practice, 83 DENV. U. L. REV. 147 at 195 (2005), available at http://www.law.du.edu/images/uploads/denver-university-lawreview/v83_i1_sylvesterLohr.pdf [hereinafter Security of Our Secrets]
Jeffrey Van Hulten,
individual is hidden or removed while other information regarding that person is analyzed and shared with little or no regard to privacy protection.10 Before looking more deeply into how PII has gained its traditional definition due to this approach, it serves useful to first understand when and for what purposes data anonymizing is utilized. Traditionally information sharing is motivated by the need to share information for research purposes, such as health findings and statistics being shared among medical academics, or consumer practices and behaviors being shared between corporations or industries.11 Moreover, potential personal information is often used internally to help demonstrate how particular practices or techniques are impacting an overall goal.12 For example, a multi-levelmarketing company may want to know if age or gender influences the sales of a particular product or if the particular strategies utilized by some employees pushes sales higher or lower. Furthermore, in other cases, and more so recently, information is released for crowdsourcing purposes.13 As will be highlighted infra Part II, some corporations, like Netflix, or even entire industries will release information to the general public with the idea that volunteer analysts (e.g. bloggers, free lance researchers, etc.) will analyze the information for the corporation for free or for a prize rather than the corporation hiring out a small group of analysts. Again, these public releases rely heavily on data anonymizing to protect the PII (sensitive and/or
10
Id.
11
Broken Promises, at 1708; See also, Posting of Susan Wojcicki, Vice President, Product Management to The Official Google Blog, Making Ads More Interesting, http://googleblog.blogspot.com/2009/03/making-ads-more- interesting.html Id.; See also, Posting of Philip Lenssen to Google Blogoscoped, Google-Internal Data Restrictions, http://blogoscoped.com/archive/2007-0627-n27.html Jeff Howe, The Rise of Crowdsourcing, Wired Magazine Issue 14.06 (June 2006), available at http://sistemas-humanocomputacionais.wdfiles.com/local--files/capitulo%3Aredes-sociais/Howe_The_Rise_of_Crowdsourcing.pdf
12
13
Jeffrey Van Hulten,
identity related information) while providing useable information in terms of an overall purpose.14 These uses of data anonymizing have largely influenced the way PII is viewed and defined. More specifically PII has been approached in a regulatory sense strictly from a categorical perspective.15 This is largely because the practice of data anonymizing is centered on categories of information that are seen as indentifying a person and/or relating to highly sensitive information about a person. This categorical approach has been mainly influenced by what legal scholar Paul Ohm highlights as three main types of data anonymizing: suppression, generalization, and aggregation.16 Data anonymizng can be done in many forms and tracked by the information supplier in varying degrees. As Paul Ohm points out, the vast majority of information is anonymized and then forgotten, in a privacy sense. Thus, the main types of data anonymizing all stem from the same category of the release-and-forget approach.17 The forget element of this approach again applying to the tracking of what happens to the information once it is released (i.e. who it is shared with, how it is treated, where it is shared, etc. is never tracked). In a deeper look at these three approaches to data anonymizing it becomes very clear how PII has been defined only by categories of information. For example, suppression is the approach that perhaps most people would assume anonymizing takes. In suppression the PII is simply redacted or removed leaving all other information in tact.18 Thus, a company examining the consumer habits of its online shoppers would redact consumers names, street/mailing addresses, and credit card number while leaving the item purchased, any product rating made, how many
14
National Institutes of Health, HIPAA Privacy Rules for Researchers, http://privacyruleandresearch.nih.gov/faq.asp
15 16
Security of Our Secrets, at 182. Broken Promises, at 1711-16 17 Id. 18 Id.
Jeffrey Van Hulten,
items were purchased, when the purchase was made, what products were viewed, gender and age of the consumer, and even the zip code to track habits by geographic region. This approach is premised on the idea that indentifying information is singled out and removed.19 The suppression method does have inherent concerns because while identifying information is removed many other pieces of information remain that could lead to the re-identification of a person.20 Thus, in some cases suppression is not an adequate approach. To combat this, another approach, generalization, can aid in providing more anonymity while arguably protecting the utility of the information.21 For example in the consumer example above the zip code, gender, and age of a consumer could very easily narrow down the specific or likely shoppers within a particular geographic region. Thus, the online store may want to generalize the information, like creating an age range rather than specific ages for each purchase or combine several zip codes to broaden the geographic scope. This approach lowers the risk of being re-identified but also lowers the quality and specificity of the findings that can be made from the information.22 Finally, a third approach can be used to help circumvent either pitfall of suppression or generalization but also comes at a price. Aggregation is the release of specific information that typically has already been analyzed on some level.23 For example if the online store wanted to analyze the gender of purchasers toward a particular product the online store could supply the analysts with two numbers, the amount of men and women that purchased the product. This
19
Broken Promises, at 1713; See alsoClaudio Bettini et al., The Role of Quasi-Identifiers in k-Anonymity Revisited (DICo Univ. Milan Tech. Rep. RT-11-06, July 2006), available at http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDQQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fvi ewdoc%2Fdownload%3Fdoi%3D10.1.1.96.564%26rep%3Drep1%26type%3Dpdf&ei=YnpxUeTYMqerigLQz4GIDQ&usg=AFQjCNEVXlUOV xygHpaZU3TUtxH7fhYIew&sig2=jc0UrSI738JNwemTi7V7zQ&bvm=bv.45373924,d.cGE
20 21
Id. at 1714; See also Sweeny, supra note 4 Id. 22 Id. 23 Id. at 1715
Jeffrey Van Hulten,
approach shields the raw data from others creating more privacy protection while still providing specific information. However, the information may be too specific and thus this approach may only work when examining specific criterion rather than taking a holistic approach to information analysis. The unifying theme through all of these approaches is the fact that data is anonymized by virtue of categories. In all three approaches the information is assessed for what types of information are too revealing in terms of identity and/or sensitivity. In no way do these approaches account for the interplay of information or the interaction of openly available information and how it may re-identify a person.24 Thus, we see PII is, and always has been, viewed in a categorical sense which overlooks the current issue of harmless information providing connections to the harmful invasion of information that stands protected on either end of the privacy spectrum, albeit identity and sensitive information. As Part II will further explain, this definitional gap, in what is PII, can create issues that are problematic under a legal scheme that has yet to anticipate new harms and an appropriate way to address them. II. PII and the definitional gap: Netflix a Case Study The definitional gap of PII can be slightly difficult to understand in the abstract because ratings to online purchases or viewed movies seems to be rather disconnected from your social security number or medical history. This is typically compounded by the fact that knowing a persons shopping or movie preferences seems rather harmless in comparison to their social security number.25 Thus, it seems useful to highlight a case study that demonstrates the slippery
24 25
Id. at 1716 Nancy Messieh, Do We Really Care About Our Online Privacy?, The Next Web, September 13, 2011, available at http://thenextweb.com/insider/2011/09/13/do-we-really-care-about-our-online-privacy/
Jeffrey Van Hulten,
slope that technology has placed privacy rights on, and how ill equipped our regulatory scheme currently is to handle such potential harm in lieu of its categorical approach to defining PII.26 In October of 2006 Netflix announced a contest with a winning prize of $1 million.27 The prize would go to the best new mathematical algorithm designed to help Netflixs computer servers better predict what movies to recommend to customers based upon the customers viewing history.28 In order to facilitate this contest Netflix released over one hundred million records containing the personal viewing history of nearly five hundred thousand customers.29 The sharing of the information was largely motivated by potential financial gains; more specifically, if Netflix could ascertain a better way to predict movies that its customers would actually enjoy Netflix stood to fortify its repeat customer rates resulting in a larger profit margin.30 In order to protect the privacy of the customers, Netflix utilized suppression tactics and replaced all customer names with non-identifiable ID numbers and redacted all other traditionally defined PII, namely: credit card numbers, addresses, etc.31 The public release of information was seen as a great opportunity for researchers in the mathematic and computer science fields, not just for potentially winning the prize, but for the free access to a large amount of datasets that could be used to test theories and the latest advances in technology.32 The resultant research that came from the Netflix release would be more startling however as the implications of these advancements would force many privacy
26 27
Broken Promises, at 1731-45 Id. at 1720-22; See also The Netflix Prize Rules, http://www.netflixprize.com/rules Id.; See also Netflix Prize: FAQ, http://www.netflixprize.com/faq
28
29 30
Id.; See also supra note 27 Id. 31 Id. 32 Id.
Jeffrey Van Hulten,
advocates to reexamine what actually constitutes PII and whether a categorical approach is the best policy approach for a regulatory scheme.33 Among the research that was produced, a study showing the ability to re-identify anonymized data to the identity of the actual viewer was some of the more surprising finds.34 The study originally began by showing that if a customer of Netflix was included within the data that was released for the contest, a person knowing very little about that customer would be able to identify which viewing history record belonged to that customers viewing history and ultimately connect an identity with a specific set of anonymized data.35 Of course this initially does not seem like the death knell of privacy via anonymized data in that a customers medical history was not subsequently exposed. However, the findings sparked more research that showed a deeper undercurrent to what potentially could be connected regarding a persons identity or sensitive information once this connection was made.36 For example, researchers took the Netflix data and used mathematical algorithms to cross reference the viewing histories released by Netflix with the rating history of individual users on IMDb.com. IMDb.com holds all of its information publically and thus users have opaque anonymous ID names or numbers and can rate any movie within the database. The interesting element of this research was the fact that both Netflix and IMDb.com did not contain identical lists of movies. Therefore, both Netflix and IMDb.com contained information about a viewers movie preferences that the other did not contain. The research that was found showed that from a small subset of IMDb.com users several were statistical matches for a customers viewing
33
Arvind Narayanan & Vitaly Shmatikov, How to Break the Anonymity of the Netflix Prize Dataset, ARVIX, Feb. 5, 2008, at 1, http://arxiv.org/pdf/cs/0610105v2.pdf Id. Id.; See also Justin Vastola et. al., Statistics for Re-Identification in Network Models, University of Pennsylvania, available at https://opimweb.wharton.upenn.edu/linkservid/1BAB25B2-D765-78AD-322BC102A698C73A/showMeta/0/ Id.
34 35
36
Jeffrey Van Hulten,
history found in the Netflix release37. What was more telling was the complete picture that, when combined, both the databases painted of what type of political persuasion and social viewpoints this viewer had based upon the rating of several movies and TV shows. Again, this has yet to connect viewing history with medical history or other sensitive information but it is the very activity that could. As we reconsider the spectrum of privacy the connections made between all the pieces of information that fall in the middle of the two ends of the spectrum lead to an eventual connection between the two ends themselves.38 For example, a persons viewing history may be linked to their social media account profiles that show the same sets of movies being liked or followed. Through the use of mathematics and computer science technology a person can then connect the statistical likelihood of how many people would have liked the same set of movies, eventually leading to a process of elimination and thus connecting that social media account identity to a similar viewing history can re-identify someone via seemingly anonymized data.39 Subsequently, once an identity is found and connected to a particular social media account the person creating the connections has just gained access to a new cache of information about an individual that can be cross referenced with other databases and more connections can potentially be made regarding a persons identity and sensitive information.40 This process continues to repeat until a solid connection between actual identity and sensitive information is formed. The advances in technology have not only made this process possible but have also
37 38
Id. Broken Promies, supra note 7 39 Narayanan, supra note 33 40 Broken Promies, supra note 7
10
Jeffrey Van Hulten,
made seemingly random anonymous data identifiable and connectable at increasingly faster speeds.41 The over arching implication of these findings is detrimental to the categorical approach to PII. In particular, PII is defined by classifying categories of information as personal vs. nonpersonal. Thus, the regulatory scheme enforces privacy protection by requiring that information classified as personal be protected and safe-guarded while non-personal information is not.42 However, as demonstrated above in the Netflix release all information has the potential to be personal information. Thus, maintaining a categorical approach would require a ban on any information release or sharing running afoul of several laws including the Constitution. To combat this issue some states courts have begun to expand the definition of PII to include other pieces of information traditionally not viewed as PII.43 However, this still maintains that there are some pieces of information that could not connect a persons identity to or with their sensitive information. As technology continues to advance this will continue to become less true leaving gaps in the current regulatory scheme in how to with such potential issues. Part III will examine a specific recommendation to deal with the current approach of categorically defining PII. III. Paul Ohm and the Five Step Test This definitional gap in PII that has been created through connecting seemingly unrelated pieces of information and tying them back to the identity and/or sensitive information of a person is problematic for two reasons. First, it forces PIIs definition to be expanded to virtually all
41
Arvind Narayanan et. al, Link Prediction by De-Anonymization: How We Won the Kaggle Network Challenge, arXiv:1102.4374.v1, Feb. 22, 2011, available at http://arxiv.org/pdf/1102.4374v1.pdf Broken Promises, supra note 7 at 1731-44 Robert E. Braun and Craig A. Levine, Client Alert: California Supreme Court Rules That Zip Codes Are Personally Identifiable Information, JMBM (Feb. 16, 2011), available at http://www.jmbm.com/docs/california_supreme_court_rules_that_zip_codes_are_personal_identification_information.pdf
42 43
11
Jeffrey Van Hulten,
pieces of information, leaving room for sweeping regulation that could greatly harm the dissemination of information. Second, this issue is taking place under a regulatory scheme that focuses solely on the information itself.44 It does not consider the collector/sharer or the receiver of the information and what they are or are not doing to and with the information they obtain. In reconsidering the definitional gap of PII and these two subsequent issues, Paul Ohm has created an approach he suggests would fill this gap and create a better post-anonymization regulatory scheme that rethinks the traditional PII approach.45 More specifically, Paul Ohm argues that a system that has a comprehensive privacy baseline with a more contextualized sector specific regulatory arm to provide additional above the baseline protection to information will better fit the privacy reality of today and the future.46 This two-tiered approach is further supplemented by a Five Factor Test regulators are to apply when considering the second tier of sector specific regulation, these factors are: 1) Data Handling Techniques, 2) Private versus Public release, 3) Quantity, 4) Motive, and 5) Trust.47 Before briefly examining the details of the Five Factor test it is important to first briefly discuss how this approach changes the current categorical approach to defining PII. This new approach does not require that categories become a thing of the past because while unrelated information may now have the potential to be connected, and thus swallowing all categories of information in the operative definition of PII, these categories can still serve as a starting point that helps craft a more holistic view of the situation. The idea essentially being that who has the information, what they are doing with it, and how it is being handled, may be categorical
44 45
Broken Promises, supra note 7 at 1731 Id. at 1751 46 Id. at 1762-64 47 Id. at 1764-68
12
Jeffrey Van Hulten,
approached based on how serious the information is viewed to be in terms of privacy.48 It is from this platform that the Five Factor Test launches the major considerations including categories that regulators should make when rethinking PII and privacy law in general. The Five Factor Test begins with Data Technique Handling. More specifically, Paul Ohm suggests that regulators should consider how categories of information are treated in terms of privacy and how likely is that technique susceptible to fostering re-identification.49 Thus, the resultant suggestions is to create a rubric that rates handling techniques on some type of scale that would then allow regulators to suggest varying scale standards be set specifically for particular industries and sectors.50 The second and third factors are Private versus Public Releases and Quantity. Private versus Public Release is just as it sounds, and is directed at the very practice utilized by Netflix in their 2006 public release of information.51 The idea behind this factor is that public releases serve very little if any true utilitarian purpose. The driving force is often monetary. Therefore, regulators should only consider allowing for Public Releases of anonymized data for exceptional purposes.52 Although not ideal from an information utility standpoint it is privacy at the cost of some utility. The third approach, Quantity, takes a similar approach of making common practices today more exceptional than common in the future. Here, Paul Ohm argues for regulation to take into account the amount of information one database is allowed to have noting that one stop shopping for information greatly heightens the likelihood of re-identification.53 Moreover, the
48 49 50
Id. at 1759 Id. at 1764-68 Id. 51 Id. 52 Id. 53 Id.
13
Jeffrey Van Hulten,
longer information is retained also increases this risk. Thus, Paul Ohm suggests that regulators consider the amount of information and the duration it is kept be regulated.54 Finally the Fourth and Fifth factor, Motive and Trust. Both factors lend themselves to more of a philosophical ideal that the traditional PII definition and regulatory scheme has not possessed.55 In particular, Motive suggests that the reason for sharing should be a consideration in how relaxed or strict the regulation of a particular category of information should be. This allows flexibility in regulating less harshly more pure pursuits such as academic research versus monetary motives. Moreover, Trust, much like Motive, considers the adversarial potential of any given actor and therefore regulation should vary in amount based upon the level of trust we have for the particular people and institutions that have or will receive information.56 The two-tiered approach and additional five factor test serves as a more comprehensive approach to the otherwise under or over inclusive potential of the traditional PII definition and regulatory scheme. However as Part IV will highlight this approach comes with its own shortcomings and needs additional tweaks in order to more fully approach filling the definitional gap while also providing a balance for the utility of information. IV. The Five Factor Test: Analysis and Considerations The two-tiered approach with its supplementary Five Factor Test is premised on a fundamental principle that many can agree upon; that is: the traditional definition of PII is rooted in the idea the information itself can be categorized into potentially harming ones privacy or being irrelevant to it.57 As the Netflix Release of 2006 demonstrates this approach is insufficient and thus the way PII has been defined is ineffective. The new system suggested by Paul Ohm
54 55
Id. Id. 56 Id. 57 Id. at 1742-43
14
Jeffrey Van Hulten,
correctly shifts the definition of PII away from information and toward how information is treated and by whom it is collected and shared.58 Thus, the definitional gap of PII is filled by allowing any information to be PII if it is treated in a certain way or shared with those whom may use that information to invade ones privacy; not simply by what kind of information it is generally. However, this definitional shift requires a new approach to the regulatory scheme that relies upon the traditional PII definition and while Paul Ohm has crafted a thoughtful approach it also has several shortcomings that need to be considered and remedied before moving forward in recalibrating the regulatory scheme as it currently stands. First, the two-tiered system with the supplemental Five Factor Test relies heavily upon industry regulation.59 Moreover, this new system is reliant upon regulation that is tailored to specific industries in different ways. This is problematic for two reasons. One, this type of approach is susceptible to industry discrimination. Although this is not a likely legal hurdle, it will remain a political one. This new system will likely lead to many industry hired lobbyist seeking new legislation to ease their industry specific regulations for a varying number of reasons. This is not a new political reality. This idea makes this new approach susceptible to efficiency arguments as well as free market criticisms. Two, in the same vein of political reality it serves to note that theoretical approaches to reform are never best served when being examined in a vacuum. Thus, considering the influence of special interest in U.S. Politics this approach may have a strange result when attempted in such a political atmosphere. There is the potential that industries could lobby to receive lower standards than they should be given under the theoretical approach and thus circumventing the
58 59
Id. at 1764-68 Id.
15
Jeffrey Van Hulten,
overall purpose of the reform. Again, this is highly circumstantial but could be the result when the categorical approach shifts from defining information types to defining industry types.60 To combat these two potential issues we must step back to the theoretical approach proposed and redesign the definitional shift of PII. Although a categorical approach to information has its gaps it should not be abandoned in the sense that industry categorization replaces it as the main definitional determinant. Instead the approach should factor in informations re-identification potential under a variety of circumstances to shape how PII is redefined. Thus, rather than targeting industries target industry practices. This would allow for less feelings of being singled out and foster more best practices conversations. Plus it could thwart, arguable, some special interest by seeming less industry focused to the public and more procedural focused. Second, the two-tiered/five factor approach has left a key player out of consideration: the individual. As noted above, this approach is highly industry centric. Indeed, it is true the majority of the privacy harms stem from industries and their treatment of information in a way that could make it become PII. However, the same can be said of the individual whose information is being collected. Throughout this paper subtle references have been made toward the fact that outside information is needed to aid those who are attempting to re-identify anonymized information. This outside information often comes from an individuals social networking accounts and other public oriented venues where vast amounts of unique information to that individual is put on display for some or all to see. The current reformed approached suggested does not account for this fact at all. It places all the responsibility upon industries and their practices. This is largely problematic because information may be safely guarded from transferring non-PII to
60
Id.
16
Jeffrey Van Hulten,
PII through all sorts of regulations and mandatory practices but could be undone by the simple actions of an individual who may unwittingly cause their own information to become PII.61 In order to combat this two things should be added to the suggested reforms. First, a mechanism must be put into place whereby industries are not held liable or responsible in instances where seemingly harmless information becomes PII due to the actions of an individual. For example, in the Netlfix Release if an individual places all their Netflix ratings on their Facebook page and has no privacy settings activated, thus their page is open to the public, no industry standard or practice will be able to protect this individual without banning any release or sharing altogether. This runs into the original definitional problem with PII swallowing all information within its definition. Thus, in this case Netflix should not be held liable for the fact that the release caused the viewing history of this individual to become PII. Of course this is assuming the release in this example was done between two private parties or met the exceptional requirement under the Private versus Public Release factor. This paper will not address who, the courts or the legislature, is best to determine the levels of liability based upon privacy settings and individual release of information. Suffice it to stand though, that individual responsibility must be factored into the new regulatory scheme. Moreover the second issue regarding individuals is the lack of knowledge. Currently, and potentially under the new system, individuals are not made aware of how their information is used and subsequently how their very own actions could undermine their own privacy. Thus, the regulatory scheme must provide some regulation regarding how the industry practices can interplay with the individuals practices regarding their information. For example, Facebook
61
Id.
17
Jeffrey Van Hulten,
could have a concise and simple warning that must be clicked through before posting anything that reminds individuals that information shared without appropriate privacy settings could lead to re-identification of anonymized data and thus adversely affect the individuals privacy. It is important that the new regulatory system institute transparency and education to the individual wherever possible. Finally, the proposed system assumes or at least fails to address another unlikely political reality: an educated legislature. Although this final issue is outside the scope of the PII definitional issue, it still serves as an appropriate inquiry because who creates and institutes these regulations will effectively shapes what the new definition of PII will be. In order to accurately determine which industries should be regulated and on which levels, would require a fairly in depth, if not specialized, level of knowledge. This knowledge would need to at least include an understanding of current information collecting and trading practices, a general understanding of the technology involved, and how potential advances could alter its capability. The regulators in Paul Ohms approach would need to have an assumed level of understanding that is both adequate and appropriate to make informed decisions on the specifics of regulations created.62 No suggestions are made for what level of knowledge is required to appropriately make these regulatory decisions or to ensure it exists. To combat this shortcoming perhaps an additional piece to this two-tiered/five factor approach needs to be made: expand the duties of the FCC or create a new regulatory agency. In either case the regulators would not be the legislators themselves who may or may not have the adequate knowledge to make these decisions but an agency that can be staffed with those who do. It serves to note again that this paper does not address whether or not the federal government
62
Id.
18
Jeffrey Van Hulten,
should be involved in this regulation process or whether the courts should be the ones to foster the paradigm shift. This simply considers it from a largely legislative approach. In the end an agency staffed with knowledgeable people in regards to the many facets of this issue would likely be the best regulator when compared to congress itself. CONCLUSION The concept of data anonymizing has served as a beacon of security and reassurance as we have emerged into a modern age filled with technological advances that can enhance and harm our privacy expectations. However, as these advances continue forward they have eroded the very security they once stood to protect. The ability to analyze information, cross-reference it, and re-identify who it belongs to, has given reason for pause as our regulatory framework has not been designed to handle this new conundrum. As we approach rethinking this system and its fundamental foundation in a categorical definition for PII, it is important to maintain this foundation and build from it. As this paper has shown there are approaches that, with certain enhancements, can begin to form a system that fills the definitional gap of PII from informational categories to how information is used based on the interplay of industry and behavioral components; components that can make irrelevant anonymized information re-identifiable and harmful to an individuals privacy expectations.
19

Re-Identification: Revisiting How We Define Personally Identifiable Information

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Re-Identification: Revisiting How We Define Personally Identifiable Information

Diunggah oleh

Hak Cipta:

Format Tersedia

Jeffrey Van Hulten,