Anda di halaman 1dari 24

"How 'Big' Do I Have To Be To Have 'Big Data' Issues?

Don C. Moody, J.D., M.S.

Don C. Moody, 2012

1.0: Overview/Agenda

1.0:

Overview/Agenda

1.1: Short preso (15 mins!) Follow-up at lunch or offline

1.2

'Set the table' Framing the Issue (Perspectives on BD) How 'big' is data getting? (fun exercise) A few BD tools/technologies Is BD a real biz issue or just hype? BDs current popularity wave Real legal concerns or just Chicken Little?
Handouts

1.3:

2.0: Perspectives on Big Data

2.0:

Perspectives On Big Data

2.1 This is old hat!! (See innumerable beer and diapers examples from 1990s even before Target & pregnancy) 2.2 Framing the threshold question Do I Have to Worry About Big Data legal concerns? from two points of view: Small-to-Medium Enterprises (SMEs): (or lawyers representing them) Large Enterprises/Fortune 1000: I know Im a big company, I know data, and (if applicable) I know Im in a heavily privacy regulated area like health/HIPAA, financial/GLBA, education/FERPA, video history (VPPA), kids (COPPA) but when do I have to worry about separate BD legal issues? BD legal issues typically center around privacy but also can include: False/deceptive/unfair business practices e-Discovery IP

2.0:

Perspectives On Big Data

2.3 "Big data" is relative term (& somewhat misleading) due to 3 V's:

Scalability of inexpensive technologies (volume)


Availability of many unstructured sources (variety) Rapid proliferation (velocity)

2.0:
2.4

Perspectives On Big Data

Better: "Lotta Data" Not limited to large enterprises or large individual file sizes (e.g. trillions of small text entries) Better Still: "Lotta Messy Data" Lack of structure huge concern (e.g. 80% of data worldwide is unstructured Imposing order on chaos (e.g. pattern recognition) is key goal of 'big' data analytics

2.5

2.0:
2.5 Perspectives

Perspectives On Big Data

Small-to-Medium Enterprises (SMEs): (or lawyers representing them)

Cheap IT and clustering = big (volume & velocity)


Prevalance of Social Media = (variety)

Large Enterprises/Fortune 1000: When data is big enough or detailed enough for: Temptation to de-anonymize

Likelihood of unintended pattern reocgition (exceeds reasonable consumer expectations and/or what Priv Policy says)(FTC)

3.0:

How "Big" Is Data Getting?

3.0:
3.1

How "Big" Is Data Getting?

Measurement scales (with pragmatic examples and some fun facts/tidbits for each)

Can be 'big' (if unstructured or high volume):


Kilobytes (Docs, spreadsheets, GIF/JPEGs) Megabytes (MP3, higher res images) Gigabytes (PC hard drives, HD video)

3.0:
3.1

How "Big" Is Data Getting?

Measurement scales (with pragmatic examples and some fun facts/tidbits for each) Getting 'bigger': (big' but still not expensive!) Terabytes (enterprise servers, large HDDs) U.S. Library of Congress had over 235 terabytes of data in 2011 100 terabytes uploaded to Facebook/day 3 Terabyte Seagate HDD available on Amazon for $120 (as of 11/01/2012) AT&T claims to have largest single, unique database (1.9 trillion rows) @ 312 terabytes

3.0:
3.1

How "Big" Is Data Getting?

Measurement scales (with pragmatic examples and some fun facts/tidbits) Definitely 'big': Petabytes (supercomputers, large virtual "drives") The total file size of the movie Avatar (incl. encoding for 3-D, IMAX, HD, etc.) constituted over 1 petabyte of data, roughly equivalent to a 32-year long MP3 song. In 2008, eBay, Walmart and BofA were considered data storage leaders with 4 PB, 2.5 PB and 1.5 PB respectively Now, however, Facebook reportedly has over 30+ petabytes of data in a massive Hadoop cluster IBM put together a 120 Petabyte (120 million gigabyte) data cluster (virtual drive) using over 200,000 smaller HDDs, equaling a 1 trillion files or 2 billion hour long MP3

3.0:

How "Big" Is Data Getting?

3.1 Measurement scales (with pragmatic examples and some fun facts/tidbits for each)

Definitely 'big':
Exabytes (largest individual data sets globally) Zettabytes (total data currently on Earth projected to be 2.7 ZB)

3.0:
3.1

How "Big" Is Data Getting?

Measurement scales (with pragmatic examples and some fun facts/tidbits for each)

Mostly theoretical (for now): Yottabytes


Bored geeks at play:

Reverse alphabet proposal (X,Y,Z) Brontobytes

Excellent video demonstration from Univ. of Utah Prof. Chris Johnson @ TEDx Salt Lake City 2011: http://www.youtube.com/watch?v=5UxC9Le1eOY

4.0 A Few Big Data Tools and Technologies

3.0:

A Few Big Data Tools/Technologies

Massively parallel processing v. single supercomputer


Open source BD projects/technologies Apache Hadoop (application framework facilitating MPP for BD purposes (integrates MapReduce) Apache Cassandra (distributed DBMS for BD) Helpful Intro Video on Hadoop: http://www-01.ibm.com/software/data/infosphere/hadoop/

Proprietary platforms (storage management, analytics, DBMS)

Greenplum (acquired by EMC in 2010)


IBMs Big Data Platform SAPs HANA MapRs Drill (Hadoop re-done as proprietary platform with value-adds) Google Dremel

3.0:
3.1

How "Big" Is Data Getting?

Measurement scales (with pragmatic examples and some fun facts/tidbits for each)

Getting 'bigger': (big' but still not expensive!) Terabytes (enterprise servers, large HDDs)

Definitely 'big':

Petabytes (supercomputers, large virtual "drives")


Exabytes (largest individual data sets globally) Zettabytes (total data currently on Earth = 2.7 ZB)

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues


5.1 Business: All just hype and marketing spin? Yes! CRM/ERP in 1990s = lots of hype/promise + lots of (expensive) flameouts No! Internet closing in on 20+ yrs of mass adoption; meaningful patterns in online histories now emerging So much info available now that companies do not have to guess or make assumptions. Now (like Jeopardy) the only hurdle to an answer is knowiong which questions to ask. Kinda/sorta: Data is still data, and many traditional data mining techniques still apply to 'big' data (e.g. market basket analysis), just thought of in new ways

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues


5.2 BD industry/subsector now getting a lot of press coverage (and money thrown that way): IDC Estimates: Market for BD-related services projected to grow from $3.2B in 2010 to $16.9B in 2015 Time Magazine write-up on extensive use of data profiling analytics by Obama Camp (Released yesterday) Obama Administration projected to be spending over $200M on Big Data-related projects Harvard Business Review October 2012 Expose BD Legal Practice Groups Being Formed: Law Technology News October 2012

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues


5.3 Legal: De-anonymization temptation (e.g. "Database of Ruin") (Unintended) pattern recognition (e.g. Target example) Risk v. reward analysis (likelihood of occurring v. severity of harm)

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues


5.3 Legal: Data source type: search queries IP addresses (not anonymous!)

log/use data

5.0 Nailing Down The Real 'Big Data' Business/Legal Issues


Legal: Data subject matter: Financial (GLBA) Health care (HIPAA)

Video/library histories (VPPA)


Education (FERPA)

CONCLUSION

Anda mungkin juga menyukai