Anda di halaman 1dari 10

The Proceedings of the 14th Annual International Conference on Digital Government Research

Big Data and e-Government: Issues, Policies, and


Recommendations
John Carlo Bertot Heeyoon Choi
University of Maryland College Park Korea Institute for Science & Technology
College of Information Studies Information(KISTI)
College Park, MD 20742 Seoul, Korea
301.405.3267 82.10.4780.2933
jbertot@umd.edu hychoi@kisti.re.kr

ABSTRACT combined with new technologies and analysis techniques,


The promises and potential of Big Data in transforming has the potential to inform decision and policy making in
digital government services, governments, and the unprecedented ways. Increasingly, the research and
interaction between governments, citizens, and the business scientific communities, governments, and the private sector
sector, are substantial. From “smart” government to are generating large-scale data sets on a range of topics,
transformational government, Big Data can foster including climate change, traffic patterns, health and
collaboration; create real-time solutions to challenges in disease data, purchasing behavior, and social behavior
agriculture, health, transportation, and more; and usher in a through social media interactions. As noted in their 2010
new era of policy- and decision-making. There are, report, the President’s Council of Advisors on Science and
however, a range of policy challenges to address regarding Technology indicated that network and information
Big Data, including access and dissemination; digital asset technologies (NIT) are [6, p. vii-viii]: “Key drivers of
management, archiving and preservation; privacy; and economic competitiveness; crucial to achieving our major
security. This paper selectively reviews and analyzes the national and global priorities in energy and transportation,
U.S. policy context regarding Big Data and offers education and life-long learning, healthcare, and national
recommendations aimed at facilitating Big Data initiatives. and homeland security; able to accelerate the pace of
discovery in nearly all other fields; and essential to
achieving the goals of open government.”
Categories and Subject Descriptors
H.4.2 [Information Systems Applications]: Type of A foundational component to the successful attainment of
systems – e-government applications. the NIT aspirations is Big Data. The integration of large-
scale data sets with emerging NIT technologies and
General Terms visualization techniques can extract knowledge and insights
Legal aspects, management, measurement, performance. to resolve some of the world’s most difficult challenges. In
short, our ability to harness Big Data have the potential to
Keywords reduce information overload, lead to new scientific and
Open government, Big Data. research insights, create economic development, and
generate new policies that benefit the publics served by
Acknowledgment governments.
This report was made possible through the generous
support of the Korea Institute of Science and Technology 1.1 The Promise of Big Data
Information (KISTI). There are increasing examples of Big Data initiatives that
offer views of the promise of Big Data-driven application
1. INTRODUCTION and research. These include:
Alvin Toffler coined the phrase “information overload” in  A partnership between Inrix, Inc. and the New Jersey
his 1970 book Future Shock. In this book, Toffler Department of Transportation (NJDOT). By
introduced the idea that too much information can lead to harvesting signals and data from GPS units in vehicles
challenges in making decisions [20]. In today’s context, and cell phones, Inrix collects data about car speed
however, we are faced with a deluge of data that, when along key roads. In doing so, Inrix can alert the
NJDOT in real time of any incidents along any of the
Permission to make digital or hard copies of all or part of this work for major roads, while simultaneously alerting drivers of
personal or classroom use is granted without fee provided that copies are these incidents by sending traffic alert signals to their
not made or distributed for profit or commercial advantage and that GPS units or smartphones [16].
copies bear this notice and the full citation on the first page. Copyrights  The Climate Corporation is a weather insurance
for components of this work owned by others than ACM must be company that provides policies to make up the
honored. Abstracting with credit is permitted. To copy otherwise, or
difference between federal crop insurance and farmer
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from losses due to weather. Its vast sensor network analyzes
Permissions@acm.org. and predicts temperature, precipitation, soil moisture,
dg.o 2013, June 17 - 20 2013, Quebec City, QC, Canada and yields for 20 million U.S. farm fields. By knowing
Copyright 2013 ACM 978-1-4503-2057-3/13/06…$15.00.

1
The Proceedings of the 14th Annual International Conference on Digital Government Research

the number of heat stress days, as well as soil moisture analytic tools [17, 21]. Moreover, Big Data have the
content, the models developed by the Climate following characteristics: 1) They require significant
Corporation can help determine the amount of weather processing power (such as via a supercomputer); 2) They
insurance farmers need, and payouts that they would span a range of data types such as text, numeric, image,
need to make [21]. video; and 3) They can cross multiple data platforms such
 The New York State Energy Research and as from social media networks, web log files, sensors,
Development Authority (NYSERDA) used a range of location data from smart phones, digitized documents, and
Big Data techniques to assess implications of climate photograph and video archives. Big Data are increasingly
change on the state of New York, and to provide measured in terabytes, are created through collaborative
strategies to deal with climate change in areas such as efforts, and require increased computing power and new
agriculture, public health, energy, and transportation. analytic tools.
This has been expanded by the U.S. Centers for
Disease Control, who is working with 10 states and The notion of Big Data in the United States, particularly
cities through its Read States & Cities Initiative government data, is not new. Whether in print or electronic
(http://www.cdc.gov/climateandhealth/climate_ready. form, the U.S. government has collected and released a
htm) to study and responses to climate change, for wide range of data, publications, and content in the name of
which Big Data are an underlying key component. transparency and openness. Indeed, a core principle of the
These serve as some examples of Big Data use across a founding of the U.S. is access to and dissemination of
range of sectors – and also serve to highlight the potential information about government [7]. Over the years,
benefits of Big Data initiatives. government information and data have evolved, and so too
have the methodologies and approaches to collecting and
This paper reviews and discusses key issues and policies disseminating government data. Some milestones in the
within the United States regarding Big Data. In particular, U.S. include:
the paper focuses on government interaction and  The use of punch cards and an early version of
involvement with Big Data, and concludes computing technology to tabulate the 1890 census data
recommendations for Big Data initiatives based on the (https://www.census.gov/history/www/through_the_de
lessons learned from the U.S. experience. cades/overview/1890.html).
 The launching of Social Security as part of the Social
2. METHODOLOGY Security Act in 1935, which required a large-scale
The authors engaged in exploratory research that included: data collection effort to collect data from 26 million
1) Policy analysis of existing U.S. information policies, workers and 3 million employers. IBM received the
laws, and initiatives that govern the availability and use of contract to undertake this initiative
Big Data by government agencies; 2) A literature review of (http://www.ssa.gov/history/briefhistory3.html).
Big Data initiatives in the U.S.; and 3) Interviews with civil  Cox and Ellsworth [4], both NASA researchers, coin
society groups and data activists in the in the U.S. In all, the term “big data” in reference large scale datasets
interviews were conducted with five different open regarding simulations of airflow around aircraft –
government representatives in communities that focus on datasets that were large and difficult to analyze and
environmental, health, budget, and overall access to process due to computing technology limitations.
government information and transparency. The interviews,  In 2010, the Obama Administration launched data.gov
which occurred between December 2012 and the end of as part of its open government initiative to make
January 2013, were informed by the initial policy analysis available “high value” datasets to the public
and literature review findings by the authors. (http://www.data.gov/about).
Thus, one can trace the evolution of Big Data for over 100
The research questions that guided the research were: 1) years in the U.S. What is new, however, is the overall
How are U.S. federal government agencies defining Big approach and philosophy governing Big Data.
Data? 2) What approaches to Big Data are government
agencies taking, particularly in terms of data release, 4. DATA PHILOSOPHY
availability, use, and collaborations? 3) To what extent A government’s philosophy towards releasing,
does the U.S. information policy environment support, disseminating, and fostering the use of government data is
contend with, promote, or hinder Big Data initiatives? 4) often embedded in a policy framework and initiative. In the
What policy initiatives, changes, and/or guidance are U.S., Big Data is embedded within the Obama
necessary to overcome the challenges and attain the Administration’s Open Government Directive (OGD). On
promise of Big Data? This is a growing space for research, his first day in office, President Obama committed his
practice, and policy development and, as an exploratory administration to an “unprecedented level of openness in
study, this effort serves as a limited and initial review that Government” by adopting three guiding principles of
will require additional study. openness [12]: 1) Transparency: Agencies should treat
information as a national asset and empower the public
3. A BRIEF HISTORY AND DEFINING with the information needed to hold the government
accountable; 2) Participation: Agencies should inform and
BIG DATA improve government decision-making by tapping into the
In general, we can define Big Data as vast datasets that
citizenry’s collective expertise through proactive
cannot be analyzed using conventional software and
engagement; and 3) Collaboration: Agencies should

2
The Proceedings of the 14th Annual International Conference on Digital Government Research

cooperate among themselves and with nonprofits, 5.1 Privacy, Security, Accuracy, and
businesses, academia, and the public to better accomplish
the work of the government. Archiving
Big Data raise a large number of information management
The Administration operationalized its Open Government issues, primarily in the areas of privacy, security, accuracy,
Initiative with the release of the ambitious Open and archiving, spanning major issues such as personally
Government Directive (OGD) [14]. The OGD sought to identifiable information, security of government data and
deliver greater openness, transparency, and accountability – information, and the accuracy of publicly available data. By
but more significantly to provide a mechanism through fostering collaborations and economic development
which to promote institutional transformation of which Big through private-public partnerships, government agencies
Data are a key component. appear to be tacitly endorsing the privacy, security, and
other policies employed by those private sector entities.
The OGD was incorporated into the international scale
Open Government Partnership (OGP), which seeks to OMB Memo M-04-04 (E-Authentication Guidance for
encourage accountability, transparency, and transformation Federal Agencies) mandates that agencies decide the
of governments through, among other factors, Big Data appropriate levels of openness, moderation, authentication
initiatives (http://www.opengovpartnership.org/). This open and attribution, balancing access, identity authentication,
data philosophy is foundational for Big Data to succeed, as attribution, and concern for authoritative sourcing in the
it ensures publicly accessible datasets through managed level of moderation that is needed on an agency website.
processes. To accomplish this, agencies must perform a standardized
risk assessment on all applications they put online.
Similarly, the Federal Information Security Management
5. DATA POLICIES Act (FISMA) requires a Certification and Accreditation
A key issue, however, regarding open government in (C&A) process for all federal information technology
general and Big Data in particular, are the information and systems that utilize social media technologies. Further, this
data policies that govern the management, use, reuse, and C&A activity must be performed by an independent third
accessibility of government information and data. The U.S. party auditing team and must conform to the Risk
has a complex and evolving set of information policies Management Framework of the National Institute of
(laws, regulations, and memoranda) that govern the Standards and Technology.
information lifecycle – from the creation of information
through to its disposition and archiving. In addition, there The Information Quality Act (passed into law in 2001)
are a range of information policies that create a broader requires agencies to maximize the quality, objectivity,
access and dissemination framework. Table 1 identifies utility, and integrity of information and services provided
some of the key policy areas and instruments, while the to the public. The Act came into effect prior to the
ensuing sections discuss some of the key issues within the development and use of prevailing social media
policy structures that require consideration. technologies, nonetheless,, agencies must ensure reasonable

Table 1. Selected Information Policies by Objective.


Policy Objectives related to Big Data Selected Relevant Policy Instruments
Privacy, Security, Accuracy, and Archiving  Children’s Online Privacy Protection Act (COPPA)
 Federal Information Security Management Act (FISMA)
 Information Quality Act
 OMB Memo M-03-22 (Guidance for Implementing the Provisions of the
E-government Act of 2002)
 OMB Memo M-04-04 (E-Authentication Guidance for Federal
Agencies)
 OMB Memo M-05-04 (Policies for Federal Agency Websites)
 Federal Depository Library Program (Title 44 USC)
Governing and Governance  E-government Act of 2002
 OMB Circular A-130 (Management of Federal Information Resources)
 Paperwork Reduction Act
 Various Copyright (Title 17 USC) and Patent & Trademark (Title 35
USC) legislation
Access and Dissemination  Americans with Disabilities Act
 Executive Order 13166 – Improving Access to Services for Persons with
Limited English Proficiency
 Individuals with Disabilities Education Act
 Section 504 of the Rehabilitation Act
 Section 508 of the Rehabilitation Act
 Telecommunications Act of 1996
 Depository Library Act of 1962
 Government Printing Office Electronic Information Access
Enhancement Act of 1993

3
The Proceedings of the 14th Annual International Conference on Digital Government Research

suitable information and service quality strategies records management and archiving efforts are built [1]. By
consistent with the level of importance of the information using third party applications and software that reside on
that include clearly identifying the limitations inherent in non-governmental information systems, or are in a
the information dissemination product (e.g., possibility of continual state of modification and adaptation, data
errors, degree of reliability, and validity) and taking ownership, records schedules, and archiving are significant
reasonable steps to remove the limitations inherent in the issues.
information.
5.2 Governing and Governance
Under OMB Memo M-03-22 (Guidance for Implementing U.S. policy instruments also establish parameters regarding
the Provisions of the E-government Act of 2002), federal governing and governance. These instruments provide
public websites are required to conduct privacy impact broad principles and guidance for agencies, but fail to
assessments, post privacy policies in a standardized address the use of Big Data, as nearly all pre-date the
machine-readable format on each website, and post a development and use of Big Data technologies. Much of
“Privacy Act Statement” that describes the Agency’s legal such guidance encapsulated in information management
authority for collecting personal data and how the data will documents issues by the U.S. Office of Management and
be used. Additionally, federal websites are prohibited from Budget (OMB) establish principles that:
using persistent cookies and other web tracking methods  Agencies are required to disseminate information to
unless an Agency head or designated Agency sub-head the public in a timely, equitable, efficient, and
approved their use for a compelling need. The Privacy Act appropriate manner.
did not envision a time in which people would be
 Agencies are required to establish and maintain
voluntarily communicating with government agencies in
Information Dissemination Product Inventories.
digital “open spaces” like data.gov communities.
 Agencies must consider disparities of access and how
those without Internet access will have access to
When government websites become two-way communities,
important disseminations.
it opens the possibility of virus and other attack agents
being inserted into the government environment, as well as  Agencies should develop alternative strategies to
the possibility of unintended release of information. OMB distribute information.
Memo M-05-04 (Policies for Federal Agency Websites)  When using electronic media, the regulations that
requires that agencies provide adequate security controls to govern proper management and archiving of records
ensure information is resistant to tampering, to preserve still apply.
accuracy, to maintain confidentiality as necessary, and to  Agencies need to evaluate and determine the most
ensure that the information or service is available as appropriate methods to capture and retain records on
intended by the Agency and as expected by users. both government servers and technologies hosted on
non-Federal hosts.
One of the major thrusts of Big Data is to allow users to When considering these principles in light of Big Data
take data from one website and combine it with data from technologies, a range of issues surface, such as: the need
another, commonly referred to as “Mashups.” According to for alternative dissemination strategies for access to and
this memo, agency public websites are required, to the dissemination of government information and services; and
extent practicable and necessary to achieve intended the need to consider records management, archiving, and
purposes, to provide all data in an open, industry standard preservation.
format that permits users to aggregate, disaggregate, or
otherwise manipulate and analyze the data to meet their The E-government Act of 2002 also established several
needs. related principles of e-government that inform the creation
and use of Big Data by government agencies, such as:
The presence of a related policy, however, does not requirements to develop priorities and schedules for
guarantee the application of a solution to an information making government information available and accessible to
management issue raised by Big Data. For over 150 years, the public; post inventories on agency websites; complying
the Government Printing Office has served as the lead and with requirements of Section 508 of the Rehabilitation Act
coordinating agency in conjunction with the Federal in all online activities; and to implementing and
Depository Library Program (FDLP) – a network of nearly maintaining an Information Dissemination Management
150 full, partial, and regional Depositories. This System.
collaborative network has served as the primary means for
providing community access to government information. 5.3 Access and Dissemination
However, the ability of Big Data initiatives to provide To increase access to government information and services
direct constituent and government interactions raises major and to successfully facilitate engagement and collaboration,
challenges for the comprehensive collection and members of the public must be able to access and use Big
dissemination of government information. While Data technologies. Several policy instruments are directly
responsibilities for archiving Big Data remain unaddressed, related to access and dissemination, including:
much of this information disappears.  Executive Order 13166 (Improving Access to Services
for Persons with Limited English Proficiency) requires
This situation also means that there decreasingly exists a that agencies provide appropriate access to persons
permanent and final “document,” upon which nearly all with limited English proficiency, encompassing all

4
The Proceedings of the 14th Annual International Conference on Digital Government Research

“federally conducted programs and activities.” This participation. In addition, data can be stored and made
policy objective is meant to address gaps in e- available in multiple formats (e.g., CSV, XML, Excel) via
government usage among people who predominantly these platforms. Each format has implications, and can
speak a language other than English. limit and/or promote the use of the data. But if the goal is
 The Individuals with Disabilities Education Act for broad public access and use, then commonly used data
requires equal access to all electronic materials used in formats are essential.
public education. The Americans with Disabilities Act
provides broad prohibitions on the exclusion of Simultaneously, however, there is a need for data
persons with disabilities from government services repositories for large-scale scientific and research datasets.
and benefits, including communication with the These exist at research institutions, universities, and
government. Section 504 of the Rehabilitation Act government agencies, but there is a need to create, adopt,
creates broad standards of equal access to government and adhere to formal data management standards and
activities and information for individuals with practices across entities so as to ensure data compatibility,
disabilities, and establishes general rights to accessible naming conventions, and organizational schemes. In
information and communication technologies. addition, there is a need for well-defined data
 The Telecommunications Act of 1996 promotes the documentation and codebooks so as to ensure informed use
development and implementation of accessible of the datasets by researchers.
information and communication technologies being
used online. 7. DATA QUALITY, AUTHORITY, &
 Section 508 requires that electronic and information GOVERNANCE
technologies purchased, maintained, or used by the
The quality, reliability, and authority of Big Data are key
federal government meet certain accessibility
issues for governments, the research and scientific, and
standards designed to make online information and
non-governmental and private sectors. Data of poor quality,
services fully available to people with disabilities.
that are not certified and/or verified, or are collected using
 The Government Printing Office Electronic faulty methods can lead to incorrect findings – which can
Information Access Enhancement Act of 1993 updated undermine significantly a range of decision and policy
the statutes governing the depository library program making processes.
to pave the way for access to and dissemination of
digital government information, initially trough Key elements of the data policies that govern data.gov
GPOAccess, and now through FDSYS include:
(http://www.gpo.gov/fdsys/).
 Placing the burden on the government agencies
Though not exhaustive, the above sections demonstrated
collecting and releasing the data to ensure data
that there is a substantial information policy framework that
accuracy, timeliness, and overall quality (as per the
has yet to address Big Data throughout the information
Information Quality Act discussed in the Data
lifecycle.
Policies section of this report.
 Requiring agencies to maintain version control so as to
6. DATA PLATFORM, QUANTITY, ensure clear labeling of the datasets.
AND FORMATTING  Requiring agencies to ensure that no data with national
Big Data require three key infrastructure ingredients: 1) a security implications are released through data.gov.
platform for organizing, storing, and making data  Requiring agencies to ensure that confidentiality and
accessible; 2) computing technology and power that can privacy guidelines are adhered to regarding released
process large-scale datasets; and 3) data formats that are data.
structured and usable. When it comes to public facing open A key missing overall requirement is the specification of
data systems, governments have relied on data portals, such metadata and data documentation standards. In addition,
as data.gov (US) and data.gov.uk (UK). Big Data often Recommending metadata and documentation standards.
involves not megabits or gigabits of data, but rather However, differing data communities (discussed below in
terabytes and even petabytes of data. As a result, the Data Communities and Collaborations section of the
governments have also supported the development of report) do recommend adherence to metadata and
supercomputing resources housed in key research documentation standards. For example, the Ocean
institutions to facilitate access to the data processing power community recommends adherence to
needed to run computationally complex modeling systems. (http://www.data.gov/communities/node/237/community-
These large-scale supercomputer centers are shared of-practice/data-quality): ISO19115, the international
resources that will need to increase their capacity and standard for geospatial metadata, which the Federal
availability in order to satisfy the needs of Big Data. Geographic Data Committee follows
(http://www.fgdc.gov/metadata/geospatial-metadata-
A key aspect of the U.S. Open Government National standards); The Dublin Core metadata element set for non-
Action Plan [13] is to essentially open source the data.gov geospacial data resources
platform and make it available for replication in countries (http://dublincore.org/documents/usageguide/elements.shtm
around the world. As a public-facing platform, this can l); and Data documentation standards so as to provide
serve as a tool that fosters collaboration, stores datasets, guidance on the contents and use of the datasets.
engages communities, and offers opportunities for

5
The Proceedings of the 14th Annual International Conference on Digital Government Research

These are key components to maintaining and making a new dataset that requires documentation,
available high quality Big Data datasets for consumption. management, and preservation.
Another factor regarding data quality is ensuring the Though data are increasingly born digital and can be made
veracity of the new datasets formed (usually referred to as available instantaneously, governments have a large
“mashups”) when datasets are combined, altered or repository of older datasets that require transformation into
otherwise manipulated for analysis purposes. Often more usable and accessible formats for today’s analysis and
disparate datasets that are not intended to integrate are use. The migration of these data can be costly, requires
combined to create new datasets that have the potential to effort, concerted initiative, and management. And yet
inform researchers, governments, policymakers, and the making such data publicly accessible can yield tremendous
public. Often, however, there is not a formal process or insights into some of our most pressing social and scientific
authorized entity that validates or verifies the combined needs.
data. As stated on the data.gov site, “Once the data have
been downloaded from the agency's site, the government Finally, Big Data are not necessarily born as Big Data – but
cannot vouch for their quality and timeliness. Furthermore, rather, through the accumulation, modification,
the US Government cannot vouch for any analyses incorporation, and manipulation of many smaller datasets.
conducted with data retrieved from Data.gov” It is therefore of significant importance that there be a
(http://www.data.gov/data-policy). While this disclaimer “small data” curation and management plan in effect as
serves to limit the liability of the data.gov initiative, the well.
issue of secondary data use remains significant.
9. DATA COMMUNITIES AND
8. DIGITAL CURATION, COLLABORATIONS
PRESERVATION, & AVAILABILITY There is a distinction between making Big Data available
Digital curation “involves maintaining, preserving and and accessible, and fostering its use. Moreover, there is a
adding value to digital research data throughout its distinction between selective community data use (that is,
lifecycle,” and “curated data in trusted digital repositories only scientists within a particular domain) and broader
may be shared among the wider…research community” interdisciplinary use that cuts across domains and more
(http://www.dcc.ac.uk/digital-curation/what-digital- typical research communities. When coupled with
curation). Importantly, digital curation focuses on emerging technologies such as social media, it is possible
managing digital resources throughout a lifecycle, such as to create broad-based communities that foster collaboration
conceptual issues regarding digital assets, the creation of and engagement, co-production, and crowdsourcing
digital assets, access and use issues, and appraisal and solutions and innovations [1, 3, 5, 10, 15, 17].
selection practices.
Though not mutually exclusive, these opportunities offer
Embedded within the evolving area of the curation of great promise and pose new challenges in redefining
digital assets is Big Data. There is a need to engage in government-community connections and interactions,
active data management strategies for Big Data along the particularly through Big Data.
entire lifecycle, particularly as new digital data assets
continue to grow. Issues to consider regarding Big Data 9.1 Building Big Data Communities
curation include: Though initially simply a repository for datasets, the
 Moving data forward. Over time, older data may be data.gov initiative initially saw low uptake and use of
stored in formats that are incompatible with future released data. To promote and foster data use, data.gov
technologies and analysis tools. There will be a need sought to create and grow data communities around key
to ensure the migration of datasets to newer formats to topical areas where there was an interest and need.
guarantee access and use in the future. Examples include:
 Technology-embedded data. Big Data, in order to be  Education
truly useful, can often be embedded in specialized (http://www.data.gov/communities/education), a
technologies, models, or proprietary systems (e.g., community built around national education datasets
forecasting models, specialty software). The raw data from various agencies. Using visualizations,
themselves may not lend themselves to similar classroom instructional modules, and datasets, the
findings without the embedded technologies and community is designed to assess the state of education
analysis platforms. The issue becomes one of deciding on all levels.
to preserve the data and/or technologies that used the  Health (http://www.data.gov/communities/health), a
data to generate research findings. community that is a one-stop resource for the growing
 Version control. Government data (e.g., employment, ecosystem of innovators who are turning data into new
productivity, GDP) are often updated. It is essential to applications, services, and insights that can help
engage in version control to ensure correct labeling of improve health.
datasets for cataloguing and use. There are similar communities in the areas of business,
 Continually evolving data. One of the major goals of cities, energy, law, manufacturing, oceans, and safety, to
Big Data initiatives is to engage communities to name some (see http://www.data.gov/communities/ for
combine multiple large-scale datasets to create new more details).
knowledge. Each of these permutations of the data are

6
The Proceedings of the 14th Annual International Conference on Digital Government Research

Another example of a data and information community is was made freely available on the Amazon Web
science.gov. As an interagency initiative of 17 U.S. Services (AWS) cloud. Researchers only will pay for
government science organizations within 13 Federal the computing services that they use.
Agencies, science.gov provides searchable access to some  Department of Energy – Scientific Discovery Through
55 databases and millions of federal agency science pages, Advanced Computing. The Department of Energy
research findings, and downloadable datasets. established the Scalable Data Management, Analysis
and Visualization (SDAV) Institute. The SDAC
One factor in building data communities is the growing Institute brings together six national laboratories and
need to integrate and manage data from multiple sources seven universities to develop new tools to help
and sectors. Already under development are a range of scientists manage and visualize data using the
sensor-based technologies such as smart vehicles, Department’s supercomputers.
buildings, and homes – plus the increasingly ubiquitous  US Geological Survey – Big Data for Earth System
smartphones. These technologies enable a constant flow of Science. USGS provided grants through its John
geo-located data regarding traffic, energy consumption, Wesley Powell Center for Analysis and Synthesis. The
water use, and more [19]. But significantly, these flows of Center catalyzes innovative thinking in Earth system
data need to intersect across governments, private sector science by providing scientists a place and time for in-
corporations, utility companies, devices (e.g., cars, depth analysis, state-of-the-art computing capabilities,
smartphones, home sensors, building sensors) and and collaborative tools invaluable for making sense of
individuals for them to be truly useful and inform the huge data sets. These Big Data projects seek to
development of communities and nations. improve our understanding of climate change,
earthquake recurrence rates, and ecological indicators.
9.2 Promoting Research Collaborations These efforts serve to foster data communities as well as
At the same time, however, there is a need to promote make an impact on pressing social and scientific
deeper development within the research and scientific challenges. And, as noted by Lane [8] and Braveman [2],
communities to harness the power of Big Data. To that end, funding Big Data initiatives offers the ability to seek
the Obama Administration announced its “Big Data relationship linkages in scientific grand challenges by
Research and Development Initiative” in March 2012 bringing together often-disparate disciplines that might not
(http://www.whitehouse.gov/sites/default/files/microsites/o otherwise collaborate.
stp/big_data_press_release_final_2.pdf). Collectively, six
U.S. agencies are seeding investments in Big Data In addition to the “Big Data Initiative,” an international
initiatives, including: coalition created the “Digging into Data” challenge
 National Science Foundation and the National (http://www.diggingintodata.org/) in 2009. Sponsored by
Institutes of Health - Core Techniques and the U.S. Institute of Museum and Library Services, the Arts
Technologies for Advancing Big Data Science & & Humanities Research Council, the National Endowment
Engineering. “Big Data” is a new joint solicitation for the Humanities, the National Science Foundation, the
supported by the National Science Foundation (NSF) Netherlands Organisation for Scientific Research, the
and the National Institutes of Health (NIH) that will Social Sciences and Humanities Research Council of
advance the core scientific and technological means of Canada, and the Joint Information Systems Committee
managing, analyzing, visualizing, and extracting (JISC), the Digging into Data challenge sought to address
useful information from large and diverse data sets. how Big Data changes the research landscape for the
 National Science Foundation. In addition to funding humanities and social sciences.
the Big Data solicitation, and keeping with its focus on
basic research, NSF is implementing a comprehensive, 10. RECOMMENDATIONS
long-term strategy that includes new methods to Based on the above discussion regarding the Big Data
derive knowledge from data; infrastructure to manage, experience in the U.S., there are a number of suggested
curate, and serve data to communities; and new recommendations:
approaches to education and workforce development.  Review and recalibrate information and data
 Department of Defense and the Defense Advanced policies as necessary. The U.S. has a complex
Research Projects Agency. Big Data investments in information and data policy structure, and it is
research include how to harness and utilize massive increasingly out of date for today’s data and
data in new ways and bring together sensing, information flows and technologies. There are a
perception and decision support with autonomous number of critical areas in which there is a need to
systems; improve situational awareness to help review and update policies that, for example, govern:
warfighters and analysts and provide increased support o Privacy. Big Data can contain a range of
to operations; and develop computational techniques personally identifiable data at the individual,
and software tools for analyzing large volumes of data, household, vehicle, or other levels. Privacy
both semi-structured (e.g., tabular, relational, laws and policies can contradict the
categorical, meta-data) and unstructured (e.g., text opportunities in Big Data, but Big Data
documents, message traffic). simultaneously can violate the privacy rights of
 National Institutes of Health. The National Institutes individuals or communities.
of Health announced that the data produced by the o Data Reuse. Data are often collected by
international 1000 Genomes Project (200 terabytes) governments or other entities (e.g., utility

7
The Proceedings of the 14th Annual International Conference on Digital Government Research

companies, telecommunication carriers) for a there is a need to consider the archiving and
particular purpose such as receipt of social or long-term preservation of research datasets
other services. And, individual government created at non-governmental institutions such
agencies and/or corporations often have as universities and research centers funded by
acceptable use and privacy policies that govern government research agencies. These strategies
data collection and use. However, Big Data should include overall dataset management so
increasingly combine datasets from across as to ensure the availability of smaller datasets
sectors, governments, and households to create that can become part of Big Data efforts.
new insights and inform decision- and policy-  Robust data platform and architecture. The data
making. Data use and reuse policies should be platform and information architecture of Big Data are
reviewed and updated to give individuals clear significant issues to resolve. There is a need for a
guidelines and make informed decisions robust technology infrastructure for organizing,
regarding their data. curating, storing, and making datasets accessible to the
o Data accuracy. As new datasets are created research and scientific communities, the private and
through combining disparate data from other sectors, and the public. These platforms need to
different agencies, researchers, scientists, provide both physical (technology) and intellectual
private sector companies (e.g., (organizational) access to Big Data, and need to
telecommunications companies, vehicle integrate seamlessly with a range of technologies,
manufacturers, utility companies) and citizen analysis techniques, and information architectures. In
groups, there is a need to develop and ensure the case of publicly accessible platforms such as
data quality standards. Data collected for data.gov, such public facing services serve as a
single purpose use may not be fully compatible generic and publicly available platform. There may
with other datasets, and this can lead to errors also be a need for specialized platforms for very large
and a range of false findings. There is both a scale datasets in particular sectors (e.g., health,
need to ensure data quality as well as a environment).
verification system that validates reported  Processing and computing power. Big Data requires
findings. The disclaimer on data.gov significant computing power to process, analyze,
(http://www.data.gov/privacy-policy) places manipulate, and represent the data through
this burden on 1) the agencies releasing the visualizations. More to the point, Big Data requires the
data, and 2) those downloading and using the capacity that supercomputers offer. In the U.S.,
data. This is an inadequate response to data use supercomputer centers are distributed at several
that can have significant impact on social, universities or university systems across the country
policy, and science programs. (e.g., Ohio Supercomputer Center), Mississippi State
o Data access. Policies that govern access to University, the San Diego Supercomputer Center).
data such as Freedom of Information, archives, There may be a need to increase supercomputing
and preservation need to account for digital capacity as increased numbers of Big Data datasets,
data. But also, given that Big Data often applications, visualizations, and other products require
combine datasets from different sectors – more intensive computer processing and time
research, scientific, public, and corporate – allocations.
there is the question of what policies govern  Data standards. Big Data requires interoperability at
access to and preservation of newly formed the technology level, but also at the data level through
datasets that are created with cross-sector data the adherence to metadata standards. Different
that are neither fully public nor private in terms domains may have varying metadata standards such as
of their content. In addition, Big Data raises ISO19115 (the international standard for geospatial
the issue of public access to government metadata); the Dublin Core metadata element set for
datasets. Publicly accessible portals such as non-geospacial data resources
data.gov provide baseline access, and there is a (http://dublincore.org/documents/usageguide/elements
need for such platforms to data. .shtml); the emerging Data Documentation Initiative
o Archiving and preservation. There are a (DDI) for social and behavioral science
number of issues regarding archives and (http://www.ddialliance.org/); and others such as
preservation policies regarding Big Data, Z39.87 (MIX – Metadata for Images in XML;
including the large-scale nature of digital http://www.loc.gov/standards/mix/) for digital images
datasets, the embedding of analysis and and the Darwin Core
findings within certain technologies and (http://rs.tdwg.org/dwc/index.htm) for biodiversity
techniques, and the raw data files. One side of data. Those creating, generating, and disseminating
the coin is records management and archival Big Data datasets need to consider appropriate data
policies, requirements, and practices of standards formats to ensure collaborations and data
government agencies, collaborative reuse.
partnerships, and archival agencies. The other
side of the coin is the long-term preservation In addition, there needs to be documentation standards
and moving datasets forward over time as data for public release files that describe the organization
and information technologies change. Also, of the dataset, data elements, data type (e.g., numeric,

8
The Proceedings of the 14th Annual International Conference on Digital Government Research

text), and other descriptive information regarding environment, economic, health). If the idea is
dataset contents. Also, limitations of the data should to promote the creation and use of Big Data,
be acknowledged and made apparent. then there is often a need to help build data
 Data sharing across sectors. As Big Data communities – simply releasing data may be
increasingly involves the passing of data in real time necessary, but not necessarily sufficient, to
between systems, governments, and sectors, there is a foster and grow Big Data efforts.
need for a robust data sharing and interoperability o Facilitate data exchanges and sharing across
framework. The transportation domain provides an sectors. Given the range of “smart”
illustrative example. Increasingly vehicles are being technologies, often geo-located and sensor-
equipped with sensors that can log and report a range based with data coming from multiple inputs
of real time data such as speed and roads traveled. (e.g., individuals, vehicles, smartphones), there
Such data (under the jurisdiction of the automobile is a need to engage in large-scale data
manufacturers) can inform issues such as congestion collaboratives that engage various stakeholder
and traffic flow, usually a function under the purview communities, establish data protocols,
of local or national government transportation entities. establish data exchange and reporting
A different example regards public transportation. mechanisms, and include analysis and
Sensors on smartphones (under the control of decision-making and policy-making efforts.
telecommunications carriers) can inform public transit Though innovative Big Data initiatives have begun
authorities about the use of public transit systems (i.e., in a grass-roots way, there is sufficient potential
busses, trains) to inform decisions about capacity benefit that now requires a clear science, funding,
needs. The ability for these data collection and and resource allocation set of policies and seed
reporting systems to seamlessly integrate and create funds.
collaborative analysis techniques, is essential to fully The above recommendations are selective and serve to
realize the benefits of these types of Big Data offer a number of considerations regarding the promotion
applications. As noted above, it may be necessary to and development of Big Data efforts. They also show that
revise information and data policies to reflect this there is a substantial need for a Big Data governance model
integrative data context. to more fully address the policies and practices surrounding
 Data curation, archiving, and preservation. The Big Data.
development of a framework for the management and
curation of digital assets, which encompass Big Data, 11. CONCLUSION
is essential. Digital assets go through a lifecycle from Big Data initiatives hold significant promise for policy- and
conceptualization (which includes establishing decision-making, for fostering greater understanding of
metadata standards) through to transformation significant scientific and social challenges, for fostering
(forward migration to new formats for future use) – collaboration between governments and citizens and
and multiple steps and processes along the way. This businesses, and for ushering in a new era of digital
includes archival activities and preservation efforts government services. As this paper showed, however, there
designed specifically for digital data assets across a are a range of policy issues governing Big Data to consider
range of data types – numeric, pictographic, audio, in a holistic approach than can ultimately lead to a Big Data
video, and multiple media. Metadata schemes, Big governance model. There is a need for future research that
Data technology platforms and information explores such a model and how best to consider Big Data in
architecture, data sharing processes, and data policies that context.
all converge in this space to create a series of curation
issues that require significant development, planning,
and implementation strategies.
12. REFERENCES
 Foster research and data communities. Funding,
[1] Bertot, J. C., Jaeger, P. T., Munson, S., & Glaisyer,
creating research priorities, seeding projects, and
T. (2010). Engaging the public in open government:
bringing together stakeholders is critical to the success
The policy and government application of social
of Big Data initiatives. There is a need to:
media technology for government transparency. IEEE
o Allocate strategically funding to Big Data
Computer, 43(11): 53-59.
initiatives to address grand challenges faced in
the scientific and research communities (e.g.,
[2] Braveman, N.S. (2012). Science metrics and the
climate change, weather prediction), but also to
black box of science policy. Research Trends:
address critical challenges faced in
Special Issue on Big Data, 30(September 2012): 9-
communities (e.g., health and wellness,
10.
transportation).
o Provide interdisciplinary and cross-disciplinary
[3] Chang A., & Kannan P. K. (2008). Leveraging Web
funding opportunities that bring together
2.0 in government. Washington DC: IBM Center for
scientific and research communities in ways
The Business of Government.
that promote innovation through Big Data.
o Facilitate interactions across governments,
[4] Cox, M., & Ellsworth, D. (1997). Application-
sectors, and research and science communities
controlled demand paging for out-of-core
through the creation of Big Data domains (e.g.,

9
The Proceedings of the 14th Annual International Conference on Digital Government Research

visualization. In Proceedings of the 8th conference on Executive Departments and Agencies. Available:
Visualization '97 (VIS '97), Roni Yagel and Hans http://www.whitehouse.gov/the-press-
Hagen (Eds.). Los Alamitos, CA: IEEE Computer office/transparency-and-open-government.
Society Press: 235-ff.
[13] Office of Science and Technology Policy. (2011).
[5] Drapeau, M. & Wells, L. (2009). Social software and The Open Government Partnership: National Action
national security: An initial net assessment. Center Plan for the United States of America. Available:
for Technology and National Security Policy, http://www.whitehouse.gov/sites/default/files/us_nati
National Defense University. Available: onal_action_plan_final_2.pdf.
http://www.ndu.edu/ctnsp/Def_Tech/DTP61.
[14] Orszag, P. (2009, December 8). Open government
[6] Executive Office of the President President’s Council directive. Memorandum for the Heads of Executive
of Advisors on Science and Technology. (2010). Departments and Agencies. Available:
Designing a digital future: Federally funded research http://www.whitehouse.gov/open/documents/open-
and development in networking and information government-directive.
technology. Washington, DC: Executive Office of the
President President’s Council of Advisors on Science [15] Osimo, D. (2008). Web 2.0 in government: Why and
and Technology. Available at: how? Washington DC: Institute for Prospective
http://www.whitehouse.gov/ostp/pcast. Technological Studies.

[7] Jaeger, P.T., Bertot, J.C., & Shuler, J.A. (2010). The [16] Ovide, S. (2012). Tapping ‘big data’ to fill potholes.
Federal Depository Library Program (FDLP), Wall Street Journal (June 12). Last accessed October
Academic Libraries, and Access to Government 31, 2012 at
Information. The Journal of Academic Librarianship, http://online.wsj.com/article/SB10001424052702303
36(6): 469–478. 444204577460552615646874.html.
http://dx.doi.org/10.1016/j.acalib.2010.08.002.
[17] Snyder, C. (2009). Government agencies make
[8] Lane, J. (2012). Science metrics and the black box of friends with new media. Wired, 25 March. Available:
science policy. Research Trends: Special Issue on http://blog.wired.com/business/2009/03/government-
Big Data, 30(September 2012): 7-8. agen.html.

[9] New York State Research and Development [18] TechAmerica Foundation. (2012). Demystifying big
Authority. (2010). Responding to climate change in data: A practical guide to transforming the business
New York State. New York, NY: New York State of government. Washington, DC: TechAmerica
Research and Development Authority. Available at: Foundation. Available at:
http://www.nyserda.ny.gov/Page- http://www.techamerica.org/Docs/fileManager.cfm?f
Sections/Environmental- =techamerica-bigdatareport-final.pdf.
Research/EMEP/Research/Climate-Change/New-
York-State/NYSERDA-Initiatives/Funded- [19] The Economist. (2012, October 27). Special report on
Projects/~/media/Files/EE/EMEP/Climate%20Chang technology and geography: A sense of place. The
e/clim-aid-synthesis-draft.ashx. Economist, 405(8808): 1-22.

[10] Noveck, B. E. (2008). Wiki-government. Democracy: [20] Toffler, A. (1970). Future shock. New York, NY:
A Journal of Ideas, Vol. 7. Random House.

[11] Office of Science and Technology Policy. (2011). [21] Wohlsen, M. (2012). Big data helps farmers weather
The Open Government Partnership: National action drought’s damage. Wired (September 6). Last
plan for the United States of America. Washington, accessed October 31, 2012 at
DC: Office of Science and Technology Policy. http://www.wired.com/business/2012/09/big-data-
Available at: drought/.
http://www.whitehouse.gov/sites/default/files/us_nati
onal_action_plan_final_2.pdf. [22] Zikopoulos, P., Deutsch,T., Deroos, D., Corrigan, D.,
Parasuraman, K., & Giles, J. (2012). Harness the
[12] Obama, B.H. (2009, January 21). Transparency and Power of Big Data: The IBM Big Data Platform.
open government. Memorandum for the Heads of Armonk, NY: IBM.

10