Anda di halaman 1dari 83

Architectural Design and Best Practices Project

Final Report and Design Recommendations (A006.1)

Prepared for the Virginia Department of Education

February 28, 2011

Technical Point of Contact:

Louis McDonald | CIO/CTO

louis.mcdonald@cit.org | 703.689.3037

Administrative Point of Contact:

Pat Inman | Contract Manager

pat.inman@cit.org | 703.689.3037

A006.1 Delivverable Final Report and Desiggn Recommendaations

Conte
ents
Executivve Summary ................................................................................................................................................. 4

Intrroduction ...................................................................................................................................................... 6

1.1

Study Goal ................................................................................................................................................ 6

1.2

Project Deliiverables ................................................................................................................................ 6

Research Processs ............................................................................................................................................. 6

Keyy Messages .................................................................................................................................................... 8

3.1

Stakeholderr Managemeent .................................................................................................................... 8

3.2

Federated Systems Perfform Poorly .................................................................................................... 9

3.3

Data Govern
nance ..................................................................................................................................... 9

3.4

Leveraging Existing Sysstems ............................................................................................................. 10

3.5

Use of Com
mmercial Systtems .............................................................................................................. 10

3.6

Multiple Haash Keys............................................................................................................................... 11

3.7

Requiremen
nts and System Architeccture ......................................................................................... 11

3.8

Clearly Defiined Securitty Policies ..................................................................................................... 12

Arch
hitecture Beest Practice Case Studiess ................................................................................................. 13

4.1

Indiana Dep
partment of Education .................................................................................................... 13

4.2

Iowa Deparrtment of Ed
ducation .........................................................................................................18

4.3

Army Suicid
de Mitigatioon Project...................................................................................................... 20

4.4

Texas Educcation Agenccy .................................................................................................................... 23

4.5

DLA Data Convergencee and Qualityy Project ................................................................................. 26

4.6

NORC at th
he Universitty of Chicagoo Data Enclaave ...................................................................... 29

Subj
bject Matter Expert Interrviews ........................................................................................................... 34

5.1

Dr. Bhavanii Thuraisingh


ham ............................................................................................................... 35

5.2

Paul Carneyy............................................................................................................................................. 36

5.3

James Camp
pbell ..................................................................................................................................... 37

5.4

Susan Carteer ........................................................................................................................................... 38

5.5

Raj Ramesh
h .............................................................................................................................................40

5.6

Ron Kleinm
man ........................................................................................................................................ 41

5.7

Peter Dobleer ............................................................................................................................................42

5.8

Dr. Laura Haas ....................................................................................................................................... 43

5.9

Dr. Thilini Ariyachandrra ................................................................................................................... 44

5.10 Dr. Cynthiaa Dwork ............................................................................................................................... 45

SLD
DS Architecture Overview
w .................................................................................................................... 47

Arch
hitectural Design
n and Best Practiices Project | P a g e | 2

A006.1 Delivverable Final Report and Desiggn Recommendaations

6.1

SLDS Seven
n Functionall Componentts ............................................................................................. 47

6.1.11 Portal ..................................................................................................................................................... 47

6.1.22

Securitty ...........................................................................................................................................49

6.1.33

Workfl
flow ....................................................................................................................................... 50

6.1.4
4

Reportting ........................................................................................................................................ 53

6.1.55

Lexicon
n ............................................................................................................................................ 56

6.1.66

Shakerr .............................................................................................................................................. 57

6.1.77

Data ................................................................................................................................................... 58

Phyysical Infrastructure ................................................................................................................................ 60

7.1

Developmen
nt Environm
ment ................................................................................................................ 60

7.2

Test Enviroonment.................................................................................................................................. 61

7.3

Production Environmen
nt.................................................................................................................... 62

Appendix
x A: Secondary Architeccture Best Prractice Case Studies ............................................................ 63

A.1 Illiinois State Board of Educcation ........................................................................................................... 63

A.2 Noorth Dakota Departmentt of Public In


nstruction.............................................................................. 65

A.3 Washington Education Reesearch and Data Centerr.......................................................................... 68

Appendix
x B: Best Praactice Case Studies Interrviewee Listt .......................................................................... 70

B.1 Ind
diana Departtment of Edu
ucation .......................................................................................................... 70

B.2 Iow
wa Departm
ment of Educaation .............................................................................................................. 70

B.3 Daata Strategiess Army Suiicide Mitigaation Projectt .......................................................................... 70

B.4 Teexas Education Agency ........................................................................................................................... 71

B.5 Daata Strategiess DLA Datta Convergen


nce and Quaality Project ...................................................... 71

B.6 NO
ORC Data En
nclave ................................................................................................................................... 71

B.7 Illiinois State Board of Educcation ........................................................................................................... 72

B.8 Noorth Dakota Departmentt of Public In


nstruction .............................................................................. 72

B.9 Staate of Washiington Educcation Reseaarch & Data Center ............................................................. 73

Appendix
x C: Materiaals Sent to Best Practicess Intervieweees ...................................................................... 74

C.1 Besst Practices Interview Template ........................................................................................................ 74

C.2 Arrchitectural Best Practice, Design & Planning Su


upport Projeect Overview
w ............................ 76

Appendix
x D: Materiaals Sent to Su
ubject Matter Experts ............................................................................. 77

D.1 Sub
bject Matterr Expert In
nterview Tem
mplate .................................................................................... 77

D.2 Virginia Statew


wide Longittudinal Data System - Ex
xecutive Sum
mmary ......................................... 79

D.3 Virrginia Statew


wide Longitu
udinal Data System - Ussage ................................................................... 82

Arch
hitectural Design
n and Best Practiices Project | P a g e | 3

A006.1 Delivverable Final Report and Desiggn Recommendaations

Execu
utive Summary
The educcational land
dscape has ch
hanged dram
matically sin
nce the establishment of Statewide
Longitud
dinal Data Syystems (SLD
DS) throughoout the Unitted States. A grant progrram1 funded by
the U.S. Departmentt of Educatioon, as authorrized by the Educationall Technical Assistance Act of
2002, hass helped to change manyy states K-122 data system
ms significan
ntly and mayy, in fact,
revolutioonize the future managem
ment and utility of educational dataa. States thatt have
implemen
nted these data systemss now have more
m accuratte and robust data and an
n enhanced
ability too access, anallyze and utillize the dataa in a mannerr previously unavailablee in the past.
These successes, how
wever, have been achieveed only throu
ugh rigorouss planning, meticulous design,
and assid
duous implem
mentations.
d a federal multi-year grrant to
In 2010, the Virginia Departmentt of Educatioon (VDOE) was awarded
enhance its statewid
de data system
m, and launcched the Virrginia Longittudinal Dataa System (VL
LDS)
project. The project team was ch
harged with creating a syystem that will address the needs off all
ders; additioonally, the teeam faced ch
hallenging Feederal and Virginia security and privvacy
stakehold
requirem
ments. To meeet these chaallenges, VDO
OE commisssioned the Center for Inn
novative

Technoloogy (CIT) too identify keyy success facctors that coould provide guidance in
n the develop
pment
of a fully secure and private, as well as more efficient, SL
LDS in Virgin
nia.
In order to achieve th
his complex objective, th
he CIT projeect team inveestigated oth
her large datta
integration projects in state educcation agenccies, other goovernmentall agencies an
nd in other
industriees. The team examined published infformation from a varietyy of sources, and a gap
analysis was perform
med on these reports and
d articles to identify misssing informaation and criitical
areas wh
here further research musst be conduccted. In addiition, the teaam interview
wed nine SLD
DS
leaders and a numberr of industryy leaders wh
ho managed large integraation projectts. These
interview
ws provided the foundatiion for the compilation
c
of best practtices and keyy takeawayss that
were esseential to thee success of each of thesee projects.
Informattion collected
d during thee research an
nd analysis phase was an
nalyzed to id
dentify comm
mon
themes and these theemes were orrganized intto a set of beest practices that were in
ncluded in a final
report. This process revealed sevveral key elem
ments2 that were cruciall to ensuringg successful
implemen
ntations. Th
he primary th
hemes that emerged
e
inclluded, amon
ng others, thee necessity foor
detailed project plan
nning and maanagement (the
(
importaance of data governance and stakehoolder
managem
ment) as welll as the need
d to conductt comprehen
nsive research
h and planniing before
implemen
nting the tecchnology and creating th
he system arrchitecture (the use of coommercial
solutionss and leveragging existingg systems).

1
2

TheStateewideLongitud
dinalDataSysttems(SLDS)GrrantProgram
Thefinalreportinclude
edchallengesaandobstaclesttobeavoidedandprovidedrrecommendatiionsforanum
mberof
ms.Inaddition,,thereportco
ontainedasupp
plementthatin
ncludedtheco
ompletereporttswith
preliminaryactionitem
detailedfindingsforeaachinterviewaanddataprojectresearched..

Arch
hitectural Design
n and Best Practiices Project | P a g e | 4

A006.1 Delivverable Final Report and Desiggn Recommendaations

The CIT project team


m, along with
h VLDS subjject matter experts, took
k the originaal conceptuaal
SLDS arcchitecture an
nd incorporaated results from
f
the besst practices research and
d subject matter
expert in
nterviews to develop an implementattion architeccture. The refined architecture (Secction
6) expan
nded the detaails for functtional compoonents, secu
urity, reportin
ng, and how
w data requessts
would bee managed given known
n constraintss. The team developed an
n understand
ding of the fllow of
information, and the necessary workflows too support the different scenarios to support the SLDS
deployment.
All this in
nformation allowed the team to devvelop a physiical infrastru
ucture archittecture (Secttion
7). This included thee physical haardware, the location, an
nd the functionality for th
he hardwaree.
Followin
ng standard lifecycle devvelopment prractices, threee versions of the infrasttructure are
described
d; Developm
ment, Test, an
nd Productioon.

Arch
hitectural Design
n and Best Practiices Project | P a g e | 5

A006.1 Delivverable Final Report and Desiggn Recommendaations

1 Intrroductio
on
1.1 Sttudy Goall
The goal of the Archiitectural Dessign and Besst Practices Project was to provide th
he Virginia
Departm
ment of Educaation (VDOE
E) with an up-to-date an
nd relevant assessment of the best
practicess related to the design, development, deploymen
nt, and operaation of a Staatewide
Longitud
dinal Data Syystem (SLDS
S).

1.2 Prroject Delliverables


s
Deliverable
e
ID
A001
A002.1
A002.2
A002.3
A002.4
A002.5
A003.1
A003.2

A004
A005.1
A005.2
A006.1

Deescription
Monthly Status Repoorts
List of alll SMES interrested in parrticipating in
n
interview
w program
Interview
w Schedule aand Interview
w Template
Preparatiion and prereading matterial as requ
uired
for Intervview SMEs
Consolid
dated outputt from intervviews
Templatee for Final Deliverable
Monthly Status Repoorts
PMO Sup
pport, to incclude program
m
ntation i.e., W
Work plan, S
Scope,
documen
Requirem
ments, Sched
dule, Risk, an
nd Change
Managem
ment plans as requested by the Progrram
Office
Architecttural Best Prractices Report
Workshoop Agenda and Presentation materiaal
Summaryy Workshop
p Findings R
Report
port & Desiggn Recommeendations
Final Rep

Delivery Date
t
Monthly
October 20110
O
O
October
20110
October 20110
O
D
December
20
010
November 20
N
010
Monthly
Ongoing

D
December
20
010
December 20
D
010
J
January
2011
February 20111
F

2 Research Process
s

The CIT project team


m applied itss established
d CIT Conneect research and analysiss process to
execute this effort. This rigorouss best practiice method
dology includ
des the identtification of and
analysis of informatioon provided
d by Subject Matter Expeerts and by comparable SLDS projeccts or
large data integration
ns from the public and private sectoor provides high confiden
nce results.
Addition
nally, this meethod also in
ncludes analyysis and con
nsolidation of feedback received from
m the
SLDS staakeholders.
Figure 1: CIT Connect Process

ProjectandRequirements
D
Definition

Technolo
ogy
Identificatio
onand
Researcch

Technology
Assessmentand
A
nterviewProcess
In

Recommen
ndation
Developm
mentand
Consolid
dation

Feedback
Integrationand
ReportFinalization

CIT Con
nnect projectts are perform
med under the control of a well-defiined project managemen
nt
approach
h. This approoach providees visibility into project status at all times via reegular review
ws,
status rep
ports, and in
nterim deliveerables. The CIT Connecct Process, shown in Figgure 1, provid
des a
diverse, five-step app
proach to sou
urcing innovvation, rigorous analysiss of alternativves, and a

Arch
hitectural Design
n and Best Practiices Project | P a g e | 6

A006.1 Delivverable Final Report and Desiggn Recommendaations

structureed engineerin
ng methodology for creaating final reecommendattions. The usse of this
structureed multi-step
p approach maximizes the
t likelihoood of successs, while redu
ucing risk. Prroject
execution
n is guided by a project plan develop
ped and main
ntained by the CIT projeect managem
ment
team.
For the VDOE Archittectural Dessign and Bestt Practices Project, the process begaan with an
analysis of the inform
mation and requirementss provided by VDOE and
d the identiffication of th
he
information and tech
hnology areaas to be targeeted for stud
dy. The secon
nd process sttep focused on
initial ressource sourccing by developing a list of candidatee SLDS and large data in
ntegration
perts in the in
nformation and technoloogy areas defined for thee
projects and Subject Matter Exp
project. This second process prod
duced severaal deliverablles focused on Subject Matter Experrts
interview
ws and best practices casse studies th
hat were pressented to the Virginia Department of

Educatioon between October 2010


0 and Decem
mber 2010.
The Arch
hitectural Beest Practices Report (A0
004) focused
d on initial daata sourcingg, both to devvelop
a list of candidate SL
LDS and large data integrration projeccts to be anaalyzed and too collect
published analyses, reports, and case studiess on existingg projects. Th
he set of can
ndidate projeects,
which was documen
nted in Deliveerable A002-2, was com
mpiled by acccessing CITss business
network and by searching our daata resourcess to identify candidate companies an
nd organizattions.
This list was delivereed to the Virrginia Department of Edu
ucation on October 31, 2010.
The Consolidated Ou
utput from Subject Mattter Expert In
nterviews (A
A002.4) focu
used on initiaal
resource sourcing byy developing a list of cand
didate Subjeect Matter Experts in th
he informatioon and
technology areas defiined for the project. CIT
T researcherss reached ou
ut to CITs bu
usiness netw
work
in additioon to contaccting these id
dentified exp
perts, who were discoveered through
h literature
searches.. The set of candidate Su
ubject Matteer Experts, which was doocumented in Deliverablle
A002.1, was submitteed to the Department off Education on October 31, 2010.
For both
h reports, CIT
T reviewed the sourced materials
m
an
nd developed
d a gap analyysis of inform
mation
requirem
ments. The gaap analysis was the basiss for the creaation of survvey tools and
d an analysis
framewoork which th
hen were used to guide th
he interview
w process and
d the assessm
ment steps that
followed
d. The gap an
nalysis also guided the seelection of leeaders of proojects targeteed for direct
interview
ws.3 The infoormation prioorities that emerged
e
from
m the gap an
nalysis were, therefore, used
to classiffy and to priooritize intervview candid
dates via a prrocess based
d upon criterria that inclu
uded
domain relevance to VDOE, cost, complexityy, technical and businesss maturity, and stakehollder
consideraations. The survey questtions and tarrgeted list off candidates for best praactice interviiews
were proovided to thee Departmen
nt of Educatiion as Deliveerable A002--2 on Octobeer 31, 2010. The
targeted list of Subjeect Matter Ex
xperts (Deliverable A00
02.1), intervieew schedulee, question

templatee (Deliverablle A002.2), and pre-read


ding materiall (Deliverablle A002.3) were provided
d to
the Depaartment of Ed
ducation on October 31, 2010.
The subssequent phasses of the proocess focuseed on consullting leaders of similar sttate and
commerccial efforts, synthesizingg and analyziing and orgaanizing theirr feedback an
nd presentin
ng
these besst practices to stakehold
ders. Step thrree, the longgest phase off the project,, involved
3

Alloftheeprojectshadpotentialrelevvancetotheefffort;however,itwouldhaveebeenbothinffeasibleand
duplicativetointerview
wtheleaderso
ofallofthecan
ndidateprojeccts,giventhisp
projectsshorttimeline.

Arch
hitectural Design
n and Best Practiices Project | P a g e | 7

A006.1 Delivverable Final Report and Desiggn Recommendaations

performing interview
ws and synth
hesizing and
d analyzing th
he informatiion from botth the Subjecct
Matter Experts and Best Practicees candidatees. During th
his phase, CIIT researcherrs conducted
d nine
interview
ws with lead
ders of large-scale projectts representiing both thee public and private sectors,4
and ten Subject Mattter Experts.
In step foour, the CIT Connect teaam organizeed and categoorized key leessons learneed from each
h of
the intervviews and highlighted common them
mes and uniq
que insightss. The CIT Connect team
m
presented
d the best prractice and Subject Matter Expert in
nterview fin
ndings to VD
DOE stakeholders
on Decem
mber 13, 2010
0 and integraated feedbacck and guidaance from th
he stakeholdeers to generaate
the reporrts A004 and
d A002-4.
Step five involved thee presentatioon of the CIT
T project teaams implem
mentation arcchitecture an
nd
associateed physical in
nfrastructurre to the SLD
DS stakehold
ders on January 27, 2011. Presentatioons
were mad
de by the CIIT project team and VLD
DS subject matter expertts. The purpose of this
workshoop was to forrmalize an aggreed upon architecture for the SLD
DS.

3 Key
y Messa
ages
The threee componen
nts of the VD
DOE Architecctural Design
n and Best Practices prooject covered
d
three diffferent themees and focus. The Best Prractices inteerviews and analysis centered on thee
implemen
ntation and logistical prrocesses invoolved in largge scale data integration projects. Th
he
Subject Matter Expeerts interview
ws focused oon technical best practicces for an LD
DS architectu
ural
design.
The folloowing topicss are the key messages an
nd best practices borne out of the Beest Practicess and
Subject Matter Interrviews.
Best Practtices Interviiews
Stakehollder Manageement
Data Govvernance
Use of Coommercial Solutions
S
Leveragin
ng Existing Systems
Requirem
ments Drive System Arch
hitecture

Subject Maatter Experts Interview


w
Federaated Systems Perform Pooorly
Data Governance
G
Use off Commerciaal Solutions
Use off Multiple Hash
H
Keys
Clearly Defined Seecurity Policcies

3.1 Sttakeholde
er Manag
gement
When em
mbarking up
pon a system
ms integration
n project, nu
umerous stak
keholders pllay a part in the
planningg, developmeent, implemeentation, and
d maintenan
nce of the sysstem. Knowiing stakeholders
requirem
ments, expecttations, and resources arre essential to a projectss success.
During th
he Army Suiicide Mitigattion Project,, Data Strateegies discoveered that maanaging the
stakehold
ders becamee an overwheelming task when it cam
me to obtaining memoran
ndums of
understaanding (MOU
U) and data sharing agreeements neeeded prior too the integrattion of a data
nto the system. Data Straategies also found that clear commun
nication bettween the prroject
source in
implemen
ntation team
m and the staakeholders as well as com
mmunicatioon among thee stakeholdeers
was best facilitated by the projecct managers.. The DLA Data Converggence and Qu
Quality Projecct
4

Alistofttheorganizatio
onsandcompaanieswithwho
omwespokem
maybefoundin
nAppendixB.

Arch
hitectural Design
n and Best Practiices Project | P a g e | 8

A006.1 Delivverable Final Report and Desiggn Recommendaations

managerss needed to ensure accurrate and tim


mely commun
nication of project statuss, feedback, and
next step
ps; this creatted a foundation for a poositive collab
borative enviironment. Th
his positive
collaboraative environ
nment amon
ngst the team
ms contributeed to the oveerall successs of the projeect.
The Indiaana Departm
ment of Educcation attem
mpted to gath
her stakehold
der requirem
ments througgh
large, moonthly meetings before discovering tthat meetinggs with indivvidual stakeh
holder group
ps
proved too be the morre effective and led to inccreased buy--in.
Lastly, ass Illinois is currently in the design sttage of its SL
LDS, the Illin
nois State Booard of Educcation
has hired
d a consultin
ng firm to perrform some stakeholderr managemen
nt. The consu
ulting group
p is in
the midst of gatherin
ng the techniical and proggram inform
mation for eacch of the 13 data systemss that
will be in
ntegrated intto the Illinoiis SLDS and this informaation will haave a direct impact on th
he
SLDS fin
nal architectu
ure.
(Army Suiciide Mitigationn, DLA, Illinoiss State Board of Education, Indiana Department of Education)

3.2 Fe
ederated Systems
s Perform
m Poorly
Federated systems su
uffer in perfoormance more than a cen
ntralize dataabase. Requiirements and
d
queries should be plaanned prior to building the system to maximize performancce. Usually with a
distributted databasee model, dataa converges into a wareh
house for sim
mpler analysiis. A global
schema is defined thaat allows forr an easier coonvergence of the data. The federateed model for the
SLDS can
nnot allow foor permanen
nt convergen
nce, nor is it likely that a global scheema will be
developeed that encom
mpasses all data sourcess. The disparrate data and
d network im
mpacts on th
he
distributtion of the daata can impaact the overaall performan
nce of the fed
derated arch
hitecture.
(Ariyachaandra, Doblerr, Haas)

3.3 Da
ata Goverrnance
The impoortance of daata governan
nce was a common message through
hout the courrse of this efffort.
Data govvernance ofteen is viewed as a large in
nitial effort foor many data integration
n projects. For
systems that continu
ue to expand
d and to add data sourcess, however, it will be an ongoing effoort,
one that,, our intervieewees noted
d, is often und
derestimated
d. In most caases, an SLD
DS effort musst
accommoodate a num
mber of disparrate stakehoolders and soources and, thus, requirees a higher th
han
normal leevel of effortt to identify data ownersship and oveersight to enssure its accu
uracy and
security. The fact thaat each data source will have its own
n data govern
nance createes an additioonal
layer of complexity when attemp
pting to creaate and manaage data governance. Prior to
implemen
nting a stateewide system
m, it is critical for the staakeholders to agree on who owns th
he
data in th
he system, who will overrsee and maiintain the syystem and who will apprrove output and
requests for access.
The Nortth Dakota Department of Public Instruction and
d the Washin
ngton Reseaarch and Datta
Center arre two projeects that are in the early stages of building their data integraation systems.
Both of th
hese projectts have had difficulties m
moving forwaard with thee implementaation of the
system due because of stakehold
ders inabilityy to agree up
pon the ruless of the LDS data govern
nance.
The Arm
my Suicide Mitigation and
d the DLA Data Converggence and Quality Projeccts were proojects
that stresssed the imp
portance of data governaance during the early plaanning stages prior to
implemen
ntation and continue to emphasize these elemen
nts as data sources are added. With the

Arch
hitectural Design
n and Best Practiices Project | P a g e | 9

A006.1 Delivverable Final Report and Desiggn Recommendaations

addition of a new datta source, th


he project maanagers musst understand the new soources
governan
nce and how
w its integratiion will affecct the LDS overarching governance..
Appropriiate data govvernance, paarticularly in
n a federated
d model, is keey to ensurin
ng that the data,
the linkaages of data, and perform
mance of the system are optimized. Once the SLD
DS architectu
ure is
implemen
nted and operational, it is important to monitorr the types of queries exeecuting in th
he

system. This monitorring allows for tuning off the system to improve its performaance, and to
understaand how the security moodel is enforccing the rulees. It is not un
ncommon th
hat rules willl
need to be tweaked as data goverrnance contiinues to evollve.
Although
h the Virginiia SLDS stak
keholders haave a strong understandiing of the syystems baselline
data goveernance, a nu
umber of facctors will inffluence the need for upfrront and onggoing efforts. The
Virginia SLDS will ad
dd future sources and th
his will requiire both a riggorous upfroont effort to
minimizee rework and
d redesign an
nd will neceessitate ongooing efforts when changees are made to the
existing data sourcess or when neew data sourrces are addeed.
(Army Suuicide Mitigattion, DLA, North Dakota SL
LDS, Texas Edducation Agency, Washingtoon Research annd Data
Center)

3.4 Le
everaging
g Existing
g System
ms
Some SLD
DS projects were able too leverage ex
xisting system
ms to becom
me the found
dation of the SLDS
system; this saved tim
me and resou
urces duringg the design and implem
mentation staages of the prroject.
Indiana was able to leverage an existing systtem, Learning Connectioon, and expaand on its
capabilitties. Initially, Indiana had
d not planneed to expand
d its Learnin
ng Connectioon portal, as it

was builtt primarily ffor teacher networking aand was not intended to be a workin
ng data systeem for
other staakeholders. However, du
ue to politicaal conflicts, Learning Connection evoolved into su
uch a
system. Indianas LD
DS project waas begun by the previouss administraation, and th
he states new
w
leadershiip originally planned to eliminate Leearning Conn
nection. Thee Indiana Deepartment off
Educatioon, however, argued thatt starting oveer with a new
w data systeem for K-12 would not be
cost-effecctive. In the end, Learnin
ng Connectiion was mod
dified to be used as a colllaboration siite
and as a K-12 data management system.
North Daakota also was able to leeverage an ex
xisting data warehouses to avoid reeinventing th
he
h Dakota SL
LDS team
wheel.5 After surveyying what syystems existeed in the state, the North
discovereed that theirr legacy K-122 system had
d the techniccal capabilitiies to form th
he LDS
foundatioon. This K-12 warehousee will be exp
panded into an LDS and will collect information
n from
other ageencies. By bu
uilding out th
he K-12 dataa warehousee into an LDS
S, North Dak
kotas team saved
time and
d money in th
he project, which will en
nable them to focus on other techniccal and non
technicall issues (such as linkages between other data systems).
(Indiana Department of Education, North Dakotaa Department of Public Instruuction)

3.5 Us
se of Com
mmercial Systems
s
Our interrviews reveaaled both positive and neegative consequences of using comm
mercial off-th
he
shelf solu
utions; thesee commercial solutions can be a beneefit, saving agencies timee from build
ding
its own solutions, bu
ut they can also limit thee versatility and expandaability of thee system.
5

Korsmo,T.(2010,Octo
ober26).Telep
phoneInterview
wwithRonaJo
obe.

Archiitectural Design and Best Practicces Project | P a g e | 10

A006.1 Delivverable Final Report and Desiggn Recommendaations

The Indiaana Departm


ment of Educcation began
n their projecct with an Oracle platforrm for their data
warehou
use, but eventually switched to a SQL
L platform. . After a yearrs effort, thee project stafff
realized that Oracle was not meeeting their needs, was ovverly compliccated, was not user-frien
ndly,
and was extremely ex
xpensive. Th
he team restarted with a new solutioon and had to perform reework
because of the comm
mercial solutiion they initially chose; however, sin
nce their movve to the SQ
QL
platform, the SLDS has progresseed rapidly an
nd has perfoormed well.
The Iowaa Departmen
nt of Educatiion purchaseed an off-thee-shelf data model for th
heir SLDS. Th
his
model was adopted prior to Iowaa receiving tthe SLDS aw
ward, when th
he system was focused on the
K-12 spacce. After decciding to exp
pand their effforts to P-166, the SLDS team found that the dataa
model th
hey had purchased did noot work as aan effective model for thee higher education data
within th
he state. In order to integgrate the higgher education data, the Iowa team is investigating
whether to purchasee additional commercial data modelss or to build their own cu
ustom data
models (b
both which will requiree additional ffinancial and
d man hour resources).
The Texaas Education
n Agency purchased com
mmercial off--the-shelf soolutions in orrder to devellop a
public-faacing Web siite where ussers can acceess the data from the SLD
DS. The Tex
xas SLDS team
m
found thaat commerciial software provided ad
dequate toolss that alloweed them to maintain thee Web
site, whille minimizin
ng maintenan
nce resourcees.
There aree few off-thee-shelf solutiions that aree able to perfform data in
ntegration on
n-the-fly in
federated
d databases, but the spacce continuess to grow. Major databasse vendors, e.g. IBM, Oraacle,
have fedeerated datab
base managem
ment system
ms that are ab
ble to assist with the inttegration
requirem
ment.
(Indiiana Departm
ment of Educatiion, Iowa Depaartment of Eduucation, Texass Education Aggency, Ariyachhandra,
Haas, Ramesh)

3.6 Mu
ultiple Ha
ash Keys
s
Encryptiion of person
nal identifiab
ble information (PII) usiing one-wayy hashing waas discussed as a

method for protectin


ng an individ
duals identitty. Subject Matter Experrts mentioneed that usingg
various data to createe hash keys can provide a number off options forr greater recoord matching.
Techniqu
ues can inclu
ude combining multiple values into a single hash
h, or creatingg multiple hashes
that can be used for comparison.
(C
Carney, Carteer, Dobler, Kleinman)

3.7 Re
equireme
ents and System A
Architectu
ure
Reportin
ng and usagee requiremen
nts should deetermine thee type of arch
hitecture to be built and
d
identifyin
ng these elem
ments of thee system earlly in the desiign and deveelopment ph
hases will savve
time and
d money. Datta warehousiing specialissts at Claraviiew,6 emphaasized duringg an intervieew
that know
wing how th
he system sh
hould perform
m and what functions will be requirred will drive the
architectture of the syystem. In esssence, the architecture of an LDS shoould be deteermined largeely by
what an agency wantts it to do. The Claraview
w team cauttioned that, as they havee witnessed with
other staate departmeents of educaation, failuree to identify and addresss system and
d stakeholderr
needs adequately willl result in a failed or less than optim
mal LDS.
6

Claraview
wisabusinesssintelligenceanddatawareh
housingconsulttingorganization.See
www.claraview.com/dnn/
http://w

Archiitectural Design
n and Best Practices Project | P a g e | 11

A006.1 Delivverable Final Report and Desiggn Recommendaations

The Indiaana Departm


ment of Educcation team cconcurred th
hat design sh
hould be dep
pendent upoon
functionss, or how thee agency plaans to use thee system. Th
hey reiterated that since all states haave
differing reporting reequirementss and needs, n
no one desiggn solution will be approopriate for alll.
This requ
uirement shoould be closeely aligned w
with identifyying stakehoolder needs.
(Indiana Department of Education)

3.8 Cllearly Deffined Sec


curity Pollicies
Institutin
ng security polices for th
he protection
n of data to prevent the possible ideentification of a
person iss critical to the success of the system
m. SMEs statted that secu
urity policiess and measurres
need to be defined cllearly. Securiity policies, in combinattion with thee database seecurity, can

maximize the protecction of sensiitive data. To

o rely only on database security toolls would be shortsighted. It is importaant to review


w all aspects
of security, including op
perating systtem hardeniing
practicess and networrk device con
nfigurations

s.
(Dwork, Kleeinman)

Archiitectural Design and Best Practicces Project | P a g e | 12

A006.1 Delivverable Final Report and Desiggn Recommendaations

4 Arc
chitectu
ure Best Practice
e Case Studies
The goal of the Archiitectural Dessign and Besst Practices Project was to provide th
he VLDS teaam
with an up-to-date and relevant assessment of the best practices relaated to the design,
developm
ment, deployyment, and operation of a Statewide Longitudinaal Data Systeem.
The Centter for Innovvative Techn
nology (CIT)) was commiissioned by VDOE to conduct researrch on
similar loongitudinal database devvelopment efforts or largge data integgrations acrooss disparatee
organizattions. Addittionally, CIT
T was tasked
d to produce best practicce recommen
ndations that
would in
nclude the id
dentification of risks and
d impedimen
nts in buildin
ng an LDS. Based on the
information collected
d from the nine individu
ual case studies,
d
the projeect team con
nsolidated
themes, lessons learn
ned and bestt practices.

4.1 Ind
diana Departmentt of Educ
cation
State/Agency:
Web Sitee:
Address:
POC:
POC Phoone:
POC Email:

Indiana Deepartment of Education


http://ww
ww.doe.in.govv/data/
151 West Ohio Street
Indianapoolis, Indiana 46204
Molly Chaamberlin
Director of Data Analyysis Collection and Repoorting
317-234-68849
mchamberr@doe.in.govv

CaaseProfile
Stud
dentEnrollm
ment:1,046,1
1477
Teacchers:62,668
88
LDSGrant:$5,18
88,2609

Backgrou
ound
In 2007, the Indiana Departmentt of Educatioon (IDOE) was awarded
d approximattely $5.2 milllion
to create a compreheensive P-20 data system. For its LDS, IDOE envissioned a systtem that woould
allow daata integration at all leveels and woulld enable staakeholders too track and to analyze
student achievementt and attainm
ment from eaarly childhoood through higher educaation and
beyond.10 The main objectives of Indianas LDS were to improve datta quality; prrovide
7

Stateedu
ucationaldata
aprofiles.(n.d.)).Retrievedfro
om
http://nces.ed.gov/pro
ograms/stateprofiles/sresult.asp?mode=sh
hort&s1=18
8
Ibid.
9
Statewid
delongitudinaldatasystemg
grantprogramgranteestateeIndiana.(n.d.).Retrievedfrom
http://nces.ed.gov/pro
ograms/slds/sttate.asp?stateaabbr=IN
10
IndianaP
P20ComprehensiveDataSyystem.(n.d.).R
Retrievedfrom
http://nces.ed.gov/pro
ograms/slds/pd
df/Indianaabsttract.pdf

Archiitectural Design and Best Practicces Project | P a g e | 13

A006.1 Delivverable Final Report and Desiggn Recommendaations

longitudiinally linked
d data to be used to drivee policy deciisions; and too make the data user-friiendly
for teach
hers, principaals, superinttendents, and
d other stakeeholders.
The projeect involved
d many differrent stakeholders, primaarily: the Loccal Education
n Agencies, IDOE
data warrehouse, inteernal IDOE staff, Departm
ment of Workforce Deveelopment, an
nd higher
n (e.g., Ivy Tech, Indianaas statewidee communityy college nettwork). Indiirectly, the
education
projects stakeholderrs and consu
umers were policy makerrs, legislatorrs, parents an
nd students.
Initially, IDOE held monthly meeetings for th
he LDS stakeeholders. Th
hese stakehollders includeed
m every majorr division in the IDOE, sp
pecial educaation, languaage minorityy, Title
representtatives from
1, Curricu
ulum and In
nstruction, Data Reportin
ng, Student Services, Teechnology, 40
0 fellows froom
different school systeems across the state, state administrrators, etc. Because of th
he large size of the
group, th
he meetings became tooo involved an
nd unproducctive.11 An outside evalu
uator suggestted
performing a series of interviewss with the in
ndividual grooups instead of holding large stakehoolder
meetingss.
IDOE con
nducted inteerviews with
h each of thee stakeholdeer groups and
d asked abou
ut their visioon in
an LDS in
n terms of fu
unctionality and design. During this process, IDO
OE acted as intermediarry and
a champiion for each group. As a result, not only
o was IDO
OE able to gaather pertineent input froom
each stak
keholder grooup, but theyy obtained buy-in
u
from the stakehold
ders. IDOE synthesized the
feedback
k data and crreated a smalll functionall committee that assisted
d in the day--to-day decissions
of buildin
ng and desiggning the LD
DS.

KeyyTakeaway
Inputfromstakeholderrsisessentiaaltodesigniingalongitu
udinaldata
system.The
erequireme
entssetforth
hbythestakeholdersh
help
determinetthearchitecctureandfunctionalityo
ofthefinalssystem.

The IDOE team faced


d a number of obstacles while buildiing Learningg Connection
ns and the data
12
warehou
use. First, th
he team foun
nd that theirr original plaatform, Oracle, was expeensive and time
consumin
ng to learn. Once the plaatform was switched to SQL, howevver, the projeect progresseed
smoothlyy. Another difficulty wass the changee in Indianass administraation. Once the new
administtration took office, the LDS team waas forced to defend the need for an LD
DS and present an
overview
w of what a data warehou
use is, how it should fun
nction; and what had beeen done untiil that
point. In spite of IDO
OEs presentations, the new adminisstration was still uncertaain on what to do
with an LDS system, particularlyy Learning Connection. 13 After severral discussioons, the IDOE
LDS team
m persuaded
d the adminisstration to expand Learn
ning Connecction to becoome an LDS tool.
The new administrattion respond
ded and requ
uested additiional changees with thee evaluation
11

Chamberlin,M.(2010,October25).TTelephoneInteerviewwithRo
onaJobe,CIT.
LearninggConnectionsw
wasbuiltinap
pproximately18
8months,whilethewarehou
usewasbuiltin15months.
13
TheorigginalLearningC
Connectionwaasaninteractivve,networkinggsite,andthen
newadministrationwantedtto
closeitbeccausetheydidnotseeitsvallue.
12

Archiitectural Design and Best Practicces Project | P a g e | 14

A006.1 Delivverable Final Report and Desiggn Recommendaations

process and switchin


ng from Oraccle to SQL. After negatioons, the LDS
S team was able to retain
n their
evaluatioon system, bu
ut replaced their Oracle platform wiith SQL. Aftter securing the new
administtrations consent and app
proval, alongg with the aggreed-upon changes, thee linkage of the
two systeems, Learnin
ng Connectioon and the warehouse, was relativelly short and seamless.
Addition
nally, IDOE also created a help desk for Learningg Connection
n and its pub
blic reportin
ng
p desks havee email addreesses to whiich users can
n submit
system, DOE Compaass; both help
questions.

System Design and


d Architecturre
Indianass LDS is an amalgamation of multiple systems. Itt is compriseed of three main data
warehou
uses and porttals: Learnin
ng Connectioon, IWIS (In
ndiana Work
kforce Intelliigence Systeem),
and the IDOE Data Warehouse. Additionallyy, a public siite, IDOE Coompass sharees aggregatee
reports and data. Thee IDOE Com
mpass system
m accesses coopied tables and rolled-u
up data from
m the
IDOE Warehouse, which are shaared with thee public. Avaailable data sets includee number of
students and teacherrs in Indianaa as a whole, and in certaain districts. Access to ceertain data sets is
restricted
d to authorizzed users.
IDOE Datta Warehouse
The IDOE Data Warrehouse is an
n internal entterprise dataa warehousee that is builtt on an SQL
platform. The warehouse projectt commenced
d two years ago and emp
ployed an Orracle platforrm
and Oraccle Business Intelligencee tools. This architecture
a
e was chosen
n because of the
recommeendations of the Indiana Office of Teechnology, prior to the new administration. Afteer a
year, IDO
OE reevaluatted the systeems perform
mance and cooncluded thaat the Oraclee tools did noot
meet their needs. Acccording to th
he LDS stafff, the Oracle tools were not user frieendly, were
overly coomplicated [and] extrem
mely expensiive. Therefoore, the team
m investigateed other solu
utions.
One prod
duct they considered as a reporting platform
p
waas SharePoin
nt, but found
d the softwarre too
expensivve.
When th
he new administration toook office, th
he Data Anallysis, Collection, and Reeporting Offiice
briefed th
he new CIO on the prob
blems they had encounteered in build
ding the dataa warehouse and,
he SQL platfform. After the conversioon to
under hiss guidance, the data warrehouse was moved to th
the SQL platform, ID
DOE was able to create data marts in
n-house. Theey currently are using
Microsofft SSRS and SSAS and, th
hus far, havee not encoun
ntered the prroblems theyy experienceed
with Oraacle and noteed that thesee tools are vvery, very eassy to use.14
Learning Connection
Originallly, Learning Connection
n was built ass an interacttive site for Indiana teach
hers to exch
hange
information on lesson
n plans, tech
hniques, and
d resources similar to a social-netw
working site..
Throughout the courrse of Indian
nas LDS projject, Learnin
ng Connectioon evolved in
nto a learnin
ng
managem
ment tool thaat provides data to stakeeholders at the local leveel. In essencee, Learning
Connectiion is a portal where teaachers and ad
dministratorrs can accesss standards--based activiities,
share lessson plans, an
nd communiicate with otther teacherrs. Additionaally, Learningg Connectioon also
allows teeachers to acccess their cu
urrent studeents longitud
dinal data. Currently, th
he system can
n run
simple reeports, but th
he IDOE teaam is workin
ng on expand
ding it to havve more com
mplex-reportting
capabilitties. Learningg Connectioon also interaacts with thee Data Wareehouse by pu
ulling copieed
14

Ibid.

Archiitectural Design and Best Practicces Project | P a g e | 15

A006.1 Delivverable Final Report and Desiggn Recommendaations

data from
m the warehoouse. Once the data has been cleaned
d, checked, verified, and
d loaded intoo the
warehou
use, Learningg Connection
n retrieves reeports directtly from the warehouse.
IWIS
The Indiaana Workfoorce Intelligeence System (IWIS) is a separate datta warehousse that is link
ked to
K-12 and
d post-second
dary data. According to the Indiana Workforce Developmen
nt website, IWIS
began byy integratingg disparate data sets from
m within th
he Departmen
nt of Workfforce
Developm
ment to then
n integratingg [the] resultting new datta with inforrmation from
m the Comm
mission
15
for Higheer Education
n. This datta warehouse is outside of the IDOE
E and is run by Indianas
Departm
ment of Work
kforce Devellopment. IW
WIS was addeed to the LD
DS project aftter Learning
Connectiion and the IDOE Data Warehouse. Originally, the Indiana SLDS team planned to
populatee the IDOE Data Warehoouse with lin
nked data. However, afteer discoverin
ng that the
Departm
ment of Work
kforce Devellopment and
d members off the higher education coommunity had
systems of their own
n, the SLDS team decided
d to link them with Indiianas LDS.
Theoreticcally IWEIS
S also will pu
ull its data frrom the dataa warehouse;; however, ID
DOE currenttly is
strugglin
ng with acqu
uiring accesss to data from
m its Departm
ment of Woorkforce Development,
although
h Ms. Chamb
berlain did not elaboratee on this poin
nt.

Securityy
Informattion within the Data Waarehouse is id
dentifiable, however, wh
hen other syystems pull
information from thee warehousee, the warehoouse creates a set of tables from the identified daata
that is dee-identified and aggregatted. In essen
nce, systems do not actually access the source daata
directly. For examplee, Learning Connection only accessees tables thatt have been created and
copied frrom the wareehouse. Add
ditionally, the warehousee and Learning Connectiion utilizes role
based permissions; educators in Learning Coonnection haave access on
nly to their current stud
dents
and admiinistrators have access only to the sttudents currrently attend
ding their schools.
Moreoveer, the sourcee data in the warehouse can be accesssed only byy certain IDO
OE personnell who
have app
propriate perrmissions approximateely four peop
ple. Ms. Chaamberlain deeclined to dissclose
what oth
her security measures haave been imp
plemented.

Data Usa
sage and Rep
porting
The wareehouse housses five yearss of data, wh
hich represen
nts approxim
mately one million publicc
school sttudents recoords and 65,0
000 non-pub
blic school students records. Assesssments data are
generated
d once a year this inclu
udes enrollm
ment and oth
her data requ
uired to geneerate the statte
report caard and reports to the fed
deral govern
nment.
The publlic may view
w prepared agggregated daata sets by scchool and byy district, as well as pub
blic
reports th
hrough the Compass daata site. The system accessses copied tables (rolled up data) to
generate these reportts and aggregated data.166 Certain datta from the IDOE Comp
pass site are
accessiblle only to reggistered userrs. However, other unideentified, agggregate data sets that aree not
readily avvailable throough the sitee can be requ
uested. Depeending upon
n the size of the request, this

15
16

Chamberlin,M.(2010,October25).TTelephoneInteerviewwithRo
onaJobe,CIT.
Thesysttemisnotacce
essingdeidenttifieddata,buttratheraggreggateddata.

Archiitectural Design and Best Practicces Project | P a g e | 16

A006.1 Delivverable Final Report and Desiggn Recommendaations

data usuaally can be provided witthin 2 to14 b


business dayss.
(http://coompass.doe.in.gov/Dash
hboard.aspx??view=STATE&val=0&d
desc=STATE
Research
hers also mayy submit req
quests for larrge data setss. IDOE has a number of canned rep
ports
for researrchers (e.g., enrollment by school in the last fivee years). How
wever, if reseearchers requ
uest
student-llevel data th
hat is de-iden
ntified, this rrequest is prrocessed by the legal dep
partment and
d
must sign
n a data sharring agreemeent. Once th
he legal deparrtment apprroves the ressearchers req
quest,
s/he mayy use the onliine data requ
uest system for IDOE Coompass. Thee request is entered into a
queue an
nd IDOE perssonnel retrieeve and revieew the de-id
dentified data before it iss released.
Currentlly, the IDOE team is expanding the ssystems repoorting capab
bilities. Theyy note that had
they takeen into accou
unt the typees of reports and departm
mental requiirements from
m the LDS in
n the
beginnin
ng, building and expandiing the systeem would haave been easiier.

Lessons Learned
Throughout Indianas LDS projecct, the LDS team
t
found practices thaat have helped along thee way.
First, IDO
OE discovered the efficaacy of buildin
ng upon legaacy systems like Learnin
ng Connectioon
because it saved timee and moneyy. IDOE also found that prohibiting other linked
d systems to
access soource data en
nhances secu
urity and preeserves a con
nsistent tru
ue record. Data cleaningg is
imperativve. Lastly, acchieving stak
keholder buyy-in and gain
ning feedbacck is importaant in buildiing a
ntatives indiividually proovided IDOE
E
system. Interviewingg different sttakeholders and represen
substantive informattion on stakeeholder requ
uirements (e.g., types of reports and how the sysstem
should perform). Furrthermore, ID
DOE found that gainingg stakeholderr buy-in is allso importan
nt for
a smooth
h LDS implem
mentation.

KeyyTakeaway
Buildingonlegacysystemssavestimeandmo
oney,e.g.,tu
urning
LearningCo
onnection,in
nitiallyateaachernetwo
orkingsystem
m,intoa
datamanaggementtoollratherthan
neliminatinggthesystem
mand
startingove
er.

Archiitectural Design and Best Practicces Project | P a g e | 17

A006.1 Delivverable Final Report and Desiggn Recommendaations

4.2 Iow
wa Deparrtment off Education
State/Agency:
Iowa Depaartment of Education
Web Sitee:
http://ww
ww.iowa.gov/educate/in
v
ndex.php?op
ption=com_coontent&view
w=article&id
d=1691:edinssight
&catid=4
45:data-colleections&Item
mid=2490

Address:
400 E 14th
h St
Des Moinees, Iowa 503319
POC:
Jay Pennin
ngton
Bureau Ch
hief
POC Phoone:
515-281-48837
POC Email:
jay.pennin
ngton@iowa.gov

CaaseProfile
Stud
dentEnrollm
ment:487,55
5917
Teacchers:35,96118
LDSGrant:$8,77
77,45919

Backgrou
ound
In 2008, Iowa initiateed a project to create Ed
dInsight, thee Iowa Deparrtment of Ed
ducations (IDE)
K-12 centtralized dataa warehousee. EdInsight integrated seven years of historical data from Prroject
EASIER (student levvel enrollmen
nt and curricculum data), IMS (speciial education
n data), and the
Iowa Tessting Program
m (student assessment data). The in
nitial budgett for the projject was $1.22
million and had a tottal implemen
ntation cost of $2.9 milliion through FY2009. In May 2009, the
project was funded by an $8.78 million SLDS
S grant whicch would be used to incrrease the scoope
and functionality of EdInsight too be interopeerable with postsecondaary data systems or to creeate a
consolidaated P-16 daata system. The LDS team
m plans to ad
dd additionaal sources of information
n such
as teacheer, financial, transcript, workforce, disaster mitiggation, and additional asssessment data.
EdInsigh
ht is still in itts statewidee rollout phase.

System Design and


d Architecturre
IDE decid
ded to use a commerciall off-the-shellf (COTS) data
d model foor the EdInsiight project
because this particullar data mod
del was desiggned, specifically, for usee in the K-122 space. Duriing
the desiggn process, th
he IDE team
m discovered that some of the data within the sysstem did nott fit
the COT
TS model, parrticularly, th
he post-secon
ndary data. Eventually, however, thiis post-secon
ndary
data wass integrated into EdInsigght.
17

Stateeducationaldata
d
aprofiles.(n.d.).Retrievedfro
om
http://nces.ed.gov/pro
ograms/stateprofiles/sresult.asp?mode=sh
hort&s1=19
18
Ibid.
19
Statewid
delongitudinalldatasystemg
grantprogramgranteestateIowa.(n.d.)
t
).Retrievedfro
om
http://nces.ed.gov/pro
ograms/slds/sttate.asp?stateaabbr=IA

Archiitectural Design and Best Practicces Project | P a g e | 18

A006.1 Delivverable Final Report and Desiggn Recommendaations

Securityy
Security is managed through thee use of role-b
based accesss and trainin
ng.

Data Usa
sage and Rep
porting
EdInsigh
hts data is ussed to condu
uct analyses and producee reports forr education stakeholderss,
such as the IDE staff who are graanted access to data in preformatted
d reports and
d advanced data
analyses,, depending on their rolee and permisssions. Curreently, over 1550 users havve been trained
and given
n access to th
he system an
nd more than
n a dozen prre-formatted
d reports havve been
developeed. At this tim
me, there aree no plans too allow userss to perform ad-hoc querries.
A portion
n of the SLD
DS grant willl fund the creeation of a public portal. This portall will provid
de
aggregatee-level data that will be accessible th
hrough the Web; howevver, the portaal has not yeet
been devveloped.

Lessons Learned
Mr. Penn
nington stateed that gainiing buy-in att the regionaal level was critical to th
he current su
uccess
of the prooject and wiill continue to play a keyy factor durin
ng its statew
wide rollout. He observed that
COTS prroduct did not meet IDE
Es needs and
d took longerr to load and
d to format the data. To
combat this problem
m, the team iss investigatin
ng whether to purchase or develop additional
modules that will fit the post-seccondary and
d workforce data that ID
DE intends too integrate in
nto
EdInsigh
ht.

KeyyTakeaway
Commerciaalofftheshe
elfsolutionssmustbeevvaluatedcarrefullyand
setagainstthesystemscurrentan
ndfuturereq
quirementsinorderto
attaintheirrcostandtiimesavingb
benefits

Archiitectural Design and Best Practicces Project | P a g e | 19

A006.1 Delivverable Final Report and Desiggn Recommendaations

4.3 Arrmy Suiciide Mitiga


ation Pro
oject
State/Agency:
Web Sitee:

Address:
POC:

POC Phoone:
POC Email:

Data Strattegies
http://ww
ww.datastrategiesinc.com
m
P.O. Box 772
Midlothian, Virginia 23113
Susan Carrter
Managingg Partner
Kevin Corrbett
Managingg Partner
804-965-0
0003
SCarter@D
DataStrategiiesInc.com
KCorbett@
@DataStrateegiesInc.com
m

CaaseProfile

#ofRecords:Un
navailable
Proje
ectBudget:Unavailablee

Backgrou
ound
Due to an
n increase in
n suicides, th
he United States Army hired Data Sttrategies to design and pilot a
prototyp
pe data system
m that woulld gather infformation froom disparatee sources in order to identify
predictorrs of potentiial suicides. The goal of the
t pilot prooject was to utilize this information to
establish
h a means to stem the number of suiccides and suiicide attemp
pts. In order to achieve th
his
goal, Arm
my leaders reealized that they would need
n
an inteegrated data environmen
nt that would
d
provide accurate and
d reliable datta for analysiis.
The projeect teams ch
hallenge wass to develop a system thaat would relly on numeroous databasees,20
both govvernment and
d private, th
hat had not been
b
linked previously. The system fiirst must analyze
historicaal data of suiccide cases frrom 2001 to 2008
2
in ordeer to determiine if there are any
commonalities. Thesse commonallities then will
w be match
hed against the records of current solldiers
in the hoope of identiffying those who may be at a high risk for suicidee.
In terms of database managemen
nt systems, although
a
the sample grou
up was relattively small, the
size of th
he records was large. Beccause of the sensitivity of the topic and the need
d to ensure th
he
soldiers privacy and the securityy of their infoormation, peersonal, iden
ntifiable infoormation wass
removed or de-identiified. Furtheer, since the Army had noo stringent performancee requiremen
nts,
such as ad-hoc queries into the system, the majority of th
he analyses were perform
med on histoorical

20

SomeoffthedatasourrcesincludedaareArmy,finan
ncialandmediccal.Theprojecctteammustn
negotiateHIPAA
A
requireements,whichwillimpacttheeArmysabilityytoaggregatethedata.

Archittectural Design and Best Practicces Project | P a g e | 20

A006.1 Delivverable Final Report and Desiggn Recommendaations

data thatt was static. As a result, the time from


m query to data deliveryy could be alllowed to tak
ke
days.

System Design and


d Architecturre
The finall design of th
he system haad not yet beeen determin
ned at the tim
me of the intterview, parttly
due to th
he fact that many of the leaders of the planned daata sources had not yet signed
memoran
nda of underrstanding (M
MOU) or datta sharing aggreements. Because thesee data sourcees
were from
m different industries, th
hey did not ffollow a sharred schema, governance or, in many cases,
data typees. At that tiime, Data Strrategies plan
nned to investigate a num
mber of desiign types thaat
would allow the Arm
my a choice of date typess in the futurre. The projeect team considered varioous
architecttures and sch
hemas, whilee remaining open to variious data typ
pes (e.g., Exccel, Oracle, and
flat file tyypes) to ensu
ure that the system coulld be flexiblee and expand
dable.
A sandboox environm
ment was creaated as a cen
ntralized datta warehouse that copied
d data from its
data sourrces. This warehouse waas used by D
Data Strategiees because th
hey were noot allowed diirect
access too the data sou
urces for thee prototype d
development, however, the sandbox
x allowed Daata
Strategiees to mimic systems thatt could be ceentralized, diistributed orr federated database

managem
ment systemss. For the pu
urpose of thee prototype, queries werre not submiitted live acrross

the intern
net but, insttead, used th
he sandbox eenvironmentt. This meantt that althou
ugh real

performaance of the syystem was not measured


d, this was acceptable siince speed performance was

not a req
quirement of the system at this stagee of developm
ment.

Securityy
In order to meet the security requ
uirements seet forth by th
he program, the Data Strrategies team
m de
identified
d the records from the vaarious datab
bases, but stiill had to be able to link the data to

perform the analysess. To accomp


plish this, thee team creatted unique id
dentification
n (ID) numb
bers.

They werre able to lin


nk this uniqu
ue ID to the records of eaach of the daatabases in the followingg two

ways:
1. They searched for an indiividuals recoords that con
ntained an existing uniq
que ID and th
hen

pushed that unique ID too all the remaaining data sources.

2. They found a unique ID contained wiithin each off the data sources and crreated a tablle of

th
hose IDs at the central data warehou
use.

Due to th
he relatively small subjecct group, botth of these solutions woorked.

Data Usa
sage and Rep
porting
The goal for the systeem is to havee de-identifiied aggregatee data that will allow on
nly authorizeed

users witthin the Arm


my to analyzee the data. T
The data and
d reports werre not made available to the

public orr to any participating data sources.

Lessons Learned
Ms. Cartter explained
d that althou
ugh there weere many arcchitectural an
nd technological barrierrs to

the projeect, the single most comp


plex obstaclee to overcom
me within th
his project was the

managem
ment among the various data sourcess. Although the various Army agenciies were und
der a
mandate by the Secreetary of the Army to parrticipate in th
his pilot proogram, the ex
xternal agen
ncies

Archiitectural Design and Best Practicces Project | P a g e | 21

A006.1 Delivverable Final Report and Desiggn Recommendaations

providingg informatioon were not. The MOU and data shaaring agreem
ments21 had yeet to be
negotiateed and signeed and it wass necessary that these doocuments bee executed prior to Data
Strategiees accessing the data sou
urce informattion and inteegrating it in
nto the armyy suicide sysstem.
Due to th
he number of data sourcees and the underestimattion of resou
urces needed
d to manage
ders and exeecute these tasks, many of the MOU
Us and data sharing agreeements weree not
stakehold
signed du
uring the pillot project. A final impleementation of this system
m would req
quire MOUs and
data sharring agreemeents that cou
uld take yearrs to be signed. Ms. Cartter recommeended that
organizattions planniing to constrruct a longitu
udinal datab
base make su
ure that theyy plan to com
mmit
ments as welll as the
resourcess to the deveelopment of the MOUs and data shaaring agreem
managem
ment of the various stakeeholders welll in advancee of the projeect launch.

KeyyTakeaway
Datagovern
nanceandsstakeholdermanagemen
ntareupfro
ontefforts
buttheyalssorequireo
ongoingeffortsthatshouldnotbeo
overlooked.
Thesetwoe
effortsaree
essentialtoe
ensuretherreliabilityan
nd
expandabilityofthesyystem

21

Theseagreementsdetterminedwhowouldparticip
pate,whatdataawouldbeshaared,howitwastobeutilizeed,and
whereiitcouldbestored.

Archittectural Design and Best Practicces Project | P a g e | 22

A006.1 Delivverable Final Report and Desiggn Recommendaations

4.4 Te
exas Education Ag
gency
State/Agency:
Web Sitee:
Address:

POC:

POC Phoone:
POC Email:

Texas Edu
ucation Agen
ncy
http://ww
ww.texaseduccationinfo.org/tpeir/
Informatioon Analysis, TPEIR Grou
up
Texas Edu
ucation Agen
ncy
1701 North
h Congress Avenue
Austin, Teexas 78701
Brian Raw
wson
Director, Statewide Data Initiativees
Nina Tayloor
Director of Informatioon Analysis
512-463-94
437
512-475-20
085
Brian.Raw
wson@tea.staate.tx.us
Nina.Tayloor@tea.statee.tx.us

CaaseProfile
Stud
dentEnrollm
ment:4,752,1
14822
Teacchers:327,90
0523
LDSGrant:$18,1
195,07824

Backgrou
ound
In 2001, the Texas Leegislature fun
nded a projeect that wou
uld build an integrated data repositoory for
the Texaas Education
n Agency (TE
EA), the Tex
xas Higher Ed
ducation Cooordinating Board (THE
ECB),
and Statee Board for Educator Cerrtification (S
SBEC. The project becam
me known as the Texas PK-16
Public Ed
ducation Infformation Reesource (TPEIR) Projectt. Half of TP
PEIRs origin
nal $7 million
n
appropriiation for thee public acceess initiativee was set asid
de for FY200
02 and FY20
003.25 The syystem
ulted from th
he project teaams work in
ntegrates thee data from disparate daata sources at each
that resu
of the parrticipating agencies. Theese data incllude studentt, educator, and organizaational data from
as far bacck as 1989.
TPEIR was designed
d to ensure th
hat stakehollders within Texas woulld have accesss to high qu
uality
data usin
ng an efficien
nt and effective method to obtain it and would liink student data from eaarly
childhood through postgraduatee study to alllow for longiitudinal anallysis that woould identifyy
patterns and trends within the Texas public education system. Dataa from TPEIR
R was plann
ned to
be availab
ble to intern
nal staff as well as to the public.
22

Stateeducationaldata
d
aprofiles.(n.d.).Retrievedfro
om
http://nces.ed.gov/pro
ograms/stateprofiles/sresult.asp?mode=sh
hort&s1=48
23
Ibid.
24
Statewid
delongitudinalldatasystemg
grantprogramgranteestateTexas.(n.d.
t
.).Retrievedfrrom
http://nces.ed.gov/pro
ograms/slds/sttate.asp?stateaabbr=TX
25
Thefinaltotalcostofttheprojectwass$6.1million,with$1.75millionspentinFFY2002,and$4
4.35millionspeentin
$18.2millionSSLDSgrant,theesecondhighestgrantamountawarded.
FY2003.InMay2010,TTexaswonan$

Archittectural Design and Best Practicces Project | P a g e | 23

A006.1 Delivverable Final Report and Desiggn Recommendaations

Project management of TPEIR was a compleex and ongoing effort. Th


he management of the syystem
was hand
dled by two advisory grooups, the Intteragency Stteering Comm
mittee (ISC)) comprised of the
Informattion Resourcces Managers of each ageency and thee Technical Advisory Grooup (TAG)
compriseed of the prooject manageers of each aggency. The IS
SC met twicce a month too determine
policy, reeview risks, and resolve issues, and tthe TAG mett weekly to determine th
he technical
infrastructure, plan the practicall implementaation, and reesolve techniical issues.

System Design and


d Architecturre
TPEIR was designed
d with two distinct data repositoriess. One reposiitory housed
d aggregated
d
data26 that was de-id
dentified and
d approved ffor public rellease. In ordeer to comply with federall and
state stan
ndards, the otther repositoory contains cconfidential,, student-leveel education data that is
available only to authorized users..
The actuaal developmeent of the sysstem was outtsourced to an outside ven
ndor. The ressulting custoom
system deesign adopteed a combinattion of the R
Ralph Kimball (i.e., a congglomerate of data marts) and
Bill Inmoon methodoloogies (i.e., a siingle data waarehouse) as the foundatiion for the daata warehousse,
which waas similar to that of the TEA K-12 dataa warehouse.

The dataa warehouse stores facts//metrics witthin fact tablles and codees within dim
mension tablles.
An AIX server was used during the developm
ment and tessting processses, but the final data
collection
n was moved
d to a produ
uction serverr.
TPEIR cu
urrently inteegrates data from two daata sources into a centraalized databaase, but its
architectture framewoork allows foor new data sources to be added in order to enh
hance the pow
wer of
the systeem. Figure 2 illustrates th
he TPEIR arrchitecture frramework.
Figure 2: TPE
EIR Architectu
ure Chart27

26

Thisdattacanbeaccessedathttp:///www.texaseducationinfo.org.


27

TexasEd
ducationAgenccy,Information
nAnalysisDivission.(2010).TeexasPK16pub
bliceducationiinformationreesource
ww.texaseducaationinfo.org/ttpeir/TPEIR_Do
ocumentation..pdf
Retrievedffromhttp://ww

Archittectural Design and Best Practicces Project | P a g e | 24

A006.1 Delivverable Final Report and Desiggn Recommendaations

Securityy
The team
m ensured th
he systems seecurity by crreating dimeension tabless that used surrogate keyys
that weree arbitrary, system generrated valuess as unique id
dentifiers. These keys were used to
perform the linkagess in the systeem.

Data Usa
sage and Rep
porting
The systeems report component uses Crystall Reports sollution. The TPEIR data is available to the
public an
nd to authorized stakehoolders. The p
publicly avaiilable data arre used to reeport on Tex
xas
public hiigh school grraduation; Texas collegee and universsity admissioons, enrollm
ment, and
graduatioon; teacher certification,, employmen
nt, and reten
ntion; and scchool districtt employmen
nt. A
completee list of publlicly availablle reports can
n be found at: http://ww
ww.texasedu
ucationinfo.oorg/
Authorizzed TEA stafff members use Rapid SQ
QL or SAS too run queriess against thee data and
generate reports or extract data to be stored in files. Theese results off these queries are return
ned as
quickly as a few secoonds while laarger queriess may take several minuttes.

Lessons Learned
The integgration of th
he data from three differeent agencies and the con
nduct of mulltiple data
collection
ns (while prreserving thee original datta) was a prooblem that the TPEIR teeam faced eaarly in
the plann
ning of the SLDS. It was important tto preserve the original data so that each agencyy
could reccreate historrical results, if necessary. The team conformed data across th
he agencies and
defined standards th
hat would ap
pplied to currrent and futu
ure data colllections. Theey maintaineed
regular meetings of the Interagen
ncy Steeringg Committeee (ISC) and Technical Ad
dvisory Grou
up
(TAG) too exchange information, review chan
nges, resolvee issues, and establish coonsensus.

KeyyTakeaway
Forsystemssthatrequirrethesourccedatatomaintainitsd
data
integrity,th
hedatagove
ernanceiscrritical.Itistthroughthestringent
standardan
ndrulesdefiinitionsthattthedataso
ourcesareaabletoshare
e
theirdataforuseinthe
eSLDSwhile
epreservinggtheoriginaaldatabasess
system.
Another problem thee project team
m faced wass the implem
mentation of the public-facing
f
Web site
that wou
uld allow thee public to acccess data. TEA wanted
d to minimizee the mainteenance
requirem
ments for thiss Web site. To accomplish this, the TPEIR team
m utilized com
mmercial offf-the
shelf softtware and minimized the customizaation of softw
ware tools too maintain th
he Web site..
These toools allowed the developeers to utilizee metadata, common edu
ucational terrminology, online
help pagees, standard reporting foormats, simp
ple navigatioon, and altern
natives to view the data in
text and//or graphic formats with
hout large ex
xpenditures.
As a resu
ult of followiing these besst practices, the TPEIRss final expen
nditures werre nearly tweelve
percent under budgeet.

Archittectural Design and Best Practicces Project | P a g e | 25

A006.1 Delivverable Final Report and Desiggn Recommendaations

4.5 DL
LA Data Converge
ence and Quality Project
State/Agency:
Web Sitee:
Address:
POC:
POC Phoone:
POC Email:

Data Strattegies
http://ww
ww.datastrategiesinc.com
m
P.O. Box 772
Midlothian, Virginia 23113
Susan Carrter
804-965-0
0003
SCarter@D
DataStrategiiesInc.com

CaaseProfile
#ofRecords:7m
millionbaserecords(eacchrecordhaad
apprroximately1520associaatedrecordss)
Proje
ectCost:$2.5millionovver5years

Backgrou
ound
In 2002, the Defense Logistics Aggency (DLA)) hired Dataa Strategies to vet its proocess and sysstems
issues in implementing a Businesss Systems Modernizatioon (BSM) prrogram. Thiss five-year prroject
was part of a larger system overh
haul that DLA implemen
nted during a period of over 10 years.. The
BSM proogram was im
mplemented to upgrade the procurem
ment and fin
nancial systeems that man
naged
DLAs su
upply chain managementt processes. The new proocess requireed DLA to deliver accuraate
information and dataa for businesss, profiling standards, business rulees, and proceesses. Howevver,
because the three cen
nters have evvolved over time, it was difficult to merge them.
Originallly, DLA begaan with three main supp
ply centers that perform
med the sam
me functions on
n 50 years aggo, the three centers had identical
different items. Wheen the projecct first began
architectture and bussiness processses. Over tim
me, howeverr, the three centers evolvved and begaan
fferent methoods and busiiness rules. In 2002, DLA
A initiated a massive meerger of the three
using diff
centers. The goal was to make th
he centers interoperable and compliaant with thee new busineess
rules DLA
A was develooping in ord
der to have ceentralized. In other worrds, although
h the data ceenters
were phyysically sepaarated, the daata was, in a virtual persspective, to be integrated
d and located
d in
one centrral place, sin
nce users neeeded the abillity to retrievve procurem
ments that were located in
more thaan one data center.
This inteegration meaant that the data had to appear to the user to be in one placee so that theyy
could query across th
he centers. The project had three maain stakehold
der groups, which are th
he
owners of each data centers, an
nd a fourth entity, an um
mbrella recorrd center callled the Logisstics
Informattion Group (LIG). Logisttics Information Group approved evverything thaat was done to
the systeem and recorrds, as well as any cleanssing. Moreovver, this entity maintaineed records of
ms the Defen
nse Departm
ment could pu
urchase and
d their negotiiated prices.. In essence,
what item
Logisticss Information
n was the gaate-keeper off what itemss could be prrocured. Thee LIG inform
mation
became known as the golden reccord. Althoough this grooup owned the golden record, theyy were

Archittectural Design and Best Practicces Project | P a g e | 26

A006.1 Delivverable Final Report and Desiggn Recommendaations

not userss of the systeem. While th


he three supp
ply centers had their ow
wn functionss and governiing
teams, alll data had too be compliaant and matcch with the golden recoord.

System Design and


d Architecturre
It became clear that DLA needed
d to develop a set of unifoorm businesss rules that would goverrn all
centers. There was a tremendouss amount of resistance frrom each of the centers as each group was
unwillingg to sacrificee control or autonomy. To eliminate this resistan
nce, Data Strrategies
facilitateed discussion
ns among thee centers in order to reacch a consenssus in creatin
ng the busin
ness
rules. Daata Strategiess worked wiith each of th
he disparatee data sourcee owners to understand not
only how
w each of thee centers and
d systems cap
ptured, proccessed, and stored data, but also how
w they
would eaach need to interact oncee all systemss were integrrated.
The proccess involved
d negotiation
ns between tthe DLA heaadquarters an
nd the centeers on how th
he
data is ussed, how it should appeaar, and whatt specific excceptions in the rules thaat would be
required because of each centerss unique item
m/record typ
pes and security requirem
ments. In thee end,
because of the definittions provid
ded by DLA aand the feed
dback from th
he centers, Data Strategiies
recommeended a singlle set of busiiness rules, w
with defined
d exceptionss in each cen
nter.
Once thee business ru
ules were creeated, Data S
Strategies su
urveyed each
h data centerr and assesseed its
level of data cleanlineess. The asseessment wass based on th
he new set of business ru
ules. During the
data cleaansing process, the team::

ensured that the source datta and the convverged data maaintained theirr independencee without
loosing either conntext. Data Strrategies staff aanalyzed the buusiness rules associated withh the data;
iddentified the ruules and metriccs required to vvalidate that data would mett the new businness rules;
annd created auttomated routinnes to run compplex queries thhat analyzed annd identified reecords that
haad anomalies. The results weere displayed inn both summarry and detailedd reports that showed the
annomalies and the recommendded solutions. T
The solution also included ann approach usiing
exxtensive heurisstics and patteern matching coode to overcom
me embedded data issues.28
The basee population of data was 7 million an
nd each one had 15to 20 associated reecords. Dataa
Strategiees divided th
hem into lotss and evaluatted each lot every 2 week
ks over 18 months. This did
not reducce the quanttity of data, but ensured that it was compliant an
nd collaboraative.
The threee data centerrs were integgrated in virrtual space so that users might accesss and queryy data
and reports, regardleess of the sou
urce. For thee ultimate en
nd-user, Dataa Strategies created narrrow,
role-baseed views of the personal systems so tthat users coould view th
he data that pertained to them
and for which they were cleared.. The reportss comprised of HQ-levell statistics an
nd trends th
hat
outlined the current status of thee data qualitty as a wholee (outlining the risk to th
he migration
n
success) provided deetailed inform
mation at th
he Source Ow
wner, Table, and Attribu
ute levels to
identify where the laargest issues were.

28

DataStraategies.(n.d.).DLADataConvergenceand
dQualityProject.

Archittectural Design and Best Practicces Project | P a g e | 27

A006.1 Delivverable Final Report and Desiggn Recommendaations

Ms. Cartter credits much of the projects succcess to the soolutions cap
pability to crreate detaileed and
dashboarrd views of the results ass well as its recommendations for prroblem resollution.

Lessons Learned
For largee data integraation projects, Ms. Carter recommen
nds several best practicees.
Ms. Cartter first recoommendation
n is to mentoor the stakeh
holders invoolved in the data integrattion,
so that th
hey understaand the histoory of what is being don
ne and why itt is being doone. This givees
these grooups informaation they neeed to make decisions reegarding the project. Con
nversely, prooject
implemen
nters and leaaders must solicit feedbaack from staakeholders in
n order to un
nderstand
stakehold
der requirem
ments as own
ners and useers of the datta. This two--way commu
unication wiill
help an organization
n to establish
h effective daata governan
nce rules.
For Dataa Strategies, informing th
he three centters stakehoolders of the projects pu
urposes and
receivingg feedback gaave them an advantage in bringing the three cen
nters togetheer. As a neutrral
intermed
diary, Data Strategies waas well receivved because stakeholderrs felt they had a voice in
n the
developm
ment of the new businesss rules. Ms. Carter explaained that, A
A neutral ap
pproach is
nt in overcom
ming politicaal silos. In th
he end, if thee different aggencies need
ds are not meet, the
importan
new systtem will be useless.29
Throughout a project, there must be good coommunicatioon between parties that is conducivee to a
nment. A con
nstant review
w of the projject goals an
nd status is essential to
collaboraative environ
ensuring the team is on track. Th
he way the architecture is establisheed in the begginning is
importan
nt because itt lays out thee foundation
n for the rest of the projeect. It is impoortant for th
he
system designers to understand the purposes of the wareehouse beforre they map and design the
architectture. Moreovver, a compleex project neecessitates the engagemeent of an opeen vendor th
hat
will not constrain th
he design of the data arch
hitecture or the project. Finally, VDO
OE must enssure
the techn
nology to be used can bee used by eveeryone and must keep th
he technologgy simple for
longevityy.

KeyyTakeaway
Managingsstakeholderssgatheringginputfrom
mthemandbeingtheir
advocateissimperativeinbuildingbusinessrules.Havingiinputfrom
stakeholdersandactingastheircchampionrresultsinlesssfriction
amongstakkeholders.

29

Carter,SS.(2010,Novem
mber5)Teleph
honeInterview
wwithRonaJob
be,CIT.

Archittectural Design and Best Practicces Project | P a g e | 28

A006.1 Delivverable Final Report and Desiggn Recommendaations

4.6 NO
ORC at th
he Univerrsity of Chicago
C
Data Encla
ave
State/Agency:
Web Sitee:
Address:
POC:
POC Phoone:
POC Email:

NORC at the Universiity of Chicaggo Data Encllave


http://ww
ww.norc.org/D
DataEnclavee/
1155 East 60th Street
Chicago, Illinois 606377
Timothy Mulcahy
301-634-93330
mulcahy-ttim@norc.org

CaaseProfile
#ofRecords:No
oactualnum
mbergiven,b
butatany
ntime,thessystemproceesses40millionrecordss
given
Proje
ectCost:$75
50,000(initially)

Backgrou
ound
The Natiional Opinioon Research Center (NO
ORC) Data Enclave is a ssecure virtuaal environmeent
ng and analyyzing sensitivve microdataa. The Enclaave providess a confidenttial, protecteed
for storin
environm
ment within which authoorized researrchers can access sensitiive micro-daata remotelyy.
A brief su
ummary from
m the NORC
C website:
W
While public usse data can be disseminated in a variety off ways, there is a more limitedd range of
opptions for disseeminating senssitive micro-daata that have not been fully de-identified foor public
usse. Some data producers have sufficient ecoonomies of scalle to develop addvanced in-house
soolutions that seerve the needs of external ressearchers, but most lack the resources to arrchive,
cuurate, and disseminate the daatasets they haave collected. The NORC Daata Enclave proovides our
paartner organizzations a securre platform whhere they can booth host and buuild a researchh
coommunity arouund their dataa.30
The NOR
RC Data Encclave31 was established in
n the early 2000s; howevver, the build-up to the
project can be traced
d to decades of history. There
T
had beeen some moovement with
hin governm
ment
agencies and other orrganizationss to provide access to miicrodata witth sensitive content to
researcheers and to reesearch organ
nizations. In
n 2002, the Confidential Information
n Protection and
Statisticaal Efficiencyy Act was passsed; this waas a mandatee to all the feederal statistical agenciees to
develop a plan to prrovide some level of acceess to some parts of theirr agencies microdata. In
n
2006, thee National In
nstitute on Standards an
nd Technologgy (NIST) reeleased a Reequest for
Proposall (RFP) that described th
he need to coonceptualizee and build a secure rem
mote access
30

DataEncclaveNORCaattheUniversittyofChicago.(n.d.)Retrievedfrom
http://www
w.norc.uchicaggo.edu/DataEn
nclave/
31
TheEncclavesdesignanddevelopmentcostswereeapproximately$750,000.Th
hethirditeratiionisplannedfora
mayreceiveanaadditional$500,000to$750,,000infundingg.
February2011andm

Archittectural Design and Best Practicces Project | P a g e | 29

A006.1 Delivverable Final Report and Desiggn Recommendaations

modalityy that could provide both


h on-site and
d remote acccess to microodata as welll as direct acccess
to the raw
w microdataa.
Originallly, statisticiaans, lawyers,, and agencyy leaders werre very conceerned about allowing
researcheers access too raw data an
nd these grou
ups developed plans to perturb the data prior too
allowing researchers access. How
wever, in 200
06, there waas a significan
nt change in
n thinking an
nd
decades worth of thoought engineering and sscience on hoow to perturrb data to bee ready for
researcheers. The new
w model allow
wed research
hers, other governmentaal agencies an
nd private seector
organizattions to havee access to the actual raw
w microdataa as opposed
d to allowingg them accesss only
to perturrbed data.
This shift
f in policy occurred wheen NORC prroved32 that there were remarkable differences in
research results if a researcher ussed perturbeed data ratheer than raw data. In som
me cases,
researcheed based on perturbed data yielded results oppoosite from what they would have beeen
had reseaarchers been
n allowed to access the raaw data.33 This revelatioon caused leaaders in
government agenciess to question
n whether prrevious policcies and proggrams createed from research
performeed with pertu
urbed data were based u
upon false asssumptions. As a result, these
statisticiians, lawyerss and agenciees who origiinally opposed the idea of providing raw data too
researcheers changed their stand on the matteer.
The challlenge becam
me to find thee true resultss and gatherr the data to be availablee in the publiic
domain, while at the same time protecting th
he confidenttiality of the provider of the data or

survey. Shortly thereeafter, the prroject was launched. It iss sponsored by the Natioonal Institutte of
Standard
ds and Techn
nology, the Kauffman Fooundation, th
he Departmeent of Agricu
ulture, the
Nationall Science Fou
undation, an
nd the Anniee E. Casey Fooundation.
The NOR
RC Data Encclave teams goal was to provide a secure remotte access mod
dality that was
both sophisticated, technologicaally and operrationally, an
nd reasonablly cost and met the
replicatioon standardss and abilityy to push thee risk of breaach as far dow
wn to zero as possible.34
Addition
nally, they aim
med to provvide remote aaccess. Untill the NORC project, acccess to sensittive
data for researchers was a cumbeersome and ttime-consum
ming processs. Researcheers had to acccess
and perfoorm analysess on the dataa on site, and
d were not allowed to leave the build
ding with an
ny
data. Thee process req
quired that researchers b
be mailed their data and analysis afteer an internaal
statisticiian carefully reviewed th
heir analysess. The NORC
C Data Enclaave aimed to relieve that
burden on researcherrs.
In summ
mation, the aiim of the new
w system waas to share soocial sciencee data in a seecure manneer.
NORC plans to prom
mote access to sensitive b
business miccrodata; protect confideentiality; arch
hive,
index, an
nd curate microdata, and
d encourage researcher collaboration
n.

System Design and


d Architecturre
The Encllave is constaantly being enhanced. The Enclave had a soft lau
unch in 20066, with a 6-m
month
incubatioon period (Ju
uly through December 2006). Reseaarchers and focus groupss consulted in
32

Mulcahyy,T.(2010,Octtober28).TelephoneIntervieewwithRonaJJobe,CIT.
Ibid.
34
Mulcahyy,T.(2010,Octtober28),TelephoneIntervieewwithRonaJJobe,CIT.
33

Archittectural Design and Best Practicces Project | P a g e | 30

A006.1 Delivverable Final Report and Desiggn Recommendaations

Decembeer 2006. The NORC staff


ff collected aand respondeed to feedbacck and, afterrward, opened the
Enclave in March 2007. In design
ning and buiilding the En
nclave, NOR
RC employed
d a 14 to 16-p
person
team com
mprised of en
ngineers, ressearchers, infformation teechnologistss, and metadata people,
among otthers, and beegan with white-boardin
ng and mock
k-ups. The in
nitial processs of white
boardingg lasted manyy months an
nd was the reesult of num
merous meetings, feedbacck and
resolutioons. The desiign process was reiteratiive, although
h no scenarioos of how th
he data woulld be
used werre created beecause of thee infinite posssibilities of research questions. Thu
us, the focus

during th
he design proocess was th
he possibilityy of a conven
nient secure remote acceess system th
hat
was virtu
ually impreggnable.
The core infrastructu
ure is, essenttially, a stand
dard implem
mentation wiith CITRIX security
requirem
ments for rem
mote access ability. CITR
RIX providess layers of seecurity for reemote accesss in
general; however, thee Enclave teaam customizzed the envirronment by adding speccialized toolss that
researcheers would neeed to perforrm certain tyypes of analyyses (e.g., staatistical pack
kages). For data
managem
ment, NORC
C has built in
nto the systeem a way of packaging th
he data for th
he researcheers,
tracking what researrchers are dooing with thee data, etc.

Securityy
Before daata is loaded
d onto the En
nclave, it is ccleaned so th
hat it is harm
monized with
h the data seets
already in
n the system
m. Every dataa set that entters the Encllave must goo through a DDI (Data
Documen
ntation Initiative) check
klist to be DD
DI-complian
nt and SDMX
X (Statisticaal Data and
Metadataa Exchange))-compliant for time-serries data. NO
ORC also em
mploys a metaadata servicees

team and
d an IHSN microdata maanagement tooolkit.
NORC utilizes a porrtfolio approach to securrity measures by bundlin
ng multiple protections. The
system uses the Citriix clients bu
uilt-in securiity measuress for the fron
nt-end securrity.

Data Usa
sage and Rep
porting
To accesss data in thee Enclave, evvery research
her first mustt go through
h a vetting prrocess by eacch of
the sponsors. Each sp
ponsor decid
des the ruless on who is eligible and who will be authorized to
access th
he data. Oncee the researcchers get passt this vettin
ng process, th
hey must sub
bmit propossals
and subsstantiate whyy they need raw data froom the Enclaave in essen
nce, why thee public use--data
is not adeequate for th
heir research
h. For federal statistics data sets, researchers mu
ust substantiiate
that theirr research is within the mission of th
he federal aggency and th
hat the data required is foor
pure reseearch purposses not for marketing, law enforcem
ment, etc. Th
he proposalss must inclu
ude
their plan
nned statistiical and disssemination m
methods and
d potential ou
utlets. In gen
neral, there are
several coontractual steps before any research
her can be grranted accesss to raw data. Currentlyy,
some of the data avaiilable within
n the Enclavee include:

NIST-TIP
o ATP Survey of Joiint Venturess (JV)
o ATP Survey of Ap
pplicants
o Business Reportin
ng Survey Seeries (BRS)
USDA/ERS/N
NASS
o Agricu
ultural Resoource Managgement Surveey (ARMS)
National Scieence Foundaation
o Surveyy of Earned Doctorates ((SED)

Archiitectural Design and Best Practicces Project | P a g e | 31

A006.1 Delivverable Final Report and Desiggn Recommendaations

o Surveyy of Doctoraal Recipientss (SDR)


Kauffman Fou
undation
o Kauffm
man Firm Su
urvey (KFS)
Annie E Caseey
o Makin
ng Connectiions Survey ((MC)

Access too the Enclave is gained th


hrough a W
Web site portaal at https:///enclave.norcc.org. Users are
required to download
d the CITRIIX Client to their desktoops and use the usernam
mes and passw
words
that weree provided to them. Inside the Enclaave are collab
boration toools, statisticaal software
packagess, discussion
n forums, etcc. (See Figuree 3: NORC Data Enclavee Screenshotts for samplee
screen sh
hots from thee NORC Encclave presen
ntation). How
wever, althoough research
hers are ablee to
collaboraate with other researcheers within th
he Enclave, th
hey are not allowed to taalk to one otther
or share data.
Figgure 3: NORC
C Data Enclavee Screenshots

Data Do
ocumentation & Sha
ared Code Libraries
Click
C
on docum
ments and
folders to open or
o navigate
in the struccture

Use the menu to


U
o
create folders or
up
pload documents

When daata is provid


ded to researcchers, certaiin informatioon is strippeed from it e.g., social seecurity
numberss, addresses, birth dates, and other ob
bvious identtifiers. The leevel of data stripping is
dependen
nt upon whiich agency has supplied it. In some cases, agenciies will allow
w access to the
raw micrrodata, thouggh with som
me data noisee. The Enclavve aims to prrovide as mu
uch data
granulariity to its useers as possiblle, so that th
heir research
h results are as true as poossible.
The systeem currentlyy has more th
han 200 reseearchers acrooss various sponsor areaas. The numb
ber of
users is expected to increase to more than 30
00 within th
he next six months and, potentially, to
more thaan 600 in thee next two orr three yearss.

Archittectural Design and Best Practicces Project | P a g e | 32

A006.1 Delivverable Final Report and Desiggn Recommendaations

Lessons Learned
For Virgiinia, one solu
ution will noot work itss LDS will reequire a mix
xture of differrent solution
ns.
There aree several opttions that th
he state could
d pursue, esp
pecially if daata will be sh
hared with the
public, but security and confiden
ntiality mustt be preserveed. One meth
hod of sharin
ng data with
h the
public is through a batch executiion job. In esssence, a ressearcher willl submit a reesearch questtion
or query and an interrnal staff meember will ru
un the analyssis and prooof for disclosu
ure, thereaftter
returningg the outputt to the reseaarcher.
Mr. Mulccahy advisess that it is more efficient now for peeople to build their own systems.
Moreoveer, the differeent componeents in a systtem e.g., bussiness intelligence tools, should
complem
ment one anoother and nott compete with each oth
her. When planning to build, keep your
options open.
In the areea of projectt managemen
nt and building the arch
hitecture, enssure that staaff and personnel
have defiined roles an
nd responsib
bilities. Ensure that the project has a well-defineed plan and goals.
NORC notes that in such technical projects,, the techniccal issues aree the easiest to overcomee;
he more diffiicult tasks.
project management and stakehoolder managgement are th

KeyyTakeaway
InbuildinganLDSthatsharesinformationwitththepublicc,keepyourr
optionsope
enanddono
otbeconstrrictedtoone
eproductso
olution.

Archittectural Design and Best Practicces Project | P a g e | 33

A006.1 Delivverable Final Report and Desiggn Recommendaations

5 Subject Ma
atter Expert Inte

erviews
The reseaarch team in
nterviewed teen Subject M
Matter Experrts (SMEs) in order to provide the VLDS
team witth specific feeedback on barriers, risk
ks, design isssues and opp
portunities associated with
implemen
nting a largee data system
m. The follow
wing individ
duals were in
nterviewed and, based on
n the
information collected
d during theese interview
ws, the projecct team conssolidated theemes into th
he key
points prresented in this section. Specifics keey points from
m each interrview are proovided in Seection
3.
SUBJECT MA
S

ATTER EXPER
RT

AREA OF EXPERTISE

D Bhavani Th
Dr.
huraisingham

n Security
Information
n Managementt
Information
Data Managgement, Minin
ng, and Securityy
Data Minin
ng for Counter--Terrorism.

Mr. Paul Carney


M
e

Higher Edu
ucation
nternet-Based Services
Building In

Mr. James Cam


M
mpbell

Large Data Integration


w
Technologyy and Data Flow

Ms. Susan Carrter


M

Data Managgement
n Technology
Information
New Techn
nology Researcch

M Raj Ramessh
Mr.

Information
n Technology
Software Prroduct Develop
pment
Enterprise Database Systeems/ eLearningg Systems
housing
Data Wareh
Very Large Database Systtems (VLDB)
d Collaborativee eLearning Sysstems
Web-based
Architectures,, Portals and CRM
Enterprise A

Dr. Ron Kleinm


D
man

XML
Java

Mr. Peter Dobler


M
l

Software Development
Sybase
SQL

Dr. Laura Haass


D

Computer E
Engineering
Systems Deesign
VLDB (Verry Large Databaase Systems)

Thilini Ariyachandra
T

Information
n Systems
ntelligence
Business In
Modeling
Data Managgement and M
Impacts of Social Networrking

Dr. Cynthia Dwork


D
w

Privacy Preeserving Data A


Analysis
Differentiall Privacy
Cryptograp
phy
Distributed
d Computing

Archittectural Design and Best Practicces Project | P a g e | 34

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.1 Drr. Bhavan


ni Thurais
singham
Title:

Direcctor, Cyber Security Cen


nter
Profeessor of Com
mputer Scien
nce
Organizaation: Univversity of Tex
xas at Dallass
Phone:
972-883-4738
Email:
bhavvani.thuraisingham@utd
dallas.edu

Backgrou
ound
Dr. Thuraisingham reeceived IEEE
E Computerr Societys prrestigious 1997 Technicaal Achievemeent
Award foor outstand
ding and innoovative contrributions to secure dataa managemen
nt. Her reseearch
in inform
mation securiity and inforrmation man
nagement haas resulted in
n over 60 jou
urnal articless, over
200 referreed conferen
nce papers, and three US
S patents. Sh
he is the auth
hor of seven
n books in daata
managem
ment, data mining and daata security including on
ne on data mining for coounter-terrorrism.

KeyyTakeaway

Federated mod
del works beest when th
he domain off questions
is no
ot well know
wn.
It is important that the datta governan
nce be well thought
t
outt
so th
hat the acceess controls across the data sourcees is
conssistent.

Summary
ry
During th
he course of the intervieew, Dr. Thuraisingham made two strrong points; one related to the
performaance of a fedeerated modeel, and the otther related to data goveernance. She stated that if the
domain of questions to be answeered by the system is nott well known
n, then the best distribu
uted
databasee model would be federatted.
Her majoor concern was in the areea of data goovernance. Dr. Thuraisin
ngham stated
d that it is
importan
nt that the d
data governan
nce be well thought out so that acceess controls across the data
sources are consisten
nt. She furth
her stated thaat You need
d to ask the question: Ass data moves up
the hieraarchy (via joiins), does the governance model stilll work?

Archittectural Design and Best Practicces Project | P a g e | 35

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.2 Pa
aul Carne
ey
Title:
Vice President, Technical Seervices
Organizaation: Natu
ural Insight
Email:
pcarney@naturaalinsight.com
m

Backgrou
ound
Combiniing best tech
hnical practices with maanagement know-how, Mr. Carney has excelled in the
fields of higher educaation, consullting and Intternet-based
d services. He launched his first Inteernet
business in 1997, and
d has since helped build additional In
nternet-baseed service orrganizations in
both con
nsumer and business envvironments. More recenttly, Mr. Carn
ney oversaw the development
of the Naatural Insigh
ht solution, a robust resoource for man
naging and optimizing distributed
workforcces.

KeyyTakeaway

Con
nsider a virtu
ualization compute
c
mo
odel to manage
proccessing requ
uirements.
Con
nsider creatiing multiplee hash valuees based on various
perssonal identiifiable inform
mation elem
ments.
Bew
ware of log fiiles created on a system
m containingg
tran
nsactional data.
d

Summary
ry
Mr. Carn
neys experieence with larrge scale disttributed systems exposeed him to thee use of
virtualizaation as a meeans to manage computiing resourcees. Mr. Carneey felt that when the ressource
utilizatioon is unknow
wn, virtualizzation should
d be consideered to manaage processin
ng requiremeents
for the syystem.
Some of his customerrs include fin
nancial instiitutions thatt require prootection of personal
identifiab
ble informattion (PII). Hee stated thatt creating mu
ultiple hash values based
d on variouss
personal identifiable information
n elements sh
hould be con
nsidered in the implemeentation

architectture. Mr. Carrneys experrience demon


nstrated thiss technique allowed for multiple
opportun
nities to find
d a match acrross the variious data sou
urces. When
n the discusssion moved to
privacy and the need
d to protect the individuaal, Mr. Carn
ney felt that applying a hash algorithm to
the PII was a big step
p in archivin
ng that requiirement.
He expreessed concerrn for log filees on computting systemss. All log filees created on
n a system neeed to
be evaluaated for the type of inforrmation theyy contain. It is possible th
hat some logg files (e.g.,
operatingg system, ap
pplication) coould contain
n transaction
nal data thatt would violaate privacy
policies.

Archittectural Design and Best Practicces Project | P a g e | 36

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.3 Ja
ames Cam
mpbell
Title:
Organizaation:
Phone:
Email:

Impllementation Strategist
SIF Association
202--607-5491
jcam
mpbell@sifasssociation.orgg

Backgrou
ound
At SIF, Mr. Campbelll is responsiible for leadiing and takin
ng ownershiip over proviiding value-aadd
for memb
bers and pottential memb
bers of the Association and their plaanned or onggoing SIF
Implemeentation and Developmen
nt. Prior to joining SIF, he was the In
ntegration Team Managger for
the Oklah
homa State Departmentt of Educatioon. In his rolee, he manageed many statte and local
projects aimed at imp
proving the technology and data flow
w across 540
0 school disttricts and ch
harter
schools.

KeyyTakeaway

Virgginias SLDS
S model is definitely
d
diifferent from
m what
otheer states havve developeed.
Mosst current SIF developm
ment effortss are focusin
ng on
inteernal user po
ortals.
Federated mod
del requires strong dataa governance.

Summary
ry
Mr. Cam
mpbell has ex
xposure to a number of LDS implemeentations aroound the nattion. In all known
implemen
ntations, he stated that Virginias SL
LDS model is definitely different from
m what otheer
states haave developed
d. He furtheer stated thatt SIF would be interesteed in the Virgginia solutioon as
SIF is con
ncerned aboout integratioon across staates.
The SLDS portal wass described to Mr. Camp
pbell as a pub
blic facing and internal facing
implemen
ntation. He responded that most SIF
F developmeent efforts arre focusing on the intern
nal
user porttal; howeverr, the intent is to have on
ne portal for both internaal and extern
nal users. Th
he
requirem
ments on dataa access are easier to maanage for thee internal useers initially. When addin
ng a
public facing connecction, the req
quirements become
b
moree difficult.
Towardss the end of the interview
w, the team discussed
d
daata governan
nce with Mr.. Campbell. He
explained
d that the feederated arch
hitecture req
quires strongg data govern
nance as thee model prom
motes
a hierarch
hy of govern
nance. He ad
dvised the teaam that the State of Washington maay be a good
model too investigate for data govvernance ideaas.

Architectural Design and Best Practicces Project | P a g e | 37

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.4 Su
usan Cartter
Title:
Organizaation:
Phone:
Email:

Man
naging Partner
Dataa Strategies, Inc.
804-965-0003
SCarrter@DataSttrategiesInc.com

Backgrou
ound
Susan Caarter has oveer twenty years of experiience in the Data Managgement and Information
Technoloogy field. Mss. Carter buiilt a successfful Women-O
Owned Smaall Business (WOSB) in Data
ment and IT consulting. Responsiblee for the reseearch and inttegration of new technoologies
Managem
into seveeral large corrporations, she has helpeed organizations such ass the Defensee Logistics
Agency (DLA), MCI (WorldCom
m), E.I. DuPoont, and SmiithKline Beeecham gain in
ncreased
efficienciies and mark
ket share thrrough innovaative uses off new and prroven technoologies.

KeyyTakeaway

Mulltiple person
nal data elements can be used in combination
n
to crreate a hash
h key that will result in
n more uniqu
ue IDs.
Use of multiplee hash keys based on vaarious sets of
iden
ntifiable info
ormation alllows differeent databasses with
diffeerent identiifiable inforrmation to have a higheer
prob
bability of forming link
kages.
Thee inability to
o store linkaages will greeatly limit performancee
Any
y changes to
o the data so
ources will require tigh
ht data
goveernance

Summary
ry
Ms. Cartter found Virrginias uniq
que privacy requirements
r
s for the SLD
DS similar too data integration
work shee performed for the Army. This work
k combined information
n from variou
us databases
includingg military, fin
nancial, med
dical, and pssychological.. Similarly, th
he subject off interest forr
forming these data linkages was one that woould require the utmost security to ensure the prrivacy
of the ind
dividuals.
To addreess this issuee, Ms. Carterr suggested the
t use of noot a single daata element to create a hash
key, but of multiple data elementts that could
d be combineed in a know
wn way priorr to being haashed.
Another technique sh
he suggested
d was the usse of multiplee hash keys based on a variety of sep
parate
or combiined data eleements. Thiss technique would
w
increaase the possiibility of creeating match
hes
across daatabases thatt may not haave the samee data elemen
nts across alll databases, but have at least
one. The other advan
ntage to utiliizing multiple hash keyss as identifierrs is the abillity to match
h
records when there are errors or inconsistencies in certaain data elem
ments such ass a misspelleed
name in one particular database. The use of multiple hassh key identiifiers and an algorithm too
determin
ne confidence in a positivve match woould result in
n matches th
hat a single hash key
identifierr would overrlook. The key to successful matchess would be through the identificatioon of
which daata elementss, or combinaations, woulld result in th
he highest number of poositive match
hes
ducing false matches. Mss. Carter sugggested that the intelligeence agenciees (e.g.,
while red

Archittectural Design and Best Practicces Project | P a g e | 38

A006.1 Delivverable Final Report and Desiggn Recommendaations

Departm
ment of Homeeland Securiity, CIA, and
d FBI) are woorking inten
nsively in thee area of mattching
multiple identifiable elements an
nd the develoopment of allgorithms th
hat result in high confideence
levels of matching.
Ms. Cartter also spok
ke briefly on Virginias in
nability to store linkagess persistentlyy. In her
experience this woulld greatly lim
mit the perfoormance of a system. Ms. Carter has worked witth
some sysstems that haave been dessigned to nott store linkages due to privacy or oth
her policy
requirem
ments; the sysstem was ab
ble to adopt tthis design because therre was a high
h level of
importan
nce on privaccy and a low
w level of imp
portance on performancee requiremen
nts.
Lastly, th
he topic of data governan
nce was toucched upon during the in
nterview. Mss. Carter streessed
the impoortance of tigght data goveernance in any system whose data soources woulld regularly have
changes or when new
w data sourcces are added
d. 

Archittectural Design and Best Practicces Project | P a g e | 39

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.5 Ra
aj Rames
sh
Title:
Organizaation:
Phone:
Email:

CEO
O
CTE
EC
7037766-5774
rraj@
@ctec-corp.ccom

Backgrou
ound
Mr. Ram
mesh has overr 17 years of management and techniical experien
nce in inform
mation techn
nology
and softw
ware producct developmeent. Mr. Ram
mesh's areas of technical expertise in
nclude: Enterrprise
Learning Sysstems, Data Warehousin
ng and Very Large Datab
base Systemss
Databasee Systems/eL
(VLDB), Web-based
d Collaboratiive E-Learnin
ng Systems, Enterprise Architecturees, Portals an
nd
CRM. Mr. Ramesh iss a co-inventtor of a Paten
nt-Pending Task Share platform for the desktop
p
environm
ment and han
nd-held wireeless devicess.

KeyyTakeaway

Used a three-tiier design prrocess for projects;


p
presentation,
busiiness logic, database.
d
In data
d
analysiss projects w
with large daatasets, userrs accept
delaayed results.
Devvelopment of
o global sch
hema helps identify
i
datta type
disccrepancies.
Dataa integratio
on on-the-fly
y is a new teechnology area
a
that
has few produccts.

Summary
ry
Mr. Ram
mesh, like Mrr. Carney, haas extensive background
b
d with large scale system
m integration
n. His
current projects invoolve multi-soource data in
ntegration accross disparaate organizaations (Federral
Governm
ment). He usees a three-tieer design proocess for proojects; presen
ntation, bussiness logic,
databasee. Mr. Ramesshs current projects con
nsist of large data sets (teens of millions of record
ds). In
these datta analysis projects, userrs accept lon
ng delays in receiving ressults. As the projects havve
moved too production
n and have matured, userrs now are reequesting reeal-time capaability in
reportingg.
When working with
h large disparrate data setts, Mr. Rameesh stated th
hat the devellopment of a
global schema helps identify dataa type discreepancies. Th
here are visuaal mapping tools that prrovide
good assiistance in working out the global schema. The fiinal point hee made was that to his
knowled
dge data integgration on-tthe-fly is a neew technoloogy area thatt has few prooducts.

Archittectural Design and Best Practicces Project | P a g e | 40

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.6 Ro
on Kleinm
man
Title:
Organizaation:
Phone:
Email:

CTO
O
SIF Association
202-607-8526
rkleiinman@sifasssociation.orrg

Backgrou
ound
Prior to joining SIF, Mr. Kleinmaan was the Chief Technical Evangeliist for Sun Developer
Relationss, and served
d as Sun's representativee on multiplee industry-w
wide Java and
d XML standards
committeees. He has extensive ex
xperience con
nsulting witth developers
r who are trrying to "javaa-tize"
their exissting applicaations. He haas prepared and delivereed numerouss presentatioons on Java
technologies both in the U.S and
d overseas. His particularr areas of exp
pertise inclu
ude Java on the
Server (E
EJBs and servver-side APIIs), Jini, Javaa-based devicce access con
ntrol and maanagement, and
more recently, XML.

KeyyTakeaway

Gran
nularity of searching
s
caan impact security poliicy
imp
plementation
n.
Real-time hash
hing could be a perform
mance ancho
or.
Use of federateed identity cconcepts maay provide a solution
for record
r
mapp
ping across sources.
Use of a centrall security au
uthority buiilt into the process
p
coulld enforce strong
s
data protection
p
iin the system.
Sepaaration of crross-walk table
t
with fiirewall.

Summary
ry
Dr. Klein
nman spent a majority off the intervieew focused on the securiity implemen
ntation for the
LDS. He pointed out that the graanularity of searching caan impact seccurity policyy implementtation.
mentation alllows finer reesolution in the search, the stronger the securityy
The moree the implem
policy neeeds to be deefined. Dr. Kleinman notted that the use of a centtral security authority coould
enforce strong data protection in
n the system
m.
Dr. Klein
nman recomm
mended thatt the team coonsider federrated identitty concepts to provide a
solution for record mapping acrooss sources. This federatted identity can be used to enforce
consisten
nt security policies acrosss the data sources. He felt that secu
urity measurres provided by
operatingg systems, database man
nagement systems, firew
walls, and rou
uters can add
d to the secu
urity
implemen
ntation for the LDS. Forr example, th
he linking/crrosswalk tab
ble in the dessign could be
separated
d from the reest of the sysstem with a firewall.
A final pooint made byy Dr. Kleinm
man was the concern thaat real-time hashing coulld be a
performaance anchor for the LDS.. Some userss may find th
he delay in reesults to be unacceptable.

Archiitectural Design and Best Practicces Project | P a g e | 41

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.7 Pe
eter Dobler
Title:
Organizaation:
Phone:
Email:

Presiident
Dobller Consultin
ng
8133322-3240
pdob
bler@doblercconsulting.ccom

Backgrou
ound
Mr. Dobller started his profession
nal career moore than tweenty two yeaars ago in software
developm
ment. After working man
ny years as a consultant for the threee largest Swiiss banks hee
founded his own con
nsulting busiiness in 19977. Mr. Doblerr is a recognized expert in Sybase AS
SE,
Sybase Replication Server and Syybase IQ. Hee also has maany years of Oracle expeerience, inclu
uding
the latestt 11g release. Mr. Dobler also has in-d
depth know
wledge of SQL
L Server 200
00 and 2005..

KeyyTakeaway

Thee problem with a federaated databasse is the perrformance


Creaate a hash using not on
nly identifiab
ble informaation but
exteernal data ass well, such
h as the sourrce.

Summary
ry
The majoority of the discussion with Mr. Dob
bler focused on the perfoormance of th
he federated
d
databasee and how too improve itss efficiency. If Virginia were able to perform various filteringg of
the data prior to enteering the SLD
DSs data en
ngine, the perrformance could be improved
ntly. He sugggested off-th
he-shelf solu
utions includ
ding those from Sybase IQ and SAP.
significan
Mr. Dobller also weigghed in on th
he use of hash key identiifiers and sugggested a haash key based
d on a
combinattion of data element(s) and externall data. In a particular example, he coombined PII data
within a record with
h the source database ideentifier. The use of exterrnal data wou
uld help to add a
nd informatiion to the haash key.
layer of protection an

Archittectural Design and Best Practicces Project | P a g e | 42

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.8 Drr. Laura Haas


Title:

IBM Fellow
Direcctor, Compu
uter Science
Organizaation: IBM Almaden Research Cen
nter
Phone:
408-927-1700
Email:
lauraa@almaden.iibm.com

Backgrou
ound
Dr. Haas is an IBM Distinguished
d Engineer and Directorr of Computeer Science att Almaden
Research
h Center. Preeviously, Dr. Haas was a research staaff member and managerr at Almaden
n. She
nown for herr work on th
he Starburst query proceessor (from which DB2 UDB was
is best kn
developeed), on Garlicc, a system which allowed federation of heteroggeneous dataa sources, and
d on
Clio, the first semi-au
utomatic toool for heterogeneous sch
hema mappin
ng. Dr. Haas is Vice Pressident
LDB Board off Trustees, a member of the IBM Acaademy of Teechnology, an
nd an ACM
of the VL
Fellow.

KeyyTakeaway

Wh
hen perform
ming joins on
n-the-fly, it is importan
nt to
min
nimize the vo
olume of th
he data and minimize
m
th
he trips
back
k and forth..
A seemi-join can
n be an efficient way to
o link data on-the-fly
o
and may be bettter than a liinking tablee/directory.
A neested-loop join
j
can be u
used on thee most consttrained
sourrce from thee query and
d then join data
d
from th
he other
sourrces as need
ded.
Com
mmercial daatabases ten
nd not to usee join index
xes due to
the amount of look-ups
l
wh
hich degrad
de performaance.

Summary
ry
The interrview with Dr. Haas wass one focused on optimizzing the desired architeccture of Virgginias
SLDS. Kn
nowing thatt Virginia woould not be able to store linkages perrsistently an
nd the system
m
would neeed to be a feederated dattabase, Dr. Haas was ablle to share vaarious techn
niques that would
improve the perform
mance of the system. Firstt and foremoost, Virginia would need
d to minimizze
both the volume of data being traansmitted an
nd minimizee the back an
nd forth trip
ps between th
he
data sourrces and the SLDS.
Instead of a linking table, Dr. Haaas stated thaat in-memorry joins would likely resu
ult in greateer
performaance, depend
ding on the types of querries that are conducted. Linking tablles or join in
ndexes
are not tyypical of com
mmercial dattabases as th
hey could req
quire multip
ple look-ups,, thereby waasting
cycles an
nd decreasingg efficiency. Dr. Haas did
d not suggesst a linking table for mulltiple databaase
systems. The SLDS architecture could be designed to maaximize perfformance byy identifying the
types of queries and the frequenccy pattern off each type of query. Sem
mi joins, merrge joins, and
d
nested-looop joins weere different types of join
ns that Dr. Haas recomm
mended as viaable optionss, each
with perfformance beenefits depen
nding on thee database size, query typ
pe, and querry frequencyy
patterns..


Archittectural Design and Best Practicces Project | P a g e | 43

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.9 Drr. Thilini Ariyachandra


Title:

Assisstant Professor, Manageement Inform


mation
Systeems
Organizaation: Xaviier Universitty
Phone:
513-7745-3379
Email:
ariyaachandrat@x
xavier.edu

Backgrou
ound
At Xavierr, Dr. Ariyacchandra teacches principlles of inform
mation system
ms, business intelligencee and
data man
nagement an
nd modeling exposing stu
udents to BII and database offerings by
b Teradataa,
Oracle, Microsoft and Microstrategy. She haas received seeveral award
ds for scholarly excellencce.
Her reseaarch is focussed on the seelection, desiign, implemeentation of business inteelligence
solutionss in organizaations, inform
mation systeem success as well as imp
pacts of sociial networkiing.
She has published in high impactt practitioneer and acadeemic journalss.

KeyyTakeaway

In a federated model, smalll query setss work best,, and ad hocc


can lead to poo
or performan
nce.
Tren
nds show distributed database im
mplementatio
ons movingg
tow
wards the fed
derated mod
del.
In-m
memory dattabases are not as scalab
ble as stand
dard DBMS.

Summary
ry
Dr. Ariyaachandras paper that coompared diffeerent data warehouse arrchitectures provided a
starting foundation for the intervview. She actually was hoping to tallk the team out of the
d model, butt soon realizeed that was not an optioon. In a federrated model,, Dr. Ariyach
handra
federated
noted thaat a small qu
uery sets work best (con
ntrolled envirronment), an
nd ad hoc caan lead to pooor
performaance. Based on her interp
pretation of the architecture, the prooposed federrated architeecture
nalysis of data and, thereefore, may noot be impactted by
does not easily suppoort ad hoc an
performaance.
Dr. Ariyaachandra exp
plained that her current consulting engagementts show that trends in
distributted databasee implementaations are moving towarrds the federrated model. She also notted
that in-m
memory dataabases are noot as scalablee as standard
d DBMS; how
wever, vendoors are startiing to
offer prod
duct optionss.

Archittectural Design and Best Practicces Project | P a g e | 44

A006.1 Delivverable Final Report and Desiggn Recommendaations

5.10 Drr. Cynthia


a Dwork
Title:
Organizaation:
Phone:
Email:

Distiinguished Sccientist
Micrrosoft Reseaarch
650693-3701
dwork@microsooft.com

Backgrou
ound
Dr. Dworrk is the worrld's foremost expert on
n placing privvacy-preservving data anaalysis on a
mathemaatically rigorrous foundattion. A corneerstone of th
his work is differential privacy, a stroong
privacy guarantee peermitting higghly accuratee data analyssis. Dr. Dwoork has also made seminaal
contributions in cryp
ptography an
nd distributed computin
ng, and is a recipient of the Edsger W.
Dijkstra Prize, recogn
nizing somee of her earlieest work establishing th
he pillars on which everyy
fault-toleerant system
m has been bu
uilt for decad
des.

KeyyTakeaway

Theeres no prin
ncipal way tto sanitize data.
d
Reiidentificatio
on techniqu
ues are gettiing faster an
nd cheaper.
Diffferencing attacks are ab
ble to re-ideentify inform
mation even
n
with
h large aggrregate data.
Thee addition off noise is u
used to prevvent variouss attacks
inclu
uding differrencing and
d averaging.
VDO
OE should consider wh
hether histo
orical record
ds should bee
arch
hived after a period of time as a seccurity measu
ure.

Summary
ry
As a privvacy expert, Dr. Dworkss interest in Virginias
V
SL
LDS was the security rulles being plaaced
on the arrchitecture and on the rooles of users given accesss to the dataabase. She waas quick to state
that re-id
dentification
n of data wass becoming faster
f
and ch
heaper and, as a result, th
hat there waas no
principall way to saniitize data. Dr. Dwork strressed the im
mportance of data securiity and
anonymization. She offered her in
nsights on various
v
techn
niques that are used to re-identify daata
and coun
ntermeasuress to those teechniques.
Upon leaarning that Virginia inten
nded to pressent aggregaate data to th
he public, Drr. Dwork
provided
d examples of differencin
ng attacks that
t
use twoo large data sets with sim
milar inform
mation
and could
d be used to re-identify individuals. To combat these differeencing attack
ks the
introduction of data noise can be added to the actual data which, if done propeerly, can
effectivelly stop the ability to average the resu
ults of aggreegate data. The addition of noise should
not signiificantly skew
w the aggreggate data, bu
ut may be aggainst certain
n regulationss or policies..
Another means of prootecting agaainst differen
ncing attacks is to limit a users abiliity to submitt
queries or limit theirr ability to ru
un queries th
hat are too siimilar.
As a geneeral question
n, Dr. Dwork
k asked if thee SLDS wou
uld have a lim
mit on its abiility to queryy
historicaal data. By archiving or liimiting the SLDS accesss to data oldeer than a deffined period,, the
SLDS can
n again limitt potential seecurity attaccks. This lim
miting of dataa also would
d have the ad
dded

Archittectural Design and Best Practicces Project | P a g e | 45

A006.1 Delivverable Final Report and Desiggn Recommendaations

benefit of reducing th
he total recoords within the databasee which wou
uld increase performancee of
the systeem.
When diiscussing thee internal usse of the SLD
DS and the ab
bility of reseearchers to gain access too
record level data, Dr. Dwork talk
ked about th
he high risk of privacy brreach and thee need for
ns and ruless that should
d be placed on these userrs. Security of a system must be
limitation
implemen
nted as a com
mbination of both techn
nological seccurity measu
ures and secu
urity policy that
governs the individuals with access to the syystem; only by having booth, can a sysstem maintaain a
high leveel of securityy of its inform
mation.

Archittectural Design and Best Practicces Project | P a g e | 46

A006.1 Delivverable Final Report and Desiggn Recommendaations

6 SLDS Arch
hitecture
e Overview
The SLDS architectu
ure consists of seven funcctional
componeents. Commeercial-of-thee-shelf (COT
TS) productss will
be used where appliccable, and sh
hared compu
uting resourcces
will be used in the ph
hysical implementation where
applicablle.
The SLDS architectu
ure can be rep
presented byy a bulls-eyee
signifyingg the data-centric naturre of the arch
hitecture. Th
he
importan
nce of security is reflecteed in the representation
through the dual ringgs that surroound criticall componentts
nside the poortal. For exaample, a secu
urity ring
located in
surround
ding data ind
dicates tightt security of that data
componeent, and the ring surroun
nding the fou
ur tools and task
ntrols built in
nto the otheer functionall componentts.
oriented componentss illustrates security con
The SLDS Portal provides the keey interface into the arch
hitecture.

6.1 SL
LDS Seve
en Functiional Com
mponents
s
6.1.1 Portal
Thee front door into the Stattewide Longgitudinal Datta System (S
SLDS) is throough
the SLDS Portall. The SLDS Portal proviides both pu
ublic (anonym
mous) and priivate
(nam
med) users with a varietyy of functions and servicees. Developm
ment of the Portal
willl performed using a mod
dern applicattion framewoork, e.g., .Neet or Java and
da
conttent manageement system
m, e.g., DotN
NetNuke or Umbraco.
Named users gain access after th
hey have requ
uested an account, and their requestt has been
approved
d by the apprropriate ageencies. Once approved, th
he named usser account has access too help,
training, the Lexicon
n, requests foor data, statu
us of requestts, and accou
unt maintenance includiing
password
d reset.

SLDS Poortal Compoonents


The Porttal provides access to virrtually all of the SLDS coomponents to include th
he Shaker,

Reports, Lexicon, Daata, and a lim


mited amoun
nt of Workflow. In addittion to the SL
LDS
Components, the Porrtal providess services su
uch as help fiiles, frequenttly asked qu
uestions (FAQs),
hyperlink
ks to Agencyy reports, an
nd the abilityy to request a private (naamed) accoun
nt. Figure 4
provides a conceptuaal representaation of the functions
f
an
nd services which will bee accessible
through the SLDS Poortal.

Archittectural Design and Best Practicces Project | P a g e | 47

A006.1 Delivverable Final Report and Desiggn Recommendaations

Figure 4: Concceptual SLDS Web Portal

Public (A
(Anonymouss) User Functionalit
n
y
Public usser will havee access to th
he following features and
d functionality:
Help files for functions which are avaailable to public users.
Frequently assked questioons (FAQs).
Prebuilt aggregated data reports whiich have been
n approved by the goverrnance structure.
Lexicon elem
ments which have been ap
pproved by the governan
nce structurre.
Hyperlinks too Agency rep
ports on other websites.
Electronic req
quest and workflow for Named Useer Account reequests.

Private (Named) User


Us Function
onality
Private users will havve access to additional fu
unctionalityy which is noot available to public useers.
Requestss for private accounts wiill be submittted electron
nically usingg elements off the SLDS Portal
and the Workflow coomponent. Procedures for submittin
ng, approvin
ng and denyin
ng account
requests will be delin
neated by th
he governancce structure.
Users wh
ho have been
n approved for a private account willl be notified
d by email. Access to privvate
account features willl be granted after users h
have supplieed a valid useername and password an
nd
have been
n authenticaated against either the C
COV or COV
V AUTH direectory. Privaate users mayy,
dependin
ng on their permissions, have access to the follow
wing featurees and functiionality:

Help/Trainin
ng files to incclude How T
To and insttructional vid
deos.
Reports
Abilitty to view noon-suppresseed aggregateed.
Abilitty to access the Query Bu
uilding Tooll (QBT) for constructingg data requessts.
Lexicon
Functtionality determined by the governan
nce structurre.
Workflow

Archittectural Design and Best Practicces Project | P a g e | 48

A006.1 Delivverable Final Report and Desiggn Recommendaations

Abilitty to electron
nically subm
mit and track
k Data request.
Abilitty to retrievee data which
h has been reequested and
d approved.
Abilitty to attach files.
Abilitty to check status, modiffy or cancel account and
d/or data requ
uest.
Password resset
Abilitty to reset th
he users passsword. This capability may be proviided through
h the
COV AUTH direcctory processs.

Applicattion Framew
work and C
Content Man
nagement System
Sy
Featu
ures
The SLDS Portal willl be developed using a m
modern appliication devellopment fram
mework and
d
content managementt system (CM
MS). Use of an applicatiion framewoork, such as Microsoft .N
Net,
and a con
ntent managgement systeem, such as U
Umbraco, alllows for the developmen
nt of rich
functionaality and serrvices with minimal deveelopment. Most contentt managemen
nt systems
include features and services succh as web UR
RL control, custom conttent types an
nd views, revvision
control, taxonomy, user managem
ment, docum
mentation, an
nd established communiity support.
6.1.2 Security
Seccurity is the foundation component for the SLDS. The sensittivity of the
infformation an
nd policies reegarding wh
ho and how data is handlled will be
maanaged throu
ugh a cohesivve security model. The model used for the SLDS
S
inccorporates au
uthenticatioon and authoorization pieeces.
Au
uthentication
n is required
d for all privaate (named) users, to incllude research
hers
as well ass agency emp
ployees. Ressearchers and agency em
mployees willl be authenticated as a
precondiition to gainiing access too the named user portion
ns of the SLD
DS portal. Aggency emplooyees
will be au
uthenticated
d before gain
ning access to the Workflow component of the SLDS application.
Figure 5 depicts the interaction of the Work
kflow compoonent with other SLDS components,, as
well as th
he authenticcation interfaace for agenccy employeees and researrchers.
When reequests for acccounts and
d data access are submittted through the SLDS Poortal, the
Workflow componen
nt triggers messages to d
designated Commonweaalth of Virgin
nia (COV)
employeees for review
w and action.. The action takes the foorm of approvval or deniall. In order foor a
COV emp
ployee to intteract with the Workfloow componeent, s/he wou
uld need to log in
(authentticate by the COV Activee Directory) to the COV
V infrastructu
ure. Thereaftter, s/he wou
uld be
able to acccess the Workflow com
mponent in oorder to act on the Work
kflow trigger.

Archittectural Design and Best Practicces Project | P a g e | 49

A006.1 Delivverable Final Report and Desiggn Recommendaations

Figuree 5: Security Flow

hentication would occurr through thee SLDS Portal using the COV AUTH
H
For reseaarchers, auth
directoryy. After an acccount requeest is approvved, a researccher would be required to log in to their
accountss to make data requests from resourcces for which
h they have received app
proval.
Authorizzation definees user roles and the perm
missions asssociated with
h those roless.
For exam
mple, a researrcher (role) would
w
have access to vieew (permisssion) the Lex
xicon, while a
data adm
ministrator (rrole) would have access to view and
d modify (perrmission) th
he Lexicon. The
nt is the hub
b for managing a users roles and associated perm
missions. Th
he
Workflow componen
SLDS com
mponents cooordinate wiith the Workflow to maanage requessts for services correctly..
6.1.3 Workflow
The back offiice of the SLDS is the Workflow com
mponent. Th
he SLDS
Workflow will be develooped using th
he Microsoftt Dynamics Customer
Relationship Managemen
nt (CRM) package. CRM
M is a solutiion for
automating in
nternal busiiness processses by creatiing workflow
w rules that
describe routtine tasks invvolving dailyy business operations. These processes
can be design
ned to make sure that ap
ppropriate an
nd timely infformation iss sent
to the correct peoplee. To initiate workflows and in orderr to act upon
n the successsful complettion of
workflow
ws the SLDS
S Workflow component will need too have interfaaces to the other compon
nents
of the SL
LDS Portal.
The main
n function off the SLDS Workflow coomponent iss to manage and define a series of tassks
within an
n organization to producce a final outtcome or outcomes. Theese workflow
ws will allow
w the
partner agencies to work togetheer to controll access to th
heir shared data and systtem. The
workflow
ws will hand
dle email alerrts and notiffications to both the partner agenciees and to thee

Archittectural Design and Best Practicces Project | P a g e | 50

A006.1 Delivverable Final Report and Desiggn Recommendaations

portal naamed users. Figure 6Figu


ure 6 providees a conceptual represen
ntation of thee architecturre
and interrfaces of the SLDS Work
kflow Comp
ponent.
Figure 6: Workflow Com
mponent

Interfacees
To initiatte workflow
ws and in ord
der to act upoon the succeessful complletion of worrkflows the SLDS
Workflow componen
nt will need to have inteerfaces to thee other comp
ponents of th
he SLDS Porrtal.

Portal In
nterfaces
Named users will intteract with the SLDS Portal in orderr to submit data to the Workflow
componeent to initiatte the follow
wing workflow processess:
User Access Request
User Query Request
Once thee workflows have been completed, th
he result of the process will be comm
municated back
to the naamed user th
hrough the Poortal. The Workflow component wiill need to bee able to pussh

back the following in


nformation to the SLDS Portal:
Approval/Dissapproval meessages,
Request for additional daata,
Portal user roole permissioons,
And query result file locaation.

Shaker Interface
I
s
When a query requesst is approveed, the Work
kflow component will in
nteract with
h the Shaker in
order to submit the query for exeecution. Thee Shaker willl notify the Workflow coomponent of the
success or failure of the query ex
xecution and
d, in the even
nt of success,, the resultan
nt file location.

Archiitectural Design and Best Practicces Project | P a g e | 51

A006.1 Delivverable Final Report and Desiggn Recommendaations

Multiple
le Componen
nt Interfacees
The Worrkflow comp
ponent will be used to cooordinate th
he access autthorization in multiple
componeents of the SL
LDS system.. This will alllow user acccess to be ceentrally adm
ministered bu
ut
distributted to the ind
dividual com
mponents baased on purp
pose and need
d.

Workfloows
The SLDS Workflow
w componentt manages an
nd defines a series of tassks within an
n organizatioon to
produce a final outcoome or outcoomes. It will allow the SL
LDS team too define diffeerent workflows

for differrent types of jobs or proccesses. At each stage in the workflow


w, one indiviidual or grou
up is

responsib
ble for a speccific task. Once the task
k is completee, the workfllow softwaree ensures that the

individuaals responsib
ble for the neext task are notified and
d receive the data they need to execu
ute

their stagge of the proocess. It will also automaate redundan


nt tasks and ensure incoomplete task
ks are

followed
d up.

User Acc
ccount Requ
uest
A workfllow must ex
xist to review
w a request foor named ressearcher acccess to the Poortal. The

workflow
w will resultt in one of th
he following outcomes:

Approve,
Disapprove,
Or request more data (up
pdate accoun
nt request).

A workfllow must ex
xist to review
w a request foor named daata owner acccess to the Portal. The

workflow
w will resultt in one of th
he following outcomes:

Approve,
Disapprove,
Or request more data (up
pdate accoun
nt request).

User Ad
d Hoc Queryy Request
A workfllow must ex
xist to review
w a query req
quest for a naamed researcher. The woorkflow willl

result in one of the foollowing outtcomes:


Approve,
Disapprove,
Or request more data (up
pdate query rrequest).

User Ad
d Hoc Queryy Result
A functioon must exisst to handle the result off a query requ
uest for a naamed researccher. The fun
nction

must perrform the folllowing task


ks:
Receive back status,
Receive file loocation,
Communicate resultant file location tto the named researcherr through thee portal,

Communicate status to th
he named reesearcher thrrough the poortal,

Communicate status to th
he named reesearcher thrrough an alert or email,

And commun
nicate failed status to prooper administrator.

Archittectural Design and Best Practicces Project | P a g e | 52

A006.1 Delivverable Final Report and Desiggn Recommendaations

All accou
unt and dataa requests aree processed through and
d managed by the Work
kflow component.
Workflow monitors and triggerss actions succh as query submission and maintain
ns status of
requests.. Workflow is the sourcee of informattion about roles and perrmissions forr SLDS userss.
When an
n account req
quest is subm
mitted, it is the Workfloow componeent that man
nages the
message((s) and notiffies designatted COV emp
ployees abou
ut the request. Through the Workfloow
componeent, employeees can approove or deny the request. Workflow then notifiess the submittter of

the accou
unt request of the final decision.
On a dataa request, Workflow moonitors the rrequest, conffirms approvval, and subm
mits the query to

the Shak
ker for action
n. Designated
d COV emplloyees are nootified of thee request to approve or deny

the queryy. If the requ


uest is denied
d, Workflow
w notifies thee researcherr of the deniaal status. If

approved
d, Workflow
w submits th
he request to the Shaker and continu
ues to monitoor status. Up
pon

completiion of the traansaction, Workflow nootifies the ressearcher thee data set is available for

downloaad.
6.1.4 Reporting
Th
he SLDS Busiiness Intelliggence (BI) arrchitecture will supportt two scenarios:

Ad Hocc reports of rrecord-level user data.

Pre-deffined (canneed) reports of aggregated


d linked dataa.

Ad Hoc Recor
o d-Level Data
Da BI Archi
hitecture
A visual repressentation of the Ad Hoc record-levell BI architectture is preseented

in Figuree 7. This arch


hitecture con
nsists of the following major compon
nents:

A Lexicon and Shell Dataabase that arre built based


d on the sou
urce data and
d will supporrt the

Logi Ad Hoc tool.


Report Creattion using th
he Logi Ad H
Hoc Business Intelligencee platform. This consistss of

th
he Logi Ad Hoc Query Building Tooll and Ad Hooc Metadata..

A Workflow engine that routes reporrt submissioons through an approval process.

The Shaker th
hat will servvice the querry against reccord level daata in the dissparate source

syystems.
The record leevel Query Results that w
will be preseented to the user.

Archittectural Design and Best Practicces Project | P a g e | 53

A006.1 Delivverable Final Report and Desiggn Recommendaations

Figurre 7: Ad Hoc Record-Level BI Architectu


ure

n and Shell Database


Lexicon
The Lexiicon will con
ntain information about the data objjects that eacch of the sou
urce data sysstems
have mad
de available to the SLDS. This inform
mation will be utilized too create a Sh
hell Databasee. The
Shell Dattabase servess two purpooses: (1) it is an instantiation of the Lexicon and is needed byy Logi
Ad Hoc in order to fu
unction; (2) it contains sample data that will be used by Loggi Ad Hoc too
allow ressearchers to preview theeir query requ
uest. A proccess will be built that creeates and
populatees the Shell Database bassed on the in
nformation available in th
he Lexicon.

Report Creation
The Repoort Creation
n process willl be provideed through Logi Ad Hoc. Logi Ad Hooc has a self--serve
user interface to allow
w a user to specify a rep
port query. Itt has facilitiees for the useer to save the
he report. When the user is finished with the rep
port, they caan run
query and preview a sample of th
the reporrt. During th
his step, a cu
ustom processs will interccept the querry that has been submittted.
The querry produced along with the parameters includin
ng columns selected and filters speciffied
will be seent to the Workflow com
mponent.

Workfloow
The Worrkflow comp
ponent routees the query through thee appropriatee steps to geet acceptancee by
specified
d agency reviiewers.

Shaker
The Shak
ker interactss with the Leexicon and th
he source daata systems to query and
d join togeth
her the
record level data from
m each of thee source data systems. Itt deposits th
he joined datta set in eitheer a
file or a database table.

Archittectural Design and Best Practicces Project | P a g e | 54

A006.1 Delivverable Final Report and Desiggn Recommendaations

Query Results
As mentiioned previoously, the Shaker has thee option to place the resu
ulting query data in a
databasee table or filee. If the resullts are sent to a file, the submitting researcher is notified thaat the
file is avaailable for doownload by the research
her from the Portal. If thee results are placed in a

databasee table, reporrts can dynam


mically be crreated in Loggi Info that references th
he data in th
hese

tables. Th
hese reportss will provid
de the user with
w some lim
mited capabiilities for anaalysis like

filtering, sorting, and


d grouping of the data.

Aggregat
ated Linked
d Data BI Arcchitecture
A visual representatiion of the BI architecturee for aggregaated linked data is preseented in Figu
ure 8.

This arch
hitecture con
nsists of the following m
major compon
nents:

The Shaker too join record


d level data frrom the sourrce data systtems.

A repository for the recorrd level linkeed data.

An ETL (extrract, transforrm, load) toool and proceess to extractt data from the record leevel

liinked data sttore and load


d it into the aggregated linked data store.

A repository for the aggreegated linked data.


The Logi Infoo Business In
ntelligence platform thatt will be used
d for servingg up prebuiltt
reeports.
The SLDS Porrtal where th
he Logi Info reports willl be embedd
ded.
Figuree 8: Aggregatee Linked Dataa BI Architectu
ure

Shaker
The Shak
ker interactss with the Leexicon and th
he source daata systems to query and
d join togeth
her the
record level data. It deposits the joined data set into a daatabase tablee in the Recoord Level Lin
nked
Data Storre.

Archittectural Design and Best Practicces Project | P a g e | 55

A006.1 Delivverable Final Report and Desiggn Recommendaations

Record Level Linked


e Data Store/ETL Proccess/Aggregate
g Linked Data Store
An ETL process takees data from the Record Level Linked
d Data Storee, aggregatess it to the
appropriiate level, and
d loads the aggregated data
d into thee Aggregate Linked Dataa Store. Oncee the
data has been loaded
d, the table in
n the Record
d Level Linkeed Data Storre will be pu
urged.

Prebuiltt Reports
Prebuilt Reports will be created using Logi IInfo and willl use data in the Aggregaate Linked Data
Store. Th
he Logi Info product alloows for custoom design an
nd forma rep
ports that pu
ull data from
ma
pre-speciified data soource. The reeports produ
uced can con
ntain tables, charts, and maps or a
combinattion. Logi In
nfo has somee prebuilt anaalytical typee reports thaat allow an en
nd user to
perform some limited
d analysis off the data inccluding sortiing, filteringg, and groupiing the data..

SLDS Poortal
The Preb
built Reportss will be mad
de available in the SLDS
S Portal.
6.1.5 Lexicon
The Lexicon is an inventory of evvery available data field in evvery available data source, thhe
structture of their stoorage, the posssible values and meanings of the informatioon stored, all possible
transfformations of each set of fieldd values to anoother set of fielld values, methhods of data soource
accesss, and matching
ng algorithms aand how they are to be used inn conjunction with possible field
value transformatioons.
The Lexiicon (Figure 9) contains no data from
m any data source. It willl be used to manage thee Shell
Databasee for users too build queriies against, aas well as prooviding the Shaker with
h appropriatee
information to prepaare an optimiized query ssequence for data requessts. The Shelll Database will
contain fictitious datta.
A researccher, when building a qu
uery, interacts with a sett of field nam
mes and relattionships to
formulate a query. Th
he user interrface for the query buildiing providess a simple vieew of the Leexicon
for easy query constrruction.
To mainttain the accu
uracy and to manage exttensibility off the Lexicon
n, the compoonent processses
all data sources perioodically at a predetermin
ned time/interval searching for:
Changes in data ranges,
new data field
ds,
an
nd anythingg else that woould disruptt the probabilistic match
hing or proviide more waays to
sslice and dicce the data.
Anomaliees found by the linking module will prompt an alert for an administratoor to modify the
matchingg algorithm or add new query choicees.

Archittectural Design and Best Practicces Project | P a g e | 56

A006.1 Delivverable Final Report and Desiggn Recommendaations

Fig
gure 9: A Logiical Representtation of the Lexicon and itss Interactions

6.1.6

Shaker

The Shakerss general fun


nction (Figurre 10) is to acccept an app
proved queryy and
return a dataaset. The queery will be broken down
n into a seriees of optimizzed
steps, or sub-queries, to retrieve de-iidentified daata from the appropriatee data
sources in th
he most efficiient mannerr. In keeping with the inttent of the
original requ
uest, query foorms (e.g. inn
ner join, leftt join, equijoiin) and speccified
ffinal output parameters (e.g. counts of non-matcching record
ds by demogrraphic
categories) will be taken
n into consid
deration. Infoormation froom the Lexiccon
concernin
ng data stru
ucture and reelationships will be used
d to produce a dynamic sub-query pllan for
data retriieval that miinimizes proocessing time
m and worklload on the target data soources.
For each query subm
mitted to the Shaker, a raandom key iss generated. Each sub-qu
uery in the data
retrieval plan will sen
nd this rand
dom key to th
he data sourrce to be used
d in creatingg a secure on
ne
way hash
hed key for any applicab
ble records. This list of haashed keys is then used by the Shaker
k to
combine records acrooss multiplee data sourcees, never tran
nsferring anyy identifiablee informatioon out
of the datta source. An
ny hashed keys used to link records will be removed from th
he final data set
and replaaced with yeet another raandom key which cannott be traced back to any original dataa
sources. The resultin
ng combined
d records are then upload
ded to a file or database table for lateer
access byy the user.

Archittectural Design and Best Practicces Project | P a g e | 57

A006.1 Delivverable Final Report and Desiggn Recommendaations

Fig
gure 10: A Loggical Represen
ntation of the Shaker and itts Interactionss

6.1.7 Data
Th
his SLDS Datta Architectu
ure (Figure 11) consists of the Sourcee Data Systeems.
Th
he Shaker wiill submit qu
ueries to the target data systems and
d join the resulting
datta sets. The data will optionally be written to a database thaat resides in the
SL
LDS environm
ment. The SL
LDS environ
nment will allso contain other databaases
neeeded by the SLDS Portall including a Metadata/S
Security dataabase, Work
kflow
dattabase, Lexiccon, Shell daatabase, and
d an Aggregate Linked Data databasee.

SLDS Daatabases
Several databases wiill reside in SLDS environ
nment. Thesse databases will act as a repository for
data and metadata neeeded by varrious compoonents of thee SLDS Portaal. Each dataabase and itss
Portal ussage are desccribed below
w.

Metadatta and Secur


urity Databas
ase
The Metaadata and Seecurity datab
base will con
nsist of the Logi Ad Hocc metadata, logging data
repositorry, auditing data repository and any data that neeeds to be maintained too control secu
urity
for the poortal.

Workfloow Database
se
The Worrkflow datab
base will con
ntain data neeeded by thee Workflow engine. Thiss will includ
de
data need
ded to track the steps an
nd processess in the work
kflow. It willl also includ
de data requiired
for security to be maiintained in the workflow
w.

Archittectural Design and Best Practicces Project | P a g e | 58

A006.1 Delivverable Final Report and Desiggn Recommendaations

Figurre 11: Record Level Query BI Architectu


ure

Lexicon
The Lexiicon databasse will provid
de informatiion about the data objects that have been exposeed to
the SLDS
S Portal by each of the soource data syystems. Interraction with
h the data store will be
through the Lexicon user interfaace and admiinistration portion of thee SLDS Portal.

Shell Datab
a ase
A Shell Database is needed in ord
der for Queryy Building Tool to functtion. The Sheell Database will
be built off of inform
mation contaiined in the L
Lexicon.

Shaker/D
De-identifieed Record Level Linked
d Database
The Shak
ker will optionally writee the results of the joined
d data from the Source Data Systemss to
the De-id
dentified Reccord Level Linked Datab
base.

Aggregat
ate Linked Data
The Aggrregate Linkeed Database will be utilizzed by the prebuilt repoorts. This dattabase will be
populateed through ETL processees that will aaggregate daata from the De-identified Record Leevel
Linked Database. Stoored procedu
ures will be used by thesse reports foor data queryying and
suppresssion.

Archittectural Design and Best Practicces Project | P a g e | 59

A006.1 Delivverable Final Report and Desiggn Recommendaations

7 Phy
ysical In
nfrastruc
cture
The SLDS application will be devveloped, testted and deplloyed in threee environmeents. The
Developm
ment environ
nment will be hosted at a Florida daatacenter opeerated by HP
PCHost.com
m. The
Test and Production environmen
nts will be hoosted at the Commonweealth Enterp
prise Solutions
Center (C
CESC) physsically locateed in Chesterrfield Virgin
nia.

7.1 De
evelopme
ent Enviro
ronment
The Deveelopment en
nvironment (Figure 12) will be purch
hased as a moonthly servicce. The
environm
ment will con
nsist of Virtu
ual Machinees (VM) operrating on HP
PCHost man
naged VMW
Ware
ESX infraastructure. This monthlyy service can
n increased, decreased orr eliminated
d as necessaryy.
Figure 12. Devvelopment En
nvironment

A request has been su


ubmitted to purchase a slice of com
mputing pow
wer which consists of a 1 4
Core proocessor, 16 GB RAM and 300 GB of RAID 6 SAN storage. VIT
TA EAD willl manage thee
Developm
ment VMs using the folllowing ruless/guidelines:
VITA EAD will build/con
nfigure each VM as requested by thee developers..
Itt is estimated
d that 6 122 VMs will be built for th
his developm
ment effort.
o Portall Server
o Reporrt Server
o Work
kflow Serverr
o Shakeer Server
o Datab
base Servers (2)
All Microsoftt OS based VMs will be joined to th
he VITA EAD
D operated EADDEV Dom
main
unless otherw
wise requested.

Archittectural Design and Best Practicces Project | P a g e | 60

A006.1 Delivverable Final Report and Desiggn Recommendaations

VITA EAD will operate a PFSense Firrewall that will protect the Develop
pment

en
nvironment..
Developers will access th
he environmeent using a VPN or a SSH
H tunnel.
Non-Develop
pers on the COV network
k may be alllowed web access to the developmen
nt
portal and rep
ports as requ
uired.
Developers will be given full administrative contrrol over theiir respective VMs.
All Code willl be checked
d into a Codee Repositoryy operated byy VITA EAD
D.
Developmentt VMs will not be backeed up unless specifically requested.

7.2 Te
est Enviro
onment
There wiill be no add
ditional serveers specificallly ordered for the SLDS Test enviroonment (Figu
ure
13). VITA
A EAD has orrdered new physical serrvers that wiill have the capacity to su
upport SLDS
testing allong with otther applicattions. This methodologyy was adopteed to reduce infrastructu
ure
costs by sharing physical infrastrructure. VIT
TA EAD will be responsible for all ap
pplications
residing on these shaared servers.
Figure 133. Test Environment

The test web and app


plication serrvers will have two 12-C
Core processoors, 128 GB RAM and 14
44 GB
wo 4-Core processors, 32 GB
of RAID 5 local disk space while the test dattabase serverr will have tw
RAM and
d 273 GB of RAID 5 locaal disk spacee.
The SLDS application will leveraage DOE and
d SCHEV tesst databases in their testt environmen
nts.
The SLDS application will primaarily use web
b services to connect and
d will use th
hat same metthod
to connecct to any futture internall or external data source.

Archiitectural Design and Best Practicces Project | P a g e | 61

A006.1 Delivverable Final Report and Desiggn Recommendaations

The Testt environmen


nt web serveers will be acccessible from
m the Intern
net. VITA EA
AD system
administtrators will access the baackend SLDS
S servers viaa 2-factor VP
PN RDP. DO
OE and SCHE
EV
will conttrol administtrative accesss to/from th
heir test dataabases. Deveelopers will normally nott have
access too the SLDS teest servers. Code/changees from the SLDS Develoopment envirronment willl be
promoted
d to the SLD
DS Test envirronment thrrough a strucctured prom
motion processs.

7.3 Prroduction
n Environ
nment
An additional web an
nd applicatioon server wiill be requireed for the SL
LDS Production environm
ment
(Figure 14). SLDS wiill also sharee new VITA EAD producction serverss. VITA EAD
D has ordered
d new
physical servers that will have th
he capacity to support vaarious SLDS componentts along with
h
other app
plications. This methodoology was ad
dopted to red
duce infrastructure costts by sharingg
physical infrastructu
ure. VITA EA
AD will be reesponsible foor all applicaations residin
ng on the SL
LDS
and shareed servers.
Figure 14. Prroduction Envvironment

All SLDS
S and shared servers willl have two 122-Core proceessors, 128 GB RAM and
d 144 GB of RAID
5 local diisk space. Th
he SLDS app
plication willl leverage DO
OE and SCH
HEV producttion databasses in
their prooduction envvironments. The SLDS ap
pplication will primarilyy use web serrvices to con
nnect
and will use that sam
me method too connect too any future internal or external dataa source.
The Prod
duction envirronment weeb servers wiill be accessiible from thee Internet. VITA EAD syystem
administtrators will access the baackend SLDS
S servers viaa 2-factor VP
PN RDP. DO
OE and SCHE
EV
will conttrol administtrative accesss to/from th
heir productiion databasees. Developers will not have
access too the SLDS production seervers. All teest and prod
duction serveers will be jooined to the COV
domain. Code/changges from the SLDS Test environmentt will be prom
moted to thee SLDS
Production environm
ment through
h a structureed promotion process.

Archittectural Design and Best Practicces Project | P a g e | 62

A006.1 Delivverable Final Report and Desiggn Recommendaations

Appen
ndix A: Secondary Arch
hitecture
e
Best Practice C
Case Stu
udies
A.1 Illin
nois Statte Board of Educa
ation
State/Agency:

Web Sitee:

Address:

POC:

POC Phoone:
POC Email:

Illinois Staate Board of Education


http://ww
ww.isbe.statee.il.us/ILDS/h
htmls/projecct.htm
100 North First Streett
Springfield
d, Illinois 366104
Michael McKindles
217-782-03329
mmckindll@isbe.net

CaaseProfile
Stud
dentEnrollm
ment:2,119,7
70735
Teacchers:135,70
0436
LDSGrant:$11,8
869,81937

Backgrou
ound
In July 20
009, Illinois Governor, Pat Quinn, siigned into laaw the P-20 Longitudinaal Education
n Data
38
System Act . The act was a response to the 22009 LDS grrant Illinois received from
m the U.S.
Departm
ment of Educaation. The grrant funded the Illinois Longitudinaal Data Systeem (ILDS), which
is being built to estab
blish the tecchnical and m
managementt systems necessary for the Illinois Board
of Education (ISBE) and its educcation partneers to managge, link and analyze P-20
0 education data.
In Decem
mber 2009, th
he ISBE creaated the Illin
nois Data Sysstem Advisory Committee. At its
inception
n, members included thee Assistant S
Superintendeent, the Diviision Admin
nistrator, and
d the
ILDS Prooject Manageer.

System Design and


d Architecturre
Prior to its receipt off the federal LDS grant, IISBE implem
mented a statte Student Id
dentification
n
System (SIS) and exp
panded its use. As of 200
09, the ISBE SIS included five years of student
enrollmeent data and program infformation; up
pdated stud
dent demograaphic inform
mation; and four
years of assessment results. The various dataa sources proovide data on
n teacher dem
mographic,
teacher certification,, LEA and scchool program
m participattion, LEA fin
nancial inforrmation, LEA
A
35

State educational data proofiles. (n.d.). Retrrieved from


http://nces.ed.gov/progrrams/stateprofiles/sresult.assp?mode=shorrt&s1=17
36
Ibid.

37
Statewidee longitudinal datta system grant prrogram - grantee state - Illinois. (n
n.d.). Retrieved from

http://nces.ed.gov/progrrams/slds/statte.asp?stateabbr=IL
38
Illinois Public Act 96-0
0107. (n.d.) Rettrieved from htttp://ilga.gov/leegislation/publlicacts/96/096--0107.htm 

Archittectural Design and Best Practicces Project | P a g e | 63

A006.1 Delivverable Final Report and Desiggn Recommendaations

facilities,, specialized
d student proograms, LEA
A compliancee and monitooring, and LE
EA child
nutrition
n services. Ass a result, ISBE had moree than 100 diisparate colllection systeems on a rangge of
technologies.
The ISBE
E data system
m currently cannot proviide data thatt can be used
d effectivelyy in education
n
decision making. Datta currently collected byy the agency is highly fraagmented acrross various
systems and collectioon vehicles. This fragmen
ntation coveers multiple data system
ms that includ
de
student level data, ass well as a vaariety of systtems that maaintain data from other parts of the ISBE
education
n enterprisee (e.g. staff daata, LEA and
d school program particiipation and LEA financial
information).
In the futture, the ISB
BE team expeects its new system to caapture and track longitu
udinal data on
students in Illinois scchools, from
m pre-kinderggarten to theeir employm
ment outcomes. With a
longitudiinal data sysstem in placee, ISBE also plans to imp
prove its abillity to suppoort the Federral

Electroniic Data Exch


hange Netwoork (EDEN))/EDFacts. IS
SBEs currentt system sup
pports a seriees of
automateed programss that pull daata from variious source systems to produce the aggregationss and
calculatioons for EDEN/EDFacts. According too ISBE, this is a high maiintenance prrocess that can be
streamlin
ned with thee right data architecture and solution
n set in placce.39

Lessons Learned
In March
h 2010, the IS
SBE released
d a Request ffor Sealed Prroposals (RF
FSP) to conttract with a
vendor too develop en
nterprise-wid
de data arch
hitecture. Th
he ISBE plan will includee data from 13
different systems thaat currently use a mixturre of Access and SQL serrvers. These 13 systems range
from 150 to 3,000 datta elements and use Web
b, LAN, and
d standalone applicationss.
Currentlly, LDS systeem is in the design phasee. The state recently hireed Public Coonsulting Grroup
to design
n the data arcchitecture. Their method
dology to daate has involved interview
ws with proogram
and techn
nical resourcces and align
nment with data model work they are performin
ng for CCSSO.
The ISBE
E team anticiipates the deesign processs will take 6 months and
d framing th
he data
architectture will take another 6 months.

39

Illinois State Board of Education. (March 2010). Reequest for Sealeed Proposals (R
RFSP): Data Arrchitecture Vendor
for the Illlinois Longitu
udinal Data Sysstem (ILDS) Prroject.

Archittectural Design and Best Practicces Project | P a g e | 64

A006.1 Delivverable Final Report and Desiggn Recommendaations

A.2 No
orth Dako
ota Deparrtment off Public In
nstructio
on
State/Agency: North
h Dakota Dep
partment of Public Instrruction
Web Sitee:
http://ww
ww.dpi.state.nd.us/
Address:
600 E. Bou
ulevard Aven
nue, Dept 20
01
Bismarck, North Dakoota 58505
POC:
Tracy Korrsmo
POC Phoone:
701-328-41134
POC Email:
tkorsmo@
@nd.gov

CaaseProfile
Stud
dentEnrollm
ment:94,728
840
4
Teacchers:8,18141


LDSGrant:$6,72
23,09042

Backgrou
ound
Prior to 2007, North Dakota did not have a Longitudinall Data System
m; however, state leaderrs
realized the importan
nce and benefits of linkiing data amoong North Dakota Deparrtment of Pu
ublic
Instructiion (K-12 sch
hools), the North Dakota Departmen
nt of Commeerce, Workfforce Divisioon and
h Dakota Staate Board off Higher Edu
ucation and pushed for an LDS projeect. To achievve
the North
this goal,, these leadeers realized that foundatiional compoonents were necessary an
nd, thus, hired
Claraview
w to develop
p a state-wid
de LDS strategic roadmaap. Shortly affter the road
dmap projectt
began, North Dakotaa applied forr and received a Statewid
de Longitudiinal Data Systems grant from
the U.S. Departmentt of Educatioon. Future fu
unding for th
he NDLDS will be a comb
bination of these
federal fu
unds and staate appropriaations.

System Design and


d Architecturre
Currentlly, North Dak
kotas LDS is still in the design stagee. The North
h Dakota LD
DS team plan
ns to
build thee K-12 data and Workforrce Departm
ment warehou
use, while th
he higher edu
ucation dataa
warehou
use will be ex
xpanded by the higher ed
ducation com
mmunitys IT
T staff. The three separaate
systems currently aree not integraated, but willl be in the fu
uture, possib
bly in 2011.

40

National Center for Educaation Statistics. "SState Profiles Hoome Page." Nationnal Center for Edducation Statisticcs. U.S. Departmeent of
Educationn, Fall 2009. Webb. 15 Dec. 2010. htttp://nces.ed.goov/programs/sttateprofiles/sreesult.asp?modee=short&s1=17&
&s2=38
41
Ibid.
42
National Center for Educaation Statistics. "SStatewide Longittudinal Data Systtems Grant Progrram - Grantee Sttate North Dakkota."
National Center for Educaation Statistics (N
NCES). U.S. Depaartment of Educaation, May 2010. Web. 15 Dec. 20111.
http://ncces.ed.gov/proggrams/slds/statte.asp?stateabb
br=ND


Archittectural Design and Best Practicces Project | P a g e | 65

A006.1 Delivverable Final Report and Desiggn Recommendaations

During th
he planning stages, the NDLDS team
m inventoried their existting statewid
de data sources
r
and discoovered that the North Dakota Deparrtment of Pu
ublic Instrucction and had
d their own data
warehou
uses. Therefoore, rather th
han creating an entirely new system, it was a nattural match to
build outt the K-12 warehouse to an LDS systtem. This app
proach saved
d the team time and
resourcess.
North Daakotas K-12, workforce (from the H
Human Resou
urces Departtment), and higher educcation
data is ex
xpected to have its own separate waarehouse. Sin
nce K-12 and
d workforce data are alreeady
being stoored in a currrent warehoouse, North D
Dakota is woorking on ex
xpanding and
d matching these
warehou
uses to a longgitudinal datta system that is based on the K-12 data warehou
use. The SLD
DS,
which is being built as an extenssion of the K-12 warehou
use is to conssume data frrom all the
different warehousess. The archittecture has not been finaalized issuees of whetheer or not it will be
separated
d from all th
he other wareehouses, seccurity, are stiill being reseearched and deliberated..
Addition
nally, how an
nd to what extent the daata will be sh
hared outsid
de of internall researcherss have
not yet been decided
d. North Dak
kota may posssibly use a separate warrehouse and portal for
aggregateed data to bee available foor non-state-agency userrs.
Accordin
ng to North Dakota, its Higher Educcation agencyy is the mostt disjointed part of the LDS
because the agency iss building its own and h
has their own
n staff. The linkages betw
ween the ageencies
have not yet been staarted; howevver the state is now deveeloping a proocess to align
n student daata.
Attributees for the lin
nkages beingg considered are: name, date of birth,, gender, graaduating high
h
school, an
nd social seccurity numb
ber.
North Daakota has daata sharing contracts bettween: K-12 and Higher Ed; K-12 and
d Unemployyment
Insurancce; and Higheer Ed and Un
nemploymen
nt Insurancee. Another isssue they aree currently
working to overcomee is governan
nce. Currenttly, part of th
he statewidee LDS is goveerned by thee
states ow
wn privacy and sharing rules. Subcommittees arre currently reviewing th
heir privacy and
sharing laws and tryiing to decidee if the curreent legislatioons are cond
ducive to a prroperly
functioniing and streaamlined process for an LDS.
Currentlly, North Dak
kota does noot have an LD
DS design yeet. The statee is in the proocess of finallizing
the K-12 project plan
n. By the end
d of Decembeer, North Daakota hopes to have a prooject plan in
n place
for its LD
DS. North Daakota plans to utilize can
nned reports to share with the publlic; they do not
plan on having ad-hooc reporting capabilities in the near future. In terms of fundiing, the statee is
currentlyy looking forr ways to auggment their current fund
ding source. North Dakoota is hopingg to be
able to seecure fundin
ng to keep itss LDS runnin
ng once it is built. North
h Dakotas im
mmediate neext
steps aree to work on a proof of cooncept, impllement its K-12 warehou
use, and worrk on an enttity
resolutioon accountab
bility plan.

Lessons Learned
Because the North Dakota LDS is in the earlyy stages of planning and
d designing, Ms. Korsmo was
unable too provide maany recommeendations foor best practtices. Howevver, data govvernance has
emerged as an imporrtant issue. Although Mss. Korsmo diid not share specifics related to dataa
governan
nce issues, sh
he indicated that the teaam is facilitatting negotiations amongg the differen
nt
agencies and currentt legislation is being reviiewed. Ms Korsmo conclluded with a
recommeendation thaat prior to pllanning the d
design of a syystem, to deetermine wheether there are

Archittectural Design and Best Practicces Project | P a g e | 66

A006.1 Delivverable Final Report and Desiggn Recommendaations

existing centralized statewide aggency system


ms upon whiich to build an LDS. Thee North Dakoota
team is optimistic forr its plans too expand thee K-12 data warehouse and expects this leveragin
ng of
the legaccy system to reduce projeect time and
d costs.

Archittectural Design and Best Practicces Project | P a g e | 67

A006.1 Delivverable Final Report and Desiggn Recommendaations

A.3 Washingto
on Educattion Rese
earch and
d Data Ce
enter
State/Agency: State of Washingtton Educatioon Research
h and Data Center
Web Sitee:
http://ww
ww.erdc.wa.ggov/
Address:
210 11th Avvenue SW, Room 318
P.O. Box 43113
Olympia, W
Washington
n 98504
POC:
Dr. Michael Gass
POC Phoone:
360-902-0
0599
POC Email:
Michael.G
Gass@OFM.W
WA.GOV

CaaseProfile
Stud
dentEnrollm
ment:1,037,0
018
Teacchers:54,428
8
LDSGrant:$17,3
341,871

Backgrou
ound
The Wasshington Ressearch and Data Center,, an agency under Washiingtons Offi
fice of Financcial
Managem
ment, managges research and educatioon data for the state and
d is leading the states LD
DS
project. The Center also managess four-year h
higher educaation enrollm
ment data systems. The SLDS
grant Waashington reeceived was based on an
n inter-agenccy proposal to support feederal projeccts
that inclu
uded a P-20 data warehoouse. Since tthe beginnin
ng of the projject in July 2010, Washin
ngton
has estab
blished an ex
xecutive sponsorship steeering comm
mittee that cu
urrently is working throough
basic govvernance issu
ues, definingg scope and tthe project managementt framework
k, etc.

System Design and


d Architecturre
The SLDS project is focused mosstly on an intternal abilityy to mine thee data. The proposal
submitteed to NCES specified thee LDS type aand what diffferent agenccies would be linked. Th
he
SLDS teaam envisionss a P-20 dataa warehouse that would receive dataa from the K--12 warehou
use
that currrently is bein
ng built. Add
ditionally, th
he LDS will inherit the model K-12 builds, thou
ugh
designs on the integrration modell are not yet in place. Washingtons LDS also wiill integrate data
from its stakeholderss:
Office of the Superintend
dent of Publiic Instruction (OSPI) K-12 data
Department of Early Learrning
Higher Educaation Coordiinating Boarrd
State Board of Education
State Board foor Technicall and Comm
munity Collegges
Council of Prresidents a voluntary association of the presideents of Wash
hington Statte's
siix public bacccalaureate degree grantting institutiions
Employment Security Deepartment provides em
mployment an
nd employerr data
Professional Educator Staandards Boaard

Archittectural Design and Best Practicces Project | P a g e | 68

A006.1 Delivverable Final Report and Desiggn Recommendaations

Higher Educaation Coordiinating Boarrd


In
ndependent Colleges of Washington
n
Workforce Training and Education C
Coordinatingg Board

Since OS
SPI contracteed out its K--12 data systeem that mayy influence th
he design off the larger LDS.
OSPI con
ntracted witth a vendor to customizee its existingg longitudinaal system. Th
he system will be
built oveer the next yeear.
The Wasshington Staate team alsoo is investigaating the inittiative with the CCSSO (Council of Chief
State Sch
hool Officerss) model, wh
hich is a P-20
0 core modelling project to be design
ned by Publicc
Consulting Group (P
PCG) and fu
unded by thee Gates Foun
ndation. Thee project invoolves buildin
ng a
data dicttionary and longitudinal data model.. Several stattes currentlyy are particip
pating in thiis
initiativee and may usse the resultiing system. The Washin
ngton SLDS team is conssidering usin
ng this
model if the state willl build interrnally and will not purch
hase an off-tthe-shelf prooduct.
The Wasshington LD
DS team has yet to determ
mine its LDS
S technical architecture. However, th
he
state willl be using an
n SQL serverr as it is alrready using it in current systems. Th
he greatest isssue is
whether or not its LD
DS will be ou
utsourced beecause the sttate will havve less control of its systtem if
it is outsoourced. Add
ditionally, Washington is considerin
ng a complete Microsoft BI package such
as ShareP
Point.

Data Usa
sage and Rep
porting
The visioon for the sysstem was a normalized data set subsscription serrvice at leastt through an
n
authenticcated site, which would
d provide a nu
umber of rep
ports. Curreently, the teaam anticipatees
that matcches among agencies wiill be primariily through social securiity numbers (SSN), althoough
the state will use a ceentralized matching systtem that reliies on more than social security num
mbers.
High sch
hool and postt-secondary institutionss will be mattched througgh social seccurity numbeers,
but K-12 data will bee matched diifferently. Reeports will be provided for both aim
med and
anonymoous users.

Lessons Learned
Currentlly, Washingtton has been
n in a state off political im
mpasse. The various agen
ncies have noot
been ablee to agree on
n an IT goverrnance modeel, although Dr. Gass decclined to proovide specificcs. It
is an issu
ue the team hopes to oveercome withiin the next few months, especially siince there iss still
much to be done and
d funding froom the federaal government will end in July 2013. The LDS teaam
plans aree to completee the system
m design in on
ne year and two years off active deveelopment.

Archittectural Design and Best Practicces Project | P a g e | 69

A006.1 Delivverable Final Report and Desiggn Recommendaations

Appen
ndix B: Best Pra
actice Case Stu
udies Intterviewe
ee List
B.1 Ind
diana Dep
partmentt of Educa
ation
Contact::
Molly Ch
hamberlin
Director,, Data Analyysis Collectioon and Repoorting
Indiana Department of Education
n
Email/Teelephone/Address:
mchambeer@doe.in.goov
317-234-66849
151 Westt Ohio Streett
Indianap
polis, Indianaa 46204
Links:
http://www.d
doe.in.gov/d
data/ - Indian
na Departmeent of Educattion Data Web site

B.2 Iow
wa Deparrtment off Educatio
on
Contact::
Jay Pennington
Bureau Chief
Iowa Dep
partment of Education
Email/Teelephone/Address:
Jay.pennington@owaa.gov
515-281-4
4837
400 E. 14
4th Street
Des Moin
nes, Iowa 50
0319
Links:
http://www.iiowa.gov/edu
ucate/index.php?option
n=com_conten
nt&task=vieew&id=1691&
&Ite
mid=2490 EdInsight Web site

B.3 Da
ata Strate
egies Arrmy Suiciide Mitiga
ation Pro
oject
Contact::
Susan Caarter
Managin
ng Partner
Data Straategies
Kevin Coorbett
Managin
ng Partner
Data Straategies
Email/Teelephone/Address:
SCarter@
@DataStrateggiesInc.com

Archittectural Design and Best Practicces Project | P a g e | 70

A006.1 Delivverable Final Report and Desiggn Recommendaations

814-965-0003
KCorbett@DataStrattegiesInc.com
m
P.O. Box
x 772
Midlothiian, Virginia 23113

B.4 Tex
xas Educ
cation Ag
gency
Contact::
Brian Raw
wson
Director of Statewide Data Initiaatives
Texas Ed
ducation Ageency
Nina Tayylor
Director of Informatiion Analysiss
Texas Ed
ducation Ageency
Email/Teelephone/Address:
brian.raw
wson@tea.state.tx.us
513-936-22383
Nina.tayllor@tea.state.tx.us
512-475-22085
1701 Nortth Congress Avenue
Austin, Texas 78701
Links:
http://www.ttea.state.tx.u
us/ - Texas Education Aggency Web site
http://www.ttexaseducatiioninfo.org/ttpeir/TPEIR
R_Documentation.pdf - Texas Publicc
Education Infformation Resource

B.5 Da
ata Strate
egies DL
LA Data Converge
ence and
d Quality Project
Contact::
Susan Caarter
Managin
ng Partner
Data Straategies
Email/Teelephone/Address:
SCarter@
@DataStrateggiesInc.com
814-965-0003
P.O. Box
x 772
Midlothiian, Virginia 23113

B.6 NO
ORC Data
a Enclave
e
Contact:
Timothy Mulcahy
Senior Reesearch Scieentist

Archiitectural Design and Best Practicces Project | P a g e | 71

A006.1 Delivverable Final Report and Desiggn Recommendaations

NORC att the Univerrsity of Chicaago


Email/Teelephone/Address:
Mulcahyy-Tim@norc.org
301-634-9330
Universitty of Chicaggo
4350 Easst West High
hway
Bethesdaa, Maryland 20814
Links:
http://www..norc.org/DataEnclave - NORC Daata Enclave Web site

B.7 Illin
nois Statte Board of Educa
ation
Contact::
Michael McKindles
ILDS Prooject Manageer
Illinois State Board of Education
n
Email/Teelephone/Address:
mmckind
dl@isbe.net
217-782-0
0329
100 N. 1stt Street
Springfieeld, Illinois 62777
Links:
http://www.iisbe.state.il.u
us/ILDS/htm
mls/project.h
htm - Illinoiss Longitudin
nal Data Systtem
Project Web site

B.8 No
orth Dako
ota Deparrtment off Public In
nstruction
n
Contact::
Tracy Koorsmo
Business Intelligencee Program Manager
North Daakota Deparrtment of Pub
blic Instructtion
Email/Teelephone/Address:
tkorsmo@
@nd.gov
701-328-4
4134
600 E. Booulevard Avee., Dept. 112
Bismarck
k, North Dak
kota 58505
Links:
http://www.d
dpi.state.nd..us/ - North Dakota Dep
partment of Public Instru
uction
http://www.n
nd.gov/itd/p
planning/inittiatives/road
dmap.pdf - Sttate of North
h Dakota
Longitudinal Data System
m Strategic Roadmap

Archittectural Design and Best Practicces Project | P a g e | 72

A006.1 Delivverable Final Report and Desiggn Recommendaations

B.9 Sta
ate of Wa
ashington
n Educatiion Resea
arch & Da
ata Cente
er
Contact::
Michael Gass
State of Washington
n Education Research & Data Centerr
Email/Teelephone/Address:
Michael.Gass@ofm.w
wa.gov
360-902--0599
210 11th Avenue SW, Room 318, P.O. Box 431113
Olympia, Washingtoon 98504
Links:
http://www.eerdc.wa.gov//default.asp - Education
n Research & Data Centeer website
http://nces.ed
d.gov/prograams/slds/pdff/washington
nabstract200
09ARRA.pd
df - Project
Abstract from
m Departmen
nt of Educattion website.

Architectural Design and Best Practicces Project | P a g e | 73

A006.1 Delivverable Final Report and Desiggn Recommendaations

Appen
ndix C: Materialls Sent to Best Practice
es Interv
viewees
C.1 Be
est Practices
i
Interview
r
Tem
mplate
General Project Overvview
1. What were thee objectives of the project??
2. Do you have a project abstract/overview
w that you are able to share?
3. Who were thee stakeholderss?
4. How long did the project taake?
5. How much did
d the project cost?
Databasee Design/Arch
hitecture
6. What steps weere involved in designing the database/w
warehouse?
7. What architeccture existed within your organization prior to the design and imp
plementation
n of
th
he system?
8. Describe the general architeecture of yourr system and its different components. Is there a mod
del
th
hat you use (ffederated, non
n-federated)?
9. Do you have a visual represeentation of th
he system thaat you are ablee to share?
10. What productts are used forr the underlyiing data manaagement (e.g. DBMS)?
Data
11. How much datta flows throu
ugh the systeem (e.g. numb
ber of records))?
12. Were there dissparate data source?
13. Was sensitive (PII) data contained with
hin the sourcee database?
Security
14. What were thee security req
quirements off this system?

15. Does your systtem de-identiify personal d


data? If so, please describe your data-de--identification
n

prrocess.
16. Does the system contain an
n authentication process? Is it single or dual factor

Users
17. Can you descriibe the differeent users of th
he system?

18. Were separatee processes (d


databases) useed for anonym
mous users an
nd named useers?

19. What level of help desk sup


pport was proovided?

Implemeentation
20. Can you give me a picture of what was in
nvolved in thee implementaation process??

21. What was the level of effortt? How manyy man-years did the implem
mentation tak
ke from design
n to

im
mplementatioon?
22. What were thee barriers/chaallenges of im
mplementing this program??

23. What ongoingg efforts and resources are n


need after a system is up and running?

24. Knowing whatt you know now, how wou


uld you approoach the prob
blem and impllementation

diifferently?
Performaance & Feedb
back
25. How is perform
mance of the system measu
ured?

26. How is perform


mance affecteed through vaariations with
hin the system
m?

27. Were any com


mpromises made to the systtem design too in order to achieve accepttable levels off

peerformance?
28. Describe the reesults of the system and itss ongoing usee.

Archittectural Design and Best Practicces Project | P a g e | 74

A006.1 Delivverable Final Report and Desiggn Recommendaations

Other
nal best practtices or lesson
ns learned can
n you share?
29. What addition
30. Are there additional resourcces (e.g., peop
ple, documents, links, etc.)) that would be helpful?
31. What are the next steps?

Archittectural Design and Best Practicces Project | P a g e | 75

A006.1 Delivverable Final Report and Desiggn Recommendaations

C.2 Arc
rchitecturral Best Practice, D
Design & Planning
g
Supp
port Projject Overv
rview
The LDS will be a logiccal data wareh
house that is fed by numerrous agenciess from around
d the
Commonw
wealth. The data provided
d by each agen
ncy is a subset of the agenccys primary data repositorry
(databasee). It is possib
ble that more than one dataa source from
m an agency may be requireed. The linkagge
between the agencies is focused on a student as ss/he progress through the school system
m into the
workforcee. LDS-level information will be de-iden
ntified so thatt no individual can be uniq
quely identifiied

from the data. Even thoough the link


kage between the agencies is student-ceentric, the maanagement of the
data in th
he LDS is an unidentifiable person. It is iimperative th
hat the design
n of the LDS protect the
individual and not fall subject to re--identification
n.43
External query interfacces into the LDS will be m
made available to the publicc, partner agencies, and
researcherrs. Level of acccess to the LDS data is maanaged througgh an authenttication and authorization
n
scheme. Levels may be from selectioon of pre-cann
ned queries too ad hoc requ
uests.
The objecctive will be too develop a hiigh-level arch
hitecture and developmentt strategy which can then be
used to deetermine sourrcing requirem
ments, staffin
ng levels and become the in
nput to a detaailed design and
staffing ph
hase which will follow rap
pidly from thiis project. Gen
neral design questions reggarding the prroject
include:
What compon
nents of the soolution need tto be developed (and whatt needs to be procured)?
What are the key developm
ment tasks?
What actual processing and
d response sp
peeds are requ
uired and whaat is acceptab
ble?
What will be best way to de-identify thee data? What is the best way to link de--identified daata?
CIT will perform reseaarch and analyysis of other d
data warehou
use designs, in
ntegration effoorts, and theiir
implemen
ntations from similar goverrnment and corporate orgaanizations. Frrom a series of research,
interview
ws, and surveyys, CIT hopes to develop beest practices and lessons leearned from such program
ms.
Participattion in CITs research will help the Virgginia Departm
ment of Educaation in decid
ding what plan
n of
action to execute.

43

Ochoa et al., Reidentiffication of Indivviduals in Chicaago's Homicide Database: A Teechnical and Leg
egal Study, tech
h.
report, Massachusetts Inst. Of Tech
hnology, 20011. Retrieved frrom:
http://citeeseerx.ist.psu
u.edu/viewdocc/download?d
doi=10.1.1.15.74
467&rep=rep1&type=p
df

Archittectural Design and Best Practicces Project | P a g e | 76

A006.1 Delivverable Final Report and Desiggn Recommendaations

Appen
ndix D: Materialls Sent tto Subje
ect Matte
er Experrts
Prior to an interview
w, each Subjeect Matter Ex
xpert was prrovided a paackage of pree-reading maaterial
that help
ps familiarizee him/her wiith the SLDS
S program. The package consisted off a sample seet of
questions that may be asked durring the interrview and a white paperr on the gran
nt award, thee
conceptu
ual architecture, constraaints placed oon the impleementation, and sample use cases forr the
system.

D.1 Su
ubject Matter Expe
ert Interrview Tem
mplate
Data Moodeling
1. What do you
u consider th
he key steps tto a successfful integrateed data modeel when
in
ncorporatingg numerous data sourcess?
2. Can you desccribe the leveel of complex
xity for a datta warehousse data modeel versus a
feederated datta model?
3. What is a com
mmon probllem overlook
ked in data modeling wh
hen dealing with multiplle
data sources??

Securityy (General)
4. In
n the area off data govern
nance with n
numerous daata owners, are there tech
hniques thatt
make data access controlls easy to maanage and im
mplement acrross data sou
urces?
5. Have you beeen involved with a data b
breach and associated legal issues th
hat occurred? Do
yoou have any recommend
dations on hoow to best avvoid a data breach?
6. Are there enh
hanced securrity models tthat are used
d with datab
base managem
ment system
ms
th
hat add an ad
dditional layyer of protecction (e.g. ad
dditional fireewall in front of DBMS
seerver)?
7. When implem
menting an in-memory d
database (IM
MDB), are theere security//protection
teechniques th
hat differ from a standard
d disk-based
d data wareh
house/federaated?

Securityy (De-Identiification)
8. Can you desccribe differen
nt techniquees/algorithms used for daata de-identiification?
9. Are you familliar with anyy known issu
ues in de-ideentification algorithms that could caause
data elementss to be re-ideentified?
10. What issues existing in attempting to update de--identified data from its original sou
urce?
Are there goood techniquees for achieviing this proccess?
11. Can the de-id
dentification
n process add
d substantiaal processingg time to a daatabase
trransaction/q
query? If so, are there tecchniques to help minimizze the proceessing overheead?

Architeccture Tradee-offs
10. Given the desscribed concceptual desiggn, what diffferent implementation models migh
ht best
work and wh
hy?
11. For any of thee given modeels, are theirr known con
nstraints/lim
mitations on the

im
mplementatiion (e.g. one model work
ks best for sm
mall data setts)?

12. For a federateed model, wh


hat can impaact the perfoormance for data access??
13. Iff the databasse is designeed for online analytical processing, does that havve an influen
nce on
th
he type of im
mplementatioon model?

Architectural Design and Best Practicces Project | P a g e | 77

A006.1 Delivverable Final Report and Desiggn Recommendaations

14. Are there besst practices for linking data records across a fedeerated data model (when
na
coommon uniq
que ID is nott present)?
15. Can you desccribe differen
nt techniquees/algorithms used for lin
nking data across a federrated
model?
16. Are in memorry databasess trending up
p or down an
nd why?
17. What lessonss learned aree there with using an in memory database?
18. Are there advvantages or disadvantagees to using an in memoryy database too replace a
sttandard dataa warehousee?

Query Pr
Processing
19. Are there preferred queryy strategies for optimizattion when th
he system is implementeed as
a singular datta warehouse versus fedeerated?
20. In
n a federated
d model, is th
he query opttimization better handleed in centraliized, distrib
buted,
orr hybrid imp
plementation
n?
21. What impactts from the network topology can aff
ffect the querry processin
ng

im
mplementatiion?

Archittectural Design and Best Practicces Project | P a g e | 78

A006.1 Delivverable Final Report and Desiggn Recommendaations

D.2 Virrginia Sta


atewide Longitudinal
i
Data System - Executiv
ve Summ
mary
Collaborrative Partn
nership
The Stateewide Longiitudinal Data System (SL
LDS) is a colllaborative effort by the Virginia
Departm
ment of Educaation (VDOE
E), Virginia Employmen
nt Commissioon (VEC), State Counciil of
Higher Education forr Virginia (S
SCHEV), Virrginia Comm
munity Colleege System (V
VCCS) and
Virginia Information
n Technologiies Agency (V
VITA).
The Com
mmonwealth
h of Virginiass Departmen
nt of Educattion (VDOE)) successfully secured a multi
year fedeeral grant forr the design, developmen
nt and operaation of a Staatewide Longgitudinal Daata
System (SLDS) to inttegrate stud
dent and worrkforce data in the Comm
monwealth. Specifically, the
SLDS willl integrate K-12, higher education and
a workforcce data into a single logiical databasee
which caan be used foor research and analysis. Elements off the SLDS project will focus
f
on
transactiional data (ee.g., transcrip
pts and stud
dent records)), which willl reduce the cost burden
n for a
number of education
n stakeholderrs, includingg students, parents, adm
ministrators, school
counseloors, registrarss, and collegge admission
ns officers. Other elemen
nts will focuss on the
integration and delivvery of de-ind
dentified data via a web portal, and the managem
ment and
governan
nce of the Coommonwealtths educatioon and work
kforce data.
In order to establish this compreehensive, lon
ngitudinal daata system, the SLDS willl be develop
ped in
phases, with the initiial phase creeating a fedeerated longitudinal data linking and reporting syystem
linking data among state agency data sources, including K-12, higherr education, and workfoorce
systems. A rubric willl be created
d to documen
nt data elem
ment definitioons, data req
quirements, and
technicall requiremen
nts for de-ideentified dataa sets that caan be linked
d among agen
ncies; build a
central liinking directtory based on data shariing agreemen
nts in place or establisheed as part off the
grant prooject; and esttablish a queery process for authorizeed user access that uses the linking
directoryy to anonymoously join in
ndividual-levvel records frrom multiplee data sourcees.

Stakehol
olders
The SLDS will serve a variety of stakeholderss, to includee legislators, policy makeers, teachers,
school ad
dministratorrs, education
n program diirectors, reseearchers (booth inside and outside th
he
state), paarents, localiities, citizen
ns and the meedia. Append
dix A provid
des a simplisstic view of
keholders.
potentiall uses by stak

Future Benefits
When coomplete, the SLDS will provide reseaarchers, anallyst, educatoors, parents, students, poolicy
makers, and program
m administraators with th
he following business benefits:

Establishing kindergarten
n to college and career data systemss that track progress and
d
fooster continu
uous improvvement;
Enhancing th
he Commonw
wealths abillity to examiine student progress and
d outcomes over
tiime by linkin
ng individuaal-level studeent data from
m K-12 education, postseecondary
ed
ducation and
d the workfoorce system;;
Enabling the exchange off data amongg agencies an
nd institutioons within th
he State and
between Stattes to inform
m policy and practice;
Linking studeent data witth teachers primarily responsible forr providing in
nstruction;

Archittectural Design and Best Practicces Project | P a g e | 79

A006.1 Delivverable Final Report and Desiggn Recommendaations

Enabling the matching off teachers wiith informattion about th


heir certificaation and teaacher
preparation programs;
Enabling dataa to be easilyy generated for continuoous improvem
ment and deecision-makiing
Ensuring the quality and integrity of data contain
ned in the syystem;
Enhancing th
he Commonw
wealths abillity to meet reporting requirements of the U.S.
Department of Education
n;

Technolo
logy Challen
nge
Virginia stands out as a special case study in
n the difficultties of comb
bining data frrom multiplee
agencies.. In addition
n to the stand
dard layers of complexitty, Virginia-sspecific privaacy laws and
d
historicaal system of locally admiinistered, state supervised public seervices creatte additionall
challengees. This com
mplex networrk of technological, regu
ulatory, and structural im
mpediments to
the integgration of ind
dividual-leveel data makees a tradition
nal approacchconsolidation of daata in
a physicaal central w
warehouse
untenablee. To successsfully combiine Virginiass set of
heterogen
neous data sources; Virgginia propossed a federateed data systeems approacch.

Federateed Data Syst


stems
Virginiass federated data system will interactt with multiiple data sou
urces on the back-end an
nd
present itself as a sin
ngle data sou
urce on the frront-end. Th
he key to succcessfully lin
nking the diffferent
data sourrces is a centtral linking apparatus. Generally, th
his is a database managem
ment system
m that
has been set up with
h access priviileges to each
h data sourcce and that houses a link
king-table
populateed with the unique identtifiers that will be used to join the tables togeth
her into one large
data set.

Using a federated syystem to meerge data accross agenciies


The speccial requirem
ments imposeed on the esttablishment of a federateed data systeem between
public aggencies dictaate that indivvidual privaccy be maintaained. For ex
xample,
While it may bee acceptable forr any system user to retrieve information sh
howing that a certain
in
ndividual particcipated in a certain program (e.g., William Smith went too Virginia Tech
h), it would
noot be acceptable if that same user could link
k that person to a particular detail specific to that
prrogram (e.g. William Smith received a grad
de of F in Calcculus). Howevver, for the purp
poses of
loongitudinal research, it is that latter information that is neeeded (e.g., wee dont care aboout
William Smith, but we do wan
nt to know how
w many males failed Calculus in a particulaar year).

Therefore, what is neeeded is a system that will permit th


he linking of data relevan
nt to longitu
udinal
research that does noot allow perssonal identiffication of an
ny of the ind
dividuals in the data set. The
proposed
d solution haas two distin
nct processes, one for esttablishing an
nd maintaining an
anonymoous linking directory, an
nd one that uses the link
king directorry to join datta sources an
nd
return a de-identifieed data set too the user th
hrough a datta query proccess.
Virginia has proposeed a cross-agency data lin
nking and reeporting systtem that can
n be used in a
manner that maintain
ns the confid
dentiality off individual student/teaccher/employeee data, can be
used for accountabiliity and analyytic purposees, and meetss the requireements of State and fedeeral
privacy laaws. In Virgginia, state laaw (2.2-380
00- 2.2-38166) currentlyy prohibits sttate agenciess from
sharing personal infoormation acrross state ageencies excep
pt under speecific circumstances.
In order to meet SLD
DS program requirementts and the neeeds of the coollaborating state agenciies,
Virginia proposed a methodologyy that would
d permit mu
ultiple state agencies to merge de
identified
d individual--level data using a federaated data system model.. The method
dology, deveeloped

Archittectural Design and Best Practicces Project | P a g e | 80

A006.1 Delivverable Final Report and Desiggn Recommendaations

in conjun
nction with Virginias Offfice of the Attorney
A
Gen
neral to link
k data betweeen K-12 and
d
higher ed
ducation, wiill permit thee two educattion systemss and the maany agencies that house
workforcce education
n and trainin
ng programs to link unit--level record
ds through th
he use of de-
identified
d data sets.

An Exam
mple
Suppose a request is received forr a report on college stud
dents who en
ntered Virgin
nias commu
unity
college syystem for the first time in 2006, and
d the variablees of interestt are particip
pation and
outcomes on statewiide assessmeents in high sschool; credeential inform
mation as of August 31, 2009;
and emplloyment outtcomes sincee the studentt left high scchool.
Using daata element definitions in
n the system
m, the user will define thee cohort and
d variables too
include in
n the dataseet. Then, the system will identify stu
udents for stu
udy in the ceentral directtory,
and, usin
ng the hashed
d identifier in that direcctory, join daata from partticipating aggency system
ms.
Finally, jooining tools will replacee the identifiier in the cen
ntral directory with anotther unique hash
value for each individ
dual in the dataset, and d
deliver the fiinal de-identtified data in
n the format
specified
d by the user.

Summary
ry
The objecctive of the SLDS is to propel Virgin
nias data colllection, repoorting, and analytic
capabilitties far beyon
nd current capacities byy merging K-12, higher ed
ducation and
d workforce data.
By mergiing de-identiified data in a federated system, Virgginia will maaintain comp
pliancy with
h state
and federral privacy laaws, while meeting critiical data repoorting requirements and
d policydevelopm
ment needs.

Archiitectural Design and Best Practicces Project | P a g e | 81

A006.1 Delivverable Final Report and Desiggn Recommendaations

D.3 Virrginia Sta


atewide Longitudinal
i
Data System - Usage
Actors:
Virginiass SLDS will be utilized by a variety of stakehold
ders, to inclu
ude legislatorrs, policy maakers,
teachers,, school adm
ministrators, education program direcctors, researrchers (both inside and
outside the state), paarents, studeents, localitiees, citizens and the med
dia.

Potentiaal Use:
The folloowing statem
ments represent potentiaal use of the system by acctors:
Actors will usse a web-bassed portal too access pub
blicly availab
ble data from
m the K-12,
postsecondarry education
n, and workfoorce agenciees.
Actors will acccess/create reports thatt will be available in a vaariety of form
mats depend
ding
on
n the users preferences.. Tables, chaarts, and grap
phs will be presented to provide diffferent
viiews of the data. Maps will also be u
used to proviide a geograp
phic perspecctive. GIS daata
laayers, includ
ding county and city bou
undaries, roaads, schools, school distrricts, and relaated
in
nformation such as censu
us counts an
nd income leevels will be integrated with the
geeographic reeports to proovide contex
xtual informaation for thee data and foor further anaalysis.
Actors will deevelop custoom reports b
by combiningg data from multiple pub
blicly availab
ble
datasets, and be able to reequest the d
data in multip
ple formats.
Actors will deevelop Custom reports b
by identifyin
ng the cohorrt, independeent and
dependent vaariables and selecting an
n output metthod (table, chart, graph
h, Excel, CSV
V, etc.)
Actors will viiew [create] a report ideentifying thee number and
d percentagee of teacherss and
principals ratted at each performance rating or levvel.

Actors will crreate [view] a report shoowing growtth data for cu


urrent and previous year

sttudents and estimates off teacher imp


pact on stud
dent achievem
ment.

Actors will viiew a report of high schoool graduatees who enrolll in state insstitutions of

higher educattion and com


mplete at leaast one yearss worth of coollege credit within two
yeears.
The proposed
d information system wiill also includ
de reportingg capabilitiess available too
teeachers and other authorrized schooll division perrsonnel to provide estim
mates of stud
dent
grrowth and teacher impaact on studen
nt performan
nce on state assessmentss in reading and
mathematics..
Actors will crreate reportss that link sttudents to coourse enrollm
ment, coursee grades, and
d to
th
he teachers providing in
nstruction in each coursee.
Actors will bee able to view
w pre-develooped and pu
ublicly availaable reports created by
VDOE, SCHE
EV and VEC.
Actors will usse the longittudinal data system to coomplete porrtions of repoorts required
d by
laawmakers, su
uch as a 200
07 study of high school dropout and graduation rates
Actors will reeceive inform
mation aboutt students who have failled state-wid
de assessmen
nts
foor two or moore years in a row

Archittectural Design and Best Practicces Project | P a g e | 82

A006.1 Delivverable Final Report and Desiggn Recommendaations

Actors will usse the SCHE


EV Student D
Data Wareh
house to creaate standard and ad hoc
reeports on poostsecondaryy education.
Actors will viiew standard
d reports thaat are publiccly available on VDOEs website,
in
ncluding sum
mmary data for required and commoonly requesteed informatiion such as
numbers of sttudents enroolled, and graduated, droopped out, and participaating in speccial

ed
ducational in
nstructionall services.
A
Actors will deevelop addittional reportts using dataa to be colleccted for the sstudent-teaccher
in
nformation system,
s
comb
bined with iinformation already colleected at the student leveel, to
develop reporrts that provvide compariison of end-oof-course grades with peerformance oon
d
sttate assessm
ments, and ad
dditional infoormation on
n students noot tested by grade and
su
ubject.
A
Actors will coonduct analyyses of speciific content sstandards th
hat, when meet, describe tthe
tyype of work that students must achiieve to be reaady for postsecondary ed
ducation.

Archittectural Design and Best Practicces Project | P a g e | 83

Anda mungkin juga menyukai