DBDM DBDM
September 19, 2011 Databases and Data Mining 3 September 19, 2011 Databases and Data Mining 4
1
DBDM
Further Topics
Mining object, spatial, multimedia, text and Web data [R] Evolution
Mining complex data objects
of
Database Technology
Multimedia data mining
Text mining
Web mining
Applications and trends of data mining
Mining business & biological data
Visual data mining
Data mining and society: Privacy-preserving data mining
September 19, 2011 Databases and Data Mining 5 September 19, 2011 Databases and Data Mining 6
1970s:
2000s
Relational data model, relational DBMS implementation Stream data management and mining
Data mining and its applications
1980s: Web technology
RDBMS, advanced data models (extended-relational, OO, XML
deductive, etc.) Data integration
September 19, 2011 Databases and Data Mining 7 September 19, 2011 Databases and Data Mining 8
2
The Future of the Past
The Past and Future of 1997: Database Systems: A “Database Systems:
Textbook Case of Research Paying Off. By: J.N. Gray,
Microsoft 1997 A Textbook Case of Research Paying Off”,
“One Size Fits All”: An Idea Whose Time Has Come and
Gone. By: M. Stonebraker, U. Cetintemel. Proceedings of
The 2005 International Conference on Data Engineering,
April 2005, http://ww.cs.brown.edu/~ugur/fits_all.pdf
September 19, 2011 Databases and Data Mining 9 September 19, 2011 Databases and Data Mining 10
September 19, 2011 Databases and Data Mining 11 September 19, 2011 Databases and Data Mining 12
3
Worldwide Vendor Revenue Estimates from RDBMS Software,
Based on Total Software Revenue, 2006 (Millions of Dollars) Historical Perspective (1960-)
September 19, 2011 Databases and Data Mining 13 September 19, 2011 Databases and Data Mining 14
IF (LW-LOAN-AMT ZERO)
“The use of COBOL cripples the OR
(LW-INT-RATE ZERO)
OR
mind; its teaching should, (LW-NBR-PMTS ZERO)
MOVE 1 TO LW-LOAN-ERROR-FLAG
therefore, be regarded as a GO TO 004000-EXIT.
September 19, 2011 Databases and Data Mining 15 September 19, 2011 Databases and Data Mining 16
4
Historical Perspective (1970’s) Network Model
hierarchical model: a tree of records, with each record having
Transition from handling transactions in daily batches to one parent record and many children
systems that managed an on-line database that could
capture transactions as they happened.
September 19, 2011 Databases and Data Mining 17 September 19, 2011 Databases and Data Mining 18
September 19, 2011 Databases and Data Mining 19 September 19, 2011 Databases and Data Mining 20
5
The "relational" data model success Ingres at UC Berkeley in 1972
Both industry and university research communities Inspired by Codd's work on the relational database model, (Stonebraker,
embraced the relational data model and extended it during Rowe, Wong, and others) a project that resulted in:
the 1970s.
It was shown that a high-level relational database query the design and build of a relational database system
language could give performance comparable to the best the query language (QUEL)
record-oriented database systems. (!) relational optimization techniques
a language binding technique
storage strategies
This research produced a generation of systems and pioneering work on distributed databases
people that formed the basis for IBM's DB2, Ingres,
Sybase, Oracle, Informix and others. The academic system evolved into Ingres from Computer Associates.
Nowadays: PostgreSQL; also the basis for a new object-relational system.
SQL
The SQL relational database language was standardized Further work on:
between 1982 and 1986. distributed databases
database inference
By 1990, virtually all database systems provided an SQL
interface (including network, hierarchical and object- active databases (automatic responsing)
September 19, 2011 Databases and Data Mining 21 September 19, 2011 Databases and Data Mining 22
Codd's ideas were inspired by the problems with the DBTG network
data model and with IBM's product based on this model (IMS).
geographically distributed databases
Codd's relational model was very controversial:
too simplistic
parallel data access.
could never give good performance.
IBM Research chartered a 10-person effort to prototype a relational Theoretical work on distributed databases led to
system => a prototype, System R (evolved into the DB2 product) prototypes which in turn led to products.
Defined the fundamentals on: Note: Today, all the major database systems offer the ability to
distribute and replicate data among nodes of a computer network.
query optimization,
data independence (views),
transactions (logging and locking), and Execution of each of the relational data operators
security (the grant-revoke model). in parallel => hundred-fold and thousand-fold
SQL from System R became more or less the standard. speedups.
Note: The results of this research appear nowadays in the
The System R group further research: products of several major database companies. Especially
distributed databases (project R*) and beneficial for data warehousing, and decision support systems;
object-oriented extensible databases (project Starburst).
effective application in the area of OLTP is challenging.
September 19, 2011 Databases and Data Mining 23 September 19, 2011 Databases and Data Mining 24
6
USA funded database research
period 1970 - 1996: The Future of 1997 (Gray)
compiling queries more efficiently, The availability of very-large-scale (tertiary) storage devices has
prompted the study of models for queries on very slow devices.
7
The Future of 1997 (Gray) The Future of 1996
New datatypes (image, document, drawing) are Silberschatz, M. Stonebraker, J. Ullman Eds.
best viewed as the methods that implement them
rather than the bytes that represent them. SIGMOD Record, Vol. 25, No. 1
pp. 52-63
By adding procedures to the database system, March 1996
one gets active databases, data inference, and
data encapsulation. The object-oriented approach
is an area of active research.
(3/3)
September 19, 2011 Databases and Data Mining 29 September 19, 2011 Databases and Data Mining 30
8
Electronic Commerce Health-Care Information Systems
Authentication records.
Funds transfer. Interfaces to information that are appropriate for use by
all health-care professionals.
September 19, 2011 Databases and Data Mining 33 September 19, 2011 Databases and Data Mining 34
9
Support for Multimedia Objects (1996) New Research Directions (1996)
Tertiary Storage (for petabyte
storage) Problems associated with putting multimedia
Tape silos
Disk juke-boxes objects into DBMSs.
New Data Types Problems involving new paradigms for distribution
The operations available for each
type of multimedia data, and the of information.
resulting implementation tradeoffs.
The integration of data involving
several of these new types.
New uses of databases
Data Mining
Quality of Service
Data Warehouses
timely and realistic presentation of
the data?
gracefully degradation service? Can Repositories
we interpolate or extrapolate some
of the data? Can we reject new
service requests or cancel old ones? New transaction models
Multi-resolution Queries Workflow Management
Content Based Retrieval Alternative Transaction Models
User Interface Support Problems involving ease of use and management
of databases.
September 19, 2011 Databases and Data Mining 37 September 19, 2011 Databases and Data Mining 38
September 19, 2011 Databases and Data Mining 39 September 19, 2011 Databases and Data Mining 40
10
Ease of Use (1996) Conclusions of the Forum (1996)
Electronic physical database design tool for index Explosion of digitized information require the solution to
selection and database schema design significant new research problems:
support for multimedia objects and new data types
distribution of information
new database applications
workflow and transaction management
ease of database management and use
September 19, 2011 Databases and Data Mining 41 September 19, 2011 Databases and Data Mining 42
M. Stonebraker, U. Cetintemel
Proceedings
of
The 2005 International Conference
on Data Engineering
April 2005
http://ww.cs.brown.edu/~ugur/fits_all.pdf
September 19, 2011 Databases and Data Mining 43 September 19, 2011 Databases and Data Mining 44
11
DBDMS Services Overview DBMS: “One size fits all.”
September 19, 2011 Databases and Data Mining 45 September 19, 2011 Databases and Data Mining 46
To avoid these problems, all the major DBMS In the early 1990’s, a new trend appeared: Enterprises
vendors have followed the adage “put all wood wanted to gather together data from multiple operational
behind one arrowhead”. databases into a data warehouse for business intelligence
purposes.
A typical large enterprise has 50 or so operational
systems, each with an on-line user community who expect
In this paper it is argued that this strategy fast response time.
has failed already, and will fail more System administrators were (and still are) reluctant to
dramatically off into the future. allow business-intelligence users onto the same systems,
fearing that the complex ad-hoc queries from these users
will degrade response time for the on-line community.
In addition, business-intelligence users often want to see
historical trends, as well as correlate data from multiple
operational databases. These features are very different
from those required by on-line users.
September 19, 2011 Databases and Data Mining 47 September 19, 2011 Databases and Data Mining 48
12
Data Warehousing Data Warehousing
September 19, 2011 Databases and Data Mining 49 September 19, 2011 Databases and Data Mining 50
in data warehousing, but not in OLTP worlds. schemas and optimizer tactics for
normal (“virtual”) views find acceptance in OLTP star schema queries) and
environments. • OLTP DBMS (B-tree indexes
and a standard cost-based
optimizer), which are united by a
common parser
September 19, 2011 Databases and Data Mining 51 September 19, 2011 Databases and Data Mining 52
13
Emerging Applications Emerging Sensor Based Applications
September 19, 2011 Databases and Data Mining 53 September 19, 2011 Databases and Data Mining 54
September 19, 2011 Databases and Data Mining 55 September 19, 2011 Databases and Data Mining 56
14
Example: An existing application:
financial-feed processing Stream Processing
September 19, 2011 Databases and Data Mining 57 September 19, 2011 Databases and Data Mining 58
15
Inbound Processing Outbound vs Inbound Processing
September 19, 2011 Databases and Data Mining 61 September 19, 2011 Databases and Data Mining 62
September 19, 2011 Databases and Data Mining 63 September 19, 2011 Databases and Data Mining 64
16
Other Issues: Integration of DBMS Processing
and Application Logic Other Issues: High Availability
September 19, 2011 Databases and Data Mining 65 September 19, 2011 Databases and Data Mining 66
September 19, 2011 Databases and Data Mining 67 September 19, 2011 Databases and Data Mining 68
17
One Size Fits All? The Fourth Paradigm
September 19, 2011 Databases and Data Mining 69 September 19, 2011 Databases and Data Mining 70
September 19, 2011 Databases and Data Mining 71 September 19, 2011 Databases and Data Mining 72
18
Gray’s Laws
Database-centric Computing in Science VLDB 2010
How to approach data engineering challenges J. Cho, H. Garcia Molina, Dealing with Web Data:
related to large scale scientific datasets[1]: History and Look ahead, VLDB 2010, Singapore,
2010.
Scientific computing is becoming increasingly D. Srivastava, L. Golab, R. Greer, T. Johnson, J.
data intensive. Seidel, V. Shkapenyuk, O. Spatscheck, J. Yates,
The solution is in a “scale-out” architecture. Enabling Real Time Data Analysis, VLDB 2010,
Singapore, 2010.
Bring computations to the data, rather than data P. Matsudaira, High-End Biological Imaging
to the computations. Generates Very Large 3D+ and Dynamic
Start the design with the “20 queries.” Datasets, VLDB 2010, Singapore, 2010.
Go from “working to working.”
September 19, 2011 Databases and Data Mining 73 September 19, 2011 Databases and Data Mining 74
VLDB 2011
Novel Subjects
Social Networks,
MapReduce, Crowdsourcing, and Mining
Mapreduce and Hadoop
Information Integration and Information Retrieval
Schema Mapping, Data Exchange, Disambiguation of
Named Entities
GPU Based Architecture and Column-store indexing
19