Anda di halaman 1dari 7

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

On the Event-Horizon of a Data-Centric Future


Alexander J. Singleton
The George Washington University via Johnson County Community College
CIS-260-004
Professor Jeremy George
May 12, 2016

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

The advent of database-science began back in 1970, with Dr. E.F. Codd at IBMs San
Jose Research Laboratory publication of groundbreaking-research describing the first relational
database model (Murach, J. (2014)). Nearly one decade later, Relational Software
Incorporated, the precursor to Larry Ellisons empire known today as Oracle, released the first
relational database management system (RDMS) using modern structured-query languages
(SQL) for data organization and retrieval (Murach, J. (2014)). Immaterial developments
transpired since the original RDMS framework until one decade into the next Millennium when
reality became evermore data-centric, in-part enabled by the cloud- more specifically serverside web applications. According to IBM, as of 2013, worldwide users generated 2.5 quintillion
bytes of data (thats 57.5 billion 32 GB iPads); 90% of the previously mentioned data was
created in the prior two years alone ((StorageNewsletter),(Wall, M. (2014, March 4)).
Telecommunications transferred to internet protocol; social media emerged as a new medium
for communication; content disseminated across myriads of channels over the air and under the
ground-yielding a new paradigm in 2009 monikered as NoSQL (Non-Structured Query
Language, a.k.a, (NDMS), Non-Structured Database Systems), as an unstructured alternative to
bypass the rigidity imposed by RDMS SQL, facilitating rapid data-collections on the fly- from
Twitter hash-tags to Facebooks email search system (Lith, A., & Mattsson, J. (2010)). A
comprehensive examination of RDMS/SQL vs NDMS/NoSQL beginning with a comparison of
relative strengths and weaknesses within the context of businesses cases and applications,
ultimately reveal a data-centric way of the future on the event-horizon.
A relational database management system utilizing structured-query language requires
logically structured management of data, cataloged by unique identifiers relating to a uniform set
of data that may be zipped-up between two different tables if needs be. Microsoft Excel is a
visual representation of what could be described as a SQL-driven database model for data
management, by organizing worksheets, RDMS tables, into workbooks, or RDMS databases.
Excel data retrieval isnt too far of a departure from RDMS interactions. The VLOOKUP function
featured in Excel specifically hinges on a unique identifier to query and retrieve column-data
requested between two different worksheets that may have relatable rows within a workbook;
similarly, RDMS affords multiple JOINS-functions in lieu of Excels VLOOKUP to enable table
traversal. Relatable unique identifiers concordantly define the very essence of RDMS- an
endless relationship of tables, and the underlying structures queried by language-SQL-to
organize and retrieve data.
NoSQL database management systems are intentionally devoid of any logical structuring
or any relational dependencies to dynamically accommodate rapid data collections, on the fly.

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

The NDMS approach is designed to bypass the limitations of strict relationships that would
otherwise relegate dynamic management of data. Unlike traditional relationship-oriented
databases (e.g. Oracle SQL, MySQL, SQLite), NoSQL affords freedom to group collections of
data together, wherefore each solution creates its very own querying-methodology according to
key-value pairs (Digital Ocean, Inc. (2014, February 21)-A). Essentially, a NoSQL model is a
hash-table defined by key-value pairs, like a dictionary storing related synonyms that may
pertain to a given word. NoSQL databases are relation-less, or schema-less; they are not
based on a single model...each database, depending on their target functionality, adopts a
different one, Digital Ocean, Inc. (2014, February 21)-A. NoSQL proliferated on account of
RDMS inefficiency to scale horizontally in a distributed system, which is a collection of
independent computers that appears to is users as a single coherent system or as a single
system (Spredzy. (2011, January 3), Kangasharju, J. (2016, May 8)).
Definitions of existing database management paradigms, RDMS and NDMS, yields
opportunity to scope within the context of Eric Brewers CAP Theorem, suggesting that it is
impossible for a distributed computer system to simultaneously provide all three of the following
guarantees: consistency, availability, and tolerance to network partitioning- all of which may be
reduced to the following database attributes: structure, querying-action, scalability, reliability,
support and application, as eloquently summarized below by Digital Ocean, a cloud
infrastructure provider that provisions virtual servers for software developers (CAP theorem.
(n.d.), (Digital Ocean, Inc. (2014, February 21)-B)):
1. Structure: SQL/Relational databases require a structure with defined attributes to hold the
data, unlike NoSQL databases which usually allow free-flow operations, (Digital Ocean, Inc.
(2014, February 21)-B).
2. Querying: Regardless of their licenses, relational databases all implement the SQL standard to
a certain degree and thus, they can be queried using the Structured Query Language (SQL).
NoSQL databases, on the other hand, each implement a unique way to work with the data they
manage, (Digital Ocean, Inc. (2014, February 21)-B).
3. Scaling: Both solutions are easy to scale vertically (i.e. by increasing system resources).
However, being more modern (and simpler) applications, NoSQL solutions usually offer many
simple means to scale horizontally, (i.e. by creating a cluster of multiple machines)(Digital
Ocean, Inc. (2014, February 21)-B).
4. Reliability: When it comes to data reliability and safe guarantee of performed transactions,
SQL databases are still the better bet..They are extremely popular, and it is very easy to find
both free and paid support (e.g. MongoDB), (Digital Ocean, Inc. (2014, February 21)-B).

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

5. Support: Relational database management systems have decades long history. They are
extremely popular, and it is very easy to find both free and paid support. If an issue arises, it is
therefore much easier to solve than recently-popular NoSQL databases especially if said
solution is complex in nature, (e.g. MongoDB)(Digital Ocean, Inc. (2014, February 21)-B).
6. Data-Warehousing: By nature, relational databases are the go-to solution for complex
querying and data keeping needs. They are much more efficient and excel in this domain- more
so than RDMS, (Digital Ocean, Inc. (2014, February 21)-B).
Obviously, SQL and NoSQL have relative strengths, and weaknesses-one may be more
suitable in business or use-cases whether the other may not; in essence, they can be
complementary frameworks; however, it is fair to maintain that SQL can perform NoSQL
operations, albeit at compromised performance, whereas conversely NoSQL cannot- due to the
intentional avoidance of RDMS schema. So what is the key-difference (pardon the pun)
between SQL, a relational database management system, and NoSQL database management
systems? The difference is that RDMS applications store data in a tabular form, while DBMS
applications store data as files, which means tables are options for NoSQL DBMS, but there
will be no relation between the tables, like in a RDMS (Udemy. (2014, February 7)).
Observations of systems in practice may be appropriate to determine suitable applications.
Amazon DynamoDB, advertised as a fast and flexible NoSQL database service for any
scale; pay only for the throughout and storage you need uses eventual consistency to come
close to get all three CAP theorem properties (Spredzy. (2011, January 3)). According to
Werner Vogels, CTO of Amazon, Dynamo is internal technology developed at Amazon to
address the need for an incrementally scalable, highly available key value-storage
system...designed to give its users the ability to trade-off cost, consistency and performance
while maintaining high-availability (Turner, J. (2011, January 12), Vogels, W., & Amazon. (2007,
October 2)). Dynamo is not directly exposed externally as a web-service, but it does power
parts of Amazon, like AWS S3-a simple storage as a service providing developers with secure,
scalable cloud storage (Vogels, W., & Amazon. (2007, October 2), Amazon Simple Storage
Service (S3) - Cloud Storage. (n.d.)).
In addition to non-RDMS, Twitter uses a rendition of RDMS SQL, called MySQL. Since
incorporation, MySQL has been one of Twitters key data storage technologies storing data in
hundreds of schemas as their largest cluster is thousands of nodes serving millions of queries
per second, (Borghino, P., & Twitter. (2015, April 16)). Twitter uses MySQL for replication for

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

fault-tolerance and read-scalability, [storing] a wide variety of data from commerce and ads to
authentication, trends, internal services and more, (Borghino, P., & Twitter. (2015, April 16)).
Cloudera offers an enterprise distribution of Hadoop, a non-conformist type of
database management. The underlying technology was invented by Google to index rich
textural and structural information they were collecting, and then present meaningful and
actionable results to users, (Turner, J. (2011, January 12)). This Google innovation was rolledup into Nutch, an open-source project contributed by Yahoo (Turner, J. (2011, January 12)).
According to Mike Olson, Hadoop subscribes to relational databases practices, as it was
invented to create cachet around a bunch of different projects, each of which has different
properties and behaves in different ways, (Turner, J. (2011, January 12)).
At the end of the day, the question becomes what problem are you trying to solve for a
given case? The Navy SEAL BUD/school teaches candidates that plan of attack is determined
by the target, not the weapon. RDMS may be better suited than non-RDMS for certain
applications, and vice-versa. For data collection, organization and retrieval, currently, DBAs
have three weapons of choice at their disposal: SQL, NoSQL, and the hybrid, Hadoop. But
what about the future of data collection? The amount of options available pales in comparison
to both closed and open-source frameworks available for web application programming. Is this
the age of post-modern database management or is database science just getting started?
Perhaps the underlying technology powering Bitcoin, known as blockchain, may provide a new
methodology presenting a potentially viable database solution alternative for modern, high
transaction volume applications (Jenkins, J. (2016, May 7)). Relating the initial claim about
RDMS vs NDMS relative strengths and weaknesses, the RDMS ability to do anything NDMS but
not necessarily vice-versa, and how heavily the hybrid framework Hadoop relies on SQL, it is
fair to conclude that relational databases are far from obsolescence and probably more relevant
than ever as we approach the event-horizon of a data-centric future.

Works Cited

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

Amazon Simple Storage Service (S3) - Cloud Storage. (n.d.). Retrieved May 12, 2016, from
https://aws.amazon.com/s3/
Borghino, P., & Twitter. (2015, April 16). Another look at MySQL at Twitter and incubating Mysos
[Web log post]. Retrieved May 8, 2016, from https://blog.twitter.com/2015/another-look-atmysql-at-twitter-and-incubating-mysos
CAP theorem. (n.d.). Retrieved May 08, 2016, from https://en.wikipedia.org/wiki/CAP_theorem
Digital Ocean, Inc. (2014, February 21)-A. A Comparison Of NoSQL Database Management
Systems And Models [Web log post]. Retrieved May 8, 2016, from
https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-databasemanagement-systems-and-models
Digital Ocean, Inc. (2014, February 21)-B. Understanding SQL and NoSQL Databases and
Different Database Models [Web log post]. Retrieved May 8, 2016, from
https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-databasemanagement-systems-and-models
Jenkins, J. (2016, May 7). Blockchain (Bitcoin) as a database? Retrieved May 12, 2016, from
http://dba.stackexchange.com/questions/137791/blockchain-bitcoin-as-a-database?atw=1
Kangasharju, J. (2016, May 8). Distributed Systems: What is a distributed system? Lecture
presented at Chapter 1 Lecture in University of Helsinki, Helsinki.
https://www.cs.helsinki.fi/u/jakangas/Teaching/DistSys/DistSys-08f-1.pdf

Lith, A., & Mattsson, J. (2010). Investigating storage solutions for large data (Master's thesis,
CHALMERS UNIVERSITY OF TECHNOLOGY, 2010) (pp. 14-18). Sweden: Chalmers
University of Technology. doi:http://publications.lib.chalmers.se/records/fulltext/123839.pdf
Murach, J. (2014). Murach's Oracle SQL and PL/SQL for developers (2nd ed.). Fresno, CA:
Mike Murach & Associates.
Spredzy. (2011, January 3). What are the differences between NoSQL and a traditional
RDBMS? Retrieved May 12, 2016, from http://dba.stackexchange.com/questions/5/what-arethe-differences-between-nosql-and-a-traditional-rdbms

StorageNewsletter 2.5 Quintillion Bytes Created Each Day, Calculated ViaWest. (2012, July
26). Retrieved May 12, 2016, from http://www.storagenewsletter.com/rubriques/marketreportsresearch/viawest-2-5-quintillion-bytes-each-day/

Running head: ON THE EVENT-HORIZON OF A DATA-CENTRIC FUTURE

Turner, J. (2011, January 12). Hadoop: What it is, how it works, and what it can do [Web log
post]. Retrieved May 8, 2016, from https://www.oreilly.com/ideas/what-is-hadoop
Udemy. (2014, February 7). Key Differences Between DBMS and RDBMS [Web log post].
Retrieved May 8, 2016, from https://blog.udemy.com/differences-between-dbms-and-rdbms/
Vogels, W., & Amazon. (2007, October 2). Amazon's Dynamo [Web log post]. Retrieved May 9,
2016, from http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Wall, M. (2014, March 4). Big Data: Are you ready for blast-off? BBC News. Retrieved May 09,
2016, from http://www.bbc.com/news/business-26383058