Anda di halaman 1dari 7

Gartner Magic Quadrant for Data Quality

Tools 2010 Says Open Source no Good until


2012
Vincent McBurney | Jul 8 | Comments (2)

The Magic Quadrant for Data Quality Tools for 2010 shows DataFlux ahead of the pack with
IBM and Informatica neck and neck behind while Oracle and Microsoft do not even qualify.
Every year or so Gartner releases a report on the main data quality software vendors in the
market:
Use this Magic Quadrant to understand the data quality tools market and how Gartner rates the
leading vendors and their packaged products in that market. Draw on this research to evaluate
vendors based on a customized set of objective criteria. Gartner advises organizations against
simply selecting vendors in the Leaders quadrant. All selections are buyer-specific, and vendors
from the Challengers, Niche Players or Visionaries quadrants could be better matches for your
requirements.
The report usually costs you a few hundred bucks but each time a few of the vendors who did
well in the report offer it for free as long as you give them all your contact details. Here are the
press releases for this years quadrant where you can click through to find the report:
• DataFlux Placed in Leaders Quadrant for 2010 Data Quality Tools Magic Quadrant
• Gartner Places Trillium Software in the 'Leaders' Quadrant of the 2010 Data ...
• Informatica Positioned in Leaders Quadrant in 2010 Data Quality Tools Report
Gartner estimates the market for data quality tools in 2009 was $727 million – which is a long
way behind ETL or EAI revenues but still a decent chunk of change.
The Result
What I have been doing for the last few years is comparing the current Magic Quadrant to the
one that came before it. So this diagram shows the results of the 2009 Quadrant shown as an
orange dot to the 2010 position shown as a green dot with a green line for those vendors who
have risen and a red line for those who have dropped: You can read
You can read find similar comparisons on my previous blog posts on this topic:
• DataFlux Still Leads in the Gartner Magic Quadrant for Data Quality Tools 2009
• Talend Gate Crashes the 2009 Gartner Magic Quadrant for Data Integration
• IBM has the strongest vision – Gartner Magic Quadrant for Data Integration Tools 2008
• DataFlux Takes the Lead in Magic Quadrant for Data Quality Tools 2008
• Gartner thinks everyone is looking up in the Magic Quadrant for Data Quality Tools,
2007
• 2006 Gartner Data Integration Tools Magic Quadrant
• Oracle Plunges and no one soars in the 2006 Gartner Data Integration Tools Magic
Quadrant
• Blog Fight - Gartner Versus Open Source Data Integration
The Winners
• DataFlux – the clear winner in the report as they have the best Ability to Execute and the
best Completeness of Vision. Leading this quadrant for the third time in a row.
• Trillium Software – keeps plugging along.
• IBM and Informatica – not doing too badly when the core business of their data
integration suite is ETL but they can come up in the top three vendors of the Data
Integration and Data Quality quadrants.
• Pitney Bowles – are shooting towards the leaders quadrant and would hope to be a leader
next year.
The Losers
• SAP-Business Objects – Got to wonder when the acquisition by SAP stop putting them in
reverse against the competition on data integration and data quality and start boosting
them. Does SAP value the data integration and data quality assets it acquired through
Business Objects or were they just after BI products?
• TIBCO – just spent a bundle of money buying Netrics, the data quality matching vendor
only to find out Netrics has been dumped from the data quality quadrant because their
focus is too narrow. TIBCO may have acquired one pure play data quality software
player but now they have to build on that.
• Oracle and Microsoft – are traditionally weak in data quality software since it doesn’t
have the same size of revenue as databases and other business software. The Gartner
report suggests they are starting to get busy - “While Oracle and Microsoft have both
recently begun to address this market via acquisitions, their market presence is currently
very limited. Oracle has just begun actively selling the acquired technology as a
complementary add-on for its product MDM solution, while Microsoft will be delivering
its technology to customers for the first time as part of the next major release of the SQL
Server database management system (DBMS).”
Open Source Flamed
Usually a Gartner Magic Quadrant simply ignores the open source tools, this one flames them.
Another significant development is related to licensing approaches and delivery models for data
quality tools. The open-source movement has reached the data quality tools market, with a few
small projects rising to the surface. Organizations with a need for data quality capabilities, such
as profiling, cleansing, matching or enrichment, should not expect deep functionality and should
stick with commercial offerings for critical production implementations. All open-source data
quality projects combined will reach just 3% to 5% market penetration (subscribed customers)
by 2012. It will likely be well beyond 2012 before open-source data quality platforms have
broadly caught up with the commercial data quality tools vendors in terms of their capabilities.
The Criteria
These are the defined functions evaluated for the report:
The data quality tools market comprises vendors that offer stand-alone software products to
address the core functional requirements of the data quality discipline:
• Profiling. The analysis of data to capture statistics (metadata) that provide insight
into the quality of the data and help to identify data quality issues.
• Parsing and standardization. The decomposition of text fields into component
parts and the formatting of values into consistent layouts based on industry
standards, local standards (for example, postal authority standards for address
data), user-defined business rules and knowledge bases of values and patterns.
• Generalized "cleansing." The modification of data values to meet domain
restrictions, integrity constraints or other business rules that define when the
quality of data is sufficient for the organization.
• Matching. Identifying, linking or merging related entries within or across sets of
data.
• Monitoring. Deploying controls to ensure that data continues to conform to
business rules that define data quality for the organization.
• Enrichment. Enhancing the value of internally held data by appending related
attributes from external sources (for example, consumer demographic attributes or
geographic descriptors).
There are a few functions missing that I think are important for major data quality initiatives –
1. data governance: tracking data stewardship and manual cleansing steps.
2. Business metadata – defining the business concepts, relationships and business rules of
what is being cleansed.
You might be able to think of some additional criteria not covered by the report.
IBM Strengths
The report puts IBM strengths as the successful embedding of the data quality message into the
Information on Demand marketing. Information Analyzer and QualityStage as enterprise wide
data quality tools and it is amongst the most diverse range of data domains of any vendor in the
report.
Customers reported a high satisfaction with scalability and performance. I don’t think there is
another data quality profiling or cleansing product on the market that can match it for scalability
and high volume processing since both Information Analyzer and QualityStage run on top of the
DataStage parallel processing engine.
IBM Cautions
The first caution is around mind share – and IBM have this same caution each time this report
comes out. IBM does a lot of marketing of the entire IOD stack through conferences and product
demos and Developerworks – but they do not have some of the high profile buzz of DataFlux
and Informatica. The DataFlux Community of Experts has data quality gurus blogging on it such
as David Loshin, Joyce Norris-Montanari, Jim Harris from the Obsessive-Compulsive Data
Quality blog, Dylan Jones (from Data Quality Pro), Charles Blyth, Phil Simon, Rich Murnane
and Robert Seiner. Informatica have the same on their Informatica Perspectives blog. They do
not quite have the same data quality credentials as the DataFlux Community but they
Other complaints against IBM are high price points and complexity of the overall system. The
products reviewed were QualityStage and Information Analyzer. This problem is addressed by
the IBM Exeros acquisition that has become InfoSphere Discovery. It is much cheaper than the
other Information Server products and it can be installed and run on a desktop letting you do
quick win data profiling at the start of a project.
One Data Quality to Rule Them All?
IBM has data quality on an enterprise scale as part of a larger data integration play. QualityStage
can scale up massively. Information Analyzer can support an entire team of data analysts and
can store the results and schedule them to keep running for years to come. These are products
that support large teams and large data. This means IBM do not do so well in the small data
quality space.
This is due to change with the introduction of InfoSphere Discovery – the data quality product
that IBM got from the acquisitions of Optim and Exeros. This solves all the cautions in this
report – it is a much lower price than Information Analyzer, it is easier to install, configure and
use. It has a wide range of functions that no one else on the quadrant does – business rule
discovery, source to target automated mapping, unified schema builder and the data quality
exception manager with a stewardship dashboard.
InfoSphere Discovery was not included in this quadrant but if IBM integrates it into their suite
effectively they should move to the front of this quadrant in the next report. The tricky part is
keeping the product small enough to be used by a data analyst but making it integrate with the
much larger Information Server for a holistic approach to data quality and data governance. I
have already seen the sales positioning that puts Discovery and Information Analyzer together
for validation and exception handling and it looks good.
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.
Related White Papers
• Move from insight to action with interactive, self-service dashboards-featuring

research from Gartner


• Overcome the four IT inhibitors to BI success in midsize companies-featuring

research from Gartner


• Data Quality Strategy: A Step-by-Step Approach

2 Comments

USER_2086161 | Jul 8
Great post, Vincent. I too was a little surprised at the flaming of open source tools. Although I
understand the report's point about low market penetration based on subscriptions to the
enterprise editions of open source data quality tools, I believe that the significant number of
users with the community edition (GPL license) is difficult to ignore. For example, I believe that
Talend Open Profiler has been remarkably well-received, especially the community edition. Of
course, I also agree with the report that these open source data quality tools do not yet provide
robust data quality capabilities on par with the depth of functionality provided by the market
leaders, but the tremendous environment of collaboration and support provided by the open
source community, definitely helps drive innovative product enhancements. In my opinion, a
company like Talend at least deserves to be mentioned in the Visionaries quadrant, albeit the
lower left corner of it. I too am very glad to see what IBM has done with InfoSphere Discovery
(via the acquisitions of Optim and Exeros), and I share your hope that they can keep the product
small enough (and priced low enough) to be used by data analysts (and small companies) but
also integrate it into the Information Server platform for a holistic approach to data quality and
data governance. Thanks for mentioning my blog and my contributions to the DataFlux
Community of Experts. (As an aside, I wonder if anyone sees the irony in the fact that I am an
IBM Information Champion, but I am helping DataFlux generate high level buzz). Best
Regards, Jim

Vincent McBurney | Jul 8


I think data profiling is the sweet spot for open source data quality - a lot of profiling tools are
just a SQL generator under the covers. Data stewardship screens and forms are another quick
win for a data quality offering. The hard parts of data quality, the parts that I don't think Open
Source can ever build as well as commercial, is advanced matching and address cleansing. Even
the big vendors like IBM and Informatica outsource international address cleansing to a
specialist like Address Doctor and that costs money to license. I think Open Source DQ is going
to have to license some advanced functions to compete.

Leave a Comment

Submit
Connect to this blog to be notified of new entries.
Name PREVIEW

E-mail

You are not logged in.


Sign In to post unmoderated comments.
Join the community to create your free profile today.
Archive Category: Data Quality
Keyword
Tags: Gartner Magic Quadrant for Data Quality Tools 2010 Says Open Source no Good until 20
12
Disclaimer: Blog contents express the viewpoints of their independent authors and are not
reviewed for correctness or accuracy by Toolbox for IT. Any opinions, comments, solutions or
other commentary expressed by blog authors are not endorsed or recommended by Toolbox for
IT or any vendor. If you feel a blog entry is inappropriate, click here to notify Toolbox for IT.

Anda mungkin juga menyukai