W H I T E PA P E R
This document contains Confidential, Proprietary and Trade Secret Information (Confidential Information) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software productas well as the timing of any such release or upgradeis at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published February 2012
White Paper
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Big Data Getting Bigger, Analytical Complexity Exploding . . . . . . . . . . . . . . 3
Inadequacy of Legacy Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Inadequacy of Existing Data Warehouse Management Practices . . . . . . . . . . . . . . . . . . . 5
Introducing Lean Data Warehouses Practices . . . . . . . . . . . . . . . . . . . . . . . 6 Implement Lean Data Warehouse Practices and Take Meaningful Action Based on the Analysis to Deliver Immediate and Quantifiable Benefits . . . . 7 Informatica Solutions for Lean Data Warehouses . . . . . . . . . . . . . . . . . . .11
Informatica Data Warehouse Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
Executive Summary
As technological advances have reshaped business, government, and consumers, business intelligence (BI) applications and data warehouse deployments have grown from departmental to enterprise-wide in recent years. As a result, the appetite for data is insatiable and analytic data volumes are growing exponentially with data warehouse systems of hundreds of terabytes and even petabytes becoming the norm. With exploding data volumes and increasing analytic complexity, information technology (IT) managers are under siege to respond to business needs while reducing the costs associated with data delivery. Unfortunately, data managers, application database administrators, data architects, and analytic application managers do not have the required instrumentation to gain visibility and understand what data is used or unused and, more importantly, decipher how data is being used to retain and optimize the most relevant assets. Lean Data Warehouses is a best-practices methodology for gaining greater visibility into the data warehouse environment by monitoring business activity and data usage, and managing data growth in data warehouses. Based on this visibility, organizations can reduce data management costs and ensure scalability of both infrastructure and available IT. The key objectives of Lean Data Warehouses are to:
Justify costs, prioritize and invest resources based on business utilization Retain and optimize the most relevant data and processes Respond faster and ensure scalability and performance
Lean Data Warehouses is one of the three pillars of Lean Data Management best practices and aims to address the challenges of managing Big Data warehouses, (See Figure 1). Lean Data Management is adapted from Lean Manufacturing practices that emphasize waste elimination to reduce costs. The other two pillars of Lean Data Management address the challenges of managing Big Applications and Big Application Portfolios.
MANAGE BIG DATA, REDUCE COSTS, MEET SLAs
LEAN APPLICATIONS
Archive Production Subset Nonproduction Improve performance Reduce maintenance
To gain tangible benefits, a comprehensive solution for Lean Data Warehouses based on usage monitoring should be combined with best practices to analyze how data is used and take meaningful action. The best practices to leverage usage monitoring to deliver immediate and quantifiable benefits include:
Developing key performance indicators (KPIs) to expose business utilization and consumption Identifying unused and unnecessary data Streamlining data loads and archiving data based on identification of unused and infrequently
accessed data
Optimizing and tuning databases based on actual data usage
2
White Paper
Effective usage monitoring requires a solution that integrates with the BI, data warehouse, and data integration stacks to provide a complete view into business activity and data usage. Informatica Data Warehouse Advisor is a software solution that monitors how business units and departments use data so that IT organizations can improve operational efficiency, scalability, and performance and control data delivery costs. Once dormant data is identified, Informatica Data Archive should be used to move the inactive data out of production instances to substantially reduce production data size, costs, and maintenance windows and increase data warehouse availability and performance. To further manage data growth in nonproduction environments, Informatica Lean Data Warehouse practices employ the Test Data Management solution (TDM), based on Informatica Data Subset software. This solution significantly reduces the footprint of nonproduction data warehouses by creating smaller, referentially intact subset copies of production that contain only the most relevant data for the user. Together, Informatica Data Archive, Data Subset, and Data Warehouse Advisor furnish the solutions to support Lean Data Warehouse practices. In the rest of this white paper, we will discuss:
The challenges around data growth and the drivers for adopting a Lean Data Warehouses approach Inadequacies of legacy data warehouse monitoring tools and data warehouse management
practices
How Informatica Data Warehouse Advisor, Data Archive, and Data Subset provide the right technology
Data Management for BI, Aberdeen Group, December 2010 Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage 3
Unfortunately, most inefficiency in BI and data warehousing environments arises from a lack of understanding how applications and data are actually used across the organization. Data managers, application database administrators, data architects, and analytical application managers are hampered by a lack of tools designed to provide visibility into what data is used or unused and, more importantly, how data is being used. This information is critical to retain and optimize the most relevant data assets. Data warehouses experience many of the same data growth issues as transactional systems: higher infrastructure costs to support dormant data in production, large and growing maintenance windows, lower user productivity due to poor performance, the creation of derivative data marts, and the multiplier effect in the many nonproduction copies, with the associated risk of data breaches. These issues are all more significant for data warehouses because data warehouses integrate data from multiple applications, as well as historical information for analytical reporting, so they are much larger and usually grow faster than transactional systems. Instead of a few terabytes, data warehouses typically contain tens to hundreds of terabytes. According to industry estimates, data warehouses actively use only the first years worth of data, but maintaining historical data can easily increase the storage requirement to as much as 20 times that of the current years data size. Based on this estimate and the massive potential size of data warehouses, the impact on cost could be substantial.
We have too much of the same information that typical DBA tools provide. But that is not useful to understand what our business users are doing and how data is used.
Director of Data Management, Large Financial Services Organization
White Paper
Inadequacy of Existing Data Warehouse Management Practices
A data warehouse environment can have many challenges associated with explosive data growth. They include performance, lengthening maintenance windows, burgeoning production infrastructure costs, the inability to meet SLAs, and the multiplier effect. In addition, the usual practice of making complete copies of production for nonproduction purposes compounds these issues. A common practice to address declining performance is data warehouse tuning. If data warehouse DBAs spend a lot of time tuning, they wont have time for more proactive activities (that is, unless you hire more DBAs). The data warehouse can reach a point where tuning is no longer effective. Longer maintenance windows (backup, disaster recovery, replication, and upgrades) due to explosive data growth means maintenance tasks need to be broken up into multiple, shorter windows. This requires more complex planning, or data warehouse availability will be impacted. Because the database is growing, you may find yourself having to address this issue repeatedly, eventually running out of time and cutting into your operational hours. One of the most common ways to cope with explosive data growth is to purchase hardware to accommodate it. Additional storage and server upgrades are a typical answer to explosive data growth. When you consider that you are purchasing the most expensive storage and servers for production systems to support data that has limited value to the organization, be sure that the hardware you are buying is indeed necessary to support the data residing on it. Most IT organizations make a complete copy of the production system for their nonproduction environments. Creating full nonproduction copies is inefficient and expensive in terms of maintenance, support, and storage costs, (See Figure 2). As development and testing environments get bogged down with unnecessary and obsolete data, system performance suffers and IT teams struggle to meet service-level agreements.
PRODUCTION
10 TB production data warehouse 10 TB copy of production 10 TB copy of production 10 TB copy of production data warehouse 10 TB copy of production 10 TB copy of production 10 TB copy of production data warehouse
Over time, using full-size copies in this manner can become costly.
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
I dont want to spend millions of dollars in additional hardware without first figuring out how we can better utilize what we support today. I want the ability to measure who is doing what and what is used and unused to manage our data and associated infrastructure more efficiently.
IT Director, Large Healthcare Services Company
White Paper
Respond faster and ensure scalability and performance
BI and data warehouse systems are growing in data size and complexity and must be continuously available to support diverse, often global user communities. Business users expect that IT teams proactively discover and respond to issues before critical business services are adversely affected. Instead of managing the BI, data warehouse, and data integration stacks as independent silos, organizations should provide end-to-end visibility to the multifunctional teams responsible for data delivery. By deploying a solution that provides an integrated view into the activity of BI users and applications with a correlated view into data warehouse usage and performance, IT organizations can improve operational efficiency and reduce the time and effort required to diagnose performance bottlenecks.
Implement Lean Data Warehouse Practices and Take Meaningful Action Based on the Analysis to Deliver Immediate and Quantifiable Benefits
Develop key performance indicators (KPIs) to expose business utilization and consumption
Too often, IT organizations rely on intuition and gut instinct to make key decisions relating to hardware and software investments and performance optimizations. By measuring and analyzing usage activity from a business-centric view, IT organizations not only can develop key performance objectives and metrics but can measure the results and variances as well. Delivering actionable information across the IT organization lets the support staff work more efficiently to achieve strategic and tactical objectives. It also lets senior IT managers measure the effectiveness of their investment, drive initiatives that can gain the most business value, and reduce the total cost of ownership.
Identify unused and unnecessary data to streamline data loads and archive inactive data
Because business users are constantly demanding more data, it is imperative for IT managers to assess and identify data that is being unnecessarily loaded every day (and often many times a day) into the warehouse but is not used or required. By identifying unused data (i.e., schemas, tables, and columns), IT can collaborate with the business to develop a more efficient process of sourcing only the necessary data, thereby streamlining data loads and indirectly improving data load times as well.
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
In addition, by identifying dormant data or data thats no longer used, IT organizations can develop a plan for archiving historical data in a lower-cost infrastructure (See Figure 3). Informatica Data Archive provides a highly compressed (up to 98 percent), immutable, secure optimized archive that can be accessed easily and quickly for e-discovery or reporting purposes. The operational benefits of archiving are clear: minimize maintenance windows and substantially reduce the footprint of your data warehouse (which sits on the most expensive infrastructure), while moving inactive data to the highly compressed archive on less expensive infrastructure.
Figure 3. Monitor data usage and identify dormant records for archiving
For example, a large pharmaceutical company has been able to reduce data management costs by more than $500,000 annually by only retaining data that is relevant and used by the business and archiving inactive data. In addition, it continually streamlines its data loads by pruning unnecessary data and has reduced batch load times by 50 percent.
White Paper
Figure 4: Optimize the data warehouse by monitoring BI reports and used columns
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
By using smaller, targeted clones of the production data warehouse for new projects, testing teams experience less lag time and save considerable storage costs.
10
White Paper
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
11
Data Warehouse
Active
ETL
Figure 6: Use Lean Data Warehouse practices to monitor data usage, identify dormant data that can be archived, eliminate unused data from data warehouse loads, and create lean subsets in nonproduction environments to further reduce costs
12
White Paper
Conclusion
Data warehouses in large enterprises are routinely growing into the tens and hundreds of terabytes, and the associated data management costs and complexity are growing exponentially. Organizations cannot continue supporting this growth without a prohibitive impact on costs associated with resources and infrastructure. Lean Data Warehouses is composed of best practices and solutions that leverage usage monitoring and data growth management to increase operational efficiencies, ensure scalability of infrastructure and available IT resources, and reduce data growth management costs. Effective usage monitoring requires a solution that integrates with the BI, data warehouse, and data integration stacks to provide a comprehensive view into business activity and data usage. Monitoring data usage is only the first step in the Lean Data Warehouses practice. Once you see how data is used, you need to act upon it by:
Eliminating unused data from data loads Optimizing the data warehouse schema Proactively archiving data periodically to reduce the data warehouse size, creating lean subsets of
ABOuT InfORMATICA
Informatica Corporation (NASDAQ: INFA) is the worlds number one independent provider of data integration software. Organizations around the world rely on Informatica to gain a competitive advantage with timely, relevant and trustworthy data for their top business imperatives. Worldwide, over 4,630 enterprises depend on Informatica for data integration, data quality and big data solutions to access, integrate and trust their information assets residing on premise and in the Cloud. For more information, call +1 650-385-5000 (1-800-653-3871 in the U.S.), or visit www.informatica.com.
production data in nonproduction environments to further reduce costs Informatica delivers the best-in-class technology and solutions to implement Lean Data Warehouses practices. (See Figure 6) With Informatica Lean Data Warehouses solutions, youll lower the total cost of ownership of your data warehouses and other applications by:
Reducing storage, server, software, and maintenance costs Improving data warehouse performance Increasing data warehouse availability Supporting compliance with internal, industry, and governmental mandates and regulations
Together, Informatica and your IT organization can align the business value of data in your data warehouses with the most appropriate and cost-effective IT infrastructure to manage it.
Lean Data Warehouse Practices: Optimize Data Warehouses with Better Visibility into Data Usage
13
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com
2012 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, The Data Integration Company, Ultra Messaging, and RulePoint are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. First Published: December 2011 1887 (02/16/2012)