Anda di halaman 1dari 6

Business Intelligence and E-Discovery

By Nadia Brannon
n the modern and increasingly complex business environment, business intelligence (BI) platforms are becoming more prevalent, especially in large organizations. Adoption of BI technologies is likely to double in the next five years. Yet, in the e-discovery arena there is very little awareness of these massive sources of electronically stored information (ESI). Additionally, there are hardly any e-discovery tools that can access information contained in such BI systems. This article explains what BI systems are, how they function, and the implications of BI technologies for the e-discovery process. The Basics: Structured vs. Unstructured Data Enterprise data in general can be broken down into two broad categories: structured and unstructured data. Structured data typically resides in databases. Such data is organized into tables with columns and rows of defined data types; relationships between various data fields and tables are clearly defined. Most common are relational database management systems (RDBMS) that are capable of handling large volumes of data such as: Oracle IBM DB2 MS SQL Server Sybase Teradata Many organizations have an enterprise resource planning (ERP) system or systems that capture daily transactions (orders, shipments, inventory movement, etc.), contain business planning, human resources, accounting, and financial reporting
Nadia Brannon is a data analytics and database forensics analyst at LECG. She may be contacted at nbrannon@lecg.com.

components. They also might have a Web server with a database that is populated with transactions that are executed via the Web. In fact, most organizations have multiple systems that talk to each other. This is particularly true for large entities that grew through multiple acquisitions and mergers. The data that resides outside of structured databases is called unstructured data. This includes: Electronic documents PowerPoint presentations Spreadsheets Email Images Schedules IM logs, and Multimedia files, etc. This data typically resides on individual computers or on file servers. In some cases, when the unstructured data is particularly important to the company and it needs to be searchable or requires further analysis, it might be organized into a structured database and made available as part of a business intelligence solution.There are a number of so-called content management systems that are designed to organize unstructured data in order to help control and manage content, versioning, and access rights. These systems include: Microsoft SharePoint LotusNotes IBM FileNet EMC Documentum Intellectual Property & Technology Law Journal 1

Volume 22 Number 7 July 2010

How do organizations make sense out of all this data in order to gain the competitive edge? That is where business intelligence comes in. What Is Business Intelligence? Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. BI systems are often referred to as the successor to decision support systems and most often facilitate various kinds of enterprise reporting tools. A typical business intelligence solution includes data sources where transactional data is accumulated, data warehouses/data marts, reporting and visualization tools, as well as predictive analytics and modeling. BI Components Source systems collect information to be analyzed; these include point of sale systems (electronic registers), Web transactional systems, inventory scanners, time card systems, etc. Data captured by the source systems is stored in data aggregations referred to as data sources or transactional databases. Typically these databases are configured for speed of processing rather than data analysis. Information from the data sources goes through a process known as extract, transform, and load (ETL) where the data is extracted from the source system, transformed (to meet business needs), and loaded into a data warehouse. Many different data sources can be consolidated in a single data warehouse. Information from the data warehouse is made available to end-users in the form of data marts where the data is organized to answer specific types of business questions (e.g., sales data can be cross referenced by product, region, time, sales representative, etc.). Finally, reporting and analytical tools are used to analyze the information in the data marts. These include standard and ad hoc reporting, dashboards, online analytical processing (OLAP), alerts, statistical and other predictive and optimization models. 2 Intellectual Property & Technology Law Journal

Why Do Companies Use Business Intelligence Platforms? In the current environment many companies offer similar products and use comparable technologies, making business processes the last remaining points of differentiation. Business intelligence platforms help management make better decisions in the exceedingly complex and ever changing business environment, to out-think the competition, to run the business in the most efficient way, to truly understand their customer base, and to deliver individualized products and services. Amazon.com, Netflix, WalMart, Proctor and Gamble, CapitalOne, Harrahs, Marriott, and the Red Sox all compete on business intelligence and predictive analytics. Vendors and Tools Who are the major software players in the BI market, and what type of solutions do they offer? Gartner identifies a few BI niche players such as Board International, Targit, acrplan, Actuate, and Panorama Software,challengers such as QlickTech, Tibco Software, and Tableau, as well such established market leaders as Oracle, SAS, IBM, Microsoft, Microstrategy, SAP, and Information Builder. Most of the established market leaders offer full suites of BI solutions. IBM IBM Cognos 8 Business Intelligence Solution Suite SAS SAS Business Intelligence Solution Suite Information Builder Information Builder WebFOCUS Enterprise Business Intelligence Suite Information Builder FOCUS (Host-Based Reporting mainframe solution) MicroStrategy 8 Business Intelligence Solution MicroStrategy 8 Platform Intelligence Server Narrowcast Server Volume 22 Number 7 July 2010

Microsoft Dynamics GP Business Intelligence Microsoft Dynamics add-on Microsoft FRx Microsoft Office Business Scorecard Manager 2005 Microsoft SQL Server Oracle Business Intelligence Foundation Products Business Intelligence Suite, Enterprise Edition Plus (OBIEE+) Business Intelligence Standard Edition One (OBIE) Business Intelligence Publisher Business Intelligence Answers Real-Time Decisions Oracle Business Intelligence Applications Oracle BI Suite Enterprise Edition Oracle Financial Analytics Oracle Human Resources Analytics Oracle Procurement and Spend Analytics Oracle Supply Chain and Order Management Analytics Oracle Sales Analytics Oracle Service Analytics Oracle Contact Center Analytics Oracle Marketing Analytics Oracle Business Indicators SAP SAP Business Intelligence Volume 22 Number 7 July 2010

SAP Analytics SAP NetWeaver E-Discovery Discovery is the term used for the initial phase of litigation in which the parties in a dispute are required to provide each other relevant information and records, along with all other evidence related to the case. E-discovery simply refers to discovery of electronically stored information. E-discovery process includes the identification, collection, preservation, processing, review, analysis, production, and presentation of electronically stored information. Historically, e-discovery has been focused on unstructured data such as email, instant messages, documents, spreadsheets, html pages, and images, as the information contained in these files can be easily reviewed and understood by lawyers. Structured data is often shied away from for two reasons: (1) the complexity and multiplicity of the database systems and (2) structured data is virtually meaningless without specialized reporting and analytical tools. Therefore, more often than not, structured data is completely ignored or inordinate amounts of effort and cost are inefficiently expended on data production with little benefit. Data in Organizations with BI Solutions Redundant Data As data gets collected in the source systems, accumulated in transactional databases, and stored for analysis in the data warehouses, companies will end up with multiple copies of the same data in all of these systems. However, the data is not going to be identical; it will be different in scope and volume. Typically, source systems retain data only for a limited time, and the data in these systems is present in its raw format. Transactional databases might combine data from multiple source systems, they could contain look-up tables and additional metadata that makes data discovery much easier, though often the data is limited in scope. For example, the transactional database might contain sales records for only the last three months. Large queries against the transactional database could be incredibly burdensome and could significantly impact the systems Intellectual Property & Technology Law Journal 3

performance bringing the entire business operation to a grinding halt. Data warehouses would typically have data accumulated for long periods but might not include all the fields that exist in the transactional databases. BI tools such as OLAP, as well as static and dynamic reports, provide access to the same data, but sometimes in a more user-friendly format. Clean vs. Dirty Data Data in source systems and transactional databases is typically as is, as it most likely has not undergone any review or scrubbing. Data in data warehouses is much cleaner. It has been analyzed for outliers, scrubbed, and erroneous data has been removed or corrected; thus it is more suitable for conducting further analysis. Dynamic vs. Static Data It is important to understand that most transactional data is dynamic in nature. For example, as of 3:55 pm a fitness club has 2,399 active members; as of 4:00 pm, it still has the same number of members, but these might be different people, as during the five minutes some members may have joined while others cancelled their membership. Thus, it is extremely important to know when a particular report was run and when data from the transactional databases was loaded into the data warehouse. ETL data transfers do not happen in real time; because of this, certain transactions might exist in transactional databases and not in the data warehouse and vice versa. System Migration and Integration System migrations pose a particular problem in e-discovery. Business rules and data structures in legacy systems could be significantly different from those in current systems causing complications in terms of data compatibility and translation. Similar problems arise when a company has multiple systems that perform identical functions. For example, two companies merged into one, but their respective CRM systems continue to run independently. The data warehouse that combines data from both CRM systems might be a preferred source of data for e-discovery purposes as opposed to the CRM systems themselves. Additionally, a data warehouse might contain historic data from the legacy system that was not migrated into the current system. 4 Intellectual Property & Technology Law Journal

What Should One Be Mindful of While Conducting Discovery in the Presence of BI? Who Owns the System vs. Data? Before starting the discovery process, it is important to understand who owns the physical system as opposed to the data in it. Business units (accounting, HR, marketing) are typically the data owners; they understand business rules and processes and the data that reflects them, but they might not necessarily know the technical side of how these rules are implemented in the system or how the data is handled and backed up. A business analyst might identify a report that is right on point or run a query that gives you exactly what you need, but he or she might not be aware that a production database from 10 years ago still exists on a backup tape in the database administrators drawer. Assemble a team that includes members that understand not only legal but also business and systems requirements, otherwise key points might be lost in the translation. Beware of Data Redundancy Identify systems that are more likely to contain responsive information in the most convenient and easily accessible format. Often queries against the data warehouse and canned reports are much more efficient, reliable, and cost effective than full data dumps from a transactional database, which might be a system of record. Understand Data Scope Understand the scope of data in various systems; when and how system migrations were performed. Focusing on a source system only could create a problem, as transactional systems might contain data for a limited time period, especially when the data warehouse has accumulated data for many years. Duplicate data in multiple systems (resulting from migration or integration) might lead to unexpected and false conclusions. Understand when data overlaps in different systems, when the data systems supplement each other, and beware of discrepancies in the various systems. Again, a data warehouse might turn out to be a superior source of responsive information than the system of record. Volume 22 Number 7 July 2010

Extract the Information in an Appropriate Format Make sure you understand how the data is going to be used once it is produced in litigation, for what purposes it will be analyzed, and what format it should be in. Too many times a lot of unnecessary effort and money are spent to convert, scrub, and make sense of data that was produced during discovery when a simple canned report would have sufficed. Sometimes, a missing field might make the data useless. For example, medical claims data was produced in a class action. The database administrator (DBA) extracted the member ID field, which should uniquely identify class members. However, the DBA did not realize that the company started using the member ID field only in the last few years, thus a large portion of the claims data had no identifier to link it to the class members, which rendered it virtually useless. Cloud/SaaS Cloud computing is becoming more prevalent. This class of computing includes: infrastructure-asa-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS). SaaS creates a particular challenge in terms of e-discovery. Examples of SaaS are: The CRM application hosted by Saleforces.com Google Docs Use of ASPs (application service providers) for computerized billing, invoicing, HR management, etc. While transactional data might appear to be inaccessible for discovery as it resides with the ASP, the data might be moved in the regular course of business via ETL process from the cloud into the BI platform for reporting and analysis purposes making it easily accessible. Confluence of E-Discovery and Business Intelligence Platforms Two separate trendsa desire to know everything about a business in order to compete, on one hand, and a need to know everything about what happened in the face of litigation on the other led to an interesting confluence of technologies: BI and e-discovery tools. Volume 22 Number 7 July 2010

While business intelligence platforms originate in a structured data arena, e-discovery efforts historically have been focused primarily on unstructured data. Data mining algorithms such as logistic regression, neural networks, and decision trees have been traditionally applied to structured data, but they are finding their way into analysis of unstructured data and text mining. An example of the convergence of these technologies is a BI search, which could be effectively used in e-discovery process. BI search is a term describing the convergence of business intelligence and enterprise search. It is a way to provide business users better access to information by enabling natural language searches of BI systems, rather than requiring specific structured queries. BI search functions enable users to search BI systems for reports and information, similarly to the way that they search the Web. In some cases, companies have a federated search engine that first indexes the unstructured data (somewhat similar to what Google does with Web pages) and thus makes their text content available for search. In fact, Google plays in this space by offering a Google Search Appliance (GSA), which includes both software and hardware and makes content of the entire company network available for search by users. Somewhat similar solutions are SAP NetWeaver, IBM WebSphere, and Oracle Stellent Content Server. The functionality of software in multiple categories, such as e-discovery, records management, email archiving, enterprise content management, information access technology, Web content management, and BI, is starting to converge and overlap. A good example is IBM, which competes in virtually all of these markets and offers business intelligence, data integration, and master data management through its family of InfoSphere products. Similarly, Autonomy spans multiple arenas with its IDOL7, Interwoven, Zantaz, Meridio, Virage, i-Manage, and Cardiff products. EMC, CA, and Symantec also play in multiple areas such as records management, e-discovery, and archiving. Conclusion Business intelligence platforms are becoming more widespread and sophisticated, and an increasing number of companies are integrating their BI and records management solutions. Counsel needs to be aware of the wealth of information contained in these BI systems and take full advantage of them in the discovery process. Intellectual Property & Technology Law Journal 5

Copyright of Intellectual Property & Technology Law Journal is the property of Aspen Publishers Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Anda mungkin juga menyukai