Anda di halaman 1dari 13

Data Warehouse Appliances – Evaluation Criteria

Krishna Manoharan
krishmanoh@gmail.com
http://dsstos.blogspot.com

1
DataWarehouse Appliances
Many Appliances in the Market with differing capabilities.
A common comparison method conducted by a
prospective Customer is by conducting Performance
Benchmarks.
In a typical real world deployment, Performance
characteristics would be one among many requirements.
This presentation is a customer centric attempt to outline
other requirements that can influence the outcome of a
deployment in the long run.
Ultimately the underlying technology becomes of academic
interest and hence should not be the sole criteria in
evaluating an appliance.

2
Grading an Appliance
In the following pages, I have listed typical Customer
Requirements.
The criteria that is listed is not all-encompassing, it can
serve as a foundation to further expand your
requirements.
It is important that as a Customer, you rate these criteria
based on your needs.
And then compare and grade Appliances based on their
capabilities as pertaining to these requirements.
As I typically do, I would use a real-world example to
show the process.
Appliance evaluation is not part of this exercise.
3
A Business Case – High Level
Business Unit within the Organization has a requirement to
analyze and run reports on their transaction data. Currently no
DW exists.
Rapidly changing business model due to nature of business. The
DW needs to keep pace with the changes. A traditional
Dimensional Model may not suffice.
Dashboards, Canned and Ad-hoc Reports. Rigid SLA with
financial penalties.
Reports need to be consistent and available 24*7 (no downtime
during loading of data). No real-time reporting requirements.
90% of the Reports require churning the last 1 year worth of
data. Some Reports churn the entire data set every day.
Growth rate is 2X/year. Data loaded on a daily basis. ETL should
not affect reporting performance.

4
Some Relevant Facts in our example
Existing Transaction System – Oracle 10g R2 RAC
DW needs to be built from scratch.
ETL Engine - Informatica
Reporting Engine – Cognos/Business Objects
Development tools – SQL Developer/Toad/Erwin
Backup – Netbackup
Monitoring - BMC Patrol
Admin Skills – Primary Oracle shop.
No DR capabilities yet. However, HA is a requirement.
Limited DC space available.
5
Criteria 1 – Developer Requirements
Area Requirements Comments Rating
Integrate with Informatica as a source/target without complicated work-
ETL Integration capabilities Required
arounds.

BI Tools Integration Ability to integrate with Cognos/Business Objects. Required


Developer Requirements

Can reports be run directly against Normalized Transaction Schemas? Or


do I need a traditional Dimensional Model with Summary Layers for Required
Be Schema Agnostic
performance? This will identify development time and flexiblility to (Performance Test)
meet Business requirements rapidly.
What methods are available for ensuring Data Consistency during
Data Consistency During Required
reporting - apart from typical ACID capabilities? For example - Oracle
Loading (Performance Test)
offers Partition Exchange as an option.

Support for Stored


Support for Stored Procedures NA
Procedures

Required
Predefined Complex Views Help with Cognos integration/Reporting by defining Views
(Performance Test)

Support Analytical/Window Required


Support analytical/Window functions at the DB layer.
Functions at DB Layer (Performance Test)

ERWIN + SQL Developer + To ensure compatibilty with current Data Modelling/Developement


Optional
Toad compatibility tools.

6
Criteria 1 – Developer Requirements contd.

Area Requirements Comments Rating


Date/Time Based Partitioning or equivalent is preferable due to the
Partitioning and supported Required (Performance
nature of reports and potential archiving capabilities . Will consider
types of partitioning. Test)
other partitioning methods if able to achive performance criteria.
Developer Requirements

Helps with reducing ETL Development Time by enforcing


Support Constraints on
constraints(PK/FK/Unique/Not Null) on the DB. Increases Database Optional
Database
overhead. Typically enforced through Indexes (more overhead).

The nature of the Business requires updating older data. Indexes


typically speed up such DML operations and enforcing constraints. If Optional
Support Indexes
capability is absent, then require to be tested using Performance (Performance Test)
Criteria.

Support various different


Support for existing Oracle Datatypes. Currently using RAW columns. Required
Datatypes

Data lifecycle Management


Archiving/offloading/compressing seldom used data. NA
(Archival strategies)

7
Criteria 3 – Standards, HA etc.

Area Requirements Comments Rating


Standards

Be HW Agnostic Future proof - no lock downs into proprietary HW. Optional

SQL Standard Support for ANSI SQL and/or Oracle SQL Required
Operational

Support Replication Support replication from Transaction systems using Golden


Data Store

NA
(Shareplex/Golden Gate/Etc) Gate/Shareplex/Informatica CDC etc.

Ability to support Transaction Function as a hybrid (Single row select/insert/update/delete


NA
Level activities capabilities)
Availability

DR Capabilities DR Mechanisms (Sync/Async Replication etc) NA

HA Capabilities HA capabilities (continously active) - Full HW/SW redundancy Required

8
Criteria 4 – Performance Management
Area Requirements Comments Rating
Minimal Performance
Do loads / reports need to be tuned for performance? How are bad Required (Performance
Management of Reports (Ad-
queries handled? Test)
hoc/Canned) and Loads
Performance Management

Aggressive Compression Save on Storage. Reduces IO overhead and improves IO performance.


Required (Performance
Capabilities and Minimal Pushes burden to CPU. Compression and related side-effects on DML
Test)
Compression Restrictions. activities (Insert/Update/Delete)

Consistent Reporting
Consistent timings for Reports for a fixed number of users. Do ETL Required (Performance
Performance (even during
activities affect Reporting Performance? Test)
Loading activities)

Database Resource Resource Management capabilities - control CPU/IO per User or other Required (Performance
Management mechanisms. Test)

Linear and seamless


scalability wrt Required (Performance
Linear scalability? Does scale-out/up require downtime?
Performance/Capacity Test)
Planning

High Performance Density


How small a configuration will meet performance needs? Affects overall Required (Performance
(Performance/GB +
price of appliance. Test)
Performance/U)

9
Criteria 5 – Maintenance
Area Requirements Comments Rating
Simplified, accurate and fast Statistics gathering on Large objects can be expensive in terms of Required (Performance
Statistics Gathering resources and time. Test)

Simplified Partition
Partition Maintenance consumes significant time of DBAs. Required
Maintenance

Simplified Space
Space Management consumes significant time of DBAs. Required
Management
Maintenance

Performance and Database


Does the appliance report performance and database stats - for e.g.
Stats (Not statistics for Required
Number of concurrent sessions, Memory Utilization and such?
objects)

Archiving based on Date


DW typically archive/delete older data based on a date predicate. Optional
Predicates

Well rounded Admin Console Well designed Admin interface would make management efficient. Required

Debug/Trace Capabilities Troubleshoot performance issues using debug/trace capabilities. Required

Backing up large volumes of data can be challenging. Easy integration


Easy Backup capabilities Required
into existing Netbackup infrastructure would be preferred.

10
Criteria 5 – Maintenance contd.
Area Requirements Comments Rating
In case of needing to restore data from an earlier period, can it be
Easy Restore capabilities Required
automated and seamless (without requiring an outage)?

Consistent and Easy Recovery Recovery from crash/hw failures without disruptions to running
Required
capablities loads/reports.

Easy Upgrades/Patching Seamless and easy upgrades/patching without downtime. Required


Maintenance

Integrate with existing Can the appliance be monitored by BMC Patrol? Can alerts be setup?
Required
Monitoring Infrastructure SNMP capabilities?

Easy to learn and Manage


Do we need specialized skills? Is it easily learnable? Required
Admin Skills

Low Complexity Is it complex in terms of HW/SW/Components etc? Required

Quick Infrastructure
Can it be up and running within a day/two? Required
Ramp-up time

Responsive Support
How is it licensed? Is HW/SW supported by a single vendor? Required
Model/Economical Licensing

11
Criteria 6 – Infrastructure Footprint

Area Requirements Comments Rating


Infrastructure Footprint

Moderate Power/Cooling Power consumption, Cooling etc (affects long term cost). Current
Required
Requirements DataCenter has limitations.

Space consumed in the Data Center (affects long term cost). Current
Minimal Space requirements Required
DataCenter has limitations.

When scaling out by adding an additional appliance unit, does the new
No Appliance Localization unit need to be in close proximity with the existing one? With limited Required
DataCenter Space, this is important.

Customer Appliance Can the appliance be re-assembled in a Customer Provided Rack?What


Optional
Modifications other modifications can be done by the customer?

12
Summarizing
As you can see, there are many aspects to evaluating an
appliance.
There is no perfect appliance that would possibly fit all
your requirements. It comes down to striking a middle
ground.
If you were to take these criteria with your requirements
and ask your appliance vendor to fill in their comments, it
would be easy to compare different vendors even before
attempting a benchmark.
Benchmarks are time and resource intensive and require
significant up-front planning. Ideally, this evaluation matrix
should help you narrow down the list significantly.

13

Anda mungkin juga menyukai