Anda di halaman 1dari 6

IISWC 2007 Panel

Analyzing Petabytes

Suchi Raman
Netezza Corp.
http://www.netezza.com/

Petabyte Database Workloads

Macro-analytic queries
Identify trends and patterns > Very large data volumes > Query times dominated by disk scan times
>

Micro-analytic queries
Short running queries > Query run once and stored > Pre-computed summaries
>

Data management
ETL load/unload > Backup/restore
>

Netezza Confidential

Netezza NPS System


Asymmetric Massively Parallel Processing
SOLARIS AIX

Netezza Performance Server System


Snippet Processing Unit (SPU)
Processor & streaming DB logic

Client

TRU64

HP-UX

1
WINDOWS LINUX

ODBC 3.X JDBC Type 4 SQL/92

SQL Compiler Execution Engine

Snippet Processing Unit (SPU)


Processor & streaming DB logic

Query Plan

Snippet Processing Unit (SPU)


Processor & streaming DB logic

Optimize
ETL Server

Admin
High-speed Loader/Unloader


1000+

High-Performance Database Engine Streaming joins, aggregations, sorts, etc.


Snippet Processing Unit (SPU)
Processor & streaming DB logic

DBA CLI

Source Systems

Front End
3rd Party Apps

DBOS

SMP Host
High Performance Loader

Gigabit Ethernet

Massively Parallel Intelligent Storage

Netezza Confidential

Software challenges

Effective disk bandwidth


> > > > >

Optimal data layouts Data compression Increased effective disk bandwidth (and reliability!) Upgrades and evolution of on-disk formats Minimize disk reads (indexes, caches) Skew avoidance algorithms Scheduling among queries, especially with mixed workloads combining large and small queries System monitoring during busy periods Accurate profiling techniques

Query processing algorithms


>

>

System Monitoring/profiling
> >

Data management challenges


>
> > >

High speed data path in/out of NPS system Efficient/flexible data formats for load/unload Infrastructure challenge fast external devices for sourcing/sinking data Custom functions (UDFs/UDAs) implemented within the system

Netezza Confidential

Hardware challenges

> > > > >

Hardware challenges
Increased effective disk bandwidth (and reliability!) Multi-core technology Balancing CPU-to-disk ratio Specialized engines (e.g., FPGA-based filtering) Faster internal and external connectivity

Netezza Confidential

How can University Researchers contribute?

Explore new applications and data types


> > >

E.g., network traffic analysis Geospatial data Biological data types

>
> >

Skew avoidance/scheduling algorithms


Applications built on UDFs/UDAs Verification methods for optimizer algorithms

Platform improvements
> > > >

Disk performance and reliability FPGA filtering algorithms Faster interconnect networks Power and cooling improvements

Netezza Confidential

Anda mungkin juga menyukai