Anda di halaman 1dari 31

1 Copyright 2012 EMC Corporation. All rights reserved.

Greenplum Database
Overview
Michael Crutcher
Greenplum Product Management
2 Copyright 2012 EMC Corporation. All rights reserved.
3 Copyright 2012 EMC Corporation. All rights reserved.
4 Copyright 2012 EMC Corporation. All rights reserved.
5 Copyright 2012 EMC Corporation. All rights reserved.
Greenplum Unified Analytic Platform
6 Copyright 2012 EMC Corporation. All rights reserved.
GREENPLUM DATABASE
Industry Leading Database with
Massively Parallel Performance
To Empower your Analytics
7 Copyright 2012 EMC Corporation. All rights reserved.

Extreme Performance for Analytics
Optimized for BI and analytics
Deep integration with statistical packages
High performance parallel implementations
Simple and automatic
Just load and query like any database
Tables are automatically distributed
across nodes
Extremely scalable
MPP shared-nothing architecture
All nodes can scan and process in parallel
Linear scalability by adding nodes
GREENPLUM DATABASE
8 Copyright 2012 EMC Corporation. All rights reserved.
Performance Through Parallelism
GREENPLUM DATABASE
Network
Interconnect
... ...
... ...
Master
Servers
Query planning &
dispatch
Segment
Servers
Query processing
& data storage
External
Sources
Loading,
streaming, etc.
9 Copyright 2012 EMC Corporation. All rights reserved.
Greenplum Data
Computing Appliance
Choose Greenplum
Database and/or
Hadoop modules in
rack increments
Scale up by adding
your choice of
additional modules
Minimal time to value
Greenplum
Software Solutions
Greenplum
Database, Hadoop,
& Chorus on your
x86 hardware
Flexibility for any
workload or
environment
Perpetual or
subscription licenses
Greenplum Delivers Choice & Flexibility
GREENPLUM DATABASE
10 Copyright 2012 EMC Corporation. All rights reserved.
Core Functionality
GREENPLUM DATABASE


11 Copyright 2012 EMC Corporation. All rights reserved.

Component Overview
PRODUCT
FEATURES
CLIENT ACCESS
& TOOLS
Multi-Level Fault Tolerance
(RAID, Mirroring, DR with
Data Domain Boost)
Shared-Nothing MPP
Parallel Query Optimizer
Polymorphic Data Storage
CLIENT ACCESS
ODBC, JDBC, OLEDB,
MapReduce, etc.
CORE MPP
ARCHITECTURE
Parallel Dataflow Engine
gNet Software Interconnect
Scatter/Gather Streaming Data Loading
Online System Expansion Workload Management
GREENPLUM
DATABASE ADAPTIVE
SERVICES
LOADING & EXT. ACCESS
Petabyte-Scale Loading
Trickle Micro-Batching
Anywhere Data Access

STORAGE & DATA ACCESS
Hybrid Storage & Execution
(Row- & Column-Oriented)
In-Database Compression
Multi-Level Partitioning
Indexes Btree, Bitmap, etc.
External Table Support
LANGUAGE SUPPORT
Comprehensive SQL
Native MapReduce
SQL 2003 OLAP Extensions
Programmable Analytics
Analytics Extensions
(GeoSpatial, PR/R, PL/Java,
PL/Python, PL/Perl)
3
rd
PARTY TOOLS
BI Tools, ETL Tools
Data Mining, etc
ADMIN TOOLS
Greenplum Command Center
Greenplum Package Manager
GREENPLUM DATABASE
12 Copyright 2012 EMC Corporation. All rights reserved.
SINGLE RACK COMPARISON
Most Powerful Data Loading Capabilities
Industry leading performance
at 10+TB per-hour per-rack
Scatter-Gather Streaming
provides true linear scaling
Support for both large-batch and
continuous real-time loading
strategies
Enable complex data
transformations in-flight
Transparent interfaces to loading
via support files, application, and
services
Greenplum load rates scale linearly with
the number of racks, others do not.
For example, two racks = >20TB/H
Greenplum Oracle
Exadata
Netezza Teradata
GREENPLUM DATABASE
13 Copyright 2012 EMC Corporation. All rights reserved.
Polymorphic Table Storage
TM
Storage types can be mixed within a table or database
Four table types: heap, row-oriented AO, column-oriented AO,
external
Rich compression functionality, definable column by column
Block compression: Gzip (levels 1-9), QuickLZ
Stream compression: RLE (levels 1-4)
Flexible indexing, partitioning, and more
TABLE CUSTOMER
Mar
11
Apr
11
May
11
Jun
11
Jul
11
Aug
11
Sept
11
Oct
11
Nov
11
Row-oriented for HOT DATA Column-oriented for COLD DATA
GREENPLUM DATABASE
14 Copyright 2012 EMC Corporation. All rights reserved.
A supercomputing-based soft-switch
responsible for
Efficiently pumping streams of data between motion
nodes during query-plan execution
Delivers messages, moves data, collects results, and
coordinates work among the segments in the system
gNet Software Interconnect
gNet Software
Interconnect
GREENPLUM DATABASE
15 Copyright 2012 EMC Corporation. All rights reserved.
Parallel Query Optimizer
Cost-based optimization
looks for the most
efficient plan
Physical plan contains
scans, joins, sorts,
aggregations, etc.
Global planning avoids
sub-optimal SQL
pushing to segments
Directly inserts motion
nodes for inter-segment
communication
PHYSICAL EXECUTION PLAN
FROM SQL OR MAPREDUCE
Gather Motion
4:1(Slice 3)
Sort
HashAggregate
HashJoin
Redistribute Motion
4:4(Slice 1)
HashJoin
Hash Hash
HashJoin
Hash
Broadcast Motion
4:4(Slice 2)
Seq Scan on
motion
Seq Scan on
customer
Seq Scan on
lineitem
Seq Scan on
orders
GREENPLUM DATABASE
16 Copyright 2012 EMC Corporation. All rights reserved.
Analytics Overview
GREENPLUM DATABASE


17 Copyright 2012 EMC Corporation. All rights reserved.
Greenplum gNet
Data Access & Query Layer
GREENPLUM
HD
Analytical Capabilities Overview
Stored
Procedures
MapReduce
Polymorphic Storage
SQL 2003
OLAP
SQL
GREENPLUM DATABASE
ODBC JDBC
GREENPLUM DATABASE
In-Database
Analytics
18 Copyright 2012 EMC Corporation. All rights reserved.
Data Access & Query Layer
SQL
GREENPLUM DATABASE
ODBC JDBC
In-Database Analytics: Categories
In-Database Analytics
Partner
Open-Source
User-written
Embedded
SAS/HPA
High Performance
Analytics
SAS Scoring
Accelerator
Open Source
Extensions
User-Written
Analytical
Algorithms
GPDB
Embedded
Analytics
GREENPLUM DATABASE
19 Copyright 2012 EMC Corporation. All rights reserved.
Analytics Highlight: MADlib
Scalable in-database
analytics
Data-parallel
Mathematical Algorithms
Statistical Algorithms
Machine learning Algorithms
Supports structured and
unstructured data.
Open-source software
Source Accessibility
Converge business,
academic, and open-source
communities
GREENPLUM DATABASE
20 Copyright 2012 EMC Corporation. All rights reserved.
Manageability, Extensions
GREENPLUM DATABASE


21 Copyright 2012 EMC Corporation. All rights reserved.
Single console for both Database and Hadoop
Administration
Start, Stop Database
Recover, Rebalance Segments
Interactive view of System Metrics
Real-time
Historic (Configurable by time period)
In-depth view for System Health
Hardware health
Software (Database, Hadoop)
Query Monitoring
Search, Prioritize, Cancel Queries
View Querys Execution Plan
Workload Management
Configure Resource Queues
Prioritize Users
Easy Manageability for Big Data
GREENPLUM DATABASE
22 Copyright 2012 EMC Corporation. All rights reserved.
Master
Servers
Segment
Servers
...
...
Greenplum supports easy deployment
of numerous extensions like Madlib,
PL/Perl, PL/Java, PostGIS, etc.
GREENPLUM DATABASE
Easy Extension Installation
Greenplum Package Manager
23 Copyright 2012 EMC Corporation. All rights reserved.
Connect any data set in Hadoop to
GP DBs SQL Engine
Process Hadoop data in place
Parallelize import/export data
from/to Hadoop thanks to GP DBs
market leading data sharing
performance
Supported formats:
Text (compressed and
uncompressed)
binary
proprietary/user-defined
GP HD 1.x, GP MR 1.x, CDH3u2
Text
Binary
User-
Defined
gNet for Hadoop
High Performance gNet for Hadoop
Parallel Query Access
GREENPLUM DATABASE
24 Copyright 2012 EMC Corporation. All rights reserved.
High Availability,
Back up, Support
GREENPLUM DATABASE


25 Copyright 2012 EMC Corporation. All rights reserved.
GPDB cluster
2 Master servers
Multiple Segment servers
Segment servers support
multiple database
instances
Primary instances that
actively process queries
Standby mirror instances
Block level mirroring
Low resource
consumption
Differential resynch
capable for fast recovery
Set of Active
Segment Instances
High Availability
GREENPLUM DATABASE
26 Copyright 2012 EMC Corporation. All rights reserved.
Backup/Restore with EMC Data Domain
Integration options
NFS: Data Domain device mounted
as NFS storage
DD Boost: Native, client-side
deduplication. Supported in GPDB
4.2 and higher
Drastic reduction in backup storage
requirement
Backup all segment servers in
parallel directly to Data Domain
Data Domain Integrates seamlessly
into standard Greenplum full
backup data export and data
restore procedures
GREENPLUM DATABASE
Full
Appliance
+
Data Domain
Boost or NFS
2 X 10GBit IP
27 Copyright 2012 EMC Corporation. All rights reserved.
Ideal for configurations with RPO and RTO requirements that can be specified in hours
Supports:
Collection Replication for DD Boost backup
Directory-level replication for NFS backup
Encryption over the WAN

Data Domain
Replication
LAN/WAN
Greenplum DCA
Greenplum DCA
Data Domain Data Domain
GREENPLUM DATABASE
Backup and restore between remote and primary sites
Backup/Restore with EMC Data Domain
28 Copyright 2012 EMC Corporation. All rights reserved.
Customer Support Services
Remote Technical Support
24x7 technical support and remote troubleshooting
Customer-managed case severity level
Four-hour response objective
Onsite Support (DCA Only)
Installation of replacement parts
Replacement parts shipped for next business day arrival
GP SW upgrade included
Proactive Service
Secure remote monitoring for hardware (DCA)
Notification of engineering technical advisories
Built-in tools maximize stability and performance
Secure Self-Help
24x7 access to eService support tools including
knowledgebase, forums, and appropriately licensed
software updates
GREENPLUM DATABASE
29 Copyright 2012 EMC Corporation. All rights reserved.
GREENPLUM DATABASE
Other Relevant Greenplum Sessions
Session Presenter Times
Unified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00
Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15
Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00
Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00
Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30
Big Data Driven Businesses in Action:
Creating Real Business Value Using
Greenplum UAP (Panel w/4 Customers)
Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30
Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45
Disruptive Data Science How Data
Science and Big Data are Transforming
Business, IT and People
Annika Jimenez
David Dietrich
Tues 4:15-5:15 Thurs 11:30-12:30
30 Copyright 2012 EMC Corporation. All rights reserved.
Thank You

Anda mungkin juga menyukai