Jean-Pierre Dijcks
Oracle
Big Data Product Management
Paul Kent
SAS
VP Big Data
4 Q&A
Integrated Software:
Oracle Linux, Oracle Java VM
Oracle Big Data SQL*
Cloudera Distribution of Apache Hadoop EDH Edition
Cloudera Manager
Oracle R Distribution
Oracle NoSQL Database
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 10
Big Data SQL
SELECT w.sess_id, c.name
FROM web_logs w, customers c
WHERE w.source_country = Brazil
AND w.cust_id = c.customer_id;
No Bottlenecks
Full Stack Install and Upgrades
Simplified Management
Operational Simplicity Simplify
Cluster Access to ALL Data
Growth
Critical Node Migration
Always Highly Available
Always Secure
Very Competitive Price Point
Day 1
RCK_1
mammoth e newhost1,,newhostn
RCK_1 RCK_2
N
N
mammoth e newhost1,,newhostn
RCK_1 RCK_2
This expansion automatically optimizes HA
setup across multiple racks
N Because of uniform nodes and IB networking,
N
no data is moved
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | 19
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues
RCK_1 RCK_2
N
N
N
N
RCK_1 RCK_2
Reinstate the Repaired Node with a Single
Command:
N bdacli admin_cluster reprovision N1
N
Operational Simplicity
B B B
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | 27
Introducing
SOURCE: http://commons.wikimedia.org/wiki/File:Tamoxifen-3D-vdW.png
C op yr i g h t 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Agenda
1. SAS & Oracle Partnership
2. Family Stories
1. Hadoop
2. Oracle Engineered Systems Family
3. SAS Software Family
3. Deployment Patterns
Copyright 2014, SAS Institute Inc. All rights reserved.
Copyright 2014, SAS Institute Inc. All rights reserved.
Elephant :: 3 Good Ideas !!
1. Never forgets
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 46
Diversity. Its a good thing!
Impala Nyala
Copyright 2014, SAS Institute Inc. All rights reserved.
Agenda
1. SAS & Oracle Partnership
2. Family Stories
1. Hadoop
2. Oracle Engineered Systems Family
3. SAS Software Family
3. Deployment Patterns
Copyright 2014, SAS Institute Inc. All rights reserved.
4 Important Things
HADOOP
SAS
Hive QL
SERVER
HADOOP
SAS SAS HPA
SERVER Procedures
HPBIN HPCOUNTREG
HPMIXED
HPSEVERITY
HPFOREST
HPSVM
HPDECIDE
HPQLIM
Client
#2 Be Familiar
2. Be Familiar
3. Performance
63
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
STARTER BDA
SAS Visual Analytics
SAS
HPA Metadata
Root Server SAS
Node SAS Midtier
Compute
64
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
STARTER BDA
SAS Visual Analytics
SAS
Metadata
HPA
Server SAS
Consider:
Root
Node SAS Midtier
Compute
65
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
FULL RACK BDA
Metadata
SAS Server SAS
HPA SAS Midtier
Root Compute
Node
LASR LASR
Worker Worker
17 18
HDFS HDFS
Data Data
17 18
66
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
FULL RACK BDA ASSEMBLED IN OSC, SYDNEY AUSTRALIA
67
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
FULL RACK BDA ASSEMBLED IN OSC, SYDNEY AUSTRALIA
68
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
FULL RACK BDA ASSEMBLED IN OSC, SYDNEY AUSTRALIA
69
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
FULL RACK BDA ASSEMBLED IN OSC, SYDNEY AUSTRALIA
Read and Write Tabular files to/from Hive (will confirm Oracle BIGSQL in OSC-SC)
Read and Write SAS binary format files to/from HDFS
High Degree Of Parallelism (DOP) reads via Map-Only jobs
70
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS High-Performance Analytics Performance
SAS Format Data (SASHDAT)
1107 var 1107 var
11.795 Mobs 73.744 Mobs
97GB 608GB
5.7GB/node 35.7GB/node 6x
Create 208.79 sec 2284.29 sec 11
Scan/Count 24.60 sec 259.38 sec 10.5
HPCORR 295.20 1410.40 4.7
HPCNTREG
Table 1: Summation336.79 1547.59
of 5/20/100/200 columns; 4.6
Baseline:
HPREDUCE DOP=1
(u) (no236.55
parallelism) 2467.76 10.4
120M rows, 400 columns, reg_simtbl_400
HPREDUCE (s) 219.50 2037.74 9.3
71
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
OSC-AU FullRack BDA
408 Threads
600 GB dataset
17 servers
72
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
73
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
74
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
75
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
EXADATA SAS EMBEDDED PROCESSING (EP) TO EXADATA
INTEGRATION LEVERAGING BIG DATA SQL
SAS Visual Analytics
SAS
HPA Metadata
Root Server SAS
Node SAS Midtier
Compute
LASR
Worker
18
HDFS
Data
18
SAS
EP
76
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS High-Performance Analytics Performance
SAS EP Parallel Data Feeders
DOP=1 DOP=24 DOP=24
(flash cache)
Add(5) 1.25min 1.5min .5min
Add(20) 2.5min 1.5min .5min
Add(100) 13min 1.5min .6min
Add(200) 16min ~2min 1.25min (10x)
77
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS High-Performance Analytics Performance
SAS EP Parallel Data Feeders
78
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS High-Performance Analytics Performance
SAS Format Data (SASHDAT) and Oracle EXADATA
1107 var 907 var 1107 var
11.795 Mobs 11.795 Mobs 73.744 Mobs
97GB 79.7GB 608GB
5.7GB/node 4.7GB/node 35.7GB/node
SASHDAT EXADATA SASHDAT
Create 208.79 sec 931.22 sec 2284.29 sec
Scan/Count 24.60 sec 956.16 sec 259.38 sec
HPCORR 295.20 833.24 1410.40
Table 1: Summation of 5/20/100/200 columns;
HPCNTREG
Baseline: DOP=1 336.79
(no parallelism) 756.97 1547.59
120M rows,
HPREDUCE (u) 400236.55
columns, reg_simtbl_400
1055.11 2467.76
HPREDUCE (s) 219.50 1051.93 2037.74
79
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ORACLE ENGINEERED SYSTEMS FOR
80
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS AND ORACLE WORKING TOGETHER TO CREATE CUSTOMER VALUE
Joint R & D development and Template physical architectures Best Practice papers
Product Management teams developed based on use-cases SAS and Oracle Engineers
in Cary and Redwood Physically tested and provide joint "Sizing and
Shores benchmarked together Architecture Analysis and
Focus on driving SAS Reduction in physical effort Design"
technology components to Overall reduction in lifecycle
run natively in Oracle costs
database
Joint performance
engineering optimizations
C op yr i g h t 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Paul.Kent @ sas.com SAS AND ORACLE
BETTER TOGETHER
@hornpolish
paulmkent
C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Internal/Restricted/Highly Restricted 83