Jean-Pierre Dijcks
Oracle
Big Data Product Management
Paul Kent
SAS
VP Big Data
Integrated Software:
• Oracle Linux, Oracle Java VM
• Oracle Big Data SQL*
• Cloudera Distribution of Apache Hadoop – EDH Edition
• Cloudera Manager
• Oracle R Distribution
• Oracle NoSQL Database
* Oracle Big Data SQL is separately licensed
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 5
Recap: Standard and Modular
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 10
Big Data SQL
SELECT w.sess_id, c.name
FROM web_logs w, customers c
WHERE w.source_country = ‘Brazil’
AND w.cust_id = c.customer_id;
• No Bottlenecks
• Full Stack Install and Upgrades
• Simplified Management
Operational Simplicity Simplify Access to ALL Data
– Cluster Growth
– Critical Node Migration
• Always Highly Available
• Always Secure
• Very Competitive Price Point
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues
• 12 node BDA for Production
Day 1 HA and Security Set-up
• Hadoop
• Ready to Load Data
Day 1
RCK_1
mammoth –e newhost1,…,newhostn
RCK_1 RCK_2
N
N
mammoth –e newhost1,…,newhostn
RCK_1 RCK_2
This expansion automatically optimizes HA
setup across multiple racks
N Because of uniform nodes and IB networking,
N
no data is moved
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19
Successful Big Data Systems Grow
From Cluster Install with HA to Large Clusters to Dealing with Operational Issues
RCK_1 RCK_2
N
N
RCK_1 RCK_2
• Reinstate the Repaired Node with a Single
Command:
N
N bdacli admin_cluster reprovision N1
Operational Simplicity
B B B
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 27
Introducing
SOURCE: http://commons.wikimedia.org/wiki/File:Tamoxifen-3D-vdW.png
C o py r i g h t © 2 0 1 2, S A S In s ti t u te I n c . A l l r i g ht s r e s e r v e d .
Agenda
1. SAS & Oracle Partnership
2. Family Stories
1. Hadoop
2. Oracle Engineered Systems Family
3. SAS Software Family
3. Deployment Patterns
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Elephant :: 3 Good Ideas !!
1. Never forgets
2. Is a good (hard) worker
3. Is a Social Animal (teamwork)
X
MYFILE.TXT
..block1 -> block1 block1copy2
..block2 -> block2 block2 copy2
..block3 -> block3 copy2 block3
X
Copyright © 2014, SAS Institute Inc. All rights reserved.
Redundancy Wins!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 46
Diversity. It’s a good thing!
Impala Nyala
Copyright © 2014, SAS Institute Inc. All rights reserved.
Agenda
1. SAS & Oracle Partnership
2. Family Stories
1. Hadoop
2. Oracle Engineered Systems Family
3. SAS Software Family
3. Deployment Patterns
Copyright © 2014, SAS Institute Inc. All rights reserved.
4 Important Things
HADOOP
SAS
Hive QL
SERVER
HADOOP
SAS
SAS HPA
SERVER
Procedures
Client
• Optionally select node subset “N, N-1, N-2, N-3, …) for more
dedicated resources for SAS Analytic Compute Environment by
shifting Big Data Appliance roles
63
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
STARTER BDA
SAS Visual Analytics
SAS
HPA Metadata
SAS
Root Server
Midtie
Node SAS
r
Compute
64
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
STARTER BDA
SAS Visual Analytics
SAS
Metadata
HPA
Server
SAS Consider:
Root Midtie
Node SAS
r
Compute
65
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
FULL RACK BDA
Metadata
SAS
SAS Server
Midtie
HPA SAS
r
Root Compute
Node
LASR LASR
Worker Worker
… 17 18
…
HDFS HDFS
Data Data
17 18
66
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
FULL RACK BDA
67
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
FULL RACK BDA
68
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
FULL RACK BDA
69
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
FULL RACK BDA
70
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
SAS High-Performance Analytics Performance
SAS Format Data (SASHDAT)
1107 var 1107 var
11.795 Mobs 73.744 Mobs
97GB 608GB
5.7GB/node 35.7GB/node 6x
Create 208.79 sec 2284.29 sec 11
Scan/Count 24.60 sec 259.38 sec 10.5
HPCORR 295.20 1410.40 4.7
HPCNTREG
Table 1: Summation336.79 1547.59
of 5/20/100/200 columns; 4.6
Baseline:
HPREDUCE DOP=1
(u) (no236.55
parallelism) 2467.76 10.4
120M rows, 400
HPREDUCE (s) columns,
219.50reg_simtbl_400
2037.74 9.3
71
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
OSC-AU FullRack BDA
• 408 Threads
• 600 GB dataset
• 17 servers
72
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
73
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
74
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
75
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
EXADATA SAS EMBEDDED PROCESSING (EP) TO EXADATA
INTEGRATION LEVERAGING BIG DATA SQL
SAS Visual Analytics
SAS
HPA Metadata
SAS
Root Server
Midtie
Node SAS
r
Compute
LASR
Worker
18
…
…
HDFS
Data
18
SAS
EP
76
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
SAS High-Performance Analytics Performance
SAS EP Parallel Data Feeders
DOP=1 DOP=24 DOP=24
(flash cache)
Add(5) 1.25min 1.5min .5min
Add(20) 2.5min 1.5min .5min
Add(100) 13min 1.5min .6min
Add(200) 16min ~2min 1.25min (10x)
77
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
SAS High-Performance Analytics Performance
SAS EP Parallel Data Feeders
78
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
SAS High-Performance Analytics Performance
SAS Format Data (SASHDAT) and Oracle EXADATA
1107 var 907 var 1107 var
11.795 Mobs 11.795 Mobs 73.744 Mobs
97GB 79.7GB 608GB
5.7GB/node 4.7GB/node 35.7GB/node
SASHDAT EXADATA SASHDAT
Create 208.79 sec 931.22 sec 2284.29 sec
Scan/Count 24.60 sec 956.16 sec 259.38 sec
HPCORR 295.20 833.24 1410.40
Table 1: Summation
HPCNTREG 336.79of 5/20/100/200 columns;
756.97 1547.59
Baseline: DOP=1 (no parallelism)
HPREDUCE (u) 236.55 1055.11 2467.76
120M rows, 400 columns, reg_simtbl_400
HPREDUCE (s) 219.50 1051.93 2037.74
79
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
ORACLE ENGINEERED SYSTEMS FOR
80
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
SAS AND ORACLE WOR
C o py r i gh t © 2 0 1 2, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
Paul.Kent @ sas.com SAS AND ORACLE
BETTER TOGETHER
@hornpolish
paulmkent
C o py r i gh t © 2 0 1 3, S A S I ns t i t u t e I n c . A l l r i g ht s r e s e r v e d.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 83