Catalin Esanu
Data Solution Architect
cesanu@Microsoft.com
Azure SQL Data Warehouse
End-to-end platform built Market leading price/
Elastic scale & performance
for the cloud performance
• Scales to petabytes of data with • Integrated with Azure platform • Simple compute & storage billing
MPP processing and other Microsoft services • Pay for what you need
• Resize compute nodes <1 minute • Enables hybrid solutions • High performance without rewriting
• Faster time to insight than other • Built on SQL Server experience & applications
SMP offerings technology • Low cost for latent data
• Designed for on-demand workload • Infrastructure, management and
support provided – WITH SLA
Quick glance – Azure SQL Data Warehouse
Requires
• A method for scheduling tasks
• A communication plan to maximise efficiency
• A distribution method for exchange of goods
SCALE UP vs OUT
UP – SMP (Symmetric Multi-Processing ) OUT - MPP
• Diminishing returns • Linear scale
• Non-linear costs at scale • Incremental cost
• Parallel execution hard • Parallel execution by default
• Low-mid complexity • Complex queries
• High concurrency • Medium concurrency
• Shared everything • Shared nothing
What will we talk about?
• Supported/Unsupported features (high level)
• High level architecture
• Data distribution
• Concurrency
• Resource classes
• Polybase
• Best Practices
• Getting started - demo
Supported Features
- Partitions
- Stored Procedures
- Functions
- Indexes (Clusterd Columnstore Indexes, Clustered/Nonclusterd with secondary indexes)
- DMVs
- Powershell cmdlets
- Temporary tables
- Variables
- Loops
- Schemas
- Views
- Drivers: ODBC, JDBC, PHP, ADO.NET
- Loading with Polybase and BCP
- Transparent Data Encryption
- Snapshots and Geo-Backup
- Dynamic SQL
- Pivot/Unpivot
- Analytical Functions
Unsupported Features
- Cursors
- Merge statement
- Cross db selects
- Table valued parameters
- Table variables
- CLRs
- Partial support for CTE
- ANSI joins on UPDATE/DELETE
- Full list here with possible workarounds
Azure SQL Data Warehouse Architecture
Storage and Compute are de-coupled,
Application or
User connection DMS (Data enabling a true elastic service and
Movement Service) separate charging for both compute and
Control executes across all storage
Data Loading
(SSIS, REST, OLE, ADO, ODBC,
DMS
Node database nodes
WebHDFS, AZCopy, PS)
Massively Parallel 100 DWU < > 2000 DWU
Processing (MPP) Engine
Compute
DMS DMS DMS DMS Scale compute up or down
SQL SQL SQL SQL when required
DB DB DB DB
(SLA <= 60 seconds).
Compute Compute Compute Compute
Node Node Node Node Pause, Resume, Stop, Start.
Control
Node
Massively Parallel
Processing (MPP) Engine
SQL
DB
SQL
DB
SQL
DB
SQL
DB
• RA-GRS storage
Compute Compute Compute Compute • +PB’s of storage
Node Node Node Node • Ingest data without
incurring compute costs
HDInsight
Data Distribution Concept
Distributions
CREATE TABLE myTable (column Defs)
WITH ( DISTRIBUTION = HASH (id));
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
D11 D12 D13 D14 D15 D16 D17 D18 D19 D20
D21 D22 D23 D24 D25 D26 D27 D28 D29 D30
D31 D32 D33 D34 D35 D36 D37 D38 D39 D40
D41 D42 D43 D44 D45 D46 D47 D48 D49 D50
D51 D52 D53 D54 D55 D56 D57 D58 D59 D60
Why Distribute Data
DW100 4 4 1 1 2 4 6 6 12 23
DW200 8 8 1 2 4 8 6 12 23 47
DW300 12 12 1 2 4 8 6 12 23 47
DW400 16 16 1 4 8 16 6 23 47 94
DW500 20 20 1 4 8 16 6 23 47 94
DW600 24 24 1 4 8 16 6 23 47 94
DW1000 32 40 1 8 16 32 6 47 94 188
DW1200 32 48 1 8 16 32 6 47 94 188
DW1500 32 60 1 8 16 32 6 47 94 188
Data Warehouse Units (DWU) Max DWU for a single SQL Data Warehouse 6000
Tempdb Max size of Tempdb 399 GB per DW100. Therefore at DWU1000 Tempdb is sized to 3.99 TB
Table Bytes per column Dependent on column data type. Limit is 8000 for char data types, 4000 for nvarchar, or 2 GB for MAX data types.
Index Non-clustered indexes per table. 999 - Applies to rowstore tables only.
PolyBase
Query
PolyBase View
Results
Execute T-SQL queries against
relational data in SQL Server and
semi-structured data in Hadoop or
Azure Blob Storage
CREATE temp
1
table T
Execute on compute nodes
PolyBase query example #2
-- select and aggregate
on external table (data
in HDFS)
Execution plan: SELECT AVG(c_acctbal)
WhatFROM Customer
happens here?
Step 1: QO compiles
WHEREpredicate into Java and
c_acctbal
generates
< 0MapReduce
GROUP BY (MR) job
Step 2: Engine submits MR job to Hadoop
c_nationkey;
cluster. Output left in hdfsTemp.
hdfsTemp
<US, $-975.21>
Run MR job on Apply filter and compute <UK, $-63.52>
1 <FRA, $-119.13>
Hadoop aggregate on Customer.
PolyBase query example #2
-- select and aggregate
on external table (data
in HDFS)
Execution plan: SELECT AVG(c_acctbal)
1. FROM Customer
Predicate and aggregate pushed
RETURN
Select * from T into Hadoop cluster as MapReduce
4 OPERATION WHERE c_acctbal
job
< 0 GROUP BY
IMPORT
3 hdfsTEMP Read hdfsTemp into T 2. c_nationkey;
Query optimizer makes cost-based
decision on what operators to push
CREATE temp
2 table T On DW compute nodes
hdfsTemp
1
Run MR job on Apply filter and compute <US, $-975.21>
Hadoop
aggregate on Customer. <UK, $-63.52>
Output left in hdfsTemp <FRA, $-119.13>
Summary: PolyBase
Query relational and non-relational data with T-SQL
Apps
Query relational
and non-relational
data, on-premises and
in Azure
T-SQL query
Secure
Connect to SQL Data Warehouse with the sqlcmd command
Including connection, authentication, prompt utility included with SQL Server
authorization, encryption, and auditing
Execute sample queries with SSDT/SSMS/sqlcmd to test your
Visualize
connections
Dynamic reporting and visualization
using Power BI
Getting started with Azure SQL Data Warehouse
Migrate
The Data Warehouse Migration Utility
(https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-migrate-migration-utility/) is a tool designed to migrate
schema and data from SQL Server and Azure SQL Database to SQL
Data Warehouse, including these steps:
Migrate
Migrate data and schema from SQL
Server and Azure SQL Database Download, install, launch, and connect Data Warehouse
Migration Utility to connect to source and destination databases
Load Data Click Migrate Schema to generate schema migration script for
ETL/ELT using Microsoft or third-party
data loading tools
selected tables
Click Migrate Data to generate scripts that will move data first to
Secure flat files on your server, and then directly into SQL Data
Including connection, authentication, Warehouse
authorization, encryption, and auditing
You may also use a tool such as DWUCalculator
Visualize
(http://dwucalculator.azurewebsites.net) for rough estimation of
Dynamic reporting and visualization
required DWU
using Power BI
Getting started with Azure SQL Data Warehouse
Load Data
Perform ETL/ELT and load data into your SQL Data Warehouse
using:
Azure Data Factory
AzCopy
Visualize
Dynamic reporting and visualization
using Power BI
Getting started with Azure SQL Data Warehouse
Secure
Establishing security for your SQL Data Warehouse is critical to
getting started:
Visualize
Dynamic reporting and visualization
using Power BI
What’s next?
- Read more about Azure
https://azure.microsoft.com/en-us/solutions/?v=4
https://azure.microsoft.com/en-us/get-started/
https://azure.microsoft.com/en-us/documentation/samples/
- Go to https://azure.microsoft.com/en-us/documentation/services/sql-data-warehouse for more
details about the product.
- Vote on User Voice page for your favorite upcoming features -
https://feedback.azure.com/forums/307516-sql-data-warehouse
- Contact me - cesanu@Microsoft.com
I’d love to hear your thoughts/impressions about the product.