Anda di halaman 1dari 54

Mike Ruthruff Program Manager Contributor: Prem Mehra SQL Server Customer Advisory Team Microsoft Corporation

Works on the largest, most complex SQL Server projects worldwide


US: NASDAQ, USDA, Verizon, Raymond James Europe: London Stock Exchange, Barclays Capital Asia/Pacific: Korea Telecom, Western Digital, Japan Railways East ISVs: SAP, Siebel, Sharepoint, GE Healthcare

Drives product requirements back into SQL Server from our customers and ISVs Shares deep technical content with SQL Server community

SQLCAT.com http://blogs.msdn.com/sqlcat http://blogs.msdn.com/mssqlisv http://technet.microsoft.com/en-us/sqlserver/bb331794.aspx

Target the Most Challenging and Innovative Applications on SQL Server Investing in Large Scale, Referenceable SQL Server Projects Across the World
Provide SQLCAT technical & project experience Conduct architecture and design reviews covering performance, operation, scalability and availability aspects Offer use of HW lab in Redmond with direct access to SQL Server development team

Work with Marketing Team Developing Case Studies


3

Characteristics of SQL Server I/O operations Best practices


SQL Server Design Practices Storage Configuration Common Pitfalls

Monitoring performance of SQL Server on SAN Emerging Storage Technologies


Lots of Additional Material In Appendix Section (not covered during session)

How to validate a configuration using I/O load generation tools General SQL Server I/O characteristics How to diagnose I/O bottlenecks Sample Configurations

Generalizing SQL Server I/O patterns is difficult making sizing storage for a SQL Server deployment non-trivial in some cases OLTP (Online Transaction Processing)

Typical heavy on random read / writes (8K most common) Some amount of read-ahead typical Typical 64KB+ sequential reads (table and range scan) 128-256KB sequential writes (bulk load) Backup/Restore , Index Rebuild, etc (see appendix)

RDW (Relational Data Warehousing)


Operational Activities

Many mixed workloads observed in customer deployments Analysis Services I/O patterns

Up to 64KB random reads

See appendix for more details on I/O characteristics of certain SQL Server operations
5

User threads fill log buffers & requests log manager to flush all records up to certain LSN - log manager thread writes the buffers Sequential in nature Individual write size varies
Dependent on nature of transaction Transaction Commit forces log buffer to be flushed to disk Up to 60KB in size

Log manager throughput considerations


Version Limits (per database)
Outstanding log writes: 32-bit=Limit of 8, 64-bit=Limit of 32 No more than 480K in-flight Outstanding log writes: 32-bit=Limit of 8, 64-bit=Limit of 32 No more than 3840K in-flight Limit log writes to 8 outstanding

SQL Server 2005 SP1 or later

SQL Server 2008

SQL Server 2000 SP4 & SQL Server 2005 RTM

How do I determine if I am hitting log bottlenecks?


First look for associated wait types (dm_os_wait_stats):
WRITELOG & LOGBUFFER

Suboptimal Disk Response Times (most common issue)


Logical Disk Counters: Avg. Disk/sec Write SQL Server:Databases: (Log Flush Wait Time)/(Log Flushes/sec)

Log manager limits (SQL Server:Databases Log Counters)


Amount of in-flight I/O Limit = Avg. Bytes per Flush * Current Queue Length Avg. Bytes per Flush = (Log Bytes Flushed/sec) / (Log Flushes/sec)

Amount of Outstanding I/O limit = Current Disk Queue Length or sys.dm_io_pending_io_requests


7

 High rate of inserts


 ~14,000 inserts/sec

 Observed high waits

on WRITELOG
 During checkpoint

activities log manager encounters periods of 32 outstanding I/Os

Periodically sweeps buffer pool and flushes dirty buffers to disk


Up to 32 pages in a single I/O request (WriteFileGather) More random in nature, although attempts to find adjacent pages Background/automatic checkpoints:
Triggered by log volume or recovery interval and performed by the checkpoint thread Initiated by the T-SQL checkpoint command Automatic as part of some operations, such as recovery, restore, snapshot creation, etc.

Types of Checkpoints
User-initiated checkpoints:

Reflexive checkpoints:

Concurrency
Background/automatic checkpoints take place one at a time, however
Any number of user-initiated or reflexive checkpoints may occur simultaneously as long they are for different databases

NUMA systems checkpoints spread work to lazy writer per node

Checkpoint Throttling
Checkpoint measures I/O latency impact and automatically adjusts checkpoint I/O to keep the overall latency from being unduly affected CHECKPOINT [checkpoint_duration]
CHECKPOINT now allows an optional numeric argument, which specifies the number of seconds the checkpoint should take Checkpoint makes a best effort to perform in the time specified If specified time is too low it runs at full speed

Lazy Writer
Background process which attempts to locate buffer pages which can be returned to the free list
LRU-2 algorithm in SQL 2005 / 2008

Time of next-to-last reference

Determined by the reference count of the page in SQL 2000


10

Attempts to retrieve data pages that will be used in immediate future Single read-ahead request
I/O Size determined by logical vs. physical ordering, target size of 64 pages (any multiple of 8K up to 512K) Standard: Limited to 128 pages, Enterprise: up to 512 pages Cumulative outstanding limit of 5000 (pages)

Occurs independent of parallel plan selection, however:


Parallel plan may drive I/O harder due multiple workers (scanner threads) Parallel page supplier segments the data requests in the case of parallel plans

Until the buffer pool is (nearly) full, all single-page requests bring in the entire 8-page extent (Enterprise only)
Helps the server come up to speed quicker
11

Characteristics of SQL Server I/O operations Best practices


SQL Server Design Practices Storage Configuration Common Pitfalls

Monitoring performance of SQL Server on SAN Emerging Storage Technologies


12

More data files does not necessarily equal better performance


Determined mainly by hardware capacity & characteristics of access patterns Data files can be used to maximize # of spindles stripping Number of data files per FILEGROUP

In the range of .25 to 1 per CPU cores depending on nature of the workload (also consider growth will number of CPU cores grow over time?) Scalability / performance consideration for allocation intensive workloads see slide on Diagnosing Allocation Contention

Consider disaster recovery requirements

Will the target environment for a disaster recovery restore accommodate the file sizes?

Best practices:

Align data files with CPU cores (considering access patterns) Pre-size data/log files Use equal size for files within a single FILEGROUP Grow all files in a single FILEGROUP together when possible Rely on AUTOGROW as little as possible
13

Performance

Filegroups can be used to separate tables/indexes - allowing selective placement of these at the disk level (use with caution) Separate objects requiring more data files due to high page allocation rate

Administration considerations (primarily)


Backup can be performed at the filegroup or file level Partial availability

Database is available if primary filegroup is available; other filegroups can be offline A filegroup is available if all its files are available

Tables and Indexes

Can specify separate filegroups for in-row data and large-object data

Partitioned Tables

Each partition can be in its own filegroup May provide better archiving strategy as partitions can be SWITCHED in/out of the table
14

Tempdb placement (dedicated vs. shared spindles)

In many modern storage scenarios it may better to place tempdb on common spindles with data files utilizing more cumulative disks
Depends on how well you know your workload use of tempdb (i.e. RDW workloads may

differ than OLTP)

Understand your own Tempdb usage

Many underlying technologies within SQL Server utilize tempdb (index rebuild with sort in tempdb, RCSI, etc..) More details (Working with tempdb in SQL Server 2005):
http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx

Best Practice: 1 data file per CPU core on host server


Applies most to allocation intensive workloads with heavy tempdb utilization Same practices as data files with respect to sizing and growth

15

High rate of allocations to data files can result in contention on allocation structures

PFS/GAM/SGAM are structures within data file which manage free space Especially a consideration on servers with many CPU cores More data files scales-out these structures and reduces the contention potential

Allocation contention is diagnosed by looking for waits on


PAGELATCH_UP

Either real time on sys.dm_exec_requests or tracked in sys.dm_os_wait_stats

Resource description in form of DBID:FILEID:PAGEID Can be cross referenced with sys.dm_os_buffer_descriptors to determine type of page

More details here:


http://sqlcat.com/technicalnotes/archive/2008/03/07/How-many-files-should-a-database-havepart-1-olap-workloads.aspx

16

Storage technologies are evolving rapidly and traditional best practices may not apply to all configurations

However, many still do apply (specifically physical isolation practices), especially at the high end (high volume OLTP, large scale DW) There is no one single right way to configure storage for SQL Server on SAN

SAN deployment can be complex


Generally involves multiple organizations to put a configuration in place

Dependent on things such as:


Workload characteristics (OLTP, RDW, Mixed) Deployment scenario (i.e. SQL Server consolidation) Scale of deployment - database size Performance requirements Array architecture Backup/restore & disaster recovery requirements, etc..

17

Shared storage environments are becoming more common


Already many shared components (ports / switches, array cache, controllers, etc) Spindle sharing is becoming more common of a practice

Different considerations on different classes of arrays Physical design matters


Cache does not solve all performance problems Think about splitting workloads with very different I/O characteristics at the physical levels yes, there is a benefit Isolation at physical level can provide 1) predictability and 2) better performance (in some cases)
Best to tune for writes (when possible) ; low log latency and absorbing checkpoint operations In shared storage environments - can be overused across hosts impacting all users
18

Array Cache

LUN design should be driven by:

Optimal configuration for particular storage array


Array architecture varies greatly and may impact LUN design / growth

strategy

Management/Growth strategy Windows/SQL Server considerations Array feature utilization (snapshots, replication, etc..)

LUNs should be dedicated to SQL Server data files


Separating data/log/tempdb at logical level even if shared at physical level facilitates easier monitoring Root of any mount point volumes should be dedicated for that purpose

Use single partition per LUN so you can extend/grow


19

There are barriers between DBAs and storage administrators


DBAs need to have knowledge of physical storage configuration Storage Administrators need some understanding of SQL Server I/O patterns When performance matters - size based spindle count not capacity, consider physical isolation at spindle level Shared components can impact everyone Heterogeneous I/O workloads sharing physical spindles can be problematic Workloads with overlapping periods of heavy I/O unpredictable performance Performance degradation over time as capacity utilization increases (increased seek times)

Sizing only on capacity is a common problem

Shared storage environments


Poorly tuned queries issuing more I/O than necessary

20

 SQL Server predeployment best practices not followed

Validation of configuration prior to deployment

SQLIO/IOMeter (see appendix)

Proactive monitoring strategy in place and trending of response times Proper host storage configuration

Queue depth set too low Multipathing improperly configured Not using vendor recommended drivers

Volume alignment performed at partition creation time


Disk Partition Alignment: Increase I/O Throughput by up to 10%, 20%, 30% or more Jimmy May Disk Partition Alignment Essentials (Cheat Sheet) Jimmy May

SQL Server Predeployment Best Practices Whitepaper:


http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx

21

Characteristics of SQL Server I/O operations Best practices


SQL Server Design Practices Storage Configuration Common Pitfalls

Monitoring performance of SQL Server on SAN Emerging Storage Technologies


22

Understand potential throughput of the hardware


Each component in the path has associated speed/bandwidth Know where the potential bottlenecks exist

Controllers/Processors

Switch

Front End Ports

Cache

Host

Switch

PCI Bus

HBA

Fiber Channel Ports

Array Processors

Disks
23

 Tools available to monitor SQL Server I/O behavior


Tool
Performance Monitor sys.dm_os_wait_stats sys.dm_io_virtual_file_stats sys.dm_exec_query_stats

Monitors
Disk counters Logical or Physical (when necessary) PAGEIOLATCH waits

Granularity
Volume or LUN

SQL Server Instance level

Latency, Number of I/Os Number of Reads (logical or physical) Number of writes (logical) Number of I/Os and type of access (seek, scan, lookup, write) PAGEIOLATCH waits

Database files

Query or Batch

sys.dm_db_index_usage_stats sys.dm_db_index_operational_sta ts sys.dm_os_io_pending_ios

Index or Table

Index or Table

Pending I/O requests at any given point in time

File (Per I/O request)

 Windows view of the I/O world


Counter Description
easures disk latency. Numbers will vary, optimal values for averages over time: 1 - 5 ms for Log (Ideally 1ms or better) 5 - 20 ms for Data (OLTP) (Ideally 10ms or better) <=25-30 ms for Data (DSS or RDW) (consider aggregate throughput) Hard to interpret due to virtualization of storage. Consider in combination with response times. Measures the Number of I/Os per second Discuss with vendor sizing of spindles of different type and rotational speeds Impacted by disk head movement (i.e. short stroking the disk will provide more I/O per second capacity)

Average Disk/sec Read & Write

Current Disk Queue Length

Disk Reads/Writes per Second

Disk Read & Write Bytes/sec Average Disk Bytes/Read & Write

Measure of total disk throughput. Ideally larger block scans should be able to heavily utilize connection bandwidth. Measures the size of I/Os being issued. Larger I/O tend to have higher latency (example: BACKUP/RESTORE)

 Backend monitoring of the array (array specific tools)


 Only way to get the complete story  Trending over time generally less granularity (1 min)

 Typical components monitored


 Front end port usage
 Bandwidth utilization, # of concurrent requests on port

 Throughput at the LUN level / physical disk level




Physical Disk I/O Rates




Exposes spindle sharing/undersized spindle count/RAID choice issues

 Cache utilization
 % of Cache Utilized  Write pending %  Impacts how aggressive array is in flushing writes to physical media

 Storage controller(s) utilization


 Similar to monitoring CPU utilization on any server

*Terminology may vary by vendor

Characteristics of SQL Server I/O operations Best practices


SQL Server Design Practices Storage Configuration Common Pitfalls

Monitoring performance of SQL Server on SAN Emerging Storage Technologies


27

 Storage level replication is more common place (SRDF, TrueCopy,

Continuous Access, SnapMirror, etc)


 Synchronous & Asynchronous  Storage based replication vs. database mirroring  Many of the same considerations apply  Sometimes data outside the database needs to be in a consistent state with

database

 Snapshot based Backup/Restore Technologies


 Fully materialized (sometimes referred to as a clone) & those that maintain only

deltas (sometimes referred to as snapshots - space efficient)


 Requires vendor provided tools & integration with SQL/Windows (VDI/VSS)

 Thin Provisioning
 Capacity on demand / supports Green Computing  Requires NTFS quick format & SQL Server instant file initialization

 Solid State Disks


 Flash memory based storage (no moving parts)  Potentially much higher performance especially for random I/O patterns

 Geographically dispersed clusters


 Enabling technologies provide by storage for enabling clusters over a

geographical distance

 Virtualization of heterogeneous

storage environments
 Ability to manage all storage

resources through a single platform


 Migration of data transparent to

the application

Storage device based on NAND flash Fits into regular HDD slot Utilizes the same command set and interface Advantages

Performance, weight, power & cooling consumption, more durable

Disadvantages
Cost per GB Shifting bottleneck Limited experience for enterprise use Seeks are free, writes are expensive relative to reads

Most appealing for

IOPs intensive Tier 0 storage (particularly random I/O) Mobile devices


30

EMC DMX4 Array RAID5 - 3+1


4 physical devices Log/Data on same physical devices

Database size (~300GB) Random read and write for checkpoints / sequential log writes 16 core server completely CPU bound

Sustained 12K IOPs < 4ms latency


Average 0.004 10100 0.001 1900 98 5200
31

Counter Avg Disk/sec Read (total) Disk Reads/sec (total) Avg Disk/sec Write (total) Writes/sec (total) Processor Time Batches/sec

31

EMC DMX4 Array RAID 1+0


34 Physical Devices Data 4 Physical Devices Log

Same workload/database as SSD configuration (OLTP) Nearly same sustained IOs with ~10x number of spindles

Higher latency Slightly lower throughput Short stroking the spinning media

Counters Avg Disk/sec Read (total) Disk Reads/sec (total) Avg Disk/sec Write (total) Writes/sec (total) Processor Time Batches/sec

Average 0.017 10259 0.002 2103 90 4613

PASS Community Summit 2008 DBA-323 SQL Server 2008 on SAN - Best Practices and Lessons Learned

32

SQL CAT Blog


www.sqlcat.com

SQL Server 2000/2005 I/O Basics on TechNet


http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx

SQL Server PreDeployment Best Practices


http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx

Storage Top 10 Best Practices


http://sqlcat.com/top10lists/archive/2007/11/21/storage-top-10-best-practices.aspx

SQL Server AlwaysOn Partner program


http://www.microsoft.com/sql/alwayson/default.mspx

SQL Server Best Practices Site


http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx

SQL Server 2008 website


http://www.microsoft.com/sqlserver/2008/en/us/default.aspx

Windows Server System Storage Home


http://www.microsoft.com/windowsserversystem/storage/default.mspx

ICE
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032341825&Culture=en-US

SQL Server Case Studies


http://www.microsoft.com/sql/casestudies/default.mspx

Microsoft SQL Server I/O subsystem requirements for the tempdb database
http://support.microsoft.com/kb/917047

FIX: Concurrency enhancements for the tempdb database


http://support.microsoft.com/kb/328551

33

34

Visit the Microsoft Technical Learning Center located in the Expo Hall
Microsoft Data Platform ISV Village Microsoft Ask the Experts Lounge Microsoft Chalk Talk Theatre Presentations

35

PASS Community Summit 2008 November 18 21, 2008 Seattle WA

36

Sponsored by

37

38

Difficult to generalize I/O patterns of SQL Server SQL is a platform on which applications are built hence I/O patterns may differ significantly from one application to another Monitoring of I/O is necessary to determine specifics of each scenario Understanding the I/O characteristics of common SQL Server operations/scenarios can help determine how to configure storage General I/O characteristics of common scenarios:

Operation
OLTP Log OLTP Log OLTP Data (Index Seeks) OLTP - Lazy Writer OLTP - Checkpoint Read Ahead (DSS, Index/Table Scans) Bulk Insert

Random / Sequential
Sequential Sequential Random Random Random Sequential Sequential

Read / Write
Write Read Read Write Write Read Write

Size Range
Sector Aligned Up to 60K Sector Aligned Up to 120K 8K Any multiple of 8K up to 256K Any multiple of 8K up to 256K Any multiple of 8KB up to 256K (512K for ENT Edition) Any multiple of 8K up to 128K

Note these values may change as optimizations are made to take advantage of modern storage enhancements 39

Operation

Random / Sequential
Sequential

Read / Write
Write

Size Range
512KB (SQL 2000) , Up to 4MB (SQL2005) (Only log file is initialized in SQL Server 2005)

CREATE DATABASE

BACKUP RESTORE DBCC CHECKDB ALTER INDEX REBUILD - replaces DBREINDEX (Read Phase) ALTER INDEX REBUILD - replaces DBREINDEX (Write Phase) DBCC SHOWCONTIG (deprecated, use sys.dm_db_index_physical_stats)

Sequential Sequential Sequential Sequential

Read/Write Read/Write Read Read

Multiple of 64K (up to 4MB) Multiple of 64K (up to 4MB) 8K 64K Any multiple of 8KB up to 256K

Sequential

Write

Any multiple of 8K up to 128K

Sequential

Read

8K 64K

Note these values may change as optimizations are made to take advantage of modern storage enhancements 40

SAN
Better flexibility provided by virtualization of storage
Speed of deployment (once initial configuration is in place) Online configuration changes Better overall utilization of storage resources

DAS
Simple and well understood

Likely cheaper for the same performance

More features
Storage Replication/Disaster Recovery, Snapshots/Clones via VDI/VSS Integration, Thin Provisioning, etc..

May have increased redundancy / reliability Likely higher cost


May be lower depending on individual components (i.e. SATA vs. SCSI) May be offset due to better overall utilization of storage resources

Less flexible better get it right the first time

Contrary to some common perceptions SAN does might not equal better performance

41

Questions to ask when determining if a I/O is a performance problem?


Are my top SQL Server wait types related to I/O? Are my disk response times healthy? Are they reasonable for my physical configuration? Do I need to investigate the physical level? What type of I/O operations is SQL Server performing?
Random in nature: focus on I/Os per second and response time. Sequential in nature: focus on aggregate throughput. How large are the I/Os (size will impact latency)?

Which queries are issuing the most I/O and are they properly tuned?

Which data files are incurring the most I/O and highest response times?

42

What is the process of diagnosing I/O performance issues?


Logical Disk Counters:
First line of defense - see previous slide

Wait types (sys.dm_os_wat_stats): PAGEIOLATCH_SH/EX, WRITELOG


Consider averages for wait statistics Accumulated from last server start or flush of stats consider deltas

Virtual File Stats: sys.dm_io_virtual_file_stats


File level statistics allowing for average size, latency, number of I/Os and total amount of I/O

Identify I/O intensive queries


sys.dm_exec_query_stats order by total_physical_reads Investigate query plans and index tuning

43

SQLIOSim.exe

Use: Ensure correct functionality of underlying I/O subsystem. Simulates various patterns of SQL Server I/O. Replacement for SQLIOStress.exe.
http://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-fordownload.aspx

SQLIOStress.exe (deprecated use SQLIOSim)


Use: Ensure correct functionality of underlying I/O subsystem. Simulates various patterns of SQL Server I/O. Use: Test throughput of I/O subsystem or establish benchmark of I/O subsystem performance
http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19&displaylang=en

SQLIO.exe

IOMeter

Use: Test throughput of I/O subsystem or establish benchmark of I/O subsystem performance Open source tool, Allows combinations of I/O types to run concurrently against test file No support for mount point volumes
http://www.iometer.org/

44

SQLIO / IOMeter

Validate Storage Configurations

SQLIO is an unsupported tool provided by Microsoft that can be used for this

IOMeter is an external tool providing ability to stress storage subsystem with a variety of I/O patterns concurrently
Test and validate the performance of each storage configuration before deploying SQL Server application (common pitfall) Benchmark performance and shake out hardware/driver/multipathing problems early in the configuration Share the results with your vendor good method for comparing different configurations

45

Things to consider when running tests


Test a variety of I/O types and sizes Make sure your test files are significantly larger than the amount of cache on the array

Exception: if you are testing channel throughput in which case use files that will fit in array cache To get a true representation of disk performance use test files of approximate size to planned data files small test files (even if they are larger than cache) may result in smaller seek times and skew results

Test each I/O path individually and then combinations of the I/O paths Relatively short tests are okay, however, longer runs may give a more complete understanding of how the storage will perform Allow time in between tests to allow the hardware to reset (cache flush) Keep all of the benchmark data to refer to after the SQL implementation has taken place Maximum throughput (IOPs or MB/s) has been obtained when latency continues to increase while throughput is near constant
46

Things to consider when looking at results

Consult your storage admin or vendor. They should know if this results are reasonable for the particular storage configuration Once you reach saturation (i.e. latency increases but throughput does not)
1. 2.

Ensure any multipathing is functional and you are not bound by the capacity of a single HBA, switch port, etc Ensure queue depth setting on the HBA is set high enough.
If too low it will seem as though the disk is saturated before it actually is (common pitfall) Default values for queue depth generally not ideal for SQL Server consider increasing

If test results vary wildly check to determine if you are sharing spindles with others on the array or shared components are an issue Monitoring at host and on the array during tests is ideal
47

RAID Levels

RAID 1+0 preferred for log, tempdb


Data files - when HA & performance really matter

RAID 5 observed frequently used in deployments


There is a write-performance penalty but may be acceptable for the scenario Cache may help but not for sustained write workloads

Other RAID levels are becoming more popular


Example: RAID-DP (NetApp proprietary)

48

17TB single OLTP database (mixed workload)

Second copy for reporting using transactional replication

Complex storage configuration deployed in production Achieves disaster recovery through storage level replication across

Distance ~30Km Average 3-5ms latency Through snapshot/VDI technologies Used to quick reestablish replication

Backup/Restore

49

Large scale online services


8 petabytes total across entire organization (not all SQL Server ) Generally simple LUN design, larger LUNs reduce number of LUNs per host

Deep queues per LUN, proactive monitoring of response time Sized for IOPs: sustain 8K Random / 18K sequential per LUN

Mixed environment OLTP / DW

Mix of EMC Clariion and DMX Storage

CX700, CX3-80, CX-400i, DMX 3 - 4500, DMX 4 Using SRDF for DR / Using clone technologies to enable scale out of reporting as well as backup/restore

One example: Business Intelligence Data Warehouse


Storage based snapshots per day for scale out reporting 16 servers in an active/active cluster Each server allocated with ~9TB = 144TB in total

50

Monitoring Strategy

Focus on response time , considered most important metric


Writes less than 6ms , Reads < 25ms Customers start to notice issues above 50ms disk response time

Utilize vendor specific tools for monitoring trending on backend


Deploy in both shared and dedicated models based on the requirements of the application Utilize simple LUN design fewer, larger LUNs to simplify management (clones)

SQL Server configuration


How do they succeed.


Work closely with SQL developers to optimize storage for I/O Professional respect. I will really say that it is a team effort A big part of it is learning to speak a common language. Hardware guys speak in I/Os SQL folks talk in spids and queries. We learned over the years to translate.
51

Information Security Consolidated Event DW


Internal tool used by Microsoft Information Security team Collects inbound and outbound e-mail traffic, login events, and Web browsing, into a single database which is then used to provide forensic evidence Provides analysis and query capabilities Gathers data from 85+ sources around the world Up to 10 concurrent users running ad-hoc queries and fixed reports SSIS, SSRS and the DW on the same box Use Table Partition to load new data into new partition quickly Achieve with minimal HW and operation cost
52

Dedicated storage environment Single database

4 way single core 2.2 GHz HP Proliant DL 585 G1, X-64 with 8 GB RAM 40 TB across two CX700 arrays Currently @ 30TB

Loading/Deleting 500GB-1TB / day (60 day retention period) SQL Data, Log, TempDB volumes on RAID1+0, backup volumes on RAID5 200GB LUNs backed by 12 Spindles

53

ERP database migrated from IBM Mainframe to HP Integrity rx8640 this year 12 CPUs Itanium Dual-Core with 192GB RAM as Database Server Database Volume around 5TB 6 teamed 4Gb HBAs Application Server layer:

10 x DL380 2 x Intel 5160 3.0GHz

Workload during day created by over 1500 users Workload during night created by heavy batch activities Up to 30,000 random IOPS monitored during high load phase or parallel index create

54

Anda mungkin juga menyukai