Anda di halaman 1dari 24

MICROSOFT SQL SERVER

DATABASE ENGINE I/O

by Bob Dorr, Microsoft SQL Server Principle Escalation Engineer, 1994 Present
Built: Jan 2008

Areas Covered

Write Ahead Logging (WAL) Protocol


Synchronous vs Asynchronous I/O
Scatter / Gather I/O
Sector alignment, Block Alignment
Latching and a page: A read walk-through
SQL Server I/O Sizes
Data cache maintenance
PAE and AWE
Read Ahead
User Mode and Kernel Mode (SYSTRAP)
Sparse Files and Copy On Write (COW) Pages
Locked Pages
Scribbler(s) and Bit flips
Page Protection and Constant Pages
Checksum vs Torn
Stale Read
Stalled I/O

WAL Protocol

Write Ahead Logging


ACID (Durability Property)
Log records secured
before data
Hardened / Stable Media
Log contains parity bit
Commit
Rollback
Trigger
Snapshot

Synchronous vs
Asynchronous I/O

Sync: Wait for Completion


Async: Post and Continue

Overlapped
Event
Completion Port

SQL Server

98% Async Usage


Overlapped

and HasOverlappedIoCompleted
Network Layers Use Completion Port

Backup/Restore Use Sync Sequential Patterns

dm_io_pending_io_request
s
Overlapped Structure
Async Processing ~=
CPU
Package vs Phone

Scatter / Gather I/O

Memory

Consolidates or Distributes
APIs

Scatter

Gather

Disk

ReadFileScatter
WriteFileGather

Increases Efficiency
Used by SQL I/O Paths
Used by Windows Page File
Old Design: 6.x Sorting
AWE Availability
WriteMultiple
# of 8K Pages
Forward and
Backward
Buffer Pool Ramp-up

Sector Alignment
Block Alignment

Sector: Log Writes


Block: Performance
Avoid Crossovers
DiskPart/DiskPar Utilities
Discuss with your Vendor

Alignment:
http://support.microsoft.com/kb/929491
To verify that an existing partition is aligned, divide the size of the stripe unit by
the starting offset of the RAID disk group. Use the following syntax: ((Partition
offset) * (Disk sector size)) / (Stripe unit size)
Example of alignment calculations in bytes for a 256-KB stripe unit size:
(63 * 512) / 262144 = 0.123046875
(64 * 512) / 262144 = 0.125
(128 * 512) / 262144 = 0.25
(256 * 512) / 262144 = 0.5
(512 * 512) / 262144 = 1

Double Touch
Rewrites
Defragment
4K Sectors

Latch
Memory (Data
Pages)

Multiple Readers (SH)


One Writer (EX)
Protects In-Memory Data Page

BUF
Array

BUF

Status
Latch
Database*
PageId
Hash *

Latch = Physical Protection


Lock = Logical Protection

User Mode
UMS/SQLOS Aware
Optimized FIFO Ordering

Flushed &
Rollback
Latch Timeout
Sub-latch

Reading A Page

Get Free Buffer for Read


Acquire Exclusive (EX) Latch
Is already in-memory/hashed?
Add Entry to Page Hash
Post and Record Asynchronous Read
Continue Processing .
Check Status (Scheduler Switch)
Complete: Validate I/O and Release Latch

0:000>
0:000> uf
uf ZwWriteFile
ZwWriteFile

mov
mov r10,rcx
r10,rcx

mov
mov eax,5
eax,5
Syscall
Syscall
Kernel
Kernel
Transition
Transition
ret
ret
kernel
kernel transition
transition Stuck
Stuck I/O?
I/O?
ntdll!ZwWriteFile+0xa
ntdll!ZwWriteFile+0xa
kernel32!WriteFile+0xf6
kernel32!WriteFile+0xf6
sqlservr!DiskWriteAsync+0xee
sqlservr!DiskWriteAsync+0xee

Page Audits
Read retry
Stalled I/O Warnings
Error raised at Acquire
Shared (SH) waiters
PAGE_IO* vs PAGE*
Latch
Writing A Page

Myth: Single Worker Per File


Truth: Each Worker Issues I/O
Vol
#1

dbTest.MDF

dbTest.NDF

Vol
#2

Create Database
Workers Assigned by Volume
ID

Work
er #4

dbTest.LDF
Work
er #5

Work
er #1

Serial Plan
select * from
dbTest.dbo.tblTest

Primary = dbTest.MDF
Secondary =
dbTest.NDF
Log = dbTest.LDF

Work
er #3

Work
er #2

Parallel Plan
select * from
dbTest.dbo.tblTest

Data Cache Maintenance

Memory Pressure: LazyWriter

Per NUMA Node


Time Of Last Access (TLA)

Recovery Interval: Checkpoint

Queue
I/O Targets
.LDF Usage Triggers
Alternate Triggers (Backup, Restore, )
Scatter/Gather Usage (WriteMultiple)
Checkpoint
Assignments
By Ordinal Sweep
Stalled I/O LW #0
I/O Queue Depth > 2

PAE and AWE

Physical Address Extensions

/PAE in Boot.ini
Boots Kernel with 36 bit addressing
Physical Memory > 4GB
Virtual Address Unchanged (/2gb or /3GB)
Automatic for Hot Add Memory Computers

Address Windows Extension

Windows APIs (AllocateUserPhysicalPages)


Physical Memory Allocations
Un/Mapped in or out of Virtual Address Range

32 Bit Address = 4294967295 (0xFFFFFFFF)


4GB
Interlocked Instruction
lock xadd
dword ptr [ecx],eax
36 Bit Address = 68719476735
(0xFFFFFFFFF) 64GB

Data Pages-Only
Locked Pages
Windows Paging
Windows 2000
Bugs

Read Ahead

128 Pages Standard SKU


1024 Pages Enterprise
SKU
Uses ReadFileScatter
Plan Based Decisions
Power of Asynchronous
I/O
Read Over
Write
Ramp-up

Sparse Files Copy On


Write

Usage

Online DBCC
Snapshot Databases

Buffer Pool:
PrepareToDirty
File Control Block (FCB)
Chaining

Sparse Allocation
FCB Tracking
Windows Limits
New Page
Allocations

Advanced Protection

What is a Scribbler?
Data Page Audits

None
Torn Bits
Checksum

Log Block Checksum


Constant Page
Backup with Checksum
DBCC Page Audit
Stale Read Check
SQLIOSim

REFERENCES

Overview

SQL Server Always On


http://www.microsoft.com/sql/always
on

SQL Server I/O Basics Chapter 1


http://www.microsoft.com/technet/pr
odtechnol/sql/2000/maintain/sqlIOba
sics.mspx

SQL Server I/O Basics Chapter 2

Fundamentals and Requirements


KB230785 SQL Server 7.0, SQL Server 2000 and
SQL Server 2005 logging and data sto
rage algorithms extend data reliabil
ity
KB917047 Microsoft SQL Server I/O subsystem r
equirements for the
tempdb database
KB231347 SQL Server databases not supported

Subsystems

KB917043 Key factors to consider when evaluating third-party file cac


he systems with SQL Server

KB234656- Using disk drive caching with SQL Server


KB46091- Using hard disk controller caching with SQL Server
KB86903 - Description of caching disk controls in SQL Server
KB304261Description of support for network database files in SQL Serv
er
KB910716 (in progress) Support for third-party Remote Mirroring solutions used wit
h SQL Server 2000 and 2005
KB833770 - Support for SQL Server 2000 on iSCSI
technology components (applies to SQL Server2005)

Design and Configuration


White paper - Physical Database Layout and Design
KB298402 Understanding How to Set the SQL Server I/O Affinity Option
KB78363 - When Dirty Cache Pages are Flushed to Disk
White paper - Database Mirroring in SQL Server 2005
White paper Database Mirroring Best Practices and Performance Considerat
ions

KB910378 Scalable shared database are supported by SQL Server 2005


MSDN article - Read-Only Filegroups
KB156932 Asynchronous Disk I/O Appears as Synchronous on Windows N
T, Windows 2000, and Windows XP

Diagnostics
KB826433 Additional SQL Server Diagnostics Added to Detect Unrepo
rted I/O Problems
KB897284 SQL Server 2000 SP4 diagnostics help detect stalled and s
tuck I/O operations
(applies to SQL Server2005)
KB828339 Error message 823 may indicate hardware problems or sy
stem problems in SQL Server
KB167711 - Understanding Bufwait and Writelog
Timeout Messages
KB815436 Use Trace Flag 3505 to Control SQL Server Checkpoint Beh
avior

Certification Policy
KB913945Microsoft does not certify that third-par
ty products will work with Microsoft SQL
Server
KB841696 Overview of the Microsoft third-party st
orage software solutions support policy

KB231619 - How to use the SQLIOStress


utility to stress a disk subsystem such
as SQL Server

Utilities

Download SQLIO Disk Subsystem Benchmark To


ol

Download - SQLIOStress
utility to stress disk subsystem
(applies to SQL Server7.0, 2000, and
2005 - replaced with SQLIOSim and
SQL Server 2008 installed in BINN)

Blog Content
SQL Server Urban Legends Discussed
http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx
How It Works: SQL Server Checkpoint (FlushCache) Outstanding I/O Target
http://blogs.msdn.com/psssql/archive/2008/04/11/how-it-works-sql-server-checkpoint-flushcache-outstanding-i-o-target.aspx
How It Works: SQL Server Page Allocations
http://blogs.msdn.com/psssql/archive/2008/04/08/how-it-works-sql-server-page-allocations.aspx
How It Works: Shapshot Database (Replica) Dirty Page Copy Behavior (NewPage)
http://blogs.msdn.com/psssql/archive/2008/03/24/how-it-works-shapshot-database-replica-dirty-page-copy-behavior-newpage.aspx
How It Works: SQL Server 2005 I/O Affinity and NUMA Don't Always Mix
http://blogs.msdn.com/psssql/archive/2008/03/18/how-it-works-sql-server-2005-i-o-affinity-and-numa-don-t-always-mix.aspx
How It Works: Debugging SQL Server Stalled or Stuck I/O Problems - Root Cause
http://blogs.msdn.com/psssql/archive/2008/03/03/how-it-works-debugging-sql-server-stalled-or-stuck-i-o-problems-root-cause.aspx
How It Works: SQL Server 2005 Database Snapshots (Replica)
http://blogs.msdn.com/psssql/archive/2008/02/07/how-it-works-sql-server-2005-database-snapshots-replica.aspx
How It Works: File Stream the Before and After Image of a File
http://blogs.msdn.com/psssql/archive/2008/01/15/how-it-works-file-stream-the-before-and-after-image-of-a-file.aspx
Using SQLIOSim to Diagnose SQL Server Reported Checksum (Error 824/823) Failures
http://blogs.msdn.com/psssql/archive/2008/12/19/using-sqliosim-to-diagnose-sql-server-reported-checksum-error-824-823-failures.a
spx

How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem
http://support.microsoft.com/kb/231619
Should I run SQLIOSim? - An e-mail follow-up from SQL PASS 2008
http://blogs.msdn.com/psssql/archive/2008/11/24/should-i-run-sqliosim-an-e-mail-follow-up-from-sql-pass-2008.aspx
What do I need to know about SQL Server database engine I/O?
http://blogs.msdn.com/psssql/archive/2006/11/27/what-do-i-need-to-know-about-sql-server-database-engine-i-o.aspx
SQLIOSim is "NOT" an I/O Performance Tuning Tool
http://blogs.msdn.com/psssql/archive/2008/04/05/sqliosim-is-not-an-i-o-performance-tuning-tool.aspx
How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers ...
http://blogs.msdn.com/psssql/archive/2008/11/12/how-it-works-sqliosim-running-average-target-duration-discarded-buffers.aspx
How It Works: SQLIOSim [Audit Users] and .INI Control File Sections with User Count Options
http://blogs.msdn.com/psssql/archive/2008/08/19/how-it-works-sqliosim-audit-users-and-ini-control-file-sections-with-user-count-opt
ions.aspx

Understanding SQLIOSIM Output

Additional Learning Resources

Inside SQL Server 7.0 and Inside SQL Server 2000


Written by Kalen Delaney her husband is Paul Randle who wrote the core dbcc checks for
SQL 7.0, 2000 and 2005

The Gurus Guide to SQL Server Architecture and Internals ISBN 0201-70047-6
Written by Ken after he joined Microsoft SQL Server Support
Many chapters reviewed by developers and folks like myself

SQL Server 2005 Practical Troubleshooting ISBN 0-321-44774-3 Ken


Henderson
Authors of this book were key developers or support team members
Cesar QP developer and leader of the QP RedZone with Keithelm and Jackli
Sameert Developer of UMS and SQLOS Scheduler
Santeriv Developer of the lock manager
Slavao Developer of the SOS memory managers and engine architect
Wei Xiao Engine developer
Bart Duncan long time SQL EE and now developer of the Microsoft Data Warehouse
performance focused
Bob Ward SQL Server Support Senior EE

Advanced Windows Debugging ISBN 0-321-37446


Written by Microsoft developers excellent resource

Applications for Windows Jeffrey Richter


Great details about Windows basics