Display PDF

Table of Contents
Design Guide
Windows Display Driver Model (WDDM) Design Guide
Roadmap for Developing Drivers for the Windows Display Driver Model (WDDM)
What's new for Windows 10 display drivers (WDDM 2.0)
What's new for Windows 8.1 display drivers (WDDM 1.3)
WDDM 2.0 and Windows 10
GPU virtual memory in WDDM 2.0
Driver residency in WDDM 2.0
Context monitoring
WDDM 1.2 features
Advances to the display Infrastructure
Direct3D features and requirements in WDDM 1.2
Graphics INF requirements in WDDM 1.2
WDDM 1.2 installation scenarios
WDDM 1.2 driver enforcement guidelines
Introduction to the Windows Display Driver Model (WDDM)
Windows Display Driver Model (WDDM) Architecture
Benefits of the Windows Display Driver Model (WDDM)
Migrating to the Windows Display Driver Model (WDDM)
Windows Display Driver Model (WDDM) Operation Flow
Installation Requirements for Display Miniport and User-Mode Display Drivers
Setting the Driver Control Flags
Adding Software Registry Settings
Adding User-Mode Display Driver Names to the Registry
Loading a User-Mode Display Driver
Setting the Driver Feature Score
Setting a Copy-File Flag to Support PnP Stop
Setting the Start Type Value
Disabling Interoperability with OpenGL
Appending Information to the Friendly String Names of Graphics Adapters
Omitting LayoutFile and CatalogFile Information
Identifying Source Disks and Files
General x64 INF Information
General Install Information
Overriding Monitor EDIDs
Installation Requirements for Display Drivers Optimized for Windows 7 and Later
Setting the Feature Score for Windows 7 Display Drivers
Appending Information to the Friendly String Names for Windows 7 Display Drivers
Differentiating the SKU for Windows 7 Display Drivers
Encoding Windows 7 Display Driver INF Files in Unicode
Initializing Display Miniport and User-Mode Display Drivers
Plug and Play (PnP) in WDDM 1.2 and later
Providing seamless state transitions in WDDM 1.2 and later
Standby hibernate optimizations
Initializing the Display Miniport Driver
Initializing Communication with the Direct3D User-Mode Display Driver
Initializing Use of Memory Segments
Enumerating GPU engine capabilities
Loading an OpenGL Installable Client Driver
Providing Kernel-Mode Support to the OpenGL Installable Client Driver
WDDM Threading and Synchronization Model
Threading and Synchronization Model of Display Miniport Driver
Threading Model of User-Mode Display Driver
Video Memory Management and GPU Scheduling
Handling Memory Segments
Handling Command and DMA Buffers
GDI Hardware Acceleration
Video memory offer and reclaim
GPU preemption
Direct flip of video memory
Direct3D rendering performance improvements starting in WDDM 1.3
Graphics kernel performance improvements starting in WDDM 1.3
Present overhead improvements starting in WDDM 1.3
User-Mode Display Drivers
Returning Error Codes Received from Runtime Functions
Handling the E_INVALIDARG Return Value
Processing Shader Codes
Converting the Direct3D Fixed-Function State
Copying Depth-Stencil Values
Validating Index Values
Supporting Multiple Processors
Handling Multiple Locks
DirectX Video Acceleration 2.0
Supporting Direct3D Version 10
Supporting Direct3D Version 10.1
Processing High-Definition Video
Protecting Video Content
Verifying Overlay Support
Multiplane overlay support
Tiled resource support
Using cross-adapter resources in a hybrid system
Managing Resources for Multiple GPU Scenarios
Supporting OpenGL Enhancements
Monitor Drivers
Monitor Class Function Driver
Monitor Filter Drivers
Multiple Monitors and Video Present Networks
Video Present Network Terminology
Introduction to Video Present Networks
VidPN Objects and Interfaces
Child Devices of the Display Adapter
Enumerating Child Devices of a Display Adapter
Monitor Hot Plug Detection
Enumerating Cofunctional VidPN Source and Target Modes
Determining Whether a VidPN is Supported on a Display Adapter
Indirect Display Driver Model Overview
IddCx Objects
Tasks in the Windows Display Driver Model (WDDM)
Requesting and Using Surface Memory
Specifying Memory Type for a Resource
Locking Memory
Locking Swizzled Allocations
Manipulating 3-D Virtual Textures Directly from Hardware
Registering Hardware Information
Debugging Tips for the Windows Display Driver Model (WDDM)
Installing Checked Binaries
Enabling Debug Output for the Video Memory Manager
Changing the Behavior of the GPU Scheduler for Debugging
Emulating State Blocks
Logging Driver Errors
User-mode driver logging
Disabling Frame Pointer Omission (FPO) optimization
Using GPUView
XPS rasterization on the GPU
Timeout Detection and Recovery
Implementation Tips and Requirements for the Windows Display Driver Model
(WDDM)
Hardware support for Direct3D feature levels
Saving Energy with VSync Control
Validating Private Data Sent from User Mode to Kernel Mode
Specifying device state and frame latency starting in WDDM 1.3
Windows Display Driver Model (WDDM) 64-Bit Issues
Changing Floating-Point Control State
Supplying Fence Identifiers
Handling Resource Creation and Destruction
Supporting Video Capture and Other Child Devices
Supporting Rotation
Version Numbers for WDDM Drivers
Supporting Brightness Controls on Integrated Display Panels
Supporting Display Output and ACPI Events
Marking Sources as Removable
Stereoscopic 3D
Supporting Output Protection Manager
Supporting Transient Multi-Monitor Manager
Connecting and Configuring Displays
Wireless displays (Miracast)
Adaptive refresh for playing 24 fps video content
GPU power management of idle states and active power
Windows 2000 Display Driver Model (XDDM) Design Guide
Roadmap for Developing Drivers for the Windows 2000 Display Driver Model (XDDM)
Introduction to Display (Windows 2000 Model)
Windows 2000 Display Architecture
General Design and Implementation Strategies
Accessing the Graphics Adapter
Fast User Switching
Creating Graphics INF Files
Compatibility Testing Requirements for Display and Video Miniport Drivers
Display Drivers (Windows 2000 Model)
Graphics DDI Functions for Display Drivers
Display Driver Requirements
Display Driver Initialization
Synchronization Issues for Display Drivers
Debugging Display Drivers
Desktop Management
Pointer Control
Managing Display Palettes
Bitmaps in Display Drivers
Asynchronous Rendering
Transparency in Display Drivers
Special Effects in Display Drivers
Color Management for Displays
DirectDraw and GDI
Tracking Window Changes
Supporting the DitherOnRealize Flag
Supporting Banked Frame Buffers
Unloading Video Hardware
Using Events in Display Drivers
Multiple-Monitor Support in the Display Driver
Disabling Timeout Recovery for Display Drivers
Mirror Drivers
Display Driver Testing Tools
DirectDraw
About DirectDraw
DirectDraw Driver Fundamentals
DirectDraw Driver Initialization
Video Port Extensions to DirectX
Color Control Initialization
AGP Support
Kernel-Mode Video Transport
Extended Surface Alignment
Extended Surface Capabilities
Compressed Texture Surfaces
Compressed Video Decoding
Direct3D DDI
Cross Platform Direct3D Driver Development
Direct3D Implementation Requirements
Direct3D Driver DDI
Direct3D Driver Initialization
Direct3D Context Management
Direct3D Texture Management
Primitive Drawing and State Changes
FVF (Flexible Vertex Format)
Advanced Direct3D Driver Topics
DirectX 7.0 Release Notes
DirectX Video Acceleration
Introduction to DirectX VA
Video Decoding
Deinterlacing and Frame-Rate Conversion
ProcAmp Control Processing
COPP Processing
Example Code for DirectX VA Devices
DirectX VA Data Flow Management
DirectX VA Operations
Defining Accelerator Capabilities
Video Miniport Drivers in the Windows 2000 Display Driver Model
Video Miniport Driver Header Files (Windows 2000 Model)
Video Miniport Driver Requirements (Windows 2000 Model)
Video Miniport Driver Within the Graphics Architecture (Windows 2000 Model)
Video Miniport Driver Initialization (Windows 2000 Model)
Video Miniport Driver's Device Extension (Windows 2000 Model)
Individually Registered Callback Functions in Video Miniport Drivers
Events in Video Miniport Drivers (Windows 2000 Model)
Processing Video Requests (Windows 2000 Model)
Plug and Play and Power Management in Video Miniport Drivers (Windows 2000
Model)
Video Port Driver Support for AGP
Video Port Driver Support for Bug Check Callbacks
Child Devices of the Display Adapter (Windows 2000 Model)
I2C Bus and Child Devices of the Display Adapter
Interrupts in Video Miniport Drivers
Timers in Video Miniport Drivers
Spin Locks in Video Miniport Drivers
Resetting the Adapter in Video Miniport Drivers
Bus-Master DMA in Video Miniport Drivers
Supporting DualView (Windows 2000 Model)
Enabling DualView
DualView Advanced Implementation Details
TV Connector and Copy Protection Support in Video Miniport Drivers
Mirror Driver Support in Video Miniport Drivers (Windows 2000 Model)
VGA-Compatible Video Miniport Drivers (Windows 2000 Model)
Video Miniport Drivers on Multiple Windows Versions (Windows 2000 Model)
Implementation Tips and Requirements for the Windows 2000 Display Driver Model
Exception Handling When Accessing User-Mode Memory
Version Numbers for Display Drivers
Handling Removable Child Devices
GDI
Graphics System Overview
Using the Graphics DDI
GDI Support for Graphics Drivers
Display Samples
Display Devices Design Guide
4/26/2017 • 1 min to read • Edit Online
This section includes:

Send comments about this topic to Microsoft
Windows Display Driver Model (WDDM) Design
Guide
The Windows Display Driver Model (WDDM) is available starting with Windows Vista and is required starting with
Windows 8. This section discusses requirements, specifications, and behavior for WDDM drivers.
Note Windows 2000 Display Driver Model (XDDM) and VGA drivers will not compile on Windows 8 and later
versions. If display hardware is attached to a Windows 8 computer without a driver that is certified to support
WDDM 1.2 or later, the system defaults to running the Microsoft Basic Display Driver.
The following sections describe the Windows Display Driver Model (WDDM):
Introduction to the Windows Display Driver Model (WDDM)
Installation Requirements for Display Miniport and User-Mode Display Drivers
Installation Requirements for Display Drivers Optimized for Windows 7 and Later
Initializing Display Miniport and User-Mode Display Drivers
Windows Vista Display Driver Threading and Synchronization Model
Monitor Drivers
Debugging Tips for the Windows Display Driver Model (WDDM)
Implementation Tips and Requirements for the Windows Display Driver Model (WDDM)
Display Samples
Note WDDM drivers do not directly use services of the Windows Graphics Device Interface (GDI) engine;
therefore, the GDI section is not relevant to writing display drivers for the WDDM driver model.
Roadmap for Developing Drivers for the Windows
Display Driver Model (WDDM)
The Windows Display Driver Model (WDDM) requires that a graphics hardware vendor supply a
paired user-mode display driver and kernel-mode display driver (or display miniport driver).
To create these display drivers, perform the following steps:
Step 1: Learn about Windows architecture and drivers.
You must understand the fundamentals of how drivers work in Windows operating systems. Knowing the
fundamentals will help you make appropriate design decisions and allow you to streamline your
development process. See Concepts for all driver developers.
Step 2: Learn the fundamentals of WDDM display drivers.
To learn the fundamentals, see Introduction to the Windows Display Driver Model (WDDM)), Video Memory
Management and GPU Scheduling, and Threading and Synchronization Model of Display Miniport Driver.
For a description of the major new features in recent Windows releases, see:
Windows Display Driver Model Enhancements (WDDM 1.2)
Step 3: Learn about user-mode display drivers and issues with display miniport drivers from the User-Mode
Display Drivers and Multiple Monitors and Video Present Networks sections.
Step 4: Learn about the Windows driver build, test, and debug processes and tools.
Building a driver is not the same as building a user-mode application. See Developing, Testing, and
Deploying Drivers for information about Windows driver build, debug, and test processes, driver signing,
and driver verification. See Driver Development Tools for information about building, testing, verifying, and
debugging tools.
Step 5: Make additional display driver design decisions.
For information about making design decisions, see Implementation Tips and Requirements for the
Windows Display Driver Model (WDDM) and Tasks in the Windows Display Driver Model (WDDM).
Step 6: Access and review the display driver samples in the WDK at Display Samples.
Step 7: Develop, build, test, and debug your display drivers.
For information about how to develop display drivers for your graphics adapter, see Initializing Display
Miniport and User-Mode Display Drivers and Windows Display Driver Model (WDDM) Operation Flow. See
Developing, Testing, and Deploying Drivers for information about iterative building, testing, and debugging.
For debugging tips that are specific to display drivers, see Debugging Tips for the Windows Display Driver
Model (WDDM). This process will help ensure that you build a driver that works.
Step 8: Create a driver package for your display drivers.
For more information, see Distributing a driver package. For information about how to install display drivers
for a graphics adapter, see Installation Requirements for Display Miniport and User-Mode Display Drivers.
Step 9: Sign and distribute your display drivers.
The final step is to sign (optional) and distribute the driver. If your driver meets the quality standards that
are defined in the Windows Hardware Certification Kit (formerly Windows Logo Kit or WLK), you can
distribute it through the Microsoft Windows Update program. For more information, see Distributing a
driver package.
These are the basic steps. Additional steps might be necessary based on the needs of your individual driver.
What's new for Windows 10 display drivers (WDDM
2.0)
Memory Management
GPU virtual memory
All physical memory is abstracted into virtual segments that can be managed by the graphics processing unit
(GPU) memory manager.
Each process gets its own GPU virtual address space.
Support for swizzling ranges has been removed.
For more details, see GPU virtual memory in WDDM 2.0.
Driver residency
The video memory manager makes sure that allocations are resident in memory before submitting command
buffers to the driver. To facilitate this functionality, new user mode driver device driver interfaces (DDIs) have
been added (MakeResident, TrimResidency, Evict).
The allocation and patch location list is being phased out because it is not necessary in the new model.
User mode drivers are now responsible for handling allocation tracking and several new DDIs have been added
to enable this.
Drivers are given memory budgets and expected to adapt under memory pressure. This allows Universal
Windows drivers to function across application platforms.
New DDIs have been added for process synchronization and context monitoring.
For more details, see Driver residency in WDDM 2.0.
What's new for Windows 8.1 display drivers (WDDM
1.3)
This topic lists display driver features that are new or updated for Windows 8.1. Windows 8.1 introduces version
1.3 of the Windows Display Driver Model (WDDM).
An interface that's used to query a GPU node's engine capabilities.
Describes how to handle resources that are shared between integrated and discrete GPUs.
YUV format ranges in Windows 8.1
An interface that's used to signal user-mode display drivers that video inputs are either in the studio luminance
range or in the extended range.
Describes how to enable wireless (Miracast) displays.
Describes how to implement multiplane overlays.
Describes how to support tiled resources.
Describes how drivers implement 48-Hz adaptive refresh to conserve power on monitors that are normally run at
60 Hz.
Direct3D rendering performance improvements
Describes how drivers can improve rendering performance on Microsoft Direct3D 9 hardware.
Graphics kernel performance improvements
Describes how drivers can manage history buffers to provide accurate timing data about the execution of API calls
in a direct memory access (DMA) buffer.
Present overhead improvements
Describes how drivers must support additional texture formats and a new present device driver interface (DDI).
Specifying device state and frame latency
Describes how a user-mode display driver can pass device status and frame latency info to the display miniport
driver.
Supporting Path-Independent Rotation
Supported starting with Windows 8.1 Update. Describes how a display miniport driver can support cloning
portrait-first displays on landscape-first displays with the greatest possible resolution.
1.2)
Windows 8 introduced version 1.2 of the Windows Display Driver Model (WDDM). WDDM 1.2 also supports
Microsoft Direct3D Version 11.1. See these topics for info on features, guidance to independent hardware vendors
(IHVs), and hardware requirements:
Overview of WDDM 1.2
WDDM 1.2 features
Description of the new features available in WDDM 1.2
Note these requirements that have also been added to the documentation:
Summary of Direct3D support requirements (November 2012)

These topics list the hardware capabilities and formats that user-mode drivers must support for different Direct3D
feature levels:
Required Direct3D 9 capabilities
Required DXGI formats
Corrections to XR_BIAS conversions (November 2012)

XR and XR_BIAS format requirements have been corrected in these topics:
XR Layout
XR_BIAS Color Channel Conversion Rules
XR_BIAS to Float Conversion Rules
Float to XR_BIAS Conversion Rules
Conversion from BGR8888 to XR_BIAS
1.1)
The Windows Driver Kit (WDK) that is released with Windows 7 includes new features for user-mode display
drivers and kernel-mode display miniport drivers. It also includes updates to the requirements for installing display
drivers that are optimized for Windows 7 and information about new Microsoft Win32 APIs that are available in
Windows 7 that control desktop display setup.
New Windows 7 Features for User-Mode Display Drivers
The new Windows 7 features for user-mode display drivers include:
Windows 7 also provides extended format awareness to Microsoft Direct3D version 10.1. For more information
about extended format awareness, see Supporting Extended Format Awareness.
For information about the new Win32 APIs that control desktop display setup, see Connecting and Configuring
Displays.
New Windows 7 Features for Kernel-Mode Display Miniport Drivers
You can develop your kernel-mode display miniport driver to run on Windows 7 with the following capabilities:
Connecting and Configuring Displays - DDIs
New INF Requirements
The INF files for display drivers that are written to the Windows Vista display driver model and that are optimized
for the model's Windows 7 features, require several updates. For information about these updates, see Installing
Display Drivers Optimized for Windows 7 and Later.
GPUView
The release of the Windows 7 operating system also introduces GPUView (GPUView.exe), which is a new
development tool that monitors the performance of the graphics processing unit (GPU). For more information
about GPUView, see Using GPUView.
This section provides details about new features and enhancements in Windows Display Driver Model (WDDM)
version 2.0, which is available starting with Windows 10.
In this section
TOPIC DESCRIPTION
GPU virtual memory in WDDM 2.0 This section provides details about GPU virtual memory,
including why the changes were made and how drivers
will use it. This functionality is available starting with
Windows 10.
Driver residency in WDDM 2.0 This section provides details about the driver residency
changes for WDDM 2.0. The functionality described is
available starting with Windows 10.
Context monitoring A monitored fence object is an advanced form of fence

synchronization which allows either a CPU core or a
graphics processing unit (GPU) engine to signal or wait on
a particular fence object, allowing for very flexible
synchronization between GPU engines, or across CPU
cores and GPU engines.

GPU virtual memory in WDDM 2.0
This section provides details about GPU virtual memory, including why the changes were made and how drivers
will use it. This functionality is available starting with Windows 10.
Introduction
Under Windows Display Driver Model (WDDM) v1.x, the device driver interface (DDI) is built such that graphics
processing unit (GPU) engines are expected to reference memory through segment physical addresses. As
segments are shared across applications and over committed, resources gets relocated through their lifetime and
their assigned physical addresses change. This leads to the need to track memory references inside command
buffers through allocation and patch location lists, and to patch those buffers with the correct physical memory
reference before submission to a GPU engine. This tracking and patching is expensive and essentially imposes a
scheduling model where the video memory manager has to inspect every packet before it can be submitted to an
engine.
As more hardware vendors move toward a hardware based scheduling model, where work is submitted to the
GPU directly from user mode and where the GPU manages the various queue of work itself, it is necessary to
eliminate the need for the video memory manager to inspect and patch every command buffer before submission
to a GPU engine.
To achieve this we are introducing support for GPU virtual addressing in WDDM v2. In this model, each process
gets assigned a unique GPU virtual address space in which every GPU context to execute in. An allocation, created
or opened by a process, gets assigned a unique GPU virtual address within that process GPU virtual address space
that remains constant and unique for the lifetime of the allocation. This allows the user mode driver to reference
allocations through their GPU virtual address without having to worry about the underlying physical memory
changing through its lifetime.
Individual engines of a GPU can operate in either physical or virtual mode. In the physical mode, the scheduling
model remains the same as it is with WDDM v1.x. In the physical mode the user mode driver continues to generate
the allocation and patch location lists. They are submitted along a command buffer and are used to patch
command buffers to actual physical addresses before submission to an engine.
In the virtual mode, an engine references memory through GPU virtual addresses. In this mode the user mode
driver generates command buffers directly from user mode and uses new services to submit those commands to
the kernel. In this mode the user mode driver doesn’t generate allocation or patch location lists, although it is still
responsible for managing the residency of allocations. For more information on driver residency, see Driver
residency in WDDM 2.0.
GPU memory models

WDDM v2 supports two distinct models for GPU virtual addressing, GpuMmu and IoMmu. A driver must opt-in to
support either or both of the models. A single GPU node can support both modes simultaneously.
GpuMmu model
In the GpuMmu model, the video memory manager manages the GPU memory management unit and underlying
page tables, and exposes services to the user mode driver that allow it to manage GPU virtual address mapping to
allocations.
For more information, see GpuMmu model.
IoMmu model
In the IoMmu model, the CPU and GPU share a common address space and page tables.
For more information, see IoMmu model.
GpuMmu model
In the GpuMmu model, the graphics processing unit (GPU) has its own memory management unit (MMU) which
translates per-process GPU virtual addresses to physical addresses.
Each process has separate CPU and GPU virtual address spaces that use distinct page tables. The video memory
manager manages the GPU virtual address space of all processes and is in charge of allocating, growing, updating,
ensuring residency and freeing page tables. The hardware format of the page tables, used by the GPU MMU, is
unknown to the video memory manager and is abstracted through device driver interfaces (DDIs). The abstraction
supports a multilevel level translation, including a fixed size page table and a resizable root page table.
Although the video memory manager is responsible for managing the GPU virtual address space and its
underlying page tables, the video memory manager doesn't automatically assign GPU virtual addresses to
allocations. This responsibility falls onto the user mode driver.
The video memory manager offers two set of services to the user mode driver. First, the user mode driver may
allocate video memory through the existing Allocate callback and free that memory through the existing Deallocate
callback. Just like today, this returns the user mode driver a handle to a video memory manager allocation, which
can be operated on by a GPU engine. Such allocation represents only the physical portion of an allocation and may
be referenced by an engine, operating physically, through allocation list reference.
For engines running in the virtual mode, a GPU virtual address needs to be explicitly assigned to an allocation
before it may be accessed virtually. For this purpose the video memory manager offers the user mode driver
services to reserve or free GPU virtual addresses and to map specific allocation ranges into the GPU virtual address
space of a process. These services are very flexible and allow the user mode driver fine grain control over a process
GPU virtual address space. The user mode driver may decide to either assign a very specific GPU virtual address to
an allocation, or let video memory manager automatically pick an available one, possibly specifying some min and
max GPU virtual address constrains. A single allocation may have multiple GPU virtual address mappings
associated with it and services are provided to the user mode driver to implement the Tile Resource contract.
Similarly, in a linked display adapter configuration, the user mode driver may explicitly map GPU virtual address to
specific allocation instances and choose for each mapping whether the mapping should be to self or to a specific
peer GPU. In this model, the CPU and GPU virtual addresses assigned to an allocation are independent. A user
mode driver may decide to keep them the same in both address spaces or keep them independent.
GPU virtual addresses are managed logically at a fixed 4KB page granularity through the DDI interface. GPU virtual
addresses may reference allocations, which are resident in either a memory segment or system memory. System
memory is managed at 4KB physical granularity while memory segments are managed at either 4KB or 64KB at the
driverâ€™s choice. All video memory manager allocations are aligned and sized to be a multiple of the page size
chosen by the driver.
Access to an invalid range of GPU virtual addresses results in an access violation and termination of the context
and/or device that caused the access fault. To recover from such a fault, the video memory manager initiates an
engine reset which gets promoted to an adapter wide timeout detection recovery (TDR) if unsuccessful.
The GpuMmu model is illustrated below:
GPU segments
graphics processing unit (GPU) access to physical memory is abstracted in the device driver interface (DDI) by a
segmentation model. The kernel mode driver expresses the physical memory resources available to a GPU by
enumerating a set of segments, which are then managed by the video memory manager.
There are three types of segments in Windows Display Driver Model (WDDM) v2:
Memory Segment
A memory segment represents memory, dedicated to a GPU. This may be VRAM on a discrete GPU or
firmware/driver reserved memory on an integrated GPU. There can be multiple memory segments enumerated.
New in WDDM v2, a memory segment is managed as a pool of physical pages which are either 4KB or 64KB in size.
Surface data is copied into and out of a memory segment using Fill/Transfer/Discard/FillVirtuall/TransferVirtual
paging operations.
The CPU may access the content of a memory segment in one of two ways. First, a memory segment may be visible
in the physical address space of the CPU, in which case the video memory manager simply maps CPU virtual
addresses directly to allocations within the segment. New in WDDM v2, the video memory manager also supports
accessing the content of a memory segment through a programmable CPU host aperture associated with that
segment.
Aperture Segment
An aperture segment is a global page table used to make discontinuous system memory pages appears contiguous
from the perspective of a GPU engine.
In WDDM v2, a single aperture segment must be reported.
System Memory Segment
The system memory segment is an implicit segment representing system memory references (i.e. a guest physical
address). The system memory segment is not directly enumerated by the kernel mode driver. It is implicitly
enumerated by the video memory manager and always gets assigned SegmentId==0 . To place an allocation in the
system memory segment, the kernel mode driver needs to use the aperture segment ID.
Physical memory reference

In the DDI, physical memory references always take the form of a segment ID-segment offset pair.
Accessing allocations by physical address

GPU engines, which donâ€™t support GPU virtual addressing, need to access allocations through their physical
addresses. This has implication on how an allocation gets assigned resources from a segment. Physical references
imply that an allocation must be allocated either contiguously in a memory segment or occupy a contiguous range
in the aperture segment.
To avoid unnecessary and expensive contiguous allocations, the kernel mode driver must explicitly identify
allocations, which require to be accessed physically by a rendering engine, by setting the new
DXGK_ALLOCATIONINFOFLAGS2::AccessedPhysically flag during allocation creation.
Such allocations will be mapped to the aperture segment when resident in system memory. The allocations will be
contiguous when resident in a memory segment. Allocations, created this way, may be referenced through the
allocation list on engines, operating in the physical addressing mode.
Allocations, which do not have this flags set, will be allocated as a set of pages in a memory segment or a set of
pages in system memory, either of which are accessed through GPU virtual addresses. Allocations created this way
cannot be referenced through the allocation list. Any command buffer submission referencing the allocation that
way will be rejected.
Primary surfaces are understood to be accessed physically by the display controller and will be allocated
contiguously in a memory segment or mapped into the aperture segment when displayed. The kernel mode driver
should only set the AccessedPhysically flags when a rendering engine will access the allocation physically. The
distinction between the implicit physical access on primary surface and the explicit flags is when the allocation will
be mapped into the aperture. When the AccessedPhysically flags is set, the allocation will be mapped into the
aperture whenever it is resident. Primary surfaces, which do not have this flags set, will be mapped into the
aperture only when being displayed. This helps to remove pressure on the aperture segment, as typically there are
only a few primary surface actively being displayed, while there may be a very large number of them existing and
being rendered to (i.e. all FlipEx swapchains are created as primary and potentially displayable surfaces in dFlip/iFlip
scenarios).
AccessedPhysically==0 AccessedPhysically==1 Primary &&

AccessedPhysically==0
Memory Segment Set of pages Contiguous Contiguous

Only GPU virtual access GPU physical access is Only GPU virtual access
is allowed. allowed is allowed by rendering
engines.
Aperture Segment Not mapped Mapped when resident Mapped when

displayed
System memory pages, GPU physical access is
only mapped by GPU allowed. Only GPU virtual access
page tables, not to the is allowed by rendering
aperture segment. engines.
Only GPU virtual access
is allowed.

GPU virtual address
graphics processing unit (GPU) virtual addresses are managed in logical 4KB or 64 KB pages at the device driver
interface (DDI) level. This allows GPU virtual addresses to reference either system memory, which is always
allocated at a 4KB granularity, or memory segment pages, which may be managed at either 4KB or 64KB.
The video memory manager supports a multilevel virtual address translation scheme, where several level of page
tables are used to translate a virtual address. The levels are numbered from zero and the level zero is assigned to
the leaf level. Translation starts from the root level page table. When the number of page table levels is two, the
root page table can be resized to accommodate a process with variable GPU virtual address space size. Every level
is described by the DXGK_PAGE_TABLE_LEVEL_DESC structure which is filled by the kernel mode driver during a
DxgkDdiQueryAdapterInfo call. The kernel mode driver also fills out the DXGK_GPUMMUCAPS caps structure to
describe the GPU virtual addressing support.
Each process has its own GPU virtual address space. Before a graphics context of a process can be set for execution
the kernel mode driver will get a DxgkDdiSetRootPageTable call which sets the root page table address.
The virtual address translation for the case of two page table levels is shown in the following diagram.
The GPU virtual address has DXGK_GPUMMUCAPS::VirtualAddressBitCount bits.

The low bits [0 â€“ 11] represent an offset in bytes in a page. The next
DXGK_PAGE_TABLE_LEVEL_DESC::PageTableIndexBitCount bits represent the index of a page table entry in a
leaf level page table.
The number of entries in a page table is 2DXGK_PAGE_TABLE_LEVEL_DESC::PageTableIndexBitCount and the page table size is
DXGK_PAGE_TABLE_LEVEL_DESC::PageTableSizeInBytes bytes.
The rest of the bits represent an index to a page table entry in the root page table. The root page table is resizable
for the 2-level translation scheme and a new DxgkDdiGetRootPageTableSizeDDI is introduced to obtain its size.
The DXGK_PTE structure is used through the DDI to represent a page table entry. This structure represents
information about each entry, which the Microsoft DirectX graphics kernel manages. The driver uses this
information to build hardware-specific page table entries.
Creation of page table allocations
Page tables are created as implicit allocations and do not have a user mode driver or a kernel mode driver handle.
To allocate a page table, the video memory manager allocates an allocation of size
DXGK_PAGE_TABLE_LEVEL_DESC::PageTableSizeInBytes from the segment, specified in
DXGK_PAGE_TABLE_LEVEL_DESC::PageTableSegmentId. After creation, the video memory manager initializes
every entry in the page table to invalid using the new UpdatePageTable paging operation. Page tables never
change size, except for the root page table in the 2-level translation scheme.
The video memory manager supports resizing of the root page table in the 2-level translation scheme. When a root
page table, covering a specified amount of address space, is being created, the video memory manager calls the
new DxgkDdiGetRootPageTableSizeDDI to determine the required allocation size for it. The video memory manager
then allocates an allocation of that size in the segment, specified by
DXGK_PAGE_TABLE_LEVEL_DESC::PageTableSegmentId for the root level. After creation, the video memory
manager initializes every entry in the page table to invalid using the new UpdatePageTable paging operation. The
root page table can grow or shrink as the amount of video address space required by a process expands and
shrinks. Once the root page table is created, the video memory manager calls the new
DxgkDdiSetRootPageTableDDI to associate the newly created root page table with the various context that will
execute within.
In linked display adapter configurations, root page tables are created as LinkMirrored allocations, which have
identical content and are located at the same physical address on each GPU in the link. Lower level page tables are
allocated as LinkInstanced allocation to reflect the fact that their content may vary between GPU, typically because
of different peer mapping. The content of page tables is updated separately on all GPUs.
Growing and shrinking a root page table

This section is applicable only for systems with two levels of page tables. When the number of page table levels is
greater than two, the page table size for each level is defined by the virtual addressing caps and is fixed.
When the user mode driver requests GPU virtual addresses, the video memory manager grows the size of the
address space of a process to accommodate the request. This is accomplished by growing the size of the current
root page table (if necessary) as well as allocating new page tables for the new range.
To grow a root page table the video memory manager creates another root page table allocation, makes it resident,
and initializes its entires using UpdatePageTable operations, and destroys the old allocation. The
DxgkDdiGetRootPageTableSize function is used to get the size of the new page table in bytes.
To shrink a root page table, the video memory manager creates a new page table allocation, makes it resident,
copies a portion of the old page table to the new one using the CopyRootPageTable paging operation and destroys
the old allocation.
After the resize operation completes, the video memory manager calls the DxgkDdiSetRootPageTableDDI to
associate the impacted contexts with their new root page table.
Updating page table

As surfaces move around in memory, the video memory manager updates the content of page tables to reflect the
new location of surfaces. This is done through the new UpdatePageTable paging DDIs.
Moving a page table

Page tables may be relocated or evicted by the video memory manager when a device is idle or suspended. When
moving a page table, the video memory manager updates the higher levels page table to reference the new
location of the page table by using the UpdatePageTableDDIs.
When the root page table itself is relocated, the video memory manager calls the DxgkDdiSetRootPageTableDDI to
inform impacted contexts of the new location of their page directory.
Physical page size

As mentioned previously the video memory manager supports two page sizes. System memory is always managed
in 4KB pages, while memory segments may be managed at either 4KB or 64KB granularity as determined by the
kernel mode driver.
When opting for virtual memory to be managed in 64KB pages, all allocations are automatically aligned and sized
to be multiple of 64KB.
Expanding all allocations to 64KB can have a significant memory impact. It is the responsibility of the user mode
driver to pack small allocations into a larger one as to avoid wasting memory.
When mapping a GPU virtual address to a large 64KB memory segment page, the video memory manager will map
4KB page table entries to 16 contiguous 4KB pages in the memory segment. Both the virtual address and the
physical address are guaranteed to share the same 64KB alignment (i.e. the bottom 16bits of the virtual address
and the physical address are guaranteed to match.).
Per-process GPU virtual address spaces
Each process is associated with two graphics processing unit (GPU) virtual address spaces, an application GPU
virtual address space and a privileged virtual address space.
Application GPU virtual address space

The application GPU virtual address space is the address space that command buffers, generated by the user mode
driver, execute within. This address space is managed by the user mode driver using services provided by the video
memory manager. Before an allocation can be accessed by a GPU engine operating in the virtual mode, the user
mode driver must assign a GPU virtual address range to the allocation. For regular allocations, this is done using
the new MapGpuVirtualAddress service, exposed by the video memory manager. MapGpuVirtualAddress allows the
user mode driver to either pick a specific address where it wants the allocation to be mapped or it can let the video
memory manager pick an available GPU virtual address automatically. Drivers should generally let the video
memory manager pick an address automatically but in some circumstances the driver may need more control. In
linked display adapter configurations, MapGpuVirtualAddress can also be used to specify whether a mapping is to
the instance of the allocation on the current GPU or on a peer GPU. MapGpuVirtualAddress queues a request to the
video memory manager and returns to the user mode driver immediately while the request is processed. The
request is queued on the device paging queue and the user mode driver must ensure it synchronizes against the
returned device paging fence value. FreeGpuVirtualAddress can be used to unmap an allocation and reclaim its GPU
virtual address. All virtual addresses associated with an allocation are automatically freed when the allocation is
destroyed so the user mode driver doesnâ€™t need to explicitly unmap it. The video memory manager provides
two tile resources-specific services to the user mode driver. ReserveGpuVirtualAddress allows the user mode driver
to reserve address space for a tile resource and UpdateGpuVirtualAddress allows the user mode driver to map and
unmap regions of the tile resources to specific tile pool pages. ReserveGpuVirtualAddress executes against the
device paging queue, while UpdateGpuVirtualAddress executes in a special companion context running within the
process' privileged address space.
Process privileged virtual address space

Processes using tile resources get a second virtual address space associated with them on the first call to
ReserveGpuVirtualAddress. This address space is used to update the page table of the process synchronously with
rendering. We cover this address space in the Tile resources topic.
Virtual address space on linked display adapters

When physical graphics adapters are linked to a linked display adapter chain, there is still a single GPU virtual
address space per process (except the paging process). But the virtual address space on each physical adapter is
mapped by its own set of page tables.
System paging process
Most paging operations occur in a context of the system paging process. The only exception is the page table
update from the UpdateGpuVirtualAddress callback, which occurs in a special companion context and occurs
synchronous of rendering.
The Microsoft DirectX graphics kernel uses the system paging process to perform paging operations, such as:
Transfer allocation between system and local graphics processing unit (GPU) memory
Fill allocations with pattern
Update page tables
Map allocations to the aperture segment
Flush the translation look-aside buffer
The paging process has its own GPU virtual address space, GPU contexts and direct memory access (DMA) buffers
(called paging buffers). It has its own page tables which are pinned in physical memory and evicted only during
power transitions.
The virtual address space for the paging process has a pre-defined layout, is initialized during adapter initialization,
and every time after the memory content is lost due to power transitions.
The DirectX graphics kernel initializes enough page tables and page table entries in the root page table to cover the
1 GB virtual address space. The scratch area is used to temporary map allocations during transfer and fill operations
to the Paging process virtual address space. If an allocation does not fit into the virtual address scratch area, the
transfer operation will be done in chunks.
A system root page table allocation is created for the paging process. Its content is set during initialization and
never changes (except after power transitions).
The page tables of the system process are divided into two parts:
A system page table is created that reflects the scratch area page table into the address space of the system
process. This allow the system process to modify the scratch area page tables and map/unmap memory from the
scratch area as necessary. The content of the page tables is set during adapter initialization and never changes. The
scratch area page table page table entries are used to map allocations to the virtual address space of the paging
process. They are initialized as invalid during initialization and used later for paging operations. The page tables of
the paging process are initialized through UpdatePageTable paging operations during adapter initialization and
power on event. For these operations, the PageTableUpdateMode is forced to CPU_VIRTUAL and must be
completed immediately using the CPU (the paging buffer should not be used).
Updates of the page table entries for all other processes are done using the PageTableUpdateMode specified by
the driver. These updates are done in the context of the paging process.
Here is how the setup is done:
1. A root page table allocation and lower level page table allocations are created to cover 1 GB of address space.
2. The allocations are committed to a memory segment.
3. Multiple UpdatePageTable paging operations are issued to the driver to initialize the page table entries.
As an example of the paging process virtual address space initialization, letâ€™s consider the case with the
following parameters:
Page size is 4096 bytes
Paging process virtual address space is 1 GB
Page table entry size is 4 bytes
In this case we need a 2-level translation scheme made up of:
One system root page table
One system page table
255 scratch area page tables
The following figure shows how the page tables would be initialized based in the location of root page table and the
page table in physical memory. Note that the physical addresses are given only as illustration. A page table covers 4
MB of the address space. So the system page table covers all scratch area page tables. The scratch area starts from
4 MB virtual address.
As you see, the virtual address range from 0 to 4095 will be invalid.
Device paging queues
Various services exposed by the video memory manager can take a non-trivial amount of time to finish. For
example, making an allocation resident can possibly involve bringing the allocation content, which hasnâ€™t been
used in a long time, back from the page file. Reserving graphics processing unit (GPU) virtual address or mapping a
virtual address to an already resident allocation arenâ€™t quite as expensive but still involve immediate page table
update which needs to be queued onto the paging engine and may take a little while to finish.
Rather than forcing the thread requesting these services to wait until their completion, the video memory manager
implements these services using an asynchronous queue. This asynchronous queue is called the device paging
queue.
Each graphics device has a dedicated paging queue where various video memory manager requests are queued to
the video memory manager thread pool for servicing. A device paging fence object is associated with the queue
and every operation gets assigned a unique fence value that gets signaled when the video memory manager
completes the operation. An operation that can be done immediately by the video memory manager returns a
device paging fence value of zero.
The device paging fence is a regular monitored fence object and the user mode driver can wait on these video
memory manager services either on the CPU or on the GPU.
Generally the user mode driver wants to push the synchronization as far as possible and will queue a GPU wait into
a context before that context take a dependency on a requested video memory manager operation. For example,
after reserving the virtual address for a tile resource, the user mode driver must ensure to wait until the reserve
operation completes before a GPU engine starts accessing the virtual address range of the tile resource.
To obtain a reference to the device paging fence object a new GetDevicePagingFenceObjectCbdevice driver
interface (DDI) is added to the user mode driver. This is illustrated below:

Driver protection
Along with every virtual address, the video memory manager allows independent hardware vendors (IHVs) to
define a driver / hardware specific protection (i.e. page table entry encoding) that is associated specifically with that
virtual address. Think about driver protection as extra bit in the page table entry that the video memory manager
doesnâ€™t know about but that the driver must control in order for the graphics processing unit (GPU) to access
memory in an optimum way.
Note Driver protection is optional and can be left at zero on any platform that doesn't require this functionality.
When mapping or reserving a GPU virtual address range the driver may specify a 64-bit driver protection value.
The specified driver protection is used by the video memory manager when initializing the page table entry
corresponding to that specific virtual address. In particular, driver protection is given back to the driver for any
BuildPagingBufferDXGK_OPERATION_UPDATE_PAGE_TABLE corresponding to the specified virtual address.
Multiple virtual addresses may be mapped to a single allocation using different driver protections. Page table
entries for each of these virtual addresses will be updated using the appropriate driver protection.
Driver protection only applies to level 0 page table entries and will be set to zero for any other page table entry
levels.
Paging and unique driver protection

When paging an allocation in or out of a memory segment, the video memory manager assigns a temporary virtual
address from the system device address space for the purpose of transferring the allocationâ€™s content. When
creating this mapping, the driver protection related to the allocation is ambiguous since there could exist multiple
mapping in various process address space with different driver protection.
Because of this, the video memory manager will specify a driver protection of zero for any system device mapping
used for paging by default.
A driver can change this behavior by setting the unique bit when specifying the driver protection associated with a
virtual address.
#define D3DGPU_UNIQUE_DRIVER_PROTECTION 0x8000000000000000ULL
When this bit is set, the video memory manager will enforce that any mapping to the same allocation range use the
same driver protection value, or the mapping request will fail with STATUS_INVALID_PARAMETER.
An allocation range, mapped with a unique driver protection value, cannot be mapped again with a different
protection value. The only way to change the protection in this case is to map the range with no access.
An allocation range that is mapped with a non-unique driver protection value can be mapped again with any
protection value.
When evicting an allocation that has virtual address ranges mapped with driver protection set to unique, the video
memory manager will setup the paging process mapping used for those ranges with the appropriate driver
protection value without ambiguity.
The following figure shows VA mapping for an allocation with different driver protection values.
During paging operations the allocation will be copied in chunks:
1. Copy allocation range [0, A1] with driver protection 0
2. Copy allocation range [A1, A2] with driver protection P1
3. Copy allocation range [A2, A4] with driver protection 0
4. Copy allocation range [A4, A5] with driver protection P4
5. Copy allocation range [A5, Size] with driver protection 0 It is possible that paging process page table entries will
be set with one driver protection value when an allocation is evicted and set to a different value when the
allocation is committed. It is assumed that the driver should refresh the allocation data after the virtual address
mapping is updated. For example, consider a case when the current allocation mapping set is M1 and the user
mode driver called UpdateGpuVirtualAddress with mapping set M2. Just before the mapping set M2 is applied,
the allocation can be evicted by the video memory manager. The mapping set M2 is applied and the allocation is
committed back. Now the allocation content in the local memory segment might be different from the original.
Tiled Resources
For tiled resources, driver protection is specified when reserving a virtual address range. A user mode driver call to
UpdateGpuVirtualAddress will inherit the virtual address current driver protection.
Tile resources
For tile resources, the asynchronous video memory manager services running on the device paging queue
arenâ€™t sufficient. In particular, for tile resources we want to queue page table updates along with rendering and
ensure that the updates are applied synchronously between draw operations.
For example, given the following API call sequence by an application:
1. Draw #42
2. Update tile mapping
3. Draw #43
We want to ensure that Draw #42 executes with the page tables in their old state while Draw #43 executes with
page tables in their new state. For update tile mapping operations that specify the no-overwrite flag, this
synchronization can be relaxed a bit, but high performance synchronous updates must be supported. In order to
support high performance queued updates, we need the ability to generate paging operations ahead of time and
queue them to a context and wait for them to be executed once the dependent rendering context reaches a certain
point (ex: after Draw #42 above).
This means that the paging operation needs to be queued behind a graphics processing unit (GPU) wait that will be
signaled by a specific rendering context. Because of this, they canâ€™t be queued directly to the shared system
context as this would imply that one application could block execution of paging operations for everyone else in the
system.
In theory, in todayâ€™s packet based scheduling we could implement the wait portion of the operation in the
device paging queue, monitor the wait and submit the paging operation to the shared system context after the wait
condition has been satisfied. However, as we move beyond packet based scheduling and onto hardware scheduling
we want to ensure that we can use GPU to GPU synch primitives for the interlocked operations to ensure the best
possible performance.
To solve this problem, weâ€™re introducing the notion of a per-context paging companion context. The paging
companion context is lazily created on the first call to UpdateGpuVirtualAddress and is used for all page table
updates that require interlocked synchronization. UpdateGpuVirtualAddress takes a GPU monitored fence object
and a specific fence value as parameters. The companion context waits on this monitored fence does the page table
update and then increments the monitored fence object and signals it. This allows the rendering context to tightly
synchronize with the companion context.
Page table update using the companion context is illustrated below.
The companion context is lazily created by the video memory manager against an engine chosen by the kernel
mode driver during context creation (DXGKARG_CREATECONTEXT.PagingCompanionNodeId).
The companion context executes in a per-process privileged address space. The address space is privileged because
it is both controlled by the kernel and only the direct memory access (DMA) buffers generated in the kernel are
allowed to execute within. Outside of that, this is a normal GPU virtual address space and doesnâ€™t require any
special hardware virtual address space privilege support.
We opted for a per-process privileged GPU virtual address space rather than re-using the system paging process
GPU virtual address space for simplicity. Given that mapping and unmapping of tile resource is common, we need
to have the application page table mapped permanently in an address space to avoid having to map/unmap them
frequently. Also, as weâ€™ll shortly detail, we need to map all of the tile pools themselves in a permanent fashion
as well. Doing these permanent mappings in the system address space would have introduced unnecessary
additional complexity.
The per-process privileged GPU virtual address space is initialized such that the process GPU page tables are visible
through the address space making it possible for update command to update the various page table entries using
the GPU. Further, all tile pools created by a process are also mapped into the address space.
The way page table entries are updated by the companion context is a bit special and requires some explaining.
When a map operation is queued for execution on the shared system context, the video memory manager knows
the physical addresses being mapped to and those physical addresses can appear directly in the associated paging
buffer. UpdatePageTable paging operations are used in this case and the video memory manager guarantees that
paging operations on some specific pages will complete before those pages are reused for some other purpose.
However, for synchronous updates of page tables on the companion context, things are more difficult. The video
memory manager knows the physical page of tile pool being referenced at the time the update operations are built,
however, given those operations will be queued behind an arbitrary long GPU wait (the app could even deadlock
and never signal), the video memory manager doesnâ€™t know what the physical page of the tile pool will be at
the time the paging operation actually do get executed and the video memory manager canâ€™t keep the tile pool
at that location for an arbitrary long time.
To solve this problem we essentially need to either queue the paging operation and patch it up later as physical
address changes or we need to late bind the actual address used in the update, the video memory manager does
the latter.
To solve this problem the video memory manager does two things. First, it maps a GPU virtual address to all of the
tile pool elements belonging to a process inside the process privileged address space. As tile pool moves around in
memory, the video memory manager automatically keeps those GPU virtual address pointing to the right location
for the tile pool using the same simple mechanism it does for any other allocation type.
To update tile resource page table entries, the video memory manager introduces a new CopyPageTableEntry
paging operation which copies page table entries from the tile pool virtual address to the tile resource virtual
address. Because the video memory manager keeps the tile pool virtual address up to date as the tile pool moves
around in memory, the copy operation is guaranteed to be executed with the currently valid physical location for
the tile pool no matter how much time elapsed between the command having been generated and the commands
actually executing.
Note that as long as there are queued page table updates referencing a particular tile pool, the video memory
manager will keep that tile pool in the residency requirement for the application no matter what the user mode
driver or application says, to guarantee that the tile pool virtual addresses are valid when executing the update
operation.
This mechanism is illustrated below:
**
Update GPU virtual address on GPUs with CPU_VIRTUAL page table

update mode
On GPUs, which support the DXGK_PAGETABLEUPDATE_CPU_VIRTUAL page table update mode, the
CopyPageTableEntries operation will not be used. These are integrated GPU, which do not use paging buffers.
The video memory manager will defer the update operation until the right time and use the UpdatePageTable
operations to setup page tables.
The disadvantage of this method is that the UpdatePageTable operations are not parallel with rendering
operations. The advantage is that the driver does not need to implement support for paging buffers and implement
UpdatePageTable as an immediate operation.
Linked display adapter
Each physical adapter in a linked display adapter (LDA) link can support GpuMmu or IoMmu or both addressing
modes independently.
IoMmu support
Each physical adapter in a link can support the IoMmu model and/or the GpuMmu model.
DxgkDdiCreateDevice will be called for logical adapters, which support the IoMmu model.
GpuMmu support
All physical adapters in a link share the same process virtual address space, but each graphics processing unit
(GPU) has its own page tables. Generally, content of page tables is different on each GPU.
Each physical adapter is allowed to have its own GpuMmu capabilities (page table segment, page table update
node, virtual address layout, the underlying page table format, size, etc.). The only restriction is that all physical
adapters must have the same virtual address size. GpuMmuCaps.VirtualAddressBitCount must be the same for
all adapters. The driver should clamp the address space size to the smallest of the physical GPUs.
The Microsoft DirectX graphics kernel will now query GpuMmu caps for every physical adapter in a link.
DxgkDdiQueryAdapterInfo (DXGKQAITYPE_PAGETABLELEVELDESC) will also be called for each physical adapter.
InputDataSize and pInputData for DxgkDdiQueryAdapterInfo(DXGKQAITYPE_GPUMMUCAPS) will point to
DXGK_GPUMMUCAPSIN.
InputDataSize and pInputData for DxgkDdiQueryAdapterInfo(DXGKQAITYPE_PAGETABLELEVELDESC) will
point to DXGK_PAGETABLELEVELDESCIN.
Related topics
DxgkDdiCreateDevice
Resizable BAR support
It is typical today for a discrete graphics processing unit (GPU) to have only a small portion of its frame buffer
exposed over the PCI bus. For compatibility with 32bit OSes, discrete GPUs typically claim a 256MB I/O region for
their frame buffers and this is how typical firmware configures them.
For Windows Display Driver Model (WDDM) v2, Windows will renegotiate the size of a GPU BAR post firmware
initialization on GPUs supporting resizable BAR, see Resizable BAR Capability in the PCI SIG Specifications Library.
A GPU, supporting resizable BAR, must ensure that it can keep the display up and showing a static image during the
reprogramming of the BAR. In particular, we don't want to see the display go blank and back up during this process.
It is important to have smooth transition between the firmware displayed image, the boot loader image and the
first kernel mode driver generated image. It is guaranteed that no PCI transaction will occur toward the GPU while
the renegotiation is taking place.
For the most part this renegotiation will be invisible to the kernel mode driver. When the renegotiation is
successful, the kernel mode driver will observe that the GPU BAR has been resized to its maximum size to expose
the entire VRAM of the discrete GPU.
Upon successful resizing, the kernel mode driver should expose a single, CPUVisible, memory segment to the video
memory manager. The video memory manager will map CPU virtual addresses directly to this range when the CPU
need to access the content of the memory segment.
CPU host aperture
For 32bit OS discrete graphics processing units (GPUs), which donâ€™t support resizable BAR or when resizing the
frame buffer BAR fails, Windows Display Driver Model (WDDM) v2 will offer an alternative mechanism by which a
discrete GPU VRAM can be efficiently accessed. For GPUs, which support a programmable BAR address space, a
new CPU Host Aperture functionality is introduced in WDDM v2 to abstract that functionality.
When exposing a CPU host aperture, the kernel mode driver fills out a new DXGK_CPUHOSTAPERTURE caps
structure for every segment supporting a CPU host aperture. This defines the size of the CPU host aperture, this
allows driver to reserve some of the BAR for internal purposes. The page size is the same as the GPU pages of the
memory segment.
The kernel mode driver then exposes two new device driver interfaces (DDIs) to manage the BAR address space, in
particular DxgkDdiMapCpuHostAperture and DxgkDdiUnmapCpuHostAperture.
The memory for the page table behind the CPU host aperture is managed by the driver and setup early during
driver initialization. Both DxgkDdiMapCpuHostAperture and DxgkDdiUnmapCpuHostAperture are expected to be
operational immediately after segment enumeration and are used during the video memory manager initialization
to map CPU virtual address to the page directory and page table of the system paging process during adapter
initialization.
When CPU access to a memory segment is required, the video memory manager reserves pages in the CPU Host
Aperture and maps memory segment pages through it. This is illustrated below.
In the linked display adapter configuration things look similar except for the following.
Default or LinkMirrored allocation are always mapped to GPU0.
LinkInstanced allocation have a virtual address range of AllocationSize*NumberOfGPUInLink associated with
them with various part of the allocation being mapped to different GPU.
This is illustrated below:
Support for 64KB pages
To support 64 KB pages Windows Display Driver Model (WDDM) v2 provides two types of leaf page tables, one that
supports 4 KB page table entries and one that supports 64 KB entries. Both page table entry sizes cover the same
virtual address range, so a page table for 4KB pages has 16 times the number of entries as the 64 KB page table.
The size of a 64 KB page table is defined by DXGK_GPUMMUCAPS::LeafPageTableSizeFor64KPagesInBytes.
The UpdatePageTable operation has a flag that indicates the type of the page table is updated,
DXGK_UPDATEPAGETABLEFLAGS::Use64KBPages.
There are two modes of operations that are supported by the WDDM v2:
1. The page table entries of the level 1 page table point either to 4 KB page table or 64 KB page table.
2. The page table entries of the level 1 page table point to a 4 KB page table and a 64 KB page table at the same
time. This is called "dual PTE" mode.
The dual PTE support is expressed by the DXGK_GPUMMUCAPS::DualPteSupported cap. The video memory
manager chooses the page size based on the allocation alignment, graphics processing unit (GPU) memory
segment properties, and the GPU memory segment type. An allocation will be mapped using 64 KB pages if its
alignment and the size are multiple of 64 KB and it is resident in a memory segment that supports 64 KB pages.
Single PTE mode

In this mode the page table entries of the level 1 page table point either to a 4 KB page table or a 64 KB page table.
DXGK_PTE::PageTablePageSize field is added to DXGK_PTE. It should be used only for page table entries of the
level 1 page table (page directory in the old terminology). This field tells the kernel mode driver the type of the
corresponding page table (using 64KB or 4KB pages).
The video memory manager chooses to use a 64 KB page table for a virtual address range when:
Only 64 KB aligned allocations are mapped to the range.
The memory segments of all allocations mapped to the range support 64 KB pages.
When a virtual address range is mapped by 64 KB pages and the above conditions are no longer valid (for example,
an allocation is committed to the system memory segment), the video memory manager switches from the 64 KB
page table to the 4 KB page table.
When a page table has only 64 KB page table entries and a page table entry needs to point to 4KB page (for
example, an allocation is placed to system memory), the page table will be converted to use 4 KB page table entries.
The conversion is done as follows:
1. All contexts of the process are suspended.
2. Existing page table entries are updated to point to 4KB pages. The driver will get the UpdatePageTable paging
operation.
3. The level 1 page table entry that points to the page table will be updated to reflect the new page size
(PageTablePageSize = DXGK_PTE_PAGE_TABLE_PAGE_4KB). The driver will get the UpdatePageTable
paging operation.
4. All contexts of the process are resumed.
When a page table has only 4KB page table entries and the number of page table entries that must point to 4KB
pages is zero, the page table will be converted to use 64 KB page table entries.
The conversion is done as follows:
1. All contexts of the process are suspended.
2. Existing page table entries are updated to point to 64KB pages. The driver will get the UpdatePageTable paging
operation.
3. The level 1 page table entry that points to the page table will be updated to reflect the new page size
(PageTablePageSize = DXGK_PTE_PAGE_TABLE_PAGE_64KB). The driver will get the UpdatePageTable
paging operation.
4. All contexts of the process are resumed.
To prevent frequent switches between different page table sizes, the driver should pack small allocations together.
Dual PTE mode

In this mode the page table entries of the level 1 page table might point to a 4 KB page table and a 64 KB page table
at the same time.
Both pointer in the entries of the level 1 page table might have the Valid flag set, but the entries in the level 0 page
table that cover the same 64 KB virtual address range cannot be valid at the same time.
When an allocation that is covered by a 64 KB page table entry is placed to a memory segment with 64 KB page
size, the 64 KB page table entry becomes invalid and the corresponding 4 KB page table entries become valid.
In the following diagram a 4 KB allocation and a 64 KB aligned allocation are in the same virtual address range
covered by a level0 page table and in a segment that supports 64 KB pages.
Swizzling ranges
Swizzling ranges are no longer supported in Windows Display Driver Model (WDDM) v2.
Context allocation
To allocate memory for the context save area of a context, the kernel mode driver can use context allocations via
DxgkCbCreateContextAllocation. Some new functionality is added to context allocations to make them fit into the
new graphics processing unit (GPU) virtual address model.
AccessedPhysically
A context allocation can specify the AccessedPhysically flags to indicate that the allocation should be allocated
contiguously in a memory segment or mapped into the aperture if accessed from system memory.
Assigning a GPU virtual address to a context allocation

The video memory manager exposes a new DxgkCbMapContextAllocation service to the kernel mode driver to
allocate a GPU virtual address to a context allocation.
Context allocations are mapped into the application GPU virtual address space associated with the specified context.
Note The driver should be careful not to expose privileged information when a context allocation is to be mapped
directly to an application GPU virtual address space.
These services behave like their user mode counterpart.
Updating the content of a context allocation

It may sometime be necessary for the kernel mode driver to update the content of a context allocation. For example,
a privileged (AccessedPhysically, no GPU virtual mapping) context allocation may contain a reference to the page
directory associated with a particular context. When the kernel mode driver is notified of the page directory
relocation by DxgkDdiSetRootPageTable, the kernel mode driver may need to update the content of that context
allocation.
For this purpose a new DxgkCbUpdateContextAllocationdevice driver interface (DDI) is added. This DDI queues a
request to the video memory manager to initiate an update of the context allocation. The context allocation being
updated is mapped into the scratch area of the video memory manager paging process, then the driver is called
with a new UpdateContextAllocation paging operation to do the actual update of the context allocation. The video
memory manager returns from DxgkCbUpdateContextAllocation after the update is completed.
The kernel mode driver can pass some private driver data between its calls to DxgkCbUpdateContextAllocation and
the resulting UpdateContextAllocation paging operation.
GpuMmu Example Scenarios
This topic describes common usage scenarios and the sequence of operations necessary to implement them.
These scenarios include:
Updating page table entries of a process
Transferring allocation content from one location to another
Filling an allocation with a pattern
Making an allocation resident in system memory
Initialization of the memory manager control structures
Updating page table entries of a process

Here is the sequence of operations to update page table entries to map an allocation that belong to a process (P) to
physical memory. It is assumed that the page table allocations are already resident in a graphics processing unit
(GPU) memory segment.
1. The video memory manager allocates a virtual address range in the paging process context for the root page
table allocation of the process P.
2. The video memory manager allocates a virtual address range in the paging process context for the page table
allocations of the process P.
3. The video memory manager calls DxgkDdiBuildPagingBuffer with the UpdatePageTable command to map the
paging process page table entries to the process P page tables and the page directory.
4. The video memory manager calls DxgkDdiBuildPagingBuffer with the FlushTLB(PagingProcessRootPageTable)
command.
5. The video memory manager calls DxgkDdiBuildPagingBuffer with the UpdatePageTable command to fill the
process page table entries with physical address information.
6. The video memory manager calls DxgkDdiBuildPagingBuffer with the FlushTLB(process P root page table)
command.
7. The paging buffer is submitted for execution in the paging process context.
Transferring allocation content from one location to another
Here is the sequence of operations when transferring an allocation content from one location to another (ex. from
local memory to system memory).
1. The video memory manager allocates virtual address ranges for the source allocation and the destination
allocation in the paging process virtual address scratch area.
2. The video memory manager calls DxgkDdiBuildPagingBuffer with the UpdatePageTable command. The
command maps the paging process page table entries for the source virtual address range to the allocation
physical address in the local GPU memory.
3. The video memory manager calls DxgkDdiBuildPagingBuffer with UpdatePageTable command. The command
maps the paging process page table entries for the destination virtual address to system memory.
4. The video memory manager calls DxgkDdiBuildPagingBuffer with the FlushTLB(paging process root page table).
5. The video memory manager calls DxgkDdiBuildPagingBuffer with the TransferVirtual command to perform a
transfer operation.
6. The paging buffer is submitted to the GPU for execution in the paging process context.
Filling an allocation with a pattern
Here is the sequence of operations when an allocation needs to be filled with a pattern.
1. The video memory manager allocates a virtual address range for the destination allocation in the paging process
virtual address scratch area.
2. The video memory manager calls DxgkDdiBuildPagingBuffer with the UpdatePageTable command. The
command maps the paging process page table entries for the destination virtual address.
3. The video memory manager calls DxgkDdiBuildPagingBuffer with the FlushTLB(paging process root page table).
4. The video memory manager calls DxgkDdiBuildPagingBuffer with the FillVirtual command to perform the
operation.
5. The paging buffer is submitted to the GPU for execution in the paging process context.
Making an allocation resident in system memory

The following operations are performed when D3DKMTMakeResident is called to make an allocation resident. It
is assumed that the application process page tables are resident in memory.
In the application thread context:
1. Allocate and pin physical system memory pages for the allocation virtual address range (if the allocation is
resident in system memory).
2. Generate a new paging fence ID for the application device.
3. Submit a MakeResident command to the video memory manager worked thread.
4. Return to the application.
In the video memory manager worker thread context:
1. Update the application process page table entries (see the corresponding section above).
2. If the allocation is resident in a local memory segment, fill the allocation with zeros (see the corresponding
section above).
3. Submit the SignalSynchronizationObject command to the scheduler with the paging fence ID.
Initialization of the memory manager control structures

The paging process initialization
The Microsoft DirectX graphics kernel initializes the paging process virtual address space when the graphics device
is switched to the D0 power device state
1. The paging process is created with DxgkDdiCreateProcess.
2. The system device is created with DxgkDdiCreateDevice. At this point the kernel mode driver can reserve a
virtual address range in the paging process address space.
3. Page table allocations are created for the paging process.
4. The page table allocations are committed to the memory segments that are defined in the virtual addressing
capability structure.
5. UpdatePageTable operations are called to initialize the page tables.
A client process initialization
When a new process is created, the DirectX graphics kernel will:
Create the initial page table allocations.
Initialize the page table allocations when the first allocation from the process is made resident.
IoMmu model
In the IoMmu model each process has a single virtual address space that is shared between the CPU and graphics
processing unit (GPU) and is managed by the OS memory manager.
To access memory, the GPU sends a data request to a compliant IoMmu. The request includes a shared virtual
address and a process address space identifier (PASID). The IoMmu unit performs the address translation using the
shared page table. This is illustrated below:
The kernel mode driver expresses support for the IoMmu model by setting the
DXGK_VIDMMCAPS::IoMmuSupported caps. When this flags is set, the video memory manager will
automatically register any process using the GPU with the IoMmu and obtain a PASID for that process address
space. The PASID is passed to the driver during device creation.
Primary allocations are mapped by the video memory manager into the aperture segment before being displayed,
ensuring that the display controller has physical access to these allocations.
In the IoMmu model, the driver continues to allocate video memory for the GPU using the video memory
manager's Allocate service. This allow the user mode driver to follow the residency model, support the Microsoft
DirectX resource sharing model, ensure that primary surfaces are visible to the kernel, and are mapped into
aperture before being displayed.
The first level of translation (tile resource address to shared CPU/GPU address) is entirely managed in user mode by
the user mode driver.
Driver residency in WDDM 2.0
This section provides details about the driver residency changes for Windows Display Driver Model (WDDM) 2.0.
The functionality described is available starting with Windows 10.
In this section
TOPIC DESCRIPTION
Residency overview With the introduction of the new residency model,

residency is being moved to an explicit list on the device
instead of the per-command buffer list. The video
memory manager will ensure that all allocations on a
particular device residency requirement list are resident
before any contexts belonging to that device are
scheduled for execution.
Allocation usage tracking With the allocation list going away, the video memory
manager no longer has visibility into the allocations being
referenced in a particular command buffer. As a result of
this, the video memory manager is no longer in a position
to track allocation usage and to handle related
synchronization. This responsibility will now fall to the
user mode driver. In particular, the user mode driver will
have to handle the synchronization with respect to direct
CPU access to allocation as well as renaming.
Offer and reclaim changes For WDDM v2, requirements around Offer and Reclaim
are being relaxed. User mode drivers are no longer
required to use offer and reclaim on internal allocations.
Idle/suspended applications will get rid of driver internal
resources by using the TrimAPI that was introduced in
Microsoft DirectX 11.1.
TOPIC DESCRIPTION
Access to non-resident allocation Graphics processing unit (GPU) access to allocations

which are not resident is illegal and will result in a device
removed for the application that generated the error.
There are two distinct models of handling such invalid
access dependent on whether the faulting engine
supports GPU virtual addressing or not:
For engines that don't support GPU virtual addressing
and use the allocation and patch location list to patch
memory references, an invalid access occurs when the
user mode driver submits an allocation list which
references an allocation which is not resident on the
device (i.e. the user mode driver hasn't called
MakeResidentCb on that allocation). When this occurs,
the graphics kernel puts the faulty context/device in
error.
For engines that do support GPU virtual addressing
but access a GPU virtual address that is invalid, either
because there is no allocation behind the virtual
address or there is a valid allocation but it hasn't been
made resident, the GPU is expected to raise an
unrecoverable page fault in the form of an interrupt.
When the page fault interrupt occurs, the kernel mode
driver needs to forward the error to the graphics
kernel through a new page fault notification. Upon
receiving this notification, the graphics kernel initiates
an engine reset on the faulting engine and puts the
faulty context/device in error. If the engine reset is
unsuccessful, the graphics kernel promotes the error
to a full adapter wide timeout detection and recovery
(TDR).
Process residency budgets In WDDM v2, processes will be assigned budgets for how
much memory they can keep resident. This budget can
change over time, but generally will only be imposed
when the system is under memory pressure. Prior to
Microsoft Direct3D 12, the budget is handled by the user
mode driver in the form of Trim notifications and
MakeResident failures with STATUS_NO_MEMORY.
TrimToBudget notification, Evict, and failed MakeResident
calls all return the latest budget in the form of an integer
NumBytesToTrim value that indicates how much needs
to be trimmed in order to fit in the new budget.

Residency overview
Overview
Today the user mode driver builds allocation and patch location list information along with every command buffer
it builds. This information is used by the video memory manager for two purposes:
The allocation list and patch location list are used to patch command buffers with actual segment addresses
before they are submitted to a graphics processing unit (GPU) engine. GPU virtual address support in the
Windows Display Driver Model (WDDM) v2 removes the need for this patching.
The allocation list and patch location list are used by the video memory manager to control residency of
allocation. The video memory manager ensures that any allocations referenced by a command buffer are made
resident before the command buffer is sent to execution for a particular engine.
With the introduction of the new residency model, residency is being moved to an explicit list on the device instead
of the per-command buffer list. The video memory manager will ensure that all allocations on a particular device
residency requirement list are resident before any contexts belonging to that device are scheduled for execution.
To manage residency, the user mode driver will have access to two new device driver interfaces (DDIs),
MakeResident and Evict, as well as be required to implement a new TrimResidency callback. MakeResident will add
one or more allocations to a device residency requirement list. Evict will remove one of more allocations from that
list. The TrimResidency callback will be called by the video memory manager when it needs the user mode driver to
reduce its residency requirement.
MakeResident and Evict have also been updated to keep an internal reference count, meaning multiple calls to
MakeResident will require an equal number of Evict calls to actually evict the allocation.
Under the new residency model, the per-command buffer allocation and patch location list are being slowly phased
out. While these lists will exist in some scenarios, they will no longer have any control over residency.
Important Residency in the WDDM v2 is controlled exclusively by the device residency requirement list. This is
true across all engines of the GPU and for every API.
Phasing out allocation and patch location list

The role of the allocation and patch location list will get significantly reduced with the introduction of the new
residency model and will actually go completely away with the introduction of hardware assisted scheduling.
Under the packet based scheduling model, the allocation list will continue to exist as follows:
For engines which donâ€™t support GPU virtual addressing, the allocation list and patch location list will
continue to exist, however, they will be used purely for patching purposes and will no longer have any control
over residency. The allocation list and patch location list will be provided to both the user mode driver and the
kernel mode driver in the various usual DDIs, but any references to allocations that are not resident will cause
the GPU scheduler to reject the submission and put the device in error (lost). This mode of operation is
considered legacy and we expect all GPU engines to get support for GPU virtual addressing in future hardware
releases. It is expected that this mode of operation will be dropped in future versions of the WDDM.
For engines which do support GPU virtual addressing, a new context creation flag
(DXGK_CONTEXTINFO_NO_PATCHING_REQUIRED) is added to indicate that the particular context
doesnâ€™t require any patching. When this flag is specified, no patch location list will be allocated and only a
very small allocation list (16 entries) will be allocated. The allocation list will be used to keep track of write
references to primary surfaces and for no other purpose. The GPU scheduler needs to know when a particular
command buffer is writing to a primary surface such that it may properly synchronize execution of that buffer
with respect to flip potentially occurring to the primary surface.
Similarly, the allocation list is used in the kernel mode driver Present path today to pass information to the driver
about the source and destination of the Present operation. In this context the allocation list will continue to exist to
pass parameters around, however, the allocation list will not be used for residency. On GPUs requiring patching the
Present allocation list will contain pre-patch information like it does today and the Present packet will be re-patched
before being scheduled if any of the resources move around in memory between the time they are queued to the
scheduler and the time they are scheduled for execution on the GPU.
The table below summarizes when a WDDM v2 driver should expect to receive an allocation and patch location list
in various user mode driver and kernel mode driver DDIs.
GPU ENGINE ALLOCATION LIST? PATCH LOCATION LIST?
No GPU virtual address support (require Yes, full size, but purely use for Yes, full size.
patching, default) patching purposes.
Any reference to allocation that is not
resident will result in the submitting
device being put in error (lost) and the
submission rejected by the scheduler.
GPU virtual address support Yes, 16 entries. No

(DXGK_CONTEXTINFO_NO_PATCHIN
G_REQUIRED flag set) References the primary surface, if any,
being written to by the command
buffer. Used by the GPU scheduler for
synchronization with lips occurring on
the display controller. The primary
surface must already be on the device
residency requirement list or the
reference will be rejected.
GPU virtual address support + No No

hardware scheduling

Allocation usage tracking
With the allocation list going away, the video memory manager no longer has visibility into the allocations being
referenced in a particular command buffer. As a result of this, the video memory manager is no longer in a position
to track allocation usage and to handle related synchronization. This responsibility will now fall to the user mode
driver. In particular, the user mode driver will have to handle the synchronization with respect to direct CPU access
to allocation as well as renaming.
For allocation destruction, the video memory manager will asynchronously defer these in a safe manner that will
be both non-blocking for the calling thread and very performant. As such a user mode driver doesnâ€™t have to
worry about having to defer allocation destruction. When an allocation destruction request is received, the video
memory manager assumes, by default, that commands queued prior to the destruction request may potentially
access the allocation being destroyed and defers the destruction operation until the queued commands finish. If the
user mode driver knows that pending commands donâ€™t access the allocation being destroyed, it can instruct the
video memory manager process the request without waiting by setting the AssumeNotInUse flag when calling
Deallocate2 or DestroyAllocation2.
Lock2
The user mode driver will be responsible for handling proper synchronization with respect to direct CPU access. In
particular, a user mode driver will be required to support the following:
1. Support no-overwrite and discard lock semantics. This implies that the user mode driver will have to implement
its own renaming scheme.
2. For map operations requiring synchronization (i.e. not the above no-overwrite or discard), the user mode
driver will be required to:
Return WasStillDrawing if an attempt is made to access an allocation which is currently busy and the
caller has requested that the Lock operation not block the calling thread
(D3D11_MAP_FLAG_DO_NOT_WAIT).
Or, if the D3D11_MAP_FLAG_DO_NOT_WAIT flags is not set, wait until an allocation becomes available
for CPU access. The user mode driver will be required to implement a non-polling wait. The user mode
driver will make use of the new context monitoring mechanism.
For now, the user mode driver will continue to need to call LockCb/UnlockCb to ask the video memory manager to
setup an allocation for CPU access. In most cases, the user mode driver will be able to keep the allocation mapped
for its entire lifetime. However, in the future, LockCb and UnlockCb will be deprecated in favor of new Lock2Cb and
Unlock2Cb calls. The goal of these new callbacks is to provide a fresh clean implementation with a fresh set of
arguments and flags.
Swizzling ranges are removed from the Windows Display Driver Model (WDDM) v2 and it is the responsibility of
the driver developer to remove the dependency on swizzling ranges from calls to LockCb as they move towards an
implementation that is based on Lock2Cb.
Lock2Cb is exposed as a simple method for obtaining a virtual address to an allocation. There are a few restrictions
based on the type of allocation as well as the current segment that it is currently resident in.
The following apply for CPUVisible allocations:
Cached CPUVisible allocations must reside within an aperture segment or not be resident in order to be locked.
We cannot guarantee cache coherency between the CPU and a memory segment on the graphics processing
unit (GPU).
CPUVisible allocations located in a fully CPUVisible memory segment (resized using the resizable BAR) are
guaranteed to be lockable and able to return a virtual address. No special constraints are required in this
scenario.
CPUVisible allocations located within a !CPUVisible memory segment (with or without access to a
CPUHostAperture) can fail to be mapped into a CPU virtual address for various reasons. If the CPUHostAperture
is out of available space or the allocation does not specify an aperture segment, a virtual address is impossible
to obtain. For this reason we require that all CPUVisible allocations in !CPUVisible memory segments must
contain an aperture segment in their supported segment set to guarantee that we will be able to place the
allocation within system memory and provide a virtual address.
CPUVisible allocations already located within system memory (and/or mapped into an aperture segment) are
guaranteed to work.
The following applies for !CPUVisible allocations:
CPUVisible allocations are backed by section objects which cannot point directly to the GPUs frame buffer. In
order to lock a !CPUVisible allocation, we require that the allocation support an aperture segment in the
supported segment set, or already be in system memory (must not be resident on the device).
If an allocation is successfully locked while the allocation is not resident on the device, but does not support an
aperture segment, the allocation must be guaranteed to not be committed into a memory segment during the
duration of the lock.
Lock2 currently contains no flags, and Reserved flag bits must all be 0.
CPUHostAperture
To better support locking with !CPUVisible memory segments when resizing the BAR fails, a CPUHostAperture is
provided in the PCI aperture. The CPUHostAperture behaves as a page-based manager which can then be mapped
directly to regions of video memory via the DxgkDdiMapCpuHostAperturedevice driver interface (DDI) function.
This allows us to then map a range of virtual address space directly to a non-contiguous range of the
CPUHostAperture, and have the CPUHostAperture then map to video memory without the need for swizzling
ranges.
The maximum amount of lockable memory that can be referenced by the CPU within !CPUVisible memory
segments is limited to the size of the CPUHostAperture. The details for exposing the CPUHostAperture to the
Microsoft DirectX graphics kernel can be found in the CPU host aperture topic.
I/O coherency
On x86/x64 today, we require that all GPUs support I/O coherency over PCIe in order to allow a GPU to read or
write to a cacheable system memory surface and maintain coherency with the CPU. When a surface is mapped as
cache coherent from the point of view of the GPU, the GPU needs to snoop the CPU caches when accessing the
surface. This form of coherency is typically used for resources that the CPU is expected to read from, such as some
staging surfaces.
On some ARM platforms, I/O coherency is not supported directly in hardware. On these platforms, I/O coherency
needs to be emulated by manually invalidating the CPU cache hierarchy. The video memory manager achieves this
today by tracking operations to an allocation coming from the GPU (allocation list read/write operation) as well as
the CPU (Map operation, read/write) and emitting a cache invalidation when we determine the cache may either
contain data that needs to be written back (CPU write, GPU read) or contain stale data that needs to be invalidated
(GPU write, CPU reads).
On platform with no I/O coherency, the responsibility to track CPU and GPU access to allocations falls to the user
mode driver. The graphics kernel exposes a new Invalidate CacheDDI that the user mode driver may use to write
back and invalidate the virtual address range associated with a cacheable allocation. On platforms which do not
have support for I/O coherency, the user mode driver will be required to call this function after CPU write and
before GPU read as well as after write and before CPU read. The latter may seem unintuitive at first, but since the
CPU could have speculatively read data prior to the GPU write making it to memory, it is necessary to invalidate all
CPU caches to ensure the CPU re-reads data from RAM.
Offer and reclaim changes
For Windows Display Driver Model (WDDM) v2, requirements around Offer and Reclaim are being relaxed. User
mode drivers are no longer required to use offer and reclaim on internal allocations. Idle/suspended applications
will get rid of driver internal resources by using the TrimAPI that was introduced in Microsoft DirectX 11.1.
Offer and reclaim will continue to be supported at the API level and the user mode driver is required to forward
application requests to offer or reclaim resources to the kernel. Under WDDM v2, offering allocation is no longer
supported through the allocation list and as a result the user mode driver needs to change the way it implements
offer and reclaim.
Resources being offered by an application should be offered immediately by the user mode driver, by calling
OfferCb, if the resources have no reference in the direct memory access (DMA) buffers currently being built across
all contexts. If the resources have pending references in the DMA buffer being built, the user mode driver should
defer the call to OfferCb until after the dependent DMA buffer has been submitted through RenderCb. The
graphics kernel will take care of deferring the operation, in a non-blocking way, until it is safe to offer the resource
and as such the user mode driver doesnâ€™t need to worry about having to defer the call to OfferCb until the
dependent operation completes on the graphics processing unit (GPU).
Calling reclaim will automatically page in an allocation if it is in the residency requirement list (i.e. the user or driver
has requested the allocation to be resident via a MakeResidentCb call). For ReclaimAllocations2Cb, this operation
is asynchronous, and a paging fence is returned and should be handled the same way as fences returned from
MakeResidentCb. The allocation is guaranteed to be resident and usable on the GPU when the fence is signaled.
Immediately after returning from ReclaimAllocationsCb/ReclaimAllocations2Cb, the backing store of the
allocation is guaranteed to be valid and the allocation may be placed under CPU access via Lock2Cb. The driver
does not need to wait on the paging fence to do so.
Access to non-resident allocation
Graphics processing unit (GPU) access to allocations which are not resident is illegal and will result in a device
removed for the application that generated the error.
There are two distinct models of handling such invalid access dependent on whether the faulting engine supports
GPU virtual addressing or not:
For engines which don’t support GPU virtual addressing and use the allocation and patch location list to patch
memory references, an invalid access occurs when the user mode driver submits an allocation list which
references an allocation which is not resident on the device (i.e. the user mode driver hasn’t called
MakeResidentCb on that allocation). When this occurs, the graphics kernel will put the faulty context/device in
error.
For engines which do support GPU virtual addressing but access a GPU virtual address that is invalid, either
because there is no allocation behind the virtual address or there is a valid allocation but it hasn’t been made
resident, the GPU is expected to raise an unrecoverable page fault in the form of an interrupt. When the page
fault interrupt occurs, the kernel mode driver will need to forward the error to the graphics kernel through a
new page fault notification. Upon receiving this notification, the graphics kernel will initiate an engine reset on
the faulting engine and put the faulty context/device in error. If the engine reset is unsuccessful, the graphics
kernel will promote the error to a full adapter wide timeout detection and recovery (TDR).
Process residency budgets
In Windows Display Driver Model (WDDM) v2, processes will be assigned budgets for how much memory they can
keep resident. This budget can change over time, but generally will only be imposed when the system is under
memory pressure. Prior to Microsoft Direct3D 12, the budget is handled by the user mode driver in the form of
Trim notifications and MakeResident failures with STATUS_NO_MEMORY. TrimToBudget notification, Evict, and
failed MakeResident calls all return the latest budget in the form of an integer NumBytesToTrim value that
indicates how much needs to be trimmed in order to fit in the new budget.
For Direct3D 12 applications, the budget is handled completely by the application. The size of the budget is meant
as a cue to let the application know what to size itself to. By using the budget size as a hint, the application can
decide how many resources to keep resident, what resolution and quality of resources to keep.
To properly manage these budgets, the kernel needs to know what memory should participate in the budget. There
is a new ApplicationTarget bit in DXGK_SEGMENTFLAGS2 structure that needs to be set on segments that the
kernel mode driver wishes to be included in the budgeting logic. For example, on a discrete graphics processing
unit (GPU) with 1 segment of VRAM thatâ€™s suitable for application usage, and 1 segment of VRAM thatâ€™s
used for special-purpose resources automatically, the driver would likely only mark the primary VRAM segment as
ApplicationTarget. For integrated GPUs, the main aperture segment will usually be the one marked. There is no
limit to how many segments can be marked as ApplicationTarget. The kernel will aggregate these together and
present the application with a unified size.
Context monitoring
A monitored fence object is an advanced form of fence synchronization which allows either a CPU core or a
graphics processing unit (GPU) engine to signal or wait on a particular fence object, allowing for very flexible
synchronization between GPU engines, or across CPU cores and GPU engines.
Monitored fence creation

A monitored fence object is created by calling CreateSynchronizationObjectCb callback with the new
synchronization object type D3DDDI_MONITORED_FENCE.
A monitored fence object is created along with the following attributes:
Initial value
Flags (specifying its waiting and signaling behavior)
Upon creation, the graphics kernel returns a fence object composed of the following items:
ITEM DESCRIPTION
hSyncObject Handle to the synchronization object. Used to refer to it in

a call to the graphics kernel.
FenceValueCPUVirtualAddress Read-only mapping of the fence value (64bits) for the

CPU. This address is mapped WB (cacheable) from the
point of view of the CPU on platforms supporting I/O
coherency, UC (uncached) on other platforms. Allows the
CPU to keep track of the fence progress by simply reading
this memory location. The CPU is not allowed to write to
this memory location. To signal the fence, the CPU is
required to call the
SignalSynchronizationObjectFromCpuCb.
Adapters which support IoMmu should use this address
for GPU access. The address is mapped as read-write in
this case.
FenceValueGPUVirtualAddress Read/write mapping of the fence value (64bits) for the

GPU. This address is mapped as requiring I/O coherency
on platforms supporting it. To signal the fence, the GPU is
allowed to write directly to this GPU virtual address.
This address should not be used by IoMmuGPUs.
The fence value is a 64-bit value with their respective virtual addresses aligned on a 64-bit boundary. GPUs should
declare whether they are capable of atomically updating 64-bit values as visible by the CPU via a new
DXGK_VIDSCHCAPS::No64BitAtomics flag. If a GPU is capable of only updating 32-bit values atomically, the OS
will handle the fence wraparound case automatically. However it will place a restriction that outstanding wait and
signal fence values cannot be more than UINT_MAX/2 away from the last signaled fence value.
GPU signal
In case a GPU engine is not capable of writing to a monitored fence using its virtual address, the user mode driver
will use a new SignalSynchronizationObjectFromGpuCb callback that will queue a software signal packet to the
GPU context.
To signal the fence from the GPU, the user mode driver inserts a fence write command in a context command
stream directly without going through kernel model. The mechanism by which the kernel monitors fence progress
varies depending on whether a particular GPU engine support the basic or advanced implementation of the
monitored fence.
When a command buffer completes execution on the GPU, the graphics kernel will go through the list of fence
objects with pending waits that could be signaled for this process, read their current fence value, and determine if
there are any waiters that need to be un-waited.
GPU wait
To wait on a monitored fence on a GPU engine, the user mode driver will first need to flush its pending command
buffer then call WaitForSynchronizationObjectFromGpuCb specifying the fence object (hSyncObject) as well as
the fence value being waited on. The graphics kernel will queue the dependency to its internal database, then
return immediately to the user mode driver so that it may continue to queue work behind the wait operation.
Command buffers submitted after the wait operation will be not scheduled for execution until the wait operation
has been satisfied.
CPU signal
A new SignalSynchronizationObjectFromCpuCb has been added to allow the CPU to signal a monitored fence
object. When a monitored fence object is signaled by the CPU, the graphics kernel will update the fence memory
location with the signaled value so that it becomes immediately visible to any user mode reader as well as
immediately un-wait any satisfied waiters.
CPU wait
A new WaitForSynchronizationObjectFromCpuCb has been added to allow the CPU to wait on a monitored fence
object. Two forms of wait operations are available. In the first form, the WaitForSynchronizationObjectFromCpuCb
callback blocks until the wait has been satisfied. In the second form, WaitForSynchronizationObjectFromCpuCb
takes a handle to a CPU event that will be signaled once the waiting condition has been satisfied.
This section provides details about new features and enhancements in Windows Display Driver Model (WDDM)
version 1.2, which is available starting with Windows 8. It also describes hardware requirements, implementation
guidelines, and usage scenarios.
In this section
TOPIC DESCRIPTION
WDDM 1.2 features This topic describes the WDDM Version 1.2 feature set,
which includes several new enhancements that improve
performance, reliability, and the overall end-user
experience.
Advances to the display Infrastructure Windows 8 provides enhancements and optimizations to

the display infrastructure to further improve the user
experience.
Direct3D features and requirements in WDDM 1.2 Microsoft Direct3D offers a rich collection of 3-D graphics
APIs, which are widely used by software applications for
complex visualization and game development. This section
describes feature improvements and Windows 8 Direct3D
software and hardware requirements.
Graphics INF requirements in WDDM 1.2 WDDM drivers in Windows 8 require INF changes to the
graphics driver. The most notable change is in the feature
score. WDDM 1.2 drivers require a higher feature score
than earlier WDDM drivers. This section describes all
relevant INF requirements for Windows 8 graphics drivers
WDDM 1.2 installation scenarios The Windows 8 installation graphics driver behavior is
designed to ensure that, whenever possible, our
customers get a graphics driver that has been tested and
certified for Windows 8. This behavior is defined by the
rules that are described in this section.
WDDM 1.2 driver enforcement guidelines This section describes WDDM 1.2 driver enforcement
guidelines.
Introduction
The WDDM was introduced with Windows Vista as a replacement of the Windows XP or Windows 2000 Display
Driver Model (XDDM). With its introduction in Windows Vista, the WDDM architecture offered functionality to
enable new features such as Desktop Composition, enhanced fault tolerance, video memory manager, GPU
scheduler, cross process sharing of Direct3D surfaces, and so on. WDDM was specifically designed for modern
graphics devices that were Microsoft Direct3D 9 with pixel shader 2.0 or better, and had all the necessary hardware
features to support the WDDM features. WDDM for Windows Vista was referred to as "WDDM 1.0."
Windows 7 made incremental changes to the driver model for supporting Windows 7 features and capabilities and
was referred to as "WDDM 1.1." WDDM 1.1 is a strict superset of WDDM 1.0. WDDM 1.1 introduced support for
Microsoft Direct3D 11, Windows Graphics Device Interface (GDI) hardware acceleration, Connecting and
Configuring Displays, DirectX Video Acceleration (VA) High-Definition (DXVA-HD), and many other features. For
more details on these features, see the Graphics Guide for Windows 7.
Windows 8 introduces an array of new features and capabilities that require graphics driver changes. These
incremental changes benefit end users and developers, and improve system reliability. The WDDM driver model
that enables these Windows 8 features is referred to as "WDDM 1.2." WDDM 1.2 is a superset of WDDM 1.1 and
WDDM 1.0. These changes can be represented in a simplified form, as shown in this table:
DIRECT3D VERSIONS
OPERATING SYSTEM DRIVER MODELS SUPPORTED SUPPORTED FEATURES ENABLED
Windows Vista WDDM 1.0 D3D9, D3D10 Scheduling, Memory

XDDM on Server and Management, Fault
limited UMPC tolerance, D3D9 & 10
Windows Vista SP1 / WDDM 1.05 D3D9, D3D10, D3D10.1 + BGRA support in D3D10,
Windows 7 client pack D3D 10.1
XDDM on Server 2008
Windows 7 WDDM 1.1 D3D9, D3D10, D3D10.1, GDI Hardware acceleration,

D3D11 DXVA HD, D3D11
XDDM on Server 2008
R2
Windows 8 WDDM 1.2 D3D9, D3D10, D3D10.1, Smooth Rotation,

D3D11, D3D11.1 Stereoscopic 3-D, D3D11
Video, D3D11.1, etc.
Note
With Windows 8 and WDDM 1.2, XDDM is no longer supported, and XDDM drivers do not load on Windows 8
client or server. For the scenarios that are traditionally dependent on XDDM, Windows 8 allows migration to
WDDM as shown in the next table.
independent hardware vendors (IHVs) and system builders should adopt the alternative WDDM solution that
works best for their customers. This means that a Windows 8 system will always have a WDDM-based driver.
CURRENTLY USING WDDM SUPPORT FOR XDDM SCENARIOS
XDDM VGA Driver Microsoft Basic Display Driver
XDDM IHV Driver System builders need to work with the IHV to get:
Display-Only WDDM Driver or
Full Graphics WDDM Driver
Alternately Microsoft Basic Display Driver
XDDM Virtualization Driver System builders need to work with the IHV to get a new
Display-Only Virtualization Driver
CURRENTLY USING WDDM SUPPORT FOR XDDM SCENARIOS
CSM for Int10 support on Unified Extensible Firmware No longer needed with UEFI Graphics Output Protocol (GOP)
Interface (UEFI) support
Remote Desktop Access/Collab Desktop Duplication API
Remote Session Driver No change, no support for <32 bpp modes
Note
Microsoft provides a WDDM-based Basic Display Driver that is a replacement for the earlier in-box XDDM
Standard VGA driver and provides basic display functionality and software-based 2-D and 3-D rendering.
WDDM 1.2 introduces new types of graphics drivers, targeting specific scenarios as described below:
WDDM Full Graphics Driver: This is the full version of the WDDM graphics driver that supports hardware
accelerated 2-D and 3-D operations. This driver is fully capable of handling all the render, display, and video
functions. WDDM 1.0 and WDDM 1.1 are full graphics drivers. All Windows 8 client systems must have a full
graphics WDDM 1.2 device as the primary boot device.
WDDM Display Only Driver: This driver is supported only as a WDDM 1.2 driver and enables IHVs to write a
WDDM based kernel-mode driver that is capable of driving display-only devices. Windows handles the 2-D or
3-D rendering by using software-simulated GPU. Display-only devices are not allowed as the primary graphics
device on client systems.
WDDM Render Only Driver: This driver is supported only as a WDDM 1.2 driver and enables IHVs to write a
WDDM driver that supports rendering functionality only. Render-only devices are not allowed as the primary
graphics device on client systems.
This table summarizes driver model versus the supported driver categories:
DRIVER MODEL/DRIVER
CATEGORY FULL GRAPHICS DISPLAY ONLY RENDER ONLY
WDDM 1.0 (Windows Vista) Yes No No
WDDM 1.1 (Windows 7) Yes No No
WDDM 1.2 (Windows 8) Yes Yes Yes
This table explains scenario usage for the new driver types:
CLIENT RUNNING IN A
VIRTUAL
CLIENT SERVER ENVIRONMENT SERVER VIRTUAL
Full Graphics Required as boot Optional Optional Optional

device
Display-Only Not allowed Optional Optional Optional
Render-Only Optional as non- Optional Optional Optional

primary adapter
Headless Not allowed Optional N/A N/A
WDDM 1.2 is required for all systems that are shipped with Windows 8. WDDM 1.0 and WDDM 1.1 will continue
to work on Windows 8. However, the best experience and Windows 8â€“specific features are enabled only by a
WDDM 1.2 driver.
WDDM 1.2 features
This topic describes the Windows Display Driver Model (WDDM) Version 1.2 feature set, which includes several
new enhancements that improve performance, reliability, and the overall end-user experience.
Each of these features requires special support from third-party WDDM 1.2 and later drivers. This section
elaborates on what constitutes the WDDM 1.2 feature set.
WDDM 1.2 has both mandatory and optional features. The driver must implement all the mandatory features to
claim itself as a "WDDM 1.2 driver," while the driver can implement any combination (or none) of the optional
features. A non-WDDM 1.2 driver must report none of the WDDM 1.2 features.
This table summarizes the WDDM 1.2 feature set. "M" indicates mandatory, "O" indicates optional, and "NA"
indicates not applicable. To read details about each feature, follow the link in the left column.
WDDM 1.2 feature set
WINDOWS 8 FEATURES
ENABLED BY WDDM WDDM DRIVER TYPE: WDDM DRIVER TYPE: WDDM DRIVER TYPE:
1.2 FEATURE BENEFIT FULL GRAPHICS RENDER ONLY DISPLAY ONLY
Video memory offer Enables more efficient M M NA

and reclaim usage of video
memory
GPU preemption Improves desktop M M NA

responsiveness
TDR changes in Improved resiliency M M NA

Windows 8 to GPU hangs
Optimized screen Screen rotation M NA M

rotation support experience without
flicker
Stereoscopic 3D Provides a consistent O NA NA

API and DDI platform
to enable
Stereoscopic 3D
scenarios
Direct3D 11 video Simplified M* M* NA

playback programming
improvements experience for video
playback applications
Direct flip of video Improvements in the M NA NA

memory video playback and
composition stack to
reduce power
consumption
WINDOWS 8 FEATURES
ENABLED BY WDDM WDDM DRIVER TYPE: WDDM DRIVER TYPE: WDDM DRIVER TYPE:
1.2 FEATURE BENEFIT FULL GRAPHICS RENDER ONLY DISPLAY ONLY
Providing seamless High resolution is M NA M

state transitions maintained in state
transitions and
during bug checks
Plug and Play (PnP) Maintain high M NA M

start and stop resolution as display
ownership is
transitioned between
firmware, Windows,
and driver
Standby hibernate Enables optimizations O O NA

optimizations to the graphics stack
to improve
performance on sleep
and resume
GPU power Provides a O O O

management of idle standardized
states and active infrastructure for
power fine-grained device
power management
XPS rasterization on Enables a quality M** M** NA

the GPU printing experience
on Windows with
third-party drivers
Container ID support Helps represent M NA M

for displays monitor device
connectivity and
associated state to
the user in a user
interface similar to
the device hub
Disabling Frame Improves debugging M M M

Pointer Omission of performance
(FPO) optimization problems related to
FPO in the field
User-mode driver Improves ability to M M NA

logging diagnose and
investigate memory-
related issues by
providing better view
into memory usage
*This feature is mandatory for all WDDM 1.2 drivers with Microsoft Direct3D 10-, 10.1-, 11-, or 11.1-capable
hardware (or later).
**No new device driver interface (DDI) or behavior changes. However, WDDM 1.2 and later drivers must be able
to pass XML Paper Specification (XPS) rasterization conformance tests to ensure a quality printing experience for
hardware-accelerated XPS printing scenarios.
Note
A new set of APIs is available starting with Windows 8 for duplicating the desktop for collaboration scenarios. For
more details, see Desktop duplication.
Additional new features in Windows 8

The following new or updated display driver DDIs are also provided in Windows 8:
Kernel Mode Display-Only Driver (KMDOD) Interface
Provides a limited set of display functions without rendering capability.
Note Refer also to the Kernel mode display-only miniport driver sample in the MSDN hardware sample gallery.
Support for system on a chip (SoC) architecture through the SPB interface
Lets a display miniport driver access bus resources on an SoC system.
Surprise removal of secondary adapter
DxgkDdiNotifySurpriseRemoval
DXGK_SURPRISE_REMOVAL_TYPE
DXGK_DRIVERCAPS
D3DKMT_WDDM_1_2_CAPS
System Firmware Table Interface
Lets the display miniport driver enumerate and read the system firmware table.
Brightness Control Interface V. 2 (Adaptive and Smooth Brightness Control)
Lets a display miniport driver reduce power to the display backlight and still smoothly adapt to changes in
ambient light and user requests to change brightness.
Also see Windows 8 brightness control for integrated displays.
Microsoft DirectX Graphics Infrastructure DDI (DXGI )
Blt1DXGI
DXGI_DDI_ARG_BLT1
DXGI_DDI_BASE_ARGS
DXGI1_2_DDI_BASE_FUNCTIONS
Allocation sharing & enqueing GPU events
pfnCreateSynchronizationObject2Cb
pfnSignalSynchronizationObject2Cb
pfnWaitForSynchronizationObject2Cb
D3DDDI_DEVICECALLBACKS
D3DDDI_SYNCHRONIZATIONOBJECT_FLAGS
D3DDDICB_CREATESYNCHRONIZATIONOBJECT2
D3DDDICB_SIGNALFLAGS
D3DDDICB_SIGNALSYNCHRONIZATIONOBJECT2
D3DDDICB_WAITFORSYNCHRONIZATIONOBJECT2
D3DKMT_CREATEALLOCATIONFLAGS
D3DKMT_CREATEKEYEDMUTEX2
D3DKMT_CREATEKEYEDMUTEX2_FLAGS
D3DKMT_RELEASEKEYEDMUTEX2
D3DKMTShareObjects
Cancel command interface
DxgkDdiCancelCommand
DXGKARG_CANCELCOMMAND
DXGK_VIDSCHCAPS
Output duplication
D3DKMTOutputDuplPresent
D3DKMTOutputDuplReleaseFrame
D3DKMT_OUTPUTDUPL_RELEASE_FRAME
D3DKMT_OUTPUTDUPL_SNAPSHOT
D3DKMT_OUTPUTDUPLCONTEXTSCOUNT
D3DKMT_OUTPUTDUPLPRESENT
D3DKMT_OUTPUTDUPLPRESENTFLAGS
D3DKMT_PRESENT_RGNS
Windows 8 OpenGL Enhancements
OpenGL installable client drivers (ICDs) can call new functions to control access to resources and to map between
objects and identifiers.
Advances to the display Infrastructure
Windows 8 provides enhancements and optimizations to the display infrastructure to further improve the user
experience.
In this section
TOPIC DESCRIPTION
Container ID support for displays This topic describes Container ID support for displays—
visual representation of devices that are embedded within
a display or monitor device.
Microsoft Basic Display Driver In Windows 8, The Microsoft Basic Display Driver
(MSBDD) is the in-box display driver that replaces the
XDDM VGA Save and VGA PnP drivers.
Desktop duplication Windows 8 introduces a new Microsoft DirectX Graphics

Infrastructure (DXGI)-based API to make it easier for
independent software vendors (ISVs) to support desktop
collaboration and remote desktop access scenarios.
Support for headless systems Windows 8 supports booting without any graphics
hardware. This is accomplished by using a stub display
output if no display devices are found. This stub display is
implemented as part of the in-box Microsoft Basic Display
Driver (MSBDD).

Container ID support for displays
This topic describes Container ID support for displays—visual representation of devices that are embedded within
a display or monitor device.
Minimum Windows Display Driver Model (WDDM) version 1.2
Minimum Windows version 8
Driver implementation—Full graphics and Display only Mandatory
WHCK requirements and tests Device.Graphicsâ€¦ContainerIDSupport
Container ID device driver interface (DDI)

Implement this function and structure in your display miniport driver:
DxgkDdiGetChildContainerId
DXGK_CHILD_CONTAINER_ID
Container ID description
New capabilities in monitor devices can provide a better user experience. In particular, Universal Serial Bus (USB)
hubs are popular connectors on monitors for connecting mouse and keyboard. Also, connectors such as HDMI
support audio, and therefore audio speakers are embedded in monitors as well. Many new display devices support
touch capabilities. This provides a great user experience by reducing wire clutter on user desktops.
It's important to visually represent the connectivity and state of these devices to the user in an intuitive way. The
Devices and Printers page was introduced with Windows 7. As shown here, the Devices and Printers folder
shows the user the installed devices that are connected to the PC, providing a simple way to check on a printer,
music player, camera, mouse, or digital picture frame (to name just a few). At the same time, this page groups
those devices that are contained within the same piece of hardware to make it easier for users to discover all their
drivers.
With Windows 7 Microsoft introduced the concept of a container ID for devices: "a system-supplied device
identification string that uniquely groups the functional devices associated with a single-function or multifunction
device installed in the computer." (See Container IDs.) The devices are grouped if they contain the same container
ID.
For the container ID concept to be successful, all the device classes in Windows must support it, and the entire
ecosystem needs to implement it in hardware. In Windows 7, if multiple monitors that support audio are plugged
in, it isn't easy for the user to determine which display maps to which audio end points. The same difficulty exists
for touch digitizers. In Windows 8, the display device class adds support for container ID. This makes it possible for
all the functions of a display device to report the same container ID and get visually paired in the Windows user
interface and the APIs.
Container ID user scenarios

Consider the following workflow for a monitor that has embedded audio speakers:
1. The user connects the monitor using an HDMI cable.
2. WDDM driver reports the presence of display device to the Windows graphics stack.
3. The Windows graphics stack queries WDDM driver for the Container ID, using the device driver interfaces
(DDIs) introduced with Windows 8.
4. The display driver queries the monitor for the container ID and passes it back to Windows.
5. At the same time, the audio driver must pass the exact same container ID to the Windows audio stack.
6. If viewed in the Devices and Printers control panel, the display and speakers are grouped together.
In some cases, the display device might not contain a container ID. In this case, Windows automatically generates a
unique container ID by using the manufacturer ID, product ID, and serial number obtained from the Extended
Display Identification Data (EDID). Because these values are unique, the container ID is also unique. Windows 8
provides a DDI that passes the same information to the WDDM driver so that it can be passed to the audio driver
to generate the same container ID.
In a few scenarios, the ownership of driving the display is transitioned between Windows, the WDDM display
driver, and firmware. These transitions are associated with hardware or the software that is being reset or
reconfigured and can cause screen flashes and flickers. Possible transition scenarios and their behaviors are
discussed in Providing seamless state transitions in WDDM 1.2 and later.
Hardware certification requirements

For info on requirements that hardware devices must meet when they implement this feature, refer to the relevant
WHCK documentation on Device.Graphicsâ€¦ContainerIDSupport.
See WDDM 1.2 features for a review of features added with Windows 8.
Microsoft Basic Display Driver
In Windows 8, The Microsoft Basic Display Driver (MSBDD) is the in-box display driver that replaces the XDDM VGA
Save and VGA PnP drivers.
The key benefits of using MSBDD are as follows:
MSBDD helps to enable a consistent end user and developer experience because it is compatible with DirectX
APIs and technologies such as the Desktop Composition.
Server scenarios can benefit from the higher functionality (specifically, features like reboot-less updates,
dynamic start and stop, and so on) that are provided by the WDDM driver model.
MSBDD supports Unified Extensible Firmware Interface (UEFI) Graphics Output Protocol (GOP).
MSBDD works on both XDDM and WDDM hardware.
MSBDD is the default in-box display driver that is loaded during setup, in safe mode, in the absence of an IHV
graphics driver, or when the inbox installed graphics IHV driver is not working or is disabled. The primary purpose
of this driver is to enable Windows to write to the display controllerâ€™s linear frame buffer.
MSBDD can use the video BIOS to manage modes and resolutions on a single monitor. On UEFI platforms, MSBDD
inherits the linear frame buffer that is set during boot; in this case, no mode or resolution changes are possible. As
shown in Figure 1 Scenarios supported by Microsoft Basic Display Driver, MSBDD is used in the following scenarios:
Server: Server configurations that lack WDDM-capable graphics hardware can use MSBDD.
Windows setup: In the early phases of Windows setup, just before the final boot, only the MSBDD is loaded.
For example, a user has an older platform that is currently in working condition although it has no in-box
graphics driver support for Windows 8. The user upgrades to Windows 8 and uses MSBDD for the setup,
installation, and to retrieve an IHV driver if one is available.
â€¢ Driver installation, in the following cases:
When a user is installing a new WDDM IHV driver, MSBDD is used during the transition (from the point
when the old WDDM IHV driver is uninstalled to the point before the new IHV driver is installed).
When a user encounters problems installing the latest WDDM IHV driver, the user or system can disable
the current graphics driver and fallback to using MSBDD.
Driver upgrade: By using MSBDD, there is no need to go through a system reboot when upgrading to the IHV-
recommended driver.
Safe mode: In this mode, only trusted drivers get loaded; this includes MSBDD.
Figure 1 Scenarios Supported by Microsoft Basic Display Driver
Desktop duplication
Windows 8 introduces a new Microsoft DirectX Graphics Infrastructure (DXGI)-based API to make it easier for
independent software vendors (ISVs) to support desktop collaboration and remote desktop access scenarios.
Such applications are widely used in enterprise and educational scenarios. These applications share a common
requirement: access to the contents of a desktop together with the ability to transport the contents to a remote
location. The Windows 8 Desktop duplication APIs provide access to the desktop contents.
Currently, no Windows API allows an application to seamlessly implement this scenario. Therefore, applications use
mirror drivers, screen scrapping, and other proprietary methods to access the contents of the desktop. However,
these methods have the following set of limitations:
It can be challenging to optimize the performance.
These solutions might not support newer graphics-rendering APIs because the APIs are released after the
product ships.
Windows does not always provide rich metadata to assist with the optimization.
Not all solutions are compatible with the desktop composition in Windows Vista and later versions of Windows.
Windows 8 introduces a DXGI-based API called Desktop Duplication API. This API provides access to the contents of
the desktop by using bitmaps and associated metadata for optimizations. This API works with the Aero theme
enabled, and is not dependent on the graphics API that applications use. If a user can view the application on the
local console, then the content can be viewed remotely as well. This means that even full screen DirectX
applications can be duplicated. Note that the API provides protection against accessing protected video content.
The API enables an application to request Windows to provide access to the contents of the desktop along monitor
boundaries. The application can duplicate one or more of the active displays. When an application requests
duplication, the following occurs:
Windows renders the desktop and provides a copy to the application.
Each rendered frame is placed in GPU memory.
Each rendered frame comes with the following metadata:
Dirty region
Screen-to-screen moves
Mouse cursor information
Application is provided access to frame and metadata.
Application is responsible for processing each frame:
Application can choose to optimize based on dirty region.
Application can choose to use hardware acceleration to process move and mouse data.
Application can choose to use hardware acceleration for compression before streaming out.
For detailed documentation and samples, see Desktop Duplication API.
Support for headless systems
Windows 8 supports booting without any graphics hardware. This is accomplished by using a stub display output if
no display devices are found. This stub display is implemented as part of the in-box Microsoft Basic Display Driver
(MSBDD).
Because the stub display is used when no PnP driver is available, no third-party drivers are required. It works for
both normal operation and for system crashes, so no hardware or firmware support is required to fake a display
device.
On architectures in which VGA has been the norm, MSBDD requires positive confirmation that VGA is not present;
otherwise, it assumes that VGA hardware is available and that the system is not headless. System firmware should
set the VGA Not Present flag in the IAPC_BOOT_ARCH field of FADT and if there is any VBIOS, it should implement
an empty mode list through the VESA BIOS Extensions (VBE). These mechanisms should indicate that VGA is not
present even if the system implements a VBIOS with int 10h mode 12h support for compatibility with previous
versions of Windows. In the absence of VBE support, the Basic Display Driver uses a display that is initialized by the
boot loader, so a headless system should not represent a working display through UEFI GOP.
Direct3D features and requirements in WDDM 1.2
Microsoft Direct3D offers a rich collection of 3-D graphics APIs, which are widely used by software applications for
complex visualization and game development. This section describes feature improvements and Windows 8
Direct3D software and hardware requirements.
In this section
TOPIC DESCRIPTION
DirectX feature improvements in Windows 8 Windows 8 includes Microsoft DirectX feature

improvements that benefit developers, end users and
system manufacturers.
Direct3D software requirements in Windows 8 This topic describes software requirements to support
Direct3D in Windows 8.
Hardware requirements This topic describes hardware requirements to support

Direct3D in Windows 8.
Depending on the capability of the graphics adapter, Direct3D allows applications to utilize hardware acceleration
for the entire 3-D rendering pipeline or for partial acceleration. Newer versions of the Direct3D APIs such as
Direct3D 9Ex and Microsoft Direct3D 10 are available only starting with Windows Vista because the Windows
Display Driver Model (WDDM) provides the display driver interfaces needed for the functionality. This figure shows
the incremental versions of Direct3D APIs that are supported on the various versions of WDDM:
Direct3D APIs supported on various versions of WDDM

DirectX feature improvements in Windows 8
Windows 8 includes Microsoft DirectX feature improvements that benefit developers, end users and system
manufacturers.
The feature improvements are in the following areas:
Pixel formats (5551, 565, 4444): Higher performance for DirectX applications on lower-power hardware
configurations.
Double-precision shader functionality: High Level Shader model performance improvements that let you do
more on the GPU without involving the CPU.
Target-independent rasterization: Higher performance anti-aliasing path for Direct2D applications.
No overwrite and discard: Higher performance for Microsoft Direct3D 11.1 applications on mobile platforms
and power constraint devices that use tile-based renderers.
UAVs at every stage: Added capabilities to enable shader debugging at all shader stages on DirectX 11.1
hardware.
Cross-process sharing of texture arrays (for supporting Stereoscopic 3D): Provides a basis to enable
Stereoscopic 3-D.
Unordered access views with multi-sample anti-alias sample access: Enables Direct3D 11 applications to
implement high-quality rendering algorithms without needing to allocate memory for large numbers of
samples.
Logic ops: Improvements to deferred shading techniques.
Improved control of constant buffers: Efficient buffer management for game developers.
Pixel formats (5551, 565, 4444)

To better support graphics in low-power configurations using DirectX, the following DirectX 9 pixel formats from
the DXGI_FORMAT enumeration must be supported in Direct3D for Windows 8:
DXGI_FORMAT_B5G6R5_UNORM
DXGI_FORMAT_B5G5R5A1_UNORM
These additional formats provide increased performance on lower-power hardware in DirectX applications. These
formats are supported on all GPUs to date. This table describes the required support for these formats, depending
on the hardware feature level.
Required format support depending on hardware feature levels
CAPABILITY FEATURE LEVEL 9_X FEATURE LEVEL 10.0 FEATURE LEVEL 10.1 FEATURE LEVEL 11+
Typed Buffer No Required Required Required
Input Assembler No Optional Optional Optional

Vertex Buffer
Texture1D No Required Required Required

CAPABILITY FEATURE LEVEL 9_X FEATURE LEVEL 10.0 FEATURE LEVEL 10.1 FEATURE LEVEL 11+
Texture2D Required Required Required Required
Texture3D No Required Required Required
TextureCube Required Required Required Required
Shader ld* No Required Required Required
Shader sample* (with Required Required Required Required

filtering)
Shader gather4 No No No Required
Mipmap Required Required Required Required
Mipmap Auto- Required for 565, Required for 565, Required for 565, Required for 565,
Generation optional for 4444, optional for 4444, optional for 4444, optional for 4444,
5551 5551 5551 5551
RenderTarget Required for 565, no Required for 565, Required for 565, Required for 565,
for 4444, 5551 optional for 4444, optional for 4444, optional for 4444,
5551 5551 5551
Blendable Required for 565, no Required for 565, Required for 565, Required for 565,
RenderTarget for 4444, 5551 optional for 4444, optional for 4444, optional for 4444,
5551 5551 5551
UAV Typed Store No No No Optional
CPU Lockable Required Required Required Required
4x MSAA Optional Optional Required for 565, Required for 565,

optional for 4444, optional for 4444,
5551 5551
8x MSAA Optional Optional Optional Required for 565,

optional for 4444,
5551
Other MSAA Sample Optional Optional Optional Optional

Count
Multisample Resolve Required (if MSAA Required (if MSAA Required for 565, Required for 565,
supported) for 565, supported) for 565, optional for 4444, optional for 4444,
no for 4444, 5551 optional for 4444, 5551 5551
5551
Multisample Load No Required (if MSAA Required for 565, Required for 565,
supported) for 565, optional for 4444, optional for 4444,
optional for 4444, 5551 5551
5551)
Double-precision shader functionality

In Windows 8, Windows Display Driver Model (WDDM) 1.2 drivers that support double precision must also
support additional double-precision floating-point instructions in High Level Shader model 5 in all shader stages.
The instructions are:
Double-precision reciprocal
Double-precision divide
Double-precision fused multiply-add
Because the runtime can pass these instructions directly to the driver, the implementation can optimize their
performance, or implement them as specialized single instructions in hardware.
Note
To use these features, developers must ensure that they are running with FEATURE_LEVEL_11 or higher with
double-precision support (D3D11_FEATURE_DOUBLES) on a WDDM 1.2 or later driver.
Sum of absolute differences
Image processing is a critical application in modern devices. A common operation is pattern matching or search.
Video-encoding operations typically search for matching square tiles (typically 8x8 or 16x16), and image
recognition algorithms search for more general shapes that are identified by a bit mask. To improve the
performance of these scenarios, a new intrinsic has been added to the Microsoft High Level Shader Language
(HLSL) for Shader Model 5.0 in all shader stages. This intrinsic msad4() corresponds to and generates a group of
masked sum of absolute differences (MSAD) instructions in the shader IL. All WDDM 1.2 and later drivers must
support this instruction either directly in hardware or as a set of other instructions (emulated).
Note
Ideally, the MSAD instruction should be implemented so that overflow results in saturation, not in a wrap behavior.
Be aware that overflow behavior is undefined.
Developers must check to make sure that they are running with FEATURE_LEVEL_11 or higher on a WDDM 1.2 or
later driver to use this feature. Developers must not rely on result accuracy for accumulation values that overflow
(that is, go above 65535).
Target-independent rasterization (TIR)

Target-independent rasterization (TIR) provides a high performance anti-aliasing path for Direct2D usage scenarios
that involve high-quality anti-aliasing of structured graphics. TIR enables Direct2D to move the rasterization step
from the CPU to the GPU while it preserves the Direct2D anti-aliasing semantics and quality. Using this capability,
the software layer can evaluate a large number of sub-pixel sample positions for coverage, yet only allocate the
memory that is required for a smaller number of samples. This provides the performance advantage of using the
GPU to render but retaining the image quality of a CPU-rendered implementation. This allows a single sample to
be broadcast to multiple samples of a multi-sample anti-aliased render target.
SampleCount =1 (Limited TIR on 10, 10.1 & 11)
Direct3D 10.0 - Direct3D 11.0 hardware (and Feature Level 10_0 - 11_0) supports ForcedSampleCount set to 1
(and any sample count for Render Target View) along with the described limitations (for example, no depth/stencil).
For 10_0, 10_1 and 11_0 hardware, when D3D11_1_DDI_RASTERIZER_DESC.ForcedSampleCount is set to 1,
line rendering cannot be configured to 2-triangle (quadrilateral)â€“based mode (that is, the MultisampleEnable
state cannot be set to true). This limitation isn't present for 11_1 hardware. Note that the naming of the
MultisampleEnable state is misleading because it no longer has anything to do with enabling multisampling;
instead, it is now one of the controls together with AntialiasedLineEnable for selecting line-rendering mode.
This limited form of target-independent rasterization, with ForcedSampleCount = 1, closely matches a mode that
was present in Direct3D 10.0, but became unavailable for Direct3D 10.1 and Direct3D (and Feature Levels 10_1 and
11_0) due to API changes. In Direct3D 10.0, this mode was the center-sampled rendering even on a Multiple
Sample Anti Aliasing (MSAA) surface that was available when MultisampleEnable was set to false (and this could
be toggled by toggling MultisampleEnable). In Direct3D 10.1+, MultisampleEnable no longer affects
multisampling (despite the name), and only controls line-rendering behavior.
No overwrite and discard

Rendering content on a tile -based deferred-rendering (TBDR ) architecture
Render targets in Direct3D 11.1 can now support a discard behavior by using a new set of resource APIs.
Developers must be aware of this capability and call an additional Discard() method to run more efficiently on
TBDR architectures (with no penalty to traditional graphics hardware). This will improve performance on mobile
platforms and other power-constrained devices that use tiled renderers.
Updating resources on a TBDR architecture
Because TBDR architectures complete multiple passes over the same command buffer, you must use special care to
notify the driver when a portion of a sub-resource was not modified during a previous draw call. Having a
NO_OVERWRITE usage on a Direct3DUpdateSubResource function can help the driver to manage resources
where no previous draw calls were made to a region of a texture. This simply requires that you inform the driver of
the applicationâ€™s intent of either discarding the existing data, or protecting it from overwrite. This enables more
efficient rendering on TBDR architectures and introduces no penalties when it is run on traditional desktop
hardware.
New variants of the Direct3D 11 UpdateSubresource() and CopySubresourceRegions APIs, which both update a
portion of a GPU surface, provide an addition Flags field where NO_OVERWRITE or DISCARD can be specified.
These APIs drive the Direct3D 11.1 device driver interface (DDI) and Direct3D 9 DDIs. New drivers for any DirectX
9+ hardware are required to support revised BLT, BUFBLT, VOLBLT, and TEXBLT DDIs by adding the flags discussed
here.
These are also required to be supported for all Direct3D 10+ hardware with Direct3D 11.1 drivers.
UAVs at every stage

In Microsoft Direct3D 11, the number of unordered-access views (UAVs) was limited to eight at the Compute
Shader and to eight combined (render target views (RTVs) + UAVs) at the Pixel Shader. In DirectX 11.1, the number
that can be bound has been increased. For DirectCompute, the limit is now 64, and for graphics the combined total
bound at the output merger is 64 (that is, graphics can have 64 minus the up-to-eight that are potentially used by
RTVs).
Unordered access views can be accessed from any shader stage, but still come out of the total for the graphics
pipeline
Adding UAVs at every shader stage allows you to add debugging information to the pipeline. This ease of
development makes Windows a more desirable platform for writing GPU-accelerated applications.
This requires at least a DirectX 11.1 feature level.
Cross-process sharing of texture arrays (for supporting Stereoscopic 3-

D)
Although Stereoscopic 3-D is an optional WDDM 1.2 system feature, there is underlying infrastructure that must be
implemented by all WDDM 1.2 device drivers regardless of whether they support the Stereoscopic 3-D system
feature.
DirectX 10 (or greater)â€“capable graphics hardware must support cross-process sharing of texture arrays. This
capability provides a basis to enable Stereoscopic 3-D. The WDDM 1.2 Direct3D DDIs require support of arrayed
buffers as render targets independent of hardware feature level.
This requirement ensures that stereo applications wonâ€™t have failures in mono modes. For example: even for
cases when stereo is not enabled on the system, applications should be able to create stereo swap chains or
arrayed buffers as render targets and then call Present. In this case, only the left view is displayed (or if the prefer
right Microsoft DirectX Graphics Infrastructure (DXGI) present flag is set, only the right view).
Therefore, WDDM 1.2 drivers (Full Graphics & Render devices) must support Direct3D 11 APIs by adding support
for cross process sharing of texture arrays. In earlier versions, cross-process shared resources could be only single-
layer surfaces. In Windows 8, the maximum size of a shared array is two elements (which is sufficient for stereo).
For more information on this requirement, see Device.Graphicsâ€¦Stereoscopic3DArraySupport in Windows
Hardware Certification Requirements. Other relevant Microsoft WindowsWindowsWindows HCK requirements are
Device.Graphicsâ€¦ProcessingStereoscopicVideoContent and
Device.Display.Monitor.Stereoscopic3DModes.
UAVs with multi-sample anti-alias sample access

Direct3D 11 allows rasterization to unordered access views (UAVs) with no render target views (RTVs)/DSVs
bound. Even though UAVs can have arbitrary sizes, the implementation can operate the rasterizer by using the pixel
dimensions of the viewport/scissor rectangle. The sample pattern for DirectX 11 hardware is single sample only.
The DirectX 11.1 hardware specification expands to allow multiple samples. This is a variation of target-
independent rasterization where only UAVs are bound for output.
UAV-only rendering together with multisampling at the rasterizer is now possible by keying off the
ForcedSampleCount state, with the sample patterns limited to 0, 1, 4, and 8 (not 16, which TIR supports). (The UAVs
themselves are not multi-sampled in terms of allocation.) A setting of 0 is equivalent to the setting 1 - single
sample rasterization.
Shaders can request pixel-frequency invocation with UAV-only rendering. However, requesting sample-frequency
invocation is invalid (produces undefined shading results). The SampleMask rasterizer state does not affect
rasterization behavior here at all.
Support for this feature is available on DirectX 11.0+ hardware, including hardware that does not support full 11_1
level of target-independent rasterization with RTVs. The driver can report that it supports UAV-only multi-sample
anti-alias sample access (MSAA) rendering (implying 4 and 8 samples are both supported). All DirectX 11+
hardware supports 1. If the hardware can perform full 11_1 target-independent rasterization with RTVs (which
requires 16-sample support), then UAV-only MSAA rasterization support is required (meaning 4 and 8 samples in
the UAV-only case).
This feature enables applications to implement high quality rendering algorithms such as analytic anti-aliasing
without needing to allocate memory for large numbers of samples.
Logic operations
Allowing for logic operations at the output merger allows you to perform some operations on images that are
currently not possible. For example, you can compute masks much more effectively and easily and also implement
modern deferred-shading techniques for 3-D rendering.
Although this functionality exists in most 3-D hardware, it is not currently as general as the color blending is. As a
result, the configuration of logic ops is constrained in the following ways:
When logic ops are used in the first RT blend desc, IndependentBlendEnable must be set to false, so that the
same logic op applies to all RTs.
When logic ops are used, all RenderTargets bound must have a UINT or SINT format, otherwise the rendering is
undefined.
Improved control of constant buffers
Partial constant buffer updates
Constant buffers today require monolithic copies from source to destination during updates that clobber the entire
buffer. Where it's desired to update only a portion of the constant buffer, an offset for the writes is ideal. This ability
to random-access write into a constant buffer is requested by game developers and makes constant buffer
management more natural and efficient. These capabilities were already supported for other buffer types, and are
added to constant buffers in WDDM 1.2 drivers.
This feature must be supported for all Direct3D 10+ hardware with Direct3D 11.1 drivers. For the developer, this is
emulated on DirectX 9 hardware so it works on all feature levels.
Note
You must specify either the NO_OVERWRITE or DISCARD flag.
Offsetting constant buffer updates
A common desire for high-performance game engines is to collect a large batch of constant buffer updates for
constants to be referenced by separate Draw\* calls, each needing its own constants, all at once. This is facilitated
by allowing the application to create a large buffer and then pointing individual shaders to regions within it (similar
to a view, but without having to make a whole object to describe the view).
Constant buffers now can be created that have a size larger than the maximum constant buffer size addressable by
an individual shader (at most 4096 16-byte elements - 65kB, where each element is one four-component shader
constant). The constant buffer resource size is now limited only by the size of memory allocation that the system is
capable of handling.
When a constant buffer larger than 4096 elements is bound to the pipeline by using *SetShaderConstantsAPIs
such as VSSetShaderConstants, it appears to the shader as if it is only 4096 elements in size.
A variant of the *SetShaderConstantsAPIs, *SetShaderConstants1, allows a "FirstConstant" and "ConstantCount"
to be specified together with the binding. When the shader accesses a constant buffer bound this way, it appears as
if it starts at the specified "FirstConstant" offset (where 1 means 16 bytes) and has a size defined by ConstantCount
(number of 16-byte constants). This is basically a lightweight "View" of a region of a larger constant buffer. (Both
FirstConstant and ConstantCount must be a multiple of 16).
This feature must be supported by all WDDM 1.2 drivers for Direct3D 10+ hardware. The Direct3D 11 runtime
emulates the appropriate behavior for Feature Level 9_x.
Clearview
This feature enables the implementation to perform an efficient clear operation on a video memory resource,
clearing multiple rects in a single API/DDI call. The API includes support for rectangles that define a subset of the
resource to be cleared. This capability was supported in the DirectX 9 DDI, and is required for Windows 8 drivers
(WDDM 1.2). This approach results in improved performance for 2-D operations such as those used in imaging and
UI.
Tileable copy flag

A tileable copy operation allows an application to notify the implementation that the image source and destination
are pixel-aligned and will not participate in cross-pixel exchange of information in a subsequent rendering pass.
This enables significant performance improvements on some implementations that benefit from caching subsets of
the image data during the copy operation. This capability was supported in the DirectX 9 DDI, and is required for
Windows 8 and later drivers (WDDM 1.2).
Same-surface blits
Many UI operations, such as scrolling, require transferring image data from one portion of an image to another.
This feature adds support for a copy operation where both source rectangle and destination rectangle are in the
same image or resource. In the case of overlapping source and destination rectangles, the situation must be
handled correctly by the implementation and driver. This was already required by the DirectX 9 DDI and is required
in WDDM 1.2 for all hardware. This approach results in significant performance improvements of key UI scenarios.
Direct3D 11.1 DDI

These functions and structures are new or updated for Windows 8:
AssignDebugBinary
CalcPrivateBlendStateSize(D3D11_1)
ClearView
DefaultConstantBufferUpdateSubresourceUP(D3D11_1)
ResourceUpdateSubresourceUP(D3D11_1)
VsSetConstantBuffers(D3D11_1)
D3D11_1DDI_D3D11_OPTIONS_DATA
D3DDDI_BLTFLAGS
D3DDDI_COPY_FLAGS
D3DDDIARG_BUFFERBLT1
D3DDDIARG_DISCARD
D3DDDIARG_TEXBLT1
D3DDDIARG_VOLUMEBLT1
D3DDDICAPS_ARCHITECTURE_INFO
D3DDDICAPS_SHADER_MIN_PRECISION
D3DDDICAPS_SHADER_MIN_PRECISION_SUPPORT
D3DDDICAPS_TYPE
Direct3D software requirements in Windows 8
This topic describes software requirements to support Microsoft Direct3D in Windows 8.

For Windows 8, independent hardware vendors must write a Windows Display Driver Model (WDDM) 1.2 driver
that can support the relevant Direct3D feature level user-mode driver (UMD) device driver interfaces (DDIs).
For example, Microsoft Direct3D 9â€“capable hardware must, at minimum, support the Direct3D version 9 DDI.
These software requirements vary based on the Microsoft DirectX hardware level as specified in this table:
DirectX software requirements
DIRECTX HARDWARE SOFTWARE REQUIREMENTS
D3D9 Required: WDDM 1.2

Required: D3D9 - UMD DDI

Required: D3D10- UMD DDI
Required: D3D11.1 - UMD DDI
D3D10.1 Required: WDDM 1.2

Required: D3D10.1- UMD DDI

DIRECTX HARDWARE SOFTWARE REQUIREMENTS
D3D11.1 Required: WDDM 1.2

The following tables describe the functionality that's exposed by using user-mode driver (UMD) DDI changes in
Windows 8.
D3D9 - UMD DDI exposes the following new features in Windows 8
REQUIRED? FEATURE
Required No overwrite and discard
Required Tileable copy flag
D3D11.1 - UMD DDI exposes the following new features in Windows 8 across feature levels 10, 10.1, 11,
and 11.1
REQUIRED? FEATURE
Required No overwrite and discard
Required Support for cross-process sharing of texture arrays (including

Stereoscopic 3D)
Required Tileable copy flag
Required ClearView
If Implemented Logic ops
Required Pixel formats (5551, 565, 4444) - exact support varies across
feature level
Required Same-surface blits
Required Partial constant buffer updates
Required Offset constant buffer bind
Required Improved resource sharing
Required SampleCount=1 (limited Target-independent rasterization

(TIR) on 10, 10.1, and 11)
D3D11.1 - UMD DDI exposes the following new features for feature level 11 & 11.1
REQUIRED? FEATURE
Required UAV-MSAA
If Implemented Double-precision shader functionality
Required Masked sum of absolute differences (MSAD)
D3D11.1 - UMD DDI exposes the following new features for feature level 11.1
REQUIRED? FEATURE
Required UAVs at every stage
Required UAV-MSAA (at 16 samples)
Required TIR

Direct3D hardware requirements in Windows 8
This topic describes hardware requirements to support Microsoft Direct3D in Windows 8.

Independent hardware vendors must follow the Windows 8 Direct3D rendering requirements for hardware, as
specified in this table. See also DirectX feature improvements in Windows 8 for specifics.
Direct3D rendering requirements for hardware
MICROSOFT DIRECTX HARDWARE VERSION REQUIRED/OPTIONAL WINDOWS 8 RENDERING REQUIREMENTS
D3D9 Required D3D9 HW Spec
D3D10.1 Required D3D9 HW Spec
D3D10.1 Required D3D10.1 HW Spec
D3D11 Required D3D10.1 HW Spec
The following tables describe the Direct3D hardware specification updates for Windows 8.
Microsoft Direct3D 10 hardware specification changes for Windows 8
REQUIRED? FEATURE
Required Pixel formats (5551, 565, 4444) *

REQUIRED? FEATURE
Required Same-surface blits *
If implemented Logic ops
Direct3D 10.1 hardware specification changes for Windows 8
REQUIRED? FEATURE
Microsoft Direct3D 11 hardware specification changes for Windows 8
REQUIRED? FEATURE
If implemented UAV-MSAA
If implemented Threading concurrent creates
If implemented Threading command lists
If implemented Double-precision support
Direct3D 11.1 hardware specification for Windows 8
REQUIRED? FEATURE
Required Logic ops
Required UAVs at every stage
Required UAV-MSAA
Required Target-independent rasterization (TIR)
If implemented Threading concurrent creates

REQUIRED? FEATURE
If implemented Threading Command Lists
If implemented Double-precision support
\* Already exists in the Microsoft Direct3D 9 hardware specification, but is not previously exposed in Direct3D 10.
Graphics INF requirements in WDDM 1.2
Windows Display Driver Model (WDDM) drivers in Windows 8 require INF changes to the graphics driver. The
most notable change is in the feature score. WDDM 1.2 drivers require a higher feature score than earlier WDDM
drivers. This section describes all relevant INF requirements for Windows 8 graphics drivers
In this section
TOPIC DESCRIPTION
Updated feature score directive in Windows 8 The updated feature score directive is a general
installation setting that's required for all Windows 8
drivers that follow the WDDM.
Driver matching criteria This topic describes the elements that are used to choose
the best match on a driver.
Updated friendly name for WDDM 1.2 This topic describes the updated friendly name for a
Graphics INF. This is a localizable string name requirement
for all Windows 8 in-box display driver INFs.
SKU differentiation directive With Windows Server 2008 and Windows Vista SP1, the
in-box display driver INFs were modified to include a new
value that represented the drivers as Client Only,
meaning that the drivers would not install on server SKUs
of Windows. This directive is required for all display drivers
in Windows 8.
General Unicode requirement in INF files INF files should be saved and encoded as Unicode; they
must not be ANSI.
Installed display drivers directive The installed display drivers directive is a software device
setting that gives the proper name for the UMD that is
installed as part of the driver package.
Copy flags to support PnP stop directive The Plug and Play (PnP) stop directive file section flag is
required for the WDDM to support driver upgrades that
don't require a reboot.
Driver\services start type directive The driver\services start type directive is a service
installation setting requirement for all display drivers.
WDDM drivers are Plug and Play (PnP) and therefore
must be demand started, where StartType =3.
TOPIC DESCRIPTION
Capability override settings to disable OpenGL This software device setting for all in-box display INFs
ensures that no in-box drivers are exposed to possible
interoperability issues with out-of-box OpenGL ICDs.
[Version] section directives This topic describes [Version] section directives in the INF.
[SourceDiskNames] section directives On Windows Vista and later, in-box INFs use the
[SourceDisksXxx] directives. However, the values of these
sections were changed from what had previously typically
been noted in an independent hardware vendor (IHV)
production driver package.
General x64 directives This topic describes the changes that are needed to
properly decorate the INF for use on 64-bit Windows.
General install section directives This is a general reminder that all references to out-of-box
or production/retail binaries, services, regadd, or delreg
sections that are normally part of a retail WHQL driver
packages, are not listed in the Windows in-box driver
packages.
[String] section changes for localized strings This INF requirement ensures that pseudo-localized builds
work. The requirement is to delineate localizable versus
non-localizable strings within the strings section.
Driver DLL for display adapter or chipset has properly This topic describes the proper formatting for display
formatted file version driver DLLs.

Updated feature score directive in Windows 8
The updated feature score directive is a general installation setting that's required for all Windows 8 drivers that
follow the Windows Display Driver Model (WDDM).
This table shows the values that apply for Windows 8. Key changes are italicized.
Feature scores for WDDM versions
DRIVER MODEL FEATURE SCORE
Windows 8 WHQL E0
Windows 8 Pre-Release Driver E3
Windows 7 WHQL E6
Windows 7 inbox EC
Windows Vista WHQL F6
Windows Vista inbox F8
Microsoft Basic Display Driver FB
XDDM third-party FC (Not used in Windows 8)
XDDM inbox in Windows Vista FD (Not used in Windows 8)
VGA FE (Not used in Windows 8)
Default or No Score FF
Unsigned drivers No feature score = FF
Each operating system release introduces a new feature score value. For Windows 8 this is E3 for in-box and pre-
release drivers, and E0 for WHQL drivers. The feature score is used by Windows to determine which driver to install
when multiple possible drivers exist. A driver with a higher ranked feature score is selected.
All Windows 8 in-box driver devices have a higher ranked feature score than all existing Windows 7 drivers
because the in-box drivers are tested on Windows 8, and existing Windows 7Windows 7 drivers have not been.
This results in the in-box Windows 8 driver replacing existing Windows 7 drivers. An independent hardware vendor
(IHV) can use the E0 feature score with a Windows 7 driver if the following is true:
The driver has been tested for Windows 8.
The driver has fixes that make it better than the in-box driver.
The driver is intended to be retained on upgrade to Windows 8.
Driver matching criteria
This topic describes the elements that are used to choose the best match on a driver.
The following elements are used to choose the best match on a driver. They are listed in order from most
significant to least significant:
1. Signature
a. Signed
b. Unsigned
2. Scope
a. Specific
b. Basic - DNF_BASIC_DRIVER
3. Signature score
a. Within signed
a. #define SIGNERSCORE_LOGO_PREMIUM 0x0D000001
b. #define SIGNERSCORE_LOGO_STANDARD 0x0D000002
c. #define SIGNERSCORE_INBOX 0x0D000003
d. #define SIGNERSCORE_UNCLASSIFIED 0x0D000004 // UNCLASSIFIED == INBOX == STANDARD
== PREMIUM when the SIGNERSCORE_MASK filter is applied
e. #define SIGNERSCORE_WHQL 0x0D000005 // base WHQL.
f. #define SIGNERSCORE_AUTHENTICODE 0x0F000000
b. Within unsigned
a. #define SIGNERSCORE_UNSIGNED 0x80000000
b. #define SIGNERSCORE_W9X_SUSPECT 0xC0000000
c. #define SIGNERSCORE_UNKNOWN 0xFF000000
4. Feature Score, for display
a. Windows 8 WHQL E0
b. Windows 8 Pre-Release Driver E3
c. Windows 7 WHQL E6
d. Windows 7 Inbox EC
e. Windows Vista WHQL F6
f. Windows Vista Inbox F8
g. Microsoft Basic Display Driver FB
h. XDDM 3rd party FC (Not used in Windows 8)
i. XDDM Inbox in Windows Vista FD (Not used in Windows 8)
j. VGA FE (Not used in Windows 8)
k. Default or No Score FF
l. Unsigned drivers FF
m. No Feature score FF
5. Match type (INF matches are listed under the models section as Description=Install Section, HWID,
CompatID. With 0 or 1 HW IDs and 0 or more CompatIDs)
a. Device HardwareID == INF HardwareID
b. Device HardwareID == INF CompatID
c. Device CompatID == INF HardwareID
d. Device CompatID == INF CompatID
6. Match rank: priority of match within list of matches from device
7. Driver date
8. Driver version number
Updated friendly name for WDDM 1.2
This topic describes the updated friendly name for a Graphics INF. This is a localizable string name requirement for
all Windows 8 in-box display driver INFs.
All Windows 8 in-box drivers must use the E3 feature score, regardless of the friendly name. The friendly name will
reflect the driver model supported by the INF that's described here.
For Windows Display Driver Model (WDDM) 1.2 drivers that were tested on Windows 8 and that are included in the
box in Windows 8, (Microsoft Corporation â€“ WDDM v1.2) must be appended to the device name, as shown in this
example:
;
; Localizable Strings
;
IHV_DeviceName.XXX = â€œMy Device Name (Microsoft Corporation â€“ WDDM v1.2)â€
Note
To easily highlight drivers for testing only, that are going to enable Windows 8â€“specific optional features that are
optimized for Windows 8, we recommend the following input so that users can easily determine that it is not a
standard Windows 8 driver. (This should also make bugs easier to triage).
For example: WDDM 1.2 specific work
IHV_DeviceName.XXX = â€œMy Device Name (Engineering Sample â€“ WDDM v1.2)â€
For WDDM 1.1 drivers that were tested on Windows 8 and that are included in the box in Windows 8, (Microsoft
Corporation â€“ WDDM v1.1) must be appended to the device name, as shown in this example:
;
;
For WDDM 1.0 drivers that were tested on Windows 8 and that are included in the box in Windows 8, (Microsoft
Corporation â€“ WDDM v1.0) must be appended to the device name, as shown in this example:
;
;

SKU differentiation directive
With Windows Server 2008 and Windows Vista SP1, the in-box display driver INFs were modified to include a new
value that represented the drivers as Client Only, meaning that the drivers would not install on server SKUs of
Windows. This directive is required for all display drivers in Windows 8.
In Windows Vista before SP1, the following values were used:
X86:
[Manufacturer]
%ATI% = ATI.Mfg
[ATI.Mfg]
In Vista SP1\Server 2008 the following values were used;

X86:
[Manufacturer]
%ATI% = ATI.Mfg,NTx86...1
[ATI.Mfg.NTx86...1]
X64:
[Manufacturer]
%ATI% = ATI.Mfg,NTamd64...1
[ATI.Mfg.NTamd64...1]
For Windows 8, the same values that were used for Windows Vista SP1 and Windows Server 2008 are used.
SKU differentiation for device drivers

independent hardware vendors (IHVs) can use ProductType INF values to indicate that a given INF is valid for
server or client platforms only. This works on Windows XP and later operating systems, and the changes are
relatively simple to implement.
Therefore, even if a client-only driver package exists in the driver store of a server system, that driver is not
installable.
The INF Manufacturer Section topic shows how to add TargetOSVersion to filter device installations based on
various criteria. One of these criteria is ProductType, which can be used to specify a category of SKUs on which the
package can be installed. The following values are defined for ProductType:
0x0000001 (VER_NT_WORKSTATION)
0x0000002 (VER_NT_DOMAIN_CONTROLLER)
0x0000003 (VER_NT_SERVER)
For any given architecture, a typical INF is decorated to install on any SKU in the following way:
[Manufacturer]
%MSFT%=Models,amd64
[Models.NTamd64]
<models entries>
In order to restrict this INF to install on client only, you need to add a ProductType of â€œ1â€ to the
decoration. The number may be expressed as decimal or hexadecimalâ€¦ the documentation shows hexadecimal, but I
will use decimal in the example for simplicity.
[Manufacturer]
%MSFT%=Models,amd64...1
; models section for workstation

[Models.NTamd64...1]
<models entries>
For server, the syntax breaks it down to install on a client and a plain server. Each of these has its own
product typeâ€¦ unfortunately the INF syntax needs you to specify both to cover both cases. Thus you need to
duplicate the entire models section to really cover the server SKU:
[Manufacturer]
%MSFT%=Models,amd64...1amd64...3
; models section for client

IHV_DeviceName.XXX = â€œFoo Generic Device Name (Microsoft Corporation â€“ WDDM v1.2)â€
IHV_DeviceName.YYY = â€œFoo Enthusiast Device Name (Microsoft Corporation â€“ WDDM v1.2)â€
<models entries>
; models section for Server

IHV_DeviceName.XXX = â€œFoo Generic Name (Microsoft Corporation â€“ WDDM v1.2)â€
IHV_DeviceName.ZZZ = â€œFoo Datacenter Name (Microsoft Corporation â€“ WDDM v1.2)â€
<models entries>

General Unicode requirement in INF files
INF files should be saved and encoded as Unicode (UTF-16); they must not be ANSI or UTF-8.
To check for Unicode in INF files
1. Use Microsoft Notepad to open the INF file.
2. On the File menu, click Save As.
3. If ANSI appears in the Encoding field of the dialog box, change the encoding to Unicode and save the file
under a new name.
This figure shows the Save As dialog box for a file that has ANSI encoding:
The proper default value is shown in this figure:

Installed display drivers directive
The installed display drivers directive is a software device setting that gives the proper name for the UMD that is
installed as part of the driver package.
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%,

UserModeDriverName1, UserModeDriverName2, UserModeDriverNameWow1, UserModeDriverNameWow2
For example:
For example:
X86:
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, r200umd
X64:
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, r200umd, r200umdva, r200umd64, r200umd64va

Copy flags to support PnP stop directive
The Plug and Play (PnP) stop directive file section flag is required for the Windows Display Driver Model (WDDM)
to support driver upgrades that don't require a reboot.
Note
This is required only for the user-mode driver binaries, not for the kernel-mode driver entry.
For example:
;
; File sections
;
[r200.Miniport]
r200.sys
[r200.Display]
r200umd.dll,,,0x00004000 ; COPYFLG_IN_USE_TRY_RENAME
r200umd2.dll,,,0x00004000 ; COPYFLG_IN_USE_TRY_RENAME

Driver\services start type directive
The driver\services start type directive is a service installation setting requirement for all display drivers. Windows
Display Driver Model (WDDM) drivers are Plug and Play (PnP) and therefore must be demand started, where
StartType =3.
For example:
; Service Installation Section

;
[R200_Service_Inst]
ServiceType = 1 ; SERVICE_KERNEL_DRIVER
StartType = 3 ; SERVICE_DEMAND_START
ErrorControl = 0 ; SERVICE_ERROR_IGNORE
LoadOrderGroup = Video
ServiceBinary = %12%\r200.sys

Capability override settings to disable OpenGL
This software device setting for all in-box display INFs ensures that no in-box drivers are exposed to possible
interoperability issues with out-of-box OpenGL ICDs.
For example:
[R200_SoftwareDeviceSettings]
HKR,, CapabilityOverride, %REG_DWORD%, 0x8

[Version] section directives
This topic describes [Version] section directives in the INF.

All inbox drivers must not reference the Layout.inf file.
All inbox drivers must not reference any catalog files.
For example:
[Version]
Signature="$Windows NT$"
Provider=%MSFT%
ClassGUID={4D36E968-E325-11CE-BFC1-08002BE10318}
Class=Display
DriverVer=11/22/2004, 6.14.10.7000
Note:
no line item for LayoutFile=layout.inf
no line item for CatalogFile=delta.cat
WHQL display drivers must not reference the Layout.inf file.

For example:
[Version]
Provider=%IHV%
Class=Display
DriverVer=11/22/2004, 6.14.10.7000
Note:
no line item for LayoutFile=layout.inf

[SourceDiskNames] section directives
On Windows Vista and later, in-box INFs use the [SourceDisksXxx] directives. However, the values of these sections
were changed from what had previously typically been noted in an independent hardware vendor (IHV) production
driver package.
[SourceDisksNames] and [SourceDisksFiles] section directives

For example, for IHV production drivers:
For example, IHV production drivers:

[SourceDisksNames]
1 = %DiskID1%
[SourceDisksFiles]
r200.sys = 1
r200umd.dll = 1
This is the Windows inbox INF requirement:
[SourceDisksNames]
3426=windows cd
[SourceDisksFiles]
IHVKDM.sys = 3426
IHVUMD.dll = 3426
IHVVID.dll = 3426
[SignatureAttributes] section directives

On Windows Vista and later, inbox INFs use the [SignatureAttributes] directives.
There is no need to reference the miniport (.sys) file.
For example:
[SignatureAttributes]
IHVUMD1.dll=SignatureAttributes.PETrust
IHVUMD2.dll=SignatureAttributes.PETrust
[SignatureAttributes.PETrust]
PETrust=true

General x64 directives
This topic describes the changes that are needed to properly decorate the INF for use on 64-bit Windows.
For example:
[DestinationDirs]
DefaultDestDir = 11
R200.Miniport = 12 ; drivers
R200.Display = 11 ; system32
R200.DispWow = 10, SysWow64
[Manufacturer]
%ATI% = ATI.Mfg, NTamd64
[ATI.Mfg.NTamd64]
[R200_RV200]
FeatureScore=F8
CopyFiles=R200.Miniport, R200.Display, R200.DispWow
AddReg = R200_SoftwareDeviceSettings
AddReg = R200_RV200_SoftwareDeviceSettings
DelReg = R200_RemoveDeviceSettings
; File sections
;
[r200.Miniport]
r200.sys
[r200.Display]
[R200.DispWow]

General install section directives
This is a general reminder that all references to out-of-box or production/retail binaries, services, regadd, or delreg
sections that are normally part of a retail WHQL driver packages, are not listed in the Windows in-box driver
packages.
Do not refer to anything that is required by your OpenGL ICDs, OpenCL, Control Panel, Help files, out-of-box
services, polling applications, and so on.
[String] section changes for localized strings
This INF requirement ensures that pseudo-localized builds work. The requirement is to delineate localizable versus
non-localizable strings within the strings section.
The following example has no preface of what is localized or not; this should not be used:
[Strings]
REG_MULTI_SZ = 0x00010000
REG_DWORD = 0x00010001
MSFT = "Microsoft"
IHV = "Contoso, Ltd"
The following example should be used instead; note the new lines:
[Strings]
;Localizable
MSFT = "Microsoft"
IHV = "Contoso, Ltd"
;Non-Localizable
REG_MULTI_SZ = 0x00010000
REG_DWORD = 0x00010001

Driver DLL for display adapter or chipset has
properly formatted file version
This topic describes the proper formatting for display driver DLLs.
The file version of the display driver DLLs must be of the form A.BB.CC.DDDD:
The A field must be set to 9 for WDDM 1.2 drivers on Windows 8.
The A field must be set to 8 for WDDM 1.1 drivers on Windows 7.
The A field must be set to 7 for WDDM 1.0 drivers on Windows Vista.
The A field must be set to 6 for XDDM drivers on Windows Vista.
For Windows 7 and earlier (WDDM 1.1 and earlier) drivers the BB field must be set to the DDI version that the
driver supports:
DirectX 9 drivers (which expose any of the D3DDEVCAPS2_* caps) must set BB to 14.
DirectX 10 drivers must set BB to 15.
Direct3D 11-DDI driver on Direct3D 10 hardware must set BB to 16.
Direct3D 11-DDI driver on Direct3D 11 hardware must set BB to 17.
For Windows 8 (WDDM 1.2) drivers the BB field must be set to the highest DirectX feature level supported by the
driver on the graphics hardware covered by the driver:
A Feature Level 9 driver must set BB to 14.
A Feature Level 11_1 driver must set BB to 18.
Because for WDDM 1.2 drivers, BB is set to reflect feature level supported, irrespective of hardware DX level, 16 is
not used, as it was specific to D3D11-DDI on DX10 hardware for WDDM 1.1 drivers.
The CC field can be equal to any value between 01 and 9999.
The DDDD field can be set to any numerical value between 0 and 9999.
For example:
Windows Vista DirectX 9.0â€“compatible WDDM drivers can use the range 7.14.01.0000 to 7.14.9999.9999.
Windows 7 DirectX 10.0â€“compatible WDDM 1.1 drivers can use the range 8.15.01.0000 to 8.15.9999.9999.
Windows 8 WDDM 1.2 drivers on DX10 hardware would be 9.15.01.0000 to 9.15.9999.9999.
Recommendation (this will become a requirement in a future release): We highly recommend that the DriverVer
in the display driver .INF file also conform to the above DLL version-numbering requirement, except that for
Windows 8, WDDM 1.2 drivers, the BB field in the INF DriverVer must be set for the highest DirectX feature level
that is supported by the driver on the graphics hardware listed in the INF.
WDDM 1.2 installation scenarios
The Windows 8 installation graphics driver behavior is designed to ensure that, whenever possible, our customers
get a graphics driver that has been tested and certified for Windows 8. This behavior is defined by the rules that are
described in this section.
In this section
TOPIC DESCRIPTION
Windows 8 in-box graphics driver preferred In this scenario, Windows 8 in-box graphics driver are
preferred over Windows 7 or older graphics drivers.
Windows 8 in-box graphics drivers treated as generic In this scenario, Windows 8 in-box graphics drivers,
drivers including the MS Basic Display Driver (MSBDD), are all
treated like generic drivers by Windows and Windows
Update.
WDDM graphics driver migrated to Windows 8 When there is no Window 8 in-box coverage for the
graphics hardware in a Windows 8 upgrade installation, a
WDDM 1.1 or WDDM 1.0 graphics driver that was used
by the previous version of Windows will be migrated to
Windows 8.
XDDM drivers not supported for Windows 8 XDDM drivers are not supported for Windows 8 and will
not install or run on Windows 8.

Windows 8 in-box graphics driver preferred
In this scenario, Windows 8 in-box graphics driver are preferred over Windows 7 or older graphics drivers.
In a Windows 8 upgrade installation, if the graphics hardware is covered by a Windows 8 in-box driver, the
graphics drivers from the previous Windows version is not migrated to Windows 8. This is true even when the
older graphics driver is a 4-part match (graphics hardware + pc model specific), and the Windows 8 in-box
graphics driver is only a 2-part match (graphics hardware specific only). This is because the Windows 8 in-box
driver has a better feature score than any driver in-box or Logoâ€™ed for a previous Windows release. To
understand the driver selection criteria, see Driver matching criteria. If a Windows 8 certified driver was installed on
Windows 7 before the Windows 8 upgrade installation, that driver will migrate.
Note
A Windows 7 in-box graphics driver will never migrate to Windows 8, even if there is no Windows 8 in-box
coverage for the graphics hardware. In this case, Windows 8 uses the Microsoft Basic Display Driver (MSBDD).
Windows 8 in-box graphics drivers treated as generic
drivers
In this scenario, Windows 8 in-box graphics drivers, including the MS Basic Display Driver (MSBDD), are all treated
like generic drivers by Windows and Windows Update.
This means that any newer matching graphics driver package on Windows Update is offered as an Important
update.
If the default settings for Windows 8 Windows Update are being used, an Important driver update downloads and
installs without user intervention, and often without the user noticing that this update occurred.
When Windows 8 ships, the in-box graphics drivers are significantly older than the latest driver updates on
Windows Update. This behavior ensures that the user experiences Windows 8 by using the latest/best graphics
driver available.
Note
The Windows 8 certified OEM graphics drivers that is provided on new computers sold with Windows 8 pre-
installed are not considered generic. A newer, matching graphics driver on Windows Update would be offered as
an Optional update in these cases. The user must actively choose to install an Optional driver update.
WDDM graphics driver migrated to Windows 8
When there is no Window 8 in-box coverage for the graphics hardware in a Windows 8 upgrade installation, a
WDDM 1.1 or WDDM 1.0 graphics driver that was used by the previous version of Windows will be migrated to
Windows 8.
The Windows 8 installer can block certain problem drivers. Such drivers are not migrated to Windows 8. Drivers
are identified for such blocks based on issues that are reported by using Windows telemetry. In such cases,
Windows 8 uses the MSBDD until a newer driver is installed by Windows Update or by the user from an OEM/IHV
support site.
XDDM drivers not supported for Windows 8
XDDM drivers are not supported for Windows 8 and will not install or run on Windows 8.
If the graphics hardware is not supported by a Windows 8 in-box graphics driver, Windows 8 will run the MSBDD
until a Windows 8 compatible driver is installed from Windows Update or an OEM/IHV site.
Note
The vendor can develop a Windows 8-compatible display-only driver for the hardware if it is a server product.
Table 1 Driver Upgrade Experience in Windows 8 summarizes graphics driver migration behavior during a
Windows 8 upgrade and clean installations. In this table, ITB = in-box, and OTB = out-of-the-box; this is an OEM or
IHV retail driver package or Windows Update package.
Table 1 Driver Upgrade Experience in Windows 8
WINDOWS 8 IN-BOX RESULTING INITIAL DRIVER IN

DRIVER USED IN WINDOWS 7 SCENARIO COVERAGE WINDOWS 8
Win7 OTB Driver / Win7 ITB Upgrade ITB Driver Support Win8 ITB Driver
Driver / No Driver / XDDM
Driver
Win7 OTB Driver Upgrade No ITB Driver Support Win7 OTB Driver
Win7 ITB Driver / No Driver / Upgrade No ITB Driver Support Win8 MSBDD
Blocked OTB Driver / XDDM
Driver
N/A Clean ITB Driver Support Win8 ITB Driver
N/A Clean No ITB Driver Support Win8 MSBDD
In cases where the Windows 7 graphics driver itself is not migrated, any IHV or OEM value-add components from
the Windows 7 graphics driver package, such as control panels and OpenGL support libraries, can persist after a
Windows 8 upgrade installation. This happens because the Windows 8 installer cannot know that these value-add
components are associated with the Windows 7 retail or OEM driver package. These value-add components might
not function properly in the absence of the rest of their driver package.
IHVs should harden these value-add components to simply exit in such cases. In the rare cases where the value-add
causes problems, the specific value-add components can be blocked from migrating by the Microsoft compatibility
team. In some cases, an IHVâ€™s Windows 8 in-box driver removes the value-add component on upgrade. This is
up to the IHV.
Some retail and OEM Windows 7 graphics drivers are intentionally structured to prevent their installation on
Windows 8. Windows 8 might try to migrate such a driver according to the rules above, but it would fail to install
on Windows 8, resulting in the use of the MSBDD.
An IHV can create a unified driver package that is a WDDM 1.2 driver on Windows 8, but that appears like a WDDM
1.1 or 1.0 driver on previous Windows releases.
WDDM 1.2 driver enforcement guidelines
This section describes WDDM 1.2 driver enforcement guidelines.
In this section
TOPIC DESCRIPTION
WDDM 1.2 driver enforcement Validation by the Microsoft DirectX graphics kernel
subsystem (Dxgkrnl) is enforced starting with Windows 8
to determine whether the mandatory Windows Display
Driver Model (WDDM) 1.2 features are supported by the
WDDM 1.2 driver.
WDDM driver and feature caps This topic describes WDDM driver feature capabilities
(caps).
WDDM 1.2 best practices To deliver the best experience in Windows 8 and later,
Windows takes advantage of the graphics hardware
paired with a WDDM 1.2 or later driver. This section
summarizes the best practices.

WDDM 1.2 driver enforcement
Validation by the Microsoft DirectX graphics kernel subsystem (Dxgkrnl) is enforced starting with Windows 8 to
determine whether the mandatory Windows Display Driver Model (WDDM) 1.2 features are supported by the
WDDM 1.2 driver.
WDDM 1.2 has both mandatory and optional features. The driver must set all the mandatory feature caps to claim
itself as a WDDM 1.2 driver, while the driver can implement any combination (or none) of the optional features. A
non-WDDM 1.2 driver must report none of the WDDM 1.2 features.
User experience when a driver fails the Dxgkrnl validation

If a driver has wrongly claimed itself as WDDM 1.2 or has implemented only some of the mandatory features, then
it will fail to create an adapter, and the system will fall back to the Microsoft Basic Display Driver (MSBDD).
WDDM driver and feature caps
This topic describes Windows Display Driver Model (WDDM) driver feature capabilities (caps).
This table lists the requirements for a driver to specify to Windows the WDDM driver type and version.
WDDM 1.2 driver requirements
WDDM DRIVER TYPE DDI REQUIREMENTS
Full Graphics Implement all the Render-specific and Display-specific required

device driver interfaces (DDIs)
Display-Only Implement all the Display-specific DDIs and return a null

pointer for all the Render-specific DDIs
Render-Only Implement all the Render-specific DDIs and return a null

pointer for all the Display-specific DDIs, or implement all the
DDIs for a full WDDM driver but report
DISPLAY_ADAPTER_INFO.NumVidPnSources = 0 and
DISPLAY_ADAPTER_INFO.NumVidPnTargets = 0.
This table lists all the feature capabilities visible to the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys)
that WDDM 1.2 drivers are required to set. "M" indicates a mandatory feature, "O" indicates optional, and "NA"
indicates not applicable. To read details about each feature, follow the link in the left column.
WDDM 1.2 feature caps
FEATURE FULL GRAPHICS DRIVER RENDER-ONLY DRIVER DISPLAY-ONLY DRIVER FEATURE CAPS
WDDM version M M M DXGK_DRIVERCAPS.

WDDMVersion
Plug and Play (PnP) M NA M DXGK_DRIVERCAPS.

start and stop: Bug SupportNonVGA
check and PnP Stop
support for Non-VGA
Optimized screen M NA M DXGK_DRIVERCAPS.

rotation support SupportSmoothRota
tion
GPU preemption M M NA DXGK_DRIVERCAPS.

PreemptionCaps
FEATURE FULL GRAPHICS DRIVER RENDER-ONLY DRIVER DISPLAY-ONLY DRIVER FEATURE CAPS
DXGK_FLIPCAPS.Flip M M NA DXGK_FLIPCAPS.Flip
OnVSyncMmIo OnVSyncMmIoFlipO
nVSyncMmIo was
available starting with
Windows Vista; the
requirement starting
with Windows 8 is to
set the
FlipOnVSyncMmIo
cap.
TDR changes in M M NA DXGK_DRIVERCAPS.

Windows 8 SupportPerEngineT
DR
Standby hibernate O O NA DXGK_SEGMENTDES

optimizations: CRIPTOR3.Flags
Optimizing the
graphics stack to
improve performance
on sleep and resume
Stereoscopic 3D: New O NA NA D3DKMDT_VIDPN_S

infrastructure to OURCE_MODE_TYPE
process and present
stereoscopic content
Direct flip of video M NA NA DXGK_DRIVERCAPS.

memory SupportDirectFlip
GDI Hardware M M NA DXGK_PRESENTATIO

Acceleration: A NCAPS.SupportKern
required feature elModeCommandBu
starting with WDDM ffer
1.1
GPU power O O O If this feature is

management of idle supported, the
states and active DxgkDdiSetPowerCo
power mponentFState and
DxgkDdiPowerRuntim
eControlRequest
functions must be
supported.

WDDM 1.2 best practices
To deliver the best experience in Windows 8 and later, Windows takes advantage of the graphics hardware paired
with a Windows Display Driver Model (WDDM) 1.2 or later driver. This section summarizes the best practices.
System manufacturers:
Ensure the following cases are fully tested and work well with your system configurations:
Compatible with Microsoft Basic Display Driver
Updates on servers that don't need a reboot
Design new servers with WDDM hardware and adopt the relevant WDDM driver type that best suits your
customerâ€™s needs.
Work with graphics hardware vendors to get certified WDDM 1.2 drivers for validation.
For headless systems:
System firmware should set the VGA Not Present flag in the IAPC_BOOT_ARCH field of the Fixed ACPI
Description Table (FADT), and if there is any VBIOS, it should implement an empty mode list through the
VESA BIOS Extensions (VBE).
In the absence of VBE support, the headless system should not represent a working display through the
Unified Extensible Firmware Interface (UEFI) Graphics Output Protocol (GOP).
See Windows hardware certification for validation and testing information.
Test a variety of hardware configurations on both desktops and mobile systems to ensure a solid end-user
experience on Windows 8 and later.
Graphics hardware vendors:
Work with Microsoft to develop WDDM 1.2 drivers.
Test pre-release WDDM 1.2 drivers on Windows 8 and later.
Provide updated WDDM 1.x drivers to Microsoft for deployment through Windows Update.
In addition to the Windows certification test suite, validate graphics and gaming performance, application
compatibility, and various self-host scenarios on each ASIC family.
Test WDDM 1.0 and 1.1 drivers on Windows 8 and later.
Make the full retail package for WDDM 1.2 drivers available as early as possible.
Independent software vendors (ISVs):
Test existing and upcoming Microsoft DirectX games with WDDM 1.2 drivers on Windows 8 and later.
Test individual applications on Windows 8 and later.
Take advantage of the Windows 8 DirectX feature improvements.
Introduction to the Windows Display Driver Model
(WDDM)
The following topics introduce the display driver model for Windows Display Driver Model (WDDM) and explain
some benefits of creating display drivers for Windows Vista and later:
Benefits of the Windows Display Driver Model (WDDM)
Migrating to the Windows Display Driver Model (WDDM)
Windows Display Driver Model (WDDM) Operation Flow
The display driver model architecture for the Windows Display Driver Model (WDDM), available starting with
Windows Vista, is composed of user-mode and kernel-mode parts. The following figure shows the architecture
required to support WDDM.
A graphics hardware vendor must supply the user-mode display driver and the display miniport driver. The user-
mode display driver is a dynamic-link library (DLL) that is loaded by the Microsoft Direct3D runtime. The display
miniport driver communicates with the Microsoft DirectX graphics kernel subsystem. For more information about
the user-mode display driver and display miniport driver, see the Windows Display Driver Model (WDDM)
Reference.
Benefits of the Windows Display Driver Model
(WDDM)
Creating display drivers is easier using the Windows Display Driver Model (WDDM), available starting with
Windows Vista, as opposed to using the Windows 2000 Display Driver Model (XDDM), because of the following
enhancements. In addition, WDDM drivers contribute to greater operating system stability and security because
less driver code runs in kernel mode where it can access system address space and possibly cause crashes.
Note XDDM and VGA drivers will not compile on Windows 8 and later versions. If display hardware is attached to
a Windows 8 computer without a driver that is certified to support WDDM 1.2 or later, the system defaults to
running the Microsoft Basic Display Driver.
The Microsoft Direct3D runtime and Microsoft DirectX graphics kernel subsystem perform more of the
display processing (that is, more code is in the runtime and subsystem as opposed to the drivers). This
includes code that manages video memory and schedules direct memory access (DMA) buffers for the GPU.
For more information, see Video Memory Management and GPU Scheduling.
Surface creation requires fewer kernel-mode stages.
Surface creation on operating systems earlier than Windows Vista requires the following successive kernel-
mode calls:
1. DdCanCreateSurface
2. DdCreateSurface
3. D3dCreateSurfaceEx
Surface creation in WDDM requires only the CreateResource user-mode display driver call, which in turn
calls the pfnAllocateCb function. A call to the kernel-mode DxgkDdiCreateAllocation function then
occurs.
Calls that create and destroy surfaces and that lock and unlock resources are more evenly paired.
Video memory, system memory, and managed surfaces are handled identically in WDDM. Operating
systems prior to Windows Vista handled these components in subtly different ways.
Shader translation is performed in the user-mode portion of the display drivers.
This approach eliminates the following complexities that occur when shader translation is performed in
kernel mode:
Hardware models that do not match device driver interface (DDI) abstractions
Complex compiler technology that is used in the translation
Because the shader processing occurs completely per process and hardware access is not required, kernel-
mode shader processing is not required. Therefore, shader translation code can be processed in user mode.
You must write try/except code around user-mode translation code. Translation faults should cause a return
to application processing.
Background translation (that is, translation code that runs in a separate thread from other display-
processing threads) is easier to write for user mode.
Migrating to the Windows Display Driver Model
(WDDM)
Migrating to the Windows Display Driver Model (WDDM)) requires driver writers to write completely different
display and video miniport drivers. Similar to the Windows 2000 display driver model (XDDM), WDDM requires a
paired display driver and display miniport driver. However, in WDDM, the display driver runs in user mode. Also,
the model does not use services of the Windows Graphics Device Interface (GDI) engine; the model uses services of
the Microsoft Direct3D runtime and Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys).
WDDM supports display and video miniport drivers written according to XDDM. However, new drivers should be
written as WDDM drivers, whenever possible, to take advantage of software and hardware features available
starting with Windows Vista.
Although driver writers can reuse low-level hardware-dependent code in their WDDM drivers, they should rewrite
new device driver interface (DDI)-related code. When writing WDDMdrivers, consider these points:
The display miniport driver must implement a revised set of entry-point functions to interact with the
operating system and the DirectX graphics kernel subsystem. For more information, see DriverEntry of
Display Miniport Driver. The display miniport driver can call any documented kernel function.
The display miniport driver dynamically loads the appropriate DirectX graphics kernel subsystem. The
display miniport driver and the DirectX graphics kernel subsystem call each other through interfaces.
The display miniport driver is no longer required to process most video I/O control codes (IOCTL). In XDDM,
the kernel-mode display driver uses these codes to communicate with the video miniport driver. In WDDM,
the user-mode display driver communicates with the Direct3D runtime; the WDDM graphics kernel
subsystem, in turn, communicates with the display miniport driver. Note The following IOCTLs are still used
in WDDM, and the display miniport driver must process them:
IOCTL_VIDEO_QUERY_COLOR_CAPABILITIES IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS
The user-mode display driver must implement and export an OpenAdapter function, which opens an
instance of the graphics adapter. The user-mode display driver must also implement a CreateDevice
function, which creates representations of display devices that handle collections of rendering state.
The user-mode display driver's CreateResource function, along with the display miniport driver's
DxgkDdiCreateAllocation function, replace the DdCanCreateSurface, DdCreateSurface, and
D3dCreateSurfaceEx functions in XDDM.
Most of the remaining user-mode display driver functions implement the same functionality that the kernel-
mode display driver for XDDM implemented in the following:
The D3dDrawPrimitives2 function and DP2 operation codes
The motion compensation callback functions and DirectX Video Acceleration structures
Windows Display Driver Model (WDDM) Operation
Flow
The following diagram shows the flow of Windows Display Driver Model (WDDM) operations that occur from
when a rendering device is created to when the content is presented to the display. The sequence in the sections
that follow describes the operation flow in more detail.
Creating a Rendering Device
1. After an application requests to create a rendering device,

the display miniport driver receives a
DxgkDdiCreateDevice call. The display miniport driver
initializes direct memory access (DMA) by returning a
pointer to a filled DXGK_DEVICEINFO structure in the
pInfo member of the DXGKARG_CREATEDEVICE
structure.
2. If the call to the display miniport driver's

DxgkDdiCreateDevice succeeds, the Microsoft Direct3D
runtime calls the user-mode display driver's
CreateDevice function.
3. In the CreateDevice call, the user-mode display driver
must explicitly call the pfnCreateContextCb function to
create one or more contexts—GPU threads of execution
on the newly created device. The Direct3D runtime
returns information in the pCommandBuffer and
CommandBufferSize members of the
D3DDDICB_CREATECONTEXT structure to initialize the
command buffer.
Creating Surfaces for a Device
4. After an application requests to create surfaces for the

rendering device, the Direct3D runtime calls the user-
mode display driver's CreateResource function.
5. The user-mode display driver's CreateResource calls the

pfnAllocateCb runtime-supplied function.
6. The display miniport driver receives a

DxgkDdiCreateAllocation call, which indicates the
number and types of allocations to create.
DxgkDdiCreateAllocation returns information about
the allocations in an array of DXGK_ALLOCATIONINFO
structures in the pAllocationInfo member of the
DXGKARG_CREATEALLOCATION structure.
Submitting the Command Buffer to Kernel Mode
7. After an application requests to draw to a surface, the

Direct3D runtime calls the user-mode display driver
function related to the drawing operation, for example,
DrawPrimitive2.
8. To submit the command buffer to kernel-mode, the

Direct3D runtime calls either the user-mode display
driver's Present or Flush function. Also, the user-mode
display driver submits the command buffer if the
command buffer is full.
9. The user-mode display driver calls the pfnPresentCb

runtime-supplied function if Present was called, or the
pfnRenderCb runtime-supplied function if Flush was
called or the command buffer is full.
10. The display miniport driver receives a call to the

DxgkDdiPresent function if pfnPresentCb was called, or
the DxgkDdiRender or DxgkDdiRenderKm function if
pfnRenderCb was called. The display miniport driver
validates the command buffer, writes to the DMA buffer
in the hardware's format, and produces an allocation list
that describes the surfaces used.
Submitting the DMA Buffer to Hardware

11. The Microsoft DirectX graphics kernel subsystem calls the
display miniport driver's DxgkDdiBuildPagingBuffer
function to create special purpose DMA buffers, known as
paging buffers, that move the allocations specified in the
allocation list to and from GPU-accessible memory.
Note DxgkDdiBuildPagingBuffer is not called for every

frame.
12. The DirectX graphics kernel subsystem calls the display

miniport driver's DxgkDdiSubmitCommand function to
queue the paging buffers to the GPU execution unit.

miniport driver's DxgkDdiPatch function to assign
physical addresses to the resources in the DMA buffer.

miniport driver's DxgkDdiSubmitCommand function to
queue the DMA buffer to the GPU execution unit. Each
DMA buffer submitted to the GPU contains a fence
identifier, which is a number. After the GPU finishes
processing the DMA buffer, the GPU generates an
interrupt.
15. The display miniport driver is notified of the interrupt in

its DxgkDdiInterruptRoutine function. The display
miniport driver should read, from the GPU, the fence
identifier of the DMA buffer that just completed.
16. The display miniport driver should call the

DxgkCbNotifyInterrupt function to notify the DirectX
graphics kernel subsystem that the DMA buffer
completed. The display miniport driver should also call the
DxgkCbQueueDpc function to queue a deferred
procedure call (DPC).

Installation Requirements for Display Miniport and
A display miniport driver for a graphics device is installed on the operating system by using an INF file that is
marked as Class=Display. This INF will be interpreted by the system-supplied display class installer during driver
installation.
The INF file of the graphics device's display miniport driver for Windows Vista and later must store all software
settings under the DDInstall section. Doing so causes the operating system to copy all registry values to the Plug
and Play (PnP) software key in the registry.
To ensure proper installation, the following information must be supplied in the INF file of any display miniport
driver that conforms to the Windows Display Driver Model (WDDM).
Adding User-Mode Display Driver Names to the Registry
Appending Information to the Friendly String Names of Graphics Adapters
Overriding Monitor EDIDs with an INF
You should refer to the Overview of INF Files and INF File Sections and Directives sections for general help in
creating a display miniport driver INF file. For more information about registry root identifiers, such as HKR, see
INF AddReg Directive.
Note There are no INF sections and directives for uninstalling display drivers that are specific to graphic devices.
The ExcludeFromSelect directive is required for all drivers, except for mirror drivers, that are written to the
Windows Display Driver Model (WDDM).
The following example shows how to add the ExcludeFromSelect directive to a ControlFlags section of the INF
file:
[ControlFlags]
ExcludeFromSelect=*
For more information on driver control flags, see INF ControlFlags Section.
The INF file must add all software registry settings to the Plug and Play (PnP) software key as shown in the
following example:
[Xxx.Mfg]
"RADEON 8500/RADEON 8500LE (R200 LDDM)" = R200_R200, PCI\VEN_1002&DEV_514c&SUBSYS_003a1002
[R200_R200]
Include=msdv.inf
CopyFiles=R200.Miniport, R200.Display
AddReg = R200_R200_SoftwareDeviceSettings

Adding User-Mode Display Driver Names to the
Registry
You must set the following entry in an add-registry section of the INF file so that the names of user-mode display
drivers are added to the registry during driver installation:
[Xxx_SoftwareDeviceSettings]
...
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, UserModeDriverName1, UserModeDriverName2,
UserModeDriverNameWow1, UserModeDriverNameWow2
For example, for x86 computers:
...
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, r200umd
For example, for x64 computers:
...
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, r200umd, r200umdva, r200umd64, r200umd64va
Microsoft Windows Hardware Quality Labs (WHQL) test programs use the list of user-mode display driver names
to validate that the driver binaries remain unchanged over a test run. Other applications might also use the list of
user-mode display driver names, typically through Implementing WMI (WMI), as the list of files that the
applications determine are part of the driver package.
You must set the following entry in an add-registry section of the INF file so that the user-mode display driver's
DLL name is added to the registry during driver installation and so that the Microsoft Direct3D runtime can
subsequently load the DLL:
...
HKR,, UserModeDriverName, %REG_MULTI_SZ%, Xxx.dll
The INF file must contain information to direct the operating system to copy the user-mode display driver into the
system's %systemroot%\system32 directory. For more information, see INF CopyFiles Directive and INF
DestinationDirs Section.
The Direct3D runtime obtains the user-mode display driver's DLL name from the registry in order to load the user-
mode display driver in the runtime's process space.
The FeatureScore directive is required for all drivers that install and run on Windows Vista and later operating
systems.
Note Applies only to Windows 7 and later versions. The system-supplied display class installer determines
whether to install display drivers based on the presence of the FeatureScore directive and the value that the
FeatureScore directive sets. If you attempt to install display drivers that do not have feature score set, you receive
an error message.
Note A logo test requirement is that drivers that install and run on Windows XP and earlier operating systems and
Windows Server 2003 and earlier operating systems not set the FeatureScore directive.
You must use the FeatureScore directive to set the feature score to the following values, depending on the display
driver model that the driver is written to and how the driver is distributed.
F8 for in-box drivers that are written to the Windows Display Driver Model (WDDM)
F6 for vendor-supplied drivers that are written to WDDM
FC for vendor-supplied drivers that are written to the Windows 2000 display driver model
The following examples show how to add the FeatureScore directive:
[R200_RV200]
FeatureScore=F6
[R200_R200]
FeatureScore=F6
AddReg = R200_R200_SoftwareDeviceSettings
[R200_RV250]
FeatureScore=F6

A new copy-file flag is required for display drivers that are written to the Windows Display Driver Model (WDDM)
in order to properly support Plug and Play (PnP) stop (that is, driver upgrades that don't require a system restart).
Note This flag is required only for user-mode display driver binaries and not for display miniport drivers.
The following example shows the new copy-file flag that is added to just the copy-file section for user-mode
display drivers and not display miniport drivers:
;
; File sections
;
[r200.Miniport]
r200.sys
[r200.Display]
For more information about the CopyFiles directive and file sections that are associated with CopyFiles, see INF
CopyFiles Directive.
You should set display drivers that are written to the Windows Display Driver Model (WDDM) to start to run on
demand on Windows Vista and later, rather than during operating-system initialization, as was the case with
display drivers that ran on operating systems prior to Windows Vista. This change is due to manifest and image-
based-install functionality that was not present on operating systems prior to Windows Vista. You should set the
value for the StartType entry to SERVICE_DEMAND_START (3) rather than SERVICE_SYSTEM_START (1).
The following example shows a service-install section with the value for the StartType entry set to
SERVICE_DEMAND_START to indicate that the display miniport driver is started on demand:
;
; Service Installation Section
;
[R200_Service_Inst]
ServiceType = 1 ; SERVICE_KERNEL_DRIVER
StartType = 3 ; SERVICE_DEMAND_START
ErrorControl = 0 ; SERVICE_ERROR_IGNORE
LoadOrderGroup = Video
ServiceBinary = %12%\r200.sys
For more information about service-install sections that are associated with the AddService directive, see INF
AddService Directive.
To ensure that no Microsoft Direct3D display drivers are exposed to possible interoperability issues with OpenGL
installable client drivers (ICDs), you must set the following entry in an add-registry section of the INF:
...
HKR,, CapabilityOverride, %REG_DWORD%, 0x8

Appending Information to the Friendly String Names
of Graphics Adapters
You must append information to the string names of graphics adapters. This information depends on the display
driver model that the adapters' drivers are written to:
For the Windows 2000 Display Driver Model, you must append "(Microsoft Corporation)":
XDDM Foo Device Name (Microsoft Corporation)
For the Windows Display Driver Model (WDDM), you must append "(Microsoft Corporation - WDDM)":
New Driver Model Foo Device Name (Microsoft Corporation - WDDM)
For more information about the Strings section and the %strkey% tokens that are specified elsewhere in the INF,
see INF Strings Section.
You must not specify any information for the LayoutFile and CatalogFile directives in the Version section. The
following example shows a typical Version section:
[Version]
Provider=%MSFT%
Class=Display
DriverVer=11/22/2004, 6.14.10.7000
For more information about the Version section and directives that are associated with Version, see INF Version
Section.
You must specify all in-box display drivers in the SourceDisksNames and SourceDisksFiles sections. The
following example shows how to identify the CD-ROM discs and the names of the source files that are contained
on the disks. The source files are transferred to the target computer during installation.
[SourceDisksNames]
3426=windows cd
[SourceDisksFiles]
DisplayMiniportDriverName.sys = 3426
UserModeDriverName1.dll = 3426
UserModeDriverName2.dll = 3426
For more information about the SourceDisksNames and SourceDisksFiles sections, see INF
SourceDisksNames Section and INF SourceDisksFiles Section.
The following x64-specific information is required for an INF file that loads display drivers that run on 64-bit
Windows Vista and later:
[DestinationDirs]
DefaultDestDir = 11
R200.Miniport = 12 ; drivers
R200.Display = 11 ; system32
R200.DispWow = 10, SysWow64 ; x64-specific
[Manufacturer]
%ATI% = ATI.Mfg, NTamd64 ; Ntamd64 is x64-specific
[ATI.Mfg.NTamd64] ; Ntamd64 is x64-specific
[R200_RV200]
FeatureScore=F8
CopyFiles=R200.Miniport, R200.Display, R200.DispWow ; R200.DispWow is x64-specific
; File sections
;
[r200.Miniport]
r200.sys
[r200.Display]
; The following [R200.DispWow] section is x64-specific
[R200.DispWow]

Generally, all references to out-of-box or production and retail binaries, services, and add-registry or del-registry
sections that are typically part of your retail Microsoft Windows Hardware Quality Labs (WHQL) driver packages
are not listed in the Windows Vista in-box driver packages.
Examples of these types of references cannot be listed because they vary so much per vendor. However, typically,
you should not refer to anything that is required by your OpenGL installable client drivers' (ICDs) help files, out-of-
box services, polling applications, and so on.
Overriding Monitor EDIDs with an INF
With an INF file you can override the Extended Display Identification Data (EDID) of any monitor. A sample INF file,
Monsamp.inf, that shows how to do this was provided with the Windows Driver Kit (WDK) through Windows 7
(WDK version 7600). Monsamp.inf is reproduced here.
For info on how to use and modify Monsamp.inf, see Monitor INF File Sections.
Approaches to correcting EDIDs

All monitors, analog or digital, must support EDID, which contains info such as the monitor identifier, manufacturer
data, hardware identifier, timing info, and so on. This data is stored in the monitor’s EEPROM in a format that is
specified by the Video Electronics Standards Association (VESA).
Monitors provide the EDID to Microsoft Windows components, display drivers, and some user-mode applications.
For example, during initialization the monitor driver queries the Windows Display Driver Model (WDDM) driver for
its brightness query interface and device driver interface (DDI) support, which is in the EDID. Incorrect or invalid
EDID info on the monitor’s EEPROM can therefore lead to problems such as setting incorrect display modes.
There are two approaches to correcting EDIDs:
The standard solution is to have the customer send the monitor back to the manufacturer, who reflashes the
EEPROM with the correct EDID and returns the monitor to the customer.
A better solution, described here, is for the manufacturer to implement an INF file that contains the correct EDID
info, and have the customer download it to the computer that's connected to the monitor. Windows extracts the
updated EDID info from the INF and provides it to components instead of the info from the EEPROM EDID,
effectively overriding the EEPROM EDID.
In addition to replacing the EDID info as described here, a vendor can provide an override for the monitor name
and the preferred display resolution. Such an override is frequently made available to customers through Windows
Update or digital media in the shipping box. Such an override receives higher precedence than the EDID override
mentioned here. Guidelines for achieving this can be found in Monitor INF File Sections.
EDID format
EDID data is formatted as one or more 128-byte blocks:
EDID version 1.0 through 1.2 consists of a single block of data, per the VESA specification.
With EDID version 1.3 or enhanced EDID (E-EDID), manufacturers can specify one or more extension blocks in
addition to the primary block.
Each block is numbered, starting with 0 for the initial block. To update EDID info, the manufacturer’s INF specifies
the number of the block to be updated and provides 128 bytes of EDID data to replace the original block. The
monitor driver obtains the updated data for the corrected blocks from the registry and uses the EEPROM data for
the remaining blocks.
Updating an EDID
To update an EDID by using an INF:
1. The monitor manufacturer implements an INF that contains the updated EDID info and downloads the file to
the user’s computer. This can be done through Windows Update or by shipping a CD with the monitor.
2. The monitor class installer extracts the updated EDID info from the INF and stores the info as values under
this registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\DISPLAY
Each EDID override is stored under a separate key. For example:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\DISPLAY\DELA007\
5&1608c50f&0&10000090&01&20\Device Parameters\EDID_Override
3. The monitor driver checks the registry during initialization and uses any EDID info that's stored there
instead of the corresponding info on EEPROM. EDID info that has been added to the registry always takes
precedence over EEPROM EDID info.
4. Windows components and user-mode apps use the updated EDID info.
Overriding an EDID with an INF

To override an EDID, include an AddReg directive in the INF for each block that you want to override, in the
following format:
HKR, EDID_OVERRIDE, BlockNumber, Byte 1, Byte 2, Byte 3, Byte 4,...
The block number is followed by 128 hexadecimal integers that contain the binary EDID data.
Manufacturers must update only those EDID blocks that are incorrect. The system obtains the remaining blocks
from EEPROM. The following example shows the relevant sections of an INF that updates EDID blocks 0, 4, and 5.
The monitor driver obtains blocks 1 - 3 and any extension blocks that follow block 5 from EEPROM:
[ABC.DDInstall.HW]
ABC.AddReg
...
[ABC.AddReg]
HKR, EDID_OVERRIDE, 0, 1, 00, FF, ..., 3B
HKR, EDID_OVERRIDE, 4, 1, 1F, 3E, ..., 4E
HKR, EDID_OVERRIDE, 5, 1, 24, 5C, ..., 2D
...
For more info on INFs in general, and AddReg and DDInstall in particular, see Creating an INF File.
; monsamp.INF
;
; Copyright (c) Microsoft Corporation. All rights reserved.
;
; This is a generic INF file for overriding EDIDs
; of any monitors, starting with Windows Vista.
;
[Version]
signature="$WINDOWS NT$"
Class=Monitor
ClassGuid={4D36E96E-E325-11CE-BFC1-08002BE10318}
Provider="MS_EDID_OVERRIDE"
DriverVer=04/18/2006, 1.0.0.0
; Be sure to add the directive below with the proper catalog file after
; WHQL certification.
; WHQL certification.
;CatalogFile=Sample.cat
[DestinationDirs]
DefaultDestDir=23
[SourceDisksNames]
1=%SourceDisksNames%
; Enable the following section to copy a monitor profile.

[SourceDisksFiles]
;profile1.icm=1
[Manufacturer]
%MS_EDID_OVERRIDE%=MS_EDID_OVERRIDE,NTx86,NTamd64
; Modify the hardware ID (MON1234) to match that of the monitor being used.
[MS_EDID_OVERRIDE.NTx86]
%MS_EDID_OVERRIDE-1%=MS_EDID_OVERRIDE-1.Install, MONITOR\MON1234
; Modify the hardware ID (MON1234) to match that of the monitor being used.
[MS_EDID_OVERRIDE.NTamd64]
%MS_EDID_OVERRIDE-1%=MS_EDID_OVERRIDE-1.Install.NTamd64, MONITOR\MON1234
[MS_EDID_OVERRIDE-1.Install.NTx86]
DelReg=DEL_CURRENT_REG
AddReg=MS_EDID_OVERRIDE-1.AddReg, 1024, 1280, DPMS
CopyFiles=MS_EDID_OVERRIDE-1.CopyFiles
[MS_EDID_OVERRIDE-1.Install.NTamd64]
AddReg=MS_EDID_OVERRIDE-1.AddReg, 1024, 1280, DPMS
CopyFiles=MS_EDID_OVERRIDE-1.CopyFiles
[MS_EDID_OVERRIDE-1.Install.NTx86.HW]
AddReg=MS_EDID_OVERRIDE-1_AddReg
[MS_EDID_OVERRIDE-1.Install.NTamd64.HW]
AddReg=MS_EDID_OVERRIDE-1_AddReg
[MS_EDID_OVERRIDE-1_AddReg]
HKR,EDID_OVERRIDE,"0",0x01,0x00,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0x00,0x35,\
0xEE,0x34,0x12,0x01,0x00,0x00,0x00,0x0A,0x0E,0x01,0x03,0x68,0x22,0x1B,\
0x78,0xEA,0xAE,0xA5,0xA6,0x54,0x4C,0x99,0x26,0x14,0x50,0x54,0xA5,0x4B,\
0x00,0x71,0x4F,0x81,0x80,0xA9,0x40,0x01,0x01,0x01,0x01,0x01,0x01,0x01,\
0x01,0x01,0x01,0x30,0x2A,0x00,0x98,0x51,0x00,0x2A,0x40,0x30,0x70,0x13,\
0x00,0x52,0x0E,0x11,0x00,0x00,0x1E,0x00,0x00,0x00,0xFF,0x00,0x41,0x42,\
0x30,0x30,0x30,0x30,0x30,0x30,0x30,0x30,0x30,0x31,0x0A,0x00,0x00,0x00,\
0xFC,0x00,0x4D,0x53,0x20,0x31,0x32,0x33,0x34,0x0A,0x0A,0x0A,0x0A,0x0A,\
0x0A,0x00,0x00,0x00,0xFD,0x00,0x38,0x4C,0x1F,0x50,0x12,0x00,0x0A,0x20,\
0x20,0x20,0x20,0x20,0x20,0x00,0xDB
[DEL_CURRENT_REG]
HKR,MODES
HKR,,MaxResolution
HKR,,DPMS
HKR,,ICMProfile
; Pre-defined AddReg sections. These can be used for default settings

; when a given standard resolution is used.
[1024]
HKR,,MaxResolution,,"1024,768"
[1280]
HKR,,MaxResolution,,"1280,1024"
[DPMS]
HKR,,DPMS,,1
[MS_EDID_OVERRIDE-1.AddReg]
HKR,"MODES\1024,768",Mode1,,"31.0-94.0,55.0-160.0,+,+"
HKR,"MODES\1280,1024",Mode1,,"31.0-94.0,55.0-160.0,+,+"
; Enable the following section to copy a monitor profile.

[MS_EDID_OVERRIDE-1.CopyFiles]
;PROFILE1.ICM
[Strings]
MonitorClassName="Monitor"
SourceDisksNames="MS_EDID_OVERRIDE Monitor EDID Override Installation Disk"
MS_EDID_OVERRIDE="MS_EDID_OVERRIDE"
MS_EDID_OVERRIDE-1="MS EDID Override"

Installation Requirements for Display Drivers
Optimized for Windows 7 and Later
This section applies only to Windows 7 and later, and Windows Server 2008 R2 and later versions of Windows
operating system.
Any INF files for display drivers that are written to the Windows Display Driver Model (WDDM), and that are
optimized for Windows 7 and later, must conform to several requirements that are described in Installation
Requirements for Display Miniport and User-Mode Display Drivers. The most notable change is in the
FeatureScore directive.
The following requirements are new for display driver INF files starting with Windows 7:
Setting the Feature Score for Windows 7 Display Drivers
Appending Information to the Friendly String Names for Windows 7 Display Drivers
Differentiating the SKU for Windows 7 Display Drivers
Encoding Windows 7 Display Driver INF Files in Unicode
Setting the Feature Score for Windows 7 Display
Drivers
operating system.
The FeatureScore directive is required for all drivers that install and run on Windows Vista and later operating
systems. The feature score settings that apply for Windows Vista are described in Setting the Driver Feature Score.
The following table shows the feature score settings that apply for Windows 7 and later.
FEATURE
SCORE DRIVER MODEL AND DISTRIBUTION METHOD
E6 Vendor-supplied drivers that are written to the Windows

Display Driver Model (WDDM) are optimized for the
model's Windows 7 features, are packaged in a Windows
7 driver package that is qualified by the Windows
Hardware Quality Labs (WHQL), and are included in the
Windows Compatibility Center tested products list
E6 Vendor-supplied drivers that are written to the WDDM

are optimized for the model's Windows 7 features, and
are packaged with a unified Windows 7 and Windows
Vista driver package that is certified by using the
Windows Hardware Certification Kit
EC In-box drivers that are written to WDDM are optimized

for the model's Windows 7 features, and are packaged
with a Windows 7 driver package
F4 Vendor-supplied drivers that are written to WDDM with a

unified Windows 7 and Windows Vista driver package,
with certification of the package by using the Windows
Hardware Certification Kit
F4 In-box drivers that are written to WDDM with a Windows

7 driver package
F6 Vendor-supplied drivers that are written to WDDM with a

Windows Vista driver package that is qualified by WHQL
and included in the Windows Vista Compatibility Center
tested products list
F8 In-box drivers that are written to WDDM with a Windows

Vista driver package
FEATURE
SCORE DRIVER MODEL AND DISTRIBUTION METHOD
FC Vendor-supplied drivers that are written to the Windows

2000 display driver model
FD In-box drivers in Windows Vista that are written to the

Windows 2000 display driver model
FE Video graphics array (VGA) drivers
For drivers written to WDDM, graphics hardware vendors must place the FeatureScore directive under the
DDInstall section of their INF file and use FeatureScore to apply the feature score to the drivers.
For Windows 2000 Display Driver Model drivers, Microsoft applies the appropriate feature score through the class
installer at the time of driver installation, or in the INF for in-box Windows 2000 Display Driver Model drivers.
Vendors must not use the FeatureScore directive to insert a feature score for drivers written to the Windows 2000
Display Driver Model.
An unsigned driver receives a feature score that is equal to FF. This value is the default and indicates no score.
Appending Information to the Friendly String Names
for Windows 7 Display Drivers
operating system.
A graphics adapter's friendly name is a localizable string name that is required in the INF for every Windows 7 in-
box display driver. The section on Appending Information to the Friendly String Names of Graphics Adapters
describes the information that you must append for the Windows Display Driver Model (WDDM) and the Windows
2000 Display Driver Model. For the WDDM optimized for Windows 7, you must append "(Microsoft Corporation -
WDDM v1.1)":
New Driver Model Foo Device Name (Microsoft Corporation - WDDM v1.1)
The text appended to the graphics adapter's friendly name specifies the WDDM version that the driver uses.
Differentiating the SKU for Windows 7 Display
Drivers
All in-box display driver INF files in Windows Server 2008, Windows Vista SP1, and later versions must include a
value that indicates that the drivers are for Windows Client editions only and that they do not install on Windows
Server SKUs.
In Windows 7, Windows Vista SP1, Windows Server 2008, and Windows Server 2008 R2, the Manufacturer
directive must be followed by a string in the following form:
NT<platform>...1
In this string, platform is x86 or amd64.

The following example shows the Manufacturer directive, and the section that follows it for drivers for x86
systems:
[Manufacturer]
%ATI% = ATI.Mfg,NTx86...1
[ATI.Mfg.NTx86...1]
The following example shows the Manufacturer directive, and the section that follows it for drivers for x64
systems:
[Manufacturer]
%ATI% = ATI.Mfg,NTamd64...1
[ATI.Mfg.NTamd64...1]
For more information about the Manufacturer section, see INF Manufacturer Section.
Encoding Windows 7 Display Driver INF Files in
Unicode
All in-box INF files must be encoded as Unicode; they must not be ANSI.
To encode an INF file as Unicode (or to verify whether the INF file is encoded as Unicode), perform the following
steps:
1. Use Notepad to open the INF file.
2. On the File menu, click Save As.
3. If "ANSI" appears in the Encoding field of the dialog box, change the encoding to "Unicode" and save the file
under a new name.
Initializing Display Miniport and User-Mode Display
Drivers
The following topics describe how the display miniport driver and the user-mode display driver are initialized. Plug
and Play (PnP) cases and initializing use of memory segments are also discussed.
Plug and Play (PnP) start and stop cases (WDDM 1.2 and later)
Providing seamless state transitions in WDDM 1.2 and later
Initializing Communication with the Direct3D User-Mode Display Driver
Providing Kernel-Mode Support to the OpenGL Installable Client Driver
Plug and Play (PnP) in WDDM 1.2 and later
All Windows Display Driver Model (WDDM) 1.2 and later display miniport drivers must support the following
behavior in response to start and stop requests by the Plug and Play (PnP) infrastructure. Behavior can differ
depending on whether the driver returns a success or failure code, or whether the system hardware is based on
the basic input/output system (BIOS) or Unified Extensible Firmware Interface (UEFI).
Minimum WDDM version 1.2
Driver implementation-Full graphics and Display only Mandatory
WHCK requirements and tests Device.Graphics.WDDM12.Display.PnpStopStartSupp

ort
Display miniport driver PnP DDI

Starting in Windows 8, the Microsoft DirectX graphics kernel subsystem provides this function that a driver can
call if the display device is started or resumed from hibernation:
DxgkCbAcquirePostDisplayOwnership
These functions and structure are available for the display miniport driver to implement WDDM 1.2 and later PnP
requirements:
DxgkDdiStopDeviceAndReleasePostDisplayOwnership
DxgkDdiSystemDisplayEnable
DxgkDdiSystemDisplayWrite
DXGK_DISPLAY_INFORMATION
PnP start operation

A Plug and Play (PnP) start process on the display device occurs either during boot or during an upgrade from one
display driver to another. In this case the driver must call the DxgkCbAcquirePostDisplayOwnership function to get
information about the frame buffer and to maintain display synchronization. Frame buffer information is provided
either from the firmware or from the previous WDDM 1.2 and later driver that was loaded on the system.
During calls the operating system makes to DxgkDdiSetPowerState function to return to the D0 power state, and
to the DxgkDdiStartDevice function, the WDDM 1.2 and later driver must set source visibility to false
(DXGKARG_SETVIDPNSOURCEVISIBILITY.Visible = FALSE) for all active video present network (VidPN) targets.
In this case the display pipeline hardware must maintain sync signals with the monitor, but the pipeline must
continue to send black pixel data to the monitor no matter what pixel data is present in the surface that's currently
being scanned out. This means that the pixel pipeline is guaranteed to be blanking the monitor with all black
pixels. Later, when the first frame is rendered into the frame buffer, the operating system sets source visibility to
true.
All of these procedures keep the monitor synchronized and ensure that the user doesn't see flashes or flickers on
the screen.
These are the return codes that the driver should return after a PnP start process.
DRIVER RETURN CODE DESCRIPTION
Success Behavior is the same as in Windows 7.

For a BIOS-based system, if the driver starts successfully,
the frame buffer is still active and the driver must be
ready to set to a valid mode.
Failure For a BIOS-based system, the driver must leave the

system in a BIOS-compatible state.
For a UEFI-based system, the driver must leave the
display in the same mode that was set by the UEFI
Graphics Output Protocol (GOP) so that the basic display
driver can use the display. The driver must return a valid
error code. If the driver cannot leave the GOP in a state
that can be used by the basic display driver, the driver
must return the STATUS_GRAPHICS_STALE_MODESET
error code from Ntstatus.h, and the operating system
causes a system bugcheck to occur.
PnP stop operation

A Plug and Play (PnP) stop process on the display device typically occurs when a driver is being upgraded to a
new version. In this case the operating system calls the driver's
DxgkDdiStopDeviceAndReleasePostDisplayOwnership function, which requires the driver to provide accurate
frame buffer information.
In the DxgkDdiStopDeviceAndReleasePostDisplayOwnership call the driver must ensure that the source visibility
for the active VidPn targets is true (DXGKARG_SETVIDPNSOURCEVISIBILITY.Visible = TRUE). In addition,
starting in WDDM 1.2 the driver needs to ensure that the surface that the pixel pipeline is programmed to scan out
from is filled with black pixels. The driver should complete filling the surface with black pixels before before source
visibility is set to true.
Be sure to also implement DxgkDdiStopDevice in your driver. In some cases the operating system might call
DxgkDdiStopDevice instead of DxgkDdiStopDeviceAndReleasePostDisplayOwnership, or after a call to
DxgkDdiStopDeviceAndReleasePostDisplayOwnership fails.
These are the return codes that the driver should return after a PnP stop process.
Success, and driver returns mode information Before the driver is stopped it must set up a frame buffer,
using the current resolution, that the basic display driver
can use, and the driver must return this information when
the operating system calls the
function. The saved mode information doesn't have to be
compatible with BIOS, and the basic display driver won't
offer a BIOS mode until the system is rebooted.
The operating system guarantees that it won't call
DxgkDdiStopDevice if
returns STATUS_SUCCESS.
Success, and driver sets the Width and Height members This scenario is possible only if the system has two
of the DXGK_DISPLAY_INFORMATION structure to zero graphics cards, no monitors are connected to the current
power-on self-test (POST) device, and the operating
system calls the
function to stop the POST device.
In this case the current display continues to run on the
second graphics adapter, and the basic display driver runs
in headless mode on the adapter that supports the POST
device.
Failure The operating system calls the Windows 7-style PnP stop
driver interface through the DxgkDdiStopDevice function.
For a BIOS-based system, the driver must set the display
into a BIOS-compatible mode.
For a UEFI-based system, the basic display driver runs in
headless mode on the graphics adapter.
For further requirements on PnP and other state transitions, see Providing seamless state transitions in WDDM 1.2
and later.

WHCK documentation on Device.Graphics.WDDM12.Display.PnpStopStartSupport.
Providing seamless state transitions in WDDM 1.2
and later
Starting in Windows 8, several features help to minimize or eliminate screen flashes and flickers during the boot
process, during transitions from lower power states, and during transitions back to operating system control in
driver upgrades or system bug checks. In addition, system firmware on Windows 8 and later computers must
detect native resolution and timing of the integrated display panel at the time of power up and hand off this
information to the operating system. Windows Display Driver Model (WDDM) 1.2 and later display miniport
drivers must support this behavior.
WHCK requirements and tests System.Client.Firmware.UEFI.GOP.Display

Device.Graphics…PnpStopStartSupport
Device.Graphics…DisplayOutputControl
Transition from firmware to operating system

All Windows 8 systems targeted for client SKUs must support the Unified Extensible Firmware Interface (UEFI)
Graphics Output Protocol (GOP). During the boot phase, the GOP sets the native timing and native resolution on
the integrated display panel of the system. When the operating system is ready to take over ownership of the
display, the GOP hands off a frame buffer that can be used to scan out to the display. At this time the operating
system doesn't attempt to reset the display timings or the resolution but simply uses the provided frame buffer,
thereby eliminating one screen flash.
WHCK documentation on System.Client.Firmware.UEFI.GOP.Display.
Transition from operating system to driver

When the operating system hands ownership of the display to the WDDM driver after a boot, it initiates a Plug
and Play (PnP) start of the device by calling the DxgkDdiStartDevice function. Alternately, after resuming from
hibernation the operating system starts the device by calling the DxgkDdiSetPowerState function with the
DeviceUid parameter set to DISPLAY_ADAPTER_HW_ID (defined in Video.h). At this time typically the screen is
blanked out (renders as black) while the WDDM graphics driver takes control.
The driver can call the DxgkCbAcquirePostDisplayOwnership function (available starting in Windows 8) to
query the operating system for the exact state of the current frame buffer and the display mode that was set by
the firmware and boot loader. With the information in the DXGK_DISPLAY_INFORMATION structure retrieved
by this function, it's possible for the driver to keep the display controller active and not cause a re-synchronization
of the monitor. Because the driver also has detailed information about the frame buffer, it's possible to perform a
smoother transition.
More details on PnP start are given in Plug and Play (PnP) in WDDM 1.2 and later.
Transition from driver to operating system

The operating system can request a PnP stop of the display device by calling the DxgkDdiStopDevice function. At
this time typically the screen is blanked out (renders as black) while the operating system takes over the display
control. The operating system can call the DxgkDdiStopDeviceAndReleasePostDisplayOwnership function
(available starting in Windows 8) that requires the WDDM driver to set up a frame buffer configured for scan out.
The operating system can render into this frame buffer while it's in control of the display, making it possible to
perform a smooth transition.
More details on PnP stop, including additional scenarios, are given in Plug and Play (PnP) in WDDM 1.2 and later.
For more info about this handoff, refer to the relevant WHCK documentation on Device.Graphics…
PnpStopStartSupport.
Transition to operating system without disabling driver

Sometimes the operating system experiences an unrecoverable error and has to issue a system bug check. When
this happens, there are certain cases where the operating system has to take control of the display but doesn't
have the ability to stop the WDDM driver. WDDM 1.2 and later drivers are required to implement the
DxgkDdiSystemDisplayEnable and DxgkDdiSystemDisplayWrite functions, which let the operating system
seamlessly transition to a state where it can display the error screen while maintaining the graphical interface at a
high resolution and color depth. This transition eliminates a jarring user experience.
WHCK documentation on Device.Graphics…DisplayOutputControl.
Windows 8 firmware mode changes

These are changes to the firmware's display mode before the firmware hands off control to the operating system:
WDDM 1.2 and later drivers (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN8)
To further eliminate display flashes, starting with Windows 8, Int10 mode change requests are not called on the
firmware for WDDM 1.2 and later drivers.
In addition, if a mode change occurs while the monitor is turned off, the operating system calls the
DxgkDdiCommitVidPn function only once, with the pCommitVidPnArg parameter set to the value it would have if
the monitor were turned on, and the PathPoweredOff member of pCommitVidPnArg->Flags set to TRUE.
WDDM 1.0 and 1.1 drivers (DXGKDDI_INTERFACE_VERSION < DXGKDDI_INTERFACE_VERSION_WIN8)
For WDDM versions 1.0 and 1.1 drivers running on Windows 8, during the boot process or when resuming from
hibernation, calls into Int10 VGA mode 0x12 are made that set the display resolution to the monitor's native high
resolution. Prior to Windows 8, an Int10 VGA mode 0x12 call set the display resolution to 640 x 480 pixels, at 16
bits per pixel, with no flashing cursor, to show the operating system splash screen image.
However, for WDDM versions 1.0 and 1.1 drivers that indicate they don't support high-resolution mode, starting
in Windows 8 a boot into VGA mode 0x12 sets the display resolution to 640 x 480 pixels, at 16 bits per pixel, with
no flashing cursor. When the system resumes from hibernation, the display resolution will still be set to the
monitor's native high resolution.
In addition, if a mode change occurs while the monitor is turned off, the operating system calls the
DxgkDdiCommitVidPn function as described above for WDDM 1.2 drivers, plus it calls DxgkDdiCommitVidPn a
second time with an empty video present network (VidPN) in pCommitVidPnArg->hFunctionalVidPn , and none
of the flag values set in pCommitVidPnArg->Flags.
This two-part calling sequence also occurs when the system resumes after hibernation and monitor sync
generation is to remain enabled. In this case the driver should take no action when it receives the second call to
DxgkDdiCommitVidPn.
Standby hibernate optimizations
Windows 8 offers optimizations to the graphics stack that your driver can optionally take advantage of to improve
system performance on sleep and resume.
Driver implementation—Full graphics and Render only Optional
WHCK requirements and tests Device.Graphicsâ€¦StandbyHibernateFlags
Standby hibernate device driver interface (DDI)

These structures are new or updated starting with Windows 8 to support standby hibernation.
DXGK_QUERYADAPTERINFOTYPE
DXGK_SEGMENTDESCRIPTOR3
DXGK_SEGMENTFLAGS
Every device that can support this feature should take advantage of these hibernate optimizations. When a WDDM
1.2 or later driver enumerates segment capabilities, it must also set one or more of the standby hibernate flags
PreservedDuringStandby, PreservedDuringHibernate, and PartiallyPreservedDuringHibernate. See
Remarks of the DXGK_SEGMENTFLAGS topic for more details.
Using standby hibernate optimizations

When a PC transitions to sleep or resumes from sleep, several operations occur to make sure that video memory
content is properly preserved and restored. Some of these operations are unnecessary and can be avoided:
An integrated graphics adapter uses system memory as video memory. Because system memory is always
refreshed when a computer goes to sleep, no eviction is necessary. Therefore, the delays that are introduced by
the graphics stack can be brought down to zero delay or to the order of few milliseconds.
The total time to purge memory on discrete adapters equals the amount of memory that is purged, divided by
the rate of purge. Thus the time can be reduced by reducing the amount of memory to purge.
The goal of these operations is to make sure that the only data that is discarded is data that can be re-created.
WDDM 1.2 drivers can take advantage of these optimizations by specifying which allocations should be preserved
during power state transitions.
Newer generations of discrete graphics adapters can be designed to refresh their memory when in standby (self
refreshing VRAM). These adapters will benefit from these optimizations.
Eviction will still be relevant for discrete graphics adapters that donâ€™t have the self-refreshing VRAM feature. In
these cases, the performance optimization is to minimize the amount of data that is preserved. For example,
unused data in video memory such as offered allocations, discarded allocations, and unused direct memory access
(DMA) buffers can be discarded.
This feature can yield these benefits:
Doing no work: On integrated and discrete graphics adapters (with self-refresh VRAM feature), the delay that is
introduced by the graphics stack can be brought down to zero delay or to the order of few milliseconds.
Doing less work: On discrete graphics adapters, the performance improvement is mostly dependent on how
much unused data in video memory is discarded.
Reduced memory trashing: The larger the amount of memory evicted, the greater the effect of memory
trashing. This has a bigger impact on discrete graphics adapters because they require a large amount of system
memory to evict.

WHCK documentation on Device.Graphicsâ€¦StandbyHibernateFlags.
After the operating system has loaded the display miniport driver, the following steps occur to initialize the display
miniport driver:
1. The operating system calls the display miniport driver's DriverEntry function.
2. DriverEntry allocates a DRIVER_INITIALIZATION_DATA structure and populates the Version member of
DRIVER_INITIALIZATION_DATA with DXGKDDI_INTERFACE_VERSION and the remaining members of
DRIVER_INITIALIZATION_DATA with pointers to the display miniport driver's other entry point functions
(that is, the functions that the display miniport driver implements).
3. DriverEntry calls the DxgkInitialize function to load the Microsoft DirectX graphics kernel subsystem
(Dxgkrnl.sys) and to supply the DirectX graphics kernel subsystem with pointers to the display miniport
driver's other entry point functions.
4. After DxgkInitialize returns, DriverEntry propagates the return value of DxgkInitialize back to the
operating system. Display miniport driver writers should make no assumptions about the value that
DxgkInitialize returns.
Initializing Communication with the Direct3D User-
Mode Display Driver
To initialize communication with the Microsoft Direct3D user-mode display driver, which is a dynamic-link library
(DLL), the Direct3D runtime first loads the DLL. The Direct3D runtime next calls the user-mode display driver's
OpenAdapter function through the DLL's export table to open an instance of the graphics adapter. The
OpenAdapter function is the DLL's only exported function.
In the call to the driver's OpenAdapter function, the runtime supplies the pfnQueryAdapterInfoCb adapter
callback function in the pAdapterCallbacks member of the D3DDDIARG_OPENADAPTER structure. The runtime
also supplies its version in the Interface and Version members of D3DDDIARG_OPENADAPTER. The user-mode
display driver must verify that it can use this version of the runtime. The user-mode display driver returns a table
of its adapter-specific functions in the pAdapterFuncs member of D3DDDIARG_OPENADAPTER.
The user-mode display driver should call the pfnQueryAdapterInfoCb adapter callback function to query for the
graphics hardware capabilities from the display miniport driver.
The runtime calls the user-mode display driver's CreateDevice function (one of the driver's adapter-specific
functions) to create a display device for handling a collection of render state and to complete the initialization.
When the initialization is complete, the Direct3D runtime can call the display driver-supplied functions, and the
user-mode display driver can call the runtime-supplied functions.
The user-mode display driver's CreateDevice function is called with a D3DDDIARG_CREATEDEVICE structure
whose members are set up in the following manner to initialize the user-mode display driver interface:
The runtime sets Interface to the version of the interface that the runtime requires from the user-mode
display driver.
The runtime sets Version to a number that the driver can use to identify when the runtime was built. For
example, the driver can use the version number to differentiate between a runtime released with Windows
Vista and a runtime released with a subsequent service pack, which might contain a fix that the driver
requires.
The runtime sets hDevice to specify the handle that the driver should use when the driver calls back into
the runtime. The driver generates a unique handle and passes it back to the runtime in hDevice. The
runtime should use the returned hDevice handle in subsequent driver calls.
The runtime supplies a table of its device-specific callback functions in the D3DDDI_DEVICECALLBACKS
structure to which pCallbacks points. The user-mode display driver calls the runtime-supplied callback
functions to access kernel-mode services in the display miniport driver.
The user-mode display driver returns a table of its device-specific functions in the D3DDDI_DEVICEFUNCS
structure to which pDeviceFuncs points.
Note The number of display devices (graphics contexts) that can simultaneously exist is limited only by available
system memory.
Memory segments, in the context of the display driver model for Windows Vista and later (WDDM), describe the
graphics processing unit's (GPU) address space to the video memory manager. Memory segments generalize and
virtualize video memory resources. Memory segments are configured according to the memory types that the
hardware supports (for example, frame buffer memory or system memory aperture).
To initialize how it uses memory segments, the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) calls the
display miniport driver's DxgkDdiQueryAdapterInfo function. To direct the display miniport driver to return
information about memory segments from the DxgkDdiQueryAdapterInfo call, the graphics subsystem specifies
either the DXGKQAITYPE_QUERYSEGMENT or the DXGKQAITYPE_QUERYSEGMENT3 value in the Type
member of the DXGKARG_QUERYADAPTERINFO structure.
The graphics subsystem calls the display miniport driver's DxgkDdiQueryAdapterInfo function twice for segment
information. The first call to DxgkDdiQueryAdapterInfo retrieves the number of segments supported by the driver,
and the second call retrieves detailed information about each segment. In the calls to DxgkDdiQueryAdapterInfo,
the driver points the pOutputData member of DXGKARG_QUERYADAPTERINFO to populated
DXGK_QUERYSEGMENTOUT structures (for a driver version prior to Windows Display Driver Model (WDDM)
1.2) or to populated DXGK_QUERYSEGMENTOUT3 structures (for a WDDM 1.2 and later driver).
In the first call, the pSegmentDescriptor member of DXGK_QUERYSEGMENTOUT (for a driver version prior to
WDDM 1.2) or DXGK_QUERYSEGMENTOUT3 (for a WDDM 1.2 and later driver) is set to NULL. The driver should
fill only the NbSegment member of DXGK_QUERYSEGMENTOUT or DXGK_QUERYSEGMENTOUT3 with the
number of segment types that it supports. This number also indicates the number of unpopulated
DXGK_SEGMENTDESCRIPTOR (for a driver version prior to WDDM 1.2) or DXGK_SEGMENTDESCRIPTOR3 (for
a WDDM 1.2 and later driver) structures that the driver requires from the second call to DxgkDdiQueryAdapterInfo.
In the second call, the driver should fill all members of DXGK_QUERYSEGMENTOUT or
DXGK_QUERYSEGMENTOUT3. In the second call, the driver should populate an array the size of NbSegment of
DXGK_SEGMENTDESCRIPTOR or DXGK_SEGMENTDESCRIPTOR3 structures in the pSegmentDescriptor
member of DXGK_QUERYSEGMENTOUT or DXGK_QUERYSEGMENTOUT3 with information about the
segments that the driver supports.
In both calls to DxgkDdiQueryAdapterInfo, the pInputData member of DXGKARG_QUERYADAPTERINFO points
to a DXGK_QUERYSEGMENTIN structure that contains information about the location and properties of the AGP
aperture. If no AGP aperture is available, or if one is present but no appropriate GART driver is installed, the
information about the AGP aperture is set to zero. If no AGP aperture is present, the display miniport driver should
not indicate, in the pSegmentDescriptor array of DXGK_QUERYSEGMENTOUT or
DXGK_QUERYSEGMENTOUT3, that it supports an AGP-type aperture segment. If an AGP-type aperture segment
is indicated in such circumstances, the adapter fails to initialize.
During initialization, because memory is plentiful, memory for the paging buffer can be allocated from a specific
segment. The video memory manager allocates memory for the paging buffer from the segment specified in the
PagingBufferSegmentId member of DXGK_QUERYSEGMENTOUT or DXGK_QUERYSEGMENTOUT3. The
driver indicates the identifier of the paging-buffer segment in the second call to DxgkDdiQueryAdapterInfo. The
driver should also specify the size in bytes that should be allocated for the paging buffer in the PagingBufferSize
member of DXGK_QUERYSEGMENTOUT or DXGK_QUERYSEGMENTOUT3.
For more information about memory segments and working with paging buffers, see Handling Memory Segments
and Paging Video Memory Resources.
Starting in Windows 8.1, a display miniport driver must implement the DxgkDdiGetNodeMetadata function, which
is used to query the engine capabilities of a GPU node.
This information helps with the evaluation of how workloads are scheduled and distributed among nodes and
improves the ability to debug applications.
Engine capabilities device driver interface (DDI)

This interface provides the engine capabilities of a specified GPU node:
DxgkDdiGetNodeMetadata
DXGKARG_GETNODEMETADATA
DXGK_ENGINE_TYPE
A pointer to the DxgkDdiGetNodeMetadata function is provided by the DxgkDdiGetNodeMetadata member of
the DRIVER_INITIALIZATION_DATA structure.
GPU node architecture

Each display adapter on the system has a number of different engines available to schedule tasks on. Each engine
is assigned to only one node, but each node may contain more than one engine if that node is associated with
multiple adapters—such as in linked display adapter (LDA) configuration, where multiple physical GPUs are linked
to form a single, faster, virtual GPU.
Different nodes represent the asymmetrical processing cores of the GPU, while the engines within each node
represent the symmetrical processing cores across adapters. That is, a 3-D node contains only identical 3-D
engines on several adapters, and never a different engine type.
Because the engines are always grouped together in nodes by engine type, the engine type information can be
queried based on a specified node. The types of engine that the display miniport driver can specify are listed in the
DXGK_ENGINE_TYPE enumeration.
Example implementation of node metadata function

This code shows how a display miniport driver can implement some of the engine types that can be returned by
the DxgkDdiGetNodeMetadata function.
NTSTATUS
IHVGetNodeDescription(
IN_CONST_HANDLE hAdapter,
UINT NodeOrdinal,
OUT_PDXGKARG_GETNODEMETADATA pGetNodeMetadata
)
{
DDI_FUNCTION();
PAGED_CODE();
if(NULL == pGetNodeMetadata)
{
return STATUS_INVALID_PARAMETER;
}
CAdapter *pAdapter = GetAdapterFromHandle(hAdapter);
//Invalid handle
if(NULL == pAdapter)
{
}
//Node ordinal is out of bounds. Required to return

//STATUS_INVALID_PARAMETER
if(NodeOrdinal >= pAdapter->GetNumNodes())
{
}
switch(pAdapter->GetEngineType(NodeOrdinal))
{
//This is the adapter's 3-D engine. This engine handles a large number
//of different workloads, but it also handles the adapter's 3-D
//workloads. Therefore the 3-D capability is what must be exposed.
case GPU_ENGINE_3D:
{
pGetNodeMetadata->EngineType = DXGK_ENGINE_TYPE_3D;
break;
}
//This is the adapter's video decoding engine

case GPU_ENGINE_VIDEO_DECODE:
{
pGetNodeMetadata->EngineType = DXGK_ENGINE_TYPE_VIDEO_DECODE;
break;
}
//This engine is proprietary and contains no functionality that

//fits the DXGK_ENGINE_TYPE enumeration
case GPU_ENGINE_PROPRIETARY_ENGINE_1:
{
pGetNodeMetadata->EngineType = DXGK_ENGINE_TYPE_OTHER;
//Copy over friendly name associated with this engine

SetFriendlyNameForEngine(pGetNodeMetadata->FriendlyName,
DXGK_MAX_METADATA_NAME_LENGTH,
PROPRIETARY_ENGINE_1_NAME);
break;
}
}
return STATUS_SUCCESS;
}

The OpenGL runtime accesses the registry to determine which OpenGL installable client driver (ICD) to load. To
load the OpenGL ICD, the OpenGL runtime:
Determines the name, version, and flags that are associated with the OpenGL ICD by calling the
D3DKMTQueryAdapterInfo function with the KMTQAITYPE_UMOPENGLINFO value set in the Type
member of the D3DKMT_QUERYADAPTERINFO structure that the pData parameter points to.
Checks the version number of the OpenGL ICD that D3DKMTQueryAdapterInfo returns to validate the
version of the OpenGL ICD.
Loads the OpenGL ICD by using the name of the OpenGL ICD.
Initializes access to the OpenGL ICD's functions. Note To obtain a license for the OpenGL ICD Development
Kit, contact the OpenGL Issues team.
To locate the name of the OpenGL ICD, D3DKMTQueryAdapterInfo searches the registry in the following key:
HKLM/System/CurrentControlSet/Control/Class/{Adapter GUID}/0000/
This key also contains the names of the Microsoft Direct3D user-mode display drivers. This key contains four
registry entries for 32-bit Windows Vista display drivers that are used on 32-bit Windows Vista and four entries for
32-bit Windows Vista display drivers that are used on 64-bit Windows Vista. The following entries are for 32-bit
Windows Vista display drivers that are used on 32-bit Windows Vista:
UserModeDriverName
REG_SZ
The name of the Direct3D user-mode display driver, which is required for the operation of a Direct3D rendering
device regardless of whether the operating system supports an OpenGL ICD.
OpenGLDriverName
REG_SZ
The name of the OpenGL ICD. For example, if the OpenGL ICD is Mydriver.dll, the value of this entry is
Mydriver.dll.
OpenGLVersion
REG_DWORD
The version number of the OpenGL ICD that the OpenGL runtime uses to validate the version of the OpenGL ICD.
OpenGLFlags
REG_DWORD
A flag bitmask. Currently, bit 0 (0x00000001) is set for compatibility. When bit 1 (0x00000002) is set, the OpenGL
runtime does not call the ICD's finish function before the runtime calls the ICD's swap-buffers function.
The following entries are for 32-bit Windows Vista display drivers that are used on 64-bit Windows Vista:
UserModeDriverNameWow
REG_SZ
The name of the 32-bit Microsoft Direct3D user-mode display driver for 64-bit Windows Vista.
OpenGLDriverNameWow
REG_SZ
The name of the 32-bit OpenGL ICD for 64-bit Windows Vista.
OpenGLVersionWow
REG_DWORD
The version number of the 32-bit OpenGL ICD for 64-bit Windows Vista.
OpenGLFlagsWow
REG_DWORD
A flag bitmask of the 32-bit OpenGL ICD for 64-bit Windows Vista.
Providing Kernel-Mode Support to the OpenGL
Installable Client Driver
The OpenGL installable client driver (ICD) can obtain the same level of support for calling kernel-mode services as
the Direct3D user-mode display driver. However, rather than gaining access to kernel-mode services through
callback functions like the Microsoft Direct3D runtime supplies through the pAdapterCallbacks member of the
D3DDDIARG_OPENADAPTER structure and the pCallbacks member of the D3DDDIARG_CREATEDEVICE
structure, the OpenGL ICD must load Gdi32.dll and initialize use of the OpenGL-kernel-mode-accessing functions
as shown in the following example code. This code does not implement Windows 8 enhancements in OpenGL.
Note To obtain a license for the OpenGL ICD Development Kit, contact the OpenGL Issues team.
#include "d3dkmthk.h"
PFND3DKMT_CREATEALLOCATION pfnKTCreateAllocation = NULL;

PFND3DKMT_DESTROYALLOCATION pfnKTDestroyAllocation = NULL;
PFND3DKMT_SETALLOCATIONPRIORITY pfnKTSetAllocationPriority = NULL;
PFND3DKMT_QUERYALLOCATIONRESIDENCY pfnKTQueryAllocationResidency = NULL;
PFND3DKMT_QUERYRESOURCEINFO pfnKTQueryResourceInfo = NULL;
PFND3DKMT_OPENRESOURCE pfnKTOpenResource = NULL;
PFND3DKMT_CREATEDEVICE pfnKTCreateDevice = NULL;
PFND3DKMT_DESTROYDEVICE pfnKTDestroyDevice = NULL;
PFND3DKMT_QUERYADAPTERINFO pfnKTQueryAdapterInfo = NULL;
PFND3DKMT_LOCK pfnKTLock = NULL;
PFND3DKMT_UNLOCK pfnKTUnlock = NULL;
PFND3DKMT_GETDISPLAYMODELIST pfnKTGetDisplayModeList = NULL;
PFND3DKMT_SETDISPLAYMODE pfnKTSetDisplayMode = NULL;
PFND3DKMT_GETMULTISAMPLEMETHODLIST pfnKTGetMultisampleMethodList = NULL;
PFND3DKMT_PRESENT pfnKTPresent = NULL;
PFND3DKMT_RENDER pfnKTRender = NULL;
PFND3DKMT_OPENADAPTERFROMHDC pfnKTOpenAdapterFromHdc = NULL;
PFND3DKMT_OPENADAPTERFROMDEVICENAME pfnKTOpenAdapterFromDeviceName = NULL;
PFND3DKMT_CLOSEADAPTER pfnKTCloseAdapter = NULL;
PFND3DKMT_GETSHAREDPRIMARYHANDLE pfnKTGetSharedPrimaryHandle = NULL;
PFND3DKMT_ESCAPE pfnKTEscape = NULL;
PFND3DKMT_SETVIDPNSOURCEOWNER pfnKTSetVidPnSourceOwner = NULL;
PFND3DKMT_CREATEOVERLAY pfnKTCreateOverlay = NULL;

PFND3DKMT_UPDATEOVERLAY pfnKTUpdateOverlay = NULL;
PFND3DKMT_FLIPOVERLAY pfnKTFlipOverlay = NULL;
PFND3DKMT_DESTROYOVERLAY pfnKTDestroyOverlay = NULL;
PFND3DKMT_WAITFORVERTICALBLANKEVENT pfnKTWaitForVerticalBlankEvent = NULL;
PFND3DKMT_SETGAMMARAMP pfnKTSetGammaRamp = NULL;
PFND3DKMT_GETDEVICESTATE pfnKTGetDeviceState = NULL;
PFND3DKMT_CREATEDCFROMMEMORY pfnKTCreateDCFromMemory = NULL;
PFND3DKMT_DESTROYDCFROMMEMORY pfnKTDestroyDCFromMemory = NULL;
PFND3DKMT_SETCONTEXTSCHEDULINGPRIORITY pfnKTSetContextSchedulingPriority = NULL;
PFND3DKMT_GETCONTEXTSCHEDULINGPRIORITY pfnKTGetContextSchedulingPriority = NULL;
PFND3DKMT_SETPROCESSSCHEDULINGPRIORITYCLASS pfnKTSetProcessSchedulingPriorityClass = NULL;
PFND3DKMT_GETPROCESSSCHEDULINGPRIORITYCLASS pfnKTGetProcessSchedulingPriorityClass = NULL;
PFND3DKMT_RELEASEPROCESSVIDPNSOURCEOWNERS pfnKTReleaseProcessVidPnSourceOwners = NULL;
PFND3DKMT_GETSCANLINE pfnKTGetScanLine = NULL;
PFND3DKMT_POLLDISPLAYCHILDREN pfnKTPollDisplayChildren = NULL;
PFND3DKMT_SETQUEUEDLIMIT pfnKTSetQueuedLimit = NULL;
PFND3DKMT_INVALIDATEACTIVEVIDPN pfnKTInvalidateActiveVidPn = NULL;
PFND3DKMT_CHECKOCCLUSION pfnKTCheckOcclusion = NULL;
PFND3DKMT_GETPRESENTHISTORY pfnKTGetPresentHistory = NULL;
PFND3DKMT_CREATECONTEXT pfnKTCreateContext = NULL;
PFND3DKMT_DESTROYCONTEXT pfnKTDestroyContext = NULL;
PFND3DKMT_DESTROYCONTEXT pfnKTDestroyContext = NULL;
PFND3DKMT_CREATESYNCHRONIZATIONOBJECT pfnKTCreateSynchronizationObject = NULL;
PFND3DKMT_DESTROYSYNCHRONIZATIONOBJECT pfnKTDestroySynchronizationObject = NULL;
PFND3DKMT_WAITFORSYNCHRONIZATIONOBJECT pfnKTWaitForSynchronizationObject = NULL;
PFND3DKMT_SIGNALSYNCHRONIZATIONOBJECT pfnKTSignalSynchronizationObject = NULL;
PFND3DKMT_CHECKMONITORPOWERSTATE pfnKTCheckMonitorPowerState = NULL;
PFND3DKMT_OPENADAPTERFROMGDIDISPLAYNAME pfnKTOpenAdapterFromGDIDisplayName = NULL;
PFND3DKMT_CHECKEXCLUSIVEOWNERSHIP pfnKTCheckExclusiveOwnership = NULL;
PFND3DKMT_SETDISPLAYPRIVATEDRIVERFORMAT pfnKTSetDisplayPrivateDriverFormat = NULL;
PFND3DKMT_SHAREDPRIMARYLOCKNOTIFICATION pfnKTSharedPrimaryLockNotification = NULL;
PFND3DKMT_SHAREDPRIMARYUNLOCKNOTIFICATION pfnKTSharedPrimaryUnLockNotification = NULL;
HRESULT InitKernelTHunks()
{
HINSTANCE hInst = NULL;
hInst = LoadLibrary( "gdi32.dll" );

if (hInst == NULL) {
return E_FAIL;
}
pfnKTCreateAllocation = (PFND3DKMT_CREATEALLOCATION)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateAllocation" );
pfnKTQueryResourceInfo = (PFND3DKMT_QUERYRESOURCEINFO)
GetProcAddress((HMODULE)hInst, "D3DKMTQueryResourceInfo" );
pfnKTOpenResource = (PFND3DKMT_OPENRESOURCE)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateAllocation" );
pfnKTDestroyAllocation = (PFND3DKMT_DESTROYALLOCATION)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroyAllocation" );
pfnKTSetAllocationPriority = (PFND3DKMT_SETALLOCATIONPRIORITY)
GetProcAddress((HMODULE)hInst, "D3DKMTSetAllocationPriority" );
pfnKTQueryAllocationResidency = (PFND3DKMT_QUERYALLOCATIONRESIDENCY)
GetProcAddress((HMODULE)hInst, "D3DKMTQueryAllocationResidency" );
pfnKTCreateDevice = (PFND3DKMT_CREATEDEVICE)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateDevice" );
pfnKTDestroyDevice = (PFND3DKMT_DESTROYDEVICE)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroyDevice" );
pfnKTQueryAdapterInfo = (PFND3DKMT_QUERYADAPTERINFO)
GetProcAddress((HMODULE)hInst, "D3DKMTQueryAdapterInfo" );
pfnKTLock = (PFND3DKMT_LOCK)
GetProcAddress((HMODULE)hInst, "D3DKMTLock" );
pfnKTUnlock = (PFND3DKMT_UNLOCK)
GetProcAddress((HMODULE)hInst, "D3DKMTUnlock" );
pfnKTGetDisplayModeList = (PFND3DKMT_GETDISPLAYMODELIST)
GetProcAddress((HMODULE)hInst, "D3DKMTGetDisplayModeList" );
pfnKTSetDisplayMode = (PFND3DKMT_SETDISPLAYMODE)
GetProcAddress((HMODULE)hInst, "D3DKMTSetDisplayMode" );
pfnKTGetMultisampleMethodList = (PFND3DKMT_GETDISPLAYMODELIST)
GetProcAddress((HMODULE)hInst, "D3DKMTGetMultisampleMethodList" );
pfnKTPresent = (PFND3DKMT_PRESENT)
GetProcAddress((HMODULE)hInst, "D3DKMTPresent" );
pfnKTRender = (PFND3DKMT_RENDER)
GetProcAddress((HMODULE)hInst, "D3DKMTRender" );
pfnKTOpenAdapterFromHdc = (PFND3DKMT_OPENADAPTERFROMHDC)
GetProcAddress((HMODULE)hInst, "D3DKMTOpenAdapterFromHdc" );
pfnKTOpenAdapterFromDeviceName = (PFND3DKMT_OPENADAPTERFROMDEVICENAME)
GetProcAddress((HMODULE)hInst, "D3DKMTOpenAdapterFromDeviceName" );
pfnKTCloseAdapter = (PFND3DKMT_CLOSEADAPTER)
GetProcAddress((HMODULE)hInst, "D3DKMTCloseAdapter" );
pfnKTGetSharedPrimaryHandle = (PFND3DKMT_GETSHAREDPRIMARYHANDLE)
GetProcAddress((HMODULE)hInst, "D3DKMTGetSharedPrimaryHandle" );
pfnKTEscape = (PFND3DKMT_ESCAPE)
GetProcAddress((HMODULE)hInst, "D3DKMTEscape" );
pfnKTSetVidPnSourceOwner = (PFND3DKMT_SETVIDPNSOURCEOWNER)
GetProcAddress((HMODULE)hInst, "D3DKMTSetVidPnSourceOwner" );
pfnKTReleaseProcessVidPnSourceOwners = (PFND3DKMT_RELEASEPROCESSVIDPNSOURCEOWNERS)
GetProcAddress((HMODULE)hInst, "D3DKMTReleaseProcessVidPnSourceOwners" );
pfnKTCreateOverlay = (PFND3DKMT_CREATEOVERLAY)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateOverlay" );
pfnKTUpdateOverlay = (PFND3DKMT_UPDATEOVERLAY)
GetProcAddress((HMODULE)hInst, "D3DKMTUpdateOverlay" );
pfnKTFlipOverlay = (PFND3DKMT_FLIPOVERLAY)
GetProcAddress((HMODULE)hInst, "D3DKMTFlipOverlay" );
pfnKTDestroyOverlay = (PFND3DKMT_DESTROYOVERLAY)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroyOverlay" );
pfnKTWaitForVerticalBlankEvent = (PFND3DKMT_WAITFORVERTICALBLANKEVENT)
GetProcAddress((HMODULE)hInst, "D3DKMTWaitForVerticalBlankEvent" );
pfnKTSetGammaRamp = (PFND3DKMT_SETGAMMARAMP)
GetProcAddress((HMODULE)hInst, "D3DKMTSetGammaRamp" );
pfnKTGetDeviceState = (PFND3DKMT_GETDEVICESTATE)
GetProcAddress((HMODULE)hInst, "D3DKMTGetDeviceState" );
pfnKTCreateDCFromMemory = (PFND3DKMT_CREATEDCFROMMEMORY)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateDCFromMemory" );
pfnKTDestroyDCFromMemory = (PFND3DKMT_DESTROYDCFROMMEMORY)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroyDCFromMemory" );
pfnKTSetContextSchedulingPriority = (PFND3DKMT_SETCONTEXTSCHEDULINGPRIORITY)
GetProcAddress((HMODULE)hInst, "D3DKMTSetContextSchedulingPriority" );
pfnKTGetContextSchedulingPriority = (PFND3DKMT_GETCONTEXTSCHEDULINGPRIORITY)
GetProcAddress((HMODULE)hInst, "D3DKMTGetContextSchedulingPriority" );
pfnKTSetProcessSchedulingPriorityClass = (PFND3DKMT_SETPROCESSSCHEDULINGPRIORITYCLASS)
GetProcAddress((HMODULE)hInst, "D3DKMTSetProcessSchedulingPriorityClass" );
pfnKTGetProcessSchedulingPriorityClass = (PFND3DKMT_GETPROCESSSCHEDULINGPRIORITYCLASS)
GetProcAddress((HMODULE)hInst, "D3DKMTGetProcessSchedulingPriorityClass" );
pfnKTGetScanLine = (PFND3DKMT_GETSCANLINE)
GetProcAddress((HMODULE)hInst, "D3DKMTGetScanLine" );
pfnKTSetQueuedLimit = (PFND3DKMT_SETQUEUEDLIMIT)
GetProcAddress((HMODULE)hInst, "D3DKMTSetQueuedLimit" );
pfnKTPollDisplayChildren = (PFND3DKMT_POLLDISPLAYCHILDREN)
GetProcAddress((HMODULE)hInst, "D3DKMTPollDisplayChildren" );
pfnKTInvalidateActiveVidPn = (PFND3DKMT_INVALIDATEACTIVEVIDPN)
GetProcAddress((HMODULE)hInst, "D3DKMTInvalidateActiveVidPn" );
pfnKTCheckOcclusion = (PFND3DKMT_CHECKOCCLUSION)
GetProcAddress((HMODULE)hInst, "D3DKMTCheckOcclusion" );
pfnKTGetPresentHistory = (PFND3DKMT_GETPRESENTHISTORY)
GetProcAddress((HMODULE)hInst, "D3DKMTGetPresentHistory" );
pfnKTCreateContext = (PFND3DKMT_CREATECONTEXT)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateContext" );
pfnKTDestroyContext = (PFND3DKMT_DESTROYCONTEXT)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroyContext" );
pfnKTCreateSynchronizationObject = (PFND3DKMT_CREATESYNCHRONIZATIONOBJECT)
GetProcAddress((HMODULE)hInst, "D3DKMTCreateSynchronizationObject" );
pfnKTDestroySynchronizationObject = (PFND3DKMT_DESTROYSYNCHRONIZATIONOBJECT)
GetProcAddress((HMODULE)hInst, "D3DKMTDestroySynchronizationObject" );
pfnKTWaitForSynchronizationObject = (PFND3DKMT_WAITFORSYNCHRONIZATIONOBJECT)
GetProcAddress((HMODULE)hInst, "D3DKMTWaitForSynchronizationObject" );
pfnKTSignalSynchronizationObject = (PFND3DKMT_SIGNALSYNCHRONIZATIONOBJECT)
GetProcAddress((HMODULE)hInst, "D3DKMTSignalSynchronizationObject" );
pfnKTCheckMonitorPowerState = (PFND3DKMT_CHECKMONITORPOWERSTATE)
GetProcAddress((HMODULE)hInst, "D3DKMTCheckMonitorPowerState" );
pfnKTOpenAdapterFromGDIDisplayName = (PFND3DKMT_OPENADAPTERFROMGDIDISPLAYNAME)
GetProcAddress((HMODULE)hInst, "D3DKMTOpenAdapterFromGdiDisplayName" );
pfnKTCheckExclusiveOwnership = (PFND3DKMT_CHECKEXCLUSIVEOWNERSHIP)
GetProcAddress((HMODULE)hInst, "D3DKMTCheckExclusiveOwnership" );
pfnKTSetDisplayPrivateDriverFormat = (PFND3DKMT_SETDISPLAYPRIVATEDRIVERFORMAT)
GetProcAddress((HMODULE)hInst, "D3DKMTSetDisplayPrivateDriverFormat" );
pfnKTSharedPrimaryLockNotification = (PFND3DKMT_SHAREDPRIMARYLOCKNOTIFICATION)
GetProcAddress((HMODULE)hInst, "D3DKMTSharedPrimaryLockNotification" );
pfnKTSharedPrimaryUnLockNotification = (PFND3DKMT_SHAREDPRIMARYUNLOCKNOTIFICATION)
GetProcAddress((HMODULE)hInst, "D3DKMTSharedPrimaryUnLockNotification" );
if ((pfnKTCreateAllocation == NULL) ||
(pfnKTQueryResourceInfo == NULL) ||
(pfnKTOpenResource == NULL) ||
(pfnKTDestroyAllocation == NULL) ||
(pfnKTSetAllocationPriority == NULL) ||
(pfnKTQueryAllocationResidency == NULL) ||
(pfnKTCreateDevice == NULL) ||
(pfnKTDestroyDevice == NULL) ||
(pfnKTQueryAdapterInfo == NULL) ||
(pfnKTLock == NULL) ||
(pfnKTUnlock == NULL) ||
(pfnKTGetDisplayModeList == NULL) ||
(pfnKTSetDisplayMode == NULL) ||
(pfnKTGetMultisampleMethodList == NULL) ||
(pfnKTPresent == NULL) ||
(pfnKTRender == NULL) ||
(pfnKTOpenAdapterFromHdc == NULL) ||
(pfnKTOpenAdapterFromDeviceName == NULL) ||
(pfnKTCloseAdapter == NULL) ||
(pfnKTGetSharedPrimaryHandle == NULL) ||
(pfnKTEscape == NULL) ||
(pfnKTSetVidPnSourceOwner == NULL) ||
(pfnKTCreateOverlay == NULL) ||
(pfnKTUpdateOverlay == NULL) ||
(pfnKTUpdateOverlay == NULL) ||
(pfnKTFlipOverlay == NULL) ||
(pfnKTDestroyOverlay == NULL) ||
(pfnKTWaitForVerticalBlankEvent == NULL) ||
(pfnKTSetGammaRamp == NULL) ||
(pfnKTGetDeviceState == NULL) ||
(pfnKTCreateDCFromMemory == NULL) ||
(pfnKTDestroyDCFromMemory == NULL) ||
(pfnKTSetContextSchedulingPriority == NULL) ||
(pfnKTGetContextSchedulingPriority == NULL) ||
(pfnKTSetProcessSchedulingPriorityClass == NULL) ||
(pfnKTGetProcessSchedulingPriorityClass == NULL) ||
(pfnKTReleaseProcessVidPnSourceOwners == NULL) ||
(pfnKTGetScanLine == NULL) ||
(pfnKTSetQueuedLimit == NULL) ||
(pfnKTPollDisplayChildren == NULL) ||
(pfnKTInvalidateActiveVidPn == NULL) ||
(pfnKTCheckOcclusion == NULL) ||
(pfnKTCreateContext == NULL) ||
(pfnKTDestroyContext == NULL) ||
(pfnKTCreateSynchronizationObject == NULL) ||
(pfnKTDestroySynchronizationObject == NULL) ||
(pfnKTWaitForSynchronizationObject == NULL) ||
(pfnKTSignalSynchronizationObject == NULL) ||
(pfnKTCheckMonitorPowerState == NULL) ||
(pfnKTOpenAdapterFromGDIDisplayName == NULL) ||
(pfnKTCheckExclusiveOwnership == NULL) ||
(pfnKTSetDisplayPrivateDriverFormat == NULL) ||
(pfnKTSharedPrimaryLockNotification == NULL) ||
(pfnKTSharedPrimaryUnLockNotification == NULL) ||
(pfnKTGetPresentHistory == NULL))
{
return E_FAIL;
}
return S_OK;
}

WDDM Threading and Synchronization Model
The following topics describe the display driver threading and synchronization model for the Windows Display
Driver Model (WDDM):
Threading and Synchronization Model of Display Miniport Driver
Threading and Synchronization Model of Display
Miniport Driver
Multiple threads can be present within the display miniport driver at the same time. That is, in general, the display
miniport driver is reentrant. However, some calls into the display miniport driver should not be reentrant because
they either access graphics hardware or access global cross-thread data structures. Although reentrancy or
nonreentrancy cannot be selected at a per-call level, the Windows Display Driver Model (WDDM) pre-assigns, per
call, the following synchronization levels that define precisely what the driver should expect for the call:
Threading and Synchronization Third Level
Threading and Synchronization Second Level
Threading and Synchronization First Level
Threading and Synchronization Zero Level
Thread Synchronization and TDR
Threading and Synchronization Third Level
The Windows Display Driver Model (WDDM) guarantees that the following calls into the display miniport driver are
made under the third level of threading and synchronization. This ensures that only a single thread (that is, the
calling thread) is within the driver. In addition, the graphics hardware is idle, no direct memory access (DMA)
buffers are currently being processed by the driver or passed through the GPU scheduler, and the video memory is
completely evicted to host CPU memory.
DxgkDdiAddDevice
DxgkDdiQueryChildRelations
DxgkDdiRemoveDevice
DxgkDdiResetFromTimeout
DxgkDdiRestartFromTimeout
DxgkDdiSetPowerState
DxgkDdiStartDevice
DxgkDdiStopDevice
DxgkDdiUnload
Threading and Synchronization Second Level
The second level of threading and synchronization is the same as the third level, except that video memory is not
evicted to host CPU memory. In other words, the Windows Display Driver Model (WDDM) guarantees that only a
single thread (that is, the calling thread) is within the display miniport driver, the graphics hardware is idle, and no
direct memory access (DMA) buffers are currently being processed by the driver or passed through the GPU
scheduler. The following calls into the display miniport driver are made under the second level:
Note In order for some calls to be made under the second level, the HardwareAccess flag must be set within the
D3DDDI_ESCAPEFLAGS structure that is a member of DXGKARG_ESCAPE. If this flag is not set, then the call will
fail.
DxgkDdiCommitVidPn
DxgkDdiControlInterrupt
DxgkDdiDispatchIoRequest
DxgkDdiEscape
DxgkDdiNotifyAcpiEvent
DxgkDdiQueryInterface
DxgkDdiRecommendFunctionalVidPn
DxgkDdiRecommendMonitorModes
DxgkDdiSetPalette
DxgkDdiSetVidPnSourceAddress
DxgkDdiSetVidPnSourceVisibility
DxgkDdiUpdateActiveVidPnPresentPath
Threading and Synchronization First Level
The Windows Display Driver Model (WDDM) categorizes calls into the display miniport driver that are made under
the first level of threading and synchronization into the following nonreentrancy classes. No reentrancy is
permitted within a particular class. That is, only one thread can enter the driver within a particular class; however,
calls from multiple classes and zero-level calls can be entered simultaneously.
Note Although two or more threads from different classes and threads from zero-level calls can be running in the
driver at the same time, no two threads can belong to a single process.
Note The child I/O class functions are synchronized per child device (that is, simultaneous calls to multiple child
devices are allowed). However, if internal dependencies exist between child devices, the display miniport driver
must block calls as required.
Pointer Class
GPU Scheduler Class
Swizzling Range Class
Overlay Class
Child I/O Class
Pointer Class
The Windows Display Driver Model (WDDM) does not permit a call into one of the pointer class functions in a
reentrant fashion. That is, at the most, one thread can be running within one of the following functions at a given
time:
DxgkDdiSetPointerPosition
DxgkDdiSetPointerShape
GPU Scheduler Class
The Windows Display Driver Model (WDDM) does not permit a call into one of the GPU scheduler loader class
functions in a reentrant fashion. That is, at the most, one thread can be running within one of the following
functions at a given time:
DxgkDdiBuildPagingBuffer
DxgkDdiPatch
DxgkDdiPreemptCommand
DxgkDdiSubmitCommand
Swizzling Range Class
The Windows Display Driver Model (WDDM) does not permit a call into one of the swizzling range class functions
in a reentrant fashion. That is, at the most, one thread can be running within one of the following functions at a
given time:
DxgkDdiAcquireSwizzlingRange
DxgkDdiReleaseSwizzlingRange
Overlay Class
The Windows Display Driver Model (WDDM) does not permit a call into one of the overlay class functions in a
reentrant fashion. That is, at the most, one thread can be running within one of the following functions at a given
time:
DxgkDdiCreateOverlay
DxgkDdiDestroyOverlay
DxgkDdiFlipOverlay
DxgkDdiUpdateOverlay
Child I/O Class
The Windows Display Driver Model (WDDM) does not permit a call into one of the child I/O class functions in a
reentrant fashion. That is, at the most, one thread can be running within one of the following functions per child
device at a given time:
DxgkDdiQueryChildStatus
DxgkDdiQueryDeviceDescriptor
DxgkDdiI2CReceiveDataFromDisplay
DxgkDdiI2CTransmitDataToDisplay
DxgkDdiOPMConfigureProtectedOutput
DxgkDdiOPMCreateProtectedOutput
DxgkDdiOPMDestroyProtectedOutput
DxgkDdiOPMGetCertificate
DxgkDdiOPMGetCertificateSize
DxgkDdiOPMGetCOPPCompatibleInformation
DxgkDdiOPMGetInformation
DxgkDdiOPMGetRandomNumber
DxgkDdiOPMSetSigningKeyAndSequenceNumbers
Threading and Synchronization Zero Level
The Windows Display Driver Model (WDDM) permits the following calls into the display miniport driver to be
made in a reentrant fashion. That is, more than one thread can simultaneously enter the driver by calling the
following functions:
Note Although two or more threads can be running in the driver at the same time, no two threads can belong to a
single process.
DxgkDdiCloseAllocation
DxgkDdiCollectDbgInfo Note DxgkDdiCollectDbgInfo should collect debug information for various failures
and can be called at any time and at high IRQL (that is, the IRQL that DxgkDdiCollectDbgInfo runs at is
generally undefined). In any case, DxgkDdiCollectDbgInfo must verify availability of the required debug
information and proper synchronization. However, if the Reason member of the
DXGKARG_COLLECTDBGINFO structure that the pCollectDbgInfo parameter of DxgkDdiCollectDbgInfo
points to is set to VIDEO_TDR_TIMEOUT_DETECTED or VIDEO_ENGINE_TIMEOUT_DETECTED, the driver
must ensure that DxgkDdiCollectDbgInfo is pageable, runs at IRQL = PASSIVE_LEVEL, and supports
synchronization zero level.
DxgkDdiControlEtwLogging
DxgkDdiCreateAllocation
DxgkDdiCreateContext
DxgkDdiCreateDevice
DxgkDdiDescribeAllocation
DxgkDdiDestroyAllocation
DxgkDdiDestroyContext
DxgkDdiDestroyDevice
DxgkDdiDpcRoutine
DxgkDdiEnumVidPnCofuncModality
DxgkDdiGetScanLine
DxgkDdiGetStandardAllocationDriverData
DxgkDdiInterruptRoutine
DxgkDdiIsSupportedVidPn
DxgkDdiMiracastCreateContext
DxgkDdiMiracastDestroyContext
DxgkDdiMiracastIoControl
DxgkDdiMiracastQueryCaps
DxgkDdiOpenAllocation
DxgkDdiPresent
DxgkDdiQueryAdapterInfo
DxgkDdiQueryCurrentFence
DxgkDdiRecommendVidPnTopology
DxgkDdiRender
DxgkDdiRenderKm
DxgkDdiResetDevice
Thread Synchronization and TDR
The following figure shows how thread synchronization works for the display miniport driver in the Windows
Display Driver Model (WDDM).
If a hardware timeout occurs, the Timeout Detection and Recovery (TDR) process initiates. The GPU scheduler calls
the driver's DxgkDdiResetFromTimeout function, which resets the GPU. DxgkDdiResetFromTimeout is called
synchronously with any other display miniport driver function, except for the runtime power management
functions DxgkDdiSetPowerComponentFState and DxgkDdiPowerRuntimeControlRequest. That is, no other thread
runs in the driver while the DxgkDdiResetFromTimeout thread runs. The operating system also guarantees that no
access to the frame buffer can occur from any application during the call to DxgkDdiResetFromTimeout; therefore,
the driver can reset a memory controller phase locked loop (PLL) and so on.
While the recovery thread executes DxgkDdiResetFromTimeout, interrupts and deferred procedure calls (DPCs) can
continue to be called. The KeSynchronizeExecution function can be used to synchronize portions of the reset
procedure with device interrupts.
After the driver returns from DxgkDdiResetFromTimeout, most driver functions can again be called, and the
operating system starts to clean up resources that are no longer required. During the cleanup period, the following
driver functions are called for the indicated reasons:
The driver is called to notify about an allocation being evicted.
For example, if the allocation was paged in a memory segment, the driver's DxgkDdiBuildPagingBuffer
function is called with the Operation member of the DXGKARG_BUILDPAGINGBUFFER structure set to
DXGK_OPERATION_TRANSFER and with the Transfer.Size member set to zero to inform the driver about
the eviction. Note that no content transfer is involved because the content was lost during the reset.
If the allocation was paged in an aperture segment, the driver's DxgkDdiBuildPagingBuffer function is called
with the Operation member of DXGKARG_BUILDPAGINGBUFFER set to
DXGK_OPERATION_UNMAP_APERTURE_SEGMENT to inform the driver to unmap the allocation from the
aperture.
The driver's DxgkDdiReleaseSwizzlingRange function is called to release an unswizzling aperture and
segment aperture ranges.
The driver should not access the GPU during the preceding calls unless absolutely necessary.
After the cleanup period is over, the operating system calls the driver's DxgkDdiRestartFromTimeout function to
inform the driver that cleanup is complete and that the operating system will resume using the adapter for
rendering.
Note TDR functionality has been updated for Windows 8. See TDR changes in Windows 8.
The user-mode display driver is not loaded into multiple processes simultaneously--the user-mode display driver
DLL is loaded into the address space of each process separately. Still, multiple threads can run in the user-mode
display driver at the same time. However, each thread that is running in the user-mode display driver must access
a different display device, which is created by a call to the user-mode display driver's CreateDevice function. For
example:
An application that creates two Microsoft Direct3D devices can have two threads that access these devices
independently.
An application can use, on two different threads, a Direct3D device that the Microsoft DirectX 9.0 Direct3D
runtime created along with a Microsoft DirectDraw device that the DirectX 5.0 runtime created.
Note Two or more threads that are using the same display device can never run in the user-mode display driver
simultaneously.
Like the display miniport driver, the user-mode display driver is not required to use any global data structures,
because Direct3D devices are independent and state and resources from each device do not affect the other
devices. If the user-mode display driver must maintain global cross-device data structures (such as, for a custom
system memory heap manager), it must arbitrate access by using its own mechanisms. Such global data structures
that the driver manages are strongly discouraged. Because the Direct3D runtime opens an independent "view" of
the shared resource in each user-mode display device that must access the resource, cross-process or cross-device
resources should not be handled differently from resources that a single process or device use. Lifetime and other
management are handled by the DirectX graphics kernel subsystem (Dxgkrnl.sys).
On multiple-processor computers, the Direct3D runtime might call a user-mode display driver from a worker
thread instead of from the main application thread. This multiple-processor optimization is transparent to the user-
mode display driver. When the runtime uses multiple-processor optimization, it still ensures that only one thread
that references a particular device runs in the driver at any given time.
The following sections describe the video memory management and graphics processing unit (GPU) scheduling
model:
GPU preemption
The following topics introduce memory segments and describe how they are used in the display driver model for
Windows Vista:
Using Memory Segments to Describe the GPU Address Space
Configuring Memory Segment Types
Dividing a Memory-Space Segment into Banks
Mapping Virtual Addresses to a Memory Segment
Specifying Segments for DMA Buffers
Specifying Segments When Creating Allocations
Reporting Graphics Memory
Using Memory Segments to Describe the GPU
Address Space
Before the video memory manager can manage the address space of the GPU, the display miniport driver must
describe the GPU's address space to the video memory manager by using memory segments. The display miniport
driver creates memory segments to generalize and virtualize video memory resources. The driver can configure
memory segments according to the memory types that the hardware supports (for example, frame buffer memory
or system memory aperture).
During driver initialization, the driver must return the list of segment types that describe how memory resources
can be managed by the video memory manager. The driver specifies the number of segment types that it supports
and describes each segment type by responding to calls to its DxgkDdiQueryAdapterInfo function. The driver
describes each segment using a DXGK_SEGMENTDESCRIPTOR structure. For more information, see Initializing
Use of Memory Segments.
Thereafter, the number and types of segments remain unchanged. The video memory manager ensures that each
process receives a fair share of the resources in any particular segment. The video memory manager manages all
segments independently, and segments do not overlap. Therefore, the video memory manager allocates a fair
amount of video memory resources from one segment to an application regardless of the amount of resources
that application currently holds from another segment.
The driver assigns a segment identifier to each of its memory segments. Later, when the video memory manager
requests to create allocations for video resources and render those resources, the driver identifies the segments
that support the request and specifies, in order, the segments that the driver prefers the video memory manager
use. For more information, see Specifying Segments When Creating Allocations.
The driver is not required to specify all video memory resources that are available to the GPU in its memory
segments; however, the driver must specify all memory resources that the video memory manager manages
among all processes running on the system. For example, a vertex shader microcode that implements a fixed
function pipeline can reside in the GPU address space, but outside the memory managed by the video memory
manager (that is, not part of a segment) because the microcode is always available to all processes and is never the
source of contention between processes. However, the video memory manager must allocate video memory
resources, such as vertex buffers, textures, render targets, and application-specific shader code, from one of the
driver's memory segments because the resource types must be fairly available to all processes.
The following figure shows how the driver can configure memory segments from the GPU address space.
Note Video memory that is hidden from the video memory manager cannot be mapped into user space or be
made exclusively available to any particular process. To do so breaks the fundamental rules of virtual memory that
require that all processes running on the system have access to all memory.
Configuring Memory Segment Types
The video memory manager and display hardware only support certain types of memory segments, so the display
miniport driver can only configure segments of those types. The display miniport driver can configure memory-
space and aperture-space segments, which are different in that a memory-space segment consists of a medium
that holds the bits of an allocation while an aperture-space segment is a virtual address space. When a range in a
memory-space segment is allocated, actual memory is allocated. When a range in an aperture-space segment is
allocated, the virtual address space is redirected to physical pages that are allocated independently from either a
video memory pool or system memory.
The display miniport driver can configure the following types of memory segments:
Linear Memory-Space Segments
Linear Aperture-Space Segments
AGP-Type Aperture-Space Segments
Linear Memory-Space Segments
A linear memory-space segment is the classical type of segment that display hardware uses. The linear memory-
space segment conforms to the following model:
Virtualizes video memory located on the graphics adapter.
Is accessed directly by the GPU (that is, without redirection through page mapping).
Is managed linearly in a one-dimensional address space.
The driver sets the Flags member of the DXGK_SEGMENTDESCRIPTOR structure to 0 to specify a linear memory-
space segment. However, the driver can set the following bit-field flags to indicate additional segment support:
CpuVisible to indicate that the segment is CPU-accessible.
UseBanking to indicate that the segment is divided into banks.
The following figure shows a visual representation of a linear memory-space segment.

Linear Aperture-Space Segments
A linear aperture-space segment is similar to a linear memory-space segment; however, the aperture-space
segment is only an address space and cannot hold bits. To hold the bits, system memory pages must be allocated,
and the address-space range must be redirected to refer to those pages. The display miniport driver must
implement the DxgkDdiBuildPagingBuffer function for DXGK_OPERATION_MAP_APERTURE_SEGMENT and
DXGK_OPERATION_UNMAP_APERTURE_SEGMENT operation types to handle the redirection and must expose this
function as described in DriverEntry of Display Miniport Driver. The DxgkDdiBuildPagingBuffer function
receives the range to be redirected and the MDL that references the physical system memory pages that were
allocated.
The display miniport driver typically accomplishes the redirection of the address-space range by programming a
page table, which is unknown to the video memory manager.
The driver must set the Aperture bit-field flag in the Flags member of the DXGK_SEGMENTDESCRIPTOR
structure to specify a linear aperture-space segment. The driver can also set the following bit-field flags to indicate
additional segment support:
CpuVisible to indicate that the segment is CPU-accessible.
CacheCoherent to indicate that the segment maintains cache coherency with the CPU for the pages to
which the segment redirects.
The following figure shows a visual representation of a linear aperture-space segment.

AGP-Type Aperture-Space Segments
An AGP-type aperture-space segment is similar to a linear aperture-space segment; however, the display miniport
driver does not expose DXGK_OPERATION_MAP_APERTURE_SEGMENT and
DXGK_OPERATION_UNMAP_APERTURE_SEGMENT operation types of the DxgkDdiBuildPagingBuffer callback
function through the AGP-type aperture-space segment. Instead, the video memory manager uses the GART driver
to map and unmap system pages (that is, the video memory manager does not involve the display miniport driver).
The driver must set the Agp bit-field flag in the Flags member of the DXGK_SEGMENTDESCRIPTOR structure to
specify an AGP-type aperture-space segment.
Dividing a Memory-Space Segment into Banks
The display miniport driver can provide fine-grained hints to the video memory manager about the optimal
placement for allocations of video resources within a linear memory-space segment by dividing the segment into
banked memory (banks). If the driver divides the linear memory-space segment into banks, the driver must set the
UseBanking bit-field flag in the Flags member of the DXGK_SEGMENTDESCRIPTOR structure for the segment.
The driver returns hints about banked memory in the HintedBank member of DXGK_ALLOCATIONINFO
structures for allocations when the video memory manager calls the driver's DxgkDdiCreateAllocation function.
For more information, see Specifying Segments When Creating Allocations.
While an allocation must be entirely contained within a segment, the allocation can cross the boundaries of banks
within a segment.
If banks are used, the driver must cover the entire address space of the segment with banks. The first bank always
starts at offset zero within the segment and the last bank always ends at the end of the segment. Banks are
contiguous and have no free space between them.
Mapping Virtual Addresses to a Memory Segment
The display miniport driver can specify, for each memory-space or aperture-space segment that it defines, whether
CPU virtual addresses can map directly to an allocation located in the segment by setting the CpuVisible bit-field
flag in the Flags member of the DXGK_SEGMENTDESCRIPTOR structure for the segment.
To map a CPU virtual address to a segment, the segment should have linear access through the PCI aperture. In
other words, the offset of any allocation within the segment should be the same as the offset in the PCI aperture.
Therefore, the video memory manager can calculate the bus-relative physical address of any allocation based on
the allocation's offset within the given segment.
The following diagram illustrates how virtual addresses are mapped to a linear memory-space segment.
The following diagram illustrates how virtual addresses are mapped to the underlying pages of a linear aperture-
space segment.
Before mapping a virtual address to a portion of the segment, the video memory manager calls the display
miniport driver's DxgkDdiAcquireSwizzlingRange function so that the driver can set up the aperture that is used
for accessing bits of the allocation that might be swizzled. The driver can change neither the offset into the PCI
aperture where the allocation is accessed nor the amount of space that the allocation takes up in the aperture. If the
driver cannot make the allocation CPU-accessible given these constraints (for example, the hardware possibly ran
out of unswizzling aperture), the video memory manager evicts the allocation to system memory and lets the
application access the bits there.
If the content of a previously created allocation is in system memory when the user-mode display driver calls the
pfnLockCb function to request direct access to the memory, the video memory manager returns the system
memory buffer to the user-mode display driver, and the display miniport driver is not involved in accessing the
allocation. Therefore, the content of the allocation is not modified by the display miniport driver and remains in
unswizzled format. This implies that when a CPU-accessible allocation is evicted from video memory, the display
miniport driver must unswizzle the allocation so that the resultant system memory bits can be directly accessed by
the application.
If the GPU resources that are associated with an allocation currently mapped for direct application access are
evicted, the content of the allocation is transferred to system memory so that the application can continue to access
the content at the same virtual address but different physical medium. To set up the transfer, the video memory
manager calls the display miniport driver's DxgkDdiBuildPagingBuffer function to create a paging buffer, and
the GPU scheduler calls the driver's DxgkDdiSubmitCommand function to queue the paging buffer to the GPU
execution unit. The hardware-specific transfer command is in the paging buffer. For more information, see
Submitting a Command Buffer. The video memory manager ensures that the transition of video to system memory
is invisible to the application. However, the driver must ensure that the byte ordering of an allocation through the
PCI aperture exactly matches the byte ordering of the allocation when the allocation is evicted.
For aperture-space segments, the underlying bits of the allocation are already in system memory, so no transfer
(unswizzling) of data during the eviction process is required. Therefore, a CPU-accessible allocation located in an
aperture-space segment cannot be swizzled if it is accessed directly by an application.
If a surface will be directly accessible through the CPU by an application but will be swizzled in an aperture-space
segment, the display drivers should implement the surface as two different allocations. When the user-mode
display driver creates such a surface, it can call the pfnAllocateCb function and can set the NumAllocations
member of the D3DDDICB_ALLOCATE structure to 2 and the pPrivateDriverData members of the
D3DDDI_ALLOCATIONINFO structures in the pAllocationInfo array of D3DDDICB_ALLOCATE to point to private
data about the allocations (such as their swizzled and unswizzled formats). The allocation that will be used by the
GPU contains bits in swizzled format, and the allocation that will be accessed by the application contains the bits in
unswizzled format. The video memory manager calls the display miniport driver's DxgkDdiCreateAllocation
function to create the allocations. The display miniport driver interprets the private data (in the
pPrivateDriverData member of the DXGK_ALLOCATIONINFO structure for each allocation) that is passed from
the user-mode display driver. The video memory manager is unaware of the format of the allocations; it just
allocates blocks of memory of certain sizes and alignments for the allocations. A call to the user-mode display
driver's Lock function to lock the surface for processing causes the following actions:
1. The user-mode display driver calls the pfnRenderCb function to submit the unswizzle operation in the
command buffer to the Direct3D runtime and on to the display miniport driver.
2. The user-mode display driver calls the pfnLockCb function to lock the unswizzled allocation. Note that the
user-mode display driver must not set the D3DDDILOCKCB_DONOTWAIT flag in the Flags member of the
D3DDDICB_LOCK structure.
3. The pfnLockCb function waits until the transfer (unswizzling) between allocations is performed.
4. The pfnLockCb function requests that the display miniport driver obtains a virtual address for the
unswizzled allocation and returns the virtual address to the user-mode display driver in the pData member
of D3DDDICB_LOCK.
5. The user-mode display driver returns the unswizzled allocation's virtual address to the application in the
pSurfData member of D3DDDIARG_LOCK.
Specifying Segments for DMA Buffers
The display miniport driver can specify aperture segments from which DMA buffers can be allocated. DMA buffers
can also be allocated as contiguous locked-down system memory.
The video memory manager allocates and destroys DMA buffers when applications require them. Therefore, the
video memory manager requires a set of segments from which it can allocate DMA buffers. Note that the segment
set might consist of only one segment.
When the Microsoft DirectX graphics kernel subsystem calls the display miniport driver's DxgkDdiCreateDevice
function to create a graphics context device, the display miniport driver can specify a segment set from which the
video memory manager can allocate DMA buffers. If the display miniport driver sets the DmaBufferSegmentSet
member of the DXGK_DEVICEINFO structure to 0, then the video memory manager will allocate contiguous
nonpaged memory for DMA buffers; in this case, the display miniport driver must access the memory by using PCI
cycles, and through DMA, must send data directly from the memory's physical address. If the display miniport
driver sets DmaBufferSegmentSet to nonzero, then the video memory manager will allocate pageable memory
and will map the pages to the specified aperture segments. The pages within the aperture segments are revealed to
the display miniport driver in a call to its DxgkDdiSubmitCommand function.
Note that the basic video memory manager model does not support DMA buffers in local video memory.
Specifying Segments When Creating Allocations
The display miniport driver specifies and returns information about its memory segments that it prefers the video
memory manager use when the video memory manager calls the driver's DxgkDdiCreateAllocation function. In
the call to DxgkDdiCreateAllocation, the driver creates allocations for video resources. The driver returns identifiers
of supported segments and segment preferences in the DXGK_ALLOCATIONINFO structures that describe the
allocations.
From the returned segment information, the video memory manager determines the appropriate memory
segment to page-in for the given operation.
Reporting Graphics Memory
The video memory manager reports to clients about the memory information that the display miniport driver
supplies.
Operating systems prior to Windows Vista report graphics memory as a single number through the Control Panel
Display application. Display drivers provide this number to the operating system; the operating system then
reports the number to the user through the Display application.
The video memory manager of the Windows Display Driver Model (WDDM) reports an accurate account of each
graphics memory contributor. The following clients use this report:
The Windows System Assessment Tool (WinSAT) checks for the available graphics memory and takes the
action to turn off or turn on the Premium Aero Glass experience based on the amount of available memory.
The Desktop Window Manager (DWM) (Dwm.exe) depends on the exact state of the available graphics
memory on computers with Windows Display Driver Model (WDDM) display drivers.
Microsoft DirectX games and other graphics applications must be able to get accurate values that describe
the state of the graphics memory. An inaccurate graphics memory number could drastically change the
game experience for the user.
The following sections describe how the video memory manager calculates graphics memory numbers and
provide examples of how the memory numbers are reported:
Calculating Graphics Memory
Examples of Graphics Memory Reporting
Retrieving Graphics Memory Numbers
Calculating Graphics Memory
The video memory manager must calculate the total amount of graphics memory before it can report an accurate
account of graphics memory. The following list of items describes how the video memory manager calculates the
graphics memory numbers:
Total system memory
Total amount of system memory that is accessible to the operating system. Memory that the BIOS allocates does
not appear in this total-system-memory number. For example, a computer with a 1 GB DIMM (that is, 1024 MB)
and that also has a BIOS that reserves 1 MB of memory appears to have 1023 MB of system memory.
Total system memory that is available for graphics use
Total amount of system memory that is dedicated or shared to the GPU. This number is calculated as follows:
TotalSystemMemoryAvailableForGraphics = MAX((TotalSystemMemory / 2), 64MB)
Commit limit on aperture segment

The amount of system memory that the video memory manager allows display miniport drivers to pin down (that
is, the amount of system memory that display miniport drivers can memory map through an aperture segment) for
GPU use at any given instant. The total amount of system memory that is allocated for the GPU might exceed the
commit limit greatly; however, the video memory manager ensures that only up to a commit limit amount is
actually resident in an aperture segment at any one time.
By default, the commit limit on a particular aperture segment is the size of that segment. The display miniport
driver can specify a different commit limit in the CommitLimit member of the DXGK_SEGMENTDESCRIPTOR
structure when the driver describes the segment. A commit limit that is specified in such a way applies only to the
particular segment that the driver describes.
In addition to per-segment commit limit, there is a global commit limit on all aperture segments. This global
commit limit is also referred to as shared system memory. This value is computed by the video memory manager.
However, although the display miniport driver can reduce this value to a lower value in the
ApertureSegmentCommitLimit member of the DXGK_DRIVERCAPS structure, we do not recommend this
practice.
The video memory manager does not allow a display miniport driver to violate the per-segment commit limit nor
the global commit limit. If a particular segment has a commit limit of 1 GB but the global commit limit is 256 MB,
the video memory manager does not allow a display miniport driver to map more than 256 MB of system memory
into that segment.
Dedicated video memory
Sum of the size of all memory segments for which the display miniport driver did not specify the
PopulatedFromSystemMemory member in the DXGK_SEGMENTFLAGS structure for each segment.
Dedicated system memory
Sum of the size of all memory segments for which the display miniport driver specifies the
PopulatedFromSystemMemory member in the DXGK_SEGMENTFLAGS structure for each segment. This
number cannot be greater than the total system memory that is available for graphics use
(TotalSystemMemoryAvailableForGraphics).
Shared system memory
The maximum amount of system memory that is shared to the GPU. This number is calculated as follows:
MaxSharedSystemMemory = TotalSystemMemoryAvailableForGraphics - DedicatedSystemMemory
The amount of system memory that is shared to the GPU. This number is calculated as follows:
SharedSystemMemory = MIN(MIN(SumOfCommitLimitOnAllApertureSegment, DXGK_DRIVERCAPS.ApertureSegmentCommitLimit),

MaxSharedSystemMemory)
Total video memory

The total amount of video memory. This number is calculated as follows:
TotalVideoMemory = DedicatedVideoMemory + DedicatedSystemMemory + SharedSystemMemory

Examples of Graphics Memory Reporting
The following examples compare numbers that are reported for different adapters and memory configurations on
Windows Vista versus Windows XP. The examples show the Display application and the WinSAT applet reports of
available memory.
Example 1: 256-MB Dedicated On-board Graphics Memory on a Desktop
The following screen shots show an ATI discrete graphics adapter that has 256 MB of dedicated integrated (on-
board) graphics memory. The ATI discrete graphics adapter also shares system memory (511 MB) for graphics
purposes.
The following screen shot shows a report of available memory through the Display application on Windows Vista.
The following screen shot shows a report of available memory through the WinSAT applet on Windows Vista.
The following screen shot shows a report of available memory through the Display application on Windows XP.
Note The single "Memory Size" number that the preceding screen shot shows is just the dedicated on-board
graphics memory, which is not an accurate representation of the total amount of available graphics memory.
Example 2: 32-MB Dedicated On-Board Graphics Memory on a Mobile Computer
The following screen shots show an NVIDIA TurboCache technology discrete adapter that is present in a mobile
computer. This adapter has some dedicated on-board graphics memory. However, the adapter mostly shares
system memory for graphics purposes.
Note For TurboCache computers like the one shown in the preceding screen shot, the single "Memory Size"
number is a combination, but not a total, of dedicated graphics memory and shared system memory. Again, this is
not an accurate representation of the total amount of available graphics memory.
Example 3: 256-MB Shared Graphics Memory on a Mobile Computer
The following screen shots show an Intel UMA (Unified Memory Architecture) Mobile adapter that has no dedicated
graphics memory on the motherboard. Instead, the adapter shares system memory for all graphics purposes.
Retrieving Graphics Memory Numbers
Software developers who create graphics applications can use the Microsoft DirectX version 10 APIs starting in
Windows Vista to retrieve the accurate set of graphics memory numbers on computers running Windows Display
Driver Model (WDDM) display drivers. The following steps show how to retrieve the graphics memory numbers:
1. Because the new graphics memory reporting is available only on computers running Windows Display
Driver Model (WDDM) display drivers, an application must first call the following function to confirm the
driver model:
HasWDDMDriver()
{
LPDIRECT3DCREATE9EX pD3D9Create9Ex = NULL;
HMODULE hD3D9 = NULL;
hD3D9 = LoadLibrary( L"d3d9.dll" );
if ( NULL == hD3D9 ) {
return false;
}
//
// Try to create a IDirect3D9Ex interface (also known as a DX9L
// interface).
// This interface can only be created if the driver is written
// according to the Windows Display Driver Model (WDDM).
//
pD3D9Create9Ex = (LPDIRECT3DCREATE9EX) GetProcAddress (
hD3D9, "Direct3DCreate9Ex" );
return pD3D9Create9Ex != NULL;

}
2. After the application determines that the display driver model is the WDDM, the application can use the new
DirectX version 10 APIs to get the graphics memory numbers. The application gets the graphics memory
numbers from the following DXGI_ADAPTER_DESC data structure, which is present in Dxgi.h and is
included in the DirectX Software Development Kit (SDK).
typedef struct DXGI_ADAPTER_DESC {

WCHAR Description[ 128 ];
UINT VendorId;
UINT DeviceId;
UINT SubSysId;
UINT Revision;
SIZE_T DedicatedVideoMemory;
SIZE_T DedicatedSystemMemory;
SIZE_T SharedSystemMemory;
LUID AdapterLuid;
} DXGI_ADAPTER_DESC;
Because of the extensive use of graphics in the Windows Vista and later desktop and DirectX games, software that
runs on Windows Vista and later should be able to accurately determine the amount of available graphics memory.
WDDM manages the virtualization of graphics memory in itself and also ensures accurate reporting of various
aspects of graphics memory. Application developers and software vendors should take advantage of the DirectX
version 10 APIs for retrieving the accurate set of graphics memory values on computers that have Windows Vista
display drivers.
The following topics describe how command and DMA buffers are handled in the display driver model for
Windows Vista:
Introduction to Command and DMA Buffers
Using the Guaranteed Contract DMA Buffer Model
Paging Video Memory Resources
Submitting a Command Buffer
Splitting a DMA Buffer
Requesting to Rename an Allocation
Patching a DMA Buffer
Preparing DMA Buffers
Introduction to Command and DMA Buffers
Command and DMA buffers closely resemble each other. However, a command buffer is used by the user-mode
display driver, and a DMA buffer is used by the display miniport driver.
A command buffer has the following characteristics:
It is never directly accessed by the GPU.
The hardware vendor controls the format.
It is allocated for the user-mode display driver from regular pageable memory in the private address space
of the rendering application.
A DMA buffer has the following characteristics:
It is based on the validated content of a command buffer.
It is allocated by the display miniport driver from kernel pageable memory.
Before the GPU can read from a DMA buffer, the display miniport driver must page-lock the DMA buffer and
map the DMA buffer through an aperture.
Using the Guaranteed Contract DMA Buffer Model
The display driver model for Windows Vista guarantees the size of DMA buffers and patch-location lists for a
rendering device.
In guaranteed contract mode, the user-mode display driver is aware of the exact size of the DMA buffer and patch-
location list that is available for translation when the user-mode display driver fills command buffers and calls
pfnRenderCb to submit them to the display miniport driver. After each call to pfnRenderCb, the user-mode
display driver receives the size of the DMA buffer and patch-location list that is available for the following
translation (that is, the following call to pfnRenderCb).
The video memory manager guarantees not to trim the DMA buffers and patch-location lists for that device until
the next translation is complete. The display miniport driver must be able to translate one command buffer into
exactly one DMA buffer and one patch-location list. If this translation is not possible, the user-mode command
buffer is, by definition, invalid. The display miniport driver cannot return status that indicates it is out of DMA buffer
space and patch-location lists during the translation; doing so results in the video memory manager bug checking
the system because the memory manager failed to meet the requirements of the guaranteed DMA contract.
Paging Video Memory Resources
Unlike the Microsoft Windows 2000 Display Driver Model, the Windows Vista display driver model allows more
video memory resources to be created than the total amount of physical video memory available, which are then
paged in and out of video memory as necessary. In other words, not all video memory resources are in video
memory simultaneously.
The GPU can have multiple DMA buffers in its pipeline. The video memory resources that are referenced by these
active DMA buffers must be in video memory. Other idle video memory resources can be paged out to system
memory.
Before the GPU scheduler can call the display miniport driver's DxgkDdiSubmitCommand function to submit a
DMA buffer to the GPU, the scheduler must ensure that all video memory resources used by the DMA buffer are
actually in the video memory. If some resources are not in video memory, they must be paged in from system
memory. The GPU scheduler must call upon the video memory manager to find space in video memory to transfer
necessary video memory resource data from system memory to video memory. When video memory demand is
high, the GPU scheduler must call upon the video memory manager to transfer idle video memory resource data
to system memory to make room for the required video memory resource data. The special purpose DMA buffers
that contain the commands for transferring data between video and system memory are known as paging buffers.
The video memory manager calls the display miniport driver's DxgkDdiBuildPagingBuffer function to create
paging buffers to which the driver writes hardware-specific data transfer commands.
Submitting a Command Buffer
The following sequence of operations must be performed to pass a command buffer through the Windows Vista
graphics stack:
1. The user-mode display driver initiates a command-buffer submission if the Direct3D runtime calls one of
the following user-mode display driver functions to perform the specified operation:
The Present function to display graphics.
The Flush function to submit hardware commands.
The Lock function to lock a resource, which is used in the current command batch.
Note that the user-mode display driver also always initiates a command-buffer submission whenever the
command buffer is full.
2. The user-mode display driver calls the Direct3D runtime's pfnRenderCb function to submit the command
buffer to the runtime.
3. The DirectX graphics kernel subsystem calls the display miniport driver's DxgkDdiRender or
DxgkDdiRenderKm function to validate the command buffer, write a DMA buffer in the hardware's format,
and produce an allocation list that describes the surfaces used. Note that the DMA buffer has not yet been
patched (that is, assigned physical addresses). Note If the runtime initiated the command-buffer submission
by calling the user-mode display driver's Present function, the graphics subsystem calls the display
miniport driver's DxgkDdiPresent function, rather than DxgkDdiRender or DxgkDdiRenderKm.
4. The video memory manager calls the display miniport driver's DxgkDdiBuildPagingBuffer function to
create special purpose DMA buffers, known as paging buffers, that move the allocations specified in the
allocation list that accompanies the DMA buffer to and from GPU-accessible memory. For more information,
see Paging Video Memory Resources.
5. The GPU scheduler calls the display miniport driver's DxgkDdiPatch function to assign physical addresses
to the resources in the DMA buffer. However, the scheduler is not required to call DxgkDdiPatch to assign
physical addresses to the paging buffer because physical addresses for the paging buffer were passed in
and assigned during the DxgkDdiBuildPagingBuffer call.
6. The GPU scheduler calls the display miniport driver's DxgkDdiSubmitCommand function to request that
the driver queue the paging buffer to the GPU execution unit.
7. The GPU scheduler calls the display miniport driver's DxgkDdiSubmitCommand function to request that
the driver queue the DMA buffer to the GPU execution unit. Each DMA buffer submitted to the GPU contains
a fence identifier. After the GPU finishes processing the DMA buffer, the GPU generates an interrupt.
8. The display miniport driver is notified of the interrupt in its DxgkDdiInterruptRoutine function. The
display miniport driver should read, from the GPU, the fence identifier of the DMA buffer that just
completed.
9. The display miniport driver should call the DxgkCbNotifyInterrupt function to notify the GPU scheduler
that the DMA buffer completed.
10. The display miniport driver should call the DxgkCbQueueDpc function to queue a deferred procedure call
(DPC).
11. The display miniport driver's DPC is notified to handle most of the DMA buffer processing.
Splitting a DMA Buffer
Split points are used by the video memory manager to divide a large work item submitted by the display miniport
driver into smaller work items that require less GPU resources to execute. For example, a large DMA buffer might
reference a set of allocations that possibly cannot fit in local video memory or nonlocal memory. The only way to
process such a work item is to divide it into multiple smaller work items that require less GPU resources.
Note DMA buffer splitting and DMA buffer preemption are different independent concepts. A display miniport
driver must always support DMA buffer splitting even on a system with a GPU where DMA buffer preemption is
not possible. On a system with a GPU where context save and restore is not possible, the GPU scheduler schedules
split portions of a DMA buffer back to back ensuring the split portions are not interleaved with another DMA buffer
from a different GPU context. However, a paging buffer should be submitted between portions of a split DMA
buffer because paging operations are required between split portions of a DMA buffer. Each split point that the
driver uses to build an application DMA stream is used by the video memory manager. A submitted DMA buffer
should reprogram enough GPU state after each split point to account for a potential paging buffer that might be
inserted at that location.
To specify split points, the display miniport driver specifies values in the SplitOffset and SlotId members of the
D3DDDI_PATCHLOCATIONLIST structure for each allocation that is referenced in the AllocationIndex member
of D3DDDI_PATCHLOCATIONLIST. To track allocation usage within a particular DMA buffer, the video memory
manager creates the required dimensions of an array using the MaxAllocationListSlotId member of the
DXGK_DRIVERCAPS structure that the driver provided through a call to its DxgkDdiQueryAdapterInfo function.
This array is initialized at zero and is filled as split portion entries of the patch-location list are processed. The
SlotId member of D3DDDI_PATCHLOCATIONLIST for the patch location indicates which row of the resource
table must be updated while the SplitOffset member indicates the offset within the DMA buffer where the
allocation is required. The DMA buffer can be run up to the point specified by SplitOffset without the resource
being accessible to the GPU. Similarly, if a new patch-location split portion entry refers to the same SlotId, the
previous allocation is being replaced by the new allocation, and the previous allocation is no longer required (that
is, the previous allocation can be paged-out).
When paging in the resources required by a DMA buffer, the video memory manager processes the patch-location
list by starting with the first element and moving down toward the last element. The
D3DDDI_PATCHLOCATIONLIST elements that are filled by the driver must contain values in their SplitOffset
members; the elements are strictly increasing (that is, allocations must appear in the order in which they are used
in the stream). The video memory manager pages in allocations that are referenced in the patch-location list in the
order that they are provided. When a point is reached where the video memory manager can no longer page-in an
allocation due to a low memory condition, the video memory manager submits the current portion of the DMA
buffer being prepared to the GPU scheduler for execution. The DMA buffer is run from the beginning of the
previous split point up to the SplitOffset value that is specified for an allocation that could not be brought in. Once
submitted, the video memory manager determines the list of required allocations at the current split offset in the
DMA stream by using the resource table. All allocations on the table are kept at their current physical location while
other allocations that are no longer in use might be evicted. The video memory manager then continues to process
the patch-location list, potentially splitting multiple times again.
The driver should specify split points each time an allocation is bound or unbound. To specify that an allocation is
unbound, the driver can specify a NULL allocation handle in the hDeviceSpecificAllocation member of the
DXGK_ALLOCATIONLIST structure with the appropriate value in the SlotId member of the associated
D3DDDI_PATCHLOCATIONLIST. The driver should unbind large resources to increase the chances that the video
memory manager can solve complex memory placement issues.
Similarly, the driver should reprogram large resources at every split point. When taking a split point, the video
memory manager is forced to leave a previously bound allocation to the previous allocation. This causes
fragmentation of memory that can lead to a failure to solve complex memory placement issues that might have
been solved if not for the previously bound allocation restriction. When calculating the state at a split point, the
video memory manager determines which slot identifier (SlotId) is being reprogrammed at that split point (that is,
each patch-location list element that shares the same SplitOffset value with other elements) and ignores
placement restriction on this split point. For example, if the driver uses a 64-MB texture, reprogramming that
texture at every split point gives the video memory manager the flexibility to move that texture around in memory
between split points if necessary.
Requesting to Rename an Allocation
The user-mode display driver should request that the video memory manager rename an allocation associated
with a surface when an application indicates to discard the content of the surface as part of a request to lock the
surface (for example, a vertex buffer). The Microsoft Direct3D runtime passes the Discard bit-field flag to indicate
that it no longer requires the current content of the surface. The driver can request that the video memory manager
allocate a new allocation to handle the lock request if the current allocation holding the content of the surface is
busy, rather than stalling the application thread until the current allocation becomes idle.
The user-mode display driver requests that the video memory manager rename an allocation when the driver sets
the Discard member of the D3DDDICB_LOCKFLAGS structure in a call to the pfnLockCb function. The video
memory manager determines if it should rename the allocation or should cause the application to stall until the
allocation is idle based on whether the allocation is currently busy and on the current memory condition. For each
allocation being renamed, the video memory manager maintains a list of allocations that are successively used for
locking allocations. The video memory manager cycles through the list each time the application discards the
content of an allocation. The length of the list is determined by application requirements and memory pressure. The
video memory manager attempts to keep the list long enough to avoid stalling the application thread on a lock
request. However, under memory pressure, the video memory manager can trim the list to avoid causing extra
memory pressure.
To impose a limit on the length of the renaming list for an allocation, the driver sets the
MaximumRenamingListLength member of the DXGK_ALLOCATIONINFO structure when it creates the
allocation. If the driver sets MaximumRenamingListLength to a nonzero value, then the video memory manager
determines the appropriate length of the renaming list without exceeding the limit imposed by the driver. If the
driver sets MaximumRenamingListLength to 0, then the memory manager can increase the size of the renaming
list to whatever size is necessary to improve performance.
Note that when the user-mode display driver sets the Discard member of D3DDDICB_LOCKFLAGS, the video
memory manager does not call the display miniport driver to allocate extra allocations for the original allocation.
The video memory manager creates all extra allocations using the creation parameters of the original allocation.
From the perspective of the display miniport driver, the same allocation is paged in at potentially multiple
simultaneous segment locations.
Patching a DMA Buffer
After the video memory manager is informed where every memory resource for the DMA buffer is located, the
GPU scheduler calls the display miniport driver's DxgkDdiPatch function to patch the resource with a physical
address (that is, assign a physical address to the resource).
Preparing DMA Buffers
The display miniport driver must prepare DMA buffers in a timely manner. While the GPU processes a DMA buffer,
the display miniport driver is typically called upon to prepare the next DMA buffer for submission to the GPU. To
prevent GPU starvation, the display miniport driver must spend less time preparing and submitting subsequent
DMA buffers than the GPU takes to process the current DMA buffer.
The GDI Hardware Acceleration feature introduced with Windows 7 provides accelerated core graphics device
interface (GDI) operations on a graphics processing unit (GPU).
To indicate that the GPU and the driver support this feature, the display miniport driver must set
DXGKDDI_INTERFACE_VERSION to >= DXGKDDI_INTERFACE_VERSION_WIN7.
The display miniport driver also should set DXGK_PRESENTATIONCAPS-
>SupportKernelModeCommandBuffer to TRUE to indicate that it supports GDI Hardware Acceleration
command buffer processing. The driver should report this type of support only if the cache-coherent GPU aperture
segment exists and there is no significant performance penalty when the CPU accesses GPU memory.
The following reference topics describe how to use this feature:
Driver-Implemented Functions
The following functions must be implemented by display miniport drivers that support GDI Hardware
Acceleration:
DxgkDdiRenderKm
Structures D3DKM_TRANSPARENTBLTFLAGS
D3DKMDT_GDISURFACEDATA
D3DKMDT_GDISURFACEFLAGS
DRIVER_INITIALIZATION_DATA
DXGK_CREATECONTEXTFLAGS
DXGK_CREATEDEVICEFLAGS
DXGK_GDIARG_ALPHABLEND
DXGK_GDIARG_BITBLT
DXGK_GDIARG_CLEARTYPEBLEND
DXGK_GDIARG_COLORFILL
DXGK_GDIARG_STRETCHBLT
DXGK_GDIARG_TRANSPARENTBLT
DXGK_RENDERKM_COMMAND
DXGK_PRESENTATIONCAPS
DXGKARG_GETSTANDARDALLOCATIONDRIVERDATA
DXGKARG_RENDER
Enumerations D3DKMDT_STANDARDALLOCATION_TYPE
D3DKMDT_GDISURFACETYPE
DXGK_GDIROP_BITBLT
DXGK_GDIROP_COLORFILL
DXGK_RENDERKM_OPERATION
For more details on how to implement GDI Hardware Acceleration in your display miniport driver, see the
following topics:
Setting the Size and Pitch of the Memory Allocation
Initialization and DMA Buffer Creation
Reporting Optional Support for Rendering Operations
Supporting Kernel-Mode Command Buffers
Specifying GDI Hardware-Accelerated Rendering Operations
Setting the Size and Pitch of the Memory Allocation
A display miniport driver that supports GDI Hardware Acceleration should set the size and pitch of the allocations
of system or video memory when it processes the following allocation calls.
When the driver processes a call to DxgkDdiCreateAllocation, it should set the size, in bytes, of the system or video
memory allocation. The size of the allocation is set through the pCreateAllocation-> pAllocationInfo->Size
member. If the allocation is visible to the CPU, the size should include the pitch value, which is the width of the
surface, including padding, in bytes.
Allocations are visible to the CPU if the pGetStandardAllocationDriverData->pCreateGdiSurfaceData->Type
member is set to D3DKMDT_GDISURFACE_STAGING_CPUVISIBLE or D3DKMDT_GDISURFACE_EXISTINGSYSMEM.
For the properties of these surface types, see the descriptions in D3DKMDT_GDISURFACETYPE.
When the driver processes a call to DxgkDdiGetStandardAllocationDriverData for an allocation that is visible to the
CPU, it should:
1. Set the pGetStandardAllocationDriverData->StandardAllocationType member to
D3DKMDT_STANDARDALLOCATION_GDISURFACE.
2. Set the description of a surface that can be used for redirection by GDI Hardware Acceleration and the
Desktop Windows Manager (DWM) through the D3DKMDT_GDISURFACEDATA structure that is pointed to
by the pGetStandardAllocationDriverData->pCreateGdiSurfaceData member. For example, set the
pitch of the allocation through the Pitch member of D3DKMDT_GDISURFACEDATA.
Initialization and DMA Buffer Creation
To indicate that the GPU supports GDI Hardware Acceleration, a display miniport driver's implementation of the
DriverEntry function must fill in the DxgkDdiRenderKm member of the DRIVER_INITIALIZATION_DATA
structure with a pointer to the driver-implemented DxgkDdiRenderKm function.
The DirectX graphics kernel subsystem calls the DxgkDdiRenderKm function to generate a DMA buffer from the
command buffer that is passed by the kernel-mode Canonical Display Driver (CDD) provided by the operating
system.
When the display port driver of the DirectX graphics kernel subsystem (Dxgkrnl.sys) calls the
DxgkDdiCreateContext function, it sets the pCreateContext->Flags->GdiContext member to indicate the
context that is used for GDI Hardware Acceleration.
Similarly, when the display port driver calls the DxgkDdiCreateDevice function, it sets the pCreateDevice-
>Flags->GdiDevice member to indicate the device that is used for GDI Hardware Acceleration.
Reporting Optional Support for Rendering
Operations
Beginning with Windows 7, a display miniport driver can set additional members in the
DXGK_PRESENTATIONCAPS structure to indicate certain rendering operations that the driver can or cannot
support.
For further information about available rendering capability settings, see DXGK_PRESENTATIONCAPS.
Supporting Kernel-Mode Command Buffers
The display miniport driver should submit a command buffer in response to a call to the DxgkDdiRenderKm
function as described in Submitting a Command Buffer.
The driver can use the MultipassOffset member of the DXGKARG_RENDER structure to track the progress of
input command buffer processing. For example, the display miniport driver can use the high 16 bits as an offset to
the last processed command, and the low 16 bits to track the processing of the command.
Specifying GDI Hardware-Accelerated Rendering
Operations
When the DxgkDdiRenderKm function is called, the operating system specifies the type of GDI hardware-
accelerated rendering operation to perform through the pRenderKmArgs parameter. The display port driver of the
DirectX graphics kernel subsystem (Dxgkrnl.sys) sets the pRenderKmArgs->pCommand member to point to a
command buffer that contains an array of variable-size DXGK_RENDERKM_COMMAND structures. It also sets the
pRenderKmArgs->pCommandLength member to the size of the command buffer, in bytes.
The driver must translate the input DXGK_RENDERKM_COMMAND command buffer into DMA buffer commands
and build the patch location list.
DXGK_RENDERKM_COMMAND contains members that specify characteristics of GDI hardware-accelerated
rendering operations, as described in the following table.
CORRESPONDING CORRESPONDING
DXGK_RENDERKM_COMMAND DXGK_GDIARG_XXX DXGK_RENDERKM_OPERATION
RENDERING OPERATION MEMBER STRUCTURE VALUE
alpha blend AlphaBlend DXGK_GDIARG_ALPHA DXGK_GDIOP_ALPHABL

BLEND END = 3
bit-block transfer with BitBlt DXGK_GDIARG_BITBLT DXGK_GDIOP_BITBLT =

no stretching 1
ClearType and ClearTypeBlend DXGK_GDIARG_CLEAR DXGK_GDIOP_CLEARTY

antialiased text pixel TYPEBLEND PEBLEND = 7
blend
color fill ColorFill DXGK_GDIARG_COLOR DXGK_GDIOP_COLORFI

FILL LL = 2
stretched bit-block StretchBlt DXGK_GDIARG_STRETC DXGK_GDIOP_STRETCH

transfer HBLT BLT = 4
bit-block transfer with TransparentBlt DXGK_GDIARG_TRANS DXGK_GDIOP_TRANSPA

transparency PARENTBLT RENTBLT = 6
The operating system uses the OpCode member of DXGK_RENDERKM_COMMAND to indicate the specific GDI
hardware-accelerated rendering operation that the display miniport driver must process. The OpCode member is
of type DXGK_RENDERKM_OPERATION, with values shown in the table.
The operating system will also supply the appropriate value of the DXGK_RENDERKM_COMMAND CommandSize
member, which specifies the size of the current rendering command, in bytes, including the value of OpCode and
the number of sub-rectangles in the command.
Further information about the capability of the display adapter to perform a bit-block transfer with transparency is
provided in the D3DKM_TRANSPARENTBLTFLAGS structure contained in the DXGK_GDIARG_TRANSPARENTBLT-
>Flags member.
Windows Display Driver Model (WDDM) 1.2 and later user-mode display drivers must use the memory offer and
reclaim feature, available starting with Windows 8, to reduce memory overhead needed for temporary surfaces in
local and system memory.
Driver implementation—Full graphics and Render only Mandatory
WHCK requirements and tests Device.Graphics…OfferReclaim
Especially in mobile scenarios, graphics-intensive apps that need hardware acceleration can make heavy use of
GPU resources. Also, in many mobile devices the GPU is integrated into the CPU chipset and the GPU uses portions
of system memory as video memory. To ensure reasonable system performance when multiple apps make heavy
use of a GPU that in turn makes heavy demand on system memory, the memory footprint of display drivers
should be minimized. The offer/reclaim device driver interfaces (DDIs) provide a mechanism to do this.
An API is available for apps to offer unneeded memory that the system can later reclaim for other uses, as well as
to reclaim memory that was recently discarded. See the Microsoft DirectX Graphics Infrastructure (DXGI) app
programming topic, DXGI 1.2 Improvements.
Offer and reclaim DDI

New functions are available starting with Windows 8 for the user-mode driver to offer or reclaim memory.
The driver calls these system-provided functions to offer or reclaim memory allocations:
pfnOfferAllocationsCb
pfnReclaimAllocationsCb
The driver implements these functions if it supports Microsoft Direct3D 10 hardware:
pfnOfferResources
pfnReclaimResources
The driver implements the following functions if it supports Microsoft Direct3D 9 hardware. Also, if apps offer or
reclaim their allocations while using the Direct3D 11 API running on Direct3D 9 hardware, the Direct3D runtime
calls these functions:
OfferResources
ReclaimResources
Use these associated structures and enumerations:
D3DDDI_OFFER_PRIORITY
D3DDDIARG_OFFERRESOURCES
D3DDDIARG_RECLAIMRESOURCES
D3DDDICB_OFFERALLOCATIONS
D3DDDICB_RECLAIMALLOCATIONS
DXGI_DDI_ARG_OFFERRESOURCES
DXGI_DDI_ARG_RECLAIMRESOURCES
DXGI1_2_DDI_BASE_FUNCTIONS
To support the offer/reclaim feature, starting with Windows 8 this structure has two new members:
D3DDDI_ALLOCATIONLIST
You should carefully test that your driver handles this feature correctly because after an allocation is discarded, all
data in it is lost.

WHCK documentation on Device.Graphics…OfferReclaim. Note that these requirements list the scenarios in
which the driver must offer allocations.
GPU preemption
A new GPU preemption model is available starting with Windows 8. In this model the operating system no longer
allows the preemption of GPU direct memory access (DMA) packets to be disabled, and it guarantees that
preemption requests will be sent to the GPU before a Timeout Detection and Recovery (TDR) process is initiated.
WHCK requirements and tests Device.Graphics…Preemption Test

Device.Graphics…FlipOnVSyncMmIo
If long-running packets cannot be successfully preempted, high-priority GPU work, such as work required by the
Desktop Window Manager (DWM), can be delayed, resulting in glitches during window transitions and animations.
Also, long-running GPU packets that cannot be preempted can cause a TDR process to repeatedly reset the GPU,
and eventually a system bugcheck can occur.
Note
All WDDM 1.2 display miniport drivers must support the Windows 8 preemption model. However, when in
operation, WDDM 1.2 drivers can also reject the Windows 8 preemption model and retain Windows 7 behavior by
the Microsoft DirectX graphics kernel subsystem scheduler.
GPU preemption device driver interfaces (DDIs)

The following device driver interfaces (DDIs) are available for the display miniport driver to implement the
Windows 8 GPU preemption model.
DxgkCbCreateContextAllocation
DxgkCbDestroyContextAllocation
pfnSetPriorityCb
Dxgkrnl Interface
DXGKRNL_INTERFACE
D3DKMDT_COMPUTE_PREEMPTION_GRANULARITY
D3DKMDT_GRAPHICS_PREEMPTION_GRANULARITY
D3DKMDT_PREEMPTION_CAPS
D3DKMT_QUERYADAPTERINFO
DXGK_DRIVERCAPS
DXGK_SUBMITCOMMANDFLAGS
DXGK_VIDSCHCAPS
DXGKARGCB_CREATECONTEXTALLOCATION
Display miniport driver implementation

Follow these general steps to implement the Windows 8 GPU preemption model in your display miniport driver:
1. Compile your driver against headers that have DXGKDDI_INTERFACE_VERSION >=
DXGKDDI_INTERFACE_VERSION_WIN8.
2. Declare support for the Windows 8 GPU preemption model by setting the PreemptionAware and
MultiEngineAware members of the DXGK_VIDSCHCAPS structure to 1. To support the Windows 7
preemption model, set PreemptionAware to zero.
3. Specify the supported level of preemption granularity in the D3DKMDT_PREEMPTION_CAPS structure, which
takes constant values from the D3DKMDT_GRAPHICS_PREEMPTION_GRANULARITY and
D3DKMDT_COMPUTE_PREEMPTION_GRANULARITY enumerations.
4. If the hardware supports lazy context switching, submit a zero-length buffer to the DxgkDdiSubmitCommand
function and set the pSubmitCommand->Flags->ContextSwitch member to 1. Note the discussion under the
ContextSwitch member of the DXGK_SUBMITCOMMANDFLAGS structure.
5. Set GPU context allocations and device context allocations by calling the DxgkCbCreateContextAllocation
function. Note the specific instructions and restrictions given in Remarks for the function.
6. Call the DxgkCbDestroyContextAllocation function to destroy GPU context allocations and device context
allocations that were created with DxgkCbCreateContextAllocation.
7. When preparing the DMA buffer in response to a call to the DxgkDdiBuildPagingBuffer function, initialize the
context resource by filling in the InitContextResource internal structure within the
DXGKARG_BUILDPAGINGBUFFER structure. If context resources are evicted or relocated, the video memory
manager will preserve the content of the context resources.
8. The driver must support memory-mapped I/O flip on the next vertical sync. In Windows 8, the GPU scheduler
attempts to preempt hardware even if flips are pending. Therefore, to prevent tearing and rendering artifacts,
the driver must support the memory-mapped I/O flip model and must set the FlipOnVSyncMmIo member of
the DXGK_FLIPCAPS structure to 1 and support the operations described under FlipOnVSyncMmIo.
Memory mapping considerations in your implementation
Create a robust driver that supports the Windows 8 GPU preemption model and provides a quality user
experience by following this guidance:
Request mid-DMA buffer preemption from the GPU when the DirectX graphics kernel (Dxgkrnl) scheduler
sends a preemption command. Hardware devices that have a finer granularity of mid-DMA buffer preemption
should produce a better customer experience.
Allow paging command fence IDs to be reused: if a preemption request resulted in preempting paging
commands in the hardware queue, the Dxgkrnl scheduler will resubmit preempted paging commands with the
same fence IDs that were originally used for them, and the paging commands will be scheduled prior to any
other commands on that engine. Non-paging commands will be resubmitted with newly assigned fence IDs.
Provide a patch location list for split DMA buffers—see Splitting a DMA Buffer.
A verification mode, called binding leak detection, is available that walks through the patch location list and
rejects packets that do not unbind, or that do not reprogram allocations for each split packet. Some hardware
support virtual addresses, allowing an extra level of indirection that can make this verification unnecessary. In
such a case, to indicate that the driver opts out of the verification mode, set the NoDmaPatching member of
the DXGK_VIDSCHCAPS structure to 1.
In Windows 7, the Dxgkrnl scheduler guarantees that all split DMA packets that correspond to the same render
command are executed sequentially without switching to another render context. In the Windows 8 preemption
model, the scheduler can execute render packets from a different context between two split packets that
correspond to the same render command. As a consequence, drivers that are aware of preemption should
handle a split/partial DMA packet submission in the same way as a regular full packet submission. In particular,
GPU state must be saved or restored at the boundary for such submissions.
A preemption-aware driver must not change the content of a split DMA buffer when it is broadcast to multiple
adapters in linked display adapter (LDA) mode, where multiple physical GPUs are linked to form a single, faster,
virtual GPU. This is because, in the Windows 8 preemption model, the Dxgkrnl scheduler no longer guarantees
synchronous execution of a split packet sequence without switching to another context. A driver that changed
the content of a split DMA packet would compromise the integrity of the packet's data because if the packet
were executed on another engine, it would operate on the same copy of DMA buffer data.
In the Windows 8 GPU preemption model, the Dxgkrnl scheduler enables preemption for packets that have
associated "signal on submit" synchronization primitives. If a device uses "signal on submit" synchronization
primitives in conjunction with hardware-based wait states, it must support the ability to preempt a wait
instruction before the wait condition is satisfied.

WHCK documentation on Device.Graphics…Preemption Test and Device.Graphics…FlipOnVSyncMmIo.
Direct flip of video memory
The direct flip feature allows for special optimizations to the composition model to reduce power consumption.
The optimizations benefit these scenarios:
To ensure optimal power consumption for video playback and other full screen scenarios, direct flip enables a
minimum of memory bandwidth to display full-screen content and ensure smooth transitions between full-
screen apps, other apps, and the desktop environment.
The user wants to view a video or run an app that covers the entire screen. When the user enters or exits the
app, or notifications appear over the app, no mode change is required, and the experience is smooth.
Furthermore, the user enjoys extended battery life on mobile devices because memory bandwidth
requirements are reduced for full-screen apps such as video.
Driver implementation—Full graphics Mandatory
WHCK requirements and tests Device.Graphicsâ€¦DirectFlip
DirectFlip device driver interface (DDI)

These functions and structures are new or updated for Windows 8:
CheckDirectFlipSupport
CheckDirectFlipSupport(D3D11_1)
D3D11_1_DDI_CHECK_DIRECT_FLIP_FLAGS
D3DDDI_CHECK_DIRECT_FLIP_FLAGS
D3DDDIARG_CHECKDIRECTFLIPSUPPORT
D3DKMT_DIRECTFLIP_SUPPORT
D3DKMT_QUERYADAPTERINFO
D3DKMT_WAITFORVERTICALBLANKEVENT2
D3DKMTWaitForVerticalBlankEvent2
DXGK_DRIVERCAPS
DXGK_SEGMENTFLAGS
DXGK_SETVIDPNSOURCEADDRESS_FLAGS

WHCK documentation on Device.Graphicsâ€¦DirectFlip.
Direct3D rendering performance improvements
Windows Display Driver Model (WDDM) 1.3 and later drivers can support Microsoft Direct3D rendering
performance improvements that let Direct3D 9 hardware make better use of hardware command buffers and
counters and make efficient copies of system memory to subresources. These capabilities, which mirror some of
the capabilities available for Direct3D Version 10 hardware, are new starting with Windows 8.1.
New Direct3D 11.1 resource trim and map default performance improvements are also available. The map default
scenario is outlined in the Behavior changes section below.
Rendering performance reference

This reference section describes the user-mode device driver interfaces (DDIs):
Direct3D rendering performance functions implemented by the user-mode driver
All functions that user-mode display drivers must implement in order to improve rendering performance for
Direct3D Level 9 hardware.
These user-mode structures and enumerations support rendering performance improvements and are new or
updated for Windows 8.1. All apply to Direct3D Level 9 drivers except for D3D11_1_DDI_FLUSH_FLAGS.
D3DDDI_FLUSH_FLAGS (new)
D3DDDIARG_COPYFLAGS (new)
D3DDDIARG_COUNTER_INFO (new)
D3DDDIARG_UPDATESUBRESOURCEUP (new)
D3DDDICAPS_SIMPLE_INSTANCING_SUPPORT (new)
CreateResource2 (WDDM 1.3 and later Direct3D Level 9 drivers must return the E_INVALIDARG error code if
the CaptureBuffer flag value is set)
D3D11_1_DDI_FLUSH_FLAGS (D3DWDDM1_3DDI_TRIM_MEMORY constant added)
D3DDDI_DEVICEFUNCS (pfnFlush1, pfnCheckCounterInfo, pfnCheckCounter,
pfnUpdateSubresourceUP members added)
D3DDDI_POOL (D3DDDIPOOL_STAGINGMEM constant added)
D3DDDICAPS_TYPE (D3DDDICAPS_GET_SIMPLE_INSTANCING_SUPPORT constant added)
GetCaps (new info in Remarks)
DDI implementation requirements starting with WDDM 1.3

Starting with WDDM 1.3, the following functions are required or optional for user-mode drivers to implement.
FUNCTION GROUP DESCRIPTION
Direct3D 9 functions that are optional prior to WDDM BufBlt1

1.3. Now required: CreateResource2
TexBlt1
VolBlt1
FUNCTION GROUP DESCRIPTION
Direct3D 9 functions that are available starting with pfnCheckCounter

WDDM 1.3. A driver must either implement all of these pfnCheckCounterInfo
functions or none of them: pfnFlush1
pfnPresent1(D3D)
pfnPresent1(DXGI)
pfnUpdateSubresourceUP
pfnSetMarker
pfnSetMarkerMode
When the WDDM 1.3 and later optional functions BltDXGI —native staging
immediately above are implemented, these functions have Blt1DXGI —native staging
associated behavior changes: CreateResource2 —native staging, large capture
textures
GetCaps —time stamps, simple instancing
Lock —native staging
TexBlt1 —native staging
Unlock —native staging
VolBlt1 —native staging
These scenarios apply when GetCaps is called:
If D3DDDICAPS_GETD3DQUERYDATA is set, the
driver can optionally report support for time stamps,
meaning that the Direct3D runtime won't mask
support.
If
D3DDDICAPS_GET_SIMPLE_INSTANCING_SUPPORT
is set, the driver can report optional hardware support
for instancing.
These Direct3D 11 functions have associated behavior CreateResource(D3D11) — buffer map default (see
changes: Behavior changes section below)
pfnFlush1 — resource trim
ResourceMap — buffer map default (see Behavior
changes section below)
ResourceUnmap — buffer map default (see Behavior
changes section below)
Behavior changes for calls to resource create, map, and unmap

functions
For these functions that are implemented by WDDM 1.3 and later drivers, the Direct3D runtime supplies a
restricted set of input values for the map default scenario. These restricted values apply only to drivers that support
feature level 11.1 and later.
CreateResource(D3D11) function—
These input D3D11DDIARG_CREATERESOURCE structure members are restricted:
MEMBER DESCRIPTION
MEMBER DESCRIPTION
ResourceDimension and Usage These behavior changes only apply when the Direct3D
runtime supplies type D3D10DDIRESOURCE_BUFFER for
ResourceDimension and type
D3D10_DDI_USAGE_DEFAULT for Usage.
BindFlags The Direct3D runtime sets only the

D3D10_DDI_BIND_SHADER_RESOURCE and
D3D11_DDI_BIND_UNORDERED_ACCESS values.
MapFlags If all the other member requirements listed here are met,
the runtime can set D3D10_DDI_MAP_READ,
D3D10_DDI_MAP_WRITE, and
D3D10_DDI_MAP_READWRITE values. The driver must
support these values. Values of
D3D10_DDI_MAP_WRITE_DISCARD and
D3D10_DDI_MAP_WRITE_NOOVERWRITE are invalid.
MiscFlags The runtime sets only the

D3D11_DDI_RESOURCE_MISC_BUFFER_ALLOW_RAW_V
IEWS and
D3D11_DDI_RESOURCE_MISC_BUFFER_STRUCTURED
values.
Format The runtime sets only the DXGI_FORMAT_UNKNOWN

value.
SampleDesc The runtime sets the DXGI_SAMPLE_DESC.Count

member to 1, and the Quality member to zero.
MipLevels The runtime sets the value to 1.
ArraySize The runtime sets the value to 1.
pPrimaryDesc The runtime sets the value to NULL.
ResourceMap function—
These input parameters to ResourceMap are restricted:
PARAMETER DESCRIPTION
hResource The Direct3D runtime sets only a

D3D10DDIRESOURCE_BUFFER resource when a non-
zero value for MapFlags is set in the creation call to
CreateResource(D3D11).
The runtime sets only the DXGI_FORMAT_UNKNOWN

value.
Subresource The runtime only sets the value to 0.
DDIMap If all the other member requirements listed here are met,
the runtime can set D3D10_DDI_MAP_READ,
D3D10_DDI_MAP_WRITE, or
D3D10_DDI_MAP_READWRITE values, matching the
MapFlags value set in the creation call to
CreateResource(D3D11).
Flags Although the input value from the runtime isn't restricted,
the driver must be able to support the
D3D10_DDI_MAP_FLAG_DONOTWAIT value.
pMappedSubResource Although the input value from the runtime isn't restricted,
the driver must assign a valid CPU-cacheable pointer to
the D3D10DDI_MAPPED_SUBRESOURCE.pData
member and must set the RowPitch and DepthPitch to
match the size of the buffer and the data provided in
pData.
ResourceUnmap function—
These input parameters to ResourceUnmap are restricted:
hDevice Although the input value from the Direct3D runtime isn't
restricted, the value which match the hDevice value from
the original ResourceMap call.
hResource The runtime sets only a D3D10DDIRESOURCE_BUFFER

resource when a non-zero value for MapFlags is set in the
creation call to CreateResource(D3D11).
Subresource The runtime only sets the value to 0.

Graphics kernel performance improvements
To help evaluate graphics hardware performance, Windows Display Driver Model (WDDM) 1.3 and later drivers
can optionally provide accurate timing information for API calls that are processed by the GPU. This capability is
new starting with Windows 8.1.
Kernel performance reference

These reference topics describe how to implement this capability in your display miniport driver and user-mode
display driver:
DxgkDdiCalibrateGpuClock
DxgkDdiFormatHistoryBuffer
DXGK_HISTORY_BUFFER
DXGK_HISTORY_BUFFER_HEADER
DXGKARG_CALIBRATEGPUCLOCK
DXGKARG_FORMATHISTORYBUFFER
DXGKARG_HISTORYBUFFERPRECISION
DRIVER_INITIALIZATION_DATA (new DxgkDdiCalibrateGpuClock and DxgkDdiFormatHistoryBuffer
members)
DXGK_ALLOCATIONINFOFLAGS (new HistoryBuffer member)
DXGK_QUERYADAPTERINFOTYPE (new DXGKQAITYPE_HISTORYBUFFERPRECISION constant value)
DxgkDdiCreateAllocation (see "Allocating history buffers" in Remarks)
Present overhead improvements
Starting with Windows 8.1, the Microsoft Direct3D runtime handles internal swap buffers more efficiently, reducing
the processing load on the GPU. To support this better performance, Windows Display Driver Model (WDDM) 1.3
and later drivers must support a new present device driver interface (DDI) and new texture formats as shared
surfaces:
WDDM 1.3 present DDI

These reference topics describe how to implement this capability in your display miniport driver and user-mode
display driver:
pfnPresent1(D3D)
pfnPresent1(DXGI)
D3DDDIARG_PRESENT1
D3DDDIARG_PRESENTSURFACE
D3DKMT_COMPOSITION_PRESENTHISTORYTOKEN
DXGI_DDI_ARG_PRESENT1
DXGI_DDI_ARG_PRESENTSURFACE
D3DDDI_DEVICEFUNCS (new pfnPresent1 function pointer)
D3DDDIFORMAT (new D3DDDIFMT_G8R8 and D3DDDIFMT_R8 constant values)
D3DKMT_PRESENT_MODEL (new D3DKMT_PM_REDIRECTED_COMPOSITION constant value)
D3DKMT_PRESENTHISTORYTOKEN (new Composition member)
DXGI_DDI_BASE_ARGS (new pDXGIDDIBaseFunctions4 member)
DXGI1_3_DDI_BASE_FUNCTIONS (new pfnPresent1 function pointer)
Texture format support for shared surfaces

Drivers should support both sharing resources and shareable backbuffers for these additional texture formats from
the DXGI_FORMAT enumeration:
DXGI_FORMAT_A8_UNORM
DXGI_FORMAT_R8_UNORM
DXGI_FORMAT_R8G8_UNORM
DXGI_FORMAT_BC1_TYPELESS\*
DXGI_FORMAT_BC1_UNORM
DXGI_FORMAT_BC1_UNORM_SRGB
In addition, drivers should support the DXGI_FORMAT_L8_UNORM placeholder format if they support Microsoft
Direct3D 11 and later on Direct3D feature level 9 hardware. DXGI_FORMAT_L8_UNORM is functionally equivalent
to the D3DDDIFMT_L8 format.
Drivers should also support additional texture formats from the D3DDDIFORMAT enumeration:
D3DDDIFMT_G8R8
D3DDDIFMT_R8
Graphics hardware vendors must write user-mode display drivers for their display adapters. The user-mode
display driver is a dynamic-link library (DLL) that is loaded by the Microsoft Direct3D runtime. A user-mode
display driver must at least support the Direct3D version 9 DDI. User-mode display drivers can also support the
Direct3D version 10 DDI. The user-mode display driver can consist of one DLL that supports both Direct3D version
9 DDI and Direct3D version 10 DDI or it can consist of two separate DLLs, one for version 9 and the other for
version 10 of Direct3D DDI. The following topics discuss various aspects of the user-mode display driver:
Returning Error Codes Received from Runtime Functions
Returning Error Codes Received from Runtime
Functions
Calls to the Direct3D version 9 user-mode display driver-supplied functions must return error codes that they
receive when they call the Direct3D runtime-supplied kernel-services accessing functions. For example, the runtime
might call a user-mode display driver function, such as the CreateResource function. This, in turn, calls a runtime-
supplied function, such as the pfnAllocateCb function, to perform a specific operation, in this case to allocate
memory for the resource. If the user-mode display driver receives an error code from the call to the runtime-
supplied function, it must return that error code back to the runtime.
Note There is one exception to the rule that a driver must pass a runtime error code back to the runtime. When the
driver calls the pfnAllocateCb runtime-supplied function, to allocate video memory for optional resources when
the video memory is already allocated, the rule does not apply. If pfnAllocateCb fails to allocate this video
memory for optional resources that are only required to optimize performance, the driver should not report the
out-of-memory error (E_OUTOFMEMORY) back to the runtime.
Typically, a user-mode display driver cannot fail any of its functions by returning E_INVALIDARG. However, if the
user-mode display driver receives the E_INVALIDARG return value when it calls one of the Microsoft Direct3D
runtime-supplied functions (because of a programming error in the driver or malicious code that runs in the
operating system), the driver must return E_INVALIDARG back to the Direct3D runtime after the runtime calls one
of the driver's functions. Otherwise, the user-mode display driver should never return E_INVALIDARG to the
Direct3D runtime.
The user-mode display driver uses vertex declarations, and the tokens within each individual pixel and vertex
shader code, to program shader assemblers.
The user-mode display driver receives vertex and pixel shader code when the Microsoft Direct3D runtime calls the
driver's CreateVertexShaderFunc and CreatePixelShader functions, respectively. The user-mode display driver
receives vertex declarations when the runtime calls the driver's CreateVertexShaderDecl function. The vertex
declarations consist of arrays of D3DDDIVERTEXELEMENT structures. The user-mode display driver converts
shader code and vertex shader declarations into a hardware-specific format and associates the shader code and
declarations with shader and declaration handles. The runtime uses the created handles in calls to the
SetVertexShaderDecl, SetVertexShaderFunc, and SetPixelShader functions to set the vertex shader
declaration and the vertex and pixel shaders so that all subsequent drawing operations use them.
For more information about the format of an individual shader code and the tokens that comprise each shader
code, see Direct3D Shader Codes.
Note When an application creates vertex shaders, pixel shaders, and vertex declarations, the shader code and
declaration for each ends with an end token. When the Direct3D runtime, in turn, passes vertex and pixel shader
creation requests to the user-mode display driver, the vertex and pixel shader code that accompanies the requests
ends with end tokens. However, when the runtime passes vertex declaration creation requests, the vertex
declarations that accompany the requests do not end with end tokens.
The Microsoft Direct3D runtime converts Direct3D fixed-function state to vertex or pixel shader version 2.0 if the
user-mode display driver supports version 2.0 or later for each shader type. However, the runtime does not
convert shader versions. For example, if an application uses vertex or pixel shader version 1.1, then version 1.1 is
passed unconverted to the user-mode display driver regardless of whether the driver supports shader version 2.0
or later. Flexible vertex format (FVF) codes are used with fixed-function processing.
Converter Features for DirectX Versions
How the fixed-function vertex and pixel shader converters work depend on the version of Microsoft DirectX used:
DirectX 9.0
Fixed-function vertex and pixel shader converters can work with the Windows Vista display driver model.
The converters are enabled by default.
When the fixed-function vertex or pixel shader converter is used, the pure device is disabled. When an
application requests the pure device, the Direct3D runtime creates a HAL device.
The runtime supports mixed vertex processing.
Software vertex processing always uses the fixed-function vertex shader converter.
Hardware vertex processing uses the fixed-function vertex shader converter when the driver supports vertex
shader version 2.0 or later.
Hardware vertex processing uses the fixed-function pixel shader converter when the driver supports pixel
In the mixed vertex processing mode when the fixed-function vertex shader converter is enabled for
hardware, the number of float constants is set to what the hardware can support.
DirectX 8.0 and earlier
Fixed-function vertex and pixel shader converters can work with the Windows Vista display driver model
only.
The converters are enabled by default.
The fixed-function vertex shader converter is not supported with software vertex processing.
Hardware vertex processing uses the fixed-function vertex shader converter when the driver supports vertex
Hardware vertex processing uses the fixed-function pixel shader converter when the driver supports pixel
Note For versions of DirectX prior to DirectX 8.0, the fixed function to shader mapping code is implemented
in Ddraw.dll.
Unused User-Mode Display Driver Functions
The following user-mode display driver functions are not called by the Direct3D runtime when the fixed-function
vertex shader converter is enabled:
MultiplyTransform
SetTransform
SetMaterial
SetLight
CreateLight
DestroyLight
Unused Render States
The following render states are not passed by the Direct3D runtime (or, if passed by mistake, can be ignored by the
driver) when the fixed-function vertex shader converter is enabled:
D3DRS_VERTEXBLEND
D3DRS_INDEXEDVERTEXBLENDENABLE
D3DRS_TWEENFACTOR
D3DRS_FOGVERTEXMODE
D3DRS_LIGHTING
D3DRS_AMBIENT
D3DRS_COLORVERTEX
D3DRS_LOCALVIEWER
D3DRS_DIFFUSEMATERIALSOURCE
D3DRS_SPECULARMATERIALSOURCE
D3DRS_AMBIENTMATERIALSOURCE
D3DRS_EMISSIVEMATERIALSOURCE
D3DRS_POINTSCALEENABLE
D3DRS_POINTSCALE_A
D3DRS_POINTSCALE_B
D3DRS_POINTSCALE_C
D3DRS_NORMALIZENORMALS
Ignored Texture Stage States
The Direct3D runtime passes all texture stage states to the driver. The driver should ignore the following texture
stage states when the fixed-function pixel shader converter is enabled:
D3DTSS_COLOROP
D3DTSS_COLORARG1
D3DTSS_COLORARG2
D3DTSS_ALPHAOP
D3DTSS_ALPHAARG1
D3DTSS_ALPHAARG2
D3DTSS_BUMPENVMAT00
D3DTSS_BUMPENVMAT01
D3DTSS_BUMPENVMAT10
D3DTSS_BUMPENVMAT11
D3DTSS_BUMPENVLSCALE
D3DTSS_BUMPENVLOFFSET
D3DTSS_COLORARG0
D3DTSS_ALPHAARG0
D3DTSS_RESULTARG
D3DTSS_CONSTANT
The Microsoft Direct3D runtime calls the user-mode display driver's Blt function to copy depth-stencil values from
video memory to system memory, or vice versa. The driver and hardware must perform format conversions from,
or to, all driver-supported opaque depth-stencil formats (that is, all formats defined by the D3DDDIFORMAT
enumeration type except D3DDDIFMT_D*_LOCKABLE) to, or from, any of the following formats:
D3DDDIFMT_D16_LOCKABLE
D3DDDIFMT_D32_LOCKABLE
D3DDDIFMT_D32F_LOCKABLE
D3DDDIFMT_S8_LOCKABLE
The driver discards any channel (depth or stencil) present in the source format but not present in the destination
format. The runtime does not permit copying between depth-stencil surfaces that do not share any common
channel types.
The driver first converts a source depth value to a 32-bit unsigned integer value, and then from the 32-bit
unsigned integer value to the destination representation. The following rules apply for both of these conversions:
If the source depth value is a floating-point value, a clamp to [0,1] is applied and the result is multiplied by
_MAX_UINT.
If the source is integral and the destination is a lower-precision integer, the right-most extra bits are
removed.
If the source is integral and the destination is a higher-precision integer, the rightmost extra bits are
replicated from the left-most significant bits.
If the source is integral and the destination is a floating-point value, then the 32-bit integer is converted to a
floating-point value and the result is divided by _MAX_UINT.
The driver is not required to provide special treatment to nonuniformly distributed depth values.
The driver expands a source stencil value to an 8-bit integer (that is, the driver pads the source stencil value with
zeros on the left). If the destination representation uses lower precision, then the driver should discard the most
significant bits to perform the conversion.
User-mode display drivers must support depth-stencil copies of arbitrary subrectangles. However, drivers are not
required to perform mirror, stretch, or color-key operations during depth-stencil copies. Point sampling is
implicitly required during depth-stencil copies.
A user-mode display driver can pass "Designed for Microsoft Windows" for Hardware Logo testing, regardless of
whether it performs index validation. However, to ensure that the driver works with Microsoft DirectX applications
that might pass invalid indexes, a user-mode display driver should perform index validation.
You should consider the following items:
DirectX 8.0 and DirectX 9.0 applications can pass a stride value of 0 when they render with a vertex buffer. In
this situation, only vertex 0 should be referenced. The stride value is set in the Stride member of the
D3DDDIARG_SETSTREAMSOURCE structure in a call to the user-mode display driver's SetStreamSource
function.
A call to the driver's SetStreamSourceUM function does not include the size of the vertex data. That is, the
size of the user-memory buffer that supplies the vertex data that the pUMBuffer parameter of
SetStreamSourceUM points to is not specified.
The NumVertices member of the D3DDDIARG_DRAWINDEXEDPRIMITIVE or
D3DDDIARG_DRAWINDEXEDPRIMITIVE2 structure is never set to 0 in a call to the driver's
DrawIndexedPrimitive or DrawIndexedPrimitive2 function. The driver should set the maximum
allowable index to (NumVerticesÂ -Â 1).
User-mode display drivers on multiple-processor computers can let the Microsoft Direct3D runtime handle
multiple-processor optimizations, or the drivers can perform their own multiple-processor optimizations.
Runtime -Handled Multiple -Processor Optimizations
The multiple-processor optimizations that are handled by Direct3D runtime are enabled only on drivers that
support the LockAsync, UnlockAsync, and Rename functions. These functions enable the multiple-processor
optimizations to work well with applications that frequently lock dynamic resources. The LockAsync and
UnlockAsync functions--along with the GetQueryData function--must be reentrant on drivers that expose a DDI
version of 0x0000000B or greater. The driver returns the DDI-version value in the DriverVersion member of the
D3D10DDIARG_OPENADAPTER structure in a call to the driver's OpenAdapter function. When the runtime calls
a driver function in a reentrant manner, one thread can execute inside that function while another thread that
references the same display device executes inside of another driver function.
The Direct3D runtime uses multiple-processor optimizations in some situations to offload work to a separate
processor and improve computer performance. When multiple-processor optimizations are enabled, an additional
software layer is added between the Direct3D runtime and the user-mode display driver. This software layer
intercepts all calls that the Direct3D runtime would otherwise make to the user-mode display driver's functions.
Instead of calling the user-mode display driver directly, the software layer queues commands into batches that a
worker thread asynchronously processes. However, the software layer cannot batch all calls that are made to the
user-mode display driver's functions. In particular, the software layer cannot batch calls to functions that return
information (for example, CreateResource). When the software layer must call one of these types of driver
functions, it flushes all queued commands through the worker thread, and then the software layer calls the driver
function on the main application thread.
Driver-Handled Multiple -Processor Optimizations
If a driver will perform its own multiple-processor optimizations, it must not implement LockAsync,
UnlockAsync, and Rename functions. In this situation, the driver must call the pfnSetAsyncCallbacksCb
function to notify the runtime whether the runtime will start or stop receiving calls to the runtime's callback
functions from a worker thread.
If the driver performs its own multiple-processor optimizations, it should follow the same policy that the Direct3D
runtime uses when it determines to enable multiple-processor optimizations. This policy enables fair sharing of
system resources across all processes. In particular, the driver should disable multiple-processor optimizations in
the following situations:
The application runs in windowed mode.
The computer contains only one processor (or processor core); the driver should disable optimizations on
single-processor computers with hyper-threading.
The application requested that no multiple-processor optimizations be enabled, or the application uses
software-vertex processing; this information is passed to the driver's CreateDevice function.
If vendors want to enable multiple-processor optimizations in one of these situations, they should first contact
Microsoft.
With the Direct3D runtime, you can allow vertex and index buffers to have more than one lock outstanding. User-
mode display drivers must handle multiple locks the same way as the runtime in the Windows 2000 Display Driver
Model.
A user-mode display driver must not fail a call to its LockAsync function for a resource that is already locked. That
is, the driver cannot fail any calls to its LockAsync function for a particular resource after the first call to its
LockAsync function succeeds in locking that resource. Similarly, the driver cannot fail any calls to its Lock function
for a particular resource after the first call to its Lock function succeeds in locking that resource. The runtime
matches each call that it makes to the driver's LockAsync function with a call to the driver's UnlockAsync function.
The runtime also matches each call that it makes to the driver's Lock function with a call to the driver's Unlock
function.
The user-mode display driver cannot fail a call to its UnlockAsync function unless the resource that the
D3DDDIARG_UNLOCKASYNC structure describes was not actually locked by a previous call to the driver's
LockAsync function. Similarly, the driver cannot fail a call to its Unlock function unless the resource that the
D3DDDIARG_UNLOCK structure describes was not actually locked by a previous call to the driver's Lock function.
In situations in which the resources were not previously locked, UnlockAsync and Unlock return E_INVALIDARG.
The following topics discuss Microsoft DirectX Video Acceleration (VA) version 2.0:
Video Decode Acceleration for DirectX VA 2.0
Video Processing for DirectX VA 2.0
Extended Support for DirectX VA 2.0
Video Decode Acceleration for DirectX VA 2.0
The following topics discuss video decoding for DirectX VA 2.0:

Providing Capabilities for Video Decoding
Creating a Video Decode Device
Creating Compressed Buffers and Decode Render Targets
Decoding Video
Synchronizing Video Decode Operations
Providing Capabilities for Video Decoding
When its GetCaps function is called, the user-mode display driver provides the following capabilities for video
decoding based on the request type (which is specified in the Type member of the D3DDDIARG_GETCAPS
structure that the GetCaps function's pData parameter points to):
D3DDDICAPS_GETDECODEGUIDCOUNT and D3DDDICAPS_GETDECODEGUIDS request types
The user-mode display driver returns the number and a list of the following GUIDs that it supports for video
acceleration (VA) decoding. The Microsoft Direct3D runtime first requests the number of GUIDs followed by a
request for the list of supported GUIDs.
DEFINE_GUID(DXVADDI_ModeMPEG2_MoComp, 0xe6a9f44b, 0x61b0, 0x4563,0x9e,0xa4,0x63,0xd2,0xa3,0xc6,0xfe,0x66);

DEFINE_GUID(DXVADDI_ModeMPEG2_IDCT, 0xbf22ad00, 0x03ea, 0x4690,0x80,0x77,0x47,0x33,0x46,0x20,0x9b,0x7e);
DEFINE_GUID(DXVADDI_ModeMPEG2_VLD, 0xee27417f, 0x5e28, 0x4e65,0xbe,0xea,0x1d,0x26,0xb5,0x08,0xad,0xc9);
DEFINE_GUID(DXVADDI_ModeH264_A, 0x1b81be64, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);

DEFINE_GUID(DXVADDI_ModeH264_B, 0x1b81be65, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeH264_C, 0x1b81be66, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeH264_D, 0x1b81be67, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeH264_E, 0x1b81be68, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeH264_F, 0x1b81be69, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeWMV8_A, 0x1b81be80, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);

DEFINE_GUID(DXVADDI_ModeWMV8_B, 0x1b81be81, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeWMV9_A, 0x1b81be90, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);

DEFINE_GUID(DXVADDI_ModeWMV9_B, 0x1b81be91, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeWMV9_C, 0x1b81be94, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeVC1_A, 0x1b81beA0, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);

DEFINE_GUID(DXVADDI_ModeVC1_B, 0x1b81beA1, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeVC1_C, 0x1b81beA2, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
DEFINE_GUID(DXVADDI_ModeVC1_D, 0x1b81beA3, 0xa0c7, 0x11d3,0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5);
#define DXVADDI_ModeMPEG2_MOCOMP DXVADDI_ModeMPEG2_MoComp
#define DXVADDI_ModeWMV8_PostProc DXVADDI_ModeWMV8_A

#define DXVADDI_ModeWMV8_MoComp DXVADDI_ModeWMV8_B
#define DXVADDI_ModeWMV9_PostProc DXVADDI_ModeWMV9_A

#define DXVADDI_ModeWMV9_MoComp DXVADDI_ModeWMV9_B
#define DXVADDI_ModeWMV9_IDCT DXVADDI_ModeWMV9_C
#define DXVADDI_ModeVC1_PostProc DXVADDI_ModeVC1_A

#define DXVADDI_ModeVC1_MoComp DXVADDI_ModeVC1_B
#define DXVADDI_ModeVC1_IDCT DXVADDI_ModeVC1_C
#define DXVADDI_ModeVC1_VLD DXVADDI_ModeVC1_D
#define DXVADDI_ModeH264_MoComp_NoFGT DXVADDI_ModeH264_A

#define DXVADDI_ModeH264_MoComp_FGT DXVADDI_ModeH264_B
#define DXVADDI_ModeH264_IDCT_NoFGT DXVADDI_ModeH264_C
#define DXVADDI_ModeH264_IDCT_FGT DXVADDI_ModeH264_D
#define DXVADDI_ModeH264_VLD_NoFGT DXVADDI_ModeH264_E
#define DXVADDI_ModeH264_VLD_FGT DXVADDI_ModeH264_F
D3DDDICAPS_GETDECODERTFORMATCOUNT and D3DDDICAPS_GETDECODERTFORMATS request types

The user-mode display driver returns the number and a list of render target formats that it supports for a particular
DirectX VA decode type. The Direct3D runtime specifies the GUID for a particular DirectX VA decode type in a
variable that the pInfo member of D3DDDIARG_GETCAPS points to.
D3DDDICAPS_GETDECODECOMPRESSEDBUFFERINFOCOUNT and
D3DDDICAPS_GETDECODECOMPRESSEDBUFFERINFO request types
The user-mode display driver returns the number of and information about the compressed buffer types that are
required to accelerate the video decode. The Direct3D runtime specifies a DXVADDI_DECODEINPUT structure for
a particular DirectX VA decode type in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The
user-mode display driver returns information about the compressed buffer types in an array of
DXVADDI_DECODEBUFFERINFO structures that the pData member of D3DDDIARG_GETCAPS specifies.
D3DDDICAPS_GETDECODECONFIGURATIONCOUNT and D3DDDICAPS_GETDECODECONFIGURATIONS request
types
The user-mode display driver returns the number and a list of accelerated decode configurations that it supports
for a particular DirectX VA decode type. The Direct3D runtime specifies a DXVADDI_DECODEINPUT structure for a
particular DirectX VA decode type in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The
user-mode display driver returns accelerated decode configurations in an array of
DXVADDI_CONFIGPICTUREDECODE structures that the pData member of D3DDDIARG_GETCAPS specifies.
Creating a Video Decode Device
The Microsoft Direct3D runtime calls the user-mode display driver's CreateDecodeDevice function to create a
decode device for video acceleration (VA). When the Direct3D runtime is finished with the decode device, it calls the
user-mode display driver's DestroyDecodeDevice function.
Creating Compressed Buffers and Decode Render
Targets
The Microsoft Direct3D runtime calls the user-mode display driver's CreateResource function to create
compressed buffers and render targets for decoding.
Each compressed buffer type has its own surface format as well as a special flag that indicates that the surface that
the runtime creates contains compressed buffer information for accelerated video decode. The user-mode display
driver determines to create a compressed buffer if the DecodeCompressedBuffer bit-field flag in the Flags
member of the D3DDDIARG_CREATERESOURCE structure that the pResource parameter of CreateResource
points to is set. The user-mode display driver determines the type of compressed buffer to create by the format
value in the Format member of D3DDDIARG_CREATERESOURCE. The following formats are defined:
D3DDDIFMT_PICTUREPARAMSDATA = 150
D3DDDIFMT_MACROBLOCKDATA = 151
D3DDDIFMT_RESIDUALDIFFERENCEDATA = 152
D3DDDIFMT_DEBLOCKINGDATA = 153
D3DDDIFMT_INVERSEQUANTIZATIONDATA = 154
D3DDDIFMT_SLICECONTROLDATA = 155
D3DDDIFMT_BITSTREAMDATA = 156
The Direct3D runtime creates each decode render target independently in a call to the user-mode display driver's
CreateResource function. Each of the targets is referenced as a subresource index of a single resource. The user-
mode display driver determines to create a decode render target if the DecodeRenderTarget bit-field flag in the
Flags member of D3DDDIARG_CREATERESOURCE is set.
Decoding Video
The Microsoft Direct3D runtime calls the user-mode display driver's DecodeBeginFrame and DecodeEndFrame
functions to indicate a time period between these function calls that the user-mode display driver can decode
video. Before the user-mode display driver can perform any video decode operations, the Microsoft Direct3D
runtime must call the user-mode display driver's SetDecodeRenderTarget function to set the render target
surface for those decode operations. However, the call to SetDecodeRenderTarget can occur only outside the
begin-frame and end-frame time period.
In protected mode and in the call to DecodeBeginFrame, the Direct3D runtime sets or changes a DirectX VA
content key in a variable that the pPVPSetKey member of the D3DDDIARG_DECODEBEGINFRAME structure
points to. The decode device uses this key for protected transfers of the compressed DirectX VA buffers for this and
subsequent frames.
Note The Direct3D runtime sets the pPVPSetKey pointer only to change or set the key. To keep the previously set
key in use, the runtime sets the pointer to NULL to avoid potentially time consuming reloading of the same key.
The driver does not eliminate the redundant settings. A decoder application must avoid redundant settings.
After the render target surface for decode operations is set, the user-mode display driver can receive calls to its
DecodeExecute function to perform video decode operations between the begin-frame and end-frame time
period.
In calls to DecodeExecute, not all of the buffer types that are specified in the CompressedBufferType members
of the DXVADDI_DECODEBUFFERDESC structures of the pCompressedBuffers array of the
D3DDDIARG_DECODEEXECUTE structure are used for each decode GUID that the hDecode member of
D3DDDIARG_DECODEEXECUTE specifies. For example, the slice-control (D3DDDIFMT_SLICECONTROLDATA),
inverse-quantization (D3DDDIFMT_INVERSEQUANTIZATIONDATA), and bit-stream
(D3DDDIFMT_BITSTREAMDATA) buffers are required only for variable-length decode (VLD) processing, and the
deblocking-control buffer (D3DDDIFMT_DEBLOCKINGDATA) is not used by MPEG-2 at all.
In protected mode, the buffers that were encrypted for a protected transfer with a content key contain a pointer to
initial counter values in their buffer descriptors (that is, in variables that the pCipherCounter members of the
DXVADDI_DECODEBUFFERDESC structures point to). Each call to the user-mode display driver's
DecodeExecute function must perform a protected transfer of such buffers to local video memory before
DecodeExecute uses the buffers' data in the decode operation. However, no plans exist to encrypt DirectX VA
compressed buffers of types other than residual-difference (D3DDDIFMT_RESIDUALDIFFERENCEDATA) and bit-
stream (D3DDDIFMT_BITSTREAMDATA) types.
Synchronizing Video Decode Operations
The synchronization mechanism for DirectX VA 2.0 is improved from the 1.0 version and is more similar to the
synchronization mechanisms used by Microsoft Direct3D operations.
In DirectX VA 1.0, synchronization is performed mainly by the decoder. Before the decoder can use a compressed
buffer, it calls the DdMoCompQueryStatus function to determine if the buffer is available for use (that is, the
hardware is not accessing the buffer). If the buffer is not available, the decoder must sleep, poll, or perform another
operation.
DirectX VA 2.0 uses the synchronization model that Direct3D already uses on vertex buffers and index buffers. In
DirectX VA 2.0, synchronization is performed by the decoder locking the compressed buffer. If the user-mode
display driver attempts to lock the compressed buffer and the buffer is in use, the driver can either fail the lock or
rename the buffer. The user-mode display driver requests that the video memory manager rename the buffer
when the driver sets the Discard member of the D3DDDICB_LOCKFLAGS structure in a call to the pfnLockCb
function. If the user-mode display driver renames the buffer, the driver returns a pointer to an alternative buffer so
that the decoder can continue without being blocked.
Typically, for DirectX VA 2.0, synchronization is only an issue if the hardware can consume the compressed buffers
directly without additional buffer copies.
Video Processing for DirectX VA 2.0
The following topics discuss video processing for DirectX VA 2.0:

Providing Capabilities for Video Processing
Creating a Video Processing Device
Creating a Render Target Surface for Video Processing
Processing Video Frames
Providing Capabilities for Video Processing
When its GetCaps function is called, the user-mode display driver provides the following video processing
capabilities based on the request type (which is specified in the Type member of the D3DDDIARG_GETCAPS
structure that the pData parameter points to):
D3DDDICAPS_GETVIDEOPROCESSORDEVICEGUIDCOUNT and
D3DDDICAPS_GETVIDEOPROCESSORDEVICEGUIDS request types
The user-mode display driver returns the number and a list of the following GUIDs that it supports for video
processing. The Microsoft Direct3D runtime specifies the DXVADDI_VIDEODESC structure for a particular video
stream to process in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The runtime first
requests the number of supported GUIDs followed by a request for the list of supported GUIDs.
DEFINE_GUID(DXVADDI_VideoProcProgressiveDevice,
0x5a54a0c9,0xc7ec,0x4bd9,0x8e,0xde,0xf3,0xc7,0x5d,0xc4,0x39,0x3b);
DEFINE_GUID(DXVADDI_VideoProcBobDevice, 0x335aa36e,0x7884,0x43a4,0x9c,0x91,0x7f,0x87,0xfa,0xf3,0xe3,0x7e);
D3DDDICAPS_GETVIDEOPROCESSORCAPS request type

Each video-processor mode that the user-mode display driver supports can have unique capabilities. The user-
mode display driver returns those capabilities when the D3DDDICAPS_GETVIDEOPROCESSORCAPS request type is
passed. The Direct3D runtime specifies a DXVADDI_VIDEOPROCESSORINPUT structure for the video-processing
mode to retrieve capabilities for in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The
user-mode display driver returns capabilities for the video-processing mode in a
DXVADDI_VIDEOPROCESSORCAPS structure that the pData member of D3DDDIARG_GETCAPS points to.
D3DDDICAPS_GETPROCAMPRANGE request type
The user-mode display driver returns a pointer to a DXVADDI_VALUERANGE structure that contains the range of
allowed values for a particular ProcAmp control property on a particular video stream. The Direct3D runtime
specifies a DXVADDI_QUERYPROCAMPINPUT structure for the ProcAmp control property on a particular video
stream in a variable that the pInfo member of D3DDDIARG_GETCAPS points to.
D3DDDICAPS_GETVIDEOPROCESSORRTFORMATCOUNT and D3DDDICAPS_GETVIDEOPROCESSORRTFORMATS
request types
The user-mode display driver returns the number and a list of render target formats that it supports for a particular
video processing mode. The Direct3D runtime specifies a DXVADDI_VIDEOPROCESSORINPUT structure for the
video-processor mode in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The user-mode
display driver returns render target formats that it supports in an array of D3DDDIFORMAT-typed values that the
pData member of D3DDDIARG_GETCAPS specifies.
D3DDDICAPS_GETVIDEOPROCESSORRTSUBSTREAMFORMATCOUNT and
D3DDDICAPS_GETVIDEOPROCESSORRTSUBSTREAMFORMATS request types
The user-mode display driver returns the number and a list of sub-stream formats that it supports for a particular
video processing mode. The Direct3D runtime specifies a DXVADDI_VIDEOPROCESSORINPUT structure for the
video-processor mode in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The user-mode
display driver returns sub-stream formats that it supports in an array of D3DDDIFORMAT-typed values that the
pData member of D3DDDIARG_GETCAPS specifies.
D3DDDICAPS_FILTERPROPERTYRANGE request type
The user-mode display driver returns a pointer to a DXVADDI_VALUERANGE structure that contains the range of
allowed values for a particular filter setting on a particular video stream when the
D3DDDICAPS_FILTERPROPERTYRANGE request type is passed. The Direct3D runtime specifies a
DXVADDI_QUERYFILTERPROPERTYRANGEINPUT structure for the filter setting on a particular video stream in a
variable that the pInfo member of D3DDDIARG_GETCAPS points to.
Creating a Video Processing Device
The Microsoft Direct3D runtime calls the user-mode display driver's CreateVideoProcessDevice function to
create a device for processing a video stream. When the Direct3D runtime is finished with the device, it calls the
user-mode display driver's DestroyVideoProcessDevice function.
Creating a Render Target Surface for Video
Processing
The Microsoft Direct3D runtime calls the user-mode display driver's CreateResource function to create render
target surfaces for video processing. The user-mode display driver determines that it should create a render target
surface for video processing from the presence of the VideoProcessRenderTarget bit-field flag in the Flags
member of the D3DDDIARG_CREATERESOURCE structure that the pResource parameter of CreateResource
points to. The user-mode display driver can use this render target for video processing but not necessarily for 3-D.
The user-mode display driver can perform video processing on regular RGB 3-D render target surfaces. However,
the user-mode display driver can often output to YUV formats that the 3-D hardware cannot support as a render
target.
The following are the only surface types that the driver should support as valid render targets for video processing:
RGB or YUV surfaces that are created with the VideoProcessRenderTarget bit-field flag.
RGB surfaces that are created with the RenderTarget bit-field flag.
RGB textures that are created with the RenderTarget and Texture bit-field flags.
Processing Video Frames
The Microsoft Direct3D runtime calls the user-mode display driver's VideoProcessBeginFrame and
VideoProcessEndFrame functions to indicate a time period between these function calls that the user-mode
display driver can process video frames. Before the user-mode display driver can process any video frames, the
Microsoft Direct3D runtime must call the user-mode display driver's SetVideoProcessRenderTarget function to
set the render target surface for video processing. However, the call to SetVideoProcessRenderTarget can occur
only outside the begin-frame and end-frame time period.
After the render target surface for video processing is set, the user-mode display driver can receive calls to its
VideoProcessBlt function to process video frames between the begin-frame and end-frame time period.
Extended Support for DirectX VA 2.0
The following topics discuss how a user-mode display driver can extend DirectX VA 2.0 support:
Providing Capabilities for DirectX VA 2.0 Extension Modes
Creating and Using a DirectX VA 2.0 Extension Device
Providing Capabilities for DirectX VA 2.0 Extension
Modes
When its GetCaps function is called, the user-mode display driver provides the following capabilities for DirectX VA
2.0 extension modes based on the request type (which is specified in the Type member of the
D3DDDIARG_GETCAPS structure that the pData parameter points to):
D3DDDICAPS_GETEXTENSIONGUIDCOUNT and D3DDDICAPS_GETEXTENSIONGUIDS request types
The user-mode display driver returns the number and a list of the GUIDs that it supports for extension modes. The
runtime first requests the number of supported GUIDs followed by a request for the list of supported GUIDs.
D3DDDICAPS_GETEXTENSIONCAPS request type
Each extension mode that the user-mode display driver supports can have unique capabilities. The user-mode
display driver returns those capabilities when the D3DDDICAPS_GETEXTENSIONCAPS request type is passed. The
Direct3D runtime specifies a DXVADDI_QUERYEXTENSIONCAPSINPUT structure for the extension GUID to
retrieve capabilities for in a variable that the pInfo member of D3DDDIARG_GETCAPS points to. The user-mode
display driver returns capabilities for the extension GUID in a private structure that the pData member of
D3DDDIARG_GETCAPS points to.
Creating and Using a DirectX VA 2.0 Extension Device
The Microsoft Direct3D runtime calls the user-mode display driver's CreateExtensionDevice function to create an
extension device for DirectX VA 2.0. When the Direct3D runtime is finished with the device, it calls the user-mode
display driver's DestroyExtensionDevice function.
The Direct3D runtime calls the user-mode display driver's DecodeExtensionExecute function to decode video on
a nonstandard decode device between a begin-frame and end-frame time period and on a specific render target
surface. For a general discussion about decoding video, see Decoding Video.
The Direct3D runtime calls the user-mode display driver's ExtensionExecute function to perform nonstandard
DirectX VA 2.0 operations on an extension device.
The following sections describe the new features of Direct3D version 10 and how to support and use the Direct3D
version 10 DDI:
Enabling Support for the Direct3D Version 10 DDI
Initializing Communication with the Direct3D Version 10 DDI
Rendering Pipeline
Using the State-Refresh Callback Functions
Using Direct3D Version 10 Handles
Handling Errors
Querying for Information from the GPU
Retroactively Requiring Free-Threaded CalcPrivate DDIs
DirectX Graphics Infrastructure DDI
To enable support for a user-mode display driver DLL's version 10 DDI, the INF file that installs the display drivers
for a graphics device must list the name of the DLL regardless of whether the Direct3D version 10 DDI exists in the
same DLL as the Direct3D version 9 DDI or in a separate DLL.
The Installation Requirements for Display Miniport and User-Mode Display Drivers section describes how a user-
mode display driver is installed and used according to the Windows Vista display driver model. To also enable
support for the Direct3D version 10 DDI, you must specify the name of the DLL that contains the version 10 DDI as
the second entry in the list of user-mode display driver names even if the version 10 DDI exists in the same DLL as
the version 9 DDI. The following example shows how support for the version 10 DDI is enabled if the version 10
DDI is contained in Umd10.dll (that is, a separate DLL from the version 9 DDI):
...
HKR,, UserModeDriverName, %REG_MULTI_SZ%, umd9.dll, umd10.dll
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, umd9, umd10
The following example shows how support for the version 10 DDI is enabled if the version 10 DDI is contained in
Umd.dll (that is, the same DLL as the version 9 DDI):
...
HKR,, UserModeDriverName, %REG_MULTI_SZ%, umd.dll, umd.dll
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, umd, umd

Initializing Communication with the Direct3D Version
10 DDI
To initialize communication with the user-mode display driver DLL's version 10 DDI, the Direct3D version 10
runtime first loads the DLL if the DLL is not yet loaded. The Direct3D runtime next calls the user-mode display
driver's OpenAdapter10 function through the DLL's export table to open an instance of the graphics adapter. The
OpenAdapter10 function is the DLL's only exported Direct3D version 10 function.
In the call to the driver's OpenAdapter10 function, the runtime supplies the pfnQueryAdapterInfoCb adapter
callback function in the pAdapterCallbacks member of the D3D10DDIARG_OPENADAPTER structure. The
runtime also supplies its version in the Interface and Version members of D3D10DDIARG_OPENADAPTER. The
user-mode display driver must verify that it can use this version of the runtime. The user-mode display driver must
not fail newer versions of the runtime because newer runtime versions can use previous DDI versions and
therefore can correctly communicate with drivers that implement those previous DDI versions. The user-mode
display driver returns a table of its adapter-specific functions in the pAdapterFuncs member of
D3D10DDIARG_OPENADAPTER.
The user-mode display driver should call the pfnQueryAdapterInfoCb adapter callback function to query for the
graphics hardware capabilities from the display miniport driver.
The runtime calls the user-mode display driver's CreateDevice(D3D10) function (one of the driver's adapter-
specific functions) to create a display device for handling a collection of render state and to complete the
initialization. When the initialization is complete, the Direct3D version 10 runtime can call the display driver-
supplied Direct3D version 10 functions, and the user-mode display driver can call the runtime-supplied functions.
The user-mode display driver's CreateDevice(D3D10) function is called with a D3D10DDIARG_CREATEDEVICE
structure whose members are set up in the following manner to initialize the user-mode display driver's version 10
DDI:
display driver.
The runtime sets Version to a number that the driver can use to identify when the runtime was built. For
requires.
The runtime sets hRTDevice to specify the handle that the driver should use when the driver calls back into
the runtime.
The runtime sets hDrvDevice to specify the handle that the runtime uses in subsequent driver calls.
structure to which pKTCallbacks points. The user-mode display driver calls the runtime-supplied callback
The user-mode display driver returns a table of its device-specific functions in the
D3D10DDI_DEVICEFUNCS structure to which pDeviceFuncs points.
The runtime supplies a DXGI_DDI_BASE_ARGS structure to which DXGIBaseDDI points. The runtime and
the user-mode display driver supply their DirectX Graphics Infrastructure DDI to this structure.
The runtime sets hRTCoreLayer to specify the handle that the driver should use when the driver calls back
into the runtime to access core Direct3D 10 functionality (that is, in calls to the functions that the
pUMCallbacks member specifies).
The runtime supplies a table of its core callback functions in the
D3D10DDI_CORELAYER_DEVICECALLBACKS structure to which pUMCallbacks points. The user-mode
display driver calls the runtime-supplied core callback functions to refresh state.
Note The number of display devices (graphics contexts) that can exist simultaneously is limited only by available
system memory.
Rendering Pipeline
Graphics hardware that supports Direct3D version 10 can be designed with shared programmable shader cores.
The graphics processing unit (GPU) can program shader cores that can be scheduled across the functional blocks
that make up the rendering pipeline. This load balancing means that hardware developers are not required to use
every shader type, but only the ones that are required to perform rendering. This load balancing can then free
resources for shader types that are active. The following figure shows the functional blocks of the rendering
pipeline. The sections that follow the figure describe the blocks in more detail.
Input Assembler
The input assembler stage uses fixed function operations to read vertices out of memory. The input assembler then
forms geometry primitives and creates pipeline work items. Auto-generated vertex identifiers, instance identifiers
(available to the vertex shader), and primitive identifiers (available to the geometry shader or pixel shader) enable
identifier-specific processing. The dotted line in the figure shows the flow of identifier-specific processing.
Vertex Shader
The vertex shader stage takes one vertex as input and outputs one vertex.
Geometry Shader
The geometry shader stage takes one primitive as input and outputs zero, one, or multiple primitives. Output
primitives can contain more data than possible without the geometry shader. The total amount of output data per
operation is (vertex size x vertex count).
Stream Output
The stream output stage concatenates (streams out) primitives that reach the output of the geometry shader to
output buffers. The stream output is associated with the geometry shader and both are programmed together.
Rasterizer
The rasterizer stage clips (including custom clip boundaries) primitives, performs perspective divide on primitives,
implements viewport and scissor selection, performs render-target selection, and performs primitive setup.
Pixel Shader
The pixel shader stage takes one pixel as input and outputs one pixel at the same position or no pixel. The pixel
shader cannot read current render targets.
Output Merger
The output merger stage performs fixed function render-target blend, depth, and stencil operations.
Input Assembler Stage
The input assembler (IA) introduces triangles, lines, or points into the rendering pipeline by pulling source
geometry data out of 1D buffers.
Vertex data can come from multiple buffers, and can be accessed in an array-of-structures fashion from each
buffer. The buffers are each bound to an individual input slot and given a structure stride. The layout of data across
all the buffers is specified by an input declaration, in which each entry defines an element. The element contains an
input slot, a structure offset, a data type, and a target register (for the first active shader in the pipeline).
A given sequence of vertices is constructed out of data that is fetched from buffers. The data is fetched in a traversal
that is directed by a combination of fixed-function state and various Draw\*() DDI calls. Various primitive
topologies (for example, point-list, line-list, triangle-list, and triangle-strip) are available to make the sequence of
vertex data represent a sequence of primitives.
Vertex data can be produced in one of two ways. The first way to produce vertex data is non-indexed rendering,
which is the sequential traversal of buffers that contain vertex data. The vertex data originates at a start offset at
each buffer binding. The second way to produce vertex data is indexed rendering, which is sequential traversal of a
single buffer that contains scalar integer indexes. The indexes originate at a start offset into the buffer. Each index
indicates where to fetch data out of a buffer that contains vertex data. The index values are independent of the
characteristics of the buffers that they refer to. Buffers are described by declarations. Non-indexed and indexed
rendering, each in their own way, produce addresses from which to fetch vertex data in memory, and subsequently
assemble the results into vertices and primitives.
Instanced geometry rendering is enabled by allowing the sequential traversal, in either non-indexed or indexed
rendering, to loop over a range within each vertex buffer (non-indexed case) or index buffer (indexed case). Buffer-
bindings can be identified as instance data or vertex data. This identification specifies how to use the bound buffer
while performing instanced rendering. The address that is generated by non-indexed or indexed rendering is used
to fetch vertex data, which also accounts for looping when the runtime performs instanced rendering. Instance data,
on the other hand, is always sequentially traversed starting from a per-buffer offset, at a frequency equal to one
step per instance (for example, one step forward after the number of vertices in an instance are traversed). The step
rate for instance data can also be chosen to be a sub-harmonic of the instance frequency (that is, one step forward
every other instance, every third instance, and so on).
Another special case of the IA is that it can read buffers that the stream output stage wrote to. Such a scenario
enables a new type of draw operation, DrawAuto. DrawAuto allows a dynamic amount of output that was written
to stream-output buffers to be reused, without the CPU involvement, to determine how much data was actually
written.
In addition to producing vertex data from buffers, the IA can auto-generate three scalar counter values: VertexID,
PrimitiveID, and InstanceID, for input to shader stages in the rendering pipeline.
In indexed rendering of strip topologies, such as triangle strips, a mechanism is provided for drawing multiple
strips with a single Draw\() call (that is, the **cut* command to cut strips).
The Direct3D runtime calls the following driver functions to create, set up, and destroy the IA:
CalcPrivateElementLayoutSize
CreateElementLayout
DestroyElementLayout
IaSetIndexBuffer
IaSetInputLayout
IaSetTopology
IaSetVertexBuffers
Vertex Shader Stage
The vertex shader stage processes vertices by performing operations such as transformations, skinning, and
lighting. Vertex shaders always operate on a single input vertex and produce a single output vertex. This stage of
the rendering pipeline must always be active.
The Direct3D runtime calls the following driver functions to create, set up, and destroy the vertex shader:
CalcPrivateShaderSize
CreateVertexShader(D3D10)
DestroyShader
VsSetConstantBuffers
VsSetSamplers
VsSetShader
VsSetShaderResources
Geometry Shader Stage
The geometry shader (GS) stage runs application-specified shader code with vertices as input and can generate
vertices on output. Unlike vertex shaders, which operate on a single vertex, the geometry shader's inputs are the
vertices for a full primitive (that is, two vertices for lines, three vertices for triangles, or a single vertex for a point)
plus the vertex data for the edge-adjacent primitives (that is, an additional two vertices for a line or an additional
three vertices for a triangle). The following figure shows examples of primitives that are input to a geometry
shader.
Another input to the geometry shader is a primitive ID that is auto-generated by the input assembler (IA). A
primitive ID allows the geometry shader to fetch or compute, if required, per-face data.
The geometry shader stage can output multiple vertices to form a single selected topology. Available GS output
topologies are tristrip, linestrip, and pointlist. The number of primitives that a geometry shader emits can vary,
though the maximum number of vertices that a geometry shader can emit must be declared statically. Strip lengths
that a geometry shader emits can be arbitrary (there is a cut command).
The output of the geometry shader can be sent to the rasterizer and to a vertex buffer in memory. Output that is
sent to memory is expanded to individual point, line, and triangle lists (similarly to how output is passed to the
rasterizer).
The geometry shader stage can implement the following algorithms:
Point Sprite Tessellation: The shader takes in a single vertex and generates four vertices (two output
triangles) that represent the four corners of a quad with arbitrary texcoords, normals, and other attributes.
Wide Line Tessellation: The shader receives two line vertices (LV0 and LV1) and generates four vertices for a
quad that represents a widened line. Additionally, a geometry shader can use the adjacent line vertices (AV0
and AV1) to perform mitering on line endpoints.
Fur/Fin Generation: Rendering multiple offsets potentially with different textures (extruded faces) to simulate
the parallactic effects of fur. Fins are extruded edges that often fade out if the angle is not oblique. Fins are
used to make objects look better at oblique angles.
Shadow Volume Generation: Adjacency information that is used to determine whether to extrude.
Single Pass Rendering to Multiple Texture Cube Faces: Primitives are projected and emitted to a pixel shader
six times. Each primitive is accompanied by a render-target array index, which selects a cube face.
Set up barycentric coordinates as primitive data so the pixel shader can perform custom attribute
interpolation.
A pathological case: An application generates some geometry, then n-patches that geometry, and then
extrudes shadow volumes out of that geometry. For such cases, multi-pass is the solution with the ability to
output vertex and primitive data to a stream and circulate the data back.
Note Because each call to the geometry shader can produce a varying number of outputs, parallel calls to
hardware are more difficult at this stage than when running other pipeline stages (such as vertex or pixel shader
stages) in parallel. While hardware implementations will run geometry shader calls in parallel, the complex
buffering that is required to accomplish parallel geometry shader calls means that applications should not require
the level of parallelism achievable at the geometry shader stage to be as much as other pipeline stages. In other
words, the geometry shader could become a bottleneck in the pipeline depending on the program load that the
geometry shader has. However, the goal is that algorithms that use the geometry shader's capability will still run
more efficiently than the application that has to emulate the behavior on hardware that is not able to generate
geometry programmatically.
The Direct3D runtime calls the following driver functions to create, set up, and destroy the geometry shader:
CalcPrivateGeometryShaderWithStreamOutput
CreateGeometryShader
CreateGeometryShaderWithStreamOutput
DestroyShader
GsSetConstantBuffers
GsSetSamplers
GsSetShader
GsSetShaderResources
Stream Output Stage
The stream output (SO) stage can stream out vertices to memory just before those vertices arrive at the rasterizer.
The stream output operates like a tap in the pipeline. This tap can be turned on even as data continues to flow
down to the rasterizer. Data that is sent out through the stream output is concatenated to buffers. These buffers can
be recirculated on subsequent passes as pipeline inputs.
One constraint about the stream output is that it is tied to the geometry shader, in that they must be created
together (though either can be "NULL"/"off"). Although, the particular memory buffers that are streamed out to are
not tied to a particular geometry shader and stream output pair. Only the description of which parts of the vertex
data to feed to a stream output is tied to the geometry shader.
The stream output might be useful for saving ordered pipeline data that will be reused. For example, a batch of
vertices might be "skinned" by passing the vertices into the pipeline as if they are independent points (just to visit
all of them once), applying "skinning" operations on each vertex, and streaming out the results to memory. The
saved out "skinned" vertices are subsequently available for use as input.
Because the amount of output that is written through the stream output is dynamic, a new type of Draw,
DrawAuto, is necessary to allow stream output buffers to be reused with the input assembler, without the CPU
involvement to determine how much data was actually written. In addition, queries are necessary to mitigate
stream output overflow, as well as retrieve how much data was written to the stream output buffers
(D3D10DDI_QUERY_STREAMOVERFLOWPREDICATE and D3D10DDI_QUERY_STREAMOUTPUTSTATS of the
D3D10DDI_QUERY enumeration).
The Direct3D runtime calls the following driver functions to create and set up the stream output:
CalcPrivateGeometryShaderWithStreamOutput
CreateGeometryShaderWithStreamOutput
SoSetTargets
Rasterizer Block
The rasterizer block clips, sets up primitives, and determines how to call the pixel shader stage. The Direct3D
runtime does not view the rasterizer block as a stage in the pipeline. Instead, the Direct3D runtime views the
rasterizer block as an interface between pipeline stages that happens to perform a significant set of fixed function
operations. Many of these fixed function operations can be adjusted by software developers.
The rasterizer always determines that input positions are provided in clip-space, performs clipping and perspective
divide, and applies viewport scale and offset.
The Direct3D runtime calls the following driver functions to create, set up, and destroy the state of the rasterizer:
CalcPrivateRasterizerStateSize
CreateRasterizerState
DestroyRasterizerState
SetRasterizerState
SetScissorRects
SetViewports
Pixel Shader Stage
Input data that is available to the pixel shader stage includes vertex attributes that can be selected, on a per-Element
basis, to be interpolated with or without perspective correction, or be treated as constant per-primitive.
Outputs are one or more 4-vectors of output data for the current pixel location, or no color (if the pixel is
discarded).
The Direct3D runtime calls the following driver functions to create, set up, and destroy the pixel shader:
CreatePixelShader(D3D10)
DestroyShader
PsSetConstantBuffers
PsSetSamplers
PsSetShader
PsSetShaderResources
Output Merger Stage
The final step in the logical pipeline is visibility determination, through stencil or depth, and writing or blending of
outputs to render targets, which can be one of many resource types. These operations, as well as the binding of
output resources (render targets), are defined at the output merger stage.
The Direct3D runtime calls the following driver functions to create, set up, clear, and destroy the output:
CalcPrivateBlendStateSize
CalcPrivateDepthStencilStateSize
CalcPrivateDepthStencilViewSize
ClearDepthStencilView
ClearRenderTargetView
CreateBlendState
CreateDepthStencilState
CreateDepthStencilView
DestroyBlendState
DestroyDepthStencilState
DestroyDepthStencilView
SetBlendState
SetDepthStencilState
SetPredication
SetRenderTargets
SetTextFilterSize
Using the State-Refresh Callback Functions
The user-mode display driver can use the Direct3D Runtime Version 10 State-Refresh Callback Functions to
achieve a stateless driver or to build up command buffer preamble data.
The Direct3D runtime supplies pointers to its state-refresh callback functions in the
D3D10DDI_CORELAYER_DEVICECALLBACKS structure that the pUMCallbacks member of the
D3D10DDIARG_CREATEDEVICE structure points to in a call to the CreateDevice(D3D10) function.
The user-mode display driver might call, for example, the pfnStateIaIndexBufCb state-refresh callback function,
while the driver is within a call to the driver's IaSetIndexBuffer function. This call is quite possible, especially
because the user-mode display driver might use the pfnStateIaIndexBufCb callback function to build a preamble,
and the call to IaSetIndexBuffer might exhaust the size of the command buffer and cause a flush. For such a
situation, the call to pfnStateIaIndexBufCb passes the same "new" binding information as the original call to
IaSetIndexBuffer. This situation results in a more optimal preamble.
Using Direct3D Version 10 Handles
Direct3D version 10 handles are strongly typed to prevent misusage and to enable the compiler to detect
mismatched handle types. Direct3D version 10 handles have life spans that start with a call to a create-type
function (for example, CreateGeometryShader) and end with a call to a destroy-type function (for example,
DestroyShader). Three categories of handles exist for Direct3D version 10. The first two categories of handles are
driver handles, which the Direct3D runtime uses to communicate with the driver, and runtime handles, which the
driver uses to communicate with the runtime. The third category of handles are kernel handles. The following
sections describe the Direct3D version 10 handles:
Direct3D Version 10 Runtime and Driver Handles
Direct3D Version 10 Kernel Handles
Direct3D Version 10 Runtime and Driver Handles
The Direct3D version 10 runtime and driver handles share the same life span. The Direct3D runtime specifies the
lifetime of an object between calls to create-type functions (for example, CreateResource(D3D10)) and calls to
destroy-type functions (for example, DestroyResource(D3D10)). The runtime provides driver-handle values as
well as runtime-handle values. These handles are essentially pointers that are wrapped with a strong type to
identify the object that is being operated on. The following are examples of runtime and driver handles for
resources:
// Strongly typed handle to identify a resource object to the driver:

typedef struct D3D10DDI_HRESOURCE
{
void* pDrvPrivate; // Pointer to memory location as large as the driver requested.
} D3D10DDI_HRESOURCE;
// Strongly typed handle to identify a resource object to the runtime:

typedef struct D3D10DDI_HRTRESOURCE
{
void* handle;
} D3D10DDI_HRTRESOURCE;
All driver handles for a rendering device object and its children objects undergo the following two-pass creation
mechanism:
1. To determine the value of the driver handle pointer, the runtime first calls a CalcPrivateObjTypeSize
function (for example, the CalcPrivateResourceSize function). In this call, the runtime passes in the
creation parameters (for example, a pointer to the D3D10DDIARG_CREATERESOURCE structure). The
runtime also passes in the creation parameters in the call to a CreateObjType function.
The user-mode display driver is generally not required to allocate anything during a call to
CalcPrivateObjTypeSize. However, if the driver does and fails or must indicate any other type of failure
condition, the driver can return SIZE_T( -1 ) to prevent handle creation. The runtime then returns an
E_OUTOFMEMORY error condition to the calling application.
Minimally, the driver should return sizeof( void* ) from a call to CalcPrivateObjTypeSize.
2. If the runtime can allocate enough space to satisfy the size required by the user-mode display driver, the
runtime will then call a CreateObjType function (for example, CreateResource(D3D10)) with the same
creation parameters, along with the new unique value for the driver handle. The pointer value of the driver
handle will be unique and constant for the life span of the handle, as it points to a region of memory the size
of which was returned by CalcPrivateObjTypeSize. The user-mode display driver can use this region of
memory as required. The driver should gain an increase in efficiency by locating any frequently accessed
data into the region of memory provided by the runtime.
Direct3D Version 10 Kernel Handles
The Direct3D version 10 kernel handle life spans are typically controlled by the user-mode display driver explicitly.
Such handles allow the user-mode display driver to manipulate allocations. Such handles can also allow the user-
mode display driver to perform other interactions with the kernel (including interactions with the display miniport
driver).
The following shows an example of a kernel handle for a resource:
// Strongly typed handle to identify a resource object to the driver:

typedef struct D3D10DDI_HKMRESOURCE
{
D3DKMT_HANDLE handle;
} D3D10DDI_HKMRESOURCE;

Handling Errors
The Direct3D version 10 functions that a user-mode display driver implements typically have VOID for a return
parameter type. The primary exception to this rule is the CalcPrivateObjTypeSize-type function (for example, the
CalcPrivateResourceSize function). This type of function returns a SIZE_T parameter type that indicates the size of
the memory region that the driver requires for creating the particular object type through the CreateObjType-type
function (for example, CreateResource(D3D10)).
Returning VOID prevents the user-mode display driver from notifying the Direct3D runtime of errors in the
conventional way (that is, through a user-mode display driver's function return parameter). Instead, the user-mode
display driver must use the Direct3D runtime's pfnSetErrorCb callback function to pass such information back to
the runtime. The runtime supplies a pointer to its pfnSetErrorCb in the
D3D10DDI_CORELAYER_DEVICECALLBACKS structure that the pUMCallbacks member of the
D3D10DDIARG_CREATEDEVICE structure points to in a call to the CreateDevice(D3D10) function.
The reference page for each user-mode display driver function specifies the errors that the function can pass
through a call to pfnSetErrorCb. This means that if the user-mode display driver calls pfnSetErrorCb with an error
code that is not allowed for the current user-mode display driver function, the runtime determines that the error
condition is critical and acts appropriately. Because the runtime will act appropriately during pfnSetErrorCb, you
should not expect that you can reverse the effects of calling pfnSetErrorCb( E_FAIL ) by calling something like
pfnSetErrorCb( S_OK ). In fact, the runtime determines that S_OK is just as invalid or critical as E_FAIL. The concept
of an S_OK return code is equivalent to the user-mode display driver function not calling pfnSetErrorCb at all.
If the Direct3D runtime determines that an error condition is critical, it will first take action by logging the error with
Dr. Watson--the default post-mortem (just-in-time) debugger. The runtime will then lose the device on purpose,
thereby emulating the scenario of receiving the D3DDDIERR_DEVICEREMOVED error code. By requiring the driver
to call the pfnSetErrorCb callback function, the odds are much greater that every error coming out of the driver
will have a useful call stack associated with it. Having a call stack associated with an error enables quick diagnosis
and accurate Dr. Watson logs.
You should use pfnSetErrorCb in your driver code when something goes wrong in your driver even though
returning an error code that the runtime does not allow for the particular driver function is determined by the
runtime as a driver bug or issue. It would be even worse for the user-mode display driver to absorb critical errors
and continue on. The user-mode display driver should call pfnSetErrorCb as close to the point of the error
detection as possible to provide a useful call stack for post-mortem debugging.
The following table lists the categories of errors that the Direct3D runtime allows from particular driver functions.
ERROR CATEGORY MEANING
NoErrors The driver should not encounter any errors, including

D3DDDIERR_DEVICEREMOVED. The runtime will
determine that any call to pfnSetErrorCb is critical.
AllowDeviceRemoved The driver should not encounter any errors, except for
D3DDDIERR_DEVICEREMOVED. The runtime will
determine that any call to pfnSetErrorCb that does not
pass D3DDDIERR_DEVICEREMOVED is critical. The driver
is not required to return DEVICEREMOVED if the device
has been removed. However, the runtime allows the driver
to return DEVICEREMOVED, in case DEVICEREMOVED
interfered with the driver function, which typically should
not happen.
AllowOutOfMemory The driver can possibly run out of memory. Therefore, the
driver can pass E_OUTOFMEMORY and
D3DDDIERR_DEVICEREMOVED through pfnSetErrorCb.
The runtime will determine that any other error codes are
critical.
AllowCounterCreationErrors The driver can possibly run out of memory. The driver also
might be unable to create counters due to the exclusive
nature of counters. Therefore, the driver can pass
E_OUTOFMEMORY, DXGI_DDI_ERR_NONEXCLUSIVE, and
D3DDDIERR_DEVICEREMOVED through pfnSetErrorCb.
critical.
AllowMapErrors The driver should check for resource contention.

Therefore, the driver can pass
DXGI_DDI_ERR_WASSTILLDRAWING through
pfnSetErrorCb if the
D3D10_DDI_MAP_FLAG_DONOTWAIT flag was passed
into the driver's ResourceMap function. The driver can
also pass D3DDDIERR_DEVICEREMOVED through
pfnSetErrorCb. The runtime will determine that any other
error codes are critical.
AllowGetDataErrors The driver should check for query completion. Therefore,

the driver can pass DXGI_DDI_ERR_WASSTILLDRAWING
through pfnSetErrorCb if the query has not finished yet.
The driver can also pass D3DDDIERR_DEVICEREMOVED
through pfnSetErrorCb. The runtime will determine that
any other error codes are critical.
AllowWKCheckCounterErrors The driver's CheckCounter function should indicate

whether it supports any runtime-defined counters.
Therefore, the driver can pass
DXGI_DDI_ERR_UNSUPPORTED through pfnSetErrorCb.
critical.
The driver cannot return D3DDDIERR_DEVICEREMOVED
for any check-type function.
AllowDDCheckCounterErrors The driver should validate the device-dependent counter

identifier (counter ID) to ensure that the counter ID is
within range and there is enough room to copy each
counter string into the provided buffer. The driver can
pass E_INVALIDARG through pfnSetErrorCb, when the
parameters are incorrect in this way.
The driver cannot return D3DDDIERR_DEVICEREMOVED
for any check-type function.

Querying for Information from the GPU
The Direct3D runtime might require information from the graphics processing unit (GPU) other than an output
render target or output vertex buffer. Because the GPU executes in parallel with the CPU, the user-mode display
driver should supply functions that expose the asynchronous nature of communication with the GPU efficiently.
The query object is the resource that the runtime and driver use for asynchronous notification. To create a query
object, the runtime first calls the driver's CalcPrivateQuerySize function so that the driver can supply the size of
the memory region that the driver requires for the query object. The runtime then calls the driver's
CreateQuery(D3D10) function to create the query object. In the CalcPrivateQuerySize and CreateQuery(D3D10)
calls, the runtime supplies a query-type value from the D3D10DDI_QUERY enumeration in the Query member of
the D3D10DDIARG_CREATEQUERY structure that the pCreateQuery parameters point to.
Each query object instance exists in one of three states: building, issued, and signaled. The runtime calls the driver's
QueryBegin function to transition the query object to the building state.
Note All query types support QueryBegin except for D3D10DDI_QUERY_EVENT and
D3D10DDI_QUERY_TIMESTAMP. The building concept does not exist for D3D10DDI_QUERY_EVENT and
D3D10DDI_QUERY_TIMESTAMP.
The runtime calls the driver's QueryEnd function to transition the query object to the issued state. Transitions to
the signaled state occur asynchronously some time later. The runtime calls the driver's QueryGetData function to
detect whether the query has transitioned to the signaled state. If the query is in the signaled state, QueryGetData
can pass back data that applies to the query in the memory region that the pData parameter points to.
All query objects of the same type are FIFO (that is, first-in, first-out). For example, all query objects of type
D3D10DDI_QUERY_EVENT complete in FIFO order based on their issued order. However, query objects of different
types can complete or signal in an overlapping order. For example, a query of type D3D10DDI_QUERY_EVENT can
complete before a query of type D3D10DDI_QUERY_OCCLUSION, even if the runtime issued the
D3D10DDI_QUERY_EVENT query after the runtime issued the D3D10DDI_QUERY_OCCLUSION query.
When the runtime no longer requires the query object, the runtime frees the memory region that the runtime
previously allocated for the object and calls the driver's DestroyQuery(D3D10) function to notify the driver that
the driver can no longer access this memory region.
Retroactively Requiring Free-Threaded CalcPrivate
DDIs
Direct3D version 11 retroactively requires user-mode display driver functions that begin with pfnCalcPrivate on
Direct3D version 10 DDI functions that are free-threaded. This retroactive requirement matches the behavior of the
Direct3D version 11 DDI to always require pfnCalcPrivate\* and pfnCalcDeferredContextHandleSize functions
that are free-threaded even if the driver indicates it does not support DDI threading. For more information about
how the driver indicates threading support, see Supporting Threading, Command Lists, and 3-D Pipeline. The
reason for this retroactive requirement is that such functions are typically very simple as they return an immediate
value for size. The functions that are more complex decide which immediate value to return based on the
parameters that are passed to the function. The requirement for functions that begin with pfnCalcPrivate to
actually write any data to places other than the stack does not exist. The requirement for these functions to read
any data other than parameters is a rarity. Any requirement to read data does not produce contention issues. This
fact allows the Direct3D version 11 API to take a much needed optimization and prevent from performing
expensive synchronization twice per create (for example, any call to create an object like a call to
CreateResource(D3D10) or CreateGeometryShader), instead of just once.
A notable exception to this retroactive free-threaded requirement is the CalcPrivateDeviceSize function that is
used to satisfy display device creation. CalcPrivateDeviceSize is located on the adapter function table
(D3D10_2DDI_ADAPTERFUNCS or D3D10DDI_ADAPTERFUNCS). CalcPrivateDeviceSize does not fall
underneath the group of functions that experienced the relaxation in threading model. It is not required to free-
thread the CalcPrivateDeviceSize function.
DirectX Graphics Infrastructure DDI
The DirectX Graphics Infrastructure (DXGI) was developed with the realization that some parts of graphics evolve
more slowly than others. DXGI provides a common framework for future graphics components. The first Direct3D
runtime version that takes advantage of DXGI is Direct3D version 10. In previous versions of the Direct3D runtime,
access to low-level tasks was included in the Direct3D runtime. DXGI defines a DDI that manages low-level shared
tasks independently from the Direct3D runtime. The following tasks are now implemented with DXGI, and you can
use the DXGI DDI to handle these tasks:
Presentation
Gamma correction control
Resource residency
Resource priority
The following sections describe how the user-mode display driver supports and uses the DXGI DDI:
Supporting the DXGI DDI
Passing DXGI Information at Resource Creation Time
DXGI Presentation Path
Setting DXGI Information in the Registry
Supporting the DXGI DDI
To support the Microsoft DirectX Graphics Infrastructure (DXGI) device driver interface (DDI), the user-mode display
driver must include the Dxgiddi.h header file. Dxgiddi.h also includes the Dxgitype.h header file, which contains
definitions that are shared with application-level DXGI constructs. Dxgiddi.h defines several user-mode display
driver entry points and a DXGI callback function that the driver can use to communicate with the kernel (including
the display miniport driver).
The Microsoft Direct3D runtime supplies access to the DXGI DDI in the DXGI_DDI_BASE_ARGS structure that the
DXGIBaseDDI member of the D3D10DDIARG_CREATEDEVICE structure points to in a call to the
CreateDevice(D3D10) function. The user-mode display driver supplies pointers to these DXGI functions:
Direct3D Version 10 DXGI Functions
Direct3D Version 11.1 DXGI Functions
Direct3D Version 11.2 DXGI Functions
The driver implements these functions through members of the structures that the pDXGIDDIBaseFunctionsXxx
members of DXGI_DDI_BASE_ARGS point to. The driver should record the pointer to the DXGI callback function
table that the pDXGIBaseCallbacks member of DXGI_DDI_BASE_ARGS points to for later use. The driver should
record the pointer to the DXGI callback function table rather than record the individual pointer to the DXGI callback
function because the Direct3D runtime can change the address of the callback function whenever there is no thread
inside the user-mode display driver. A further DXGI user-mode display driver requirement exists for software
rasterizers. Such a user-mode display driver (more specifically, any driver that does not support hardware that is
shared with the Direct3D version 9 DDI implementation on the graphics adapter) must return the
DXGI_STATUS_NO_REDIRECTION value instead of the S_OK value from its CreateDevice(D3D10) function.
Returning DXGI_STATUS_NO_REDIRECTION indicates to DXGI that it should not use the shared resource
presentation path to effect communication with the Desktop Window Manager (DWM). The shared resource
presentation path is created when calls to shared-resource functions (that is, CreateResource(D3D10) and
OpenResource(D3D10) functions with the D3D10_DDI_RESOURCE_MISC_SHARED flag set) occur. However,
DXGI should instead use techniques relevant to a swapchain whose buffers are available only to the CPU. For
example, DXGI should move rendered data from the back buffer to the desktop by means other than the shared
resource presentation path. In this situation, DXGI actually calls the driver's PresentDXGI function to move
rendered data rather than effect communication with the DWM.
Passing DXGI Information at Resource Creation Time
The Direct3D version 10 runtime can pass DXGI-specific information when it calls the user-mode display driver's
CreateResource(D3D10) function to create a resource. The runtime can pass a pointer to a
DXGI_DDI_PRIMARY_DESC structure in the pPrimaryDesc member of the D3D10DDIARG_CREATERESOURCE
structure to specify that the resource can be used as a primary (that is, the resource can be scanned out to the
display). The runtime sets pPrimaryDesc to a non-NULL value only if the runtime also sets the
D3D10_DDI_BIND_PRESENT bit in the BindFlags member of D3D10DDIARG_CREATERESOURCE.
The runtime can specify the DXGI_DDI_PRIMARY_OPTIONAL flag in the Flags member of
DXGI_DDI_PRIMARY_DESC to notify the user-mode display driver that the driver can opt out from using the
resource in a flip-style presentation. To notify the runtime that it should not use the resource in flip-style
presentations, the driver sets the DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT flag in the DriverFlags
member of DXGI_DDI_PRIMARY_DESC.
If the driver returns DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT in the CreateResource(D3D10) call to
create the resource, the runtime will always perform a bit-block transfer (bitblt)-style presentation (instead of a
flip-style presentation) when the resource is the source of the presentation. This functionality is useful if graphics
hardware cannot scan out a particular subset of a given resource type. For example, graphics hardware might or
might not be able to scan out a multisampled back buffer type of resource. In addition, the ability to scan out
multisampled back buffers might further depend on the format of the surface. If the graphics hardware was not
able to scan out a particular multisampled format, the user-mode display driver would set the
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT flag in the DriverFlags member of DXGI_DDI_PRIMARY_DESC
for the resource with this format.
If the runtime does not set the DXGI_DDI_PRIMARY_OPTIONAL flag in the Flags member of
DXGI_DDI_PRIMARY_DESC to notify the driver about the possibility of opting out of using the resource in a flip-
style presentation, the driver can still return the DXGI_DDI_ERR_UNSUPPORTED error code along with the
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT flag from a call to CreateResource(D3D10). The driver's
CreateResource(D3D10) passes DXGI_DDI_ERR_UNSUPPORTED in a call to the pfnSetErrorCb function if the
driver cannot scan out such a primary. Returning DXGI_DDI_ERR_UNSUPPORTED along with
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT causes DXGI to interpose a proxy surface in the presentation
path, between the back buffers and the primary surface. The proxy surface always matches the primary (scanned-
out) surface in terms of size, multisample, and rotation. The first step in this process is for DXGI to determine which
of the multisample or rotation settings cause the driver to refuse to scan out a surface with those settings. DXGI
makes this determination by scaling back and trying to create a primary without rotation, without multisampling,
or without both. After DXGI determines the driver's support for scan-out features, DXGI creates the primary and
proxy surfaces, and the driver should be able to flip between these two surfaces. DXGI will still subsequently satisfy
an application's requests for auto-rotated or multisampled back buffers by calling the driver's BltDXGI function to
perform bitblts from back buffers to the proxy surface. These bitblts request the driver to perform multisample
resolves or rotates. For more information about BltDXGI, see the BltDXGI reference page.
DXGI Presentation Path
DXGI provides applications with a presentation methodology that "just works." For example, applications are not
required to perform any special operations to transition between windowed mode and full-screen mode. This
presentation methodology is possible because DXGI and the user-mode display driver work together to preserve
presentation across combinations of Multiple Sample Anti Aliasing (MSAA), monitor rotation, back and front buffer
differences in size and format, and full-screen versus windowed modes. Another advantage of DXGI is that it allows
a display adapter to have limited ability to scan-out MSAA and rotated surfaces because DXGI provides a "stateless"
DDI. In a stateless DDI, the adapter's driver is not required to record data across DDI calls.
The basic task of presentation is to move data from a rendered back buffer to the primary surface for viewing. This
task is performed in the different situations that are described in the following sections.
Windowed mode with DWM on
In the windowed mode with Desktop Windows Manager (DWM)-on case, DXGI communicates with DWM and
opens a view of a shared resource that is a render target for the DXGI producer and a texture for DWM. This shared
resource exists in addition to any back buffers that the application creates. DXGI calls the driver's BltDXGI function
to move data from any of the back buffers to the shared surface. This operation might require stretch, color
conversion, and MSAA resolve. However, this operation never requires source and destination sub-rectangles. In
fact, these sub-rectangles cannot be expressed in the call to BltDXGI. This bit-block transfer (bitblt) always has the
Present flag set in the Flags member of the DXGI_DDI_ARG_BLT structure that the pBltData parameter points to.
Setting the Present flag indicates that the driver should perform the operation atomically. The driver performs the
bitblt operation atomically to minimize the possibility of tearing while the DWM reads the shared resource for
composition.
Windowed mode with DWM off
In the windowed mode with DWM-off case, DXGI calls the driver's PresentDXGI function with the Blt flag set in the
Flags member of the DXGI_DDI_ARG_PRESENT structure that the pPresentData parameter points to. In this
PresentDXGI call, DXGI can specify any of the application-created back buffers in the hSurfaceToPresent and
SrcSubResourceIndex members of DXGI_DDI_ARG_PRESENT. There is no additional shared surface.
Full-screen mode
The full-screen case is more complicated than the windowed mode with DWM either on or off.
When DXGI makes the transition to full-screen mode, it attempts to exploit a flip operation in order to reduce
bandwidth and gain vertical-sync synchronization. The following conditions can prevent the use of a flip operation:
The application did not re-allocate its back buffers in a way that they match the primary surface.
The driver specified that it will not scan-out the back buffer (for example, because the back buffer is rotated
or is MSAA).
The application specified that it cannot accept the Direct3D runtime discarding of the back buffer's contents
and requested only one buffer (total) in the chain. (In this case, DXGI allocates a back surface and a primary
surface; however, DXGI uses the driver's PresentDXGI function with the Blt flag set.)
When one of the preceding conditions has occurred thereby preventing a flip operation and a call to the driver's
PresentDXGI function with the Blt flag set is also not appropriate (because the back buffer does not match the
front buffer exactly), DXGI allocates the proxy surface. This proxy surface matches the front buffer. Therefore, a flip
between the proxy surface and the front buffer becomes possible. If the proxy surface exists, DXGI uses the driver's
BltDXGI function with the Present flag cleared (0) to copy the application's back buffers to the proxy surface. In
this BltDXGI call, DXGI might request converting, stretching, and resolving. DXGI then calls the driver's PresentDXGI
function with the Flip flag set in the Flags member of the DXGI_DDI_ARG_PRESENT structure to move the proxy
surface bits to scan-out.
To notify the user-mode display driver that the driver can opt out from scanning out, the driver will receive
resource-creation calls for optional and non-optional classes of scan-out surfaces. Optional scan-out surfaces are
designated by the DXGI_DDI_PRIMARY_OPTIONAL flag. Non-optional scan-out surfaces do not have the
DXGI_DDI_PRIMARY_OPTIONAL flag set. For more information about these types of resource-creation calls, see
Passing DXGI Information at Resource Creation Time.
DXGI sets the DXGI_DDI_PRIMARY_OPTIONAL flag to create all back buffer surfaces (that is, optional surfaces) and
does not set the flag for any front buffer or proxy surface (that is, non-optional surface).
If DXGI_DDI_PRIMARY_OPTIONAL is set for a back buffer, the driver can set the
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT flag. For more information about setting this flag, see Passing
DXGI Information at Resource Creation Time. If the driver sets DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT
for an optional buffer, it has no effect other than to cause DXGI to call the driver's PresentDXGI function with the
Blt flag set instead of with the Flip flag set.
If DXGI_DDI_PRIMARY_OPTIONAL is not set for a front buffer or the proxy surface, the driver can still opt out of
scan-out by failing the resource creation call with error code DXGI_DDI_ERR_UNSUPPORTED and setting
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT.
Note Failing the create call without setting DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT is reserved for real
failure cases, like out of memory.
DXGI exploits this opt-out methodology when it attempts to create a full-screen presentation chain for an MSAA or
rotated back buffer. If the driver will not scan-out any or both of these types, the driver will opt out. DXGI will then
attempt to create a non-rotated surface, a non-MSAA surface, or both until the driver accepts the resource creation.
Therefore, DXGI will fall back progressively until the non-optional surface exactly matches the front buffer format,
sample count, rotation, and size.
If the driver opts out of any non-optional surface, DXGI still must have a way to move bits from the back buffer to
the primary surface. Consequently, if the driver opts out of scan-out for MSAA and rotation, the driver opts in to
resolving, rotating, or both when DXGI calls the driver's BltDXGI function. When the driver opts out, DXGI will
create a proxy surface and call BltDXGI to move data from the back buffers to that proxy surface. The driver should
have no reason to opt-out of this proxy surface because the proxy exactly matches the front buffer.
The following unusual situations occur when the application does not re-create its surfaces after a transition either
into or out of full-screen mode:
If the application does not re-create its surfaces when it goes into full-screen mode, DXGI determines that
the back buffers do not match the front buffer, even if they really do match on format, size, rotation, and
sample count. The reason for this determination is that the operating system requires back buffers to be
tagged for scan-out to a particular monitor when those buffers are created. Windowed back buffers cannot
yet be definitively assigned to a particular monitor because the monitor is chosen dynamically when full
screen is entered. Therefore, DXGI must not send these back buffers to the driver for scan-out (through a flip
operation). Applications of this type typically force DXGI to create the proxy surface.
If the application does not re-create its back buffers when it returns to windowed mode, DXGI might call the
driver's BltDXGI or PresentDXGI (with Blt set) to perform a bitblt on a surface that was previously created
for a flip operation. This situation should not be an issue but is mentioned here for completeness. Note that
DXGI always destroys the proxy surface when the application transitions to windowed mode.
Also, note that applications can resize their back buffers dynamically while the applications are in full-screen mode.
This action causes the logic that is described in the preceding situations to occur again. Therefore, the proxy surface
might be created and destroyed, and opting out might or might not be required over time even though the
application remains in full-screen mode. The application can also transfer its output to another monitor
dynamically without leaving full-screen mode. Therefore, the application incurs a switch back to bitblt mode
because the application's back buffers were tagged for a different monitor.
Finally, you should be aware of the situation that occurs with respect to MSAA back buffers if the driver does not
opt out of MSAA scan-out. In this situation, the driver opts in the scan-out of MSAA. Therefore, DXGI interchanges
the MSAA back buffer and MSAA front buffer through flip operations, and performs a resolve operation by what is
equivalent to the digital-to-analog converter (DAC). In this situation, the application can resize its back buffers
dynamically while in full-screen mode, which forces DXGI to switch to calling the driver's BltDXGI function.
Because the MSAA characteristics of the back buffer and front buffer still match, DXGI will specify that the driver
perform a non-resolving, possibly color-converting, stretch bitblt. The driver should then replicate, without resolve,
multisamples to the front buffer, which is necessary if a driver chooses to scan-out MSAA.
Setting DXGI Information in the Registry
DXGI and the reference rasterizer use the following registry keys:
DWORD Software\Microsoft\DXGI\DisableFullscreenWatchdog
Set to 1 to disable the watchdog thread.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\FlushOften
Set to 1 to flush often.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\FenceEachEntryPoint
Set to 1 to make each call to a DDI function fence with the GPU. Fencing with the GPU means to flush the command
batch and block until the GPU is idle.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\Debug
Set to 1 to:
Flush often and make each call to a DDI function fence with the GPU.
Run the reference rasterizer (RefRast) single threaded.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\D3D10RefGdiDisplayMask
Each bit in the DWORD mask enables (if set to 1) or disables (if set to 0) the display monitor, which is controlled by
the reference device.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\SingleThreaded
Set to 1 to enable running RefRast single threaded.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\ForceHeapAlloc
Set to 1 to make the reference device create resources by using the regular process heap, versus other allocation
mechanisms.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\AllowAsync
Set to 1 to allow the reference device's second thread to run asynchronously (that is, multiple command buffers are
allowed to be outstanding).
The reference hardware typically runs in a second thread; however, this second thread completes all its work before
the primary thread can continue.
DWORD Software\Microsoft\Direct3D\ReferenceDevice\SimulateInfinitelyFastHW
Set to 1 to make the reference device's simulated hardware process only a few limited commands to give the
appearance that the reference device is really fast (by essentially doing nothing).
The driver can use this key as a performance tool.
The following sections describe features of Direct3D version 10.1:

Version Numbers for Direct3D Version 10.1
Supporting Extended Format Awareness
Version Numbers for Direct3D Version 10.1
Direct3D versions 10.0 and 10.1 supply #defines that the user-mode display driver uses for versioning. The user-
mode display driver must examine the Interface member of the D3D10DDIARG_OPENADAPTER,
D3D10DDIARG_CREATEDEVICE, and D3D10DDIARG_CALCPRIVATEDEVICESIZE structures that the driver
receives in calls to the OpenAdapter10, CreateDevice(D3D10), and CalcPrivateDeviceSize functions to
determine the version of the Direct3D DDI that the Direct3D runtime supports. The most significant 16 bits of the
Interface member is the number of the Direct3D DDI major version. For Direct3D versions 10.0 and 10.1, this
number is 10. The least significant 16 bits of the Interface member is the Direct3D DDI minor version. This minor-
version value is bumped every time a Direct3D DDI breaking change is introduced. This minor-version value can
also be bumped artificially to signify a stronger version change. The following #defines associate a Direct3D DDI
minor version with a released version number (that is, D3D10_0 == x, D3D10_1 == y, where y > x).
The user-mode display driver should only examine the most significant 16 bits of the Version member of the
D3D10DDIARG_OPENADAPTER, D3D10DDIARG_CREATEDEVICE, and
D3D10DDIARG_CALCPRIVATEDEVICESIZE structures to determine when the Direct3D runtime is built. This value
is manually bumped every time there is a non-breaking Direct3D DDI change. The driver might come to depend on
each non-breaking DDI change over time. Therefore, the driver should ensure that the passed in DDI build version
is greater than or equal to the *_BUILD_VERSION of the current driver and fail out if the driver is incompatible
(perhaps while also providing a registry workaround). The least significant 16 bits of the Version member is the
DDI revision version. The least significant 16 bits of Version is typically used to special case the driver based on
bugs that are present in the Direct3D API. The driver must succeed creation for all values. However, the driver can
change behavior depending on certain values. You should compare with these values by using >= because the
numbers might rise arbitrarily due to runtime fixes. Also, you should not use "> (previous broken version)" (rather
than ">= working version") because new revisions might appear that have version numbers between the two
known numbers and do not contain the required fixes. The following #defines are for Direct3D DDI versioning:
#define D3D10_DDI_MAJOR_VERSION 10
#define D3D10_0_DDI_MINOR_VERSION 1
#define D3D10_0_DDI_INTERFACE_VERSION ((D3D10_DDI_MAJOR_VERSION << 16) | D3D10_0_DDI_MINOR_VERSION)
#define D3D10_0_DDI_BUILD_VERSION 4
#define D3D10_0_DDI_VERSION_VISTA_GOLD ( ( 4 << 16 ) | 6000 )
#define D3D10_0_DDI_VERSION_VISTA_GOLD_WITH_LINKED_ADAPTER_QFE ( ( 4 << 16 ) | 6008 )
#define D3D10_0_DDI_IS_LINKED_ADAPTER_QFE_PRESENT(Version) (Version >=
D3D10_0_DDI_VERSION_VISTA_GOLD_WITH_LINKED_ADAPTER_QFE)
#if D3D10DDI_MINOR_HEADER_VERSION >= 1

#define D3D10_1_DDI_MINOR_VERSION 2
#define D3D10_1_DDI_INTERFACE_VERSION ((D3D10_DDI_MAJOR_VERSION << 16) | D3D10_1_DDI_MINOR_VERSION)
#define D3D10_1_DDI_BUILD_VERSION 1
// Note: d3d10_1 doesn't currently ship on vista gold. // This definition is included for completeness in
the
// event that it does at some point in the future:
#define D3D10_1_DDI_VERSION_VISTA_GOLD ( ( 1 << 16 ) | 6000 )
#define D3D10_1_DDI_VERSION_VISTA_SP1 ( ( 1 << 16 ) | 6008 )
#define D3D10_1_DDI_IS_LINKED_ADAPTER_QFE_PRESENT(Version) (Version >= D3D10_1_DDI_VERSION_VISTA_SP1)
#define D3D10on9_DDI_MINOR_VERSION 0
#define D3D10on9_DDI_INTERFACE_VERSION ((D3D10_DDI_MAJOR_VERSION << 16) | D3D10on9_DDI_MINOR_VERSION)
#define D3D10on9_DDI_BUILD_VERSION 0

Supporting Extended Format Awareness
This section applies only to Windows 7 and later operating systems.

Several new formats are defined for the version of Direct3D 10.1 that Windows 7 provides. Also, Windows 7
Direct3D 10.1 provides the DXGI_FORMAT_R8G8B8A8_TYPELESS existing format family with the ability to cast
between members. Direct3D 10.1 and later versions expose this extended format support through a new version
and hardware capability discovery mechanism. Direct3D 10.0 does not support extended formats even if the
graphics hardware has Direct3D 10.1 capabilities.
The following are new Direct3D 10.1 features to support extended format awareness:
New XR formats for high color scan-out
Re-adding BGR formats that are missing from Direct3D version 10
Enabling creation of differently-formatted views of fully-typed members of the
DXGI_FORMAT_R8G8B8A8_TYPELESS, DXGI_FORMAT_R10G10B10A2_TYPELESS and
DXGI_FORMAT_R16G16B16_A16_TYPELESS families, which contain all the Direct3D version 10 scan-out
formats
Scan-out and present support for BGRA and BGRA_SRGB
Windows 7 also provides its version of Direct3D 9 with a new swap-chain flag that permits the XR interpretation of
a 10:10:10:2 back buffer to communicate to the DWM.
The following sections describe the new features for Direct3D:
Version Discovery Support
Details of the Extended Format
Fully-Typed Back Buffers Casting
BGRA Scan-Out Support
Extended Format Aware Requirements
DDI Changes for Direct3D Version 9 Drivers
Version Discovery Support

A user-mode display driver that runs on Windows Vista and later versions and Windows Server 2008 and later
versions must fail adapter creation (that is, fail a call to the driver's OpenAdapter10 function) for DDI versions that
the driver does not explicitly support.
Windows 7 provides a way for Direct3D applications to discover the DDI versions and hardware capabilities that
the driver explicitly supports. This improves version verification. Windows 7 introduces new adapter-specific
functions to improve versioning and to provide the opportunity to optimize API and driver initialization. You must
implement and export the OpenAdapter10_2 function in your Direct3D version 10.1 driver so the Direct3D
runtime can call the driver's new adapter-specific functions. If you instead implement OpenAdapter10 in your
Direct3D version 10.1 driver, the driver can only indicate whether it supports a DDI version by passing or failing the
call to OpenAdapter10.
OpenAdapter10_2 returns a table of the driver's adapter-specific functions in the pAdapterFuncs_2 member of
the D3D10DDIARG_OPENADAPTER structure. pAdapterFuncs_2 points to a D3D10_2DDI_ADAPTERFUNCS
structure. The Direct3D runtime calls the driver's adapter-specific GetSupportedVersions function to query for
the DDI versions and hardware capabilities that the driver supports. GetSupportedVersions returns the DDI
versions and hardware capabilities in an array of 64-bit values. The following code example shows a
GetSupportedVersions implementation:
// Array of 64-bit values that are defined in D3d10umddi.h
const UINT64 c_aSupportedVersions[] = {
D3D10_0_7_DDI_SUPPORTED, // 10.0 on Windows 7
D3D10_0_DDI_SUPPORTED, // 10.0 on Windows Vista
D3D10_1_x_DDI_SUPPORTED, // 10.1 with all extended
// format support (but not
// Windows 7 scheduling)
};
HRESULT APIENTRY GetSupportedVersions(

D3D10DDI_HADAPTER hAdapter,
__inout UINT32* puEntries,
__out_ecount_opt( *puEntries )
UINT64* pSupportedDDIInterfaceVersions)
)
{
const UINT32 uEntries = ARRAYSIZE( c_aSupportedVersions );
if (pSupportedDDIInterfaceVersions &&
*puEntries < uEntries)
{
return HRESULT_FROM_WIN32( ERROR_INSUFFICIENT_BUFFER );
}
// Determine concise hardware support from kernel, cache with hAdapter.

// pfnQueryAdapterInfoCb( hAdapter, ... )
*puEntries = uEntries;
if (pSupportedDDIInterfaceVersions)
{
UINT64* pCurEntry = pSupportedDDIInterfaceVersions;
memcpy( pCurEntry, c_aSupportedVersions, sizeof( c_aSupportedVersions ) );
pCurEntry += ARRAYSIZE( c_aSupportedVersions );
assert( pCurEntry - pSupportedDDIInterfaceVersions == uEntries );
}
return S_OK;
}
A Direct3D version 10.1 driver is not required to verify the values that are passed to the Interface and Version
members of D3D10DDIARG_OPENADAPTER in a call to its OpenAdapter10_2 function even though these
values contain DDI version information with which to initialize the driver. The driver can return DDI version and
hardware capabilities through a call to its GetSupportedVersions function.
The Direct3D runtime can pass values to the Interface and Version members of D3D10DDIARG_CREATEDEVICE
in a call to the driver's CreateDevice(D3D10) function that are different than the values that the runtime passed
to OpenAdapter10_2; the runtime passes values to the Interface and Version members of
D3D10DDIARG_CREATEDEVICE that are based on the DDI version and hardware capabilities information that the
driver's GetSupportedVersions returned to the runtime. The driver is not required to validate the values that are
passed to the Interface and Version members of D3D10DDIARG_CREATEDEVICE because the driver already
indicated support of these values through its GetSupportedVersions function.
If you are porting your driver from Direct3D version 10.0 to Direct3D version 10.1, you should convert the driver to
only monitor the Interface and Version members that are passed to CreateDevice(D3D10) instead of
OpenAdapter10_2. You should analyze both CalcPrivateDeviceSize and CreateDevice(D3D10) function
implementations in your ported driver to ensure that there are no assumptions about the values in the Interface
and Version members for CreateDevice(D3D10) matching the values in the Interface and Version members for
OpenAdapter10_2.
Note OpenAdapter10_2 has the same function signature as OpenAdapter10 (that is,
PFND3D10DDI_OPENADAPTER as defined in the D3d10umddi.h header). You can implement both functions in the
same user-mode display driver DLL.
Details of the Extended Format

In the following table, the XR part of a format name can be considered a new shader interpretation of the bits akin
to UNORM or SINT. The XR_BIAS part of a format name is a special case that overloads this interpretation semantic
with additional metadata. This metadata indicates that the format must be explicitly offset and biased in shader
code on transitions into and out of the shader. The driver is not required to perform any of this biasing work; it is
left entirely to the application.
The following table shows resources with particular attributes that use the extended formats if the hardware
supports these extended formats for the resource with those attributes or if extended formats for those resources
are optional.
Formats (DXGI_FORMAT_*) B8G8R8A8 B8G8R8A8 B8G8R8X8 B8G8R8X8 R10G10B10 Resource B8G8R8A8
_UNORM _UNORM B8G8R8X8 _UNORM _UNORM R10G10B10A2 _XR_BIAS attribute _TYPELESS (existing) _SRGB
_TYPELESS (existing) _SRGB _TYPELESS _A2_UNORM Buffer
N/A
R
(changed)
N/A
N/A
R
(changed)
N/A
N/A
N/A
Input
Assembler
Vertex
Buffer
N/A
R
(changed)
N/A
N/A
R
(changed)
N/A
N/A
N/A
Texture1D
R
R
(changed)
R
R
R
(changed)
R
R
N/A
Texture2D
R
R
(changed)
R
R
R
R
R
R
Texture3D
R
R
(changed)
R
R
R
(changed)
R
R
N/A
Texture
Cube
R
R
(changed)
R
R
R
(changed)
R
R
N/A
Shader ID
N/A
R
R
N/A
R
R
N/A
N/A
Shader
Sample
(any filter)
N/A
R
R
N/A
R
R
N/A
N/A
MIP-map
textures
R
R
(changed)
R
R
R
(changed)
R
R
N/A
MIP-map
Auto-
Generation
N/A
R
(changed)
R
N/A
R
(changed)
R
N/A
N/A
Render
Target
N/A
R
R
N/A
R
R
N/A
N/A
Blendable
Render
Target
N/A
R
R
N/A
R
R
N/A
N/A
CPU
Lockable
R
R
R
R
R
R
R
R
Multi-
Sample
Render
Target
N/A
O
O
N/A
o
o
N/A
N/A
Multi-
Sample
Resolve
N/A
R
(changed)
R
N/A
R
(changed)
R
N/A
N/A
Multi-
Sample
Load
N/A
R
R
N/A
R
R
N/A
N/A
Display
Scan Out
N/A
R
(changed)
R
N/A
N/A
N/A
N/A
R
Cast
Within Bit
Layout
R
R
(changed)
R
R
R
R
R
R
Note In the preceding table, cell entries have the following meaning:
"R" indicates that hardware support is required
"o" indicates that hardware support is optional
N/A indicates that the resource attribute either is not applicable to the extended format or does not allow
the extended format
Note The DXGI_FORMAT_B8G8R8A8_UNORM and DXGI_FORMAT_B8G8R8X8_UNORM formats already existed in
the DXGI_FORMAT enumeration. However, they are now considered members of the appropriate new family. Their
requirements have changed compared to their original definitions.
Note Rows for the "Input Assembler Index Buffer", "Shader sample_c (comparison filter)", "Shader sample (mono
1-bit filter)", "Shader gather4", and "Depth-Stencil Target" resource attributes are not included in the preceding
table for readability. All meaning for these resource attributes is N/A.
The following sections describe the details of the new extended formats:
XR Layout
XR Format Alpha Content
DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM
Casting Ability of XR Formats
XR_BIAS Color Channel Conversion Rules
Interpretation of X Channel
XR Layout

XR is a fixed-point 1.9 format. The value is biased by -0.5, which results in a dynamic range of approximately [-
0.5,1.5]. The fixed point representation implies a scale of 2x to shift the decimal point one place to the right.
Each XR element occupies one 32-bit DWORD, which is laid out as shown in the following table regardless of host
CPU endianness.
BITS 31:30 BITS 29:20 BITS 19:10 BITS 9:0
Alpha channel Blue channel Green channel Red channel
Each of the red, green and blue channels is laid out as shown in the following table.
BIT 9 BITS 8:0
1-bit integer part 9-bit fractional part

XR Format Alpha Content

As you can determine from the format names in the table in the Details of the Extended Format section, the alpha
channel is interpreted as a UNORM value that is in the range [0,1] and should be treated identically to the alpha
channel of the existing format DXGI_FORMAT_R10G10B10A2_UNORM.
DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM

The DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM format requires the application to be aware of the biased
nature of data that is related to the format. As can be seen from the conversion rules in the following sections, a
shader must be aware of XR_BIAS and must perform its own bias and scale on any data that is read from or written
to the DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM format.
Scan-out hardware must be able to apply the bias and scale.
The DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM format has only the display scan-out, CPU lockable, and
"cast within bit layout" resource attributes. Therefore, to render to a resource, the application typically creates a
render target view of format DXGI_FORMAT_R10G10B10A2_*.
For full functionality, the display miniport driver must support XR_BIAS as a display format. The new
D3DDDIFMT_A2B10G10R10_XR_BIAS value was added to the D3DDDIFORMAT enumeration for XR_BIAS support.
Casting Ability of XR Formats

The DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM format is a member of the
DXGI_FORMAT_R10G10B10A2_TYPELESS family. Therefore, an application can cast the
DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM format through the API-level concept of "views" to any other
member of that family. This procedure is the expected way that an application renders to a resource. Specifically,
the Direct3D runtime can only scan out and copy (through the driver's BltDXGI function) a resource of format
XR_BIAS. Therefore, to render to the resource, an application typically creates a view of format
DXGI_FORMAT_R10G10B10A2_UNORM.
XR_BIAS to Float Conversion Rules

The following code shows how to convert XR_BIAS to float:
float XRtoFloat( UINT XRComponent ) {

// The & 0x3ff shows that only 10 bits contribute to the conversion.
return (float)( (XRComponent & 0x3ff) â€“ 0x180 ) / 510.f;
}

Float to XR_BIAS Conversion Rules

The following rules apply for converting float to XR_BIAS. In these rules, suppose that the starting float value is c.
If c is NaN, the result is 0; otherwise, the following rules apply. NaN stands for "not a number," which means
a symbolic entity that represents a value not otherwise available in floating-point format.
Perform the following operation to convert from float scale to integer scale:
c = c * 510
The preceding operation might induce overflow.
Perform the following operation for bias:
c = c + 384
The preceding operation might induce overflow.
Perform one of the following operations to clamp, depending on the exponent of c:
If, post bias, the exponent of c is greater than or equal to 2 (>= 2 or c is INF), the result is 0x3ff, which is
approximately equivalent to 1.2529.
If, post bias, the exponent of c is less than 0 (< 0 or c is -INF), the result is 0x0, which represents
approximately -0.7529.
Re-interpret the most significant 10 bits of the mantissa of c as the result.
The conversion of float to XR_BIAS is permitted tolerance of 0.6f Unit-Last-Place (ULP) on the XR side. This
tolerance means that after converting from float to XR, any value within 0.6f ULP of a represent-capable target
format value is permitted to map to that value. Note that 1 ULP of the infinitely precise result means that, for
example, an implementation is permitted to truncate results to 32-bit rather than perform round-to-nearest-even,
as that would result in an error of at most one unit in the last (least significant) place that is represented in the
floating-point number.
The standard Direct3D version 10 requirement for inverting data also applies.
Conversion from BGR8888 to XR_BIAS

The conversion from BGR8888-type formats (for example, DXGI_FORMAT_B8G8R8A8_UNORM) to XR_BIAS is
lossless.
The scale factor of 510 is explicitly chosen to provide a cleanly invertible conversion between a BGR8888-type
format and XR_BIAS without causing the nonlinear jump near 0.5 that would be implied by a scale factor of 511.
Interpretation of X Channel

The user-mode display driver should read the X channel in all formats that include X (for example,
DXGI_FORMAT_B8G8R8X8_UNORM) as 1.0f when such formats are presented to filtering hardware or the blender.
The X channel must be copied unmodified when data is moved outside of the 3-D pipeline (that is, when an
application calls the ID3D10Device::CopyResource, ID3D10Device::CopySubresourceRegion, or
ID3D10Device::UpdateSubResource method). For more information about these methods, see the DirectX SDK
documentation.
Fully-Typed Back Buffers Casting

Consider resources that are created through a call to the driver's CreateResource(D3D10) function with the
Format member of the D3D10DDIARG_CREATERESOURCE structure set to a format of family
DXGI_FORMAT_R8G8B8A8_TYPELESS, DXGI_FORMAT_B8G8R8A8_TYPELESS or
DXGI_FORMAT_R10G10B10A2_TYPELESS and with the D3D10_DDI_BIND_PRESENT value set in the BindFlags
member of D3D10DDIARG_CREATERESOURCE. The Direct3D version 10.1 runtime can subsequently create
views (render target or shader resource) on these resources by using any of the fully-typed members of the
appropriate family (for example, DXGI_FORMAT_B8G8R8A8_UNORM_SRGB for the
DXGI_FORMAT_B8G8R8A8_TYPELESS family), even if the original resource is created as fully typed. If
D3D10_DDI_BIND_PRESENT is not set for the resource, this re-casting is not allowed, as is the case for all fully-
typed resources in Direct3D version 10.
This change for Direct3D version 10.1 allows applications to re-view a DXGI_FORMAT_R8G8B8A8_UNORM back
buffer as DXGI_FORMAT_R8G8B8A8_UNORM_SRGB and vice versa. This change also allows applications to cast a
DXGI_FORMAT_B8G8R8A8_UNORM_SRGB back buffer for DXGI_FORMAT_B8G8R8A8_UNORM and to re-view
DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM as DXGI_FORMAT_R10G10B10A2_* for rendering.
BGRA Scan-Out Support

The scan-out bit is turned on for the DXGI_FORMAT_B8G8R8A8_UNORM and
DXGI_FORMAT_B8G8R8A8_UNORM_SRGB formats. Therefore, the user-mode display driver should be able to
perform the following operations:
Handle requests for the primary surface that are in these formats.
Handle calls to its SetDisplayMode function for resources that are created with these formats.
Handle calls to its PresentDXGI function to present these formats through both bit-block transfer (bitblt)
and flip operations.
Handle calls to its BltDXGI function to copy these formats through stretch, rotate, and resolve (in fact, all the
bitblt operations that are expected for the RGBA variants).
Extended Format Aware Requirements

User-mode display drivers that are extended format aware guarantee to return accurate values from their
CheckFormatSupport entry-point function for every format in the table in the Details of the Extended Format
section. However, drivers do not necessarily support every format.
Extended format aware drivers implicitly guarantee that casting of fully-typed back buffers is supported.
Extended format aware drivers implicitly support all of the BGRX and BGRA formats with capabilities as defined in
the table in the Details of the Extended Format section.
Extended format aware drivers implicitly support BGRA and BGRA_SRGB scan out as described in the BGRA Scan-
Out Support section.
If an extended format aware driver returns any support bits for any of the new formats, it must return all of the bits
that are required in the table in the Details of the Extended Format section. The driver cannot return any bits that
are not required in the table.
Claiming Support under Direct3D Version 10.1
The Direct3D 10.1 and later DDIs are updated to allow the user-mode display driver to claim support for two new
versions. One version corresponds to drivers that want to support feature level 10.0, and the other version
corresponds to drivers that want to support feature level 10.1. The following are the new version definitions:
// D3D10.0 or D3D10.1 with extended format support (but not Windows 7 scheduling)
#define D3D10_0_x_DDI_BUILD_VERSION 10
#define D3D10_0_x_DDI_SUPPORTED ((((UINT64)D3D10_0_DDI_INTERFACE_VERSION) << 32) |
(((UINT64)D3D10_0_x_DDI_BUILD_VERSION) << 16))
#define D3D10_1_x_DDI_BUILD_VERSION 10
#define D3D10_1_x_DDI_SUPPORTED ((((UINT64)D3D10_1_DDI_INTERFACE_VERSION) << 32) |
(((UINT64)D3D10_1_x_DDI_BUILD_VERSION) << 16))
XR_BIAS and PresentDXGI

Drivers are not required to support windowed present of XR_BIAS resources through calls to their PresentDXGI
functions. These cases are restricted at the runtime level. As with all other formats, drivers perform full-screen
present of XR_BIAS through either a flip operation or a bit-block transfer (bitblt) operation with an identical source
and destination resource. No stretch or conversion is necessary.
XR_BIAS and BltDXGI
The Direct3D runtime calls a driver's BltDXGI function to perform only the following operations on XR_BIAS source
resources:
A copy to a destination that is also XR_BIAS
A copy of unmodified source data
A stretch in which point sample is acceptable
A rotation
Because XR_BIAS does not support Multiple Sample Anti Aliasing (MSAA), drivers are not required to resolve
XR_BIAS resources.
DDI Changes for Direct3D Version 9 Drivers

XR_BIAS is the only new extended format ability that Windows 7 makes available to user-mode display drivers that
only support the Direct3D version 9 DDI.
Such a user-mode display driver can indicate that it supports the D3DDDIFMT_A2B10G10R10_XR_BIAS format
value from the D3DDDIFORMAT enumeration. The driver indicates such support by creating an entry in the array
of populated FORMATOP structures in the pData member of the D3DDDIARG_GETCAPS structure that the driver
returns from a call to its GetCaps function with the D3DDDICAPS_GETFORMATDATA value set in the Type
member of D3DDDIARG_GETCAPS. This entry should indicate, in the Operations member of FORMATOP, all of
the typical operations that the runtime can perform on surfaces with the D3DDDIFMT_A2B10G10R10_XR_BIAS
format. For example, the driver should set the FORMATOP_*_RENDERTARGET bits in Operations. The driver must
also set the FORMATOP_DISPLAYMODE and FORMATOP_3DACCELERATION bits in Operations.
If the driver returns a FORMATOP entry for the D3DDDIFMT_A2B10G10R10_XR_BIAS format, the driver can
subsequently receive calls to its CreateResource function to create resources with the
D3DDDIFMT_A2B10G10R10_XR_BIAS format set in the Format member of the D3DDDIARG_CREATERESOURCE
structure.
The driver only receives requests to create resources with the D3DDDIFMT_A2B10G10R10_XR_BIAS format for full-
screen flipping chains. The Desktop Windows Manager (DWM) handles windowed presentation of XR_BIAS in
shader code. The driver should treat D3DDDIFMT_A2B10G10R10_XR_BIAS-format resources as the
D3DDDIFMT_A2B10G10R10 format in all operations except scan out, For example, the driver can treat
D3DDDIFMT_A2B10G10R10_XR_BIAS-format resources as the D3DDDIFMT_A2B10G10R10 format for blending,
filtering, and format-conversion operations. The only difference is how XR_BIAS affects scan-out. For more
information about scan-out, see BGRA Scan-Out Support.
operating system.
The following sections describe the new features of Direct3D version 11 and how to support and use the Direct3D
Version 11 DDI:
Initializing Communication with the Direct3D Version 11 DDI
Pipelines for Direct3D Version 11
Supporting Threading, Command Lists, and 3-D Pipeline
Changes from Direct3D 10
Supporting Deferred Contexts
Supporting Command Lists
Conforming to the DXGI DDI
operating system.
To enable support for a user-mode display driver DLL's version 11 DDI, the INF file that installs the display drivers
for a graphics device must list the name of the DLL regardless of whether the Direct3D version 11 DDI exists in the
same DLL as the Direct3D version 9 DDI and Direct3D version 10 DDI or in a separate DLL.
The Installation Requirements for Display Miniport and User-Mode Display Drivers section describes how a user-
mode display driver is installed and used according to the Windows Vista display driver model. To also enable
support for the Direct3D version 11 DDI, you must specify the name of the DLL that contains the version 11 DDI as
the third entry in the list of user-mode display driver names even if the version 11 DDI exists in the same DLL as
the version 9 and 10 DDIs.
You can use the same user-mode display driver DLL name in multiple locations to unify your driver
implementation. In fact, the design of the Direct3D version 10 and version 11 DDIs strongly supports a shared
implementation of Direct3D version 10 and Direct3D version 11 drivers.
Umd11.dll (that is, a separate DLL from the version 9 and 10 DDIs):
...
HKR,, UserModeDriverName, %REG_MULTI_SZ%, umd9.dll, umd10.dll, umd11.dll
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, umd9, umd10, umd11
Umd.dll (that is, a shared implementation of Direct3D version 9, 10 and 11 drivers):
...
HKR,, UserModeDriverName, %REG_MULTI_SZ%, umd.dll, umd.dll, umd.dll
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, umd, umd, umd

Initializing Communication with the Direct3D Version
11 DDI
operating system.
To initialize communication with the user-mode display driver DLL's version 11 DDI, the Direct3D version 11
runtime first loads the DLL if the DLL is not yet loaded. The Direct3D runtime next calls the user-mode display
driver's OpenAdapter10_2 function through the DLL's export table to open an instance of the graphics adapter.
The OpenAdapter10_2 function is the DLL's only exported function.
Note The OpenAdapter10_2 function is identical to the OpenAdapter10 function except that OpenAdapter10_2
returns a table of the driver's adapter-specific functions in the pAdapterFuncs_2 member of the
D3D10DDIARG_OPENADAPTER structure, and OpenAdapter10 returns a table of the driver's adapter-specific
functions in the pAdapterFuncs member of D3D10DDIARG_OPENADAPTER. pAdapterFuncs_2 points to a
D3D10_2DDI_ADAPTERFUNCS structure; pAdapterFuncs points to a D3D10DDI_ADAPTERFUNCS structure.
OpenAdapter10_2 was designed to make initializing drivers more efficient. You must implement
OpenAdapter10_2 in your Direct3D version 11 drivers. You can also implement OpenAdapter10_2 (rather than
or in addition to OpenAdapter10) in your Direct3D version 10.1 drivers to increase the initialization efficiency of
those drivers. For more information about implementing OpenAdapter10_2 in Direct3D version 10.1 drivers, see
Version Discovery Support. OpenAdapter10_2 handles the exchange of versioning and other information
between the runtime and the driver.
Versioning
OpenAdapter10_2 and the driver's adapter-specific functions change the way versioning between the Direct3D
API and Direct3D DDI is handled from the way that Direct3D 10 handled versioning (for more information about
how Direct3D 10 handles versioning, see Initializing Communication with the Direct3D Version 10 DDI). Instead of
the Direct3D API relying upon failure of the driver's OpenAdapter10_2 function to indicate no support for a
particular version (as with OpenAdapter10_2), the driver must explicitly list the DDI versions it supports. The
Direct3D runtime calls the user-mode display driver's GetSupportedVersions function (one of the driver's
adapter-specific functions) to query for the DDI versions that the driver supports.
There are at least two new DDI versions for the Direct3D 11 DDI functions. Each DDI version distinguishes whether
the DDI runs on Windows Vista or Windows 7. However, support of the Direct3D 11 DDI does not necessarily
indicate full support of the hardware features that are associated with D3D_FEATURE_LEVEL_11. Drivers can
support the new threading features of the Direct3D 11 DDI with hardware that does not support the other features
that are exposed by the Direct3D 11 DDI, like tessellation, and so on. The following code shows how each DDI
version is distinguished:
// D3D11.0 on Vista
#define D3D11_DDI_MAJOR_VERSION 11
#define D3D11_0_DDI_MINOR_VERSION ...
#define D3D11_0_DDI_INTERFACE_VERSION \
((D3D11_DDI_MAJOR_VERSION << 16) | D3D11_0_DDI_MINOR_VERSION)
#define D3D11_0_DDI_BUILD_VERSION ...
#define D3D11_0_DDI_SUPPORTED \
((((UINT64)D3D11_0_DDI_INTERFACE_VERSION) << 32) | \
(((UINT64)D3D11_0_DDI_BUILD_VERSION) << 16))
// D3D11.0 on Windows 7
#define D3D11_0_7_DDI_MINOR_VERSION ...
#define D3D11_0_7_DDI_INTERFACE_VERSION \
((D3D11_DDI_MAJOR_VERSION << 16) | D3D11_0_7_DDI_MINOR_VERSION)
#define D3D11_0_7_DDI_BUILD_VERSION ...
#define D3D11_0_7_DDI_SUPPORTED \
((((UINT64)D3D11_0_7_DDI_INTERFACE_VERSION) << 32) | \
(((UINT64)D3D11_0_7_DDI_BUILD_VERSION) << 16))
#ifndef IS_D3D11_WIN7_INTERFACE_VERSION
#define IS_D3D11_WIN7_INTERFACE_VERSION( i ) (D3D11_0_7_DDI_INTERFACE_VERSION == i)
#endif
Information Exchange
In addition to specifying version information, the driver's OpenAdapter10_2 function also exchanges other
information between the runtime and the driver.
In the call to the driver's OpenAdapter10_2 function, the runtime supplies the pfnQueryAdapterInfoCb adapter
callback function in the pAdapterCallbacks member of the D3D10DDIARG_OPENADAPTER structure. The user-
mode display driver should call the pfnQueryAdapterInfoCb adapter callback function to query for the graphics
hardware capabilities from the display miniport driver.
The runtime calls the user-mode display driver's CreateDevice(D3D10) function (one of the driver's adapter-
specific functions) to create a display device for handling a collection of render state and to complete the
initialization. When the initialization is complete, the Direct3D version 11 runtime can call the display driver-
supplied Direct3D version 11 functions, and the user-mode display driver can call the runtime-supplied functions.
The user-mode display driver's CreateDevice(D3D10) function is called with a D3D10DDIARG_CREATEDEVICE
structure whose members are set up in the following manner to initialize the user-mode display driver's version 11
DDI:
display driver.
The runtime sets Version to a number that the driver can use to identify when the runtime is built. For
requires.
The runtime sets hRTDevice to specify the handle that the driver should use when the driver calls back into
the runtime.
The runtime sets hDrvDevice to specify the handle that the runtime uses in subsequent driver calls.
structure to which pKTCallbacks points. The user-mode display driver calls the runtime-supplied callback
The user-mode display driver returns a table of its device-specific functions in the
D3D11DDI_DEVICEFUNCS structure to which p11DeviceFuncs points.
The runtime supplies a DXGI_DDI_BASE_ARGS structure to which DXGIBaseDDI points. The runtime and
the user-mode display driver supply their DirectX Graphics Infrastructure DDI to this structure.
The runtime sets hRTCoreLayer to specify the handle that the driver should use when the driver calls back
into the runtime to access core Direct3D 10 functionality (that is, in calls to the functions that the
p11UMCallbacks member specifies).
The runtime supplies a table of its core callback functions in the
D3D11DDI_CORELAYER_DEVICECALLBACKS structure to which p11UMCallbacks points. The user-mode
display driver calls the runtime-supplied core callback functions to refresh state.
Pipelines for Direct3D Version 11
operating system.
The graphics rendering pipeline for Direct3D version 11 is expanded from the graphics rendering pipeline for
Direct3D version 10. In addition to the shared programmable shader cores that Direct3D version 10 supported,
Direct3D version 11 also supports hull, domain, and compute shader cores.
Direct3D version 11 actually supports two separate pipelines: the draw pipeline (graphics rendering pipeline) and
the dispatch pipeline (compute shader pipeline). The draw and dispatch pipelines are technically loosely connected
in the sense that you cannot have the same subresource bound for writing in both pipelines simultaneously, or
bound for writing in one pipeline and for reading in the other pipeline.
The following figure shows the functional block of the draw pipeline for Direct3D version 11.
The following figure shows the functional block of the dispatch pipeline for Direct3D version 11.
The following sections describe the new-for-Direct3D 11 blocks that are shown in the preceding figures.
Hull Shader
The hull shader operates once per patch. You can use the hull shader with patches from the input assembler. The
hull shader can transform input control points that make up a patch into output control points. The hull shader can
perform other setup for the fixed-function tessellator stage. For example, the hull shader can output tess factors,
which are numbers that indicate how much to tessellate.
The Direct3D runtime calls the following driver functions to create, set up, and destroy the hull shader:
CalcPrivateTessellationShaderSize
CreateHullShader
DestroyShader
HsSetShaderResources
HsSetShader
HsSetSamplers
HsSetConstantBuffers
HsSetShaderWithIfaces
Tessellator
The tessellator is a fixed-function unit whose operation is defined by declarations in the hull shader. The tessellator
operates once per patch that is output by the hull shader. The hull shader generates tess factors, which are
numbers that notify the tessellator how much to tessellate (generate geometry and connectivity) over the domain
of the patch.
The Direct3D runtime calls the driver's CalcPrivateTessellationShaderSize function to calculate the size of the
memory region for a hull or domain shader.
Domain Shader
The domain shader is invoked once per vertex, which is generated by the tessellator. Each invocation is identified by
its coordinate on a generic domain. The role of the domain shader is to turn that coordinate into something
tangible (such as, a point in 3-D space) for use down stream of the domain shader. Each domain shader invocation
for a patch also accesses shared input of all the hull shader output (such as, output control points).
The Direct3D runtime calls the following driver functions to create, set up, and destroy the domain shader:
CalcPrivateTessellationShaderSize
CreateDomainShader
DestroyShader
DsSetShaderResources
DsSetShader
DsSetSamplers
DsSetConstantBuffers
DsSetShaderWithIfaces
Compute Shader
The compute shader allows the GPU to be viewed as a generic grid of data-parallel processors, without any
graphics impediments from the draw pipeline. The compute shader has explicit access to fast shared memory to
facilitate communication between groups of shader invocations. The compute shader also has the ability to
perform scattered reads and writes to memory. The availablility of atomic operations enables unique access to
shared memory addresses. The compute shader is not part of the draw pipeline. The compute shader exists on its
own. However, the compute shader exists on the same device as all the other shader stages. The Direct3D runtime
calls the driver's DispatchXxx functions rather than the driver's DrawXxx functions to invoke the compute shader.
The Direct3D runtime calls the following driver functions to create, set up, and destroy the compute shader:
CreateComputeShader
DestroyShader
CsSetShaderResources
CsSetShader
CsSetSamplers
CsSetConstantBuffers
CsSetShaderWithIfaces
CsSetUnorderedAccessViews
Dispatch
DispatchIndirect
Unordered Access Resource Views
Unordered access resource views are read/write resources that you can bind to the compute shader or pixel shader.
The binding of unordered access resource views is similar to how you can bind shader resource views, which are
read-only resources, to any shader stage.
The Direct3D runtime calls the following driver functions to create, set up, and destroy unordered access resource
views:
CalcPrivateUnorderedAccessViewSize
CreateUnorderedAccessView
DestroyUnorderedAccessView
ClearUnorderedAccessViewFLOAT
ClearUnorderedAccessViewUINT
CopyStructureCount
SetRenderTargets(D3D11)
Supporting Threading, Command Lists, and 3-D
Pipeline
operating system.
A user-mode display driver indicates the new Direct3D version 11 capabilities that it supports (for example,
threading, command lists, and 3-D pipeline) when the Direct3D version 11 runtime calls the driver's
GetCaps(D3D10_2) function. GetCaps(D3D10_2) is one of the driver's adapter-specific functions that the driver
provides in the D3D10_2DDI_ADAPTERFUNCS structure that the pAdapterFuncs_2 member of the
D3D10DDIARG_OPENADAPTER structure points to. For more information about providing adapter-specific
functions during driver initialization, see Initializing Communication with the Direct3D Version 11 DDI. When its
GetCaps(D3D10_2) function is called, the user-mode display driver provides new Direct3D version 11 capabilities
based on the request type (which is specified in the Type member of the D3D10_2DDIARG_GETCAPS structure
that the GetCaps(D3D10_2) function's pData parameter points to).
Threading and Command Lists
The Direct3D version 11 API requires a mode of operation where it can synchronize the application threads to
ensure that only one of the threads runs in the DDI at a time. The Direct3D version 11 API also requires a mode of
operation with a software emulation of command lists. These modes of operation are required by and leveraged
on prior-version DDIs (such as, the Direct3D version 10 DDI). Therefore, as a development aid for driver writers,
these same modes of operation are extended to exist on the Direct3D version 11 DDI. Driver writers can decide
which modes of operations they would like their drivers to support for the Direct3D version 11 DDI.
All drivers should eventually fully support all types of threading operations (that is, all drivers should eventually
support all the threading capabilities of the D3D11DDI_THREADING_CAPS structure). However, the driver can
require that the API emulate command lists or enforce a single-threaded mode of operation for the driver. The API
must be aware of the driver's threading capabilities during the creation of an API device, but before the creation of
a DDI device. Therefore, the runtime determines the driver's threading capabilities when it calls the driver's
GetCaps(D3D10_2) adapter-specific function with the Type member of D3D10_2DDIARG_GETCAPS set to
D3D11DDICAPS_THREADING. The driver returns a pointer to a D3D11DDI_THREADING_CAPS structure in the
pData member of D3D10_2DDIARG_GETCAPS that identifies the driver's threading capabilities. The driver must
support free-threaded mode (D3D11DDICAPS_FREETHREADED) if the driver also supports command lists
(D3D11DDICAPS_COMMANDLISTS_BUILD_2) because command lists build on free-threaded mode. The driver
must opt-in to support the free-threaded mode and command lists. The application can determine the support that
the driver indicated through the use of the application-level CheckFeatureSupport function and the
D3D11_FEATURE_THREADING constant; however, some applications might not care due to the support that the
API provides.
3-D Pipeline Level
Drivers that support the Direct3D version 11 DDI are not required to support all the hardware features of the
Direct3D version 11 DDI. Drivers can support the new threading model of the Direct3D version 11 DDI on top of
hardware that only supports the Direct3D version 10 DDI. The Direct3D version 11 runtime determines the driver's
maximum hardware level of support when the runtime calls the driver's GetCaps(D3D10_2) function with the
Type member of D3D10_2DDIARG_GETCAPS set to D3D11DDICAPS_3DPIPELINESUPPORT. The driver returns a
pointer to a D3D11DDI_3DPIPELINESUPPORT_CAPS structure in the pData member of
D3D10_2DDIARG_GETCAPS that identifies the maximum hardware level of support.
The API does not use just the DDI version as the primary indicator of API feature level support; the API allows the
driver to feed back into this process. The runtime chooses a D3D11DDI_3DPIPELINELEVEL value and feeds back
the value to the driver during device creation in a call to the driver's CreateDevice(D3D10) function, as part of
the Flags member of the D3D10DDIARG_CREATEDEVICE structure.
If the driver supports hardware levels less than Direct3D version 11 on the Direct3D version 11 DDI, there are
minor ramifications to the operation of the driver. The first is that the Direct3D version 11 runtime might never call
many new Direct3D version 11 DDI functions at all. For example, the Direct3D version 11 runtime does not call any
of the new shader-stage DDI functions (like DsSetShader) if the driver supports a hardware feature level that is
less than Direct3D version 11. Other DDI functions follow the rules of the feature level and ignore the fact that the
Direct3D version 11 DDI might be associated with higher capabilities. For example, even though IAVertexInputSlots
for the Direct3D version 11 API is 32, the Direct3D version 10 feature level only allows 16 and that is what the
driver should expect.
Deprecated or converted features present another interesting aspect. Deprecation is not possible at the Direct3D
version 11 DDI level because deprecation must support the ability to express earlier-version DDI functions. For
example, the Direct3D 11 API version of PIPELINESTATS is always constant; however, it requests different
D3D10_DDI_QUERY_DATA_PIPELINE_STATISTICS with Direct3D 10 feature levels and
D3D11_DDI_QUERY_DATA_PIPELINE_STATISTICS with Direct3D 11 feature levels, and so on. Even though the
API attempted to deprecate the text filter size, it is easier for drivers to deprecate the DDI function table entry, in its
entirety, than to attempt to re-use the function table entry for something else.
Even if a driver that supports the Direct3D version 11 DDI does not support the full Direct3D version 11 feature
level, the driver cannot opt-out of "extended format awareness", as described in Supporting Extended Format
Awareness. Because the driver supports the Direct3D version 11 DDI, the driver should handle the following tasks:
Support BGR formats
Correctly respond to calls to its CheckFormatSupport function to check for XR_BIAS support. The driver
should either claim support or deny support.
Allow casting of fully typed back buffers
The Direct3D version 11 API also informs the driver whether the application uses multiple threads through the
D3D11DDI_CREATEDEVICE_FLAG_SINGLETHREADED flag. If this flag is present in the Flags member of the
D3D10DDIARG_CREATEDEVICE structure when a display device is created through a call to the driver's
CreateDevice(D3D10) function, the driver can determine that no deferred contexts are created and that the driver
is not required to synchronize, as concurrent creates do not occur.
Changes from Direct3D 10
operating system.
The following sections describe how Direct3D 11 has changed from Direct3D 10.
Driver Callback Functions to Kernel-Mode Services
The device-specific callback functions that the Direct3D version 11 runtime supplies in the
D3DDDI_DEVICECALLBACKS structure when the runtime calls the user-mode display driver's
CreateDevice(D3D10) function isolate the driver from kernel handles and kernel function signatures. The
Direct3D version 11 runtime changes the callback semantics and, therefore, the implementation of the callback
functions to support a free-threaded mode of operation, whereas previous Direct3D version runtimes did not
support a free-threaded mode of operation. The rules for free-threaded mode operation apply after the driver
indicates that it supports free-threaded mode (D3D11DDICAPS_FREETHREADED); otherwise, the previous heavily
restricted rules apply. For information about how the driver indicates support for free-threaded mode, see
Threading and Command Lists. The following restrictions still exist for Direct3D version 11:
Only a single thread can work against an HCONTEXT at a time. Existing callback functions that currently use
an HCONTEXT are pfnPresentCb, pfnRenderCb, pfnEscapeCb, pfnDestroyContextCb,
pfnWaitForSynchronizationObjectCb, and pfnSignalSynchronizationObjectCb. Therefore, if more
than one thread call these callback functions and use the same HCONTEXT, the driver must synchronize the
calls to the callback functions. Satisfying this requirement is quite natural because these callback functions
are likely to be called only from the thread that manipulates the immediate context.
The driver must call the following callback functions only during calls to the following driver functions by
using the same threads that called those driver functions:
pfnAllocateCb
The driver must call pfnAllocateCb on the thread that called the driver's CreateResource(D3D11) function
when shared resources are created. Regular non-shared allocations with the device are fully free-threaded.
pfnPresentCb
The driver must call pfnPresentCb only during calls to the driver's PresentDXGI function.
pfnSetDisplayModeCb
The driver must call pfnSetDisplayModeCb only during calls to the driver's SetDisplayModeDXGI
function.
pfnRenderCb
The driver must call pfnRenderCb on the thread that called the driver's Flush(D3D10) function. This
restriction is quite natural because of the HCONTEXT restrictions.
The pfnDeallocateCb callback function deserves special mention because the driver is not required to call
pfnDeallocateCb before the driver returns from its DestroyResource(D3D10) function for most resource
types. Because DestroyResource(D3D10) is a free-threaded function, the driver must defer destruction of the
object until the driver can efficiently ensure that no existing immediate context reference remains (that is,
the driver must call pfnRenderCb before pfnDeallocateCb). This restriction applies even to shared
resources or to any other callback function that uses HRESOURCE to complement HRESOURCE usage with
pfnAllocateCb. However, this restriction does not apply to primaries. For more information about primary
exceptions, see Primary Exceptions. Because some applications might require the appearance of
synchronous destruction, the driver must ensure that it calls pfnDeallocateCb for any previously destroyed
shared resources during a call to its Flush(D3D10) function. A driver must also cleanup any previously
destroyed objects (only those that will not stall the pipeline) during a call to its Flush(D3D10) function; the
driver must do so to ensure that the runtime calls Flush(D3D10) as an official mechanism to cleanup
deferred destroyed objects for those few applications that might require such a mechanism. For more
information about this mechanism, see Deferred Destruction and Flush(D3D10). The driver must also ensure
that any objects for which destruction was deferred are fully destroyed before the driver's
DestroyDevice(D3D10) function returns during cleanup.
Deprecate Ability to Allow Modification of Free -Threaded DDIs
For Direct3D version 11, the API-level concept of a display device and an immediate context are still bundled
together at the DDI level by the legacy concept of a display device. This bundling of display device and immediate
context maximizes compatibility with prior-version DDIs (such as, the Direct3D version 10 DDI) and reduces driver
churn when supporting multiple versions of APIs through multiple versions of DDIs. However, this bundling of
display device and immediate context results in a more confusing DDI because the threading domains are not
extremely explicit. Instead, to understand the threading requirements of multiple interfaces and the functions
within those interfaces, driver developers must refer to the documentation.
A primary feature of the Direct3D version 11 API is that it allows multiple threads to enter create and destroy
functions simultaneously. Such a feature is incompatible with allowing the driver to swap out the function table
pointers for create and destroy, as the Direct3D version 10 DDI semantics for functions that are specified in
D3D10DDI_DEVICEFUNCS and D3D10_1DDI_DEVICEFUNCS allowed. Therefore, after the driver passes back the
function pointers for creates (CreateDevice(D3D10)), the driver should not attempt to change behavior by
modifying these particular function pointers when the driver runs under the Direct3D version 11 DDI and while the
driver supports DDI threading. This restriction applies to all device functions that start with pfnCreate, pfnOpen,
pfnDestroy, pfnCalcPrivate, and pfnCheck. All the rest of the device functions are strongly associated with the
immediate context. Because a single thread manipulates the immediate context at a time, it is well-defined to
continue to allow the driver to hot-swap immediate context function table entries.
pfnRenderCb Versus pfnPerformAmortizedProcessingCb
The Direct3D version 10 API functions hooked the Direct3D runtime's pfnRenderCb kernel callback function to
perform amortized processing (that is, instead of executing certain operations for every API function call, the driver
performed amortized operations for every so many API function calls). The API typically uses this opportunity to
trim high watermarks and flush out its deferred object destruction queue, among other things.
To allow the kernel callback functions to be as free-threaded as possible for the driver, the Direct3D API no longer
uses pfnRenderCb when the driver supports the Direct3D version 11 DDI. Therefore, drivers that support the
Direct3D version 11 DDI must manually call the pfnPerformAmortizedProcessingCb kernel callback function
from the same thread that entered the driver DDI function after the driver submits a command buffer on the
immediate context (or similar frequency). Because the operation should trim high watermarks, it would be
advantageous to do it before the driver generates command buffer preambles when leveraging the state-refresh
DDI callback functions.
In addition, the driver should be aware of the API amortization issue, and try to balance how often it uses the
pfnPerformAmortizedProcessingCb kernel callback function. On one extreme, the driver might cause over-
processing. For example, if the driver always called pfnPerformAmortizedProcessingCb twice (back-to-back),
possibly due to multiple-engine usage, it would be more efficient for the driver to call
pfnPerformAmortizedProcessingCb only once. On the other extreme, the driver might not allow the Direct3D API to
do any work for a whole frame if the driver never called pfnPerformAmortizedProcessingCb, possibly due to an
alternating frame rendering design. The driver is not required to call pfnPerformAmortizedProcessingCb any more
often than it naturally would, as that is overkill (for example, if the driver did not call
pfnPerformAmortizedProcessingCb in a 1 millisecond timeframe, it must be time to pump the API). The driver is
required to only determine which of the existing pfnRenderCb calls should be accompanied by
pfnPerformAmortizedProcessingCb and, naturally, conform to the threading semantics of the operation.
For drivers that support command lists, those drivers must also call pfnPerformAmortizedProcessingCb from
deferred contexts whenever those drivers run out of room (a similar frequency as every immediate context flush).
The Direct3D version 11 runtime expects to, at least, trim its high-watermarks during such an operation. Because
the threading semantic that is related to pfnRenderCb has been relaxed for Direct3D version 11, concurrency
issues must be solved in order to allow Direct3D version 11 to continue to hook pfnRenderCb, without restriction.
New DDI Error Code
The D3DDDIERR_APPLICATIONERROR error code is created to allow drivers to participate in validation where the
Direct3D version 11 API did not. Previously, if the driver returned the E_INVALIDARG error code, it would cause the
API to raise an exception. The presence of the debug layer would cause debugging output and indicate that the
driver had returned an internal error. The debugging output would suggest to the developer that the driver had a
bug. If the driver returns D3DDDIERR_APPLICATIONERROR, the debug layer determines that the application is at
fault, instead.
Retroactively Requiring Free -Threaded CalcPrivate DDIs
Direct3D version 11 retroactively requires driver functions that begin with pfnCalcPrivate on Direct3D version 10
DDI functions to be free threaded. This retroactive requirement matches the behavior of the Direct3D version 11
DDI to always require pfnCalcPrivate\* and pfnCalcDeferredContextHandleSize functions to be free threaded
even if the driver indicates it does not support DDI threading. For more information about this retroactive
requirement, see Retroactively Requiring Free-Threaded CalcPrivate DDIs.
Deferred Destruction and Flush D3D10
Because all the destroy functions are now free-threaded, the Direct3D runtime cannot flush a command buffer
during destruction. Therefore, the destroy functions must defer the actual destruction of an object until the driver
can ensure that the thread that manipulates the immediate context is no longer dependent on that object to
survive. Each discrete immediate context method cannot efficiently use synchronization to solve this destruction
issue; therefore, the driver should use synchronization only when it flushes a command buffer. The Direct3D
runtime also uses this same design when it must deal with similar issues.
Due to the ratification of deferred destruction, the Direct3D runtime advocates that those applications that cannot
tolerate deferred-destruction workarounds instead use explicit mechanisms. Therefore, the driver must process its
deferred-destruction queue during calls to its Flush(D3D10) function (even if the command buffer is empty) to
ensure that these mechanisms actually work.
Those applications that require a form of synchronous destruction must use one of the following patterns,
depending on how heavyweight a destruction they require:
After the application ensures that all dependencies on that object are released (that is, command lists, views,
middle ware, and so on), the application uses the following pattern:
Object::Release(); // Final release

ImmediateContext::ClearState(); // Remove all ImmediateContext references as well.
ImmediateContext::Flush(); // Destroy all objects as quickly as possible.
The following pattern is a more heavywight destruction:
Object::Release(); // Final release

ImmediateContext::ClearState(); // Remove all ImmediateContext references as well.
ImmediateContext::Flush();
ImmediateContext::End( EventQuery );
while( S_FALSE == ImmediateContext::GetData( EventQuery ) ) ;
ImmediateContext::Flush(); // Destroy all objects, completely.
Primary Exceptions
Primaries are resources that the runtime creates in calls to the driver's CreateResource(D3D11) function. The
runtime creates a primary by setting the pPrimaryDesc member of the D3D11DDIARG_CREATERESOURCE
structure to a valid pointer to a DXGI_DDI_PRIMARY_DESC structure. Primaries have the following notable
exceptions in regard to the preceding changes from Direct3D 10 to Direct3D 11:
Both the driver's CreateResource(D3D11) and DestroyResource(D3D10) functions for primaries are not
free-threaded, and they share the immediate context threading domain. Concurrency can still exist with
functions that start with pfnCreate and pfnDestroy, which includes CreateResource(D3D11) and
DestroyResource(D3D10). However, concurrency cannot exist with CreateResource(D3D11) and
DestroyResource(D3D10) for primaries. For example, the driver can detect that a call to its
CreateResource(D3D11) or DestroyResource(D3D10) function is for a primary, and thereby determine that it
can safely use or touch immediate context memory for the duration of the function call.
Primary destruction cannot be deferred by the Direct3D runtime, and the driver must call the
pfnDeallocateCb function appropriately within a call to the driver's DestroyResource(D3D10) function.
Supporting Deferred Contexts
operating system.
The following sections describe the deferred context feature of Direct3D version 11:
Introduction to Deferred Contexts
Excluding DDI Functions for Deferred Contexts
Mapping on Deferred Contexts
Using Context-Local DDI Handles
Introduction to Deferred Contexts
operating system.
Deferred contexts are used by an application to create command lists. If a user-mode display driver indicates that it
supports command lists through the D3D11DDICAPS_COMMANDLISTS_BUILD_2 flag of the
D3D11DDI_THREADING_CAPS structure, it must also support the ability to create and manipulate deferred
contexts. For more information about how the driver indicates threading capabilities, see Supporting Threading,
Command Lists, and 3-D Pipeline. Deferred contexts differ from the immediate context in that the commands that
the deferred contexts record cannot be executed until the application explicitly requests to execute the commands,
by executing the generated command list. To create and use a deferred context, Direct3D version 11 provides the
following new DDI functions. These functions are a subset of information that is required to create the
device/immediate context combination.
AbandonCommandList
CalcPrivateDeferredContextSize
CreateDeferredContext
RecycleCreateDeferredContext
The semantics of the CalcPrivateDeferredContextSize and CreateDeferredContext functions are similar to
other similar DDI functions.
The Direct3D runtime passes in a new driver handle and core layer handle for each call to the driver's
CreateDeferredContext function to create each deferred context. The pipeline state of each deferred context must
be equivalent to the pipeline state that the immediate context has after the clear-state operation is performed on it.
The driver must fill members of the D3D11DDI_DEVICEFUNCS structure that the p11ContextFuncs member of
D3D11DDIARG_CREATEDEFERREDCONTEXT structure points to with a subset of the functions from its function
table; the runtime uses each of the corresponding deferred context D3D10DDI_HDEVICE handle values that the
hDrvContext member of D3D11DDIARG_CREATEDEFERREDCONTEXT specifies with this function table.
The driver must continue to provide functions that start with pfnCreate, pfnOpen, and pfnDestroy for the deferred
context. These functions share the same threading semantics as the rest of the deferred context, and they are used
to open and close context-local DDI handles as described in Using Context-Local DDI Handles. Functions that start
with pfnCalcPrivate or pfnCheck are not leveraged for deferred contexts; therefore, the driver can set the members
of D3D11DDI_DEVICEFUNCS for these functions to NULL when the deferred context is created. The majority of
the remaining device functions are leveraged for deferred context support. The driver does not leverage its
QueryGetData function, though. However, the driver leverages its ResourceMap and ResourceUnmap functions.
The driver only supports the ResourceIsStagingBusy function and new DDI functions for Direct3D version 11
resource clamps on the immediate context by using immediate-context handles. For a complete list of the functions
that are not leveraged for deferred contexts, see Excluding DDI Functions for Deferred Contexts.
The driver leverages the core layer callback functions that are provided in the memory block that the
p11UMCallbacks member of D3D11DDIARG_CREATEDEFERREDCONTEXT points to. These core layer callback
functions provide the refresh-state DDI for each deferred context. Most importantly, however, is the addition of the
pfnPerformAmortizedProcessingCb callback function that is described in Changes from Direct3D 10.
The driver should not expect the pfnDisableDeferredStagingResourceDestruction callback function to which
the pfnDisableDeferredStagingResourceDestruction member of
D3D11DDI_CORELAYER_DEVICECALLBACKS points to be valid. The driver should have called
pfnDisableDeferredStagingResourceDestruction within the CreateDevice(D3D10) function for the
device/immediate context; afterward, the driver should never call
pfnDisableDeferredStagingResourceDestruction with the new Direct3D version 11 DDI semantics.
The driver's RecycleCreateDeferredContext function must clear out the pipeline state for the deferred context,
similar to how the driver's CreateDeferredContext clears out the pipeline state for the deferred context. After the
runtime calls the driver's AbandonCommandList, CreateCommandList, or RecycleCreateCommandList, the
runtime can use the deferred context handle with either the driver's DestroyDevice(D3D10) or
RecycleCreateDeferredContext function. For more information about RecycleCreateDeferredContext, see
Optimization for Small Command Lists.
Excluding DDI Functions for Deferred Contexts
operating system.
When the Microsoft Direct3D runtime calls the user-mode display driver's CreateDeferredContext function to
create a deferred context, the driver provides the functions that the runtime can call for that deferred context. The
driver fills members of the D3D11DDI_DEVICEFUNCS structure that the p11ContextFuncs member of the
D3D11DDIARG_CREATEDEFERREDCONTEXT structure points to. The driver provides only a subset of functions
for a deferred context as the driver does for an immediate context.
The driver excludes many functions for deferred contexts by setting the following members of
D3D11DDI_DEVICEFUNCS or D3D11_1DDI_DEVICEFUNCS to NULL:
typedef struct D3D11DDI_DEVICEFUNCS {

...
PFND3D10DDI_RESOURCEMAP pfnStagingResourceMap;
PFND3D10DDI_RESOURCEUNMAP pfnStagingResourceUnmap;
PFND3D10DDI_QUERYGETDATA pfnQueryGetData;
PFND3D10DDI_FLUSH pfnFlush;
PFND3D10DDI_RESOURCEMAP pfnResourceMap;
PFND3D10DDI_RESOURCEUNMAP pfnResourceUnmap;
PFND3D10DDI_RESOURCEISSTAGINGBUSY pfnResourceIsStagingBusy;
PFND3D11DDI_CALCPRIVATERESOURCESIZE pfnCalcPrivateResourceSize;
PFND3D10DDI_CALCPRIVATEOPENEDRESOURCESIZE pfnCalcPrivateOpenedResourceSize;
PFND3D10DDI_OPENRESOURCE fnOpenResource;
PFND3D11DDI_CALCPRIVATESHADERRESOURCEVIEWSIZE pfnCalcPrivateShaderResourceViewSize;
PFND3D10DDI_CALCPRIVATERENDERTARGETVIEWSIZE pfnCalcPrivateRenderTargetViewSize;
PFND3D11DDI_CALCPRIVATEDEPTHSTENCILVIEWSIZE pfnCalcPrivateDepthStencilViewSize;
PFND3D10DDI_CALCPRIVATEELEMENTLAYOUTSIZE pfnCalcPrivateElementLayoutSize;
PFND3D10_1DDI_CALCPRIVATEBLENDSTATESIZE pfnCalcPrivateBlendStateSize;
PFND3D10DDI_CALCPRIVATEDEPTHSTENCILSTATESIZE pfnCalcPrivateDepthStencilStateSize;
PFND3D10DDI_CALCPRIVATERASTERIZERSTATESIZE pfnCalcPrivateRasterizerStateSize;
PFND3D10DDI_CALCPRIVATESHADERSIZE pfnCalcPrivateShaderSize;
PFND3D11DDI_CALCPRIVATEGEOMETRYSHADERWITHSTREAMOUTPUT pfnCalcPrivateGeometryShaderWithStreamOutput;
PFND3D10DDI_CALCPRIVATESAMPLERSIZE pfnCalcPrivateSamplerSize;
PFND3D10DDI_CALCPRIVATEQUERYSIZE pfnCalcPrivateQuerySize;
PFND3D10DDI_CHECKFORMATSUPPORT pfnCheckFormatSupport;
PFND3D10DDI_CHECKMULTISAMPLEQUALITYLEVELS pfnCheckMultisampleQualityLevels;
PFND3D10DDI_CHECKCOUNTERINFO pfnCheckCounterInfo;
PFND3D10DDI_CHECKCOUNTER pfnCheckCounter;
PFND3D11DDI_CHECKDEFERREDCONTEXTHANDLESIZES pfnCheckDeferredContextHandleSizes;
PFND3D11DDI_CALCDEFERREDCONTEXTHANDLESIZE pfnCalcDeferredContextHandleSize;
PFND3D11DDI_CALCPRIVATEDEFERREDCONTEXTSIZE pfnCalcPrivateDeferredContextSize;
PFND3D11DDI_CREATEDEFERREDCONTEXT pfnCreateDeferredContext;
PFND3D11DDI_CALCPRIVATECOMMANDLISTSIZE pfnCalcPrivateCommandListSize;
PFND3D11DDI_CALCPRIVATETESSELLATIONSHADERSIZE pfnCalcPrivateTessellationShaderSize;
PFND3D11DDI_CALCPRIVATEUNORDEREDACCESSVIEWSIZE pfnCalcPrivateUnorderedAccessViewSize;
PFND3D11DDI_SETRESOURCEMINLOD pfnSetResourceMinLOD;
} D3D11DDI_DEVICEFUNCS;
typedef struct D3D11_1DDI_DEVICEFUNCS {
...
PFND3D10DDI_RESOURCEMAP pfnStagingResourceMap;
PFND3D10DDI_RESOURCEUNMAP pfnStagingResourceUnmap;
PFND3D10DDI_QUERYGETDATA pfnQueryGetData;
PFND3D11_1DDI_FLUSH pfnFlush;
PFND3D10DDI_RESOURCEMAP pfnResourceMap;
PFND3D10DDI_RESOURCEUNMAP pfnResourceUnmap;
PFND3D10DDI_RESOURCEISSTAGINGBUSY pfnResourceIsStagingBusy;
PFND3D11DDI_CALCPRIVATERESOURCESIZE pfnCalcPrivateResourceSize;
PFND3D10DDI_CALCPRIVATEOPENEDRESOURCESIZE pfnCalcPrivateOpenedResourceSize;
PFND3D10DDI_OPENRESOURCE fnOpenResource;
PFND3D11DDI_CALCPRIVATESHADERRESOURCEVIEWSIZE pfnCalcPrivateShaderResourceViewSize;
PFND3D10DDI_CALCPRIVATERENDERTARGETVIEWSIZE pfnCalcPrivateRenderTargetViewSize;
PFND3D11DDI_CALCPRIVATEDEPTHSTENCILVIEWSIZE pfnCalcPrivateDepthStencilViewSize;
PFND3D10DDI_CALCPRIVATEELEMENTLAYOUTSIZE pfnCalcPrivateElementLayoutSize;
PFND3D11_1DDI_CALCPRIVATEBLENDSTATESIZE pfnCalcPrivateBlendStateSize;
PFND3D10DDI_CALCPRIVATEDEPTHSTENCILSTATESIZE pfnCalcPrivateDepthStencilStateSize;
PFND3D11_1DDI_CALCPRIVATERASTERIZERSTATESIZE pfnCalcPrivateRasterizerStateSize;
PFND3D11_1DDI_CALCPRIVATESHADERSIZE pfnCalcPrivateShaderSize;
PFND3D11_1DDI_CALCPRIVATEGEOMETRYSHADERWITHSTREAMOUTPUT pfnCalcPrivateGeometryShaderWithStreamOutput;
PFND3D10DDI_CALCPRIVATESAMPLERSIZE pfnCalcPrivateSamplerSize;
PFND3D10DDI_CALCPRIVATEQUERYSIZE pfnCalcPrivateQuerySize;
PFND3D10DDI_CHECKFORMATSUPPORT pfnCheckFormatSupport;
PFND3D10DDI_CHECKMULTISAMPLEQUALITYLEVELS pfnCheckMultisampleQualityLevels;
PFND3D10DDI_CHECKCOUNTERINFO pfnCheckCounterInfo;
PFND3D10DDI_CHECKCOUNTER pfnCheckCounter;
PFND3D11DDI_CHECKDEFERREDCONTEXTHANDLESIZES pfnCheckDeferredContextHandleSizes;
PFND3D11DDI_CALCDEFERREDCONTEXTHANDLESIZE pfnCalcDeferredContextHandleSize;
PFND3D11DDI_CALCPRIVATEDEFERREDCONTEXTSIZE pfnCalcPrivateDeferredContextSize;
PFND3D11DDI_CREATEDEFERREDCONTEXT pfnCreateDeferredContext;
PFND3D11DDI_CALCPRIVATECOMMANDLISTSIZE pfnCalcPrivateCommandListSize;
PFND3D11_1DDI_CALCPRIVATETESSELLATIONSHADERSIZE pfnCalcPrivateTessellationShaderSize;
PFND3D11DDI_CALCPRIVATEUNORDEREDACCESSVIEWSIZE pfnCalcPrivateUnorderedAccessViewSize;
PFND3D11DDI_SETRESOURCEMINLOD pfnSetResourceMinLOD;
} D3D11DDI_DEVICEFUNCS;

Mapping on Deferred Contexts
operating system.
The runtime can map a dynamic resource (through a call to the driver's ResourceMap function) on a deferred
context, as the Direct3D version 11 API ensures that the first use of the mapped dynamic resource is to discard the
previous contents. The best option is to create a new dynamic resource on each discard over continually using the
original dynamic resource. The creation of this aliased resource is required to allow operations that are done to the
virtual dynamic resource in the timeline of the deferred context to not affect the operations that are done to the
virtual dynamic resource in the timeline of the immediate context. Remember, that deferred contexts are merely
recording operations that eventually are actualized during a call to the driver's CommandListExecute function.
When a dynamic resource is used, the original intentions of the application are preserved, and write-combined
GPU-accessible memory is provided to the application (that is, the memory is optimized for single-use CPU
upload).
Each resource map can provide the pointers directly to the aliased resource. There is an additional burden on
deferred context recording to implement this type of aliasing. For example, deferred context recording might
require new views to be created for aliased textures. Integrations with driver aliasing are necessary and seem
plausible to do. When a command list is executed, the last context-local created resource (to satisfy the map-discard
calls) must be substituted as the "current" resource that backs the dynamic resource for the immediate context, and
so on.
A call to the driver's ResourceCopy function to copy a resource to a dynamic resource must still be supported
both on the deferred context, after map-discard calls, and on the immediate context after a call to the driver's
CommandListExecute function, where the local deferred context resource is ideally swapped into the immediate
context version of the "current" resource. A call to the driver's ResourceCopy function with dynamic-resource
destinations is not frequently used, so you should use a copy-on-write mechanism. If ResourceCopy is called that
would affect either the dynamic resource on the deferred context after a map-discard call or on the immediate
context that holds a command list local resource as current, a new resource should be conceptually allocated to
provide the new destination of the copy, and the old resource must be copied to the new resource (if the operation
is a ResourceCopyRegion).
Using Context-Local DDI Handles
operating system.
Each object (for example, resource, shader, and so on) has context-local DDI handles.
Suppose an object is used with three deferred contexts. In this situation, four handles refer to the same object (one
handle for each deferred context and another handle for the immediate context). Because each context can be
manipulated by a thread concurrently, a context-local handle ensures that multiple CPU threads do not contend
over similar memory (either intentionally or unintentionally). Context-local handles are also intuitive because the
driver probably must modify much of this data that is logically associated per context anyway (for example, the
object might be bound by the context, and so on).
There is still the distinction of an immediate context handle versus a deferred context handle. In particular, the
immediate context handle is guaranteed to be the first handle that is allocated and the last handle that is destroyed.
The corresponding immediate context handle is provided during "opening" of each deferred context handle to link
them together. There is currently no concept of an object having a per-device DDI handle (that is, a handle that is
created before and destroyed after the immediate context handle, and would only be referenced in order by
context handle creation).
Some handles have dependency relationships with other handles (for example, views have a dependency on their
corresponding resource). The creation and destruction ordering guarantee that exists for the immediate context is
extended to deferred context handles as well (that is, the runtime creates a context-local resource handle before the
runtime creates any context-local view handles to that resource, and the runtime destroys a context-local resource
handle after the runtime destroys all context-local view handles to that resource). When the runtime creates a
context-local handle, the runtime provides the corresponding context-local dependency handles as well.
Driver Data Organization
There are a few concerns about driver data organization that need attention. Like Direct3D version 10, the proper
locality of data can reduce cache misses between the API and driver. The proper locality of data can also prevent
the cache thrashing, which occurs when multiple pieces of frequently accessed data all resolve to the same cache
index and exhaust the associatively of the cache. The DDI has been designed since Direct3D version 10 to help
avoid such issues from manifesting by the driver informing the API how much memory the driver requires to
satisfy a handle and the API assigning the value of the handle. However, new thread-related concerns impact the
DDI design in the Direct3D version 11 timeframe.
Naturally, context-local handles provide a way to associate object data per-context, which avoids contention issues
between threads. However, since such data is replicated for each deferred context, the size of such data is a major
concern. That provides the natural rationalization to share read-only data between the immediate context handle
and the deferred context handles. During creation of deferred context handles, the immediate context handle is
provided to establish the connection between handles. However, any data that is located off of the deferred context
handles gain locality benefits with API data, and the additional level of indirection to read-only data prevents
locality benefits from extending to the read-only data. Some read-only data can be replicated into each context
handle region if the locality benefits justify the data duplication. However, the memory that backs each deferred
context handle should be considered at such a premium that it might be worthwhile to relocate data that is
nonadjacent from the handle if that data is relatively large and not accessed as frequently as other data. Ideally, the
type of data that is associated with each deferred context handle would be all high-frequency data anyway;
therefore, the data would not be large enough to consider relocation necessary. Naturally, the driver must balance
these conflicting motivations.
In order to make the driver data design efficiently compatible with Direct3D version 10, yet not divergent in
implementation, the read-only data should be located contiguous (but still segregated from and after) the
immediate context handle data. If the driver uses this design, the driver must be aware that cache-line padding is
required between the immediate context handle data and the read-only data. Because a thread might manipulate
each context handle data frequently (if not concurrently), false-sharing penalties occur between the immediate
context handle data and deferred context handle data if cache-line padding is not used. The driver design must be
cognizant of false-sharing penalties that manifest if pointers are established and traversed regularly between
context handle memory regions.
The Direct3D runtime uses the following Direct3D 11 DDI for deferred context local handles:
The CheckDeferredContextHandleSizes function verifies the sizes of the driver-private memory spaces
that hold the handle data of deferred context handles.
The CalcDeferredContextHandleSize function determines the size of the region of memory for a deferred
context.
For the Direct3D runtime to retrieve the deferred context handle size that is required by the driver, the preceding
DDI functions must be used. Immediately after creation of an object for the immediate context, the runtime calls
CalcDeferredContextHandleSize to query the driver for the amount of storage space that the driver requires to
satisfy deferred context handles to this object. However, the Direct3D API must tune its CLS memory allocator by
determining how many unique handle sizes and their values are accessed; the runtime calls the driver's
CheckDeferredContextHandleSizes function to obtain this information. Therefore, during device instantiation,
the API requests an array of deferred context handle sizes by double polling. The first poll is to request how many
sizes are returned, while the second poll passes in an array to retrieve the value of each size. The driver must
indicate how much memory it requires to satisfy a handle along with which handle type. The driver can return
multiple sizes that are associated with a particular handle type. However, it is undefined for the driver to ever
return a value from CalcDeferredContextHandleSize that was not also correspondingly returned in the
CheckDeferredContextHandleSizes array.
As for creating the DDI handles, the create methods on the deferred context are used. For example, examine the
CreateBlendState(D3D10_1) and DestroyBlendState functions. The HDEVICE naturally points to the appropriate
deferred context (versus the immediate context); other CONST structure pointers are NULL (assuming the object
has no dependencies); and, the D3D10DDI_HRT* handle is a D3D10DDI_H* handle to the corresponding immediate
context object.
For objects that have dependencies (for example, views have a dependency relationship on their corresponding
resource), the structure pointer that provides the dependency handle is not NULL. However, the only valid member
of the structure is the dependency handle; whereas, the rest of the members are filled with zero. As an example, the
D3D11DDIARG_CREATESHADERRESOURCEVIEW pointer in a call to the driver's
CreateShaderResourceView(D3D11) function will not be NULL when the runtime calls this function on a
deferred context. In this CreateShaderResourceView(D3D11) call, the runtime assigns the appropriate context-local
handle for the resource to the hDrvResource member of D3D11DDIARG_CREATESHADERRESOURCEVIEW. The
rest of the members of D3D11DDIARG_CREATESHADERRESOURCEVIEW, though, are filled with zero.
The following example code shows how the Direct3D runtime translates an application's create request and the
first use of deferred context to calls to the user-mode display driver to create immediate versus deferred contexts.
The application's call to ID3D11Device::CreateTexture2D initiates the runtime code in the following "Resource
Create" section. The application's call to ID3D11Device::CopyResource initiates the runtime code in the following
"Deferred Context Resource Usage" section.
// Device Create
IC::pfnCheckDeferredContextHandleSizes( hIC, &u, NULL );
pArray = malloc( u * ... );
IC::pfnCheckDeferredContextHandleSizes( hIC, &u, pArray );
// Resource Create
s = IC::pfnCalcPrivateResourceSize( hIC, &Args );
pICRHandle = malloc( s );
IC::pfnCreateResource( hIC, &Args, pICRHandle, hRTResource );
s2 = IC::pfnCalcDeferredContextHandleSize( hIC, D3D10DDI_HT_RESOURCE, pICRHandle );
// Deferred Context Resource Usage

pDCRHandle = malloc( s2 );
DC::pfnCreateResource( hDC, NULL, pDCRHandle, pICRHandle );
Issues with pfnSetErrorCb

None of the create functions return an error code, which would have been ideal for the Direct3D version 11
threading model. All of the create functions use pfnSetErrorCb to retrieve error codes back from the driver. To
maximize compatibility with the Direct3D version 10 driver model, new DDI create functions that return error
codes were not introduced. Instead, the driver must continue to use the unified device/immediate context
D3D10DDI_HRTCORELAYER handle with pfnSetErrorCb during the creation functions. When the driver supports
command lists, the driver should use the appropriate pfnSetErrorCb that is associated with the corresponding
context. That is, deferred context errors should go to the particular deferred context call to pfnSetErrorCb with the
corresponding handle, and so on.
Deferred contexts can return E_OUTOFMEMORY through a call to pfnSetErrorCb from DDI functions that
previously only allowed D3DDDIERR_DEVICEREMOVED (like Draw, SetBlendState, and so on), since deferred
context memory demands perpetually grow with each call to a DDI function. The Direct3D API triggers a local
context removal, to assist the driver with such a failure case, which effectively tosses out the partially built
command list. The application continues to determine that it is recording a command list; however, when the
application eventually calls the FinishCommandList function, FinishCommandList returns a failure code of
E_OUTOFMEMORY.
Supporting Command Lists
This section applies only to Windows 7 and later, and Windows Server 2008 R2 and later versions of Windows.
The Direct3D runtime uses the following Direct3D 11 DDI for command lists:
The CalcPrivateCommandListSize function determines the size of the user-mode display driver's private
region of memory for a command list.
The CreateCommandList function creates a command list.
The RecycleCommandList function recycles a command list.
The RecycleCreateCommandList function creates a command list and makes a previously unused DDI
handle completely valid again.
The DestroyCommandList function destroys a command list.
The RecycleDestroyCommandList function notifies the driver that lightweight destruction of a command
list is required.
The CommandListExecute function runs a command list.
The semantics for the driver's CommandListExecute, CalcPrivateCommandListSize, CreateCommandList,
and DestroyCommandList functions are mostly self explanatory, based on other similar DDI functions and the
API documentation for the corresponding DDI.
After the Direct3D runtime successfully calls the driver's CreateCommandList or RecycleCreateCommandList
function on the deferred context that is specified in the hDeferredContext member of the
D3D11DDIARG_CREATECOMMANDLIST structure that the pCreateCommandList parameter points to, the
Direct3D runtime performs the following destruction sequence on the deferred context:
1. The Direct3D runtime "closes" all open deferred object handles. Note that these handles might still appear
bound to the deferred context.
2. The runtime destroys the deferred context.
During the call to CreateCommandList or RecycleCreateCommandList, any calls that the driver makes to the
state-refresh DDI callback functions continue to divulge the current state of the deferred context. However, during
the "closing" and destruction of the deferred context, any calls to the state-refresh DDI reflect that nothing is bound
(that is, immediately after the call to CreateCommandList or RecycleCreateCommandList, everything is implicitly
unbound).
A deferred context can also be abandoned either explicitly by the application or due to an error condition by the
API or the driver. For such cases, the Direct3D runtime performs the following sequence:
1. The Direct3D runtime calls the driver's AbandonCommandList function.
2. The runtime unbinds handles from the deferred context one by one.
3. The runtime "closes" all open deferred object handles.
4. The runtime either recyles or destroys the deferred context.
The preceding sequence is similar to the destruction sequence of an immediate context. The call to the driver's
AbandonCommandList function provides an opportunity for the driver to apply state into whatever the driver
prefers.
During the call to the driver's CommandListExecute function, the driver must transition the state of the deferred
context to make it equivalent to the state when the device was created. This operation is also known as a clear-state
operation. During the call to the driver's CommandListExecute function, however, any calls that the driver makes
to the state-refresh DDI callback functions still reflect the state of what was bound during the last DDI call to a
driver function. During the next DDI call to a driver function, any calls that the driver makes to the state-refresh DDI
callback functions show the current state as completely empty, which reflects the state transition implicit from
CommandListExecute. This fact differs slightly from the typical semantics and behavior of the state-refresh DDI
callback functions. If the driver had called a state-refresh DDI callback function during a call to one of the driver's
SetShader functions, the state-refresh DDI callback function would show as already bound the new shader that is
being bound. This divergence of state-refresh DDI callback behavior provides more flexibility to the driver to reflect
the old state during CommandListExecute.
The Direct3D version 11 API ensures that no query has been both manipulated (that is, had QueryBegin or
QueryEnd called on it) by the command list and been only "begun" by the context that attempts to execute the
command list. The API also ensures that no command list that recorded the map of a dynamic resource is executed
on a context that has the same resource currently mapped. Before an application calls the FinishCommandList
function, the Direct3D runtime calls the driver's QueryEnd and ResourceUnmap DDI function on any query or
dynamic resource that still holds a begun query or mapped resource open because FinishCommandList implicitly
terminates query ranges and unmaps any mapped resource.
Optimization for Small Command Lists
A memory-recycling optimization for small-memory-amount command lists can be important to reduce
contention among command-list DDI function calls and to reduce the overhead of call processing that is required
for command lists. The processing overhead that is inherant in each command list is significant. This optimization
is meant for command lists where the processing overhead that is required for the command lists dominates the
CPU time and memory space that is required for the command lists. A small-memory-amount command list is, for
example, a single graphics command, like CopyResource. The amount of memory required for CopyResource is
two pointers. However, CopyResource still requires the same amount of command-list call processing as a large-
memory-amount command list. When small-memory-amount command lists are generated at high frequency, the
processing overhead required for the runtime to call the driver's CreateCommandList, DestroyCommandList,
CreateDeferredContext, and DestroyDevice(D3D10) functions (for deferred context) becomes increasingly
important. The memory referred to here is system memory that holds driver data structures, which includes the
memory for DDI handles.
The driver's RecycleCommandList function must notify the driver when driver handles go out-of-use (but are not
yet deleted), and when previously unused driver handles are re-used. This notification applies to both command-
list and deferred-context handles. The only memory the driver must recycle is the memory that the DDI handle
points to. While the objective of RecycleCommandList is to recycle memory that is associated with the handle, for
efficiency the driver has complete flexibility to pick and choose which memory to recycle. The driver cannot change
the size of the region of memory to which the immediate-context command list handle points. This size is the
return value of CalcPrivateCommandListSize. The driver also cannot change the size of the region of memory to
which the context command list local handle points.This size is the return value of
CalcDeferredContextHandleSize.
The driver's RecycleCreateCommandList and RecycleCreateDeferredContext DDI functions must return out-
of-memory error codes as E_OUTOFMEMORY HRESULT values. These functions do not provide such error codes
through calls to the pfnSetErrorCb function. This driver requirement prevents the runtime from having to use
device-wide synchronization to watch for immediate context errors from these create-type driver functions.
Watching for these errors would be a source of catastrophic contention for small-memory-amount command lists.
The distinctions among the driver's RecycleDestroyCommandList, RecycleCommandList, and
RecycleCreateCommandList functions are important. Their features include the following:
RecycleDestroyCommandList
The runtime calls the driver's RecycleDestroyCommandList function to notify the driver that lightweight
destruction is required. That is, the driver should not yet de-allocate the memory for the DDI command-list handle.
The driver's RecycleDestroyCommandList function is free-threaded just like the driver's DestroyCommandList
function.
RecycleCommandList
The driver's RecycleCommandList function informs the driver that the runtime integrated a command-list handle
back into the deferred-context cache. The function then provides the driver with an opportunity to integrate
memory that is associated with the command list back into the deferred-context cache. The runtime calls the
driver's RecycleCommandList function from the deferred-context thread. The RecycleCommandList DDI
function reduces the need for the driver to perform synchronization of its own.
RecycleCreateCommandList
The runtime calls the driver's RecycleCreateCommandList function to make a previously unused DDI handle
completely valid again.
These recycling DDI functions provide optimization opportunities to help recycle resources for small-memory-
amount command lists. The following pseudocode shows the implementation of the runtime through the flow of
function calls from the API to the DDI :
::FinishCommandList()
{
// Empty InterlockedSList, integrating into the cache
Loop { DC::pfnRecycleCommandList }
If (Previously Destroyed CommandList Available)

{ IC::pfnRecycleCreateCommandList }
else
{
IC::pfnCalcPrivateCommandListSize
IC::pfnCreateCommandList
IC::pfnCalcDeferredContextHandleSize(D3D11DDI_HT_COMMANDLIST)
}
Loop { DC::pfnDestroy* (context-local handle destroy) }
IC::pfnRecycleCreateDeferredContext
}
...
Sporadic: DC::pfnCreate* (context-local open during first-bind per CommandList)
CommandList::Destroy()
{
// If DC still alive, almost always recycle:
If (DC still alive)
{ IC::pfnRecycleDestroyCommandList }
Else
{ IC::pfnDestroyCommandList }
// Add to InterlockedSList
}
The following state diagram shows the validity of an immediate-context DDI command-list handle. The green state
represents a handle that can be used with CommandListExecute.
Conforming to the DXGI DDI
operating system.
The Direct3D version 11 DDI conforms to the DirectX Graphics Infrastructure (DXGI) DDI's definition for resource
interfaces, device enumeration, and presentation.
Presentation
Because Direct3D version 11 devices must support presentation from any scan-out capable format, user-mode
display drivers will be required to field present operations through their display miniport drivers (kernel-mode
drivers) that call for color conversion from any of the scan-out formats to any other scan-out format and also to
standard GDI scan-out formats. These scan-out formats are known by the following values from the DXGI_FORMAT
enumeration:
DXGI_FORMAT_B5G6R5_UNORM
DXGI_FORMAT_B8G8R8X8_UNORM
There are back buffer restrictions with the Direct3D version 11 DDI. If DXGI_USAGE_BACKBUFFER (from the
DXGI_USAGE enumeration) is set, the following are the only other DXGI usages that are allowed:
DXGI_USAGE_SHADERINPUT, which maps to D3D11_BIND_SHADER_RESOURCE
DXGI_USAGE_RENDER_TARGET_OUTPUT, which maps to D3D11_BIND_RENDER_TARGET
Note that no CPU access flags are allowed for back buffers.
Direct3D 11 video playback improvements
With wider adoption of Microsoft Direct3D 10 technologies in mainstream apps, some app developers want to
treat all content the same. This is challenging to do with video on the Microsoft Direct3D 9 API when all 2-D and 3-
D content is processed through the Direct3D 10 or 11 APIs. Because Windows 8 introduces video on Microsoft
Direct3D 11, applications can use a single API to perform all graphical operations.
Driver implementation—Full graphics and Render only Mandatory for all WDDM 1.2 drivers with Microsoft Direct3D
10-, 10.1-, 11-, or 11.1-capable hardware (or later)
WHCK requirements and tests Device.Graphicsâ€¦DX11 Video Decode FeatureLevel

9
Device.Graphicsâ€¦DX11 VideoProcessing
These are key benefits to using Direct3D 11:

Direct3D 11 video simplifies interoperability between Microsoft Media Foundation and Microsoft DirectX
technologies.
Using multiple APIs is harder to program, so using video on Direct3D 11 simplifies the programming experience
and makes the app more efficient. The API provides more flexibility in using decoded and processed video.
The Direct3D 11 API for stereoscopic 3-D video unpacks stereo frames into left- and right-eye images.
It has parity with DirectX Video Acceleration (DXVA) 2.0 and DXVA-HD in decoding and video processing
capabilities.
It works in Session 0 for transcoding scenarios.
Direct3D 11 video device driver interfaces (DDIs)

These device driver interfaces (DDIs) are new or updated for Windows 8:
CalcPrivateCryptoSessionSize
CalcPrivateAuthenticatedChannelSize
CalcPrivateVideoDecoderOutputViewSize
CalcPrivateVideoDecoderSize
CalcPrivateVideoProcessorEnumSize
CalcPrivateVideoProcessorInputViewSize
CalcPrivateVideoProcessorOutputViewSize
CalcPrivateVideoProcessorSize
CheckFormatSupport
CheckVideoDecoderFormat
CheckVideoProcessorFormat
ConfigureAuthenticatedChannel(D3D11_1)
CreateAuthenticatedChannel(D3D11_1)
CreateCryptoSession
CreateResource2
CreateVideoDecoder
CreateVideoDecoderOutputView
CreateVideoProcessor
CreateVideoProcessorEnum
CreateVideoProcessorInputView
CreateVideoProcessorOutputView
CryptoSessionGetHandle
DecryptionBlt(D3D11_1)
DestroyAuthenticatedChannel
DestroyCryptoSession
DestroyVideoDecoder
DestroyVideoDecoderOutputView
DestroyVideoProcessor
DestroyVideoProcessorEnum
DestroyVideoProcessorInputView
DestroyVideoProcessorOutputView
EncryptionBlt(D3D11_1)
FinishSessionKeyRefresh
GetCaptureHandle
GetCertificate
GetCertificateSize
GetContentProtectionCaps
GetCryptoKeyExchangeType
GetEncryptionBltKey
GetVideoDecoderBufferInfo
GetVideoDecoderBufferTypeCount
GetVideoDecoderConfig
GetVideoDecoderConfigCount
GetVideoDecoderProfile
GetVideoDecoderProfileCount
GetVideoProcessorCaps
GetVideoProcessorCustomRate
GetVideoProcessorFilterRange
GetVideoProcessorRateConversionCaps
NegotiateAuthenticatedChannelKeyExchange
NegotiateCryptoSessionKeyExchange
QueryAuthenticatedChannel(D3D11_1)
RetrieveSubObject(D3D11_1)
StartSessionKeyRefresh
VideoDecoderBeginFrame
VideoDecoderEndFrame
VideoDecoderExtension
VideoDecoderGetHandle
VideoDecoderSubmitBuffers
VideoProcessorBlt
VideoProcessorGetOutputExtension
VideoProcessorGetStreamExtension
VideoProcessorInputViewReadAfterWriteHazard
VideoProcessorSetOutputAlphaFillMode
VideoProcessorSetOutputBackgroundColor
VideoProcessorSetOutputColorSpace
VideoProcessorSetOutputConstriction
VideoProcessorSetOutputExtension
VideoProcessorSetOutputStereoMode
VideoProcessorSetOutputTargetRect
VideoProcessorSetStreamAlpha
VideoProcessorSetStreamAutoProcessingMode
VideoProcessorSetStreamColorSpace
VideoProcessorSetStreamDestRect
VideoProcessorSetStreamExtension
VideoProcessorSetStreamFilter
VideoProcessorSetStreamFrameFormat
VideoProcessorSetStreamLumaKey
VideoProcessorSetStreamOutputRate
VideoProcessorSetStreamPalette
VideoProcessorSetStreamPixelAspectRatio
VideoProcessorSetStreamRotation
VideoProcessorSetStreamSourceRect
VideoProcessorSetStreamStereoFormat
D3D10_DDI_RESOURCE_BIND_FLAG
D3D10_DDI_RESOURCE_MISC_FLAG
D3D10DDIARG_CREATEDEVICE
D3D11_1DDI_VIDEO_PROCESSOR_ALPHA_FILL_MODE
D3D11_1DDI_VIDEO_PROCESSOR_AUTO_STREAM_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_COLOR_SPACE
D3D11_1DDI_VIDEO_PROCESSOR_CONTENT_DESC
D3D11_1DDI_VIDEO_PROCESSOR_CONVERSION_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_CUSTOM_RATE
D3D11_1DDI_VIDEO_PROCESSOR_DEVICE_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_FEATURE_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_FILTER
D3D11_1DDI_VIDEO_PROCESSOR_FILTER_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_FILTER_RANGE
D3D11_1DDI_VIDEO_PROCESSOR_FORMAT_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_FORMAT_SUPPORT
D3D11_1DDI_VIDEO_PROCESSOR_ITELECINE_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_OUTPUT_RATE
D3D11_1DDI_VIDEO_PROCESSOR_RATE_CONVERSION_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_ROTATION
D3D11_1DDI_VIDEO_PROCESSOR_STEREO_CAPS
D3D11_1DDI_VIDEO_PROCESSOR_STEREO_FLIP_MODE
D3D11_1DDI_VIDEO_PROCESSOR_STEREO_FORMAT
D3D11_1DDI_VIDEO_PROCESSOR_STREAM
D3D11_1DDI_VIDEO_USAGE
D3D11_1DDI_VIDEODEVICEFUNCS
D3D11_1DDIARG_CREATEAUTHENTICATEDCHANNEL
D3D11_1DDIARG_CREATECRYPTOSESSION
D3D11_1DDIARG_CREATEVIDEODECODER
D3D11_1DDIARG_CREATEVIDEODECODEROUTPUTVIEW
D3D11_1DDIARG_CREATEVIDEOPROCESSOR
D3D11_1DDIARG_CREATEVIDEOPROCESSORENUM
D3D11_1DDIARG_CREATEVIDEOPROCESSORINPUTVIEW
D3D11_1DDIARG_CREATEVIDEOPROCESSOROUTPUTVIEW
D3D11_1DDIARG_SIGNATURE_ENTRY
D3D11_1DDIARG_STAGE_IO_SIGNATURES
D3D11_1DDIARG_TESSELLATION_IO_SIGNATURES
D3D11_1DDIARG_VIDEODECODERBEGINFRAME
D3D11_1DDIARG_VIDEODECODEREXTENSION
D3D11_DDI_SHADER_MIN_PRECISION
D3D11_DDI_SHADER_MIN_PRECISION_SUPPORT_DATA
D3D11_DDI_VIDEO_DECODER_BUFFER_TYPE
D3D11DDI_HANDLETYPE
D3D11DDIARG_CREATEDEFERREDCONTEXT
D3D11DDIARG_CREATERESOURCE
D3DDDI_RESOURCEFLAGS2
D3DDDIARG_CREATERESOURCE2
DXVAHDDDI_ROTATION
DXVAHDDDI_STREAM_STATE
DXVAHDDDI_STREAM_STATE_ROTATION_DATA
DXVAHDDDI_VPDEVCAPS
FORMATOP

Direct3D 11 API support is required on all Windows 8 hardware.
WHCK documentation on Device.Graphicsâ€¦DX11 Video Decode FeatureLevel 9 and
Device.Graphicsâ€¦DX11 VideoProcessing.
operating system.
To more efficiently process video, including high-definition video, a user-mode display driver should implement
the DirectX Video Acceleration (VA) High-Definition (DXVA-HD) DDI that ships with Windows 7. The following
sections describe the DXVA-HD DDI and programming considerations that you need to be aware of when you use
the DXVA-HD DDI in your user-mode display driver.
DXVA-HD DDI
DXVA-HD DDI Programming Considerations
DXVA-HD DDI
operating system.
The DXVA-HD DDI is an extension to the Direct3D version 9 DDI to handle the processing of high-definition video.
The DXVA-HD DDI consists of the following entry points:
The following D3DDDICAPS_TYPE values are used by the Direct3D runtime to retrieve information about
the high-definition video processing capabilities that the user-mode display driver supports. The runtime
sets these D3DDDICAPS_TYPE values in the Type member of the D3DDDIARG_GETCAPS structure that
the pData parameter of the driver's GetCaps function points to when the runtime calls GetCaps.
D3DDDICAPS_DXVAHD_GETVPDEVCAPS
The driver provides a pointer to a DXVAHDDDI_VPDEVCAPS structure for the video processor capabilities
that the decode device (which is specified in a DXVAHDDDI_DEVICE_DESC structure that is pointed to by
the pInfo member of D3DDDIARG_GETCAPS) supports.
D3DDDICAPS_DXVAHD_GETVPOUTPUTFORMATS
The driver provides an array of D3DDDIFORMAT enumeration types that represent the output formats for
the decode device (which is specified in a DXVAHDDDI_DEVICE_DESC structure that is pointed to by the
pInfo member of D3DDDIARG_GETCAPS).
D3DDDICAPS_DXVAHD_GETVPINPUTFORMATS
The driver provides an array of D3DDDIFORMAT enumeration types that represent the input formats for
the decode device (which is specified in a DXVAHDDDI_DEVICE_DESC structure that is pointed to by the
pInfo member of D3DDDIARG_GETCAPS).
D3DDDICAPS_DXVAHD_GETVPCAPS
The driver provides an array of DXVAHDDDI_VPCAPS structures for the capabilities for each video
processor that the decode device (which is specified in a DXVAHDDDI_DEVICE_DESC structure that is
pointed to by the pInfo member of D3DDDIARG_GETCAPS) supports.
D3DDDICAPS_DXVAHD_GETVPCUSTOMRATES
The driver provides an array of DXVAHDDDI_CUSTOM_RATE_DATA structures for the custom frame rates
that a video processor (which is specified by a CONST_GUID that is pointed to by the pInfo member of
D3DDDIARG_GETCAPS) supports.
D3DDDICAPS_DXVAHD_GETVPFILTERRANGE
The driver provides a pointer to a DXVAHDDDI_FILTER_RANGE_DATA structure for the range that the
filter (which is specified by a DXVAHDDDI_FILTER enumeration value that is pointed to by the pInfo
member of D3DDDIARG_GETCAPS) supports.
The CreateVideoProcessor function creates a video processor that can process high-definition video.
The SetVideoProcessBltState function sets the state of a bit-block transfer (bitblt) for a video processor.
The GetVideoProcessBltStatePrivate function retrieves the state data of a private bitblt for a video
processor.
The SetVideoProcessStreamState function sets the state of a stream for a video processor.
The GetVideoProcessStreamStatePrivate function retrieves the private stream-state data for a video
processor.
The VideoProcessBltHD function processes video input streams and composes to an output surface.
The DestroyVideoProcessor function releases resources for a previously created video processor.
DXVA-HD DDI Programming Considerations
operating system.
When you implement the DXVA-HD DDI in your user-mode display driver, you should consider the following
programming tips:
The driver must set the D3DCAPS3_DXVAHD (0x00000400L) bit in the Caps3 member of D3DCAPS9
structure to indicate that it supports the DXVA-HD DDI, otherwise the Direct3D runtime fails to call the
CreateVideoProcessor function to create a DXVA-HD device. The D3DCAPS9 structure is described in the
DirectX 9.0 SDK documentation. The driver sets the D3DCAPS3_DXVAHD bit in response to a call to its
GetCaps function in which the D3DDDICAPS_GETD3D9CAPS value is set in the Type member of the
D3DDDIARG_GETCAPS structure that the pData parameter points to.
The DXVAHD_SURFACE_TYPE_VIDEO_INPUT_PRIVATE value of the application-level
DXVAHD_SURFACE_TYPE enumeration has no corresponding DDI value. An application sets the
DXVAHD_SURFACE_TYPE_VIDEO_INPUT_PRIVATE value for an off-screen plain surface that is allocated in
different format type for the CPU or a shader-base video processor plug-in.
The DXVAHD_SURFACE_TYPE_VIDEO_OUTPUT value of the application-level DXVAHD_SURFACE_TYPE
enumeration corresponds to the VideoProcessRenderTarget bit-field flag of the
D3DDDI_RESOURCEFLAGS structure. The Direct3D runtime sets VideoProcessRenderTarget in the Flags
member of the D3DDDIARG_CREATERESOURCE structure when the runtime calls the driver's
CreateResource function to create a video processing render target.
The Direct3D runtime maintains both bit-block transfer (bitblt) and stream states. The runtime returns to the
application when the runtime is queried.
The application-level IDXVAHD_VideoProcessor::GetVideoProcessBltState method has no
corresponding DDI function. However, when an application calls
IDXVAHD_VideoProcessor::GetVideoProcessBltState to retrieve the private bitblt state data for a video
processor, the Direct3D runtime calls the driver's GetVideoProcessBltStatePrivate function.
The application-level IDXVAHD_VideoProcessor::GetVideoProcessStreamState method has no
corresponding DDI function. However, when an application calls
IDXVAHD_VideoProcessor::GetVideoProcessBltState to retrieve the private stream state data for a video
processor, the Direct3D runtime calls the driver's GetVideoProcessStreamStatePrivate function.
The DXVAHD_STREAM_STATE_D3DFORMAT value of the application-level DXVAHD_STREAM_STATE
enumeration has no corresponding DDI value in the DXVAHDDDI_STREAM_STATE enumeration. The
video processor plug-in uses the DXVAHD_STREAM_STATE_D3DFORMAT value for a surface that is
allocated with the DXVAHD_SURFACE_TYPE_VIDEO_INPUT_PRIVATE value of the application-level
DXVAHD_SURFACE_TYPE enumeration.
The DXVAHD_DEVICE_TYPE enumeration has no corresponding DDI enumeration (for example, no
DXVAHDDDI_DEVICE_TYPE). The first member of the DXVAHDDDI_VPDEVCAPS structure is reserved
whereas the first member of the application-level DXVAHD_VPDEVCAPS structure is set to a
DXVAHD_DEVICE_TYPE value in the DeviceType member. The DeviceType member is set by the runtime
or the video processor plug-in, which always reports the driver as DXVAHD_DEVICE_TYPE_HARDWARE.
The Multiplier member of the DXVAHDDDI_FILTER_RANGE_DATA structure is a floating-point value. The
driver should use a value that can be represented exactly as a base 2 fraction. For example, 0.25 can be
represented exactly as a base 2 fraction but 0.1 cannot.
Any DXVA-HD DDI function should return S_OK, E_INVALIDARG or E_OUTOFMEMORY.
Apps can signal user-mode display drivers to take advantage of extended-range [0, 255] YUV video formats
starting in Windows 8.1, as shown in this table:
YUV RANGE INPUT DATA RANGE TYPICAL USAGE STANDARD
extended range [0, 255] consumer equipment: JFIF standard, and MJPEG
webcams and point-and- video format uses as the
shoot cameras default
studio luminance range [16, 235] professional cameras and ITU BT.601 and BT.709
video equipment
Most video produced by the content and broadcast industry is in studio range, while video produced by individual
consumers is in extended range. Extended range is also called full luminance range.
Before Windows 8.1, the Microsoft Media Foundation video processing pipeline acted on all input data as if it were
in studio range, which results in reduced dynamic range and often harsh contrast if the input data was actually in
extended range.
Starting in Windows 8.1, when video input YUV formats are in extended range, apps can notify drivers of this
higher dynamic range.
Converting extended-range YUV format

These images show how YUV extended-range content that ranges from dark to light values is converted
(interpreted) to RGB format:
The top image shows extended-range content interpreted incorrectly, as if it were studio range.
The bottom image shows extended-range content interpreted correctly.
The incorrect interpretation in the top image shows increased contrast and highlights become excessively bright
before pure white is reached.
Extended-range YUV interface

Before Windows 8.1, Media Foundation only supported studio luminance range, so interpretations of extended-
range images resulted in increased contrast, as shown in the first image above. Starting with Windows 8.1, the
Media Foundation pipeline uses these structures and enumerations to indicate to Windows Display Driver Model
(WDDM) 1.3 and later user-mode display drivers whether extended-range or studio-range YUV content is being
played or captured:
New enumerations
D3D11_1DDI_VIDEO_PROCESSOR_NOMINAL_RANGE
DXVAHDDDI_NOMINAL_RANGE
Changed structures and enumerations
D3D11_1DDI_VIDEO_PROCESSOR_COLOR_SPACE
D3D11_1DDI_VIDEO_PROCESSOR_DEVICE_CAPS
DXVAHDDDI_BLT_STATE_OUTPUT_COLOR_SPACE_DATA
DXVAHDDDI_STREAM_STATE_INPUT_COLOR_SPACE_DATA
DXVAHDDDI_VPDEVCAPS
Note WDDM 1.3 and greater user-mode display drivers must support all of these new and changed structures and
enumerations.
See YUV-RGB data range conversions for details on how to convert between different input RGB and YUV formats.
YUV-RGB data range conversions
If you want to convert from RGB or YUV inputs to YUV or RGB outputs, the expected behavior depends on the input
data range:
OPERATIO
INPUT INPUT INPUT INPUT OUTPUT OUTPUT OUTPUT OUTPUT N
data format RGB nominal RGB nominal format data
range range range range range range
0-255 YUV N/A 2 N/A 2 YUV 0-255 None
16-235 YUV N/A 1 N/A 1 YUV 16-235 None
16-235 YUV N/A 1 N/A 2 YUV 0-255 Scale
0-255 YUV N/A 2 N/A 1 YUV 16-235 Scale
0-255 RGB 0 N/A N/A 1 YUV 16-235 RGBtoYU

V
0-255 RGB 0 N/A N/A 2 YUV 0-255 RGBtoYU

V
16-235 YUV N/A 1 0 N/A RGB 0-255 YUVtoRG

B
0-255 YUV N/A 2 0 N/A RGB 0-255 YUVtoRG

B
In this case the "nominal range" is the constant value from the DXVAHDDDI_NOMINAL_RANGE enumeration.
See YUV format ranges in Windows 8.1 for definitions of YUV format ranges.
operating system.
To protect video content, a user-mode display driver should implement the Content Protection DDI that ships with
Windows 7. The following sections describe the Content Protection DDI.
Content Protection DDI
Using Crypto Session with DirectX Video Accelerator 2.0 Decoder
Content Protection DDI
operating system.
The Content Protection DDI is an extension to the Direct3D version 9 DDI to protect video. The Content Protection
DDI consists of the entry points that are described in this section.
Required Content Protection DDI Functions
If content protection is implemented in the user-mode display driver, the driver must support the following
Content Protection DDI functions:
The CreateAuthenticatedChannel function creates a channel that the Direct3D runtime and the driver can
use to set and query protections.
The AuthenticatedChannelKeyExchange function negotiates the session key.
The QueryAuthenticatedChannel function queries an authenticated channel for capability and state
information.
The ConfigureAuthenticatedChannel function sets state within an authenticated channel.
The DestroyAuthenticatedChannel function releases resources for an authenticated channel.
The CreateCryptoSession function creates a crypto session that the Direct3D runtime uses to manage a
session key and to perform crypto operations into and out of protected memory.
The CryptoSessionKeyExchange function negotiates the session key.
The DestroyCryptoSession function releases resources for an encryption session.
Content Protection Capabilities
The user-mode display driver only reports content protection capabilities if it supports each of the preceding
required Content Protection DDI functions. The following D3DDDICAPS_TYPE values are used by the Direct3D
runtime to retrieve information about the content protection capabilities that the user-mode display driver
supports. The runtime sets these D3DDDICAPS_TYPE values in the Type member of the D3DDDIARG_GETCAPS
structure that the pData parameter of the driver's GetCaps function points to when the runtime calls GetCaps.
D3DDDICAPS_GETCONTENTPROTECTIONCAPS
The runtime supplies a pointer to a DDICONTENTPROTECTIONCAPS structure for the specific encryption and
decode combination that the driver should use. The driver returns a pointer to a populated
D3DCONTENTPROTECTIONCAPS structure that describes the driver's content-protection capabilities for the
encryption and decode combination. For more information about D3DCONTENTPROTECTIONCAPS, see the DirectX
SDK documentation.
D3DDDICAPS_GETCERTIFICATESIZE
The driver provides a pointer to a number that specifies the size, in bytes, of the driver's certificate that is used for a
channel or crypto type. The Direct3D runtime then uses this size to allocate a buffer to hold the certificate
information that the runtime receives when the runtime calls GetCaps with D3DDDICAPS_GETCERTIFICATE.
D3DDDICAPS_GETCERTIFICATE
The runtime supplies a pointer to a DDICERTIFICATEINFO structure that describes the certificate that the driver
should retrieve.
For an authenticated channel, the driver uses the existing OPM certificate, which is an X.509 certificate that is root
signed by Microsoft.
An application can query the driver's certificate to determine the following information:
Whether the driver is trusted.
Whether the driver is revoked.
The driver's public key. The application uses the driver's public key to establish a session key for an
authenticated channel that is used for authentication.
A call to GetCaps with D3DDDICAPS_GETCERTIFICATE set fails if called for the Direct3D 9 authenticated channel
because this channel does not support a certificate or authentication.
For a crypto session, the driver returns its certificate for the given crypto type. Depending on the crypto type and
the key exchange that are used, a certificate might or might not be used. It is also possible that different crypto
types can use different certificates.
Optional Content Protection DDI Functions
The driver can optionally support the following Content Protection DDI functions:
The EncryptionBlt function reads encrypted data from a protected surface.
The GetPitch function retrieves the pitch of a protected surface.
The StartSessionKeyRefresh function returns a random number that the decoder/application and the
driver/hardware can subsequently use to perform an exclusive OR operation (XOR) with the session key.
The FinishSessionKeyRefresh function indicates that all buffers from that point in time will use the
updated session key value.
The GetEncryptionBltKey function returns the key that is used to decrypt the data that the driver's
EncryptionBlt function returns.
The DecryptionBlt function writes data to a protected surface.
Content Protected Resources
The following D3DDDI_RESOURCEFLAGS flags are used by the Direct3D runtime for protected content. The
runtime sets these D3DDDI_RESOURCEFLAGS flags in the Flags member of the D3DDDIARG_CREATERESOURCE
structure that the pResource parameter of the driver's CreateResource function points to when the runtime calls
CreateResource.
RestrictedContent
The resource might contain protected content. An application might or might not have explicitly enabled content
protection before the application creates a resource. The driver should ensure that the runtime places the allocation
for the resource in a memory pool that can be protected. The driver should allow the creation of lockable protected
resources. However, the driver should explicitly fail the calls to its Lock function to lock these surfaces while
content protection is enabled.
RestrictSharedAccess
Only specific processes should be allowed access to the shared resource.
The driver should restrict shared access to this resource. The runtime can only call the driver's OpenResource
function to open this resource with display devices (hDevice) within the process that created the resource or by
those devices that were explicitly granted access via the authenticated channel.
Using Crypto Session with DirectX Video Accelerator
2.0 Decoder
operating system.
The user-mode display driver can associate a crypto session with a DirectX Video Accelerator (VA) 2.0 decode
device to make the DirectX VA 2.0 decode device use the session key of the crypto session. If the Direct3D runtime
specifies a valid decode GUID in the DecodeProfile member of the D3DDDIARG_CREATECRYPTOSESSION
structure when the runtime calls the driver's CreateCryptoSession function to create the crypto session, the
runtime can subsequently call the driver's ConfigureAuthenticatedChannel function with
D3DAUTHETICATEDCONFIGURE_CRYPTOSESSION set to configure the crypto session with the DirectX VA 2.0
decode device. Before configuring the crypto session with the DirectX VA 2.0 decode device, the runtime must call
the driver's DecodeExtensionExecute function to retrieve a driver handle for the DirectX VA 2.0 decode device.
The runtime sets the members of the D3DDDIARG_DECODEEXTENSIONEXECUTE structure to the following
values to retrieve the driver handle for the DirectX VA 2.0 decode device:
#define DXVA2_DECODE_GET_DRIVER_HANDLE 0x725

D3DDDIARG_DECODEEXTENSIONEXECUTE.Function = DXVA2_DECODE_GET_DRIVER_HANDLE;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateInput->pData = NULL;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateInput->DataSize = 0;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateOutput->pData = HANDLE*;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateOutput->DataSize = sizeof(HANDLE);
When the runtime calls the driver's CreateDecodeDevice function to create the DirectX VA 2.0 decode device, the
runtime specifies zeros for the decode-encryption GUIDs within the DXVADDI_CONFIGPICTUREDECODE
structure.
After the runtime calls the driver's CreateCryptoSession function with the CryptoType member of the
D3DDDIARG_CREATECRYPTOSESSION structure set to D3DCRYPTOTYPE_AES128_CTR to create the crypto
session, the setting of the pPVPSetKey member of the D3DDDIARG_DECODEBEGINFRAME structure in a call to
the driver's DecodeBeginFrame function to decode a frame indicates the following meanings:
If pPVPSetKey is set to NULL, none of the buffers for the frame contain encrypted data and hence do not
require decryption.
If pPVPSetKey points to the NULL_GUID (all zeros), the buffers for the frame are encrypted with the session
key.
If pPVPSetKey points to a content key, it indicates that an application used the session key to encrypt the
content key. The driver should use this content key to decrypt all encrypted buffers that are associated with
this frame.
The initialization vector for each encrypted buffer appears in the pCipherCounter member of the
DXVADDI_DECODEBUFFERDESC structure in a call to the driver's DecodeExecute function. The driver should
fail the call to its DecodeExecute function if it determines that the initialization vector was previously used for the
same content key (or session key if the content key is not used). The application should increment the IV member
of the DXVADDI_PVP_HW_IV for each buffer that the application encrypts. Therefore, the driver's DecodeExecute
function can fail if the IV member is less than or equal to the previous IV value that was passed to DecodeExecute.
If the runtime must partially encrypt the buffers, it calls the driver's DecodeExtensionExecute function and sets
the members of the D3DDDIARG_DECODEEXTENSIONEXECUTE structure to the following values to specify
which blocks the driver should encrypt:
#define DXVA2_DECODE_SPECIFY_ENCRYPTED_BLOCKS 0x724

D3DDDIARG_DECODEEXTENSIONEXECUTE.Function = DXVA2_DECODE_SPECIFY_ENCRYPTED_BLOCKS;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateInput->pData = D3DENCRYPTED_BLOCK_INFO*;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateInput->DataSize = sizeof(D3DENCRYPTED_BLOCK_INFO);
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateOutput->pData = NULL;
D3DDDIARG_DECODEEXTENSIONEXECUTE.pPrivateOutput->DataSize = 0;

operating system.
To verify overlay support, a user-mode display driver should implement the new Overlay DDI that ships with
Windows 7. The following sections describe the new Overlay DDI.
Overlay DDI
Overlay DDI Programming Considerations
Overlay DDI
operating system.
The Overlay DDI is an extension to the Direct3D version 9 DDI to verify overlay support. The Overlay DDI consists
of the following entry points:
The D3DDDICAPS_CHECKOVERLAYSUPPORT value from the D3DDDICAPS_TYPE enumeration is used by
the Direct3D runtime to verify whether the display device supports a particular overlay. The runtime sets
D3DDDICAPS_CHECKOVERLAYSUPPORT in the Type member of the D3DDDIARG_GETCAPS structure that
the pData parameter of the driver's GetCaps function points to when the runtime calls GetCaps. The
runtime also sets the pInfo member of D3DDDIARG_GETCAPS to a pointer to a
DDICHECKOVERLAYSUPPORTINPUT structure that describes the overlay. If the driver supports the
overlay, the driver sets the members of a D3DOVERLAYCAPS structure and returns a pointer to this
structure in the pData member of D3DDDIARG_GETCAPS. Otherwise, if the driver does not support the
overlay, the driver fails the call to its GetCaps function with either
D3DDDIERR_UNSUPPORTEDOVERLAYFORMAT or D3DDDIERR_UNSUPPORTEDOVERLAY depending on
whether the lack of support was based on the overlay format. D3DOVERLAYCAPS is described in the DirectX
SDK documentation.
The driver sets the MaxOverlayDisplayWidth and MaxOverlayDisplayHeight members of
D3DOVERLAYCAPS to indicate any restrictions that the driver and hardware might have, which involve the
final overlay size (after stretching the overlay data).
The driver sets the D3DOVERLAYCAPS_STRETCHX (0x00000040) and D3DOVERLAYCAPS_STRETCHY
(0x00000080) capability bits in the Caps member of D3DOVERLAYCAPS to indicate that the overlay
hardware is capable of arbitrarily stretching and shrinking the overlay data. Drivers should not attempt to
emulate overlay stretching through the GPU and should only set these caps if the overlay hardware
supports stretching. Less overhead is typically required for the application to perform GPU stretching as a
part of the video processing and composition phase than for the driver to perform a separate pass at the
very end to emulate overlay stretching.
The driver should handle the following new bit-field flags from the D3DDDI_OVERLAYINFOFLAGS
structure. A D3DDDI_OVERLAYINFOFLAGS structure identifies the type of overlay operation to perform. A
D3DDDI_OVERLAYINFOFLAGS structure is specified in the Flags member of the D3DDDI_OVERLAYINFO
structure in a call to either the driver's CreateOverlay or UpdateOverlay function.
LimitedRGB
The overlay is limited range RGB rather than full range RGB. In limited range RGB, the RGB range is
compressed such that 16:16:16 is black and 235:235:235 is white.
YCbCrBT709
The overlay is BT.709, which indicates high-definition TV (HDTV), rather than BT.601.
YCbCrxvYCC
The overlay is extended YCbCr (xvYCC) rather than conventional YCbCr.
When the display format is 64 bits rather than 32 bits (for example, when the Desktop Windows Manager
(DWM) uses D3DFMT_A16B16G16R16F for the display mode), the runtime places the lower 32 bits of the
overlay colorkey in the DstColorKeyLow member of the D3DDDI_OVERLAYINFO structure and the upper
32 bits in the DstColorKeyHigh member of D3DDDI_OVERLAYINFO.
Overlay DDI Programming Considerations
operating system.
When you implement the Overlay DDI in your user-mode display driver, you should consider the following
programming tips:
If the driver supports the Overlay DDI, it must set the D3DCAPS_OVERLAY bit in the Caps member of
D3DCAPS9 structure. The D3DCAPS9 structure is described in the DirectX 9.0 SDK documentation. The
driver sets the D3DCAPS_OVERLAY bit in response to a call to its GetCaps function in which the
D3DDDICAPS_GETD3D9CAPS value is set in the Type member of the D3DDDIARG_GETCAPS structure that
the pData parameter points to.
When the display format is 64 bits rather than 32 bits (for example, when the DWM uses the
D3DDDIFMT_A16B16G16R16F value from the D3DDDIFORMAT enumeration for the display mode), the
Direct3D runtime places the low 32 bits of the overlay color key in the DstColorKeyLow member of the
D3DDDI_OVERLAYINFO structure and the upper 32 bits in the DstColorKeyHigh member of
D3DDDI_OVERLAYINFO.
Multiplane overlays can be supported by Windows Display Driver Model (WDDM) 1.3 and later drivers. This
capability is new starting with Windows 8.1.
These sections describe how to implement this capability in your driver:
Multiplane overlay functions called by user-mode display drivers
All user-mode multiplane overlay functions that the operating system implements.
Multiplane overlay functions implemented by the user-mode driver
All functions that a user-mode driver must implement in order to support multiplane overlays.
Multiplane overlay user-mode structures and enumerations
All user-mode structures and enumerations that are used with multiplane overlay device driver interfaces (DDIs).
Multiplane overlay kernel-mode driver-implemented functions
All multiplane overlay functions that the display miniport driver implements.
Multiplane overlay kernel-mode structures
All structures that are used by the display miniport driver.
Multiplane overlay kernel-mode enumerations
All enumerations that are used by the display miniport driver.
This user-mode enumeration constant value supports multiplane overlays and is new for Windows 8.1:
D3DDDICAPS_TYPE (new D3DDDICAPS_GET_MULTIPLANE_OVERLAY_GROUP_CAPS constant value)
Multiplane overlay hardware requirements
Display drivers and hardware are not required to support multiplane overlays. However, to provide multiplane
overlay support, the hardware must meet these requirements:
Hardware must support non-overlapping planes:
One plane can cover one portion of the screen while another plane can cover a different, mutually
exclusive, portion of the screen.
If any portion of the screen is not covered by a plane, the hardware must scan out black for that area. The
hardware can assume that there is a virtual plane at the bottom-most z order that is filled with black.
Hardware must support overlapping planes:
The hardware must be able to enable or disable alpha blending on a per-plane basis. (Alpha blending is a
technique where the color in a source bitmap is combined with that in a destination bitmap to produce a
new destination bitmap.)
Blending between the planes using pre-multiplied alpha must be supported.
When only one output target is active, the active output must support multiplane overlays. In the case of clone
mode, where multiple outputs are simultaneously active, hardware should not report that it supports multiplane
overlays unless all active outputs support multiplane overlays.
The Desktop Window Manager (DWM)â€™s swapchain (plane 0) must be able to interact with the other overlay
planes.
All planes must be able to be enabled and disabled, including plane 0 (the DWMâ€™s swapchain).
All planes must support source and destination clipping, including plane 0 (the DWMâ€™s swapchain).
At least one plane must support shrinking and stretching, independent from other planes that might be enabled.
Planes that support scaling must support both bilinear filtering and filtering quality that is better than bilinear.
At least one plane must support these YUV formats (for more info, see YUV format ranges in Windows 8.1):
Both ITU BT.601 and BT.709 YUV to RGB matrix conversion for YUV formats.
Both normal (or studio) range YUV luminance (16 - 235) and extended-range YUV luminance (0 â€“ 255).
Hardware must handle these register latching scenarios:
All per-plane attributes (buffer address, clipping, scaling, and so on) must atomically post during the
vertical retrace period. When updating a block of registers, they must all post atomically—for example, if
the VSync occurs after writing 10 of 20 registers pertaining to the overlay plane, none of them will post
until the next VSync because they cannot all post on the current Vsync).
Each plane can be updated independently from the other planes. For example, if the plane 0 registers
have been updated prior to the VSync and later the plane 1 registers are updated when the VSync occurs,
the plane 1 updates might wait until the next VSync, but the plane 0 updates should occur on time.
When multiple planes are updated during a single present call, the updates should occur atomically. For
example, if a single present call is updating plane 0 and enabling plane 1, the plane 0 registers should not
post on the VSync unless the plane 1 registers also post on the same VSync.
Transformation, scaling, and blending should occur in this order:
1. The source allocation is clipped according to the specified source rectangle. The source rectangle is
guaranteed to be bounded within the size of the source allocation.
2. Apply a horizontal image flip, then a vertical image flip if requested.
3. Apply scaling according to the destination rectangle, apply clipping according to the clip rectangle, and
apply the appropriate filtering when scaling.
4. Blend with allocations at other layers. Blending should be performed from top to bottom (or until an
opaque layer is hit) in z-order. If alpha blending is requested, hardware must honor the per-pixel-
alpha, and color value is pre-multiplied by alpha. The following pseudo code performs a â€œsource
over destinationâ€? operation repeatedly from top to bottom, (((Layer[0] over Layer[1]) over Layer[2])
over â€¦ Layer[n]). Outside of the destination rectangle, each layer must be treated as transparent
(0,0,0,0).
Color = Color[0]; // Layer 0 is topmost.

Alpha = Color[0].Alpha;
for (i = 1; Alpha < 1 && i < LayersToBlend; i++)
{
Color += ((1 - Alpha) * Color[i]);
Alpha += ((1 - Alpha) * Color[i].Alpha);
}
Output Color;
Hardware can blend from bottom to top as long as the output result is the same. In this case, the
following blend algorithm should be used:
Color = Color[LayersToBlend-1]; // Bottom-most layer

Alpha = Color[LayersToBlend-1].Alpha;
if (LayersToBlend > 1)
{
for (i = LayersToBlend - 2; Alpha < 1 && i >= 0; i--)
{
Color = Color[i] + ((1 â€“ Color[i].Alpha) * Color;
Alpha = Color[i].Alpha + (1 â€“ Color[i].Alpha) * Alpha;
}
}
Output Color;
5. Black color must be displayed at the area where not covered by any of destination rectangles from
any layers. Hardware can assume that there is a conceptual virtual bottom-most black layer that is the
size of the screen.
Multiplane overlay resource creation
When multiplane overlays are used, these requirements apply to allocations that are created within Microsoft
DirectX apps.
DirectX 11 resource creation

When the CreateResource(D3D11) function is called:
The D3D10_DDI_BIND_PRESENT and D3D10_DDI_RESOURCE_MISC_SHARED constant values are set in the
BindFlags member of the D3D11DDIARG_CREATERESOURCE structure, indicating that the allocation can be
scanned out.
It's possible that other bit-field flags in Flags will also be set, such as:
D3D10_DDI_BIND_SHADER_RESOURCE
D3D10_DDI_BIND_RENDER_TARGET
D3D11_DDI_BIND_UNORDERED_ACCESS
D3D11_DDI_BIND_DECODER
D3D11_1DDI_RESOURCE_MISC_RESTRICTED_CONTENT
D3D11_1DDI_RESOURCE_MISC_RESTRICT_SHARED_RESOURCE_DRIVER
When the DXGI_DDI_PRIMARY_DESC structure is passed in the CreateResource(D3D11) call:
DXGI_DDI_PRIMARY_DESC has an appropriate value for the VidPnSourceId member.
DXGI_DDI_PRIMARY_DESC.ModeDesc matches the current mode.
For multiplane overlay resources, the driver must not set the
DXGI_DDI_PRIMARY_DRIVER_FLAG_NO_SCANOUT flag value in the DriverFlags member of
DXGI_DDI_PRIMARY_DESC.
DirectX 9 resource creation

When the CreateResource2 function is called:
The Primary and SharedResource bit-field flags in the Flags member of the
D3DDDIARG_CREATERESOURCE2 structure are set.
It's possible that other bit-field flags in Flags will also be set, such as:
RenderTarget
Texture
DecodeRenderTarget
RestrictedContent
RestrictSharedAccess
The VidPnSourceId member of the D3DDDIARG_CREATERESOURCE2 structure is properly initialized.
The RefreshRate member of the D3DDDIARG_CREATERESOURCE2 structure contains zero.
Multiplane overlay VidPN presentation
When multiplane overlays are used, these requirements apply to functions used to present on multiple surfaces in
video present networks (VidPNs):
DxgkDdiSetVidPnSourceAddressWithMultiPlaneOverlay
If DXGK_MULTIPLANE_OVERLAY_PLANE.Enabled is false, the display miniport driver should disable the
specified plane.
If a plane was enabled in a previous call to DxgkDdiSetVidPnSourceAddressWithMultiPlaneOverlay but is not
present in the current call, the driver should continue to display the plane without flipping it.
It's possible that the driver will receive multiple calls to DxgkDdiSetVidPnSourceAddressWithMultiPlaneOverlay
during the same VSync (one call to flip one plane, and another call to flip a different plane). In this case, the
driver should process both calls.
The data passed should have been validated in user mode by a trusted source. However, the display miniport
driver should still check the data to ensure that it doesn't cause problems. If the data is incorrect, the driver can
fail the call with a STATUS_INVALID_PARAMETER error code, but such failures might not be handled
gracefully and imply either a bug in the operating system or in the user mode driver.
DxgkDdiSetVidPnSourceVisibility
When DXGKARG_SETVIDPNSOURCEVISIBILITY.Visible is set to FALSE on a given source in a call to this function,
all hardware planes must be disabled, including the layer used for the primary surface. When Visible is set to
TRUE, only the plane used for the primary surface must be enabled, and all other planes must remain disabled.
When this function is called, the driver should disable all non-primary overlay planes. The primary surface is flipped
using DxgkDdiSetVidPnSourceAddressWithMultiPlaneOverlay when in multiplane overlay mode.
Tiled resources can be supported by Windows Display Driver Model (WDDM) 1.3 and later drivers. This capability
is new starting with Windows 8.1.
These reference topics describe how to implement this capability in your user-mode display driver:
Tiled resource functions implemented by the user-mode driver
D3DWDDM1_3DDI_CHECK_MULTISAMPLE_QUALITY_LEVELS_FLAG
D3DWDDM1_3DDI_D3D11_OPTIONS_DATA1
D3DWDDM1_3DDI_DEVICEFUNCS
D3DWDDM1_3DDI_TILE_COPY_FLAG
D3DWDDM1_3DDI_TILE_MAPPING_FLAG
D3DWDDM1_3DDI_TILE_RANGE_FLAG
D3DWDDM1_3DDI_TILE_REGION_SIZE
D3DWDDM1_3DDI_TILED_RESOURCE_COORDINATE
D3DWDDM1_3DDI_TILED_RESOURCES_SUPPORT_FLAG
D3D10_2DDICAPS_TYPE
(new D3DWDDM1_3DDICAPS_D3D11_OPTIONS1 constant value)
D3D10_DDI_FILTER
(new D3DWDDM1_3DDI_FILTER_XXX constant values)
(new D3DWDDM1_3DDI_RESOURCE_MISC_TILED and
D3DWDDM1_3DDI_RESOURCE_MISC_TILE_POOL constant values)
D3D10DDIARG_CREATEDEVICE
(new pWDDM1_3DeviceFuncs member)
D3D11DDIARG_CREATEDEFERREDCONTEXT
(new pWDDM1_3ContextFuncs member)
Starting in Windows 8.1, a Windows Display Driver Model (WDDM) driver can support a hybrid system, where
cross-adapter resources are shared between an integrated GPU and a discrete GPU, and an application can be run
on either GPU, depending on the needs of the application. The operating system and driver together determine
which GPU an application should run on.
The display miniport driver should express support for cross-adapter resources by setting the
CrossAdapterResource member of the DXGK_VIDMMCAPS structure.
Drivers get information in different ways depending on the type of allocation. If the allocation is a traditional full-
screen primary, the user-mode display driver gets the information that's usually provided when the primary is
created, such as the primary flag, the video present network (VidPN) source ID, the refresh rate, and rotation
information. However, if the allocation is a direct flip primary, the cross-adapter allocation could be used as a
primary, but the user-mode display driver won't get the usual information that's provided when the primary is
created. Also, in this case the discrete user-mode display driver receives information about the primary but should
not validate it. The integrated driver does not receive information that indicates that it's a primary.
These subsequent topics give more details on driver implementation for hybrid systems:
Validating a hybrid system configuration
Rendering on a discrete GPU using cross-adapter resources
Hybrid system DDI
Definition and properties of a hybrid system:

The system contains a single integrated GPU and a single discrete GPU: The integrated GPU is integrated into
the CPU chipset and outputs to an integrated display panel such as an LCD panel. The discrete GPU is typically
a removable card that connects to a motherboard chipset's north bridge through a bus such as PCI.
The discrete GPU has significantly higher performance than the integrated GPU.
The discrete GPU is a render-only device, and no display outputs are connected to it.
Both GPUs are physically enclosed in the same housing, and the discrete GPU can't be connected or
disconnected while the computer is running.
The operating system detects the configuration of a hybrid system when it runs power-on self-test (POST)
routines, when a new driver is installed, or when a display adapter is enabled or disabled.
Definition and properties of a cross-adapter resource:

A cross-adapter resource is available only starting in Windows 8.1.
It can be paged-in only to the aperture GPU memory segment.
It is allocated as a shared resource.
It has only one allocation, in a linear format.
It has a standard pitch alignment of 128 bytes (defined by the
D3DKMT_CROSS_ADAPTER_RESOURCE_PITCH_ALIGNMENT constant).
It has a standard height alignment of 4 rows (defined by the
D3DKMT_CROSS_ADAPTER_RESOURCE_HEIGHT_ALIGNMENT constant).
Its memory start address is aligned to a one-page boundary.
It might be created as a standard allocation from kernel mode by the display miniport driver and then be
opened later by the user-mode display driver.
It might be created by the user-mode display driver.
Validating a hybrid system configuration
This procedure is used starting in Windows 8.1 to validate the configuration of a hybrid system of display adapters:
1. When the system boots, one of the display adapters is marked as the current POST adapter. If this POST adapter
supports Windows Display Driver Model (WDDM) 1.3 and has an integrated display panel, it's considered an
integrated hybrid adapter.
2. A discrete adapter in a hybrid system is considered a hybrid discrete adapter. It must:
Set the DXGK_DRIVERCAPS.HybridDiscrete member.
Support WDDM 1.3.
Support cross-adapter resources.
Have no display outputs.
3. Only one WDDM hybrid discrete adapter is allowed on the system.
4. When an integrated hybrid adapter is detected:
Any new WDDM 1.3 display adapter (excluding an adapter that matches (2) or (3) or is a basic display or
basic render driver) will not be loaded.
Any loaded WDDM 1.3 display adapter (excluding an adapter that matches (2) or (3) or is a basic display
or basic render driver) that is not a hybrid discrete adapter will be stopped.
5. Drivers that support WDDM versions prior to 1.3 are allowed to load even when an integrated hybrid
adapter is present.
Rendering on a discrete GPU using cross-adapter
resources
Starting in Windows 8.1, a discrete GPU uses a cross-adapter resource as:

A destination for bit-block transfer (bitblt) or present operations, but without stretching or color conversion.
The resource that the operating system requests the user-mode display driver to perform the bitblt or present
operation to and from.
An integrated GPU uses a cross-adapter resource as:
A texture during composition by the Desktop Window Manager (DWM).
A render target for GDI hardware acceleration.
A display primary.
Not as a render target for 3-D operations.
The following sections describe the architecture and processes involved in three possible scenarios where an
application renders on a discrete GPU within a hybrid system.
Redirected bitblt presentation model
1. A cross-adapter resource for a top-level window is created in kernel mode as a standard allocation on the
integrated GPU.
2. When this resource is opened on the discrete GPU, the Microsoft DirectX graphics kernel subsystem
(Dxgkrnl.sys) calls the DxgkDdiGetStandardAllocationDriverData function and creates a new resource on the
discrete GPU using the same backing store (mass-storage device) as for the integrated GPU.
3. The Microsoft Direct3D runtime instructs the discrete GPU's user-mode display driver to open the cross-adapter
resource using private driver data.
4. A DirectX application renders on the discrete GPU to a back-buffer resource. See the "Render" operation in the
figure.
5. When a DirectX application calls a Present method, the Direct3D runtime calls the PresentDXGI (or pfnPresent)
function of the discrete GPU's user-mode driver to copy the back buffer to the cross-adapter resource. See the
"Present" operation in the figure.
6. When a Windows Graphics Device Interface (GDI) application renders to a top-level window, the DirectX
graphics kernel subsystem calls the DxgkDdiRenderKm function of the integrated GPU's display miniport driver
and indicates that the cross-adapter resource is a render target. See the connection between the GDI application
and the cross-adapter surface in the figure.
7. The DWM process opens the cross-adapter resource in the integrated GPU and uses it during composition as a
source texture. See the "Composition" operation in the figure.
Direct flip presentation model
1. The Direct3D runtime instructs the discrete GPU's user-mode display driver to create a cross-adapter resource
for each swap chain surface.
2. On the discrete GPU, the Direct3D runtime might set the Primary and VidPnSourceId members of the
D3DDDI_ALLOCATIONINFO structure if the Direct Flip mode is available. These member values should be
passed when the pfnAllocateCb function is called.
3. The Direct3D runtime instructs the integrated GPU's user-mode display driver to open a cross-adapter resource
that is to be managed by the DWM.
4. An application renders on the discrete GPU using the render target texture as a destination. See the "Render"
operation in the figure.
5. When an application calls a Present method, the Direct3D runtime calls the BltDXGI (or pfnBlt) function of the
discrete GPU's user-mode driver to perform a copy to the cross-adapter resource. The runtime then calls the
PresentDXGI (or pfnPresent) function of the discrete GPU's user-mode driver, with source set to the cross-
adapter resource and the destination allocation set to NULL. See the "Copy" operation in the figure.
6. The DWM performs its composition using the resource from the integrated GPU. If a Direct Flip operation is
needed (DXGK_SEGMENTFLAGS.DirectFlip is set), DWM instructs the integrated GPU's display miniport driver
to perform a flip operation from one cross-adapter allocation to another. See the "DWM flip" operation in the
figure.
Full-screen model
1. The Direct3D runtime instructs the integrated GPU's user-mode display driver to create a cross-adapter shared
primary allocation for each swap chain surface.
2. The Direct3D runtime instructs the discrete GPU's user-mode display driver to open the cross-adapter
resources.
3. An application renders on the discrete GPU using the render target texture as the destination.
4. When the application calls a Present method, the Direct3D runtime instructs the discrete GPU's user-mode
display driver to perform a copy to a cross-adapter resource.
5. The integrated GPU's user-mode display driver and display miniport driver are instructed to flip to this cross-
adapter resource.
Hybrid system DDI
Starting with Windows 8.1, these user-mode and kernel-mode structures and enumerations of the display device
driver interface (DDI) are updated to handle cross-adapter resources on a hybrid system:
D3DDDI_RESOURCEFLAGS2
D3DDDI_SYNCHRONIZATIONOBJECT_FLAGS
D3DKMDT_GDISURFACEDATA
D3DKMDT_GDISURFACETYPE
DXGK_DRIVERCAPS
DXGK_VIDMMCAPS
This function, new for Windows 8.1, is implemented by the user-mode display driver:
QueryDListForApplication1
Here's how to set up and register a DLL that exports this function.
Setting up the dList DLL

A dList is a list of applications that need cross-adapter shared surfaces for high-performance rendering on the
discrete GPU. The discrete GPU installs a separate small dList DLL that exports the QueryDListForApplication1
function. The operating system itself doesn't determine which GPU an application should run on. Instead, the
Microsoft Direct3D runtime calls QueryDListForApplication1 at most once during Direct3D initialization.
The driver must query an up-to-date list of process information to determine whether or not the process needs the
enhanced performance of a discrete GPU instead of the integrated GPU.
For best performance, the DLL should be under 200 KB in size, should keep allocations to a minimum, and should
be able to return from the QueryDListForApplication1 function in under 4 ms.
Registering the dList DLL

The user-mode display driver provides the name of the small dList DLL in its INF file under the registry keys
UserModeDListDriverName and UserModeDListDriverNameWow, the latter under the Wow64 registry entry.
Here's example INF code:
...
HKR,, UserModeDListDriverName, %REG_MULTI_SZ%, dlistumd.dll
HKR,, UserModeDListDriverNameWow, %REG_MULTI_SZ%, dlistumdwow.dll

operating system.
To appropriately manage resources for multiple GPU scenarios, a user-mode display driver can implement a new
device driver interface (DDI) that ships with Windows 7. Each resource might be divided across memory for
multiple GPUs to render on. The driver can implement this new DDI to re-merge each resource so that the new
resource owner has the merged resource. In this DDI implementation, the driver must flush any partially built
command buffers that might modify the resource. This DDI is provided as extensions to the Direct3D version 9 DDI
and to the Direct3D version 10 DXGI DDI. The driver can implement ResolveSharedResource to support
Microsoft Direct3D feature level 9 and ResolveSharedResourceDXGI to support Direct3D feature levels 10 and
11.
Starting in Windows 8.1, a user-mode driver can support cross-adapter resources that are shared between a
discrete GPU and an integrated GPU. See Using cross-adapter resources in a hybrid system.
Windows 7 Enhancements
This section applies only to Windows 7 and later, Windows Server 2008 R2 and later.
You can implement your OpenGL installable client driver (ICD) to use the following OpenGL enhancements that
ship with Windows 7:
Enhancing Synchronization
You can enhance the synchronization capabilities of your OpenGL ICD by using the following second-generation
OpenGL synchronization functions:
D3DKMTCreateSynchronizationObject2
D3DKMTOpenSynchronizationObject
D3DKMTWaitForSynchronizationObject2
D3DKMTSignalSynchronizationObject2
Controlling Resource Access with Mutexes
You can use the following OpenGL mutex functions to control access to resources:
D3DKMTCreateKeyedMutex
D3DKMTOpenKeyedMutex
D3DKMTDestroyKeyedMutex
D3DKMTAcquireKeyedMutex
D3DKMTReleaseKeyedMutex
Managing Access to Shared Resources
You can use the following OpenGL functions to manage access to a shared resource:
D3DKMTConfigureSharedResource
D3DKMTCheckSharedResourceAccess
Monitoring Present History
You can use the following OpenGL functions to monitor the history of present operations:
D3DKMTPresent with D3DKMT_PRESENTHISTORYTOKEN structures populated in the
PresentHistoryToken member of the D3DKMT_PRESENT structure
D3DKMTGetPresentHistory
Miscellaneous Enhancements
You can use the following OpenGL miscellaneous enhancements:
D3DKMTCheckVidPnExclusiveOwnership
D3DKMTGetOverlayState
D3DKMTSetDisplayMode with the D3DKMT_SETDISPLAYMODE_FLAGS structure populated in the
Flags member of the D3DKMT_SETDISPLAYMODE structure
D3DKMTPollDisplayChildren with new flags set in the D3DKMT_POLLDISPLAYCHILDREN structure
Windows 8 Enhancements
This section applies only to Windows 8 and later, and Windows Server 2012 and later.
You can implement your OpenGL installable client driver (ICD) to use the following OpenGL enhancements that
ship with Windows 8:
Controlling Resource Access with Mutexes
You can use these OpenGL mutex functions and associated structures to control access to resources while
specifying private data to associate with a keyed mutex:
D3DKMTAcquireKeyedMutex2
D3DKMTCreateKeyedMutex2
D3DKMT_ACQUIREKEYEDMUTEX2
D3DKMT_CREATEKEYEDMUTEX2
OpenGL Helper Functions
You can use these functions and their associated structures to access objects and their handles:
D3DKMTGetSharedResourceAdapterLuid
D3DKMTOpenAdapterFromLuid
D3DKMTOpenNtHandleFromName
D3DKMTOpenResourceFromNtHandle
D3DKMTOpenSyncObjectFromNtHandle
D3DKMT_GETSHAREDRESOURCEADAPTERLUID
D3DKMT_OPENADAPTERFROMLUID
D3DKMT_OPENNTHANDLEFROMNAME
D3DKMT_OPENRESOURCEFROMNTHANDLE
D3DKMT_OPENSYNCOBJECTFROMNTHANDLE
Monitor Drivers
Starting with Windows Vista, each monitor has a device stack that includes a Microsoft monitor class function
driver and possibly a vendor-supplied filter driver. The following topics describe the function and filter drivers
associated with monitors:
Each video output on the display adapter that has a monitor connected to it is represented by a device node that is
a child of the display adapter's device node.
Typically, there are only two device objects in the device stack that represent a (video output, monitor) pair: the
physical device object (PDO) and the functional device object (FDO). In some cases, there is a filter DO, associated
with a vendor-supplied filter driver, above the FDO. For integrated monitors, such as the built-in flat panel on a
laptop computer, there might be a filter DO, associated with the Advanced Configuration and Power Interface
(ACPI) driver, above the PDO.
The following table shows the device stack for a video output that has a connected monitor.
DEVICE OBJECT REQUIRED/OPTIONAL DRIVER
Filter DO Optional, typically not needed Filter driver supplied by monitor

vendor
FDO Required Monitor class function driver

(Monitor.sys) supplied by Microsoft
Filter DO Required only for integrated ACPI ACPI driver (Acpi.sys) supplied by
display panels Microsoft
PDO Required Bus driver (display miniport/port

pair) supplied by display adapter
vendor
User-mode applications use WMI to invoke the services of the monitor class function driver. Those services include
exposing a monitor's identification data and (in the case of an ACPI display) setting the brightness of the display.
A monitor stores its identification and capability information in an Extended Display Identification Data (EDID)
structure. A request, from a user-mode application, to read a monitor's EDID is processed by the function driver
(Monitor.sys) in that monitor's device stack. When the monitor function driver receives a request to retrieve the
monitor's EDID, it sends a request to the display port/miniport driver pair that is represented by the physical device
object (PDO) at the bottom of the monitor's device stack. The display port/miniport driver pair uses the Display
Data Channel (DDC) protocol to read the monitor's EDID over the I²C bus, which is a simple two-wire bus built into
all standard monitor cables.
The EDID can be obtained using the ACPI_METHOD_OUTPUT_DDC method whose alias is defined in Dispmprt.h.
This method is required for integrated LCDs that do not have another standard mechanism for returning EDID
data.
For more information about communication between display adapters and monitors, see the following topics:
I2C Functions
I2C Functions Implemented by the Video Port Driver
For details about EDID structures and the DDC protocol, see the following standards published by the Video
Electronics Standards Association (VESA):
Enhanced Display Data Channel Standard
Enhanced EDID Standard
For details about the I²C bus, see the I²C Bus Specification published by Philips Semiconductors.
Microsoft provides a general-purpose monitor class function driver, Monitor.sys, that handles most monitor-
related tasks. There is no need for a vendor-supplied monitor driver unless the vendor wants to provide services
beyond those provided by the monitor class function driver.
If a monitor vendor chooses to provide a filter driver, that driver is represented by a filter device object that sits
above the functional device object in the monitor's device stack. The filter driver handles requests from user-mode
applications, also provided by the monitor vendor. The interface between the filter driver and the user-mode
applications is private and known only to the monitor vendor.
Note that programmatic control of a monitor through the Display Data Channel Command Interface (DDC/CI) is
not handled by the monitor device stack, so monitor vendors should not write filter drivers for that purpose.
For a representation of a monitor device stack, see Monitor Class Function Driver.
This section describes how the video present network (VidPN) manager, the display miniport driver, and display
port driver collaborate to manage the collection of display devices connected to a display adapter.
Enumerating Cofunctional VidPN Source and Target Modes
Determining Whether a VidPN is Supported on a Display Adapter
The VidPN manager uses the notion of a video present network (VidPN) to manage the set of display devices
connected to a display adapter. For more information, see Introduction to Video Present Networks.
The following list gives definitions of the primary terms used to describe VidPNs, display adapters, and devices that
connect to display adapters.
display adapter's presentational subsystem
All the hardware responsible for scanning rendered content from video memory and presenting it on video
outputs.
video present network
A model that relates the video present sources on a display adapter to the video present targets on the adapter and
specifies how those sources and targets are configured. It is an abstraction of the display adapter's presentational
subsystem. A VidPN consists of the following:
video present sources
Independent views (that is, primary surface chains) that the display adapter can present concurrently.
video present targets
Independent physical video outputs, on each of which the display adapter can present a view.
topology
A collection of video present paths, where each video present path is an association between a video present
source and a video present target. A video present path specifies that a particular source is connected to a
particular target. The path also specifies the content transformations that are applied on the presented content.
Connecting a source to a target means that the display adapter scans from the primary surface (chain) of the
source and encodes the scanned content to the video signal format of the target, applying content transformations
(for example, contrast/brightness gains, flicker filter, color transformation) in the process.
source mode sets
Each video present source in the VidPN is associated with a source mode set, which is a list of primary surface
formats (source modes) that are supported on the source in the topology of the VidPN.
target mode sets
Each video present target in the VidPN is associated with a target mode set, which is a list of video signal formats
(target modes) that are supported on the target in the topology of the VidPN.
monitor source mode sets
Each target in the VidPN that is connected to a monitor (or other external display device) is associated with a
monitor source mode set, which is a list of video signal formats supported on the connected monitor.
active VidPN
The VidPN that is currently set on the display adapter.
pinned mode
A mode designated as the desired mode for a particular video present source or target. The mode that is pinned on
a source (target) is not necessarily the mode currently in use by the source (target); rather it is the desired mode for
that source (target) in a given VidPN (possibly to be used as the next active VidPN). A pinned mode must remain
available in its mode set throughout additional constraints imposed on the VidPN. That is, no change to the VidPN
is authorized unless all of the pinned modes are still supported in the modified VidPN.
functional VidPN
A VidPN that satisfies all of the following conditions:
It has a topology with at least one video present path.
Every video present source in the topology has a pinned mode.
Every video present target in the topology has a pinned mode.
modality
The collection of mode sets for all the sources and targets in the topology of a VidPN.
cofunctional mode set
The set of modes that are available for a particular source or target, given the constraints (for example, topology,
modes pinned on other sources and targets) of a VidPN.
cofunctional VidPN modality
The collection of cofunctional mode sets for all the sources and targets in the topology of a VidPN.
child device of the display adapter
A device on the display adapter that the display miniport driver enumerates as a child. All child devices of the
display adapter are on-board devices; monitors and other devices that connect to the display adapter are not
considered child devices.
external device
A device that connects to a child device of the display adapter. External devices are not considered child devices of
the display adapter.
video output device
A child device of the display adapter that supplies a video output signal to an external or built-in display device.
video output codec
A hardware encoder/decoder on the display adapter that reads from a primary surface in video memory and
places a video signal representation of that surface on one or more of the display adapter's video outputs.
The video present network (VidPN) manager, which is a component of the DirextX graphics kernel subsystem
(Dxgkrnl.sys), is responsible for managing the collection of monitors and other display devices that are connected
to a display adapter. The responsibilities of the VidPN manager include the following:
Respond to hot plugging and unplugging of monitors.
Maintain and update a set of available display modes as the set of connected monitors changes.
Manage the association between rendering surfaces and video outputs on the display adapter; for example,
clone views and extension of the desktop to multiple monitors.
Adjust the set of available display devices and display modes when the lid on a laptop computer is opened
or closed.
Adjust the set of available display devices and display modes when a laptop computer is docked or
undocked.
The hardware on a display adapter that is responsible for scanning rendered content from video memory and
presenting it on video outputs is called the display adapter's presentational subsystem. A video present network
(VidPN) is a software model of a display adapter's presentational subsystem.
The key elements of a display adapter's presentational subsystem are the views (primary surface chains) and the
video outputs. In the VidPN model, a view is called video present source, and a video output is called a video
present target.
A video present path is an association between a video present source and a video present target. A VidPN models
the relationship between sources and targets by maintaining a set of video present paths. The set of paths is called
a VidPN topology.
Note that video present targets are not the monitors (or other external display devices) connected to the display
adapter. The video present targets are the video output connectors themselves.
The following diagram illustrates a VidPN.
The VidPN illustrated in the preceding diagram has three video present targets: a DVI connector, an HD15
connector, and an S-video connector. The VidPN topology is represented by the lines that connect the two sources
to the three targets. The topology specifies that Source 1 is connected to the DVI target and Source 2 is connected
to both the HD15 and S-video targets. The content rendered on Source 2 is presented as a clone view on the
display devices connected to the HD15 and S-video connectors.
Each video present source supports a certain set of surface formats called source modes. To keep track of the
source modes supported by the various video present sources, a VidPN maintains a source mode set for each
video present source. The source mode set for a particular video present source is not static; it changes as the
topology changes, and it changes according to the modes chosen for other video present sources.
The model works similarly for video present targets. Each video present target supports a certain set of video
signal formats called target modes, and a VidPN maintains a target mode set for each video present target. The
target mode set for a particular video present target changes as the topology changes and as modes are chosen
for other video present targets.
The Role of the Display Miniport Driver
A display adapter has one or more video output codecs (for example, a CRTC) that read from video present
sources and place the corresponding video signals on video present targets. At any given time, a video output
codec can read from at most one video present source; however, that codec can supply a video signal to more
than one video present target (clone view).The VidPN manager concerns itself with the associations between video
present sources and video present targets, but does not concern itself with the role of the video output codecs. The
decisions about which video output codec reads from a particular video present source is entirely under the
control of the display miniport driver. For example, suppose a display adapter has two video output codecs, and
the VidPN manager asks the display miniport driver to implement the topology shown in the following diagram.
The following diagram shows one way that the display miniport driver could assign video output codecs to video
present sources.
Notice that the clone view (HD15, S-video) in the preceding diagram is handled by a single CRTC. Now suppose
that the HD15 output connected to CRTC 1 is no longer needed. Then the display miniport driver could implement
the clone view by configuring the video output codecs as shown in the following diagram:
Implementing the clone view with two CRTCs has some advantages over implementing it with one CRTC. For
example, with two CRTCs the HD15 and S-video outputs can have different resolutions and refresh rates.
The important point is that the VidPN manager never knows anything about how the video output codecs on a
display adapter are assigned to the video present sources and targets. The VidPN manager knows only the
associations between sources and targets. The underlying composite associations that involve the video output
codecs are known only to the display miniport driver.
The video present network (VidPN) manager uses a VidPN object to maintain information about associations
between video present sources, video present targets, and display modes. For more information, see the
Introduction to Video Present Networks topic.
A VidPN object contains the following sub-objects.
Topology
Source mode set
Target mode set
Monitor source mode set
Path
Source
Target
Source mode
Target mode
Monitor source mode
The following diagram illustrates a VidPN object and its sub-objects.
The preceding diagram illustrates whether a particular association is one-to-one, one-to-many, many-to-one, or
many-to-many. For example, the diagram shows that a source can belong to more than one path, but a target can
belong to only one path.
The blue objects in the diagram are accessed through handles and interfaces, and the gray objects are accessed
through structure pointers. An interface in this context is a structure that contains function pointers. For example,
the DXGK_VIDPNTOPOLOGY_INTERFACE structure contains pointers to functions (implemented by the VidPN
manager) that the display miniport driver calls to inspect and alter a topology object. When the display miniport
driver calls any one of those functions, it must supply a handle to a topology object. The following table lists the
handle, interface, and pointer data types used to access a VidPN object and its sub-objects.
OBJECT ACCESS METHOD AND DATA TYPE
VidPN Accessed through handle and interface.

D3DKMDT_HVIDPN
DXGK_VIDPN_INTERFACE
Topology Accessed through handle and interface.

D3DKMDT_HVIDPNTOPOLOGY
DXGK_VIDPNTOPOLOGY_INTERFACE
Source mode set Accessed through handle and interface.

D3DKMDT_HVIDPNSOURCEMODESET
DXGK_VIDPNSOURCEMODESET_INTERFACE
Target mode set Accessed through handle and interface.

D3DKMDT_HVIDPNTARGETMODESET
DXGK_VIDPNTARGETMODESET_INTERFACE
Monitor source mode set Accessed through handle and interface.

D3DKMDT_HMONITORSOURCEMODESET
DXGK_MONITORSOURCEMODESET_INTERFACE
Path Accessed through structure pointer.

D3DKMDT_VIDPN_PRESENT_PATH
Source Accessed through structure pointer.

D3DKMDT_VIDEO_PRESENT_SOURCE
Target Accessed through structure pointer.

D3DKMDT_VIDEO_PRESENT_TARGET
Source mode Accessed through structure pointer.

D3DKMDT_VIDPN_SOURCE_MODE
Target mode Accessed through structure pointer.

D3DKMDT_VIDPN_TARGET_MODE
Monitor source mode Accessed through structure pointer.

D3DKMDT_MONITOR_SOURCE_MODE
The VidPN manager, which is one of the components of the DirectX graphics kernel subsystem, cooperates with
the display miniport driver to build and maintain VidPNs. The following steps describe how the display miniport
driver obtains a handle and an interface to a VidPN object.
1. During initialization, the DirectX graphics kernel subsystem calls the display miniport driver's
DxgkDdiStartDevice function. That call provides the display miniport driver with a DXGKRNL_INTERFACE
structure, which contains pointers to functions implemented by the DirectX graphics kernel subsystem. One
of those functions is DxgkCbQueryVidPnInterface.
2. At some point, the VidPN manager needs help from the display miniport driver, so it provides the display
miniport driver with a handle to a VidPN object by calling one of the following functions:
DxgkDdiIsSupportedVidPn
DxgkDdiEnumVidPnCofuncModality
3. The display miniport driver passes the handle obtained in Step 2 to DxgkCbQueryVidPnInterface, which
returns a pointer to a DXGK_VIDPN_INTERFACE structure.
After the display miniport driver has a handle and an interface to a VidPN object, it can get handles and interfaces
(as needed) to the primary sub-objects: topology, source mode set, target mode set, and monitor source mode set.
For example, the display miniport driver can call pfnGetTopology (one of the functions in the VidPN interface) to
get a handle to a VidPN topology object and a pointer to a DXGK_VIDPNTOPOLOGY_INTERFACE structure.
The following functions (in the VidPN interface) provide handles and interfaces to the primary sub-objects of a
VidPN object.
pfnGetTopology
pfnAcquireSourceModeSet
pfnAcquireTargetModeSet
Note that two of the functions in the preceding list have corresponding functions that release VidPN sub-objects.
pfnReleaseSourceModeSet
pfnReleaseTargetModeSet
After the display miniport driver obtains a handle and an interface to one of a VidPNs primary sub-objects, it can
call the interface functions to get descriptors of objects related to the sub-object. For example, given a handle and
an interface to a topology object, the display miniport driver could perform the following steps to get descriptors
of all the paths in topology.
1. VidPN Topology interface
Call the pfnAcquireFirstPathInfo function of the VidPN topology interface to obtain a pointer to a
D3DKMDT_VIDPN_PRESENT_PATH structure that describes the first path in the topology.
2. VidPN Topology interface
Call the pfnAcquireNextPathInfo function repeatedly to obtain pointers to
D3DKMDT_VIDPN_PRESENT_PATH structures that describe the remaining paths in the topology.
Similarly, the display miniport driver can get descriptors of the modes in a mode set by calling the
pfnAcquireFirstModeInfo and pfnAcquireNextModeInfo functions of any of the following mode set interfaces.
DXGK_VIDPNSOURCEMODESET_INTERFACE
DXGK_VIDPNTARGETMODESET_INTERFACE
DXGK_MONITORSOURCEMODESET_INTERFACE
Note that the DXGK_VIDPNSOURCEMODESET_INTERFACE interface has no function for removing a mode from
a source mode set. When the display miniport driver needs to update a source mode set, it does not alter an
existing mode set by adding and removing modes. Instead, it creates a new mode set that replaces the old mode
set. An example of a function that must update mode sets is the display miniport driver's
DxgkDdiEnumVidPnCofuncModality function. The steps involved in updating a source mode set are as follows:
1. VidPN Source Mode Set interface
Call the pfnCreateNewModeInfo of the DXGK_VIDPNSOURCEMODESET_INTERFACE interface to get a
pointer to a D3DKMDT_VIDPN_SOURCE_MODE structure (allocated by the VidPN manager).
Call pfnAddMode repeatedly to add modes to the source mode set.
2. VidPN interface
Call the pfnAssignSourceModeSet function of the DXGK_VIDPN_INTERFACE to assign the new mode set to
a particular video present source. The new source mode set replaces the source mode set that is currently
assigned to that source.
Updating a target mode set is similar to updating a source mode set. The
DXGK_VIDPNTARGETMODESET_INTERFACE interface has the following functions:
VidPN Target Mode Set interface
A pfnCreateNewModeInfo function for creating a new target mode set and a pfnAddMode function for
adding modes to the set.
There is no interface (set of functions) for obtaining the source and target that belong to a particular path. The
display miniport driver can determine which source and target belong to a particular path by inspecting the
VidPnSourceId and VidPnTargetId members of the D3DKMDT_VIDPN_PRESENT_PATH structure that
represents the path.
A child device of the display adapter is a device on the display adapter that is enumerated as a child by the display
miniport driver. All child devices of the display adapter are on-board; monitors and other external devices that
connect to the display adapter are not considered child devices.
The display miniport driver's DxgkDdiQueryChildRelations function is responsible for enumerating child
devices of the display adapter. During the enumeration, the display miniport driver assigns each child device a type
and a hot-plug detection (HPD) awareness value. The type is one of the DXGK_CHILD_DEVICE_TYPE enumerators:
TypeVideoOutput
TypeOther
The HPD awareness value is one of the DXGK_CHILD_DEVICE_HPD_AWARENESS enumerators:
HpdAwarenessAlwaysConnected
HpdAwarenessInterruptible
HpdAwarenessPolled
The following table gives some examples of devices that have various types and HPD awareness values.
HPDAWARENESS VIDEOOUTPUT OTHER
AlwaysConnected Output for integrated LCD panel on TV tuner

a desktop computer
cross bar switch
MPEG2 codec
Interruptible DVI
HDMI
Output for integrated LCD panel on
a portable computer
Polled S-video
HD15
The operating system uses one of several strategies, depending on the HPD awareness value, to determine
whether an external device is connected to a child device. The following table briefly describes how the operating
system determines the connection status of devices with various HPD awareness values.
HPDAWARENESS HOW OPERATING SYSTEM DETERMINES CONNECTION STATUS
AlwaysConnected The operating system knows the child device is always

present. No external device is ever connected to or
disconnected from the child device.
HPDAWARENESS HOW OPERATING SYSTEM DETERMINES CONNECTION STATUS
Interruptible The operating system is notified when an external display

device is connected to or disconnected from the child
device. (The display panel on a portable computer is
considered connected when the lid is open and
disconnected when the lid is closed.)
Polled The operating system asks whether an external display

device is connected to the child device.

The following sequence of steps describes how the display port driver, display miniport driver, and video present
network (VidPN) manager collaborate at initialization time to enumerate child devices of a display adapter.
1. The display port driver calls the display miniport driver's DxgkDdiStartDevice function.
DxgkDdiStartDevice returns (in the NumberOfChildren parameter) the number of devices that are (or could
become by docking) children of the display adapter. DxgkDdiStartDevice also returns (in the
NumberOfVideoPresentSources parameter) the number N of video present sources supported by the
display adapter. Those video present sources will subsequently be identified by the numbers 0, 1, ... N -1.
2. The display port driver calls the display miniport driver's DxgkDdiQueryChildRelations function, which
enumerates child devices of the display adapter. DxgkDdiQueryChildRelations fills in an array of
DXGK_CHILD_DESCRIPTOR structures: one for each child device. Note that all child devices of the display
adapter are on-board: monitors and other external devices that connect to the display adapter are not
considered child devices. For more information, see Child Devices of the Display Adapter.
DxgkDdiQueryChildRelations must enumerate potential child devices as well as the child devices that are
physically present at initialization time. For example, if connecting a laptop computer to a docking station
will result in the appearance of a new video output, DxgkDdiQueryChildRelations must enumerate that
video output regardless of whether the computer is docked at initialization time. Also, if connecting a dongle
to a video output connector will allow several monitors to share the connector, DxgkDdiQueryChildRelations
must enumerate a child device for each branch of the dongle, regardless of whether the dongle is connected
at initialization time.
3. For each child device (enumerated as described in Step 1) that has an HPD awareness value of
HpdAwarenessInterruptible or HpdAwarenessPolled, the display port driver calls the display miniport
driver's DxgkDdiQueryChildStatus function to determine whether the child device has an external device
connected to it.
4. The display port driver creates a PDO for each child device that satisfies one of the following conditions:
The child device has an HPD awareness value of HpdAwarenessAlwaysConnected.
The child device has an HPD awareness value of HpdAwarenessPolled or
HpdAwarenessInterruptible, and the operating system knows from a previous query or notification
that the child device has an external device connected.
5. The display port driver calls the display miniport driver's DxgkDdiQueryDeviceDescriptor function for
each child device that satisfies one of the following conditions:
The child device is known to have an external device connected.
The child device is assumed to have an external device connected.
The child device has a type of TypeOther.
DxgkDdiQueryDeviceDescriptor returns an Extended Display Information Data (EDID) block if the connected
monitor (or other display device) supports EDID descriptors.
Note: During initialization, the display port driver calls DxgkDdiQueryDeviceDescriptor for each monitor to
obtain the first 128-byte block of the monitor's EDID. That gives the display port driver what it needs at
initialization time: PnP hardware ID, instance ID, compatible IDs, and device text. At a later time, the monitor
class function driver (Monitor.sys) calls DxgkDdiQueryDeviceDescriptor for each monitor to obtain the first
128-byte EDID block and additional 128-byte EDID extension blocks. This means that the display miniport
driver will be called twice to provide the first 128-byte block of each monitor's EDID.
6. The VidPN manager obtains identifiers for all of the video present sources and video present targets
supported by the display adapter. The video present sources are identified by the numbers 0, 1, ... N - 1,
where N is the number of sources returned by the display miniport driver's DxgkDdiStartDevice function.
The video present targets have unique integer identifiers that were previously created by the display
miniport driver during DxgkDdiQueryChildRelations. Each child device of type TypeVideoOutput is
associated with a video present target, and the ChildUid member of the child device's
DXGK_CHILD_DESCRIPTOR structure is used as the identifier for the video present target.
7. The VidPN manager uses the following procedure to build an initial VidPN.
If a last known good VidPN is recorded in the registry, use it as the initial VidPN.
Otherwise, call the display miniport driver's DxgkDdiRecommendFunctionalVidPn function to
obtain an initial VidPN.
If DxgkDdiRecommendFunctionalVidPn fails to return a functional VidPN that is acceptable, create a
simple VidPN that contains one video present path; that is, one (source, target) pair. Call the display
miniport driver's DxgkDdiIsSupportedVidPn function to verify that the proposed VidPN will work.
If DxgkDdiIsSupportedVidPn reports that the proposed VidPN will not work, keep trying until a
suitable VidPN is found.
Call the display miniport driver's DxgkDdiEnumVidPnCofuncModality function to determine the
source and target modes that are available for the VidPN.
A video output on a display adapter is considered a child device of the display adapter. A monitor or other external
display device that connects to the output is not considered a child device. During initialization, the display
miniport driver's DxgkDdiQueryChildRelations function assigns each child device a type and an HPD awareness
value. The type is one of the DXGK_CHILD_DEVICE_TYPE enumerators:
TypeVideoOutput
TypeOther
The HPD awareness value is one of the DXGK_CHILD_DEVICE_HPD_AWARENESS enumerators:
HpdAwarenessAlwaysConnected
HpdAwarenessInterruptible
HpdAwarenessPolled
A child device that has a type of TypeVideoOutput and any HPD awareness value other than
HpdAwarenessAlwaysConnected is called a video output connector.
If the display miniport driver cannot determine whether a monitor is connected to the video output, the driver
should emulate the behavior of an interruptible device, with the HPD awareness value set to
HpdAwarenessInterruptible. If the display miniport driver needs to indicate that an interruptible monitor should
be connected to the video output, such as when a user enters a keyboard shortcut to switch to a television view, the
driver should call the DxgkCbIndicateChildStatus function with ChildStatus.HotPlug.Connected set to TRUE.
At certain times, the operating system requests that the display miniport driver report the status of all video output
connectors that have an HPD awareness value of HpdAwarenessPolled. There is no regular polling interval;
rather, the request is made when there is a specific need to update the list of available display devices and modes.
For example, when a laptop computer is docked, the operating system needs to know whether a monitor is
connected to the video output on the docking station. The operating system makes the request by calling the
display miniport driver's DxgkDdiQueryChildStatus function for each child device that has an HPD awareness
value of HpdAwarenessPolled.
For video output connectors that have an HPD awareness value of HpdAwarenessInterruptible, the display
miniport driver is responsible for notifying the operating system whenever an external display device is hot
plugged or unplugged. The display miniport driver's interrupt handling code calls the display port driver's
DxgkCbIndicateChildStatus function to report that an external display device has been connected to or
disconnected from a particular video output. When a laptop computer is docked, the display miniport driver's
DxgkDdiNotifyAcpiEvent function must call DxgkCbIndicateChildStatus for each video output on the docking
station that has an HPD awareness value of HpdAwarenessInterruptible.
If a connector with an HPD awareness value of HpdAwarenessPolled is made unavailable (that is, covered up)
when a laptop computer is docked, the display miniport driver's DxgkDdiNotifyAcpiEvent function must call
DxgkCbIndicateChildStatus to report that the connector is disconnected.
The video output associated with an integrated display panel on a portable computer is an unusual case. The
operating system needs to know whether the portable computer's lid is open or closed, so the idea of connected is
used to mean open and the idea of not connected is used to mean closed. The video output associated with an
integrated display on a portable computer has an HPD awareness value of HpdAwarenessInterruptible. That
does not mean, however, that the display adapter generates an interrupt when the lid is opened or closed. Rather,
the ACPI BIOS generates an interrupt when the lid is opened or closed. That interrupt results in a call to the display
miniport driver's DxgkDdiNotifyAcpiEvent function, which calls DxgkCbIndicateChildStatus to report the status
(open or closed) of the lid. The display miniport driver reports the status of the lid by setting the
HotPlug.Connected member of a DXGK_CHILD_STATUS structure to TRUE (open) or FALSE (closed) and
passing the DXGK_CHILD_STATUS structure to DxgkCbIndicateChildStatus.
The following list describes the steps followed when a monitor is connected to an HD15 connector, assuming that
the connector has an HPD awareness value of HpdAwarenessPolled.
1. A monitor is connected to the HD15 connector on the display adapter. The display adapter does not detect
this as a hot-plug event.
2. At some future time, a user-mode application requests a list of display devices.
3. For each video output connector on the display adapter that has an HPD awareness value of
HpdAwarenessPolled, the VidPN manager calls the display miniport driver's DxgkDdiQueryChildStatus
function to determine whether an external display device is connected. When DxgkDdiQueryChildStatus is
called for the HD15 connector, it reports that an external monitor is indeed connected.
The following list describes the steps followed when a monitor is connected to a DVI connector, assuming that the
connector has an HPD awareness value of HpdAwarenessInterruptible.
1. A flat panel is connected to the DVI connector on the display adapter.
2. The display adapter detects a hot-plug event and generates an interrupt.
3. The interrupt is handled by the display miniport driver's DxgkDdiInterruptRoutine function, which
schedules a deferred procedure call (DPC). Subsequently the display miniport driver's DPC callback function
is called.
4. The DPC callback function passes a DXGK_CHILD_STATUS structure to the display port driver's
DxgkCbIndicateChildStatus function to report the status of the DVI connector. The ChildUid member of
the DXGK_CHILD_STATUS structure identifies the DVI connector, and the HotPlug.Connected member (set
to TRUE in this case) indicates that an external display device is connected.
Suppose a DVI connector supports a dongle that has three branches: DVI, HD15, and S-video. In that case, the
display miniport driver would have previously enumerated three child devices associated with the one physical DVI
connector: DVI-on-DVI, HD15-on-DVI, and S-video-on-DVI. Each of those child devices would have a type of
TypeVideoOutput and an HPD awareness value of HpdAwarenessInterruptible. The following list describes the
steps followed when a monitor is connected to the HD15 branch of the dongle.
1. The display adapter detects a hot-plug event and generates an interrupt.
2. The interrupt is handled by the display miniport driver's DxgkDdiInterruptRoutine function, which
schedules a deferred procedure call (DPC). Subsequently the display miniport driver's DPC callback function
is called.
3. The DPC callback function determines that the hot-plug event was on the HD15 branch of the dongle
(HD15-on-DVI).
4. The DPC callback functions passes a DXGK_CHILD_STATUS structure to DxgkCbIndicateChildStatus to
report the status of the HD15-on-DVI video output. The ChildUid member of the DXGK_CHILD_STATUS
structure identifies the video output, and the HotPlug.Connected member (set to TRUE in this case)
indicates that an external display device is connected.
The following list describes the steps followed when the lid is closed on a laptop computer.
1. The lid is closed on a portable computer, which generates an ACPI event. Subsequently, the display miniport
driver's DxgkDdiNotifyAcpiEvent function is called.
2. DxgkDdiNotifyAcpiEvent passes a DXGK_CHILD_STATUS structure to the display port driver's
DxgkCbIndicateChildStatus function to report the status of the child device associated with the built-in
display panel. Specifically, DxgkDdiNotifyAcpiEvent sets the HotPlug.Connected member of the
DXGK_CHILD_STATUS structure to FALSE.
Enumerating Cofunctional VidPN Source and Target
Modes
This topic describes how the video present network (VidPN) manager and the display miniport driver collaborate
to enumerate modes that are available on video present sources and targets. Before reading this material, you
should be familiar with the material in the following topics:
From time to time, the VidPN manager asks the display miniport driver to enumerate the modes that are available
on a display adapter's video present sources and targets. Typically, the request has the following pattern:
1. The VidPN manager creates or obtains a VidPN that has modes pinned on some, but not all, of its sources
and targets.
2. The VidPN manager calls DxgkDdiIsSupportedVidPn to determine whether the VidPN can be extended to
form a functional VidPN that is supported on the display adapter. That is, it asks whether modes can be
pinned on the remaining sources and targets without changing the existing pinned modes.
3. The VidPN manager calls DxgkDdiEnumVidPnCofuncModality to obtain the modes that are available on
the sources and targets that do not yet have pinned modes.
One of the arguments passed to DxgkDdiEnumVidPnCofuncModality is a handle to a VidPN object called the
constraining VidPN.
DxgkDdiEnumVidPnCofuncModality must do the following:
Inspect the constraining VidPN.
For each source and target that does not have a pinned mode, adjust the mode set so that it is the largest
possible mode set that is cofunctional with the constraints.
For each path that does not have a pinned scaling transformation, adjust the scaling support flags so that
they are cofunctional with the constraints.
For each path that does not have a pinned rotation transformation, adjust the rotation support flags so that
they are cofunctional with the constraints.
For each source that has a pinned mode, report the multisampling methods that are available for that
source.
The following paragraphs give details on how to perform each of the tasks in the previous bulleted list.
Inspecting the constraining VidPN
The following properties of the constraining VidPN are the constraints that must be honored by
DxgkDdiEnumVidPnCofuncModality.
Topology (the set of associations between sources and targets)
Pinned modes
Scaling, scaling support, rotation, and rotation support of each path
Target color basis of each path
Target color coefficient dynamic ranges of each path
Content type (graphics or video) of each path
Gamma ramp of each path
To extract the constraints from the constraining VidPN, perform the following steps:
VidPN interface.
Begin by calling the pfnGetTopology function to get a pointer to a VidPN Topology interface that
represents the constraining VidPN's topology.
VidPN Topology interface
Call the pfnAcquireFirstPathInfo and pfnAcquireNextPathInfo functions to get information about each
path in the constraining VidPN's topology. Information about a particular path (source ID, target ID, scaling
transformation, rotation transformation, target color basis, etc.) is contained in a
D3DKMDT_VIDPN_PRESENT_PATH structure.
VidPN interface
For each path, pass the path's source ID to the pfnAcquireSourceModeSet function to get the path's
source.
VidPN Source Mode Set interface
Call the pfnAcquirePinnedModeInfo function to determine which mode (if any) is pinned in the source's
mode set. If the source's mode set has a pinned mode, there is probably no need to examine the remaining
modes in the set. If the mode set does not have a pinned mode, examine the remaining modes in the set by
calling pfnAcquireFirstModeInfo and pfnAcquireNextModeInfo.
Use a similar procedure to examine the target mode sets and to determine which target mode sets have
pinned modes.
Adjusting mode sets
As you inspect the mode sets associated with sources and targets in the constraining VidPN's topology, take note
of which mode sets have pinned modes. If a mode set does not have a pinned mode, determine whether it needs
to be adjusted. A mode set must be adjusted if it contains modes that are not cofunctional with the constraints or if
it lacks available modes that are cofunctional with the constraints.
For video present targets that have connected monitors, you must also consider the set of modes supported by the
monitor. Even if a video present target on the display adapter supports a particular mode (given the constraints),
you should only list that mode in the target's mode set if the connected monitor also supports the mode. To
determine the modes supported by connected monitor, perform the following steps:
DXGK_MONITOR Interface
Call pfnAcquireMonitorSourceModeSet. If a mode set needs no adjustment, you can leave it alone. If a
mode set needs to be adjusted, then you must create a new mode set and replace the existing mode set
with the new one.
VidPN interface
To create and populate a new source mode set, call pfnCreateNewSourceModeSet.
VidPN Source Mode Set interface
Then call pfnCreateNewModeInfo and pfnAddMode.
VidPN interface
Finally call pfnAssignSourceModeSet to replace the existing source mode set with the new one.
Adjusting scaling support flags
For each path in the constraining VidPN's topology, determine whether the path has a pinned scaling
transformation. To make that determination, inspect vpnPath.ContentTransformation.Scaling, where vpnPath is
the D3DKMDT_VIDPN_PRESENT_PATH structure that represents the path. If
vpnPath.ContentTransformation.Scaling is set to D3DKMDT_VPPS_IDENTITY, D3DKMDT_VPPS_CENTERED,
or D3DKMDT_VPPS_STRETCHED, then the scaling transformation for the path is pinned. Otherwise, the scaling
transformation is not pinned.
If the path does not have a pinned scaling transformation, determine whether the path's scaling support flags need
to be adjusted. The support flags must be adjusted if they show support for a type of scaling that is not
cofunctional with the constraints or if they fail to show support for a type of scaling that is cofunctional with the
constraints. To alter the scaling support flags, set the members of the
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT structure that holds the flags.
Adjusting rotation support flags
Adjusting a path's rotation support flags is similar to adjusting a path's scaling support flags. Suppose vpnPath is a
D3DKMDT_VIDPN_PRESENT_PATH structure. If vpnPath.ContentTransformation.Rotation is set to
D3DKMDT_VPPR_IDENTITY, D3DKMDT_VPPR_ROTATE90, D3DKMDT_VPPR_ROTATE180, or
D3DKMDT_VPPR_ROTATE270, then the rotation transformation for the path is pinned. Otherwise, the rotation
transformation is not pinned. The rotation support flags are in
vpnPath.ContentTransformation.RotationSupport.
Reporting multisampling methods
If the display adapter has one or more video output codecs that are capable of antialiasing by multisampling, then
you must report the multisampling methods that are available (given the constraints), for each source that has a
pinned mode. To report the available multisampling methods, perform the following steps:
Create an array of D3DDDI_MULTISAMPLINGMETHOD structures
Pass the array to the pfnAssignMultisamplingMethodSet function of the VidPN interface.
The D3DDDI_MULTISAMPLINGMETHOD structure has two members, which you must set, that characterize a
multisampling method. The NumSamples member indicates the number of subpixels that are sampled. The
NumQualityLevels member indicates the number of quality levels at which the method can operate. You can
specify any number of quality levels as long as each increase in level noticably improves the quality of the
presented image.
Enumeration Pivots
As described previously, DxgkDdiEnumVidPnCofuncModality must create mode sets that are cofunctional with the
VidPN passed in its hConstrainingVidPn parameter. In some cases, DxgkDdiEnumVidPnCofuncModality must
augment its behavior according to additional information (an enumeration pivot) passed in the EnumPivotType
and EnumPivot parameters.
The enumeration pivot can be one of the following:
The mode set of a particular video present source
The mode set of a particular video present target
The scaling transformation of a particular VidPN present path
The rotation transformation of a particular VidPN present path
If the enumeration pivot is a mode set, then DxgkDdkEnumVidPnCofuncModality must leave that mode set
unchanged. If the enumeration pivot is the scaling (rotation) transformation of a path, then
DxgkDdiEnumVidPnCofuncModality must not change the scaling (rotation) support flags for that path.
Determining Whether a VidPN is Supported on a
Display Adapter
This topic describes how the display miniport driver determines whether a particular video present network
(VidPN) is supported on a display adapter. Before reading this material, you should be familiar with the material in
the following topics:
A VidPN is functional if it satisfies the following conditions:
It has a topology that has at least one path. (A path is an association between a source and target.)
Each source and target in the topology has a pinned mode.
A VidPN is supported on a display adapter if one of the following conditions is true:
It is functional, and it can be implemented on the display adapter. That is, the video output codecs on the
display adapter can be configured to support the topology and the pinned modes specified by the VidPN.
It has a topology with at least one path, and it can be extended to a functional VidPN that can be
implemented on the display adapter. That is, it would be possible, without changing any modes that have
already been pinned, to pin modes on all the video present sources and targets that don't yet have modes
pinned. Furthermore, it would be possible to implement the resulting functional VidPN on the display
adapter.
It has an empty topology. The idea is that displaying nothing is always supported on a display adapter.
Part of determining whether a VidPN is supported is determining whether the VidPN's topology is valid. In other
words, can the video present sources be connected to the video present targets as specified by the topology? Note
that it is not a requirement that all video present targets in the topology have connected monitors. The topology
can be valid and the VidPN can be supported even if there are no connected monitors.
From time to time, the VidPN manager calls DxgkDdiIsSupportedVidPn to ask the display miniport driver
whether a certain VidPN is supported on a display adapter. One of the arguments passed to
DxgkDdiIsSupportedVidPn is a handle to a VidPN object called the desired VidPN. DxgkDdiIsSupportedVidPn
must inspect the topology of the desired VidPN and must take note of which video present sources and targets in
the desired VidPN already have pinned modes. Then it must return a Boolean value that indicates whether the
desired VidPN is supported (according to the definition given previously in this topic). For information about
inspecting the topology, source mode sets, and target mode sets of a VidPN, see VidPN Objects and Interfaces.
Indirect Display Driver Model Overview
The Indirect Display driver model was designed to provide a simple user mode driver model to support monitors
that are not connected to traditional GPU display outputs. For example, a dongle connected to the PC via USB that
has a regular (VGA, DVI, HDMI, DP etc) monitor connected to it.
Driver Implementation
The Indirect Display driver model is implemented as a UMDF class extension. The driver is the UMDF driver for the
device and uses the functionality exposed by the IddCx (Indirect Display Driver Class eXtension) to interface with
the windows graphics sub-systems.
Indirect Display Driver Functionality

As the Indirect Display driver is the UMDF driver, it is responsible for all UMDF functionality like device
communications, power management, plug and play etc. The IddCx provides an interface to the Indirect Display
driver to interact with the Windows graphics sub-system in the following ways:
1. Create the graphics adapter representing the Indirect Display device
2. Report monitors being connected and disconnected from the system
3. Provide descriptions of the monitors connected
4. Provide available display modes
5. Support other display functionality, like hardware mouse cursor, gamma, I2C communications and protected
content
6. Process the desktop images to display on the monitor Because the Indirect Display driver is a UMDF driver
running in session 0, it does not have any component running in the user session so any driver instability will
not affect the stability of the system as a whole.
User Mode Model

The Indirect Display driver is a user mode only model with no support for kernel mode components, hence the
driver is able to use any DirectX API's in order to process the desktop image. In fact, the IddCx provides the desktop
image to encode in a DirectX surface.
Note The Indirect Display driver should be built as a universal windows driver so it can be used on multiple
Windows platforms.
At build time, the UMDF Indirect Display driver declares the version of IddCx it was built against and the OS
ensures that the correct version of IddCx is loaded when the driver is loaded.
The following sections describe the Indirect Display Driver Model:
IddCx Objects
IddCx Objects
IddCx uses the extensible UMDF object model to represent graphics objects, they are covered in following sections.
The UMDF object model allows the driver specific storage to be associated with each IddCx (and hence UMDF)
object, see UMDF Object Model for more information
IDDCX_ADAPTER
This object represents a single logical display adapter created by the driver in a two stage process. First, it calls the
IddCxAdapterInitAsync callback function and the OS calls the driver's EvtIddCxAdapterInitFinished DDI to
complete the initialization.
In the simplest case, there is a one to one mapping between the UMDF device object created by the plug and play
subsystem for the attached Indirect Display device and the IDDCX_ADAPTER the driver creates. In more complex
scenarios where a single Indirect Display dongle contains multiple plug and play devices (eg 2 USB device
function), it is the responsibility of the driver to create only a single IDDCX_ADAPTER object for the multiple
UMDF device objects created, one for each Pnp device. The driver needs to consider the following in this scenario :
1. The IDDCX_ADAPTER should only be created once all the devices that make up the Indirect Display solution
have been started successfully
2. The driver has to pass a single WDFDEVICE when creating the adapter, so it requires logic to decide which
UMDF device it will pass.
3. If any of the devices that make up the Indirect Display adapter have a hardware error, the driver should report
all devices that make up the adapter as being in error. The Indirect Display model does not have an explicit
destroy adapter callback. Once the adapter initialization sequence has been completed successfully, the adapter
is valid until the UMDF device passed at initialization time is stopped. When creating the adapter, the driver
provides static adapter information about the Indirect Display Adapter.
IDDCX_MONITOR
This object represents a specific monitor connected to one of the connectors on the Indirect Display adapter.
The driver creates the monitor object in a two stage process. First, the driver calls the IddCxMonitorCreate
callback to create the IDDCX_MONITOR object, then calls IddCxMonitorArrival callback to complete the monitor
arrival. When a monitor is unplugged, the driver calls the IddCxMonitorDeparture callback to report the monitor
has been unplugged, which will cause the IDDCX_MONITOR object to be destroyed. Even if the same monitor is
un-plugged then reconnected, the IddCxMonitorDeparture/IddCxMonitorArrival sequence needs to be called
again. The IDDCX_MONITOR is a child of the IDDCX_ADAPTER object.
IDDCX_SWAPCHAIN
This object represents a swapchain that will provide desktop images to display on a connected monitor. The
swapchain has multiple buffers to allow the OS to compose the next desktop image in one buffer while the Indirect
Display driver is accessing another buffer. The IDDCX_SWAPCHAIN is a child of the IDDCX_MONITOR so there
will only be one assigned swapchain to a given monitor at any time. The OS creates and destroys the
IDDCX_SWAPCHAIN objects and assigns/unassigns them to monitors using the
EvtIddCxMonitorAssignSwapChain and EvtIddCxMonitorUnassignSwapChain Ddi calls.
IDDCX_OPMCTX
This object represents an active OPM context from a single application OPM context that the application can use to
control output protection on a single monitor. Multiple OPM contexts can be active on a given monitor at the same
time. The OS calls the driver to create and destroy the OPM contexts using the driver's
EvtIddCxMonitorOPMCreateProtectedOutput and EvtIddCxMonitorOPMDestroyProtectedOutput DDI calls.
These topics discuss solutions for tasks you can perform by using the Windows Display Driver Model (WDDM):
Locking Memory
Manipulating 3-D Virtual Textures Directly from Hardware
The user-mode display driver receives calls to its CreateResource function when the Microsoft Direct3D runtime
requires the creation of a list of surfaces. The Direct3D runtime specifies a resource handle to the list of surfaces
that the user-mode display driver uses to call back into the runtime. The user-mode display driver creates a
resource object to represent the list of surfaces, generates a unique handle to this object, and returns the handle
back to the Direct3D runtime. The runtime uses this unique handle in subsequent driver calls to identify the list of
surfaces. The runtime identifies a particular surface by specifying the index of the surface in the array contained in
the pSurfList member of the D3DDDIARG_CREATERESOURCE structure.
Because the user-mode display driver receives the driver-defined resource handle in calls that refer to the
resource, the driver is not required to perform a costly handle lookup in order to locate the driver-defined resource
object. Likewise, so that the runtime is also not required to perform a handle lookup, the user-mode display driver
uses the Direct3D runtime-defined resource handle when the user-mode display driver calls back into the runtime.
The user-mode display driver calls the pfnAllocateCb function to allocate memory for the surfaces. In the
pfnAllocateCb call, the user-mode display driver can pass private data for the list of surfaces and for each
individual surface in the pPrivateDriverData members of the D3DDDICB_ALLOCATE and
D3DDDI_ALLOCATIONINFO structures, respectively. However, the user-mode display driver cannot receive
private data from the pPrivateDriverData members. The user-mode display driver can allocate memory for this
private data and can free the memory after the pfnAllocateCb call returns, or can use stack memory to pass this
private data. The pfnAllocateCb function returns to the user-mode display driver a handle to each allocation for
each allocated surface.
Note The user-mode display driver must call the pfnAllocateCb function once for each shared surface for each
device. For example, if device 1 creates a shared surface that is also used by devices 2, 3, and 4, then devices 2, 3,
and 4 must also call pfnAllocateCb once for the shared surface in order to retrieve the allocation handle.
The user-mode display driver must track each surface to each allocation handle, typically, by maintaining a surface-
to-allocation handle table. The user-mode display driver should store each allocation handle within the driver-
defined resource object.
When the Direct3D runtime performs an operation on a previously allocated surface (for example, in a call to the
user-mode display driver's Blt function), the user-mode display driver receives the handle to the resource, possibly
with a surface index. The user-mode display driver uses this resource handle to retrieve the driver-defined
resource object. The driver obtains the allocation handles that are stored in the resource object and assembles
them in the command buffer. The user-mode display driver uses the allocation handles that correspond to the
surfaces when calling the pfnRenderCb function to submit a command buffer to the display miniport driver. The
display miniport driver can call the DxgkCbGetHandleData function to determine to which surface allocations
the user-mode display driver refers.
The user-mode display driver receives information about the memory type that should be used when it receives a
request to create a resource. The memory type is specified as either system or video memory through the
D3DDDIPOOL_SYSTEMMEM or D3DDDIPOOL_VIDEOMEMORY enumerators, respectively, of the Pool member of
the D3DDDIARG_CREATERESOURCE structure. In addition, the Microsoft Direct3D runtime provides hints to the
driver about the type of video memory to use by specifying one of following enumerators in the Pool member:
D3DDDIPOOL_LOCALVIDMEM
The runtime recommends that the driver use local video memory.
D3DDDIPOOL_NONLOCALVIDMEM
The runtime recommends that the driver use nonlocal video memory (for example, AGP memory).
The runtime provides hints to the user-mode display driver to improve performance. For example, the runtime
might specify D3DDDIPOOL_NONLOCALVIDMEM if the CPU writes to the surface, which is performed faster using
nonlocal video memory.
The user-mode display driver passes the hints to the display miniport driver through the pPrivateDriverData
members of the D3DDDI_ALLOCATIONINFO and DXGK_ALLOCATIONINFO structures in a vendor-specific way.
The display miniport driver indicates to the video memory manager the appropriate memory segment to use by
returning the identifier of the segment in the HintedSegmentId member of the DXGK_ALLOCATIONINFO
structure from a call to the driver's DxgkDdiCreateAllocation function.
Regardless of the type of video memory that is used to create the resource, the user-mode display driver must not
expose any semantic differences to the runtime. That is, for each video memory type, the driver must render
information identically and must return the same return values.
Locking Memory
To keep the graphics processing unit (GPU) from stalling, a preparation worker thread locks memory while the
GPU is busy with other operations. However, if a large allocation is completely paged to disk, the GPU might stall
while the driver waits for the GPU scheduler to page in the allocation.
The video memory manager provides special support for direct CPU access to swizzled allocations (that is,
allocations in which the display miniport driver's DxgkDdiCreateAllocation function sets the Swizzled flag in
the Flags member of the DXGK_ALLOCATIONINFO structure).
When the video memory manager evicts CPU-accessible allocations that are not marked by the driver as swizzled
from a memory segment, the display miniport driver must always store them in a linear format. Therefore, such
allocations cannot be swizzled while they are located in an aperture segment, and they must always be swizzled or
unswizzled by the driver's DxgkDdiBuildPagingBuffer function.
On the other hand, allocations that are marked as swizzled are not required to always be stored in a linear format
when evicted from a memory segment. For such allocations, the video memory manager tracks the swizzling state
of those allocations and only requires the driver's DxgkDdiBuildPagingBuffer function to unswizzle an allocation
during certain transfer operations.
After the user-mode display driver calls the Microsoft Direct3D runtime's pfnLockCb function, the video memory
manager and the display miniport driver behave in the following ways depending on the state of the allocation:
1. Allocation located in a memory segment
The video memory manager attempts to acquire a CPU aperture to provide linear access to the allocation. If
the video memory manager cannot acquire the aperture, the video memory manager evicts the allocation
back to system memory (unless the driver sets the DonotEvict member of the D3DDDICB_LOCKFLAGS
structure). When the video memory manager calls the display miniport driver's
DxgkDdiBuildPagingBuffer function to transfer the allocation, the display miniport driver should
unswizzle the allocation.
2. Allocation evicted (swizzled) or located in an aperture segment
The allocation must be unswizzled before the CPU can access it. Therefore, the video memory manager first
attempts to page in the allocation into a memory segment. After the allocation is located in a memory
segment, the video memory manager and display miniport driver behave as in number 1.
3. Allocation evicted (unswizzled)
If the allocation is already unswizzled to system memory, the video memory manager returns the existing
allocation pointer without further processing.
In order for the GPU to use an allocation that was previously unswizzled, the allocation must be reswizzled
before the GPU uses it. Therefore, on a surface fault, the video memory manager and the display miniport
driver behave in the following ways:
Allocation in a memory segment (unswizzled on the fly by the CPU aperture)
The allocation is already in a swizzled format that the GPU can process. Therefore, no further
processing is required by the video memory manager.
Allocation evicted to system memory (unswizzled)
The pages of the allocation contain unswizzled data and cannot be mapped into an aperture segment.
Therefore, the allocation must be paged in a memory segment. When the video memory manager
calls the display miniport driver's DxgkDdiBuildPagingBuffer function to page in the allocation,
the video memory manager requests that the display miniport driver swizzle the allocation.
Note After a swizzled allocation is under CPU access through a CPU aperture, it can still be evicted before the user-
mode display driver terminates the CPU access. This case is handled as in number 2. The eviction is performed in
such a way as to be invisible to the application and user-mode display driver. Also, a no-overwrite lock (that is, a
lock obtained by setting the IgnoreSync member of D3DDDICB_LOCKFLAGS) is not allowed on a swizzled
allocation. Only the CPU or the GPU can access such an allocation at any given time.
Manipulating 3-D Virtual Textures Directly from
Hardware
The user-mode display driver can create an allocation on top of an existing virtual address (for example, the virtual
address for the view of a three-dimensional (3-D) texture file). Creating an allocation on top of an existing virtual
address makes the 3-D texture available to hardware manipulation with a system-memory copy. However, in this
scenario, the user-mode display driver's Lock function must always evict pages from local video memory back to
system memory because the virtual address for the allocation was not allocated by the video memory manager.
Therefore, the video memory manager cannot transparently remap the virtual address for the texture from system
memory to video memory and vice versa. In other words, a virtual address with this property cannot be a mapped
view.
To display useful information to the user and for assistance in debugging, a display miniport driver must set
certain hardware information in the registry. A display miniport driver must set a chip type, digital-to-analog
converter (DAC) type, memory size (of the adapter), and a string to identify the adapter. This information is shown
by the Display application in Control Panel. Typically, the driver sets this information in its DxgkDdiAddDevice
function.
To set this information, the driver:
1. Calls the IoOpenDeviceRegistryKey function to open and obtain a handle to a software key for storing
driver-specific information. In this call, the driver specifies the PLUGPLAY_REGKEY_DRIVER flag in the
DevInstKeyType parameter and the KEY_SET_VALUE, KEY_WRITE, or KEY_ALL_ACCESS value in the
DesiredAccess parameter.
2. Calls the ZwSetValueKey function several times to set each type of hardware information. In each call, the
driver specifies, in the KeyHandle parameter, the software-key handle that was obtained from
IoOpenDeviceRegistryKey.
The following table describes the information that the driver must register and provides details for the
ValueName and Data parameters of ZwSetValueKey:
INFORMATION FOR ENTRY VALUENAME PARAMETER DATA PARAMETER
Chip type HardwareInformation.ChipType Null-terminated string that

contains the chip name
DAC type HardwareInformation.DacType Null-terminated string that

contains the DAC name or
identifier (ID)
Memory size HardwareInformation.MemorySiz ULONG that contains, in

e megabytes, the amount of video
memory on the adapter
Adapter ID HardwareInformation.AdapterStri Null-terminated string that

ng contains the name of the adapter
BIOS HardwareInformation.BiosString Null-terminated string that

contains information about the
BIOS

Debugging Tips for the Windows Display Driver
Model (WDDM)
These topics discuss debugging tips for the Windows Display Driver Model (WDDM):
Enabling Debug Output for the Video Memory Manager
Changing the Behavior of the GPU Scheduler for Debugging
Using GPUView
Timeout Detection and Recovery (TDR)
When developing a new driver for the Windows Display Driver Model (WDDM), you should use checked binaries
of WDDM components. The checked-binary versions of these components have extensive validation and
debugging aids that are not available with the free binaries. However, free binaries should be used for performance
tuning because checked binaries are slower.
Hardware vendors who want to run checked binaries for WDDM can use one of the following approaches:
Install the checked-binary version of Windows Vista or later. For example, install the checked-binary version
of Windows 7 if you are developing a driver for Windows 7 rather than Windows Vista.
This is the most straightforward approach. However, running all checked-binary versions of operating
system components can lead to poor overall performance. Therefore, this is not always an appropriate
choice.
Install checked-binary versions of the WDDM components over a free-binary version of Windows Vista or
later.
This is the recommended way to run binaries for WDDM.
Replace the WDDM binaries in the free-binary Windows Vista or later with their checked-binary versions by
restarting using an alternate installation of Windows Vista or later.
Note The Win32k.sys, Gdi32.dll, Winsrv.dll, and User32.dllWDDM binaries are exceptions to this rule. These
binaries should always match the type of operating system build being installed. Therefore, on a free-binary
version of the operating system, these binaries should also be free binary; on a checked-binary version of
the operating system build, these binaries should be checked binary. Otherwise, hardware vendors can mix
and match free-binary and checked-binary versions of all other WDDM binaries.
Enabling Debug Output for the Video Memory
Manager
The video memory manager has an extensive logging mechanism to help catch and debug issues in a driver during
its development. To enable debugger output for the video memory manager, driver writers must first modify the
debug filter in the kernel debugger. The video memory manager currently uses the same filter as the video port
driver. Therefore, driver writers should submit the following command in the kernel debugger so that debug
messages received from the video memory manager can be displayed in the kernel debugger:
ed nt!Kd_VIDEOPRT_Mask ff

Changing the Behavior of the GPU Scheduler for
Debugging
To help in debugging the driver, the behavior of the graphics processing unit (GPU) scheduler can be changed by
configuring the registry.
You can enable or disable preemption requests from the GPU scheduler (see Timeout Detection and Recovery) by
using the following registry configuration:
KeyPath : HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\Scheduler
KeyValue : EnablePreemption
ValueType : REG_DWORD
ValueData : 0 to disable preemption, 1 to enable preemption (default).

To enable the Microsoft Direct3D runtime to emulate state blocks, the registry can be configured in the following
way:
KeyPath : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Direct3D
KeyValue : EmulateStateBlocks
ValueData : 1 for D3D runtime emulation of stateblocks, 0 for driver implementation (default).
Note After the registry is configured to turn on emulation of state blocks by the Direct3D runtime, the runtime
does not call the user-mode display driver's StateSet function to set any state-block information.
The Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) records display driver-related errors, assertions,
warnings, and events to a log in memory (in Watchdog.sys).
In addition to recording information to a log, by default, the checked-build version of the DirectX graphics kernel
subsystem breaks into the attached debugger if errors or assertions occur. By default, the free-build version of the
DirectX graphics kernel subsystem only records errors and assertions to the log and does not break into the
debugger if errors or assertions occur. You can change this default behavior by first creating the following
REG_DWORD entries in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Logging\BreakOnAssertion
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Logging\BreakOnError
To make the debugger start if errors or assertions occur, you should set the value of BreakOnError or
BreakOnAssertion to 1 (TRUE) respectively. To make the debugger not start if errors or assertions occur, you
should set the value of BreakOnError or BreakOnAssertion to 0 (FALSE) respectively.
User-mode driver logging
To get a more actionable breakdown of video memory, the Windows Display Driver Model (WDDM) driver must
expose the relationship between Microsoft Direct3D resources and video memory allocations. This is made
possible starting with Windows 8 with the introduction of additional user-mode driver (UMD) logging interfaces.
With this information added to Event Tracing for Windows (ETW) traces, it's possible to see the video memory
allocations from the API perspective.
WHCK requirements and tests Device.Graphicsâ€¦UMDLogging
For developers, UMD logging can clarify memory costs that are currently very hard to see, such as internal
fragmentation or the impact of rapidly discarding surfaces. It enables Microsoft to better work with customers and
partners who provide traces for analysis of performance problems. In particular, this feature can help to overcome
a common blocking point in investigating memory-related performance issues: the application is using too large a
working set, but you cannot determine which API resources or calls are causing the problem.
The driver must expose the relationship between Direct3D resources and video memory allocations by
implementing the UMD ETW interfaces. In addition to the logging events, the driver must be able to report all
existing mappings between resources and allocations at any point in time.
UMD driver allocation logging DDI

The user-mode driver allocation logging device driver interface (DDI) provides events under the Event Tracing for
Windows (ETW) kernel-level tracing facility that show which API resources are associated with which kernel
allocations in the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys).
You can use the DDI to discover internal memory fragmentation or the impact of surfaces being rapidly discarded,
to provide better trace information for Microsoft to help you identify performance problems, and to help
determine when an app's resources or API calls are causing it to use too large a working set of memory.
Use these functions, enumeration, and structure from the Umdprovider.h header to log events in your user-mode
display driver:
UMDEtwLogMapAllocation function
UMDEtwLogUnmapAllocation function
UMDEtwRegister function
UMDEtwUnregister function
UMDETW_ALLOCATION_SEMANTIC enumeration
UMDETW_ALLOCATION_USAGE structure
Also see the Umdetw.h header.
WHCK documentation on Device.Graphicsâ€¦UMDLogging.
Disabling Frame Pointer Omission (FPO) optimization
In Windows 7, Windows Display Driver Model (WDDM) 1.1 kernel-mode drivers are required to disable Frame
Pointer Omission (FPO) optimizations to improve the ability to diagnose performance problems. Starting with
Windows 8, the same requirement is applicable for all WDDM 1.2 and later drivers (user-mode and kernel-mode),
thereby making it easier to debug performance issues related to FPO in the field.
Driver implementation—Full graphics, Render only, and Mandatory

Display only
WHCK requirements and tests Device.Graphicsâ€¦WHQL FPO optimization check for

kernel video driver(s) (1.1)

WHCK documentation on Device.Graphicsâ€¦WHQL FPO optimization check for kernel video driver(s) (1.1).
Using GPUView
GPUView (GPUView.exe) is a development tool for determining the performance of the graphics processing unit
(GPU) and CPU. It looks at performance with regard to direct memory access (DMA) buffer processing and all
other video processing on the video hardware. GPUView is useful for developing display drivers that comply with
the Windows Vista display driver model. GPUView is introduced with the release of the Windows 7 operating
system.
GPUView and other files that are associated with it are included with the Windows Performance Toolkit (WPT) as
an installable option of the WPT MSI. GPUView binaries are available for x86-based, x64-based, and IA64-based
architectures. For example, Wpt_x86.msi is for an x86 platform. The WPT MSI includes the files that are described in
the following table.
FILES PURPOSE
EULA.rtf Legal agreement
GPUView.chm GPUView help file
Readme.txt Any additional information that is not included in the help

file
GPUView.exe Program for viewing ETL files with video data
AEplugin.dll, DWMPlugin.dll, MFPlugin.dll, NTPlugin.dll, Plugins to interpret events

DxPlugin.dll, and DxgkPlugin.dll
CoreTPlugin.dll Plugin for Statistical Options dialog
Log.cmd Script to turn on and off the appropriate information for

logging
SymbolSearchPath.txt A text file that sets the symbol path to resolve stackwalk
and other events

XPS rasterization on the GPU
XML Paper Specification (XPS) rasterization on the GPU does not require any independent hardware vendor (IHV)
code or behavioral changes in drivers. However, XPS rasterization is a usage pattern that can potentially expose
bugs or improper assumptions in driver code. Windows Display Driver Model (WDDM) 1.2 and later drivers must
be able to pass XPS rasterization display conformance tests in order to ensure high-quality Windows printing.
WHCK requirements and tests Device.Graphicsâ€¦XPSRasterizationConformance
XPS rasterization conformance

The XPS rasterization display conformance requirement determines whether a WDDM GPU driver produces correct
rasterization results when it's used by Direct2D in the context of the XPS rasterizer.
The XPS rasterizer is a system component used heavily by Windows print drivers to rasterize an XPS Print
Descriptor Language (PDL). To determine the correctness of rasterization results, a comparison is performed
between the results that are obtained from the XPS rasterizer when executed on a system with the subject WDDM
GPU driver, and the results obtained from baseline use of the XPS rasterizer.

WHCK documentation on Device.Graphicsâ€¦XPSRasterizationConformance.
Timeout Detection and Recovery (TDR)
One of the most common stability problems in graphics occurs when a computer "hangs" or appears completely
"frozen" while, in reality, it is processing an end-user command or operation. The end-user typically waits a few
seconds and then decides to reboot the computer. The frozen appearance of the computer typically occurs
because the GPU is busy processing intensive graphical operations, typically during game play. The GPU does not
update the display screen, and the computer appears frozen.
In Windows Vista and later, the operating system attempts to detect situations in which computers appear to be
completely "frozen". The operating system then attempts to dynamically recover from the frozen situations so that
desktops are responsive again. This process of detection and recovery is known as timeout detection and recovery
(TDR). In the TDR process, the operating system's GPU scheduler calls the display miniport driver's
DxgkDdiResetFromTimeout function to reinitialize the driver and reset the GPU. Therefore, end users are not
required to reboot the operating system, which greatly enhances their experience.
The only visible artifact from the hang detection to the recovery is a screen flicker. This screen flicker results when
the operating system resets some portions of the graphics stack, which causes a screen redraw. This flicker is
eliminated if the display miniport driver complies with Windows Display Driver Model (WDDM) 1.2 and later (see
Providing seamless state transitions in WDDM 1.2 and later). Some legacy Microsoft DirectX applications (for
example, those DirectX applications that conform to DirectX versions earlier than 9.0) might render to a black
screen at the end of this recovery. The end user would have to restart these applications.
This sequence briefly describes the TDR process:
Timeout detection in the Windows Display Driver Model (WDDM)

The GPU scheduler, which is part of the DirectX graphics kernel subsystem (Dxgkrnl.sys), detects that the GPU is
taking more than the permitted amount of time to execute a particular task. The GPU scheduler then tries to
preempt this particular task. The preempt operation has a "wait" timeout, which is the actual TDR timeout. This
step is thus the timeout detection phase of the process. The default timeout period in Windows Vista and later
operating systems is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout
period, the operating system diagnoses that the GPU is frozen.
To prevent timeout detection from occurring, hardware vendors should ensure that graphics operations (that is,
direct memory access (DMA) buffer completion) take no more than 2 seconds in end-user scenarios such as
productivity and game play.
Preparation for recovery

The operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to
inform the driver that the operating system detected a timeout. The driver must then reinitialize itself and reset
the GPU. In addition, the driver must stop accessing memory and should not access hardware. The operating
system and the driver collect hardware and other state information that could be useful for post-mortem
diagnosis.
Desktop recovery
The operating system resets the appropriate state of the graphics stack. The video memory manager, which is also
part of Dxgkrnl.sys, purges all allocations from video memory. The display miniport driver resets the GPU
hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As
previously mentioned, some legacy DirectX applications might render just black at the end of this recovery, which
requires the end user to restart these applications. Well-written DirectX 9Ex and DirectX 10 and later applications
that handle Device Remove technology continue to work correctly. An application must release and then re-create
its Microsoft Direct3D device and all of the device's objects. For more information about how DirectX applications
recover, see the Windows SDK.
Related TDR topics

These topics describe the TDR process and registry keys that enable TDR debugging:
Limiting Repetitive GPU Hangs and Recoveries
TDR Error Messaging
TDR Registry Keys
TDR changes in Windows 8
Limiting Repetitive GPU Hangs and Recoveries
Beginning with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008, the user experience has been
improved in situations where the GPU hangs frequently and rapidly. Repetitive GPU hangs indicate that the
graphics hardware has not recovered successfully. In these situations, the end user must shut down and restart the
operating system to fully reset the graphics hardware. If the operating system detects that six or more GPU hangs
and subsequent recoveries occur within 1 minute, the operating system bug-checks the computer on the next GPU
hang.
TDR Error Messaging
Throughout the TDR process (that is, the process of detecting and recovering from situations where a GPU stops
operating), the desktop is unresponsive and thus unavailable to the end user. In the final stages of recovery, a brief
screen flash can occur that is similar to the brief screen flash that occurs when the end user changes the screen
resolution. After the operating system has successfully recovered the desktop, the following informational message
appears to the end user.
The operating system also logs the preceding message in the Event Viewer application and collects diagnosis
information in the form of a debug report. If the end user opted in to provide feedback, the operating system
returns this debug report to Microsoft through the Online Crash Analysis (OCA) mechanism.
TDR Registry Keys
You can use the following TDR-related registry keys for testing or debugging purposes only. That is, they should
not be manipulated by any applications outside targeted testing or debugging.
TdrLevel
Specifies the initial level of recovery. The default value is to recover on timeout (TdrLevelRecover).
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue : TdrLevel
ValueData : TdrLevelOff (0) - Detection disabled
TdrLevelBugcheck (1) - Bug check on detected timeout, for example, no recovery.
TdrLevelRecoverVGA (2) - Recover to VGA (not implemented).
TdrLevelRecover (3) - Recover on timeout. This is the default value.
TdrDelay
Specifies the number of seconds that the GPU can delay the preempt request from the GPU scheduler. This
is effectively the timeout threshold. The default value is 2 seconds.
KeyValue : TdrDelay
ValueData : Number of seconds to delay. 2 seconds is the default value.
TdrDdiDelay
Specifies the number of seconds that the operating system allows threads to leave the driver. After a
specified time, the operating system bug-checks the computer with the code VIDEO_TDR_FAILURE (0x116).
The default value is 5 seconds.
KeyValue : TdrDdiDelay
ValueData : Number of seconds to leave the driver. 5 seconds is the default value.
TdrTestMode
Reserved. Do not use.
KeyValue : TdrTestMode
ValueData : Do not use.
TdrDebugMode
Specifies the debugging-related behavior of the TDR process. The default value is
TDR_DEBUG_MODE_RECOVER_NO_PROMPT, which indicates not to break into the debugger.
KeyValue : TdrDebugMode
ValueData : TDR_DEBUG_MODE_OFF (0) - Break to kernel debugger before the recovery to allow investigation
of the timeout.
TDR_DEBUG_MODE_IGNORE_TIMEOUT (1) - Ignore any timeout.
TDR_DEBUG_MODE_RECOVER_NO_PROMPT (2) - Recover without breaking into the debugger. This is the default
value.
TDR_DEBUG_MODE_RECOVER_UNCONDITIONAL (3) - Recover even if some recovery conditions are not met (for
example, recover on consecutive timeouts).
TdrLimitTime
Supported in Windows Server 2008 and later versions, and Windows Vista with Service Pack 1 (SP1) and
later versions.
Specifies the default time within which a specific number of TDRs (specified by the TdrLimitCount key) are
allowed without crashing the computer. The default value is 60 seconds.
KeyValue : TdrLimitTime
ValueData : Number of seconds before crashing. 60 seconds is the default value.
TdrLimitCount
Supported in Windows Server 2008 and later versions, and Windows Vista with Service Pack 1 (SP1) and
later versions.
Specifies the default number of TDRs (0x117) that are allowed during the time specified by the
TdrLimitTime key without crashing the computer. The default value is 5.
KeyValue : TdrLimitCount
ValueData : Number of TDRs before crashing. The default value is 5.

TDR changes in Windows 8
Starting with Windows 8, GPU timeout detection and recovery (TDR) behavior has changed to allow parts of
individual physical adapters to be reset, instead of requiring an adapter-wide reset.
WHCK requirements and tests Device.Graphics…TDRResiliency
TDR device driver interface (DDI)

To accommodate this behavior change, display miniport drivers should implement these functions:
DxgkDdiQueryDependentEngineGroup
DxgkDdiQueryEngineStatus
DxgkDdiResetEngine
Note A driver that supports these functions must also support level zero synchronization for the
DxgkDdiCollectDbgInfo function. This is to ensure that level zero miniport calls not affected be the reset operation
can continue. See Remarks of DxgkDdiCollectDbgInfo.
These structures are associated with the new functions:
DXGK_DRIVERCAPS (new SupportPerEngineTDR member)
DXGK_ENGINESTATUS
DXGKARG_QUERYDEPENDENTENGINEGROUP
DXGKARG_QUERYENGINESTATUS
DXGKARG_RESETENGINE
A display miniport driver indicates support for these functions by setting the
DXGK_DRIVERCAPS.SupportPerEngineTDR member, in which case it must implement the
DxgkDdiQueryDependentEngineGroup, DxgkDdiQueryEngineStatus, and DxgkDdiResetEngine functions.
TDR process description

A common stability problem in graphics occurs when the system appears completely frozen or hung while
processing an end-user command or operation. Usually the GPU is busy processing intensive graphics operations,
typically during game-play. No screen updates occur, and users assume that their system is frozen. Users usually
wait a few seconds and then reboot the system by pressing the power button.
Both Windows Vista and Windows 7 try to detect these problematic hang situations and dynamically recover a
responsive desktop. The system does not reboot, but in most cases the screen flickers as it is redrawn. However,
some older Microsoft DirectX applications render a black screen at the end of recovery, and users must restart
these applications. These GPU hangs are referred to as time-out detection and recovery errors (TDRs).
The figure shows the timeout detection and recovery process. For more details about this process, see Timeout
Detection and Recovery (TDR).
TDRs happen when a GPU command has taken too long to complete or the hardware is hung. TDRs enable the
operating system to detect that the UI is not responsive.
Nodes
As used in the above TDR functions, a node is one of multiple parts of a single physical adapter that can be
scheduled independently. For example, a 3-D node, a video decoding node, and a copy node can all exist in the
same physical adapter, and each can be assigned a separate node ordinal value in the
DXGKARG_QUERYDEPENDENTENGINEGROUP.NodeOrdinal member in a call to
DxgkDdiQueryDependentEngineGroup.
The number of nodes in the physical adapter is reported by the display miniport driver in the
NbAsymetricProcessingNodes member of DXGK_DRIVERCAPS.GpuEngineTopology.
The node ordinal value is passed in the NodeOrdinal member of the DXGKARG_CREATECONTEXT structure
when a context is created.
Engines
As used in the above TDR functions, an engine is one of multiple physical adapters (or GPUs) that together act as
one logical adapter. The DirectX graphics kernel subsystem supports such configurations but requires that each
engine must have the same number of nodes.
As an example, the GPU scheduler considers engine 0 to correspond to physical adapter 0. Engine 0 must have the
same number of nodes as engine 1, which corresponds to adapter 1.
Engine ordinal value at context creation

When a context is created, a single bit corresponding to the engine ordinal value is set in the EngineAffinity
member of the DXGKARG_CREATECONTEXT structure. The EngineOrdinal member of this and other scheduler-
related structures is a zero-based index. The value of EngineAffinity is 1 << EngineOrdinal, and EngineOrdinal
is the highest bit position in EngineAffinity.
Packets unaffected by engine reset

The driver might be asked by the GPU scheduler to resubmit packets that were submitted too late to the engine
hardware queue to be fully processed before the engine reset completed. The driver must follow these guidelines
to resubmit such packets:
Paging packets: The driver will be asked by the GPU scheduler to resubmit paging packets with their original
fence IDs, and in the same order as they were originally submitted. Any such packets will be resubmitted before
new packets are added to the hardware queue.
Render packets: The GPU scheduler will assign render packets new fence IDs and then resubmit them.
Calling sequence to reset an engine

When DxgkDdiResetEngine succeeds, the GPU scheduler ensures that the LastAbortedFenceId value returned
from the engine reset call corresponds either to an existing fence ID in the hardware queue, or to the last
completed fence ID on the GPU. The latter situation can happen when the hardware queue empties after the GPU
timeout is detected, but before the engine reset callback is invoked.
The last completed fence ID value on the GPU must be maintained by the driver at all times because it is also
needed to set the DmaPreempted.LastCompletedFenceId member of a
DXGKARGCB_NOTIFY_INTERRUPT_DATA preemption interrupt notification structure. The last completed fence
ID should be advanced only in these situations:
When a packet is completed (not preempted), the last completed fence ID should be set to the fence ID of the
completed packet.
When DxgkDdiResetEngine succeeds, the last completed fence ID should be set to the value of the
LastCompletedFenceId member returned by the engine reset call.
For adapter-wide reset, the last completed fence ID on all nodes should be advanced to the last submitted fence
ID at the time of the reset.
Here's a chronological sequence of a successful engine reset, as seen by the GPU scheduler:
1. A preemption attempt is issued.
2. A GPU timeout is detected.
3. A snapshot of the last submitted and completed fence IDs is taken by the GPU scheduler, and interrupts from
the timed-out engine are ignored. This is one atomic operation at the device interrupt level.
4. If there are no packets in the hardware queue at this point, exit. This can happen if a packet was completed in
the time window between steps 2 and 3.
5. All queued DPCs are flushed.
6. Prepare for engine reset.
7. Call DxgkDdiResetEngine.
8. If the LastAbortedFenceId member is less than the last completed fence ID or is greater than the last
submitted fence ID, the DirectX graphics kernel subsystem causes a system bugcheck to occur. In a crash
dump file, the error is noted by the message BugCheck 0x119, which has these four parameters:
0xA, meaning the driver has reported an invalid aborted fence ID
LastAbortedFenceId value returned by the driver
Last completed fence ID
An internal operating system parameter
9. If the LastAbortedFenceId value is valid, proceed with engine reset recovery as follows. If a paging packet
was affected by the engine reset, the GPU scheduler follows the engine reset with an adapter-wide reset. All
devices that own allocations referenced by that paging packet are put in the error state as well. However,
the system device itself is not put into the error state, and it resumes execution after the reset is complete.
Special cases
A special situation can occur when a packet is completed on the GPU between steps 3 and 7 described above. In
this case, LastAbortedFenceId should be set by the driver to the fence ID of the last completed packet if there are
no packets in the hardware queue from the driver's point of view. From the scheduler's point of view, it will appear
that such a packet was aborted, and the corresponding device will be put into an error state even though the
packet eventually completed.
If the driver cannot perform a reset operation because the hardware is in an invalid state, or because the hardware
is incapable of resetting the nodes, the driver should return a failure status code. If the GPU scheduler receives a
failure status code, it performs an adapter-wide reset and restart operation following the TDR behavior prior to
Windows 8.
Even if a driver has opted into the Windows 8 TDR behavior, there will be cases when the GPU scheduler requests
a reset and restart of the entire logical adapter. Therefore the driver must still implement the
DxgkDdiResetFromTimeout and DxgkDdiRestartFromTimeout functions, and their semantics remain the same as
prior to Windows 8. When an attempt to reset a physical adapter with DxgkDdiResetEngine leads to a reset of the
logical adapter, the !analyze command of the Windows debugger shows that the TdrReason value of the TDR
recovery context is set to a new value of TdrEngineTimeoutPromotedToAdapterReset = 9.

WHCK documentation on Device.Graphics…TDRResiliency.
Implementation Tips and Requirements for the
Windows Display Driver Model (WDDM)
These topics discuss tips and requirements for implementing Windows Display Driver Model (WDDM) user-mode
drivers and display miniport drivers:
Validating Private Data Sent from User Mode to Kernel Mode
Microsoft Windows Vista Display Driver 64-Bit Issues
Supporting Rotation
Display device hardware must support Microsoft Direct3D feature levels as described in the following Direct3D
topics:
Direct3D feature levels
10Level9 ID3D11Device Methods
10Level9 ID3D11DeviceContext Methods
Hardware Support for Direct3D 10Level9 Formats
In addition, the user-mode driver must expose certain capabilities in Direct3D feature levels 9_1, 9_2, and 9_3 in
order for Direct3D features to be properly exposed to applications. These driver topics list the specific capabilities
and display formats that the driver must expose:
For applications to fully access the features of Microsoft Direct3D versions 9_1, 9_2, and 9_3, the user-mode driver
must expose certain hardware capabilities. These capabilities are expressed in terms of the D3DCAPS9 structure
that is returned by the user-mode driver's GetCaps function. To indicate support of the capabilities, the driver must
set these members of D3DCAPS9 to a bitwise-OR of all of the respective flag values:
Minimum capabilities for Direct3D level 9_1

D3DCAPS9 MEMBER FLAG VALUE
Caps2 D3DCAPS2_DYNAMICTEXTURES
D3DCAPS2_FULLSCREENGAMMA
PresentationIntervals D3DPRESENT_INTERVAL_IMMEDIATE
D3DPRESENT_INTERVAL_ONE
PrimitiveMiscCaps D3DPMISCCAPS_COLORWRITEENABLE
ShadeCaps D3DPSHADECAPS_ALPHAGOURAUDBLEND
D3DPSHADECAPS_COLORGOURAUDRGB
D3DPSHADECAPS_FOGGOURAUD
D3DPSHADECAPS_SPECULARGOURAUDRGB
TextureFilterCaps D3DPTFILTERCAPS_MINFLINEAR
D3DPTFILTERCAPS_MINFPOINT
D3DPTFILTERCAPS_MAGFLINEAR
D3DPTFILTERCAPS_MAGFPOINT
TextureCaps D3DPTEXTURECAPS_ALPHA
(See Note.)
D3DPTEXTURECAPS_CUBEMAP
D3DPTEXTURECAPS_MIPMAP
D3DPTEXTURECAPS_PERSPECTIVE
TextureAddressCaps D3DPTADDRESSCAPS_CLAMP
D3DPTADDRESSCAPS_INDEPENDENTUV
D3DPTADDRESSCAPS_MIRROR
D3DPTADDRESSCAPS_WRAP
TextureOpCaps D3DTEXOPCAPS_DISABLE
D3DTEXOPCAPS_MODULATE
D3DTEXOPCAPS_SELECTARG1
D3DTEXOPCAPS_SELECTARG2
SrcBlendCaps D3DPBLENDCAPS_INVDESTALPHA
D3DPBLENDCAPS_INVDESTCOLOR
D3DPBLENDCAPS_INVSRCALPHA
D3DPBLENDCAPS_ONE
D3DPBLENDCAPS_SRCALPHA
D3DPBLENDCAPS_ZERO
DestBlendCaps D3DPBLENDCAPS_ONE
D3DPBLENDCAPS_INVSRCCOLOR
D3DPBLENDCAPS_ZERO
StretchRectFilterCaps D3DPTFILTERCAPS_MAGFLINEAR
D3DPTFILTERCAPS_MINFLINEAR
ZCmpCaps D3DPCMPCAPS_ALWAYS
D3DPCMPCAPS_LESSEQUAL
RasterCaps D3DPRASTERCAPS_DEPTHBIAS
D3DPRASTERCAPS_SLOPESCALEDEPTHBIAS
StencilCaps D3DSTENCILCAPS_TWOSIDED
MaxTextureWidth 2048
MaxTextureHeight 2048
NumSimultaneousRTs 1
MaxSimultaneousTextures 8
MaxTextureBlendStages 8
PixelShaderVersion D3DPS_VERSION(2,0)
MaxPrimitiveCount 65535
MaxVertexIndex 65534
MaxVolumeExtent 256
MaxTextureRepeat Must be zero, or 128, or greater.
MaxAnisotropy 2
MaxVertexW 0.f
Note These requirements also apply:

The driver must also set the TextureCaps member to a value of
D3DPTEXTURECAPS_NONPOW2CONDITIONAL and D3DPTEXTURECAPS_POW2, or to neither.
When the driver responds to an event, where D3DDDIARG_CREATEQUERY.QueryType is
D3DDDIQUERYTYPE_EVENT, it must always set the event's BOOL value to TRUE when responding. See
CreateQuery and D3DDDIARG_CREATEQUERY.

These capabilities must be set in addition to those listed for Direct3D level 9_1.
PrimitiveMiscCaps D3DPMISCCAPS_SEPARATEALPHABLEND
DevCaps2 D3DDEVCAPS2_VERTEXELEMENTSCANSHARESTREAMOF
FSET
TextureAddressCaps D3DPTADDRESSCAPS_MIRRORONCE
VolumeTextureAddressCaps D3DPTADDRESSCAPS_MIRRORONCE
VertexShaderVersion D3DVS_VERSION(2,0)
MaxAnisotropy 16
MaxPrimitiveCount 1048575
MaxVertexIndex 1048575
MaxVertexW 10000000000.f
Note This requirement also applies:

When the driver responds to a z-testing query, where D3DDDIARG_CREATEQUERY.QueryType is
D3DDDIQUERYTYPE_OCCLUSION, it must always set the query's UINT value to a non-zero value when
responding. See CreateQuery and D3DDDIARG_CREATEQUERY.

These capabilities must be set in addition to those listed for Direct3D levels 9_1 and 9_2.
PS20Caps->Caps D3DPS20CAPS_GRADIENTINSTRUCTIONS
PrimitiveMiscCaps D3DPMISCCAPS_INDEPENDENTWRITEMASKS
D3DPMISCCAPS_MRTPOSTPIXELSHADERBLENDING
TextureAddressCaps D3DPTADDRESSCAPS_BORDER
NumSimultaneousRTs 4
PS20Caps->NumInstructionSlots 512 (Pixel Shader Version 2b)
PS20Caps->NumTemps 32 (Pixel Shader Version 2b)
VS20Caps->NumTemps 32 (Vertex Shader Version 2a)
MaxVertexShaderConst 256 (Vertex Shader Version 2a)
VertexShaderVersion D3DVS_VERSION(3,0) (See Note.)
Note The VertexShaderVersion value of D3DVS_VERSION(3,0) guarantees instancing support. Direct3D 10Level
9 does not expose Shader Model 3.0.
This topic presents the requirements that Microsoft Direct3D feature levels place on the user-mode display driver.
The first and second columns of the first table show all Direct3D format types that the driver must support. The
third column shows all associated constant values of the Direct3D D3D10_FORMAT_SUPPORT and/or
D3D11_FORMAT_SUPPORT enumerations that the driver must support. The fourth column shows the minimum
Direct3D feature level at which the driver must support each format.
The second table shows the Direct3D 10Level 9 support algorithm for each enumeration value.
REQUIRED D3D10_ OR D3D11_

D3D9 FORMAT (D3DDDIFMT_* D3D10+ API EQUIVALENT FORMAT_SUPPORT_* MINIMUM REQUIRED
AND/OR D3DDECLTYPE (DXGI_FORMAT_) ENUMERATION VALUES DIRECT3D LEVEL
A32B32G32R32F or R32G32B32A32_FLOAT IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_FLOAT4
TEXTURE2D 9_2
TEXTURE3D 9_3
TEXTURECUBE 9_3
SHADER_LOAD 9_2
MIP 9_3
MIP_AUTOGEN 9_3
RENDER_TARGET 9_2
CPU_LOCKABLE 9_2
D3DDECLTYPE_FLOAT3 R32G32B32_FLOAT IA_VERTEX_BUFFER 9_1
A16B16G16R16F or R16G16B16A16_FLOAT IA_VERTEX_BUFFER 9_3

D3DDECLTYPE_FLOAT16_4
TEXTURE2D 9_2
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
MIP 9_2
MIP_AUTOGEN 9_2
RENDER_TARGET 9_2
BLENDABLE 9_3
CPU_LOCKABLE 9_2
A16B16G16R16 or R16G16B16A16_UNORM TEXTURE2D 9_2

D3DDECLTYPE_USHORT4N
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
SHADER_SAMPLE 9_2
MIP 9_2
MIP_AUTOGEN 9_2
RENDER_TARGET 9_2
CPU_LOCKABLE 9_2
Q16W16V16U16 or R16G16B16A16_SNORM IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_SHORT4N
D3DDECLTYPE_SHORT4 R16G16B16A16_SINT IA_VERTEX_BUFFER 9_1
G32R32F or R32G32_FLOAT IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_FLOAT2
TEXTURE2D 9_3
TEXTURE3D 9_3
TEXTURECUBE 9_3
SHADER_LOAD 9_3
RENDER_TARGET 9_3
CPU_LOCKABLE 9_3
D3DDECLTYPE_UBYTE4 R8G8B8A8_UINT IA_VERTEX_BUFFER 9_1
A8R8G8B8 or R8G8B8A8_UNORM IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_UBYTE4N
TEXTURE2D 9_1
TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
DISPLAY 9_1
BACK_BUFFER_CAST 9_1
A8R8G8B8 R8G8B8A8_UNORM_SRGB TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
DISPLAY 9_1
Q8W8V8U8 R8G8B8A8_SNORM TEXTURE2D 9_1

TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1
A8R8G8B8 B8G8R8A8_UNORM TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
DISPLAY 9_1
X8R8G8B8 B8G8R8X8_UNORM TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
A8R8G8B8 B8G8R8A8_UNORM_SRGB TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
DISPLAY 9_1
X8R8G8B8 B8G8R8X8_UNORM_SRGB TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
MIP_AUTOGEN 9_1
RENDER_TARGET 9_1
BLENDABLE 9_1
CPU_LOCKABLE 9_1
G16R16F or R16G16_FLOAT IA_VERTEX_BUFFER 9_3

D3DDECLTYPE_FLOAT16_2
TEXTURE2D 9_2
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
MIP 9_2
MIP_AUTOGEN 9_2
RENDER_TARGET 9_2
CPU_LOCKABLE 9_2
G16R16 or R16G16_UNORM TEXTURE2D 9_2

D3DDECLTYPE_USHORT2N
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
SHADER_SAMPLE 9_2
MIP 9_2
MIP_AUTOGEN 9_2
RENDER_TARGET 9_2
CPU_LOCKABLE 9_2
V16U16 or R16G16_SNORM IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_SHORT2N
TEXTURE2D 9_1
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_1
SHADER_SAMPLE 9_2
MIP 9_1
CPU_LOCKABLE 9_1
D3DDECLTYPE_SHORT2 R16G16_SINT IA_VERTEX_BUFFER 9_1

R32F or R32_FLOAT IA_VERTEX_BUFFER 9_1

D3DDECLTYPE_FLOAT1
TEXTURE2D 9_2
TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
MIP 9_2
MIP_AUTOGEN 9_2
RENDER_TARGET 9_2
CPU_LOCKABLE 9_2
R32_UINT IA_INDEX_BUFFER 9_1
S8D24 or D24S8 D24_UNORM_S8_UINT TEXTURE2D 9_1

DEPTH_STENCIL 9_1
L16 R16_UNORM TEXTURE2D 9_2

TEXTURE3D 9_2
TEXTURECUBE 9_2
SHADER_LOAD 9_2
SHADER_SAMPLE 9_2
MIP 9_2
CPU_LOCKABLE 9_2
R16_UINT IA_INDEX_BUFFER 9_1
D16 or D16_LOCKABLE D16_UNORM TEXTURE2D 9_1

DEPTH_STENCIL 9_1
V8U8 R8G8_SNORM TEXTURE2D 9_1

SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1
L8 R8_UNORM TEXTURE2D 9_1

TEXTURE3D 9_1
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1
DXT1 BC1_UNORM or TEXTURE2D 9_1

BC1_UNORM_SRGB
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1

BC2_UNORM_SRGB
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1

BC3_UNORM_SRGB
TEXTURECUBE 9_1
SHADER_LOAD 9_1
SHADER_SAMPLE 9_1
MIP 9_1
CPU_LOCKABLE 9_1
REQUIRED D3D10_ OR D3D11_ FORMAT_SUPPORT_*

ENUMERATION VALUES SUPPORT ALGORITHM IN DIRECT3D 10LEVEL 9
BACK_BUFFER_CAST Assumed true for any format that supports DISPLAY.
BLENDABLE No FORMATOP_NOALPHABLEND
CPU_LOCKABLE Assumed always true.

REQUIRED D3D10_ OR D3D11_ FORMAT_SUPPORT_*
ENUMERATION VALUES SUPPORT ALGORITHM IN DIRECT3D 10LEVEL 9
DISPLAY Hard-coded.
IA_VERTEX_BUFFER D3DDTCAPS_* (See Note.)
MIP No FORMATOP_NOTEXCOORDWRAPNORMIP
MIP_AUTOGEN (See Note.)
RENDER_TARGET FORMATOP_OFFSCREEN_RENDERTARGET
SHADER_LOAD Assumed for all non-depth formats.
SHADER_SAMPLE (See Note.)
TEXTURE2D FORMATOP_TEXTURE
TEXTURE3D FORMATOP_VOLUMETEXTURE
TEXTURECUBE FORMATOP_CUBETEXTURE
Note These are further details on the support algorithm's requirements in Direct3D 10Level 9:
The IA_VERTEX_BUFFER and/or IA_INDEX_BUFFER formats are supported by software vertex processing if there
is no D3DDEVCAPS_HWTRANSFORMANDLIGHT capability.
The TEXTURE2D format can also be inferred from it being a depth-stencil format.
For the SHADER_SAMPLE format, the driver must support FORMATOP_TEXTURE,
FORMATOP_VOLUMETEXTURE, or FORMATOP_CUBETEXTURE, and it must not report FORMATOP_NOFILTER.
For the MIP_AUTOGEN format, Direct3D 10Level 9 generates its own mip-maps, so it requires MIP,
RENDER_TARGET, and TEXTURE2D bits.
To save power on a computer, your kernel-mode display driver can reduce the number of VSync monitor refresh
interrupts that occur.
Newer processors and platforms often work with the operating system to conserve energy when the computer
system is idle. However, periodic system activity, such as the firing of interrupts, causes peak power usage and can
prevent the computer system from entering transient sleep states that would conserve energy.
Beginning with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008, the operating system can turn
off periodic VSync interrupt counting when the screen is not being refreshed from new graphics or mouse activity.
By controlling the VSync interrupt interval, your driver can save significant energy.
You can take advantage of this feature by rebuilding Windows Display Driver Model (WDDM) drivers by using the
Windows Server 2008 or later versions of the Windows Driver Kit (WDK).
Windows Vista with SP1 Driver Changes for VSync Control
For drivers to take advantage of this feature, they must support the VSyncPowerSaveAware member in the
DXGK_VIDSCHCAPS structure that was introduced in Windows Vista with SP1. Existing drivers that follow the
WDDM must be recompiled with the VSyncPowerSaveAware member by using the Windows Server 2008 or
later versions of the WDK.
A Windows Vista with SP1 or later system with a driver that follows the WDDM and that supports this feature will
turn off the counting feature of the VSync interrupt if no GPU activity occurs for 10 continuous periods of 1/Vsync,
where VSync is the monitor refresh rate. If the VSync rate is 60 hertz (Hz), the VSync interrupt occurs one time
every 16 milliseconds. Thus, in the absence of a screen update, the VSync interrupt is turned off after 160
milliseconds. If GPU activity resumes, the VSync interrupt is turned on again to refresh the screen.
Windows 8 Display-Only VSync Requirements
Starting in Windows 8, it's optional for a kernel mode display-only driver (KMDOD) to support VSync functionality,
as follows:
Display-only driver supports VSync control
If the KMDOD supports the VSync control feature, it must implement both DxgkDdiControlInterrupt and
DxgkDdiGetScanLine functions and must provide valid function pointers to both of these functions in the
KMDDOD_INITIALIZATION_DATA structure.
In this case the KMDOD must also implement the DxgkDdiInterruptRoutine and DxgkDdiDpcRoutine functions in
order to report VSync interrupts to the operating system.
In addition, the values of the PixelRate, hSyncFreq, and vSyncFreq members of the
DISPLAYCONFIG_VIDEO_SIGNAL_INFO structure cannot be D3DKMDT_FREQUENCY_NOTSPECIFIED.
Display-only driver does not support VSync control
If the KMDOD does not support the VSync control feature, it must not implement either DxgkDdiControlInterrupt
or DxgkDdiGetScanLine functions and must not provide valid function pointers to either of these functions in the
KMDDOD_INITIALIZATION_DATA structure.
In this case the Microsoft DirectX graphics kernel subsystem simulates values of VSync interrupts and scan lines
based on the current mode and the time of the last simulated VSync.
In addition, the values of the PixelRate, hSyncFreq, and vSyncFreq members of the
DISPLAYCONFIG_VIDEO_SIGNAL_INFO structure must be set to D3DKMDT_FREQUENCY_NOTSPECIFIED.
If these conditions are not met, the DirectX graphics kernel subsystem will not load the KMDOD.
Registry Control
For Windows Vista with SP1 and later versions of the Windows operating systems, the default VSync idle time-out
is 10 VSync periods. Optionally, for testing purposes, the time-out can be controlled by using registry settings.
Important To avoid application compatibility problems, do not change the default registry setting in production
drivers.
Key Path:
RTL_REGISTRY_CONTROL\GraphicsDrivers\Scheduler
Key Value:
VsyncIdleTimeout
ValueType:
REG_DWORD
Value:
10 = default
Value:
0 = disable VSync control (produces the same behavior same as Windows Vista)
Validating Private Data Sent from User Mode to
Kernel Mode
A display miniport driver must validate all private data sent from the user-mode display driver to prevent the
miniport driver from crashing, not responding (hanging), asserting, or corrupting memory if the private data is
invalid. However, because the operating system resets hardware that "hangs," the display miniport driver can send
instructions to the graphics processing unit (GPU) that cause the GPU to "hang." Private data can include any of the
following items:
Command buffer content sent to the miniport driver's DxgkDdiRender or DxgkDdiRenderKm function in
the pCommand buffer member of the DXGKARG_RENDER structure.
Data sent to the following miniport driver functions:
The DxgkDdiCreateAllocation function in the pPrivateDriverData buffer members of the
DXGKARG_CREATEALLOCATION and DXGK_ALLOCATIONINFO structures.
The DxgkDdiEscape function in the pPrivateDriverData buffer member of the DXGKARG_ESCAPE
structure.
The DxgkDdiAcquireSwizzlingRange function in the PrivateDriverData 32-bit member of the
DXGKARG_ACQUIRESWIZZLINGRANGE structure.
The DxgkDdiReleaseSwizzlingRange function in the PrivateDriverData 32-bit member of the
DXGKARG_RELEASESWIZZLINGRANGE structure.
The DxgkDdiQueryAdapterInfo function in the pInputData buffer member of the
DXGKARG_QUERYADAPTERINFO structure when the DXGKQAITYPE_UMDRIVERPRIVATE value is
specified in the Type member.
Specifying device state and frame latency starting in
WDDM 1.3
Windows Display Driver Model (WDDM) 1.3 and later user-mode display drivers can use escape flags to pass
device status and frame latency info to the display miniport driver when the pfnEscapeCb function is called. These
flags are available in the D3DDDI_ESCAPEFLAGS structure starting in Windows 8.1.
These reference topics describe how to implement this capability in your user-mode display driver:
D3DDDI_DEVICEEXECUTION_STATE
D3DDDI_EXECUTIONSTATEESCAPE
D3DDDI_FRAMELATENCYESCAPE
D3DDDI_ESCAPEFLAGS (new DeviceStatusQuery and ChangeFrameLatency members)
Windows Display Driver Model (WDDM) 64-Bit
Issues
To allow 32-bit applications to run on a 64-bit operating system, a 32-bit user-mode display driver must be
provided in addition to the 64-bit user-mode display driver that 64-bit applications require. However, only the 64-
bit version of a display miniport driver is required on a 64-bit operating system. Windows on Windows (WOW64)
enables 32-bit applications to run on a 64-bit operating system. For more information, see Supporting 32-Bit I/O in
Your 64-Bit Driver.
To install a 32-bit user-mode display driver on a 64-bit operating system, the following entry must be set in an
add-registry section of the INF file for the graphics device's display miniport driver. This must happen so that the
32-bit user-mode display driver's DLL name is added to the registry during driver installation:
...
HKR,, UserModeDriverNameWow, %REG_MULTI_SZ%, Xxx.dll
...
The INF file must contain information to direct the operating system to copy the 32-bit user-mode display driver
into the system's %systemroot%\SysWOW64 directory. For more information, see INF CopyFiles Directive and
INF DestinationDirs Section.
Because WOW64 cannot process opaque or untyped data structures such as the D3DDDICB_ALLOCATE structure
passed via the pfnAllocateCb function, it cannot perform an automatic conversion from 32 bit to 64 bit.
Therefore, for WOW64 to work correctly, you must consider the following items when writing a 32-bit user-mode
display driver to run on a 64-bit operating system:
Avoid pointers or data types that are sensitive to multiple operating systems, such as, SIZE_T or HANDLE.
Along with making the size of the entire structure variable, these variable-width data types make the
alignment and position of individual members different. If variable width members are unavoidable, you can
add another member to indicate that the data structure originates from a 32-bit user-mode display driver.
The 64-bit display miniport driver can then properly perform the conversion.
Even if variable width members are not present, you might need to consider architecture-specific alignment
requirements. For instance, on x64, a UINT64 (or QWORD) should be 8-byte aligned. Because a 32-bit user-
mode display driver compiled by a standard 32-bit compiler might not align these native 64-bit types
correctly, the 64-bit display miniport driver might not be able to accurately access data from the 32-bit user-
mode display driver. However, you can force alignment by using the appropriate pragma compiler
directives. Although using pragma compiler directives might cause a slight waste of space on 32-bit
operating systems, this lets you use identical 32-bit user-mode display drivers on 32-bit and 64-bit
operating systems. If you cannot force alignment by using the appropriate pragma compiler directives, the
32-bit user-mode display driver that runs using WOW64 on a 64-bit operating system must be different
from the 32-bit user-mode display driver running on a 32-bit operating system.
All functions of a display miniport driver and a user-mode display driver must save the floating-point control state,
such as, rounding mode or precision, before changing the floating-point control state and must restore the
floating-point control state to the previously saved setting before returning.
The Microsoft DirectX graphics kernel subsystem supplies an identical fence identifier in the SubmissionFenceId
members of the DXGKARG_PATCH and DXGKARG_SUBMITCOMMAND structures in calls to the display
miniport driver's DxgkDdiPatch and DxgkDdiSubmitCommand functions. Depending on how the graphics
hardware is implemented, the driver is only required to use the fence identifier passed to one of either the
DxgkDdiPatch or DxgkDdiSubmitCommand function for the following reasons:
The driver uses the fence identifier passed to DxgkDdiPatch to write into the end of the direct memory
access (DMA) buffer.
The driver uses the fence identifier passed to DxgkDdiSubmitCommand to write into the ring buffer, which
is the buffer where DMA buffers are queued for execution by the graphics processing unit (GPU) (most GPU
types use a DMA buffer queuing model).
To enable the Microsoft DirectX graphics kernel subsystem to properly track resource lifetime and to prevent
memory leaks in the operating system, the user-mode display driver must properly create and destroy resources.
The Microsoft Direct3D runtime calls the following user-mode display driver functions to create user-mode
resources.
CreateResource creates a new shared or unshared resource.
OpenResource opens a view to an existing shared resource.
In both calls, the Direct3D runtime passes a unique user-mode runtime resource handle that the user-mode display
driver uses to call back into the runtime. When CreateResource or OpenResource returns successfully, the user-
mode display driver returns a unique user-mode handle that represents the resource. This handle is the user-mode
driver resource handle. The runtime uses the user-mode driver resource handle in subsequent driver calls.
A one-to-one correspondence exists between the user-mode runtime resource handle and the user-mode driver
resource handle. The Direct3D runtime and the user-mode display driver exchange the user-mode runtime and
driver resource handles through the hResource members of the D3DDDIARG_CREATERESOURCE and
D3DDDIARG_OPENRESOURCE structures.
When the user-mode display driver calls the Direct3D runtime's pfnAllocateCb function to create allocations for a
user-mode resource, the driver should specify the user-mode runtime resource handle in the hResource member
of the D3DDDICB_ALLOCATE structure that the pData parameter points to. The Direct3D runtime generates a
unique kernel-mode handle to the resource and passes it back to the user-mode display driver in the
hKMResource member of D3DDDICB_ALLOCATE. The user-mode display driver can insert the kernel-mode
resource handle in the command stream for the display miniport driver to use later.
Note Although user-mode resource handles are always unique for each user-mode resource creation, kernel-
mode resource handles are not always unique. When the Direct3D runtime calls the user-mode display driver's
OpenResource function to open a view to an existing shared resource, the runtime passes the resource's kernel-
mode handle in the hKMResource member of the D3DDDIARG_OPENRESOURCE structure that the pResource
parameter points to. The runtime previously created this kernel-mode handle after the runtime called the user-
mode display driver's CreateResource function.
To destroy a user-mode resource that CreateResource or OpenResource created, the Direct3D runtime passes the
user-mode driver resource handle in the hResource parameter in a call to the user-mode display driver's
DestroyResource function. To release the kernel-mode resource handle and all of the allocations that are
associated with the user-mode resource, the user-mode display driver passes the user-mode runtime resource
handle in the hResource member of the D3DDDICB_DEALLOCATE structure that the pData parameter points to
in a call to the pfnDeallocateCb function.
Consider the following items when a user-mode display driver creates and destroys resources:
For allocations that the user-mode display driver creates in response to shared resources (that is, in
response to CreateResource calls with the SharedResource bit-field flag set in the Flags member of
D3DDDIARG_CREATERESOURCE), the driver must assign a non-NULL value to the hResource member of
D3DDDICB_ALLOCATE.
For allocations that the user-mode display driver creates in response to non-shared resources, the driver is
not required to assign a non-NULL value to the hResource member of D3DDDICB_ALLOCATE. If the driver
assigns NULL to hResource, the allocations are associated with the device and not a particular resource
(and kernel-mode resource handle). However, if allocations are truly related to a resource, the driver should
associate the allocations with that resource. Note A kernel-mode resource handle is created only if the user-
mode display driver sets the hResource member of D3DDDICB_ALLOCATE to the user-mode runtime
resource handle that the driver received from the hResource member of the
D3DDDIARG_CREATERESOURCE structure in a call to CreateResource.
When DestroyResource is called to destroy a non-shared user-mode resource, the user-mode display
driver can call pfnDeallocateCb with the hResource member of D3DDDICB_DEALLOCATE set to NULL
only if the driver never associated any allocations with the resource. If the user-mode display driver
associated allocations with the resource, the driver must call pfnDeallocateCb with the hResource
member of D3DDDICB_DEALLOCATE set to a non-NULL value; otherwise, a memory leak will occur.
A display miniport driver and the driver for a video capture device or another child device can mutually define a
private interface that the child driver can use to communicate with its device through the parent miniport driver. A
child video capture driver must be tightly coupled to the parent display miniport driver. In fact, video capture could
possibly be implemented as part of the display miniport driver. A video capture driver can use the private interface
with the display miniport driver to access the I2C bus and for other purposes.
To initialize the private interface, the video capture driver sends a IRP_MN_QUERY_INTERFACE request to the
display port driver (part of Dxgkrnl.sys) for the display miniport driver. After the display port driver receives such a
request, it calls the miniport driver's DxgkDdiQueryInterface function and passes a pointer to a
QUERY_INTERFACE structure that contains information to initialize the private interface.
Note If video capture is implemented as part of the display miniport driver, the video capture might call
DxgkDdiQueryInterface directly.
Each driver of a child device (including video capture devices) must return the adapter GUID that indicates the
hardware that the device is associated with. The adapter GUID is supplied to the display miniport driver in the
AdapterGuid member of the DXGK_START_INFO structure that is pointed to by the DxgkStartInfo parameter of
the DxgkDdiStartDevice function that is sent when the adapter is initialized. User-mode capture components can
subsequently map this adapter GUID to a display adapter.
In the Microsoft Windows 2000 Display Driver Model, video capture applications send system memory capture
buffers to kernel mode. Kernel mode then describes the system memory buffers by using memory descriptor list
(MDL) structures and sends the MDLs to the video capture driver. In addition to supporting capture to system
memory, the Windows Vista display driver model supports capture to video memory. The Direct3D runtime calls
DirectX Video Acceleration 2.0-type functions to direct the GPU to perform post processing on capture data.
Instead of sending MDLs to describe the video memory buffers, the user-mode display driver will send
D3DKMT_HANDLE-type values that are handles to capture buffer allocations. Therefore, the video capture driver
and display miniport driver combination can use existing callback functions like DxgkCbGetHandleData to
reference private data that describes the capture buffer. The driver combination can also use the
DxgkCbGetCaptureAddress callback function to return the physical address of the capture buffer.
Video capture applications call into the Direct3D runtime to create capture buffers; the runtime subsequently calls
into the user-mode display driver. The runtime calls the user-mode display driver's CreateResource function with
the CaptureBuffer bit-field flag set in the Flags member of the D3DDDIARG_CREATERESOURCE structure to
create capture buffers. The display miniport driver must also specify the Capture bit-field flag for the video
memory manager when the memory manager calls the display miniport driver's DxgkDdiCreateAllocation
function to create allocations for the capture buffers. When capture buffers are created, they are immediately
pinned in memory and are not unpinned until they are released. Because the capture stack must send kernel-mode
allocation handles for capture buffers to the capture driver, the runtime calls the user-mode display driver's
GetCaptureAllocationHandle function to map each resource handle to the kernel-mode allocation handle for
that resource.
The capture driver can report whether it supports capturing to system memory directly. If the capture driver
supports capturing directly to system memory, MDLs are sent to the capture driver for this purpose. If the capture
driver does not support direct capture to system memory, the runtime creates video memory capture buffers, and
the capture driver must fill them. The user-mode display driver's CaptureToSysMem function is called to copy the
contents of a capture buffer to a system memory surface. The runtime can use CaptureToSysMem rather than the
Blt function to take advantage of special hardware for bit-block transfers (bitblt) that do not require that the user-
mode display driver call the pfnRenderCb function.
Because AVStream controls video capture, the DirectX graphics kernel subsystem is not aware of when video
capture occurs. However, the graphics kernel subsystem is aware of the allocations that are used as capture
buffers. When a capture buffer is about to be destroyed, the graphics kernel subsystem calls the display miniport
driver's DxgkDdiStopCapture function to indicate that the capture operation must immediately stop using an
allocation as the capture buffer. If the capture operation has already been stopped through the capture stack, the
driver can safely ignore the call.
Supporting Rotation
The following topics describe how display miniport drivers and user-mode display drivers support rotation:
Supporting Rotation in a Display Miniport Driver
VidPN Path-Independent Rotation Interface
Supporting Rotation in a User-Mode Display Driver
Supporting Rotation in a Display Miniport Driver
A display miniport driver's DxgkDdiEnumVidPnCofuncModality function calls the

pfnUpdatePathSupportInfo function to report rotation support for each path in a video present network (VidPN)
topology. For more information about reporting rotation support, see Enumerating Cofunctional VidPN Source and
Target Modes.
The Microsoft DirectX graphics kernel subsystem uses non-rotated surface dimensions to create the shared
primary surface. To notify a display miniport driver to rotate the surface, the DirectX graphics kernel subsystem
specifies D3DKMDT_VIDPN_PRESENT_PATH_ROTATION-typed values in the Rotation member of the
D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION structure that is specified in the
ContentTransformation member of the D3DKMDT_VIDPN_PRESENT_PATH structure in calls to the display
miniport driver's DxgkDdiCommitVidPn and DxgkDdiUpdateActiveVidPnPresentPath functions.
Note All rotation degrees are defined in the counter-clockwise direction, which is consistent with how GDI defines
rotation.
When the DirectX subsystem notifies the display miniport driver to rotate the surface, the driver should rotate the
surface data only if the Rotate bit-field flag was set in the Flags member of the DXGKARG_PRESENT structure
that the pPresent parameter points to in a call to the driver's DxgkDdiPresent function. Even if the driver
determines that the current orientation of the screen is rotated from the presentation data and Rotate was not set,
the driver should not rotate the data.
Clone -mode behavior
Clone mode is a mode in which a video present source connects to multiple video present targets through multiple
paths in a video present network. (For more information about video present networks, see Multiple Monitors and
Video Present Networks.)
A display miniport driver handles rotation differently if it operates in clone mode because each target might
require a different rotation. The operating system, various versions of Microsoft DirectX runtimes, and user-mode
clients detect only the orientation of the primary video present target. Therefore, the content in the video present
source will always match the orientation of the primary video present target.
The following table shows how a display miniport driver behaves in clone mode for all of the relevant situations.
The setting of the Rotate flag is the setting of the Rotate bit-field in the Flags member of the
DXGKARG_PRESENT structure.
PRIMARY TARGET SECONDARY TARGET ROTATE FLAG DRIVER BEHAVIOR
Not rotated Not rotated Not set The driver performs no

rotation.
Not rotated Rotated Not set The driver rotates the

secondary target even
though the Rotate flag
is not set.
PRIMARY TARGET SECONDARY TARGET ROTATE FLAG DRIVER BEHAVIOR
Rotated Not rotated Set The driver rotates the

primary target but not
the secondary target.
Rotated Not rotated Not set Because Rotate is not

set, the driver does not
rotate the primary
target. Because the
secondary target does
not match the
orientation of the
content in the source,
the driver must rotate
the secondary target.
This situation occurs
when the client is
rotation-aware, and it
already has properly
oriented the content of
the source. Therefore,
the operating system
does not set Rotate.
Rotated Rotated Set The driver rotates both

the primary and
secondary targets.
Rotated Rotated Not set The rotation-aware

client has already
properly oriented the
content of the source.
Therefore, no additional
rotation is required.
Clone-mode requirements starting with Windows 8.1 Update

Starting with Windows 8.1 Update, drivers must meet these requirements. If test signing is enabled, a system
bugcheck will occur if a driver fails to meet these requirements.
Primary clone path
Definition: The path that includes the target monitor that duplicates the source display—for example, an external
monitor that duplicates the display on a laptop computer.
Requirement: In the primary clone path, the driver must set Offset0 to TRUE and the other 3 offset values in
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT to FALSE.
In the case of a portrait-first source display, the primary clone path is not rotationally offset. This means that the
primary clone path always has an offset of zero
(D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset0 is TRUE), and the Desktop Window
Manager (DWM) rotates its content in advance to match the proper orientation.
The primary clone path determines the monitor refresh rate for all primary and secondary clone targets.
Secondary clone path
Definition: The path that includes any additional target monitor, not part of the primary clone path, that also
duplicates the source display.
Requirement: In the secondary clone path, the driver must set at least one of the 4 offset values in
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT to TRUE. If the driver doesn't support path-
independent rotation, it should set Offset0 to TRUE in all secondary clone paths.
Here are two examples of settings the driver should make if it supports path-independent rotation:
Landscape-first example
If the source display and the target in the secondary clone path are both landscape-first monitors, in the secondary
clone path the driver would set D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset0 to TRUE
and the other 3 offset values in D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT to FALSE.
Alternately in this case, in the secondary clone path the driver would set both Offset0 and Offset180 to TRUE and
the other offset values to FALSE.
Portrait-first example
If the source display is a portrait-first device and connects to a landscape-first external monitor, in the secondary
clone path the driver would set either Offset270 or Offset90 to TRUE.
For more info, see Supporting Path-Independent Rotation.
Optimized screen rotation support
Windows 8 ensures a flicker-free screen rotation experience by ensuring that the output from the graphics adapter
stays enabled during a rotational mode change. This feature is required on all Windows Display Driver Model
(WDDM) 1.2 drivers that support rotated modes.
Note Starting with Windows 8.1 Update, device driver interfaces (DDIs) are updated to support the highest
possible resolution on cloned monitors when the primary display is rotated. See Supporting Path-Independent
Rotation.
Smooth rotation DDI

The display miniport driver must support updating the path rotation when these driver-implemented functions are
called:
DxgkDdiCommitVidPn
DxgkDdiUpdateActiveVidPnPresentPath
The driver must indicate support for smooth rotation in a call to DxgkDdiUpdateActiveVidPnPresentPath by setting
the DXGK_DRIVERCAPS structure's SupportSmoothRotation member, which is available starting with Windows
8. The driver must always be able to set the path rotation during a call to DxgkDdiCommitVidPn.
Smooth rotation scenarios

On traditional desktop and laptop systems, screen rotation is not a frequently used scenario. But in mobile devices,
screen rotation is often a mainstream scenario. Windows 8 enables optimizations to the display infrastructure to
ensure that the monitor synchronization stays enabled during screen rotation. End users can experience a smooth
rotation transition when the following are true:
The platform is running WDDM 1.2.
The desktop composition manager is on and is actively composing.
The mode change request is determined to be compatible with smooth rotation mode transition. Two modes
are compatible if they have the same dimensions (width and height), topology, refresh rates, pixel formats, and
stride, and differ only in screen orientation (that is, are rotated).
Supporting Path-Independent Rotation
Starting with Windows 8.1 Update, the operating system supports cloning portrait-first displays on landscape-first
displays with the greatest possible resolution. The display miniport driver must set the proper offset values in the
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT structure for the primary clone path and secondary
clone path, as described in Supporting Rotation in a Display Miniport Driver.
These Device driver interfaces (DDIs) are new in Windows 8.1 Update:
VidPN Path-Independent Rotation Interface
These DDIs are updated in Windows 8.1 Update:
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION
Cloning a portrait-first device

When a driver of a portrait-first device is requested to clone to a landscape-first monitor, it should report source-
mode (x,y) resolutions that match the resolutions in the primary clone path. The secondary clone path could then
support 90- and 270-degree offset values
(D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset90 or .Offset270 are TRUE). So when a
VidPN is committed with an D3DKMDT_VIDPN_PRESENT_PATH_ROTATION enumeration value that indicates a
90- or 270-degree offset, this means that the (x,y) resolutions are flipped in this particular path.
By default the operating system chooses the secondary clone path to be the internal display panel. In the case that
the internal panel is portrait-first, the operating system expects
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset270 to be set on this path in order to display
on the internal display panel in landscape mode. In the case of a landscape-first external monitor in the secondary
clone path, the operating system expects the driver to support
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset90, although this is likely to be a rare
scenario.
Example clone scenarios

Here's a typical scenario where a portrait-first device with native resolution 800 (width) x 1280 pixels (height) is
connected in clone mode to a landscape-first TV with height 1080 pixels. The driver would report this info to the
operating system:
source mode
1280 x 800
TV target mode
1920 x 1080 (aspect-ratio preserved scaling)
device target mode
800 x 1280 (identity scaling)
primary clone path (TV)
driver supports only D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset0, as well as normal
rotation support
secondary clone path (device)
driver supports only D3DKMDT_VIDPN_PRESENT_PATH_ROTATION_SUPPORT.Offset270, as well as normal
rotation support
The call to the DxgkDdiCommitVidPn function then returns with these path settings from the
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION enumeration:
D3DKMDT_VPPR_IDENTITY
D3DKMDT_VPPR_IDENTITY_OFFSET270
The operating system expects the driver to rotate the provided content 270 degrees.
If, in the Display control panel's Orientation drop-down box, the user chooses the Landscape (flipped) option,
the call to the DxgkDdiCommitVidPn function returns with these path settings from the
D3DKMDT_VIDPN_PRESENT_PATH_ROTATION enumeration:
D3DKMDT_VPPR_ROTATE180
D3DKMDT_VPPR_ROTATE180_OFFSET270
If the Desktop Window Manager (DWM) has already rotated the content 180 degrees, the driver must still rotate it
another 270 degrees in the secondary clone path. Otherwise, the driver must rotate the content 180 degrees for
the TV and 90 degrees for the device. Note that to rotate the content, the driver must set the Rotate member of
the DXGK_PRESENTFLAGS structure.
Supporting Rotation in a User-Mode Display Driver
A user-mode display driver supports rotation differently, depending on many factors. For example, the user-mode
display driver must behave differently for full-screen devices than it does for windowed devices. Also, the primary
surfaces are created differently based on whether the desktop window manager (DWM) is running, the graphics
adapter supports Microsoft DirectX 9L, or the DirectX 9L application is rotation-aware.
The following topics describe how a user-mode display driver supports rotation for different situations:
Windowed-Mode Behavior
Full-Screen-Mode Behavior
DirectX Runtime Behavior
Windowed-Mode Behavior
The Microsoft Direct3D runtime for a windowed-mode device never calls functions of a user-mode display driver to
lock a rotated primary surface, to render to a rotated primary surface, or to perform bit-block transfers (bitblt) to or
from a rotated primary. That is, the Direct3D runtime for a windowed-mode device handles all of these situations.
The Direct3D runtime for a windowed-mode device might not call the user-mode display driver's OpenResource
function to open the shared primary surface and to inform the user-mode display driver of the orientation of the
primary surface. However, if the desktop window manager (DWM) is not running, the Direct3D runtime calls
OpenResource, and the user-mode display driver is informed about the orientation of the primary. The user-mode
display driver must be aware of the primary surface orientation only if the driver must access the primary surface
(through a bitblt or lock) for its own purposes; the Direct3D runtime for a windowed-mode device will never
request the user-mode display driver to access a rotated primary surface. Therefore, if the user-mode display driver
must access the primary surface for its own internal purposes, the driver requires a mechanism in addition to a call
to its OpenResource function because OpenResource is not always called.
The DWM or the display miniport driver's DxgkDdiPresent function rotates windowed-mode data.
Full-Screen-Mode Behavior
A user-mode display driver can determine that a rendering device is in full-screen mode:
If the Fullscreen bit-field flag is set in the Flags member of the D3DDDIARG_OPENRESOURCE structure
that the pResource parameter points to in a call to the driver's OpenResource function.
If the Primary bit-field flag is set in the Flags member of the D3DDDIARG_CREATERESOURCE structure
that the pResource parameter points to in a call to the driver's CreateResource function.
An application that is developed for Microsoft DirectX 9.0 or earlier will cause the Microsoft Direct3D runtime to
call OpenResource to open the shared primary surface and then CreateResource to create any additional back
buffers. A Microsoft DirectX 9L application will cause the Direct3D runtime to call CreateResource (without calling
OpenResource) to create all swap-chain buffers. The Direct3D runtime specifies the primary surface orientation in
the Rotation member of the D3DDDIARG_OPENRESOURCE and D3DDDIARG_CREATERESOURCE structures
that the pResource parameter points to in calls to both the OpenResource and the CreateResource functions,
respectively.
For a full-screen device, a user-mode display driver must lock a rotated resource, render to a rotated resource, and
perform bit-block transfers (bitblt) from a rotated resource. Typically, the user-mode display driver creates interim
render targets in the rotated orientation (all locks, bitblts, and renderings will go to these interim render targets)
and primary allocations in the landscape orientation (that is, the orientation that the digital-to-analog converter
[DAC] uses to scan out). When the user-mode display driver is called to flip the data, it performs a rotating bitblt
from the interim render target to the landscape buffer before it calls the pfnPresentCb function to issue the flip
command.
Whenever a user-mode display driver must perform a bitblt that involves a rotated resource and a non-rotated
resource, the Direct3D runtime specifies the Rotate bit-field flag in the Flags member of the D3DDDIARG_BLT
structure in a call to the driver's Blt function to indicate to the driver that the proper rotation must occur for the
bitblt.
DirectX 9L applications can be rotation-aware, which means that they will render everything in the proper
orientation and will properly handle locks to a rotated buffer. When the Direct3D runtime creates a swap chain for
a rotation-aware application, the runtime always specifies the rotation as D3DDDI_ROTATION_IDENTITY in the
Rotation member of the D3DDDIARG_CREATERESOURCE structure because the user-mode display driver is not
required to perform any special actions for the rotation-aware application to work.
DirectX Runtime Behavior
Various versions of the Microsoft DirectX runtime handle the following rotation situations on behalf of the driver:
The Microsoft DirectDraw runtime automatically fails any attempt to display an overlay while the display is
rotated.
All versions of the DirectX runtime adjust the scan-line values that are returned while the primary surface is
rotated so that the scan-line values cover the entire range up to the height of the resolution. Otherwise, an
application that attempts beam chasing might stop responding if it waits for a scan-line value that is greater
than the width of the display and that the application would otherwise never receive while in portrait mode.
All versions of the DirectX runtime handle all accesses to a rotated primary surface that are made by a
windowed-mode device that uses various forms of emulation.
To ensure that a display driver that conforms to the Windows Display Driver Model (WDDM) or the Windows 2000
display driver model (XDDM) runs on Microsoft Windows with a specific version of Microsoft DirectX, you must
apply an appropriate version number to that driver. If a vendor distributes a display driver with the wrong version
number or a version number that uses the wrong format, end users will encounter difficulties when they install any
DirectX application.
Note The DriverVer directive provides a way to add version information for the driver package, including the
driver file and the INF file itself, to the INF file. By using the DriverVer directive, you can safely and definitively
replace driver packages by future versions of the same package. For more information about this directive, see INF
DriverVer Directive.
This table gives examples of the range of version numbers that are appropriate for vendor-supplied display drivers
that conform to WDDM for compatibility with various versions of DirectX. \
TARGET SYSTEM RANGE OF VERSION NUMBERS
WDDM and DirectX 9.0-compatible display drivers 7.14.01.0000 - 7.14.99.9999
WDDM and DirectX 10.0-compatible display drivers 7.15.01.0000 - 7.15.99.9999
This table gives the range of version numbers that are appropriate for vendor-supplied display drivers that
conform to the Windows 2000 display driver model for compatibility with DirectX 9.0.
TARGET SYSTEM RANGE OF VERSION NUMBERS
XDDM and DirectX 9.0-compatible display drivers 6.14.01.0000 - 6.14.99.9999
For more information about versioning for display drivers, see Version Numbers for Display Drivers.
Supporting Brightness Controls on Integrated Display
Panels
Brightness controls are implemented in the monitor driver, Monitor.sys, supplied by the operating system. The
monitor driver implements a Windows Management Instrumentation (WMI) interface to allow applications (such
as the operating system's brightness slider) to interact with the brightness level. The monitor driver registers with
the Device Power Policy Engine (DPPE) so that brightness levels respond to changes in power policy. The monitor
driver registers with the Advanced Configuration and Power Interface (ACPI) to process ACPI-based brightness
shortcut keys. For compatibility with the Windows 2000 Display Driver Model, the monitor driver implements the
IOCTL-based brightness controls.
Either the display miniport driver or ACPI methods that are exposed by the system basic input/output system
(BIOS) can support changing the brightness of an integrated display panel. For the first video target that is marked
as having output technology that connects internally in a computer (D3DKMDT_VOT_INTERNAL), the monitor
driver calls the display miniport driver's DxgkDdiQueryInterface function to query for the Brightness Control
Interface that is identified by GUID_DEVINTERFACE_BRIGHTNESS_2 and
DXGK_BRIGHTNESS_INTERFACE_VERSION_1, and the Brightness Control Interface V. 2 (Adaptive and Smooth
Brightness Control) that is identified by GUID_DEVINTERFACE_BRIGHTNESS and
DXGK_BRIGHTNESS_INTERFACE_VERSION_2. If the display miniport driver does not support at least the Brightness
Control Interface, the monitor driver uses ACPI to query for the _BCL, _BCM, and _BQC methods on the child
device. For more information about these methods, see the ACPI specification on the ACPI website.
Note In the Windows Display Driver Model (WDDM), an ACPI identifier is not used to identify an integrated
display panel. This is different from the Windows 2000 Display Driver Model, which supports only display panels
with an identifier of 0x0110.
If either the display miniport driver or BIOS-exposed ACPI methods support brightness controls, the monitor driver
registers for ACPI notifications of brightness shortcut keys. No alternative mechanism exists to signal the monitor
driver about shortcut key notifications. If the monitor driver cannot use either brightness-control mechanism or if
the display miniport driver supplies the brightness control interface but fails a call to the
DxgkDdiGetPossibleBrightness function, the monitor driver does not support brightness controls.
Brightness Levels
Brightness levels are represented as single-byte values in the range from zero to 100 where zero is off and 100 is
the maximum brightness that a laptop computer supports. Every laptop computer must report a maximum
brightness level of 100; however, a laptop computer is not required to support a level of zero. The only
requirement for values from zero to 100 is that larger values must represent higher brightness levels. The
increment between levels is not required to be uniform, and a laptop computer can support any number of distinct
values up to the maximum of 101 levels. You must decide how to map hardware levels to the range of brightness
level values. However, a call to the display miniport driver's DxgkDdiGetPossibleBrightness function should not
report more brightness level values than the hardware supports.
Disabling Automatic Brightness Changes by the BIOS
To avoid problems that might occur if the system BIOS and the monitor driver both control display panel
brightness, the display miniport driver should set bit 2 of the argument to the _DOS method. For more information
about the _DOS method and its arguments, see the ACPI specification. By setting bit 2, the system BIOS is informed
that it should not perform any automatic brightness changes.
BIOS Requirements to Support Brightness Controls
For the display miniport driver to support controlling integrated panel brightness in an optimum way, the system
BIOS must provide the following items through the ACPI:
Brightness control methods
An integrated panel device should support the ACPI brightness control methods (_BCL, _BCM, and _BQC).
_BCL and _BCM are unchanged since version 1.0b of the ACPI specification; you can find their definitions in
the ACPI 3.0 specification in sections B.6.2 and B.6.3. _BQC is optional and is defined in the ACPI 3.0
specification in section B.6.4. For definitions of brightness levels, see Brightness Levels.
The following are the aliases for the ACPI brightness control methods defined in Dispmprt.h:
ACPI_METHOD_OUTPUT_BCLÂ - Allows Windows to query a list of brightness levels supported by the
display output devices. This method is required if an integrated LCD is present and supports brightness
levels.
ACPI_METHOD_OUTPUT_BCMÂ - Allows Windows to set the brightness level of the display output
device. Windows will only set levels that were reported by the ACPI_METHOD_OUTPUT_BCL method. The
ACPI_METHOD_OUTPUT_BCM method is required if the ACPI_METHOD_OUTPUT_BCL method is
implemented.
Disabling the automatic system BIOS brightness control
The system BIOS should support setting bit 2 of the argument to the _DOS method on the graphics adapter
to allow automatic system BIOS brightness changes to be disabled. This bit is an addition to the previously
defined values for the bits in this method. For details about this bit, see section B.4.1 in the ACPI 3.0
specification. If this bit is not supported, the monitor driver and the system BIOS can both change the
brightness level, which results in a flicker of brightness and can potentially leave the brightness set to a
value that is not what the user requested.
The following alias for the ACPI automatic brightness control method is defined in Dispmprt.h:
ACPI_METHOD_DISPLAY_DOSÂ - Indicates that the system BIOS is capable of automatically switching
the active display output or controlling the brightness of the LCD. The following are the allowed
parameters:
ACPI_ARG_ENABLE_AUTO_LCD_BRIGHTNESS. States that the system BIOS should automatically
control the brightness level of the LCD when the power changes from AC to DC.
ACPI_ARG_DISABLE_AUTO_LCD_BRIGHTNESS. States that the system BIOS should not
automatically control the brightness level of the LCD when the power changes from AC to DC.
Notifications of brightness shortcut keys
Brightness shortcut key notifications should be targeted to the integrated display panel device, not to the
graphics adapter.
The following notifications are supported as defined in Dispmprt.h:
ACPI_NOTIFY_CYCLE_BRIGHTNESS_HOTKEY - The user has pressed the hotkey for cycling display
brightness.
ACPI_NOTIFY_INC_BRIGHTNESS_HOTKEY - The user has pressed the hotkey for increasing display
brightness.
ACPI_NOTIFY_DEC_BRIGHTNESS_HOTKEY - The user has pressed the hotkey for decreasing display
brightness.
ACPI_NOTIFY_ZERO_BRIGHTNESS_HOTKEY - The user has pressed the hotkey for reducing display
brightness to zero.
These shortcut key notifications are new to the ACPI 3.0 specification and are described in section B.7.
Typically, a laptop computer would not support all of these shortcut key notifications.
The default behavior of the monitor driver for the ACPI_NOTIFY_INC_BRIGHTNESS_HOTKEY and
ACPI_NOTIFY_DEC_BRIGHTNESS_HOTKEY notifications is to increment (or decrement) brightness by at least
5 percent more (or less) than the previous brightness level, until the next available 5-percent step level is
reached (5, 10, 15, ..., 95, 100). Incrementing or decrementing with shortcut keys can create asymmetrical
patterns in brightness levels, as the following examples show.
Available _BCL brightness control levels specified as 0, 1, 5, 10, ..., 95, 100
Results using the ACPI_NOTIFY_INC_BRIGHTNESS_HOTKEY notification:
0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100
Results using the ACPI_NOTIFY_DEC_BRIGHTNESS_HOTKEY notification:
100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, 0
Available _BCL brightness control levels specified as 1, 5, 10, ..., 95, 100
Results using the ACPI_NOTIFY_INC_BRIGHTNESS_HOTKEY notification:
1, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100
Results using the ACPI_NOTIFY_DEC_BRIGHTNESS_HOTKEY notification:
100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, 1
In the latter example, 1 is the last available value, so the driver sets the brightness level to 1 even though it is
less than 5 percentage units different from the previous value of 5.
This default monitor driver behavior can be overridden by changing the value of the DWORD.
MinimumStepPercentage in the following registry key:
HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Monitor \ Parameters\
Related topics
A comprehensive approach to system configuration and device power control is built into Windows, based on the
Advanced Configuration and Power Interface (ACPI) specification. Windows supports capabilities that can be used
by drivers to manage the configuration and power of display output devices. For more information, see the ACPI
specification on the ACPI website.
BIOS Requirements to Support Display Output Devices

The display miniport driver or ACPI methods that are exposed by the system BIOS support display output devices
configuration. The DxgkDdiNotifyAcpiEvent function is called to notify the display miniport driver about ACPI
events. For example, when the user presses the keyboard shortcut for the output device switch, the
DxgkDdiNotifyAcpiEvent function is called with ACPI_NOTIFY_CYCLE_DISPLAY_HOTKEY notification and a
request type of DXGK_ACPI_CHANGE_DISPLAY_MODE. As a result, the operating system calls the
DxgkDdiRecommendFunctionalVidPn function to query the selected display output device.
The following aliases for the ACPI display output are defined in Dispmprt.h:
ACPI_METHOD_DISPLAY_DOD - Enumerates all the devices attached to the display adapter. This method is
required if the integrated controller supports switching of output devices. This is the alias name for the DOD_
method defined by the ACPI specification.
ACPI_METHOD_DISPLAY_DOS - Indicates that the system firmware is capable of automatically switching the
active display output. This is the alias name for the SOD_ method defined by the ACPI specification. The
following are the allowed parameters:
ACPI_ARG_ENABLE_SWITCH_EVENT. States that the system firmware should not automatically switch
the active display output device. Instead, it must save the desired change in state variables associated
with each display output device and generate a display switch event. The operating system can query the
active status of a device by calling the ACPI_METHOD_OUTPUT_DGS method.
ACPI_ARG_ENABLE_AUTO_SWITCH. States that the system firmware should automatically switch the
active display output device without interacting with the operating system. It does not generate a display
switch event.
ACPI_ARG_DISABLE_SWITCH_EVENT. States that the system firmware should not perform any action;
that is, neither switch the output device nor notify the operating system. The values returned by the
ACPI_METHOD_OUTPUT_DGS method are locked.
ACPI_METHOD_OUTPUT_DCS - Returns the status of a display output device. This is the alias name for the
CSD_ method defined by the ACPI specification.
ACPI_METHOD_OUTPUT_DGS - Checks whether the status of a display output device is active. This is the alias
name for the SGD_ method defined by the ACPI specification.
ACPI_METHOD_OUTPUT_DSS - Sets the status of a display output device to active or inactive. This is the alias
name for the SSD_ method defined by the ACPI specification. The operating system manages this action to
avoid flickering.
ACPI_METHOD_DISPLAY_GPD - Queries the CMOS entry to determine which video device is posted at boot
time. This is the alias name for the DPG_ method defined by the ACPI specification.
ACPI_METHOD_DISPLAY_SPD - Updates the CMOS entry that determines which video device is posted at boot
time. This is the alias name for the DPS_ method defined by the ACPI specification.
ACPI_METHOD_DISPLAY_VPO - Determines what video options are implemented. This is the alias name for the
OPV_ method defined by the ACPI specification.
External Asynchronous Events
The operating system must be notified about external, asynchronous events that affect the display output devices.
The following notifications and related request types are defined in Dispmprt.h and used in the
DxgkDdiNotifyAcpiEvent function.
ACPI_NOTIFY_CYCLE_DISPLAY_HOTKEY - Notifies the operating system that the user has pressed the cycle
display keyboard shortcut.
ACPI_NOTIFY_NEXT_DISPLAY_HOTKEY - Notifies the operating system that the user has pressed the next
ACPI_NOTIFY_PREV_DISPLAY_HOTKEY - Notifies the operating system that the user has pressed the previous
Note The previous notifications depend on the handling of the event caused by the user when pressing the
keyboard shortcuts.
The following are the types of requests that the display miniport driver can make to the operating system.
DXGK_ACPI_CHANGE_DISPLAY_MODE - Requests to initiate a mode change to the new recommended active
video present network (VidPN).
DXGK_ACPI_POLL_DISPLAY_CHILDREN - Requests to poll the connectivity of the children of the display
adapter.
Note The previous requests are the values of the AcpiFlags parameter returned by the DxgkDdiNotifyAcpiEvent
function.
Related topics
To prevent a display application from making a video present source the primary view, you should mark the source
as removable. To indicate which sources are removable, you can specify a DWORD Plug and Play (PnP) value in the
registry named RemovableSources.
Note You cannot mark source 0 in the DWORD bit-field value as removable.
The nth bit in the bit-field value specifies whether source n-1 is removable. For example, to mark source 1 as
removable, you can add the following line to a display miniport driver's INF file:
HKR,, RemovableSources, %REG_DWORD%, 2

...
For more information about installing display drivers, see Installation Requirements for Display Miniport and User-
Mode Display Drivers.
Stereoscopic 3D
Windows 8 provides a consistent API and device driver interface (DDI) platform for stereoscopic 3-D scenarios
such as gaming and video playback.
Driver implementation—Full graphics Optional
WHCK requirements and tests Device.Graphicsâ€¦ProcessingStereoscopicVideoCon

tent
Device.Display.Monitor.Stereoscopic3DModes
Stereoscopic 3-D rendering is only enabled on systems that have all the components that are stereoscopic 3-D-
capable. These components include 3-D-capable display hardware, graphics hardware, peripherals, and software
applications. The stereo design in the graphics stack is such that the particular visualization or display technology
that is used is agnostic to the operating system. The display driver communicates directly to the graphics display
and has knowledge about the display capabilities through the standardized Extended Display Identification Data
(EDID) structure. The driver enumerates stereo capabilities only when it recognizes that such a display is connected
to the system.
To implement stereo capabilities in your display miniport and user-mode drivers, see the lists of new or updated
DDIs below.
The stereoscopic display setting is part of the Screen Resolution control panel, as shown here:
The Enable Stereo setting is a checkbox with the following states:

Not available (either grayed out or invisible): On systems incapable of rendering on stereo displays.
Set to Enabled (checked): This is the default setting on systems capable of rendering on stereo displays and
implies Stereo-On-Demand. By default, the Desktop Window Manager (DWM) is mono mode. DWM switches
to stereo mode only when a stereo app is launched by the user (on-demand). Note that the DWM can be in
either mono or stereo mode when this checkbox is checked.
Set to Disabled (unchecked): DWM is in mono mode if the user has unchecked this setting. Stereo applications
present in mono mode in this case.
Stereoscopic 3-D kernel-mode support

These DDIs are updated for Windows 8 to support stereoscopic 3-D rendering on a VidPN.
D3DDDI_ALLOCATIONINFO
D3DKMDT_VIDPN_SOURCE_MODE_TYPE
D3DKMT_PRESENTFLAGS
DXGI_DDI_ARG_ROTATE_RESOURCE_IDENTITIES
DXGK_PRESENTFLAGS
DXGK_SETVIDPNSOURCEADDRESS_FLAGS
DXGKARG_OPENALLOCATION
Stereoscopic 3-D swapchain DDIs

These DDIs are new or updated for Windows 8 to support stereoscopic 3-D swapchains.
BltDXGI
Blt1DXGI
CreateResource(D3D10)
CreateResource(D3D11)
RotateResourceIdentitiesDXGI
D3DDDI_ALLOCATIONINFO
DXGI_DDI_ARG_ROTATE_RESOURCE_IDENTITIES
DXGI_DDI_PRESENT_FLAGS
DXGI_DDI_PRIMARY_DESC

System builders are encouraged to test their stereo driver packages by using the above settings to ensure correct
functionality.
Stereo 3-D functionality can be enabled only on Microsoft DirectX 10â€“capable hardware and higher. However,
since Microsoft Direct3D 11 APIs work on DirectX 9.x and 10.x hardware, all WDDM 1.2 drivers must support
Direct3D 11 and be tested thoroughly to ensure that Direct3D 11APIs work on all Windows 8 hardware.
Although stereoscopic 3-D is an optional WDDM 1.2 feature, Direct3D 11 API support is required on all Windows 8
hardware. Therefore, WDDM 1.2 drivers (Full Graphics and Render devices) must support Direct3D 11 APIs by
adding support for cross-process sharing of texture arrays. This requirement is to ensure that stereo apps donâ€™t
have failures in mono modes.
For more info on requirements that hardware devices must meet when they implement this feature, refer to the
relevant WHCK documentation on Device.Graphicsâ€¦Processing Stereoscopic Video Content and
Device.Display.Monitor.Stereoscopic 3D Modes.
The Output Protection Manager (OPM) device driver interface (DDI) enables the copy protection of video signals
that are output by various connectors of the graphics adapter. To learn more about how Windows Vista protects
the content that graphics adapters output, download the Output Content Protection document at the Output
Content Protection and Windows Vista website.
OPM is the successor to the Certified Output Protection Protocol (COPP) that the Windows 2000 display driver
model provides. OPM supports all of COPP's features. For information about COPP's features, see Introduction to
COPP. OPM also supports new features.
The OPM DDI is semantically similar to the COPP DDI because OPM is essentially COPP 1.1 for the Windows Vista
display driver model. However, the OPM DDI is much simpler than the COPP DDI because the OPM DDI consists of
a set of functions while the COPP DDI is mapped through the DirectDraw and DirectX Video Acceleration (VA) DDI.
If a display miniport driver supports the passing of protected commands, information, and status between
applications and the driver, the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) can successfully open
the driver's OPM DDI.
The following topics describe the new features of OPM and how to support and use the OPM DDI:
OPM Terminology
OPM Features
Performing a Hardware Functionality Scan
Retrieving the OPM DDI
Using the OPM DDI
Handling Protection Levels with OPM
Handling the Loss of a Display Device
Retrieving Information About a Protected Output
Retrieving COPP-Compatible Information about a Protected Output
Configuring a Protected Output
Reporting Status of a Protected Output
Implementation Tips and Requirements for OPM
OPM Terminology
The following are the primary terms that are used with OPM:
Connector
The physical output connection between the graphics adapter and a display device.
Protection type
The type of protection that can be applied to a video signal that is passed through a graphics adapter's connector.
More than one type of protection can be applied to a single connector.
Protection level
The level of protection that is applied to a video signal that is passed through a graphics adapter's connector. The
level value is dependent on the protection type. Some protection types (for example, High-bandwidth Digital
Content Protection (HDCP)) have only two protection levels (for example, on and off).
OPM Features
OPM supports all of Certified Output Protection Protocol's (COPP) features. The following describes some new
OPM features and how some OPM features compare to COPP features:
OPM requires that applications sign requests for information from the video output while COPP does not
require that applications sign requests for information from the graphics driver.
Note A COPP graphics driver is equivalent to an OPM video output.
COPP applications request information from a graphics driver by causing a [**DXVA\_COPPStatusInput**]

(https://msdn.microsoft.com/library/windows/hardware/ff563899) structure to be passed to the driver.
OPM supports High-bandwidth Digital Content Protection (HDCP) repeaters. For more information about
HDCP repeaters, see the HDCP Specification Revision 1.1.
Applications can more easily support HDCP in OPM. Applications are not required to parse HDCP System
Renewability Messages (SRMs) and to determine if a monitor was revoked. For more information about
HDCP SRMs, see the HDCP Specification Revision 1.1.
OPM uses X.509 certificates and COPP uses proprietary XML certificates. The COPP certificate format is
based on the signature format in the XML-Signature Syntax and Processing specification. For information
about X.509 certificates, see the X.509 Certificate Profile.
COPP applications get the COPP IAMCertifiedOutputProtection interface by creating version 7 or 9 of the
Video Mixing Renderer (VMR) and then passing IID_IAMCertifiedOutputProtection to the VMR filter's
implementation of IUnknown::QueryInterface. OPM applications get the IOPMVideoOutput interface by
passing an HMONITOR or an IDirect3DDevice9 object to the OPMGetVideoOutputsFromHMONITOR or
OPMGetVideoOutputsFromIDirect3DDevice9Object function respectively. For more information about
these functions and interfaces, see the Microsoft Windows SDK documentation.
OPM supports clone mode in all cases while COPP supports clone mode only in one specific case.
OPM's redistribution control flag has slightly different semantics than COPP's redistribution control flag
(COPP_CGMSA_RedistributionControlRequired).
Performing a Hardware Functionality Scan
A display miniport driver's Hardware Functionality Scan (HFS) ensures that the miniport driver communicates with
the required hardware. For more information about HFS, download the Output Content Protection document at the
Output Content Protection and Windows Vista website.
A display miniport driver must start performing an HFS whenever the Microsoft DirectX graphics kernel subsystem
(Dxgkrnl.sys) calls the following driver functions:
DxgkDdiStartDevice
DxgkDdiSetPowerState with the graphics adapter's power state set to D0.
The HFS can be asynchronous and is not required to complete before DxgkDdiStartDevice or
DxgkDdiSetPowerState returns. However, no OPM DDI function can return until the HFS completes.
Retrieving the OPM DDI
The following sequence shows how the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) retrieves the
display miniport driver's OPM DDI:
1. The DirectX graphics kernel subsystem calls the display miniport driver's DxgkDdiAddDevice function to
create a context block for a graphics adapter and to return a handle to that graphics adapter.
2. The DirectX graphics kernel subsystem initializes a QUERY_INTERFACE structure with the values in the
following table.
MEMBER NAME MEMBER TYPE VALUE
InterfaceType CONST PGUID A pointer to

GUID_DEVINTERFACE_OPM
(BF4672DE-6B4E-4BE4-A325-
68A91EA49C09)
Size USHORT sizeof(DXGK_OPM_INTERFACE)
Version USHORT DXGK_OPM_INTERFACE_VERSIO

N_1
Interface PINTERFACE A pointer to a

DXGK_OPM_INTERFACE
structure
InterfaceSpecificData PVOID NULL
3. The DirectX graphics kernel subsystem passes the initialized QUERY_INTERFACE in a call to the display
miniport driver's DxgkDdiQueryInterface function.
4. If the display miniport driver does not support the OPM interface, DxgkDdiQueryInterface must return
STATUS_NOT_SUPPORTED.
If the display miniport driver supports OPM, DxgkDdiQueryInterface initializes the
DXGK_OPM_INTERFACE structure that was received in the Interface member of QUERY_INTERFACE with
the values in the following table.
Member name, type, and value:
Size
Type USHORT
sizeof(DXGK_OPM_INTERFACE)
Version
Type USHORT
DXGK_OPM_INTERFACE_VERSION_1
InterfaceReference
Type PINTERFACE_REFERENCE
A pointer to the display miniport driver's InterfaceReference routine (For information about
InterfaceReference, see the Remarks section of the INTERFACE structure.)
InterfaceDereference
Type PINTERFACE_DEREFERENCE
A pointer to the display miniport driver's InterfaceDereference routine (For information about
InterfaceDereference, see the Remarks section of the INTERFACE structure.)
DxgkDdiOPMGetCertificateSize
Type DXGKDDI_OPM_GET_CERTIFICATE_SIZE
A pointer to the display miniport driver's DxgkDdiOPMGetCertificateSize function
DxgkDdiOPMGetCertificate
Type DXGKDDI_OPM_GET_CERTIFICATE
A pointer to the display miniport driver's DxgkDdiOPMGetCertificate function
DxgkDdiOPMCreateProtectedOutput
Type DXGKDDI_OPM_CREATE_PROTECTED_OUTPUT
A pointer to the display miniport driver's DxgkDdiOPMCreateProtectedOutput function
DxgkDdiOPMGetRandomNumber
Type DXGKDDI_OPM_GET_RANDOM_NUMBER
A pointer to the display miniport driver's DxgkDdiOPMGetRandomNumber function
DxgkDdiOPMSetSigningKeyAndSequenceNumbers
DXGKDDI_OPM_SET_SIGNING_KEY_AND_SEQUENCE_NUMBERS
A pointer to the display miniport driver's DxgkDdiOPMSetSigningKeyAndSequenceNumbers function
DxgkDdiOPMGetInformation
DXGKDDI_OPM_GET_INFORMATION
A pointer to the display miniport driver's DxgkDdiOPMGetInformation function
DxgkDdiOPMGetCOPPCompatibleInformation
DXGKDDI_OPM_GET_COPP_COMPATIBLE_INFORMATION
A pointer to the display miniport driver's DxgkDdiOPMGetCOPPCompatibleInformation function
DxgkDdiOPMConfigureProtectedOutput
DXGKDDI_OPM_CONFIGURE_PROTECTED_OUTPUT
A pointer to the display miniport driver's DxgkDdiOPMConfigureProtectedOutput function
DxgkDdiOPMDestroyProtectedOutput
DXGKDDI_OPM_DESTROY_PROTECTED_OUTPUT
A pointer to the display miniport driver's DxgkDdiOPMDestroyProtectedOutput function
5. When the display miniport driver is finished using the OPM interface, the driver calls its
InterfaceDereference routine. The driver should call InterfaceDereference before its
DxgkDdiRemoveDevice function is called.
Using the OPM DDI
The Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) uses the OPM DDI to create OPM protected outputs,
destroy OPM protected outputs, get certificates, configure protected outputs, get information about protected
outputs, and get information about the graphics adapter. The DirectX graphics kernel subsystem gets pointers to
the OPM DDI functions when it calls the display miniport driver's DxgkDdiQueryInterface function to query for
the interface that is identified by GUID_DEVINTERFACE_OPM and DXGK_OPM_INTERFACE_VERSION_1. The
following sequence describes how the OPM DDI is typically used to create, manipulate, and destroy OPM protected
outputs:
1. The DirectX graphics kernel subsystem calls the DxgkDdiOPMCreateProtectedOutput function to create
an OPM protected output. An OPM protected output always corresponds to exactly one physical video
output. DxgkDdiOPMCreateProtectedOutput returns a handle to the newly created output.
2. The DirectX graphics kernel subsystem calls the DxgkDdiOPMGetCertificateSize and
DxgkDdiOPMGetCertificate functions to get the display miniport driver's OPM certificate or COPP
certificate and its size. Note DxgkDdiOPMCreateProtectedOutput, DxgkDdiOPMGetCertificateSize, and
DxgkDdiOPMGetCertificate are the only OPM DDI functions that the DirectX graphics kernel subsystem does
not pass a protected output handle to.
3. The DirectX graphics kernel subsystem calls the DxgkDdiOPMGetRandomNumber function to get the
protected output's random number.
4. The DirectX graphics kernel subsystem passes a 256-byte buffer in a call to the
DxgkDdiOPMSetSigningKeyAndSequenceNumbers function. The buffer contains data that is encrypted
with one of the display miniport driver's public keys. For more information about public keys, download the
Output Content Protection document from the Output Content Protection and Windows Vista website. The
public key that is used depends on the semantics of the protected output. The public key in the display
miniport driver's OPM certificate is used if the protected output has OPM semantics. The public key in the
display miniport driver's COPP certificate is used if the protected output has COPP semantics. The
encryption scheme that is used to encrypt the data also depends on the protected output's semantics. The
data is encrypted with the standard RSA algorithm if the protected output has COPP semantics and with the
RSAES-OAEP encryption scheme if the protected output has OPM semantics. For information about RSA,
AES, and RSAES-OAEP, see the RSA Laboratories website. The display miniport driver uses the appropriate
private key and decryption method to decrypt the data. A random number, two random sequence numbers,
and a 128-bit AES key are in the decrypted data. The display miniport drive ensures that the random number
matches the random number that the driver returned when its DxgkDdiOPMGetRandomNumber
function was called. The driver then stores the two sequence numbers and the 128-bit AES key.
5. The DirectX graphics kernel subsystem can now call the DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation function to get information from a protected output. The
DirectX graphics kernel subsystem can also call DxgkDdiOPMConfigureProtectedOutput to configure a
protected output. DxgkDdiOPMGetInformation can be called only if the output has OPM semantics and
DxgkDdiOPMGetCOPPCompatibleInformation can be called only if the output has COPP semantics.
Typically, the DirectX graphics kernel subsystem calls DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation to get information about the output and then calls
DxgkDdiOPMConfigureProtectedOutput one or more times to configure the output. Then, the DirectX
graphics kernel subsystem calls DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation again. The DirectX graphics kernel subsystem can get the
following types of information by calling DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation:
The output's connector type.
The types of content protection that the output supports. Outputs can currently support Analog Copy
Protection (ACP), Content Generation Management System Analog (CGMS-A), High-bandwidth Digital
Content Protection (HDCP), and DisplayPort Content Protection (DPCP). For more information about ACP,
see the Rovi (formerly Macrovision) website. For more information about HDCP, see the HDCP
Specification Revision 1.1. For more information about DisplayPort, see the DisplayPort Web article.
The output's current virtual protection level for a particular protection type.
The physical output's actual protection level for a particular protection type.
The version of the HDCP System Renewability Message (SRM) that the output currently uses. For more
information about HDCP SRM, see the HDCP Specification Revision 1.1. Only
DxgkDdiOPMGetInformation can get this information.
The connected HDCP device's key-selection vector (KSV) and whether the HDCP device is a repeater. Only
DxgkDdiOPMGetCOPPCompatibleInformation can get this information. For more information about
HDCP repeaters and KSVs, see the HDCP Specification Revision 1.1.
The type of expansion bus that the graphics adapter uses. PCI and AGP are examples of expansion buses.
The format of the images that are sent from the physical connector that is associated with the protected
output to a monitor.
The CGMS-A and ACP signaling standards that the protected output supports. Only
DxgkDdiOPMGetCOPPCompatibleInformation can get this information.
The identifier of the output.
The electrical characteristics of a Digital Video Interface (DVI) output connector.
The DirectX graphics kernel subsystem can change the following settings by calling
DxgkDdiOPMConfigureProtectedOutput:
The current protection level of one of the output's protection types. For example,
DxgkDdiOPMConfigureProtectedOutput can enable or disable HDCP and can turn off ACP protection or
change the current ACP protection level.
The current HDCP SRM that the protected output uses.
The current signaling standard that the protected output uses. This change can be done only if the output
has COPP semantics.
6. The DirectX graphics kernel subsystem calls DxgkDdiOPMDestroyProtectedOutput when it finishes
using the protected output object.
Handling Protection Levels with OPM
Each output protection type (for example, Analog Copy Protection (ACP), Content Generation Management System
Analog (CGMS-A), High-bandwidth Digital Content Protection (HDCP), and DisplayPort Content Protection (DPCP))
has protection levels associated with it. For more information about ACP, see the Rovi (formerly Macrovision)
website. For more information about HDCP, see the HDCP Specification Revision 1.1. For more information about
DisplayPort, see the DisplayPort Web article.
A graphics adapter is not required to support any output protection types. However, a graphics adapter must
accurately report the protection types that it supports for each of the graphics adapter's outputs and the currently
set protection level for each output.
ACP and CGMS-A protect analog TV signals. Currently, OPM can use ACP and CGMS-A to protect signals from
composite outputs, S-Video outputs, or component outputs. For information about the various ACP and CGMS-A
protection levels, see the DXGKMDT_OPM_ACP_PROTECTION_LEVEL and DXGKMDT_OPM_CGMSA
enumerations.
HDCP protects digital video signals. Currently, OPM can use HDCP to protect data from Digital Video Interface (DVI)
and High-Definition Multimedia Interface (HDMI) connector outputs. For information about the HDCP protection
levels, see the DXGKMDT_OPM_HDCP_PROTECTION_LEVEL enumeration.
DPCP protects digital video signals from DisplayPort output connectors.
The following sections describe the precedence that is placed on protection levels if more than one protected
output is created for a particular physical output connector and the algorithm for determining a physical output
connector's protection level:
Assigning Precedence to Protection Levels
Determining the Protection Level for a Physical Output
Assigning Precedence to Protection Levels
A precedence value is assigned to each protection level for each protection type. This way, a physical output can
determine which protection level to use if two or more protected outputs are associated with the physical output
and each protected output has a different protection level.
The Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys) can make more than one call to a display miniport
driver's DxgkDdiOPMCreateProtectedOutput function to create more than one protected output for a particular
physical output. Furthermore, each of these protected outputs can have a different protection level for the same
output protection type.
For example, suppose that a graphics adapter has one composite output that has the CGMS-A protection type, and
that protected outputs A and B are both associated with that composite output. Next, suppose that protected output
A's CGMS-A protection level is set to DXGKMDT_OPM_CGMSA_COPY_NO_MORE while protected output B's
CGMS-A protection level is set to DXGKMDT_OPM_CGMSA_COPY_ONE_GENERATION. In this situation, the
physical output cannot use both protection levels. Therefore, because the physical output can output only one
CGMS-A protection level at a time, the physical output must use the CGMS-A protection level with the higher
precedence.
The following sections show which protection level a physical output should use (from highest to lowest
precedence) when different protected outputs instruct the physical output to use different protection levels. Note
that these tables apply to protected outputs with COPP or OPM semantics.
ACP Protection Level Precedence
When different protected outputs instruct the physical output to use different ACP protection levels, the physical
output should use the protection level with the higher precedence as shown in the following table. Note that this
table applies to protected outputs with COPP semantics.
ACP PROTECTION LEVEL VALUE PRECEDENCE
DXGKMDT_OPM_ACP_OFF (0) Lowest precedence (0)
DXGKMDT_OPM_ACP_LEVEL_ONE (1) 1
DXGKMDT_OPM_ACP_LEVEL_THREE (3) 2
DXGKMDT_OPM_ACP_LEVEL_TWO (2) Highest precedence (3)
CGMS -A Protection Level Precedence

When different protected outputs instruct the physical output to use different CGMS-A protection levels, the
physical output should use the protection level with the higher precedence as shown in the following table. Note
that this table applies to protected outputs with COPP semantics.
CGMS-A PROTECTION LEVEL VALUE PRECEDENCE

CGMS-A PROTECTION LEVEL VALUE PRECEDENCE
DXGKMDT_OPM_CGMSA_OFF (0) Lowest precedence (0)
DXGKMDT_OPM_CGMSA_COPY_FREELY (1) 1
DXGKMDT_OPM_CGMSA_COPY_ONE_GENERATION (3) 2
DXGKMDT_OPM_CGMSA_COPY_NO_MORE (2) 3
DXGKMDT_OPM_CGMSA_COPY_NEVER (4) Highest precedence (4)
Note The redistribution control flag (DXGKMDT_OPM_REDISTRIBUTION_CONTROL_REQUIRED) does not affect the
CGMS-A precedence value. For example, (DXGKMDT_OPM_CGMSA_COPY_ONE_GENERATION |
DXGKMDT_OPM_REDISTRIBUTION_CONTROL_REQUIRED) has the same precedence value as
DXGKMDT_OPM_CGMSA_COPY_ONE_GENERATION.
HDCP Protection Level Precedence
When different protected outputs instruct the physical output to use different HDCP protection levels, the physical
table applies to protected outputs with COPP or OPM semantics.
HDCP PROTECTION LEVEL VALUE PRECEDENCE
DXGKMDT_OPM_HDCP_OFF (0) Lowest precedence (0)
DXGKMDT_OPM_HDCP_ON (1) Highest precedence (1)
DPCP Protection Level Precedence

When different protected outputs instruct the physical output to use different DPCP protection levels, the physical
table applies to protected outputs with OPM semantics.
DPCP PROTECTION LEVEL VALUE PRECEDENCE
DXGKMDT_OPM_DPCP_OFF (0) Lowest precedence (0)
DXGKMDT_OPM_DPCP_ON (1) Highest precedence (1)

Determining the Protection Level for a Physical
Output
You should use the algorithms in the following sections to determine the protection level for a physical video
output connector. These algorithms are represented in pseudocode.
Algorithm for Protection Level
You should use the following algorithm to determine the protection level value for a physical video output
connector:
1. For each protection type (ACP, CGMS-A, HDCP, and DPCP) that the physical output connector supports,
perform the following steps:
a. Set the proposed protection level to no output protection. For example, for ACP, a driver should set the
protection level to DXGKMDT_OPM_ACP_OFF; for CGMS-A, a driver should set the protection level to
DXGKMDT_OPM_CGMSA_OFF; for HDCP, a driver should set the protection level to
DXGKMDT_OPM_HDCP_OFF; and for DPCP, a driver should set the protection level to
DXGKMDT_OPM_DPCP_OFF.
b. For each protected output that is associated with the physical output connector, perform the
following steps:
a. Retrieve the current protected output's protection level for the current protection type.
b. If the current protection type is CGMS-A, remove the
DXGKMDT_OPM_REDISTRIBUTION_CONTROL_REQUIRED flag if the flag is set.
c. End if
d. If the current protected output's protection level has a higher precedence than the proposed
protection level, set the proposed protection level to the current protected output's protection
level.
e. End if
c. End for
d. Set the physical output's protection level to the proposed protection level.
2. End for
Algorithm for Redistribution Control
You should use the following algorithm to determine if a physical output connector must enable redistribution
control:
1. For each protected output that is associated with the physical output connector, perform the following
steps:
a. Retrieve the information on whether the current protected output's redistribution control flag is set.
b. If the DXGKMDT_OPM_REDISTRIBUTION_CONTROL_REQUIRED flag is set, perform the following
steps:
a. Enable redistribution control.
b. Stop executing the algorithm.
c. End if
2. End for
Handling the Loss of a Display Device
The following scenarios initiate a call to the display miniport driver's DxgkDdiOPMDestroyProtectedOutput
function while content protection on a graphics adapter's output connector might be enabled:
Changing the display mode
Attaching or detaching a monitor from the Windows desktop
Entering a full-screen Command Prompt window
Starting any DirectDraw or Direct3D exclusive-mode application
Performing Fast User Switching
Locking the workstation or pressing CTRL+ALT+DELETE
Attaching to the workstation by using Remote Desktop Connection
Entering a power-saving mode--for example, suspend or hibernate
Terminating the application unexpectedly--for example, through a page fault
Retrieving Information About a Protected Output
The display miniport driver can receive requests to retrieve information about the protected output that is
associated with a graphics adapter's physical output connector. The display miniport driver's
DxgkDdiOPMGetInformation function is passed a pointer to a DXGKMDT_OPM_GET_INFO_PARAMETERS
structure in the Parameters parameter that contains the information request. DxgkDdiOPMGetInformation writes
the required information to the DXGKMDT_OPM_REQUESTED_INFORMATION structure that the
RequestedInformation parameter points to. The guidInformation and abParameters members of
DXGKMDT_OPM_GET_INFO_PARAMETERS specify the information request. Depending on the information request,
the display miniport driver should populate the members of the DXGKMDT_OPM_STANDARD_INFORMATION,
DXGKMDT_OPM_OUTPUT_ID, or DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT structure with the required
information and point the abRequestedInformation member of DXGKMDT_OPM_REQUESTED_INFORMATION
to that structure. After the driver specifies the cbRequestedInformationSize (for example, sizeof
(DXGKMDT_OPM_STANDARD_INFORMATION)) and abRequestedInformation members of
DXGKMDT_OPM_REQUESTED_INFORMATION, the driver must calculate the One-key Cipher Block Chaining (CBC)-
mode message authentication code (OMAC) for the data in DXGKMDT_OPM_REQUESTED_INFORMATION and
must set this OMAC in the omac member of DXGKMDT_OPM_REQUESTED_INFORMATION. For more information
about calculating OMAC, see the OMAC-1 algorithm.
Note Before DxgkDdiOPMGetInformation returns, the display miniport driver must verify that the OMAC that is
specified in the omac member of DXGKMDT_OPM_GET_INFO_PARAMETERS is correct. The driver must also
verify that the sequence number that is specified in the ulSequenceNumber member of
DXGKMDT_OPM_GET_INFO_PARAMETERS matches the sequence number that the driver currently has stored. The
driver must then increment the stored sequence number.
Note The driver must return a 128-bit cryptographically secure random number in the rnRandomNumber
member of DXGKMDT_OPM_STANDARD_INFORMATION, DXGKMDT_OPM_OUTPUT_ID, or
DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT. The random number was generated by the sending application and
was provided in the rnRandomNumber member of DXGKMDT_OPM_GET_INFO_PARAMETERS.
The driver returns the following information for the indicated request:
For DXGKMDT_OPM_GET_SUPPORTED_PROTECTION_TYPES set in the guidInformation member and
undefined in the abParameters member of the DXGKMDT_OPM_GET_INFO_PARAMETERS structure, the
driver indicates the available types of protection mechanisms. To indicate the available protection types, the
driver returns a valid bitwise OR combination of values from the DXGKMDT_OPM_PROTECTION_TYPE
enumeration in the ulInformation member of DXGKMDT_OPM_STANDARD_INFORMATION. The
DXGKMDT_OPM_PROTECTION_TYPE_HDCP and DXGKMDT_OPM_PROTECTION_TYPE_DPCP values are
valid.
For DXGKMDT_OPM_GET_CONNECTOR_TYPE set in guidInformation and undefined in abParameters,
the driver indicates the connector type. To indicate the connector type, the driver returns a valid bitwise OR
combination of values from the D3DKMDT_VIDEO_OUTPUT_TECHNOLOGY enumeration in the
ulInformation member of DXGKMDT_OPM_STANDARD_INFORMATION.
For DXGKMDT_OPM_GET_VIRTUAL_PROTECTION_LEVEL or
DXGKMDT_OPM_GET_ACTUAL_PROTECTION_LEVEL set in guidInformation and the protection type set in
abParameters, the driver returns a protection-level value in the ulInformation member of
DXGKMDT_OPM_STANDARD_INFORMATION. If the protection type is
DXGKMDT_OPM_PROTECTION_TYPE_HDCP, the protection-level value is from the
DXGKMDT_OPM_HDCP_PROTECTION_LEVEL enumeration. If the protection type is
DXGKMDT_OPM_PROTECTION_TYPE_DPCP, the protection-level value is from the
DXGKMDT_OPM_DPCP_PROTECTION_LEVEL enumeration.
The DXGKMDT_OPM_GET_VIRTUAL_PROTECTION_LEVEL request returns the currently set protection level
for the protected output. The DXGKMDT_OPM_GET_ACTUAL_PROTECTION_LEVEL request returns the
currently set protection level for the physical connector that is associated with the protected output.
For DXGKMDT_OPM_GET_ADAPTER_BUS_TYPE set in guidInformation and undefined in abParameters,
the driver identifies the type and implementation of the bus that connects a graphics adapter to a mother
board chipset's north bridge. To identify the type and implementation of the bus, the driver returns a valid
bitwise OR combination of values from the DXGKMDT_OPM_BUS_TYPE_AND_IMPLEMENTATION
enumeration in the ulInformation member of DXGKMDT_OPM_STANDARD_INFORMATION.
For DXGKMDT_OPM_GET_CURRENT_HDCP_SRM_VERSION set in guidInformation and undefined in
abParameters, the driver returns a value in the ulInformation member of
DXGKMDT_OPM_STANDARD_INFORMATION that identifies the version number of the current High-
bandwidth Digital Content Protection (HDCP) System Renewability Message (SRM) for the protected output.
The least significant bits (bits 0 through 15) contain the SRM's version number in little-endian format. For
more information about the SRM version number, see the HDCP Specification Revision 1.1.
For DXGKMDT_OPM_GET_ACTUAL_OUTPUT_FORMAT set in guidInformation and undefined in
abParameters, the driver returns information in the members of
DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT that describe how the signal that goes through the physical
connector that is associated with the protected output is formatted.
For DXGKMDT_OPM_GET_OUTPUT_ID set in guidInformation and undefined in abParameters, the driver
returns information in the members of DXGKMDT_OPM_OUTPUT_ID that identifies the output connector.
For DXGKMDT_OPM_GET_DVI_CHARACTERISTICS set in the guidInformation member and undefined in
the abParameters member of the DXGKMDT_OPM_GET_INFO_PARAMETERS structure, the driver indicates
electrical characteristics of a Digital Video Interface (DVI) output connector. To indicate the DVI electrical
characteristics, the driver returns one of the values from the DXGKDT_OPM_DVI_CHARACTERISTICS
enumeration in the ulInformation member of DXGKMDT_OPM_STANDARD_INFORMATION.
Retrieving COPP-Compatible Information about a
Protected Output
The display miniport driver can receive requests to retrieve COPP-compatible information about the protected
output that is associated with a graphics adapter's physical output connector. The display miniport driver's
DxgkDdiOPMGetCOPPCompatibleInformation function is passed a pointer to a
DXGKMDT_OPM_COPP_COMPATIBLE_GET_INFO_PARAMETERS structure in the Parameters parameter that
contains the information request. DxgkDdiOPMGetCOPPCompatibleInformation writes the required information to
the DXGKMDT_OPM_REQUESTED_INFORMATION structure that the RequestedInformation parameter points to.
The guidInformation and abParameters members of
DXGKMDT_OPM_COPP_COMPATIBLE_GET_INFO_PARAMETERS specify the information request. Depending on the
information request, the display miniport driver should populate the members of the
DXGKMDT_OPM_STANDARD_INFORMATION, DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT,
DXGKMDT_OPM_ACP_AND_CGMSA_SIGNALING, or
DXGKMDT_OPM_CONNECTED_HDCP_DEVICE_INFORMATION structure with the required information and
point the abRequestedInformation member of DXGKMDT_OPM_REQUESTED_INFORMATION to that structure.
After the driver specifies the cbRequestedInformationSize (for example, sizeof
(DXGKMDT_OPM_STANDARD_INFORMATION)) and abRequestedInformation members of
DXGKMDT_OPM_REQUESTED_INFORMATION, the driver must calculate the One-key Cipher Block Chaining (CBC)-
mode message authentication code (OMAC) for the data in DXGKMDT_OPM_REQUESTED_INFORMATION and
must set this OMAC in the omac member of DXGKMDT_OPM_REQUESTED_INFORMATION. For more information
about calculating OMAC, see the OMAC-1 algorithm.
Note Before DxgkDdiOPMGetCOPPCompatibleInformation returns, the display miniport driver must verify
that the sequence number that is specified in the ulSequenceNumber member of
DXGKMDT_OPM_COPP_COMPATIBLE_GET_INFO_PARAMETERS matches the sequence number that the driver
currently has stored. The driver must then increment the stored sequence number.
Note The driver must return a 128-bit cryptographically secure random number in the rnRandomNumber
member of DXGKMDT_OPM_STANDARD_INFORMATION, DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT,
DXGKMDT_OPM_CONNECTED_HDCP_DEVICE_INFORMATION. The random number was generated by the sending
application and was provided in the rnRandomNumber member of
DXGKMDT_OPM_COPP_COMPATIBLE_GET_INFO_PARAMETERS.
The driver returns the following information for the indicated request:
For DXGKMDT_OPM_GET_SUPPORTED_PROTECTION_TYPES set in the guidInformation member and
undefined in the abParameters member of
DXGKMDT_OPM_COPP_COMPATIBLE_GET_INFO_PARAMETERS, the driver indicates the available types of
protection mechanisms. To indicate the available protection types, the driver returns a valid bitwise OR
combination of values from the DXGKMDT_OPM_PROTECTION_TYPE enumeration in the ulInformation
member of DXGKMDT_OPM_STANDARD_INFORMATION. The DXGKMDT_OPM_PROTECTION_TYPE_ACP,
DXGKMDT_OPM_PROTECTION_TYPE_CGMSA, and
DXGKMDT_OPM_PROTECTION_TYPE_COPP_COMPATIBLE_HDCP values are valid.
For DXGKMDT_OPM_GET_CONNECTOR_TYPE set in guidInformation and undefined in abParameters,
the driver indicates the connector type. To indicate the connector type, the driver returns a valid bitwise OR
combination of values from the D3DKMDT_VIDEO_OUTPUT_TECHNOLOGY enumeration in the
ulInformation member of DXGKMDT_OPM_STANDARD_INFORMATION.
For DXGKMDT_OPM_GET_VIRTUAL_PROTECTION_LEVEL or
DXGKMDT_OPM_GET_ACTUAL_PROTECTION_LEVEL set in guidInformation and the protection type set in
abParameters, the driver returns a protection-level value in the ulInformation member of
DXGKMDT_OPM_STANDARD_INFORMATION. If the protection type is
DXGKMDT_OPM_PROTECTION_TYPE_ACP, the protection-level value is from the
DXGKMDT_OPM_ACP_PROTECTION_LEVEL enumeration. If the protection type is
DXGKMDT_OPM_PROTECTION_TYPE_CGMSA, the protection-level value is from the
DXGKMDT_OPM_CGMSA enumeration. If the protection type is
DXGKMDT_OPM_PROTECTION_TYPE_COPP_COMPATIBLE_HDCP, the protection-level value is from the
DXGKMDT_OPM_HDCP_PROTECTION_LEVEL enumeration.
The DXGKMDT_OPM_GET_VIRTUAL_PROTECTION_LEVEL request returns the currently set protection level
for the protected output. The DXGKMDT_OPM_GET_ACTUAL_PROTECTION_LEVEL request returns the
currently set protection level for the physical connector that is associated with the protected output.
For DXGKMDT_OPM_GET_ADAPTER_BUS_TYPE set in guidInformation and undefined in abParameters,
the driver identifies the type of the bus that connects a graphics adapter to a mother board chipset's north
bridge. To identify the type of the bus, the driver returns a valid bitwise OR combination of values from the
DXGKMDT_OPM_BUS_TYPE_AND_IMPLEMENTATION enumeration in the ulInformation member of
DXGKMDT_OPM_STANDARD_INFORMATION.
The driver can only combine the DXGKMDT_OPM_COPP_COMPATIBLE_BUS_TYPE_INTEGRATED
(0x80000000) value with one of the bus-type values when none of the interface signals between the
graphics adapter and other subsystems are available on an expansion bus that uses a publicly available
specification and standard connector type. Memory buses are excluded from this definition.
For DXGKMDT_OPM_GET_ACTUAL_OUTPUT_FORMAT set in guidInformation and undefined in
DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT that describe how the signal that goes through the physical
connector that is associated with the protected output is formatted.
For DXGKMDT_OPM_GET_ACP_AND_CGMSA_SIGNALING set in guidInformation and undefined in
DXGKMDT_OPM_ACP_AND_CGMSA_SIGNALING that describe how the signal that goes through the
physical connector that is associated with the protected output is protected.
For DXGKMDT_OPM_GET_CONNECTED_HDCP_DEVICE_INFORMATION set in guidInformation and
undefined in abParameters, the driver returns information in the members of
DXGKMDT_OPM_CONNECTED_HDCP_DEVICE_INFORMATION that contain High-bandwidth Digital
Content Protection (HDCP) information.
Configuring a Protected Output
The display miniport driver can receive requests to configure the protected output that is associated with a graphics
adapter's physical output connector. The display miniport driver's DxgkDdiOPMConfigureProtectedOutput
function is passed a pointer to a DXGKMDT_OPM_CONFIGURE_PARAMETERS structure that specifies how to
configure the protected output. The guidSetting and abParameters members of
DXGKMDT_OPM_CONFIGURE_PARAMETERS specify the configuration request.
Note Before DxgkDdiOPMConfigureProtectedOutput returns, the display miniport driver must verify that the
One-key Cipher Block Chaining (CBC)-mode message authentication code (OMAC) that is specified in the omac
member of DXGKMDT_OPM_CONFIGURE_PARAMETERS is correct. For more information about verifying OMAC,
see OMAC-1 algorithm. The driver must also verify that the sequence number that is specified in the
ulSequenceNumber member of DXGKMDT_OPM_CONFIGURE_PARAMETERS matches the sequence number that
the driver currently has stored. The driver must then increment the stored sequence number.
The display miniport driver should support the following configuration requests:
Setting the Protection Level for a Protected Output
Configuring Protection for the Video Signal
Setting the HDCP SRM Version
Setting the Protection Level for a Protected Output
OPM configuration can set the protection level of a protection type on a protected output. To set the protection
level, the display miniport driver's DxgkDdiOPMConfigureProtectedOutput function receives a pointer to a
DXGKMDT_OPM_CONFIGURE_PARAMETERS structure with the guidSetting member set to the
DXGKMDT_OPM_SET_PROTECTION_LEVEL GUID and the abParameters member set to a pointer to a
DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS structure that specifies the type of protection to set
and the level at which to set the protection. The following protection levels can be set for the indicated protection
types:
For DXGKMDT_OPM_PROTECTION_TYPE_ACP specified in the ulProtectionType member of
DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS, one of the protection-level values from the
DXGKMDT_OPM_ACP_PROTECTION_LEVEL enumeration can be specified in the ulProtectionLevel
member of DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS.
For DXGKMDT_OPM_PROTECTION_TYPE_CGMSA specified in the ulProtectionType member of
DXGKMDT_OPM_CGMSA enumeration can be specified in the ulProtectionLevel member of
DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS.
For DXGKMDT_OPM_PROTECTION_TYPE_HDCP or
DXGKMDT_OPM_PROTECTION_TYPE_COPP_COMPATIBLE_HDCP specified in the ulProtectionType
member of DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS, one of the protection-level values
from the DXGKMDT_OPM_HDCP_PROTECTION_LEVEL enumeration can be specified in the
ulProtectionLevel member of DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS.
For DXGKMDT_OPM_PROTECTION_TYPE_DPCP specified in the ulProtectionType member of
DXGKMDT_OPM_DPCP_PROTECTION_LEVEL enumeration can be specified in the ulProtectionLevel
member of DXGKMDT_OPM_SET_PROTECTION_LEVEL_PARAMETERS.
Note The DXGKMDT_OPM_SET_PROTECTION_LEVEL_ACCORDING_TO_CSS_DVD GUID is new for Windows 7 and
is used to indicate that the driver should enable HDCP according to the new CSS rules. Setting the
DXGKMDT_OPM_SET_PROTECTION_LEVEL_ACCORDING_TO_CSS_DVD command is identical to setting the
existing DXGKMDT_OPM_SET_PROTECTION_LEVEL command except that
DXGKMDT_OPM_SET_PROTECTION_LEVEL_ACCORDING_TO_CSS_DVD has no absolute requirement to enable the
requested protection.
Configuring Protection for the Video Signal
OPM configuration can configure protection for the video signal that goes through the physical connector that is
associated with the protected output. To set signal protection, the display miniport driver's
DxgkDdiOPMConfigureProtectedOutput function receives a pointer to a
DXGKMDT_OPM_SET_ACP_AND_CGMSA_SIGNALING GUID and the abParameters member set to a pointer to a
DXGKMDT_OPM_SET_ACP_AND_CGMSA_SIGNALING_PARAMETERS structure that specifies how to protect the
signal.
Setting the HDCP SRM Version
OPM configuration can set the version of the High-bandwidth Digital Content Protection (HDCP) System
Renewability Message (SRM) for the protected output. To set the version, the display miniport driver's
DxgkDdiOPMConfigureProtectedOutput function receives a pointer to a
DXGKMDT_OPM_SET_HDCP_SRM GUID and the abParameters member set to a pointer to a
DXGKMDT_OPM_SET_HDCP_SRM_PARAMETERS structure. The
DXGKMDT_OPM_SET_HDCP_SRM_PARAMETERS structure contains a ULONG that specifies the version number.
The least significant bits (bits 0 through 15) contain the SRM's version number in little-endian format. For more
information about the SRM version number, see the HDCP Specification Revision 1.1.
Reporting Status of a Protected Output
External events can alter the nature of the protection that is applied to a connector or even modify the type of the
connector. The display miniport driver must report these events to OPM applications whenever the driver receives
a call to its DxgkDdiOPMGetInformation or DxgkDdiOPMGetCOPPCompatibleInformation function. The
display miniport driver must report the following external events by returning the specified status flags from the
DXGKMDT_OPM_STATUS enumeration only on the next call to DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation after the events occur:
Connection working properly
If the connection between the computer and the display device is working properly, the display miniport driver
should set the DXGKMDT_OPM_STATUS_NORMAL status flag in the ulStatusFlags member of the
DXGKMDT_OPM_CONNECTED_HDCP_DEVICE_INFORMATION structure.
Connection integrity
If the computer and the display device become disconnected, the display miniport driver should set the
DXGKMDT_OPM_STATUS_LINK_LOST status flag in the ulStatusFlags member of the
Connector reconfigurations
If the end-user causes the configuration of the physical connector to change, the display miniport driver should set
the DXGKMDT_OPM_STATUS_RENEGOTIATION_REQUIRED status flag in the ulStatusFlags member of the
Tampering
If tampering with the graphics adapter or the adapter's display miniport driver has occurred, the display miniport
driver should set the DXGKMDT_OPM_STATUS_TAMPERING_DETECTED status flag in the ulStatusFlags member
of the DXGKMDT_OPM_STANDARD_INFORMATION, DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT,
Revoked HDCP device
If a revoked High-bandwidth Digital Content Protection (HDCP) device is directly or indirectly attached to a
connector and if HDCP is enabled, the display miniport driver should set the
DXGKMDT_OPM_STATUS_REVOKED_HDCP_DEVICE_ATTACHED status flag in the ulStatusFlags member of the
DXGKMDT_OPM_STANDARD_INFORMATION or DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT structure. If HDCP is
not enabled, the driver is not required to set this status flag. The driver sets this status value only from a call to its
DxgkDdiOPMGetInformation function to determine if HDCP is enabled.
The display miniport driver returns a pointer to a DXGKMDT_OPM_STANDARD_INFORMATION,
DXGKMDT_OPM_ACTUAL_OUTPUT_FORMAT, DXGKMDT_OPM_ACP_AND_CGMSA_SIGNALING, or
DXGKMDT_OPM_CONNECTED_HDCP_DEVICE_INFORMATION structure in the abRequestedInformation
member of the DXGKMDT_OPM_REQUESTED_INFORMATION structure. A pointer to
DXGKMDT_OPM_REQUESTED_INFORMATION is returned through the RequestedInformation parameter of
DxgkDdiOPMGetInformation or DxgkDdiOPMGetCOPPCompatibleInformation.
For example, consider two media playback applications, A and B. Each application controls, via OPM, the HDCP
protection level of the connector that attaches the computer to the display monitor. Each application controls its
own unique protected output. If the connector becomes unplugged, the next time either application initiates a
DxgkDdiOPMGetInformation or DxgkDdiOPMGetCOPPCompatibleInformation request to its protected output, the
display miniport driver should return the DXGKMDT_OPM_STATUS_LINK_LOST status flag.
Assume application A is the first to initiate a call to DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation on its protected output. Application A then receives the
DXGKMDT_OPM_STATUS_LINK_LOST flag and acts accordingly. If application A initiates a subsequent
DxgkDdiOPMGetInformation or DxgkDdiOPMGetCOPPCompatibleInformation call, it should not receive the
DXGKMDT_OPM_STATUS_LINK_LOST flag, unless the connector becomes unplugged again. When application B
initiates a call to DxgkDdiOPMGetInformation or DxgkDdiOPMGetCOPPCompatibleInformation on its protected
output, it receives the DXGKMDT_OPM_STATUS_LINK_LOST flag and acts accordingly. Again, application B should
not receive the DXGKMDT_OPM_STATUS_LINK_LOST flag again until the connector becomes unplugged again.
Implementation Tips and Requirements for OPM
The following topics discuss tips and requirements for implementing OPM functionality in display miniport drivers:
OPM and ChangeDisplaySettingsEx
OPM and Display Modes
CGMS-A Standards
OPM and ChangeDisplaySettingsEx
Because applications can alter analog content protection (ACP) levels by calling the Microsoft Win32
ChangeDisplaySettingsEx function, the display miniport driver should ensure that adjustments to the ACP
protection type through ChangeDisplaySettingsEx are independent of adjustments made by the
IOPMVideoOutput interface. In other words, if the ACP protection type is set on the physical connector through
the display miniport driver's DxgkDdiOPMConfigureProtectedOutput function, the display miniport driver
should not permit disabling the ACP protection type on the physical connector through a
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS request. Note that user-mode calls to ChangeDisplaySettingsEx
initiate IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS requests to the display miniport driver.
For more information about the ChangeDisplaySettingsEx function, see the Microsoft Windows SDK
documentation.
OPM and Display Modes
The display miniport driver should report all the protection types that are supported on the physical connector that
is associated with the protected output, regardless of the display mode that is currently being used. The display
miniport driver reports supported protection types when it receives a call to its DxgkDdiOPMGetInformation or
DxgkDdiOPMGetCOPPCompatibleInformation function with
DXGKMDT_OPM_GET_SUPPORTED_PROTECTION_TYPES set in the guidInformation member of the
DXGKMDT_OPM_GET_INFO_PARAMETERS structure. For more information about retrieving supported
protection types, see Retrieving Information About a Protected Output or Retrieving COPP-Compatible Information
about a Protected Output.
If the current resolution is too high for a particular protection type, the driver should return an error when the
display miniport driver's DxgkDdiOPMConfigureProtectedOutput function is called to set the protection level
for that protection type. The following scenarios give examples of when the driver's
DxgkDdiOPMConfigureProtectedOutput function should return success and when an error:
If the protected output is associated with an S-Video output connector, a call to the display miniport driver's
DxgkDdiOPMGetCOPPCompatibleInformation function with
DXGKMDT_OPM_GET_SUPPORTED_PROTECTION_TYPES set should indicate support of the analog content
protection (ACP) type (DXGKMDT_OPM_PROTECTION_TYPE_ACP). Thereafter, if the driver's
DxgkDdiOPMConfigureProtectedOutput function is called to set a level for the ACP type on this connector,
the driver should return success because the output resolution of S-Video is fixed, even though desktop
resolution (display mode) might be higher.
If the protected output is associated with component output connectors, a call to the display miniport
driver's DxgkDdiOPMGetCOPPCompatibleInformation function with
DXGKMDT_OPM_GET_SUPPORTED_PROTECTION_TYPES set should also indicate support of the ACP type.
However, if the driver's DxgkDdiOPMConfigureProtectedOutput function is called to set a level for the ACP
type on this output when the display resolution is 720p or 1080i, the driver should return the
STATUS_GRAPHICS_OPM_RESOLUTION_TOO_HIGH error code. 720p or 1080i is too high of a resolution to
set the protection level for the ACP type to on component output connectors.
CGMS-A Standards
Multiple standards define the Content Generation Management System Analog (CGMS-A) protection type. Various
countries and regions use various versions of CGMS-A. A hardware vendor must ensure that his or her display
miniport driver supports the appropriate CGMS-A version. For example, a driver for a graphics adapter to be used
in Japan should probably support the Association of Radio Industries and Businesses (ARIB) TR-B15 standard,
which is the operational guideline for digital satellite broadcasting. However, a driver for a graphics adapter to be
used in the United States should support the International Electrotechnical Commission (IEC) 61880 standard or
the Consumer Electronics Association (CEA) CEA-608-B standard. The standard that a graphics adapter's display
miniport driver supports depends on the type of signal that the adapter transmits. The following list describes
various standards that define CGMS-A. Currently, redistribution control is defined only in the CEA-805-A standard.
CEA-805-A
Data on Component Video Interfaces
Defines how CGMS-A and redistribution control information should be encoded in an analog 480p, 720p, or 1080i
signal that is transmitted from a component video output (Y/Pb/Pr output).
This standard is published by CEA. For more information about CEA, see the Consumer Electronics Association
website.
CEA-608-B and EIA-608-B
Line 21 Data Services
Defines how CGMS-A information should be encoded in a 480i signal that is transmitted from an RF, composite, or
S-Video output.
This standard is published by CEA and Electronic Components Industry Association (ECIA). For more information
about ECIA, see the Electronic Components Industry Association website.
EN 300 294 V1.3.2 (1998-04)
Television systems; 625-line television - Wide Screen Signaling (WSS)
Defines how CGMS-A should be encoded in a 576i Phase Alternation Line (PAL) or Sequential Color with Memory
(SECAM) signal.
This standard is published by the European Telecommunications Standards Institute (ETSI). For more information
about this standard, see the ETSI website.
IEC - 61880 - First edition - Video systems (525/60)
Video and accompanied data using the vertical blanking interval - Analog interface
A method of encoding CGMS-A information in a 480i video signal that is transmitted from an analog or digital
video output.
This method is published by IEC. For more information about the IEC, see the IEC website.
IEC - 61880-2 - First edition - Video systems (525/60)
Video and accompanied data using the vertical blanking interval - Analog interface - Part 2: 525 progressive scan
system
A method of encoding CGMS-A information in a 480p video signal that is transmitted from an analog or digital
video output.
IEC - 62375 - Video systems (625/50 progressive)
Video and accompanied data using the vertical blanking interval - Analog interface
A method of encoding CGMS-A information in a 576p video signal that is transmitted from an analog or digital
video output.
ARIB TR-B15
Operational Guideline for Digital Satellite Broadcasting
Defines how CGMS-A information should be encoded in an analog 480i, 480p, 720p, or 1080i signal that is
transmitted from a video output.
This standard applies only to Japan.
This standard is published by ARIB. For more information about ARIB, see the ARIB English website.
Transient Multi-Monitor Manager is a Windows Vista feature that simplifies the setup of display configurations on
mobile computers. TMM can place a mobile computer display (for example, a laptop computer display) into clone
view when a new monitor is detected. TMM is disabled on desktop computers. For Windows Vista, there is no GDI
function that an application can call to enter clone view. Hardware vendors must continue to use their own
proprietary methods to enter clone view on desktop computers. However, hardware vendors should implement
and provide an IViewHelper COM interface object that will allow TMM to set clone-view mode on mobile
computers.
TMM Terminology
Requirements of an IViewHelper Clone-View COM Object
Using an IViewHelper Clone-View COM Object
Handling Monitor Configurations
Determining Whether a Platform is Mobile or Desktop
TMM Terminology
The following are the primary terms that are used with TMM:
Clone View
The display mode where the primary display is shown on all active monitors that are attached to a graphics
adapter.
COM
Component Object Method - provides binary standard for linking components.
External Only
Display configuration that is common on laptop computers where the display is shown on an external display
device instead of the internal display device.
Single View
The display mode where only one view is shown on one monitor.
Topology
Information that specifies which sources are shown on which targets for a graphics adapter.
Requirements of an IViewHelper Clone-View COM
Object
A hardware vendor's clone-view IViewHelper COM interface object must meet the following requirements:
The COM object must reside within a dynamic-link library (DLL), which is a COM in-process (in-proc) server.
The implementation of the COM object must be opaque to the operating system.
The IViewHelper interface must provide methods for getting and setting the topology data, which includes
clone view.
The hardware vendor must find a display mode for clone view so that the display is shown on two or more
monitors.
If a call to the COM object's IViewHelper::Commit method does not generate a mode change, Commit
must call the Win32 BroadcastSystemMessage function and must always post (using the
BSF_POSTMESSAGE broadcast option) a WM_DISPLAYCHANGE message. For more information about
BroadcastSystemMessage, see the Microsoft Windows SDK documentation.
The IViewHelper::Commit method must not be used in place of a call to the Win32
ChangeDisplaySettingsEx(NULL, NULL, NULL, 0, NULL) function with the indicated arguments. For more
information about ChangeDisplaySettingsEx, see the Windows SDK documentation.
Using an IViewHelper Clone-View COM Object
TMM will use the methods of a hardware vendor's clone-view IViewHelper COM interface object in new monitor
and persisted monitor configurations. In a persisted monitor configuration, TMM restores display data (that is,
display modes and topology data) to monitors. TMM can pass this display data to the user-mode display driver
through the IViewHelper::SetConfiguration method so the driver can modify or fold in other display data (for
example, gamma or TV settings).
Errors from a Video Present Network (VidPN) are returned through the methods of IViewHelper. Therefore, if TMM
applies an improper topology, the VidPN fails and the failure result is passed back to the calling function. Mapping
a target to two sources or using a target or source identifier that the VidPN cannot identify are examples of
improper topology.
TMM determines the IViewHelper COM interface object through the UserModeDriverGUID string registry value.
Hardware vendors should add this value under the registry keys that the DeviceKey member of the
DISPLAY_DEVICE structure specifies. A call to the Win32 EnumDisplayDevices function returns this registry key
information in DISPLAY_DEVICE that the lpDisplayDevice parameter points to. If multiple DeviceKey names exist,
this value should appear under each of those keys. The following is an example of a device key and the
UserModeDriverGUID string registry value:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\{7661971C-A9BD-48B5-ACBC-298A8826535D}\0000]
"UserModeDriverGUID"="{YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYYYY}"
For COM to load the IViewHelper COM interface object, the COM object should be registered as an in-process (in-
proc) handler, and the threading model should be Both. The GUID that is registered should match the GUID in
UserModeDriverGUID. For information about the Both threading model attribute, see the Microsoft Windows
SDK documentation.
You should only copy and register the correctly compiled versions of IViewHelper COM interface object DLLs in the
system directory. That is, you should only copy and register the 64-bit IViewHelper DLL for 64-bit operating
systems and the 32-bit IViewHelper DLL for 32-bit operating systems. The two DLL binaries should not be
concurrently present on the same computer. TMM will not operate properly if the two binaries are concurrently
present on the same computer, even with Windows on Windows (WOW).
Handling Monitor Configurations
This section provides the following two examples of how TMM configures monitors:
Handling Two Monitor Configurations
Handling Existing Monitor Configurations
Handling Two Monitor Configurations
A two-monitor configuration generates the TMM dialog. If two targets are part of the same graphics adapter, TMM
will map the one source that is currently mapped to one of the targets to both targets. After TMM performs the
mapping, the TMM dialog will pop up. If the targets are on different graphics adapters, the TMM dialog will pop up
without activating the second monitor. In this situation, the TMM dialog will not have the option for clone or
extended.
The following sequence shows the order in which TMM calls the methods of IViewHelper and performs other
operations in this situation:
1. TMM calls the EnumDisplayDevices function to retrieve the current display configuration, which includes
adapters, displays, and monitors. For more information about EnumDisplayDevices, see the Microsoft
Windows SDK documentation.
2. TMM compares display configuration against the previously recorded display configurations.
3. If the display configuration has one or two monitors with Extended Display Information Data (EDID) that
TMM has not encountered before, TMM proceeds to bring up the TMM dialog.
4. For each adapter in the display configuration, TMM makes calls to the IViewHelper::GetConnectedIDs
method to retrieve all of the sources on the adapter whether the sources are mapped or not.
5. TMM makes calls to the IViewHelper::GetConnectedIDs method to retrieve all of the targets on the
adapter, whether the targets are mapped or not. Each target must be connected but is not required to be
active.
6. For each source in the graphics adapter, TMM makes calls to the IViewHelper::GetActiveTopology
method to retrieve the active targets for the source.
7. TMM finds the graphics adapter that has a source that is mapped to a target. This source identifier is called
"CloneSource." If the adapter has two targets, TMM creates an array of two entries (ULONG targetArray[2]).
TMM places the existing target identifier as the first element and the second target identifier as the second
element.
8. TMM calls the IViewHelper::SetActiveTopology(adapterName, CloneSource, 2, targetArray) method with
the indicated parameters.
9. TMM calls the IViewHelper::Commit method.
If an error result is returned from any of the IViewHelper methods, the computer does not enter clone view, and the
TMM dialog pops up with clone-view and external-only options disabled.
If the computer enters clone view and the user chooses extended view from the TMM dialog (and clicks OK or
Apply), TMM must turn off clone view as follows:
1. TMM calls the IViewHelper::SetActiveTopology(adapterName, CloneSource, 1, targetArray) method with
the indicated parameters.
2. TMM calls the IViewHelper::Commit method.
In the preceding SetActiveTopology call, parameter three is set to 1 and not 2. In this situation,
SetActiveTopology interprets targetArray as an array with one element. SetActiveTopology turns off the
second target and enters single view. Next, TMM uses the ChangeDisplaySettingsEx function to extend the
display. For more information about ChangeDisplaySettingsEx, see the Microsoft Windows SDK documentation.
The following figure shows the flow of operations that occur when TMM handles the situation when a monitor is
added to make a two-monitor configuration.

Handling Existing Monitor Configurations
Besides detecting new monitors and launching the TMM dialog in a two-monitor configuration, TMM also must
restore previous display configurations. TMM can restore display configurations by passing display data to the
user-mode display driver through the IViewHelper::SetConfiguration method. TMM will allocate memory and
store display modes and topology information in the memory. TMM passes this memory in an IStream interface
that the pIStream parameter of SetConfiguration points to. The user-mode display driver can also modify or fold
in other display data (for example, gamma or TV settings). When the driver is finished with the display data, the
driver calls the IStream::Release method to free the memory.
The following figure shows the flow of operations that occur when TMM restores an existing monitor configuration.

Determining Whether a Platform is Mobile or
Desktop
TMM runs only on mobile computers and is automatically disabled on desktop computers. Hardware vendors
should enable and use their own proprietary methods to enter clone view on desktop computers. They should
determine if a platform is mobile so that they can avoid using their proprietary methods to enter clone view on a
mobile computer and instead use TMM.
Hardware vendors can use the following code to determine if a platform is mobile or desktop. The platform can
then use the appropriate mechanism to enter clone view.
#include <Powrprof.h> // For GetPwrCapabilities
BOOL IsMobilePlatform()
{
BOOL fIsMobilePlatform = FALSE;
fIsMobilePlatform = (PlatformRoleMobile == PowerDeterminePlatformRole());
POWER_PLATFORM_ROLE iRole;
// Check if the operating system determines

// that the computer is a mobile computer.
iRole = PowerDeterminePlatformRole();
if (PlatformRoleMobile == iRole)
{
fIsMobilePlatform = TRUE;
}
else if (PlatformRoleDesktop == iRole)
// Can happen when a battery is not plugged into a laptop
{
SYSTEM_POWER_CAPABILITIES powerCapabilities;
if (GetPwrCapabilities(&powerCapabilities))
{
// Check if a battery exists, and it is not for a UPS.
// Note that SystemBatteriesPresent is set on a laptop even if the battery is unplugged.
fIsMobilePlatform = ((TRUE == powerCapabilities.SystemBatteriesPresent) && (FALSE ==
powerCapabilities.BatteriesAreShortTerm));
}
// GetPwrCapabilities should never fail
// However, if it does, leave fReturn == FALSE.
}
return fIsMobilePlatform;
}
For information about the functions that are called in the preceding code, see the Microsoft Windows SDK
documentation.
This section applies only to Windows 7 and later, and Windows Server 2008 R2 and later versions of the Microsoft
Windows operating system.
The new Connecting and Configuring Displays (CCD) Win32 APIs that are described in the Connecting and
Configuring Displays reference section provide more control over the desktop display setup. They can also be used
to make your app display correctly on a portrait device. For example, in versions of Windows prior to Windows 7, it
was impossible to set clone mode by using the ChangeDisplaySettingsEx function. The new CCD APIs move
away from using Windows Graphics Device Interface (GDI) concepts like view name and toward Windows Display
Driver Model (WDDM) concepts like adapter, source, and target identifiers.
The display control panel, new hot keys, and the Hot Plug Detection (HPD) manager can use the CCD APIs. OEMs
can use the CCD APIs for their value-add applets instead of using private driver escapes.
The CCD APIs provide the following functionality:
Enumerate the display paths that are possible from the currently connected displays.
Set the topology (for example, clone and extend), layout information, resolution, orientation, and aspect
ratio for all the connected displays in one function call. By performing multiple settings for all connected
displays in one function call, the number of screen flashes is reduced.
Add or update settings to the persistence database.
Apply settings that are persisted in the database.
Use best mode logic to apply optimum display settings.
Use best topology logic to apply the optimum topology for the connected displays.
Start or stop forced output.
Allow OEM hot keys to use the new operating system persistence database.
The CCD APIs cannot handle the following tasks. In addition, the CCD APIs are not backward compatible with the
Windows 2000 display driver model.
Replace the API sets and private driver escapes that hardware vendors previously provided to control
desktop display setup.
Pass private data down to the kernel-mode display miniport driver.
Provide a new set of monitor-control APIs.
Query the monitor capabilities, which include EDID, DDCCI, and so on.
Provide a context identifier to uniquely identify the settings that the CCD APIs retrieve from the persistence
database.
Although the CCD APIs allows a caller to get and set the displays, they do not provide any functionality to
enumerate the possible source modes in a given path. APIs that existed prior to Windows 7 already provide
this functionality.
The following sections describe the CCD APIs in more detail:
CCD Concepts
CCD APIs
Note In addition to using the CCD APIs to set up the desktop display, hardware vendors must modify their
Windows 7 Windows Display Driver Model (WDDM) display miniport drivers to support CCD. For more
information about supporting CCD in display miniport drivers, see CCD DDIs.
CCD Concepts
operating system.
The following sections describe CCD concepts:
Forced Versus Connected Targets
Path Priority Order
Desktop Layout
Relationship of Mode Information to Path Information
Scaling the Desktop Image
Forced Versus Connected Targets
operating system.
The CCD APIs introduce the concepts of connected monitors and forceable targets. A monitor is connected to a
target if the GPU can detect the presence of the monitor, which is a physical attribute of the monitor and target. A
target is forceable if the GPU can send a display signal out of the target even if the GPU cannot detect a connected
monitor. All analog target types are considered forceable, and all digital targets are not considered forceable. The
following table describes the combination of connected and forced states when the path is active and not active.
PATH-ACTIVE STATE PATH-FORCED STATE MONITOR-CONNECTION STATE RESULT
Active Forced Connected Target output is enabled

because a monitor is
connected and is active.
Active Forced Not connected Target output is enabled

as the path is being
forced and is active.
Active Not forced Connected Target output is enabled

because a monitor is
connected and is active.
Active Not forced Not connected The path cannot be set

because it is not being
forced and the monitor
is not connected.
Not active Forced Connected Target output can be

enabled because it is
being forced and a
monitor is connected.
Not active Forced Not connected Target output can be

enabled because it is
being forced.
Not active Not forced Connected Target output can be

enabled because a
monitor is connected.
PATH-ACTIVE STATE PATH-FORCED STATE MONITOR-CONNECTION STATE RESULT
Not active Not forced Not connected Target output cannot be

enabled because a
monitor is not
connected and the path
is not being forced.
The following table describes several types of possible forced state for each path.
FORCED STATE MEANING
Normal force This forced state is lost after power transitions, reboots, or
forced state is turned off.
Path-persistent This forced state is lost after reboot. The Microsoft Win32
ChangeDisplaySettingsEx function always destroys all
path-persisted monitors even if those monitors in their
paths are the target of the ChangeDisplaySettingsEx
call. If a caller calls the SetDisplayConfig CCD function
with the SDC_USE_SUPPLIED_DISPLAY_CONFIG or
SDC_TOPOLOGY_SUPPLIED flag set in the Flags
parameter, SetDisplayConfig removes the path-persisted
monitor if the new topology does not include the path
that the monitor is in. For all other SDC_TOPOLOGY_XXX
flags that the caller specifies in the Flags parameter,
SetDisplayConfig removes the path-persisted monitor
unless the caller also specifies the
SDC_PATH_PERSIST_IF_REQUIRED flag and the path is
active in the new topology.
Boot persistent This forced state is only lost when it is turned off. This
state is persistent across system reboots.

Path Priority Order
operating system.
The SetDisplayConfig CCD function determines that the active paths within the path array that is specified by the
pathArray parameter are ordered such that SetDisplayConfig gives higher priority to lower number array path
elements. The following items impact the ordering:
If SetDisplayConfig does not find an existing display configuration, SetDisplayConfig uses the path
priority during the best mode logic in the search order. Therefore, SetDisplayConfig is more likely to
satisfy a higher priority path at native resolution than a lower priority path.
In cloned paths, the highest priority path is the path on which flips are scheduled. Therefore, lower priority
paths can be subject to minor tearing.
The DirectX graphics kernel subsystem uses the path priority (along with the GDI primary view) to derive the
path-importance value that the subsystem passes to the ImportanceOrdinal member of the
D3DKMDT_VIDPN_PRESENT_PATH structure in a call to the display miniport driver. The path-importance
value impacts driver decisions, such as, to which path the driver should give priority in resource allocations.
For example, the lower-ordinal path might have better access to overlays or to a higher quality controller.
The QueryDisplayConfig CCD function always returns the paths in priority order. If the QDC_ALL_PATHS flag is
set in the Flags parameter of QueryDisplayConfig, QueryDisplayConfig returns all of the inactive path
combinations following all the active path combinations in the paths array that the pPathInfoArray parameter
specifies.
Desktop Layout
operating system.
The caller uses the position member of the DISPLAYCONFIG_SOURCE_MODE structure in a call to the
SetDisplayConfig CCD function to control the arrangement of source surfaces on the desktop. The position
member specifies the position in desktop coordinates of the upper-left corner of the source surface. The source
surface that is positioned at (0, 0) is consider the primary surface. GDI has strict rules about how the source
surfaces can be arranged in the desktop space. For example, GDI does not allow any gaps between source surfaces
and no overlaps in source surfaces.
Although SetDisplayConfig attempts to rearrange sources surfaces to enforce these GDI layout rules, the caller
should specify the layout of the sources surfaces. It is undefined how GDI will rearrange the sources surfaces to
enforce its layout rules, and the resultant layout of sources surfaces might not be what the caller wanted to achieve.
Relationship of Mode Information to Path
Information
operating system.
The QueryDisplayConfig CCD function always returns path information and source and target mode information
for a particular display configuration. The following figure shows an example of how the source and target mode
information relates to the path information. In this example, the QDC_ALL_PATHS flag was passed to the Flags
parameter in the call to QueryDisplayConfig.

Scaling the Desktop Image
This section applies only to Windows 7 and later, and Windows Server 2008 R2 and later versions of the Windows
operating system.
A caller can use the SetDisplayConfig CCD function to scale the desktop image to the monitor. If the desktop and
monitor use the same resolution, SetDisplayConfig is not required to scale the desktop image to the monitor.
This SetDisplayConfig operation is known as identify scaling. If the desktop and monitor resolution are different,
SetDisplayConfig applies one of the following types of scaling. The monitor resolution is defined by the
DISPLAYCONFIG_TARGET_MODE structure.
Centered
Centered scaling is a mode in which the desktop is displayed on the monitor without any scaling at all. When
SetDisplayConfig applies centered scaling, black bands might be visible above and below the desktop. The
following figure shows centered scaling.
Stretched
Stretched scaling is a mode in which the desktop is horizontally and vertically stretched on the monitor to ensure
that the entire display is used. When SetDisplayConfig applies stretched scaling, no black bands are visible above
and below the desktop. However, the desktop might appear distorted. The following figure shows stretched scaling.
Aspect-Ratio-Preserving Stretched
Aspect-ratio-preserving stretched scaling is a mode in which the desktop is stretched horizontally and vertically as
much as possible while maintaining the aspect ratio. When SetDisplayConfig applies aspect-ratio-preserving
stretched scaling, black bands might be visible either above and below or left and right of the desktop. However,
black bands cannot be visible both above and below and left and right of the desktop. Because users are expected
to prefer this type of scaling, SetDisplayConfig applies this type of scaling as the default. The following figure
shows aspect-ratio-preserving stretched scaling.
Scaling depends on the source and target modes that are used for a path. In addition, the caller can call
SetDisplayConfig without specifying the target mode information (that is, setting the modeInfoArray parameter
is optional and can be set to NULL). Therefore, the caller cannot typically predict if SetDisplayConfig must
perform any scaling. Furthermore, no API exists to get the full list of scaling types that the graphics adapter
supports. The EnumDisplaySettings Win32 function (described in the Windows SDK documentation) returns
DMDFO_DEFAULT in the dmDisplayFixedOutput member of the DEVMODE structure that the lpDevMode
parameter points to when the caller requests the new Windows 7 scaling types.
The scaling that a caller passes to SetDisplayConfig is a scaling intent rather than an explicit request to perform a
scaling operation. If scaling is required (for example, source and target resolutions differ), SetDisplayConfig uses
the scaling that the caller supplies. If the supplied scaling is not supported, SetDisplayConfig uses the graphics
adapter's default scaling. When the source and target resolutions that the caller passes to SetDisplayConfig are
the same, SetDisplayConfig always sets identify scaling.
The following table shows the different SetDisplayConfig scaling requests.
SYMBOL IN TABLE MEANING
DC_IDENTITY DISPLAYCONFIG_SCALING_IDENTITY
DC_CENTERED DISPLAYCONFIG_SCALING_CENTERED
DC_STRETCHED DISPLAYCONFIG_SCALING_STRETCHED
DC_ASPECTRATIOCENTEREDMAX DISPLAYCONFIG_SCALING_ASPECTRATIOCENTEREDMAX
DC_CUSTOM DISPLAYCONFIG_SCALING_CUSTOM
DC_PREFERRED DISPLAYCONFIG_SCALING_PREFERRED
AdapterDefault The adapter default scaling value

Currently, on tablet systems, the default is stretched. On
non-tablet systems with graphics adapters that support
the Windows Display Driver Model (WDDM), the default is
defined by the driver. On non-tablet systems with
graphics adapters that support the Windows Display
Driver Model (WDDM) with features new for Windows 7,
the default is DC_ASPECTRATIOCENTEREDMAX.
SYMBOL IN TABLE MEANING
DatabaseValue The scaling value from the database for the current
connected monitors
The following table shows the values that are saved in the database and the values that are actually set.
Scaling flag passed to SetDisplayConfig The resultant source mode and target mode have same resolution The
resultant source mode and target mode have different resolution Set
Store
Set
Store
DC_IDENTITY current config not in Db
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DC_IDENTITY current config in Db
DC_IDENTITY
DatabaseValue
DatabaseValue
DatabaseValue
DC_CENTERED
DC_IDENTITY
DC_CENTERED
DC_CENTERED
DC_CENTERED
DC_STRETCHED
DC_IDENTITY
DC_STRETCHED
DC_STRETCHED
DC_STRETCHED
DC_ASPECTRATIOCENTEREDMAX on WDDM with Windows 7 features driver
DC_IDENTITY
DC_ASPRATIOMAX
DC_ASPRATIOMAX
DC_ASPRATIOMAX
DC_ASPECTRATIOCENTEREDMAX on WDDM driver
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DC_CUSTOM on WDDM with Windows 7 features driver that does support custom scaling on the path
DC_CUSTOM
DC_CUSTOM
DC_CUSTOM
DC_CUSTOM
DC_CUSTOM on WDDM with Windows 7 features driver that does not support custom scaling on the path
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DC_CUSTOM on WDDM driver
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DC_PREFERRED current config not in Db
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DC_PREFERRED current config in Db
DC_IDENTITY
DatabaseValue
DatabaseValue
DatabaseValue
The following table shows how the scaling that a caller can pass to the legacy ChangeDisplaySettingsExAPI
(described in the Windows SDK documentation) maps to the scaling set.
Scaling flag passed to ChangeDisplaySettingsEx The resultant source mode and target mode have same resolution
The resultant source mode and target mode have different resolution Set
Store
Set
Store
DMDFO_DEFAULT with current config not in CCD database
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DMDFO_DEFAULT with current config in CCD database
DC_IDENTITY
DatabaseValue
DatabaseValue
DatabaseValue
DMDFO_STRETCH
DC_IDENTITY
DC_STRETCHED
DC_STRETCHED
DC_STRETCHED
DMDFO_CENTER
DC_IDENTITY
DC_CENTERED
DC_CENTERED
DC_CENTERED
DM_DISPLAYFIXEDOUTPUT not set, current config not in CCD database
DC_IDENTITY
AdapterDefault
AdapterDefault
AdapterDefault
DM_DISPLAYFIXEDOUTPUT not set, current config in CCD database
DC_IDENTITY
DatabaseValue
DatabaseValue
DatabaseValue
The following table shows how display configuration scaling is translated and returned from
EnumDisplaySettings.
GDI SCALING VALUES RETURNED FROM LEGACY

CURRENT ACTIVE SCALING ENUMDISPLAYSETTINGS(ENUM_CURRENT_SETTINGS)
DC_IDENTITY DMDFO_DEFAULT
DC_CENTERED DMDFO_CENTER
DC_STRETCHED DMDFO_STRETCH
DC_ASPRATIOMAX DMDFO_DEFAULT
DC_CUSTOM DMDFO_DEFAULT
DC_PREFERRED DMDFO_DEFAULT
DirectX Games and Scaling

Microsoft DirectX 9L and earlier runtimes require that applications always call the ChangeDisplaySettingsEx
function without DM_DISPLAYFIXEDOUTPUT set in the dmFields member of the DEVMODE structure that the
lpDevMode parameter points to. DirectX 10 and later runtimes allow applications to choose the scaling that those
applications pass to ChangeDisplaySettingsEx. The following table shows the mapping of scaling values to
scaling flags that are passed to ChangeDisplaySettingsEx.
SCALING FLAGS THAT ARE PASSED TO

DXGI FLIP CHAIN SCALING VALUE CHANGEDISPLAYSETTINGSEX
DXGI_MODE_SCALING_UNSPECIFIED DMDFO_DEFAULT, DMDFO_CENTER, or

DMDFO_STRETCH. The scaling that applications use
depends on several factors, which include the current
desktop scaling and the mode list that the driver exposes.
DXGI_MODE_SCALING_CENTERED DMDFO_CENTER
DXGI_MODE_SCALING_STRETCHED DMDFO_STRETCH
By using this information in combination with the preceding scaling tables, you can determine the expected scaling
from a DirectX application.
CCD APIs
operating system.
The following sections describe the CCD APIs and show how to use them in some example code:
CCD Summaries and Scenarios
CCD Example Code
CCD Summaries and Scenarios
operating system.
The following sections summarize how a caller uses each CCD API and provide scenarios for using those CCD APIs:
QueryDisplayConfig Summary and Scenarios
GetDisplayConfigBufferSizes Summary and Scenarios
SetDisplayConfig Summary and Scenarios
DisplayConfigGetDeviceInfo Summary and Scenarios
DisplayConfigSetDeviceInfo Summary and Scenarios
QueryDisplayConfig Summary and Scenarios
operating system.
The following sections summarize how a caller uses the QueryDisplayConfig CCD function and provide scenarios
for using QueryDisplayConfig.
QueryDisplayConfig Summary
The caller can use QueryDisplayConfig to enumerate any of the following information:
All of the individual paths that are possible for the current set of connected monitors. The caller can then
combine the paths to construct possible topologies.
All of the paths that are currently active.
The active paths as they are currently defined in the persistence database for the set of connected displays.
The source and target mode along with orientation, scaling, layout, and connector type on a per-path basis.
The hot-key options that the current topology maps to.
QueryDisplayConfig Scenarios
QueryDisplayConfig is called in the following scenarios:
The display control panel applet calls QueryDisplayConfig to populate the Control Panel's user interface
with the current applied topology when the Control Panel first starts. The current applied topology includes
those displays on which forced projection is enabled.
The display control panel applet calls QueryDisplayConfig to enumerate all of the possible paths to
populate the multimon drop-down box.
Before the Control Panel user interface starts, the display hot key calls QueryDisplayConfig to obtain the
display option (that is, clone, internal, external, or extended) that is currently set.
A third party application might call QueryDisplayConfig to query the current settings that are stored in the
database for the set of connected displays.
GetDisplayConfigBufferSizes Summary and Scenarios
operating system.
The following sections summarize how a caller uses GetDisplayConfigBufferSizes CCD function and provide
scenarios for using GetDisplayConfigBufferSizes.
GetDisplayConfigBufferSizes Summary
The caller can use GetDisplayConfigBufferSizes to obtain information that the caller requires for the
QueryDisplayConfig CCD function.
GetDisplayConfigBufferSizes Scenarios
GetDisplayConfigBufferSizes is always called before calling QueryDisplayConfig.
SetDisplayConfig Summary and Scenarios
operating system.
The following sections summarize how a caller uses the SetDisplayConfig CCD function and provide scenarios for
using SetDisplayConfig.
SetDisplayConfig Summary
The caller can use SetDisplayConfig to apply a topology along with other display settings. That is, the caller can
use SetDisplayConfig to set the topology, layout, orientation, aspect ratio, bit depth, and so on. The caller can use
SetDisplayConfig to perform the following operations:
Set a particular topology of sources and targets.
Define the source and target mode for each path along with layout, orientation, and scaling factor.
Update the database while applying the display settings.
Test whether a particular topology that was constructed by using enumerated paths is possible.
Directly apply the last known setting from the database that maps to one of the four options from the hot
key.
Enable forced projection on a target.
Invoke the new operating system best mode logic.
SetDisplayConfig Scenarios
SetDisplayConfig is called in the following scenarios:
The display control panel applet calls SetDisplayConfig to test all the possible options to populate the
multimon drop-down box.
The display control panel applet calls SetDisplayConfig to apply the setting that a user selected from the
drop-down menu.
The display control panel applet calls SetDisplayConfig to apply the settings that a user selected from the
user interface. These settings include resolution, layout, orientation, scaling, primary, bit depth, and refresh
rate.
After the user makes a selection, the display hot key calls SetDisplayConfig to apply the appropriate setting
from the persistence database.
Tasks under the Control Panel user interface call SetDisplayConfig to apply the appropriate setting, which
is based on the type of the task.
The display control panel applet calls SetDisplayConfig to start or stop forced projection on a particular
target.
DisplayConfigGetDeviceInfo Summary and Scenarios
operating system.
The following sections summarize how a caller uses the DisplayConfigGetDeviceInfo CCD function and provide
scenarios for using DisplayConfigGetDeviceInfo.
DisplayConfigGetDeviceInfo Summary
The caller can use DisplayConfigGetDeviceInfo to obtain more friendly names to display in the user interface.
The caller can obtain names for the adapter, the source, and the target. The caller can also use
DisplayConfigGetDeviceInfo to obtain the native resolution of the connected display device.
DisplayConfigGetDeviceInfo Scenarios
DisplayConfigGetDeviceInfo is called in the following scenarios:
The display control panel applet calls DisplayConfigGetDeviceInfo to obtain the monitor name to display
in the drop-down menu that lists all the connected monitors.
The display control panel applet calls DisplayConfigGetDeviceInfo to obtain the name of the adapters that
are connected to the system.
The display control panel applet calls DisplayConfigGetDeviceInfo to obtain the native resolution of each
connected monitor so the resolution can be highlighted in the user interface.
DisplayConfigSetDeviceInfo Summary and Scenarios
operating system.
The following sections summarize how a caller uses the DisplayConfigSetDeviceInfo CCD function and provide
scenarios for using DisplayConfigSetDeviceInfo.
DisplayConfigSetDeviceInfo Summary
The caller can use DisplayConfigSetDeviceInfo to set the properties of a target. DisplayConfigSetDeviceInfo
can only be currently used to start and stop boot persisted force projection on an analog target.
DisplayConfigSetDeviceInfo Scenarios
DisplayConfigSetDeviceInfo is called in the following scenarios:
Suppose that a user used S-video or composite connector to connect a television and that the operating
system is unable to detect the television. The display control panel applet can call
DisplayConfigSetDeviceInfo to force the output on the connector.
Suppose that a user used a switchbox or KVM switch and that the operating system is unable to read the
EDID from the monitor. The display control panel applet can call DisplayConfigSetDeviceInfo to force the
output on the connector and set a resolution.
CCD Example Code
operating system.
The following pseudocode shows how to use the CCD APIs to set clone view:
SetCloneView
{
// Determine the size of the path array that is required to hold all valid paths.
Call GetDisplayConfigBufferSizes(QDC_ALL_PATHS) to retrieve the sizes of
the DISPLAYCONFIG_PATH_INFO and DISPLAYCONFIG_MODE_INFO buffers that are required.
// Allocate memory for path and mode information arrays.

Allocate PathArraySize*sizeof(DISPLAYCONFIG_PATH_INFO) for the path information array
Allocate ModeArraySize*sizeof(DISPLAYCONFIG_MODE_INFO) for the mode information array.
// Request all of the path information.

Call QueryDisplayConfig(QDC_ALL_PATHS) to obtain the path and mode information for all posible paths.
// Find and store the primary path.

Search the DISPLAYCONFIG_PATH_INFO array for an active path that is located at desktop position (0, 0).
// Determine the user friendly name of the current primary.

Call DisplayConfigGetDeviceInfo() by using the
DISPLAYCONFIG_DEVICE_INFO_GET_TARGET_NAME type and the adapter ID and target ID
from the DISPLAYCONFIG_PATH_TARGET_INFO of the primary path.
// DisplayConfigGetDeviceInfo can determine the user friendly names

// for all of the paths that might be part of the clone.
// Allow the user to pick which monitor the clone is enabled on.
// Only provide the user options of the paths from the current primary
// to targets with monitors that are connected or that are forceable.
Store a pointer to the DISPLAYCONFIG_PATH_INFO that the user picked.
// Mark the new path as active.

Set the DISPLAYCONFIG_PATH_ACTIVE in the DISPLAYCONFIG_PATH_INFO.flags of the new clone path.
NewClonePath->flags |= DISPLAYCONFIG_PATH_ACTIVE;
NewClonePath->sourceInfo.modeInfoIdx = DISPLAYCONFIG_PATH_MODE_IDX_INVALID;
NewClonePath->targetInfo.modeInfoIdx = DISPLAYCONFIG_PATH_MODE_IDX_INVALID;
// Set the new topology.

Call SetDisplayConfig
(SDC_APPLY | SDC_SAVE_TO_DATABASE | SDC_ALLOW_CHANGES | SDC_USE_SUPPLIED_DISPLAY_CONFIG)
to change to the clone topology.
}

CCD DDIs
The Connecting and Configuring Displays (CCD) feature introduced with Windows 7 provides for improved display
miniport driver control of display devices. The following reference topics describe the CCD device driver interfaces
(DDIs) that are available to developers of display miniport drivers:
System-Implemented Functions
DXGK_MONITOR_INTERFACE_V2::pfnGetAdditionalMonitorModeSet
DXGK_MONITOR_INTERFACE_V2::pfnReleaseAdditionalMonitorModeSet
Driver-Implemented Function The following function must be implemented by display miniport drivers that
support CCD:
DxgkDdiQueryVidPnHWCapability
Structures D3DKMDT_VIDPN_HW_CAPABILITY
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT
D3DKMT_POLLDISPLAYCHILDREN
D3DKMT_RENDERFLAGS
DISPLAYID_DETAILED_TIMING_TYPE_I_ASPECT_RATIO
DXGKARG_QUERYVIDPNHWCAPABILITY
DXGK_MONITOR_INTERFACE_V2
DXGK_PRESENTATIONCAPS
DXGK_TARGETMODE_DETAIL_TIMING
Enumerations D3DKMDT_VIDPN_PRESENT_PATH_SCALING
DISPLAYID_DETAILED_TIMING_TYPE_I_ASPECT_RATIO
DISPLAYID_DETAILED_TIMING_TYPE_I_SCANNING_MODE
DISPLAYID_DETAILED_TIMING_TYPE_I_STEREO_MODE
DISPLAYID_DETAILED_TIMING_TYPE_I_SYNC_POLARITY
For more details on how to implement CCD in your display miniport driver, see the following topics:
Obtaining Additional Monitor Target Modes
Using Aspect Ratio and Custom Scaling Modes
System Calls to Recommend VidPN Topology
ACPI Keyboard Shortcut Logic
Querying VidPN Hardware Capabilities
Obtaining Additional Monitor Target Modes
Beginning with Windows 7, a new monitor interface is available, DXGK_MONITOR_INTERFACE_V2. It provides

two additional functions that are not in the original DXGK_MONITOR_INTERFACE interface:
pfnGetAdditionalMonitorModeSet
pfnReleaseAdditionalMonitorModeSet
These functions provide a dynamic and scalable way for a display miniport driver to add target modes to the VidPN
target. In comparison, the DXGK_MONITOR_INTERFACE interface provides only a static list of target modes. Using
these functions, the driver can query the operating system for a list of additional modes that it should enumerate.
The driver can validate the requested modes and reject those that the monitor does not support.
When the display miniport driver receives a call to the driver-implemented
DxgkDdiEnumVidPnCofuncModality function to enumerate target modes,
it should use the following procedure to add compatible timing information to the target mode set:
1. Return the filtered additional target modes that it obtains when it calls
pfnGetAdditionalMonitorModeSet. It should also return the regular target modes, as described in
Enumerating Cofunctional VidPN Source and Target Modes.
2. The pfnGetAdditionalMonitorModeSet function will return the following:
ppAdditionalModesSet, a list of additional timing modes in DXGK_TARGETMODE_DETAIL_TIMING
format.
pNumberModes, the number of timing modes.
3. Iterate through all of these timing modes.
4. Filter out all incompatible timing modes and any regular modes that were already supplied during the call to
DxgkDdiEnumVidPnCofuncModality.
5. Convert the remaining timing modes to D3DKMDT_VIDPN_TARGET_MODE type.
6. Add all of the remaining timing modes to the VidPN target mode set.
7. Call pfnReleaseAdditionalMonitorModeSet to release the additional timing mode list that was returned
from pfnGetAdditionalMonitorModeSet.
The display miniport driver should add all additional timing modes that are supported by the hardware to the
VidPN source mode set and the target mode set. When the display mode manager (DMM) generates a mode list, all
display modes, including additional timing modes, that are not supported by the monitor are indicated as not being
supported by the monitor and appear only in the raw mode list. Regardless of whether a monitor is connected or
not, the miniport driver should report all VidPN source and target mode sets that are supported by the monitor. A
driver that reports only monitor-supported modes must also report the additional modes that are not supported by
the currently connected monitor.
CRT Monitors
For CRT monitors, DMM adds as an additional target mode the 640 x 480 x 60Hz standard monitor timing that is
defined in the Video Electronics Standards Association (VESA) specification, VESA and Industry Standards and
Guidelines for Computer Display Monitor Timing version 1.0.
DTV and HDTV Monitors
For Digital Television (DTV) and High-Definition Television (HDTV) monitors, DMM adds as additional target modes
all the standard DTV modes that are required by the WHCK Automated Test GRAPHICS-0043, as shown in the
following tables. A display miniport driver should prune all modes that are not supported by the display hardware.
59.95Hz DTV System:
DTV FORMAT HDTV FORMAT
640 x 480p x 59.94Hz, Aspect Ratio 4:3 640 x 480p x 59.94Hz, Aspect Ratio 4:3
720(1440) x 480i x 59.94Hz, Aspect Ratio 4:3 720(1440) x 480i x 59.94Hz, Aspect Ratio 4:3
720(1440) x 480i x 59.94Hz , Aspect Ratio 16:9 720(1440) x 480i x 59.94Hz , Aspect Ratio 16:9
1280 x 720p x 59.94Hz, Aspect Ratio 16:9
1920 x1080i x 59.94Hz, Aspect Ratio 16:9
1920 x 1080p x 59.94Hz, Aspect Ratio 16:9
50Hz DTV System:
720(1440) x 576i x 50Hz, Aspect Ratio 4:3 720(1440) x 576i x 50Hz, Aspect Ratio 4:3
720(1440) x 576i x 50Hz, Aspect Ratio 16:9 720(1440) x 576i x 50Hz, Aspect Ratio 16:9
720 x 576p x 50Hz, Aspect Ratio 4:3 720x 576p x 50Hz, Aspect Ratio 4:3
720 x 576p x 50Hz, Aspect Ratio 16:9 720x 576p x 50Hz, Aspect Ratio 16:9
1280 x 720p x 50Hz, Aspect Ratio 16:9
1920 x 1080i x 50Hz, Aspect Ratio 16:9

1920 x 1080p x 50Hz, Aspect Ratio 16:9
Miniport drivers written for Windows Vista should continue to conform with the WHCK Automated Test
GRAPHICS-0043 and add the additional DTV modes specified in these tables. Drivers written for Windows 7 only
have to support the new pfnGetAdditionalMonitorModeSet and pfnReleaseAdditionalMonitorModeSet
functions.
Using Aspect Ratio and Custom Scaling Modes
To support aspect-ratio-preserving stretched scaling and custom scaling modes available beginning with Windows
7 (where DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN7), the following
capabilities are added to VidPN present path data used by display miniport drivers:
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT structure:
AspectRatioCenteredMax and Custom members
D3DKMDT_VIDPN_PRESENT_PATH_SCALING enumeration:
D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX and D3DKMDT_VPPS_CUSTOM values
Specifying Scaling Modes
The behavior and appearance of the desktop on the monitor using these scaling modes is described in Scaling the
Desktop Image. When the display mode manager (DMM) calls the DxgkDdiEnumVidPnCofuncModality function,
the driver must set the members of D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT according to the
types of scaling that the VidPN present path supports, as follows:
Identity Scaling
If the path can display content with no transformation, set the Identity member of
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT to a nonzero value. When
DxgkDdiEnumVidPnCofuncModality is called, set the Scaling member of the
D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION structure to D3DKMDT_VPPS_IDENTITY.
Centered Scaling
If the path can display content unscaled and centered on the target, set
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.Centered. When DxgkDdiEnumVidPnCofuncModality
is called, set D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION.Scaling to
D3DKMDT_VPPS_CENTERED.
Stretched Scaling
If the path can display content that is scaled to fit the target while not preserving the aspect ratio of the source, set
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.Stretched. When DxgkDdiEnumVidPnCofuncModality
is called, set D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION.Scaling to
D3DKMDT_VPPS_STRETCHED.
Aspect-Ratio-Preserving Stretched Scaling
If the path can scale source content to fit the target while preserving the aspect ratio of the source, set
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.AspectRatioCenteredMax. When
DxgkDdiEnumVidPnCofuncModality is called, set
D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION.Scaling to
D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX.
Custom Scaling
If the path can display one or more scaling modes that are not described by the other
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT structure members, set
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.Custom. When DxgkDdiEnumVidPnCofuncModality is
called, set D3DKMDT_VIDPN_PRESENT_PATH_TRANSFORMATION.Scaling to D3DKMDT_VPPS_CUSTOM.
Independent hardware vendors (IHVs) can use private escape values to inform the driver how to interpret custom
scaling on a given target.
If the current pinned target and source modes have the same aspect ratio but are different sizes, the display
miniport driver should set only the Stretched and Centered members. In this case DMM will clear any nonzero
value of the AspectRatioCenteredMax member.
API to DDI Scaling
The correspondence of user-mode API scaling values to the display miniport driver DDI scaling values in the
D3DKMDT_VIDPN_PRESENT_PATH_SCALING enumeration is shown in the following table.
SETDISPLAYCONFIG API SCALING VALUE DDI SCALING VALUE
DC_IDENTITY D3DKMDT_VPPS_IDENTITY
DC_CENTERED D3DKMDT_VPPS_CENTERED
DC_STRETCHED D3DKMDT_VPPS_STRETCHED
DC_ASPRATIOMAX D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX
DC_CUSTOM D3DKMDT_VPPS_CUSTOM
DC_PREFERRED D3DKMDT_VPPS_PREFERRED
This mapping can be used with the tables in Scaling the Desktop Image to understand how user-mode scaling
types are translated into DDI scaling types that are sent to the display miniport driver.
Scaling and Driver Versions
The behavior of different display miniport driver versions running on different versions of the operating system are
shown in the following table.
Driver Version Operating System Version
DXGKDDI_INTERFACE_VERSION < DXGKDDI_INTERFACE_VERSION_WIN7
and
>= DXGKDDI_INTERFACE_VERSION_VISTA
DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN7
Windows Vista
The driver has Windows Vista behavior.
The driver must check the operating system version during initialization and should never expose or use the
AspectRatioCenteredMax and Custom members of D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.
If the driver violates this requirement, DMM will ignore AspectRatioCenteredMax and Custom and will only
recognize the Identity, Centered, or Stretched members. If the driver attempts to pin the
D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX scaling mode on any VidPN path, DMM will return the status
code STATUS_GRAPHICS_INVALID_PATH_CONTENT_GEOMETRY_TRANSFORMATION and will treat this
scaling mode the same as full-screen stretch mode.
Windows 7
The operating system clears the values of the AspectRatioCenteredMax and Custom members and assumes that
the driver does not support aspect-ratio-preserving stretched scaling and custom scaling modes. DMM will only set
scaling modes D3DKMDT_VPPS_IDENTITY, D3DKMDT_VPPS_STRETCHED, or D3DKMDT_VPPS_CENTERED.
The driver behaves as on Windows Vista.
The driver should support the AspectRatioCenteredMax member, and the operating system uses it from Control
Panel applications. The driver can optionally implement customized functionality by setting the Custom member.
DMM will always confirm that the driver interface >= DXGKDDI_INTERFACE_VERSION_WIN7 before it attempts
to check and use the AspectRatioCenteredMax or Custom members of
D3DKMDT_VIDPN_PRESENT_PATH_SCALING_SUPPORT.
Important A display miniport driver that supports the D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX or
D3DKMDT_VPPS_CUSTOM values should never set a value of D3DKMDT_VPPS_NOTSPECIFIED.
Scaling With Multiple Adapters
The values of the scaling types D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX and
D3DKMDT_VPPS_CUSTOM introduced with Windows 7 are stored in the CCD connection database that is
associated with a graphics processing unit (GPU). If the user moves a monitor from one GPU with a driver that
supports these scaling members to another GPU, the second GPU might not be supported by the original driver. In
this case the operating system will map these scaling types to the system default scaling.
If both GPUs support the scaling types D3DKMDT_VPPS_ASPECTRATIOCENTEREDMAX and
D3DKMDT_VPPS_CUSTOM, and the driver for the first GPU implements the D3DKMDT_VPPS_CUSTOM custom
scaling request, then if the user switches the monitor to the second GPU, the driver for the second GPU will
probably not know how to interpret the custom scaling request. In this case the second driver should fail a call to
the DxgkDdiCommitVidPn function and should return the
STATUS_GRAPHICS_VIDPN_MODALITY_NOT_SUPPORTED status code; the operating system will map this
scaling type to the system default scaling.
System Calls to Recommend VidPN Topology
On a computer running Windows 7, the display mode manager (DMM) determines an appropriate VidPN topology
to apply using VidPN history data in the CCD database. DMM no longer determines the VidPN topology based
upon the last known good topology as it did in Windows Vista. Consequently, on Windows 7 DMM never calls the
DxgkDdiRecommendVidPnTopology function.
On Windows Vista and its service packs, DMM continues to call DxgkDdiRecommendVidPnTopology to request that
the driver provide a recommended functional VidPN topology.
ACPI Keyboard Shortcut Logic
Beginning with Windows 7, IHVs implement ACPI-based OEM-specific keyboard shortcuts. The operating system is
unaware of these keyboard shortcuts. On Windows 7, OEMs must use the CCD database to store and apply
keyboard shortcuts so that the operating system and any OEM applications are aware of each other.
The behavior of calls to the following functions has changed for drivers running on Windows 7:
DxgkDdiNotifyAcpiEvent and DxgkDdiRecommendFunctionalVidPn
If the display miniport driver receives a call to the DxgkDdiNotifyAcpiEvent function with the
DXGK_ACPI_CHANGE_DISPLAY_MODE flag set in the AcpiFlags parameter, DMM calls the
DxgkDdiRecommendFunctionalVidPn function to obtain the new VidPN and to compare against the current
client VidPN. If the topology of the two VidPNs is the same, DMM does not modify the new VidPN. Otherwise,
DMM removes mode information from the VidPN, leaving just the topology, and allows the CCD database to
determine the modes for the given topology. DMM then sets the display configuration based on the new VidPN.
D3DKMTInvalidateActiveVidPn
This function is supported on Windows Vista and later for display miniport drivers with version <
DXGKDDI_INTERFACE_VERSION_WIN7. Function behavior is identical to the behavior on Windows Vista.
This function is not supported on Windows 7 and later for display miniport drivers with version >=
DXGKDDI_INTERFACE_VERSION_WIN7. If called, the status code STATUS_NOT_SUPPORTED is returned.
Querying VidPN Hardware Capabilities
Beginning in Windows 7, display miniport drivers are required to report all hardware capabilities of a specified
functional VidPN. Drivers should support the following callback function and its associated structures:
DxgkDdiQueryVidPnHWCapability function
DXGKARG_QUERYVIDPNHWCAPABILITY structure
D3DKMDT_VIDPN_HW_CAPABILITY structure
When the driver reports the hardware capabilities, it should consider cloning to be an implicit procedure that is
done as part of rotation or scaling transformations: a source must first be cloned before it can be rotated or scaled.
If any of the members of D3DKMDT_VIDPN_HW_CAPABILITY have no meaning on the specified VidPN path, the
display mode manager (DMM) will not report any errors if the members are set to nonzero values. DMM will clear
all such values before reporting them to the user-mode client. However, the driver is required to set the value of
the Reserved member of D3DKMDT_VIDPN_HW_CAPABILITY to 0.
Example Scenario
To show how the display miniport driver should report hardware capabilities, consider the following example set of
hardware configurations P1, P2, and P3:
P1: Surface is cloned from Source S1, then rotated 90 degrees and scaled to fit the target.
P2: Surface is cloned from Source S1, with no applied transformation.
P3: Source S2 has no applied transformation.
When DxgkDdiQueryVidPnHWCapability is called, the driver should return values for the rotation, scaling, and
cloning members of D3DKMDT_VIDPN_HW_CAPABILITY according to the following table:
Returned Values for Members of D3DKMDT_VIDPN_HW_CAPABILITY Hardware Capabilities VidPN Path
DriverRotation DriverScaling DriverCloning Hardware can perform all rotation, scaling, and cloning
transformations.
P₁
0
0
0
P₂
0
0
0
P₃
0
0
0
Hardware can perform all transformations except cloning
P₁
0
0
0
P₂
0
0
1
P₃
0
0
0
Hardware can perform cloning and scaling transformations, but not rotation. Driver performs rotation using an
intermediate rotation blit.
P₁
1
0
0
P₂
0
0
0
P₃
0
0
0
Hardware cannot perform cloning, scaling, or rotation transformations. These operations are performed by the
driver.
P₁
1
1
0
P₂
0
0
1
P₃
0
0
0
Example code for displaying an app on a portrait
device
Here is code that you can use to make your app display correctly on a portrait device.
//
// This file contains utility functions for use in desktop applications for getting the current
// orientation as Landscape/Portrait/LandscapeFlipped/PortraitFlipped (abbr: L/P/LF/PF). These
// functions are most helpful for use with APIs which expect one of these values, while the APIs
// for retrieving all return the rotation in degrees (0/90/180/270). There is not a direct mapping
// between these two forms since 0 degrees means portrait on portrait-native devices and landscape
// on landscape-native devices.
//
#include <windows.h>
#include <iostream>
enum ORIENTATION
{
INVALID,
LANDSCAPE,
PORTRAIT,
LANDSCAPE_FLIPPED,
PORTRAIT_FLIPPED
};
// Maps the current rotation from 0/90/180/270 to L/P/LF/PF using the unrotated
// resolution to guess at what the native orientation is.
ORIENTATION GetOrientationFromCurrentMode(_In_ PCWSTR pszDeviceName)
{
DEVMODEW CurrentMode = {};
CurrentMode.dmSize = sizeof(CurrentMode);
if (!EnumDisplaySettingsW(pszDeviceName,
ENUM_CURRENT_SETTINGS,
&CurrentMode))
{
// Error condition, likely invalid device name, could log error
// HRESULT hr = HRESULT_FROM_WIN32(GetLastError());
return INVALID;
}
if ((CurrentMode.dmDisplayOrientation == DMDO_90) ||
(CurrentMode.dmDisplayOrientation == DMDO_270))
{
DWORD temp = CurrentMode.dmPelsHeight;
CurrentMode.dmPelsHeight = CurrentMode.dmPelsWidth;
CurrentMode.dmPelsWidth = temp;
}
if (CurrentMode.dmPelsWidth < CurrentMode.dmPelsHeight)

{
switch (CurrentMode.dmDisplayOrientation)
{
case DMDO_DEFAULT: return PORTRAIT;
case DMDO_90: return LANDSCAPE_FLIPPED;
case DMDO_180: return PORTRAIT_FLIPPED;
case DMDO_270: return LANDSCAPE;
default: return INVALID;
}
}
}
else
{
switch (CurrentMode.dmDisplayOrientation)
{
case DMDO_DEFAULT: return LANDSCAPE;
case DMDO_90: return PORTRAIT;
case DMDO_180: return LANDSCAPE_FLIPPED;
case DMDO_270: return PORTRAIT_FLIPPED;
}
}
}
// Overloaded function accepts an HMONITOR and converts to DeviceName

ORIENTATION GetOrientationFromCurrentMode(HMONITOR hMonitor)
{
// Get the name of the 'monitor' being requested
MONITORINFOEXW ViewInfo;
RtlZeroMemory(&ViewInfo, sizeof(ViewInfo));
ViewInfo.cbSize = sizeof(ViewInfo);
if (!GetMonitorInfoW(hMonitor, &ViewInfo))
{
// Error condition, likely invalid monitor handle, could log error
// HRESULT hr = HRESULT_FROM_WIN32(GetLastError());
return INVALID;
}
else
{
return GetOrientationFromCurrentMode(ViewInfo.szDevice);
}
}
// Returns true if this is an integrated display panel e.g. the screen attached to tablets or laptops.
bool IsInternalVideoOutput(const DISPLAYCONFIG_VIDEO_OUTPUT_TECHNOLOGY VideoOutputTechnologyType)
{
switch (VideoOutputTechnologyType)
{
case DISPLAYCONFIG_OUTPUT_TECHNOLOGY_INTERNAL:
case DISPLAYCONFIG_OUTPUT_TECHNOLOGY_DISPLAYPORT_EMBEDDED:
case DISPLAYCONFIG_OUTPUT_TECHNOLOGY_UDI_EMBEDDED:
return TRUE;
default:
return FALSE;
}
}
// Given a target on an adapter, returns whether it is a natively portrait display

bool IsNativeOrientationPortrait(const LUID AdapterLuid, const UINT32 TargetId)
{
DISPLAYCONFIG_TARGET_PREFERRED_MODE PreferredMode;
PreferredMode.header.type = DISPLAYCONFIG_DEVICE_INFO_GET_TARGET_PREFERRED_MODE;
PreferredMode.header.size = sizeof(PreferredMode);
PreferredMode.header.adapterId = AdapterLuid;
PreferredMode.header.id = TargetId;
HRESULT hr = HRESULT_FROM_WIN32(DisplayConfigGetDeviceInfo(&PreferredMode.header));
if (FAILED(hr))
{
// Error condition, assume natively landscape
return false;
}
return (PreferredMode.height > PreferredMode.width);

}
// Note: Since an hmon can represent multiple monitors while in clone, this function as written will return
// the value for the internal monitor if one exists, and otherwise the highest clone-path priority.
HRESULT GetPathInfo(_In_ PCWSTR pszDeviceName, _Out_ DISPLAYCONFIG_PATH_INFO* pPathInfo)
HRESULT GetPathInfo(_In_ PCWSTR pszDeviceName, _Out_ DISPLAYCONFIG_PATH_INFO* pPathInfo)
{
HRESULT hr = S_OK;
UINT32 NumPathArrayElements = 0;
UINT32 NumModeInfoArrayElements = 0;
DISPLAYCONFIG_PATH_INFO* PathInfoArray = nullptr;
DISPLAYCONFIG_MODE_INFO* ModeInfoArray = nullptr;
do
{
// In case this isn't the first time through the loop, delete the buffers allocated
delete[] PathInfoArray;
PathInfoArray = nullptr;
delete[] ModeInfoArray;
ModeInfoArray = nullptr;
hr = HRESULT_FROM_WIN32(GetDisplayConfigBufferSizes(QDC_ONLY_ACTIVE_PATHS, &NumPathArrayElements,
&NumModeInfoArrayElements));
if (FAILED(hr))
{
break;
}
PathInfoArray = new(std::nothrow) DISPLAYCONFIG_PATH_INFO[NumPathArrayElements];

if (PathInfoArray == nullptr)
{
hr = E_OUTOFMEMORY;
break;
}
ModeInfoArray = new(std::nothrow) DISPLAYCONFIG_MODE_INFO[NumModeInfoArrayElements];

if (ModeInfoArray == nullptr)
{
hr = E_OUTOFMEMORY;
break;
}
hr = HRESULT_FROM_WIN32(QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS, &NumPathArrayElements, PathInfoArray,

&NumModeInfoArrayElements, ModeInfoArray, nullptr));
}while (hr == HRESULT_FROM_WIN32(ERROR_INSUFFICIENT_BUFFER));
INT DesiredPathIdx = -1;
if (SUCCEEDED(hr))
{
// Loop through all sources until the one which matches the 'monitor' is found.
for (UINT PathIdx = 0; PathIdx < NumPathArrayElements; ++PathIdx)
{
DISPLAYCONFIG_SOURCE_DEVICE_NAME SourceName = {};
SourceName.header.type = DISPLAYCONFIG_DEVICE_INFO_GET_SOURCE_NAME;
SourceName.header.size = sizeof(SourceName);
SourceName.header.adapterId = PathInfoArray[PathIdx].sourceInfo.adapterId;
SourceName.header.id = PathInfoArray[PathIdx].sourceInfo.id;
hr = HRESULT_FROM_WIN32(DisplayConfigGetDeviceInfo(&SourceName.header));
if (SUCCEEDED(hr))
{
if (wcscmp(pszDeviceName, SourceName.viewGdiDeviceName) == 0)
{
// Found the source which matches this hmonitor. The paths are given in path-priority order
// so the first found is the most desired, unless we later find an internal.
if (DesiredPathIdx == -1 ||
IsInternalVideoOutput(PathInfoArray[PathIdx].targetInfo.outputTechnology))
{
DesiredPathIdx = PathIdx;
}
}
}
}
}
}
if (DesiredPathIdx != -1)
{
*pPathInfo = PathInfoArray[DesiredPathIdx];
}
else
{
hr = E_INVALIDARG;
}
return hr;
}
// Overloaded function accepts an HMONITOR and converts to DeviceName

HRESULT GetPathInfo(HMONITOR hMonitor, _Out_ DISPLAYCONFIG_PATH_INFO* pPathInfo)
{
HRESULT hr = S_OK;
// Get the name of the 'monitor' being requested

MONITORINFOEXW ViewInfo;
RtlZeroMemory(&ViewInfo, sizeof(ViewInfo));
ViewInfo.cbSize = sizeof(ViewInfo);
if (!GetMonitorInfoW(hMonitor, &ViewInfo))
{
// Error condition, likely invalid monitor handle, could log error
hr = HRESULT_FROM_WIN32(GetLastError());
}
if (SUCCEEDED(hr))
{
hr = GetPathInfo(ViewInfo.szDevice, pPathInfo);
}
return hr;
}
// Note: Function return S_FALSE if there is no internal target

// Gets the path info for the integrated display panel e.g. the screen attached to tablets or laptops.
HRESULT GetPathInfoForInternal(_Out_ DISPLAYCONFIG_PATH_INFO* pPathInfo)
{
HRESULT hr = S_OK;
UINT32 NumPathArrayElements = 0;
UINT32 NumModeInfoArrayElements = 0;
DISPLAYCONFIG_PATH_INFO* PathInfoArray = nullptr;
DISPLAYCONFIG_MODE_INFO* ModeInfoArray = nullptr;
do
{
// In case this isn't the first time through the loop, delete the buffers allocated
hr = HRESULT_FROM_WIN32(GetDisplayConfigBufferSizes(QDC_ONLY_ACTIVE_PATHS, &NumPathArrayElements,
&NumModeInfoArrayElements));
if (FAILED(hr))
{
break;
}
PathInfoArray = new(std::nothrow) DISPLAYCONFIG_PATH_INFO[NumPathArrayElements];
if (PathInfoArray == nullptr)
{
hr = E_OUTOFMEMORY;
break;
}
ModeInfoArray = new(std::nothrow) DISPLAYCONFIG_MODE_INFO[NumModeInfoArrayElements];

if (ModeInfoArray == nullptr)
{
hr = E_OUTOFMEMORY;
break;
}
hr = HRESULT_FROM_WIN32(QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS, &NumPathArrayElements, PathInfoArray,

&NumModeInfoArrayElements, ModeInfoArray, nullptr));
}while (hr == HRESULT_FROM_WIN32(ERROR_INSUFFICIENT_BUFFER));
if (SUCCEEDED(hr))
{
hr = S_FALSE;
RtlZeroMemory(pPathInfo, sizeof(*pPathInfo));
for (UINT PathIdx = 0; PathIdx < NumPathArrayElements; ++PathIdx)

{
if (IsInternalVideoOutput(PathInfoArray[PathIdx].targetInfo.outputTechnology))
{
// There's only one internal target on the system and we found it.
*pPathInfo = PathInfoArray[PathIdx];
hr = S_OK;
break;
}
}
}
return hr;
}
// Given a path info, this function will find the native orientation of the path and map 0/90/180/270 to
L/P/LF/PF
ORIENTATION GetOrientationFromPathInfo(_In_ const DISPLAYCONFIG_PATH_INFO* const pPathInfo)
{
bool IsNativelyPortrait = IsNativeOrientationPortrait(pPathInfo->targetInfo.adapterId, pPathInfo-
>targetInfo.id);
DISPLAYCONFIG_ROTATION CurrentRotation = pPathInfo->targetInfo.rotation;
if (IsNativelyPortrait)
{
switch (CurrentRotation)
{
case DISPLAYCONFIG_ROTATION_IDENTITY: return PORTRAIT;
case DISPLAYCONFIG_ROTATION_ROTATE90: return LANDSCAPE_FLIPPED;
case DISPLAYCONFIG_ROTATION_ROTATE180: return PORTRAIT_FLIPPED;
case DISPLAYCONFIG_ROTATION_ROTATE270: return LANDSCAPE;
}
}
else
{
switch (CurrentRotation)
{
case DISPLAYCONFIG_ROTATION_IDENTITY: return LANDSCAPE;
case DISPLAYCONFIG_ROTATION_IDENTITY: return LANDSCAPE;
case DISPLAYCONFIG_ROTATION_ROTATE90: return PORTRAIT;
case DISPLAYCONFIG_ROTATION_ROTATE180: return LANDSCAPE_FLIPPED;
case DISPLAYCONFIG_ROTATION_ROTATE270: return PORTRAIT_FLIPPED;
}
}
}
// This function shows the use of each of the utility functions found above in a reasonable order of calling.
ORIENTATION GetOrientation(bool UseInternal)
{
DISPLAYCONFIG_PATH_INFO PathInfo = {};
HMONITOR hPrimaryMon = MonitorFromWindow(NULL, MONITOR_DEFAULTTOPRIMARY);
HRESULT hr = S_FALSE;
if (UseInternal)
{
hr = GetPathInfoForInternal(&PathInfo);
}
if ((hr == S_FALSE) || FAILED(hr))

{
// Could log an error on FAILED(hr), but whether legitimate failure or desktop system, try the primary
monitor
hr = GetPathInfo(hPrimaryMon, &PathInfo);
}
if (SUCCEEDED(hr))
{
return GetOrientationFromPathInfo(&PathInfo);
}
else
{
// In Windows 8.1 and previous operating systems, the GetPathInfo (and ForInternal) call will fail in a
remote session,
// falling back to checking the current mode is the most appropriate thing to do in this situation.
return GetOrientationFromCurrentMode(hPrimaryMon);
}
}
void PrintOrientation(ORIENTATION Orientation)

{
switch (Orientation)
{
case INVALID: std::cout << "Error" << std::endl; break;
case LANDSCAPE: std::cout << "Landscape" << std::endl; break;
case PORTRAIT: std::cout << "Portrait" << std::endl; break;
case LANDSCAPE_FLIPPED: std::cout << "Landscape Flipped" << std::endl; break;
case PORTRAIT_FLIPPED: std::cout << "Portrait Flipped" << std::endl; break;
}
}
int __cdecl main(int argc, const char* argv[])

{
UNREFERENCED_PARAMETER(argc);
UNREFERENCED_PARAMETER(argv);
HRESULT hr = E_FAIL;
// Note: This MonitorFromWindow call should be modified if the orientation is needed for
// the monitor the application's window is currently on. It is also unnecessary if only
// the internal monitor is desired.
HMONITOR hPrimaryMon = MonitorFromWindow(NULL, MONITOR_DEFAULTTOPRIMARY);
// Print the orientation of the integrated panel.

{
hr = GetPathInfoForInternal(&PathInfo);
if (hr == S_FALSE)
{
std::cout << "No integrated panel found." << std::endl;
}
else if (SUCCEEDED(hr))
{
std::cout << "Integrated panel: ";
PrintOrientation(GetOrientationFromPathInfo(&PathInfo));
}
else
{
std::cout << "Error looking for internal monitor: " << hr << std::endl;
}
}
// Print the orientation of the primary monitor.

{
hr = GetPathInfo(hPrimaryMon, &PathInfo);
if (SUCCEEDED(hr))
{
std::cout << "Primary monitor: ";
PrintOrientation(GetOrientationFromPathInfo(&PathInfo));
}
else
{
std::cout << "Error getting path info for primary monitor: " << hr << std::endl;
}
}
// In Windows 8.1 and previous operating systems, GetPathInfo (and GetPathInfoForInternal) will fail in a
remote
// session, falling back to checking the current mode is the most appropriate thing to do in this
situation.
if (FAILED(hr))
{
std::cout << "Fallback based on current mode: ";
PrintOrientation(GetOrientationFromCurrentMode(hPrimaryMon));
}
}

Wireless (Miracast) displays can optionally be supported by Windows Display Driver Model (WDDM) 1.3 and later
drivers. This capability is new starting with Windows 8.1.
For more information on the requirements of drivers and hardware to support Miracast displays, refer to the
Building best-in-class Miracast solutions with Windows 10 guide and the relevant WHCK documentation at
Device.Graphics.WDDM13.DisplayRender.WirelessDisplay.
Miracast design guide

These design guide sections describe how display miniport drivers and Miracast user-mode drivers support
Miracast displays:
WDDM display miniport driver tasks to support Miracast wireless displays
Miracast user-mode driver tasks to support Miracast wireless displays
Reporting Miracast encode chunks and statistics
Calling DisplayConfig functions for a Miracast target
Miracast reference
These reference sections describe how to implement this capability in your drivers:
User-mode device driver interfaces (DDIs)
Wireless display callback functions called by Miracast user-mode drivers
All functions implemented by the operating system that can be called by Miracast user-mode drivers.
Wireless display functions implemented by Miracast user-mode drivers
All functions that Miracast user-mode drivers must implement in order to enable Miracast displays.
Wireless display (Miracast) structures and enumerations
All user-mode structures and enumerations that are used with Miracast display device driver interfaces (DDIs).
These additional user-mode structures and enumerations support Miracast displays and are new or updated for
Windows 8.1:
DISPLAYCONFIG_TARGET_BASE_TYPE (new)
DISPLAYCONFIG_VIDEO_SIGNAL_INFO (AdditionalSignalInfo child structure added)
DISPLAYCONFIG_DEVICE_INFO_TYPE (DISPLAYCONFIG_DEVICE_INFO_GET_TARGET_BASE_TYPE
constant added)
D3DKMDT_VIDEO_SIGNAL_INFO (AdditionalSignalInfo child structure added)
DISPLAYCONFIG_DEVICE_INFO_TYPE (DISPLAYCONFIG_DEVICE_INFO_GET_TARGET_BASE_TYPE
constant added)
Kernel-mode DDIs
Wireless display (Miracast) display callback interface
All functions that are implemented by the Microsoft DirectX graphics kernel subsystem to support Miracast
displays.
Wireless display (Miracast) interface
All functions that are implemented by display miniport drivers that support Miracast displays.
These additional kernel-mode structures and enumerations support Miracast displays and are new or updated for
Windows 8.1:
DXGK_MIRACAST_CAPS (new)
D3DKMDT_VIDEO_OUTPUT_TECHNOLOGY (D3DKMDT_VOT_MIRACAST constant added)
D3DKMDT_VIDEO_SIGNAL_INFO (AdditionalSignalInfo child structure added)
DXGK_CHILD_STATUS (Miracast child structure added)
DXGK_CHILD_STATUS_TYPE (StatusMiracast constant added)
DXGKARGCB_NOTIFY_INTERRUPT_DATA (MiracastEncodeChunkCompleted child structure added)
WDDM display miniport driver tasks to support
Miracast wireless displays
To support Miracast wireless displays, Windows Display Driver Model (WDDM) display miniport drivers that run in
kernel mode need to do the following tasks.
Supporting the Miracast interface

If the display miniport driver supports Miracast displays, it must report the
DXGK_MIRACAST_DISPLAY_INTERFACE structure, which has pointers to driver-implemented Miracast functions,
when the Microsoft DirectX graphics kernel subsystem calls the DxgkDdiQueryInterface function.
If the operating systemâ€™s DirectX graphics kernel subsystem (Dxgkrnl.sys) does not call the
DxgkDdiQueryInterface function to query the Miracast display interface, then it does not support Miracast wireless
displays, and the display miniport driver should not report any Miracast target.
The driver should not report more than one Miracast target on any full WDDM graphics device, otherwise the
operating system fails to start the adapter.
After Dxgkrnl calls DxgkDdiQueryInterface to query the Miracast display interface, the driver can then report the
target type as D3DKMDT_VOT_MIRACAST during device initialization when Dxgkrnl calls the
DxgkDdiQueryChildRelations function.
The Miracast target should remain in a disconnected state until Dxgkrnl starts a Miracast connected session. When
a Miracast session is starting, and a monitor is connected to the Miracast sink or the driver receives an I/O request
from the Miracast user-mode driver because a new monitor has connected to the Miracast sink, the display
miniport driver should report a monitor arrival hot-plug detection (HPD) awareness value to the operating system
by calling the DxgkCbIndicateChildStatus function. In this call the driver should set these values:
DXGK_CHILD_STATUS MEMBER VALUE
Type StatusMiracast constant value of the

DXGK_CHILD_STATUS_TYPE enumeration
Miracast.Connected TRUE
Miracast.MiracastMonitorType Value that indicates the connection type. If the Miracast sink is
embedded in the monitor or TV, this should be set to the
D3DKMDT_VOT_MIRACAST constant value of the
D3DKMDT_VIDEO_OUTPUT_TECHNOLOGY enumeration.
These are the Miracast functions that the display miniport driver implements:
DxgkDdiMiracastCreateContext
Creates a context to start a kernel-mode instance of a Miracast display device.
DxgkDdiMiracastDestroyContext
Creates a context to start a kernel-mode instance of a Miracast display device.
DxgkDdiMiracastIoControl
Processes a synchronous I/O request that originates from a Miracast user-mode driver call to MiracastIoControl.
DxgkDdiMiracastQueryCaps
Queries the Miracast capabilities of the current display adapter.
Miracast session start

When the Miracast session has been started, the operating system calls the DxgkDdiQueryChildStatus function. The
display miniport driver should set DXGK_CHILD_STATUS.Type to a value of StatusMiracast and should use the
Miracast child structure in DXGK_CHILD_STATUS. If a monitor is connected to the Miracast sink, the driver should
set Miracast.Connected to D3DKMDT_VOT_MIRACAST.
The driver must specify the value of D3DKMDT_VIDEO_SIGNAL_INFO.VsyncFreqDivider, which is the ratio of
the VSync rate of a monitor that displays through a Miracast connected session to the VSync rate of the Miracast
sink. For example, if the Miracast sink vertical refresh rate is 240 Hz and the VSync interrupt frequency of the
connected display is 30 Hz, the driver should set VsyncFreqDivider to 8.
Handling interrupts for completed encode chunks

The data for a single frame transmitted across the wireless Miracast connection can be broken into one or more
encode chunks. Each time the GPU finishes encoding one of these chunks, it must generate an interrupt. In
response to this interrupt, the display miniport driver must call the DxgkCbNotifyInterrupt function and complete
the MiracastEncodeChunkCompleted child structure in the DXGKARGCB_NOTIFY_INTERRUPT_DATA
structure, including setting the DXGK_INTERRUPT_TYPE interrupt type to
DXGK_INTERRUPT_MICACAST_CHUNK_PROCESSING_COMPLETE.
As part of the interrupt handling, the driver can optionally specify the
MiracastEncodeChunkCompleted.pPrivateDriverData and PrivateDataDriverSize members in the
DXGKARGCB_NOTIFY_INTERRUPT_DATA structure. The user-mode driver can access this private driver data in
the MIRACAST_CHUNK_DATA.PrivateDriverData member.
If, over a period of time, the display miniport driver generates more packets with chunk data than the user-mode
display driver consumes, then the available free memory space for new chunks can run out. In this case the display
miniport driver returns STATUS_NO_MEMORY in MiracastEncodeChunkCompleted.Status, and it must call the
DxgkCbNotifyDpc function to notify the operating systemâ€™s GPU scheduler about the error condition. A call to
the GetNextChunkData function will return the STATUS_CONNECTION_RESET status code, and subsequent
calls will start receiving chunks that were submitted after the reset operation. Because some chunks will have been
lost, we recommend that the driver generate and transmit a new I-frame.
Restrictions on source modes

A WDDM display miniport driver, to handle constraints of the pixel pipeline, typically restricts source modes that
are exposed to the operating system. The driver does this by only populating the list of source modes with modes
that are exposed by the monitor that the pixel pipeline also supports. For example, the driver doesnâ€™t modify the
EDID based on pixel pipeline constraints.
Similarly, for Miracast displays the display miniport driver restricts the set of source modes that are exposed to the
operating system when it enumerates the set of source and target modes. For Miracast displays the GPU encode
capabilities, network properties, and sink decode capabilities can reduce the number of source modes that the
Miracast pixel pipeline can support.
If a display miniport driver calls the DXGK_VIDPNSOURCEMODESET_INTERFACE::pfnAddMode function to attempt
to add a 3-D stereo mode to a source thatâ€™s connected to a Miracast target, the function call will fail.
Calling operating system-provided callback functions

These are the Miracast kernel-mode callback functions that the operating system provides:
DxgkCbMiracastSendMessage
Sends an asynchronous message to the user-mode display driver.
DxgkCbMiracastSendMessageCallback
Used in a call to DxgkCbMiracastSendMessage to specify the IO_STATUS_BLOCK structure for the completed
IRP.
DxgkCbReportChunkInfo
Reports info about an encode chunk.
Sending messages asynchronously from kernel-mode to user-mode

Any message that the display miniport driver sends to its associated user-mode driver when it calls the
DxgkCbMiracastSendMessage function wonâ€™t be delivered until the Miracast connected session has started.
Therefore, if the user-mode driver's StartMiracastSession function has not yet been called, the sent message is
deferred until StartMiracastSession returns. If a message is sent after the StopMiracastSession function is called,
then the message is dropped by the operating system, and the DxgkCbMiracastSendMessageCallback function is
called with the error status set in pIoStatusBlock->Status.
Modifying an existing display miniport driver to support Miracast

displays
When the DxgkDdiStartDevice function is called, the display miniport driver needs to add a new Miracast target and
should mark the targetâ€™s hot-plug detection (HPD) awareness value as HpdAwarenessInterruptible so that
the operating system wonâ€™t poll this target. Also, when the DxgkDdiQueryChildRelations function is called, the
driver should report D3DKMDT_VOT_MIRACAST as its connection type.
The driver should not report more than one Miracast target on any full WDDM graphics device. If a driver reports
more than one Miracast target, the operating system fails the starting of the adapter. The driver should also not
report any monitor on this target if the Miracast connected session is not started.
The driver also needs to report a correct DXGK_MIRACAST_DISPLAY_INTERFACE structure, with pointers to
functions that are in kernel-mode address space, when the DirectX graphics kernel subsystem calls the
DxgkDdiQueryInterface function.
When a Miracast session is starting, and a monitor is connected the the Miracast sink, the display miniport driver
should set the DXGK_CHILD_STATUS.Type member to the StatusMiracast constant value, and should also set
DXGK_CHILD_STATUS.Miracast.Connected to TRUE, to report a monitor arrival HPD to the operating system.
The driver should set the DXGK_CHILD_STATUS.Miracast.MiracastMonitorType member to the correct monitor
type thatâ€™s connected to the sink. If the sink is part of the monitor, this member should be set to
D3DKMDT_VOT_MIRACAST.
If the driver knows the EDID of the monitor, it should report this EDID when the operating system calls the
DxgkDdiQueryDeviceDescriptor function.
Depending on hardware capabilities, the Miracast sink mode list, and network bandwidth, the driver should reports
the correct source mode, target mode, rotation mode, and scaling mode. For the target mode, driver should report
the correct VSyncFreqDivider member value in D3DKMDT_VIDEO_SIGNAL_INFO. The operating system
matches the target mode against the monitor mode and prunes any mode that isnâ€™t supported by the monitor.
Miracast user-mode driver tasks to support Miracast
wireless displays
To enable Miracast wireless displays, you need to create a standalone, unique DLL that implements a Miracast user-
mode driver. This driver will be loaded in a dedicated session 0 process. Add the name of the driver in device
software settings in the INF file as MiracastDriverName:
[MyDevice_DeviceSettings]
HKR,, MiracastDriverName, %REG_SZ%, Miracast.dll
The DLL should have an export function named QueryMiracastDriverInterface that the operating system can call.
This driver binary must not use an existing Microsoft Direct3D user-mode display driver DLL.
Note that because the Miracast user-mode driver is loaded into the UMDF0 process, no separate Windows on
Windows (WOW) version of this driver is needed. For example, on a 64-bit processor a 64-bit version of the driver
is used.
When the operating system is ready to prepare for a Miracast connected session, it calls the Miracast user-mode
driverâ€™s CreateMiracastContext function. When this function is called, the Miracast user-mode driver allocates
all the software resources it needs to start a Miracast connected session. In this call the operating system also
provides pointers to callback functions that the driver can call during the lifetime of the current Miracast context.
Then after a Real-Time Streaming Protocol (RTSP) link is established, the operating system calls
StartMiracastSession to actually start the Miracast connected session. When responding to this function call, the
driver should use the Winsock getaddrinfo function, or other relevant functions, to get the Internet Protocol (IP)
address of the Miracast sink and use standard Winsock functions to create a Hypertext Caching Protocol (HTCP)
Remote Desktop Protocol (RDP) socket.
If a Miracast display becomes available, the Miracast user-mode driver calls the operating system-supplied
MiracastIoControl function to send an I/O control request to the display miniport driver to report a monitor
arrival hot-plug detection (HPD) awareness value. The Miracast user-mode driver should also query Miracast sink
info and capabilities and report some of this info, such as the monitor description, to the display miniport driver by
calling MiracastIoControl.
After the Miracast connected session has been started, and after streaming data has been prepared and before
sending it to the network, the driver needs to call the ReportStatistic callback function to report the statistics of
the Miracast link to the operating system.
When the operating system stops a Miracast connected session, it calls the Miracast user-mode driverâ€™s
StopMiracastSession function. In response to this function call, the driver should close all the sockets it created and
drop all the further data streaming. The driver should not close the RTSP socket that the operating system gave it. It
also should not send a request to the display miniport driver to report an HPD on monitor departure.
In responding to the operating systemâ€™s calls to the DestroyMiracastContext function, the Miracast user-mode
driver should release all the software resources that it allocated in CreateMiracastContext.
When the display miniport driver receives a DxgkDdiCommitVidPn request to power off the connected Miracast
monitor, the driver should call the operating system-supplied DxgkCbMiracastSendMessage callback function to
send a message to the Miracast user-mode driver. The Miracast user-mode driver should then put the Miracast sink
into a low-power state.
The RegisterForDataRateNotifications callback function can optionally be called by the Miracast user-mode
driver to register with the operating system to receive, once a second, network quality of service (QoS) notifications
and the current network bandwidth of the Miracast connection. This network info is provided by operating system
calls to the pfnDataRateNotify function.
The Miracast user-mode driver can also call these optional callback functions provided by the operating system:
GetNextChunkData
Provides info about the next encode chunk.
ReportSessionStatus
The driver calls this function to report the status of the current Miracast connected session.
Reporting Miracast encode chunks and statistics
Display hardware can process each video frame sent over a Miracast wireless display link by splitting the frame
into multiple parts, or encode chunks. Each chunk has a unique chunk ID thatâ€™s generated from the frame
number and the frame part (or slice) number. Each chunk thatâ€™s related to the same desktop frame update must
be assigned the same frame number.
Reporting chunk processing

A driver can encode a frame to be sent over a Miracast wireless link either in multiple processing steps—for
example separating color conversion from encoding—or in a single step. Each processing step should be assigned
a unique frame part number within the frame.
Either the Miracast user-mode driver or the display miniport driver must notify the operating system each time that
the hardware has completed a processing step for a frame, and immediately before each part of the frame is sent
to the network. The time of a particular reported processing step is assumed to be the time the event was reported
to the operating system, so it's important to report the stages as rapidly as possible.
The operating system takes no action other than to log these events using the Event Tracing for Windows (ETW)
kernel-level tracing facility. This info is nevertheless important for measuring and investigating performance issues.
These are the possible ways that a driver can provide the notification:
The Miracast user-mode driver calls the ReportStatistic callback function to report details with the
MIRACAST_STATISTIC_TYPE_CHUNK_PROCESSING_COMPLETE type, or with
MIRACAST_STATISTIC_TYPE_CHUNK_SENT to indicate the chunk is just about to be sent to the network stack
for transmission.
The display miniport driver reports details of the chunk processing with the
DXGK_INTERRUPT_MICACAST_CHUNK_PROCESSING_COMPLETE interrupt type, although this report can
only be made at interrupt time, and in a addition to logging the chunk info, a chunk packet is created and
queued so that the Miracast user-mode driver can retrieve it by calling the GetNextChunkData callback
function.
The display miniport driver calls the DxgkCbReportChunkInfo callback function at any IRQL level. This
function logs only the chunk info and does not queue any chunk packets.
If the desktop image is not updated but the driver needs to encode the desktop image again to improve quality, the
same frame number and part numbers should be used. The performance tools will trigger the second encode
complete event for the same frame and part number, indicating that a second encode of the same frame was
performed.
Note The last slice of each frame must have a frame part number of zero. Doing this indicates to performance tools
that this is the last slice of the frame.
To ensure correct synchronization of the primary surface, if the encoding is performed by the pixel pipeline, any
requested flip operation at a VSync interval should not be reported before the encoding has finished accessing the
primary surface. This prevents the presenter from rendering to the primary surface while the encode engine is
reading from it.
The Miracast user-mode driver should inform the operating system at each of several stages of processing the
frame:
Start frame, chunk type MIRACAST_CHUNK_TYPE_FRAME_START:
Represents the point where the operating system asks the driver to display the new desktop frame. Although
technically this could be reported from the Miracast user-mode driver, the start of processing new frame always
involves the display miniport driver and should be reported by that driver.
Color convert complete, chunk type MIRACAST_CHUNK_TYPE_COLOR_CONVERT_COMPLETE:
Some solutions have separate color convert and encode stages. In such solutions the color convert complete
processing event should be reported as soon as possible, and the driver should use the
DXGK_MIRACAST_CHUNK_INFO.ProcessingTime member to report the time it took for the hardware to
perform the operation. If the entire frame is color converted all at once rather than in slices, then the part number
should be zero.
Encode complete, chunk type MIRACAST_CHUNK_TYPE_ENCODE_COMPLETE:
Indicates that the H.264 encode has been completed. The ProcessingTime and EncodeRate members of the
DXGK_MIRACAST_CHUNK_INFO structure should be completed.
Frame send, calling ReportStatistic using MIRACAST_STATISTIC_TYPE_CHUNK_SENT:
Indicates that the data packet for this frame/part number is just about to be sent to the networking API for
transmission from the Miracast user-mode driver. If the data for this frame/part is sent using multiple calls to the
networking API, then this should only be logged just before the first packet is sent. The call to indicate this should
be made just before calling the network API. This is important because if the network API blocks calls, then we do
not want that blocked time to count against processing of the frame in the graphics stack.
Dropped frame, chunk type MIRACAST_CHUNK_TYPE_FRAME_DROPPED:
If at any time the driver decides that it won't complete the processing of the frame/part and send it to the sink, then
it should report the dropped frame. In this context a frame is only considered dropped if the driver actually started
processing it by logging MIRACAST_CHUNK_TYPE_FRAME_START. If a driver calculates that it's going to skip this
frame without any processing, it can log MIRACAST_CHUNK_TYPE_FRAME_DROPPED without logging
MIRACAST_CHUNK_TYPE_FRAME_START.
Driver defined chunk type MIRACAST_CHUNK_TYPE_ENCODE_DRIVER_DEFINED_1 or _2:
These chunk types are available to help you understand the performance of a scenario. An example would be
where the driver uses these types to indicate that an I-Frame was created for this frame. Another example would be
where the driver logs an additional packet after the last slice of the frame has been sent to network APIs that
contained the total size of the encoded frame.
Here are examples of how frame color is converted and then how the display miniport driver reports completion of
the color conversion.
Reporting a single frame without using slices:
Value of MIRACAST_CHUNK_INFO member: ChunkType value ChunkId. ChunkId. Stage of processing
MIRACAST_ CHUNK_TYPE_ FrameNumber PartNumber ProcessingTime EncodeRate Start processing frame
FRAME_START 101 0 0 0 Color conversion is complete COLOR_CONVERT_ COMPLETE 101 0 950 0 Encode is
complete ENCODE_ COMPLETE 101 0 1042 15000 Just before call to send data to network ReportStatistic call*
101, value of ChunkSent. ChunkId. FrameNumber 0, value of ChunkSent. ChunkId. PartNumber N/A N/A
*Called using MIRACAST_STATISTIC_TYPE_CHUNK_SENT.
Reporting a single frame, processed using slices:
Value of MIRACAST_CHUNK_INFO member: ChunkType ChunkId. ChunkId. Stage of processing MIRACAST_
CHUNK_TYPE_ FrameNumber PartNumber ProcessingTime EncodeRate Start processing frame FRAME_START
101 0 0 0 Color conversion is complete COLOR_CONVERT_ COMPLETE 101 0 950 0 Encode of slice 1 is complete
ENCODE_ COMPLETE 101 1 1042 15000 Encode of slice 2 is complete ENCODE_ COMPLETE 101 0 400 15000
Just before call to send slice 1 data to network ReportStatistic call* 101, value of ChunkSent. ChunkId.
FrameNumber 1, value of ChunkSent. ChunkId. PartNumber N/A N/A Just before call to send slice 2 data to
network ReportStatistic call* 101, value of ChunkSent. ChunkId. FrameNumber 0, value of ChunkSent.
ChunkId. PartNumber (See Note above.) N/A N/A
Reporting an original frame, processed and then re-encoded without using slices:
Value of MIRACAST_CHUNK_INFO member: ChunkType ChunkId. ChunkId. Stage of processing MIRACAST_
CHUNK_TYPE_ FrameNumber PartNumber ProcessingTime EncodeRate Start processing frame FRAME_START
101 0 0 0 Color conversion is complete COLOR_CONVERT_ COMPLETE 101 0 950 0 Encode is complete
ENCODE_ COMPLETE 101 0 1042 15000 Just before call to send data for original frame to network
ReportStatistic call* 101, value of ChunkSent. ChunkId. FrameNumber 0, value of ChunkSent. ChunkId.
PartNumber N/A N/A Re-encode is complete ENCODE_ COMPLETE 101 0 500 15000 Just before call to send
data for re-encoded frame to network ReportStatistic call* 101, value of ChunkSent. ChunkId. FrameNumber 0,
value of ChunkSent. ChunkId. PartNumber N/A N/A
Reporting protocol events

When the Miracast user-mode driver reports protocol events by calling the ReportStatistic function with
MIRACAST_STATISTIC_DATA.StatisticType set to MIRACAST_STATISTIC_TYPE_EVENT, the operating system
logs the event but takes no other action. These events are nevertheless valuable for diagnostics and performance
investigation.
The MIRACAST_PROTOCOL_EVENT enumeration includes possible protocol event types that can be reported.
Reporting protocol errors

While a Miracast connected session is in progress, if a Miracast user-mode driver discovers an error, it should call
the ReportSessionStatus callback function with appropriate MIRACAST_STATUS error status info in the
MiracastStatus parameter. The operating session will always destroy the session when an error is reported.
Note that although the operating system merely logs the ReportSessionStatusStatus parameter for diagnostics
and doesn't take any action based on its value. However, we recommend that the driver use this parameter to
differentiate between different causes of the error.
Calling DisplayConfig functions for a Miracast target
To reduce compatibility issues of existing apps being exposed to new Miracast targets, the QueryDisplayConfig
and SetDisplayConfig function implementations have ways for apps to find Miracast targets:
A value of DISPLAYCONFIG_OUTPUT_TECHNOLOGY_MIRACAST in the
DISPLAYCONFIG_VIDEO_OUTPUT_TECHNOLOGY enumeration indicates that the VidPN target is a Miracast
device.
The Flags parameter value of QDC_ALL_PATHS in a call to QueryDisplayConfig won’t return any paths that
connect to a Miracast target that does not have an active monitor attached.
For each path that has a connected Miracast monitor, QueryDisplayConfig returns the connector type that’s
reported by the Miracast sink. Internal Miracast sinks report a value of
DISPLAYCONFIG_OUTPUT_TECHNOLOGY_MIRACAST in the
DISPLAYCONFIG_VIDEO_OUTPUT_TECHNOLOGY enumeration. For example, if a Miracast sink reports that a
TV is connected to the sink with a High-Definition Multimedia Interface (HDMI) cable, then
QueryDisplayConfig would report the target type as DISPLAYCONFIG_OUTPUT_TECHNOLOGY_HDMI.
The DISPLAYCONFIG_VIDEO_SIGNAL_INFO structure has a VSync frequency divider member,
vSyncFreqDivider, that’s used similarly to D3DKMDT_VIDEO_SIGNAL_INFO.vSyncFreqDivider.
The DisplayConfigGetDeviceInfo function provides the base connector type for any target. In the case of a
Miracast target, this function always returns a value of DISPLAYCONFIG_OUTPUT_TECHNOLOGY_MIRACAST
in the DISPLAYCONFIG_VIDEO_OUTPUT_TECHNOLOGY enumeration.
When Windows Display Driver Model (WDDM) 1.3 and later drivers play 24 frame per second (fps) video content
on 60-Hz monitors, they must implement 48-Hz adaptive refresh to conserve power. In this scenario the monitor
switches from 60 Hz to a 48-Hz refresh rate to play back 24-fps video content.
These are the device driver interfaces (DDIs) that WDDM 1.3 and later drivers must implement for 24-fps playback:
Adaptive refresh reference

These reference topics describe how to implement this capability in your drivers:
CheckPresentDurationSupport (new)
pfnCheckPresentDurationSupport(DXGI) (new)
D3DDDIARG_CHECKPRESENTDURATIONSUPPORT (new)
DXGI_DDI_ARG_CHECKPRESENTDURATIONSUPPORT (new)
DXGKARG_SETVIDPNSOURCEADDRESS (new Duration member)
DXGKARG_SETVIDPNSOURCEADDRESSWITHMULTIPLANEOVERLAY (new Duration member)
D3DDDI_DEVICEFUNCS (new pfnCheckPresentDurationSupport function pointer)
DXGI1_3_DDI_BASE_FUNCTIONS (new pfnCheckPresentDurationSupport function pointer)
GPU power management of idle states and active
power
Starting with Windows 8, an optional GPU power management infrastructure is available that lets Windows
Display Driver Model (WDDM) 1.2 and later drivers manage power for individual devices or a set of devices. This
infrastructure provides a standardized mechanism to support F-state and P-state power management in
collaboration with Windows.
Driver implementation Optional
WHCK requirements and tests Device.Graphics…RuntimePowerMgmt
GPU power management device driver interface (DDI)

These new and updated functions and structures are available starting with Windows 8 for the display miniport
driver to transition the state of power components and to communicate power events with the Microsoft DirectX
graphics kernel subsystem.
DxgkCbCompleteFStateTransition
DxgkCbPowerRuntimeControlRequest
DxgkCbSetPowerComponentActive
DxgkCbSetPowerComponentIdle
DxgkCbSetPowerComponentLatency
DxgkCbSetPowerComponentResidency
DxgkDdiPowerRuntimeControlRequest
DxgkDdiSetPowerComponentFState
DXGK_DRIVERCAPS
DXGK_POWER_COMPONENT_FLAGS
DXGK_POWER_COMPONENT_MAPPING
DXGK_POWER_COMPONENT_TYPE
DXGK_POWER_RUNTIME_COMPONENT
DXGK_QUERYADAPTERINFOTYPE
DXGKARG_QUERYADAPTERINFO
DXGK_QUERYSEGMENTOUT3
GPU power management scenarios

GPUs and display screens are two of the largest power consumers in laptops, mobile devices, and desktop PCs.
These are the key power management scenarios to reduce power consumption and extend battery life:
Mobile form factor devices can go into an idle state and save power because individual system components
shut down if they are not in use.
Windows System on a Chip (SoC)–based devices behave like consumer devices and mobile phones that is, they
turn on immediately when they are needed, thereby saving energy.

WHCK documentation on Device.Graphics…RuntimePowerMgmt.
Windows 2000 Display Driver Model (XDDM)
Design Guide
Display adapter drivers that run on Windows Vista can adhere to one of two models: the Windows Display
Driver Model (WDDM) or the Windows 2000 display driver model (XDDM). Drivers that adhere to Windows
Display Driver Model (WDDM) run only on Windows Vista and later. Drivers that adhere to XDDM run on
Windows 2000 and later operating systems (including Windows Vista and Windows 7).
Note XDDM and VGA drivers will not compile on Windows 8 and later versions. If display hardware is
attached to a Windows 8 computer without a driver that is certified to support WDDM 1.2 or later, the system
defaults to running the Microsoft Basic Display Driver.
The following sections describe the Windows 2000 display driver model:
DirectDraw
Direct3D DDI
Video Miniport Drivers in the Windows 2000 Display Driver Model
Implementation Tips and Requirements for the Windows Display Driver Model (WDDM)
GDI
Note The documentation for the Windows 2000 display driver model no longer includes information about
how to create a display driver that runs on the Microsoft Windows 98/Me platforms. If you want to create a
display driver for Windows 98/Me, you can use the WDK documentation that released with Windows Vista.
You can obtain the WDK for Windows Vista RTM from the Microsoft Connect website.
Roadmap for Developing Drivers for the Windows
2000 Display Driver Model (XDDM)
The Windows 2000 display driver model (XDDM) requires that a graphics hardware vendor
supply a paired display driver and video miniport driver. Both of these drivers for display run in kernel mode.
Note XDDM and VGA drivers will not compile on Windows 8 and later versions. If display hardware is attached to a
Windows 8 computer without a driver that is certified to support Windows Display Driver Model (WDDM) 1.2 or
later, the system defaults to running the Microsoft Basic Display Driver.
To create XDDM display drivers on Windows 7 and earlier versions of Windows, download and install the Windows
7 Windows Driver Kit (WDK) (build 7600), open the WDK Help documentation from the Start menu, and follow the
recommended steps in the topic, Roadmap for Developing Drivers for the Windows Display Driver Model (WDDM).
Graphics adapters that run on NT-based operating systems require a paired display driver and video miniport
driver. This section introduces these drivers and provides the following information that is relevant to both:
Fast User Switching
Compatibility Testing Requirements for Display and Video Miniport Drivers
The following figure shows the components required to display on Windows 2000 and later.
The shaded elements in the preceding figure represent services that are supplied with Windows 2000 and later.
The unshaded elements indicate that a third-party display driver and video miniport driver are required in order
for a graphics adapter to display in the Windows 2000 and later systems.
For every type of graphics card that can be used with an NT-based operating system, there must be both a display
driver and a corresponding video miniport driver. The miniport driver is written specifically for one graphics
adapter (or family of adapters). The display driver can be written for any number of adapters that share a common
drawing interface; for example, the VGA display driver can be used with either the VGA or ET4000 miniport driver.
This is because the display driver draws, while the miniport driver performs operations such as mode sets and
provides information about the hardware to the driver. It is also possible for more than one display driver to work
with a particular miniport driver; for example, the 16- and 256-color SVGA display drivers can use the same
miniport driver.
The following sections describe the key responsibilities of display and video miniport drivers. The breakdown in
responsibilities is not hard and fast; the balance between modularity and performance is the key. For example, the
hardware pointer code for the VGA driver resides in the miniport driver. This promotes modularity, so the same
display driver can handle both the Video Seven VRAM, which has a hardware pointer, and the ET4000, which does
not.
Windows 2000 Display Driver Responsibilities
Windows 2000 Video Miniport Driver Responsibilities
Windows 2000 Display Driver Responsibilities
A display driver is a kernel-mode DLL for which the primary responsibility is rendering. When an application calls a
Win32 function with device-independent graphics requests, the Graphics Device Interface (GDI) interprets these
instructions and calls the display driver. The display driver then translates these requests into commands for the
video hardware to draw graphics on the screen.
The display driver can access the hardware directly. This is because there is a wide variety in graphics hardware
capabilities, and because display is one of the most time-critical parts of any system. This accessibility and the wide
scope of capabilities within GDI provide considerable flexibility when implementing a display driver.
By default, GDI handles drawing operations on standard format bitmaps, such as on hardware that includes
a frame buffer. A display driver can hook and implement any of the drawing functions for which the
hardware offers special support. For less time-critical operations and more complex operations not
supported by the graphics adapter, the driver can punt functions back to GDI and allow GDI to do the work.
See Hooking Versus Punting for details.
For especially time-critical operations, the display driver has direct access to video hardware registers. For
example, the VGA display driver for x86 systems uses optimized assembly code to implement direct access
to hardware registers for some drawing and text operations. Note The video miniport driver must manage
all resources (for example, memory resources) shared between the video miniport driver and the display
driver. The system does not guarantee that resources acquired in the display driver will always be accessible
to the video miniport driver.
The display driver is discussed in detail in Display Drivers (Windows 2000 Model).
Windows 2000 Video Miniport Driver Responsibilities
The kernel-mode video miniport driver (.sys file) generally handles operations that must interact with other NT
kernel components. For example, operations such as hardware initialization and memory mapping require action
by the NT I/O subsystem. Video miniport driver responsibilities include resource management, such as hardware
configuration, and physical device memory mapping. The video miniport driver must be specific to the video
hardware.
The display driver uses the video miniport driver for operations that are not frequently requested; for example, to
manage resources, perform physical device memory mapping, ensure that register outputs occur in close
proximity, or respond to interrupts.
Note The video miniport driver must manage all resources (for example, memory resources) shared between the
video miniport driver and the display driver. The system does not guarantee that resources acquired in the display
driver will always be accessible to the video miniport driver.
The video miniport driver also handles:
Mode set interaction with the graphics card.
Multiple hardware types, minimizing hardware-type dependency in the display driver.
Mapping the video register into the display driver's address space. I/O ports are directly addressable.
The video miniport driver is discussed in detail in Video Miniport Drivers in the Windows 2000 Display Driver
Model.
To design an effective Windows 2000 and later display driver and video miniport driver, consider the following
strategies:
Modify an existing Windows Driver Kit (WDK) sample driver that was designed for a similar type of graphics
adapter to reduce driver design time.
Use C to write as much of the driver as possible to maximize portability, using assembly language only
when necessary for time-critical operations that are not well supported by the hardware. Although coding in
assembly increases the potential for optimization, time and portability issues outweigh its benefits.
Use video miniport drivers for operations that manage resources, perform physical device memory
mapping, ensure that register outputs occur in close proximity, or respond to interrupts. Miniport drivers are
predominately used for handling variations within a hardware family and for minimizing display driver
hardware-type dependencies.
For additional information of interest to display driver writers, see Graphics DDI Functions for Display Drivers. This
topic and the subtopics following it discuss the graphics DDI functions that are required, conditionally required,
and optional for a display driver. Video Miniport Drivers in the Windows 2000 Display Driver Model and its
subtopics contain similar information aimed at video miniport driver writers.
You should also consider the following facts:
The display driver and video miniport driver operate in the same privileged kernel-mode address space as
the rest of the NT executive. A fault in either driver will cause the rest of the system to fault.
Display drivers and video miniport drivers can be preempted at any time.
The code and data sections of a display driver are both entirely pageable.
Exported functions must execute the standard NT-based operating system prolog on entry and the epilog on
exit. For more information, see the Microsoft Windows SDK documentation.
For information that is specific to display drivers, see Graphics DDI Functions for Display Drivers. That section
contains information about required, conditionally required, and optionally required graphics DDI functions.
To ensure display performance, display drivers can access the graphics card in the following ways:
Indirectly--by sending IOCTLs to the video miniport driver of the graphics adapter. See Communicating
IOCTLs to the Video Miniport Driver.
Directly--by reading and writing to video memory (the frame buffer) or hardware registers. See Accessing
the Frame Buffer and Hardware Registers.
Communicating IOCTLs to the Video Miniport Driver
The following figure shows how the display driver communicates with the video miniport driver using IOCTLs.
The display driver calls EngDeviceIoControl with an IOCTL to send a synchronous request to the video miniport
driver. GDI uses a single buffer for both input and output to pass the request to the I/O subsystem. The I/O
subsystem routes the request to the video port, which processes the request with the video miniport driver.
Some IOCTL requests require the miniport driver to access video registers, and others store or retrieve information
from the miniport driver's data structures. Generally, no requests require the video miniport driver to perform
actual drawing operations.
In general, and unless modularity dictates otherwise, the display driver handles drawing and other time-critical
operations. Sending an IOCTL to the miniport driver to perform a time-critical function can degrade system
performance.
See Video Miniport Driver I/O Control Codes for descriptions of system-defined video IOCTLs. You can extend the
interface between the display driver and the video miniport driver by adding a private IOCTL, which must be
formatted as described in Defining I/O Control Codes. If you need to write a new IOCTL, you should first contact
Microsoft Technical Support.
Accessing the Frame Buffer and Hardware Registers
There are several ways to reduce display driver size. For example, you can implement only those functions that the
display driver can perform faster than GDI, and then specify GDI to perform all other operations. GDI often
performs a substantial amount of the drawing to linear frame buffers to reduce the size of the driver. GDI cannot
access banked memory directly; therefore, when the frame buffer is not linearly addressable, the display driver
must divide the frame buffer into a series of banks and provide a means for GDI to perform its draw operations to
the appropriate bank. See Supporting Banked Frame Buffers for details.
The display driver has direct access to I/O-mapped and memory-mapped video registers. This access allows a
display driver to achieve high performance. For example, the driver might need to access video hardware registers
to send line-drawing commands at high throughput.
Similarly, for graphics cards, such as the S3, many of the innermost loops in the graphics engine code require reads
and writes of several video controller ports (for example, text output in graphics mode, bit block transfers, and line
drawing). Rather than requiring the display driver to send an IOCTL to the miniport driver for each request, the
display driver is permitted to access the video hardware directly.
Fast User Switching
Fast User Switching, a feature of Windows XP and later, enables multiple users to be logged onto the same
machine. A particular user's desktop and any running applications persist from that user's logon session to his or
her next.
Fast User Switching works by allowing multiple virtual display drivers to run at the same time. (Each virtual display
driver is associated with a particular PDEV.) The video miniport driver, however, exists as a single instance. When
one of the virtual display drivers calls a video miniport driver callback, serious problems ensue if the miniport
driver attempts to access a passed-in memory address in the context of the display driver when that display driver
instance is no longer the active kernel thread. A tenet of the display driver/video miniport driver architecture is that
information should flow in one direction only: from the display driver to the video miniport driver.
NT-based operating system display and video miniport drivers must be installed using an INF file. The Driver
Development Kit (DDK) provides a tool called geninf.exe that lets you generate an INF file for your display and
video miniport drivers.
Note The geninf.exe tool is not available in the Windows Driver Kit (WDK), which replaces the DDK.
When geninf.exe is run, it displays a number of dialogs that prompt you for information, such as the company
name and the names of the display drivers and video miniport driver. Geninf.exe then generates an INF file from
this information.
Note The file generated by geninf.exe might not be a fully valid INF file. Geninf.exe produces an INF file that will
likely need custom registry settings added for each device described in the INF file.
If your miniport driver maps more than 8 MB of device memory, you should manually edit the INF file to include
the section and appropriate entries described in INF GeneralConfigData Section.
When you run geninf.exe, select "Display" when prompted to choose the device class. An INF file marked as
Class=Display will be interpreted by the system-supplied display class installer during driver installation. This
ensures that all registry entries associated with a video driver are properly initialized.
An INF file of the class Display can install only the following files on the system:
A single miniport driver
A single display driver
Control panel extension DLLs
No other types of driver or application files can be installed from an INF file of class Display.
Limitations of geninf.exe
You cannot use geninf.exe to generate:
An INF file that supports more than one architecture.
An INF file that supports Windows 95/98/Me or Windows NT version 4.0 or previous.
A mirror driver INF file. Use the INF file provided with the mirror sample driver as a template. See Mirror
Driver INF File for more details.
A monitor INF file. Use the INF named monsamp.inf as a template. See Monitor INF File Sections for more
details.
These sample INF files are both shipped with the Windows Driver Kit (WDK).
See Creating an INF File and INF File Sections and Directives for detailed information when updating a sample INF
file.
INF GeneralConfigData Section
If your miniport driver maps more than 8 MB of device memory, include a GeneralConfigData section in your INF
file.
[GeneralConfigData]
[MaximumDeviceMemoryConfiguration = n]
[MaximumNumberOfDevices = n]
The following are GeneralConfigData entries and values:

MaximumDeviceMemoryConfiguration=n
Specifies the maximum number of megabytes of device memory that the miniport driver will attempt to map into
the system address space for one video device enumerated by PCI. Windows uses this value as a hint to determine
how many system page table entries (PTEs) it should allocate for the mapping. For this entry to take effect, a reboot
may be needed. You can determine whether a reboot is necessary by checking the status of your device in the
Device Manager.
MaximumNumberOfDevices=n
Specifies how many video devices (as enumerated by PCI and driven by your miniport driver) are expected to be
present in the system. If you specify this entry, you must also specify the
MaximumDeviceMemoryConfiguration entry. For this entry to take effect, a reboot may be needed. You can
determine whether a reboot is necessary by checking the status of your device in the Device Manager.
For information about supporting more than one monitor, see Supporting DualView (Windows 2000 Model) and
Multiple-Monitor Support in the Display Driver.
Display INF File Sections
This section tells you how to write the setup information file (INF) sections that specifically apply to a graphics-
adapter installation. For more general information about INF files, see INF File Sections and Directives.
DDInstall.SoftwareSettings Section
A DDInstall.SoftwareSettings section contains an AddReg directive and/or a DelReg directive. Each directive
points to a separate, writer-defined INF section that contains registry entries for the installer to add or delete.
For example, the following code shows an AddReg directive that points to a writer-defined add-registry section
named ACME-1234_SoftwareDeviceSettings. The DelReg directive points to a delete-registry section named
ACME-1234_DeleteSWSettings.
[ACME-1234.SoftwareSettings]
AddReg=ACME-1234_SoftwareDeviceSettings
DelReg=ACME-1234_DeleteSWSettings
The add-registry section adds four entries to the registry and sets their values, as shown in the following code.
[ACME-1234_SoftwareDeviceSettings]
HKR,, InstalledDisplayDrivers, %REG_MULTI_SZ%, Acme1
HKR,, OverRideMonitorPower, %REG_DWORD%, 0
HKR,, MultiFunctionSupported, %REG_DWORD%, 1
HKR,, VideoDebugLevel, %REG_DWORD%, 2
The preceding code first sets the value of the InstalledDisplayDrivers entry to the name of the display driver. The
code then sets the value of the OverRideMonitorPower entry to 0 (in other words, FALSE). This entry, which
should be used only by OEM system vendors, controls the power behavior of the monitor device (for example, the
LCD, CRT, or TV). When set to 1, OverRideMonitorPower limits the possible power states of the monitor device to
D0 and D3.
Third, the code sets the value of the MultiFunctionSupported entry to 1 (in other words, TRUE), which is the
required value for an adapter that supports multiple PCI functions. Last, the code sets the value of the
VideoDebugLevel entry, which controls the global debug level that checked builds use for debug messages. This
value ranges from 0 (no debug messages) to 3 (the most verbose messages). For more information about global
debug levels, see VideoDebugPrint.
Most video miniport drivers are not VGA-compatible and require no VgaCompatible entry in the registry. If your
video miniport driver is VGA-compatible, add the VgaCompatible entry to the registry and set its value to 1
(TRUE) in the add registry section, as shown here:
[ACME-1234_SoftwareDeviceSettings]
HKR,, VgaCompatible, %REG_DWORD%, 1
For more information about VGA-compatible video miniport drivers, see VGA-Compatible Video Miniport Drivers
(Windows 2000 Model).
The following delete-registry section deletes three registry entries: GraphicsClocking, MemClocking, and
CapabilityOverride.
[ACME-1234_DeleteSWSettings]
HKR,, GraphicsClocking
HKR,, MemClocking
HKR,, CapabilityOverride
The CapabilityOverride entry specifies the capabilities that the system turns off for the display driver. For
example, even if the display driver implements a DrvEscape function, that capability cannot be used if the 0x10 flag
is set in the CapabilityOverride entry.
The value of the CapabilityOverride registry entry is a bitwise OR of one or more of the flags that are listed in the
following table.
FLAG MEANING
0x1 Disables all hardware acceleration. Equivalent to moving

the hardware-acceleration slide bar (in the Display item of
Control Panel) to the minimum setting.
0x2 Disables all support for Microsoft DirectDraw and

Microsoft Direct3D hardware acceleration.
0x4 Disables all support for Direct3D hardware acceleration.

Prevents calls to DdGetDriverInfo, which request
Direct3D capability and callback information, from
reaching the driver.
0x8 Disables all support for the OpenGL installable client driver
(ICD) and miniclient driver (MCD). Prevents calls to
DrvSetPixelFormat, DrvDescribePixelFormat, and
DrvSwapBuffers from reaching the driver. Also prevents
OPENGL_GETINFO, OPENGL_CMD and MCDFUNCS
escapes from reaching the driver.
0x10 Disables support for all escapes in the driver. Prevents calls
to DrvEscape and DrvDrawEscape from reaching the
driver.
For display drivers that are shipped with Windows, CapabilityOverride is typically set to 0x8, which disables
OpenGL. Note that it is not necessary to set the 0x10 flag to disable OpenGL, and you should not set the 0x10 flag
unless you intend to disable all escapes.
Microsoft Windows XP and earlier operating systems do not delete the CapabilityOverride registry entry when a
display driver is updated--for example, from a driver that is shipped with Windows to a more recent driver
provided by an independent hardware vendor (IHV). The persistent CapabilityOverride entry disables the same
capabilities in the updated driver that it disabled in the old driver. Therefore, for Windows XP and earlier, include a
DelReg directive in your INF file that explicitly deletes the existing CapabilityOverride entry. Windows XP SP1
and later operating systems automatically delete the CapabilityOverride entry when a driver is updated so, for
those systems, it is not necessary to delete the CapabilityOverride entry.
Disabling AGP Transfer Rates and Sideband Addressing
If necessary, you can modify the INF file for your display adapter to disable certain AGP transfer rates or sideband
addressing. Note that a miniport driver can change AGP transfer rates when it calls AgpSetRate, but such calls are
not allowed to change transfer rates that are disabled in an INF file.
The regstr.h header file, which is shipped with the Windows Driver Kit (WDK), defines the following set of flags.
FLAG VALUE MEANING
AGP_FLAG_NO_1X_RATE 0x00000001L Disables the single-speed (66 MHz)

transfer rate.
AGP_FLAG_NO_2X_RATE 0x00000002L Disables two times the single-speed

transfer rate.
AGP_FLAG_NO_4X_RATE 0x00000004L Disables four times the single-speed

transfer rate.
AGP_FLAG_NO_8X_RATE 0x00000008L Disables eight times the single-

speed transfer rate.
AGP_FLAG_NO_SBA_ENABLE 0x00000100L Disables sideband addressing (SBA).
Two types of settings exist: global and platform-specific. The registry contains the global entries at the following
location:
HKLM,"SYSTEM\CurrentControlSet\Control\AGP"
You can find the platform-specific entries under "Parameters" in the filter-driver service key. For example, these
entries exist for the hypothetical AcmeAGP adapter in the following location in the registry:
HKLM,"SYSTEM\CurrentControlSet\Services\AcmeAGP\Parameters"
To disable sideband addressing for a device that has a DeviceID of 0x012A (Nuclear3D) and a VendorID of 0x1AD0
on VIA Technologies platforms, add a Nuclear3D_Install.HW section to your INF file. (For more information about
this type of INF Install section, see INF DDInstall.HW Section.) In this section, include an AddReg directive similar
to the following:
[Nuclear3D_Install.HW]
AddReg = Nuclear3D_Reg
Next, create the following section, which the AddReg directive points to:
[Nuclear3D_Reg]
HKLM,"SYSTEM\CurrentControlSet\Services\viaagp\Parameters","1AD0012A",0x00030003,00,01,00,00,00,00,00,00
The preceding entry indicates that the subkey identified by the string following HKLM is to be added to the registry,
under the HKEY_LOCAL_MACHINE root. The "1AD0012A" string is the entry name, from which the first four
characters compose the DeviceID, and the last four compose the VendorID for this part. The hexadecimal number
following the entry name comprises a set of flags, which indicate the data type for the entry. The last part is the
entry value, which disables sideband addressing.
Important The bytes in the value entry are in the opposite order from those of the AGP_FLAG_NO_SBA_ENABLE
flag's definition in the preceding table.
Suppose you determine that AGP 4X is broken on every chipset for this same device. To indicate this fact, add a
second entry to the Nuclear3D_Reg section:
[Nuclear3D_Reg]
HKLM,"SYSTEM\CurrentControlSet\Services\viaagp\Parameters","1AD0012A",0x00030003,00,01,00,00,00,00,00,00
HKLM,"SYSTEM\CurrentControlSet\Control\AGP","1AD0012A",0x00030003,04,00,00,00,00,00,00,00
The second entry in the preceding code indicates that the subkey identified by the string following HKLM is to be
added to the registry, under the HKEY_LOCAL_MACHINE root. As in the previous entry, the value name associated
with this subkey is a string that is composed of the device's DeviceID and VendorID. The flag value is also the same.
The value entry is AGP_FLAG_NO_4X_RATE, which disables the AGP 4X transfer rate. Notice that, as before, the
bytes in this value entry are in the opposite order as those of the flag's value in the preceding table.
Monitor INF File Sections
Monitors must be installed in NT-based operating systems using an INF file. The Windows Driver Kit (WDK)
provides a sample monitor INF file, monsamp.inf, that you should use as a template to generate an INF file for
your monitor. You cannot use the geninf.exe tool described in Creating Graphics INF Files to generate a monitor
INF.
The rest of this topic comments on some of the sections in monsamp.inf that are of specific interest to monitor INF
writers. For more general information about INF files, see INF File Sections and Directives.
You can also use an INF file to override the monitor Extended Display Identification Data (EDID). See Overriding
Monitor EDIDs with an INF.
SourceDisksFiles Section
Files that must be copied during monitor installation should be placed in the [SourceDisksFiles] section. The
following example identifies an .icm file that is on distribution disk 1.
[SourceDisksFiles]
profile1.icm=1
For more general information, see INF SourceDisksFiles Section. See Monitor Profiles for more information
about color management and profiles.
Models Section
Information about each model that is supported by a given manufacturer should be placed in the Models section.
The following example identifies two models manufactured by ACME:
[ACME]
%ACME-1234%=ACME-1234.Install, Monitor\MON12AB
%ACME-5678%=ACME-5678.Install, Monitor\MON34CD
Each model is represented by a single line. Each line contains three elements:
Model name -- for example, %ACME-1234% is a token that represents the actual model name (which
would appear in the Strings section).
Link to a subsequent DDInstall section -- for example, ACME-1234.Install is a link to the subsequent
[ACME-1234.Install] section.
Hardware identification -- for example, the expression Monitor\MON12AB combines the device class
(Monitor) and the device identification (MON12AB) as it appears in the device's EDID.
For more general information, see INF Models Section.
DDInstall Section
The DDInstall section provides information to the driver about the operations to be performed when it installs the
specified device. Each line in this section provides a link or links to different INF writer-defined sections that appear
later in the INF file. The following example shows the DDInstall section for the ACME-1234 model:
[ACME-1234.Install]
AddReg=ACME-1234.AddReg, 1280, DPMS
CopyFiles=ACME-1234.CopyFiles
DelReg directive--provides a link to the DEL_CURRENT_REG section, which details the registry keys to be
deleted.
AddReg directive--provides links to three sections in which registry keys to be added are detailed. These
sections are ACME-1234.AddReg, 1280, and DPMS.
CopyFiles directive--provides a link to the ACME-1234.CopyFiles section, which specifies the files to be
copied from the distribution disk or disks.
For more general information, see INF DDInstall Section.
INF Writer-Defined Sections
An INF writer-defined section can have any name, provided it is unique within the INF file. These sections are
pointed to by directives in other sections. The following bullet items discuss some of the INF writer-defined
sections from monsamp.inf:
DEL_CURRENT_REG section -- identifies four registry keys whose values will be deleted: MODES,
MaxResolution, DPMS, and ICMProfile. These keys will be updated appropriately with new values in
subsequent sections.
[DEL_CURRENT_REG]
HKR,MODES
HKR,,MaxResolution
HKR,,DPMS
HKR,,ICMProfile
1280 section -- updates the MaxResolution registry key to the string value shown.
[1280]
HKR,,MaxResolution,,"1280, 1024"
DPMS section -- updates the DPMS registry key to 1 (TRUE). For a monitor that does not support power
management, the following line should instead set the DPMS key value to 0 (FALSE).
[DPMS]
HKR,,DPMS,,1
AddReg section -- You can specify entries under a MODES key in an add-registry section of a monitor INF
to identify the monitor's supported resolutions and timings. If the INF specifies modes in this way, the
modes' entries will override the values that are specified in the monitor's Extended Display Information
Data (EDID). Therefore, MODES key INF values should be used only if a problem exists with the EDID or in
the interpretation of the EDID.
Each subkey to the MODES key specifies a resolution and can contain up to nine values that are used to
specify specific timings or timing ranges. The resolution for each subkey name must be a combination of
two integer values--width and height--separated by a comma. The specific timings are named from Mode1
to Mode9. The naming must be contiguous. The string values allow frequencies for horizontal and vertical
sync pulses to be specified, either as single values or as ranges, where a range is given as a minimum value,
followed by a dash (-), followed by a maximum value. The frequency values are currently interpreted only as
integers with any digits that follow the decimal place ignored. The string allows the polarity of the
horizontal and vertical sync pulses to be specified. However, these polarity values are currently ignored.
Only the maximum horizontal sync pulse value is required in each string. For example, the following shows
that for each subkey string, the information in square brackets is optional:
[{MinHSync}-]{MaxHSync}[,{MinVSync}-{MaxVSynx}]
Therefore, each subkey string can be specified without a vertical sync range. However, it is not
recommended to specify a subkey string without a vertical sync range.
The first line of the following sets the "MODES\1280,1024" subkey to the string value that is shown. The
same line also identifies a value name for this subkey, Mode1. The first pair of numbers in the string
following the Mode1 subkey specifies the range of horizontal synchronization frequencies, in KHz. The next
pair of numbers in this string specifies the range of vertical synchronization frequencies, in Hz. In the
second line, the PreferredMode registry key is set to the values shown in the accompanying string. The
values in the string are used to set both the horizontal and the vertical resolution, in pixels, and the screen
refresh rate, in hertz (Hz), for the preferred screen mode. Only the horizontal and the vertical resolution
values are required in the PreferredMode string. For example, the following shows that for the
PreferredMode string, the information in square brackets is optional:
{Width},{Height}[,{Frequency}]
Therefore, a preferred mode can be specified without a frequency. However, it is not recommended to
specify a preferred mode without a frequency.
The third line sets the ICMProfile key to the string value "profile1.icm".
[ACME-1234.AddReg]
HKR,"MODES\1280,1024",Mode1,,"27.0-106.0,55.0-160.0,+,+"
HKR,,PreferredMode,,"1024,768,70"
HKR,,ICMProfile,0,"profile1.icm"
For a monitor that meets the sRGB specification, which is preferred, no monitor profile is needed.
Compatibility Testing Requirements for Display and
Video Miniport Drivers
This section lists some of the requirements for NT-based operating system graphics drivers and associated
components to meet the Windows Hardware Quality Lab (WHQL) test criteria. The drivers that meet or exceed
these tests are authorized to display the Microsoft Windows Logo. For more information, go to the Windows
Platform Development website.
INF and Installation Requirements
Video Miniport Driver and Display Driver Requirements (Windows 2000 Model)
Control Panel Requirements
INF and Installation Requirements
NT-based operating system display and video miniport drivers must be installed using an INF file. To ensure that all
registry entries associated with a video driver are properly initialized, this INF must be interpreted by the system-
supplied display class installer and marked as Class=Display.
Video Miniport Driver and Display Driver
Requirements (Windows 2000 Model)
This section provides links to other sections that contain information about requirements for video miniport drivers
and for display drivers.
Video Miniport Driver Requirements (Windows 2000 Model)
Control Panel Requirements
The following are requirements for extensions to the display Control Panel:
Existing property pages must remain unchanged. Any Microsoft-provided property page (including
Settings) must not be disabled, modified, removed, or replaced.
Custom property pages can be added only under Advanced Properties. The features exposed at the top
level are commonly accessed features that are included in every Windows system. Because Windows 2000
and later operating system versions support multiple displays, custom property pages cannot be added to
the top-level property page set in the Display Control Panel.
Control Panel extensions must be able to operate with existing Windows Control Panel elements.
Custom property pages must be labeled with an icon in addition to the text name. In order to prevent
conflicts with future operating system or shell releases, third-party tabs must contain, in addition to the page
label, either an icon (with the company's logo) or text with the name of the vendor. For example, "Acme
Video Controls" is acceptable; "Video Controls" is not.
Control panel extensions must not initialize if the necessary hardware/driver combination is not present. If
the display hardware that ships with the custom Control Panel extensions is not present, the extensions
should not load. Likewise, if a custom property page has features that are dependent on proprietary driver
extensions (for example, extensions that are not guaranteed to be present in every other display driver),
those features must disable themselves, or the property page must not load when the necessary driver is not
installed.
Control Panel extension must respect the Hide modes that this monitor cannot display check box on the
Monitor tab. If the check box is selected, then Control Panel extension must not display any modes that are
not enumerated by EnumDisplaySettings (described in the Microsoft Windows SDK documentation).
Control panel state must be stored in the registry. No .ini files are allowed. Any state that is maintained by
your Control Panel extension must be stored in the SOFTWARE key in the registry, accessible through HKR
in the INF file.
Microsoft NT-based operating system display driver writers are concerned with two core software interfaces:
Graphics DDI interface--The set of functions that the display driver implements. GDI can call the graphics
DDI interface to process graphics commands.
GDI interface--System-supplied helper routines called by display drivers to simplify driver implementation.
This section describes key concepts associated with NT-based operating system display drivers as well as some
implementation information. See GDI Support for Graphics Drivers and Using the Graphics DDI for graphics driver
design details that are common to both printer drivers and display drivers, such as driver initialization and
termination, and graphics output.
Display driver writers can also implement the following DDIs:
DirectDraw DDI -- Graphics interface that allows vendors to provide hardware accelerations for DirectDraw.
See DirectDraw for details.
Direct3D DDI -- 3D graphics interface that allows vendors to provide hardware accelerations for Direct3D.
See Direct3D DDI for details.
For complete descriptions of the graphics DDI entry points and structures, as well as GDI service functions and
objects, see GDI Functions.
Graphics DDI Functions for Display Drivers
A Microsoft NT-based operating system display driver must implement several graphics DDI functions. Although
writing a driver that capitalizes on existing GDI capabilities would be smaller and simpler to write, you should
make sure that your driver also implements those operations it can perform more efficiently than GDI.
The display driver graphics DDI functions fall into three groups, each of which is discussed in following topics:
1. Graphics DDI functions required by every display driver.
2. Graphics DDI functions required under certain conditions.
3. Graphics DDI functions that are optional.
Required Display Driver Functions
At a minimum, every display driver must:

1. Enable and disable the graphics hardware.
2. Supply GDI with information about hardware capabilities.
3. Enable the drawing surface.
The following table lists the functions that all display drivers must implement. Following DrvEnableDriver, the
remaining functions are listed alphabetically. Note that except for DrvEnableDriver, which GDI calls by name, all
other display driver functions do not have fixed names, and are listed with pseudonames.
FUNCTION DESCRIPTION
DrvEnableDriver As the initial driver entry point, provides GDI with the
driver version number and entry points of optional
functions supported.
DrvAssertMode Resets the video mode for a specified video hardware

device.
DrvCompletePDEV Informs the driver about the completion of device

installation.
DrvDisableDriver Frees all allocated resources for the driver and returns the
device to its initially loaded state.
DrvDisablePDEV When the hardware is no longer needed, frees memory

and resources used by the device and any surface created,
but not yet deleted.
DrvDisableSurface Informs the driver that the surface created for the current
device is no longer needed.
DrvEnablePDEV Enables a PDEV.
DrvEnableSurface Creates a surface for a specified hardware device.
DrvGetModes Lists the modes supported by a specified video hardware

device.
A list of required functions for all graphics drivers appears in Required Graphics Driver Functions.
Conditionally Required Display Driver Functions
Depending on how a driver is implemented and on the features of the underlying adapter, other graphics DDI
functions may be required. For example, if a driver manages its own surface (using EngCreateDeviceSurface to
get a handle to the surface), that driver must also, at a minimum, support the following drawing functions:
DrvCopyBits Translates between device-managed raster surfaces and

GDI standard-format bitmaps.
DrvStrokePath Draws a path (curve or line) when called by GDI.
DrvTextOut Renders a set of glyphs at specified positions.
Note Driver calls are serialized for any given surface.

Drivers that write to standard-format DIBs usually allow GDI to manage most or all of these operations. Displays
that support settable palettes must support the DrvSetPalette function.
DrvSetPalette Requests that the driver realize the palette for a specified
device. The driver sets the hardware palette to match the
entries in the given palette as closely as possible.
A list of conditionally required functions for all graphics drivers appears in Conditionally Required Graphics Driver
Functions.
Optional Display Driver Functions
In order to reduce driver size, display driver writers usually add only those optional functions that are well
supported in video hardware. The display driver can implement the functions listed in the following tables. These
functions are sorted into the following categories:
Bitmap Management Functions
Drawing Functions
Image Color Management Functions
Pointer and Window Management Functions
Miscellaneous Functions
Bitmap Management Functions
DrvCreateDeviceBitmap Creates and manages a bitmap with a driver-defined

format.
DrvDeleteDeviceBitmap Deletes a device-managed bitmap.
Drawing Functions
DrvAlphaBlend Provides bit-block transfer capabilities with alpha

blending.
DrvBitBlt Provides general bit-block transfer capabilities between

device-managed surfaces, between GDI-managed
standard-format bitmaps, or between a device-managed
surface and a GDI-managed standard-format bitmap.
DrvDitherColor Requests a device to create a brush dithered against a

device palette.
DrvFillPath Paints a closed path for a device-managed surface.
DrvGradientFill Shades the specified primitives.
DrvLineTo Draws a single, solid, integer-only cosmetic line.

DrvPlgBlt Provides rotate bit-block transfer capabilities between

combinations of device-managed and GDI-managed
surfaces.
DrvRealizeBrush Realizes a specified brush for a defined surface.
DrvStretchBlt Allows stretching block transfers among device-managed

and GDI-managed surfaces.
DrvStretchBltROP Performs a stretching bit-block transfer using a ROP.
DrvStrokeAndFillPath Simultaneously strokes and fills a path.
DrvTransparentBlt Provides bit-block transfer capabilities with transparency.
Image Color Management Functions

DrvIcmCheckBitmapBits Checks whether the pixels in the specified bitmap lie

within the device gamut of the specified transform.
DrvIcmCreateColorTransform Creates an ICM color transform.
DrvIcmDeleteColorTransform Deletes the specified ICM color transform.
DrvIcmSetDeviceGammaRamp Sets the hardware gamma ramp of the specified display

device.
Pointer and Window Management Functions

DrvDescribePixelFormat Describes the pixel format for a device-specified PDEV by

writing a pixel format description to a
PIXELFORMATDESCRIPTOR structure.
DrvMovePointer Moves a pointer to a new position and redraws it.
DrvSaveScreenBits Saves or restores a specified rectangle of the screen.
DrvSetPixelFormat Sets the pixel format of a window.

DrvSetPointerShape Removes the pointer from the screen if the driver has
drawn it, and then sets a new pointer shape.
Miscellaneous Functions
DrvDestroyFont Notifies driver that a font realization is no longer needed;

driver can free allocated data structures.
DrvDrawEscape Implements draw-type escape functions.
DrvEscape Queries information from a device not available in a

device-independent graphics DDI.
DrvFree Frees storage associated with an indicated data structure.
DrvNotify Allows a display driver to be notified about certain

information by GDI.
DrvSynchronize Coordinates drawing operations between GDI and a

display driver-supported coprocessor device; for engine-
managed surfaces only.
DrvSynchronizeSurface Allows drawing operations performed by a device's

coprocessor to be coordinated with GDI.
Display drivers can also optionally implement the Microsoft DirectDraw and/or Direct3D interfaces. See the
following sections for details:
DirectDraw
Direct3D DDI
A list of optional functions for all graphics drivers appears in Optional Graphics Driver Functions.
An NT-based operating system display driver must meet the requirements specified in the PC 99 Design Guide. If
the driver is to be submitted for Windows Hardware Quality Lab (WHQL) testing, it must meet WHQL
requirements as well.
Display Driver Initialization
Display driver initialization is similar to graphics driver initialization, as described in Supporting Initialization and
Termination Functions. This section provides initialization details that are specific to display drivers.
Video miniport and display driver initialization occur after the NT executive and the Win32 subsystem are loaded
and initialized. The system loads the video miniport driver or drivers that are enabled in the registry, and then
determines which video miniport driver and display driver pair to use. During this process, GDI opens all necessary
display drivers, based on the information provided by Window Manager.
The basic display driver initialization procedure, in which the desktop is created, is shown in the following figure.
1. When GDI is called to create the first device context (DC) for the video hardware, GDI calls the display driver
function DrvEnableDriver. Upon return, DrvEnableDriver provides GDI with a DRVENABLEDATA
structure that holds both the driver's graphics DDI version number and the entry points of all callable
graphics DDI functions that are implemented by the driver (other than DrvEnableDriver).
2. GDI then calls the driver's DrvEnablePDEV function to request a description of the driver's physical device's
characteristics. In the call, GDI passes in a DEVMODEW structure, which identifies the mode that GDI wants
to set. If GDI requests a mode that the display or underlying miniport driver does not support, then the
display driver must fail this call.
3. The display driver represents a logical device controlled by GDI. As shown in the following figure, a single
logical device can manage several physical devices, each characterized by type of hardware, logical address,
and surfaces supported. The display driver allocates the memory to support the device it creates. A display
driver may be called upon to manage more than one PDEV for the same physical device, although only one
PDEV can be enabled at a time for a given physical device. Each PDEV is created in a separate GDI call to
DrvEnablePDEV, and each call creates another PDEV that is used with a different surface.
Because a driver must support more than one PDEV, it should not use global variables.
The following figure illustrates logical versus physical devices.
4. When installation of the physical device is complete, GDI calls DrvCompletePDEV. This function provides
the driver with a GDI-generated physical device handle to use when requesting GDI functions for the device.
5. In the final stage of initialization, a surface is created for the video hardware by a GDI call to
DrvEnableSurface, which enables graphics output to the hardware. Depending on the device and the
environment, the display driver enables a surface in one of two ways:
The driver manages its own surface by calling the GDI function EngCreateDeviceSurface to obtain a
handle for the surface. The device-managed surface method is required for hardware that does not
support a standard-format bitmap and is optional for hardware that does.
GDI can manage the surface completely as an engine-managed surface if the hardware device has a
surface organized as a standard-format bitmap. A driver can call EngModifySurface to convert the
device-managed primary bitmap to one that is engine-managed. The driver can still hook any drawing
operations.
Any existing GDI bitmap handle is a valid surface handle. A driver can call EngModifySurface to convert the
device-managed primary bitmap to an engine-managed bitmap. If the surface is engine-managed, GDI can handle
any or all drawing operations. If the surface is device-managed, at a minimum, the driver must handle DrvTextOut,
DrvStrokePath, and DrvBitBlt.
GDI automatically enables DirectDraw after calling DrvEnableSurface. After DirectDraw is initialized, the driver
can use DirectDraw's heap manager to perform off-screen memory management. See DirectDraw and GDI for
details.
A display driver must implement DrvNotify in order to receive notification events, particularly the
DN_DRAWING_BEGIN event. GDI sends this event immediately before it begins drawing, so it can be used to
determine when caches can be initialized.
See the Plug and Play section for more information about the boot process.
Synchronization Issues for Display Drivers
Microsoft advises that the display driver not call any GDI functions while holding a lock. It is especially important
that the display driver not call any of the following functions while holding a mutex. Doing so can lead to a
deadlock.
BRUSHOBJ_hGetColorTransform
BRUSHOBJ_pvAllocRbrush
BRUSHOBJ_pvGetRbrush
BRUSHOBJ_ulGetBrushColor
CLIPOBJ_bEnum
CLIPOBJ_cEnumStart
CLIPOBJ_ppoGetPath
EngAcquireSemaphore
EngAllocMem
EngAllocPrivateUserMem
EngAllocUserMem
EngAlphaBlend
EngAssociateSurface
EngBitBlt
EngCheckAbort
EngComputeGlyphSet
EngControlSprites
EngCopyBits
EngCreateBitmap
EngCreateClip
EngCreateDeviceBitmap
EngCreateDeviceSurface
EngCreateDriverObj
EngCreateEvent
EngCreatePalette
EngCreatePath
EngCreateSemaphore
EngCreateWnd
EngDeleteClip
EngDeleteDriverObj
EngDeleteEvent
EngDeletePalette
EngDeletePath
EngDeleteSafeSemaphore
EngDeleteSemaphore
EngDeleteSurface
EngDeleteWnd
EngDxIoctl
EngEraseSurface
EngFillPath
EngFntCacheAlloc
EngFntCacheFault
EngFntCacheLookUp
EngFreeMem
EngFreeModule
EngFreePrivateUserMem
EngFreeUserMem
EngGetType1FontList
EngGradientFill
EngHangNotification
EngInitializeSafeSemaphore
EngLineTo
EngLoadImage
EngLoadModule
EngLoadModuleForWrite
EngLockDirectDrawSurface
EngLockDriverObj
EngLockSurface
EngMapEvent
EngMapFile
EngMapFontFileFD
EngMarkBandingSurface
EngModifySurface
EngMovePointer
EngNineGrid
EngPaint
EngPlgBlt
EngQueryPalette
EngReleaseSemaphore
EngSetPointerShape
EngStretchBlt
EngStretchBltROP
EngStrokeAndFillPath
EngStrokePath
EngTextOut
EngTransparentBlt
EngUnloadImage
EngUnlockDirectDrawSurface
EngUnlockDriverObj
EngUnlockSurface
EngUnmapEvent
EngUnmapFile
EngUnmapFontFile
EngUnmapFontFileFD
EngWaitForSingleObject
FONTOBJ_cGetAllGlyphHandles
FONTOBJ_cGetGlyphs
FONTOBJ_pifi
FONTOBJ_pjOpenTypeTablePointer
FONTOBJ_pQueryGlyphAttrs
FONTOBJ_pvTrueTypeFontFile
FONTOBJ_pxoGetXform
FONTOBJ_vGetInfo
HeapVidMemAllocAligned
PALOBJ_cGetColors
PATHOBJ_bEnumClipLines
PATHOBJ_bMoveTo
PATHOBJ_bPolyBezierTo
PATHOBJ_vEnumStartClipLines
PATHOBJ_vGetBounds
STROBJ_bEnum
VidMemFree
WNDOBJ_bEnum
WNDOBJ_cEnumStart
WNDOBJ_vSetConsumer
XFORMOBJ_bApplyXform
XFORMOBJ_iGetFloatObjXform
XFORMOBJ_iGetXform
XLATEOBJ_cGetPalette
XLATEOBJ_hGetColorTransform
XLATEOBJ_iXlate
XLATEOBJ_piVector
Debugging Display Drivers
The following topics in this section describe debugging and testing techniques that are specific to display drivers:
Disabling EA Recovery
Disabling the Watchdog Timer While Testing Display Drivers
Disabling EA Recovery
In Microsoft Windows XP SP1 and later operating systems, GDI uses a watchdog timer to monitor the time that
threads spend executing in the display driver. The watchdog defines a time threshold. If a thread spends more time
in a display driver than the threshold specifies, the watchdog tries to recover by switching to VGA graphics mode. If
the attempt fails, the watchdog generates bug check 0xEA, THREAD_STUCK_IN_DEVICE_DRIVER.
Before attempting to recover, the watchdog will break into any debugger that is attached to the computer. You can
then debug the code--as long as you have first disabled EA recovery.
In Windows XP SP1, disable EA recovery by setting the global variable WdDisableRecovery, which is located in
watchdog.sys, to 1. To do so, you can enter the following WinDbg command:
ed watchdog!WdDisableRecovery 1
In Microsoft Windows Server 2003, disable EA recovery by setting the global variable VpDisableRecovery, which
is located in videoprt.sys, to 1. To do so, you can enter the following WinDbg command:
ed videoprt!VpDisableRecovery 1
After you have disabled EA recovery, put breakpoints in your display driver where you suspect the code is looping,
and resume execution.
Disabling the Watchdog Timer While Testing Display
Drivers
If, during debugging and testing, you use software emulation for the rendering that a display adapter will
eventually perform, you might need to increase the watchdog time threshold. Otherwise, it is likely that the
emulation code, which renders significantly more slowly than hardware does, will exceed the threshold.
To specify the watchdog time threshold for display drivers, create the following REG_DWORD entry in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\BreakPointDelay
Set the value of BreakPointDelay to the watchdog time threshold, in 10-second units. For example, a value of 200
specifies a threshold of 2,000 seconds.
If you test your display driver without an attached debugger, you can prevent the watchdog timer from generating
a bug check. To do so, create the following REG_DWORD entry in the registry, and set its value to 1:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck
The techniques described in this topic are only for debugging and testing. Do not release a driver that creates or
alters BreakPointDelay or DisableBugCheck.
Display Hardware Acceleration Slider
The Display Properties dialog box has a hardware acceleration slider that can be helpful when you debug a
display driver. By using the slider, you can set the display hardware acceleration support to one of six levels ranging
from level 0 (full acceleration) to level 5 (no acceleration).
To find the hardware acceleration slider in Microsoft Windows XP, open the Display Properties dialog box and
click the Settings tab. Click the Advanced button, and then click the Troubleshoot tab.
The following list describes the portion of hardware acceleration that is disabled at each level. Any feature that is
disabled at a particular level is disabled in all subsequent levels.
Level 0
The slider is in the far right position. Hardware acceleration is fully enabled.
Level 1
Hardware cursor and device-bitmap support are disabled.
Level 2
The following display driver functions are not called. Instead, GDI performs the operations in software.
DrvStretchBlt
DrvPlgBlt
DrvFillPath
DrvStrokeAndFillPath
DrvLineTo
DrvStretchBltROP
DrvTransparentBlt
DrvAlphaBlend
DrvGradientFill
Level 3
Microsoft DirectDraw and Direct3D support are disabled.
Level 4
Only the following graphics operations are accelerated.
DrvTextOut
DrvBitBlt
DrvCopyBits
DrvStrokePath
Also, the following display driver functions are not called.
DrvSaveScreenBits
DrvEscape
DrvDrawEscape
DrvResetPDEV
DrvSetPixelFormat
DrvDescribePixelFormat
DrvSwapBuffers
Level 5
The slider is in the far left position. The panning driver (part of kernel-mode GDI) handles all rendering. GDI calls the
display driver's DrvEnablePDEV and DrvEnableSurface functions to create a primary surface and also calls the
display driver to set the display mode. The display driver is not called to do any rendering.
Another way to limit display hardware acceleration is to set flags in the CapabilityOverride registry entry. For
example, setting the 0x2 flag in the CapabilityOverride entry is equivalent to placing the hardware acceleration
slider at level 3. For a description of the CapabilityOverride registry entry, see Display INF File Sections.
Desktop Management
A display driver must implement DrvAssertMode and DrvGetModes to manage desktops.

If the display driver is palette-managed, it will also receive a call to DrvSetPalette to reset its palette to the correct
state.
GDI's mechanism for handling dynamic mode changes has changed significantly in Windows 2000 and later
operating system versions. The GDI HDEV assigned to a driver during initialization may differ from the HDEV
assigned after the mode change is complete. Display drivers will generally be unaffected by this change for the
following reasons:
Drivers have always assigned ppdev->hdevEng = hdev in their DrvCompletePDEV implementations.
Drivers have always referenced ppdev->hdevEng in any callbacks that require an HDEV.
Switching Desktops: Responding to DrvAssertMode
When switching between desktops on a display, Window Manager ensures that the desktops are properly redrawn
and that a mouse pointer is enabled and displayed in the correct position. The display driver receives a call to
DrvAssertMode only when there is a desktop switch.
When this function is called, the driver ensures that the indicated PDEV is either in the mode specified when the
PDEV was created, or in text mode. Window Manager then selects the correct pointer shape and moves it to the
current position. GDI, not the driver, is responsible for maintaining the mouse pointer state.
GDI calls DrvAssertMode to set the mode of a specified hardware device. This function selects either the mode
specified when the display driver-defined PDEV structure was created or the default mode of the hardware. The
driver should keep a record of the current mode of the PDEV.
GDI also calls DrvAssertMode, with the enable parameter set to FALSE, when the user switches from a windowed
application to a full-screen application in x86 applications, or when the user switches desktops (on all platforms).
The display driver must restore the video hardware to a default mode by sending IOCTL_VIDEO_RESET_DEVICE in
an EngDeviceIoControl call to the video miniport driver.
Returning Display Modes: DrvGetModes
The display driver must also support DrvGetModes. This function gives GDI a pointer to an array of DEVMODEW
structures. The structures define the attributes of the display for the various modes that it supports, including the
dimension (in both pixels and millimeters), number of planes, bits per plane, color information, and so on.
The order in which a driver writes the available display modes to memory when the DrvGetModes function is
called can affect the final display mode that Windows chooses. In general, if an application does not specify a
default mode, the system will select the first matching mode in the list supplied by the driver.
For example, suppose that the current display mode is
800x600x32bpp@60Hz DMDO_DEFAULT DMDFO_CENTER
and the driver specifies the list of available display modes as follows:
A. 600x800x32bpp@60Hz DMDO_270 DMDFO_STRETCH
B. 600x800x32bpp@60Hz DMDO_90 DMDFO_STRETCH
C. 600x800x32bpp@60Hz DMDO_90 DMDFO_CENTER
D. 600x800x32bpp@60Hz DMDO_270 DMDFO_CENTER
Case 1
If an application attempts to set the monitor to 600x800x32bpp@60Hz, but the DM_DISPLAYORIENTATION and
DM_DISPLAYFIXEDOUTPUT flags are not set in the dmFields member of DEVMODEW, the system must choose
the orientation and fixed output modes. In this case the system will choose display mode C because it is the first
listed mode that matches the current DMDFO_CENTER setting.
Case 2
If the application attempts to set the monitor to 600x800x32bpp@60Hz DMDFO_STRETCH, the system will choose
display mode A.
Case 3
If the application attempts to set the monitor to 600x800x32bpp@60Hz DMDO_270, the system will choose display
mode D.
Case 4
If the application attempts to set the monitor to 600x800x32bpp@60Hz DMDO_DEFAULT, the system will fail to
find an acceptable match.
One exception applies to these rules: when the system seeks a match for the display orientation, and the orientation
is not specified and the current mode cannot be matched, the system will give DMDO_DEFAULT priority over other
display orientations.
For example, suppose that the current display mode is
600x800x32bpp@60Hz DMDO_90 DMDFO_STRETCH
and the driver specifies the list of available display modes as follows:
A. 800x600x32bpp@60Hz DMDO_180 DMDFO_CENTER
B. 800x600x32bpp@60Hz DMDO_180 DMDFO_STRETCH
C. 800x600x32bpp@60Hz DMDO_DEFAULT DMDFO_CENTER
D. 800x600x32bpp@60Hz DMDO_DEFAULT DMDFO_STRETCH
In this situation, if the application attempts to set the monitor to 800x600x32bpp@60Hz, the system will choose
display mode D.
Supporting Multiple PDEVs
This section shows how an application can create a new PDEV while the current PDEV is still loaded. Control Panel's
Display program requires that a display driver support the enabling of additional PDEVs, because it is possible for
an application to create a new PDEV with a new desktop. Specifically, the end user can click the Display icon to run a
test on changes to such elements as the size, number of colors, and refresh rate of the screen. The Display program
creates a new desktop dynamically to test the display's mode changes.
GDI performs the following steps when the user clicks on the Display icon to request a mode change. These steps
assume that no active Direct3D, WNDOBJ, or DRIVEROBJ objects are owned by the current driver instance.
1. Temporarily disable the current PDEV
Call DrvAssertMode (old PDEV instance). Call is made using FALSE if miniport driver is to assume
control, and with TRUE if PDEV should be used.
2. Load new driver (if required by new PDEV instance)
Call DrvEnableDriver (new driver instance).
3. Create new PDEV
Call DrvEnablePDEV (new PDEV instance).
Call DrvCompletePDEV (new PDEV instance).
Call DrvEnableSurface (new PDEV instance).
4. Get DirectDraw information (if DirectDraw is hooked by driver). Second call to DrvGetDirectDrawInfo is
made only if the first call succeeds.
Call DrvGetDirectDrawInfo (new PDEV instance).
Call DrvGetDirectDrawInfo (new PDEV instance).
5. Enable DirectDraw (if hooked by driver and previous call to DrvGetDirectDrawInfo succeeded).
Call DrvEnableDirectDraw (new PDEV instance).
6. Copy old PDEV state to new PDEV instance (if both instances use same driver, and DirectDraw is hooked by
driver).
Call DrvResetPDEV.
7. Notify each driver instance of its new HDEV association. The first call to DrvCompletePDEV notifies the new
driver instance; the second call notifies the old driver instance.
Call DrvCompletePDEV (new PDEV instance).
Call DrvCompletePDEV (old PDEV instance).
The driver should use the new HDEV value in any callbacks to GDI that require an HDEV.
8. Disable DirectDraw (if hooked by the driver and DirectDraw is active).
Call DrvDisableDirectDraw (old PDEV instance).
9. Disable surface.
Call DrvDisableSurface (old PDEV instance).
10. Disable PDEV.
Call DrvDisablePDEV (old PDEV instance).
In this example, GDI temporarily disables the current PDEV when the user clicks Apply, and then creates a second
PDEV that matches the display mode selections in the dialog box. After the user views a bitmap on the display
screen under the test mode, the second PDEV is destroyed and the Display program restores the original PDEV for
the desktop. Note that without the ability to revert back to the original display settings, the system would become
unusable if the settings were incompatible with the hardware and driver.
If the current instance of the driver owns a Direct3D, WNDOBJ, or DRIVEROBJ object, the driver's view of the
previous mode change sequence changes as follows (note that in Windows 2000 and later, DirectDraw is always
enabled as soon as the driver is initialized):
Destruction of the owning driver instance will be deferred. Specifically, the second call to
DrvCompletePDEV in step 7, step 8, and step 9 will not occur at the time of the mode change. As a result,
the old driver instance is disabled due to the call to DrvAssertMode(FALSE) in step 1, and is retained until
either the system does a mode change back to the original mode, or until all the objects that reference the
instance are destroyed.
If the system reverts back to the original mode before the referencing objects are destroyed, the original
driver instance will be resurrected. That is, steps 2 through 5 do not occur, and the original driver instance is
reenabled by a call to DrvAssertMode(TRUE) (see step 1).
If the system does not revert back to the original mode before all the referencing objects are destroyed, then
the driver instance will be destroyed when the final referencing object is destroyed. That is, the second call to
DrvCompletePDEV in step 7, step 8, and step 9 will occur at the time the final referencing object is destroyed
(for example, when all the owning processes are terminated).
An implication of this is that Direct3D or OpenGL drivers can be called to destroy an inactive driver instance at any
time. These drivers can be called even if another instance of the driver is currently active, or if the driver is in full-
screen MS-DOS mode, or if another driver owns the hardware entirely (such as the VGA driver). Consequently, the
DrvDisableDirectDraw, DrvDisableSurface, and DrvDisablePDEV routines (see steps 8-10) of a driver cannot
assume that the device is in graphics mode and that they own exclusive access. As a general rule, drivers should not
manipulate their video hardware in their DrvDisableXxx routines unless they know that their instance is currently
active (by remembering the state from the last DrvAssertMode call).
Note A PDEV is private to a driver and contains all the information and data that represents the associated physical
device. To create multiple PDEVs, the graphics driver must meet both of the following requirements:
1. The driver must not use global variables instead of dereferencing members of a PDEV structure. If global
variables are used, they might contain or point to random data when a new PDEV is created, or an old one
restored. All state information must be saved in the PDEV. The PDEV is always passed to any graphics
operation and is therefore used to get or set global data.
2. The DrvDisableSurface, DrvDisablePDEV, and DrvDisableDriver routines must be implemented in the
graphics driver so that an application can create and destroy additional PDEVs, and in some cases load more
than one driver.
Note If the driver's version number is 1.0, GDI will not call the driver to create a second PDEV. The version number
of the driver is returned in DRVENABLEDATA.
Note Occasionally, the Display program's test bitmap will be displayed using a different driver than the currently-
loaded driver. For example, if a system is running in 16-color mode with the VGA driver and testing a 64K-color
mode with the VGA64K display driver, the VGA64K driver will be loaded dynamically and unloaded when the test is
complete.
Pointer Control
Every application must be able to control a pointer that moves around a windowed display in response to a
pointing device, such as a mouse. The display driver, GDI, or the video miniport driver can draw the pointer. Refer
also to Controlling the Pointer and Moving the Pointer.
GDI can directly handle all pointer drawing for a display that uses a linearly addressable buffer. For a device that is
not a linear frame buffer, GDI uses DrvCopyBits for pointer drawing. However, pointer code supported by
hardware and implemented in the display driver is much faster.
Display drivers can sometimes choose which kinds of pointers they will draw and which kind they will allow GDI to
handle. For example, a device might support monochrome pointers in hardware but fail the color pointer calls,
allowing GDI to handle them instead.
The display driver can control the pointer in situations for which the processor does not have to be owned
exclusively and the pointer does not have to be drawn off an interrupt, such as the vertical synchronization
interrupt. In these special cases, the miniport driver must draw and control the pointer because certain kernel-
mode callbacks (which are only available in the video miniport driver) are required. This can adversely affect
performance because it requires IOCTLs to communicate with the miniport driver for each pointer operation.
To write a display driver and miniport driver pair, you must include IOCTLs for passing pointer information
between the two drivers, and to allow the miniport driver to assume the drawing of any or all pointers, if necessary.
Pointer Drawing
GDI supports both color pointers and monochrome pointers. The shape of a monochrome pointer is defined by a
single bitmap. The width of the bitmap is the same as the width of the pointer on the display, but the bitmap has
twice the vertical extent as appears on the display, allowing it to contain two masks.
Calls to the pointer functions are serialized by GDI. This means two different threads in the driver cannot execute
the pointer functions simultaneously. There are two possible pointer functions: DrvSetPointerShape and
DrvMovePointer.
Drawing Monochrome Pointers
A monochrome bitmap consists of two parts: the first defines the AND mask for the pointer; the second defines the
XOR mask. Together these masks provide two bits of information for each pixel of the pointer image. The following
table describes the result that is displayed for the indicated values in the AND and XOR masks.
AND MASK VALUE XOR MASK VALUE RESULT DISPLAYED
0 0 Pixel is black
0 1 Pixel is white
1 0 Pixel is unchanged (transparent)
1 1 Pixel color is inverted
This bitmap definition and usage supplies a black-and-white image, while providing support for transparency and
inversion of the pixels that make up the pointer.
Drawing Color Pointers
Defining the color pointer in the same way as the monochrome pointer (that is, as a bitmap that includes an AND
portion and an XOR portion -- see Pointer Drawing and Drawing Monochrome Pointers) also supports
transparency and inversion.
If the color bitmap is black (index 0) and the AND mask value is 1, then the result is transparency.
If the color bitmap is white and the AND mask value is 1, then the result is inversion.
If the device cannot display color, then a pointer can be drawn as black and white.
These conventions allow applications to use a single pointer definition for both color and monochrome displays.
Controlling the Pointer: DrvSetPointerShape
If a display driver controls the pointer, then the driver must support DrvSetPointerShape to allow the pointer
shape to be changed. A call to DrvSetPointerShape produces the following results:
1. The function removes any existing pointer that the driver has drawn on the display.
2. The function sets the new requested shape, unless it is unable to handle the shape.
3. The new pointer is displayed at the position indicated by the parameters of the call.
The driver can call EngSetPointerShape to have GDI manage a software cursor.
Moving the Pointer: DrvMovePointer
If DrvSetPointerShape is included in the driver, then DrvMovePointer must also be supported. This function
moves a driver-managed pointer to a new position. Because GDI serializes calls to pointer functions,
DrvMovePointer is not be called while any thread is drawing in the display driver, unless the GCAPS_ASYNCMOVE
flag has been set in the DEVINFO structure.
The driver should call EngMovePointer to have GDI move an engine-managed pointer on the device. The driver
requests that GDI manage the cursor by calling EngSetPointerShape.
Managing Display Palettes
If the video hardware supports colors that can be set, it maintains a color lookup table called a palette. GDI takes
each RGB value and translates it into a device color index so that it can be displayed. GDI uses precalculated and
cached tables for the translation. These tables are accessible to drivers as the user object XLATEOBJ. Therefore,
every GDI graphics function that takes source colors and moves them to a destination device uses a XLATEOBJ
structure to translate the colors. For more information about palettes and how GDI handles them, see GDI Support
for Palettes.
If the video hardware supports palettes that can be set, GDI calls the DrvSetPalette function in the display driver
when it has finished mapping colors into the device palette requested by the application. GDI passes the new
palette to the display driver, and the driver queries the PALOBJ to set its internal hardware palette to match the
palette changes for the video hardware. This is known as palette realization.
The DrvSetPalette function supplies a handle to a PDEV to the driver, and requests the driver to realize the palette
for that device. The driver should set the hardware palette to match the entries in the given palette as closely as
possible.
This entry point is required if the device supports a palette that can be set, and should not be provided otherwise. A
display driver specifies that its device has a settable palette by setting the GCAPS_PALMANAGED bit in the
flGraphicsCaps field of the DEVINFO structure returned in DrvEnablePDEV.
The service routine PALOBJ_cGetColors is available to display drivers. This function downloads RGB colors from
an indexed palette, and should be called from within the implementation of DrvSetPalette.
Bitmaps in Display Drivers
Certain devices, such as the 16-color VGA display, can more rapidly perform bit-block transfers from nonstandard
bitmaps. To support this, a driver can hook DrvCreateDeviceBitmap which allows the driver to create a bitmap
that a driver manages completely. When a driver creates such a bitmap, the driver can store it in any format. The
driver examines the passed parameters and provides a bitmap with at least as many bits-per-pixel as requested.
The contents of the bitmap are undefined after creation. If the application requests a device-managed bitmap, GDI
calls the driver for drawing functions after DrvCreateDeviceBitmap returns control. If the driver returns FALSE,
the driver-managed bitmap is not created, so GDI can handle drawing operations on an engine-managed surface.
The DrvSaveScreenBits function is also related to bit-block transfers in display drivers. Some display drivers can
move data to or from off-screen device memory more rapidly than an area can be redrawn or copied from a DIB.
These drivers can hook DrvSaveScreenBits, which lets the driver be called to save or restore a specified rectangle
of a displayed image more quickly when a menu or dialog box appears.
Note For bit-block transfer calls, GDI (not the driver) handles pointer exclusion and clip region locking.
Drivers that implement device bitmaps in off-screen memory can significantly improve system performance. Off-
screen device bitmaps improve system performance by:
Using accelerator hardware in place of GDI to draw.
Improving the speed of bitmap-to-screen bit-block transfers.
Reducing demands on main memory (a bitmap stored in off-screen memory isn't taking up space in main
memory).
Leveraging hardware to perform operations that support OpenGL, such as mask bit-block transfers and
double-buffering.
Drivers should implement device bitmaps in off-screen memory through DrvCreateDeviceBitmap.
Asynchronous Rendering
A display driver that handles one or more graphics DDI drawing operations asynchronously and provides GDI
access to its bitmaps through the use of EngModifySurface must implement a synchronization routine. The driver
must also provide a synchronization routine in order to avoid drawing errors if it batches graphics DDI drawing
operations.
Such a driver has the option of implementing one of DrvSynchronizeSurface or DrvSynchronize as the
synchronization routine. GDI calls one of these routines only when the driver has hooked them in
EngAssociateSurface. GDI will call only DrvSynchronizeSurface in drivers that hook both of these
synchronization routines.
DrvSynchronizeSurface provides additional information to the driver regarding synchronization events and why
they occur. This enables the driver to reduce performance lag due to synchronization. For example, drivers that
track which device bitmaps are in the accelerator's queue might be able to return immediately from
DrvSynchronizeSurface if the specified surface is not currently in the queue.
In addition to providing a synchronization routine, a driver can also activate a time-based or programmaticflush
mechanism by setting the following flags in the flGraphicsCaps2 field of the DEVINFO structure:
GCAPS2_SYNCTIMER -- Setting this flag causes the driver's synchronization routine to be called periodically.
Drivers that batch graphics DDI calls must specify this flag. By doing so, these drivers avoid problems such
as lag in a software cursor's movement or in drawing that is performed in bursts.
GDI passes the DSS_TIMER_EVENT flag to DrvSynchronizeSurface when this synchronization routine is
called due to a periodic event.
GCAPS2_SYNCFLUSH -- Setting this flag causes the driver's synchronization routine to be called whenever
the Microsoft Win32 GdiFlush API is called. Drivers that perform asynchronous rendering must specify this
flag and provide a synchronization routine.
GDI passes the DSS_FLUSH_EVENT flag to DrvSynchronizeSurface when this synchronization routine is
called due to a flush-based event. See the Microsoft Windows SDK documentation for more information
about GdiFlush.
Limitations on Batching DirectDraw Drawing Calls
The driver must never batch DirectDraw drawing calls when the destination surface is the visible screen. Such a
situation occurs in a windowed DirectX application where the completed frame is updated to the screen via DdBlt
and should thus be displayed immediately. This restriction also applies to DirectDraw video port surfaces, which
might be flipped asynchronously.
Transparency in Display Drivers
If the display hardware supports transparency, the display driver should implement DrvTransparentBlt.
To reduce the cost of reading from video memory, drivers should implement this function when both the source
and destination surfaces are in video memory. Drivers should let GDI process transparent bit-block transfers from
system memory to video memory, and let GDI handle stretched bit-block transfers as well.
Special Effects in Display Drivers
Windows 2000 and later operating system versions support the following special effects:
If the display hardware supports alpha blending, the display driver can implement DrvAlphaBlend.
If the display hardware supports gradient fills, the display driver should implement DrvGradientFill.
Alpha Blending
The Microsoft Windows 2000 (and later) Shell uses alpha blending extensively to perform operations such as
blend-in and blend-out animations and alpha cursors. Because alpha blend operations require reading from both
the source and destination surfaces, it is very slow to punt to GDI when either the source or destination is in video
memory. Consequently, hardware accelerations in the driver will yield visibly smoother animations and improve
overall system performance.
Drivers should implement DrvAlphaBlend for bit-block transfers from compatible bitmaps using a constant alpha,
and from 32 bpp BGRA system-memory surfaces with per-pixel alpha values. DrvAlphaBlend can be implemented
using triangle texture fills, provided that no seam is ever visible.
The worst-case error produced by DrvAlphaBlend should not exceed one (1) per color channel. When stretching is
involved, the source should be COLORONCOLOR-stretched (see the Windows SDK documentation) prior to
blending; the worst-case error should not exceed one (1) per color channel combined with the worst-case
stretching error.
In cases where alpha blending is combined with stretching, there are tests in the WDK that evaluate a display
driver's implementation of DrvAlphaBlend in the following way:
1. The test calls the display driver's DrvAlphaBlend, producing an alpha-blended and stretched rectangle.
2. The test generates a destination rectangle, using the same source rectangle as was used in the call to
DrvAlphaBlend.
3. For each pixel P in the destination rectangle of step 2, the test simulates a reverse stretch to determine the
corresponding pixel in the source rectangle, before stretching. The test applies a tolerance value to the
reverse stretch to accommodate the varying stretch implementations by drivers. The test then calculates the
alpha blend that should be applied to that pixel.
Because any of four possible pixels (the corners of the 3 X 3 pixel square centered on pixel P) in the source
rectangle could be stretched to produce pixel P in the destination rectangle, the test must compare the color
value of each corner pixel with that of the pixel at the corresponding position in the rectangle produced by
DrvAlphaBlend.
The worst-case stretching error is the largest difference in color value between any pair of corresponding corner
pixels, where one of them is on the DrvAlphaBlend-produced rectangle, and the other is on the test-produced
source rectangle.
Gradient Fills
The Windows 2000 (and later) Shell uses gradient fills on all caption bars.
The results produced by DrvGradientFill depend on the number of bits per pixel, and must satisfy the following
guidelines:
24-bpp or 32-bpp surfaces
Values must increase or decrease monotonically in all gradated directions.
For rectangular gradients: When ulMode == GRADIENT_FILL_RECT_H, each vertical bar must be a single
color. When ulMode == GRADIENT_FILL_RECT_V, each horizontal bar must be a single color.
The worst-case error in any channel cannot exceed Â±1.
The endpoints of the region must be exact matches.
15-bpp or 16-bpp surfaces
The worst-case error in any channel cannot exceed Â±15.
1-bpp to 8-bpp surfaces
No error is permitted in gradient fills for any of these surfaces. For an 8-bpp surface, GDI does not call the driver's
DrvGradientFill function.
Note that in all surfaces, clipping does not affect results.
GDI supports Image Color Management (ICM) version 2.0. Display drivers can use ICM without implementing any
special code.
If the display hardware supports a gamma ramp, the display driver should implement
DrvIcmSetDeviceGammaRamp. Color-calibrating applications that require color exactness use this capability.
DirectDraw also uses this function to allow DirectX applications -- such as a game that performs palette animation
in RGB modes -- to control the gamma ramp. For example code, refer to the Permedia sample display drivers.
Note The Microsoft Windows Driver Kit (WDK) does not contain the 3Dlabs Permedia2 (3dlabs.htm) and 3Dlabs
Permedia3 (Perm3.htm) sample display drivers. You can get these sample drivers from the Windows Server 2003
SP1 Driver Development Kit (DDK), which you can download from the DDK - Windows Driver Development Kit
page of the WDHC website.
Monitor Profiles
A monitor profile is a type of device profile used for color management. This profile contains information about
how to convert colors in a monitor's color space and color gamut into colors in a device-independent color space.
Any user-mode application, such as a setup program or a word processor with graphics capabilities, can use a
monitor profile, provided that ICM has been enabled, and that the application has knowledge of the profile's
format.
Although you can create custom monitor profiles using third-party tools, you may be able to use one of the
monitor profiles shipped with Windows 2000 and later operating system versions. These profiles are described in
the following table.
PROFILE MONITOR CHARACTERISTICS
mnB22G15.icm B22 phosphor, gamma 1.5
mnB22G18.icm B22 phosphor, gamma 1.8
mnB22G21.icm B22 phosphor, gamma 2.1.
mnEBUG15.icm EBU phosphor, gamma 1.5
mnEBUG18.icm EBU phosphor, gamma 1.8
mnEBUB21.icm EBU phosphor, gamma 2.1
mnP22G15.icm P22 phosphor, gamma 1.5
Diamond Compatible 9300K G2.2.icm 9300Â° Kelvin white point, gamma 2.2
Hitachi Compatible 9300K G2.2.icm 9300Â° Kelvin white point, gamma 2.2
NEC Compatible 9300K G2.2.icm 9300Â° Kelvin white point, gamma 2.2
Trinitron Compatible 9300K G2.2.icm 9300Â° Kelvin white point, gamma 2.2
Installing a Monitor Profile
A user can install a monitor profile in three different ways:
1. In the Windows Explorer, select the profile, right-click the name, and then click Install Profile.
2. Refer to the profile in a monitor INF file.
3. Hard-code the profile's path and file name in an application.
Because the default directory for monitor profiles is subject to change, hard-coding the profile's path and file name
is not recommended.
Using a Monitor Profile
A monitor profile, unlike a printer profile, supports very little communication between the output device and an
application. For example, if a user changes the gamma ramp in the video buffer, the monitor profile is not notified
that such a change has occurred. In this case, with ICM enabled, two color corrections are applied to the image
before it is displayed, as shown in the following sequence of steps.
1. The application opens and then manipulates the image.
2. The application enables ICM by a call to a Win32 GDI ICM function, such as SetICMMode. (See the
Microsoft Windows SDK for more information.)
3. The application sends the image to Win32 GDI.
4. If ICM is enabled, Win32 GDI uses the monitor profile to translate the colors in the image.
5. Win32 GDI sends the image to kernel-mode GDI.
6. Kernel-mode GDI formats the image for the display driver, based on such device characteristics of the device
context (DC) as bit depth, resolution, and halftoning.
7. The display driver (or video hardware) performs gamma correction to the image.
DirectDraw and GDI
GDI automatically enables DirectDraw when the display driver is initialized. To provide better interaction between
DirectDraw and the graphics DDI portion of the driver, a driver that also supports the DirectDraw DDI can
implement or call the following functions:
DrvDeriveSurface
A driver-implemented function that wraps a GDI driver surface around a DirectDraw driver surface, allowing any
GDI drawing to DirectDraw video memory or AGP surfaces to be hardware accelerated (rather than being drawn in
software via the DIB engine). Typically, if the driver already supports off-screen device bitmaps, this function should
require only a few additional lines of code.
DrvDeriveSurface improves the performance of DirectDraw applications that also use GDI, and it also eliminates
cursor flicker when a software cursor is used with DirectDraw or Direct3D applications.
HeapVidMemAllocAligned and VidMemFree
Driver-called functions that use the DirectDraw heap manager for all off-screen memory management.
DrvCreateDeviceBitmap should call HeapVidMemAllocAligned to request DirectDraw to allocate space for
GDI bitmaps; DrvDeleteDeviceBitmap should call VidMemFree to free this allocation.
DirectDraw has priority over the graphics DDI portion of the driver for off-screen memory allocation. The driver
should hook the DirectDraw DdFreeDriverMemory callback, which allows the driver to remove GDI surfaces from
off-screen memory to make space for higher priority DirectDraw surface allocations.
Both HeapVidMemAllocAligned and VidMemFree are declared in dmemmgr.h, which ships with the Windows
Driver Kit (WDK). A driver might have to define __NTDDKCOMP__ before including this header file.
Tracking Window Changes
Changes to a window, including one in a multiple-monitor system, can be tracked by a device driver through a
WNDOBJ. A WNDOBJ is a driver-level window object that contains information about the position, size, and the
visible client region of a window. That is, by creating a WNDOBJ that corresponds to an application window, the
driver can track the size, position, and client region changes in that window.
An application uses the Win32 API to access the WNDOBJ_SETUP functionality implemented by the device driver.
Access is gained through the Win32 ExtEscape function. GDI passes this escape call to the device driver with
DrvEscape, implemented by the device driver with WNDOBJ_SETUP for the value of iEsc.
An application calls ExtEscape(hdc, WNDOBJ_SETUP,...) and passes a handle to the application-created window
(created by CreateWindow or some equivalent Win32 function) through the input buffer to the driver. If the driver
is to keep track of the window, it calls EngCreateWnd, within the context of the ExtEscape call, to create a
WNDOBJ structure for the given window. From that point on, any changes to that window will pass down to the
driver.
The driver should handle the ExtEscape call in a manner similar to the following:
ULONG DrvEscape(
SURFOBJ *pso,
ULONG iEsc,
ULONG cjIn,
PVOID pvIn,
ULONG cjOut,
PVOID pvOut)
{
WNDOBJ *pwo;
WNDDATA *pwd;
if (iEsc == WNDOBJ_SETUP)
{
pwo = EngCreateWnd(pso,*((HWND *)pvIn),&DrvVideo,
WO_RGN_CLIENT, 0);
// Allocate space for caching client rects. Remember the pointer

// in the pvConsumer field.
pwd = EngAllocMem(0, sizeof(WNDDATA), DRIVER_TAG);

WNDOBJ_vSetConsumer(pwo,pwd);
// Update the rectangle list for this wndobj.
vUpdateRects(pwo);
return(1);
}
Creating a window object involves locking special window resources, therefore EngCreateWnd should be called
only in the context of the WNDOBJ_SETUP escape in DrvEscape or DrvSetPixelFormat.
The EngCreateWnd function supports window tracking by multiple drivers. Through EngCreateWnd, each driver
identifies its own callback routine that GDI is to call for changes to the corresponding window. This feature allows,
for example, a live video driver to track changes to live video windows while an OpenGL driver is tracking changes
to OpenGL windows.
GDI will call back to the driver with the most recent window states if a new WNDOBJ is created in
DrvSetPixelFormat or ExtEscape. GDI will also call back to the driver when a window referenced by a WNDOBJ is
destroyed.
As an accelerator, the driver may access public members of the WNDOBJ structure.
Tracking window changes involves the use of three callback functions provided to support the WNDOBJ structure.
The visible client region may be enumerated by calling the WNDOBJ_cEnumStart and WNDOBJ_bEnum callback
functions. A driver may associate its own data with a WNDOBJ by calling the WNDOBJ_vSetConsumer callback
function.
Supporting the DitherOnRealize Flag
In earlier versions of GDI and the graphics DDI, two calls by GDI to display driver functions were required to dither
a specified color and then realize a brush for that color. For example, when an application requests that a rectangle
be filled with a dithered color, GDI typically calls DrvBitBlt, passing the extents of the rectangle and the brush
object to use. The display driver then checks the brush, finds that it has not been realized, and calls back to GDI with
BRUSHOBJ_pvGetRbrush for GDI's realization of the brush. Because the display driver, not GDI, performs the
dithering of a brush, GDI passes the RGB that the application originally supplied for dithering in a DrvDitherColor
callback to the display driver.
DrvDitherColor returns a pointer to an array of color indexes that describe the dither information for the supplied
color back to GDI. GDI immediately passes this dither information back to the display driver in a call to
DrvRealizeBrush. With the BRUSHOBJ realized, control returns back to GDI and subsequently back to the original
DrvBitBlt function.
To accomplish dithering using the above technique, GDI must call DrvDitherColor, followed immediately by a call to
DrvRealizeBrush -- two separate function calls. Setting a GCAPS_DITHERONREALIZE flag in the DEVINFO structure
and modifying DrvRealizeBrush to effectively combine these two functions eliminates the need for the separate call
to DrvDitherColor and also saves some memory allocation. Under this scheme, if the display driver sets
GCAPS_DITHERONREALIZE, GDI calls DrvRealizeBrush with the RGB to be dithered and with the RB_DITHERCOLOR
flag set in iHatch. The RB_DITHERCOLOR flag is set in the high byte of iHatch, while the RGB color to be dithered is
contained in the three low-order bytes. The need to call DrvDitherColor is eliminated in this situation because the
functionality of both calls is put into one.
For example code, refer to the Permedia sample display drivers.
Supporting Banked Frame Buffers
Most of today's accelerators have frame buffers that can be mapped linearly into CPU address space. The display
drivers of such devices do not have to support banked frame buffers.
GDI cannot directly access banked memory associated with a banked frame buffer. Consequently, the display
driver of a device with such a frame buffer must divide the frame buffer into a series of contiguous banks and
provide a means for GDI to perform its draw operations to the appropriate frame buffer banks. That is, GDI is made
to write data to one bank of the frame buffer before being moved to subsequent banks, as necessary, to complete
the draw operation through a mechanism referred to as banked callbacks.
The Permedia sample display drivers that shipped with the Driver Development Kit (DDK) provide sample code for
implementing banked frame buffer support.
SP1 DDK, which you can download from the DDK - Windows Driver Development Kit page of the WDHC website.
The following figure shows a sample accelerator's frame buffer, a 1024-by-768 VGA display buffer, divided into
several banks. This figure is provided for the purpose of illustration only. The display driver does not specifically
use the physical address A000 but uses a logical address passed to it by the miniport driver.
In this example, the video memory contents are written to the accelerator through a series of draw operations that
address contiguous banks of the frame buffer. As far as GDI is concerned, each of its draw operations appear to be
to the standard frame buffer and not to different banks of the accelerator's frame buffer. The device driver for the
accelerator handles the banking operations that cause GDI to draw to the accelerator's frame buffer on a bank-by-
bank basis.
The frame buffer is a device-managed surface when an accelerator employs a banked frame buffer, so the display
driver hooks the draw function calls. When the display driver hooks a call, such as draw path, fill path, or bit-block
transfer, it determines which banks in the frame buffer are affected by the draw function that was called.
If the driver elects to have GDI perform the draw function, the driver calls the appropriate EngXxx function.
However, before making the call, the display driver must modify the clip and surface objects it received in the
hooked call and pass these modified objects in the callback to GDI. The clip and surface objects are modified to
prevent GDI from drawing beyond the extents of the bank. That is, if GDI is called to draw a path that exists partially
in the next bank, and if there is no modification of the clip and surface objects, GDI will write to memory beyond
the extents of the current bank. If GDI attempts to draw outside the extents of the bank, the resulting access
violations can be difficult to track.
The example banked frame buffer in the following figure shows how an elliptical object drawn on the display spans
two banks of the banked frame buffer, BANK_1 and BANK_2.
To draw this object, GDI must first draw the top portion of the ellipse (in BANK_1) to the standard frame buffer, and
then draw the lower portion of the ellipse to the same standard buffer. The display driver must then map these two
successive writes by GDI to BANK_1 and BANK_2 of the banked frame buffer to display, and also to prevent GDI
from writing beyond the limits of each bank.
When performing banked frame buffering, the display driver can determine the bounds of the object (the size of
the destination rectangle) by checking the parameters of the call or by calling back to GDI. From the bounds of the
object, the driver can determine how many banks are spanned by the object. For every bank that the bounding
rectangle touches, the display driver calls back to the appropriate GDI draw function, changing values for each call.
The driver changes the CLIPOBJ members originally passed by GDI to correspond to changes in the bounds of the
bank. The top and bottom scan values are redefined so that GDI does not attempt to draw beyond the limits of the
bank. The bank manager takes the original CLIPOBJ data obtained from GDI and retains the values for later
restoration. Then it changes the bounds to provide new rclBounds.top and rclBounds.bottom values that
describe the extent of the bank being drawn to. During banking, GDI must perform clipping to a size that prevents
drawing the entire path and overwriting the limits of the current bank.
If the original CLIPOBJ passed by GDI was defined as NULL or DC_TRIVIAL, then the display driver passes a
substitute CLIPOBJ, created through EngCreateClip. This substitute CLIPOBJ is modified to define a clip window so
that GDI will clip to the extents of a single bank. If the CLIPOBJ is complex, such as a triangular-shaped clip object
on an ellipse as shown in the preceding figure, the display driver modifies the complex CLIPOBJ with the
rclBounds.top and rclBounds.bottom values to produce an additive effect between the two clip objects. As a
result, GDI is prevented from writing off the end of the bank. The driver must also restore the original bounds of
the CLIPOBJ data previously obtained from GDI.
In addition to altering the bounds values, the display driver sets the OC_BANK_CLIP flag in the clip object to
inform GDI that this is a banked callback.
GDI must also be made to draw with reference to the beginning of the standard frame buffer. When called to draw,
GDI simply gets a pointer to a SURFOBJ, which includes the pvScan0, lDelta, and iBitmapFormat members. GDI
calculates where to draw on the surface by using these values as follows:
start_draw_point = pvScan0 + (y*lDelta) + (x*PixelSize(iBitmapFormat))

where x and y are coordinates at which drawing is to begin, and start_draw_point is the address at which the
address of the first pixel is to be drawn. GDI performs this calculation on every drawing call and always references
the SURFOBJ for pvScan0, which is the logical address for the start of the standard frame buffer.
For example, if GDI needs to draw the entire contents of an 8 bits-per-pixel 64K frame buffer, beginning at a logical
address of pvScan0 = 0x100000, it would end the draw operation at 0x10FFFF (0x100000 + (63*1024)+(1023)),
where y is 63, lDelta is 1024, and x is 1023 (the position of the last pixel in the last scan line).
The next time the display driver calls GDI to draw that part of the object that falls within the next bank of the
banked frame buffer, GDI interprets the value of y as 64. With a value of 0x100000 for pvScan0 and 64 for y, GDI
would attempt to begin to write data at 0x110000. However, 0x110000 is beyond the 0x10FFFF extent of the 64K
frame buffer and must not be written to by GDI during this operation.
Consequently, when the display driver requests GDI to write the data that is to appear in the second and
subsequent banks of the frame buffer, the driver must decrement the value of pvScan0 so that GDI calculates a
starting point that is still referenced to the example address of 0x100000. Continuing in the example, this means
decrementing the value of pvScan0 to a value of 0x090000 when drawing to the second bank of the frame buffer.
As a result of this change to pvScan0, GDI still draws with a reference to address 0x100000. That is, 0x090000 +
(64*1024) + 0 is equal to 0x100000, where GDI must begin to draw in order for the data to be mapped into the
second bank of the frame buffer.
Unloading Video Hardware
When a surface is no longer required, a GDI call to DrvDisableSurface informs the display driver that the surface
created for the current hardware device by DrvEnableSurface can be disabled. The driver must also free any
resources the surface was using.
After the surface is disabled, GDI calls DrvDisablePDEV to inform the driver that the hardware device is no longer
needed. The driver then frees any memory and resources that were allocated during the processing of
DrvEnablePDEV.
Finally, GDI disables the display driver by calling DrvDisableDriver. The driver must free any resources allocated
during DrvEnableDriver and restore the video hardware to its default state. After the driver returns from the
DrvDisableDriver function, GDI frees the memory it has allocated for the driver and removes driver code and data
from memory.
The following figure shows GDI's calling sequence for disabling the video hardware.

Using Events in Display Drivers
GDI provides support for events, a type of kernel dispatcher object that can be used to synchronize two threads
running below DISPATCH_LEVEL. A display driver can use events to synchronize access to the video hardware:
By the display driver and the video miniport driver
By the display or video miniport driver and another component, such as an OpenGL driver or a program
extension (such as the Display program in Control Panel).
The following table lists the GDI event-related functions.
EngClearEvent Sets a given event object to the nonsignaled state.
EngCreateEvent Creates a synchronization event object.
EngDeleteEvent Deletes the specified event object.
EngMapEvent Maps a user-mode event object to kernel mode.
EngReadStateEvent Returns the current state of a given event object: signaled

or nonsignaled.
EngSetEvent Sets an event object to the signaled state if it was not

already in that state, and returns the event object's
previous state.
EngUnmapEvent Cleans up the kernel-mode resources allocated for a

mapped user-mode event.
EngWaitForSingleObject Puts the current thread into a wait state until the given
dispatch object is set to the signaled state, or (optionally)
until the wait times out.
The video port driver also provides support for events to video miniport drivers. See Events in Video Miniport
Drivers (Windows 2000 Model) for more information.
For a broader perspective on events, see Event Objects.
Multiple-Monitor Support in the Display Driver
Multiple-monitor support is provided by Windows 2000 and later; therefore, display driver writers must not
implement any special code to provide this support.
Display drivers must be implemented without using global variables. All state must exist in the PDEV for a
particular display driver. GDI will call DrvEnablePDEV for every hardware device extension that is created by the
video miniport driver.
To track window changes in a multiple-monitor system, a driver can request GDI to create WNDOBJ objects with
desktop coordinates. The driver does this by calling EngCreateWnd using the flag WO_RGN_DESKTOP_COORD.
See Tracking Window Changes for more information.
In a multiple-monitor system, GDI stores the device's desktop position in the dmPosition member of the
DEVMODEW structure.
Disabling Timeout Recovery for Display Drivers
Because timeout recovery code is complex, it might cause incompatibility with display drivers. To resolve the
compatibility problems, timeout recovery can be disabled.
To disable timeout recovery, create the following REG_DWORD entry in the registry, and set its value to 0:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\EaRecovery

Mirror Drivers
In this section
Remote Display Drivers
Video Port DDI Support
Mirror Driver INF File
Mirror Driver Installation
Mirror drivers in Windows 8

Starting with Windows 8, mirror drivers will not install on the system. Mirror drivers described in this section will
install and run only on earlier versions of Windows.
However, a special GDI accessibility driver model is available starting with Windows 8 to developers who want to
provide mirror driver capabilities in assistive technologies for customers with disabilities or impairments. To learn
more about this special driver model, please contact acc_driver@microsoft.com.
A remote display driver model that is based on the mirror driver architecture can also run starting with Windows
8. For more information, see Remote Display Drivers.
Mirror driver description

A mirror driver is a display driver for a virtual device that mirrors the drawing operations of one or more
additional physical display devices. It is implemented and behaves much like any other display driver; however, its
paired video miniport driver is minimal in comparison to a typical miniport driver. See Mirror Driver Support in
Video Miniport Drivers (Windows 2000 Model) for more information about miniport drivers in mirroring systems.
The Windows Driver Kit (WDK) through the Windows 7 edition (Version 7600) contains a sample mirror driver
which includes component source files that are contained in three directories.
DIRECTORY CONTAINS SOURCE FILES FOR
\src\video\displays\mirror\disp The mirror driver.
\src\video\miniport\mirror\mini The miniport driver.
\src\video\displays\mirror\app The user-mode service. Also contains mirror.inf.
GDI supports a virtual desktop and provides the ability to replicate a portion of the virtual desktop on a mirror
device. GDI implements the virtual desktop as a graphics layer above the physical display driver layer. All drawing
operations start in this virtual desktop space; GDI clips and renders them on the appropriate physical display
devices that exist in the virtual desktop.
A mirror device can specify an arbitrary clip region in the virtual desktop, including one that spans more than one
physical display device. GDI then sends the mirror device all drawing operations that intersect that driver's clip
region. A mirror device can set a clip region that exactly matches a particular physical device; therefore, it can
effectively mirror that device.
Note In Windows 2000 and later, the mirror driver's clip region must include the primary display device.
Note In Windows Vista and later, the Desktop Windows Manager (DWM) will be turned off when the mirror
driver is loaded.
The mirror driver code sample illustrates how to implement a mirror driver. For more information that will help
you understand the sample:
Use the sample INF file, mirror.inf, as a template. See Mirror Driver INF File for details.
See the mirror.exe application, which demonstrates how the mirror driver is attached to the virtual desktop.
See Mirror Driver Installation for details.
Refer to the Windows SDK documentation for information about using the Win32 EnumDisplayDevices
function. You use this function to determine the \\.\Display# name associated with your mirrored display
device. This number is required to change the settings for your mirrored device. For multiple instances, # is
a different number for each instance; therefore you must determine this number by iterating through the
available display devices.
To attach the mirrored device to the global desktop
1. Add the REG_DWORD registry entry Attach.ToDesktop to your driver's services keys.
2. Set this key's entry to 1 (one).
To disable the mirror driver, set this entry to 0 (zero).
As mentioned previously, the driver is installed and operates in a drawing layer that resides above the device
layer. Because the mirror driver's coordinate space is the desktop coordinate space, it can span more than one
device. If the mirror driver is intended to mirror the primary display, its display coordinates should coincide with
the primary display's desktop coordinates.
After the mirror driver is installed, it will be called for all rendering operations that intersect the driver's
display region. On a multiple-monitor system, this might not include all drawing operations if the mirror
driver overlaps only the primary display device.
It is recommended that a user-mode service be used to maintain the mirror driver's settings. This
application can ensure that the driver is loaded correctly at boot time and it can respond appropriately to
changes to the desktop by getting notifications of display changes via the WM_DISPLAYCHANGE message.
GDI calls the mirror driver for any 2D graphics DDI drawing operation that intersects the driver's bounding
rectangle. Note that GDI does not perform a bounding rectangle check if the surface is a device format
bitmap; that is, if the SURFOBJ has an iType of STYPE_DEVBITMAP.
As always, the mirror driver must be implemented without the use of global variables. All state must exist
in the PDEV for that particular driver. GDI will call DrvEnablePDEV for every hardware device extension
created by the video miniport driver.
The mirror driver should not support DirectDraw.
A mirror driver must set the GCAPS_LAYERED flag to TRUE in the flGraphicsCaps member of the
DEVINFO structure.
An accessibility mirror driver must set the GCAPS2_EXCLUDELAYERED and GCAPS2_INCLUDEAPIBITMAPS
flags to TRUE in the flGraphicsCaps2 member of the DEVINFO structure.
A mirror driver can optionally support brush realizations by implementing DrvRealizeBrush.
GDI allows the same driver to run on both a single and multiple-monitor system. A driver in a multiple-monitor
system need only track its position within the global desktop. GDI provides this position to the driver whenever a
Win32 ChangeDisplaySettings call occurs, such as when a user dynamically changes the monitor's position in
the desktop by using the Display program in Control Panel. GDI updates the dmPosition member of the
DEVMODEW structure accordingly when such a change occurs. A driver can receive notification of such a change
by implementing DrvNotify. See Mirror Driver Installation for more information.
Note Mirror drivers are not required to render with pixel-perfect accuracy when rendering on the client side with
such accuracy may be difficult. For example, the adapter/monitor receiving the mirrored image is not required to
render Grid Intersect Quantization (GIQ) line drawing and polygon fills with the same precision as the
adapter/monitor being mirrored.
Remote Display Drivers
A remote display driver is based on the Windows 2000 Mirror Driver model and is used to render the desktop in a
remote session.
To successfully install and run starting with Windows 8, a remote display driver must implement only the following
device driver interfaces (DDIs) and no more.
DrvAssertMode
DrvBitBlt
DrvCompletePDEV
DrvCopyBits
DrvDisableDriver
DrvDisablePDEV
DrvDisableSurface
DrvEnablePDEV
DrvEnableSurface
DrvEscape
DrvGetModes
DrvMovePointer
DrvResetPDEV
DrvSetPointerShape
Video Port DDI Support
Starting with Windows 8, display drivers based on the Windows 2000 Display Driver Model (XDDM) will not install
or run, but GDI accessibility drivers and remote display drivers will install and run. For these scenarios only some of
the functions that are exported by the Video Port Driver are supported.
Supported Video Port DDIs

For GDI accessibility drivers and remote display driver scenarios, starting with Windows 8 the following system-
implemented Video Port Driver device driver interfaces (DDIs) are still supported.
VideoDebugPrint
VideoPortAcquireDeviceLock
VideoPortAcquireSpinLock
VideoPortAcquireSpinLockAtDpcLevel
VideoPortAllocatePool
VideoPortClearEvent
VideoPortCompareMemory
VideoPortCreateEvent
VideoPortCreateSpinLock
VideoPortDeleteEvent
VideoPortDeleteSpinLock
VideoPortFlushRegistry
VideoPortFreePool
VideoPortGetAssociatedDeviceExtension
VideoPortGetCurrentIrql
VideoPortGetProcAddress
VideoPortGetRegistryParameters
VideoPortGetVersion
VideoPortInterlockedDecrement
VideoPortInterlockedExchange
VideoPortInterlockedIncrement
VideoPortInitialize
VideoPortLockBuffer
VideoPortLogError
VideoPortMoveMemory
VideoPortQueryPerformanceCounter
VideoPortQueryServices
VideoPortQuerySystemTime
VideoPortQueueDpc
VideoPortReadStateEvent
VideoPortReleaseDeviceLock
VideoPortReleaseSpinLock
VideoPortReleaseSpinLockFromDpcLevel
VideoPortSetEvent
VideoPortSetRegistryParameters
VideoPortSynchronizeExecution
VideoPortUnlockBuffer
VideoPortWaitForSingleObject
VideoPortZeroMemory
Unsupported Video Port DDIs

For GDI accessibility drivers and remote display driver scenarios, starting with Windows 8 the following system-
implemented Video Port Driver DDIs are not supported.
Items marked with an asterisk (*) are for hardware scenarios that are no longer supported starting with Windows 8.
VideoPortAllocateBuffer
VideoPortAllocateCommonBuffer
VideoPortAllocateContiguousMemory
VideoPortAssociateEventsWithDmaHandle
VideoPortCheckForDeviceExistence*
VideoPortCompleteDma*
VideoPortCreateSecondaryDisplay*
VideoPortDDCMonitorHelper*
VideoPortDebugPrint
VideoPortDisableInterrupt
VideoPortDoDma
VideoPortEnableInterrupt
VideoPortEnumerateChildren*
VideoPortFreeCommonBuffer
VideoPortFreeDeviceBase*
VideoPortGetAccessRanges*
VideoPortGetAgpServices
VideoPortGetAssociatedDeviceID*
VideoPortGetBusData*
VideoPortGetBytesUsed
VideoPortGetCommonBuffer
VideoPortGetDeviceBase*
VideoPortGetDeviceData*
VideoPortGetDmaAdapter*
VideoPortGetDmaContext
VideoPortGetMdl
VideoPortGetRomImage*
VideoPortGetVgaStatus*
VideoPortInt10*
VideoPortIsNoVesa*
VideoPortLockPages
VideoPortMapBankedMemory
VideoPortMapDmaMemory
VideoPortMapMemory*
VideoPortProtectWCMemory*
VideoPortPutDmaAdapter*
VideoPortReadPortBufferUchar*
VideoPortReadPortBufferUlong*
VideoPortReadPortBufferUshort*
VideoPortReadPortUchar*
VideoPortReadPortUlong*
VideoPortReadPortUshort*
VideoPortReadRegisterBufferUchar
VideoPortReadRegisterBufferUlong
VideoPortReadRegisterBufferUshort
VideoPortReadRegisterUchar
VideoPortReadRegisterUlong
VideoPortReadRegisterUshort
VideoPortRegisterBugcheckCallback
VideoPortReleaseBuffer
VideoPortReleaseCommonBuffer*
VideoPortRestoreWCMemory*
VideoPortScanRom*
VideoPortSetBusData*
VideoPortSetBytesUsed
VideoPortSetDmaContext
VideoPortSetTrappedEmulatorPorts*
VideoPortSignalDmaComplete
VideoPortStallExecution
VideoPortStartDma*
VideoPortStartTimer
VideoPortStopTimer
VideoPortUnlockPages
VideoPortUnmapDmaMemory
VideoPortUnmapMemory*
VideoPortVerifyAccessRanges*
VideoPortWritePortBufferUchar*
VideoPortWritePortBufferUlong*
VideoPortWritePortBufferUshort*
VideoPortWritePortUchar*
VideoPortWritePortUlong*
VideoPortWritePortUshort*
VideoPortWriteRegisterBufferUchar
VideoPortWriteRegisterBufferUlong
VideoPortWriteRegisterBufferUshort
VideoPortWriteRegisterUchar
VideoPortWriteRegisterUlong
VideoPortWriteRegisterUshort
VideoPortZeroDeviceMemory*
Mirror Driver INF File
Use the Mirror.inf sample mirror driver INF file as a template for constructing your own mirror driver INF file.
For more information, see Installing a Boot Driver and INF File Sections and Directives.
Note Starting with Windows 8, mirror drivers will not install on the system. For more information, see Mirror
Drivers.
Mirror Driver Installation
The system installs a mirror driver in response to a Win32 ChangeDisplaySettings or

ChangeDisplaySettingsEx call. You should implement a user-mode service to make one of these calls to install
your mirror driver and maintain its settings. Use this application to:
Ensure that the mirror driver is loaded correctly at boot time. The application should specify the
CDS_UPDATEREGISTRY flag to save the settings to the registry, so that the driver will automatically be
loaded on subsequent boots with the same DEVMODEW information described below.
Respond appropriately to desktop changes by getting display change notifications through the
WM_DISPLAYCHANGE message.
The sample Mirror.exe, which you can build from the source code files that ship with the Windows Driver Kit
(WDK), implements a subset of the operations a user-mode service should provide to load a mirror driver.
Before the mirror driver is installed, the user-mode application should fill in a DEVMODEW structure that specifies
the following display attributes:
Position (dmPosition)
Size (dmPelsWidth and dmPelsHeight)
Format of the mirrored display (dmBitsPerPel)
The user-mode application must also set dmFields appropriately, by including a flag for each of these structure
members to be changed. The mirrored display's position coordinates must be specified in desktop coordinates; as
such, they can span more than one device. To directly mirror the primary display, the mirror driver should specify
its location to coincide with the primary display's desktop coordinates.
After the DEVMODEW structure members have been set, change the mirrored display settings by passing this
structure in a call to the Win32 ChangeDisplaySettingsEx function.
After the mirror driver is installed, it will be called by GDI for all rendering operations that intersect the driver's
display region. GDI might not send all drawing operations to the mirror driver if the mirror driver overlaps only
the primary display in a multiple-monitor system.
See the Microsoft Windows SDK documentation for more information about the ChangeDisplaySettings and
ChangeDisplaySettingsEx functions, and display change desktop notifications.
Drivers.
Display Driver Testing Tools
In addition to providing a debugger, Windows 2000 and later operating system versions provide the following
tools for testing and debugging your display driver:
newdisp -- lets you dynamically reload your driver without rebooting the system. See NewDisp: Dynamic
Reload of a Display Driver for details.
Slider control in Control Panel's Display program -- lets you dynamically change the accelerations that GDI
will call in your driver. See Dynamic Change of Permitted Driver Accelerations for details.
NewDisp: Dynamic Reload of a Display Driver
The Driver Development Kit (DDK) provides a tool that allows a display driver to be dynamically reloaded without
rebooting. This tool, called newdisp.exe, accelerates display driver testing during development by making reboots
less necessary when updating display driver code.
Note This tool is not available in the Windows Vista and later releases of the Windows Driver Kit (WDK).
To run *newdisp.exe*
1. Close all Direct3D and OpenGL applications.
2. Copy your updated display driver's DLL into the \system32 directory.
3. Run newdisp (without any arguments).
Each time newdisp is invoked, it reloads the display driver. Assuming that no driver references exist at the time of
invocation, newdisp accomplishes the dynamic reload by:
Calling ChangeDisplaySettings with 640x480x16 colors, which causes the system to load and run the 16
color VGA display driver DLL, and at the same causes the old display driver DLL to be unloaded from
memory.
Immediately performing another ChangeDisplaySettings callback to the original mode, which causes the
new display driver DLL to be loaded from \system32 directory, and the 16 color VGA display driver DLL to
be unloaded.
A reference to the driver instance exists if the driver has active Direct3D, WNDOBJ, or DRIVEROBJ objects. When
newdisp is run while a reference to the driver instance exists, the old display driver DLL will never be unloaded, and
correspondingly the new display driver DLL will never be loaded.
Newdisp relies on dynamic driver loading functionality that has been added to Windows 2000 and later to reload
the driver without rebooting; consequently, it does not work on Windows NT 4.0 and previous operating system
versions. It also does not work if the VGA driver cannot be loaded on the graphics device, or if the native display
driver supports a mode of 640x480x16 colors instead of letting that mode be handled by the VGA driver.
Note that newdisp does not currently cause the video miniport driver to be reloaded. If the miniport driver is
changed, the system must be rebooted to install and test it.
Dynamic Change of Permitted Driver Accelerations
The driver's acceleration level can be changed through the user interface by using the slider that is produced by
clicking on the Display icon in Control Panel. Depending on the value set with this slider, GDI allows the following
levels of driver accelerations listed in the following table.
VALUE DESCRIPTION
0 All display driver accelerations are permitted.
1 DrvSetPointerShape and DrvCreateDeviceBitmap are

disabled.
2 In addition to 1, more sophisticated display driver

accelerations are disallowed, including DrvStretchBlt,
DrvFillPath, DrvGradientFill, DrvLineTo,
DrvAlphaBlend, and DrvTransparentBlt.
3 In addition to 2, all DirectDraw and Direct3D accelerations

are disallowed.
4 In addition to 3, almost all display driver accelerations are

disallowed, except for solid color fills, DrvCopyBits,
DrvTextOut, and DrvStrokePath. DrvEscape is disabled.
5 No hardware accelerations are allowed. The driver will only

be called to do bit-block transfers from a system memory
surface to the screen.
A display driver can determine the current acceleration level by:

Receiving notification of a change to the acceleration level from GDI by implementing DrvNotify.
Calling EngQueryDeviceAttribute to query the current acceleration level.
DirectDraw
This section describes the Microsoft DirectDraw interface and architecture, and provides implementation
guidelines for DirectDraw driver writers. The guidelines are written for Microsoft Windows 2000 and later. The
reader should be familiar with the DirectDraw APIs, and have a firm grasp of the Windows 2000 display driver
model.
Driver writers who are creating Microsoft DirectDraw drivers for Microsoft Windows 2000 and later should use
the following header files:
ddrawint.h contains the basic types, constants, and structures for DirectDraw drivers.
ddraw.h contains the basic types, constants, and structures used by both applications and drivers.
dvp.h is used when the driver supports the DirectDraw video port extensions (VPE).
dxmini.h is used when the video miniport driver includes support for kernel-mode video transport, the
DxApi interface (functions specified by the DXAPI_INTERFACE structure).
ddkmapi.h is used by video capture drivers to access the DxApi function. DirectDraw, in turn, calls upon the
DxApi interface.
dmemmgr.h is used when the driver wants to perform its own memory management instead of relying on
the DirectDraw runtime.
ddkernel.h is used when the driver includes kernel-mode support.
dx95type.h allows driver writers to easily port existing Windows 98/Me drivers to Windows 2000 and later.
This header file maps names that are different on the two platforms.
The ddraw.h header file is shipped with the Windows SDK; all other header files are included with the Windows
Driver Kit (WDK). The Windows Driver Development Kit (DDK) also contains sample code for a DirectDraw driver
in the p3samp video display directory.
Reference pages for DirectDraw driver functions, callbacks, and structures can be found in DirectDraw Driver
Functions and DirectDraw Driver Structures.
For more information about DirectDraw, see the Windows SDK. DirectDraw driver writers can send questions and
comments by email to directx@microsoft.com.
About DirectDraw
Microsoft DirectDraw is the display component of Microsoft DirectX that allows software designers to directly
manipulate display memory, hardware blitters, hardware overlays, and flipping surfaces. DirectDraw provides a
device-independent way for games and Windows subsystem software, such as 3D graphics packages and digital
video codecs, to gain access to the features of specific display devices.
DirectDraw provides device-independent access to the device-specific display functionality in a direct 32-bit path.
DirectDraw calls important functions in a driver that accesses the display card directly, without the intervention of
the Windows graphics device interface (GDI) or the device-independent bitmap (DIB) engine.
By taking advantage of this direct path, games and other display-intensive applications run faster and avoid tearing.
A tear is a screen flicker caused by an image displayed and written to at the same time. Direct access often allows
game performance to be limited solely by display card performance. DirectDraw also uses page flipping to provide
smooth animation.
The rapid motion and ever-changing screens of many games and multimedia applications put a heavy burden on
the display process and tend to exacerbate tearing. Although GDI is very fast at drawing spreadsheets, graphs,
TrueType font rendering, and so on, it is not meant to be a real-time graphics API. DirectDraw augments GDI by
handling the device-dependent hardware accelerator functions in a 32-bit driver.
DirectDraw Architecture
Microsoft DirectDraw includes the following components:

User-mode DirectDraw (ddraw.dll), which is a system-supplied dynamic-link library (DLL) that is loaded and
called by DirectDraw applications. This component provides hardware emulation, manages the various
DirectDraw objects, and provides display memory and display hardware management services.
Kernel-mode DirectDraw, which is an integral part of win32k.sys, the system-supplied graphics engine that is
loaded by a kernel-mode display driver. This portion of DirectDraw performs parameter validation for the
driver, making it easier to implement more robust drivers. This is a critical design goal because display
drivers are trusted components of the Microsoft Windows 2000 and later operating systems. Kernel-mode
DirectDraw also handles synchronization with GDI and all cross-process states.
The DirectDraw portion of the display driver, which, along with the rest of the display driver, is implemented
by graphics hardware vendors. This component is referred to as the DirectDraw driver in this document.
Other portions of the display driver handle GDI and other non-DirectDraw related calls.
This document generically refers to both of the system-supplied components as DirectDraw.
The following figure shows a diagram of the DirectDraw driver architecture.
As shown in the preceding figure, an application accesses the display card through GDI (user and kernel-mode
portions) and the display driver. The display driver always supports GDI calls and, usually, DirectDraw and Direct3D
calls. The device independent bitmap (DIB) engine portion of GDI emulates functionality when it is not supported by
the display driver.
When DirectDraw is invoked, it accesses the graphics card directly through the DirectDraw driver. DirectDraw calls
the DirectDraw driver for supported hardware functions, or the hardware emulation layer (HEL) for functions that
must be emulated in software. GDI calls, on the other hand, are sent to the driver, which must then call back into the
DIB engine if the call is unsupported.
Note If the DirectDraw driver fails an operation, DirectDraw does not pass the operation to the DirectDraw HEL, but
instead passes the DirectDraw driver's error code back to the application.
At initialization time and during mode changes, the display driver returns capability (caps) bits to DirectDraw. This
enables DirectDraw to access information about the available driver functions, their addresses, and the capabilities
of the display card and driver (such as stretching, transparent blits, display pitch, and other advanced
characteristics). Once DirectDraw has this information, it can use the DirectDraw driver to access the display card
directly, without making GDI calls or using the GDI specific portions of the display driver.
The DirectDraw Driver
DirectDraw provides device independence through the DirectDraw driver. The DirectDraw driver is a device-specific
interface usually provided by the display hardware manufacturer. DirectDraw exposes methods to the application
and uses the DirectDraw portion of the display driver to work directly with the hardware. Applications never call the
display driver directly.
Under Windows 2000 and later, the DirectDraw driver is always implemented as 32-bit code. The DirectDraw driver
can be part of the display driver or a separate DLL that communicates with the display driver through a private
interface defined by the driver writer. The DirectDraw documentation assumes that under Windows 2000 and later,
the DirectDraw driver is part of the display driver.
The DirectDraw driver contains only device-dependent code and performs no emulation. If a function is not
performed by the hardware, the DirectDraw driver does not report it as a hardware capability. Additionally, the
DirectDraw driver does not validate parameters because the DirectDraw runtime does this before the driver is
invoked.
First Steps For DirectDraw Drivers
A good way to begin implementing DirectDraw functionality is to modify an existing driver. If no driver is available,
start with the sample code in the DirectDraw portion of the Windows Driver Kit (WDK) and get driver initialization,
lock, and flip working. From that base functionality, more powerful functionality can be added that will improve
display performance.
The minimum driver functionality DirectDraw requires is the ability to lock, unlock, and flip a surface. Assuming the
hardware supports the related operations, driver support should also be added for blts (including transparent blts,
which are important for speed in games), stretching, and overlays, which are critical for video playback.
Hardware Emulation Layer
The DirectDraw hardware emulation layer (HEL) performs emulation for the DirectDraw driver. The HEL (written by
Microsoft as part of DirectDraw) performs this emulation in user mode.
For example, if the DirectDraw driver has capability (caps) bits set for blts but not for transparent blts, the
DirectDraw driver is called for blts and the HEL for transparent blts. In the case of a transparent blt, the HEL is
passed a display memory pointer and it compares each byte from a backing surface to the color key. If the byte
does not match the color key, the HEL copies it to the destination surface using the CPU. This emulation also occurs
for other unsupported operations or if the card is out of display memory.
DirectDraw does not pass failed DirectDraw driver operations to the HEL. If either the HEL or the DirectDraw driver
fails a particular operation, an error code is returned to the application.
DirectDraw Driver Fundamentals
This section contains general information about the design of display drivers for Microsoft DirectDraw. Information
is organized in the following groups:
DirectDraw Surfaces
Display Memory
Memory Configurations
Flipping
Blitting
Managing PDEVs
Performing Floating-point Operations in DirectDraw
Return Values for DirectDraw
DirectDraw Surfaces
The Microsoft DirectDraw Surface is the basic image unit in Microsoft DirectX graphics. It is either a rectangular
collection of pixels of a particular width, height, and pixel format; or a buffer containing commands or vertices for
Microsoft Direct3D. Surfaces have bits associated with them that denote their behavior and usage. These bits are
called surface capability bits (or caps bits for short). Caps bits denote intended usages of their associated surfaces
such as holding texels for rendering (the DDSCAPS_TEXTURE caps bit), being the target for 3D rendering
(DDSCAPS_3DDEVICE), and many others. For more information about surface caps bits, see the DDSCAPS
structure.
The primary surface is the surface that is currently being scanned out to the monitor by the display card. For more
information about the primary surface, see the Flipping and Memory Configurations sections.
Complex Surfaces and Attachments
Surfaces can be complex, which means that they are part of a larger collection of associated surfaces. Examples of
complex surfaces include the front buffer and associated back buffers, the various levels of a MIP map, and the
various faces of a cube map.
The DirectDraw runtime uses a concept known as surface attachments to manage the linking of different simple
surfaces into complex surfaces. Surfaces can be attached implicitly, as when the application makes one call to
IDirectDraw7::CreateSurface to build a flipping chain (front buffer and back buffers with a possible Z-buffer); or
explicitly, as when the application associates a Z-buffer with a render target by calling
IDirectDrawSurface7::AddAttachedSurface.
Creating and Destroying DirectDraw Surfaces
Direct Draw surfaces are created in a four-stage process. These stages are:
1. DdCanCreateSurface. The runtime calls the driver's DdCanCreateSurface to see if the driver allows the
creation of a surface of this type, size, and format. The driver can return a failure code that is propagated to
the application.
2. DdCreateSurface. The driver creates the surface, potentially allocating memory for the contents of the
surface. Complex surfaces can be created all at once, with one call to DdCreateSurface. Thus, the driver may
be required to create many surfaces in one call.
3. Memory allocation. The DirectDraw runtime allocates memory for any surface that is not allocated by the
driver in response to the DdCreateSurface call. This process is covered in more detail in the following
sections.
4. D3dCreateSurfaceEx. This function associates a handle with the surface in question for later use in the
DirectXD3dDrawPrimitives2 token stream. The driver also creates its own copy of the surface structure
maintained by DirectDraw at this time. For more information about D3dCreateSurfaceEx, see the DirectX
Driver Development Kit (DDK) documentation.
Note A DirectDraw driver must never directly allocate user-mode memory for a surface (for example, by calling the
EngAllocUserMem function). Instead, the driver can have the DirectDraw runtime allocate user-mode memory for
a surface. If the driver allocates the memory directly, a subsequent request to change the video mode by a process
other than the one that created the surface, could cause an operating system crash or memory leaks. To have the
DirectDraw runtime allocate user-mode memory, the driver should return the DDHAL_PLEASEALLOC_USERMEM
value from its DdCreateSurface function. For more information, see the Remarks section on the DdCreateSurface
reference page.
Surfaces are destroyed by a single call to the driver's DdDestroySurface entry point only if the driver allocated or
was involved in allocating the memory for the surface during surface creation. If the DirectDraw runtime allocated
the memory and the driver was not involved, the runtime does not call DdDestroySurface.
Surfaces persist only as long as the mode in which they are created persists. Where there is a mode change, all the
surfaces under the driver's control are destroyed, as far as the driver is concerned. There are also other events that
can cause all surfaces to be destroyed in this way. It is not necessary for the driver to determine the cause of a
DdDestroySurface call.
Losing and Restoring DirectDraw Surfaces
Surface object lifetimes are longer for the runtime's surface objects than they are for the driver's surface objects. In
a few situations, most notably mode changes, surfaces become lost. This means that the driver's surface object is
destroyed when DdDestroySurface is called, but the runtime's surface object is placed into a suspended state. Later,
the runtime object can be restored, which corresponds to a DdCreateSurface call at the driver level.
Normally, drivers do not have to be aware of this intermediate lost state. However, there are some cases where an
understanding of this process will help the driver writer.
Driver writers can elect to handle complex surface restoration in one atomic call. At surface restoration time, the
DirectDraw runtime examines the driver's D3dCreateSurfaceEx entry point. If this entry point is defined, then the
DirectDraw runtime restores all complex surfaces in one call to DdCreateSurface. The driver probably will not be
able to differentiate between the original creation and the creation caused by restoring a surface.
Losing Driver-Managed Textures
Driver-managed texture surfaces, which consume video memory, need the ability to be placed in a suspended state
(lost). Because the driver controls the allocation of video memory for driver-managed texture surfaces, a method
notifies the driver when such texture surfaces need to be lost through an extension to the DdDestroySurface call.
When a driver-managed texture surface (marked with the DDSCAPS2_TEXTUREMANAGE flag) is lost, the driver
receives a special DdDestroySurface call with DDRAWISURF_INVALID specified in the dwFlags member of the
texture surface structure. The driver should free video memory associated with the managed texture surface, but
should not free any other private surface information including the backing (system memory) image of the video
memory copy of the surface. There will be no new DdCreateSurface call to restore the lost driver-managed texture
surfaces because they are not really lost from the driver's point of view. For the most part, this special
DdDestroySurface call is used to inform the driver that it should evict its video memory copy.
Display Memory
In general, allocating as much display memory as possible to DirectDraw increases display performance and allows
games and other DirectDraw applications to run faster, with a better quality visual image.
Usually, display cards have the pitch set to the width of the display so that no memory is wasted to the right
(conceptually) of the front buffer. This leaves one scratch area at the conceptual bottom that can be used for other
surfaces. Memory access is very straightforward in this case because one pointer can reference the entire area of
accessible display memory.
If the pitch is greater than the width of the primary surface, memory to the conceptual right of the front buffer is
wasted. This has to be reclaimed as a separate rectangular heap, regardless of whether the memory on the card is
linear or rectangular. (Even if the memory is linear, some display drivers fix the pitch to speed up the blitter. Rather
than rewrite the driver, the memory can simply be reclaimed as a separate heap.)
Memory Heap Allocation
To allocate a surface, DirectDraw scans through the display memory heaps, in the order they were specified by the
drivers. Heaps are specified in an array of VIDEOMEMORY structures. DirectDraw visits the heaps in the order of
the VIDEOMEMORY structures in the array. The VIDEOMEMORY structure sets up certain metrics of the heap, such
as the starting and ending memory addresses, flags describing the heap access, and what types of usage are
restricted for the surface placed in this heap. DirectDraw manages the heap by suballocating and deallocating
memory, that is, by creating and destroying surfaces under each heap's jurisdiction. Physical limits determine how
to set up these attributes.
DirectDraw's heap manager makes two passes through the VIDEOMEMORY structure when attempting to allocate
memory in response to a surface creation or restoration. The ddsCaps member of the VIDEOMEMORY structure
informs DirectDraw what the memory in the heap cannot be used for on the first pass. For example, if the heap is
just big enough for a back buffer, sprites can be excluded from being allocated on the first pass by setting the
DDSCAPS_OFFSCREENPLAIN flag. That way, other heaps fill up with off-screen plain surfaces, while preserving the
back buffer for page flipping.
The ddsCapsAlt member of the VIDEOMEMORY structure can be set to allow sprites on the second pass. That way,
the heap in question can allow sprites, but only if the sprite could not be created in any other heap. Do not specify
the DDSCAPS_OFFSCREENPLAIN flag in ddsCapsAlt. This allows heaps to be used optimally, without ruling out
alternative uses.
Display memory heaps can be either linear or rectangular, depending on the blitter or the needs of an existing
display driver. The dwFlags member of the VIDEOMEMORY structure is used to specify the memory allocation
type. Linear heaps describe regions of memory where the pitch of each surface can be different. Rectangular heaps
describe regions of memory where the pitch of each surface is fixed. These heaps can be mixed and matched within
the same display card, if necessary. For more information, see Memory Configurations.
A surface's memory pitch, also called stride or offset, is the number of bytes added to a column of display memory
in order to reach the same column of display memory on the following scan line. Because pitch is measured in
bytes rather than pixels, a 640x480x8 surface has a different pitch value than a surface with the same width and
height dimensions but a different pixel format (depth in bits). Additionally, the pitch value sometimes reflects extra
bytes that the runtime has reserved as a cache along with extra bytes due to alignment requirements. Therefore,
you cannot assume that pitch is simply the surface's width multiplied by the number of bytes per pixel. Rather,
visualize the difference between width and pitch as shown in the following figure.
As noted previously, you must also take into account alignment requirements when determining the pitch value.
For example, suppose a one byte per pixel (bpp) surface is 97 pixels wide. Also, suppose that the hardware or
display driver requires DWORD (4 bytes) alignment. If the runtime has not reserved cache bytes, the pitch is 100,
which is the next higher number above 97 that is evenly divisible by 4. The following calculation determines this
pitch value:
pitch = bpp * width + ( 4 - ( bpp * width) % 4 ),
that is, pitch = 97 + (4 - 1) = 100

Memory Configurations
The following sections contain three different types of memory configurations: linear, rectangular, and mixed
display memory allocation. Each section includes sample code that can be modified to fit the physical
characteristics of the card, and that can be added to the HAL to allocate display memory heaps.
Alignment requirements as described in the Memory Heap Allocation topic can apply to any of the three types of
memory configuration. Linear memory is generally used more efficiently by the application than rectangular
memory because the rows are stored sequentially. Any location can be accessed easily by moving forward or
backward along this linear range.
Linear Memory Allocation
Display memory is considered linear whenever the pitch can be changed to match the surface width, taking into
account alignment requirements (as shown in the following figure). For example, if the blitter can only hit 8-byte
strides, and a 31-pixel sprite is used, each line of display needs to be adjusted by one to align the next line on an 8-
byte boundary.
The pitch is determined by multiplying the pixel depth, in bytes, by surface width, taking into account alignment
requirements. If the display is 640 8-bit pixels across, then the pitch is 640. If the pixels are 16-bit (2 bytes) and
there is a 640 X 480 screen, then the pitch is 1280 (640 * 2). Likewise, 640 wide, 32-bit (4 bytes) pixel screens have
a pitch of 2560 (640 * 4) in linear display memory.
The following diagram illustrates linear memory heap allocation with one primary surface and one scratch area.
The pointer to the start of the primary surface is fpPrimary, a member of the VIDEOMEMORYINFO structure. The
size of the primary surface and the various Windows caches are added to this to give a pointer to the beginning of
the scratch area, indicated by the fpStart member of the VIDEOMEMORY structure. The end point, indicated by
the fpEnd member of the VIDEOMEMORY structure, is calculated by adding the size of the remaining memory
minus one.
The VIDEOMEMORY structure holds the information that manages the display memory heaps. This sample has
only one element in the array of VIDEOMEMORY structures because there is only one heap. VIDMEM_ISLINEAR, a
flag in the dwFlags member of the VIDEOMEMORY structure, denotes this as linear memory.
The following pseudocode shows how a VIDEOMEMORY structure is set up for linear memory:
/*
* video memory pool usage
*/
static VIDEOMEMORY vidMem [] = {
{ VIDMEM_ISLINEAR, 0x00000000, 0x00000000,
{ 0 }, { 0 } },
};/*
The following pseudocode shows how linear memory heaps are set up:
/*
* video memory pool information
*/
/* Calculate the number of video memory heaps */

ddHALInfo.vmiData.dwNumHeaps = sizeof ( vidMem ) / sizeof ( vidMem [ 0 ] );
/* set up the pointer to the first available video memory after the primary surface */
ddHALInfo.vmiData.pvmList = vidMem;
/*
* remainder of screen memory
*/
VideoHeapBase = ddHALInfo.vmiData.fpPrimary + dwPrimarySurfaceSize + dwCacheSize;

VideoHeapEnd = VideoHeapBase + dwDDOffScreenSize - 1;
vidMem[ 0 ].fpStart = VideoHeapBase;

vidMem[ 0 ].fpEnd = VideoHeapEnd;
The beginning of the first available scratch area is calculated by adding the beginning of the GDI primary surface to
the size of the primary surface and the size of the Windows brush, pen, and VDD caches. The result is used to set
the starting point in the first element of the VIDEOMEMORY structure to the beginning of the scratch area.
The end of the scratch area is found by adding the beginning of the scratch area to the size of the scratch area and
subtracting one to make it inclusive. The result is used to set the end point of the first (and, in this case, only)
element of the VIDEOMEMORY structure to the end of the scratch area. If there is more than one heap, the end
point is set to the end of this heap and the next heap starts where this one leaves off.
Rectangular Memory Allocation
Display memory is considered rectangular whenever the pitch is fixed to a particular size for all the surfaces within
a given heap.
With rectangular display memory, the layout is two-dimensional, with a finite width and height. This width is not
always the same as the width of the screen. Because display memory must account for different display resolutions
and design considerations, the actual horizontal width might span a much larger region than what is currently
displayed on the monitor. As described in Memory Heap Allocation, the pitch value is based on the number of bytes
to add to a column of display memory in order to reach the same column of display memory on the following scan
line.
For example, even though a screen might be displaying 640 pixels across, if the rectangular display memory is
1280 bytes across with 8-bit pixels, then the pitch is 1280 (not 640). The pitch across a 1280-pixel horizontal
stretch of memory with 16-bit pixels is 2560. The pitch for 32-bit pixels is quadruple what it is for 8-bit pixels, so if
the display is 1280 32-bit pixels across, the pitch is 5120.
Rectangular memory is generally used less efficiently by applications than linear memory because small fragments
might remain after an application stores a large surface. Applications might be unable to store other surfaces in the
remaining space even though the number of available bytes in the remaining space is greater than any new
surfaces require. Applications can access this space on a first-come, first-served basis and can only store small
surfaces that fit in the remaining fragments.
A rectangular heap can be as large as a contiguous region of available memory, but it cannot be L-shaped because
its size is measured in X by Y coordinates. If the rectangular heap is not tall enough and wide enough to hold a
primary surface, then it cannot be a back buffer. If the pitch of the primary surface is not equal to the display width
of the primary surface, a rectangular block of memory to the conceptual right of the display is left over (Heap 1 in
the following figure). This block is as wide as the pitch minus the width of the display. Leftover memory to the right
can also happen in linear cards if the existing display driver assumes a fixed pitch. Rectangular or linear memory
may also be left over below the primary surface (but not in this example).
The following diagram illustrates rectangular memory allocation.
In the preceding figure, the starting point (indicated by the fpStart member of the VIDEOMEMORY structure) of
the rectangular heap is calculated by adding the width of the primary surface to the starting address of the primary
surface. The width and height are also calculated to give the dimensions of the rectangular heap. If any memory
remains below the Windows caches, a heap could be created there.
The following pseudocode shows how a VIDEOMEMORY structure is set up for rectangular memory:
/*
*/
static VIDEOMEMORY vidMem [] = {
{ VIDMEM_ISRECTANGULAR, 0x00000000, 0x00000000,
{ 0 }, { 0 } },
};
The only difference between the code for rectangular memory and its linear counterpart is the
VIDMEM_ISRECTANGULAR flag, which indicates that this is rectangular memory
The following pseudocode shows how rectangular memory heaps are set up:
/*
*/
/* this is set to zero because there may only be one heap depending on the pitch
ddHALInfo.vmiData.dwNumHeaps = 0;
/*
* Compute the Pitch here ...
*/
vidMem[0].fpStart = ddHALInfo.vmiData.fpPrimary + dwPrimarySurfaceWidth;

vidMem[0].dwWidth = dwPitch - dwPrimarySurfaceWidth;
vidMem[0].dwHeight = dwPrimarySurfaceHeight;
vidMem[0].ddsCaps.dwCaps = 0; // surface has no use restrictions
The memory heap starting point is set to the starting address of the primary surface plus the width of the primary
surface. The width is determined by the pitch minus the width of the primary surface. The height is set to the height
of the primary surface. The surface capabilities are set to zero to indicate that there are no imposed surface use
restrictions (therefore, the surface can be used for sprites or any other type of surface).
Mixed Memory Allocation
Linear and rectangular memory heaps can be mixed and matched in any fashion, if the hardware supports it. For
example, if a front buffer has a fixed pitch, the DirectDraw-capable driver can allocate a rectangular heap to the
right of it.
As shown in the following figure, if sufficient memory remains below the primary surface, this area can be made
into a linear heap that can be used for a back buffer.
The preceding figure shows a linear piece of memory below the primary surface (Heap 1) and a rectangular piece
of memory that is reclaimed by DirectDraw to the right of the primary surface (Heap 2).
The following pseudocode shows how a VIDEOMEMORY structure is set up for a mix of linear and rectangular
memory:
/*
*/
static VIDEOMEMORY vidMem [] =
{
{ VIDMEM_ISRECTANGULAR, 0x00000000, 0x00000000,
{ 0 }, { 0 } },
{ VIDMEM_ISLINEAR, 0x00000000, 0x00000000,
{ 0 }, { 0 } },
};
Two areas of display memory can be allocated in this instance. The area to the conceptual right of the primary
surface is necessarily rectangular, and is indicated by the VIDMEM_ISRECTANGULAR flag. The area conceptually
below the primary surface can be linear, and is indicated by the VIDMEM_ISLINEAR flag.
The following pseudocode shows how a mix of linear and rectangular memory heaps are set up:
/*
*/
/* how many heaps are there

ddHALInfo.vmiData.dwNumHeaps = 2;
/* The linear piece: */
/*
* remainder of screen memory
*/
VideoHeapBase = ddHALInfo.vmiData.fpPrimary + dwPrimarySurfaceSize

+ dwCacheSize;
VideoHeapEnd = VideoHeapBase + dwDDOffScreenSize - 1;
vidMem[0].fpStart = VideoHeapBase;
vidMem[0].fpEnd = VideoHeapEnd;
/* The rectangular piece: */
/* set up the pointer to the next available video memory */

ddHALInfo.vmiData.pvmList = vidMem[1];
/*
* Compute the Pitch here ...
*/
vidMem[1].fpStart = ddHALInfo.vmiData.fpPrimary +
dwPrimarySurfaceWidth;
vidMem[1].dwWidth = dwPitch - dwPrimarySurfaceWidth;
vidMem[1].dwHeight = dwPrimarySurfaceHeight;
vidMem[1].ddsCaps.dwCaps = 0; // surface has no use restrictions
A linear memory heap is set up by determining the start and end points of the scratch area below the primary
surface, indicated by the fpStart and fpEnd members of the linear VIDEOMEMORY structure (vidMem[0]). The
rectangular piece is set up using the starting point, indicated by the fpStart member of the rectangular
VIDEOMEMORY structure (vidMem[1]), width, indicated by the dwWidth member, and height, indicated by the
dwHeight member, of the primary surface. The pitch (the dwPitch member) must be calculated before the
rectangular piece can be set up. This is the same as in the previous rectangular example, except in this case the
pitch is the second element of the VIDEOMEMORY structure instead of the first. Each new heap requires a new
VIDEOMEMORY structure.
In some cases, the flip register can handle only 256 KB boundaries. In these instances, a small heap can use up the
space between the bottom of the caches and the start of the back buffer, allowing the back buffer to begin on a 256
KB boundary. This example is not shown, but it could be implemented by adding another element to the
VIDEOMEMORY structure and setting the starting point just beyond the caches and the ending point just before the
256 KB boundary. Such a heap should be flagged with DDSCAPS_BACKBUFFER so that it can be skipped over when
the heap manager looks for a back buffer. This back buffer heap (the one aligned) would also be marked with
DDSCAPS_OFFSCREENPLAIN to keep sprites and textures from using this heap until no other memory is available
in other heaps for off-screen plain surfaces.
Flipping
Using a back buffer that can be flipped with a front buffer is the best way to take advantage of DirectDraw. Page
flipping is essential for smooth, tear-free animation in games and video playback.
The primary surface is the area of memory that is being read from to draw the screen that is currently being
displayed. If a primary surface has one or more attached back buffers, it is a flippable surface.
Flipping structures are used to page flip in DirectDraw. Conceptually, they can be thought of as linked lists made of
surfaces. The front buffer is the "visible" buffer. The back buffer and all attached flippable surfaces must be the
same size and pixel depth as the front buffer. Most modern graphics cards have enough memory for flippable
front and back buffers in high resolution modes.
All types of surfaces can be flipped in DirectDraw; page flipping is the common special case. For instance, flipping
is not limited to the primary surface on cards that support overlays, or on 3D capable display cards that have
texture memory. In these cases, overlays and textures can be flipped in the same way as the primary surface, with
the same driver entry point.
Surfaces that are not used for flipping can have any dimension and can store nonflippable objects such as image
data. Image data may also be stored in system memory, but in that case DirectDraw may use the hardware
emulation layer (HEL) because hardware blitters cannot normally reach system memory to blit the image data.
Some cards allow hardware blitters to have direct memory access (DMA) to system memory, so DirectDraw
performs a check for DMA.
The following figure illustrates the relationship between two flippable surfaces.
If a front buffer has one or more back buffers attached, it is a flippable surface, as shown in the preceding figure.
The back buffer and all attached flippable surfaces must be the same size and pixel depth as the front buffer. A
back buffer surface becomes the primary surface using a flip. A flip simply changes a pointer so it points to a
different flippable surface, thereby displaying the new surface. The front buffer (which is no longer the primary
surface) then becomes accessible, and can have new data written to it.
Flipping solves most screen flicker problems. The ability to render to a surface that is not being displayed allows
smooth, tear-free animation for game play and video playback.
Tearing
As discussed in the Flipping section, a flip essentially changes a memory pointer so that it points to a new region of
display memory (see the Permedia2 sample code). The surface being flipped away from must be finished being
displayed before the application can lock, blt, or alter that memory, or a tear may result (as shown in the following
figure).
A tear may also occur if the surface being flipped to is having data written to it during the flip. Tearing is universal
and may happen any time an image is being drawn and displayed at the same time. Faster frame rates do not solve
this problem. Because primary surfaces, overlays, and textures are all DirectDraw surfaces, they can be flipped the
same way to prevent tearing.
A tear occurs when a page flip or blt happens at the wrong time. For example, if a page flips while the monitor scan
line is in the middle of displaying a surface, as represented by the dashed line in the preceding figure, a tear occurs.
The tear can be avoided by timing the flip to occur only after the entire surface has been displayed (as in the lower
example of the figure). A tear can also occur when blitting to a surface that is in the process of being displayed.
Triple Buffering
Increasing the number of buffers that can hold a primary surface increases display performance. It is preferable to
have at least three flippable surfaces (some games use five or more). When there are only two surfaces and a page
flip occurs, the display delays until the monitor's vertical retrace is finished. The delay is necessary to ensure that
the back buffer is not written on before it is finished being displayed. With triple buffering, the third surface is
always writable because it is a back buffer and available to draw on immediately (as shown in the following figure).
In a game that does not use sprite memory, 3D rendering using triple buffering is 20 to 30 percent faster than
double buffering.
The flipping structures in the preceding figure are the same as those in Tearing, only now three buffers are used.
One buffer is almost always writable (because it is not involved in a flip) so the driver does not have to wait for the
display scan to finish before allowing the back buffer to be written to again.
The following is a brief explanation of flipping and blitting in a triple-buffered system, using the labels from the
preceding figure. The example begins with surface pixel memory 11 being displayed. This is the primary surface
pointed to by the front buffer (fpVidMem in the sample code supplied with the Microsoft Windows Driver
Development Kit [DDK]). At some point, it becomes desirable to blt to the surface at pixel memory 22. Because
fpVidMem points to the surface beginning at 11 (not 22) and the flip status is false (no flip is occurring on the
requested surface), the blt can proceed. The driver locks the surface, writes to it, and then unlocks it. To display that
surface, a flip must occur.
The DirectDraw front buffer object can now change fpVidMem (the display memory pointer) to make the surface
at 22 the primary surface. Because no flip is pending, the display pointers are exchanged (see the bottom half of the
preceding figure), and the flip status is set to TRUE. The front buffer now points to surface pixel memory 22, the
back buffer points to surface pixel memory 33, and the third buffer object points to surface pixel memory 11 (the
old primary surface). Unlike with double buffering, DirectDraw is free to write to the back buffer at this time. In
other words, DirectDraw can write to surface pixel memory 33 because no flip is pending. This circular flipping
process continues endlessly to provide smooth animation and faster game play and video playback for applications
that use DirectDraw.
Timing a Flip
Timing a flip is very easy if a flip register is available and the scan line is known. Simply check the flip register to see
whether the last flip has occurred, then make sure the scan line is not in the vertical blank before doing the flip. If
the scan line and flip register are not available, however, use one or more of the algorithms described in the
following paragraphs.
If the hardware does not have a bit for checking whether the scan line has been through a refresh cycle,
more than one method should be used to time the flip. The alternative method is based on elapsed time. It is
okay to flip if adequate time for drawing the entire surface has elapsed since the last flip. This amount of
time is determined empirically (unless the monitor refresh rate is known) at driver initialization and on mode
changes by polling until the display is in vertical sync, and then polling until it is not in vertical sync. This is
done for 20 iterations, with the thread execution priority set to maximum. The result is divided by 20 to give
display duration. This is the maximum time to wait before allowing access to the back buffer. (If the display
duration is already known, it does not need to be calculated.) In most cases, a much shorter period is
required before doing a flip, but display duration is used as a fallback because it is completely reliable.
If only the line number of the scan line is available, the fastest method is to record the time and scan line.
Then, when the scan line is less than it was before, it has been through a refresh cycle. For instance, if the
scan line is at line 300 the first time it is polled and at line 50 the next time, then the entire surface has been
displayed at least once. However, the scan line is not completely reliable on all display cards. In some cases a
register can be read from and written to at the same time, giving half of two different numbers. In these
cases, the register must be read twice to verify that it is reporting a correct number. In some cases the scan
line may be inaccurate as it passes through the horizontal sync. The flag that indicates whether the scan line
is currently in display often returns a false negative result while the scan line is in horizontal blank.
If the line number of the scan line is not available, it may be possible to verify that the scan line has passed
from display through vertical blank and back into display. By tracking when the scan line passes into and out
of display, you can determine when an entire refresh cycle has been completed. The ability to check whether
the scan line is in display or in vertical blank should be available on all display cards, but the scan-line
number may not be.
A few adjustments may be required to get the last bit of flicker out of the display, such as making sure no
calculations are occurring after the address is read (for example, shifting right two places to get the true address
because the address is on a DWORD boundary). To avoid tearing, make sure that the scan line is not in vertical
blank, is just above the vertical sync, and is in display before writing to the flip register.
Although display cards are based on the original IBM video graphics card (VGA) specifications, Super VGAs vary
considerably (see Programmer's Guide to the EGA, VGA, and Super VGA Cards by Richard F. Ferraro - Addison-
Wesley, latest edition). Some cards provide no straightforward way of determining whether the scan line is in
vertical blank, or if it is even in display. These deficiencies become more important as the competition to meet the
demand for advanced display capabilities increases. In the meantime, driver writers must use creativity to
compensate for these deficiencies.
Once it is okay to flip, the reference to the surface to display next is loaded in the flip register. Nothing happens
until the scan line gets past the vertical sync and is back into the display area. Then the flip register is read once. If
something else is stored in the flip register before that time, or if the surface is being written to at the same time it
is being read, a tear occurs. Once the register is read, the surface should not be locked or written to while it is still
being drawn.
Checking whether the register has been read is just a matter of checking a flag in most newer cards. In older cards,
however, a hardware interrupt was often used for this purpose. When a certain point was crossed that read in the
flip register, whatever function was hooked to that interrupt would be called. The problem is that this interrupt may
be used for other purposes in some newer hardware.
Flip Intervals
Beginning with DirectX 6.0, DirectDraw added the ability for an application to determine when a flip command is
performed. Support for these features can be added to an existing driver, depending on the hardware capabilities.
The following terms are used when describing the timing related to the application's call to Flip:
Posted
The time at which the application calls Flip and DirectDraw calls the driver's DdFlip entry point.
Retired
The time at which the hardware begins displaying from the new surface.
In previous versions of DirectDraw, flips were always retired on or near the vertical sync following when they were
posted. With DirectX 6.0 and later versions, applications can specify that the flip be retired immediately, that is,
exactly when posted, or at some set number of vertical syncs after the flip is posted. There are two capability bits
(DDCAPS2_FLIPNOVSYNC and DDCAPS2_FLIPINTERVAL in the DDCORECAPS structure) and four flags
(DDFLIP_NOVSYNC, DDFLIP_INTERVAL2, DDFLIP_INTERVAL3, and DDFLIP_INTERVAL4 in the DD_FLIPDATA
structure) to enable these features.
If your driver sets DDCAPS2_FLIPNOVSYNC, it receives the DDFLIP_NOVSYNC flag in the dwFlags member of the
DD_FLIPDATA structure. The DDFLIP_NOVSYNC flag indicates that the flip should be retired as soon as it is posted.
However, in this case, your hardware must be able to switch buffers on at least a per-scan line basis. The driver
should not specify support for DDCAPS2_FLIPNOVSYNC if the display does not actually retire the flip until the next
vertical sync, even if the driver returns immediately.
The number at the end of the DDFLIP_INTERVAL2, DDFLIP_INTERVAL3, and DDFLIP_INTERVAL4 flags denotes how
many vertical syncs the hardware should wait before retiring a posted flip. For example, DDFLIP_INTERVAL2 means
that the hardware should count two vertical syncs, then retire the flip on or near the second vertical sync.
If your driver exposes DDCAPS2_FLIPINTERVAL, then DirectDraw places the number of vertical syncs to delay a flip
by, into the most significant byte of the dwFlags member of the DD_FLIPDATA structure. Because the
DDFLIP_INTERVAL2, DDFLIP_INTERVAL3, and DDFLIP_INTERVAL4 flags are defined to make this true, the driver
should not treat these three flags as bit flags. Additionally, if the driver exposes DDCAPS2_FLIPINTERVAL,
DirectDraw ensures that the most significant byte of the dwFlags member is set accordingly when the
DDFLIP_INTERVAL2, DDFLIP_INTERVAL3, and DDFLIP_INTERVAL4 flags are not set. DDFLIP_NOVSYNC causes a
zero to be placed in the most significant byte, and the default value of the most significant byte becomes one
because the default behavior of a flip call is to retire the flip on or near the first vertical sync after it is posted.
Since DirectX 1.0, drivers have been required to return DDERR_WASSTILLDRAWING whenever a flip was pending
(that is, when the flip had been posted but not yet retired). This requirement is extended for flip intervals. Because
DDFLIP_NOVSYNC flips are retired when they are posted and therefore are never pending, the driver should never
return DDERR_WASSTILLDRAWING as a result of such flips. Conversely, using one of the DDFLIP_INTERVAL2,
DDFLIP_INTERVAL3, or DDFLIP_INTERVAL4 flags means that the driver needs to return
DDERR_WASSTILLDRAWING for a long period of time, because the period between posting and retiring a flip is
extended.
DirectDraw does not preclude the use of these flags with overlay surfaces, but drivers are not required to respect
them, even if they do set the DDCAPS2_FLIPINTERVAL or DDCAPS2_FLIPNOVSYNC capability bits. Drivers may
choose to respect these flags for overlays if they have the capability, but applications are unlikely to exploit this
feature.
Note The DDFLIP_INTERVAL2, DDFLIP_INTERVAL3, and DDFLIP_INTERVAL4 flags are intended to exploit hardware
capabilities. Drivers should not attempt to emulate these flags by looping in the driver until the flip can be retired as
requested. Because important operating system mutexes are held while calling a DirectDraw driver, such an
implementation can affect system performance.
Overlay support
DirectDraw also supports overlays. An overlay surface is one that can be displayed on top of the primary surface
without altering the physical bits in the surface underneath it. With an overlay, registers are set that define a
rectangle on the primary surface that contains the overlay surface. The digital-to-analog converter (DAC) changes
the location of the rectangles. The scan line reads data in primary surface memory until it reaches the rectangle that
is set aside for the overlay. It reads from the overlay surface until that line in the overlay is finished, then continues
with the original primary surface image. This switching from primary surface to the overlay and back happens on
every pass of the scan line and continues until the overlay is completely displayed.
The overlay surface can have a different pixel depth than the primary surface. For example, while 8 bits per pixel
(bpp) may look fine for the primary surface, a video clip may need 16 bpp to display acceptably. The pixel depth
switches seamlessly between the primary surface and the overlay. For more information about overlays with
DirectDraw, see the Video Port Extensions to DirectX section.
Overlays flip in exactly the same way as the primary surface. The DirectDraw surface objects swap pointers so that
the new overlay surface is read when the scan line reaches the rectangle bounding the overlay. The same flipping
algorithm described in Timing a Flip prevents tearing.
Texture support
A texture in 3D space flips the same way as any other surface. A texture is just a flat image that has bits set to
specify that it can be transformed (texture mapped) onto a 3D surface. A texture can be mapped onto a 3D surface
and the motion can be rendered smoothly by page flipping the texture. Flip waits for the renderer to finish reading
(like waiting for the scan line). If the flipping driver supports textures, it must be able to recognize and handle them
appropriately. For more details on textures, see Direct3D Texture Management.
Flipping to the GDI Surface
A display driver should be implemented so that the GDI (desktop) surface can become the primary surface. Doing
so lets applications display GDI-rendered content, such as dialog boxes. The driver can use one of the following
methods to make the GDI surface into the primary surface:
The driver can include the GDI surface as one of the buffers in the driver's flipping chain. This method is the
recommended way to let applications flip to the GDI surface. By default, when an application makes flipping
requests, DirectDraw makes calls to the driver's DdFlip function to cycle through the buffers in the order that
they are attached to each other in the chain. An application can flip to the GDI surface by determining the
position of the GDI surface in the chain and then by making the appropriate number of flipping requests.
The driver can implement a DdFlipToGDISurface function to receive notification when DirectDraw is flipping
to or from a GDI surface. If the driver can access the GDI surface, the driver can flip to the GDI surface after
receiving this notification. Using this method, the GDI surface is not required to be part of the driver's
flipping chain.
Blitting
If a blt is happening within one surface and the source and destination areas overlap, the proper direction must be
determined to avoid overwriting part of the source before it is copied. This can be accomplished with just two
potential starting points at opposite corners of the surface. All the blt engine needs are the location and dimensions
for each image.
Everything possible should be done to speed up the actual blt. Duplicating sections of code to avoid an IF statement
may make the driver go faster, for example. Perhaps the best implementation of this technique is to put the code in
a macro and use that in different places rather than making function calls. For more information, see DdBlt.
Transparent Blt
In a transparent blt, a color key normally specifies the colors that will not be moved. The source color key is
analogous to the blue screen used in motion pictures. The color is compared to each pixel and if they match, the
pixel is not copied. If they do not match, the pixel is copied. DirectDraw also supports color keys that are ranges.
In some cases, a transparent blt may be only partially supported by the hardware. This is probably still faster than
doing it in software. The DDCAPS_COLORKEYHWASSIST flag should be set in these cases.
An example of a partially supported transparent blt is a display card that requires a bitmask, instead of just using a
color key. In this case, rather than comparing each pixel with the color key to determine whether to copy it, a
monochrome mask is built. That is, all of the pixels are compared with the color key and the entire surface is
converted to a bitmask (usually one bit per byte, depending on color depth). When using DirectDraw, this is done
when the surface is unlocked.
Once the alpha mask is built, it is compared to the source surface. Everything that is not set on the alpha mask is
copied to the destination surface. This accomplishes the same effect as a source color key, but requires a mask to be
built first, rather than comparing and copying at the same time. The mask must be rebuilt any time the color key is
set. It also must be checked whenever a blt occurs because a color key override can be specified at that time. When
the application's Blt function is called, check that the color key override (the only color key passed to the blt) is the
same as the color key that is set on the surface. If they are the same, it is not really an override and the mask does
not need to be rebuilt. If they are different, then the mask must be rebuilt. (The driver's DdBlt function always sees
the color key as an override.)
Clip Lists
Clipped blts are never passed to the driver on Microsoft Windows 2000 and later. The IsClipped member of
DD_BLTDATA is always FALSE, and the clipped list is always NULL.
Color Fills and Pattern Fills
A color fill, as described by a rectangle, fills an area of the screen with a particular color. On most cards, only the
address of display memory, the dimensions of the rectangle, and the color are needed. Some cards require
beginning and ending X and Y coordinates. Note that Windows automatically drops the last line of these
coordinates. For example, when they are numbered from 0 to 640, Windows drops line 640.
Some cards use a pattern fill, which can accomplish the same thing as a color fill. An 8 x 8 region of pixels (the
pattern) makes up the desired color, and that pattern is used to fill the specified area. The pattern is set to equal the
color desired and filled in the same as a color fill. A pattern fill takes four separate colors that can be blended,
reducing the number of necessary instructions.
DirectDraw and Color Space Conversion
DirectDraw allows surfaces to be created and stored in YUV formats. Four Character Codes (FOURCCs) denote what
color space conversions are being used. Then, during the overlay process, the image is converted to 16-bit RGB.
YUV 4:2:2 is effectively the same density as 16 bits per pixel (bpp), but the color fidelity is better. The image can be
written in as YUV and go into display memory as RGB, but normally the translation is done as it is read out of
display memory so that the compression is maintained. This saves display memory and speeds up playback. For
Windows 2000 and later, some YUV formats (UYVY and YUY2 âˆ’ both types of 4:2:2) are emulated, but only when
used as textures. Note that unsupported YUV format surfaces cannot be created in system memory.
Three common YUV color spaces are:
4:2:2 (standard television is of this type)
4:1:1 (more compressed)
4:4:4 (similar to RGB)
There are many varieties of these color spaces and many other YUV formats in use today. For information about
FOURCC, go to the FOURCC website.
Drivers That Assume State Information
Most display drivers set up the state of the blitter when they do an operation. However, some display drivers expect
the blitter to be in a known state. For example, some display drivers assume that the origin is set to do a standard
blt rather than a transparent blt, and so on. In these cases, the state has to be reset after DirectDraw uses it.
Blts between two surfaces can be accomplished either with a fixed origin or with a separate origin for source and
destination surfaces. If the display driver expects the origin to remain constant and DirectDraw changes it to access
a secondary surface, the old pointer must be saved and restored when the operation is finished.
If this forces DirectDraw to wait while operations are being done so it can restore registers to their previous state,
performance suffers. This is because DirectDraw's speed comes from being asynchronous.
Care must be taken in these cases to minimize the changes being made to the display state. Moving the origin in
this scenario also wastes room on the stack that could otherwise be used for passing parameters.
Managing PDEVs
This topic applies only to Windows 2000 and later.

The number of threads that call into a display driver is dependent on the number of existing PDEVs on a device.
Each device has a maximum of one enabled PDEV per adapter output and an unlimited number of disabled PDEVs.
A PDEV is disabled or enabled by calling the driver's DrvAssertMode function. When a display driver manages a
mix of disabled and enabled PDEVs, the operating system permits a single thread to call a driver function with an
enabled PDEV while simultaneously permitting multiple threads to call driver functions with disabled PDEVs. For
example, DrvBitBlt could be running on the enabled PDEV while another disabled PDEV is being destroyed by
DrvDisableSurface. Even if a single display driver manages multiple enabled PDEVs, (for example, in a multiple
monitor scenario), the operating system still only lets a single thread call into driver code with any of those enabled
PDEVs.
If the display driver must manage any global resources and hardware states that are shared between PDEVs, the
display driver must also handle any necessary synchronization. The display driver is mapped into session space, so
each session has its own set of global variables. Therefore, you must not use a display driver global variable to hold
a synchronization object such as a mutex. Instead, store the mutex in the device extension of the video miniport
driver, which is mapped into global space not session space. You can initialize the mutex in the video miniport
driver's HwVidInitialize function. Then, the display driver's DrvEnablePDEV function can obtain a pointer to the
mutex by sending a custom IOCTL to the video miniport driver. Display driver threads that belong to different
sessions will have separate copies of the pointer, but all of those copies will point to the same mutex object.
The display driver is not allowed to directly call the kernel routines that acquire and release a mutex, so the display
driver must rely on the video miniport driver to perform those tasks. The video miniport driver could implement a
function that acquires and releases the mutex, and the display driver could obtain a pointer to that function in the
same custom IOCTL that it uses to obtain a pointer to the mutex itself.
Only the following limited number of driver functions can be called with a disabled PDEV:
DdMapMemory (for unmapping memory only)
DrvDisableDirectDraw
D3dCreateSurfaceEx (for system memory only)
D3dDestroyDDLocal
D3dContextCreate
D3dContextDestroy
DdDestroySurface
DdLock
DdUnlock
DestroyD3DBuffer
LockD3DBuffer
UnlockD3DBuffer
DrvAssertMode
DrvDisablePDEV
DrvDisableSurface
DrvResetPDEV
Performing Floating-point Operations in DirectDraw
DirectDraw driver callback functions must perform all floating-point operations between calls to the GDI-supplied
EngSaveFloatingPointState and EngRestoreFloatingPointState functions. That is, the driver's callback
functions must save the floating-point state prior to performing a floating-point operation and must restore the
floating-point state when the floating-point operation completes. For more information about floating-point
operations, see Floating-Point Operations in Graphics Driver Functions.
Return Values for DirectDraw
The following tables list values that can be returned by the DirectDraw driver-supplied functions. The
DDHAL_DRIVER_Xxx values actually are returned in the DWORD return value. The DD_OK value and DDERR_Xxx
error codes are returned in the ddRVal member of the structure to which the particular function's parameter
points.
For specific error codes that each function can return, see the function descriptions in the reference section. Refer to
DirectDraw header files ddraw.h and dxmini.h for a complete listing of error codes and return values. Note that
error codes are represented by negative values and cannot be combined.
A function in a DirectDraw driver must return one of the two return codes: DDHAL_DRIVER_HANDLED or
DDHAL_DRIVER_NOTHANDLED. If the driver returns DDHAL_DRIVER_HANDLED, then it must also return either
DD_OK or one of the error codes listed in ddraw.h. A function in a DirectDraw driver can return the codes in the
following table. These codes are defined in ddraw.h.
RETURN CODE MEANING
DD_OK The request completed successfully.
DDHAL_DRIVER_HANDLED The driver has performed the operation and returned a

valid return code for that operation in the ddrval
member of the structure passed to the driver's callback. If
this code is DD_OK, DirectDraw or Direct3D proceeds with
the function. Otherwise, DirectDraw or Direct3D returns
the error code provided by the driver and aborts the
function.
DDHAL_DRIVER_NOCKEYHW The display driver couldn't handle the call because it ran
out of color key hardware resources.
DDHAL_DRIVER_NOTHANDLED The driver has no comment on the requested operation. If

the driver is required to have implemented a particular
callback, DirectDraw or Direct3D reports an error
condition. Otherwise, DirectDraw or Direct3D handles the
operation as if the driver callback had not been defined by
executing the DirectDraw or Direct3D device-independent
implementation. DirectDraw and Direct3D typically ignore
any value returned in the ddrval member of that
callback's parameter structure.
DDERR_GENERIC There is an undefined error condition.
DDERR_OUTOFCAPS The hardware needed for the requested operation has

already been allocated.
DDERR_UNSUPPORTED The operation is not supported.

A DxApi function that is implemented in a video miniport driver returns one of the codes in the following table.
These codes are defined in dxmini.h.
RETURN CODE MEANING
DX_OK The request completed successfully.
DXERR_GENERIC There is an undefined error condition.
DXERR_OUTOFCAPS The hardware needed for the requested operation has

already been allocated.
DXERR_UNSUPPORTED The operation is not supported.

DirectDraw Driver Initialization
This section provides information about the driver initialization process, as it relates to Windows 2000 and later.
Windows 2000 Driver Initialization
Windows 2000 Driver Initialization
On Windows 2000 and later, driver information is only retrieved when requested by an application. In other words,
in response to a Microsoft DirectDraw application's request to create an instance of a DirectDraw object, the
graphics engine calls the driver functions to initialize a DirectDraw driver.
Starting with Windows 2000, this sequence is done at boot time and after each mode change. This has a side effect.
On Windows 98/Me, drivers typically have two modes of operation--GDI mode and DirectDraw mode. If
DirectDraw is running, it will not let GDI cache bitmaps, instead giving all of the memory to DirectDraw (and vice
versa when in GDI mode). This behavior caused windowed applications (such as webpages that use DirectX) to
suffer. Therefore, on Windows 2000 and later, GDI and DirectDraw are required to cooperate about how memory is
used. The Permedia3 sample driver that ships with the Windows Driver Development Kit (DDK) has examples of
how to do this. (The DDK preceded the Windows Driver Kit [WDK].)
The driver initialization sequence is achieved by calling the following functions:
DrvGetDirectDrawInfo to retrieve information about the hardware's capabilities. GDI calls this function
twice:
The first call determines the size of the display memory heap and the number of FOURCCs that the driver
supports. GDI passes NULL for both pvmList and pdwFourCC parameters. The driver should initialize and
return pdwNumHeaps and pdwNumFourCC parameters only.
The second call is made after GDI allocates display memory and FOURCC memory based on the values
returned from the first call in pdwNumHeaps and pdwNumFourCC parameters. In the second call, the
driver should initialize and return pdwNumHeaps, pvmList, pdwNumFourCC, and pdwFourCC
parameters.
GDI allocates and zero-initializes the DD_HALINFO structure to which pHalInfo points.
DrvGetDirectDrawInfo function should fill in the pertinent members of the DD_HALINFO structure with
driver-specific information:
The driver should initialize the appropriate members of the VIDEOMEMORYINFO structures to describe
the general format of the display's memory. See Display Memory.
The driver should initialize the appropriate members of the DDCORECAPS structure to describe the
driver's core capabilities to DirectDraw.
If the driver supports any of the DirectX features that are queried by sending a GUID to the driver's
DdGetDriverInfo callback, the driver must initialize the GetDriverInfo member to point to the driver's
DdGetDriverInfo callback and set the DDHALINFO_GETDRIVERINFOSET bit in dwFlags.
The driver must set dwSize to the size, in bytes, of the DD_HALINFO structure.
DrvEnableDirectDraw is used by the runtime to enable the DirectDraw hardware and determine some of
the driver's callback support. GDI allocates and zero-initializes the DD_CALLBACKS,
DD_SURFACECALLBACKS, and DD_PALETTECALLBACKS parameter structures. The driver should do the
following for each of these callbacks that it implements:
Set the corresponding member of the appropriate structure to point to the callback.
Set the corresponding DDHAL_XXX_XXX bit in the dwFlags member of the appropriate structure.
The driver can implement its DrvEnableDirectDraw function to indicate that it supports the callback
functions listed in DirectDraw Callback Support Using DrvEnableDirectDraw.
A driver's DrvEnableDirectDraw implementation can also dedicate hardware resources such as display
memory for use by DirectDraw only.
DdGetDriverInfo to retrieve the other callback functions and capabilities that the driver supports.
If it is not NULL, the GetDriverInfo callback is returned in the DD_HALINFO structure by the driver's
DrvGetDirectDrawInfo. GDI allocates and initializes the DD_GETDRIVERINFODATA structure and calls
DdGetDriverInfo for each of the GUIDs described in the DD_GETDRIVERINFODATA reference section. All
GUIDs are defined in ddrawint.h.
The driver can implement its DdGetDriverInfo function to indicate that it supports the callback functions
specified in DirectDraw and Direct3D Callback Support Using DdGetDriverInfo.
Locking the surface memory (whether the whole surface or part of a surface) ensures that an application and the
hardware cannot obtain access to the surface memory at the same time. This prevents errors from occurring while
an application is writing to surface memory. In addition, an application cannot page flip until the surface memory is
unlocked.
DirectDraw Callback Support Using
DrvEnableDirectDraw
The display driver can implement the DrvEnableDirectDraw function to indicate various DirectDraw callback
support. To indicate support, the driver returns pointers to the DD_CALLBACKS, DD_SURFACECALLBACKS, and
DD_PALETTECALLBACKS structures in the pCallBacks, pSurfaceCallBacks, and pPaletteCallBacks parameters.
The driver populates members of the DD_CALLBACKS structure to indicate that it supports the following callback
functions.
CALLBACK FUNCTION DESCRIPTION
DdCanCreateSurface Returns a value that indicates whether the driver can

create a surface of the specified surface description.
DdCreatePalette Creates a DirectDrawPalette object for the specified

DirectDraw object.
DdCreateSurface Creates a DirectDraw surface.
DdGetScanLine Returns the number of the current physical scan line.
DdMapMemory Maps application-modifiable portions of the frame buffer

into the user-mode address space of the specified process,
or it unmaps memory.
DdWaitForVerticalBlank Returns the vertical blank status of the device.
The driver populates members of the DD_SURFACECALLBACKS structure to indicate that it supports the following
callback functions.
DdAddAttachedSurface Attaches a surface to another surface.
DdBlt Performs a bit-block transfer (blt) of display data from a

source surface to a destination surface.
DdDestroySurface Destroys a DirectDraw surface.

DdFlip Causes the surface memory associated with the target

surface to become the primary surface, and the current
surface to become the nonprimary surface.
DdGetBltStatus Queries the blt status of the specified surface.
DdGetFlipStatus Determines whether the most recently requested flip on a

surface has occurred.
DdLock Locks a specified area of surface memory and provides a

valid pointer to a block of memory associated with a
surface.
DdSetColorKey Sets the color key value for the specified surface.
DdSetOverlayPosition Sets the position for an overlay.
DdSetPalette Attaches a palette to the specified surface.
DdUnlock Releases the lock held on the specified surface.
DdUpdateOverlay Repositions or modifies the visual attributes of an overlay

surface.
The driver populates members of the DD_PALETTECALLBACKS structure to indicate that it supports the following
callback functions.
DdDestroyPalette Destroys the specified palette.
DdSetEntries Updates the palette entries in the specified palette.

DirectDraw and Direct3D Callback Support Using
DdGetDriverInfo
The display driver can implement the DdGetDriverInfo function to indicate various DirectDraw and Direct3D
callback support. Callback support is contingent on the following GUIDs that the driver receives in the guidInfo
member of the DD_GETDRIVERINFODATA structure, to which the lpGetDriverInfo parameter points. The driver
returns a pointer to a structure in the lpvData member that specifies DirectDraw or Direct3D callback support.
If the driver receives the GUID_ColorControlCallbacks GUID, it returns a pointer to the
DD_COLORCONTROLCALLBACKS structure. If it supports color control, the driver fills the ColorControl
member of DD_COLORCONTROLCALLBACKS to specify its DdControlColor callback function.
If the driver receives the GUID_D3DCallbacks, GUID_D3DCallbacks3, or GUID_Miscellaneous2Callbacks
GUID, it returns a pointer to the D3DHAL_CALLBACKS, D3DHAL_CALLBACKS3, or
DD_MISCELLANEOUS2CALLBACKS structure. The driver uses these structures to indicate its Direct3D
callback support. For more information, see Direct3D DDI.
If the driver receives the GUID_KernelCallbacks GUID, it returns a pointer to the DD_KERNELCALLBACKS
structure. The driver fills members of DD_KERNELCALLBACKS to indicate that it supports the following
callback functions.
DdSyncSurfaceData Sets and modifies surface data.
DdSyncVideoPortData Sets and modifies video port extensions (VPE) object

data.
If the driver receives the GUID_MiscellaneousCallbacks GUID, it returns a pointer to the

DD_MISCELLANEOUSCALLBACKS structure. If it supports a DdGetAvailDriverMemory callback function,
the driver fills the DdGetAvailDriverMemory member of DD_MISCELLANEOUSCALLBACKS to specify
DdGetAvailDriverMemory.
If the driver receives the GUID_MotionCompCallbacks GUID, it returns a pointer to the
DD_MOTIONCOMPCALLBACKS structure to indicate its support of motion compensation callbacks. For
more information, see Compressed Video Decoding.
If the driver receives the GUID_NTCallbacks GUID, it returns a pointer to the DD_NTCALLBACKS structure.
The driver fills members of DD_NTCALLBACKS to indicate that it supports the following callback functions.
DdFlipToGDISurface Notifies the driver when DirectDraw is flipping to or

from a GDI surface.
DdFreeDriverMemory Frees offscreen or nonlocal display memory to satisfy a

new allocation request.
DdSetExclusiveMode Notifies the driver when a DirectDraw application is

switching to or from exclusive mode.
If the driver receives the GUID_VideoPortCallbacks GUID, it returns a pointer to the

DD_VIDEOPORTCALLBACKS structure to indicate its support of VPE Callback Functions. For more information,
see Video Port Extensions to DirectX.
Video Port Extensions to DirectX
Driver developers for devices with a hardware video port should implement the video port extensions (VPE) to
Microsoft DirectX. The hardware video port on a VGA graphics controller provides a fast mechanism for getting
data to the frame buffer. The hardware video port is a dedicated connection between video devices, typically
between a hardware Moving Pictures Experts Group (MPEG) device or National Television Standards Committee
(NTSC) decoder and the video card. This dedicated connection carries horizontal sync (H-sync) and vertical sync (V-
sync) information with the video data. The hardware video port and overlay can use this sync information to flip
automatically between multiple buffers, writing to one surface while the overlay displays another. This allows tear-
free video without burdening the application.
VPE allows the client -- typically Microsoft DirectShow -- to negotiate the connection between the MPEG or NTSC
decoder and the hardware video port. VPE also allows the client to control effects in the video stream, including
cropping and scaling. A VPE implementation should do only what is requested of it by the client; for example, it
should crop only when the client requests cropping.
Microsoft DirectDraw VPE objects monitor the incoming signal and pass image data to the frame buffer, using
parameters set though their interface methods to modify the image, perform flipping, or carry out other services.
VPE objects do not control the video decoder or the video source.
The VPEs are not associated with the Microsoft Windows 2000 and later video port system module, videoprt.sys.
Video Port Extensions Background
The video port extensions (VPE) technology is a DirectDraw extension that supports direct hardware connections
from a video decoder and autoflipping in the graphics frame buffer.
When video data is placed in the frame buffer, a video overlay can be used to display it. The video overlay in the
graphics system tells the digital-to-analog converter (DAC) to show data in regions it would not ordinarily show.
The use of overlays is efficient because blitting is not required to make the data visible, and because the video data
can be viewed in a higher color depth than the visible frame buffer.
The hardware video port is a video stream delivery option that can be used instead of the PCI or AGP bus. For
applications that are expected to run concurrently with other applications that require PCI bandwidth, it is
advantageous to use the hardware video port to ensure low-latency video transmission between the decoder and
the VGA graphics controller. This route is not as flexible as a PCI bus solution, because it ties the decoder to a
particular graphics chipset, but it yields an opportunity to bypass the PCI bus if necessary.
The kernel-mode video transport capabilities, described in the Kernel-Mode Video Transport section, also provide
support for VPE to ensure enhanced video playback quality and enhanced video capture support. The driver must
support kernel-mode video transport in order to support VPE under Windows 2000 and later.
VPE supports only hardware-based data connections. Kernel-mode video transport supports direct access for data
transfer.
VPE is not a WDM technology. Under Windows 2000 and later, WDM is used for other peripherals, but not for the
display driver.
Because VPE support is part of the latest version of DirectX, an application can take advantage of these capabilities
with the assurance that the solution works on any graphics card that supports VPE under Windows 2000 and later.
Displaying Video on Interlaced and Proscan Monitors
For the following two scenarios, interlaced video must be integrated for playback under Windows 2000 and later:
Interlaced content that is authored for display on an interlaced television and a computer with an interlaced
monitor (NTSC/PAL).
Existing interlaced content that is authored for display on an interlaced television, which also needs to be
displayed with the best-possible quality on a progressive scan (proscan) computer monitor.
NTSC and Interlaced Data
The National Television Systems Committee (NTSC) standard provides a series of 59.94 interlaced fields per
second, each separated by 1/59.94 of a second. The scan lines of the even-numbered fields fall spatially halfway
between the scan lines of the odd-numbered fields. However, because of the phosphor persistence of the television
monitor, two fields are never displayed on a television screen at the same time. The viewer always sees either an
even field or an odd field, but never both.
A frame in NTSC is an arbitrary grouping of two sequential fields that are completely unrelated. That is, the first
field in a frame is no more related to the second field in the same frame than it is to the second field in the previous
frame. The Phase Alternation Line (PAL) format and the Sequential Color with Memory (SECAM) standard work
identically at about 50 fields per second.
If the video contains high-motion content, the data in the odd field could be different from that in the even field.
This does not cause a problem on a television monitor because you are never looking at both fields at the same
time and the eye does a good job integrating the data. On a computer, however, this interlaced data is often
interleaved into a single buffer and then displayed using progressive scan. This means that both fields are visible at
the same time, with a high potential for motion artifacts.
The process of putting video onto a digital video disc (DVD) and then replaying it is complex. Typically, the source
material being put on the disk was created for television, so each frame has two fields that are interlaced. Film shot
at 24 frames per second (fps), however, must be converted to 59.94 fields per second to be compatible with a
television display.
NTSC/PAL Conversion
Conversion from NTSC to PAL is done by simply playing the film fast -- at 25 fps. Two sequential fields are created
from the same frame and displayed 1/50th of a second apart. Conversion from PAL to NTSC is done by repeating
every fifth video field with a process called 3:2 pulldown. A first film frame is used to create two video fields, and
then a second film frame is used to create three video fields. This process is repeated so that odd and even fields
are sent in the order Ao Ae Ao Be Bo Ce Co Ce.
Therefore, an interlaced video created from a film contains pairs of fields. But unlike a standard television signal
where each field is 1/60th or 1/50th of a second apart; many of these field pairs contain data from the same frame
of film. Displaying these field pairs at the same time does not produce any artifacts and provides better results than
television monitors do. However, any field pairs not from the same frame will be 1/24 of a second apart and
potentially could produce more artifacts.
When the interlaced video is encoded, a flag is set to indicate how the stream should be decoded. This should be
enough information to decode and optimally display the decoded pictures because the output of an MPEG-2 or
NTSC decoder for DVD or DSS is theoretically always 50 or 60 interlaced fields per second. However, older films
and some newer films often have a break in the flag cadence, especially at film reel changeover points. This requires
a method for identifying such progressive content in order to select the proper method for displaying the
information about a frame-by-frame basis.
MPEG and Progressive Content
MPEG-2 syntax provides the information necessary to identify progressive content and 3:2 pulldown. This
information is stored in the header for each frame in the following 1-bit flags:
PROGRESSIVE_FRAME: When TRUE, this indicates that the two fields (of the frame) are actually from the
same time instant (progressive film). When FALSE, this indicates that the fields might be one-half of a frame
time apart (interlaced video).
TOP_FIELD_FIRST: Indicates which field comes first in time.
REPEAT_FIRST_FIELD: Indicates whether a field should be repeated for 3:2 pulldown.
With VPE and DirectDraw under the latest DirectX release, video can always be displayed with the best-possible
quality if these flags or a signal derived from these flags can be conveyed to the system on a per-field basis.
Interlaced video can also be supported by DirectShow, with new flags in the media sample to indicate whether an
uncompressed video media sample is either a full frame or a field, plus any other information. As described in the
Displaying Interleaved Video with VPE section, DirectShow can be instructed to switch between display modes on a
per-frame basis by using either frame-based media samples or field-based media samples.
Progressive Scan Monitors and Interleaved Data
On a proscan monitor, the lines are displayed as a frame -- that is, line 1 is displayed, then line 2, then line 3, and so
on. The typical television set displays a 2:1 raster -- that is, it displays the first field of a frame on the odd lines, and
then it displays the second field of the frame on the even lines. Any material meant to be played on a television
must be processed before being displayed on a proscan monitor.
Displaying Interleaved Video with VPE
Many methods exist for deinterlacing. Professional television producers use devices such as line doublers when
deinterlacing for large-size rear-projection display, and effects systems with motion-adaptive filters when creating
zooms and slow-motion sequences.
Two simple methods are available for displaying interlace on a progressive computer monitor: bob and weave.
These terms are used here for simplicity, because computer and television industry terms are different.
Bob Method
The bob method of displaying data shows each field individually (similar to a television) using an overlay. The
resulting image is half the normal height, so it must be zoomed 200 percent in the vertical direction using an
interpolated overlay stretch. However, if this were all the bob algorithm did, the resulting image would jitter up and
down because the odd field and the even field are offset by one line. Adding one line to the overlay start address on
the odd fields solves this problem, as long as the vertical stretch is performed using an interpolator.
The bob method produces 60 (NTSC) progressive frames per second and retains all temporal information from an
interlaced source. If a video was created with a video camera and the image contains motion, bob is the best low-
cost display process for a progressive monitor. The bob method works for all sources of interlaced video data, but
the weave method produces a crisper image.
By default, DirectShow uses the bob method for correcting interlaced display.
Weave Method
The weave method displays data using the hardware video port to interleave the interlaced fields into an overlay
surface and then shows both fields at the same time. If this were all the weave algorithm did, however, motion
artifacts would appear. The weave algorithm also relies on the MPEG driver to recognize the 3:2 pattern and then
undoes it using functions in the kernel-mode video transport. The kernel-mode video transport may cause the
hardware video port to discard the repeat fields, causing all field pairs to come from the same frame. The result is
full-framed video displayed at 24 frames per second, just as it was originally sampled using film.
Each source-film frame is represented in the NTSC signal by two or three fields. This can be looked at as two A
fields that make up the A frame. Each sequence of four film frames is converted to five television frames. Film
frames A, B, C, and D become five television frames with the field pattern AA, BB, BC, CD, and DD. When the
REPEAT_FIELD flag is used to encode this pattern in an MPEG stream, the MPEG data payload contains only four
frames, but the field order of all five television frames is preserved.
The weave method produces 24 progressive frames per second and retains the full vertical resolution from an
interlaced source. If a video was created from film using a 3:2 pulldown, or if it contains no motion, weave is the
best affordable display process for a progressive monitor.
Mode Indicator and Anamorphic Format
Good driver design calls for a combination of bob, weave, and edge-adaptive filtering. Motion-detection circuitry
can dynamically control the degree to which each technique is used on a pixel-by-pixel basis. Unfortunately, display
logic to accomplish this is difficult to implement unless some additional data is made available. With the increasing
availability of large computer-grade displays, the operating system needs this additional data to assure that basic
graphics circuitry can reliably deliver an optimal picture.
In the world of DVD, it can be expected that many formats will be combined on one disk. There might be 24-frames
per second film edited to 525/60 video, and then edited to 30-frames per second film. Each time such an edit takes
place, it potentially changes how the data is best displayed. The current DVD specification does not ensure good
performance for handling this mix of formats in a display system.
Another possible scenario is the display of a film where the REPEAT_FIELD flag is irregular or absent because of an
error. When the encoder tries to interpret an irregular 3:2 pattern, it attempts to reacquire synchronization, but this
can be messy. It is entirely possible for a display that follows the REPEAT_FIELD flag to change from weave mode to
bob mode, or vice versa, literally in the middle of a shot. This would look bad to the viewer.
A good way to display a film that has an occasional 3:2 pattern irregularity is to switch from weave mode to bob
mode and back at the shot cuts that surround the irregularity. In other cases, it may be best to specifically identify
the first field pair of a new pattern and maintain weave mode.
One can imagine many content scenarios. Some automatic display schemes work better than others depending on
the particular content involved. It is likely that only some content providers will see a need to guarantee optimal
results on the widest range of playback platforms. DirectShow gives these content authors a method for judging
the impact of display mode on their titles and transmits their preferences.
VPE Functions in DirectDraw API and Driver
The video port extensions in the latest DirectX release are low-level extensions to the DirectDraw API. VPE allows
the client -- usually DirectShow -- to negotiate the connection between the MPEG or NTSC decoder and the
hardware video port. VPE also allows the client to control effects in the video stream, including cropping and
scaling.
VPE is not a high-level API designed for broad use by applications. Applications should use DirectShow, which
provides free support for VPE. The following figure shows a simple view of the VPE and kernel-mode architecture.
For more information, see Kernel-Mode Video Transport.
The preceding figure shows VPE in relation to other components of DirectDraw architecture. DirectShow uses VPE
to negotiate the connection, which provides information about how data and V-sync and H-sync information are
transferred. This information can be an APIC connection (ITU 656), external data lines with extra pins, or proprietary
data streams such as those implemented by Brooktree and Philips.
In the negotiation for the connection, the VGA hardware indicates what connections can be supported, and the
MPEG or NTSC decoder indicates its preferences. DirectShow negotiates the best connection between the two. The
connection is described as a globally unique identifier (GUID), with flags to describe other parameters, such as
double clocking and video active.
DirectX VPE Initialization
To enable VPE functionality, the driver must do the following:

When DrvGetDirectDrawInfo is called, initialize the following members of the DDCORECAPS structure
embedded in the DD_HALINFO structure to which the pHalInfo parameter points:
Set the DDCAPS2_VIDEOPORT flag in dwCaps2 to indicate that the display hardware contains a
hardware video port. The driver should also set any other hardware video-port-related DDCAPS2_Xxx
flags to describe the VPE support that the device is capable of.
Set dwMaxVideoPorts to the number of hardware video ports supported by the device.
Initialize dwCurrVideoPorts to zero.
Implement a DdGetDriverInfo function and set the GetDriverInfo member of the DD_HALINFO structure
to point to this function when DrvGetDirectDrawInfo is called. The driver's DdGetDriverInfo function
must parse the GUID_VideoPortCallbacks and GUID_VideoPortCaps GUIDs.
When DdGetDriverInfo is called with the GUID_VideoPortCallbacks GUID, fill in a
DD_VIDEOPORTCALLBACKS structure with the appropriate driver callbacks and flags set. These callbacks
are listed in VPE Callback Functions. The driver must then copy this initialized structure into the DirectDraw-
allocated buffer to which the lpvData member of the DD_GETDRIVERINFODATA structure points, and
return the number of bytes written into the buffer in dwActualSize.
When DdGetDriverInfo is called with the GUID_VideoPortCaps GUID, fill in the array of
DDVIDEOPORTCAPS structures with the capabilities of each hardware video port. Each hardware video port
has an entry in the array, with hardware video port zero specified first, hardware video port one specified
next, and so on. If the device supports only one hardware video port, there will only be one
DDVIDEOPORTCAPS structure in the array. The driver must then copy this data to the DirectDraw-allocated
buffer to which the lpvData member of the DD_GETDRIVERINFODATA structure points and return the
number of bytes written into the buffer in dwActualSize.
VPE Callback Functions
The following table lists the video port extensions (VPE) callback functions that are implemented in a display driver.
A display driver that supports VPE must implement some VPE callback functions; some are optional depending on
the hardware capabilities.
VPE CALLBACK FUNCTION DESCRIPTION
DdVideoPortCanCreate Determines whether the driver can support a DirectDraw

VPE object of the specified description.
DdVideoPortColorControl Gets or sets the VPE object color controls.
DdVideoPortCreate Notifies the driver that DirectDraw created a VPE object.
DdVideoPortDestroy Notifies the driver that DirectDraw destroyed the specified

VPE object.
DdVideoPortFlip Performs a physical flip, causing the VPE object to start

writing data to the new surface.
DdVideoPortGetBandwidth Reports the bandwidth limitations of the device's frame

buffer memory based on the specified VPE object output
format.
DdVideoPortGetConnectInfo Returns the connections supported by the specified VPE

object.
DdVideoPortGetField Determines whether the current field of an interlaced

signal is even or odd.
DdVideoPortGetFlipStatus Determines whether the most recently requested flip on a

surface has occurred.
DdVideoPortGetInputFormats Determines the input formats that the DirectDraw VPE

object can accept.
DdVideoPortGetLine Returns the current line number of the hardware video

port.
DdVideoPortGetOutputFormats Determines the output formats that the VPE object

supports.
VPE CALLBACK FUNCTION DESCRIPTION
DdVideoPortGetSignalStatus Retrieves the status of the video signal currently being

presented to the hardware video port.
DdVideoPortUpdate Starts and stops the VPE object and modifies the VPE
object data stream.
DdVideoPortWaitForSync Waits until the next vertical synch occurs.

DirectDraw Interfaces for Kernel-Mode Video
Transport Support
The kernel-mode video transport must keep track of surface information for each surface it uses, and for each VPE
object. This information must be updated every time DdUpdateOverlay or DdVideoPortUpdate is called for the
surface or hardware video port. Before DirectDraw sends this information to the kernel-mode video transport, it
calls one of two driver functions: DdSyncSurfaceData or DdSyncVideoPortData. These functions allow the driver to
fill in or modify some of the structure information and to use four dwDriverReservedN members of the
DD_SYNCSURFACEDATA or three dwDriverReservedN members of the DD_SYNCVIDEOPORTDATA structure
for its own purposes. These driver functions are required for the kernel-mode video transport support to work
correctly.
A good example of how a driver can use these dwDriverReservedN members is to set a flag indicating which
physical overlay an overlay surface is using if the hardware supports more than one physical overlay.
Color Control Initialization
A driver's DdControlColor function controls the luminance/brightness controls of an overlay and/or primary
surface. To enable color control functionality, the Microsoft DirectDraw HAL must do the following at initialization
time:
If the overlay and/or primary surface contains color controls, set the DDCAPS2_COLORCONTROLOVERLAY
and/or DDCAPS2_COLORCONTROLPRIMAY flags in the dwCaps2 member of the DDCORECAPS structure
that is embedded in the DD_HALINFO structure.
The driver must specify a function in the DD_HALINFO structure that DirectDraw can call to get additional
information. This is described in DdGetDriverInfo.
The DdGetDriverInfo callback must be called with the GUID_ColorControlCallbacks GUID specified. The
driver must fill in a DD_COLORCONTROLCALLBACKS structure with the appropriate driver callbacks and
flags set, then copy this structure to the lpvData member of the input structure.
AGP Support
Microsoft DirectDraw treats Accelerated Graphics Port (AGP) memory as a subclass of display memory. This
memory type is referred to as nonlocal display memory. The terms AGP memory and nonlocal display memory are
synonymous from the perspective of DirectDraw and DirectDraw drivers.
AGP memory is considered a pure subclass of display memory. That is, if a driver indicates it supports AGP
memory, in most cases it must have the same functional capabilities for local and nonlocal display memory,
although performance differences are permitted. The exception is if the DDCAPS2_NONLOCALVIDMEMCAPS flag
is set, in which case the blt capabilities for nonlocal display memory can differ from local display memory.
For example, if a driver states that it can texture from display memory, it must be able to texture from both local
and nonlocal display memory. Blitting is treated similarly. A driver that exports the source color key blt capability
must be able to do a source color keyed blt both to and from nonlocal display memory. The one exception to this
rule is that it is possible to preclude certain surface types from ever being allocated in nonlocal display memory.
For example, it is possible to use heaps to prevent overlay surfaces from ever being allocated in AGP memory.
Because AGP memory is treated as a subclass of display memory, DirectDraw has no separate set of display driver
entry points for AGP memory. The existing display driver calls are used for both AGP surfaces and local display
memory surfaces. An AGP-compatible driver must check incoming surfaces to see if they are in nonlocal or local
display memory, and take the appropriate action. Blts from system to AGP (and vice versa) go through DirectDraw
emulation layer as normal, unless a driver supports system-to-display memory blts (in which case it must support
system-to-AGP transfers as well).
Drivers should set the DDCAPS2_TEXMANINNONLOCALVIDMEM flag as much as possible because the Direct3D
texture manager keeps its backing image of the video memory copy of a surface in AGP memory (rather than
system memory) when this is the case.
The remainder of this section discusses the steps necessary to modify your existing driver to support AGP memory
using DirectDraw nonlocal display memory features.
Flagging Support for Nonlocal Display Memory
A driver must inform DirectDraw (and DirectDraw applications) that it is AGP-compatible. This is accomplished by
specifying the capability bit DDCAPS2_NONLOCALVIDMEM in the dwCaps2 member of the DDCORECAPS
structure, which is part of the DD_HALINFO structure passed to DirectDraw.
If running on an operating system that does not support AGP services, DirectDraw turns off the
DDCAPS2_NONLOCALVIDMEM capability bit and all associated nonlocal heaps.
Specifying Nonlocal Display Memory Heaps
A DirectDraw driver controls how much AGP memory is available and to which surfaces by returning heaps in the
DD_HALINFO structure that is passed back to DirectDraw. The driver identifies nonlocal heaps by specifying the
VIDMEM_ISNONLOCAL flag in the dwFlags member of the VIDEOMEMORY data structure that describes the
heap. Furthermore, a driver can choose to enable combining of memory writes on a nonlocal heap by specifying
the VIDMEM_ISWC flag in addition to VIDMEM_ISNONLOCAL.
It is the responsibility of an AGP-compatible DirectDraw driver to describe to DirectDraw the size (linear or
rectangular), attributes (write combining), and surface types the heap should not and cannot be used for. However,
it is not the driver's responsibility to actually reserve address space for the heap or commit memory to it. This is
handled by DirectDraw on the driver's behalf. DirectDraw hides the details of managing AGP memory from the
driver.
When specifying a nonlocal display memory heap, the start address specified by the driver has no meaning. The
start address, both graphic address remapping table (GART) linear and physical, of a nonlocal heap is determined
by the operating system when DirectDraw requests that a heap be created. Therefore, the driver can return any
value for the start address. For a rectangular heap, this start address is ignored by DirectDraw. The specified width
and height are all that DirectDraw needs to determine memory requirements. For a linear heap, the start address
has a meaning, but only to the extent that it is used to compute the size of the heap.
DirectDraw determines the size of a linear heap by (fpEnd - fpStart) + 1 (note that the specified end address is the
last byte in the heap, not the first byte after the end of the heap). As such, any start address can be specified as long
as when DirectDraw subtracts that address from the end address and adds 1, the result is the maximum size of the
heap.
Although physical memory is only committed to the AGP heap when it is needed (that is, as surfaces are allocated),
it is important not to specify very large nonlocal heaps. Such heaps consume shared address space and other
important resources even before physical memory is committed.
It is also important to note that DirectDraw and the Windows operating system impose policy limits on the amount
of AGP memory that can be committed at any given time. This is necessary to prevent resource starvation for the
rest of the system. Therefore, it is quite possible for a request for a nonlocal display memory surface to fail even
though the nonlocal heaps are not fully committed.
When DirectDraw has determined the correct addresses (linear and physical) of the heap, it stores them in its heap
descriptors. DirectDraw also provides a mechanism to notify a driver at initialization time of these addresses. How
this is done is platform specific:
On Microsoft Windows 2000 and later, this is done with a DdGetDriverInfo call using the
GUID_UpdateNonLocalHeap GUID. When this GUID is passed to DDGetDriverInfo, the heap data is passed in the
DD_UPDATENONLOCALHEAPDATA data structure.
Notification of Actual Heap Base Addresses
A driver might need to know the linear and physical address of the base of the heap at DirectDraw initialization
time (for example, during mode changes) rather than waiting for a surface creation request and looking at the
heaps in the global DirectDraw surface object. To support this, DirectDraw calls the driver-supplied
DdGetDriverInfo callback function with a globally unique identifier (GUID) that identifies the information to be
returned by the driver. If the driver recognizes the GUID and has information to return, it copies this information
into the supplied data structure and passes it back to DirectDraw.
The driver uses two GUIDs to gather and offer further information regarding Direct Draw heaps:
GUID_GetHeapAlignment
GUID_UpdateNonLocalHeap
GUID_GetHeapAlignment signals to the driver to gather heap alignment information about any DirectDraw heaps
that are passed to it. The heap information is passed to the driver using the DD_GETHEAPALIGNMENTDATA
structure. GUID_GetHeapAlignment is defined as:
DEFINE_GUID( GUID_GetHeapAlignment,
0x42e02f16, 0x7b41, 0x11d2, 0x8b, 0xff, 0x0, 0xa0, 0xc9, 0x83, 0xea, 0xf6);
GUID_UpdateNonLocalHeap signals the driver to update its internal state with the heap information with the
nonlocal heap structures supplied by DirectDraw. This information is contained in the
DD_UPDATENONLOCALHEAPDATA structure. GUID_UpdateNonLocalHeap is defined as:
DEFINE_GUID( GUID_UpdateNonLocalHeap,
0x42e02f17, 0x7b41, 0x11d2, 0x8b, 0xff, 0x0, 0xa0, 0xc9, 0x83, 0xea, 0xf6);
If the driver must allocate memory for AGP surfaces by itself, but has exposed heaps to DirectDraw, then
HeapVidMemAllocAligned is exposed as an Eng function for this purpose. HeapVidMemAllocAligned only
deals with heap addresses so it returns an offset. The driver must do whatever memory mapping work it needs to
do to turn the information returned from HeapVidMemAllocAligned into a virtual address.
Callback Handling Of Nonlocal Display Memory
Surfaces
Nonlocal display memory surfaces are treated in exactly the same way as local display memory surfaces in terms of
driver callbacks. For example, a driver's DdCanCreateSurface callback is called when attempting to create nonlocal
(as well as local) display memory surfaces, DdBlt is called when blitting between local and nonlocal display memory
surfaces, and DdDestroySurface is called when the surface memory is being discarded.
Because the same driver functions are used for both local and nonlocal display memory surfaces, drivers must
explicitly check the memory type of incoming surfaces. The memory type can be identified by checking the
ddsCaps.dwCaps member of the local surface object DD_SURFACE_LOCAL passed to the driver against the
capability bits DDSCAPS_LOCALVIDMEM and DDSCAPS_NONLOCALVIDMEM.
Applications and AGP hardware access the bits of a DirectDraw surface using two different addresses. Applications
use a virtual address that is translated through the operating system's page table to a portion of physical address
space. This physical address space is mapped by the GART hardware to appear contiguous. Hardware accesses this
physical linear address (again remapped to real, discontinuous pages of memory by the GART). The fpVidMem
member of the DD_SURFACE_GLOBAL structure holds the virtual linear address useful to applications (and
potentially some driver operations). The device-side physical address can be found from:
fpStartOffset = pSurface->fpHeapOffset - pSurface->lpVidMemHeap->fpStart;
This offset is then added to the device's GART physical base address (contained in the liPhysAGPBase member of
the VMEMHEAP structure).
In all other respects, nonlocal display memory surfaces behave exactly like local display memory surfaces. The
driver receives lock requests when an application is trying to access the surface data of nonlocal display memory
surfaces. Operations such as blts between nonlocal display memory and local display memory can be
asynchronous, just as they can be between local display memory surfaces. Attempts to lock nonlocal display
memory surfaces when operations involving those surfaces are still pending should be failed by the driver with
DDERR_WASSTILLDRAWING error code in the usual way.
Furthermore, although DirectDraw manages the allocation and freeing of nonlocal display memory surfaces on
behalf of the driver, the driver is still notified of the creation and destruction of surfaces in nonlocal display
memory. When a nonlocal display memory surface is destroyed, the driver should not return until the surface is no
longer in use.
Nonlocal display memory is lost in exactly the same way as local display memory, that is, when a mode switch
occurs or when exclusive mode changes, all local and nonlocal display memory surfaces are lost and the
DdDestroySurface driver callback is invoked for each surface. However, DirectDraw does not guarantee that the
actual reserved address ranges and committed memory are preserved. DirectDraw may choose to discard all
committed memory and the reserved address ranges, or it may choose to decommit memory but preserve the
address range. It may also preserve both and simply mark the surfaces as lost. A driver should not make
assumptions based on any one of these scenarios.
Reordering Textures In Nonlocal Display Memory
There are special cases where the driver writer might want to reorder textures in AGP memory to allow more
efficient texture management. The DDCAPS2_SYSTONONLOCAL_AS_SYSTOLOCAL flag signals that the driver can
support blts from backing surfaces (system memory copy of a surface) to nonlocal video memory using all the
same caps that were specified for backing surface memory to local video memory blts.
The DDCAPS2_SYSTONONLOCAL_AS_SYSTOLOCAL flag is valid only if the DDCAPS2_NONLOCALVIDMEMCAPS
flag is set. If DDCAPS2_SYSTONONLOCAL_AS_SYSTOLOCAL is set, then the DDCAPS_CANBLTSYSMEM flag must
be set by the driver and all the associated backing surface blt caps must be correct.
DDCAPS2_SYSTONONLOCAL_AS_SYSTOLOCAL signifies that the backing surface to video memory DDCAPS blt
caps also apply to backing surface to nonlocal video memory blts. For example, the dwSVBCaps,
dwSVBCKeyCaps, dwSVBFXCaps, and dwSVBRops members of the DDCORECAPS structure are assumed to be
filled in correctly. Any blt from a backing surface to nonlocal memory that matches these caps bits is passed to the
driver.
Note This feature is intended to enable the driver itself to do efficient reordering of textures. This is not meant to
imply that hardware can write into AGP memory. Hardware writing directly into AGP memory is not currently
supported.
Handling DMA-style AGP
An AGP-compatible display card can use AGP memory in one of two ways: using the execute model or the direct
memory access (DMA) model.
In the execute model, if there is a texture in nonlocal video memory, the card accesses AGP memory directly.
That is, if a card is textured from AGP memory it reads texel data directly from a backing surface (system
memory copy of a surface).
In the DMA model, the contents of the surfaces must be explicitly moved to local display memory on the card
before a texturing operation can be performed.
It is important to note that the model refers to how a client of the display card sees the transfer. For example, a
display card may automatically move texel data from a backing surface to a small local display memory cache when
texturing. This may seem like the DMA model. However, because the client application has no information about
this transfer taking place, the display card is, in fact, exposing an execute model. Only when the client application
has to take explicit action to move the contents of a backing surface to local display memory is the display card
considered to be exposing the DMA model.
The previous sections that dealt with AGP memory described how a driver can enable and expose the execute
model of AGP usage. This section describes the additional steps a driver must take to expose use of DMA model
AGP to the application. Note that the driver writer must decide whether to expose the execute model or the DMA
model when writing the driver. The driver should expose one model or the other, but not both.
Before exposing the DMA model from a driver, it is important to consider the implications of the DMA model to the
application writer. If a driver exposes execute model AGP support, DirectDraw assumes that surfaces in AGP
(nonlocal display memory) and local display memory are functionally identical. Thus the display card can texture
either from nonlocal or local display memory without any additional actions by the application. When setting a
render state, an application can specify the handle to a texture surface directly, regardless of whether the surface is
in nonlocal or local display memory.
However, if a driver exposes the DMA model, surfaces in nonlocal display memory may have different capabilities
from those in local display memory. Therefore, before attempting to texture from a nonlocal display memory
surface, the application must check whether the hardware is capable of texture from nonlocal display memory. This
is accomplished by examining the capabilities exposed by the driver. The same is true for blitting.
An application explicitly requests AGP memory by specifying DDSCAPS_VIDEOMEMORY ORed with
DDSCAPS_NONLOCALVIDMEM. If an application does not specify a memory type or only specifies
DDSCAPS_VIDEOMEMORY, nonlocal display memory is not considered. Also, if the call does not specify local or
nonlocal display memory, the surface is a texture, and the device sets the
D3DDEVCAPS_TEXTURENONLOCALVIDEOMEMORY flag, then the surface can be allocated in AGP memory.
This means that if a driver exposes the DMA model, surfaces are not allocated from AGP memory. This is in contrast
to a driver that exposes execute model, in which AGP memory is allocated even if the application does not explicitly
request it. Drivers that expose the execute model are therefore much simpler for applications to use. Furthermore,
an execute model driver allows a legacy application to gain the benefits of AGP, whereas a DMA model driver only
accelerates new applications written explicitly for AGP. This should be considered when deciding whether to expose
the execute or DMA models.
Flagging Support for DMA Model Nonlocal Display
Memory
In addition to specifying the DDCAPS2_NONLOCALVIDMEM flag to report AGP support, a DMA model driver must
export the capability flag DDCAPS2_NONLOCALVIDMEMCAPS. This flag indicates that nonlocal (AGP) memory has
different capabilities than local display memory.
Reporting DirectDraw Capabilities for DMA Model
Nonlocal Display Memory
A DMA model driver has different capabilities for nonlocal display memory than for local display memory. For
example, a display card may be able to stretch blit local display memory surfaces but not nonlocal display memory
surfaces. If the driver specifies the DDCAPS2_NONLOCALVIDMEMCAPS flag, the driver is probed for the
DirectDraw capabilities of nonlocal display memory surface by the DdGetDriverInfo driver entry point. The GUID
that identifies this probe is GUID_NonLocalVidMemCaps.
It is important to note that for this release of DirectDraw, a driver can only specify the capabilities for blts from
nonlocal display memory to local display memory. Transfers from local display memory to nonlocal display
memory, and from nonlocal display memory to nonlocal display memory, are always emulated by the DirectDraw
HEL. This restriction may be relaxed in a future release.
Reporting Direct3D Capabilities for DMA Model
Nonlocal Display Memory
A DMA model driver must also export the Direct3D capabilities for nonlocal display memory surfaces. This is
significantly simpler than reporting DirectDraw capabilities. The only capability affected is
D3DDEVCAPS_TEXTURENONLOCALVIDEOMEMORY. If a display card exporting the DMA model can texture directly
from nonlocal display memory, it should set this capability in its Direct3D device description. If it cannot, and the
application must explicitly load or blt the nonlocal display memory surface to a local display memory surface before
performing texturing, it should not set this capability. For completeness, an execute model driver should always set
this capability bit.
Managing AGP Heaps
This topic applies only to Windows NT-based operating systems.

A driver can manage AGP heaps using notifications that it receives from the DirectX runtime. The driver receives the
notifications from the runtime as GetDriverInfo2 requests that use the following values:
D3DGDI2_TYPE_DEFERRED_AGP_AWARE
D3DGDI2_TYPE_FREE_DEFERRED_AGP
D3DGDI2_TYPE_DEFER_AGP_FREES
For more information about the GetDriverInfo2 request, see Supporting GetDriverInfo2.
When the display device is created, the display driver receives a GetDriverInfo2 request with the
D3DGDI2_TYPE_DEFERRED_AGP_AWARE notification, which the driver uses to determine if it should disable its
other mechanisms that handle AGP heaps and instead use the D3DGDI2_TYPE_FREE_DEFERRED_AGP and
D3DGDI2_TYPE_DEFER_AGP_FREES notifications that the runtime subsequently sends. In the
D3DGDI2_TYPE_DEFERRED_AGP_AWARE notification, the DirectX runtime provides a pointer to a
DD_DEFERRED_AGP_AWARE_DATA structure in the lpvData member of the DD_GETDRIVERINFODATA data
structure.
The driver sometimes receives a GetDriverInfo2 request with the D3DGDI2_TYPE_DEFER_AGP_FREES notification
before a display mode change occurs. The DirectX runtime only sends this notification if the runtime performs the
display mode change. The driver should check the process identifier (PID) of the process destroying the surface
against the process that created the surface. If the PIDs are different, the driver should not destroy the user-mode
mappings of the AGP memory because an application might still be using the memory.
The driver receives a GetDriverInfo2 request with the D3DGDI2_TYPE_FREE_DEFERRED_AGP notification when all
display devices within the process stop using surfaces, textures, vertex buffers, and index buffers that were locked at
the time of the display mode change. The notification informs the driver that it can safely destroy all of the user-
mode mappings of the AGP memory.
In the D3DGDI2_TYPE_DEFER_AGP_FREES and D3DGDI2_TYPE_FREE_DEFERRED_AGP notifications, the runtime
provides a pointer to a DD_FREE_DEFERRED_AGP_DATA structure in the lpvData member of the
DD_GETDRIVERINFODATA data structure. The dwProcessId member of DD_FREE_DEFERRED_AGP_DATA specifies
the PID of the process that destroys the AGP memory.
Note that an application can terminate without the runtime sending the D3DGDI2_TYPE_FREE_DEFERRED_AGP
notification to the driver. Therefore, the driver should free all of the user-mode mappings of the AGP memory when
it receives a call to its D3dDestroyDDLocal function.
Kernel-Mode Video Transport
This topic describes kernel-mode video transport as it exists on the Microsoft Windows 2000 and later operating
systems.
Kernel-mode video transport refers to a new Microsoft DirectDraw component in ring 0 (kernel mode) that
enhances video functionality. This component accesses the DxApi interface. This interface is added to the video
miniport driver under Windows 2000 and later operating systems.
Windows 2000 and Later
Kernel-mode video transport refers to a Microsoft DirectDraw component that a client, such as Microsoft
DirectShow, can use to enhance video functionality. A primary role of this functionality is to call the miniport driver
to tell it to perform hardware video port and overlay flips when V-sync occurs. This capability can support up to
ten buffers without encountering hardware limitations, as long as the hardware video port supports the V-sync
interrupt request (IRQ). This capability is used automatically by the versions of DirectDraw supplied with Microsoft
DirectX 5.0 and later versions when autoflipping is specified by the client and the hardware cannot autoflip.
The kernel-mode video transport also ensures enhanced capture support. Under Microsoft Windows 98/Me and
Microsoft Windows 2000 and later, the WDM-based video capture driver runs in kernel mode, with direct access to
the frame buffer. The capture driver can "manually" flip overlays. The Windows 2000 and later miniport video
transport driver can provide V-sync notification from the hardware video port or display; it can also get field
polarities, which can be useful when capturing vertical blanking interval (VBI) data.
Although the primary purpose of the kernel-mode driver is to enhance hardware video port autoflipping
capabilities, it also supports video bus masters, which can write data while in kernel mode. The bus master can be
notified before losing the surface because of a mode change, or because a full-screen Command Prompt instance
is launched. Because the new driver support allows a bus master to be called before the changes occur, the bus
master can shut off without causing a problem.
VPE and Kernel-Mode Video Transport Architecture
This section provides some details about the Windows 2000 and later architecture for the video port extensions
(VPE) and kernel-mode video transport in DirectX 5.0 and later versions. The architecture for kernel-mode video
transport is based on new functions that Microsoft added as device-independent code. Kernel-mode video
transport consists of a DxApi function that is supplied as part of DirectDraw, the video miniport driver, and the
COM interface methods supplied as part of DirectDraw.
Windows 2000 and Later
In Windows 2000 and later, as shown in the following figure, the DxApi callbacks are part of the video miniport
driver.
For more information about the DxApi callbacks, see DxApi Miniport Driver Functions For Windows 2000 and Later.
The preceding figure shows the kernel-mode video transport architecture in relation to other kernel-mode and
user-mode components (the dashed line denotes the kernel transition). In this architecture, DirectShow (or another
user-mode client) calls the IDirectDrawKernel and IDirectDrawSurfaceKernel DirectDraw COM interfaces to get
handles to the DirectDraw object and surface objects.
Note This architecture also supports using the PCI bus for data flow between the MPEG device and VGA device.
In Windows 2000 and later, the client then passes these handles to the miniport driver. These handles are specified
in the calls to the kernel-mode video transport. The following figure shows a simple version of how the handles are
passed in user- and kernel-mode video transport.

Using Kernel-Mode Video Transport
Kernel-mode video transport functionality is accessed by the video capture driver linking with dxapi.lib, which
allows it to later call dxapi.sys. This functionality is available only when DirectDraw is loaded.
A video capture driver (for a hardware decoder) uses the DxApi function supplied with kernel-mode DirectDraw to
access the DxApi interface callback functions. The DxApi function is a single entry point that accepts a function
identifier, an input buffer and size, and an output buffer and size. The behavior of this function and the size and
format of the input and output buffers depend on the specified function identifier. The DxApi function and its
function identifiers are defined in ddkmapi.h.
DirectShow or another client accesses the DxApi interface callback functions supplied by the video miniport driver
through DirectDraw. The DxApi interface callback functions are defined in dxmini.h.
To use the kernel-mode video transport interface, the video capture driver must first receive user-mode handles for
each DirectDraw object, surface, and VPE object that it needs to use. For the capture and MPEG models, these
handles are passed down using their existing APIs. If a driver requires this functionality but is not a stream-class
driver, a user-mode component can retrieve the handles using the IDirectDrawKernel and
IDirectDrawSurfaceKernel COM interfaces and pass them down to the driver. The COM interfaces and their
methods are identified in ddkernel.h.
Getting the User-Mode Handles
The following procedures show how to obtain the user-mode (ring 3) handles.
To get the DirectDraw handle for a DirectDraw object:
1. Call QueryInterface(lpDD, &IID_IDirectDrawKernel, &pNewInterface) on the DirectDraw interface.
2. Call the IDirectDrawKernel::GetKernelHandle method on the new interface.
The IDirectDrawKernel::GetKernelHandle method returns a kernel-mode handle for the DirectDraw object. To
release the handle, use the IDirectDrawKernel::ReleaseKernelHandle method.
A user-mode component can also call the IDirectDrawKernel::GetCaps method to retrieve the kernel-mode
capabilities of the DirectDraw object.
Code Sample
ddRVal = IDirectDraw_QueryInterface( lpDD, &IID_IDirectDrawKernel, &pDDK );

if( ( ddRVal == DD_OK ) && ( pDDK != NULL ) )
{
dwDirectDrawHandle = 0;
IDirectDrawKernel_GetKernelHandle( pDDK, dwDirectDrawHandle );
if( dwDirectDrawHandle == 0 )
{
// error
}
}
To get the DirectDrawSurface handle:

1. Call QueryInterface(lpSurface, &IID_IDirectDrawSurfaceKernel, &pDDSK) on the DirectDrawSurface
interface.
2. Call the IDirectDrawSurfaceKernel::GetKernelHandle method on the new interface.
The IDirectDrawSurfaceKernel::GetKernelHandle method returns a kernel-mode handle for the
DirectDrawSurface driver. To release the handle, use the IDirectDrawSurfaceKernel::ReleaseKernelHandle
method.
Code Sample
ddRVal = IDirectDraw_QueryInterface( lpSurface,

&IID_IDirectDrawSurfaceKernel, &pDDSK );
if( ( ddRVal == DD_OK ) && ( pDDK != NULL ) )
{
dwSurfaceHandle = 0;
IDirectDrawSurfaceKernel_GetKernelHandle( pDDSK, dwSurfaceHandle );
if( dwSurfaceHandle == 0 )
{
// error
}
}

Using the DxApi Interface
As described in Using Kernel-Mode Video Transport, a video capture driver (hardware decoder) must call the
DxApi function to access the DxApi interface. As described in VPE and Kernel-Mode Video Transport Architecture, a
video miniport driver implements the DxApi interface on Windows 2000 and later platforms. The following section
describes how the DxApi interface is supported on these platforms:
DxApi Miniport Driver Functions For Windows 2000 and Later
DxApi Miniport Driver Functions For Windows 2000
and Later
Supporting the DxApi interface in a video miniport driver is only supported with Windows 2000 and later.
DxApi interface support is useful for the following operations:
Autoflipping using an IRQ for devices that do not support hardware autoflipping or that have limitations
that make it undependable. This allows DirectDraw to always revert to software autoflipping when hardware
autoflipping is unavailable.
Field skipping using an IRQ to support MPEG drivers that can undo the 3:2 pulldown of MPEG data
originally sampled from film.
Bus mastering, so devices can continuously transfer data without having to call DdLock / DdUnlock for every
frame. This is especially useful because the drivers for these devices are WDM drivers.
Capturing video and VBI. In the miniport driver, it is easy to capture video that is based on a hardware video
port IRQ or graphics IRQ.
DxApi Miniport Driver Initialization
To enable DxApi interface functionality, the DirectDraw driver must perform the following tasks at initialization
time:
1. The driver must specify a DdGetDriverInfo function in the DD_HALINFO structure that DirectDraw can call
to get additional information.
2. The DdGetDriverInfo callback is called with the GUID_KernelCallbacks GUID specified. The driver must fill in a
DD_KERNELCALLBACKS structure with the appropriate callbacks and flags set. The driver then copies this
structure to the lpvData member of the DD_GETDRIVERINFODATA structure.
3. The DdGetDriverInfo callback is called with GUID_KernelCaps GUID specified. The driver must fill in a
DDKERNELCAPS structure. The driver then copies this structure to the lpvData member of the
DD_GETDRIVERINFODATA structure.
4. The DirectDraw runtime calls the video port driver IOCTL handler with MajorFunction = IRP_MJ_PNP,
MinorFunction = IRP_MN_QUERY_INTERFACE, and InterfaceType = GUID_DxApi. The video port driver
then calls the video miniport driver's HwVidQueryInterface function to fill in the DXAPI_INTERFACE
structure with pointers to the DxApi interface callback functions that DirectDraw can call. These callback
functions are listed in Kernel-Mode Video Transport Callback Functions.
The video miniport driver can specify a value in the Context member of the DXAPI_INTERFACE structure that is
passed to the video miniport driver each time one of these functions is called.
Kernel-Mode Video Transport Callback Functions
The following table lists the kernel-mode video transport callback functions that are implemented in a video
miniport driver.
DXAPI CALLBACK FUNCTION DESCRIPTION
DxBobNextField Bobs the next field of interleaved data.
DxEnableIRQ Indicates to the miniport driver which IRQs should be

enabled or disabled.
DxFlipOverlay Flips the overlay.
DxFlipVideoPort Flips the video port extensions (VPE) object.
DxGetCurrentAutoflip Determines which surface is receiving the current field of

video data for capture purposes.
DxGetIRQInfo Indicates that the driver manages the interrupt request.
DxGetPolarity Returns the polarity (even or odd) of the current field

being written by the VPE object.
DxGetPreviousAutoflip Determines which surface received the previous field of

video data for capture purposes.
DxGetTransferStatus Determines which hardware bus master completed.
DxLock Locks the frame buffer so that it can be accessed.
DxSetState Switches from bob mode to weave mode, and vice versa.
DxSkipNextField Skips or reenables the next field.
DxTransfer Bus masters data from a surface to the buffer specified in

the memory descriptor list (MDL).

Notify Callback Functions in a Video Capture Driver
The video capture driver supplies notify callback functions to the DirectDraw runtime when the video capture driver
calls the runtime's DxApi function for certain operations. For example, the video capture driver supplies a
NotifyCallback function when the driver calls DxApi with the DD_DXAPI_OPENVIDEOPORT function identifier to
open a video port. After the video port closes, the DirectDraw runtime is notified and calls NotifyCallback. The video
capture driver can then perform necessary operations that are related to the video port closing.
The video capture driver supplies a NotifyCallback function to the DirectDraw runtime when the video capture
driver calls the DxApi function and specifies any one of the following function identifiers:
DD_DXAPI_OPENDIRECTDRAW
DD_DXAPI_OPENSURFACE
DD_DXAPI_OPENVIDEOPORT
DD_DXAPI_REGISTER_CALLBACK
DD_DXAPI_OPENVPCAPTUREDEVICE
Thereafter, when an event that is associated with the function identifier occurs, the DirectDraw runtime calls the
NotifyCallback function. The video capture driver's NotifyCallback is implemented to perform operations related to
the event.
Video VBI Capture
DirectX 5.2 introduced two DirectDraw driver functions for video vertical blanking interval (VBI) capture. These
functions are DxTransfer and DxGetTransferStatus.
The DxTransfer function facilitates video and VBI capture. Because this function is called at IRQ time, it must return
as quickly as possible. If the display hardware is not ready to do a bus master at the time DxTransfer is called, then
the video miniport driver should keep an internal queue of a number of bus masters (the actual number of bus
masters saved in the queue is up to the driver developer). This allows the hardware to perform the bus master
when the hardware is ready. In other words, the driver should not poll and wait for the bus master to complete.
When DirectDraw calls the DxTransfer function, it supplies a transfer ID in the dwTransferID member of the
DDTRANSFERININFO structure. The video miniport driver can then use this identification when the
DxGetTransferStatus function is called.
When a bus master completes, the display hardware must generate an IRQ. The video miniport driver must then
call the IRQCallback function that was specified in DxEnableIRQ. In this IRQCallback call, the video miniport driver
specifies the DDIRQ_BUSMASTER flag. DirectDraw then calls the DxGetTransferStatus function to determine which
bus master completed. The video miniport driver must return the transfer ID (dwTransferID) that DirectDraw
passed to the driver in an earlier DxTransfer call. In this way, if the driver has five bus masters in the queue,
DirectDraw can determine which one completed most recently.
Extended Surface Alignment
Microsoft DirectDraw supports surface alignment requirements on a per-heap basis. This support was introduced
in Microsoft DirectX 5.0. The driver can specify X and Y alignments for rectangular heaps, and pitch and start-offset
alignments for linear heaps. These alignments can vary for different surface types.
Some display hardware cannot set its start-of-display offset in an atomic operation. At the beginning of a display
period, it is possible for such hardware to latch a new start-of-display offset when the driver is only halfway
through setting the value. DirectDraw now allows the driver to specify alignment requirements for visible back
buffers. Some hardware may be able to express alignment requirements for potentially visible back buffers that
force the start-of-display offset to be a value that requires only one register write. This technique can help avoid the
occasional flicker that would otherwise be visible when the primary surface is flipped at a high frequency.
Review of the Older Alignment Method
Versions of DirectDraw before DirectX 5.0 allowed the driver to express pitch alignment requirements for linear
heaps. For the purposes of this discussion, use of these alignment requirements by DirectDraw can be seen in three
steps:
1. Create the surface and fill in an aligned lPitch member based on the driver's global alignment requirements
(as returned in the VIDEOMEMORYINFO structure) and the surface's ddsCaps member. This pitch is
increased until it is a multiple of the appropriate alignment requirement.
2. Call the driver's DdCreateSurface callback, if defined. The driver can modify the lPitch value, but this change
will be ignored by Microsoft Windows 2000 and later.
3. If the driver call is not handled, or if it requests allocation, allocate display memory for the surface from one
of the driver's heaps. The width of the allocated surface is taken to be the aligned pitch determined in step 1,
unless modified by the driver in step 2.
If a driver implemented the DdCreateSurface callback, it could be assured that any incoming surface would have its
lPitch member set to an aligned value. For backward-compatibility, this behavior still exists. Step three maintains
exactly the same behavior, unless the driver has exposed a GetHeapAlignment entry point (see the
DD_GETHEAPALIGNMENTDATA structure). If, and only if, this entry point is defined, the previously calculated
lPitch alignment is discarded, and all surface alignment conforms to the requirements reported using
GUID_GetHeapAlignment. Drivers can keep their VIDEOMEMORYINFO structure alignment requirements as they
are, and expect the same alignment behavior when run on older DirectDraw runtimes. This alignment behavior has
been completely replaced for DirectX 5.0 and later versions of the DirectDraw runtime. It should be noted that
exposing GetHeapAlignment turns off this legacy alignment procedure for all heaps, not just those for which
GUID_GetHeapAlignment reports alignment requirements.
Using Extended Surface Alignment
To enable the extended surface alignment functionality, the DirectDraw driver must perform the following tasks at
initialization time:
The driver must specify a DdGetDriverInfo function in the DD_HALINFO structure that DirectDraw can call
to get additional information.
The DdGetDriverInfo callback is called with the GUID_GetHeapAlignment GUID specified. The driver must fill
in a DD_GETHEAPALIGNMENTDATA structure, then copy this structure to the lpvData member of the
DD_GETDRIVERINFODATA structure.
The driver should fill in the DDSCAPS structure pointed to in the HEAPALIGNMENT structure with the logical OR
of the DDSCAPS_xxxx flags for any type of surface that requires alignment in this heap. If a bit in DDSCAPS is set,
then DirectDraw abides by the alignment restrictions expressed in the appropriate SURFACEALIGNMENT structure
member. The DDSCAPS_FLIP bit and the FlipTarget member apply to surfaces that are back buffers in the primary
flipping chain, that is, a potentially primary (visible) surface. The following list shows the currently allowed set of
surface capabilities for which alignment can be specified:
DDSCAPS_OFFSCREENPLAIN
DDSCAPS_EXECUTEBUFFER
DDSCAPS_OVERLAY
DDSCAPS_TEXTURE
DDSCAPS_ZBUFFER
DDSCAPS_ALPHA
DDSCAPS_FLIP
Note DirectDraw compares a new surface's capabilities against the entries in the HEAPALIGNMENT structure in
the order in which they are specified. For example, a surface with DDSCAPS_MIPMAP | DDSCAPS_TEXTURE |
DDSCAPS_FLIP set is aligned according to the Texture member of the HEAPALIGNMENT structure, because this is
the first applicable capabilities bit for which an alignment is specified (that is, Texture appears before FlipTarget in
the HEAPALIGNMENT structure). The FlipTarget member is not considered in this example. Because back buffers in
a primary flipping chain are marked with DDSCAPS_FLIP and no other bit for which an alignment can be specified,
such surfaces are aligned according to the FlipTarget member. Surfaces that could potentially become members of
a primary flipping chain (those with the same pixel format and size as the primary surface) are also aligned
according to the FlipTarget member.
Extended Surface Capabilities
Beginning with Microsoft DirectX 6.0, Microsoft DirectDraw contains surface capabilities beyond those found in
previous versions. These extended capabilities require the addition of several new structures, specifically the
DDSCAPS2 and DD_MORESURFACECAPS structures. The DDSCAPS2 structure contains the dwCaps member
originally found in the DDSCAPS structure, but also contains three new members: dwCaps2, dwCaps3, and
dwCaps4. Only dwCaps2 is used in DirectDraw for DirectX 6.0. The last three members of the DDSCAPS2
structure are also identically arranged in the DDSCAPSEX structure.
Extended Surface Capability Flags
The extended surface capabilities added to the latest versions of DirectDraw are made visible to the driver when the
application sets the appropriate flags in the dwCaps2 member of the DDSCAPS2 structure.
Applications can only set the DDSCAPS2_HARDWAREDEINTERLACE flag in conjunction with the
DDSCAPS_OVERLAY flag. If a driver sees this flag set at CreateSurface time, it means that DirectDraw expects that
the driver will do whatever is necessary to match the hardware video port frame rate with the device frame rate.
The DDSCAPS2_HINTDYNAMIC, DDSCAPS2_HINTSTATIC, and DDSCAPS2_OPAQUE flags are hints set by the
application at CreateSurface time that inform the driver what the application plans to do with the surface. The
DDSCAPS2_HINTDYNAMIC flag means that the application will update the surface frequently. The
DDSCAPS2_HINTSTATIC flag means that the application will update the surface rarely, but still requires access. This
means the driver must be able to allow locks on the surface, which may involve some hidden decompression and
compression steps. The DDSCAPS2_OPAQUE flag means that the application will never lock, blt, or update the
surface for the rest of that surface's lifetime. The driver is free to compress or reorder the surface without having to
ever decompress it.
Note The driver does not need to set these flags to enable them. DirectDraw merely passes these bits to the driver
when DdCreateSurface is called.
Driver writers might want to use the extended heap restriction features (described in Extended Heap Restrictions) of
DirectDraw to automatically place DDSCAPS2_OPAQUE textures in optimized heaps. This is entirely up to the driver
developer.
The DDSCAPS2_HINTDYNAMIC, DDSCAPS2_HINTSTATIC, and DDSCAPS2_OPAQUE flags are described in more
detail in the Microsoft DirectX Driver Development Kit (DDK) documentation.
The DDSCAPS2_TEXTUREMANAGE flag is not relevant to drivers. This flag informs the DirectX runtime that it is
responsible for moving the surface from a backing surface to display memory, as appropriate, to enable accelerated
3D texturing.
Exposing the Extended Surface Capabilities
The DDCORECAPS structure contains a DDSCAPS field which drivers fill in to indicate what types of surfaces they
support. When these caps are reported to the application, a slightly different structure, DDCAPS, is returned. This
DDCAPS structure is built from the driver's DDCORECAPS and other structures that are queried using the
DdGetDriverInfo interface. For the latest version of DirectX, the application-visible DDCAPS contains a DDSCAPS2
member. This DDCAPS2 member is constructed from the DDSCAPS member in the DDCORECAPS structure, and
the ddsCapsMore member of the DD_MORESURFACECAPS structure.
The DD_MORESURFACECAPS structure is queried from the driver at driver initialization time using the
DdGetDriverInfo call. The appropriate GUID, as defined in ddrawint.h, is GUID_DDMoreSurfaceCaps.
Responding to the GUID_DDMoreSurfaceCaps query is entirely optional. It is intended to allow drivers to do two
distinct things:
Expose extended surface capabilities that the driver can create in display memory.
Express to DirectDraw new heap restrictions for these extended surface capabilities.
The first item has been covered in the previous section and is fairly self-explanatory. The second item is more
complex, and readers should be familiar with the significance of the ddsCaps and ddsCapsAlt members of the
VIDEOMEMORY structure, described in Memory Heap Allocation, before reading the next section.
Extended Heap Restrictions
The DD_MORESURFACECAPS structure is of variable size. It always has a ddsCapsMore member, but it may have
zero or more ddsExtendedHeapRestrictions entries. If the driver responds to the GUID_DDMoreSurfaceCaps
query, it should return a DD_MORESURFACECAPS structure that contains as many
ddsExtendedHeapRestrictions entries as it returned display memory heaps in the DD_HALINFO structure
(DirectDraw guarantees that the GUID_DDMoreSurfaceCaps query is made after the driver reports DD_HALINFO.)
The driver should also fill in an appropriate dwSize value in the DD_MORESURFACECAPS structure. The value of
dwSize is calculated in this way:
DDMORESURFACECAPS.dwSize =
(DWORD) (sizeof(DDMORESURFACECAPS)
+ (((signed int)DDHALINFO.vmiData.dwNumHeaps) - 1)
* sizeof(DDSCAPSEX)*2 );
Note that subtracting 1 from the value of dwNumHeaps is necessary to account for the fact that the
DD_MORESURFACECAPS structure has a ddsExtendedHeapRestrictions member that is a one-element array.
Only those array elements after the first (that is, from ddsExtendedHeapRestrictions[1] on) should be counted in
calculating the total size of the DD_MORESURFACECAPS structure.
The ddsCapsEx and ddsCapsExAlt members are exactly analogous to the ddsCaps and ddsCapsAlt members of
the array of VIDEOMEMORY structures returned in the pvmList member of the VIDEOMEMORYINFO structure,
which is contained as a member of the DD_HALINFO structure. Any bit set in ddsCapsEx means that surface with
that bit set must not be placed in that heap. Any bit set in the ddsCapsExAlt member means that the surface
cannot be placed in that heap. When allocating surfaces, DirectDraw first passes through all heaps, and if it finds
any heap for which no capability bits in the ddsCaps member of the VIDEOMEMORY structure match with the
DDSCAPS bits of the surface, it allocates the surface in that heap. If this pass finds no such heaps, then DirectDraw
makes the same pass but checks the ddsCapsEx field. If this pass fails to find any heaps, then the surface cannot be
created in any heap.
Extended Surface Description Structure
The extended DirectDraw surface description structure, DDSURFACEDESC2, is identical to the DDSURFACEDESC
structure, except that the pointer to the DDSCAPS structure at the end of the structure has been replaced with a
pointer to a DDSCAPS2 structure.
The data blocks for the DdCreateSurface and DdCanCreateSurface driver calls each contain a pointer to a
DDSURFACEDESC structure. Beginning with DirectX 6.0, these pointers might actually point to a DDSURFACEDESC2
structure, even though the pointers remain typed as LPDDSURFACEDESC. If a driver chooses, it can examine the
dwSize member of the DDSURFACEDESC pointer, and thereby decide if the pointer actually points to a
DDSURFACEDESC2 structure. If your driver must run on pre-DirectX 6.0 installations, it must make this check.
If the size returned is sizeof(DDSURFACEDESC2), the driver can then examine the dwCaps2, dwCaps3, and
dwCaps4 members of the DDSCAPS2 structure.
Compressed Texture Surfaces
A surface can contain a bitmap to be used for texturing 3D objects. To reduce the amount of memory consumed by
textures, Microsoft DirectDraw supports the compression of texture surfaces.
Note No new callbacks have been added to support compressed texture surfaces. DirectDraw passes information
about compressed texture surfaces to the driver through the existing driver callbacks.
ALPHA
FOURCC DESCRIPTION PREMULTIPLIED?
DXT1 Opaque / one-bit alpha N/A
DXT2 Explicit alpha Yes
DXT3 Explicit alpha No
DXT4 Interpolated alpha Yes
DXT5 Interpolated alpha No
The preceding shows the five types of compressed textures that drivers should support.
For more information about the format of compressed textures, see Compressed Texture Formats in the
DirectDraw SDK documentation.
Enumerating DXT Formats
In Microsoft DirectX, there are two ways for your driver to enumerate pixel formats. The first method enumerates
formats that can be used for textures. This method is implemented using the lpTextureFormats member of the
D3DHAL_GLOBALDRIVERDATA structure. The second method enumerates formats that can be used for either
DDSCAPS_OVERLAY surfaces or DDSCAPS_OFFSCREENPLAIN surfaces. The second method uses the
dwNumFourCCCodes member of the DDCORECAPS structure included in the DD_HALINFO structure and the
lpdwFourCC array that is also included in the DD_HALINFO structure.
Because DXT formats are primarily intended to be used as textures, your driver enumerates DXT formats only
through the first method. There is no need to add DXT formats to the lpdwFourCC array.
Creating the Compressed Texture Surface
Whenever DirectDraw requests the driver to create a surface, the driver must determine whether it is being asked
to create a compressed texture surface. To determine this, the driver must check for information that has previously
been set by DirectDraw in the DDSURFACEDESC2 structure for the surface being created. Your driver must include
the following verification steps (as with any surface):
Check for the DDSCAPS_TEXTURE flag in the dwFlags member of the DDSCAPS structure.
Check for the DDPF_FOURCC flag in the dwFlags member of the DDPIXELFORMAT structure for the
surface being created. This check should occur before the following dwFourCC check.
Check for one of the DXT codes in the dwFourCC member of the DDPIXELFORMAT structure for the
surface being created.
Check the width and height members (dwWidth and dwHeight) of the DDSURFACEDESC2 structure.
DirectDraw sets these members to multiples of 4 pixels.
Using Compressed Texture Surfaces
DirectDraw only calls the driver to do a blt between two surfaces of the same DXT type if the
DDCAPS2_COPYFOURCC flag is set in the dwCaps2 member of the DDCORECAPS structure. If this flag is not set,
the DirectDraw HEL performs the blt. This is important for backing surface-to-display copy blts, because this is the
mechanism whereby textures are downloaded from backing (system memory) surfaces to display memory. Thus,
exposing DXT texture surfaces effectively requires your driver to support the DDCAPS2_COPYFOURCC flag.
The DDCAPS2_COPYFOURCC flag has some additional implications. Your driver must be able to execute a blt
between FOURCC formats having at least these attributes:
The source and destination formats are the same FOURCC format.
The source and destination surfaces are different.
The source and destination rectangles both make up the entire surface (that is, there is no stretching and
there are no subrectangles).
Both surfaces are in display memory.
The driver must be able to perform these blts for every FOURCC format it supports in display memory.
Note Microsoft DirectShow uses the DDCAPS2_COPYFOURCC flag to accelerate some video functionality; the
requirement for this flag implies that all FOURCC formats can be copied.
If a blt operation requires compression to a DXT format, the DirectDraw HEL always performs the blt. This means
that DirectDraw never requests the driver to perform a blt for which:
The destination surface has a DXT format.
The formats for the source and destination surfaces are not the same.
The semantics of the DirectDraw DDCAPS_CANBLTSYSMEM capability bit imply that the display driver is called for
all blts from system memory to display memory. Consequently, the driver may be called for such blts from DXT
surfaces to non-DXT surfaces. The only requirement in this case is that the driver return
DDHAL_DRIVER_NOTHANDLED if it cannot perform the decompression. This causes DirectDraw to propagate a
DDERR_UNSUPPORTED error code to the application. It is acceptable to implement decompression for blts from
system memory to display memory in your driver, but this is not required for DirectX 6.0 and later versions.
DirectDraw display memory allocation routines do not handle pixel format considerations.
HeapVidMemAllocAligned, for example, expects a count of bytes as its input parameter. Likewise,
DDHAL_PLEASEALLOC_BLOCKSIZE (see the fpVidMem member of the DD_SURFACE_GLOBAL structure) signifies
that the dwBlockSizeX and dwBlockSizeY members of the DD_SURFACE_GLOBAL structure are counts of bytes
and lines, respectively. Consequently, if your driver uses either of these mechanisms to allocate display memory
through DirectDraw allocators, your driver must be able to calculate the memory consumption, in bytes, of a DXT
surface by itself. The following sample shows one way to perform this calculation:
DWORD dx, dy;
DWORD blksize, surfsize;
LPVOID pmem;
/*
* Determine how much memory to allocate for the FOURCC format.
*/
switch ((int)pDDPF->dwFourCC)
{
case MAKEFOURCC('D','X','T','1'):
blksize = 8; // The size of a DXT1 4x4 pixel block.
break;
case MAKEFOURCC('D','X','T','2'): // premultiplied alpha
case MAKEFOURCC('D','X','T','3'): // non-premultiplied alpha
blksize = 16; //The size of a DXT2,3 4x4 pixel block
break;
case MAKEFOURCC('D','X','T','4'): // premultiplied alpha
case MAKEFOURCC('D','X','T','5'): // non-premultiplied alpha
blksize = 16; //The size of a DXT4,5 4x4 pixel block.
break;
default:
DDASSERT(0);
}
/*
* Calculate the number of blocks in the x and y dimensions.
*/
dx = (nWidth + 3) >> 2;
dy = (nHeight + 3) >> 2;
surfsize = dx * dy * blksize;
When the application calls the IDirect3DVertexBuffer7::Lock or IDirectDrawSurface7::GetSurfaceDesc

methods (described in the Direct3D and DirectDraw SDK documentation sets, respectively) on a compressed
surface, the driver must set the DDSD_LINEARSIZE flag in the dwFlags member of the DDSURFACEDESC2
structure. In addition, the driver must set the number of bytes allocated to contain the compressed surface data in
the dwLinearSize member of the same structure. (The dwLinearSize member resides in a union with the lPitch
member, so these members are mutually exclusive, as are the DDSD_LINEARSIZE and DDSD_PITCH flags.)
Your hardware or driver can convert and store the compressed texture in any format you choose (typically a
reordering into a more hardware-efficient layout). However, your hardware or driver must be able to convert the
compressed texture back to its original DXT code format whenever DirectDraw requires it, that is, whenever the
application calls the IDirect3DVertexBuffer7::Lock method.
Windows 2000 Note
Under Windows 2000, system memory DXT surfaces have had some of their fields mapped for memory allocation
purposes. The mapping is:
wWidth = lPitch = dx * blksize;

wHeight = dy;
dwRGBBitCount = 8;
When the driver encounters a system memory DXT surface, for example in D3dCreateSurfaceEx, it must map the
fields back before making use of them. The back mapping is:
realWidth = (wWidth << 2) / blksize;

realHeight = wHeight << 2;
realLinearSize = dwLinearSize * wHeight;
realRGBBitCount = 0
Reference Rasterizer Notes
The following three items should be observed when implementing reference rasterizer (RefRast) code:
1. The section entitled 3-Bit Linear Alpha Interpolation (DXT4 and DXT5 format) in the DirectDraw SDK
documentation shows the correct order in which to ramp the Alpha values for DXT4 and DXT5 formats.
2. Reference rasterizer code should have the following logic guarding the color comparison logic.
if ((color_0 > color_1) OR !DXT1) {

/* color comparison logic */
}
3. Because hardware implementations of reference rasterizers can perform the rounding of calculations to
better approximate the original value in slightly different ways, testing should allow for slight variations in
the color and alpha values obtained from the decompression logic.
Handling Compressed Texture Surfaces Created In
System Memory
This topic applies only to Windows NT-based operating systems.

The width and height of a compressed-texture surface created in system memory are altered by the user-mode
runtime to force the kernel-mode runtime to allocate the appropriate amount of memory. The display driver must
reverse this alteration to prevent subsequent operations that are performed on this surface from failing. Whenever
the DirectDraw runtime calls the driver's D3dCreateSurfaceEx function to create a compressed-texture surface,
the driver must restore the width and height of the surface to their unaltered states.
The driver's D3dCreateSurfaceEx function receives the surface's width, pitch, and height altered as follows:
Width and pitch contain the number of 4x4 blocks in a row multiplied by the block size.
Height contains the number of 4x4 blocks in the column.
The following code snippet shows the calculations that the driver must perform to restore the width and height of
the surface:
RealWidth = (Width / Block size) * 4;

RealHeight = Height * 4;
The driver should assign the restored width and height values to members in the kernel's DD_SURFACE_GLOBAL
surface structure. Doing so prevents the DirectDraw kernel-mode runtime from rejecting DXT texture download blts
because the width and height values do not match. That is, if the driver leaves the altered sizes in the wWidth and
wHeight members of DD_SURFACE_GLOBAL, the DirectDraw kernel-mode runtime rejects a blt from the altered
system-memory surface to the video-memory surface because the width and height of the source, which is in
unaltered coordinates, seems to be "outside" the altered DD_SURFACE_GLOBAL size.
Reporting Width and Height of Compressed Texture
Surfaces
When the DirectX runtime requests that a driver create a DXTn compressed texture surface whose width and height
are less than 4x4, the driver actually allocates a 4x4 block of memory for the texture surface. However, the driver
reports the width and height of the texture surface as the values that the runtime requested. For example, if a 2x2
DXT1 compressed texture surface is requested, the driver allocates a 4x4 block but reports that the block is 2x2 by
leaving the requested texture size unchanged. To request a specific DXTn compressed texture size, the runtime sets
the dwWidth and dwHeight members of a DDSURFACEDESC or DDSURFACEDESC2 structure that represents
the texture surface. The driver does not alter these size settings even if it allocates a 4x4 texture surface when the
request is for a texture surface whose width and height are less than 4x4.
DirectX Compressed VolumeTexture Formats
Volume textures maps are digitized images that are generated periodically through a true 3D region of space. For
example, a ball of fire may be sampled and the "amount" of flame within a slice can be accounted for. In this sense,
the amount of flame represents a set of values for the alpha, red, green and blue components at each pixel in the
image. The entire flame is represented by a series of these slices.
Before going into details of DirectX volume rendering, it is important to clarify some terminology. There are two
common methods used in computer graphics to represent volume data sets. One is to sample the image at each
location and store the appropriate ARGB values. For example, data for the flame could be stored as a 256 X 256 X
256 three-dimensional array. This requires about 64MB to store a 32-bit value at each location in the array. (Some
applications, such as medical imaging, may require this amount of data.) However, "slicing" the flame into 4 or 8
slabs can make a very good approximation. Each slab stores 256 X 256 elements, where each element represents a
region of data within the slab.
If we assume an array of size 256 X 256 X 4 is reasonable, then storing a 32-bit value at each location requires only
1MB of storage (less if the data is compressed). The reason to talk about these two different methods is that if you
want anything in your image that is behind the flame to be hidden, then the data value stored at each location is
different. In the full 256 X 256 X 256 data set, any one element may contribute about 1/256 of the final alpha and
color. In the slicing method, an element could contribute anywhere from 0 up to the final alpha and color used for
that pixel. The problem is that sometimes the term "voxel" is used to describe both types of sampling. Starting with
DirectX 8, the slicing method (often called "Volume Texturing") is the only method supported. The idea is to reduce
the volume data set to a reasonable size. With some minor modifications to the texturing hardware, the DirectX
graphics API can expose high-performance volumetric rendering.
Up to this point, most of this section has focused on DXTN compressed surface formats. These formats work well
for 2D textures. They are inefficient, however, when you are working with a volume data set because volume data
sets are stored in separate surfaces that require extra hardware cycles to access. To address this issue, DirectX 8
supports a set of volume surface formats.
The idea is to take the volume slices and compress them using DXTN. Then, instead of storing the completed
surfaces sequentially in memory, the DXTN data blocks are reordered. The reordering is done so that the small 4X4
texel blocks form data cubes. They can be 4X4X1, 4X4X2, 4X4X3, or 4X4X4. Note that the 4X4X1 ordering is exactly
the same as that used in DXT1 through DXT5. The important point is:
Volume surfaces are slices of data that are compressed using DXTN and then reordered to account for the 3D
locality of data. These reordered data structures are referred to by the FOURCCs of DXV1,..., DXV5 (or just DXVN to
describe the general case). Note that a DXV1 data structure would hold a set of reordered DXT1 surfaces, DXV2
would hold a set of DXT2 surfaces, and so on.
DXV N Details
A DXVN surface stored in the 1-deep arrangement contains one set of 4x4 DXTN subblocks (in effect it is just a
DXTN surface). The DXVN 4x4x2 block format structure contains two 4x4 DXTN subblocks, one taken from each of
two adjacent data slices. Using this method, a DXVN 4x4x4 block format structure contains four 4x4 DXTN
subblocks taken from 4 data slices. In all cases the N in DXVN is used to match it to the corresponding DXTN type
used for compression. DXTN blocks can be stored as 64 bits (color and 1-bit alpha, or 128 bits with additional alpha
information). So the DXVN subblocks have the same possible sizes based on the type N. That is, DXT1 and DXV1
use 64 bit subblocks, while DXT2,..,DXT5 and DXV2,..,DXV5 use 128 bit subblocks.
Given a texel coordinate (u, v, p) where u {0,1,...,width-1}, v {0, 1, ..., height-1}, and p {0, 1, ..., depth-1}, the following
can be used to compute the corresponding address of the compressed block and subblock in memory containing
that texel. As mentioned above, the subblock format matches the existing DXTN format:
subblock_size = 8 (for DXT1)

subblock_size = 16 (for DXT2,...,DXT5)
block_size = MIN(p, 4) * subblock_size
horiz_stride = (width + 3) >> 2
planar_stride = ( (height + 3) >> 2) * horiz_stride
block_byte_address = block_size *
( (p >> 2) * planar_stride + (v >> 2) * horiz_stride + (u >> 2) )
subblock_byte_address = block_byte_address + ((p & 3) * subblock_size)

Compressed Video Decoding
The following topics describe motion compensation, which is the most frequently supported part of compressed
video decoding:
Motion Compensation
Motion Compensation Callbacks
DirectX Video Acceleration uses the motion compensation callback functions to aid in the acceleration of digital
video decoding processing.
Motion Compensation
Motion compensation is the term for an important stage of the decoding process for compressed digital video.
Many graphic accelerator devices provide some type of acceleration capability for supporting compressed video
decoding. Because the motion compensation process is the most frequently supported part of video decoding, the
device driver interface that supports compressed video decoding is called the motion compensation DDI. In
addition to motion compensation, some devices can perform IDCT (Inverse Discrete Cosine Transformation) and
other hardware functions that a software video decoder can use to accelerate the decoding process. The motion
compensation DDI is flexible enough to handle devices that provide these other capabilities as well.
The input data to a software MPEG decoder is well defined. If the decoder is designed for MPEG-2, the input is in
MPEG-2 format. The output of the decoder is also well defined. It is an uncompressed frame in a variety of formats.
However, the interim formats between the software decoders and the display devices are not well defined, with
many devices requiring their own proprietary data formats. Therefore, the motion compensation device driver
interface is flexible and the interim formats are described as GUIDs. The display driver reports the GUIDs that
represent the capabilities it supports, and the software decoder chooses the GUID that best matches its
requirements.
To enable motion compensation functionality, the driver must perform the following steps:
Implement a DdGetDriverInfo function and set the GetDriverInfo member of the DD_HALINFO structure
to point to this function when DrvGetDirectDrawInfo is called. The driver's DdGetDriverInfo function must
parse the GUID_MotionCompCallbacks GUID.
Fill in a DD_MOTIONCOMPCALLBACKS structure with the appropriate driver callback pointers and
callback type flags set when the DdGetDriverInfo function is called with the GUID_MotionCompCallbacks
GUID. The driver must then copy this initialized structure into the Microsoft DirectDraw-allocated buffer to
which the lpvData member of the DD_GETDRIVERINFODATA structure points, and return the number of
bytes written into the buffer in dwActualSize.
Motion Compensation Callbacks
DirectX Video Acceleration makes use of the following motion compensation callback functions provided in
DirectDraw drivers for acceleration of digital video decoding processing, with support of alpha blending for such
purposes as DVD subpicture support:
DdMoCompBeginFrame
DdMoCompCreate
DdMoCompDestroy
DdMoCompEndFrame
DdMoCompGetBuffInfo
DdMoCompGetFormats
DdMoCompGetGuids
DdMoCompGetInternalInfo
DdMoCompQueryStatus
DdMoCompRender
The motion compensation callback functions comprise the device driver side of the DirectX Video Acceleration
interface. The motion compensation callback functions are specified by members of the
DD_MOTIONCOMPCALLBACKS structure. The following steps show how motion compensation callback
functions are accessed:
1. GUIDs received from IAMVideoAccelerator::GetVideoAcceleratorGUIDs originate from the device
driver's DdMoCompGetGuids.
2. A call to the downstream input pin's IAMVideoAccelerator::GetUncompFormatsSupported returns
data from the device driver's DdMoCompGetFormats.
3. At the start of the relevant processing, the DXVA_ConnectMode data structure from the output pin of the
decoder's IAMVideoAcceleratorNotify::GetCreateVideoAcceleratorData is passed to the device
driver's DdMoCompCreate, which notifies the decoder about the video acceleration object.
4. Data returned from IAMVideoAccelerator::GetCompBufferInfo originates from the device driver's
DdMoCompGetBuffInfo.
5. Buffers sent using IAMVideoAccelerator::Execute are received by the device driver's
DdMoCompRender.
6. Use of IAMVideoAccelerator::QueryRenderStatus calls the device driver's DdMoCompQueryStatus. A
return code of DDERR_WASSTILLDRAWING from DdMoCompQueryStatus will be seen by the host
decoder as a return code of E_PENDING from IAMVideoAccelerator::QueryRenderStatus.
7. Data sent to IAMVideoAccelerator::BeginFrame is received by the device driver's
DdMoCompBeginFrame. A return code of DDERR_WASSTILLDRAWING is needed from
DdMoCompBeginFrame in order for E_PENDING to be seen by the host decoder in response to
IAMVideoAccelerator::BeginFrame.
8. Data sent to IAMVideoAccelerator::EndFrame is received by the device driver's DdMoCompEndFrame.
9. At the end of the relevant processing, the device driver's DdMoCompDestroy is used to notify the driver
that the current video acceleration object will no longer be used, so that the driver can perform any
necessary cleanup.
Direct3D DDI
The Microsoft Direct3D device driver interface (DDI) is a graphics interface that allows vendors to provide
hardware acceleration for Direct3D. The interface is flexible, allowing vendors to provide Direct3D acceleration
according to hardware capabilities. Driver writers implement the Direct3D DDI as an integral part of the display
driver.
This section describes the Direct3D DDI, and provides implementation guidelines for Direct3D driver writers. It is
assumed that the reader is familiar with the Direct3D and Microsoft DirectDraw APIs, and that the reader has a
firm grasp of the Windows 2000 display driver model, including the DirectDraw DDI.
All Direct3D drivers for Windows 2000 and later must conform to the Microsoft DirectX 7.0 or later Direct3D
driver model. The DirectX 8.0 driver model is supported in Microsoft Windows XP.
Driver writers who are creating Microsoft Direct3D drivers for Microsoft Windows 2000 and later should use the
following header files:
d3dnthal.h
Contains prototypes for callbacks that are implemented by the driver and definitions for driver-level structures.
The D3DHAL_DP2OPERATION enumerated type is defined in this file. This header is included in winddi.h, which
must be included in all Windows 2000 and later display drivers.
d3dtypes.h
Contains Direct3D type definitions used by both applications and drivers. Except for D3DHAL_DP2OPERATION, all
other Direct3D enumerated types are defined in this header.
d3dcaps.h
Contains structures and definitions that describe capabilities of various aspects of Direct3D drivers.
dx95type.h
Allows driver developers to write driver code that is portable between Windows 2000 and later and Windows
98/Me.
ddrawint.h
This header file, which is included in winddi.h, is required to develop the Microsoft DirectDraw portion of a display
driver.
All of these header files are shipped with the Windows Driver Kit (WDK). Previous Driver Development Kits (DDKs)
also provide sample code for a Direct3D driver in the Perm3 video display directory.
Permedia3 (Perm3.htm ) sample display drivers. You can get these sample drivers from the Windows Server 2003
SP1 DDK, which you can download from the DDK - Windows Driver Development Kit page of the WDHC website.
Reference pages for Direct3D DDI functions, structures, and enumerations can be found in Direct3D Driver
Functions, Direct3D Driver Structures, and Direct3D Driver Enumerations.
The primary reference for SDK-related aspects of the Direct3D interface is the Microsoft Windows SDK
documentation. Computer Graphics: Principles and Practice by Foley, van Dam, Feiner, and Hughes, which was
published by Addison-Wesley, is a useful general graphics reference.
Cross Platform Direct3D Driver Development
The Microsoft Windows 2000 and later and Windows 98/Me Direct3D DDI types are not directly compatible when
they are compiled, because of naming differences and some type changes of structure and function members in
each DDI type. Logically however, equivalent members in each DDI type serve the same purpose.
If your code will be portable between Windows 2000 and later and Windows 98/Me, use dx95type.h, a utility file
that is included in the Windows Driver Kit (WDK) and previous Driver Development Kits (DDKs). It contains type
definitions and macros that map some naming differences that occur between the Windows 98/Me and Windows
2000 and later platforms, enabling common driver code to be used between them.
Direct3D Implementation Requirements
The purpose of this section and the following subsections is to provide detailed descriptions of the Windows Logo
requirements for hardware with respect to exposed features. This section includes descriptions of features that are
present in the PC2001 Design Guide.
Feature compliance is generally assured by comparing a scene rasterized by the hardware to an identical scene
rasterized by the reference rasterizer. Each feature must conform to the reference rasterizer as described in the
following topics:
Render Target Requirements
Point, Line, and Triangle Filling Requirements
Shading Requirements
Texturing Requirements
Render Target Requirements
The requirements for color buffers and depth buffers are as follows:
Color Buffers
If the hardware does not support a render target that is also to be used as a texture (that is, the device cannot
"render to a texture"), the device must fail calls to the IDirect3DDevice7::SetRenderTarget and
IDirect3D7::CreateDevice methods. These methods are described in the Direct3D SDK documentation. The fact
that a render target is to be used as a texture is signified by the presence of the DDSCAPS_TEXTURE flag in the
surface description (see the dwCaps member of the DDSCAPS structure).
Depth Buffers
If the hardware does not support a particular combination of render target and depth-buffers, then the device must
fail API calls that cause this scenario when it detects mismatches of this sort, such as in calls to the
IDirect3D7::CreateDevice and IDirectDrawSurface7::AddAttachedSurface methods. These methods are
described in the Direct3D and DirectDraw SDK documentation sets, respectively. An example of such a mismatch
might be when the render target and depth buffer are of different bit depths. Do not transparently alter the format
of either the render target or the depth-buffer to cause an invalid combination to work properly. Instead, allocate a
higher-precision depth buffer without informing the DirectX runtime.
Point, Line, and Triangle Filling Requirements
The requirements for filling points, lines, and triangles are as follows:
Points
The point fill and rasterization rules determine how a point is rendered. These rules are identical to the triangle fill
rules. All flags and capabilities that apply to triangles also apply to points, and vice-versa. The reference
implementation expands a point into a rectangle and applies the triangle fill rules to the result.
Given a point with coordinates P₀(x,y), generate four new points P₁, P₂, P₃, and P₄ as follows:
P1(x,y) = (x âˆ’ 0.5, y âˆ’ 0.5)

P2(x,y) = (x âˆ’ 0.5, y + 0.5)
P3(x,y) = (x + 0.5, y + 0.5)
P4(x,y) = (x + 0.5, y âˆ’ 0.5)
The rectangle is then generated as two triangles, such as (P₁, P₂, P₃) and (P₁, P₃, P₄). You may also examine the
reference rasterizer implementation for rendering points in the source files setup.cpp and scancnv.cpp of the
DirectX Driver Development Kit (DDK).
Lines
Line fill rules (that is, rules that determine how a line is rendered) follow the Grid Intersection Quantization (GIQ)
diamond convention. For more information about the GIQ diamond convention, see Cosmetic Lines. An example of
line-drawing code that follows these rules can be found in the DirectX DDK in the reference rasterizer source files
setup.cpp and scancnv.cpp.
Triangles
The triangle fill rules determine how a triangle is rendered. These rules are identical to the point fill rules. An
example of triangle-drawing code that follows the triangle fill rules can be found in the DirectX DDK in the
reference rasterizer source files setup.cpp and scancnv.cpp.
Hardware should supply the culling caps and properly implement the three culling modes. The following code
fragment determines whether to cull the current triangle:
if (CurrentCullMode != D3DCULL_NONE) {
int ccw = (((v[0]->sx - v[2]->sx) *
(v[1]->sy - v[2]->sy)) <
((v[1]->sx - v[2]->sx) *
(v[0]->sy - v[2]->sy)));
if ((CurrentCullMode == D3DCULL_CW && (ccw == 0)) ||
(CurrentCullMode == D3DCULL_CCW && (ccw != 0))) {
// Current triangle is culled, move onto
// next triangle.
}
}
// Current triangle is not culled, render it
The preceding code sample tests to determine which way the triangle is facing. The triangle is defined 0,1,2 and
tested for being counterclockwise in screen space. If it is not, and there is clockwise culling, then that triangle is not
drawn because the vertices go in clockwise order.
Shading Requirements
The requirements for flat shading, Gouraud shading, specular highlighting, alpha blending, dithering, and color-key
are as follows:
Flat Shading
The D3DPSHADECAPS_COLORFLATRGB bit in the dwShadeCap member of the D3DPRIMCAPS structure must
be set for the appropriate primitive type (line or triangle) in order to indicate that flat shading is supported for that
primitive type.
For all primitive types except triangle fans, the color, specular (if supported) and alpha data come from the first
vertex of each primitive. For triangle fans, the second vertex is used. These colors remain constant across the entire
triangle (that is, they are not interpolated).
Gouraud Shading
The dwShadeCaps member of the D3DPRIMCAPS structure must have the
D3DPSHADECAPS_COLORGOURAUDRGB bit set for the appropriate primitive type (line or triangle) to indicate that
Gouraud interpolation of colors is supported. The color and specular components are both linearly interpolated
between vertices.
For all primitives, the color data must use linear interpolation between the vertices of the primitive, and must
conform to an image generated by the reference rasterizer. Furthermore, all color and alpha components must be
interpolated in the same fashion if they are present. For example, it is not valid to use Gouraud interpolation of the
RGB components of a color while using flat shading for the alpha component. An exception is transmitting the fog
component via the alpha component of the specular color, which should never be flat shaded.
Perspective correction of iterated color components is encouraged. Note that the reference rasterizer in the latest
Direct X release already does perspective correction of color components. This is taken into account in current
testing procedures.
Specular Highlighting
If specular highlighting support is exposed, then one or both of the following flags must be set in the
dwShadeCaps member of the D3DPRIMCAPS structure for the appropriate primitive type (line or triangle):
D3DPSHADECAPS_SPECULARFLATRGB must be set if flat shading is supported.
D3DPSHADECAPS_SPECULARGOURAUDRGB must be set if Gouraud shading is supported.
Also, it must be possible to enable or disable specular highlighting by setting the appropriate value of the
D3DRENDERSTATE_SPECULARENABLE render state.
Specular highlighting must be full color, and must be capable of producing the full range of colors available for the
render target. Monochromatic specular highlighting is not sufficient to meet this requirement.
The D3DPSHADECAPS_SPECULARFLATMONO and D3DPSHADECAPS_SPECULARGOURAUDMONO flags are not
to be used to indicate monochromatic specular highlighting. They indicate only that specular highlighting is
supported in RAMP mode. There is no cap that can indicate that an adapter supports only monochromatic specular
highlights in RGB mode. If your adapter supports only monochromatic specular highlights in RGB mode, you
should not set the SPECULARFLATRGB or SPECULARGOURAUDRGB caps. With the monomodes, the hardware
should just interpolate the blue channel of the specular components as a white intensity. This is controlled by the
D3DRENDERSTATE_MONOENABLE render state.
To properly implement specular highlights, a second set of interpolants is required. However, hardware parts that
support z-buffering and blending can often emulate specular highlights by doing a second blending pass over the
triangle with the z comparison function set to D3DCMP_EQUAL (see the D3DCMPFUNC enumerated type in the
DirectX SDK documentation). This second pass would do a blend to add the interpolated specular component to the
pixel written during the first pass; values that exceed the maximum should saturate to white.
Alpha Blending
To expose support for alpha blending, the following flags must be set in the dwShadeCaps member of the
D3DPRIMCAPS structure for the appropriate primitive type (line or triangle):
D3DPSHADECAPS_ALPHAFLATBLEND must be set if flat shading is supported with alpha.
D3DPSHADECAPS_ALPHAGOURAUDBLEND must be set if Gouraud shading is supported with alpha.
If alpha blending is supported, it must be possible to enable or disable it by setting the appropriate value of the
D3DRENDERSTATE_ALPHABLENDENABLE render state.
Alpha blending, or true transparency (from the 3D space), is the process of modulating incoming color by the value
of the data that is already located in the frame buffer. This render state has modes that can be set by the
D3DRENDERSTATE_SRCBLEND and D3DRENDERSTATE_DESTBLEND render states.
When the alpha value for a blending operation is not available, it must be assumed to be 1.0 (opaque). An example
of when this is possible is blending with textures that have no alpha channel.
If the target to which the resultant pixel is written contains an alpha channel, the alpha value resulting from any
alpha blending operation must be written to that channel, allowing proper accumulation of transparency.
Dithering
If dithering is supported on a particular primitive type (line or triangle), the dwRasterCaps member of the
D3DPRIMCAPS structure must have the D3DPRASTERCAPS_DITHER flag set. The feature must be controllable by
means of the D3DRENDERSTATE_DITHERENABLE render state.
If dithering is supported, it may not default to always off or always on.
Color-key
Color-key is related to texture transparency and the D3DPTEXTURECAPS_TRANSPARENCY cap (see the
dwTextureCaps member of the D3DPRIMCAPS structure).
Color-key transparency, which is used to create 2D sprites, replaces some colors of an object with the colors that
are beneath them in the frame buffer. The driver should make it possible for an application to enable color-key for
an entire scene, but only use the color-key on certain surfaces with the attached color-key values instead of turning
the color-key on and off for each surface.
Colorkeying is enabled if the D3DRENDERSTATE_COLORKEYENABLE render state is set to TRUE and the texture
surface has the DDRAWISURF_HASCKEYSRCBLT bit set
Color-key is enabled if the D3DRENDERSTATE_COLORKEYENABLE render state is set to TRUE and the texture
surface has the DDRAWISURF_HASCKEYSRCBLT bit set. (See the dwFlags member of the DD_SURFACE_LOCAL
structure for more information.) Applications create a texture surface that uses DDSD_CKSRCBLT and then call the
IDirect3DDevice7::SetRenderState method with D3DRENDERSTATE_COLORKEYENABLE and TRUE. Both of
these must be true for color-key to occur, and applications must be permitted to leave the render state TRUE all the
time and still selectively use color-key for a subset of a frame's textures (that is, those that have the
DDRAWISURF_HASCKEYSRCBLT bit set). It is up to the driver to correctly handle this behavior. For more
information about IDirect3DDevice7::SetRenderState, see the Direct3D SDK documentation.
Texturing Requirements
This section lists requirements for texture sizes and texture filtering. There are also texture-related requirements for
the IDirect3DDevice7::ValidateDevice method.
Texture Sizes
The following are requirements for texture sizes:
1. The driver must expose its minimum and maximum texture dimensions through the dwMinTextureWidth,
dwMinTextureHeight, dwMaxTextureWidth, and dwMaxTextureHeight members of the
D3DDEVICEDESC7 structure. This structure is defined in the Direct3D SDK documentation.
2. If the hardware has an aspect ratio restriction on its textures, that ratio must be present in the
dwMaxTextureAspectRatio member of the D3DDEVICEDESC7 structure.
3. If the device supports only texture dimensions that are powers of two, then it must set the dwTextureCaps
member of the D3DPRIMCAPS structure to contain the D3DPTEXTURECAPS_POW2 flag for the appropriate
primitive type (line or triangle).
4. If the device can support two-dimensional (2D) textures (that is, not volume or cube textures) of an arbitrary
size when the texture addressing mode for the texture stage is set to D3DTADDRESS_CLAMP, the texture
wrapping for the texture stage is disabled (D3DRENDERSTATE_WRAPn set to 0), and MIP mapping is not in
use, then it must set the D3DPTEXTURECAPS_NONPOW2CONDITIONAL flag.
5. If the device only supports textures whose dimensions are equal, then it must set the dwTextureCaps
member of the D3DPRIMCAPS structure to contain the D3DPTEXTURECAPS_SQUAREONLY flag for the
appropriate primitive type (line or triangle).
If the device supports textures of an arbitrary size without restrictions other than those described in the first and
second requirements, then it must not set any of the flags described in the third, fourth, and fifth requirements.
Texture Filtering
Filters that magnify and minify textures must be enabled and disabled through the D3DTSS_MAGFILTER and
D3DTSS_MINFILTER texture stage states. This filtering must not be performed automatically when these states are
disabled. For more information about the D3DTSS_Xxx texture stage states, see the D3DTEXTURESTAGESTATETYPE
enumerated type in the Direct3D SDK documentation.
Texture MIP mapping must be enabled and disabled through the D3DTSS_MIPFILTER texture stage state. If this
state is disabled, but the texture was created as a MIP map, the device must use only the top level of the MIP map.
MIP mapped filtering must not be performed when this state is disabled.
If the device supports anisotropic filtering, the maximum anisotropy level must be exported through the value of
the dwMaxAnisotropy member of the D3DDEVICEDESC7 structure (defined in the Direct3D SDK documentation).
Furthermore, the device must accept any setting from 1 through dwMaxAnisotropy in the
D3DTSS_MAXANISOTROPY texture stage state.
The device must be able to apply all supported filter modes to textures of any supported format. For example, if
MIP mapping is supported with other texture formats, MIP map filtering of YUV textures should be able to be
performed.
Note DirectX 9.0 and later applications can use values in the D3DSAMPLERSTATETYPE enumeration to control the
characteristics of sampler texture-related render states. In DirectX 8.0 and earlier, these sampler states were
included in the D3DTEXTURESTAGESTATETYPE enumeration. The runtime maps user-mode sampler states
(D3DSAMP_Xxx) to kernel-mode D3DTSS_Xxx values so that drivers are not required to process user-mode
sampler states. For more information about D3DSAMPLERSTATETYPE, see the latest DirectX SDK documentation.
IDirect3DDevice7::ValidateDevice
If a device supports a particular combination of texture stage state blending operations and operands in a single
pass, then the device must return DD_OK from a call to the IDirect3DDevice7::ValidateDevice method
(described in the Direct3D SDK documentation) for each such combination.
If a device does not support a particular combination of texture stage state blending operations in a single pass, or
does not support one or more of the blending operations or operands, then it must return one of the error codes
allowable for the IDirect3DDevice7::ValidateDevice method. Invalid blending operations cannot silently fail the
IDirect3DDevice7::ValidateDevice method.
Direct3D Driver DDI
The following sections describe the callback functions and operation codes that comprise the Direct3D DDI.
Driver Functions to Support Direct3D
A driver that supports Direct3D provides both Direct3D callback functions and DirectDraw DDI functions. The
Direct3D DDI callbacks are prototyped as follows:
typedef DWORD (APIENTRY *LPD3DHAL_MYFUNCTIONCB) (LPD3DHAL_MYFUNCTIONDATA);
In the preceding syntax:

LPD3DHAL_MYFUNCTIONCB points to a driver-implemented callback that can be called MyFunction. All
callback names are pseudonames decided upon by the display driver writer.
LPD3DHAL_MYFUNCTIONDATA is a pointer to a D3DHAL_MYFUNCTIONDATA structure being passed to the
callback. Callback parameter structures are characterized as follows:
The first member of every structure, dwhContext, is the context handle that describes the 3D context in
which the callback should operate. The only exception to this rule is the D3DHAL_CONTEXTCREATEDATA
structure.
The last member of every structure is ddrval. This member is used to pass the callback's return value
back to Direct3D so it can be returned to the calling application.
To determine how to initialize Direct3D callback functions, see Direct3D Driver Initialization.
The following table lists the Direct3D callback functions that are implemented in a Direct3D driver. All callback
functions are required except for D3dValidateTextureStageState, which is optional depending on the hardware
capabilities.
D3dContextCreate Creates a context.
D3dContextDestroy Destroys a context.
D3dCreateSurfaceEx Creates an association between a texture handle and a

surface.
D3dDestroyDDLocal Destroys all the Direct3D surfaces previously created by

D3dCreateSurfaceEx that belong to the same given local
DirectDraw object.
D3dDrawPrimitives2 Renders primitives and returns updated state to Direct3D.
D3dGetDriverState Returns state information about the driver to DirectDraw

and Direct3D runtimes.
D3dValidateTextureStageState Performs texture stage state validation, which is required

for all drivers that support texturing.
In order to support Direct3D, a driver must minimally support Microsoft DirectDraw and must also implement
certain DirectDraw DDI functions. The functions pertinent to Direct3D support are listed in the following table.
DrvGetDirectDrawInfo This function retrieves the capabilities of the graphics

hardware. In this initialization function the driver indicates
that it supports Direct3D.
DdGetDriverInfo The runtime queries this callback function with GUIDs for
additional information about the driver. Several GUIDs
pertain specifically to the driver's Direct3D support.
DirectDraw function and callback implementation details are discussed in DirectDraw.

Operation Code Handling
A display driver handles requests to render graphics primitives and processes state changes in its
D3dDrawPrimitives2 function. The driver receives these requests as D3DHAL_DP2OPERATION operation codes.
The following topics describe how the driver processes operation codes and how its performance can be improved
during such processing:
Command Stream
Improving Performance of Operation Handling
In order to support the Microsoft DirectX 7.0 and later Direct3D driver model, driver writers need to make their
drivers respond to a number of new operation codes in their implementation of D3dDrawPrimitives2. Some of
these operation codes replace callback functions, and others provide new functionality. The most important new
operation codes are summarized in the following table, beginning with those that replace callbacks.
OPERATION CODE CONDITION/DESCRIPTION
D3DDP2OP_SETRENDERTARGET Always required. Maps a new rendering target surface

and depth buffer in the current context. Replaces
D3dSetRenderTarget.
D3DDP2OP_CLEAR Always required. Used to clear the context's render

target, Z-buffer. Replaces D3dClear2. Also used to clear
hardware stencil buffers, and depth buffers that cannot be
cleared properly by a depth fill bit-block transfer.
D3DDP2OP_SETPALETTE Used to map an association between a palette handle and

a surface handle, and specify the characteristics of the
palette. Only needed for drivers that support paletted
textures; otherwise this should be no operation (NOP).
D3DDP2OP_UPDATEPALETTE Used to make alterations to a texture palette, for drivers

that support paletted textures. Otherwise, this should be a
NOP.
D3DDP2OP_TEXBLT Specifies a blt operation from a source texture to a

destination texture.

Command Stream
At the driver level, instructions come in the form of calls to D3dDrawPrimitives2. The input structure
D3DHAL_DRAWPRIMITIVES2DATA contains a pointer into a command buffer. This is a sequence of
D3DHAL_DP2COMMAND structures. Each of these structures contains a bCommand member that specifies
what type of data follows it in the buffer. This specification comes in the form of a D3DHAL_DP2OPERATION
enumerated type, such as D3DDP2OP_INDEXEDTRIANGLESTRIP or, in the case of setting up texture states,
D3DDP2OP_TEXTURESTAGESTATE.
In other words, the D3DHAL_DP2OPERATION operation code specifies what type of structures follow it in the
command buffer. The number of structures to follow is specified by either wPrimitiveCount or wStateCount,
members of a union that is in turn a member of the D3DHAL_DP2COMMAND structure. The wPrimitiveCount
member keeps track of the number of graphics primitives to render, while the wStateCount member keeps track
of the number of state changes to process.
For an example of how a driver process operation codes, see Processing Texture Stages.
Improving Performance of Operation Handling
To improve the performance of your display driver, you should observe the following items when you implement
your driver to render graphics primitives and process state changes:
The DirectX runtime filters redundant requests to set render-state parameters. That is, if an application calls
the IDirect3DDevice8::SetRenderState method multiple times to set the same device render-state
parameter before it renders a scene, the runtime filters out redundant calls, and your driver's
D3dDrawPrimitives2 function only receives one request to set this particular render-state parameter.
Therefore, you do not have to implement your driver to perform this filtering action.
Your driver should only write to render-state registers just before it draws primitives and not every time it
receives an operation request (D3DHAL_DP2OPERATION).
For more information about IDirect3DDevice8::SetRenderState, see the Direct3D SDK documentation.
Return Codes for Direct3D Driver Callbacks
The following table lists values that can be returned by the Direct3D Driver-Supplied Functions. The
DDHAL_DRIVER_Xxx values actually are returned in the DWORD return value. The D3D_OK value, D3DHAL_Xxx
values, and D3DERR_Xxx error codes are returned in the ddrval member of the structure to which the particular
function's parameter points.
For specific error codes that each function can return, see the function and structure descriptions in the reference
section. Refer to Direct3D header files d3d.h and d3dhal.h for a complete listing of error codes and return values
(also, d3d8.h and d3d9.h for DirectX versions 8.0 and 9.0). Note that error codes are represented by negative values
and cannot be combined.
A function in a Direct3D driver must return one of the two return codes: DDHAL_DRIVER_HANDLED or
DDHAL_DRIVER_NOTHANDLED. If the driver returns DDHAL_DRIVER_HANDLED, then it must also return either
D3D_OK or one of the values listed in d3d.h or d3dhal.h. A function in a Direct3D driver can return the values in the
following table. These values are defined in d3d.h and d3dhal.h.
VALUE MEANING
D3D_OK (defined as DD_OK) The request completed successfully.
D3DHAL_CONTEXT_BAD The context that was passed in was not valid.
DDHAL_DRIVER_HANDLED The driver has performed the operation and returned a

valid return code for that operation in the ddrval member
of the structure passed to the driver's callback. If this code
is D3D_OK, Direct3D proceeds with the function.
Otherwise, Direct3D returns the error code provided by
the driver and aborts the function.
DDHAL_DRIVER_NOTHANDLED The driver has no comment on the requested operation. If

the driver is required to have implemented a particular
callback, Direct3D reports an error condition. Otherwise,
Direct3D handles the operation as if the driver callback
had not been defined by executing the Direct3D device-
independent implementation. Direct3D typically ignores
any value returned in the ddrval member of that
callback's parameter structure.
D3DHAL_OUTOFCONTEXTS There are no more contexts left in this process.
D3DERR_UNSUPPORTEDCOLOROPERATION The color operation is not supported.

Performing Floating-point Operations in Direct3D
The DirectX runtime saves and restores floating-point state when it calls many of a display driver's Direct3D
callback functions. However, as described in Performing Floating-point Operations in DirectDraw, some of the
driver's Direct3D callback functions must save floating-point state prior to performing floating-point operations
and must restore floating-point state when the operations complete.
The DirectX runtime saves and restores floating-point state as required for the following Direct3D callback
functions:
D3dContextCreate
D3dContextDestroy
D3dDrawPrimitives2
D3dGetDriverState
D3dValidateTextureStageState
For the following callback functions, a Direct3D-supported display driver must save floating-point state before
performing floating-point operations, and restore it when the operations are complete:
D3dCreateSurfaceEx
D3dDestroyDDLocal
D3DBuffer Callbacks
For more information about floating-point operations, see Floating-Point Operations in Graphics Driver Functions.
Direct3D Driver Initialization
When the driver's DrvGetDirectDrawInfo function is called by the Microsoft DirectDraw runtime to initialize
DirectDraw support, the driver must do the following to indicate its Microsoft Direct3D capabilities:
Set the DDCAPS_3D flag in the ddCaps.dwCaps member of the DD_HALINFO structure to indicate that the
driver's hardware has 3D acceleration.
Set the DDSCAPS_Xxx flags in the ddCaps.ddsCaps member of the DD_HALINFO structure that describe
the 3D capabilities of a driver's video memory surface. The flags are listed in the following table.
FLAG MEANING
DDSCAPS_3DDEVICE Indicates that a driver's surface can be used as a

destination for 3D rendering.
DDSCAPS_TEXTURE Indicates that a driver's surface can be used for 3D

texture mapping.
DDSCAPS_ZBUFFER Indicates that a driver's surface can be used as a Z-

buffer.
Set the GetDriverInfo member of the DD_HALINFO structure to point to the driver's DdGetDriverInfo
callback. The driver must also set the DDHALINFO_GETDRIVERINFOSET flag in the dwFlags member of the
DD_HALINFO structure to indicate that it has implemented the DdGetDriverInfo callback.
Allocate and initialize the members of the D3DHAL_CALLBACKS structure and return this structure in the
lpD3DHALCallbacks member of the DD_HALINFO structure.
Allocate and initialize the members of the D3DHAL_GLOBALDRIVERDATA structure and return this
structure in the lpD3DGlobalDriverData member of the DD_HALINFO structure.
To indicate that the driver is capable of working with Microsoft DirectX 7.0, it should do the following:
Include the D3DDEVCAPS_DRAWPRIMITIVES2EX flag in the dwDevCaps member of the
D3DDEVICEDESC_V1 structure that is reported during Microsoft Direct3D driver initialization.
Respond to the GUID_Miscellaneous2Callbacks GUID in DdGetDriverInfo callback by setting the
GetDriverState, CreateSurfaceEx, and DestroyDDLocal members of the
DD_MISCELLANEOUS2CALLBACKS structure. These are set to point to the appropriate callbacks for the
Direct3D driver and ORed in the dwFlags member with the DDHAL_MISC2CB32_CREATESURFACEEX,
DDHAL_MISC2CB32_GETDRIVERSTATE, and DDHAL_MISC2CB32_DESTROYDDLOCAL bits, respectively.
After DrvGetDirectDrawInfo returns, GDI calls the driver's DdGetDriverInfo callback several times for different
GUIDs to complete the driver's initialization. The DdGetDriverInfo callback must respond to the following GUIDs
to support Direct3D:
GUID_D3DCallbacks3
The driver should allocate and initialize the members of the D3DHAL_CALLBACKS3 structure and return this
structure in the lpvData member of the DD_GETDRIVERINFODATA structure.
GUID_Miscellaneous2Callbacks
The driver should allocate and initialize the members of the DD_MISCELLANEOUS2CALLBACKS structure and
return this structure in the lpvData member of the DD_GETDRIVERINFODATA structure.
GUID_D3DExtendedCaps
The driver should allocate and initialize the appropriate members of the D3DHAL_D3DEXTENDEDCAPS structure
and return this structure in the lpvData member of the DD_GETDRIVERINFODATA structure.
GUID_ZPixelFormats
The driver should allocate and initialize the appropriate members of a DDPIXELFORMAT structure for every Z-
buffer format that the driver supports and return these structures in the lpvData member of the
DD_GETDRIVERINFODATA structure. The driver must respond to this GUID if it supports the D3DDP2OP_CLEAR
operation code in its implementation of D3dDrawPrimitives2.
GUID_D3DParseUnknownCommandCallback
The driver should store the pointer to the Direct3D runtime's D3DParseUnknownCommand callback. The
pointer is passed to the driver in the lpvData member of the DD_GETDRIVERINFODATA structure. The driver's
D3dDrawPrimitives2 callback calls the D3DParseUnknownCommand callback to parse commands that the
driver does not recognize.
For more information, see DirectDraw Driver Initialization.
Direct3D Context Management
A context encapsulates the state information for an application-created Microsoft Direct3D hardware abstraction
layer (HAL) device; that is, a context describes how the driver should draw. State includes information such as the
surface being rendered to, the depth surface, shading information, and texture information.
A Direct3D driver is responsible for creating and managing its own rendering contexts.
Creating and Destroying a Context
A driver must create and initialize a device-specific context that encapsulates the state information that it requires to
perform rendering. State is not shared between contexts; so the driver must maintain full state information for each
context that it creates.
To create a context, the driver should do the following:
Allocate the device-specific context and zero-initialize it.
See D3dContextCreate for additional steps to be done within that callback. The D3dContextCreate
callback is called when an application creates a Direct3D HAL device. The driver must implement this
callback.
The driver must be able to reference all texture handles created by D3dCreateSurfaceEx within a created context.
This enables the driver to clean up all driver-specific data related to textures created within this context when a call
to the D3dContextDestroy function is made.
Direct3D calls D3dContextDestroy when an application requests that a Direct3D HAL device be destroyed. The
driver should free all resources that it allocated to the specified context. These resources include, for example,
texture resources, vertex and pixel shaders, declarations and code for vertex shaders, and resources for
asynchronous queries.
Maintaining State Within a Context
A driver updates its internal state associated with a context when its D3dDrawPrimitives2 callback is called. This
callback must also return the updated context's public state to Direct3D.
Direct3D Surface Handles
The Microsoft DirectX 7.0 device driver interface (DDI) is designed to promote a model whereby the Direct3D
runtime components parse as little of the command stream as possible before handing the commands to the
driver. Additionally, the command stream should be formatted so that it can be used by future hardware.
One important change directed toward these goals is the movement of all surface-related data out of intermediate
structures owned by the Direct3D/DirectDraw runtime into structures owned, updated, and formatted by the driver.
Surfaces are referred to by handles embedded in the command stream. In these high-frequency operations, the
driver can look up its own representation of a surface from the handle, without resorting to locking a surface via
helper functions such as EngLockDirectDrawSurface.
The mechanism for assigning these handles is a driver entry point called D3dCreateSurfaceEx. This entry point is
called directly after calls to the existing DdCanCreateSurface and DdCreateSurface entry points, and after a video
memory address and handle have been assigned to a surface. At D3dCreateSurfaceEx time, the driver copies all
pertinent information out of the DirectDraw runtime's copy of the surface structure and into its own surface
structure. Driver-side copies are required for surface data such as size, format, and fpVidMem (a member of the
DD_SURFACE_GLOBAL structure).
Handles are guaranteed by the runtime to be unique for each device and for each process. Handles are not
guaranteed to be unique for each context, and this has some implications for drivers that are discussed in greater
detail in Creating Driver-Side Surface Structures.
There is no corresponding DestroySurfaceEx call, so driver-side surface structures are destroyed at
DdDestroySurface time.
Creating Driver-Side Surface Structures
The DirectDraw runtime calls the driver's D3dCreateSurfaceEx entry point after it has called the DdCreateSurface
entry point and allocated memory for the surface. The runtime calls D3dCreateSurfaceEx only for those surfaces
tagged with DDSCAPS_TEXTURE, DDSCAPS_EXECUTEBUFFER, DDSCAPS_3DDEVICE, or DDSCAPS_ZBUFFER flags.
Before calling D3dCreateSurfaceEx, the runtime assigns an integer value as a handle to the surface. This value is
stored in the dwSurfaceHandle member of the DDRAWI_DDSURFACE_MORE structure (as pointed to by the
lpSurfMore member of the DDRAWI_DDSURFACE_LCL structure). See DD_SURFACE_MORE and
DD_SURFACE_LOCAL, which are aliases for the DDRAWI_DDSURFACE_MORE and DDRAWI_DDSURFACE_LCL
structures.
These integer values start at one and are kept as small as possible. (Zero is a guaranteed invalid value for a surface
handle.) The intention is that a driver can keep an array of pointers into its own structures. As soon as it receives a
handle (when D3dCreateSurfaceEx is called) that is beyond the end of the array, it can reallocate the array and
continue. The Direct3D runtime passes no handle value to the driver before that handle is shown to the driver via
D3dCreateSurfaceEx. However, the driver should be robust enough to handle values that are out-of-range, or that
refer to a slot in the handle table that has been freed (that is a handle for which DdDestroySurface has been called).
Note that since zero is a guaranteed invalid value, the zero entry in the handle table can be reused for other
purposes. The Perm3 sample driver uses the zero entry to store the current length of the array.
Note The Microsoft Windows Driver Kit (WDK) does not contain the 3Dlabs Permedia3 sample display driver
(Perm3.h). You can get this sample driver from the Windows Server 2003 SP1 Driver Development Kit (DDK), which
you can download from the DDK - Windows Driver Development Kit page of the WDHC website.
Pointers to DirectDraw Surfaces
Driver writers might be tempted to keep a pointer to the DirectDrawSurface data structures inside their private
driver-side surface structures. However, this practice does not succeed on Microsoft Windows 2000 and later
because access to DirectDraw kernel-side data structures is mediated through a management scheme that insulates
these structures from user mode and from drivers. EngLockDirectDrawSurface provides a pointer to the
structure that is valid until the EngUnlockDirectDrawSurface routine is called.
Outside of this lock/unlock pair, the structure is not guaranteed to reside, or even exist, at the same location.
Additionally, these lock/unlock pairs impede performance. If the driver keeps its own copies of the surface
structures, then the locks are not needed. Updates to data within the driver-side surface structures are made during
low-frequency calls like D3dCreateSurfaceEx. The result is that less code must be executed during high-frequency
calls like D3dDrawPrimitives2.
D3dCreateSurfaceEx Handles
Certain situations can cause D3dCreateSurfaceEx to be called for an invalid, destroyed surface. Drivers can
robustly handle this case by simply ignoring any D3dCreateSurfaceEx call for a video memory surface that has an
fpVidMem (a member of the DD_SURFACE_GLOBAL structure) of zero. However, a driver can get a
D3dCreateSurfaceEx call with a system memory surface that has an fpVidMem value of zero, which means that
the system memory surface is in the process of being destroyed. The driver should then free any existing driver-
side data related to this surface.
D3dCreateSurfaceEx and Backing Surfaces
D3dCreateSurfaceEx is also called for backing surfaces, which are system memory persistent copies of managed
surfaces. This allows the driver to allocate a driver-side structure for the surface and respond to the
D3DDP2OP_TEXBLT token for a system to video texture download.
D3dCreateSurfaceEx should never fail based on the format and capabilities of the backing surface requested
because emulation code could support and handle the surface. However, other conditions for failure are valid. For
example, the driver can fail D3dCreateSurfaceEx if it maintains private data structures and runs out of memory
space.
The driver should not fail D3dCreateSurfaceEx for backing surface formats for which it does not support the pixel
format. Such surfaces may be created for use with the software rasterizer. The driver should simply ignore backing
surfaces it does not support. (Alternatively, the driver can create a driver-side structure, but the corresponding
handle is never subsequently sent to the driver.)
For these backing surfaces, D3dCreateSurfaceEx causes a failure code to be propagated to the application; the
driver can then potentially affect the application in emulation-only mode. A driver's response to such situations can
be tested by running the ddtest.exe application that is located on the DirectX 7.0 SDK. Run ddtest.exe and try to
create a backing surface texture of a format unsupported by the driver, but supported by the DirectDraw emulation
layer (a list of these formats can be found in the DirectDraw SDK documentation).
D3dCreateSurfaceEx and Complex Surfaces
As explained in Creating and Destroying DirectDraw Surfaces in the DirectDraw documentation, creation of a
complex surface causes an array of surfaces to be passed to DdCreateSurface. However, even in complex cases,
only a pointer to the root surface is passed to D3dCreateSurfaceEx. It is necessary for the driver to go through the
attachment lists of the root surface and create driver-side copies for all attached surfaces. This can be a difficult
operation if the driver is attempting to handle creations for cube maps or MIP maps.
There are two types of DirectDraw surface attachments, implicit and explicit. An implicit attachment is formed
during a complex surface creation. For example, when an application creates a primary flipping chain, the primary
and back buffer are created by a single CreateSurface API call and are attached to one another implicitly by the
CreateSurface call. Another example is a MIP map chain where a series of MIP maps are created by a single
CreateSurface API call. An explicit attachment is formed when surfaces created from different CreateSurface calls
are attached explicitly by AddAttachedSurface.
How and when the DirectX runtime calls a driver's D3dCreateSurfaceEx function and how the driver processes
surfaces depends on whether those surfaces are attached implicitly or explicitly. When two surfaces are explicitly
attached, then both of those surfaces were created by separate calls to CreateSurface and each surface will have
resulted in a call to D3dCreateSurfaceEx before the surface attachment was made. However, in the case of an
implicitly attached surface, only a single CreateSurface and D3dCreateSurfaceEx call is made for the entire chain
of surfaces. Therefore, when processing a D3dCreateSurfaceEx call, a driver must run the attached list of surfaces
to identify handles and create driver side data structures for each attached surface. However, the attached surface
list may contain a mixture of implicitly and explicitly attached surfaces. The driver will already have been notified of
explicitly attached surfaces by D3dCreateSurfaceEx and probably will not want to process such surfaces again.
The driver can distinguish between implicit and explicit attachments by means of the DDAL_IMPLICIT flag stored in
the dwFlags field of the DD_ATTACHLIST data structure. If DDAL_IMPLICIT is set in the dwFlags field, the
attachment is implicit and no separate D3dCreateSurfaceEx call is seen for the attached surface. If this flag is not
set, the attachment is explicit and the attached surface results in its own D3dCreateSurfaceEx call. By examining
this flag, the driver can determine whether it must process an attached surface as part of the parent surface's
D3dCreateSurfaceEx call or whether a separate D3dCreateSurfaceEx call will have been made for the attached
surface.
For more information, see the section on surface attachments in DirectDraw Driver Fundamentals and see the
sample code included in D3dCreateSurfaceEx.
D3dCreateSurfaceEx and MIP Maps
Each level in a MIP map is associated with a different handle value. These handles might not be consecutive,
however. The Direct3D DDI is designed so that only the top-level surface's handle is passed as an argument in the
IDirect3DDevice7::SetTexture API method (described in the Direct3D SDK documentation), and then the current
level-of-detail is specified by a texture stage state (D3DTSS_MAXMIPLEVEL). The most natural way to work with MIP
maps is to build one driver-side structure that represents the entire MIP map.
D3dCreateSurfaceEx Handles and Flip
DirectDraw surface structures are designed to represent conceptual surfaces, not necessarily specific locations in
video memory. The main usage of this abstraction is in a primary flipping chain, where the application uses one
constant surface object to represent the back buffer, even though the back buffer may be moving around in video
memory as a result of the DdFlip function.
The DdFlip function takes a ring of surfaces, and sequentially reassigns their video memory pointers around this
ring. In the particular case of two surface objects, the process is reduced to trading their video memory pointers. In
addition, the DirectDraw runtime also rotates the D3dCreateSurfaceEx handles associated with each surface, and
the driver-owned contents of the dwReserved1 members of each surface. This behavior has some interesting
consequences for a DirectX 7.0 driver, and effectively rules out the embedding of pointers to DirectDraw surface
structures inside the driver's own surface structures.
Consider two surface objects, A and B, that have associated handles HA and HB, and fpVidMem (a member of the
DD_SURFACE_GLOBAL structure) values of FA and FB. Further, suppose that the application is using surface
structure A to refer to the back buffer of a flipping chain. At DdFlip time, the handles and both fpVidMem values
are swapped, so that surface A has HB and FB, and surface B has HA and FA . The application now tries to draw to the
back buffer, surface A, which should represent the video memory at FB (because the application initiated a call to
DdFlip).
A drawing command is issued to the driver, which looks up the handle associated with that surface (which is now
HB, not HA ). What would happen if the driver merely stores a pointer to the DirectDraw surface structure? The driver
looks up HB, then follows the stored pointer to surface B, which now has an fpVidMem value of FA . Drawing begins
on the video memory at FA . This is not what the application is expecting. If, on the other hand, the driver stores
surface data in its own structures, rather than following a pointer to the DirectDraw surface structure, then HB still
resolves to FB, and drawing occurs on the correct surface. This latter case is the way the current DDI is implemented.
D3dCreateSurfaceEx Handles and DirectDraw DDIs
Handles do not completely insulate a DirectX 7.0 driver from the DirectDraw-managed
DDRAWI_DDSURFACE_MORE and DDRAWI_DDSURFACE_LCL structures. These structure names are essentially
aliases for the structures DD_SURFACE_MORE and DD_SURFACE_LOCAL. In DirectDraw DDIs such as DdBlt and
DdFlip, the driver is passed surface structure pointers, and must be able to use these structures instead of its private
representations.
Points to Consider when Restoring Complex Flipping
Chains
When a complex primary surface is created, it might or might not have an attached Z-buffer. When a surface is
restored, it is possible that the application has added an attachment to a Z-buffer. Driver writers should be aware of
these different scenarios when going through surface attachment lists in D3dCreateSurfaceEx.
A typical technique, presented in the Perm3 sample driver, is to mark a surface's dwReserved1 field when
D3dCreateSurfaceEx is called for that surface. The driver only marks the surface if fpVidMem (a member of the
DD_SURFACE_GLOBAL structure) is not equal to zero. This is because fpVidMem could be zero because the
application restored a primary surface that had a back buffer with an explicitly attached Z-buffer, but the Z-buffer
had not yet been restored. At some later time, the application restores the Z-buffer, and the driver then marks it. If
the application restores the Z-buffer before restoring the primary chain, the driver may receive the Z-buffer, already
marked, attached to the back buffer when D3dCreateSurfaceEx is called.
Lost Surfaces, Surface Handles, and Contexts
Some surfaces should be referred to in a context (such as the render target, Z-buffer, or a texture), typically by
storing the handles for those surfaces in the driver's context. These surfaces may become lost (destroyed) as far as
the driver is concerned. The context itself may survive, but may now have stale (invalid) handles to lost surfaces. The
runtime guarantees that no rendering commands are passed to the context while it is in this state, but there is still
the problem of how to reassociate the context with restored surfaces before rendering can begin again. The runtime
guarantees that the handles for lost surfaces do not change. This in turn guarantees that if a context keeps handles
to its surfaces (render target, Z, and textures), then these surfaces are always recreated (restored) with the same
handle values before rendering can resume on this context.
Some driver writers may wish to optimize the state of a context by storing surface data directly in the context,
rather than doing the work of dereferencing the handles on every D3dDrawPrimitives2 call. Because the cost of
this dereference is likely to be small compared to the cost of executing a batch of D3dDrawPrimitives2 tokens, this
optimization is not encouraged. However, if driver writers wish to do so, then they must be aware that surfaces may
move when they are restored. This means that while the handle may be the same, there is no guarantee that a
surface is restored with the same fpVidMem (a member of the DD_SURFACE_GLOBAL structure) pointer with
which it was created. Contexts that are optimized in this way may end up with stale video memory pointers, and
have no information that the surface has moved. One method to deal with this is that the driver may tag any
surface when it is associated with a context (as render target, Z-buffer, or texture). Then at D3dCreateSurfaceEx
time, it can search for any context that refers to this surface and then update that context.
It is not recommended that a surface keep a pointer to a context, because one surface may be associated with more
than one context.
The concept of lost surfaces was introduced in the DirectDraw SDK documentation. Lost surfaces have some
implications in the DirectX 7.0 DDI model. For more information, see Losing and Restoring DirectDraw Surfaces.
Direct3D Texture Management
Although texture support is optional, most of today's drivers are capable of supporting it. Drivers that support
texture mapping must respond to all of the texture-related operation codes in the Microsoft Direct3D DDI. For
more information about texture-related operation codes, see D3DHAL_DP2OPERATION.
Drivers must also validate the texture stage states with the D3dValidateTextureStageState callback.
The following sections describe how drivers implement support for textures:
Multiple Textures
Paletted Textures
Texture Blitting
Driver-Managed Textures
Driver-Managed Resources
Multiple Textures
Direct3D drivers can support the simultaneous use of multiple textures by using texture-stage states of the texture
stage state type D3DTEXTURESTAGESTATETYPE, which is described in the DirectX SDK documentation. This type
allows all the properties of a texture to be defined and combined with vertex extensions that specify independent
sets of texture coordinate data.
Adding multiple texture support for Direct3D drivers requires setting the correct capability bits (Caps),
implementing texture blending, and implementing D3dValidateTextureStageState.
To be compliant with DirectX 6.0 and later versions, a driver is required to properly parse up to eight texture
coordinate sets, even if the device can only iterate and use the number of coordinates defined in the dwFVFCaps
member of the D3DHAL_D3DEXTENDEDCAPS structure. The driver uses D3DTSS_TEXCOORDINDEX to get the
correct coordinates to use for texturing.
Flexible vertex formats (FVF) allow multiple texturing because they make it possible to pass more than one texture
coordinate in the vertex structure. Multiple textures can then be blended together in an iterated process and applied
to a piece of geometry.
Texture handles are no longer generated by Direct3D drivers. Instead, the texture handles are generated by the
Direct3D runtime. Texture cache management is done completely by the Direct3D runtime so that, to the driver,
textures always appear to come from the application itself. All texture state is sent to the driver in the
D3dDrawPrimitives2 command stream.
With the addition of multiple texturing, the methods for blending and texture filtering have also been refined to
provide a clearer and more well-defined mechanism for blending. For more information about these blending and
texture filtering mechanisms, see the Microsoft DirectX SDK documentation.
If a driver sets the D3DDEVCAPS_SEPARATETEXTUREMEMORIES flag in the DevCaps member of the D3DCAPS8
structure, it indicates to DirectX 8.0 and later versions of applications that they are disabled from simultaneously
using multiple textures. The driver returns a D3DCAPS8 structure in response to a GetDriverInfo2 query as
described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this query is described in Supporting
GetDriverInfo2.
Texture Stages
The texture stage indicates the location of the texture in the texture pipeline. The position with the highest non-
NULL texture is closest to the frame buffer. Each stage is a texture blending unit that performs the operation used
to combine an associated texture onto a polygon, as shown in the following figure.
The current texture enters the stage and is blended with another texture and a diffuse component with the result
being passed forward to the next stage in the texture pipeline (or frame buffer if this is the last stage).
There are eight texture stages, numbered zero through seven, with zero being furthest from the frame buffer, and
corresponding to the render state texture handle D3DRENDERSTATE_TEXTUREHANDLE, which is described in the
DirectX SDK documentation. The driver must handle up to eight texture coordinates, even if the hardware does not
support that many.
In multiple texture rendering, the lower-numbered texture stages are farther away from the frame buffer. The
lowest texture stage in the cascade is picked up and filtered to get a texel, or texture element. A blending operation
occurs between that texel and the next as it cascades down the texture pipeline toward the frame buffer.
For example, if two textures, Texture0 and Texture1, are blended together, the resulting texel enters the rasterization
pipeline just as a single texture would using legacy texturing. With three textures, Texture0 gets blended with
Texture1. The resulting texel is then blended with Texture2 according to some programmable weight. This means
that Texture0 cannot influence Texture2 directly; it can only do so by being blended with Texture1, as illustrated in
the following figure.
Each texture stage introduces one texture into the pipeline. The pixel pipeline is separate and comes after multiple
texture operations. This may include fog application or frame buffer alpha blending.
Processing Texture Stages
The driver uses the D3DDP2OP_TEXTURESTAGESTATE operation code and D3DHAL_DP2TEXTURESTAGESTATE

structures that follow in the command stream to process changes to texture-stage states. For information about
how the driver processes operation codes, see Command Stream.
For example, when the operation code is D3DDP2OP_TEXTURESTAGESTATE, and the value of the wStateCount
member of the D3DHAL_DP2COMMAND structure is seven, then seven D3DHAL_DP2TEXTURESTAGESTATE
structures follow before the next D3DHAL_DP2COMMAND instruction is reached. Each
D3DHAL_DP2TEXTURESTAGESTATE structure contains a dwStage member that specifies which stage of the texture
blending pipeline needs to have a texture state change. The TSState member of the same structure specifies which
state of the D3DTEXTURESTAGESTATETYPE enumerated type to set, and the dwValue member of the
D3DHAL_DP2TEXTURESTAGESTATE structure contains the value to which the specified state should be set.
The process is the same for all render states, or any other type of instruction. If the bCommand member of the
D3DHAL_DP2COMMAND structure is D3DDP2OP_RENDERSTATE, then the structure to follow is a
D3DHAL_DP2RENDERSTATE structure and the information in that structure is used to set the render state
accordingly.
Rather than using distinct Boolean-valued render states to control the coordinates, each render state value is a set
of flags composed with the D3DWRAP_U and D3DWRAP_V flags (defined in d3dtypes.h). This change was made for
compatibility with higher-dimensional textures.
Other useful information pertaining to multiple texture implementation can be found in the DirectX SDK
documentation, in the sections covering blend equations, semantics of per-texture states, color operations, and
alpha operations. For more information about the texture stage state types enabled for DirectX 6.0 and later
versions, see the D3DTEXTUREOP and D3DTEXTUREFILTERTYPE enumerated types.
Note DirectX 9.0 and later applications can use values in the D3DSAMPLERSTATETYPE enumeration to control the
characteristics of sampler texture-related render states. In DirectX 8.0 and earlier, these sampler states were
included in the D3DTEXTURESTAGESTATETYPE enumeration. The runtime maps user-mode sampler states
(D3DSAMP_Xxx) to kernel-mode D3DTSS_Xxx values so that drivers are not required to process user-mode
sampler states. For more information about D3DSAMPLERSTATETYPE, see the latest DirectX SDK documentation.
Texture Addressing and Filtering Operations
In Direct3D, texture addressing, filtering, and blending operations are performed by a separate logical unit called a
texture stage. Addressing and filtering operations are described here because they form a logical grouping
independent of the blending operations. For further information about texture operations, see the D3DTEXTUREOP
enumerated type in the DirectX SDK documentation.
Although addressing and sampling operations are defined in conjunction with blending operations in DirectX 6.0
and later versions, in later releases of DirectX they are likely to be independent of the blending operations. The
texture stage states listed in the following table are used to set up texture addressing and filtering operations for
each stage in the texture pipeline.
OPERATION DESCRIPTION
D3DTSS_ANISOTROPY Specifies the anisotropic filtering ratio limit. It specifies the

maximum aspect ratio of anisotropic filtering to be applied
during sampling of this texture.
D3DTSS_MAGFILTER Defines the type of filter used to sample textures when

they are being magnified (that is, when one texel is getting
stretched onto multiple rendering surface pixels). The
filters that can be used for texture magnification are
enumerated in D3DTEXTUREMAGFILTER.
D3DTSS_MAXMIPLEVEL Specifies the maximum MIP map level to be used. It

indicates that this texture should never sample MIP map
levels that are larger that the one indicated. Therefore, the
maximum dimension is 2MAXMIPLEVEL. Zero indicates that
there is no limit.
D3DTSS_MINFILTER Defines the type of filtering that is used to sample textures

when they are being minified, that is, one texel is mapped
onto less than one screen pixel. The filters that can be
used to minify textures are enumerated in
D3DTEXTUREMINFILTER.
D3DTSS_MIPFILTER Defines the type of filtering that is used to sample

between layers of a MIP map. The filters that can be used
for this are enumerated in D3DTEXTUREMIPFILTER.
D3DTSS_MIPLEVEL Allows application to set the MIP level when hardware

cannot. This is overridden when the MIP level is
determined by hardware.
OPERATION DESCRIPTION
D3DTSS_MIPMAPLODBIAS Is a D3DVALUE that specifies the MIP map level of detail

(LOD) bias. This bias affects the MIP map level calculation,
allowing more or less blurring of textures (and more
aliasing) as desired. Units are in MIP levels.
Current WHQL/DCT tests require the MIP map LOD bias
to operate in the range -3.0 to 3.0.
D3DTSS_TEXCOORDINDEX Specifies the index of a texture coordinate set. This integer

indicates the index of the set of texture coordinates from
which the addressing unit should sample. These
coordinates are listed in the incoming flexible vertex
format (FVF) vertex data in numerical order, with zero
being the standard DirectX set of texture coordinates, one
being a second texture coordinate set, and so on. This
allows textures to share sets of texture coordinates as
desired.
Note To be Direct3D-compliant, drivers are required to properly parse up to eight texture coordinate sets, even if
the device can only iterate and use the number of coordinates defined in dwFVFCaps. The driver must use
D3DTSS_TEXCOORDINDEX to grab the right coordinates to use for texturing.
Texture Stage Operations
An application performs blending operations for texture stages by calling the

IDirect3DDevice7::SetTextureStageState method. Multiple texture blending operations are performed by a set
of texture blending unit stages. Each of these can be individually programmed to carry out a variety of texture
blending operations, selected by a parameter. For a description of IDirect3DDevice7::SetTextureStageState, see
the Direct3D SDK documentation.
Direct3D does not provide a mechanism for specifying more than one texture being introduced at each blending
stage. Saturation is defined to occur between texture stages in the pipeline, but it should occur as late as possible
within each stage.
The following operations, which are enumerated in D3DTEXTUREOP, are required for PC98 compatibility
compliance:
D3DTOP_DISABLE
D3DTOP_SELECTARG1, D3DTOP_SELECTARG2
D3DTOP_MODULATE
D3DTOP_ADD
D3DTOP_BLENDTEXTUREALPHA
The default values are D3DTOP_MODULATE for stage one, and D3DTOP_DISABLE for all other stages.
D3DTOP_MODULATE is used for stage one for backward compatibility, but, by default, all texturing should be
disabled.
Texture Stage Arguments
Each of the multiple texture blending operations combines two inputs. These can be selected by calling the
IDirect3DDevice7::SetTextureStageState method and specifying one of the following members of the
D3DTEXTURESTAGESTATETYPE enumerated type.
ENUMERATOR MEANING
D3DTSS_ALPHAARG1 Controls first input to alpha operation.
D3DTSS_ALPHAARG2 Controls second input to alpha operation.
D3DTSS_COLORARG1 Controls first input to color operation.
D3DTSS_COLORARG2 Controls second input to color operation.
For a description of IDirect3DDevice7::SetTextureStageState, see the Direct3D SDK documentation.

Texture Argument Flags
At each texture stage, any of the preceding four parameters can be set using the texture argument flags listed in the
following table.
FLAG MEANING
D3DTA_CURRENT The texture argument is the result of the previous

blending stage. In the first texture stage (stage zero), this
argument is equivalent to D3DTA_DIFFUSE. This can be
thought of as the current color of the polygon as each
texture is blended onto it. If the previous blending stage
uses a bump-map texture (the D3DTOP_BUMPENVMAP
operation), the system chooses the texture from the stage
before the bump-map texture. (If s represents the current
texture stage, and s - 1 contains a bump-map texture, this
argument becomes the result output by texture stage s -
2.)
D3DTA_DIFFUSE The iterated color data obtained from the Gouraud

interpolators. This is often used as the ARG2 on the first
texture, because there is no D3DTA_CURRENT texture
color at that point.
FLAG MEANING
D3DTA_TEXTURE The texture bound to this texture stage using the

IDirect3DDevice7::SetTexture(n, lpTex3) method
(described in the Direct3D SDK documentation), where n is
the stage number. IDirect3DDevice7::SetTexture defines
which texture object to use for the texture in this stage
when D3DTA_TEXTURE is one of the arguments.
D3DTA_TEXTURE can only be present in
D3DTSS_ALPHAARG1 and D3DTSS_COLORARG1, but not
in D3DTSS_ALPHAARG1 and D3DTSS_COLORARG2.
D3DTA_TFACTOR A value set in Direct3D with

D3DRENDERSTATE_TEXTUREFACTOR.
Note Some implementations may not be able to simultaneously use both D3DTA_TFACTOR and a D3DTA_DIFFUSE
color.
Modifier Flags
The two values listed in the following table should be combined with one of the preceding flags using the bitwise
OR operator.
VALUE MEANING
D3DTA_ALPHAREPLICATE Indicates that this argument should have its alpha channel
replicated to all color channels before use in this
operation. If this is a texture with only one component, it
is automatically replicated to all color channels for these
operations. This flag need not be specified for the
ALPHA_ARGs, but using it does not produce an error.
D3DTA_COMPLEMENT Indicates that this argument should be inverted for the

operation.
Defaults
The following default values are used if a state is not filled in by the application. While these default values have
been defined to make multiple texture operations more convenient, robust code always fully specifies the desired
state.
D3DTSS_COLORARG1 and D3DTSS_ALPHAARG1 both default to D3DTA_TEXTURE, if the corresponding texture has
been set. If no texture has been set for this stage, then these both default to D3DTA_DIFFUSE.
D3DTSS_COLORARG2 and D3DTSS_ALPHAARG2 default to D3DTA_CURRENT. Note that D3DTA_CURRENT
defaults to D3DTA_DIFFUSE on the first stage (except as noted in the description of D3DTA_CURRENT).
ARG2 defaults to D3DTA_DIFFUSE, but is ignored because the operation defaults to D3DTOP_SELECTARG1.
D3DTA_DIFFUSE defaults to 0xFFFFFFF if no diffuse color is specified in the flexible vertex format (FVF) data.
D3DTA_SPECULAR defaults to 0x00000000 if no specular color is specified in the FVF data.
D3DTA_CURRENT defaults to D3DTA_DIFFUSE if this is the first stage except when the previous blending stage is a
D3DTOP_BUMPENVMAP or D3DTOP_BUMPENVMAPLUMINANCE color operation. In this case, the following
occurs:
If the previous stage is D3DTOP_BUMPENVMAP or D3DTOP_BUMPENVMAPLUMINANCE, then this
argument is the result of the stage before the previous stage.
In the second texture stage (stage one), this argument defaults to D3DTA_DIFFUSE.
D3DTA_TEXTURE is a value for a D3DTSS_COLORARG1 or D3DTSS_ALPHAARG1 state of any stage, or defaults to
0x0 if no texture is bound to this stage.
Multiple Texture Validation
Current hardware does not necessarily implement everything that Direct3D can express. The application determines
whether a particular blending operation can be performed by first setting up the desired blending mode, and then
calling the IDirect3DDevice7::ValidateDevice method. The driver must accurately report its capabilities at
initialization time and support D3dValidateTextureStageState to allow its capabilities to be validated. Validation
also covers operations specified at the TBLEND level. For information about IDirect3DDevice7::ValidateDevice,
see the Direct3D SDK documentation.
The following table lists the return codes for IDirect3DDevice7::ValidateDevice.
RETURN CODE MEANING
CONFLICTINGTEXTUREFILTER The hardware cannot do trilinear filtering and multiple

texturing at the same time.
TOOMANYOPERATIONS The hardware cannot handle the specified number of

options.
UNSUPPORTEDALPHAARG The specified alpha argument is unsupported.
UNSUPPORTEDALPHAOPERATION The specified alpha operation is unsupported.
UNSUPPORTEDCOLORARG The specified color argument is unsupported.
UNSUPPORTEDCOLOROPERATION The specified color operation is unsupported.
UNSUPPORTEDFACTORVALUE The hardware cannot support D3DTA_TFACTOR greater

than 1.0.
WRONGTEXTUREFORMAT The hardware cannot support the current state in the

selected texture format.

Paletted Textures
Direct3D allows palettes to be used with textures. A palette can be attached to a texture, just as it can to any other
DirectDrawSurface object. To support paletted textures, drivers must respond to the D3DDP2OP_SETPALETTE and
D3DDP2OP_UPDATEPALETTE operation codes in their implementation of D3dDrawPrimitives2. These operation
codes are followed by D3DHAL_DP2SETPALETTE and D3DHAL_DP2UPDATEPALETTE structures, respectively, in
the command stream. D3DDP2OP_SETPALETTE creates an association between a palette handle and a surface
handle (already created by D3dCreateSurfaceEx). Later, D3DDP2OP_UPDATEPALETTE can be sent multiple times
to set the values of the palette entries for this texture.
Texture Blitting
An important change to the Direct3D DDI, introduced in DirectX 7.0, is that textures are blitted by embedding a
token in the D3dDrawPrimitives2 command stream. This token is D3DDP2OP_TEXBLT, and it signals the driver
that a texture has to be transferred from a backing surface into local or nonlocal video memory.
Also, instead of the driver being responsible for creating internal handles for textures through the legacy
D3dTextureCreate and D3dTextureDestroy callbacks, the runtime now assigns a handle number to each
DirectDrawSurface object that is created within the Direct3D context. The driver is signaled about this handle
number through the D3dCreateSurfaceEx callback.
D3dCreateSurfaceEx is called after every hardware abstraction layer (HAL) DdCreateSurface call is finished.
D3dCreateSurfaceEx is also called after every internal hardware emulation layer (HEL) CreateSurface call is
finished. The HEL call usually occurs when a backing DirectDrawSurface object is created. These calls may occur
before and after a Direct3D context is created with D3dContextCreate.
Also, when the application is running, a call is made to D3dDestroyDDLocal to clean up and destroy any driver
data created explicitly for these surfaces. This call is also made before a Direct3D context is created. This is done to
ensure that there are no dirty handles associated with any contexts that have not been cleaned up. This is simply a
preventative measure that should not actually destroy anything if contexts are properly cleaned up after use.
Driver-Managed Textures
The driver can manage textures that have been marked as manageable. These DirectDrawSurface objects are
marked as manageable with the DDSCAPS2_TEXTUREMANAGE flag in the dwCaps2 member of the structure
referred to by lpSurfMore->ddCapsEx. (lpSurfMore and ddCapsEx are members of the DD_SURFACE_LOCAL
and DD_SURFACE_MORE structures, respectively.)
The driver supports driver-managed textures by setting the dwCaps2 member of the DDCORECAPS structure to
the DDCAPS2_CANMANAGETEXTURE bit. The driver specifies this DDCORECAPS structure in the ddCaps member
of a DD_HALINFO structure. DD_HALINFO is returned by DrvGetDirectDrawInfo in response to the initialization
of the DirectDraw component of the driver.
The driver can then create the necessary surfaces in video or nonlocal memory in a "lazy" fashion. That is, the
driver leaves the textures in backing surfaces until it requires them, which is just before rasterizing a primitive that
makes use of the texture.
Surfaces should be evicted primarily by their priority assignment. The driver responds to the
D3DDP2OP_SETPRIORITY operation code in the D3dDrawPrimitives2 command stream. This operation code sets
the priority for a given surface. As a secondary measure, the driver should use a least-recently-used (LRU) scheme
to evict surfaces. The driver uses this scheme whenever the priority of two or more textures is identical in a
particular scenario. Logically, any surface that is in use should not be evicted at all.
If a driver supports managed surfaces, then the driver may receive a special DdDestroySurface call for a managed
surface in the case where video memory is lost, such as when a mode switch occurs. In this case the
DRAWISURF_INVALID flag is set and the driver simply evicts the video memory copy of this managed surface and
keeps other structures intact. Otherwise, the driver performs a regular destroy surface call.
The driver should handle DdBlt and DdLock calls appropriately when managing textures. This is because any
change to the backing surface image must be propagated into the video memory copy of the surface before the
texture is used again. The driver should determine if it is better to update just a portion of the surface or all of it.
For example, if the driver's DdLock function is called to modify only a portion of a backing (system memory) image
of the video memory copy of a surface, then when the driver's DdBlt function is called the driver can optimize the
update by just blitting the necessary subsurface from system memory into video memory.
The driver is allowed to perform texture management in order to perform optimization transformations on the
textures or to decide for itself where and when to transfer textures in memory.
Driver-Managed Resources
In addition to supporting texture management as described in Driver-Managed Textures, a DirectX 8.1 driver can
also manage resources in general, such as textures, volume textures, cube-map textures, vertex buffers, and index
buffers.
The driver supports driver-managed resources by setting the dwCaps2 member of the DDCORECAPS structure to
the DDCAPS2_CANMANAGERESOURCE bit. The driver specifies this DDCORECAPS structure in the ddCaps
member of a DD_HALINFO structure. DD_HALINFO is returned by DrvGetDirectDrawInfo in response to the
initialization of the DirectDraw component of the driver.
Primitive Drawing and State Changes
All Microsoft Direct3D graphics primitives and state changes are passed to the D3dDrawPrimitives2 callback in
command and vertex buffers. The driver must parse these buffers and process all drawing and state change
requests.
The following sections discuss the layout of command and vertex buffers and describe how the driver should
process them:
Command and Vertex Buffers
Direct3D Command Buffers
Direct3D Vertex Buffers
Accelerated State Management
Command and Vertex Buffers
The D3dDrawPrimitives2 DDI uses two types of buffers: command buffers and vertex buffers. Command buffers
contain instructions followed by data in a structure similar to that of execute buffers. Command buffers might
contain indexed and nonindexed primitives and, occasionally, inline vertex data. Command buffers can be either
API-level execute buffers or Direct3D internal command buffers. For a description of the main input structure, see
D3DHAL_DRAWPRIMITIVES2DATA.
With internal command buffers, the driver allocates the memory and may do multibuffering. Internal command
buffers are write-only. The instruction format can be seen in D3DHAL_DP2COMMAND.
If the D3DHALDP2_USERMEMVERTICES flag is set, the vertex buffer is specified by a user-memory pointer.
Otherwise, the vertex buffer is a DirectDrawSurface that can be an API-level execute buffer, an internal implicit
vertex buffer, or an API-level vertex buffer.
The vertex buffer API can create, destroy, lock and unlock vertex buffers, and can use the
IDirect3DVertexBuffer7::ProcessVertices method to process vertices from source to destination buffers. The
IDirect3DDevice7::DrawPrimitiveVB and IDirect3DDevice7::DrawIndexedPrimitiveVB methods are the
primary API-level calls. Vertex buffers can also be optimized, but optimized vertex buffers cannot be locked. For
descriptions of these three methods, see the Direct3D SDK documentation.
Command and Vertex Buffer Allocation
There are three types of buffers used in Direct3D:

Implicit vertex buffers, which are created for internal use only; that is, applications are unaware of them. One
implicit vertex buffer is always created after context creation and Direct3D stores vertex data in them.
Explicit vertex buffers, which are created only in response to an application request. Direct3D then stores
vertex data in explicit vertex buffers.
Command buffers, which are created for internal use only; that is, applications are unaware of command
buffers. Direct3D stores command data in command buffers.
Implicit vertex buffers are special vertex buffers used internally by Direct3D for batching. They are created during
device initialization and can be multibuffered. They are always read/write so they should not be put in video
memory (for Microsoft DirectX 6.0 and the later versions). This type of buffer is marked by the absence of both the
DDSCAPS2_VERTEXBUFFER and DDSCAPS2_COMMANDBUFFER flags.
Explicit vertex buffers are created and controlled by the application. These cannot be multibuffered and cannot be
put into local or nonlocal video memory unless the DDSCAPS_WRITEONLY flag is set. Explicit vertex buffers are
marked with DDSCAPS_VERTEXBUFFER.
Command buffers are used by Direct3D to batch commands. They can be multibuffered and are used for all APIs
except for TLVERTEX or unclipped execute-buffer API calls. This type of buffer is marked by the flag
DDSCAPS2_COMMANDBUFFER. They are always write-only, though no explicit flag is set and they never contain
invalid instructions.
By default, the Direct3D runtime allocates all of these buffers. Implicit vertex buffers and command buffers are
accessed through the surfaces with which they are associated. All buffers are passed to the driver's
D3dDrawPrimitives2 callback.
Driver-Allocated Vertex and Command Buffers
A Direct3D driver optionally performs the allocation of vertex and command buffers by supplying callback
functions. To supply these callback functions, the Direct3D driver fills out a DD_D3DBUFCALLBACKS structure and
points the lpD3DBufCallbacks member of the DD_HALINFO structure to it. DD_HALINFO is returned by
DrvGetDirectDrawInfo in response to the initialization of the DirectDraw component of the driver. The callbacks
reported in the DD_D3DBUFCALLBACKS structure are:
CanCreateD3DBuffer
CreateD3DBuffer
DestroyD3DBuffer
LockD3DBuffer
UnlockD3DBuffer
These functions are called in the same way as the DdXxxSurface callbacks (such as DdCanCreateSurface) and only
when the DDSCAPS_EXECUTEBUFFER flag is set. The buffer creation flags are DDSCAPS_WRITEONLY,
DDSCAPS2_VERTEXBUFFER, and DDSCAPS2_COMMANDBUFFER.
Drivers determine the type of buffer being requested by checking the ddsCaps member of the
DD_SURFACE_LOCAL structure passed to the CanCreateExecuteBuffer and CreateExecuteBuffer callback for
the following flags:
DDSCAPS_VERTEXBUFFER indicates that the driver should allocate an explicit vertex buffer.
DDSCAPS_COMMANDBUFFER indicates that the driver should allocate a command buffer.
If neither flag is set, the driver should allocate an implicit vertex buffer.
The driver internally allocates vertex and command buffers and cycles through these buffers. Direct3D fills a given
pair while the hardware asynchronously renders from the other queued buffers. This is very useful with direct
memory access (DMA).
Buffers in a multibuffering set can be in different memory types, that is, in system or video memory. When the
driver is called to create the first buffer, it creates the set immediately and returns the first buffer in the set to
Direct3D. The driver uses flags to specify the type of memory that it used to allocate each buffer in the set. The
driver should return a new buffer in system memory for each call to D3dDrawPrimitives2 if the
D3DHALDP2_SWAPVERTEXBUFFER or D3DHALDP2_SWAPCOMMANDBUFFER flag is set. If the returned buffer is in
video memory, the corresponding D3DHALDP2_VIDMEMVERTEXBUF or D3DHALDP2_VIDMEMCOMMANDBUF flag
should be set.
Occasionally, Direct3D requests the minimum size for the next buffer. If the size is too large, the driver should
allocate the buffer in system memory (a backing surface). If the size is too small, the driver is permitted to provide a
larger buffer. The driver should keep track of how many buffers and what memory types they are and clean up
everything on exit.
Direct3D Command Buffers
The following figure shows portions of a sample logical command buffer. The driver's D3dDrawPrimitives2
callback receives a pointer to a command buffer in the lpDDCommands member of the
D3DHAL_DRAWPRIMITIVES2DATA structure. The command buffer is always processed sequentially.
As shown in the preceding figure, a command buffer contains D3DHAL_DP2COMMAND structures, where the
bCommand member of each structure identifies a command. The following lists possible commands:
D3DDP2OP_RENDERSTATE indicates that there are wStateCountD3DHAL_DP2RENDERSTATE structures
that follow in the command buffer. The driver should parse the state from each of these structures and
update its private driver state accordingly. The driver should also update the appropriate state in the array to
which lpdwRStates points. If the driver does not support the state requested in the command buffer, the
driver should override the requested value with one that it supports.
D3DDP2OP_TEXTURESTAGESTATE indicates that there are
wStateCountD3DHAL_DP2TEXTURESTAGESTATE structures that follow in the command buffer. The
driver should parse the state from each of these structures and update the driver's texture state associated
with the specified texture stage accordingly. The driver does not report texture stage state back to the
Direct3D runtime.
A driver is required to properly parse up to eight texture coordinate sets regardless of how many coordinate
sets it actually uses.
D3DDP2OP_VIEWPORTINFO indicates that there is one D3DHAL_DP2VIEWPORTINFO structure that
follows in the command buffer. The driver should parse this structure and update the viewport information
stored in the driver's internal rendering context.
D3DDP2OP_WINFO indicates that there is one D3DHAL_DP2WINFO structure that follows in the command
buffer. The driver should parse this structure and update the w-buffer information stored in the driver's
internal rendering context.
Any of the remaining D3DDP2OP_Xxx commands indicate that there is enough data following in the
command buffer to render wPrimitiveCount (a member of the D3DHAL_DP2COMMAND structure)
primitives. Depending on the primitive command, the driver should parse D3DHAL_DP2Xxx structures from
the command buffer and vertex-associated data from either or both the vertex buffer and command buffer.
The driver must attempt to process all valid D3DDP2OP_Xxx commands; that is, the driver cannot choose to
ignore certain defined primitive types. For more information, see the individual D3DHAL_DP2Xxx structure
reference pages.
Depending on the current command, the following additional information is stored in the command buffer:
Index information for all D3DDP2OP_INDEXEDXxx primitive commands.
Vertex data for the D3DDP2OP_TRIANGLEFAN_IMM and D3DDP2OP_LINELIST_IMM primitive commands.
Additional operations are also defined as D3DDP2OP_Xxx opcodes in the D3DHAL_DP2OPERATION
structure. These are equivalent to D3DDP2OP_Xxx commands with the same names.
The command buffer occasionally contains commands that are understood only by Direct3D. If the driver's
D3dDrawPrimitives2 callback does not recognize the command, the driver should call Direct3D's
D3dParseUnknownCommand callback to attempt to parse it. When D3dParseUnknownCommand returns
successfully, the driver should continue parsing and processing the command buffer. If
D3dParseUnknownCommand fails by returning D3DERR_COMMAND_UNPARSED, D3dDrawPrimitives2
should set the following members of the D3DHAL_DRAWPRIMITIVES2DATA structure and return:
In dwErrorOffset, write the offset of the first unhandled D3DHAL_DP2COMMAND structure that is part of
the buffer to which lpDDCommands points.
Set ddrval to D3DERR_COMMAND_UNPARSED.
For information about how to initialize the D3dParseUnknownCommand callback, see Direct3D Driver
Initialization.
To simplify implementation of D3dDrawPrimitives2, driver writers can copy the parsing code from the Perm3
sample code and write driver-specific rendering and state update code only.
Direct3D is not always informed of the current render states. For example, execute buffers are not inspected by the
runtime before they reach the driver. The driver can keep track of the render state array with the lpdwRStates
member of the D3DHAL_DRAWPRIMITIVES2DATA structure. This is a pointer to the internal render states array
that the driver keeps up to date as state changes occur.
Direct3D Vertex Buffers
A vertex buffer contains the vertex data associated with a command buffer's primitives in a call to
D3dDrawPrimitives2. Vertices are represented using the flexible vertex format (FVF), where each vertex can have
the following data associated with it:
Position (x,y,z, and optional w) (required)
Diffuse color (optional)
Specular color (optional)
Texture coordinates (optional). Direct3D can send up to a maximum of eight sets of (u,v) values.
Drivers must provide FVF support.
The actual vertices and the order in which they should be processed depends on the D3DDP2OP_Xxx primitive
command just parsed from the command buffer. For details, see the individual D3DHAL_DP2Xxx structure
reference pages.
Accelerated State Management
Accelerated state management is a mechanism for communicating large state changes across the API and DDI in a
single call. This scheme allows an application to define a collection of state-set calls as a state block defined by a
single integer. Sending this integer as a render state executes all the state changes in one call.
This reduces API overhead by reducing the number of IDirect3DDevice7::SetRenderState method calls required,
and can improve the efficiency of drivers by allowing them to "precompile" stage changes into their own hardware-
specific format upon a state block define, instead of on execution of each state change.
IDirect3DDevice7::SetRenderState is described in the Direct3D SDK documentation
Most applications render in only a handful of states, so having fine-grained state transitions is seldom important.
What is more important is being able to define blocks of state that can be interchanged as the driver switches
between common rendering scenarios. This is the whole point of accelerated state management.
State-set tokens are used to record the states in the driver. A handle refers to a collection of states. The
D3DHAL_DP2STATESET structure informs the driver about what state-set operations to perform.
If the dwOperation member of the D3DHAL_DP2STATESET structure is set to D3DHAL_STATESETBEGIN, the driver
begins recording the states for the handle contained in the dwParam member. When the driver receives a
dwOperation of D3DHAL_STATESETEND, it stops recording state.
If the dwOperation member is D3DHAL_STATESETDELETE, the state-set referred to by the dwParam handle
should be deleted.
If the dwOperation member is D3DHAL_STATESETEXECUTE, the state block referred to by the dwParam handle
should be applied in the device.
If the dwOperation member is D3DHAL_STATESETCAPTURE, the current state in the driver should be captured in a
specific way, giving a snapshot of the current states defined in the state block. That is, only states that are already in
the state block are captured. Thus, the state block acts as a sort of mask, only recording states that are defined in it.
For example, if there is a D3DRENDERSTATE_ZENABLE render state in the state block, then the current state for
D3DRENDERSTATE_ZENABLE is captured and put in the state block. If there is no D3DRENDERSTATE_ZENABLE in
the state block, then that state is not captured.
Groupings of states are used to make generic state blocks that can be modified slightly for different rendering
scenarios. These predefined groupings (enumerated in D3DSTATEBLOCKTYPE in the DirectX SDK documentation)
define generic state blocks that can be subsequently modified with state changes to accommodate anticipated
recurring rendering scenarios. For example, the driver might create 100 generic predefined state blocks and then
modify each slightly to accommodate a different rendering scenario. The state block type is passed in the sbType
member of the D3DHAL_DP2STATESET structure.
The sbType member is only valid for D3DHAL_STATESETBEGIN and D3DHAL_STATESETEND and specifies the
predefined state block type with one of the following D3DSTATEBLOCKTYPE enumerated types: NULL for no state,
D3DSBT_ALL for all state, D3DSBT_PIXELSTATE for pixel state, and D3DSBT_VERTEXSTATE for vertex state.
The driver should ignore thesbType member unless it implements render state extensions. If the driver
implements extended render states, that is, render states beyond those the Direct3D runtime supplies, it can use
sbTypeto determine what type of predefined render states are being used. From this information it can determine
how to append the state block appropriately, to support its extensions.
Setting the Number of Line Pattern Repetitions
Applications can direct a Direct3D device to render primitives using solid or patterned lines. Applications can also
stretch a particular line pattern if the device supports repeating the pattern. The device's driver must set the
D3DPMISCCAPS_LINEPATTERNREP flag to indicate that the device supports repeating a particular line pattern. How
this flag is set depends on the DirectX version:
For DirectX 7.0 and earlier, set this flag in the dwMiscCaps member of the D3DPRIMCAPS structure.
For DirectX 8.0 and later, set this flag in the PrimitiveMiscCaps member of the D3DCAPSXx structure,
where Xx indicates the DirectX version (for example, D3DCAPS8 for version 8 and D3DCAPS9 for version 9).
D3DCAPS8 and D3DCAPS9 are described in their respective versions of the DirectX SDK documentation.
When applications set the render-state value for the D3DRENDERSTATE_LINEPATTERN (or D3DRS_LINEPATTERN)
render state, they can specify the number of times to repeat the line pattern by setting the wRepeatFactor member
of the D3DLINEPATTERN structure. Applications can set this member to a maximum value of 65535 (16-bit value).
However, hardware only supports a maximum of 255 (8-bit value). Therefore, a display driver must fail a request
that attempts to set the line-pattern-repetition number to a value greater than 255 as an invalid request.
D3DLINEPATTERN is described in the DirectX SDK documentation.
FVF (Flexible Vertex Format)
The driver's D3dDrawPrimitives2 callback receives vertex data in a flexible vertex format (FVF). Because the
vertex format is flexible, there is no comprehensive data structure defined for this data. Drivers must implement
full FVF functionality.
There is an FVF update for Microsoft DirectX 7.0 that includes 1D, 3D, and 4D textures in addition to the usual 2D
textures. For more information about this update, see FVF Update. See the Perm3 sample driver and DirectX SDK
documentation for more information about these topics.
(Perm3.h). You can get this sample driver from the Windows Server 2003 SP1 Driver Development Kit (DDK),
which you can download from the DDK - Windows Driver Development Kit page of the WDHC website.
Determining the Vertex Buffer Data Format
To identify the format of the data in the vertex buffer, a driver should determine the following information:
The dimension of the textures (1D, 2D, 3D, or 4D)
The components that are present in the FVF data
The ordering of the components that are present
FVF Texture Dimension
The driver should determine the dimension of the textures from the D3DTEXTURETRANSFORMFLAGS texture
coordinate count flags (D3DTTFF_COUNT n, described in the DirectX SDK documentation). The number of the count
flag signals how many texture coordinates are present. Note that this does not necessarily equate to the dimension
of the textures themselves, as explained in the following sections.
Nonprojected Textures
The following lists nonprojected textures:
D3DTTFF_COUNT1 indicates that the rasterizer should expect 1D texture coordinates.
Projected Textures
If projected textures are being used, the D3DTTFF_PROJECTED flag is set to indicate that the texture coordinates are
to be divided by the last (COUNTth ) element of the texture coordinate set. Thus, for a 2D projected texture, the count
would be three, because the first two elements are divided by the third, resulting in two floats for a 2D texture
lookup. That is, both D3DTTFF_COUNT2 and D3DTTFF_COUNT3 | D3DTTFF_PROJECTED reference a 2D texture.
FVF Vertex Data Components
The driver determines which components are present by analyzing the flags specified in the dwVertexType
member of the D3DHAL_DRAWPRIMITIVES2DATA structure. The following table indicates the bitfields that can
be set in dwVertexType and the components that they identify:
VALUE MEANING
D3DFVF_DIFFUSE Each vertex has a diffuse color.
D3DFVF_SPECULAR Each vertex has a specular color.
D3DFVF_TEX0 No texture coordinates are provided with the vertex data.
D3DFVF_TEX1 Each vertex has one set of texture coordinates.

VALUE MEANING
D3DFVF_TEX2 Each vertex has two sets of texture coordinates.
D3DFVF_TEX3 Each vertex has three sets of texture coordinates.
D3DFVF_TEX4 Each vertex has four sets of texture coordinates.
D3DFVF_TEX5 Each vertex has five sets of texture coordinates.
D3DFVF_TEX6 Each vertex has six sets of texture coordinates.
D3DFVF_TEX7 Each vertex has seven sets of texture coordinates.
D3DFVF_TEX8 Each vertex has eight sets of texture coordinates.
D3DFVF_XYZRHW Each vertex has x, y, z, and w coordinates.
Only one of the D3DFVF_TEX n flags is set.

FVF Vertex Component Ordering
Microsoft Direct3D supplies the driver with vertex data whose components are ordered as shown in the following
figure.
Direct3D always sends x,y,z, and w values; the remaining data is sent only as required by an application. Note that
this diagram assumes 2D texture coordinates, although 1D, 3D, and 4D textures are also valid for the latest DirectX
release.
As shown in the preceding figure, vertex data consists of the following components:
1. Location (x,y,z,w) (required)
The first vertex component is four D3DVALUEs that identify the position of the vertex. Direct3D always sets
the D3DFVF_XYZRHW bit in dwVertexType.
2. Diffuse Color (optional).
If present, this component is a D3DCOLOR value that specifies the diffuse color for this vertex. Direct3D sets
the D3DFVF_DIFFUSE bit in dwVertexType when this component is present.
3. Specular Color (optional).
If present, this component is a D3DCOLOR value that specifies the specular color for this vertex. Direct3D
sets the D3DFVF_SPECULAR bit in dwVertexType when this component is present.
4. Texture Data (optional).
This part varies based on the dimension of the texture. For each dimension in the texture, a D3DVALUE
specifies each of the u, v, w, or q components (see explanation of FVF Texture Dimension). For example, if 2D
nonprojected textures are being used, two D3DVALUEs per texture are needed to specify the vertex's u,v
values for each texture up to eight textures total. The number of u,v pairs present is n, where n corresponds
to the D3DFVF_TEXn flag set in dwVertexType. For example, if D3DFVF_TEX3 is set in dwVertexType, then
three u,v pairs are supplied with each vertex.
FVF data is always tightly packed; that is, no memory is wasted on components that are not explicitly specified in
the vertex buffer. For example, when dwVertexType is (D3DFVF_XYZRHW | D3DFVF_TEX2), and the texture
dimension is 2D, each vertex in the buffer consists of eight tightly packed D3DVALUEs. These specify the location
(x,y,z,w) and texture coordinates for two textures (tu₀, tv₀, tu₁, tv₁) as shown in the following figure:
In the preceding figure it is assumed that there are only two texture coordinates. The vertex data supplied to the
driver is always transformed and lit. The driver never receives normals. All data in the FVF texture coordinate sets
are single precision IEEE floats. For implementation details, see the Perm3 sample driver. For more information
about FVF, see the DirectX SDK documentation.
Advanced Direct3D Driver Topics
This section describes Microsoft Direct3D specialty topics and optimizations. These features are not central to
implementing core functionality in a Direct3D driver; they should be implemented later in the driver development
process, after core driver functionality is completed.
The topics described in this section are:
Optimized Textures
Stencil Planes
Guard Band Clipping
Extents Adjustment
W-Buffering
Bump Mapping
Hardware Transform and Lighting
Optimized Textures
In order to increase performance, some hardware vendors have changed their texture formats from the standard
Microsoft DirectDraw surface format. One approach is to tile the pixels so they are arranged in memory with 2D
locality of reference. For example, instead of arranging the pixels scan line by scan line like this:
0 1 2 3 ... width
width+1 width+2 width+3 width+4 ...
pixels are arranged in 4x8 blocks of DWORDs:
0 1 2 3
4 5 6 7
Thus, a single contiguous 32-byte memory reference pulls in a 4x8 block of pixels for use by the rasterizer. This
layout maps much better to the pattern of memory references typically generated by the rasterizer than does the
standard DirectDraw layout. Such schemes are referred to as 'swizzling' or 'patching' the texture. These operations
are relatively more rapid to perform, and do not affect the size of the memory surface allocated. However, there is
still significant latency introduced by this operation that applications must control.
Another approach is to compress the texture into video memory and decompress it into a local cache on the
accelerator. This decreases the video memory bandwidth required to keep the rasterizer running at full speed. In
DirectX 5.0, drivers could do this as well, but it is a much slower operation that affects the application's view of
video memory allocation due to the resulting smaller texture.
To accommodate hardware vendors, DirectX 6.0 and later versions enable "optimized" textures that give the driver
an opportunity to translate the surface into a proprietary format when the texture is used. Once optimized, the
surface cannot be locked and may be smaller or larger than the original.
Many 3D accelerator chipsets on the market have a proprietary format that cannot be described with the standard
DirectDraw mechanisms. This feature is intended to cover all hardware with such surface types.
Currently applications have some textures that must be updated frequently, and others that can be left alone for the
duration of the application. These latter benefit from patching due to improved resulting fill rate. The flags shown in
the Optimized Texture API section allow the application to specify this usage scenario to the driver so it can
optimize performance accordingly.
Optimized Texture API
Three new capabilities indicate the level of optimization that can be applied to a DirectDrawSurface object. In
DirectX 6.0 and beyond, only textures can be marked with these caps bits. The optimized surface paradigm may be
extended in the future to cover other types of surfaces, although the semantics may not be the same as for textures.
To address these issues, three new flags are provided in the DirectDrawSurface7:: Create method. When none of
these three flags is specified, the decision whether to patch or swizzle is left up to the driver. These flags are as
follows:
DDSCAPS2_HINTDYNAMIC
Indicates to the driver that this surface is locked frequently, (for example, once per frame) for uses such as with
streaming video or procedural textures. This cap should work for all driver-enumerated texture surface formats.
The driver should avoid any transformation for these textures, especially if it requires some overhead.
DDSCAPS2_HINTSTATIC
This indicates to the driver that the surface can be reordered/retiled/swizzled in the IDirect3DDevice7::Load and
IDirectDrawSurface7::Blt methods (described in the Direct3D SDK and DirectDraw SDK documentation,
respectively). This operation does not change the size of the texture. It is relatively fast, and symmetrical, because
the application may still lock these bits (although it takes a performance hit when doing so). Drivers are not allowed
to fail locks on these surfaces and therefore cannot use lossy compression techniques. MIP map surfaces can be
interleaved in this case.
This cap is not intended to force swizzling under any circumstances, especially those in which no performance
benefit arises. Some texel formats may silently fail to swizzle.
DDSCAPS2_OPAQUE
This indicates to the driver that this surface will not be accessed by the application again. This flag behaves like the
DDSCAPS2_HINTSTATIC flag, but with the addition of allowing actual compression using a hardware-specific
compression scheme. This operation is relatively slow, but should allow simple, symmetric compression schemes
(such as YUV 4:2:0, or color cell compression) to be used, providing compression ratios from 2 to 6x. Asymmetric
schemes such as VQ should not be used here because they result in unacceptable benchmarks.
MIP map textures can be interleaved arbitrarily by the driver. This technique should probably only be requested
outside of internal rendering loops such as when textures are loaded from disk. Heap size reports after such a
texture is loaded reflect the reduced memory consumption if compression was applied. There is additional header
overhead on textures and therefore compressing many small textures does not save as much memory as might be
expected.
In general, there is no guarantee about texture compression ratio, or compression quality implied by this flag.
Surfaces created with this flag fail in the following cases:
Calls to the IDirectDrawSurface7::Lock method.
Calls to the IDirectDrawSurface7::GetDC method.
Subrectangle blts to such surfaces.
All blts from such surfaces.
The only way to put data into such surfaces is with the IDirect3DDevice7::Load method (described in the
Direct3D SDK documentation), or a full surface blt call. For more information about IDirectDrawSurface7::Lock
and IDirectDrawSurface7::GetDC, see the DirectDraw SDK documentation.
Stencil Planes
Stencil planes enable and disable drawing on a per-pixel basis. They are typically used in multipass algorithms to
achieve special effects, such as decals, outlining, shadows, and constructive solid geometry rendering.
Some hardware designed to accelerate Direct3D implements stencil planes. The special effects enabled by stencil
planes are useful for entertainment applications.
Stencil planes are assumed to be embedded in the z-buffer data.
In DirectX 5.0, applications found the available z-buffer bit depths using the DDBD_Xx flags set in the
dwDeviceZBufferBitDepth member of the D3DDEVICEDESC_V1 structure. To support z-buffers with stencil and
z-buffer bit depths that cannot be represented using the existing DDBD_Xx flags, DirectX 6.0 and later versions have
a new API entry point, IDirect3D7::EnumZBufferFormats (described in the Direct3D SDK documentation), which
returns an array of DDPIXELFORMAT structures describing the possible z-buffer/stencil pixel formats. The
DDPIXELFORMAT structure includes the following new z-buffer-related members:
dwStencilBitDepth
Specifies the number of stencil bits (as an integer, not as a DDBD_Xx flag value).
dwZBitMask
Specifies which bits the z-value occupies. If nonzero, this mask means that the z-buffer is a standardized unsigned
integer z-buffer format.
dwStencilBitMask
Specifies which bits the stencil value occupies.
A new flag, DDPF_STENCILBUFFER, indicates the presence of stencil bits within the z-buffer. The
dwZBufferBitDepth member, which existed previously, gives the total number of z-buffer bits including the
stencil bits.
DirectX 6.0 and later versions drivers should still set the appropriate DDBD_Xx flags in dwDeviceZBufferBitDepth
for the z-only z-buffer formats they support. If stencil planes are not supported and the DDBD_Xx flags can
represent all available z-buffer formats, then setting these flags is sufficient, because they are translated into
DDPIXELFORMAT by IDirect3D7::EnumZBufferFormats. Otherwise, the Direct3D driver must respond to a
DdGetDriverInfo query that uses the GUID_ZPixelFormats GUID by returning a buffer in which the first DWORD
indicates the number of valid z-buffer DDPIXELFORMAT structures, followed by the DDPIXELFORMAT structures
themselves.
New render states associated with stencil planes are shown in the following table, which lists the render state, the
type associated with the render state's value, and a description. For more details on these render states, see the
DirectX SDK documentation.
RENDER STATE TYPE DESCRIPTION

RENDER STATE TYPE DESCRIPTION
D3DRENDERSTATE_STENCILFUNC D3DCMPFUNC Comparison function. The test

passes if the following expression is
true:
(ref & mask) OPERATION (stencil &
mask) where ref is the reference
value, stencil is the value in the
stencil buffer, and mask is the value
of
D3DRENDERSTATE_STENCILMASK.
D3DRENDERSTATE_STENCILREF DWORD Reference value used in the stencil

test.
D3DRENDERSTATE_STENCILMASK DWORD Mask value used in the stencil test.
D3DRENDERSTATE_STENCILWRITE DWORD Write mask applied to any values

MASK written to the stencil buffer.
D3DRENDERSTATE_STENCILFAIL D3DSTENCILOP These new render states are

defined, respectively, to inform the
D3DRENDERSTATE_STENCILZFAIL hardware about what to do when
D3DRENDERSTATE_STENCILPASS the stencil test fails, when the
stencil test passes but the z-test
fails, and when both the stencil and
z-tests pass. The values of these
new render states can be set to
enumerators of the D3DSTENCILOP
enumerated type, which specify the
desired stencil operation to be
performed. For more information
about D3DSTENCILOP, see the
DirectX SDK documentation.

Guard Band Clipping
The driver signals that it supports guard band clipping when it fields the GUID_D3DExtendedCaps GUID in
DdGetDriverInfo. A guard band is a rectangle that is potentially larger than the viewport (and even the render
target), to which vertices can be clipped automatically by the driver. The Microsoft Direct3D clipping code clips to
this rectangle instead of to the viewport. By allowing the driver to specify potentially large guard band rectangles,
the need to generate new vertices due to clipping is reduced. One example is a hardware device that can correctly
render as long as screen x and y coordinates fall in the range -2048 through 2047.
Guard band clipping is also beneficial for anti-aliasing hardware, because the filter area can extend outside the
rendering surface extent. This reduces filtering errors that can be introduced if primitives are geometrically clipped
to this extent.
To do the correct clipping, the driver is passed the viewport information. This specifies the actual viewport that the
application requires the geometry to be clipped to. Driver writers who do not want to implement guard band
clipping can ignore this information. It is recommended that drivers do not use this data to implement clipping
through scissors or masking operations because these are likely to be slower than letting Direct3D do the clipping.
Extents Adjustment
Some hardware uses an anti-aliasing kernel that influences pixels outside the extents rectangle defined by the
screen-space vertices. Applications that use the extents rectangle in the D3DCLIPSTATUS structure (defined in
d3dtypes.h) for dirty rectangle processing might experience rendering artifacts because the extents rectangle does
not cover the pixels modified by the hardware.
Direct3D addresses this problem by enabling hardware drivers to request that the extents rectangle be adjusted
outward by a specified number of pixels in the dvExtentsAdjust member of the D3DHAL_D3DEXTENDEDCAPS
structure. This member is filled in response to the GUID_D3DExtendedCaps GUID in DdGetDriverInfo. The extents
rectangle is clipped to the extents of the render target surface for the device. The default is zero.
W-Buffering
Normally, z-buffering uses perspective-correct z for depth comparison and storage in the z-buffer, as this is the z
that the rasterizer iterators must generate in order to maintain planar polygons. Some implementations can
perform hidden surface elimination by filling the z-buffer with depth information expressed as w, or z relative to
the eye. This is what is referred to as w-buffering. This can be achieved by linearly interpolating the vertex 1/w term
specified in the classic transformed and lit vertex structure (TLVERTEX), computing its reciprocal per pixel, and then
using this w value for the depth comparison and conditionally storing it into the depth buffer. For more
information about TLVERTEX, see the Direct3D SDK documentation.
Typically, the hardware stores a floating-point value in the buffer. The following precision formats are common:
SIZE FORMAT
16 bits 12.4
24 bits IEEE single-precision float with no low byte of mantissa.
32 bits Standard IEEE single-precision float.
Conventional z-buffering was developed for the technical markets that use CAD or authoring tools, in which the
viewing volume/workspace is of known and limited extent. The range of depth values stored can therefore be of
limited extent, allowing the ratio of far/near (the distances to the far and near clip planes) to be on the order of two
to ten.
Typical hardware designed for such applications iterates perspective-correct z and stores it directly into the z-
buffer. Due to the mathematics involved, this perspective-correct z is not distributed evenly within the z-buffer
range. Using a far/near ratio of 100 results in 90 percent of the depth buffer range being used on the first 10
percent of the scene depth range. While this may be sufficient for tools, typical applications for entertainment or
visual simulations with exterior scenes require far/near ratios of 1000 to 1 or 10000 to 1. At a ration of 1000 to 1,
98 percent of the range is used on the first two percent of the depth. This can cause hidden surface artifacts in
distant objects, especially when using 16-bit depth buffers.
By contrast, when w (or eye-relative z) is used, the buffer bits can be more evenly allocated between the near and
far clip planes in world space. The key benefit is that the ratio of far to near is no longer an issue, allowing
applications to support a maximum range of miles, yet still get reasonably accurate depth buffering within inches
of the eye point.
W-Buffering API
The D3DRENDERSTATE_ZENABLE render state supports three settings from the D3DZBUFFERTYPE enumerated
type.
VALUE MEANING
D3DZB_FALSE Disables all depth buffering.
D3DZB_TRUE Enables z-buffer using perspective correct z.
D3DZB_USEW Disables z-buffering but enables w-buffering, which is eye-

relative z.
Because the exact format used for storing w varies widely, it should be assumed to be opaque.
Surface allocations and depth-fill operations work identically when using w-buffering. All z-buffer compare modes
work identically in either case.
For more information, see the DirectX SDK documentation.
W-Buffering DDI
The driver supports w-buffering by enabling the D3DPRASTERCAPS_WBUFFER cap in the dwRasterCaps member
of the D3DPRIMCAPS structure. The D3DRENDERSTATE_ZENABLE render state is passed to the driver to enable or
disable w-buffering or z-buffering.
The D3DHAL_DP2VIEWPORTINFO structure supports fields that correspond to world-space front and back clip
planes (hither and yon respectively). This information can be used to adjust fog tables as well.
Bump Mapping
Bump mapping enables a surface to appear wrinkled or dimpled without the need to model these depressions
geometrically. It involves perturbing the angles of the surface normals according to information given in a two-
dimensional "bump map." This causes the local reflection model, where intensity is mainly a function of the surface
normal, to produce local variations on a smooth surface. This deception is evident only when silhouetted, because
in that case the perturbations are no longer visible at the edges. That is, the silhouette follows the line of the model
and is therefore not perturbed. This appears to add texture to a surface, rather than modulating the color of a flat
surface.
Because bump mapping for Direct3D is done by texturing a surface in the rendering phase without perturbing the
geometry, it bypasses modeling problems that would otherwise occur. If the object is polygonal, the mesh has to be
fine enough to receive the perturbations from the texture map. This is a drawback, if the texture is to be an option.
Bump mapping in Direct3D can be described as per-pixel texture coordinate perturbation of diffuse and specular
environment maps. Rasterizers provide information about the contour of the bump map in terms of delta values,
which the system applies to the u and v texture coordinates of an environment map in the next texture stage. The
delta values are encoded in the pixel format of the bump map surface and are integrated with multiple-texture
blending through the D3DTEXTURESTAGESTATETYPE enumerated type. For more information, see the DirectX SDK
documentation.
This technique allows the lighting environment of the scene to be represented in an image environment map
(either for diffuse or specular effects). It permits the lights to be of any number, shape, color, or intensity
distribution that can be represented in such a map. These maps can be created in advance for static cases, or
updated dynamically for changing light sources, by using blts.
Conventional bump mapping schemes derived from Phong shading are limited to spherical light sources of
constant color and fixed fall-off curve. These do not require the additional texture addressing capabilities of per-
pixel environment mapping and can work well for diffuse lighting effects, but cannot produce the visually
structured specular highlights required for photorealism. In the future, use of such techniques will be facilitated by
the integration of environment mapping calculations directly into the Direct3D geometry pipeline.
Bump mapping is commonly provided in photorealistic renderers. It can be used to drastically increase surface
detail effects, without tessellating the surface into large numbers of small triangles. When used with specular
effects, bump maps can simulate reflective yet rough surfaces, such as wet stone or pavement. When supported by
3D accelerators, such effects can be performed in real time, allowing dynamic light source changes.
When a DirectDrawSurface object is created with a bump-map format, that texture is considered to be a bump
map. This object can be bound into the Direct3D device using any of the textures bound to the texture stages. A
programmed texture blending stage can be set to perform the bump map operation by setting
D3DTOP_BUMPENVMAP. This bump-map texture uses the texture coordinates of the flexible vertex format
specified by this stage's texture coordinate identifier to position itself on the rendered objects. It also respects the
specified texture stage's controlling filtering, wrapping, and so on.
The bump values in this texture perturbs the texture coordinates used by the immediately following texture, which
should be considered a specular or diffuse environment map.
Emulation
This methodology can be emulated on any hardware that supports an 8-bit paletted texture mode. The restriction is
that the environment map used (the texture immediately following the bump texture map) must have a resolution
of exactly 16X16 texels, and no filtering. This is indicated by the D3DTEXOPCAPS_MAXBUMPENVMAP16X16 flag.
Other parts have no limits on the size of the environment map that the bump map can perturb.
Bump mapping is enabled with the multiple-texture blending operations D3DTOP_BUMPENVMAP and
D3DTOP_BUMPENVMAPPREMODULATE. The latter operation is a combination of bump mapping and gloss
mapping that allows the specular reflection intensity to be encoded in the same surface as the bump map data.
There are now four additional texture states provided at each stage:
D3DTSS_BUMPENVMAT00
D3DTSS_BUMPENVMAT01
D3DTSS_BUMPENVMAT10
D3DTSS_BUMPENVMAT11
Hardware Transform and Lighting
Hardware acceleration of geometry operations, such as lighting and transformation, has been enabled with
modifications to the D3dDrawPrimitives2 DDI for the latest Direct X release. At the API level, devices that support
vertex operations in hardware are enumerated separately from those that do rasterization only.
The existing caps structures have been extended to indicate features that may be present on a hardware-
accelerated transform device. For example, the number of supported light sources is set with the dwNumLights
member of the D3DLIGHTINGCAPS structure that is reported with the D3DDEVICEDESC_V1 structure.
Other flags are listed in the following table:
FLAG MEANING
D3DDEVCAPS_CANBLTSYSTONONLOCAL The device supports a texture blt from system memory to

nonlocal video memory.
D3DDEVCAPS_DRAWPRIMITIVES2EX The driver is DirectX 7.0-compliant by supporting

extended D3dDrawPrimitives2 capabilities.
D3DDEVCAPS_HWRASTERIZATION The device has hardware acceleration for rasterization.
D3DDEVCAPS_HWTRANSFORMANDLIGHT The device can support both hardware transform and

lighting in hardware.
D3DDEVCAPS_SEPARATETEXTUREMEMORIES The device is texturing from separate memory pools.
D3DTRANSFORMCAPS_CLIP The hardware can clip while transforming.
Because the feature sets of hardware geometry accelerators may differ (such as the number of light sources
supported), the caps structures indicate which subset of geometry operations this device performs. Zero is a valid
value for the number of light sources supported, indicating that the hardware does transformations only.
Only vertices that include a vertex normal are properly lit; for vertices that do not contain a normal, a dot product of
0 is employed in all lighting calculations.
All the key state and data structures used by the software implementation of the geometry pipeline are made
available at the DDI level. Some display cards only implement lighting in the hardware, and do transformation and
clipping on the host processor.
The following render state types pertain only to devices that accelerate transform and lighting:
D3DRENDERSTATE_AMBIENT
D3DRENDERSTATE_AMBIENTMATERIALSOURCE
D3DRENDERSTATE_CLIPPING
D3DRENDERSTATE_CLIPPLANEENABLE
D3DRENDERSTATE_COLORVERTEX
D3DRENDERSTATE_DIFFUSEMATERIALSOURCE
D3DRENDERSTATE_EMISSIVEMATERIALSOURCE
D3DRENDERSTATE_EXTENTS
D3DRENDERSTATE_FOGVERTEXMODE
D3DRENDERSTATE_LIGHTING
D3DRENDERSTATE_LOCALVIEWER
D3DRENDERSTATE_NORMALIZENORMALS
D3DRENDERSTATE_SPECULARMATERIALSOURCE
D3DRENDERSTATE_VERTEXBLEND

Vertex Blending
Vertex blending operations are supported for the latest Direct X release. Vertex blending works in this way: an
object from modeling space is multiplied by a 4X4 world matrix, placing the model's origin in a particular world
space relative to the origin of that world space. One part of the matrix does orientation and another part does the
position. There can be up to three world matrices applied, allowing objects to be "bent" by blending the vertices
with different weighting over the span of the object.
Next, the view matrix is applied, which effectively compresses the space relative to a particular viewpoint; much like
a camera renders the real world onto a two-dimensional picture.
Multimatrix Vertex Blending
Multimatrix vertex blending is a technique for rendering objects with smoothly blended skin joints. While this
particular technique is not common in current software, some kind of smooth-skinning is currently used by most
games involving animated characters.
There are a variety of geometry blending techniques. The technique described here is a form of geometry blending.
This technique blends vertices across the joints of segmented articulated models, providing increased realism with
reduced polygon count. As the joint changes in each frame, the geometry pipeline updates the positions of nearby
vertices to smoothly blend between the segments at that joint. This capability can operate between segments that
are related by joint types other than rotations, such as translations, scales, shears, or any combination that is able to
be represented by a 4X4 transformation matrix.
The general principle is to extend the typical segmented character by blending mesh vertices at joints.
Segmented models are considered to be built up from segments, or coordinate systems, each defined by a
transformation matrix. Traditionally, each vertex of a mesh is considered to belong to only a single segment. This
paradigm has been extended to allow vertices to have partial membership in other segments. Each vertex near the
joint between two or more segments can have a skin weight Beta value, representing the proportion by which it
actually belongs to the other segments.
When used in classic hierarchical modeling, the root segment gets no blending. It is always a rigid body because
there is no other transform with which to blend. The skin weight Beta is defined as the proportion of the parent
coordinate system that is to be used. This will be 0.0 over the entire object if it is entirely rigid-body.
The Beta will be 1.0 for those portions that are rigidly attached to the parent segment. This will drop as the
proportion of the parent segment's contribution to the blend drops. There must be some set of points with 1.0 in
them (and 0.0) in order to maintain geometric continuity between several segments.
Multimatrix vertex blending allows smooth skin blending to be performed by updating the position of each vertex,
without requiring a separate 4X4 transform to be specified for each vertex. It addresses the common case of a
continuous smooth surface and yet requires only one additional value --the skin weight Beta -- per vertex,
minimizing additional bandwidth requirements.
The specified vertex position data is transformed by all active world matrices, and the results are blended together
using corresponding vertex weights.
The same steps are performed for any normal present in the vertex. This produces a single vertex (with position and
normal) that is then fed into the rest of the pipeline for conventional lighting and clipping. For more details, see
Multimatrix Vertex Blending Algorithm.
Characters often can contain the majority of the polygons in a scene, and are a key performance issue. Geometry
pipelines that do not handle smooth-skinning well will have difficulty with a majority of the content.
Furthermore, the number of characters visible is one of the most highly variable components of the scene. This has
serious impact on frame-rate leveling behavior. For interactive applications, level frame rate is more important than
high average frame rate, making efficient character rendering a very critical feature.
Vertex blending must be performed after transformations. If blending is not integrated into the rendering pipeline,
the transformations must be performed twice: once for skinning and once for rendering.
Integrating vertex blending into the geometry pipeline allows this case to be handled with a single traversal of the
vertex data, which makes it just as fast as unblended vertices in bandwidth limited cases.
The following sections provide implementation details for multimatrix vertex blending:
Multimatrix Vertex Blending Algorithm
FVF Code Changes
D3DTRANSFORMSTATE Changes
D3DRENDERSTATETYPE Changes
Multimatrix Vertex Blending Algorithm
The multimatrix vertex blending algorithm assumes that only two matrices are being used.
On matrix change, update the following four matrices:
MATRIX COMPUTATION
Current Transform Matrix (CTM) CTM = WORLD * VIEW * PROJ
Secondary CTM (parent coordinates) CTM2 = WORLD1 * VIEW * PROJ
Inverse transpose of CTM ITCTM = (CTMT)⁻¹
Inverse transpose of CTM2 (required if lighting) ITCTM2 = (CTM2T)⁻¹
In some cases it may be more efficient to blend the matrices first using the vertex's weights, and then do only one
(matrix)X(vertex) multiplication.
FVF Code Changes
The key API impact of multimatrix vertex blending is the addition of the vertex blending weight parameters to the
position component of the flexible vertex format (FVF). These parameters are stored as 32-bit IEEE single precision
floats. They are indicated as present in the input vertex data by the addition of four new bit patterns for the FVF
code: D3DVFV_XYZB2, D3DVFV_XYZB3, D3DVFV_XYZB4, and D3DVFV_XYZB5.
These codes identify extra DWORDs of space that may alternatively be allocated to other uses, such as particle
radius or fog parameter, depending on which features are enabled.
Note If the number of blend weights specified is one less than the number of matrices currently being blended,
then the weight assigned to the last matrix is defined to be (1.0 - B ), where B is the total of the other weights for
that vertex.
D3DTRANSFORMSTATE Changes
Multimatrix blending also requires the specification of three additional world transform matrices.
In addition to the original world transform matrices, D3DTRANSFORMSTATE_WORLD (which might be thought of
as "D3DTRANSFORMSTATE_WORLD0"), D3DTRANSFORMSTATE_VIEW, and
D3DTRANSFORMSTATE_PROJECTION, there are now the following world transform matrices, which are described
in the DirectX SDK documentation:
D3DTRANSFORMSTATE_WORLD1, the second matrix to blend
D3DTRANSFORMSTATE_WORLD2, the third matrix to blend
D3DTRANSFORMSTATE_WORLD3, the fourth matrix to blend
Note that these are not consecutively enumerated after the original D3DTRANSFORMSTATE_WORLD.
Matrices that are not defined by this call, but are enabled for blending, are assumed to default to identity matrices.
D3DRENDERSTATETYPE Changes
A new render state has been defined to enable and control multimatrix vertex blending operations:
D3DRENDERSTATE_VERTEXBLEND, which is described in the DirectX SDK documentation. The value of this render
state can be one of the following D3DVERTEXBLENDFLAGS enumerators:
D3DVBLEND_DISABLE (use only the world matrix specified by the D3DTRANSFORMSTATE_WORLD
transformation state)
D3DVBLEND_1WEIGHT (blend between the two world matrices specified by the
D3DTRANSFORMSTATE_WORLD and D3DTRANSFORMSTATE_WORLD1 transformation states)
D3DVBLEND_2WEIGHTS (blend between the three world matrices specified by the
D3DTRANSFORMSTATE_WORLD, D3DTRANSFORMSTATE_WORLD1, and
D3DTRANSFORMSTATE_WORLD2 transformation states)
D3DVBLEND_3WEIGHTS (blend between four world matrices specified by the
D3DTRANSFORMSTATE_WORLD, D3DTRANSFORMSTATE_WORLD1, D3DTRANSFORMSTATE_WORLD2,
and D3DTRANSFORMSTATE_WORLD3 transformation states)
For a description of the D3DTRANSFORMSTATE_WORLD n transformation states, see D3DTRANSFORMSTATETYPE
in the DirectX SDK documentation.
Even though additional blending world matrices have been defined with the IDirect3DDevice7::SetTransform
method (described in the Direct3D SDK documentation), the contributions (that is, weights) of any matrices beyond
the number specified in this render state are set to zero.
User Clip Planes
User-defined clip planes are enabled for the latest DirectX release. These work just like other clip planes but they are
settable by the application. The driver must handle these planes by responding to the D3DDP2OP_SETCLIPPLANE
operation code in D3dDrawPrimitives2.
Texture Coordinate Transformations
Texture coordinate transformations are enabled for the latest Direct X release.
Texture transforms are vertex-level transformation operations. These operations can be performed by any
transform and lighting-enabled HAL driver and by any HAL device type.
Texture transforms are enabled by defining 4X4 matrices associated with each texture stage. All of the key state and
data structures used by the software implementation of the geometry pipeline are available at the DDI level.
Using texture transformations, texture coordinates can be moved, relative to the vertices they are being drawn
from. The texture transform describes how the texture coordinates are moved inside the texture map. Every time a
texture transform is applied, the matrix changes the coordinates of the texture. The standard world matrix is used in
these transformations.
At the API level, the IDirect3DDevice7::SetTransform method (described in the Direct3D SDK documentation)
defines the texture transform matrix associated with the texture currently bound to the i-th texture stage. This
matrix should be applied to the texture coordinates of that texture during rendering. Note that the same vertex-level
texture coordinate set can be used at different stages with a separate texture transform matrix that may apply
different textures.
The operation of the texture transform is controlled by the IDirect3DDevice7::SetTextureStageState method
(described in the Direct3D SDK documentation) using the D3DTSS_TEXTURETRANSFORMFLAGS flag.
The D3DTEXTURETRANSFORMFLAGS enumerated type, described in the DirectX SDK documentation, is used to
control the dimension of the texture coordinate set that is generated by the texture coordinate transformation
operation, and to control whether any perspective division should be applied to it.
D3DTTFF_DISABLE indicates that the texture coordinates are passed directly to the hardware.
The number of the D3DTTFF_COUNTn flag signals how many texture coordinates are going to be present. Note that
this does not necessarily equate to the dimension of the textures themselves, if projected textures are used.
If projected textures are used, the D3DTTFF_PROJECTED flag is set to indicate that the texture coordinates are to be
divided by the last (COUNTth) element of the texture coordinate set. Thus, for a 2D projected texture, the count is
three, because the first two elements are divided by the third, resulting in two floats for a 2D texture lookup. That is,
both D3DTTFF_COUNT2 and D3DTTFF_COUNT3 | D3DTTFF_PROJECTED refer to a 2D texture.
For nonprojected textures:
D3DTTFF_COUNT4 indicates the rasterizer should expect 4D texture coordinates.
The first byte encodes a count of texture coordinates expected to be used by the texture at a particular stage. Setting
this to zero causes no texture transformation to be applied even if one was defined by the
IDirect3DDevice7::SetTransform method. This is the number that is output from the texture coordinate
transform stage.
The number of texture coordinates passed into the texture transform is defined by the Direct3D API FVF code.
D3DTSS_TEXTURETRANSFORMFLAGS defaults to D3DTFF_DISABLED with no D3DTTFF_PROJECTED flag set. The
texture transform matrices default to 4X4 identity matrices.
On nontransform-and-lighting HAL devices (that is, those that require transformation operations on the host
processor), the output vertices are provided to the driver with the appropriate DDI-level FVF code that may differ
from the one specified at the API level. Devices that do not support hardware-accelerated transform and lighting
may still do projected textures, and therefore the drivers must still respond to the
D3DTSS_TEXTURETRANSFORMFLAGS.
Cube Environment Map Support
Multiple texture support in Direct3D allows the use of environment maps for lighting and reflections. However,
single-map 360-degree solutions like circular or spherical maps are not robust enough to be widely used in real
time. The most suitable solution for real-time generation and addressing of a 360-degree environment is a cubical
map, composed of six textures (faces). Each face can be generated by pointing a camera with a 90-degree field-of-
view in the appropriate direction. Per-vertex vectors (normal, reflection, or refraction) are provided to the
rasterization hardware that then iterates them across the polygon and calculates the intersections of the
interpolated vectors with the faces of the cube map. If the application or API generates the cube environment map,
the driver does not require the information about the transformation matrix or coordinate space in which the per-
vertex vectors are defined. This is because the vectors are used only to address the environment map, which is
logically in the same coordinate space.
Addressing of a circular map involves vector normalization; addressing of a spherical map requires the use of
trigonometric functions. All types of single-map environments are nonlinear: a circular map is extremely distorted
and anisotropic near its periphery, while a spherical map has large distortions near its poles. This makes it
necessary to re-create the environment map every time the viewpoint changes in such a way that the central view
area becomes distorted.
Cubical environment maps, formed by pointing a real or simulated camera with 90-degree field-of-view in six
different directions, are free of these disadvantages. They can be generated faster but need to be updated less
frequently, have fewer distortions, and can be addressed by using equations similar to the ones already used for
perspective-correct texture mapping.
In general, cube maps are the best choice to provide real-time environment mapping for complex lighting and
reflections.
The cube map enables are passed to the driver using the D3dDrawPrimitives2 render state mechanism. FVF
texture coordinates are passed with FVF code 01 for that texture coordinate set.
The cube map is defined in world coordinates; that is, its world transform matrix is the identity matrix. The cube
map can appear to be in a different space if texture transforms are used on the corresponding texture coordinate
indexes. These texture coordinate indexes will be looking directly at face four, the +z face. Y is up by default. The
origin of the (u,v) texel grid is at the upper left corner of each face, in order to allow creation of the face without any
additional transformations by pointing camera from the center of the cube.
DirectDrawCreateEx takes a flag that indicates a cube map is to be created. Some faces need not be allocated at
the API level, although drivers may pad as required. The surface descriptor contains a bitfield with six bits indicating
the faces that the application expects to use. When faces are attained with the
IDirectDrawSurface7::GetAttachedSurface method , the NULL faces are skipped. The dimensions of each face
are available from its surface descriptor, and the face bitcode field indicates which face it is. For more information
about DirectDrawCreateEx and IDirectDrawSurface7::GetAttachedSurface, see the DirectDraw SDK
documentation.
The pointer returned from DirectDrawCreate is actually a pointer to the first non-NULL face in the cube. The face
identifier can be obtained by taking the surface's bitcode. This is the pointer that is passed to the
IDirect3DDevice7::SetTexture method (described in the Direct3D SDK documentation) to make this map
available in the multiple-texture pipeline.
If any of the surfaces are intended to be rendered to, the cube map must be created with the
D3DPTEXTURECAPS_CUBEMAP cap flag set.
Any faces not created by the call are assumed to be filled with the color specified in the surface descriptor's
dwEmptyFaceColor member. (See the DDSURFACEDESC2 structure.)
Note Current restrictions: All cube faces must be the same size and must be square. The cube faces can be MIP
mapped. No color keying is supported with cube map textures. As with other textures, alpha channels and alpha
palettes are supported.
Integration of Environment Mapping with Standard
Diffuse Lighting
Standard diffuse lighting calculations for Gouraud shading of diffuse components can be performed in the
geometry pipeline. However, for best realism, the specular component can be generated by cube environment
mapping. When both are in operation, the FVF specified by the application should contain one vector in the position
for the lighting computation, and a separate 3D texture coordinate set for use as the vector into the cube map.
Vertex and Pixel Fogging
Fog has three main profile types: linear, exponential, and exponential squared. There are also two main
implementation methods: vertex fog (also known as iterated or local fog) and pixel fog (also known as table or
global fog).
In monochromatic (ramp) lighting mode, fog can work properly only when the fog color is black. (If there is no
lighting, any fog color works because fog is rendered as black.) Fog can be considered a measure of visibility -- the
lower the fog value, the greater the fog effect and the less visible the object.
The fog blending factor f is used in all fog calculations. It stands for the proportion of fog color versus object color.
The final color is determined by the following formula:
Color = f * objColor + (1.0 - f) * fogColor
Therefore, a fog blending factor of 0.0 is full fog color and a fog blending factor of 1.0 is full object color. Typically, f
decreases with distance.
As the following figure shows, linear fog density increases in a linear fashion as the distance increases.
This linear increase differs from exponential fog where the fog density increases exponentially. A linear fog profile
might be set up as follows: the D3DRENDERSTATE_FOGSTART render state is set to ZFront and f = 1.0; the
D3DRENDERSTATE_FOGEND render state is set to ZBack and f = 0.0. The D3DRENDERSTATE_FOGDENSITY render
state is ignored.
Vertex Fog
Vertex fog is enabled using the D3DRENDERSTATE_FOGENABLE render state. Vertex fog can be made perspective-
correct. For scenes with large polygons, vertex fog might not be the better choice, because the vertices are farther
apart.
Vertex fog can be used in two ways:
1. Make use of Direct3D lighting code using D3DVERTEX structures (described in the Direct3D SDK
documentation) to define the fog blending factor.
2. Make use of D3DLVERTEX or D3DTLVERTEX structures (both are described in the Direct3D SDK
documentation). This is useful for doing custom fog effects such as layered fog, range-based fog, and
volume fog.
When using fog with the D3DVERTEX structure, PRIMCAPS.dwRasterCaps has the
D3DPRASTERCAPS_FOGVERTEX flag set. Individual calls to the IDirect3DDevice7::SetRenderState method are
used to set D3DRENDERSTATE_FOGENABLE to TRUE, and D3DRENDERSTATE_FOGCOLOR to a 24-bit RGB color.
For linear fog, the IDirect3DDevice7::SetRenderState method is used to set D3DRENDERSTATE_FOGSTART and
D3DRENDERSTATE_FOGEND. For exponential and exponential squared fog, the D3DRENDERSTATE_FOGDENSITY
render state is set with D3DLIGHTSTATETYPE. For more information, see the Direct3D SDK documentation.
When using fog with the D3DTLVERTEX structure, the IDirect3DDevice7::SetRenderState method is used to set
D3DRENDERSTATE_FOGENABLE to TRUE, set the color of the fog in D3DRENDERSTATE_FOGCOLOR, and set
D3DRENDERSTATE_FOGTABLEMODE to D3DFOG_NONE (this is set in the D3DTLVERTEX structure itself). A fog
blend factor f is defined at each vertex. This is the alpha component of the specular RGBA.
The following figure illustrates a sample relationship between altitude and fog density in a layered atmosphere
model.
The fog blending factor is calculated during the lighting phase and is placed in the alpha component of the specular
color value in the vertex. This should then be interpolated according to the current shade mode set up by the
D3DRENDERSTATE_SHADEMODE render state.
The fog blending factors for vertices v1, v2, and v3 are determined by using the following calculations.
f1, f2, and f3 are interpolated across the triangle based on the current shade mode.
The new color, C, is obtained from the following formula:
C = (1- f) * fog_color + f * src_color
In this formula,
f is the source, interpolated fog blending factor
fog_color is the current fog color (set by the render state D3DRENDERSTATE_FOGCOLOR)
src_color is the source, interpolated, textured color
If f, the fog blending factor, is 0.0, then C is set to a value identical to the fog color. If f is 1.0 there is no fog effect.
Fog factor in a vertex is a function of the distance from the camera position to the vertex. This distance could be
approximated by taking only Z value in camera space. For per-vertex fog we compute (Xc ,Yc ,Zc ) in camera space by
transforming the vertex using Mworld*Mview and then compute the distance to the vertex.
In RGB mode, fog factor f is scaled to be in the range 0 through 255, and is written to the alpha component of the
specular output color.
In Ramp mode, diffuse and specular components are multiplied by the fog factor f and are clamped to be in the
range 0.0 through 1.0.
Pixel Fog
Pixel fog is similar to vertex fog, but the fog blending factor, f, is calculated at rasterization time rather than at
lighting time. Pixel fog is more accurate than vertex fog. The fog blending factor specified in the LVERTEX or
D3DTLVERTEX structures (described in the Direct3D SDK documentation) is ignored. Pixel fog is limited to the three
fog profile types (linear fog, exponential fog, and exponential squared fog).
For pixel fog, the hardware does a table look-up on the depth value interpolated at each pixel. There is no need for
256 table entries if they are being interpolated linearly in-between. Future releases of DirectX may provide
compensation for nonlinear z-distribution. ZFront and ZBack have values in the interval [0.0, 1.0].
Pixel fog is used by setting the PRIMCAPS.dwPRasterCaps to D3DPRASTERCAPS_FOGTABLE. The
D3DDevice7::SetRenderState method sets the D3DRENDERSTATE_FOGENABLE render state to TRUE; the
D3DRENDERSTATE_FOGCOLOR render state to 24-bit RGB; and the D3DRENDERSTATE_FOGTABLEMODE render
state to one of D3DFOG_LINEAR, D3DFOG_EXP, or D3DFOG_EXP2. Here, the fog blending factor is calculated
according to the three render states as follows:
fStart is determined by the render state D3DRENDERSTATE_FOGSTART and is in the interval [0.0, 1.0].
fEnd is determined by the render state D3DRENDERSTATE_FOGEND and is in the interval [0.0, 1.0].
fDensity is determined by the render state D3DRENDERSTATE_FOGDENSITY and is in the interval [0.0, 1.0].
The calculation of the fog blending factor f is based on z and the three fog render states just described. The actual
calculation depends on the render state D3DRENDERSTATE_FOGTABLEMODE. Only D3DFOGMODE_LINEAR uses
the fog start and end values.
D3DFOGMODE_NONE
No pixel fog is applied.
D3DFOGMODE_LINEAR
Linear fog growth.
D3DFOGMODE_EXP
Exponential fog growth.
D3DFOGMODE_EXP2
Exponential squared fog growth.
Typically, exponential and exponential squared fog are too expensive to do directly. Instead, look-up tables are
precalculated for a number of z values in the interval [0.0, 1.0] using the current fog density. The nearest table entry
can then be used for the current z value, or an interpolating value between the two surrounding z values can be
used to get the appropriate fog factor.
The final fogged color C is then calculated in the same manner as vertex fog as follows:
C = (1-f) * fog_color + f * src_color
In this formula,
f is the fog blending factor
fog_color is the current fog color (set by the render state D3DRENDERSTATE_FOGCOLOR)
src_color is the source, interpolated, textured color
If f, the fog blending factor, is 0.0, then the final fogged color is identical to the fog color. If f is 1.0, there is no fog
effect.
Range Based Fog
Fog can also be range-based. With normal z, or depth-based, fog an object can appear at the side of the view, but
then as the viewer rotates toward it, the object disappears back into the fog because its z-value changes.
However, if fog is based on range instead of depth, it does not vary as the viewer rotates in place, as illustrated in
the following figure.
Objects that are visible remain visible, regardless of the rotation. This is compelling for flight simulators, tank
games, and other applications where it is undesirable to have objects disappearing and reappearing in the distance
as the viewer rotates.
To set up fog to be range-based, D3DPRASTERCAPS_FOGRANGE and the D3DRENDERSTATE_RANGEFOGENABLE
render state should be set. This render state works only with D3DVERTEX vertices. When the application specifies
D3DLVERTEX or D3DTLVERTEX vertices, the F (fog) component of the RGBF fog value should already be corrected
for range. The D3DVERTEX, D3DLVERTEX, and D3DTLVERTEX structures are defined in the Direct3D SDK
documentation.
The following topics contain update information for DirectX 7.0, concerning flexible vertex format (FVF), the
rasterizer, and MIP Map surface creation:
FVF Update
Rasterizer Update
MIP Map Surface Creation Update
FVF Update
The FVF codes originally defined in DirectX 6.0 now support the specifications for texture coordinate sets in DirectX
7.0.
In addition to the normal 2D textures, supported in DirectX 6.0, DirectX 7.0 supports 1D, 3D, and 4D textures. In
addition, the textures may be projected. The dwVertexType member of D3DHAL_DRAWPRIMITIVES2DATA can
be examined when D3dDrawPrimitives2 is called, to determine the dimensions of each texture coordinate set.
For example, if there is a vertex with five texture coordinate sets, each one of these textures can be 1D, 2D, 3D, or
4D and they may be projected textures. Each texture stage is independent, so the dimensions can be different for
each set of coordinates. The upper 16 bits of the FVF flag contained in dwVertexType can be examined to
determine the dimensions of the texture coordinates.
The texture coordinate count is a 4-bitfield that can range from zero through eight. This gives the number of
texture coordinate sets given in the upper 16 bits of the word. The upper 16 bits of the FVF code are allocated as
two bits each for each of eight texture coordinate sets. The meaning of the texture coordinate bits is as follows:
BIT DECIMAL
PATTERN VALUE MEANING
00 0 Two-dimensional texture coordinate

pair, (u, v) as in DirectX 6.0
01 1 Three-dimensional texture
coordinate triple, (u, v, q)
10 2 Four-dimensional texture
coordinate quadruple, (u, v, w, q)
11 3 One-dimensional texture
coordinate, u
3D texture coordinate sets can be used for any of three different purposes: projected textures (signaled by
D3DTTFF_PROJECTED - see D3DTEXTURETRANSFORMFLAGS in the DirectX SDK documentation), volume textures,
or cube map vector textures, as determined by a set of render states analogous to the D3DRENDERSTATE_WRAP0
to D3DRENDERSTATE_WRAP7 modes that are already specified on a texture coordinate set basis.
The flags used with the D3DRENDERSTATE_WRAPn render states for 1D through 4D texture coordinates,
respectively, are described in the following table.
FLAGS MEANING
D3DWRAPCOORD_0 Same as D3DWRAP_U, which specifies wrapping in the u

coordinate direction.
FLAGS MEANING
D3DWRAPCOORD_1 Same as D3DWRAP_V, which specifies wrapping in the v

coordinate direction.
D3DWRAPCOORD_2 Specifies wrapping in the w coordinate direction.
D3DWRAPCOORD_3 Specifies wrapping in the q coordinate direction.
When projected textures are in use, they take the RHW value from the corresponding texture coordinate field,
instead of from the position field. However, the position field's RHW is still used for both w-buffering and fog
calculations, and therefore must be provided when either of these is in use.
Rasterizer Update
The reference rasterizer has been extracted into a separate DLL to enable additional WHQL tests to be added
asynchronously of normal DirectX ship cycles (quarterly is typical). It has been updated to support any of the
rasterizer-level operations added to the API either in the core or as extensions that require guaranteed consistency
of implementation.
The production rasterizer may not be updated to support these techniques, because environment mapping at the
vertex level is likely to be faster than at the pixel level when running in software.
This rasterizer is likely to be upgraded in terms of performance on key cases that are identified.
MIP Map Surface Creation Update
Before DirectX 7.0, the attachment chain for a MIP map usually consisted only of the sublevels of that MIP map.
With the advent of cubic environment maps, this is no longer the case. Each face of a cubic environment may itself
be a MIP map and, as such, the attachment chain of a surface forming one face of a cubic environment map can
consist of links to the other faces of the cube map as well as links to sublevels of the MIP map.
As the attachment chain of a MIP map surface can now contain links to surfaces other than simply a lower level MIP
map, a new capability bit has been introduced, DDSCAPS2_MIPMAPSUBLEVEL (see the DDSCAPS2 structure for
this and the following flags). This bit is set for all but the top-level surface of a MIP map chain. Thus, given the top-
level surface of a MIP map chain you can find the surface representing the next lowest level of the MIP map chain
by traversing the attachment list of the top-level surface looking for a surface with the
DDSCAPS2_MIPMAPSUBLEVEL capability bit set.
To determine if a surface is a face of a cubic environment map, check for the surface capability bit
DDSCAPS2_CUBEMAP. If the DDSCAPS_MIPMAP capability bit is not set, the attachment list of this surface consists
of the other faces of the cube map that are being created (check for the capability bits
DDSCAPS2_CUBEMAP_POSITIVEX, DDSCAPS2_CUBEMAP_NEGATIVEX, DDSCAPS2_CUBEMAP_POSITIVEY,
DDSCAPS2_CUBEMAP_NEGATIVEY, DDSCAPS2_CUBEMAP_POSITIVEZ, DDSCAPS2_CUBEMAP_NEGATIVEZ).
If the DDSCAPS_MIPMAP capability bit is set then the attached surface list of the cube map surface consists of links
to the other faces of the cube map as above and also to the next level of the MIP map for this face. The sublevel of
the MIP map can be identified via the DDSCAPS2_MIPMAPSUBLEVEL bit described above.
D3DRENDERSTATE_MIPMAPLODBIAS
Although this render state is obsolete, its functionality has been moved into the texture stage states, that is,
D3DRENDERSTATE_MIPMAPLODBIAS is exactly equivalent to the D3DTSS_MIPMAPLODBIAS enumerator in the
D3DTEXTURESTAGESTATETYPE enumerated type for texture stage zero. On receiving the
D3DRENDERSTATE_MIPMAPLODBIAS render state through a legacy interface, your driver should simply map this
to the same code that handles D3DTSS_MIPMAPLODBIAS for texture stage zero.
The MIP map LOD bias is a floating-point value used to change the level of detail (LOD) bias. This value offsets the
value of the MIP map level that is computed by trilinear texturing. It is usually in the range -1.0 to 1.0; the default
value is 0.0. Current WHQL/DCT tests require the MIP map LOD bias to operate in the range -3.0 to 3.0.
Each unit bias (+/-1.0) biases the selection by exactly one MIP map level. A negative bias causes the use of larger
MIP map levels, resulting in a sharper but more aliased image. A positive bias causes the use of smaller MIP map
levels, resulting in a blurrier image. Applying a negative bias also results in the referencing of a smaller amount of
texture data, which can boost performance on some systems.
Note DirectX 9.0 and later applications can use the D3DSAMP_MIPMAPLODBIAS value in the
D3DSAMPLERSTATETYPE enumeration to control the level of detail bias for mipmaps. The runtime maps user-
mode sampler states (D3DSAMP_Xxx) to kernel-mode D3DTSS_Xxx values so that drivers are not required to
process user-mode sampler states. Drivers still should process the D3DTSS_MIPMAPLODBIAS value. For more
information about D3DSAMPLERSTATETYPE, see the latest DirectX SDK documentation.
The following section contains update information for DirectX 8.0 and it focuses on those areas of the DDI that have
been modified or extended for DirectX 8.0.
Header Files for DirectX 8.0 Drivers
A DirectX 8.0 display driver's source code must include the d3d8.h header file. The header files d3d8caps.h and
d3d8types.h are included in d3d8.h.
The DirectX 8.0 Driver Development Kit (DDK) introduces a new DDI-only header file called d3dhalex.h. This header
file contains optional helper definitions and macros. Currently, this header contains some macros to assist with
reporting D3DCAPS8 to the runtime.
DIRECT3D_VERSION
A DirectX display driver must support DirectX 7.0 and earlier versions of the DirectX runtime. To do that, it is
necessary to include both old and new DirectX headers, for example d3d.h and d3d8.h. However, this can cause a
problem with the definition of the preprocessor symbol DIRECT3D_VERSION. This preprocessor symbol is used in
the header files to indicate which structures and functions should be included. If the DIRECT3D_VERSION has not
already been defined, the DirectX header files set the value of DIRECT3D_VERSION to the most recent version they
were designed for. Thus, d3d.h sets DIRECT3D_VERSION to 0x0700 and d3d8.h sets DIRECT3D_VERSION to
0x0800. If d3d.h is included in your source before d3d8.h, new Direct3D 8.0 features are not defined and compiler
errors will result.
To avoid this, define DIRECT3D_VERSION to 0x0800 before including any header files. In order to get all the
necessary symbols in header files, include d3d8.h before winddi.h or d3dnthal.h.
Minimal DirectX 8.0 DDI Support
DirectX 8.0 provides hardware acceleration by DirectX 7.0 level drivers. However, for a driver to expose any of the
new features of DirectX 8.0 such as multiple vertex streams, index buffers, or vertex and pixel shaders, it must
identify itself by reporting DirectX 8.0 style capabilities and support the new D3dDrawPrimitives2 rendering
tokens. In order to support the new D3dDrawPrimitives2 rendering tokens the driver is required to provide basic
support for vertex streams and fixed function vertex shaders.
Reporting DirectX 8.0 style capabilities involves the following steps:
Handling the new GetDriverInfo2 variant of the existing DdGetDriverInfo entry point.
Returning a D3DCAPS8 structure containing the capabilities of the device when requested.
Ensuring that defined fields of that structure have certain minimum values.
Returning a texture format list that includes DirectX 8.0 style surface format descriptions.
These various requirements are discussed in the following sections.
Supporting GetDriverInfo2
The DirectX 8.0 DDI introduces a new mechanism for querying the driver for information. This mechanism
extends the existing DdGetDriverInfo entry point to query for additional information from the driver. Currently,
this mechanism is only used for querying for DX8 style D3D caps.
Note As you read the following you may question why the GetDriverInfo2 mechanism is necessary. It would
seem preferable to simply define a new GetDriverInfo GUID that the driver would handle by returning a
D3DCAP8 structure. GetDriverInfo2, introduced in the following paragraphs, is a mechanism to minimize the
changes required to the Windows Operating Systems to enable DirectX 8.0 level functionality and thus make
redistributing the DirectX 8.0 runtime practical.
This extension to GetDriverInfo takes the form of a DdGetDriverInfo call with GUID_GetDriverInfo2. When a
DdGetDriverInfo call with that GUID is received by the driver, it must examine the data structure passed in the
lpvData field of the DD_GETDRIVERINFODATA data structure to see what information is being requested. As
described below, lpvData can point to either a DD_GETDRIVERINFO2DATA or DD_STEREOMODE structure.
The GUID_GetDriverInfo2 is the same GUID value as GUID_DDStereoMode. If your driver does not handle
GUID_DDStereoMode, this is not an issue. However, if your DirectX 8.0 driver handles GUID_DDStereoMode, note
that when a call to DdGetDriverInfo with the GUID_GetDriverInfo2(GUID_DDStereoMode) is made, the runtime
sets the dwHeight field of the DD_STEREOMODE structure to the special value D3DGDI2_MAGIC. This field
corresponds to the dwMagic field of the DD_GETDRIVERINFO2DATA structure. Therefore, by casting the
lpvData pointer to either a pointer to a DD_STEREOMODE structure or a pointer to a
DD_GETDRIVERINFO2DATA structure and checking the value of the corresponding field (dwHeight or
dwMagic) for the value D3DGDI2_MAGIC, you can distinguish between a call to determine stereo mode
capabilities or a request of Direct3D 8.0 capabilities.
Once the driver has determined that this is a call to GetDriverInfo2 it must then determine the type of
information being requested by the runtime. This type is contained in the dwType field of the
DD_GETDRIVERINFO2DATA data structure.
Finally, the driver copies the requested data into the supplied buffer. It is important to note that the lpvData field
of the DD_GETDRIVERINFODATA data structure points to the buffer to which to copy the requested data.
lpvData also points to the DD_GETDRIVERINFO2DATA structure. This means that the data returned by the driver
overwrites the DD_GETDRIVERINFO2DATA structure (and, therefore, that the DD_GETDRIVERINFO2DATA
structure occupies the first few DWORDs of the buffer).
In order to be called with GetDriverInfo2, and report DirectX 8.0 capabilities, it is necessary for the driver to set
the new flag DDHALINFO_GETDRIVERINFO2 in the dwFlags field of the DD_HALINFO structure returned by the
driver. If this flag is not set, the runtime does not send GetDriverInfo2 calls to the driver and the driver is not
recognized as a DirectX 8.0 level driver.
The runtime uses GetDriverInfo2 with type D3DGDI2_TYPE_DXVERSION to notify the driver of the current DX
runtime version being used by the application. The runtime provides a pointer to a DD_DXVERSION structure in
the lpvData field of DD_GETDRIVERINFODATA.
Reporting DirectX 8.0 Style Direct3D Capabilities
In response to a GetDriverInfo2 query with type D3DGDI2_TYPE_GETD3DCAPS8, the driver should copy an
initialized D3DCAPS8 structure into the lpvData field of the DD_GETDRIVERINFODATA structure. This structure
is new for DirectX 8.0 and is used for both reporting capabilities from the driver to runtime and from the runtime
to the application.
D3DCAPS8 has fields that describe both capabilities new to DirectX 8.0 and capabilities carried forward from
DirectX 7.0. D3DCAPS8 is not a complete replacement for existing capabilities. Although this structure (along with
information of supported surface formats) is a complete description of the device's capabilities from an API
perspective, it is not sufficient for the DDI. The runtime makes use of the DirectDraw capabilities reported by the
driver for such information as supported surface capabilities (DDSCAPS) even though these are not exposed
directly through the DirectX 8.0 API.
Furthermore, the driver is required to continue to report legacy capability structures (such as
D3DHAL_D3DEXTENDEDCAPS) as applications using legacy interfaces (DirectX 7.0 and earlier) continue to
request these capabilities. Therefore, reporting DirectX 8.0 style caps through D3DCAPS8 is an additional
requirement, rather than a replacement for the existing capability reporting mechanisms. When DirectX 8.0
interfaces are used by the application the runtime does not query for extended D3D capabilities such as
D3DHAL_D3DEXTENDEDCAPS if the driver reports DirectX 8.0 capabilities with D3DCAPS8.
D3DCAPS8 is described in the DirectX 8.0 SDK documentation. The driver should not initialize the DeviceType or
AdapterOrdinal fields. These are initialized to appropriate values by the runtime. The driver should set these
fields to zero.
Minimum Capability Requirements for DirectX 8.0
Drivers
In addition to returning the D3DCAPS8 data structure in response to a GetDriverInfo2 query, the DirectX 8.0
runtime has other requirements that a driver must meet to be considered a DirectX 8.0 level driver.
A DirectX 8.0 driver must explicitly:
Report support for one or more vertex streams in the MaxStreams field of D3DCAPS8.
Report a maximum point sprite size of at least one in the MaxPointSize field of D3DCAPS8.
Modify its list of supported texture formats to support new style pixel format specifications.
Handle the new D3dDrawPrimitives2 (DP2) drawing tokens.
Handle D3dCreateSurfaceEx for vertex and index buffers even if your driver does not support video
memory vertex buffer creation. Handles for system memory vertex and index buffers are passed to the
driver.
Set the new posttransformed clipping flag D3DPMISCCAPS_CLIPTLVERT if the hardware supports clipping of
posttransformed vertex data.
It should be noted that a driver is not required to support any of the new features of DirectX 8.0 such as pixel or
vertex shaders, volume textures, point sprites (beyond the nonzero maximum point size), multisampling or even
multiple vertex streams (as the driver can set the maximum number of simultaneous vertex streams to one) in
order to be considered a DirectX 8.0 driver.
Reporting Support for Video Memory Vertex and
Index Buffers
DirectX 8.0 has added two new Direct3D capability flags that flag to the runtime whether the driver does or does
not support video memory vertex and index buffers. Before these flags were created, the runtime could not
determine whether the driver provided real support for video memory vertex buffers or not. Therefore, if a DirectX
8.0 exports the execute buffer (d3d buffer) creation and destruction driver entry points it should add one or both of
the capability bits D3DDEVCAPS_HWVERTEXBUFFER and D3DDEVCAPS_HWINDEXBUFFER to the DevCaps field of
the D3DCAPS8 structure reported via GetDriverInfo2 to the runtime. Set the flag
D3DDEVCAPS_HWVERTEXBUFFER if your driver supports video or nonlocal video memory vertex buffers and
D3DDEVCAPS_HWINDEXBUFFER if your driver supports video or nonlocal video memory index buffers.
The runtime masks these capability bits off before reporting capabilities to the application (they are not useful to
applications only to the runtime itself). Therefore, these capabilities are not visible to the DirectX Caps Viewer
application even if your driver exports them.
Correct support for these capabilities is part of Microsoft Windows Hardware Quality Labs (WHQL) testing.
Setting Presentation Swap Intervals
A driver should always set the PresentationIntervals member of the D3DCAPS8 structure to zero when it reports
the capabilities of its Direct3D hardware. The runtime then assigns the D3DPRESENT_INTERVAL_ONE value as the
default. In addition, the runtime assigns the following presentation swap intervals depending on how the driver
specifies capability bits in the Caps2 member of D3DCAPS8:
If the driver specifies the DDCAPS2_FLIPNOVSYNC bit, the runtime also sets PresentationIntervals to
D3DPRESENT_INTERVAL_IMMEDIATE.
If the driver specifies the DDCAPS2_FLIPINTERVAL bit, the runtime also sets PresentationIntervals to
D3DPRESENT_INTERVAL_TWO, D3DPRESENT_INTERVAL_THREE, and D3DPRESENT_INTERVAL_FOUR.
The Texture Format List
Direct 8.0 introduces a new mechanism for describing pixel formats. In previous versions of DirectDraw and
Direct3D pixel formats were described by a data structure (DDPIXELFORMAT) that contained information about
the number of bits per color channel and bitmasks for each color channel (along with flags and size field). Pixel
formats in DirectX 8.0 are simple DWORDs that identify a particular pixel format and are compatible with FOURCCs
(Direct3D pixel formats are simply FOURCCs with all but the least significant bytes being zero).
The DDPIXELFORMAT data structure is no longer exposed through API level interfaces. However, it is still used at
the DDI level. The driver reports its supported texture formats through a texture format array that consists of
surface descriptions with their embedded DDPIXELFORMAT data structures. However, the embedded pixel format
structures can now be used to report new style pixel formats. To specify a new style pixel format using the
DDPIXELFORMAT data structure, set the dwFlags field of the structure to the value DDPF_D3DFORMAT and store
the new pixel format identifier in the dwFourCC field.
In addition, certain other new fields have been added to DDPIXELFORMAT (the new fields have been added as
members of unions with existing fields so the size of the data structure is the same). These fields include:
dwOperations, dwPrivateFormatBitCount, and wFlipMSTypes and wBltMSTypes.
A DirectX 8.0 DDI compliant driver should continue to report DX7 style surface formats through the standard
mechanisms, that is, the texture format list reported in the global driver data structure
(D3DHAL_GLOBALDRIVERDATA) and the Z/Stencil list reported in response to a GUID_ZPixelFormats from
DdGetDriverInfo. However, the driver should also report all of its supported surface formats through the new
DirectX 8.0 DDI mechanism described below.
DirectX 8.0 DDI style surface formats are reported using GetDriverInfo2. Two GetDriverInfo2 query types are
used by the runtime to query for surface formats from the driver. D3DGDI2_TYPE_GETFORMATCOUNT is used to
request the number of DirectX 8.0 style surface formats supported by the driver. D3DGDI2_TYPE_GETFORMAT is
used to query for a particular surface format from the driver.
To handle the D3DGDI2_TYPE_GETFORMATCOUNT, the driver must store the number of DirectX 8.0 DDI style
surface formats that it supports in the dwFormatCount field of the DD_GETFORMATCOUNTDATA.
When the runtime has received the number of supported formats from the driver, it then queries for each surface
format in turn with GetDriverInfo2 queries of type D3DGDI2_TYPE_GETFORMAT. The data structure pointed to by
the lpvData field of the DD_GETDRIVERINFODATA data structure is, in this case, DD_GETFORMATDATA.
The DirectX 8.0 runtime scans the texture format list reported by the driver examining the dwFlags fields of each
pixel format. If any of the texture formats have dwFlags set to DDPF_D3DFORMAT, then the runtime identifies this
texture format list as DX8 style and filters all texture formats whose pixel format is not flagged as
DDPF_D3DFORMAT. Furthermore, a DX7 runtime filters any texture format that has DDPF_D3DFORMAT set.
Therefore, a driver supporting the DX8 DDI can return a texture format list that contains two entries for each
supported format, one specified in the old style and one in the new. DX8 runtimes use the formats specified in the
new style and DX7 runtimes use the formats specified in the old style.
All supported surface formats, such as textures, depth or stencil buffers, or render targets, should be reported
through the GetDriverInfo2 mechanism. The runtime ignores the texture and Z/Stencil formats returned through
legacy mechanisms (D3DHAL_GLOBALDRIVERDATA and GUID_ZPixelFormats). No attempt is made to map these
formats to DX8 style formats for DirectX 8.0 drivers. However, legacy formats are mapped to the new style for
DirectX 7.0 or earlier drivers. Therefore, a driver must report all supported surface formats through the DirectX 8.0
DDI. Furthermore, because legacy runtimes do not map new style surface formats to old style formats it is essential
that the driver continues to report DirectX 7.0 style surface and Z/Stencil formats through the legacy mechanism.
Format Operations
When reporting supported surface formats a DirectX 8.0 driver must also indicate which operations can be
performed on surfaces of that format. The supported operations for a pixel format are reported through the
dwOperations field of the DDPIXELFORMAT structure. The driver should set this field to the logical combination
of all supported operations for surfaces of that format.
Surface Formats as FOURCCs
Three of the new surface formats defined by DirectX 8.0, D3DFMT_Q8W8V8U8, D3DFMT_V16U16 and
D3DFMT_W11V11U10, are passed to the driver as FOURCCs. This means the various bit depth and mask fields of
the DDPIXELFORMAT data structure are not initialized and their values are undefined. Hence, a driver processing
these three formats must not rely on the bit count or masks in the pixel format but must compute these as
necessary. For example, when computing the pitch of a surface of one of these types the dwRGBBitCount field of
the pixel format must not be used. All other formats other than YUV, DXT and IHV specific extension formats are
mapped to the legacy DDPIXELFORMAT representation when passed to the driver and, therefore, have valid pixel
formats and masks in the pixel format data structure.
Multiple Vertex Streams
DirectX 8.0 adds support for multiple vertex streams. Even if the driver and hardware combination does not support
more than one stream of vertex data, the driver must still handle the stream binding DP2 tokens
(D3DDP2OP_SETSTREAMSOURCE and D3DDP2OP_SETSTREAMSOURCEUM) and the new vertex stream based DP2
drawing tokens (see New DP2 Stream Drawing Tokens). These are the mechanisms for passing vertex data to the
driver in drawing for DirectX 8.0 level drivers.
Reporting Multiple Vertex Stream Capability
A driver reports the ability to support multiple vertex streams by setting the value of the MaxStreams field of the
D3DCAPS8 structure. A driver that supports multiple vertex streams should specify a value greater than one. A DX8
level driver that does not support multiple vertex streams should set MaxStreams to one. No DX8 level driver
should specify a value of zero for this field. The driver should also set the MaxStreamStride field to the maximum
supported stride (in bytes) between vertex elements in a vertex stream.
Stream Zero
Vertex stream zero is treated differently from the other streams because it is the only stream supported by earlier
versions of Direct3D. Vertex buffers that have a flexible vertex format (FVF), where the FVF field is nonzero, can only
be bound to stream zero. However, this does not imply that the vertex buffer bound to stream zero always has a
flexible vertex format.
Stream zero is also the implied vertex source when one of the special, fixed-function vertex shaders is the current
vertex shader handle.
Notification of Stream and Vertex Buffer Binding
A driver is notified of the binding of a vertex buffer to a particular stream through a new DP2 token,
D3DDP2OP_SETSTREAMSOURCE, and its associated HAL data structure, D3DHAL_DP2SETSTREAMSOURCE.
Vertex and Index Buffers
DirectX 8.0 introduces index buffers and updates information for vertex buffers. The following sections discuss
these buffer types:
Index Buffers
Vertex Buffers
Index Buffers
DirectX 8.0 introduces the concept of index buffers. These buffers are very similar to vertex buffers but store simple
16- or 32-bit indices into vertex data rather than the vertex data itself. Index buffers extend all the benefits of vertex
buffers, for example optimal download and caching, to index data.
Index buffers are created, locked, unlocked and destroyed with the same driver entry points as those used for
vertex buffers. A driver can distinguish between these buffer types using the new surface capability bit
DDSCAPS2_INDEXBUFFER. For index buffers, this flag is set in the ddsCapsEx.dwCaps2 field of the surface's
DD_SURFACE_MORE structure. It will be clear for vertex buffers.
Unlike many other surface types, a driver does not need to set the capability DDSCAPS2_INDEXBUFFER when
reporting its capabilities to the runtime to receive driver calls for index buffer creation, destruction, and locking. A
DirectX 8.0 driver that supports vertex buffers is assumed to support index buffers also. If the underlying hardware
has no direct support for index buffers, then the driver should handle index buffer creation by allocating system
memory for the surface.
Reporting Support for 32-bit Indices
Before DirectX 8.0, vertex indices were restricted to 16-bit quantities. DirectX 8.0 adds support for 32-bit indices. A
driver reports support for 32-bit indices by setting the value of the MaxVertexIndex field of D3DCAPS8 (currently
also in D3DHAL_D3DEXTENDEDCAPS) to a value greater than 0xFFFF. This field also allows the driver to report
that although it supports indices requiring 32-bits of storage it does not support the full range of 32-bit values.
DirectX 9.0 and later versions only.
In order for a driver to expose its Direct3D hardware abstraction layer (HAL) device to applications through DirectX
9.0 interfaces, the driver must set the value of MaxVertexIndex to a value greater than or equal to 0xFFFF.
Setting the Current Index Buffer
As with vertex data, the index buffer to be used by drawing primitives is no longer part of the data passed to the
driver with the primitive, but rather is driver state. The current index buffer is set by a new DP2 token,
D3DDP2OP_SETINDICES. This token established the index buffer with the given handle as the current index buffer
to use when drawing indexed primitives until a new index buffer is set or the current index buffer is cleared (an
index buffer handle of zero is specified in the DP2 token data).
Vertex Buffers
The following topics contain update information for vertex buffer creation and use.
Vertex Buffer Callbacks and Windows 2000
Vertex Buffer Creation Handling on Windows 2000
Vertex Buffer Renaming
Handling Renaming on Windows 2000
Vertex Buffer Callbacks and Windows 2000
DirectX 7.0 on the initial retail release of Windows 2000 prevents a driver's execute buffer (D3D buffer) callbacks
from being invoked by the runtime. This prevents the driver from being notified of vertex buffer creation requests
and, hence, no video memory or nonlocal video memory buffers can be created or used in this scenario. Video
memory vertex buffers are enabled in DirectX 8.0 and in the version of DirectX 7.0 that is shipped with DirectX 8.0.
Furthermore, Windows 2000 Service Pack 1 (SP1) enables video memory vertex buffers and all future versions of
Windows 2000 will enable video memory vertex buffers. However, there is no workaround to enable video
memory vertex buffers on Windows 2000 other than to install DirectX 8.0.
Vertex Buffer Creation Handling on Windows 2000
In DirectX 8.0, vertex (and index) buffers can be managed as textures were in DirectX 7.0. That is, a system memory
copy of a vertex buffer is maintained at all times and a video memory copy is only allocated when that vertex buffer
is actually required.
If the driver does not allocate a vertex buffer in video memory but, instead, requires the runtime to allocate the
buffer in system memory, it should not return DDHAL_DRIVER_NOTHANDLED but rather should return
DDHAL_DRIVER_HANDLED and indicate failure by setting a ddRVal of E_FAIL. If the driver returns
DDHAL_DRIVER_NOTHANDLED, the runtime attempts to allocate the surface from the video memory heaps
returned by the driver. This may either fail and return an error to the application or result in the surface being
allocated in local or nonlocal video memory (which is not the intention).
Therefore, if you wish the runtime to allocate a vertex buffer in system memory on your behalf, set ddRVal to
E_FAIL and return DDHAL_DRIVER_HANDLED.
Vertex Buffer Renaming
To improve parallelism between the driver and the runtime, Direct3D supports the concept of vertex buffer
"renaming". Essentially, this is a double buffering scheme for vertex buffers. In certain circumstances a driver can,
when passed a vertex buffer through a DDI call, modify the video memory pointer of the vertex buffer. In this way,
the driver can continue to process the contents of the vertex buffer, while, at the same time, the application can lock
and fill the vertex buffer. As far as the application is concerned it is using the same vertex buffer. The fact that the
memory pointed to by that vertex buffer has been modified is hidden by the runtime and driver.
Although previous versions of DirectX supported vertex buffer renaming there have been certain changes with
DirectX 8.0. In previous versions of Direct3D, renaming was primarily accomplished via the D3dDrawPrimitives2
DDI entry point. Flags specified in D3DHAL_DRAWPRIMITIVES2DATA would specify whether the driver could
swap the vertex or command buffer and if so, what the required sizes of the buffers would be. However, in DirectX
8.0, vertex buffer swapping is not accomplished through D3dDrawPrimitives2 (although calls through legacy
interfaces still exploit this mechanism) but rather through the LockExecuteBuffer (LockD3DBuffer) DDI entry point.
DirectX 8.0 defines a new lock flag, D3DLOCK_DISCARD, that, when passed to the driver, indicates that the caller
does not require the existing contents of the driver and hence they can be discarded before returning the pointer to
the vertex buffer data. Hence, when the driver receives a vertex buffer lock call with the D3DLOCK_DISCARD flag
set, it can choose to rename the vertex buffer by setting the fpVidMem to a new value.
Note that the D3DLOCK_DISCARD flag will not be passed to the driver by the initial retail release of Windows 2000.
The flag will be passed on Windows 2000 Service Pack 1 (SP1) and all subsequent versions of Windows 2000.
In DirectX 7.0, vertex buffer renaming could also be accomplished via LockExecuteBuffer using the flag
DDLOCK_DISCARDCONTENTS. However, the synchronization between runtime and driver on the original release
of DirectX 7.0 prevents this mechanism from working correctly. However, the version of DirectX 7.0 released with
DirectX 8.0 corrects this problem and vertex buffer renaming at lock time are functional through DirectX 7.0
interfaces.
Handling Renaming on Windows 2000
In order to correctly perform vertex buffer renaming it is important to understand the nature of the fpVidMem
pointer stored in the surface global object on Windows 2000 and later. The interpretation of fpVidMem depends
on the type of memory in which the surface is stored. For both system and nonlocal video memory (AGP) surfaces
the fpVidMem is a pointer directly into the user-mode address space of the process owning that surface.
For local video memory surfaces, fpVidMem is an offset from the start of video memory. In order to convert this
to a user-mode pointer it is necessary to add the base address of video memory as mapped into a user-mode
process. This base address can be found in the fpProcess field of the DirectDraw local object for a given process.
Although fpVidMem for a nonlocal video memory surface is simply a user-mode pointer the means by which this
user-mode pointer are generated are somewhat complex. It is necessary to understand how the Windows 2000
kernel maintains AGP heaps and manages surface allocations from them. The first important point is that, for
nonlocal heaps, the start address of the heap maintained by the kernel may not be a heap into any real address
space. It is in fact, normally, a numerical offset designed to ensure that valid allocations from that heap cannot have
a NULL (zero) address.
It may be helpful to think of AGP heaps as residing in a conceptual address space that does not correspond to any
real address space. The fpStart field of an AGP heap is the base address of the heap in this conceptual address
space. Furthermore, any surfaces allocated from an AGP heap have an fpHeapOffset that also lies in this
conceptual address space. Thus, fpHeapOffset is an offset from the base of this conceptual heap and it is not an
offset from the start of the heap itself. Furthermore, it is not a pointer into any real address space. In order for a
user-mode process to access the memory of a surface fpHeapOffset must be mapped (through pointer
arithmetic) into the address space of that user-mode process. When a surface is created, the kernel performs this
mapping according to the formula outlined below.
Given a surface (pSurface), a kernel-mode AGP heap (pvmHeap) and a mapping of the heap into a particular
user-mode process (pMap), the following formula is used to compute the actual, user-mode fpVidMem for a
surface:
fpVidMem = pMap->pvVirtAddr +
(pSurface->fpHeapOffset âˆ’ pvmHeap->fpStart)
pvVirtAddr is the base address of the user-mode mapping of the AGP heap into a given process. fpStart is the
offset of the base of the AGP heap into the conceptual address space described above and fpHeapOffset is the
offset of the start of the surface from the base of the same conceptual address space.
Your driver is notified of the conceptual base address of AGP heaps through the DdGetDriverInfo callback. When
DdGetDriverInfo is called with GUID_UpdateNonLocalHeap the fpGARTLin field of the data structure passed is the
same value as fpStart, that is, the base address of the start of the AGP heap in the conceptual address space.
Unfortunately, your driver is not notified of the value of pvVirtAddr and it is not visible to the driver through any
of the data structures passed to the driver. Therefore, its value has to be computed from the fpVidMem computed
by the kernel for the vertex buffer on initial creating. Given the fpVidMem computed by the kernel, simply subtract
the current fpHeapOffset less the heap's fpStart. Given the fpHeapOffset of the new memory to be swapped
into the vertex buffer on renaming, the new value of fpVidMem can be easily computed.
The following code fragment demonstrates computing a new fpVidMem for an AGP surface in a lock call.
// Get the vertex buffer's surface local and global from the
// lock data
LPDDRAWI_DDRAWSURFACE_LCL*pLcl = pLockData->lpDDSurface;
LPDDRAWI_DDRAWSURFACE_GBL*pGbl = pLcl->lpGbl;
// Get heap this vertex buffer was allocated from

LPVVIDEOMEMORY pHeap = pGbl->lpVidMemHeap;
// Get the current fpVidMem for the vertex buffer

FLATPTR fpCurrentVidMem = pGbl->fpVidMem;
// Compute the virtual base address of the mapping of this AGP

// into the process owning this vertex buffer.
FLATPTR pvVirtAddr = fpCurrentVidMem âˆ’
(pGbl->fpHeapOffset âˆ’ pHeap->fpStart);
// Given the fpHeapOffset of the nonlocal video memory to be

// swapped into the new vertex buffer compute the new fpVidMem
// as follow
FLATPTR fpNewVidMem = pvVirtAddr + (fpNewHeapOffset âˆ’ pHeap->fpStart);
// Now store the new fpVidMem in the surface global object and
// also in the lock data.
pGbl->fpHeapOffset = fpNewHeapOffset;
pGbl->fpVidMem = fpNewVidMem;
pLockData->lpSurfData = fpNewVidMem;
// Return success and driver handled

pLockData->ddRVal = DD_OK;
return DDHAL_DRIVER_HANDLED;
In order to make nonlocal video memory accessible to a user-mode process it is necessary for the memory to be
both committed and mapped to the user-mode process. To ensure that this is done when vertex buffer renaming is
being performed it is essential, that the new memory for the vertex buffer be allocated using the EngXxx function
HeapVidMemAllocAligned. This guarantees that the memory is committed and mapped before use.
HeapVidMemAllocAligned returns an offset into the conceptual address space of the AGP heap and, therefore,
this pointer can be used as an fpHeapOffset directly.
If the driver returns DDHAL_DRIVER_HANDLED for a lock of an AGP surface the kernel code returns the value of
lpSurfData in the DD_LOCKDATA data structure to the runtime and application. If the driver returns
DDHAL_DRIVER_NOTHANDLED the kernel simply returns the value of fpVidMem to user mode. Therefore, it is not
necessary to return DDHAL_DRIVER_HANDLED as long as fpVidMem is updated to point to the new user-mode
pointer. However, we recommend that the driver both set fpVidMem and lpSurfData and return
DDHAL_DRIVER_HANDLED.
New DP2 Stream Drawing Tokens
DirectX 8.0's support for multiple streams of vertex data requires that new DP2 drawing tokens be introduced.
These new tokens are necessary because existing drawing tokens assumed that there was a single pointer to vertex
data for a particular drawing instruction. With multiple streams, this is no longer the case. A drawing command
may well access multiple vertex data buffers simultaneously through streams.
Note that these drawing tokens replace the existing primitive type specific tokens (for example,
D3DDP2OP_POINTS, D3DDP2OP_TRIANGLELIST, D3DDP2OP_TRIANGLESTRIP) for calls through the new DirectX
8.0 interfaces only. Calls made through DX7 or earlier interfaces are still passed through the DDI as the old style
drawing tokens. Therefore, a DX8 driver is required to support both old and new style drawing tokens.
The indexed and nonindexed drawing tokens have two variants. For example, nonindexed drawing is accomplished
by the tokens D3DDP2OP_DRAWPRIMITIVE and D3DDP2OP_DRAWPRIMITIVE2. Similarly, indexed drawing is
accomplished by the tokens D3DDP2OP_DRAWINDEXEDPRIMITIVE and D3DDP2OP_DRAWINDEXEDPRIMITIVE2.
The main distinction between the two variants is that D3DDP2OP_DRAWPRIMITIVE2 and
D3DDP2OP_DRAWINDEXEDPRIMITIVE2 are used when the vertex data has been transformed by the runtime. This
is either because the driver/hardware combination does not support hardware vertex processing or the software
vertex processing has been explicitly selected. For these tokens, only stream zero is used and it contains
transformed and lit vertices.
D3DDP2OP_DRAWPRIMITIVE and D3DDP2OP_DRAWINDEXEDPRIMITIVE are used then the runtime has not
processed the vertex data. Thus, these tokens can supply untransformed vertex data when the hardware supports
hardware vertex processing or transformed vertex data when the application supplies transformed data directly to
the runtime. In this case, any number of streams (up to MaxStreams) can be active. These variants (along with the
other new drawing token, D3DDP2OP_CLIPPEDTRIANGLEFAN) enable optimal code paths in the runtime and the
distinctions beyond those described here are not significant to the driver.
Copying Vertex and Index Buffers in the DP2 Stream
A new DP2 token, D3DDP2OP_BUFFERBLT, has been added to support optimal copying and updating of index and
vertex buffers. This token is very similar to the existing D3DDP2OP_TEXBLT that copies and updates textures but has
been modified to support subbuffer copying rather than simple rectangles.
Point Sprites
DirectX 8.0 introduces support for point sprites. A point sprite is an extension to basic point rendering that allows
the size of the point to be specified, either by a render state or by a vertex component. When accelerated, the point
sprites are rendered in hardware as a screen space quadrilateral formed of two triangles, and render states such as
textures and blending are used.
Reporting Support for Point Sprites
A driver notifies the runtime of its support for point sprites by setting the MaxPointSize field of the D3DCAPS8
structure to a floating-point number greater than one (reporting a value of one is part of the requirement to
indicate a DX8 level HAL). This value specifies the maximum point width and height in render target pixels. Devices
that do not support point sprites can set this value to 1.0.
The size of a point sprite can be specified either by a new per-vertex element or by a new render state. If the driver
and hardware combination supports the interleaving of point size information with other vertex data (rather than
simply through the point size render state D3DRS_POINTSIZE), it should set the D3DFVFCAPS_PSIZE flag in the
FVFCaps field of the D3DCAPS8 structure.
The absence of D3DFVFCAPS_PSIZE indicates that the device does not support a vertex format specified in point
size (indicated by the D3DFVF_PSIZE flag); therefore, the base point size is always specified with the
D3DRS_POINTSIZE render state.
DX8 drivers for which the D3DFVFCAPS_PSIZE flag is not set are still required to accept D3DFVF_PSIZE and must
ignore any point size data passed through the flexible vertex format (FVF). Note that the D3DUSAGE_POINTS flag
must be set for vertex buffers that are to be used for rendering point sprites. If this flag is set, the driver can avoid
allocating these vertex buffers in memory types that are slow for reads into the CPU.
Point sprites present a challenge when user clip planes are being used. It is possible that a particular hardware
implementation of point sprites will clip only the actual vertex position of the point sprite against the user clip
plane, rather than the expanded quad actually rendered. If the driver and hardware combination can support
clipping of point sprites by their actual computed size rather than simple vertex position then the
D3DPMISCCAPS_CLIPPLANESCALEDPOINTS capability bit should be set in the PrimitiveMiscCaps field of
D3DCAPS8.
DX8 drivers that perform transform and lighting (that is, offer hardware vertex processing) are responsible for a
correct point sprite implementation. No emulation is performed by the DirectX 8.0 runtime. This means that even if
the hardware is used with software vertex processing, point sprites are the DX8 driver's responsibility. However, in
DirectX 8.1 and later, if the hardware is used with software vertex processing, the runtime can provide emulation.
Computing the Size of Point Sprites
Point sprites are rendered by using the existing D3DPT_POINT primitive type. The size of point sprites can be
controlled either through the new render state D3DRS_POINTSIZE or by the new FVF component D3DFVF_PSIZE.
For vertices without the D3DFVF_PSIZE vertex component, the current value of the D3DRS_POINTSIZE render state
should be used. Otherwise, the value specified in the vertex data should be used. In either case, the value is a
floating-point number that is the size (width and height) of the rendered quad in rendering target pixels. The default
value of the point size render state (1.0) is sent to the driver during initialization.
Two render states control clamping of the computed point sprite size, D3DRS_POINTSIZE_MIN and
D3DRS_POINTSIZE_MAX. The computed size of the point should be clamped to be no smaller than the size given by
D3DRS_POINTSIZE_MIN and no larger than the size given by D3DRS_POINTSIZE_MAX. It is the driver's
responsibility to ensure that the point sprite size is clamped to the minimum and maximum sizes specified by the
render states.
For drivers that support hardware vertex processing, the size of point sprites may also be scaled based on the
distance from the point to the eye (in eye space). Scaling of the point sprites is enabled by the new render state
D3DRS_POINTSCALEENABLE. If the value of this render state is TRUE then the points are scaled according to the
following parameters, the S formula, and maximum/minimum determination. Note that in this case the
application-specified point size is expressed in camera space units. This scaling is performed by drivers that support
transform and lighting only.
Si
Input point size (either per-vertex or D3DRS_POINTSIZE)
A,B,C
Point scale factors D3DRS_POINTSCALEA/B/C
V
Height of viewport (dwHeight field in D3D_VIEWPORT)
Pₑ = (Xₑ, Yₑ, Zₑ)
Eye space position of point
De = sqrt (Xₑ² + Yₑ² + Zₑ²)
Distance from eye to position (eye at origin)
S = V * S i * sqrt(1/(A + B*Dₑ + C*(Dₑ²)))
Screen space point size
Smax
MaxPointSize (member of D3DCAPS8) device capability
Smin
D3DRS_POINTSIZE_MIN
Final screen-space point size S =
Smax if S > Smax
Smin if S < Smin
S otherwise
Note that for the application to be drawing single pixel vertices, rather than point sprites, it must have the following
render states set:
SetRenderState (D3DRS_POINTSCALEENABLE, FALSE)

// All textures must be turned off.
SetTexture (0, NULL);
SetTextureStageState(1, D3DTSS_COLOROP, D3DTOP_DISABLE);
// The point size render state must be set to any value between 0.0-1.0
SetRenderState(D3DRS_POINTSIZE, 1.0);
// D3DRS_POINTSIZE_MIN and D3DRS_POINTSIZE_MAX
// must be set appropriately to allow
// D3DRS_POINTSIZE to be set to a value between 0.0-1.0

Rendering Point Sprites
A screen space point P = (X, Y, Z, W) of screen-space size S is rasterized as a quadrilateral with the following 4
vertices:
(Xâˆ’S/2, Yâˆ’S/2, Z, W)
(X+S/2, Yâˆ’S/2, Z, W)
(Xâˆ’S/2, Y+S/2, Z, W)
(X+S/2, Y+S/2, Z, W)
The vertex color attributes are duplicated at each of the 4 vertices, therefore each point is always rendered with
constant colors.
The assignment of texture coordinates is controlled by the D3DRS_POINTSPRITEENABLE setting. If
D3DRS_POINTSPRITEENABLE is set to FALSE, then the texture coordinates of the vertex are duplicated at each of
the 4 vertices. If no texture coordinates are present in the vertex the default values of (0.0f, 0.0f, 0.0f, 1.0f) are used
for the corners of the point sprite. If the D3DRS_POINTSPRITEENABLE is set to TRUE, then the texture coordinates
at the 4 vertices, starting from the top left corner and winding clockwise, are set to:
(0.0f, 0.0f)
(1.0f, 0.0f)
(0.0f, 1.0f)
(1.0f, 1.0f)
When clipping is enabled, points are clipped as follows: If the vertex is outside the view frustum in Z (either near or
far), then the point is not rendered. If the point, taking into account the point size, is totally outside the viewport in x
or y, then the point is not rendered. Remaining points are rendered. Note that it is possible for the point position to
be outside the viewport (in x or y) and still be partially visible.
Points may or may not be correctly clipped to user-defined clip planes. If D3DDEVCAPS_CLIPPLANESCALEDPOINTS
is not set, then points are clipped to user-defined clip planes based only on the vertex position, ignoring the point
size. In this case, scaled points are fully rendered when the vertex position is inside the clip planes, and are
discarded when the vertex position is outside a clip plane. Applications may prevent potential 'popping' artifacts by
adding a border geometry to clip planes that is as large as the maximum point size.
If the D3DDEVCAPS_CLIPPLANESCALEDPOINTS bit is set, then the scaled points are correctly clipped to user-
defined clip planes.
It is important to remember that point sprites should have no dependencies on the culling or fill modes. Point
sprites should always be rendered regardless of the cull or fill mode.
Also it is important that in point fill mode with flat shading that the rules for flat shading a primitive are complied
with. This means that the first vertex of a primitive dictates the color of that primitive and hence the color for each
vertex of the primitive. This is not what occurs with version 8.0 of the reference rasterizer or the sample driver and
is fixed in version 8.1.
Volume Textures
DirectX 8.0 adds support for volume or 3D textures. Such textures have depth in addition to width and height.
Reporting Support for Volume Textures
DirectX 8.0 introduces two new primitive texture capabilities flags that the driver sets to indicate support for volume
textures. These flags are D3DPTEXTURECAPS_VOLUMEMAP and D3DPTEXTURECAPS_MIPVOLUMEMAP.
D3DPTEXTURECAPS_VOLUMEMAP should be set in the dwTextureCaps field of the D3DPRIMCAPS8 structure
(part of D3DCAPS8) if the hardware has support for volume textures. D3DPTEXTURECAPS_MIPVOLUMEMAP
indicates that the driver supports MIP mapped volume textures.
Hardware that supports volume textures must also support the use of volume textures in multitexturing scenarios
(in combination with other volume textures or 2D textures). If this scenario is not supported by the hardware, the
driver cannot set D3DPTEXTURECAPS_VOLUMEMAP.
The driver can indicate that it requires the dimensions of the volume texture to be a power of 2 by setting the
primitive texture capability D3DPTEXTURECAPS_VOLUMEMAP_POW2.
A driver that supports volume textures is also required to specify the minimum and maximum volume texture
dimensions that it supports. The field MaxVolumeExtent should be set to the maximum supported dimensions of
the volume texture. The same constraint must apply to all three dimensions of the volume texture (width, height
and depth).
A driver notifies the runtime of the volume texture filtering and texture addressing modes supported by the
hardware by setting the VolumeTextureFilterCaps and VolumeTextureAddressCaps to the appropriate
combinations of flags.
Finally, the driver notifies the runtime about what surface formats can be used with volume textures by setting the
D3DFORMAT_OP_VOLUMETEXTURE in the dwOperations field of the surface format's DDPIXELFORMAT.
Handling the Creation of Volume Textures
DirectX 8.0 introduces a new surface capability bit DDSCAPS2_VOLUME. This flag is set in the
ddsCapsEx.dwCaps2 field of the surface's DD_SURFACE_MORE structure. In the DdCreateSurface and
D3dCreateSurfaceEx callbacks the depth of the volume texture can be found in the low word of the dwCaps4
field of the extended surface capabilities (ddsCapsEx) of the surface's DD_SURFACE_MORE structure. The driver
should return the "slice pitch" (that is, the number of bytes to add to move from one 2D slice of the volume to the
next) of the volume texture in the dwBlockSizeY field of the surface global structure.
Copying Volume Textures in the DP2 Stream
A new DP2 token, D3DDP2OP_VOLUMEBLT, has been added to support optimal copying and updating of volume
textures. This token is very similar to the existing D3DDP2OP_TEXBLT that copies and updates textures but has been
extended to support subvolume (box) copying rather than simple rectangles.
Locking a Subvolume of a Volume Texture
DirectX 8.1 introduces a new feature that lets a driver lock just a subvolume of a volume texture. When a driver's
DdLock function is called, the driver can improve system performance by locking just a subvolume instead of the
whole volume texture.
To indicate support of this feature, the driver must set the D3DDEVCAPS_SUBVOLUMELOCK bit in the DevCaps
member of the D3DCAPS8 structure. The driver returns a D3DCAPS8 structure in response to a GetDriverInfo2
query as described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this query is described in
Supporting GetDriverInfo2.
After support of this feature is determined, the driver can receive a DdLock call with the
DDLOCK_HASVOLUMETEXTUREBOXRECT bit set in the dwFlags member of the passed DD_LOCKDATA structure.
This bit informs the driver to lock down the specified subvolume texture. The driver must then obtain the front and
back coordinates of the locked subvolume from the left and right members of the RECTL structure that is specified
in the rArea member of DD_LOCKDATA. The driver obtains the front and back coordinates from the higher 16 bits
of the left and right members respectively.
The left and right coordinates of the locked subvolume are constrained to the lower 16 bits of the left and right
members. The driver uses the top and bottom members of the RECTL structure in rArea unchanged to specify the
top and bottom coordinates of the locked subvolume. In this way, the rArea member effectively provides three
coordinate sets to specify the locked subvolume. The RECTL structure is described in the Microsoft Windows SDK
documentation.
The following code shows how to obtain the front and back coordinates:
"real" left = rArea.left && 0xFFFF;

"real" right = rArea.right && 0xFFFF;
front = rArea.left >> 16;
back = rArea.right >> 16;
This feature is available on Windows Me and Windows XP and later versions. This feature is also available on
Windows 2000 and Windows 98 operating system versions that have the DirectX 8.1 runtime installed on them.
Presentation
DirectX 8.0 formalizes the concept of "presentation" (or making the results of rendering visible to the user) in the
API. Previously, this was accomplished either by page flipping in full screen mode or by blitting in windowed mode.
Applications use the new Present API to perform either full screen flipping or windowed mode blitting. However,
this mechanism is not yet exposed at the DDI level. The runtime simply maps the Present API to either the DdFlip
or DdBlt DDI entry points depending on the application mode.
DirectX 8.0 has added two new DirectDraw blt flags that are passed to the driver as notification of when a blt
operation is actually part of a Present and therefore marks a frame boundary. These new flags are
DDBLT_PRESENTATION and DDBLT_LAST_PRESENTATION. Two flags are necessary because clipping may result in
a single Present call invoking multiple blt operations in the driver. In this case, all of the blts that are invoked as a
result of the Present operation have the DDBLT_PRESENTATION flag set. However, only the final blt of the
sequence used to perform the Present has the DDBLT_LAST_PRESENTATION bit set. Therefore, if blt is used to
implement a Present call, the driver sees zero or more blts with the DDBLT_PRESENTATION bit set followed by
exactly one blt with both the DDLT_PRESENTATION and DDBLT_LAST_PRESENTATION bits set. These flags are
never set by the application. Only the runtime is allowed to pass these flags to a blt. In addition, these flags are only
passed to drivers supporting the DirectX 8.0 DDI.
The driver is only permitted to queue a maximum of three frames. If the driver sees a blt call with
DDBLT_PRESENTATION set and it already has three DDBLT_LAST_PRESENTATION blts queued it must fail the call
with DDERR_WASSTILLDRAWING. The runtime retries until the queue has drained sufficiently.
If the driver cannot effectively determine when a DDBLT_LAST_PRESENTATION blt in the queue has been retired,
then the driver must not queue frames at all. DDBLT_LAST_PRESENTATION should cause such drivers to return
DDERR_WASSTILLDRAWING until the accelerator is completely finished, exactly as if the application had called
Lock on the source surface before calling Blt.
Finally, in the case of multiple windowed applications running simultaneously, the driver should count presentation
blts based on the source of each blt, rather than the primary, that is, the driver is allowed to queue three frames per
window/render target. This results in better performance.
New DDSCAPS2 Flags
A new flag, DDSCAPS2_DISCARDBACKBUFFER, has been introduced to indicate that preservation of the back buffer
is not required. It is set on the primary surface and the back buffers if the application has set
D3DSWAPEFFECT_DISCARD on the Present API.
DX8 runtimes now set another new flag, DDSCAPS2_NOTUSERLOCKABLE, on the primary and the back buffers if
the flipping chain is not lockable, or on any render target that is not lockable. This allows drivers to do behind the
scenes optimization. Note that it is still possible to lock the surfaces so the driver must handle these cases, but such
locks are infrequent and are not expected to be fast.
The driver can also determine whether the depth/stencil buffer is lockable by the presence of the
DDSCAPS2_NOTUSERLOCKABLE flag.
Present and GetBltStatus
For DX8 the runtime no longer calls DdGetBltStatus on blts involving system memory surfaces. This was always the
behavior on Windows 2000. The result is that asynchronous DMA to or from system memory surfaces is no longer
possible. DX8 drivers should not page lock system memory surfaces by themselves, and system memory to video
memory transfers should be synchronous.
Palletized Textures
Although API support for palletized textures has been changed for DirectX 8.0, this is not reflected in the DDI. The
existing palette-oriented DP2 tokens continue to be used to notify the driver of the binding between a palette and a
texture and of updates to palettes.
It cannot be assumed, because an association between a surface and a palette has been established with
D3DDP2OP_SETPALETTE, that the lpPalette field of the surface structure points to a valid palette. The association
between a palette and a surface established by the DP2 stream is not reflected in the actual surface and palette data
structures.
Furthermore, DirectDraw's palette DDI entry points are not called for these palettes. All DDI notifications of texture
palette operations are done through the DP2 stream.
Cursors
DirectX 8.0 has added an API to support high update frequency cursors without requiring API level direct access to
the primary surface. For DirectX 8.0, the cursor is the standard GDI cursor if capabilities permit, or else it is
emulated with DirectDraw blts. To support the DirectX cursor API, the driver has to return capability information in
D3DCAPS8.
The CursorCaps field should be set to D3DCURSORCAPS_MONO, D3DCURSORCAPS_COLOR, or both, to indicate
support for monochrome and color hardware cursors. The MaxCursorEdgeSize field should be set to the
minimum of the maximum width and maximum height of the hardware cursor (or zero if no hardware cursor is
supported). It is not possible to express different maximum sizes for the width and height of the cursor.
Direct3D Shaders
DirectX 8.0 includes support for programmable vertex and pixel shaders. The following sections discuss these
shaders:
Vertex Shaders
Pixel Shaders
Vertex Shaders
All drivers that support the DirectX 8.0 DDI must support the new DP2 token D3DDP2OP_SETVERTEXSHADER even
if programmable vertex shaders are not supported in hardware. This is because D3DDP2OP_SETVERTEXSHADER is
the mechanism by which the FVF code of incoming vertex data is communicated to the driver when using fixed
function as well as programmable vertex processing.
D3DDP2OP_SETVERTEXSHADER can be used to notify the driver of either the handle of the current programmable
vertex shader to use or the FVF code of the vertex data for fixed function vertex processing. The handle space for
vertex shaders is managed by the runtime and includes valid FVF codes. Thus, a vertex shader handle can refer
either to a programmable vertex shader handle previously created by means of the
D3DDP2OP_CREATEVERTEXSHADER DP2 token, or to the FVF code of a vertex format to be processed by fixed
function vertex processing.
The driver for hardware that does not support programmable vertex processing should process
D3DDP2OP_SETVERTEXSHADER to determine the FVF code (and hence the processing to be performed) on the
vertex data bound to stream zero. This is particularly important when processing user memory (UM) primitives. In
this case, the only way of determining the FVF code of the supplied vertex data is through the
D3DDP2OP_SETVERTEXSHADER token. If the least significant bit of the handle is set (1), then the handle is vertex
shader handler. If the least significant bit is clear (0), then the handle is a legacy FVF code.
If the FVF code of a vertex buffer conflicts with that specified by D3DDP2OP_SETVERTEXSHADER the driver should
ignore the FVF code of the vertex buffer and continue.
The DirectX runtime guarantees that only FVF codes are passed as vertex shader handles to a driver that does not
support programmable vertex processing. However, such a driver should have debug code to verify that the FVF
code that is passed is supported.
Reporting Support for Programmable Vertex
Processing Hardware
For a DirectX 8.0 level driver to report support for programmable vertex shader hardware it must set the
VertexShaderVersion field of the D3DCAPS8 structure to a valid, nonzero vertex shader version number. The
VertexShaderVersion is a DWORD where the most significant word must have the value 0xFFFE and the least
significant word holds the actual version number. The least significant byte of this word holds the minor version
number and the most significant byte holds the major version number. Because the format of this DWORD is
complex, the driver must set the value of VertexShaderVersion using the macro D3DVS_VERSION defined in
d3d8types.h. For example, the following code fragment sets the VertexShaderVersion to indicate support for 1.0
level functionality.
myD3DCaps8.VertexShaderVersion = D3DVS_VERSION(1, 0);
To report no support for programmable vertex shaders, the following code fragment would be used:
myD3DCaps8.VertexShaderVersion = D3DVS_VERSION(0, 0);
Drivers that do not support programmable vertex processing should set VertexShaderVersion to zero.
In addition to setting the vertex shader version, the driver should report the number of constant registers it has for
vertex shading. In order to support the 1.0 vertex shading specification, the device must have at least 96 constant
registers. The driver reports the number of constant registers in the MaxVertexShaderConst field of the
D3DCAPS8 structure. For example, the following code fragment reports the minimum number of constant registers
required for version 1.0 vertex shaders.
myD3DCaps8.MaxVertexShaderConst = 96;
d3d8types.h defines a symbol for the minimum number of constant registers required by version 1.0 of the vertex
shader specification. This symbol is D3DVS_CONSTREG_MAX_V1_0 and it is recommended that the driver use this
symbol unless it supports more than 96 constant registers.
Pixel Shaders
All drivers that support the DirectX 8.0 DDI may support the new DP2 token D3DDP2OP_SETPIXELSHADER if
programmable pixel shaders are supported in hardware.
D3DDP2OP_SETPIXELSHADER can be used to notify the driver of the handle of the current programmable pixel
shader to use. A pixel shader handle refers to a programmable pixel shader handle previously created by means of
the D3DDP2OP_CREATEPIXELSHADER DP2 token.
Reporting Support for Programmable Pixel
Processing Hardware
For a DirectX 8.0 level driver to report support for programmable pixel shader hardware, it must set the
PixelShaderVersion field of the D3DCAPS8 structure to a valid, nonzero pixel shader version number. The
PixelShaderVersion is a DWORD where the most significant word must have the value 0xFFFF and the least
significant word holds the actual version number. This least significant byte of this word holds the minor version
number and the most significant byte holds the major version number. Because the format of this DWORD is
complex, the driver must set the value of PixelShaderVersion using the macro D3DPS_VERSION defined in
d3d8types.h. For example, the following code fragment sets the PixelShaderVersion to indicate support for 1.0
level functionality.
myD3DCaps8.PixelShaderVersion = D3DPS_VERSION(1, 0);
Drivers that do not support programmable pixel processing should set PixelShaderVersion to zero.
Unlike reporting the number of constant registers a device has for vertex shaders, a device cannot expose more
constant registers than are defined by the pixel shader version it specifies. For example, a device that implements
the 1.0 pixel shader specification must expose only eight constant pixel shader registers. However, there is an
additional pixel shader related capability that a driver should set, MaxPixelShaderValue. This field gives the
internal range of values supported for pixel color blending operations.
Implementations must allow data within the range they report to pass through pixel processing unmodified (for
example unclamped). This value normally defines the limits of a signed range, that is, an absolute value. Therefore,
for example, 1 indicates that the range is [-1.0 to 1.0], and 8 indicates that the range is [-8.0 to 8.0]. For pixel shader
version 1.0 to 1.3, the driver must set the value in MaxPixelShaderValue to a minimum of 1. For 1.4, the driver
must set the value in MaxPixelShaderValue to a minimum of 8.
High Order Surfaces
DirectX 8 introduces support for a class for high order surfaces. This section describes the mechanics of the DDI for
these patch surfaces but it does not describe the algorithms used. Refer to the reference rasterizer source code in
the Driver Development Kit (DDK) for the details of the algorithms that are used.
Reporting Support for High Order Surfaces
A driver reports its support for high order surfaces using four new capability bits in the DevCaps field of the
D3DCAPS8 structure. These flags are as follows:
D3DDEVCAPS_QUINTICRTPATCHES
Device supports quintic béziers and B-splines.
D3DDEVCAPS_RTPATCHES
Device supports rectangular and triangular patches.
D3DDEVCAPS_RTPATCHHANDLEZERO
When this device capability is set, the hardware architecture does not require caching of any information and that
uncached patches (handle zero) are drawn as efficiently as cached ones. Note that
D3DDEVCAPS_RPATCHHANDLERZERO does not mean that a patch with handle zero can be drawn. A handle zero
patch can always be drawn whether this cap is set or not.
D3DDEVCAPS_NPATCHES
Device supports n-patches.
High Order Surface DP2 Stream Drawing Tokens
The D3DDP2OP_DRAWRECTPATCH token is sent to the driver to draw a rectangular patch. The
D3DDP2OP_DRAWTRIPATCH token is sent to the driver to draw a triangular patch.
High Order Surface Render States
There are three render states that are used with high order surfaces. These render states are described below.
D3DRS_PATCHEDGESTYLE
This render state is used to control whether patch edges use discrete or continuous tessellation. See the DirectX 8.0
SDK documentation for more details.
D3DRS_PATCHSEGMENTS
This render state gives the number of segments to be used for each edge of the patch. If an explicit number of
segments is specified in the DP2 token those segments should override the value of this render state. For more
details, see the DirectX 8.0 SDK documentation.
D3DRS_DELETERTPATCH
This render state notifies the driver that a patch is to be deleted. For more information, see
D3DRENDERSTATETYPE.
Multisample Rendering
DirectX 8.0 introduces support for multisample rendering with the number of samples per pixel under application
control. The IDirect3DDevice8 interface supports multisampling in both fullscreen and windowed modes of
operation. Furthermore, there is sufficient flexibility to support hardware that performs the processing of samples
into pixels at the back end (directly out of the frame buffer) or at the front end (via a special flip or blt call).For more
information about IDirect3DDevice8, see the DirectX 8.0 documentation.
Reporting Multisample Support
A driver reports the multisample capabilities of its associated hardware by specifying the number of samples per
pixel for each surface format it reports. The DDPIXELFORMAT structure has been extended to include a structure
called MultiSampleCaps. This structure has members that let the driver express the number of samples per pixel
for both flip (fullscreen) and blt (windowed) multisampling. Each of these members is a WORD type in which each
bit of the WORD value indicates support for a given number of samples per pixel. Hence, the driver can express
support for several different sample counts with a single surface format entry.
Multisample Support through StretchBlt
Although not the recommended mechanism for supporting multisampling, the driver can implement
multisampling support by rendering to a large back buffer and performing a stretch blt to resample the large back
buffer to the lower resolution primary. However, if this is the mechanism by which the driver supports
multisampling, the driver must set the new capability bit D3DPRASTERCAPS_STRETCHBLTMULTISAMPLE in the
RasterCaps member of the D3D8CAPS structure. For a description of D3DCAPS8, see the DirectX 8.0 SDK
documentation.
When the driver sets the D3DPRASTERCAPS_STRETCHBLTMULTISAMPLE bit, it indicates that it:
Fails requests from applications to enable and disable full-scene anti-aliasing while the same scene is being
rendered. That is, it fails requests to turn on and off the BOOL value of the D3DRS_MULTISAMPLEANTIALIAS
device render state (D3DRENDERSTATETYPE) during the rendering of a single scene. Note that requests to
change the BOOL value of D3DRS_MULTISAMPLEANTIALIAS must not fail for a different scene. That is, if
D3DRS_MULTISAMPLEANTIALIAS is TRUE for one scene, it could be FALSE for another scene.
Is nonresponsive to requests from applications to modify samples in a multisample render target. That is, it
does not respond to setting the bitmask of the D3DRS_MULTISAMPLEMASK device render state
(D3DRENDERSTATETYPE).
It is important to note that if the driver uses a stretch blt to perform a page flip in fullscreen mode, the driver should
specify the supported sample counts in the wFlipMSTypes member of the DDPIXELFORMAT's
MultiSampleCaps structure and not the wBltMSTypes member as a flip is being performed.
Handling the Creation of Multisampled Surfaces
When a multisampled surface is being created, the number of samples can be found in the ddsCapsEx.dwCaps3 of
the DD_SURFACE_MORE structure. This field holds one of the values of the enumerated type
D3DMULTISAMPLE_TYPE. It is not a bitfield like wFlipMSTypes or wBltMSTypes. If a surface is not multisampled,
dwCaps3 has the value D3DMULTISAMPLE_NONE (0).
When determining whether a creation request for a multisample surface can be satisfied or not, the driver should
not take into account the current value of the D3DRS_MULTISAMPLEANTIALIAS render state. It is not permissible
for a driver to fail a request to set D3DRS_MULTISAMPLEANTIALIAS FALSE. Therefore, any restriction that affects
the ability to perform multisample rendering should be enforced at context create time even if
D3DRS_MULTISAMPLEANTIALIAS is FALSE at that time.
Accessing a Multisampled Primary Surface
The Direct3D runtime prevents high-performance CPU access to multisampled buffers. However, the runtime might
call a driver's DdLock function for low-performance access to multisampled buffers, such as for screen-shots and
for image verification in test scenarios.
Because the runtime cannot process the sample layout of multisampled buffers, the driver must convert the format,
and the driver's DdLock function must return a buffer of data that contains the contents of the primary surface in a
single sample-per-pixel format. If an application calls IDirect3DDevice8::GetFrontBuffer to obtain a copy of the front
buffer of a multisampled flipping chain, the Direct3D runtime calls the driver's DdLock function to lock the front
buffer. This buffer contains a version of the current front buffer that is resolved to the nominal width, height and
pixel format of the primary surface.
If such a buffer is available in device memory, then the driver can return a pointer to that buffer. If such a buffer is
not available in device memory (as is the case for devices that resolve multisample buffers at scan-out time), then
the driver should allocate a buffer in system memory and resolve the multisampled front buffer into this system
buffer. The runtime lets the driver take as much time as required to resolve the multisampled front buffer into this
system buffer.
Regardless of whether the runtime sets the DDLOCK_READONLY flag when it calls the driver's DdLock function, the
runtime treats these buffers as read only. Therefore, the driver is not required to copy any data from the system
memory surface back into device memory. In addition, the driver's DdUnlock function is not required to convert the
single sample-per-pixel format back to the primary surface's multisampled format.
Calls by applications to the cursor methods of the IDirect3DDevice8 interface can also result in DdBlt calls
targeting a multisampled primary. These DdBlt calls must handle the conversion from the single sample-per-pixel
cursor data to the multisampled primary.
For more information about IDirect3DDevice8, see the DirectX 8.0 SDK documentation.
Controlling Multisampling
Two render states of the D3DRENDERSTATETYPE enumeration control multisample rendering. For more
information about D3DRENDERSTATETYPE, see the DirectX 8.0 SDK documentation.
D3DRS_MULTISAMPLEANTIALIAS
A BOOL value that determines how individual samples are computed when using a multisample render target
buffer. When set to TRUE, the multiple samples are computed so that full-scene anti-aliasing is performed by
sampling at different sample positions for each multiple sample. When set to FALSE, the multiple samples are all
written with the same sample value (sampled at the pixel center), which allows nonantialiased rendering to a
multisample buffer. This render state has no effect when rendering to a single sample buffer. The default value is
TRUE.
D3DRS_MULTISAMPLEMASK
Each bit in this mask, starting at the LSB, controls modification of one of the samples in a multisample render target.
Thus, for an 8-sample render target, the low byte contains the 8 write enables for each of the 8 samples. This render
state has no effect when rendering to a single sample buffer. The default value is 0xFFFFFFFF.
This render state enables use of a multisample buffer as an accumulation buffer, doing multipass rendering of
geometry where each pass updates a subset of samples.
Each sample in a multisample render target contributes uniform intensity to the final presented image. Consider,
for example, that the multisample mode is 3 and the number of samples that are enabled using multisample
masking is 2. Therefore, the resulting intensity of the rendered image is 2/3. That is, the intensity of each red, green,
and blue component of every pixel is factored by 2/3.
Pure Devices
DirectX 8.0 introduces the concept of a "pure" device. When using a pure device the runtime does not track state or
state blocks or perform any software vertex processing on behalf of the hardware. Furthermore, the application
cannot query back state from the runtime. The lack of state tracking, particularly when state blocks are being used,
can result in a significant performance boost for the application.
Only vertex processing directly supported by the hardware is available to the application when using a pure device.
For example, for cards that do not support hardware transform and lighting, only pretransformed vertices can be
passed to Direct3D. Furthermore, the API functions SetClipStatus, GetClipStatus and ProcessVertices cannot be
used with the pure device.
In order to use a pure device the application must request it with the device creation flag D3DCREATE_PUREDEVICE
and the driver must report its ability to act as a pure device.
Reporting Pure Device Capability
A driver reports the ability to support pure devices by setting the new device capability D3DDEVCAPS_PUREDEVICE
in the DevCaps field of the D3DCAPS8 structure.
State Block Recording and Pure Devices
State block handling is different for a device operating in pure device mode. In this mode, the state block control
DP2 token (D3DDP2OP_STATESET) is sent to the driver with a new operation type (in the dwOperations field). This
new operation type is D3DHAL_STATESETCREATE.
Processing the D3DDP2OP_CLEAR DP2 Token
DirectX 8.0 introduces some changes to the required processing of the D3DDP2OP_CLEAR token. Specifically a new
flag D3DCLEAR_COMPUTERECTS has been added to the dwFlags field of the D3DHAL_DP2CLEAR data structure.
This new flag is only passed to the driver when a pure device type is being used (that is, D3DCREATE_PUREDEVICE
was specified when creating the device and the driver exports the D3DDEVCAPS_PUREDEVICE device cap).
Furthermore, this flag is never passed to non-DirectX 8.0 drivers and it is not specified by using the legacy Clear or
Clear2 driver callbacks.
Clipping Transformed Vertices
The Direct3D 8.0 runtime fully supports the clipping of pretransformed vertices through both the DrawPrimitive
and ProcessVertices API calls. This clipping includes user defined clipping planes as well as Z and the X and Y
viewport extents. However, the runtime does not guarantee the clipping of posttransformed vertices.
Posttransformed vertex data is passed directly from the application to the driver by the runtime. This does not
imply that a driver is required to fully clip posttransformed vertex data. A new capability flag
D3DPMISCCAPS_CLIPTLVERTS has been added for DirectX 8.0. If the driver sets this flag in the PrimitiveMiscCaps
field of the D3DCAPS8 structure, the application can assume that the driver fully clips posttransformed vertex data
to Z and the X and Y viewport extents. Clipping to user-defined clip planes is never supported for posttransformed
data. If the driver does not set this flag, the application is required to performing clipping of the posttransformed
data to the Z extents and to (at least) the guard band extents in X and Y.
It is important to note that the runtime does not validate that the application has correctly clipped posttransformed
data. It is the driver's responsibility to ensure that a crash or hang does not occur if unclipped or incorrectly clipped
data is passed when this flag is set.
Enabling Alpha Channels On Full-Screen Back Buffers
In the DirectDraw DDI, the creation of a primary flipping chain has no intrinsic pixel format. Consequently, surfaces
in this chain take on the pixel format of the display mode. For example, a primary flipping chain created in a 32bpp
mode takes on a D3DFMT_X8R8G8B8 format.
Such a chain is created for many full-screen applications. Because the back buffer of the chain has no alpha channel,
the D3DRS_ALPHABLENDENABLE render state and the associated blend-render states for destination surfaces are
poorly defined. DirectX 8.1 introduces a new feature that the Direct3D runtime uses to inform a driver of an
application's request to create a full-screen flipping chain of surfaces with an alpha channel in the pixel formats of
those surfaces.
To indicate support of this feature, the driver must set the D3DCAPS3_ALPHA_FULLSCREEN_FLIP_OR_DISCARD bit
(defined in the d3d8caps.h file) in the Caps3 member of the D3DCAPS8 structure. The driver returns a D3DCAPS8
structure in response to a GetDriverInfo2 query as described in Reporting DirectX 8.0 Style Direct3D Capabilities.
Support of this query is described in Supporting GetDriverInfo2.
After support of this feature is determined, the driver can receive DdCreateSurface calls with the
DDSCAPS2_ENABLEALPHACHANNEL (defined in the ddraw.h file) bit set in the dwCaps2 member of the
DDSCAPS2 structure. This bit is only set to create surfaces that are part of a primary flipping chain or that are on
stand-alone back buffers.
If the driver detects this bit, the driver determines that the surfaces take on not the display mode's format, but the
display mode's format plus alpha. For example, in a 32bpp mode, such surfaces should be given the
D3DFMT_A8R8G8B8 format.
This feature is available on Windows XP and later versions and on Windows 2000 operating system versions that
have the DirectX 8.1 runtime installed.
GDI Event Services in Windows 2000
GDI Event Services describes a group of GDI event-related functions that a display driver can use for
synchronization. While these event-related functions are documented as only available in Microsoft Windows XP
and later, most of them are also available in Microsoft Windows 2000. Although most of these event-related
functions are available in Windows 2000, using them in a driver implemented for Windows 2000 is discouraged
because such a driver could make Windows 2000 unreliable.
The event-related functions that are available in Windows 2000 behave similarly in Windows 2000 as they do in
Windows XP except for the EngWaitForSingleObject function. The EngWaitForSingleObject implementation in
Windows 2000 returns a DWORD value rather than the BOOL value that the Windows XP implementation returns.
This DWORD value can be one of the following values:
Zero
Indicates that one of the following operations occurred:
The wait succeeded. That is, the specified event object was set to the signaled state. The thread that called
EngWaitForSingleObject can resume processing.
The calling thread passed an invalid event-object pointer to the pEvent parameter of
EngWaitForSingleObject.
Any nonzero value
This value is an NTSTATUS status value that indicates the specific error condition. For example, STATUS_TIMEOUT
indicates that a time-out occurred.
Note The EngClearEvent and EngReadStateEvent functions are not available in Windows 2000.
The following sections contain update information for DirectX 9.0 and focus on those areas of the DDI that have
been modified or extended for DirectX 9.0:
Required DirectX 9.0 Driver Support
Recommended DirectX 9.0 Driver Support
Optional DirectX 9.0 Driver Support
Note that the DirectX 9.0 runtime only supplies hardware acceleration in any form if the display driver is a DirectX
7.0 or later driver (that is, the driver supports at least the DirectX 7.0 DDI).
The Updates for Windows DDK section contains update information for specific versions of the Microsoft Windows
Driver Development Kit (DDK) with which the DirectX 9.0 DDK is installed.
The Updates for Earlier DirectX DDK Versions section contains update information that applies to version 9.0 as
well as to prior versions.
A DirectX 9.0 display driver's source code must include the d3d9.h header file. The header files d3d9caps.h and
d3d9types.h are included in d3d9.h.
To support DirectX 8.1 and earlier versions of the DirectX runtime, the driver's source code must include both old
and new DirectX headers, for example d3d.h, d3d8.h, and d3d9.h.
To avoid problems when building a DirectX 9.0 version driver, define DIRECT3D_VERSION as 0x0900 in the driver's
source code before including any header files. Doing so prevents the possibility of DirectX 9.0 features being
missed as described in the DIRECT3D_VERSION topic. To ensure that the build process retrieves all the necessary
symbols in header files, include d3d9.h and d3d8.h before winddi.h or d3dnthal.h.
Required DirectX 9.0 Driver Support
The DirectX 9.0 runtime will supply hardware acceleration if the display driver is a DirectX 7.0 or later driver.
However, for a driver to be loaded by the operating system as a version 9.0 driver, it must implement the features
that are described in the following sections:
Supporting Two-Dimensional Operations
Supporting Dynamic Resources
Supporting Vertex Shader Declarations
Supporting Stream Offsets
Reporting Support of UBYTE4 Vertex Element
Supporting Commands for Setting Render Target
Setting Scissor Rectangle
Notifying about DirectX Version
Reporting DDI Version
A DirectX 9.0 version driver must support:
Reporting the capabilities of its device by returning a D3DCAPS9 structure when requested. The driver returns a
D3DCAPS9 structure in response to a GetDriverInfo2 request using the D3DGDI2_TYPE_GETD3DCAPS9 value
similarly to how it returns a D3DCAPS8 structure as described in Reporting DirectX 8.0 Style Direct3D
Capabilities. Support of this request is described in Supporting GetDriverInfo2. D3DCAPS9 contains both
DirectX 9.0 and DirectX 8.0 related capabilities.
The driver must continue to report only DirectX 8.0 related capabilities in D3DCAPS8 when queried by the
DirectX 8.0 runtime.
Setting the D3DFORMAT_OP_BUMPMAP flag in the dwOperations member of the DDPIXELFORMAT

structure for all surface formats that can support bump mapping in either fixed-function or programmable-
pixel pipes.
Reporting support of asynchronous query operations, even if the driver just responds by indicating that no
query types are supported. For more information, see Verifying Support of Query Types.
Querying asynchronously imposes two new requirements on the [**D3dDrawPrimitives2**]

(https://msdn.microsoft.com/library/windows/hardware/ff544704) DDI. For more information, see [Imposing
Requirements on the D3dDrawPrimitives2 DDI](imposing-requirements-on-the-d3ddrawprimitives2-ddi.md).
Letting applications perform other processing with busy present queues.

Supporting Two-Dimensional Operations
The DirectX 9.0 runtime directs a driver to perform two-dimensional (2D) pixel-copy operations differently
depending on the version of the driver that the runtime detects. For a DirectX 8.1 and earlier driver, the runtime
calls the driver's DdBlt function and synchronizes this call with the command stream. For a DirectX 9.0 and later
driver, the runtime passes the D3DDP2OP_BLT, D3DDP2OP_SURFACEBLT, or D3DDP2OP_COLORFILL operation
code along with the D3DHAL_DP2BLT, D3DHAL_DP2SURFACEBLT, or D3DHAL_DP2COLORFILL structure
respectively in the command stream. DirectX 9.0 and later drivers must support these 2D operation codes.
If the runtime specifies the DDBLT_COLORFILL flag in a call to a DirectX 8.1 or earlier driver's DdBlt function, the
runtime converts the D3DCOLOR fill-color type to an explicit pixel value as long as the runtime recognizes the
target surface format (that is, the code for the format is one of the codes in the D3DFORMAT enumerated type). If
the format is supplied by the vendor and not recognized by the runtime, the runtime passes the D3DCOLOR fill-
color type directly to the driver for processing. However, the runtime converts, to explicit pixel values, the
D3DCOLOR fill-color types of certain color formats that are used by DirectShow but are otherwise private to the
driver.
Reporting Support for 2D Operations Using Surface
Formats
The driver specifies flags in the dwOperations member of the DDPIXELFORMAT structure for a surface's format
to indicate that it can perform 2D operations using that format.
For example, the driver can indicate that it can copy to or from and color fill to a surface by setting the
D3DFORMAT_OP_OFFSCREENPLAIN flag.
When the driver uses vendor-supplied codes or codes from the D3DFORMAT enumerated type to set the
dwFourCC member of DDPIXELFORMAT and assign the format for a surface, the driver can also use the
D3DFORMAT_OP_CONVERT_TO_ARGB and D3DFORMAT_MEMBEROFGROUP_ARGB flags to indicate whether
color conversion can be performed between source and target surfaces. That is, a target surface that has the
D3DFORMAT_MEMBEROFGROUP_ARGB flag set indicates that its color format can be converted from any source
surface that has the D3DFORMAT_OP_CONVERT_TO_ARGB flag set.
The driver can only specify the D3DFORMAT_MEMBEROFGROUP_ARGB flag for target surface formats with at least
5 bits of color information per channel. That is, the D3DFMT_A1R5G5B5 format set in the dwFourCC member of
DDPIXELFORMAT is valid. However, the D3DFMT_A4R4G4B4 format is invalid. The driver is also constrained to
certain source surface formats when specifying the D3DFORMAT_OP_CONVERT_TO_ARGB flag. Source formats
can be any format that is valid for the D3DFORMAT_MEMBEROFGROUP_ARGB flag or a FOURCC surface format.
Note that although D3DFORMAT_OP_CONVERT_TO_ARGB and D3DFORMAT_MEMBEROFGROUP_ARGB indicate
ARGB formats, the runtime also lets the driver specify surfaces with XRGB formats (for example,
D3DFMT_X1R5G5B5). If the driver specifies D3DFORMAT_MEMBEROFGROUP_ARGB or
D3DFORMAT_OP_CONVERT_TO_ARGB with an invalid format, the runtime prevents the Direct3D HAL from
loading.
Using DXVA with 2D Operations
DirectX 9.0 and later drivers use the D3DDP2OP_BLT operation code to perform blits between DirectX Video
Acceleration (DXVA) surfaces. Therefore, if the runtime detects a DirectX 9.0 or later driver, the runtime must call the
driver's D3dCreateSurfaceEx function to create any DXVA (or 2D-only) surface.
Supporting Dynamic Resources
A DirectX 9.0 version driver must support the following dynamic resources:
Dynamic Vertex and Index Buffers
Dynamic Textures
Dynamic Vertex and Index Buffers
A dynamic vertex or index buffer is a resource that an application frequently locks and writes to. When a dynamic
buffer is locked in a call to the driver's LockD3DBuffer function, the DDLOCK_OKTOSWAP bit (also known as the
D3DLOCK_DISCARD bit) of the dwFlags member of the DD_LOCKDATA structure can be set to indicate that the
caller does not require the existing contents of the buffer. Therefore, the driver can discard the contents before
returning the pointer to the buffer data. Because the caller does not require the existing contents, the driver can
rename the buffer by setting the fpVidMem member of the DD_SURFACE_GLOBAL structure for the buffer to a
new value. By renaming the buffer (that is, setting up multiple buffering), the driver avoids hardware stalling.
The DDLOCK_OKTOSWAP bit can only be set to lock dynamic buffers and never to lock static buffers.
Note that drivers should store dynamic buffers in AGP memory because if dynamic buffers are stored in local
video memory and an application writes data into those buffers in a nonsequential manner, bus performance
might be seriously affected.
Dynamic Textures
Dynamic textures are almost exactly the same as dynamic buffers. Because applications also frequently lock and
modify dynamic textures, drivers should:
Optimize the texture upload or tiling speed.
Create dynamic textures in a nontiled manner if the hardware architecture lets the driver use nontiled
textures. This is because the performance improvement received from not requiring the driver to untile
dynamic textures when the textures are locked is greater than from the fill-rate advantages of tiling.
Set up multiple buffering similar to the description in Dynamic Vertex and Index Buffers. That is, set the
DDLOCK_OKTOSWAP bit to lock dynamic textures. Similarly, storing dynamic textures in local video
memory can also cause system performance to suffer if the application writes to such textures in a
nonsequential manner. Therefore, the driver should store dynamic textures in AGP memory.
Supporting Vertex Shader Declarations
A DirectX 9.0 version driver must support vertex shader declarations as described in the following topics:
Separating Declarations and Code for Vertex Shaders
Supporting Vertex Elements Sharing Offset in a Stream
Handling Vertex Elements
Separating Declarations and Code for Vertex Shaders
In DirectX 9.0, declarations and code for a vertex shader are no longer bound together when the vertex shader is
created. A DirectX 9.0 version driver for a device that supports vertex shaders must handle separate creations and
management of declaration and code objects. However, this DirectX 9.0 driver must still be able to manage a
vertex shader object, which combines both declarations and code, because the DirectX 8.0 runtime might request
to create such a vertex shader object. For more information, see Vertex Shaders.
The DirectX 9.0 runtime assigns handles from separate handle pools to both declaration and code objects. The
DirectX 9.0 driver must store these handles in separate arrays. Like the vertex shader handle space in DirectX 8.0,
DirectX 9.0 shares the vertex shader declaration handle space with flexible vertex format (FVF) codes. Setting bit
zero of the handle indicates a vertex shader declaration, otherwise a FVF code. For more information, see the
reference rasterizer (refrast.cpp sample code).
The DirectX 9.0 driver receives a vertex shader declaration when it processes the
D3DDP2OP_CREATEVERTEXSHADERDECL operation code in its D3dDrawPrimitives2 function. A
D3DHAL_DP2CREATEVERTEXSHADERDECL structure and an array of D3DVERTEXELEMENT9 structures that
define the vertex elements that make up the shader declaration follow the operation code in the command stream.
If the DirectX 9.0 driver is implemented to process vertex elements of the shader declaration, it must support all
the possible uses of the vertex data. That is, it must support all the D3DDECLUSAGE types along with multiple
meanings (usage-index values) for those types. For more information about D3DVERTEXELEMENT9 and
D3DDECLUSAGE, see the latest DirectX SDK documentation.
The DirectX 9.0 driver receives vertex shader code when it processes the
D3DDP2OP_CREATEVERTEXSHADERFUNC operation code. A D3DHAL_DP2CREATEVERTEXSHADERFUNC
structure and the vertex shader code follow the operation code in the command stream. For more information
about the format of individual shader code and the tokens that comprise each shader code, see Direct3D Driver
Shader Codes.
The DirectX 9.0 driver processes the D3DDP2OP_SETVERTEXSHADERDECL and
D3DDP2OP_SETVERTEXSHADERFUNC operation codes to make particular vertex shader declaration and code
current in the vertex shader assembler. The driver processes the D3DDP2OP_DELETEVERTEXSHADERDECL and
D3DDP2OP_DELETEVERTEXSHADERFUNC operation codes to remove these vertex shader declaration and code
from the vertex shader assembler. For each of these operations codes, a D3DHAL_DP2VERTEXSHADER structure
follows in the command stream. This structure contains just one member that identifies the handle to the
declaration or code to set or delete.
Supporting Vertex Elements Sharing Offset in a
Stream
A DirectX 9.0 version driver indicates that its device lets multiple vertex elements share the same offset in a stream
by setting the D3DDEVCAPS2_VERTEXELEMENTSCANSHARESTREAMOFFSET capability bit in the DevCaps2
member of the D3DCAPS9 structure. A vertex shader declaration consists of an array of vertex elements. For more
information, see Separating Declarations and Code for Vertex Shaders.
If a DirectX 9.0 driver for a device that supports pixel shader (PS) versions earlier than 3.0 sets
D3DDEVCAPS2_VERTEXELEMENTSCANSHARESTREAMOFFSET, the driver can handle most vertex declarations with
elements that specify the D3DDECLUSAGE_POSITIONT (0) usage type. This pre PS 3.0-driver converts vertex
declarations with D3DDECLUSAGE_POSITIONT (0) to valid flexible vertex format (FVF). However, this pre PS 3.0-
driver cannot handle vertex declarations with elements that specify the D3DDECLUSAGE_POSITIONT (0) usage type
if the declarations have gaps in texture coordinates. For example, this pre PS 3.0-driver cannot handle the following
vertex declaration:
{0,0,D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITIONT, 0}

{0,16,D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0}
{0,24,D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 5}
Because there is a gap in the texture coordinates, this pre PS 3.0-driver cannot express the
D3DDECLUSAGE_TEXCOORD elements using FVF.
If a DirectX 9.0 driver for a device that supports pixel shader 3.0 and later sets
D3DDEVCAPS2_VERTEXELEMENTSCANSHARESTREAMOFFSET, the driver must handle all vertex declarations with
elements that specify the D3DDECLUSAGE_POSITIONT (0) usage type. This driver must let multiple vertex
elements:
Share the same offset in a stream.
Be different types. Therefore, they can have different sizes.
Overlap arbitrarily. For example, one element can start at a location of a stream that is currently in the
middle of another element.
Handling Vertex Elements
The number of vertex elements in a shader declaration that a DirectX 9.0 version driver can handle depends on
whether the driver's device supports fixed-function or programmable vertex processing. For more information
about vertex elements in a shader declaration, see Separating Declarations and Code for Vertex Shaders.
If the device supports fixed-function vertex processing, the driver must handle up to 17 vertex elements (FVF
codes).
If the device supports programmable vertex processing, the driver must handle up to 64 vertex elements and skip
over those elements that it does not use. Because each channel (4 maximum) of an input register (16 maximum) for
a device that supports vertex shader 3_0 and later can be declared separately, up to 64 (16 * 4) vertex elements are
possible. This maximum number of 64 does not include the end element, which is formed from the D3DDECL_END
macro.
Supporting Stream Offsets
A DirectX 9.0 version driver must support letting applications store vertex data of multiple vertex formats in a
single vertex data stream. Applications notify the driver of where vertex data of a particular format is located in the
vertex data stream by supplying the stream offset, in bytes, to the beginning of that vertex data. To support stream
offset, the driver must process the D3DDP2OP_SETSTREAMSOURCE2 operation code in its D3dDrawPrimitives2
function. A D3DHAL_DP2SETSTREAMSOURCE2 structure, which follows the operation code in the command
stream, is used to specify the stream and the offset to where vertex data is located.
Reporting Support for Stream Offsets
A DirectX 9.0 version driver must indicate support for stream offsets by setting the D3DDEVCAPS2_STREAMOFFSET
capability bit in the DevCaps2 member of the D3DCAPS9 structure.
Reporting Support of UBYTE4 Vertex Element
A DirectX 9.0 version driver must report support of the UBYTE4 vertex element type by setting the
D3DDTCAPS_UBYTE4 bit in the DeclTypes member of the D3DCAPS9 structure. To indicate nonsupport of the
UBYTE4 vertex element type, the driver does not set the D3DDTCAPS_UBYTE4 bit. In contrast, A DirectX 8.1 and
earlier driver sets the D3DVTXPCAPS_NO_VSDT_UBYTE4 bit to indicate nonsupport of the UBYTE4 vertex element
type.
Supporting Commands for Setting Render Target
A DirectX 9.0 version driver must support new operation codes that set the render target. These operation codes
are discussed in the following topics:
Setting Multiple Render Targets and Depth Stencils
Verifying Validity of Render Target
Setting Multiple Render Targets and Depth Stencils
A DirectX 9.0 version driver must process D3DDP2OP_SETRENDERTARGET2 and D3DDP2OP_SETDEPTHSTENCIL

operation codes in its D3dDrawPrimitives2 function even if it does not support rendering to multiple targets
simultaneously. D3DHAL_DP2SETRENDERTARGET2 and D3DHAL_DP2SETDEPTHSTENCIL structures
respectively follow these codes in the command stream.
Verifying Validity of Render Target
A DirectX 9.0 version driver must verify whether its internal render target is valid before using the render target
because the DirectX 9.0 runtime permits applications to set render targets to NULL. In contrast, DirectX 8.1 and
earlier runtimes guarantee that render targets are always valid for a Direct3D context.
Setting Scissor Rectangle
A DirectX 9.0 version driver must support setting a rectangular clipping region, that is, a scissor rectangle. After this
scissor rectangle is set, rendering is restricted to just the portion of the render target that is specified by the scissor
rectangle. To set a scissor rectangle, the driver must process the D3DDP2OP_SETSCISSORRECT operation code in
its D3dDrawPrimitives2 function. A RECT structure that specifies the rectangular clipping region follows the
operation code in the command stream.
Reporting Support for Scissor Rectangle
A DirectX 9.0 version driver indicates that its device supports a scissor test by setting the
D3DPRASTERCAPS_SCISSORTEST capability bit in the RasterCaps member of the D3DCAPS9 structure.
Notifying about DirectX Version
DirectX 8.0 and later drivers are always notified about the DirectX runtime version being used by an application in a
D3DGDI2_TYPE_DXVERSION request so they can report device capabilities for the version. In addition, because an
application requests operations on surfaces with various pixel formats, DirectX 9.0 and later drivers are also
notified about the DirectX runtime version that the application supports in D3DGDI2_TYPE_GETFORMATCOUNT
and D3DGDI2_TYPE_GETFORMAT queries so those drivers can appropriately handle the operations for the version.
For example, for version 8.0 of the DirectX runtime, a DirectX 9.0 or later driver can set the number of samples for a
multiple-sampled surface using elements of the D3DMULTISAMPLE_TYPE enumerated type regardless of whether
the driver supports maskable multisampling. However, for version 9.0 of the DirectX runtime, a DirectX 9.0 or later
driver must not set D3DMULTISAMPLE_TYPE bits in the DDSCAPS3_MULTISAMPLE_MASK mask unless the driver
supports the bits as maskable. For more information about D3DMULTISAMPLE_TYPE, see the DirectX SDK
documentation.
In a D3DGDI2_TYPE_GETFORMATCOUNT query, the DirectX 9.0 driver is notified of the runtime version in the
dwReserved member of the DD_GETFORMATCOUNTDATA structure. The dwReserved member is set to
DD_RUNTIME_VERSION, which is 0x00000900 for DirectX 9.0.
In a D3DGDI2_TYPE_GETFORMAT query, the DirectX 9.0 driver is notified of the runtime version in the dwSize
member of the DDPIXELFORMAT structure that is specified in the format member of the DD_GETFORMATDATA
structure. The dwSize member is also set to DD_RUNTIME_VERSION.
Reporting DDI Version
A DirectX 9.0 version driver must report the version of the DDI that it supports so that the DirectX 9.0 runtime can
determine how to handle the driver. To report the DDI version, the driver responds to a GetDriverInfo2 request
that uses the D3DGDI2_TYPE_GETDDIVERSION value. The dwDXVersion member of the
DD_GETDDIVERSIONDATA structure is set to 9 to indicate that the DirectX 9.0 runtime makes the request.
The driver sets the dwDDIVersion member of DD_GETDDIVERSIONDATA to the DDI version that it supports for
the DirectX 9.0 runtime. If the driver was built with a prereleased version of the DirectX 9.0 Driver Development Kit
(DDK) in which the DDI version number was lower than the number in the final version of DirectX 9.0, the runtime
treats the driver as DirectX 8.0 instead.
Supporting Asynchronous Query Operations
The following topics describe how drivers support operations that query for information asynchronously:
Verifying Support of Query Types
Handling Asynchronous Queries
Imposing Requirements on the D3dDrawPrimitives2 DDI
Verifying Support of Query Types
The DirectX 9.0 runtime must verify which query types that a driver supports before any asynchronous query
operations can be performed. To verify the number of query types that the driver supports, the runtime sends a
GetDriverInfo2 request using the D3DGDI2_TYPE_GETD3DQUERYCOUNT value. If the driver does not support
any query types, it returns zero in the dwNumQueries member of the DD_GETD3DQUERYCOUNTDATA
structure for this request.
To receive information about each supported query type, the runtime sends a GetDriverInfo2 request using the
D3DGDI2_TYPE_GETD3DQUERY value for each type. The driver then returns information about the query type in a
DD_GETD3DQUERYDATA structure. For more information about GetDriverInfo2, see Supporting
GetDriverInfo2.
Handling Asynchronous Queries
A driver handles asynchronous query operations that are received in the command stream of its
D3dDrawPrimitives2 function as discussed in the following sequence:
1. The driver creates resources for a query after it receives a D3DDP2OP_CREATEQUERY operation code along
with a D3DHAL_DP2CREATEQUERY structure in the command stream.
2. The driver starts to process a query after it receives a D3DDP2OP_ISSUEQUERY operation code along with a
D3DHAL_DP2ISSUEQUERY structure in the command stream.
3. If previously submitted queries using the D3DDP2OP_ISSUEQUERY operation completed, the driver sets the
size of the response buffer in the dwErrorOffset member of the D3DHAL_DRAWPRIMITIVES2DATA
structure and sets the ddrval member of D3DHAL_DRAWPRIMITIVES2DATA to D3D_OK for successful
completion. The driver overwrites the command buffer in the incoming command stream with the response
buffer in the outgoing stream. The driver sets the D3DHAL_DP2RESPONSE structure's bCommand
member to D3DDP2OP_RESPONSEQUERY to indicate that responses to previously issued queries are
available in the response buffer. Each D3DHAL_DP2RESPONSEQUERY in the response buffer is followed
by the following data related to the query:
BOOL for D3DQUERYTYPE_EVENT. Before responding with D3DDP2OP_RESPONSEQUERY for an event,
the driver must ensure that the graphics processing unit (GPU) is finished processing all
D3DHAL_DP2OPERATION operations that are related to the event. That is, the driver only responds
after the event's ISSUE_END state occurs. Before the driver sets the event to the signaled state (set to
TRUE), the GPU might be required to perform a flush to ensure that the pixels are finished rasterizing,
blts are completed, resources are no longer being used, and so on. The driver must always set the event's
BOOL value to TRUE when responding.
DWORD for D3DQUERYTYPE_OCCLUSION. The driver sets this DWORD to the number of pixels for
which the z-test passed for all primitives between the begin and end of the query. If the depth buffer is
multisampled, the driver determines the number of pixels from the number of samples. However, if the
display device is capable of per-multisample z-test accuracy, the conversion to number of pixels should
generally be rounded up. An application can then check the occlusion result against 0, to effectively mean
"fully occluded." Drivers that convert multisampled quantities to pixel quantities should detect render
target multisampling changes and continue to compute the query results appropriately.
D3DDEVINFO_VCACHE structure for D3DQUERYTYPE_VCACHE.
If the supplied command buffer is too small for the driver to write all the responses, the driver also sends
D3DDP2OP_RESPONSECONTINUE in the outgoing stream.
4. If the runtime determines that the driver's D3dDrawPrimitives2 function succeeded (ddrval member of
D3DHAL_DRAWPRIMITIVES2DATA set to D3D_OK), the runtime examines the dwErrorOffset member of
D3DHAL_DRAWPRIMITIVES2DATA to determine if responses are available from the driver. This
dwErrorOffset member is zero if no responses are available; otherwise, dwErrorOffset is the size of the
response buffer in bytes. Therefore, on success of D3dDrawPrimitives2 (ddrval set to D3D_OK), the driver
must ensure that it only sets dwErrorOffset to nonzero when responses are available.
5. The runtime parses the returned response buffer and updates its internal data structures.
6. If the driver sent D3DDP2OP_RESPONSECONTINUE, the runtime submits an empty command buffer in the
incoming command stream so that the driver can continue to write more responses. The driver must ensure
that it can process empty command buffers.
Imposing Requirements on the D3dDrawPrimitives2
DDI
The ability of a DirectX 9.0 version driver to handle asynchronous queries imposes two new requirements on the
driver's D3dDrawPrimitives2 function. These requirements, which are mentioned in the Handling Asynchronous
Queries topic, are summarized in the following list:
The driver's D3dDrawPrimitives2 function must ensure that it can process empty command buffers because
the runtime might submit them so that the driver can write more responses. The runtime submits empty
command buffers in the incoming command stream if the driver previously returned the
D3DDP2OP_RESPONSECONTINUE operation code in the response buffer.
On success of D3dDrawPrimitives2 (ddrval of the D3DHAL_DRAWPRIMITIVES2DATA structure set to
D3D_OK), the driver must ensure that it only sets the dwErrorOffset member of
D3DHAL_DRAWPRIMITIVES2DATA to nonzero when responses are available. If the driver does not respond
to any queries and ddrval is D3D_OK, dwErrorOffset must be set to zero.
Processing with Busy Present Queues
A DirectX 9.0 version driver must return the DDERR_WASSTILLDRAWING value from a call to its DdFlip function if
the runtime passed the DDFLIP_DONOTWAIT flag in the dwFlags member of the DD_FLIPDATA structure and the
driver is unable to schedule a presentation, for example, if the present queue is full or if the driver is waiting for a
vsync interval. The runtime calls the driver's DdFlip function with DDFLIP_DONOTWAIT set if an application called
the IDirect3DSwapChain9::Present method with the D3DPRESENT_DONOTWAIT flag set. If the driver cannot
schedule a presentation, its DdFlip function returns DDERR_WASSTILLDRAWING in the ddRVal member of
DD_FLIPDATA. The application's Present method in turn returns DDERR_WASSTILLDRAWING, which lets the
application perform other processing.
The D3DPRESENT_DONOTWAIT flag is new for DirectX 9.0. The DDFLIP_DONOTWAIT flag has been available since
DirectX 7.0. If a DirectX 7.0 application were to set DDFLIP_DONOTWAIT in a call to the IDirectDrawSurface7::Flip
method, a DirectX 7.0 or later driver's DdFlip function would receive the DDFLIP_DONOTWAIT flag.
If D3DPRESENT_DONOTWAIT is not set, Present behaves as in DirectX 8.1 and earlier. That is, Present spins until
the hardware is free, without returning an error.
For more information about IDirect3DSwapChainXxx::Present, see the latest DirectX SDK documentation.
Recommended DirectX 9.0 Driver Support
It is recommended that DirectX 9.0 drivers set defaults for unused channels of texture formats.
Setting Defaults for Unused Channels of Texture Formats
Drivers and their devices should set a default value for the unused channels in texture formats so that applications
can rely on a known value being present in those channels that are not provided by input textures.
Similarly to the way that the reference rasterizer for DirectX 8.1 and later versions sets the default value for the
unused B channel in the D3DFMT_G16R16 texture format to 1.0f (see refrast.cpp sample code), a DirectX 9.0
version driver should set the default values for the unused channels in the following DirectX 9.0 floating-point
texture formats to 1.0f:
D3DFMT_R16F
D3DFMT_G16R16F
D3DFMT_R32F
D3DFMT_G32R32F
A DirectX 9.0 driver should also set the following defaults:
The alpha channel (A) (for transparency) to 1.0f, which is opaque.
The luminance channel (L) to 1.0f, which produces a maximum light intensity.
The reference rasterizer also sets defaults for the B channel, in addition to the A channel, (of RGBA) to 1.0f for the
D3DFMT_V16U16 format. In this way, the D3DFMT_V16U16 format operates interchangeably with the
D3DFMT_L6V5U5 format, which actually has an L channel. In the D3DFMT_L6V5U5 format, luminance is placed in
the B channel.
Optional DirectX 9.0 Driver Support
The following sections describe features that DirectX 9.0 drivers can implement if their hardware supports such
features.
Controlling Multiple-Sample Rendering
Supporting Nonstandard Display Modes
Supporting Multiple-Head Hardware
Managing MIP Map Textures
Handling Gamma Correction
Supporting Stretch Blit Operations
Rendering to Multiple Targets Simultaneously
Extended Blt Flags
Clamping Fog Intensity Per Pixel
Modifying Vertex Stream Frequency
Supporting Single-Pixel-Wide Antialiased Lines
Supporting High-Order Patched Surfaces
Supporting Additional Instruction Slots for Shader 3
Reporting Capabilities for Shader Versions
Controlling Multiple-Sample Rendering
The following topics describe how drivers support operations that control multiple-sample rendering.
Controlling Quality of Multiple-Sample Rendering
Dynamically Controlling Multiple-Sample Rendering
Controlling Quality of Multiple-Sample Rendering
Before an application can request to create a surface with a specific multisampling technique, it should call the
IDirect3D9::CheckDeviceMultiSampleType method to verify if the display device supports that technique. The
runtime in turn sends a GetDriverInfo2 request using the D3DGDI2_TYPE_GETMULTISAMPLEQUALITYLEVELS
value to retrieve the number of quality levels for the particular multisample type and surface format associated
with the technique. For more information about GetDriverInfo2, see Supporting GetDriverInfo2.
Whether the display device supports maskable multisampling (more than one sample for a multiple-sample
render-target format plus antialias support) or just nonmaskable multisampling (only antialias support), the driver
for the device must provide the number of quality levels for the D3DMULTISAMPLE_NONMASKABLE multiple-
sample type. Applications that just use multisampling for antialiasing purposes are then only required to query for
the number of nonmaskable multiple-sample quality levels that the driver supports.
Besides verifying whether the display device supports the multisampling technique,
IDirect3D9::CheckDeviceMultiSampleType also returns the number of quality levels associated with the
technique.
When the application requests to create a surface, it uses a combination of surface format, multisample type, and
number of quality levels whose support was previously verified. This ensures that the surface is created
successfully. The runtime calls the driver's DdCanCreateSurface, DdCreateSurface, or D3dCreateSurfaceEx
function to create the surface. In this call, the runtime encodes the number of samples for the multiple-sampled
surface into five bits (the DDSCAPS3_MULTISAMPLE_MASK mask) and the number of multiple-sample quality
levels into three bits (the DDSCAPS3_MULTISAMPLE_QUALITY_MASK mask) of the dwCaps3 member of the
DDSCAPS2 structure for the surface.
For more information about IDirect3D9::CheckDeviceMultiSampleType, see the latest DirectX SDK
documentation.
Dynamically Controlling Multiple-Sample Rendering
A DirectX 9.0 version driver can support the capability of alternately enabling and disabling multiple-sample
rendering between the rendering of primitives. To report that the driver's device supports this capability, the driver
sets the D3DPRASTERCAPS_MULTISAMPLE_TOGGLE capability bit in the RasterCaps member of the D3DCAPS9
structure. The driver returns a D3DCAPS9 structure in response to a GetDriverInfo2 query similarly to how it
returns a D3DCAPS8 structure as described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this
query is described in Supporting GetDriverInfo2.
To toggle multiple-sample rendering on and off between begin-scene and end-scene states, the driver receives the
D3DDP2OP_RENDERSTATE operation code in the command stream of its D3dDrawPrimitives2 function. The
driver processes the D3DRS_MULTISAMPLEANTIALIAS render state from the RenderState member of the
D3DHAL_DP2RENDERSTATE structure that is associated with this operation code. The driver determines whether
to enable or disable multiple-sample rendering from the Boolean value in the dwState member of
D3DHAL_DP2RENDERSTATE. The value TRUE means to enable and FALSE means to disable.
If the D3DPRASTERCAPS_MULTISAMPLE_TOGGLE capability bit is set, the driver can receive the
D3DRS_MULTISAMPLEANTIALIAS render state between D3DRENDERSTATE_SCENECAPTURE render states that
specify TRUE for begin-scene information and FALSE for end-scene information.
Supporting Nonstandard Display Modes
A DirectX 9.0 version driver for a device that supports any nonstandard display modes, such as the 10-bits-per-
channel (10:10:10:2) display and render target format, must respond to requests to enumerate these extended
nonstandard display modes. In addition, the DirectX 9.0 driver must be able to perform operations that enable
switching between standard and nonstandard display modes. The following sections describe how drivers support
nonstandard display modes:
Enumerating Extended Formats
Switching Between Standard and Nonstandard Modes
Handling Nonstandard Display Modes
Enumerating Extended Formats
The DirectX 9.0 runtime must verify which extended nonstandard display modes that a driver supports before
performing any operations using those display modes. To verify the number of nonstandard display modes that
the driver supports, the runtime sends a GetDriverInfo2 request using the
D3DGDI2_TYPE_GETEXTENDEDMODECOUNT value. If the driver does not support any nonstandard display modes,
it returns zero in the dwModeCount member of the DD_GETEXTENDEDMODECOUNTDATA structure for this
request. To receive information about each supported nonstandard display mode, the runtime sends a
GetDriverInfo2 request using the D3DGDI2_TYPE_GETEXTENDEDMODE value for each mode. The driver then
returns a D3DDISPLAYMODE structure that specifies the nonstandard display mode in the mode member of the
DD_GETEXTENDEDMODEDATA structure. For more information about GetDriverInfo2, see Supporting
GetDriverInfo2. For more information about D3DDISPLAYMODE, see the latest DirectX SDK documentation.
Switching Between Standard and Nonstandard
Modes
A DirectX 9.0 driver creates the standard primary surface for a standard display mode and a dummy primary
surface for the nonstandard mode so that the runtime can switch between modes when necessary. Both surfaces
represent the same video memory, except displayed in different formats. The driver switches between standard and
nonstandard modes when a page flip is requested as shown in the following sequence:
1. The application requests a mode switch.
An application calls the ChangeDisplaySettings function to change video mode to a matching bit depth.
For the 10:10:10:2 mode, the bit depth is 32 bits per pixel. For more information about
ChangeDisplaySettings, see documentation for the Microsoft Windows SDK.
2. The driver creates the standard primary surface.
The runtime calls the driver's DdCreateSurface function to request the creation of the primary surface. This
primary surface uses the standard display format (for example, D3DFMT_A8B8G8R8) and has no back
buffers.
3. The driver creates the dummy primary surface chain.
The runtime calls the driver's DdCreateSurface function to request the creation of the dummy primary
surface. The runtime specifies the DDSCAPS2_EXTENDEDFORMATPRIMARY (0x40000000) capability bit in
the dwCaps2 member of the DDSCAPS2 structure for this surface to indicate that the surface uses a
nonstandard display mode (for example, D3DFMT_A2R10G10B10). The runtime also specifies the
DDSCAPS_OFFSCREENPLAIN capability bit in the dwCaps member of DDSCAPS2 to indicate that the
surface has an explicit pixel format.
Because this surface is intended to be just another name for the existing primary surface, the driver should
not allocate further video memory to the surface.
For this surface, the runtime also specifies the DDSCAPS_FLIP and DDSCAPS_COMPLEX capability bits in
dwCaps and an attached set of back buffers similarly to the way the runtime sets up a standard primary
surface flipping chain. The driver should allocate video memory for these back buffers because no further
calls to the driver's DdCreateSurface function are made for these back buffers; that is, the runtime creates
more than one surface object only for the standard primary.
4. The driver flips the surface to the nonstandard format.
While the display device outputs the standard format, the application composes a nonstandard image in one
of these back buffers. Once this image is ready for display, the runtime specifies one of the nonstandard
surfaces as the target in a call to the driver's DdFlip function. The driver then reprograms the display device
to output the nonstandard format.
5. The application runs.
The application generates further calls to the driver's DdFlip function between the nonstandard buffers, and
the driver continues to display the nonstandard format. The application can also generate calls to the driver's
D3dDrawPrimitives2 function using the D3DDP2OP_BLT operation code to copy the back buffer to the
front buffer, but these calls are always made between two nonstandard surface objects. Unless the driver
supports the nonstandard format in windowed mode, the driver does not process blts between nonstandard
and standard surface formats. For more information about the windowed-mode case, see Supporting Two-
Dimensional Operations.
6. The driver flips the surface back to standard format.
When the application is closed or minimized, the runtime specifies the standard-format primary surface as
the destination in a call to the driver's DdFlip function. The driver then reprograms the display device to
output the standard format.
7. The driver destroys the dummy surface.
When the driver destroys the dummy surface, it should ensure that the standard format is reprogrammed in
the display device.
Handling Nonstandard Display Modes
A DirectX 9.0 driver for a device that supports a nonstandard display mode must also handle the following
operations using that nonstandard mode:
Flip, blit, lock, and unlock operations that behave the same as with a standard display mode.
Calls to the driver's Graphics Device Interface (GDI) functions while the DirectX-primary surface is active.
The driver should not receive any GDI DDI drawing calls while the DirectX primary is active. However, the
driver should handle such drawing without causing the operating system to crash. The driver can provide an
implementation for this situation, ignore it by immediately returning success, or fail it. Note that the data
from GDI is based on a GDI primary surface format. Therefore, if the driver provides an implementation for
this situation, it must convert from the GDI format before drawing to the DirectX-primary surface.
Calls to the GDI DDI DrvDeriveSurface function against the DirectX-primary surface cannot occur because
GDI cannot access the nonstandard display format.
Typing "Ctl+Alt+Del" while the DirectX-primary surface is active.
The kernel specifies the standard primary as the target in a call to the driver's DdFlip function before any GDI
drawing occurs. Therefore, the driver must program the display device to the standard display mode before
any GDI drawing. The driver's DdDestroySurface function for the primary surface is also called. Note that the
driver can discard contents of the DirectX-primary surface.
Windowed mode and nonstandard formats
The Reporting Support for 2D Operations Using Surface Formats topic describes how the driver specifies
that it can perform rendering to and present images from a format that differs from that of the current
desktop. This scheme extends naturally to support nonstandard formats; the driver must merely add the
enabling flags in the dwOperations member of the DDPIXELFORMAT structure for the formats.
Private formats and legacy code cannot be used to expose nonstandard desktop formats.
Supporting Multiple-Head Hardware
A DirectX 9.0 version driver can implement multiple-head support for multiple-head cards, which have the
following features:
Common frame buffer and accelerator for all display devices (heads) on the card.
Independent digital to analog converters (DAC) and monitor outputs for each display device (head).
More usable multiple-monitor support than a similar number of heterogeneous display cards.
One head control or independent operation. A single device can be exposed to an application and that device
can drive several fullscreen swap chains. Consequently, all resources are shared by the many heads, and
each head has exactly the same capabilities. Each head can be set to independent display modes; the
application can then call the Present method on each head at different times. Each swap chain for a head
must be fullscreen. Once the device enters multiple-head mode, it must remain fullscreen. The transition
back to windowed mode requires the destruction of the device (except for the minimize operation).
Note that for DirectX 8.1 and earlier applications, a DirectX 9.0 driver should still use the former mechanism of
dividing video memory between heads and treating each head as a fully independent accelerator. Only if an
application is coded to function in the DirectX 9.0 multiple-head mode does the driver use these new multiple-head
features. The driver is notified when to switch between the two modes of operation.
The following sections describe how drivers support multiple-head hardware.
Identifying Adapter Group and Providing Capabilities
Creating Heads
Example of Handle Assignments
Managing Multiple-Head Memory
Reporting Multiple-Head Video Memory
Presentation with Multiple Heads
Using Multiple Multiple-Head Adapters
Identifying Adapter Group and Providing Capabilities
The DirectX 9.0 runtime sends a GetDriverInfo2 request using the D3DGDI2_TYPE_GETADAPTERGROUP value to a
DirectX 9.0 version driver to request the identifier for the group of adapters that make up the driver's multiple-
head video card. The driver returns the identifier in the ulUniqueAdapterGroupId member of a
DD_GETADAPTERGROUPDATA structure. The driver must provide a unique identifier for the master and all
subordinate adapters within a group. The runtime uses this identifier in subsequent operations to determine
whether the given adapter is part of a group. This identifier must be unique across drivers, including drivers from
other hardware vendors. Therefore, it is recommended to report this identifier as a unique nonzero kernel-mode
address that cannot be common with other multiple-head video cards.
A DirectX 9.0 version driver indicates how its multiple-head hardware is configured by setting the following
members of the D3DCAPS9 structure:
NumberOfAdaptersInGroup
Specifies the number of adapters in the adapter group (only if master). This is 1 for single-head cards
(conventional adapters). The value is greater than 1 for the master adapter of a multiple-head card. The
value is 0 for a subordinate adapter of a multiple-head card. Each card can have at most one master, but can
have many subordinates.
MasterAdapterOrdinal
Specifies the number for the master adapter in the group. This number is relevant if the system contains
more than one multiple-head card. For example, if the system contains a single-head card, a double-head
card, and a triple-head card, the system references the heads as: 0 for the single, 1 and 2 for the double, and
3, 4, and 5 for the triple. In this case, the master adapter is: 0 for the single, 1 for the double, and 3 for the
triple.
AdapterOrdinalInGroup
Specifies a number that indicates the order in which heads in a group are referenced by the driver. This
value is always 0 for the master adapter and numbered consecutively for each subordinate adapter (that is,
1, 2, and so on).
The driver returns a D3DCAPS9 structure in response to a GetDriverInfo2 query similarly to how it returns a
D3DCAPS8 structure as described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this query is
described in Supporting GetDriverInfo2.
Creating Heads
The Microsoft DirectX 9.0 driver creates one Microsoft Direct3D context for each multiple-head card and a
Microsoft DirectDraw object for each head on each multiple-head card. Therefore, the creation process for the
multiple-head card has a per-head part and a cross-head part. The per-head part corresponds roughly to
DirectDraw DDI calls, the cross-head part to Direct3D DDI calls.
The point of connection across the various heads is the Direct3D handle that is created by the driver's
D3dCreateSurfaceEx function. The driver assigns a unique Direct3D handle to each surface across all heads in the
group. The Direct3D context on the master head manages all these handles and can target any render target that is
created on any head, most notably the back buffers in the flipping chains on the subordinate heads. The
D3dCreateSurfaceEx function for each subordinate head must be able to update the handle lookup table that is
managed by the master head. Subsequently, these handles are only used in calls to the driver's
D3dDrawPrimitives2 function for the master head.
The driver only creates textures and other resources on the master head.
The driver creates and works with heads as described in the following sequence:
1. For each head, the following operations are performed to set up the display mode and primary-flipping
surfaces:
The runtime sets the display mode.
The runtime creates the DirectDraw object.
The runtime creates a primary flipping chain and possibly a Z buffer. The runtime specifies the
DDSCAPS2_ADDITIONALPRIMARY (0x80000000) capability bit in the dwCaps2 member of the
DDSCAPS2 structure for each surface (including the Z buffer) to indicate an additional primary surface
for a multiple-head card. The runtime calls the driver's DdCreateSurface function.
The runtime calls the driver's D3dCreateSurfaceEx function, first for the master and in the order defined
by AdapterOrdinalInGroup for the subordinates. In this call, the Direct3D handle that the runtime
passes is guaranteed to be unique across all the heads in the group. The driver can insert a reference into
a subordinate head's handle lookup table. However, because a Direct3D context is not created on
subordinate heads, no D3dDrawPrimitives2 commands are issued to any subordinate heads. Therefore,
inserting this reference is not necessary.
After the runtime calls DdCreateSurface for all heads (including the master), a further
D3dCreateSurfaceEx call is made for each subordinate head's flipping chain on the master head's
DirectDraw object. The driver makes an entry in the master head's handle lookup table for each front,
back, and depth/stencil buffer for each subordinate head.
2. The runtime calls the driver's D3dContextCreate function only for the DirectDraw object on the master head.
This is the only context that is used while the application runs.
3. When the application requests to create textures and resources, the runtime call the driver's
DdCreateSurface and D3dCreateSurfaceEx functions through the master head.
4. When the application makes rendering calls, the runtime calls the driver's D3dDrawPrimitives2 function
on the master head using the appropriate operation codes.
When the application performs other operations, the following calls are routed to master and subordinate
heads:
As described in step one, D3dCreateSurfaceEx calls are made to supply the driver with handles for each
subordinate head's flipping chain. These handles are typically used with the
D3DDP2OP_SETRENDERTARGET operation code token when the application renders a frame into the
back buffer of one of the subordinate head's swap chains.
The runtime calls the driver's DdFlip function on each head (master and subordinates) to present back
buffers to primary surfaces for those heads. This call never presents a back buffer from one head to
another head's primary surface. The flipping chains on each head are completely independent.
The runtime might call the driver's DdBlt function to copy the back buffer to the front buffer for any head.
This call never copies a back buffer from one head to another head's front buffer.
The runtime can call the driver's DdGetScanLine function on any head because this call relates to the state
of the monitor and not the Direct3D context.
The runtime can call the driver's DdLock function on any head's back buffer.
The application can either allocate a Z buffer with each head or allocate one Z buffer to use with each head
sequentially. In the former case, the runtime calls the driver's DdCreateSurface function on each head
(master and subordinates) as described in step one. In the latter case, the runtime calls the driver's
DdCreateSurface function only on the master head. In either case, the runtime calls the driver's
D3dCreateSurfaceEx function to supply handles to all Z buffers that are unique across all heads in the
group.
Example of Handle Assignments
The following table shows an example arrangement of Direct3D handle values (supplied through
D3dCreateSurfaceEx) that might be present in a two-head scenario. The front, back and depth/stencil surfaces on
each head all have unique handles; the master head must work with all of these handles. The master head owns all
texture, vertex buffer, and index buffer surfaces; handles for these surfaces are only created on the master head.
MASTER HEAD HANDLE VALUE SUBORDINATE HEAD HANDLE VALUE SURFACE
0 Front buffer for master
1 Back buffer for master
2 Depth buffer for master
3 Front buffer for subordinate
4 Back buffer for subordinate
5 Depth buffer for subordinate
6 Texture 1 for master

Managing Multiple-Head Memory
Setting the DDSCAPS2_ADDITIONALPRIMARY capability bit in the dwCaps2 member of the DDSCAPS2 structure
for each surface on the subordinate head notifies that head that these surfaces are the last surfaces that are
allocated from the video memory assigned to that head. The subordinate head should then relinquish control of the
allocation of its video memory to the master head because the subordinate head is guaranteed that it does not
receive subsequent DdCreateSurface calls for the lifetime of the application.
The driver must ensure that the master head is able to allocate memory that is associated with subordinate heads.
When the runtime calls the driver's DdDestroySurface function to destroy surfaces on the subordinate head in
which the DDSCAPS2_ADDITIONALPRIMARY capability bit is set, the driver is notified that the subordinate head is
again in control of its video memory management.
For the most part, this choice of which head owns video memory is inherent in the existing DirectDraw process.
Specifically:
The runtime guarantees that no subsequent allocation requests are made on subordinate heads after
DdCreateSurface calls are made using the DDSCAPS2_ADDITIONALPRIMARY bit. Therefore, the driver is not
required to restrict allocations from its own video memory pool at any time.
When the application is terminated or minimized, all surfaces are destroyed. Therefore, all textures that were
created by the master head from the subordinate head's pool are cleaned up.
If the DDSCAPS2_ADDITIONALPRIMARY bit is not set for surfaces on subordinate heads, then those heads
continue to allocate video memory as if they were stand-alone heads. In fact, such subordinate heads are
functionally identical to any other multiple-monitor adapter.
The driver is required to provide an implementation in which the master head allocates memory from a
subordinate head's pool, including the determination about when a particular resource can be allocated
from a subordinate head's pool. Note that the master head does not have any information itself about
whether it is participating in a multiple-head scenario. When the master head runs out of its own video
memory, it must traverse all the subordinate heads in its group to determine if any of these heads have
pools that can be used by the master (in other words, to determine if any of the subordinate heads received
DdCreateSurface calls with the DDSCAPS2_ADDITIONALPRIMARY bit set).
Finally, note that the runtime guarantees that all heads in the group participate in the multiple-head
scenario. Therefore, the driver must only maintain one bit of state indicating whether it is currently in
multiple-head mode.
Reporting Multiple-Head Video Memory
In multiple-head mode, the master head must respond to a call to the driver's DdGetAvailDriverMemory function
as if the master head were the only head controlling the multiple-head card. The amount of free memory that the
driver returns must include the video memory of any subordinate head whose video memory was surrendered to
the master head (that is, any subordinate head that received DdCreateSurface calls with the
DDSCAPS2_ADDITIONALPRIMARY bit set).
Presentation with Multiple Heads
Applications can call the Present method either to present contents of back buffers for all heads at once or to
present the back buffer for an individual head. For more information about Present, see the latest DirectX SDK
documentation.
The runtime in turn makes independent sequential calls to the driver's DdFlip or DdBlt function. Because the display
mode and refresh rate of each head might be different, these calls are always independent at the DDI level.
Using Multiple Multiple-Head Adapters
The driver can provide multiple-head support if the system is equipped with more than one multiple-head card. If
the driver owns more than one multiple-head card, then the driver must ensure that the separate multiple-head
cards remain independent.
Managing MIP Map Textures
The following topics describe how a DirectX 9.0 version driver can manage MIP-map textures:
Handling Lightweight MIP Map Textures
Obtaining Sublevels of Lightweight MIP Map Textures
Generating Sublevels of MIP Map Textures
Handling Lightweight MIP Map Textures
Because the MIP sublevels of lightweight MIP-map textures are implicit and do not have corresponding DirectDraw
surface structures (DD_SURFACE_LOCAL, DD_SURFACE_GLOBAL and DD_SURFACE_MORE), a DirectX 9.0
version driver can determine if a MIP-map texture is lightweight and thus avoid creating unnecessary driver surface
structures to save memory. To determine if a MIP-map texture is lightweight, the driver verifies if the
DDSCAPS3_LIGHTWEIGHTMIPMAP bit in the dwCaps3 member of the DDSCAPSEX (DDSCAPS2) structure for the
texture surface is set.
Note that all MIP-map textures in DirectX 9.0 are lightweight by default.
The DirectX 9.0 version driver observes the following rules when handling lightweight and heavyweight MIP-map
textures:
A DirectX 9.0 and later driver can receive a D3DDP2OP_TEXBLT operation code in which the source MIP-map
texture is heavyweight and the destination MIP-map texture is lightweight or vice versa. Of course, the driver
can also receive a D3DDP2OP_TEXBLT in which both source and destination MIP-map textures are
lightweight.
Because a system memory lightweight MIP-map texture consumes only a single surface of memory, the
entire MIP map is visible to the driver within the top-level surface. The driver is never required to perform a
texture operation directly from a system memory lightweight MIP-map texture. Such a MIP-map texture can
only be the source of a D3DDP2OP_TEXBLT.
The following MIP-mapped textures must be heavyweight because locks and direct writes to video or AGP
memory corresponding to each sublevel are possible with such textures:
Render target
Depth stencil
Dynamic
Vendor formatted
Therefore, a full surface data structure is required per sublevel.
Because a video or AGP memory lightweight MIP-map texture is never locked or referenced by other DDIs,
such as DdBlt, the driver determines the sublevel placement for such a MIP-map texture. Therefore, full
surfaces (explicit fpVidmem members of the DD_SURFACE_GLOBAL structure) for the sublevels of such a
MIP-map texture are not required.
Driver-managed lightweight MIP-map textures are also restricted to a single surface and must use exactly
the same layout that Direct3D uses with system memory lightweight MIP-map textures. Note that this has
no adverse effect (other than implementation cost) because the corresponding resident (video and AGP)
MIP-map textures can have their own implementation-specific layout.
Obtaining Sublevels of Lightweight MIP Map Textures
A DirectX 9.0 version driver can use the CPixel class methods to obtain information about the sublevels of a
lightweight system memory MIP-map texture -- only information about the top level of a lightweight MIP-map
texture is stored. If the driver must copy a lightweight system memory MIP-map texture to video memory, the
driver can use the CPixel class methods to calculate the source texture's size and the offset to the source texture's
sublevels.
Driver writers are not required to use the CPixel class methods to calculate the locations of sublevels for
lightweight MIP-map textures. However, the DirectX 9.0 runtime uses CPixel class methods to recover the memory
layout of lightweight system memory MIP-map textures. Therefore, to ensure that the runtime and drivers recover
the memory layout of lightweight system memory MIP-map textures in the same manner, driver writers must
follow the same CPixel class rules to implement their own code.
For information about how the CPixel class is implemented, see the pixel.hpp, pixel.cpp, and pixlib.cpp files in the
PixLib sample in the MSDN Developer Samples code gallery.
The CPixel class contains the following methods:
CPIXEL METHOD DESCRIPTION
ComputeSurfaceSize Determines the amount of memory required to allocate a

surface.
ComputeVolumeSize Determines the amount of memory required to allocate a

volume.
ComputeMipMapSize Determines the amount of memory required to allocate a

MIP-map texture.
ComputeMipVolumeSize Determines the amount of memory required to allocate a

MIP-map texture volume.
ComputeMipMapOffset Determines the sublevel offset of a MIP-map texture.
ComputeMipVolumeOffset Determines the subvolume offset of a MIP-map volume

texture.
ComputeSurfaceOffset Determines the subrectangular offset of a surface.

Generating Sublevels of MIP Map Textures
A display driver indicates support of automatically generating the sublevels of MIP-map textures by setting the
DDCAPS2_CANAUTOGENMIPMAP bit of the dwCaps2 member of the DDCORECAPS structure. The driver
specifies this DDCORECAPS structure in the ddCaps member of a DD_HALINFO structure. DD_HALINFO is
returned by the driver's DrvGetDirectDrawInfo function. The display driver also indicates whether a particular
surface format supports automatically generating sublevels by setting the D3DFORMAT_OP_AUTOGENMIPMAP
flag in the dwOperations member of the DDPIXELFORMAT structure for the format.
When a texture surface is created, the Direct3D runtime sets the DDSCAPS3_AUTOGENMIPMAP bit of the
dwCaps3 member of the DDSCAPSEX (DDSCAPS2) structure to indicate that the MIP-map sublevels for this
texture can be automatically generated. If Direct3D directs some textures to automatically generate their MIP-map
sublevels and some textures to not automatically generate, the driver can only perform blit operations
(D3DDP2OP_TEXBLT) on these textures as described in the following scenarios:
The driver cannot blit from a source texture that auto-generates MIP maps to a destination texture that does
not.
If the driver blits from a source texture that does not auto-generate MIP maps to a destination texture that
does, the driver only blits the topmost matching level. The sublevels from the source texture are ignored. The
destination sublevels can be generated.
Similarly, if the driver blits from source to destination textures that both auto-generate MIP maps, the driver
only blits the topmost matching level. The sublevels from the source texture are ignored. The destination
sublevels can be generated.
To generate the sublevels of a MIP-map texture, the driver receives a D3DDP2OP_GENERATEMIPSUBLEVELS
command along with a D3DHAL_DP2GENERATEMIPSUBLEVELS structure. In order to receive this command, the
texture's surface format must expose the D3DFORMAT_OP_AUTOGENMIPMAP flag.
For driver-managed resources, when the driver evicts and replaces a resource in video memory, the driver must
use the last set filter type to automatically generate sublevels. Because Direct3D does not control the eviction and
replacement of the resource, Direct3D does not send a D3DDP2OP_GENERATEMIPSUBLEVELS command to the
driver.
The Direct3D runtime cannot call the driver's DdLock function or use any other DDI to access the sublevels of an
auto-generated MIP-map texture. This implies that the sublevels for auto-generated MIP-map textures, like
lightweight MIP-map textures, are "implicit" and can be specified by the driver as appropriate. The driver is not
required to specify "complete" surface data structures. Note, however, that Direct3D must be able to call the
driver's DdLock or DdBlt functions, send the D3DDP2OP_BLT command, or use any other DDI (for driver-managed
textures, dynamic textures or vendor-specific formats only) to access the top level of an auto-generated MIP-map
texture.
Handling Gamma Correction
The following topics describe how a DirectX 9.0 version driver can handle the gamma correction of surface and
buffer content. Gamma-corrected content is stored in sRGB format. For more information about sRGB format, go to
the sRGB website.
Marking Formats for Gamma and Linear Conversion
Performing Gamma Correction on Swap Chains
Marking Formats for Gamma and Linear Conversion
A DirectX 9.0 version driver marks texture formats for linear or gamma conversion so that it can determine
whether to convert textures of those formats in order to accurately process or render them.
Texture content is typically stored in sRGB format, which is gamma corrected. However, for the pixel pipeline to
perform accurate blending operations on sRGB-formatted textures, the driver must convert those textures to a
linear format before reading from them. When the pixel pipeline is ready to write those textures out to the render
target, the driver must convert those textures back to sRGB format. In this way, the pixel pipeline performs all
operations in linear space.
The driver specifies the following flags in the dwOperations member of the DDPIXELFORMAT structure for a
texture surface's format to mark the format for conversion:
D3DFORMAT_OP_SRGBREAD to indicate whether a texture is gamma 2.2 corrected or not (sRGB or not), and
if it must be converted to a linear format by the driver either for blending operations or for the sampler at
lookup time.
D3DFORMAT_OP_SRGBWRITE to indicate whether the pixel pipeline should gamma correct back to sRGB
space when writing out to the render target.
Performing Gamma Correction on Swap Chains
Applications can maintain back buffers of their swap chains in linear color space in order to perform blending
operations correctly. Because the desktop is typically not in linear color space, gamma correction to the contents of
back buffers is required before the contents can be presented on the desktop.
An application calls the IDirect3DSwapChain9::Present method to present the contents of the next back buffer in
the swap chain. In this call, to indicate that the back-buffer contents are in linear color space, the application sets
the D3DPRESENT_LINEAR_CONTENT flag. The DirectX 9.0 runtime, in turn, calls the display driver's DdBlt function
with the DDBLT_EXTENDED_FLAGS and DDBLT_EXTENDED_LINEAR_CONTENT flags set. When the driver receives
this DdBlt call, the driver determines that the source surface contains content in a linear color space. The driver can
then perform gamma 2.2 correction (sRGB) on the linear color space as part of the blt. For more information about
extended blit flags, see Extended Blt Flags.
The driver sets the D3DCAPS3_LINEAR_TO_SRGB_PRESENTATION capability bit in the Caps3 member of the
D3DCAPS9 structure to indicate that its device supports gamma 2.2 correction. The driver returns a D3DCAPS9
structure in response to a GetDriverInfo2 query similarly to how it returns a D3DCAPS8 structure as described in
Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this query is described in Supporting GetDriverInfo2.
For more information about IDirect3DSwapChainXxx::Present, see the latest DirectX SDK documentation.
Supporting Stretch Blit Operations
How a driver performs a stretch blit depends on the platform on which it runs. For Windows 98/Me platforms,
when the driver's DdBlt function receives a blit request, the driver can calculate stretch factor from the unclipped
rectangular areas in the rOrigDest and rOrigSrc members of the DD_BLTDATA structure and factor in the
calculation when it performs the blit operation.
For DirectX 9.0 and later on NT-based operating systems, the driver can calculate and record stretch factor when it
receives a blit request with the DDBLT_EXTENDED_FLAGS and
DDBLT_EXTENDED_PRESENTATION_STRETCHFACTOR flags set in the dwFlags member of DD_BLTDATA. The
driver calculates the stretch factor from the unclipped source and destination rectangular areas in the rSrc and
bltFX members respectively of DD_BLTDATA with DDBLT_EXTENDED_PRESENTATION_STRETCHFACTOR set. Note
that the driver must obtain the unclipped destination rectangular area from the following members of the
DDBLTFX structure in bltFX, and not use information in the rDest member.
Left and top coordinates from the following members of the DDCOLORKEY structure in the ddckDestColorkey
member of DDBLTFX:
Left coordinate from the dwColorSpaceLowValue member of DDCOLORKEY.
Top coordinate from the dwColorSpaceHighValue member of DDCOLORKEY.
Right and bottom coordinates from the following members of the DDCOLORKEY structure in the
ddckSrcColorkey member of DDBLTFX:
Right coordinate from the dwColorSpaceLowValue member of DDCOLORKEY.
Bottom coordinate from the dwColorSpaceHighValue member of DDCOLORKEY.
Note that the driver interprets these coordinates as signed integers rather than DWORDs. Note also that the driver
must validate the rectangle that these coordinates form before calculating the stretch factor and programming the
stretch factor in the graphics device. For more information about DDBLTFX and DDCOLORKEY, see the latest
DirectDraw SDK documentation.
When the driver receives a blit with DDBLT_EXTENDED_PRESENTATION_STRETCHFACTOR set, the driver must not
use the unclipped rectangular areas to do any actual blitting.
When the driver subsequently receives blit requests with the DDBLT_PRESENTATION and
DDBLT_LAST_PRESENTATION flags set, the driver can factor in this recorded stretch factor in the blit operations.
After the driver finishes the final blit with the DDBLT_LAST_PRESENTATION flag set, the driver must clear the
stretch-factor record to prevent interference with any subsequent blits. For more information about the
DDBLT_PRESENTATION and DDBLT_LAST_PRESENTATION flags, see Presentation.
Because stretch factor is a floating-point calculation, not all graphics devices can support it. Therefore, the driver for
such a device is not required to calculate and use stretch factor. However, even if stretch-factor calculations are
unsupported, a DirectX 9.0 and later driver on an NT-based operating system must still determine the presence of
the DDBLT_EXTENDED_PRESENTATION_STRETCHFACTOR flag because attempting to perform an actual blit
operation in which the DDBLT_EXTENDED_PRESENTATION_STRETCHFACTOR flag is set would cause rendering
corruption.
For more information about extended blit flags, see Extended Blt Flags.
Rendering to Multiple Targets Simultaneously
A DirectX 9.0 version driver can render to multiple targets simultaneously if the driver indicates that its device
supports multiple render targets. To indicate the number of render targets that the device supports, the driver sets
this number in the NumSimultaneousRTs member of the D3DCAPS9 structure. The driver must set this number
to 1, if only rendering to a single target is supported. The driver returns a D3DCAPS9 structure in response to a
GetDriverInfo2 query similarly to how it returns a D3DCAPS8 structure as described in Reporting DirectX 8.0
Style Direct3D Capabilities. Support of this query is described in Supporting GetDriverInfo2.
Render targets in a multiple render target group must have identical dimensions but can have different surface
formats.
The driver receives the D3DDP2OP_SETRENDERTARGET2 operation code if an application requests to set the color
buffer for one of the render targets in the multiple group.
If the DirectX 9.0 driver supports rendering to multiple targets simultaneously, it must support certain features and
can support extended features. The following topics describe these required and optional features:
Required Features for Multiple Render Targets
Optional Features for Multiple Render Targets
Required Features for Multiple Render Targets
A DirectX 9.0 version driver that supports rendering to multiple targets simultaneously must support the following
features:
All surfaces for a given multiple render target group are allocated atomically. This limitation is addressed by
treating this as a new type of surface format with multiple RGBA channels interleaved.
Only 32-bit surface formats are supported (for example, RGBA8, RGBA10, U16V16, and R32f type formats).
This limitation is expressed by the name of the new surface formats.
A multiple render target group cannot be the primary (that is, the surface that is displayed). The multiple
render target group must be off-screen only. This limitation is expressed by the surface format enumeration.
A multiple render target group cannot be a mipmap. That is, the creation of a MIP chain fails.
An element of a multiple render target group cannot be set as a texture at the same time as being a render
target. However different elements of the group surface can simultaneously be textures and render targets.
No antialiasing of a multiple render target group is supported.
An element of a multiple render target group when used as a texture cannot be filtered. That is, no sampler
state can affect the lookup.
An element of a multiple render target group cannot be locked.
Multiple elements of a multiple render target group can be used simultaneously, by assigning each element
to various stages like typical textures.
Elements of a multiple render target group support gamma 2.2-1.0 conversion on read, just like other
texture formats.
The D3DDP2OP_CLEAR operation code clears all elements of a multiple render target group.
Optional Features for Multiple Render Targets
A DirectX 9.0 version driver that supports rendering to multiple targets simultaneously can support extended
features. If the driver supports these extended features, it must indicate such support by reporting capability bits in
the PrimitiveMiscCaps member of the D3DCAPS9 structure. The driver can support the following extended
features:
Setting independent bit depths for render targets in a multiple render target group. The render targets can
have different formats; however, unless this feature is supported, the render targets must have identical bit
depths. The D3DPMISCCAPS_MRTINDEPENDENTBITDEPTHS capability bit must be set to indicate support
for independent bit depths.
Performing operations--other than the z and stencil test--on render targets in a multiple render target group
after pixel shader operations. For example, unless this feature is supported, the driver cannot dither, alpha
test, apply fog, blend, or perform raster operations after pixel shader operations. The
D3DPMISCCAPS_MRTPOSTPIXELSHADERBLENDING capability bit must be set to indicate support for
postpixel-shader operations.
If D3DPMISCCAPS_MRTPOSTPIXELSHADERBLENDING is set, the display device must apply the following
states to all render targets that are simultaneously rendered:
Alpha blend. Set oCi to cause the color value to blend with the ith render target.
Alpha test. Set oC0 for a comparison to occur; if the comparison fails, the pixel is canceled for all render
targets.
Fog. Apply fog to render target 0; other render targets are undefined. The driver can apply fog to all
render targets using the same state.
Dither. Undefined.
Applying independent color-write masks (D3DRS_COLORWRITEENABLE) for render targets in a multiple render
target group. The D3DPMISCCAPS_INDEPENDENTWRITEMASKS capability bit must be set to indicate support
for independent color-write masks. If D3DPMISCCAPS_INDEPENDENTWRITEMASKS is set, the available number
of independent color-write masks is equal to the maximum number of render targets in a multiple render target
group (the NumSimultaneousRTs member of the D3DCAPS9 structure).
Note that a driver for a display device that supports pixel shader version 3.0 and later must indicate that it supports
the extended features for multiple render targets. For more information, see Reporting Capabilities for Shader 3
Support.
Extended Blt Flags
DirectX 9.0 uses the DDBLT_EXTENDED_FLAGS blt flag to extend use of DDBLT_Xxx blt flags that are available in
the dwFlags member of the DD_BLTDATA structure. When the DirectX 9.0 runtime calls the display driver's DdBlt
function to perform a blt operation, the runtime can combine DDBLT_EXTENDED_FLAGS with DDBLT_Xxx flags
using a bitwise OR to create new meanings for the flags. The driver then determines the presence of
DDBLT_EXTENDED_FLAGS, reinterprets the meaning of flags, and performs the blt operation accordingly. The
driver uses this mechanism when it determines if it should perform gamma correction on a linear color space
during a blt from a back buffer to the desktop. The driver also uses extended blt flags to determine if stretch-blit
operations are requested.
Clamping Fog Intensity Per Pixel
A DirectX 9.0 version driver for a device that supports either pixel or vertex shader version 2.0 and later must
indicate that its device supports clamping the fog intensity value on a per-pixel basis by setting the
D3DPMISCCAPS_FOGINFVF capability bit. This informs users that the device does not save the fog factor in the
specular alpha channel when using software vertex shaders. The device can pass the alpha channel of the specular
color (computed in the fixed function vertex pipeline) to the pixel processing unit, instead of always overwriting the
alpha channel with the per-vertex fog intensity value.
Because the driver clamps the fog intensity value on a per-pixel basis, the runtime for DirectX 9.0 and later no
longer clamps the fog intensity value before sending it to the driver.
The driver determines how to obtain the fog value by verifying if the D3DFVF_FOG bit in the flexible vertex format
(FVF) is set. If D3DFVF_FOG is set, the driver obtains the separate fog value that is passed per vertex. If
D3DFVF_FOG is not set and the driver must use fog, the driver obtains the fog value from the specular color's
alpha channel.
When the driver sets D3DPMISCCAPS_FOGINFVF, the runtime in turn sets the
D3DPMISCCAPS_FOGANDSPECULARALPHA capability bit in the PrimitiveMiscCaps member of the D3DCAPS9
structure.
Modifying Vertex Stream Frequency
A DirectX 9.0 version driver for a device that supports vertex shader version 3.0 and later must implement vertex
stream frequency division. For version 2.0 and earlier models of vertex shader (including fixed function), the vertex
shader is called once per vertex; for each call, the input vertex registers are initialized with unique vertex elements
from the vertex streams. However, using vertex stream frequency division, the vertex shader (3.0 and later) can be
called to initialize applicable input registers at a less frequent rate.
When an application calls the IDirect3DDevice9::SetStreamSourceFreq method to set the frequency for a given
stream, the DirectX 9.0 runtime in turn calls the driver's D3dDrawPrimitives2 function using the
D3DDP2OP_SETSTREAMSOURCEFREQ operation code.
After the stream's frequency divisor is set--for example, to 2, then the driver must fetch data from the stream and
pass this data into applicable input vertex registers every 2 vertices. This divisor affects each element in the stream.
The driver uses this divisor to compute the vertex offset into the vertex buffer according to the following formula:
VertexOffset = VertexIndex / Divider * StreamStride + StreamOffset
For each vertex stream used, if the driver receives a start-vertex value during a call to the driver's
D3dDrawPrimitives2 function using the D3DDP2OP_DRAWPRIMITIVE operation code, the driver also divides this
start-vertex value by the frequency divisor and factors the result in the formula. This start-vertex value is provided
in the VStart member of the D3DHAL_DP2DRAWPRIMITIVE structure. The following formula factors in the start-
vertex value:
VertexOffset = StartVertex / Divider +

VertexIndex / Divider * StreamStride + StreamOffset
Note that the preceding formulas use integer division.

The application passes the D3DSBT_VERTEXSTATE state type in a call to the IDirect3DDevice9::CreateStateBlock
method to capture the current vertex state.
The driver ignores the setting of a stream's frequency divisor either for indexed primitives or if the driver only
supports a vertex shader model that is earlier than version 3.0 (including fixed function).
For more information about IDirect3DDeviceXxx::SetStreamSourceFreq and
IDirect3DDeviceXxx::CreateStateBlock, see the latest DirectX SDK documentation.
Supporting Single-Pixel-Wide Antialiased Lines
A DirectX 9.0 version driver can support single-pixel-wide lines that are either alias or antialias. The driver indicates
antialias support by setting the D3DLINECAPS_ANTIALIAS capability bit in the LineCaps member of the D3DCAPS9
structure. The driver returns a D3DCAPS9 structure in response to a GetDriverInfo2 query similarly to how it
returns a D3DCAPS8 structure as described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this
query is described in Supporting GetDriverInfo2.
To enable line antialiasing, the driver receives the D3DDP2OP_RENDERSTATE operation code in the command
stream of its D3dDrawPrimitives2 function. The driver processes the D3DRS_ANTIALIASEDLINEENABLE render
state from the RenderState member of the D3DHAL_DP2RENDERSTATE structure that is associated with this
operation code. The driver determines whether to enable or disable line antialiasing from the Boolean value in the
dwState member of D3DHAL_DP2RENDERSTATE. The value TRUE means to enable and FALSE means to disable.
By default, this render-state value is set to FALSE.
The D3DRS_ANTIALIASEDLINEENABLE render state applies to triangles drawn in wire-frame mode as well as line-
drawing primitive types.
When rendering to a multiple-sample render target, the driver must ignore a request to enable line antialiasing and
render all lines aliased.
Supporting High-Order Patched Surfaces
A DirectX 9.0 version driver for a device that supports adaptive tessellation and displacement mapping for high-
order patched surfaces must indicate such support with capability bits and be able to process new adaptive-
tessellation render states and a displacement-map texture stage state. For more information about adaptive
tessellation and displacement mapping, see the latest DirectX SDK.
To indicate support of adaptive tessellation and displacement mapping, the driver sets the following capability bits
in the DevCaps2 member of the D3DCAPS9 structure:
D3DDEVCAPS2_ADAPTIVETESSRTPATCH
Device can adaptively tessellate render-target patches.
D3DDEVCAPS2_ADAPTIVETESSNPATCH
Device can adaptively tessellate N-patches.
D3DDEVCAPS2_DMAPNPATCH
Device supports displacement maps for N-patches.
D3DDEVCAPS2_PRESAMPLEDDMAPNPATCH
Device supports presampled displacement maps for N-patches.
To indicate the maximum number of N-patch subdivisions that the display device can support, the driver sets the
MaxNpatchTessellationLevel member of the D3DCAPS9 structure to the maximum number. Applications that
use presampled displacement mapping are affected by the device clamping to this maximum number.
The driver specifies the D3DFORMAT_OP_DMAP flag in the dwOperations member of the DDPIXELFORMAT
structure for a particular surface format to mark the format for displacement-map sampling. When a texture
surface is created, the Direct3D runtime sets the DDSCAPS3_DMAP bit of the dwCaps3 member of the
DDSCAPSEX (DDSCAPS2) structure to indicate that the texture can be sampled in the tessellation unit.
Note that DirectX 9.0 and later drivers must turn off the N-patch feature only when the value of the
D3DRS_PATCHSEGMENTS render state is less than 1.0f. DirectX 8.1 and earlier drivers are not required to behave
in this manner.
The following adaptive-tessellation render states along with their default values are new for DirectX 9.0:
D3DRS_MAXTESSELLATIONLEVEL = 1.0f
D3DRS_MINTESSELLATIONLEVEL = 1.0f
D3DRS_ADAPTIVETESS_X = 0.0f
D3DRS_ADAPTIVETESS_Y = 0.0f
D3DRS_ADAPTIVETESS_Z = 1.0f
D3DRS_ADAPTIVETESS_W = 0.0f
D3DRS_ENABLEADAPTIVETESSELLATION = FALSE
The D3DDMAPSAMPLER sampler, which is also new for DirectX 9.0, is used in the tessellation unit to set a
displacement map texture.
Note DirectX 9.0 and later applications can use the D3DSAMP_DMAPOFFSET value in the D3DSAMPLERSTATETYPE
enumeration to control the offset, in vertices, into the presampled displacement map. The runtime maps user-mode
sampler states (D3DSAMP_Xxx) to kernel-mode D3DTSS_Xxx values so that DirectX 9.0 and later drivers are not
required to process user-mode sampler states. Therefore, drivers must instead process the D3DTSS_DMAPOFFSET
value in the TSState member of the D3DHAL_DP2TEXTURESTAGESTATE structure for
D3DDP2OP_TEXTURESTAGESTATE operations. For more information about D3DSAMPLERSTATETYPE and
presampled displacement mapping, see the latest DirectX SDK documentation.
Supporting Additional Instruction Slots for Shader 3
A display device that supports either pixel or vertex shader version 3.0 and later must support at least 512
instruction slots for either shader type. However, this display device can support up to 32768 instruction slots for
either shader type.
To indicate the maximum number of instruction slots for the vertex shader 3.0 that the device supports, the DirectX
9.0 driver for the device sets the MaxVertexShader30InstructionSlots member of the D3DCAPS9 structure to
the maximum number.
To indicate the maximum number of instruction slots for the pixel shader 3.0 that the device supports, the DirectX
9.0 driver for the device sets the MaxPixelShader30InstructionSlots member of the D3DCAPS9 structure to the
maximum number.
Because the maximum number of instruction slots for pixel and vertex 3.0 shaders can be different, the DirectX 9.0
driver can set MaxVertexShader30InstructionSlots and MaxPixelShader30InstructionSlots to different
values. The driver can set the maximum number of instruction slots from 512 to 32768. If the driver sets either
MaxVertexShader30InstructionSlots or MaxPixelShader30InstructionSlots to a value that is outside the
allowable range, the driver fails to load.
Reporting Capabilities for Shader Versions
The DirectX 9.0 version driver for a display device that supports pixel or vertex shader version 2.0 or 3.0 and later
must indicate that it supports a minimum set of capabilities in order to bind the device to the shader version. The
driver must set members of the D3DCAPS9 structure to indicate support of the capabilities. The driver returns a
D3DCAPS9 structure in response to a GetDriverInfo2 query similarly to how it returns a D3DCAPS8 structure as
described in Reporting DirectX 8.0 Style Direct3D Capabilities. Support of this query is described in Supporting
GetDriverInfo2. These capabilities are discussed in the following topics:
Reporting Capabilities for Shader 2 Support
The DirectX 9.0 version driver for a display device that supports pixel or vertex shader version 2.0 and later must
indicate that it supports the following capabilities:
If a device supports vertex shader 2.0 and later, its driver must set the members of the D3DCAPS9 structure to the
following values:
Set the MaxStreams member to be at least 8 to indicate that the device can handle 8 or more concurrent
data streams.
Set the D3DDTCAPS_UBYTE4 bit in the DeclTypes member to 1 to indicate support of the UBYTE4 vertex
element type. For more information, see Reporting Support of UBYTE4 Vertex Element.
If a device supports pixel shader 2.0 and later, its driver must configure the following bits in the TextureCaps
member to indicate whether the driver supports 2-D texture mapping as nonpowers-of-2 conditionally or
unconditionally. For more information, see the description of these bits in the D3DPRIMCAPS reference page.
Set the D3DPTEXTURECAPS_POW2 and D3DPTEXTURECAPS_NONPOW2CONDITIONAL bits to 1 to indicate
conditional support.
Set the D3DPTEXTURECAPS_POW2 and D3DPTEXTURECAPS_NONPOW2CONDITIONAL bits to 0 (that is, do
not set these bits) to indicate unconditional support.
The DirectX 9.0 version driver for a display device that supports pixel or vertex shader version 3.0 and later must
indicate that it supports the following capabilities:
Vertex shader 3.0 and later
If a device supports vertex shader 3.0 and later, its driver must set the members of the D3DCAPS9 structure to the
following values:
VS20Caps
Set the following members of the D3DVSHADERCAPS2_0 structure:
DynamicFlowControlDepth set to 24.
NumTemps set to 32.
StaticFlowControlDepth set to 4.
Caps set to the D3DVS20CAPS_PREDICATION bit to indicate that predication is supported.
GuardBandLeft, GuardBandTop, GuardBandRight, GuardBandBottom
Set each to 8K.
VertexShaderVersion
Set to 3.0.
MaxVertexShaderConst
Set to 256.
MaxVertexShader30InstructionSlots
Set to 512.
RasterCaps
Set the D3DPRASTERCAPS_FOGVERTEX bit for fog support.
VertexTextureFilterCaps
Set the following filter capabilities:
DevCaps2
Set the D3DDEVCAPS2_VERTEXELEMENTSCANSHARESTREAMOFFSET bit to indicate that vertex elements in a
vertex declaration can share the same stream offset.
DeclTypes
Set the following bits to indicate the vertex data types supported by the device:
D3DDTCAPS_UBYTE4
D3DDTCAPS_UBYTE4N
D3DDTCAPS_SHORT2N
D3DDTCAPS_SHORT4N
D3DDTCAPS_FLOAT16
D3DDTCAPS_FLOAT16
Pixel shader 3.0 and later
If a device supports pixel shader 3.0 and later, its driver must set the members of the D3DCAPS9 structure to the
following values:
PS20Caps
Set the following members of the D3DPSHADERCAPS2_0 structure:
DynamicFlowControlDepth set to 24.
NumTemps set to 32.
StaticFlowControlDepth set to 4.
NumInstructionSlots set to 512.
Caps set to the following bits:
D3DPS20CAPS_ARBITRARYSWIZZLE to indicate that arbitrary swizzles is supported.
D3DPS20CAPS_GRADIENTINSTRUCTIONS to indicate that gradient instructions is supported.
D3DPS20CAPS_PREDICATION to indicate that predication is supported.
D3DPS20CAPS_NODEPENDENTREADLIMIT to indicate no dependent read limit.
D3DPS20CAPS_NOTEXINSTRUCTIONLIMIT to indicate no limit on the mix of texture and math instructions.
MaxTextureWidth, MaxTextureHeight
Set each to 4K.
MaxTextureRepeat
Set to 8K.
MaxAnisotropy
Set to 16.
PixelShaderVersion
Set to 3.0.
MaxPixelShader30InstructionSlots
Set to 512.
PrimitiveMiscCaps
Set the following bits:
D3DPMISCCAPS_MASKZ
All the cull modes: D3DPMISCCAPS_CULLNONE, D3DPMISCCAPS_CULLCW, D3DPMISCCAPS_CULLCCW.
D3DPMISCCAPS_COLORWRITEENABLE
D3DPMISCCAPS_CLIPPLANESCALEDPOINTS
D3DPMISCCAPS_CLIPTLVERTS
D3DPMISCCAPS_BLENDOP
D3DPMISCCAPS_FOGINFVF
RasterCaps
D3DPRASTERCAPS_MIPMAPLODBIAS
D3DPRASTERCAPS_ANISOTROPY
D3DPRASTERCAPS_COLORPERSPECTIVE
D3DPRASTERCAPS_SCISSORTEST
Full depth support: D3DPRASTERCAPS_SLOPESCALEDEPTHBIAS, D3DPRASTERCAPS_DEPTHBIAS
ZCmpCaps
Set the following bits for a full set of comparisons for stencil, depth and alpha test:
D3DPCMPCAPS_NEVER
D3DPCMPCAPS_LESS
D3DPCMPCAPS_EQUAL
D3DPCMPCAPS_LESSEQUAL
D3DPCMPCAPS_GREATER
D3DPCMPCAPS_NOTEQUAL
D3DPCMPCAPS_GREATEREQUAL
D3DPCMPCAPS_ALWAYS:
SrcBlendCaps, DestBlendCaps
Set the following source and destination blending modes except where noted:
D3DPBLENDCAPS_ZERO
D3DPBLENDCAPS_ONE
D3DPBLENDCAPS_SRCCOLOR
D3DPBLENDCAPS_INVSRCCOLOR
D3DPBLENDCAPS_DESTALPHA
D3DPBLENDCAPS_INVDESTALPHA
D3DPBLENDCAPS_DESTCOLOR
D3DPBLENDCAPS_INVDESTCOLOR
D3DPBLENDCAPS_SRCALPHASAT (not set for DestBlendCaps)
D3DPBLENDCAPS_BOTHSRCALPHA (not set for DestBlendCaps)
D3DPBLENDCAPS_BOTHINVSRCALPHA (not set for DestBlendCaps)
D3DPBLENDCAPS_BLENDFACTOR
TextureCaps
Set the following texture capabilities:
D3DPTEXTURECAPS_PERSPECTIVE
D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE
D3DPTEXTURECAPS_PROJECTED
D3DPTEXTURECAPS_CUBEMAP
D3DPTEXTURECAPS_VOLUMEMAP
D3DPTEXTURECAPS_MIPMAP
D3DPTEXTURECAPS_MIPVOLUMEMAP
D3DPTEXTURECAPS_MIPCUBEMAP
TextureFilterCaps, VolumeTextureFilterCaps, CubeTextureFilterCaps
Set the following filter capabilities for each except where noted:
D3DPTFILTERCAPS_MINFLINEAR
D3DPTFILTERCAPS_MINFANISOTROPIC (not required for VolumeTextureFilterCaps and
CubeTextureFilterCaps)
D3DPTFILTERCAPS_MIPFPOINT
D3DPTFILTERCAPS_MIPFLINEAR
D3DPTFILTERCAPS_MAGFLINEAR
TextureAddressCaps
Set the following texture address modes to indicate support at vertex and pixel stages:
D3DPTADDRESSCAPS_WRAP
D3DPTADDRESSCAPS_MIRROR
D3DPTADDRESSCAPS_CLAMP
D3DPTADDRESSCAPS_BORDER
D3DPTADDRESSCAPS_INDEPENDENTUV
D3DPTADDRESSCAPS_MIRRORONCE
StencilCaps
Set the following bits to indicate support of stencil operations:
D3DSTENCILCAPS_KEEP
D3DSTENCILCAPS_ZERO
D3DSTENCILCAPS_REPLACE
D3DSTENCILCAPS_INCRSAT
D3DSTENCILCAPS_DECRSAT
D3DSTENCILCAPS_INVERT
D3DSTENCILCAPS_INCR
D3DSTENCILCAPS_DECR
D3DSTENCILCAPS_TWOSIDED
FVFCaps
Set the D3DFVFCAPS_PSIZE capability to indicate that the device supports point size per vertex.
TextureCaps
Indicate that the device supports either full support or conditional nonpow-of-2 texture support. For more
information, see Reporting Capabilities for Shader 2 Support.
Must not set the D3DPTEXTURECAPS_SQUAREONLY bit. That is, the device cannot be limited to square textures
only.
If the device supports Rendering to Multiple Targets Simultaneously (that is, the NumSimultaneousRTs member
is set to greater than 1), its driver must set the members of the D3DCAPS9 structure to the following values:
PrimitiveMiscCaps
D3DPMISCCAPS_INDEPENDENTWRITEMASKS
D3DPMISCCAPS_MRTINDEPENDENTBITDEPTHS
D3DPMISCCAPS_MRTPOSTPIXELSHADERBLENDING
MaxUserClipPlanes
If vertex shader 3.0 and later is supported, set to 6.
DeclTypes
Set the following bits to indicate the vertex formats that the device supports if vertex shader 3.0 and later is
supported:
D3DDTCAPS_SHORT2N
D3DDTCAPS_SHORT4N
D3DDTCAPS_UDEC3
D3DDTCAPS_DEC3N
Updates for Windows DDK
The DirectX 9.0 DDK can be installed and used with many versions of the Microsoft Windows Driver Development
Kit (DDK). The following topics describe features or requirements in versions of the Windows DDK that were not
documented or documented incorrectly when those versions were released:
Allocating Nonpaged Display Memory
Specifying Maximum Size of Bug-check Data in a Video Miniport Driver
Allocating Nonpaged Display Memory
This topic applies only to Microsoft Windows XP and later.

A DirectX 9.0 version display driver can call the EngAllocMem graphics device interface (GDI) function to not only
allocate memory from the system's paged pool but also from nonpaged pool. To allocate nonpaged memory, the
driver must specify the FL_NONPAGED_MEMORY flag in the Flags parameter of the EngAllocMem call. If this flag
is not specified, the memory is allocated from the system's paged pool.
Windows 2000 and earlier only permitted allocations from the system's paged pool.
Although this feature of allocating from nonpaged pool was available in WindowsXP and later, it was not
documented in the Windows XP and Windows XP with Service Pack 1 (SP1) DDKs.
Specifying Maximum Size of Bug-check Data in a
Video Miniport Driver
This topic applies only to Microsoft Windows XP with Service Pack 1 (SP1) and later.
A video miniport driver must set the value for the BugcheckDataSize parameter of the
VideoPortRegisterBugcheckCallback function to be no greater than 0x0F20 (4000) bytes for Windows XP SP1
and Microsoft Windows Server 2003 releases. In Windows Server 2003 and later releases, the maximum value for
BugcheckDataSize is the MAX_SECONDARY_DUMP_SIZE constant. The value of this constant might change in
releases later than Windows Server 2003.
The Windows XP SP1 DDK documentation incorrectly specified the maximum value for BugcheckDataSize.
Updates for Earlier DirectX DDK Versions
The following topics describe features that were not previously documented and that apply to DirectX version 9.0
as well as earlier versions:
Promoting Z Buffers to 32 Bits Per Pixel
Destroying Objects Associated with a Direct3D Context
Handling Color Values for Pixel Formats
Supplying Default Values for Texture Coordinates in Vertex Declarations
Promoting Z Buffers to 32 Bits Per Pixel
This topic applies to DirectX 8.0 and later.

A display driver whose display device does not support rendering to z and color buffers with differing pixel depths
must transparently promote a 16 bits per pixel (bpp) z buffer to 32 bpp in order to render both the z buffer and a
32 bpp color buffer at the same time. Note, however, that the z buffer cannot also have stencil bits. Therefore,
applications are not required to correct this mismatch in buffer pixel depth.
If the driver's display device can render to z and color buffers of differing pixel depth, the driver sets the
D3DFORMAT_OP_ZSTENCIL_WITH_ARBITRARY_COLOR_DEPTH flag in the dwOperations member of the
DDPIXELFORMAT structure for z-buffer formats. The Direct3D runtime then lets applications render to any
mismatch of z- and color-pixel depths.
If the driver does not set D3DFORMAT_OP_ZSTENCIL_WITH_ARBITRARY_COLOR_DEPTH for z-buffer formats, the
runtime only lets applications render to a mismatch of 32 bpp color buffer and 16 bpp z buffer with no stencil bits
as described in the introductory paragraph. In this case, the driver allocates a 32 bpp z buffer in place of the
requested 16 bpp z buffer.
If D3DFORMAT_OP_ZSTENCIL_WITH_ARBITRARY_COLOR_DEPTH is not set, the runtime does not let applications
render to the following mismatch scenarios:
16 bpp color buffer and 32 bpp z buffer at the same time. For rendering to succeed in this scenario, the
driver would have to substitute a 16 bpp z buffer for the 32 bpp z buffer, which would degrade z precision
and cause noticeable artifacts.
Any z format whose depth stencil does not occupy the same number of bits per pixel as the color buffer (in
other words, mismatching z and stencil surfaces). For rendering to succeed in this scenario, the driver would
have to change the number of stencil bits, which would also cause noticeable artifacts.
Destroying Objects Associated with a Direct3D
Context

To prevent memory leaks, a display driver must release all objects associated with a Direct3D context when the
driver's D3dContextDestroy function is called. These objects include, for example, vertex and pixel shaders,
declarations and code for vertex shaders, resources for asynchronous queries, and texture resources.
Handling Color Values for Pixel Formats

A display driver must convert input color values for the ARGB and YUV classes of color formats because
applications request color-fill and clear operations on surfaces with these formats in a uniform way. However, the
driver must directly use the color values from other class formats. For example, applications use A8R8G8B8 as the
uniform color value for all surfaces that have at most 8 bits for the alpha (A), red (R), green (G), and blue (B)
components; the driver must convert the A8R8G8B8 color to the color value that is specific to the actual ARGB
format by copying the bits with the highest significance.
The display driver receives color values when it processes the D3DDP2OP_CLEAR and D3DDP2OP_COLORFILL
operation codes in its D3dDrawPrimitives2 function.
The display driver can use the following code to convert color values for the ARGB and YUV class formats:
DWORD Convert2N(DWORD Color, DWORD n)

{
return (Color * (1 << n)) / 256;
}
DWORD CPixel::ConvertFromARGB(D3DCOLOR InputColor,

D3DFORMAT OutputFormat)
{
DWORD Output = (DWORD) InputColor;
DWORD Alpha = InputColor >> 24;
DWORD Red = (InputColor >> 16) & 0x00ff;
DWORD Green = (InputColor >> 8) & 0x00ff;
DWORD Blue = InputColor & 0x00ff;
switch(OutputFormat) {
case D3DFMT_R8G8B8:
case D3DFMT_X8R8G8B8:
Output = InputColor & 0x00ffffff;
break;
case D3DFMT_A8R8G8B8:
Output = InputColor;
break;
case D3DFMT_R5G6B5:
Output = (Convert2N(Red,5) << 11) |
(Convert2N(Green,6) << 5) |
(Convert2N(Blue,5));
break;
break;
Output = (Convert2N(Alpha, 1) << 15) |
(Convert2N(Red,5) << 10) |
break;
break;
Output = (Convert2N(Alpha,4) << 12) |
break;
case D3DFMT_R3G3B2:
break;
Output = (Alpha << 8) |
break;
case D3DFMT_A2B10G10R10:
(Convert2N(Red,10)) |
(Convert2N(Blue,10) << 20);
break;
case D3DFMT_X8B8G8R8:
Output = (Convert2N(Red,8)) |
break;
case D3DFMT_A8B8G8R8:
Output = (Alpha << 24) |
(Convert2N(Red,8)) |
break;
#if (DXPIXELVER > 8)

break;
#endif
case D3DFMT_UYVY:
case D3DFMT_R8G8_B8G8:
#endif
Output = (Red << 24) |
(Green << 16) |
(Red << 8) |
(Blue);
break;
case D3DFMT_YUY2:
case D3DFMT_G8R8_G8B8:
#endif
Output = (Green << 24) |
(Red << 16) |
(Blue << 8) |
(Red);
break;
case MAKEFOURCC('A', 'Y', 'U', 'V'):

case MAKEFOURCC('N', 'V', '1', '2'):
case MAKEFOURCC('Y', 'V', '1', '2'):
case MAKEFOURCC('I', 'C', 'M', '1'):
Output = InputColor;
break;
}
return Output;
}

Supplying Default Values for Texture Coordinates in
Vertex Declarations

A display driver whose display device supports a programmable pixel shader must supply default values for any
texture coordinates that are missing in a vertex declaration. Texture coordinates that are supplied to pixel shaders
must have four components (u,v,w,q). If the u, v, or w component is missing, the hardware or driver must supply a
default value of 0 to that component. If the q component is missing, the hardware or driver must supply a default
value of 1 to that component. Therefore, if all components are missing, (0,0,0,1) is the default value. For example, if
a 2D texture coordinate is sent to a pixel shader that uses 3D texture coordinates, then the hardware or driver
supplies default values of 0 and 1 to the 3rd and 4th components respectively.
The exception for source parameter tokens is with the following instruction:
// D3DSIO_DEF c#,f0,f1,f2,f2
For this instruction, the source parameter tokens (f#) are taken as 32-bit floats.
This section contains information about Microsoft DirectX Video Acceleration (DirectX VA). This is an application
programming interface (API) and a corresponding motion compensation device driver interface (DDI) for
acceleration of digital video decoding. The following additional DDIs are also provided as part of DirectX VA:
A deinterlacing DDI for deinterlacing and frame-rate conversion of video content.
A ProcAmp DDI to support ProcAmp control and postprocessing of video content.
A COPP DDI for protecting video content.
Driver writers who are creating DirectX VA drivers for Microsoft Windows XP with Service Pack 1 (SP1) and later
should use the dxva.h header file. This contains the structures and enumerations used for video acceleration and
deinterlacing, and frame-rate conversion.
This section includes the following topics:
Video Decoding
COPP Processing
DirectX VA allows video processing operations that are frequently executed and simple to be performed by a
hardware accelerator. Confining less complex video processing operations to the accelerator allows video decoding
acceleration to be accomplished for various video standards with minimal customization to the accelerator. Video
processing operations that are less frequently executed and more complex, such as bitstream parsing and variable-
length decoding (VLD), can be performed on the host CPU.
The DirectX VA API and corresponding motion compensation DDI provide support for the following operations:
Alpha blending for purposes such as DVD subpicture support.
Encryption for applications that require it.
Deinterlacing and frame-rate conversion of video content.
ProcAmp control and post processing of video content.
Protecting video content from unauthorized copying and displaying through the Certified Output Protection
Protocol.
The information presented here is applicable to both application and device driver developers. The format specified
defines how information is exchanged between the user-mode host decoder and the kernel-mode device driver. In
most cases, the data is transferred from the host to the device driver but, in some cases, data is sent in the other
direction.
For sample code used for decoding Windows media video format, see the Windows media sample drivers in the
Windows Media Porting Kit. The Windows Media Porting Kit is used to convert audio and video to Windows media
format.
For support of Windows media format, the Windows Media Video Codec version 9 or higher must be used.
Windows Media Video Codecs version 8 supplied with Windows XP do not support DirectX VA.
For a display driver that uses the deinterlacing DDI, video content must be interlaced and properly marked as
interlaced. The video mixing renderer (VMR) uses the VIDEOINFOHEADER2 structure in conjunction with the
deinterlacing DDI to deinterlace and perform frame-rate conversion. For more information about the
VIDEOINFOHEADER2 structure, see the Windows SDK documentation.
The ProcAmp control DDI extends DirectX VA to support ProcAmp control and post processing of video content by
graphics device drivers. The DDI maps to the existing DirectDraw and DirectX VA DDI. The DDI is not accessible
through the IAMVideoAccelerator interface. The ProcAmp control DDI is available in Microsoft DirectX 9.0 and
later versions only.
The Implementation of Current Standards topic details the hardware accelerator and software decoder
requirements that must be met for the following, motion-compensated video codec standards: ITU-T H.261, MPEG-
1, MPEG-2 (H.262), ITU-T H.263, MPEG-4, MPEG-4 AVC (H.264), and VC-1.
There are no tools supplied with DirectX VA. For more information about tools supplied for Windows media
support, see the Windows Media Porting Kit.
Implementation of Current Standards
DirectX VA provides support for ITU-T H.261, MPEG-1, MPEG-2 (H.262), ITU-T H.263, MPEG-4, MPEG-4 AVC
(H.264), and VC-1. The implementation of these video standards is described in the following topics. Some
operations must be implemented by the hardware accelerator portion of DirectX VA and some by the software
decoder and are noted as such.
ITU-T H.261
MPEG-1
MPEG-2 (H.262)
ITU-T H.263
MPEG-4
MPEG-4 AVC (H.264)
VC-1
ITU-T H.261
This standard is titled "Video Codec for Audiovisual Services at px64 kbit/s," ITU-T Recommendation H.261. This
recommendation contains the same basic design later used in other video codec standards. H.261 uses 8-bit
samples with Y, Cb, and Cr components, 4:2:0 sampling, 16x16 macroblock-based motion compensation, 8x8 IDCT,
zigzag inverse scanning of coefficients, scalar quantization, and variable-length coding of coefficients based on a
combination of zero-valued run-lengths and quantization index values.
All H.261 prediction blocks use forward-only prediction from the previous picture. H.261 does not have half-
sample accurate prediction filters, but instead uses a type of low-pass filter called the loop filter (Section 3.2.3 of the
H.261 specification) that can be turned off or on during motion compensation prediction for each macroblock.
Annex D Graphics
Recommendation H.261 Annex D Graphic Transfer mode can be supported by reading four decoded pictures from
the accelerator back onto the host and interleaving them there for display as a higher-resolution graphic picture.
MPEG-1
The MPEG-1 video standard is titled ISO/IEC 11172-2. This standard was developed after H.261 and borrowed
significantly from it. The MPEG-1 standard does not have a loop filter. Instead, it uses a simple half-sample filter
that represents a finer accuracy of movement between frames than the full-sample accuracy supported by H.261.
Two additional prediction modes, bidirectional and backward prediction, were added. These prediction modes
require one additional reference frame to be buffered. The bidirectional prediction mode averages forward-
predicted and backward-predicted prediction blocks. The arithmetic for averaging forward and backward prediction
blocks is similar to that for creating a half-sampled interpolated prediction block. The basic structure is otherwise
the same as H.261.
MPEG-2 (H.262)
The MPEG-2 standard is titled "Information Technology âˆ’ Generic Coding of Moving Pictures and Associated
Audio Information: Video," ITU-T Recommendation H.262 | ISO/IEC 13818-2. This standard added only a basic 16x8
shape to the existing tools of MPEG-1 (from a very low-layer perspective). From a slightly higher-layer perspective,
MPEG-2 added many additional ways to combine predictions referenced from multiple fields in order to deal with
interlaced video characteristics.
Frame-Structured Pictures
Frame MC: Frame motion compensation has a 16x16 prediction block shape, similar to MPEG-1 predictions. This is
the only progressive-style motion compensation within MPEG-2 (as specified by the motion_type variable in MPEG-
2). There is either one prediction plane (forward-only or backward-only) or two (bidirectional) as determined by the
macroblock_type MPEG-2 variable. Reference blocks are formed from contiguous frame lines from the frame
buffer. The frame buffer is selected by the semantics of the decoding process. Half-sample interpolation (MPEG-2
Section 7.6.4) and bidirectional interpolation (MPEG-2 Section 7.6.7.1) have identical averaging operations as in the
MPEG-1 case.
Field (16x8) MC: Field (16x8) motion compensation has each prediction plane (forward and backward directions)
consisting of a top 16x8 prediction block, and a bottom 16x8 prediction block. The reference block corresponding to
each prediction block may be extracted from the top field or bottom field of a reference frame, as determined by
the MPEG-2 variable motion_vertical_field_select[r][s]. There are two possibilities for field (16x8) motion
compensation:
One set of two prediction blocks for one prediction plane (forward or backward only prediction with two
prediction blocks per macroblock).
Two sets of two prediction blocks for two prediction planes (bidirectional prediction with four prediction
blocks in a macroblock).
Dual-Prime MC: Like field (16x8) motion compensation, dual-prime motion compensation has each plane (parity)
consisting of a top and bottom 16x8 shape. The same and opposite parity planes are combined together in the
averaging operation. This averaging operation is identical to the bidirectional interpolation that is used for other
motion types in MPEG-2. Unlike the other motion compensation types, a dual-prime macroblock always consists of
two sets of prediction blocks (of the same and opposite parity) for a total of four prediction blocks per macroblock.
Field-Structured Pictures
Field (16x16) MC: Field (16x16) motion compensation resembles frame motion compensation in that each
prediction has a 16x16 shape. However, the reference block data is formed from sequential top field or bottom field
lines only (not a mixture of alternating top field and bottom field lines as in progressive motion). As with all motion
compensation in field-structured pictures, the reconstructed macroblock is stored in the current frame buffer as
only sequential top field or bottom field lines. The top or bottom field destination is determined by the MPEG-2
picture_structure variable.
16x8 MC: Although the basic prediction block shapes of this type of motion compensation are the same as for the
other 16x8 shapes (field 16x8 MC and dual-prime MC), it does not partition the macroblock in the same manner.
The two partitions correspond to the upper and lower halves of a macroblock prediction plane, rather than the top
and bottom fields within a macroblock. For more information, see the "Two MPEG-2 Macroblock 16x8 Partitions"
illustration in Macroblock Partitioning. Note that the anchor point for the lower 16x8 half is the upper-left corner of
the 16x8 lower portion, not the upper-left corner of the whole macroblock, as is the case with all other types of
motion compensation.
Dual-Prime MC: At the lowest layer, there is virtually no distinction between dual-prime and the field-structured
field motion compensation with bidirectional prediction. The differences are manifested in the frame-buffer
selections from which reference blocks are formed. Dual-prime motion compensation in field-structured pictures
always consists of two 16x16 prediction blocks (the same and opposite parity predictions).
ITU-T H.263
ITU-T Recommendation H.263 is titled Video Coding for Low Bit Rate Communication. This recommendation offers
improved compression performance relative to H.261, MPEG-1, and MPEG-2. The H.263 standard contains a
baseline mode of operation that supports only the most basic features of H.263. It also contains a large number of
optional, enhanced modes of operation that can be used for various purposes. Baseline H.263 prediction operates
in this interface using a subset of the MPEG-1 features. The baseline mode contains no bidirectional prediction âˆ’
only forward prediction.
Rounding control: Several H.263 optional modes require rounding control. This feature is supported by the
bRcontrol member of DXVA_PictureParameters.
Motion Vectors over Picture Boundaries: Several H.263 optional modes allow motion vectors that address
locations outside the boundaries of a picture, as defined in H.263 Annex D. The bPicExtrapolation member of
DXVA_PictureParameters, indicates whether the accelerator needs to support such motion. There are two ways
that an accelerator can support motion vectors over picture boundaries. In either case, the result is the same:
Clip the value of the address on each sample fetch to ensure that it stays within picture boundaries.
Pad the picture by using duplicated samples to widen the actual memory area used by one macroblock
width and height across each border of the picture.
Bidirectional motion prediction: the bidirectional motion prediction used in some optional H.263 prediction
operations uses a different rounding operator than MPEG-1. (It uses downward-rounding of half integer values as
opposed to upward-rounding.) The bBidirectionalAveragingMode member of DXVA_PictureParameters
indicates the rounding method used for combining prediction planes in bidirectional motion compensation.
Four-MV Motion Compensation (INTER4V): Although each macroblock in H.263 is 16x16 in size, some optional
modes (for example, Annex F and Annex J of H.263) allow four motion vectors to be sent for a single macroblock,
with one motion vector sent for each of the four 8x8 luminance blocks within the macroblock. The corresponding
8x8 chrominance area uses a single, derived motion vector.
Overlapped Block Motion Compensation (OBMC): H.263 Annex F contains Overlapped Block Motion
Compensation (OBMC) for luminance samples, in addition to INTER4V support. OBMC prediction is supported by
allowing 12 motion vectors to be sent for forward prediction of a macroblock.
OBMC prediction blocks can be realized in the hardware accelerator as a combination of predictions organized into
three planes:
A current plane (with a plane index of zero)
An upper/lower plane (with a plane index of 1)
A left/right plane (with a plane index of 2)
The three planes can serve as temporary storage for the blocks q(x,y), r(x,y), and s(x,y) defined in H.263 Section F.3.
After each of the three planes have been filled out for all four blocks, they can be combined according to the
formula in H.263 Section F.3, and weighted by their respective H matrices given in H.263 Annex F.
As an example, an OBMC luminance macroblock prediction may be comprised of eight top/bottom prediction
blocks of 8x4 shape, eight left/right blocks of 4x8 shape, and four current blocks of 8x8 shape. If all four of the
motion vectors for the plane with index 0 have the same motion vector (that is, when not in an INTER4V
macroblock), a single 16x16 macroblock prediction can be used to fill the entire 16x16 plane.
To implement the OBMC process in DirectX VA, 10 motion vectors are sent for the macroblock as shown in the
following figure. The first four motion vectors are for the Y₀, Y₁, Y₂, and Y₃ blocks in the current macroblock. Remote
motion vectors are then sent in the following order:
1. Left and right halves of the top of the macroblock.
2. Top and bottom halves of the left side of the macroblock.
3. Top and bottom halves of the right side of the macroblock.
The following figure shows the motion vectors sent for a macroblock when using OBMC processing. (The letter C
indicates a motion vector of the current macroblock. The letter R indicates a motion vector that is remote with
respect to the current macroblock.)
Note that H.263 does not use distinct remote vectors for the left and right halves of the bottom of the macroblock
—it reuses the vectors for the current macroblock.
The following figure shows how one 8x8 block is placed in the three types of prediction planes used by OBMC
processing in H.263.
PB frames (Annex G and M): In this mode, macroblocks for a P-frame and a pseudoâˆ’B-frame are multiplexed
together into the unique PB-frame picture coding type. The B portion of each macroblock borrows from
information encoded for the P portion of the macroblock: the B-frame forward and backward motion vectors are
scaled from the P-frame vector, and the reconstructed P-frame macroblock serves as backward reference for the B
portion. The PB-frame includes only a pseudoâˆ’B-frame, because the backward prediction for each macroblock can
only refer to the reconstructed P macroblock that is contained within the same PB macroblock. However, as with
traditional B-frame semantics, a B macroblock within a PB-frame can refer to any location within the forward-
reference frame. The limitation of the backward reference creates smaller backward prediction block shapes (as
described in H.263 Figure G.2). PB-frames are supported in DirectX VA by representing the P portions of the PB-
frame as a P-frame, and the B portions of the PB-frame as a separate B-in-P bidirectionally predicted picture
containing a unique B-in-PB type of macroblock that has two motion vectors.
Deblocking Filter (Annex J): Special commands are defined to accelerate deblocking filters, whether used within
the motion-compensated prediction loop as with Annex J, or outside the loop as is the case when deblocking H.261
pictures or H.263 baseline pictures. The host CPU must create deblocking commands that observe group of blocks
(GOB) or slice segment boundaries, if necessary.
Reference Picture Selection (Annexes N and U): Multiple reference frames are supported by the accelerator
using the picture index selection field of each prediction block.
Scalability (Annex O): Temporal, SNR, and spatial scalability features are specified in H.263 Annex O. H.263
temporal scalability B-frames are very similar in DirectX VA to MPEG-1 B-frames. Spatial scalability requires
upsampling the lower-layer reference picture and then using the upsampled picture as a reference picture for
coding an enhancement-layer picture (in all other aspects, spatial scalability is essentially the same as signal-to-
noise ratio and temporal scalability). The appropriate bidirectional averaging rounding control should be set to
downward-biased averaging for H.263. (MPEG-1 and MPEG-2 use upward-biased averaging, and H.263 uses
downward-biased averaging.)
Reference Picture Resampling (Annex P): The simple form of this annex is supported by reference buffer
resampling. For advanced forms of Annex P resampling, the reconstructed frames that serve as reference frames
must be resampled by external means and stored as reference frame buffers that are addressable by the
accelerator.
Reduced-Resolution Update (Annex Q): The H.263 reduced-resolution update mode is not currently supported,
as it has unusual residual upsampling requirements, a different form of deblocking filter, and a different form of
advanced prediction OBMC. However, reduced-resolution update-mode operation can be supported in this
interface using host-based IDCT processing when the deblocking filter mode and the advanced prediction mode
are inactive.
Independent Segment Decoding (Annex R): There is no accelerator awareness of independent segment
borders. Some forms of Annex R can be supported without any special handling (for example, baseline plus Annex
R). Forms of Annex R that require picture segment extrapolation can be supported by decoding each segment as a
picture and then constructing the complete output picture from these smaller pictures.
IDCT (Annex W): This interface supports the inverse discrete cosine transform (IDCT) specified in Annex W of
H.263.
Other H.263 Optional Features: Other optional features of H.263 can be supported without any impact on the
DirectX VA design. For example, Annexes I, K, S, and T can be easily handled by altering the software decoder
without any impact on the accelerator.
MPEG-4
MPEG-4 was based heavily on H.263 for progressive-scan coding, and on MPEG-2 for support of interlace and
color sampling formats other than 4:2:0. The features that support H.263 and MPEG-2 can be used to support
MPEG-4.
MPEG-4 can support a sample accuracy of more than 8 bits. DirectX VA includes a mechanism to support more
than 8 bits per pixel using the bBPPminus1 member of the DXVA_PictureParameters structure.
Note The features most unique to MPEG-4, such as shape coding, object orientation, face modeling, mesh objects,
and sprites, are not supported in DirectX VA.
MPEG-4 AVC (H.264)
MPEG-4 Advanced Video Coding (AVC), also known as ITU-T H.264, is a standard for video compression that can
provide good video quality at substantially lower bit rates than previous standards (for example, half or less the bit
rate of MPEG-2, H.263, or MPEG-4). MPEG-4 AVC provides this video quality without having to increase the design
complexity so much, which can make MPEG-4 implementation impractical or excessively expensive. MPEG-4 AVC
can be applied to a wide variety of applications on a wide variety of networks and systems, including low and high
bit rates, low and high resolution video, broadcast, DVD storage, RTP and IP packet networks, and ITU-T multimedia
telephony systems. For information on how DirectX VA decodes MPEG-4 AVC, download DirectX Video
Acceleration Specification for H.264/AVC Decoding.
VC-1
VC-1 is an evolution of the conventional video codec design that is based on discrete cosine transform (DCT) and
also found in H.261, H.263, MPEG-1, MPEG-2, and MPEG-4. VC-1 is an alternative to the MPEG-4 AVC (H.264) video
codec standard. VC-1 contains coding tools for interlaced video sequences as well as progressive encoding. The
main goal of VC-1 development and standardization is to support the compression of interlaced content without
first converting the content to progressive. This support makes VC-1 more attractive to broadcast and video
industry professionals. For information on how DirectX VA decodes VC-1, download DirectX Video Acceleration
Specification for Windows Media Video v8, v9 and vA Decoding (Including SMPTE 421M "VC-1").
DirectX VA Relationship to IAMVideoAccelerator API
and Motion Compensation DDI
DirectX VA uses the IAMVideoAcceleratorNotify and IAMVideoAccelerator interfaces (documented in the

Microsoft Windows SDK), and the motion compensation DDI to specify the format of the data exchanged between
the software decoder, the video mixing renderer (VMR) or the overlay mixer (OVM), and the video display driver.
The following figure shows the relationship of these interfaces to the software decoder, VMR, and video display
driver.
The IAMVideoAcceleratorNotify interface retrieves or sets decompressed buffer information for a given video
accelerator GUID.
The IAMVideoAccelerator interface enables a video decoder filter to access the functionality of a video accelerator
and provides video rendering using the video mixing renderer (VMR) or the overlay mixer (OVM).
The motion compensation DDI establishes a common interface to access hardware acceleration capabilities and
allow cross-vendor compatibility between user-mode software applications and acceleration capabilities. The DDI
notifies the decoder when a video acceleration object is being used, starts and stops the decoding of frame buffers,
indicates the uncompressed picture formats that the hardware supports, and notifies the display driver of the
macroblocks that need to be rendered. The motion compensation DDI is accessed through the
DD_MOTIONCOMPCALLBACKS structure.
For more information about the IAMVideoAccelerator and IAMVideoAcceleratorNotify interfaces, see the
Windows SDK documentation. For more information about the motion compensation DDI, see Motion
Compensation and Motion Compensation Callbacks.
Video Decoding
DirectX VA permits one or more stages of the video decoding process to be divided between the host CPU and the
video hardware accelerator. The accelerator executes the motion-compensated prediction (MCP), and may also
execute the inverse discrete-cosine transform (IDCT) and the variable-length decoding (VLD) stages of the
decoding process.
The DirectX VA API decodes a single video stream. Support of multiple video streams requires a separate DirectX
VA session for each video stream (for example, a separate pair of output and input pins for the video decoder and
acceleration driver to use in filter graph operation). For more information about a filter graph, see KS Minidriver
Architecture.
Frame Buffer Organization
All picture buffers are assumed to have frame-organized buffers as described in the MPEG-2 video specification
(sample locations are given as frame coordinates).
It is possible to use an implementation-specific translation layer to convert prediction blocks without loss (see lossy
compression) that are described in frame coordinates to field coordinates. For example, a single frame motion
prediction can be broken into two separate, top and bottom macroblock-portion predictions.
Three video component channels (Y, Cb, Cr) are decoded using interfaces defined for DirectX VA. Motion vectors for
the two chrominance components (Cb, Cr) are derived from those sent for the luminance component (Y). The
accelerator is responsible for converting any of these motion vectors to different coordinate systems that may be
used.
The following figure shows how video data buffering is implemented in the host and accelerator.

Decoder Stages
The decoder stages that are depicted in the following figure show the operation of the motion compensation
prediction (MCP) and inverse discrete-cosine transform (IDCT) parts of an accelerator. The data indicated as
dct_type is a syntax element that controls the type of IDCT that is performed.

Motion-Compensated Prediction
Block motion-compensated prediction (MCP) is the type of prediction implemented by DirectX VA. This prediction
type is what gives the MPEG and H.26x family of codecs the advantage over pure still-frame coding methods, such
as JPEG. Types of motion-compensated prediction other than block-based prediction are not implemented by
DirectX VA.
In motion-compensated prediction, previously transmitted and decoded data serves as the prediction for current
data. The difference between the prediction and the actual current data values is the prediction error. The coded
prediction error is added to the prediction to obtain the final representation of the input data. After the coded
prediction error is added to the MCP, the final decoded picture is used in the MCP to generate subsequent coded
pictures.
This recursive loop occasionally is broken by various types of resets that are specific to the element being
predicted. The resets are described by the semantics of the decoding process. (For example, motion vectors and
coefficient predictions are reset at slice boundaries, while the whole temporal frame prediction chain is reset by an
intra-refresh frame.)
The following figure shows the signal flow for motion-compensated prediction.
The steps required for motion-compensated prediction coding of pictures are as follows:
1. Reference blocks are extracted from previously decoded frames and modified as specified by encoded
mode selection and the motion vectors and other prediction commands to form the prediction of each
image block.
2. The transformed difference between the current input data block and the prediction is approximated as
closely as possible within the available bit rate by the encoder, and the result is sent as the coded prediction
error.
3. The prediction and inverse-transformed prediction error are summed to form a reconstructed picture block.
4. The reconstructed picture block is stored in a reference frame buffer to be used for the prediction of
subsequent pictures.
5. This process continues again at step 1.
Motion vectors, DCT coefficients, and other data that is not directly part of the MCP process also employ prediction
to make the transmitted form of the data more compact. These instances of prediction are executed on the host
CPU processor or bitstream parser/variable-length-decoding unit.
Macroblock Prediction
The formation of a macroblock prediction through motion-compensated prediction (MCP) must be done as a series
of discrete stages as shown in the following figure and steps:
The following four steps are involved in creating a macroblock prediction:

1. Form the reference frame
A reference frame is an uncompressed surface that was previously created by the decoding of a previous
picture, or by writing directly into a video accelerator uncompressed surface.
2. Extract the reference block
A reference block is not necessarily the same as a prediction block. It most likely consists of extra samples
that are needed in the prediction filtering stages. Unless half-sample filtering is executed in the memory unit,
the reference block for a 16x16 half-sample filtered macroblock has a 17x17 matrix of blocks with each block
consisting of an 8 row by 8 column of pixel element data. The size of the reference block is both a function of
the prediction block dimensions and filter attributes of the prediction block. A reference block must refer to a
block of data extracted from a reference frame buffer for use in motion-compensated prediction (MCP).
Note The reference block is not defined for DirectX VA because it may have properties that reflect
implementation-specific means of maintaining picture buffers.
3. Filter the reference block to form a prediction block
The reference block may be filtered in a third stage to produce a prediction block.
4. Combine prediction blocks to form macroblock prediction
One or more prediction blocks are combined to form the final prediction of the macroblock samples. Blocks
are combined by averaging the pixel values between corresponding blocks in one or more prediction planes
and rounding each up to the nearest integer (when fractional data is 0.5 or higher). P picture blocks are
combined with the temporally closest previous I or P picture blocks. B picture blocks are combined with the
closest previous and future I or P picture blocks.
The following figure shows the additional steps in the video decoding process that occur when creating a
macroblock prediction. (The blocks with solid lines depict the motion compensation process, while the blocks with
dotted lines depict other aspects of video decoding.)

Macroblock Partitioning
Macroblocks can be broken into the following segments. (This is done to compartmentalize areas with different
characteristics.) The following diagram shows how macroblocks can be broken into the following segments.
In the MPEG-2 case, the top and bottom portion of a field-structured macroblock in a frame picture represent lines
from two different fields captured at different instances in time, as much as one-fiftieth of a second apart. Thus, the
top and bottom portion could have totally noncorrelated content if significant movement has taken place between
the two fields for the frame area covered by the macroblock. As illustrated in the following figure, an additional
16x8 scheme is added in field-structured pictures to provide a finer vertical granularity of prediction, which better
accommodates edges and smaller objects with different motion characteristics.
The prediction block itself is an approximation of shape and represents a compromise selection of motion vector
for all samples that belong to the portion of the macroblock that the prediction block represents. Ideally, each
sample would have its own motion vector, but this would consume a considerable number of bits and require extra
overhead in processing.
Prediction blocks contribute to only one partition of a macroblock. A whole prediction block covers the 16x16 area
of a macroblock. (This is the case with all H.261 and MPEG-1 predictions.) MPEG-2 introduced the 16x8 prediction
shape to address the dual field/frame nature of macroblocks. The 16x8 shape was also borrowed for use in MPEG-
2 field-structured pictures to create a finer granularity of prediction. The 8x8 shape is deployed in H.263 (Advanced
Prediction) and MPEG-4. Each chrominance prediction block generally uses a motion vector derived from the
motion vector of the corresponding luminance prediction block, because the standards model motion as being the
same for all color components.
Chrominance prediction blocks usually have half the size in both horizontal and vertical directions of their
corresponding luminance prediction blocks. Chrominance vectors are therefore generally derived by scaling down
the luminance vectors to account for the difference in the respective luminance and chrominance sample
dimensions. MPEG-2's 16x8 luminance prediction blocks have corresponding 8x4 chrominance shapes.
Exceptions to the method of scaling chrominance block dimensions are often made when the luminance prediction
block becomes too small. For example, in the H.263 Advanced Prediction mode, the chrominance prediction block
remains 8x8 in shape, and the chrominance motion vector is derived from a scaled average of the four 8x8
luminance motion vectors.
Prediction Planes
The following figure illustrates the conceptual macroblock prediction planes that exist prior to forming the final
prediction.
MPEG-2 has two planes: forward and backward (bidirectional prediction), or same-parity and opposite-parity (dual-
prime). The forward reference plane consists of blocks from the closest previous I or P picture. The backward
reference plane consists of blocks from the closest future I or P picture.
In the cases of MPEG-1 and MPEG-2, prediction planes are combined by averaging between the corresponding
block pixel values of the two prediction planes and rounding each up to the nearest integer. More sophisticated
prediction schemes, such as H.263's overlapped block motion compensated (OBMC) prediction, have three planes.
A DDI between the DirectDraw and the graphics device driver extends DirectX VA to support deinterlacing and
frame-rate conversion of video content by using the kernel-mode portion of the DirectDraw DDI and the Direct3D
DDI. The deinterlace and frame-rate conversion interface is independent of all video presentation mechanisms.
The output of the deinterlacing or frame-rate conversion process is always a progressive frame.
To use this interface, the following requirements must be met:
The deinterlaced output must physically exist in the target DirectDraw surface. This requirement precludes
all hardware overlay solutions.
The graphics engine and the hardware overlay, if present, must support a minimum of bob and weave
deinterlacing functionality.
This DDI applies to Microsoft Windows XP SP1 and later versions.
This section covers the following topics:
Deinterlace Modes
Frame-Rate Conversion Modes
Bob Deinterlacing
Mapping the Deinterlace DDI to DirectDraw and DirectX VA
Video Content for Deinterlace and Frame-Rate Conversion
Deinterlacing on 64-bit Operating Systems
Combining Deinterlacing and Video Substream Compositing
Sample Functions for Deinterlacing
Deinterlace Modes
Following are examples of the deinterlace modes that can be supported by the DDI.
MODE DESCRIPTION
Bob (line doubling) This mode uses a bit-block transfer (blt). This mode should
always be available.
Simple Switching Adaptive Either a blend of two adjacent fields if low motion is
detected for that field, or bobbing if high motion is
detected.
Motion Vector Steered Motion vectors of the different objects in the surface are
used to align individual movements to the time axis
before interpolation takes place.
Advanced 3D Adaptive The missing lines are generated through some adaptive
process that is proprietary to the hardware. The process
may use several reference samples to aid generation of
the missing lines. The reference samples may be in the
past or future. Three-dimensional linear filtering falls into
this category.

Frame-Rate Conversion Modes
Following are examples of the frame-rate conversion modes that can be supported by the DDI.
MODE DESCRIPTION
Frame Repeat/Drop This is not a recommended mode, because it uses extra

memory by copying the selected source sample into the
destination surface.
Linear Temporal Interpolation A future and a previous reference field are alpha blended
together to produce a new frame.
Motion Vector Steered Motion vectors of the different objects in a scene are used
to align individual movements to the time axis before
interpolation takes place.

Bob Deinterlacing
A display driver that implements deinterlacing must support bob-style deinterlacing. The following topics describe
the mechanics and the algorithm for bob-style deinterlacing:
Bob Deinterlacing Mechanics
Bob Deinterlacing Algorithm
Bob Deinterlacing Mechanics
All graphics adapters that can perform bit-block transfers can do simple bob-style deinterlacing. When a surface
contains two interleaved fields, the memory layout of the surface can be reinterpreted to isolate each field. This is
achieved by doubling the original surface's stride and dividing the height of the surface in half. After the two fields
are isolated in this way, they can be deinterlaced by stretching the individual fields to the correct frame height.
Additional horizontal stretching or shrinking can also be applied to correct the aspect ratio for the pixels of the
video image. A display driver can determine its ability to do this to the DirectX Video Mixing Renderer (VMR). The
individual field's height can be stretched vertically by line replication or, preferably, by a filtered stretch. If the line
replication method is used, the resulting image has a blocky appearance. If a filtered stretch is used, the resulting
image may have a slightly fuzzy appearance.
The following figure shows a video surface that contains two interleaved fields.
If the video sample contains two interleaved fields as specified by the DXVA_SampleFieldInterleavedEvenFirst
and DXVA_SampleFieldInterleavedOddFirst members of the DXVA_SampleFormat enumeration, the start
time of the second field is calculated using the rtStart and rtEnd members of the DXVA_VideoSample structure
as follows:
(rtStart + rtEnd) / 2
The end time of the first field is the start time of the second field.
Bob Deinterlacing Algorithm
If your display driver implements the DXVA deinterlacing DDI, it must support the bob-style deinterlacing
algorithm in addition to any proprietary deinterlacing algorithms. Following is a description of the bob-style
deinterlacing algorithm:
Input is a field Fin (i,j) of size MxN such that 0 <= i <= Mâˆ’1 and 0 <= j <=Nâˆ’1, where i and j are row and column
indices, respectively.
Output is a frame Fout(i,j) of size 2xMxN such that 0 <= i <= 2Mâˆ’1 and 0 <= j <=Nâˆ’1, where i and j are row and
column indices, respectively.
If Fin (i,j) is a top field:
If Fin (i,j) is a bottom field:
Each definition uses a finite impulse response (FIR) filter with an impulse response h of length 2K. Impulse
response h is symmetric about its midpoint, such that h₋₍ ₊₁₎ = h for k=0 to Kâˆ’1 and
The preferred form of bob-style deinterlacing uses K=2 and h₀ = 9/16 (so h₁ = âˆ’1/16). This filter should be
implemented as (9*(b+c)âˆ’(a+d)+8)>>4, where a, b, c, and d are the four input samples used to produce one
output sample.
Mapping the Deinterlace DDI to DirectDraw and
DirectX VA
Deinterlacing functionality must be accessed through DirectDraw's motion compensation callback functions to
which the deinterlace DDI can be mapped.
The deinterlace DDI is divided into two functional groups: the DirectX VA container methods and the DirectX VA
device methods. The container methods determine the capabilities of each DirectX VA device that is contained by
the display hardware. The device methods direct the device to perform operations specific to the device. A DirectX
VA driver can have only one container, but it can support multiple devices.
It is possible to map the deinterlace DDI to the motion compensation callbacks because they do not use typed
parameters (that is, their single parameter is a pointer to a structure). In other words, the information in the single
parameter that is passed to a motion compensation callback function can be processed according to its information
type. For example, if DXVA_DeinterlaceBltFnCode-type information is passed to the DdMoCompRender function,
DdMoCompRender can call the DeinterlaceBlt function of the deinterlace DDI to perform a bit-block deinterlace
of video stream objects. However, if DXVA_DeinterlaceQueryModeCapsFnCode-type information is passed to
DdMoCompRender instead, DdMoCompRender can call the DeinterlaceQueryModeCaps function of the
deinterlace DDI to query for the capabilities of a deinterlacing mode.
The following topics describe how the deinterlace DDI is mapped to the motion compensation callbacks:
Deinterlace Container Device for Deinterlacing
Calling the Deinterlace DDI from a User-Mode Component
Deinterlace Container Device for Deinterlacing
The sample functions for deinterlacing can only be used in the context of a DirectX VA device, so it is necessary to
first define and create a deinterlace container device.
If a driver supports accelerated decoding of compressed video, when initiated by the VMR, the driver also creates
two more DirectX VA devices: one to perform the video decoding work and one to perform deinterlacing
operations.
Note The deinterlace container device is a software construct only and does not represent any functional hardware
contained on a device.
Calling the Deinterlace DDI from a User-Mode
Component
A user-mode component, such as the VMR, initiates calls to the deinterlacing DDI.
So that the VMR can deinterlace and perform frame-rate conversion on video content, the display driver must
implement the motion compensation callback functions, which are defined by members of the
To simplify driver development, driver writers can use a motion-compensation code template and implement the
deinterlacing sample functions. The motion-compensation template calls the deinterlacing sample functions to
perform deinterlacing and frame-rate conversion on video content. For more information about using a motion-
compensation template, see Example Code for DirectX VA Devices.
The following steps explain how the VMR initiates calls to the deinterlace DDI:
1. When the VMR is added to a filter graph, it initiates a call to the driver-supplied DdMoCompGetGuids
callback function to retrieve the list of devices supported by the driver. The GetMoCompGuids member of
the DD_MOTIONCOMPCALLBACKS structure points to this callback function. For more information about a
filter graph, see KS Minidriver Architecture.
2. If the deinterlace container device GUID is present, the VMR initiates a call to the DdMoCompCreate callback
function to create an instance of the device. The CreateMoComp member of
DD_MOTIONCOMPCALLBACKS points to the callback function. In the DdMoCompCreate call, a pointer to
the container device GUID is specified in the lpGuid member of the DD_CREATEMOCOMPDATA structure.
The container device GUID is defined as follows:
DEFINE_GUID(DXVA_DeinterlaceContainerDevice,
0x0e85cb93,0x3046,0x4ff0,0xae,0xcc,0xd5,0x8c,0xb5,0xf0,0x35,0xfd);
3. To determine the available deinterlacing or frame-rate conversion modes for a particular input video format,
the VMR initiates a call to the driver-supplied DdMoCompRender callback function. The RenderMoComp
member of DD_MOTIONCOMPCALLBACKS points to the callback function. In the DdMoCompRender
call, the DXVA_ProcAmpControlQueryCapsFnCode constant (defined in dxva.h) is set in the dwFunction
member of the DD_RENDERMOCOMPDATA structure. The lpInputData member of
DD_RENDERMOCOMPDATA passes the input parameters to the driver by pointing to a completed
DXVA_VideoDesc structure. The driver returns its output through the lpOutputData member of
DD_RENDERMOCOMPDATA; lpOutputData points to a DXVA_DeinterlaceQueryAvailableModes
structure.
If the driver implements a DeinterlaceQueryAvailableModes sample function, the DdMoCompRender
callback function calls DeinterlaceQueryAvailableModes.
4. For each deinterlace mode supported by the driver, the VMR initiates a call to the driver-supplied
DdMoCompRendercallback function. In the DdMoCompRender call, the
DXVA_DeinterlaceQueryModeCapsFnCode constant (defined in dxva.h) is set in the dwFunction
member of DD_RENDERMOCOMPDATA. The lpInputData member of DD_RENDERMOCOMPDATA passes
the input parameters to the driver by pointing to a completed DXVA_DeinterlaceQueryModeCaps
structure. The driver returns its output through the lpOutputData member of DD_RENDERMOCOMPDATA;
lpOutputData points to a DXVA_DeinterlaceCaps structure.
If the driver implements a DeinterlaceQueryModeCaps sample function, the DdMoCompRender
callback function calls DeinterlaceQueryModeCaps.
5. After the VMR has determined the deinterlacing capabilities of a particular deinterlace mode (for example,
bob deinterlacing), it initiates a call to DdMoCompCreate to create an instance of the deinterlace mode
device (for example, the deinterlace bob device). In the DdMoCompCreate call, a pointer to the deinterlace
mode device GUID is specified in the lpGuid member of DD_CREATEMOCOMPDATA. The deinterlace bob
device GUID is defined as follows:
DEFINE_GUID(DXVAp_DeinterlaceBobDevice,
0x335aa36e,0x7884,0x43a4,0x9c,0x91,0x7f,0x87,0xfa,0xf3,0xe3,0x7e);
If the driver implements a DeinterlaceOpenStream sample function, the DdMoCompCreate callback

function calls DeinterlaceOpenStream.
6. For each deinterlacing operation, the VMR initiates a call to the driver-supplied DdMoCompRender callback
function. In the DdMoCompRender call, the DXVA_ProcAmpControlQueryCapsFnCode constant (defined
in dxva.h) is set in the dwFunction member of DD_RENDERMOCOMPDATA. The lpBufferInfo member of
DD_RENDERMOCOMPDATA points to an array of buffers that describes the destination surface and each
input video source sample. The lpInputData member of DD_RENDERMOCOMPDATA passes the input
parameters to the driver by pointing to a completed DXVA_DeinterlaceBlt structure. The driver does not
return any output; that is, the lpOutputData member of DD_RENDERMOCOMPDATA is NULL.
If the driver implements a DeinterlaceBlt sample function, the DdMoCompRender callback function calls
DeinterlaceBlt.
7. For each combination deinterlacing and substream compositing operation, the VMR on Microsoft Windows
Server 2003 SP1 and later and Windows XP SP2 and later initiates a call to the driver-supplied
DdMoCompRender callback function. In the DdMoCompRender call, the DXVA_DeinterlaceBltExFnCode
constant (defined in dxva.h) is set in the dwFunction member of DD_RENDERMOCOMPDATA. The
lpBufferInfo member of DD_RENDERMOCOMPDATA points to an array of buffers that describes the
destination surface and the surface for each input video source sample. The lpInputData member of
DD_RENDERMOCOMPDATA passes the input parameters to the driver by pointing to a completed
DXVA_DeinterlaceBltEx structure. The driver does not return any output; that is, the lpOutputData
member of DD_RENDERMOCOMPDATA is NULL.
If the driver implements a DeinterlaceBltEx sample function, the DdMoCompRender callback function
calls DeinterlaceBltEx.
8. When the VMR no longer needs to perform any more deinterlace operations, the driver-supplied
DdMoCompDestroy callback function is called. The DestroyMoComp member of
DD_MOTIONCOMPCALLBACKS points to the callback function.
If the driver implements a DeinterlaceCloseStream sample function, the DdMoCompDestroy callback
function calls DeinterlaceCloseStream.
9. The driver then releases any resources used by the deinterlace mode device.
Video Content for Deinterlace and Frame-Rate
Conversion
The driver receives a description of video content so that it can determine how it should deinterlace or frame-rate
convert such content. The driver receives this video content as a pointer to a DXVA_VideoDesc structure in the
following function calls:
DeinterlaceQueryAvailableModes
DeinterlaceQueryModeCaps
DeinterlaceOpenStream
The following examples indicate how the driver performs deinterlacing and frame-rate conversion on the received
video content.
Deinterlacing 720 x 480i Content Example
The DXVA_VideoDesc structure is filled as follows to direct the driver to deinterlace 720 x 480i content that is
sourced as two fields per sample at a frequency of 29.97 Hz.
MEMBER VALUE
SampleWidth 720
SampleHeight 480
SampleFormat DXVA_SampleFieldInterleavedOddFirst enumerator in

DXVA_SampleFormat
d3dFormat D3DFMT_YUY2 defined in the d3d8types.h and

d3d9types.h header files
InputSampleFreq.Numerator 30000 (29.97-Hz monitor frequency)
InputSampleFreq.Denominator 1001
OutputFrameFreq.Numerator 60000 (59.94-Hz monitor frequency)
OutputFrameFreq.Denominator 1001
Deinterlacing and Frame -Rate Conversion of 720 x 480i Content Example

The OutputFrameFreq member of the DXVA_VideoDesc structure is filled as follows to direct the driver to
deinterlace and frame-rate convert 720 x 480i content.
MEMBER VALUE
OutputFrameFreq.Numerator 85 (85-Hz monitor frequency)
Deinterlacing a Single Field to a Progressive Frame Example

The OutputFrameFreq member of the DXVA_VideoDesc structure is filled as follows to direct the driver to
deinterlace a single field to a progressive frame for later MPEG encoding.
MEMBER VALUE
OutputFrameFreq.Numerator 30000 (29.97-Hz monitor frequency)
Frame -Rate Conversion of 480p Content Example

The DXVA_VideoDesc structure is filled as follows to direct the driver to perform frame-rate conversion on 480p
content and to match the monitor display frequency.
MEMBER VALUE
SampleWidth 720
SampleHeight 480
SampleFormat DXVA_SampleProgressiveFrame enumerator in the

DXVA_SampleFormat enumeration
d3dFormat D3DFMT_YUY2 defined in the d3d8types.h and

d3d9types.h header files
InputSampleFreq.Numerator 60 (60 Hz monitor frequency)
InputSampleFreq.Denominator 1
OutputFrameFreq.Numerator 85 (85 Hz monitor frequency)

Deinterlacing on 64-bit Operating Systems
To ensure that deinterlacing operations initiated by a 32-bit application run successfully on a 64-bit operating
system, the display driver code must first detect whether the application is 32 bit or 64 bit. To perform the
detection, the driver should check the Size member of the DXVA_DeinterlaceBlt structure that the application
passes. The size of the 32-bit version of DXVA_DeinterlaceBlt is smaller than the size of the 64-bit version because
of the pointer size difference between 32 bit and 64 bit. If the driver determines that the initiating application is 32
bit, the driver should handle the deinterlacing operations by thunking. For more information about thunking, see
Supporting 32-Bit I/O in Your 64-Bit Driver.
The following example code demonstrates how the driver should handle the thunk:
switch (lpData->dwFunction) {
case DXVA_DeinterlaceBltFnCode:
{
DXVA_DeinterlaceBlt* pBlt = (DXVA_DeinterlaceBlt*)lpData->lpInputData;
if (pBlt->Size == sizeof(DXVA_DeinterlaceBlt)) {
// correctly formed 64-bit or 32-bit structure, so use it
}
#ifdef _WIN64
else if (pBlt->Size < sizeof(DXVA_DeinterlaceBlt)) {
// 32-bit structure, so thunk it!
}
#endif
else {
// unknown structure, so return error;
}
}

Combining Deinterlacing and Video Substream
Compositing
This section applies only to Microsoft Windows Server 2003 with Service Pack 1 (SP1) and later, and Windows XP
with Service Pack 2 (SP2) and later.
To improve video quality on hardware with limited memory bandwidth, driver writers can implement the
DeinterlaceBltEx function in their display drivers. The DeinterlaceBltEx function combines, within the YUV color
space, operations that composite the video substreams on top of the video stream with operations that deinterlace
and/or frame-rate convert each video frame. Driver writers are encouraged to support the DeinterlaceBltEx
function in their drivers for all of their deinterlacing modes.
The following topics describe how to support DeinterlaceBltEx:
Overview of DeinterlaceBltEx
Reporting Support for DeinterlaceBltEx
Supplying Video Substream and Destination Surfaces
Supporting Operations on Video Substream and Destination Surfaces
Displaying Samples and Background Color in the Target Rectangle
Processing Subrectangles
Input Buffer Order
Deinterlacing and Compositing on 64-bit Operating Systems
Overview of DeinterlaceBltEx
This section applies only to Windows Server 2003 with SP1 and later, and Windows XP with SP2 and
later.
The VMR on Windows Server 2003 SP1 and later and Windows XP SP2 and later can initiate calls to the display
driver's DeinterlaceBltEx function to combine deinterlacing and substream compositing operations.
The VMR on Windows Server 2003 and Windows XP SP1 uses DXVA to deinterlace or frame-rate convert video
and to output the video onto an RGB32 surface. The VMR then uses Direct3D to combine video substreams with
the video image. In other words, the video is first deinterlaced, resized, and then color-space converted to a
Direct3D RGB32 render target using the display driver's DeinterlaceBlt function. Then, video substreams are
composited over the top of the resulting video image using calls to the display driver's D3dDrawPrimitives2
function.
By using DeinterlaceBltEx rather than DeinterlaceBlt and D3dDrawPrimitives2 combined, operations can be
performed more efficiently on the available hardware.
The DeinterlaceBltEx function also can be called with progressive video and multiple video substreams. This
scenario can occur when the VMR is used for DVD playback that contains a mixture of progressive and interlaced
video. In this case, the driver should not attempt to deinterlace the video stream because the stream is already
progressive. The driver should combine the video stream with any given substreams and resize each stream as
required.
If you implement the DeinterlaceBltEx function in your driver, you must also implement the original
DeinterlaceBlt function. The VMR on Windows Server 2003 SP1 and later and Windows XP SP2 and later can
initiate calls to either the driver's DeinterlaceBltEx or DeinterlaceBlt function; the application controls which
function the VMR uses.
Reporting Support for DeinterlaceBltEx
later.
The display driver reports support for the DeinterlaceBltExdeinterlace DDI function by setting the
DXVA_VideoProcess_SubStreams, DXVA_VideoProcess_StretchX, and DXVA_VideoProcess_StretchY flags in the
VideoProcessingCaps member of the DXVA_DeinterlaceCaps structure. The driver returns a pointer to
DXVA_DeinterlaceCaps when its DeinterlaceQueryModeCaps function is called.
The display driver sets DXVA_VideoProcess_SubStreams to combine video substream compositing with
deinterlacing. The driver sets DXVA_VideoProcess_StretchX and DXVA_VideoProcess_StretchY because the pixel
aspect ratio of the video stream and substreams can be different and nonsquare, and the driver must be able to
independently stretch (horizontally and/or vertically) the video frame that is submitted for deinterlacing as well as
the supplied video substreams.
The DXVA_VideoProcess_YUV2RGB and DXVA_VideoProcess_AlphaBlend flags in the VideoProcessingCaps
member of the DXVA_DeinterlaceCaps structure have no meaning in the context of the driver's
DeinterlaceBltEx function. These flags relate to the original DeinterlaceBlt function. Because a display driver that
supports DeinterlaceBltEx must also support DeinterlaceBlt, the driver must still report these flags if it supports
their associated operations in the context of DeinterlaceBlt.
DXVA_VideoProcess_SubStreamsExtended DXVA_VideoProcess_YUV2RGBExtended and
DXVA_VideoProcess_AlphaBlendExtended flags
A display driver that implements the DeinterlaceBltEx function can support significantly enhanced color
information for each source and destination surface. The driver can report such support by setting the
DXVA_VideoProcess_SubStreamsExtended, DXVA_VideoProcess_YUV2RGBExtended, and
DXVA_VideoProcess_AlphaBlendExtended flags in the VideoProcessingCaps member of the
DXVA_DeinterlaceCaps structure.
Support for the DXVA_VideoProcess_SubStreamsExtended flag indicates that the display driver can perform the
necessary color adjustments to the source video streams and substreams. These adjustments are indicated in the
extended color data, as the video is deinterlaced, composited with the substreams, and written to the destination
surface. Extended color data is specified by members of the DXVA_ExtendedFormat structure in the
SampleFormat members of the DXVA_VideoSample2 structures for the source sample array (lpDDSrcSurfaces
parameter in the DeinterlaceBltEx call or Source member of the DXVA_DeinterlaceBltEx structure).
Support for the DXVA_VideoProcess_YUV2RGBExtended flag indicates that the display driver can perform a color-
space-conversion operation as the deinterlaced and composited pixels are written to the destination surface. If an
RGB destination surface is passed to the display driver, the VMR ensures that each color channel contains a
minimum of 8 bits. An RGB destination surface could be an offscreen, texture, or Direct3D render target, or a
combined texture and Direct3D render target surface type. The VMR still specifies the background color parameter
in the YUV color space even though an RGB destination surface is used.
Support for the DXVA_VideoProcess_AlphaBlendExtended flag indicates that the display driver can perform an
alpha-blend operation with the destination surface when the deinterlaced and composited pixels are written to the
destination surface. The driver must handle background color based on the alpha value of the fAlpha parameter in
the DeinterlaceBltEx call. When the alpha value is 1.0f, the background color is drawn opaque (without
transparency). When the alpha value is 0.0f, the background should not be drawn (transparent).
Supplying Video Substream and Destination Surfaces
later.
The VMR on Windows Server 2003 SP1 and later and Windows XP SP2 and later only supplies video substreams
with substream-surface formats that DXVA supports. That is, the VMR only supplies the following FOURCC codes
for alpha-blending substream-surface formats: AI44, IA44 or AYUV. For more information, see Loading an AYUV
Alpha-Blending Surface. Note that when multiple video substreams are supplied, each substream might be
formatted differently. Because the formats of the supplied video substreams are palletized surface formats, a
complete 16-color palette for each surface is supplied in the Palette member of each DXVA_VideoSample2
structure in the array that is passed in the pDDSrcSurfaces parameter when DeinterlaceBltEx is called. Therefore, the
driver is not required to maintain palette information for each video substream surface.
The VMR also only supplies destination surfaces whose formats are specified by the driver in the
d3dOutputFormat member of the DXVA_DeinterlaceCaps structure. The driver returns a pointer to
DXVA_DeinterlaceCaps when its DeinterlaceQueryModeCaps function is called.
Supporting Operations on Video Substream and
Destination Surfaces
later.
The VMR on Microsoft Windows Server 2003 SP1 and later and Windows XP SP2 and later must be able to
perform certain operations on video substream and destination surfaces.
Operations on Video Substream Surfaces
In addition to the operations on video substream surfaces that your driver's DeinterlaceBltEx and DeinterlaceBlt
functions perform, your driver must support the following operations:
Color Filling Substream Surfaces
The VMR and other Microsoft DirectShow components must be able to fill video substream surfaces to a known
initial color value, such as transparent black. Therefore, your driver should support calls to its DdBlt callback
function using the DDBLT_COLORFILL flag where the video substream surface is the target for the bit-block
transfer (blt).
For video substream surfaces with the AYUV FOURCC format, the VMR specifies the AYUV color for transparent
black in the dwFillColor member of the DDBLTFX structure. The driver receives DDBLTFX in the bltFX member of
the DD_BLTDATA structure when its DdBlt function is called. For information about the DDBLTFX structure, see the
Windows SDK documentation.
The AYUV color for transparent black is set as follows:
DXVA_AYUVsample2 clr;
clr.bCrValue = 0x80;
clr.bCbValue = 0x80;
clr.bY_Value = 0x10;
clr.bSampleAlpha8 = 0x00;
DWORD dwFillColor = *(DWORD*)&clr;
For video substream surfaces with the AI44 or IA44 format, the low-order byte of the value in the dwFillColor
member indicates the color value that the driver should use to fill the surface. Typically, the color value is 0.
Copying Contents to Substream Surfaces
The Line21 closed caption decoder and the teletext decoder create a source video substream surface that contains a
series of cached-character glyphs. Your driver should generate each frame of output by copying the appropriate
characters from the glyph cache to the video substream surface. The VMR then sends the video substream surface
to your driver's DeinterlaceBltEx function.
Therefore, your driver's DdBlt function should support copying any FOURCC surface to a video substream surface
of the same FOURCC format.
Your driver should indicate that it supports copying FOURCC formats by setting the DDCAPS2_COPYFOURCC flag
in the dwCaps2 member of the DDCORECAPS structure. The driver specifies the DDCORECAPS structure in the
ddCaps member of a DD_HALINFO structure. DD_HALINFO is returned by the driver's DrvGetDirectDrawInfo
function.
In a FOURCC video substream surface copy operation, the driver should not perform stretching or color-space
conversion operations.
Operations on Destination Surfaces
Your driver must support the following operations on the destination surface that is used in your driver's
DeinterlaceBltEx function:
Color Filling the Destination Surface
Because the VMR must initialize the destination surface to YUV opaque black, your driver must also support calls to
its DdBlt callback function using the DDBLT_COLORFILL flag where the target for the bit-block transfer is the
destination surface. The VMR specifies the color for opaque black in the dwFillColor member of the DDBLTFX
structure. The driver receives the DDBLTFX structure in the bltFX member of the DD_BLTDATA structure when its
DdBlt is called.
For YUV packed surface types, the VMR sets the fill color DWORD to the appropriate byte pattern for opaque black.
For a YUY2 surface, the fill color DWORD for opaque black is 0x80108010.
For planar surface types, the VMR sets the AYUV color for opaque black as follows:
DXVA_AYUVsample2 clr;
clr.bCrValue = 0x80;
clr.bCbValue = 0x80;
clr.bY_Value = 0x10;
clr.bSampleAlpha8 = 0xFF;
DWORD dwFillColor = *(DWORD*)&clr;
Your driver should ensure that the correct pixel values are written to each plane of the YUV surface.
Stretching the Destination Surface
Your driver must also support the destination surface being used as a source surface for a bit-block transfer that
combines a stretching operation with a color-space conversion. For more information, see Supporting Stretch Blit
Operations.
Copying Contents from the Destination Surface
Your driver's DdBlt function must support copying the FOURCC destination surface to a surface of the same
FOURCC format. The destination surface is used as the source surface in the copy operation. Your driver should
indicate that it supports copying FOURCC formats by setting the DDCAPS2_COPYFOURCC flag.
The destination surface for the bit-block transfer operations can be the primary surface or a Direct3D texture.
Displaying Samples and Background Color in the
Target Rectangle
This section applies only to Windows Server 2003 with SP1 and later, and Windows XP with SP2 and later.
The VMR on Windows Server 2003 SP1 and later and Windows XP SP2 and later can specify target rectangle and
background color to determine how the video stream and substreams are displayed.
The VMR specifies a target rectangle to identify the location within the destination surface to which your driver
should direct output. The coordinates of the source rectangles are always specified as absolute locations within the
source surface; likewise, the coordinates of the destination rectangles and target rectangle are always specified as
absolute locations within the destination surface. Typically, the source and destination rectangles for the video
stream and substreams are the same size as the source and destination surfaces; however, this is not always the
case. For more information, see Processing Subrectangles.
The following topics show how to display various samples with background color in the target rectangle:
Displaying 16:9 Video within a 4:3 Destination Surface
Combining Video Stream and Substream with Different Aspect Ratios
Combining Two Streams with Different Heights and Widths
Displaying 16:9 Video within a 4:3 Destination Surface
In the following example, the VMR directs to display a 16:9 video stream within a 4:3 destination surface.
Note that for clarity the preceding example does not contain any video substreams. In the preceding example, the
rectangles are as follows:
Target rectangle = {0, 0, 640, 480}
Video stream:
Source rectangle = {0, 0, 720, 480},
Destination rectangle = {0, 60,640,300}
Combining Video Stream and Substream with
Different Aspect Ratios
In the following example, the VMR calls the driver with a video stream destination rectangle that does not fully
cover the destination surface. This example can occur when the VMR presents DVD content where the video stream
is in the 4:3 aspect ratio and the subpicture stream is in the 16:9 aspect ratio.
The following diagram shows how, in this example, the video stream, video substream, and background color are
combined.
In the preceding example, the rectangles are as follows:

For the video stream, the source rectangle is {0, 0,720,480} and the destination rectangle is {107, 0, 747,480}.
For the subpicture stream , the source rectangle is {0, 0,720,480} and the destination rectangle is {0,
0,854,480}.
The Target Rectangle is also {0, 0,854,480}.
As shown in the preceding example, the left and right edges of the destination surface contain no pixels from the
video stream. The driver's DeinterlaceBltEx function should interpret pixels that fall outside the video stream's
destination rectangle as backgound color because they are combined with the pixels from the subpicture stream.
Combining Two Streams with Different Heights and
Widths
In the following example, the VMR calls the driver with a video stream and a video substream of different heights
as well as widths. The following diagram shows how, in this example, the driver combines the two streams and
background color:
Note that in the preceding example the driver's DeinterlaceBltEx function should only draw the specified
background color over the target rectangle, as shown in the following diagram.
In the preceding example, the VMR is directed to reduce the size of the output image horizontally and vertically by
a factor of two. The background color should only be displayed in the target rectangle. The driver must not write to
pixels in the destination surface that are outside of the target rectangle (hatched in the preceding diagram). In the
preceding example, the destination surface is 300x200 pixels, but the target rectangle is {0, 0,150,100}. The source
rectangle for the video stream is {0,0,300,150}; the destination rectangle for the video stream is {0,12,150,87}. The
substream source rectangle is {0,0,150,200}; the substream destination rectangle is {37,0,112, 100}. Remember that
the target rectangle is the bounding rectangle of the video stream and all substreams.
Processing Subrectangles
later.
The VMR on Windows Server 2003 SP1 and later and Windows XP SP2 and later can process subrectangular
regions of the source video image and video substreams and can write to subrectangular regions on the
destination surface. The VMR performs a subrectangular-process operation by making the coordinates of the
rectangles in the rcSrc and rcDest members of the DXVA_VideoSample2 structure for each sample different
from the coordinates of the source and destination surfaces.
If the deinterlace hardware supports subrectangular-process operations, the display driver reports this support by
setting the DXVA_VideoProcess_SubRects flag in the VideoProcessingCaps member of the
DXVA_DeinterlaceCaps structure. The driver returns a pointer to DXVA_DeinterlaceCaps when its
DeinterlaceQueryModeCaps function is called.
In subrectangular-process operations, the VMR can stretch subrectangles and can intersect subrectangles with
each other on the destination surface.
The following topics show how to perform various subrectangular-process operations:
Processing Subrectangles without Stretching
Stretching Subrectangles
Processing Subrectangles without Stretching
later.
In the following two examples, the destination surface is 720 x 576, the coordinates of the target rectangle are
{0,0,720,576}, and the background color is solid black.
The first example shows a case in which the video stream and substream rectangles do not intersect.
In this example, the reference video stream and single substream are characterized by the following rectangular
coordinates:
Video stream coordinates:
Source surface: {0,0,720,480}
Source subrectangle (rcSrc): {360,240,720,480}
Destination subrectangle (rcDest): {0,0,360,240}
Substream coordinates:
Source surface: {0,0,640,576}
In this example, the bottom-left corner of the video stream is displayed in the top-left corner of the destination
surface, and the bottom-right corner of the substream is displayed in the top-right corner of the destination
surface. The following diagram shows the output of the combination deinterlacing and substream compositing
operation (the hashed regions indicate the subrectangles that are processed).
The second example shows a case in which the video stream and substream rectangles intersect.
In the second example, the source surface coordinates are the same as in the first example. In this example, the
reference video stream and single substream are characterized by the following subrectangular coordinates:
Video stream subrectangular coordinates:
Substream subrectangular coordinates:
In this example, the lower-right corner of the video stream is displayed in the top-left corner of the destination
surface, shifted on the X and Y axis by +100. The top-left corner of the substream is displayed in the lower-right
corner of the destination surface, shifted on the X and Y axis by -100. The following diagram shows the output of
the combination deinterlacing and substream compositing operation.

Stretching Subrectangles
later.
In the following example, the destination surface is 720 x 480, the coordinates of the target rectangle are
{0,0,720,480}, and the background color is solid black.
The source surface of the video stream is 360 x 240, with the following source and destination subrectangles:
The source surface of the single substream is 360 x 240, with the following source and destination subrectangles:
The following diagram shows the output of the combination deinterlacing and substream compositing operation.

Input Buffer Order
later.
For each combination deinterlacing and substream compositing operation, the VMR initiates a call to the driver-
supplied DdMoCompRender callback function. In the DdMoCompRender call, the lpBufferInfo member of the
DD_RENDERMOCOMPDATA structure points to an array of buffers that describes the destination surface and the
surface for each input video source sample. The DdMoCompRender function in turn calls the driver's
DeinterlaceBltEx function. For more information, see Calling the Deinterlace DDI from a User-Mode Component.
The order of the elements in the array of DXVA_VideoSample2 structures in the Source member of the
DXVA_DeinterlaceBltEx structure matches the lpBufferInfo array with the exception that the destination surface
is not present.
The following topics describe the rules for arranging surfaces in the lpBufferInfo array and provide examples that
explain the sequence order of surfaces:
Input Buffer Order Rules
Input Buffer Order Example 1
Input Buffer Order Rules
later.
The order of the surfaces within the lpBufferInfo array conforms to the following rules:
The first surface in the array is the destination surface. The driver only writes to the destination surface.
The next sequence of surfaces in the array is the group of any previous destination surfaces, in reverse
temporal order, that the deinterlacing device requested for its deinterlace algorithm.
The next sequence of surfaces in the array is the collection of input interlaced or progressive surfaces that
the device requires in order to perform its deinterlace operation.
The next sequence of surfaces in the array is the collection of video substream surfaces, which are in Z order.
later.
Consider a device that does not require any previous output reference frames or previous or future input reference
frames to perform its deinterlacing operation (for example, the bob deinterlacer). The sequence of surfaces in the
lpBufferInfo array for this device are:
INDEX POSITION SURFACE TYPE TEMPORAL LOCATION
lpBufferInfo[0] Destination T
lpBufferInfo[1] Interlaced input T

later.
The VMR initiates a call to the driver's DeinterlaceBltEx function to use the device in Input Buffer Order Example 1
to combine 2 video substreams with an interlaced video stream. The sequence of surfaces in the lpBufferInfo
array are:
INDEX POSITION SURFACE TYPE TEMPORAL LOCATION LAYER LOCATION
lpBufferInfo[1] Interlaced input T Z
lpBufferInfo[2] Substream 0 Z+1

later.
and Input Buffer Order Example 2 to instead combine 3 video substreams with a progressive video stream. The
sequence of surfaces in the lpBufferInfo array are:
lpBufferInfo[1] Progressive input T Z
In the change from example 2 to 3, the video stream changed from interlaced to progressive and an additional
video substream became active.
later.
Consider a more sophisticated deinterlacing device that requires a single backward reference sample, a single
future reference sample, and the current sample to perform its deinterlace operation. Two video substreams are
also combined with the deinterlace operation. The sequence of surfaces in the lpBufferInfo array are:
lpBufferInfo[1] Interlaced input T-1 Z
lpBufferInfo[3] Interlaced input T+1 Z

later.
to combine the 2 video substreams with a progressive video stream. The VMR still passes the same number of
progressive video samples even though those samples are not necessary to produce the output in the destination
buffer. The sequence of surfaces in the lpBufferInfo array are:
lpBufferInfo[1] Progressive input T-1 Z
lpBufferInfo[2] Progressive input T Z
lpBufferInfo[3] Progressive input T+1 Z
The driver can ignore the surfaces at index 1 and index 3 because they are not required for the deinterlace
operation. Progressive samples are marked with the DXVA_SampleProgressiveFrame flag in the SampleFormat
member of DXVA_VideoSample2 structures for the samples. Substream samples are marked with the new
DXVA_SampleSubStream flag.
later.
Consider an even more sophisticated deinterlace device that requires two previous output frames, a single
backward reference sample, a single future reference sample, and the current sample to perform the deinterlace
operation. Two video substreams are also combined with the deinterlace operation. The sequence of surfaces in the
lpBufferInfo array are:
lpBufferInfo[1] Previous destination T-1
lpBufferInfo[2] Previous destination T-2
lpBufferInfo[3] Interlaced input T-1 Z
lpBufferInfo[5] Interlaced input T+1 Z

Deinterlacing and Compositing on 64-bit Operating
Systems
To ensure that deinterlacing with substream-compositing operations initiated by a 32-bit application run
successfully on a 64-bit operating system, the display driver code must first detect whether the application is 32 bit
or 64 bit. To perform the detection, the driver should check the size of the DXVA_DeinterlaceBltEx structure that
the application passes. If the driver determines that the initiating application is 32 bit, the driver should handle the
deinterlacing operations by thunking. The driver should use the DXVA_VideoSample32 and
DXVA_DeinterlaceBltEx32 structures to perform the deinterlace thunk. For more information about thunking, see
Supporting 32-Bit I/O in Your 64-Bit Driver.
Note When the driver code is compiled for 64-bit, the DXVA_VideoSample2 structure contains two extra
DWORD members to make the size of the 32-bit version of DXVA_VideoSample2 different from the 64-bit version.
Because of an 8-byte alignment, the 32-bit compiler adds 4 bytes of padding to the end of the 32-bit version,
which--without these two extra DWORD members--makes the 32-bit version the same size as the 64-bit version,
even accounting for the pointer-size difference between 32 bit and 64 bit. With two extra DWORD members
included in DXVA_VideoSample2 for 64-bit compile, the driver can differentiate between the 32-bit and 64-bit
versions based on the Size member of the DXVA_DeinterlaceBltEx structure.
The following example code demonstrates how the driver should handle the thunk:
case DXVA_DeinterlaceBltExFnCode:
{ DXVA_DeinterlaceBltEx* pBlt = (DXVA_DeinterlaceBltEx*)lpData->lpInputData;
switch (pBlt->Size) {
case sizeof(DXVA_DeinterlaceBltEx): // should be 4400 bytes on Win64
// should be 4144 bytes on Win32
break;
#ifdef _WIN64
case sizeof(DXVA_DeinterlaceBltEx32): // should be 4144 bytes
// 32-bit structure, so thunk it!
break;
#endif
default:
// unknown structure, so return error;
break;
}
}

Sample Functions for Deinterlacing
The sample deinterlacing functions in this section show how to implement deinterlacing and frame-rate
conversion functionality. The sample functions map to the motion compensation callback functions defined in the
DD_MOTIONCOMPCALLBACKS structure. You can implement each sample function and then use the motion-
compensation code template to complete the implementation. For more information, see Example Code for
DirectX VA Devices.
Deinterlace Container Device Class Sample Functions
The sample deinterlacing functions in the following table are member functions of
DXVA_DeinterlaceContainerDeviceClass (that is, they are called by using the deinterlace container device). For
more information, see Defining the Deinterlace Container Device Class and Performing ProcAmp Control and
Deinterlacing Operations.
MEMBER FUNCTION DESCRIPTION
DeinterlaceQueryAvailableModes Queries for available deinterlacing and frame-rate

conversion modes.
DeinterlaceQueryModeCaps Queries for the capabilities of a given deinterlacing and

frame-rate conversion mode.
Deinterlace Bob Device Class Sample Functions

The sample deinterlacing functions in the following table are member functions of
DXVA_DeinterlaceBobDeviceClass (that is, they are called by using the deinterlace bob device). For more
information, see Defining the Deinterlace Bob Device Class.
DeinterlaceOpenStream Opens a video stream object.
DeinterlaceBlt Provides bit-block deinterlacing of video stream objects.
DeinterlaceBltEx Windows Server 2003 SP1 and later and Windows XP

SP2 and later only.
Deinterlaces video and composites video substreams over the
top of the video stream.
DeinterlaceCloseStream Closes a video stream object.
Mapping Sample Functions to DD_MOTIONCOMPCALLBACKS

The sample functions in this section map to the motion compensation callback functions as shown in the following
table. That is, each sample function is called within its respective motion compensation callback.
FUNCTION DD_MOTIONCOMPCALLBACKS MEMBER
DeinterlaceQueryAvailableModes RenderMoComp
DeinterlaceQueryModeCaps RenderMoComp
DeinterlaceOpenStream CreateMoComp
DeinterlaceBlt RenderMoComp
DeinterlaceBltEx RenderMoComp
DeinterlaceCloseStream DestroyMoComp

The ProcAmp control DDI extends DirectX VA to support ProcAmp control and post processing of video content by
graphics device drivers. The ProcAmp control DDI is an interface between the video mixing renderer (VMR) and the
graphics device driver. The DDI maps to the existing DirectDraw and DirectX VA DDI. The DDI is not accessible via
the IAMVideoAccelerator interface. The ProcAmp control DDI is available in Microsoft DirectX version 9.0.
If a driver supports accelerated decoding of compressed video, the VMR will call the driver to create two DirectX
VA devices, one to perform the actual video decoding work, and the other to be used strictly for ProcAmp
adjustments.
This section covers the following topics:
Processing in the 8-bit YUV Color Space
VMR Video Processing
Mapping the ProcAmp Control DDI to DirectDraw and DirectX VA
ProcAmp Properties
Sample Functions for ProcAmp Control
Processing in the 8-bit YUV Color Space
Working in the YUV color space simplifies the calculations involved for ProcAmp adjustment control of a video
stream.
Y Processing
To perform ProcAmp adjustment for the Y component, subtract 16 from the Y value to position the black level at
zero. This removes the DC offset so that adjusting the contrast does not vary the black level. Because Y values
might be less than 16, negative Y values should be supported at this point. Contrast is adjusted by multiplying the
YUV pixel values by a constant. If U and V are not adjusted, a color shift will result whenever the contrast is
changed. The brightness property value is added (or subtracted) from the contrast adjusted Y values; this is done to
avoid introducing a DC offset due to adjusting the contrast. Finally, the value 16 is added to reposition the black
level at 16.
The following equation summarizes the steps described in the previous paragraph. C is the contrast value and B is
the brightness value.
Y' = ((Y - 16) x C) + B + 16
UV Processing
To perform ProcAmp adjustment for the U and V components, subtract 128 from both U and V values to position
the range around zero. The hue property is implemented by mixing the U and V values together as shown in the
following equations. H is the desired hue angle:
U' = (U-128) x Cos(H) + (V-128) x Sin(H)

V' = (V-128) x Cos(H) - (U-128) x Sin(H)
Saturation is adjusted by multiplying U' and V' by a pair of constants, and then by adding 128 to each. The
combined processing of hue and saturation on the UV data is shown in the following equations. H is the desired
hue angle, C is the contrast value, and S is the saturation value:
U'' = (((U-128) x Cos(H) + (V-128) x Sin(H)) x C x S) + 128

V'' = (((V-128) x Cos(H) - (U-128) x Sin(H)) x C x S) + 128

VMR Video Processing
The VMR can perform the following sequence of processing operations on the video before it is displayed. The
VMR's mixer component always performs these operations in the order listed. Color space conversion and aspect
ratio correction are applied to all video streams; other operations are optional. The VMR performs only those
operations that are requested by the video playback application.
ProcAmp control adjustments
Deinterlacing
Aspect ratio correction
Color space conversion
Vertical or horizontal mirroring and alpha blending
Whenever possible, the VMR's mixer combines as many of these processes as possible to reduce the overall
memory bandwidth needed to process the video. The degree to which these processes can be combined is
determined by the capabilities of the hardware.
The illustrations in this section show the video processing pipelines used by the VMR's mixer to process video
during a ProcAmp control operation. The operations performed by the mixer depend on the capabilities of the
hardware. In the illustrations, rectangles represent Direct3D surfaces and circles represent Direct3D or DirectX VA
operations. The illustrations show the video processing pipelines for the following capabilities:
Hardware that can perform color space conversion and horizontally resize the video image.
Hardware that cannot perform color space conversion and cannot horizontally resize the video image, but
can support YUV textures.
Hardware that cannot perform color space conversion, cannot horizontally resize the video image, and
cannot support YUV textures.
The output surface of the VMR's processing pipeline is always a Direct3D render target. Applications are able to
configure the VMR such that the output render target may also be a Direct3D texture or part of a Direct3D swap
chain.
The following illustration shows the video processing pipelines used by the VMR to process progressive video
when the ProcAmp control hardware is able to perform color space conversion and horizontally resize the video
image.
Usually, a video playback application does not request that the VMR perform alpha blending or vertical/horizontal
mirroring of the video as it is displayed. The VMR is then able to incorporate all the video processing into a single
stage. In this case, the first pipeline is used. If the application requests that the VMR perform alpha blending or
vertical/horizontal mirroring of the video image prior to display, the VMR inserts an extra stage to the pipeline. In
this case, the second pipeline is used.
The following illustration shows the video pipeline used by the VMR to process progressive video when the
ProcAmp control hardware cannot perform color space conversion and cannot horizontally resize the video image
during a ProcAmp adjustment operation (as indicated by the DXVA_VideoProcess_YUV2RGB and
DXVA_VideoProcess_StretchX enumerators in DXVA_VideoProcessCaps), but does support YUV textures.
The following illustration shows the video pipelines used by the VMR to process progressive video when the
ProcAmp control hardware cannot perform color space conversion, cannot horizontally resize the video image
during a ProcAmp adjustment operation (as indicated by the DXVA_VideoProcess_YUV2RGB and
DXVA_VideoProcess_StretchX enumerators in DXVA_VideoProcessCaps), and does not support YUV textures.
The VMR uses the first pipeline if the application does not request any alpha blending or mirroring of the video
image. The VMR uses the second pipeline if the application requests either alpha blending or mirroring of the video
image.
Mapping the ProcAmp Control DDI to DirectDraw
and DirectX VA
ProcAmp control functionality must be accessed through DirectDraw's motion compensation callback functions to
which the ProcAmp control DDI can be mapped. For more information about mapping a DirectX VA DDI to
DirectDraw's motion compensation callbacks, see Mapping the Deinterlace DDI to DirectDraw and DirectX VA.
The following topics describe how the ProcAmp control DDI is mapped to the motion compensation callbacks:
Deinterlace Container Device for ProcAmp Control
Calling the ProcAmp Control DDI from a User-Mode Component
Deinterlace Container Device for ProcAmp Control
The sample functions for ProcAmp control can only be used in the context of a DirectX VA device, so it is necessary
to first define and create a deinterlace container device.
The deinterlace container device for deinterlacing can also be used for ProcAmp control to determine the
capabilities of a ProcAmp control device (if the driver supports ProcAmp control adjustments). If supported, the
driver creates the ProcAmp control device when the VMR initiates a call to do so.
Note The deinterlace container device is a software construct only and does not represent any functional hardware
contained on a device.
Calling the ProcAmp Control DDI from a User-Mode
Component
A user-mode component, such as the VMR, initiates calls to the ProcAmp Control DDI.
So that the VMR can access ProcAmp control functionality, the display driver must implement the motion
compensation callback functions, which are defined by members of the DD_MOTIONCOMPCALLBACKS structure.
To simplify driver development, driver writers can use a motion-compensation code template, and implement the
ProcAmp control sample functions. The motion-compensation template calls its ProcAmp control sample functions
to perform ProcAmp control operations. For more information about using a motion-compensation template, see
Example Code for DirectX VA Devices.
The following steps explain how the VMR initiates calls to the ProcAmp Control DDI:
1. When the VMR is added to a filter graph, it initiates a call to the driver-supplied DdMoCompGetGuids
DD_MOTIONCOMPCALLBACKS points to this callback function. For more information about a filter graph,
see KS Minidriver Architecture.
2. If the deinterlace container device GUID is present, the VMR initiates a call to the DdMoCompCreate callback
function to create an instance of the device. The CreateMoComp member of
the container device GUID is specified in the lpGuid member of the DD_CREATEMOCOMPDATA structure.
The container device GUID is defined as follows:
DEFINE_GUID(DXVA_DeinterlaceContainerDevice,
0x0e85cb93,0x3046,0x4ff0,0xae,0xcc,0xd5,0x8c,0xb5,0xf0,0x35,0xfd);
3. To determine capabilities of the ProcAmp control device, the VMR initiates a call to the driver-supplied
DdMoCompRender callback function. The RenderMoComp member of DD_MOTIONCOMPCALLBACKS
points to the callback function. In the DdMoCompRender call, the
DXVA_ProcAmpControlQueryCapsFnCode constant (defined in dxva.h) is set in the dwFunction
member of the DD_RENDERMOCOMPDATA structure. The lpInputData member of
DD_RENDERMOCOMPDATA passes the input parameters to the driver by pointing to a DXVA_VideoDesc
lpOutputData points to a DXVA_ProcAmpControlCaps structure.
If the driver implements a ProcAmpControlQueryCaps sample function, the DdMoCompRender callback
function calls ProcAmpControlQueryCaps.
4. For each ProcAmp adjustment property supported by the hardware, the VMR initiates a call to the driver-
supplied DdMoCompRendercallback function. In the DdMoCompRender call, the
member of DD_RENDERMOCOMPDATA. The lpInputData member of DD_RENDERMOCOMPDATA passes
the input parameters to the driver by pointing to a completed DXVA_ProcAmpControlQueryRange
lpOutputData points to a DXVA_VideoPropertyRange structure.
If the driver implements a ProcAmpControlQueryRange sample function, the DdMoCompRender callback
function calls ProcAmpControlQueryRange.
5. After the VMR has determined the ProcAmp adjustment capabilities of the hardware, it initiates a call to
DdMoCompCreate to create an instance of the ProcAmp control device. In the DdMoCompCreate call, a
pointer to the ProcAmp control device GUID is specified in the lpGuid member of
DD_CREATEMOCOMPDATA. The ProcAmp control device GUID is defined as follows:
DEFINE_GUID(DXVA_ProcAmpControlDevice,
0x9f200913,0x2ffd,0x4056,0x9f,0x1e,0xe1,0xb5,0x08,0xf2,0x2d,0xcf);
If the driver implements a ProcAmpControlOpenStream sample function, the DdMoCompCreate callback

function calls ProcAmpControlOpenStream.
6. For each ProcAmp adjusting operation, the VMR initiates a call to the driver-supplied
DdMoCompRendercallback function. In the DdMoCompRender call, the
member of DD_RENDERMOCOMPDATA. The lpBufferInfo member of DD_RENDERMOCOMPDATA
points to an array of two buffers that describe the destination and source surfaces. The lpInputData
member of DD_RENDERMOCOMPDATA passes the input parameters to the driver by pointing to a
completed DXVA_ProcAmpControlBlt structure. The driver does not return any output; that is, the
lpOutputData member of DD_RENDERMOCOMPDATA is NULL.
If the driver implements a ProcAmpControlBlt sample function, the DdMoCompRender callback function
calls ProcAmpControlBlt.
7. When the VMR no longer needs to perform any more ProcAmp control operations, the driver-supplied
If the driver implements a ProcAmpControlCloseStream sample function, the DdMoCompDestroy
callback function calls ProcAmpControlCloseStream.
8. The driver releases any resources used by the ProcAmp control device.
ProcAmp Properties
A display driver must supply minimum, maximum, step size, and default values when the VMR queries the driver
for information about ProcAmp properties. The driver can supply this information in response to a call to its
ProcAmpControlQueryRange function. Although the driver can return any value for the ProcAmp properties, the
following are the recommended settings (all values are floats).
PROPERTY MINIMUM MAXIMUM DEFAULT INCREMENT
Brightness -100.0F 100.0F 0.0F 0.1F
Contrast 0.0F 10.0F 1.0F 0.01F
Saturation 0.0F 10.0F 1.0F 0.01F
Hue -180.0F 180.0F 0.0F 0.1F
It is important that the default values result in a null transform of the video stream. This allows the VMR to bypass
the ProcAmp adjustment stage in its video pipeline if an application has not altered any of the ProcAmp control
properties.
The VMR and the display driver perform certain validations when working with ProcAmp properties. The VMR
enforces the following parameter validations before calling a driver:
The VMR ensures that values supplied by applications fall within the valid range as specified by the driver.
The VMR clamps any application-provided values to the specified range. For example, if the maximum value
for brightness is 100, and the application supplies a value of 105, the VMR clamps the application's value to
100. When the application queries the VMR to determine the current brightness setting, it receives the
clamped value, in this case 100.
The VMR also makes any necessary rounding to the application-provided value to ensure that the values fall
on the nearest location as indicated by the step size increment returned by your driver. For example, if the
brightness step size increment is 0.5, the minimum permitted brightness value is -100.0, and the application
supplies a value of -80.7. The VMR adjusts the application's value to the nearest valid value, -80.5 in this
case.
The driver should ensure that the following relationships hold:
The maximum range value is greater than the minimum range value. This implies that the difference
between the maximum and minimum value is greater than 0.0.
The default and maximum values fall on valid locations as specified by the step size increment, as shown in
the following expressions:
min + (int((default - min) / increment) * increment) == default

min + (int((max - min) / increment) * increment) == max
Because applications usually use Windows' slider controls to display ProcAmp settings, and because the
maximum range of Windows' slider controls is 65536, drivers should keep the number of distinct ProcAmp
values to fewer than 65536. The following inequality should be true for the values chosen:
int((max - min) / increment) < 65536.
For ProcAmp properties that are not supported by your hardware, the driver should return the maximum
value, minimum value, and default value with a step size increment of 0.0.
Sample Functions for ProcAmp Control
The sample ProcAmp functions in this section show how to implement ProcAmp control functionality. These
sample functions map to the motion compensation callback functions defined in the
DD_MOTIONCOMPCALLBACKS structure. You can implement each sample function, and then use a motion-
compensation code template to complete the implementation. For more information, see Example Code for
DirectX VA Devices.
Deinterlace Container Device Class Sample Functions
The sample ProcAmp control functions in the following table are member functions of
DXVA_DeinterlaceContainerDeviceClass (that is, they are called using the deinterlace container device). For more
information, see Defining the Deinterlace Container Device Class and Performing ProcAmp Control and
Deinterlacing Operations.
ProcAmpControlQueryCaps Queries the driver to determine input requirements of the

ProcAmp control device.
ProcAmpControlQueryRange Queries the driver to determine the minimum, maximum,

step size, and default value for each ProcAmp property.
ProcAmp Control Device Class Sample Functions

The sample ProcAmp control functions in the following table are member functions of
DXVA_ProcAmpControlDeviceClass (that is, they are called using the ProcAmp control device). For more
information, see Defining the ProcAmp Control Device Class.
ProcAmpControlOpenStream Creates a ProcAmp stream object.
ProcAmpControlBlt Performs the ProcAmp adjustment operation by writing

the output to the destination surface.
ProcAmpControlCloseStream Closes the ProcAmp stream object and instructs the

device driver to release hardware resources associated
with the stream.

The sample functions in this section map to the motion compensation callback functions as follows. That is, each
sample function is called within its respective motion-compensation callback.
FUNCTION DD_MOTIONCOMPCALLBACKS MEMBER
ProcAmpControlQueryCaps RenderMoComp
ProcAmpControlQueryRange RenderMoComp
ProcAmpControlOpenStream CreateMoComp
ProcAmpControlBlt RenderMoComp
ProcAmpControlCloseStream DestroyMoComp

COPP Processing
This section applies only to Microsoft Windows Server 2003 with Service Pack 1 (SP1) and later, and Windows XP
with Service Pack 2 (SP2) and later.
The Certified Output Protection Protocol (COPP) device driver interface (DDI) extends DirectX Video Acceleration
(VA) to support copy protection of video that is output by various connectors of the graphics adapter. The COPP
DDI is an interface between the Video Mixing Renderer (VMR) and the video miniport driver. The COPP DDI maps
to the existing DirectDraw and DirectX VA DDI. The DDI is not accessible via the IAMVideoAccelerator interface.
The DDI is accessible to applications via the IAMCertifiedOutputProtection interface.
For more information about the IAMCertifiedOutputProtection and IAMVideoAccelerator interfaces, see the
latest DirectX Software Development Kit (SDK) documentation.
If a video miniport driver supports the passing of protected commands and status between applications and the
driver, the VMR will initiate a call to the driver to create a DirectX VA COPP device.
The following topics describe the COPP DDI and how to support COPP:
Introduction to COPP
Mapping the COPP DDI to DirectDraw and DirectX VA
Sample Functions for COPP
Returning Error Codes from COPP Functions
Performing COPP Operations
Implementation Tips and Requirements for COPP
Introduction to COPP
later.
COPP provides a mechanism of applying copy protection to video that is output by the graphics adapter. COPP
provides a common protocol for sending various link-protection requirements to the graphics adapter in a more
protected fashion than by using the IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS I/O control (IOCTL) code.
The following topics describe COPP:
Cryptographic Primitives Used by COPP
Communicating Through a Secure Channel
Graphics Adapter Output Requirements to Support COPP
Cryptographic Primitives Used by COPP
COPP uses the following cryptographic primitives:
Public key cryptography
COPP requires the RSA algorithm with 2,048-bit keys for public key encryption and decryption. For information
about the RSA algorithm, see the RSA Laboratories website.
Digital certificates
COPP uses eXtensible rights Markup Language (XrML) digital certificates.
Message authentication code (MAC)
COPP uses a one-key Cipher Block Chaining (CBC)-mode MAC (OMAC) for message authenticity. The OMAC is
based on Advanced Encryption Standard (AES). For information about AES, see the RSA Laboratories website. For
more information about OMAC, see the OMAC-1 algorithm.
Communicating Through a Secure Channel
This section applies only to Windows Server 2003 SP1 and later, and Windows XP SP2 and later.
To use COPP, a secure communication channel must be established between the application and the certified
physical output connector of the graphics adapter. The following topics describe establishing and using a secure
communication channel:
Authenticated Key Exchange
Command Security and Authenticity
Status Security and Authenticity
Authenticated Key Exchange
The following figure shows establishing a secure connection through authentication and key exchange. First, the
video miniport driver supplies the graphics hardware certificate to the application. Next, the application extracts the
public key from the graphics hardware certificate. After the application generates a data integrity key (kDI), the
application uses the public key to encrypt a sequence that includes the data integrity key and supplies the sequence
to the driver.
Command and status messages are subsequently passed unencrypted; however, for each message, MACs are
created by using the data integrity key.
For more information about MACs, see Cryptographic Primitives Used by COPP.
The following table describes the values in the preceding figure.
VALUE DESCRIPTION
rG H 128-bit random number generated by the driver.
CertG H Variable-length digital certificate used by the graphics

hardware.
PG H (rG H , kDI, status_start, command_start) Start sequence for the secure channel, which consists of
the following items concatenated together:
128-bit random number generated by the driver.
128-bit random data integrity session key
generated by the application.
32-bit random starting status sequence number
generated by the application.
32-bit random starting command sequence
number generated by the application.
The application encrypts the sequence by using the public
key obtained from the graphic hardware certificate. The
sequence is 2,048 bits long; the remainder of the
sequence is padded with 0s.

Command Security and Authenticity
The following figure shows an application sending command messages to the video miniport driver across the
secure channel.
These command messages are contained in an envelope. The envelope contains data and MAC sections. The
application calculates the MAC of the command data by using the data integrity key and the OMAC. For more
information about the MAC and OMAC, see Cryptographic Primitives Used by COPP.
VALUE DESCRIPTION
COMMAND Variable-length command data.
MACKDI(COMMAND) 128-bit MAC of the command data using the data

integrity session key.

Status Security and Authenticity
The following figure shows an application requesting status messages from the video miniport driver, and then the
video miniport driver sending the status messages to the application across the secure channel.
These status messages are contained in an envelope. The envelope contains data and MAC sections. The video
miniport driver calculates the MAC of the status data by using the data integrity key and the OMAC. For more
information about the MAC and OMAC, see Cryptographic Primitives Used by COPP.
VALUE DESCRIPTION
rAPP 128-bit random number generated by the application.
STATUS Variable-length status data.
MACKDI(rAPP, STATUS) 128-bit MAC of the status data using the data integrity
session key.

Graphics Adapter Output Requirements to Support
COPP
To support COPP, the physical output connector of the graphics adapter must perform the following tasks:
Protect key information
A certified output must protect key information that was issued to the output upon certification.
Restrict unauthorized components from accessing secure content
A certified output must prevent unauthorized software or hardware components from accessing content sent
through a COPP secure channel. The certified output should only permit COPP commands to operate on data
received through the COPP secure channel.
Switch to a failure mode if the output fails
If the certified output can no longer enforce the configuration profile specified at configuration time, then the
output should cease decrypting incoming content and should send a status message to the application. The status
message should indicate that the output can no longer perform as configured.
Mapping the COPP DDI to DirectDraw and DirectX
VA
COPP functionality must be accessed through the motion compensation callback functions of DirectDraw, to which
the COPP DDI can be mapped. Because the COPP DDI is implemented in the video miniport driver, the display
driver must communicate with the video miniport driver by using COPP IOCTLs.
The COPP DDI can be mapped to the motion compensation callback functions because they do not use typed
parameters (that is, their single parameter is a pointer to a structure). In other words, the information in the single
parameter that is passed to a motion compensation callback function can be processed according to its information
type.
For example, if DXVA_COPPGetCertificateLengthFnCode-type information is passed to the DdMoCompRender
function, then DdMoCompRender can initiate a call to the COPPGetCertificateLength function of the COPP DDI to
query for the length in bytes of the certificate used by the graphics hardware. However, if
DXVA_COPPSequenceStartFnCode-type information is passed to DdMoCompRender instead, then
DdMoCompRender can initiate a call to the COPPSequenceStart function of the COPP DDI to indicate the start of a
protected command and status sequence on the current video session.
The following topics describe how the COPP DDI is mapped to the motion compensation callback functions:
DirectX VA COPP Device
Calling the COPP DDI from a User-Mode Component
Calling the COPP DDI from the Display Driver
DirectX VA COPP Device
The display driver should be implemented so that a DirectX VA COPP device is created for each video session. The
COPP device represents an end point to receive COPP commands and status requests. The COPP device also stores
the protection settings of its associated physical connector. A physical connector can support multiple content
protection types. For example, an S-Video connector can support analog content protection (ACP) in addition to
Content Generation Management System for Analog (CGMS-A) protection. For more information, see Defining the
COPP Device Class.
Multiple instances of the COPP device are required so that different processes can configure the settings of the
graphics adapter's output through COPP. Therefore, you should implement the COPP device class so that each
COPP device acts appropriately when multiple video sessions are active on the system.
Note A video session consists of a video stream possibly combined with one or more video substreams. A video
session is tied to a particular graphics adapter's output connector. Several video sessions can be active on a system
and within a single process.
Calling the COPP DDI from a User-Mode Component
A user-mode component, such as the VMR, initiates calls to the COPP DDI.
So that the VMR can notify the video miniport driver to apply protection to the graphics adapter's video output, the
display driver must implement the motion compensation callback functions, which are defined by members of the
To simplify driver development, driver writers can use a motion-compensation code template and implement
COPP IOCTLs and the COPP sample functions. The display driver and video miniport driver use the COPP IOCTLs to
communicate. For more information, see Calling the COPP DDI from the Display Driver. The motion-compensation
code template initiates calls to the COPP sample functions. For more information about using a motion-
compensation code template, see Example Code for DirectX VA Devices.
The following steps explain how the VMR initiates calls to the COPP DDI:
1. When the VMR is added to a filter graph, it initiates a call to the display driver-supplied DdMoCompGetGuids
the DD_MOTIONCOMPCALLBACKS structure points to this callback function. For more information about a
filter graph, see KS Minidriver Architecture.
2. If the DirectX VA COPP device GUID is present, then the VMR initiates a call to the DdMoCompCreate
callback function to initialize a COPP device on the current video session. The CreateMoComp member of
the COPP device GUID is specified in the lpGuid member of the DD_CREATEMOCOMPDATA structure. The
COPP device GUID is defined as follows:
DEFINE_GUID(DXVA_COPPDevice, 0xd2457add,0x8999,0x45ed,0x8a,0x8a,0xd1,0xaa,0x04,0x7b,0xa4,0xd5);
The display driver must communicate with the video miniport driver by using a COPP IOCTL. If the video
miniport driver implements a COPPOpenVideoSession sample function, then the DdMoCompCreate callback
function initiates the call to COPPOpenVideoSession.
3. To determine the length of the variable-length graphics hardware certificate that should be used for the
current video session, the VMR initiates a call to the display driver-supplied DdMoCompRender callback
function. The RenderMoComp member of DD_MOTIONCOMPCALLBACKS points to the callback function.
In the DdMoCompRender call, the dwFunction member of DD_RENDERMOCOMPDATA is set to the
value DXVA_COPPGetCertificateLengthFnCode (defined in dxva.h). The display driver does not receive
any input in this call; that is, the lpInputData member of DD_RENDERMOCOMPDATA is NULL. The display
driver returns the length of the certificate through the lpOutputData member of
DD_RENDERMOCOMPDATA; lpOutputData points to a DWORD data type.
miniport driver implements a COPPGetCertificateLength sample function, then the DdMoCompRender
callback function initiates the call to COPPGetCertificateLength.
4. To retrieve the certificate used by the graphics hardware, the VMR initiates a call to the display driver-
supplied DdMoCompRender callback function. In the DdMoCompRender call, the dwFunction member
of DD_RENDERMOCOMPDATA is set to the value DXVA_COPPKeyExchangeFnCode (defined in dxva.h).
The display driver does not receive any input in this call; that is, the lpInputData member of
DD_RENDERMOCOMPDATA is NULL. The lpBufferInfo member of DD_RENDERMOCOMPDATA points to a
single RGB32 system memory surface that contains the space required for the display driver to store the
certificate. The display driver returns the 128-bit random number that it generated through the
lpOutputData member of DD_RENDERMOCOMPDATA.
miniport driver implements a COPPKeyExchange sample function, then the DdMoCompRender callback
function initiates the call to COPPKeyExchange.
5. To set the current video session to protected mode, the VMR initiates a call to the display driver-supplied
DdMoCompRender callback function. In the DdMoCompRender call, the dwFunction member of
DD_RENDERMOCOMPDATA is set to the value DXVA_COPPSequenceStartFnCode (defined in dxva.h).
The lpInputData member of DD_RENDERMOCOMPDATA passes the input command and status sequence
start codes to the display driver by pointing to a completed DXVA_COPPSignature structure. The display
driver does not return any output; that is, the lpOutputData member of DD_RENDERMOCOMPDATA is
NULL.
miniport driver implements a COPPSequenceStart sample function, then the DdMoCompRender callback
function initiates the call to COPPSequenceStart.
6. To set the protection level on the physical connector associated with the DirectX VA COPP device, the VMR
initiates a call to the display driver-supplied DdMoCompRender callback function. In the DdMoCompRender
call, the dwFunction member of DD_RENDERMOCOMPDATA is set to the value
DXVA_COPPCommandFnCode (defined in dxva.h). The lpInputData member of
DD_RENDERMOCOMPDATA passes the input parameters to the display driver by pointing to a completed
DXVA_COPPCommand structure. The display driver does not return any output; that is, the lpOutputData
member of DD_RENDERMOCOMPDATA is NULL.
miniport driver implements a COPPCommand sample function, then the DdMoCompRender callback
function initiates the call to COPPCommand.
7. To retrieve protection information regarding the physical connector being used, the VMR initiates a call to
the display driver-supplied DdMoCompRender callback function. In the DdMoCompRender call, the
dwFunction member of DD_RENDERMOCOMPDATA is set to the value DXVA_COPPQueryStatusFnCode
(defined in dxva.h). The lpInputData member of DD_RENDERMOCOMPDATA passes the input parameters
to the display driver by pointing to a completed DXVA_COPPStatusInput structure. The display driver
returns its output through the lpOutputData member of DD_RENDERMOCOMPDATA; lpOutputData
points to a DXVA_COPPStatusOutput structure.
miniport driver implements a COPPQueryStatus sample function, then the DdMoCompRender callback
function initiates the call to COPPQueryStatus.
8. When the VMR no longer needs to perform any more COPP operations, the display driver-supplied
miniport driver implements a COPPCloseVideoSession sample function, then the DdMoCompDestroy
callback function initiates the call to COPPCloseVideoSession.
9. The drivers then release any resources used by the DirectX VA COPP device.
Calling the COPP DDI from the Display Driver
The display driver initiates calls to the video miniport driver's COPP DDI by using COPP I/O control (IOCTL)
requests. The display driver calls the EngDeviceIoControl function by using a COPP IOCTL to send a synchronous
COPP request to the video miniport driver. Graphics Device Interface (GDI) uses a single buffer for both input and
output to pass the request to the I/O subsystem. The I/O subsystem routes the request to the video port, which
processes the request by using the video miniport driver.
The following sample data structure and IOCTLs can be used to transfer COPP information between the display
driver and the video miniport driver. Your drivers can either use the data structure and IOCTLs or create new ones,
as appropriate.
typedef struct {
PVOID* ppThis;
PVOID InputBuffer;
HRESULT phr;
} COPP_IO_InputBuffer;
#define IOCTL_COPP_OpenDevice \
CTL_CODE(FILE_DEVICE_VIDEO, 2190, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_COPP_CloseDevice \
#define IOCTL_COPP_GetCertificateLength \
#define IOCTL_COPP_KeyExchange \
#define IOCTL_COPP_StartSequence \
#define IOCTL_COPP_Command \
#define IOCTL_COPP_Status \
If you do not use the preceding IOCTLs, then you can define your own private IOCTLs, which must be formatted as
described in Defining I/O Control Codes.
Sample Functions for COPP
The sample COPP functions show how to implement COPP processing functionality. These sample functions map
to the motion compensation callback functions defined in the DD_MOTIONCOMPCALLBACKS structure. You can
implement each sample function and a corresponding COPP I/O control (IOCTL) request, and then use a motion-
compensation code template and a video miniport driver template to complete the implementation. For more
information, see Example Code for DirectX VA Devices.
COPP Sample Functions
The sample COPP functions in the following table are called by using the COPP device. For more information about
the COPP device, see COPP Device Definition Template Code and Defining the COPP Device Class.
COPPOpenVideoSession Initializes the COPP device used for the current video
session.
COPPGetCertificateLength Retrieves the size, in bytes, of the certificate used by the

graphics hardware.
COPPKeyExchange Retrieves the digital certificate used by the graphics

hardware.
COPPSequenceStart Sets the current video session to protected mode.
COPPCommand Sets the protection level on the physical connector

associated with the COPP device.
COPPQueryStatus Retrieves status on a protected video session that is

associated with a COPP device.
COPPCloseVideoSession Closes the COPP device object and instructs the driver to
release hardware resources associated with the COPP
device.

The sample functions in this section map to the motion compensation callback functions by using a COPP IOCTL,
as follows; that is, each sample function is called within its respective COPP IOCTL, and each COPP IOCTL is passed
to the EngDeviceIoControl function within its respective motion-compensation callback function.
FUNCTION IOCTL DD_MOTIONCOMPCALLBACKS MEMBER
COPPOpenVideoSession IOCTL_COPP_OpenDevice CreateMoComp
COPPGetCertificateLength IOCTL_COPP_GetCertificateLengt RenderMoComp

h
COPPKeyExchange IOCTL_COPP_KeyExchange RenderMoComp
COPPSequenceStart IOCTL_COPP_StartSequence RenderMoComp
COPPCommand IOCTL_COPP_Command RenderMoComp
COPPQueryStatus IOCTL_COPP_Status RenderMoComp
COPPCloseVideoSession IOCTL_COPP_CloseDevice DestroyMoComp

Returning Error Codes from COPP Functions
The COPP DDI can return the following types of error codes:
Error codes that are defined in the winerror.h header file and are common to all Windows applications.
These Windows error codes start with the E_ prefix.
Error codes that are defined in the ddraw.h header file and are unique to DirectDraw. These DirectDraw error
codes start with the DDERR_ prefix.
No error codes are unique to the COPP DDI.
When implementing the COPP DDI, you should restrict your usage of Windows error codes to the following:
E_UNEXPECTED
The display driver is in an invalid state. For example, the COPPSequenceStart function was called before the
COPPKeyExchange function.
E_INVALIDARG
Input parameters passed to the driver are invalid.
E_POINTER
An output parameter, which should point to a valid address, is NULL.
The COPP DDI can return the E_FAIL and DDERR_GENERIC error codes; however, because they do not indicate what
caused the error, their use is discouraged.
The Remarks section for each COPP function specifies the DDERR_ error codes that the COPP function can report.
The COPP DDI should not be required to return any other DDERR_ error codes.
When propagating error information from the COPP DDI in the video miniport driver to the display driver, you
should not use the return value from the EngDeviceIoControl function, because the Windows kernel manipulates
the error value that is returned from the IOCTL to EngDeviceIoControl. Instead, error information should be
passed through the lpInBuffer parameter of EngDeviceIoControl. For more information, see Calling the COPP DDI
from the Display Driver and the example code in COPP Video Miniport Driver Template and Performing COPP
Operations.
The following topics describe operations performed by using the COPP DDI:
Starting a Protected Video Session
Handling Protection Levels
Handling the Loss of a COPP Device
COPP Commands
COPP Status
COPP Status Events
Starting a Protected Video Session
To start a protected video session, the VMR should initiate operations on the DirectX VA COPP device in a specific
order. If this order is not followed, the video miniport driver should return error code E_UNEXPECTED. The video
miniport driver can determine that the correct operation order is followed by assigning a unique device-state
constant to the COPP device when an operation is performed, and then by verifying the device-state constant
before performing a subsequent operation.
To start the protected video session, calls should be made to the COPP device's functions in the following order:
1. The COPPOpenVideoSession function to initialize the COPP device. Before returning, the driver should set
the device-state constant to COPP_OPENED.
2. The COPPGetCertificateLength function to retrieve the size, in bytes, of the certificate used by the graphics
hardware. The driver should first verify that the device-state constant is currently set to COPP_OPENED. If it
is not, the driver should return E_UNEXPECTED. Before returning, the driver should set the device-state
constant to COPP_CERT_LENGTH_RETURNED.
3. The COPPKeyExchange function to retrieve the digital certificate used by the graphics hardware. The driver
should first verify that the device-state constant is currently set to COPP_CERT_LENGTH_RETURNED. If it is
not, the driver should return E_UNEXPECTED. Before returning, the driver should set the device-state
constant to COPP_KEY_EXCHANGED.
4. The COPPSequenceStart function to set the video session to protected mode. The driver should first verify
that the device-state constant is currently set to COPP_KEY_EXCHANGED. If it is not, the driver should return
E_UNEXPECTED. Before returning, the driver should set the device-state constant to COPP_SESSION_ACTIVE
to show that the video session is in the protected mode.
After the video session is set to protected mode, the video miniport driver can process COPP commands and
requests for COPP status, and pass COPP status events.
Handling Protection Levels
For each output connector of the graphics adapter that supports protection, the video miniport driver should
maintain a global reference count for each protection type and for each protection level. Note that the default
global reference counters are initialized to 0.
When the DirectX VA COPP device is created for a video session, the COPP device should contain a local reference
count for each protection type at each protection level. The driver should set the default protection-level counter for
each protection type to the value 1 and the remaining protection-level counters for each protection type to the
value 0.
When a video session sets a new protection level for a particular protection type, the driver should decrement the
reference count for the current protection level and should increment the reference count for the new protection
level. Corresponding changes should also be made to the global reference-level counters.
Whenever any global-level counters change, the driver should inspect all the counters for a particular output
connector and ensure that the protection level is set to a level that corresponds to the highest level counter whose
value is greater than 0. For more information, see the example code in the COPPCommand and COPPQueryStatus
reference pages.
While the global reference counter is greater than 0, the video miniport driver should apply content protection to
the output connector. As soon as the global reference counter reaches 0, the video miniport driver should remove
content protection from the output connector. Whenever the display driver receives a call to its DdMoCompDestroy
callback function (and, in turn, the video miniport driver receives a call to its COPPCloseVideoSession function), the
video miniport driver should decrement the global reference counter by the current level of the COPP device's local
reference counter. The video miniport driver should only remove content protection from the certified output
connector if the global reference counter for the connector reaches 0.
Note The DdMoCompDestroy function might be called while the COPP device's local reference counter is still set to
greater than 0 (for example, if the user-mode process terminated abnormally).
Handling the Loss of a COPP Device
A video session that is set to protected mode must handle scenarios that cause the destruction of a DirectX VA
COPP device that is associated with the video session. The following scenarios initiate a call to the display driver's
DdMoCompDestroy callback function while content protection on the certified output connector for the video
session is possibly enabled:
Changing the display mode
Attaching or detaching a monitor from the Windows desktop
Entering a full-screen Command Prompt window
Starting any DirectDraw or Direct3D exclusive-mode application
Performing Fast User Switching
Locking the workstation or pressing CTRL+ALT+DELETE
Attaching to the workstation by using Remote Desktop Connection
Entering a power-saving mode--for example, suspend or hibernate
Terminating the application unexpectedly--for example, through a page fault
If one of the preceding scenarios occurs while output content protection for the video session is enabled, the
display driver's DdMoCompDestroy function should initiate a call to the video miniport driver's
COPPCloseVideoSession function to decrement the global protection-level count by the current local protection-
level count for the COPP device. The video miniport driver should then examine the modified global protection
level and adjust the protection level applied to the output connector accordingly.
COPP Commands
The video miniport driver can receive a request to perform an operation on the physical connector associated with
the DirectX VA COPP device. The video miniport driver's COPPCommand function is passed a pointer to a
DXVA_COPPCommand structure that specifies the operation to perform. The guidCommandID and
CommandData members of DXVA_COPPCommand specify the operation. The following operations are
supported:
Setting the protection level
Instructing how to protect the signal
Setting the Protection Level
The COPP command can set the protection level of a protection type on the physical connector associated with the
DirectX VA COPP device. To set the protection level, the video miniport driver's COPPCommand function receives a
pointer to a DXVA_COPPCommand structure with the guidCommandID member set to the
DXVA_COPPSetProtectionLevel GUID and the CommandData member set to a pointer to a
DXVA_COPPSetProtectionLevelCmdData structure that specifies the type of protection to set and the level at
which to set the protection. If a protection level is not available for the protection type, the COPP command sets the
protection level to COPP_NoProtectionLevelAvailable (-1). The COPP command might also specify some extended
information in the ExtendedInfoChangeMask and ExtendedInfoData members of
DXVA_COPPSetProtectionLevelCmdData for the video miniport driver to set for the protection type.
The following protection levels can be set for the indicated protection types:
For COPP_ProtectionType_ACP, set one of the following values from the COPP_ACP_Protection_Level
enumerated type:
COPP_ACP_Level0 or COPP_ACP_LevelMin (0)
COPP_ACP_Level1 (1)
COPP_ACP_Level2 (2)
COPP_ACP_Level3 or COPP_ACP_LevelMax (3)
For COPP_ProtectionType_CGMSA, set one of the following values from the COPP_CGMSA_Protection_Level
enumerated type:
COPP_CGMSA_Disabled or COPP_CGMSA_LevelMin (0)
COPP_CGMSA_CopyFreely (1)
COPP_CGMSA_CopyNoMore (2)
COPP_CGMSA_CopyOneGeneration (3)
COPP_CGMSA_CopyNever (4)
COPP_CGMSA_RedistributionControlRequired (0x08)
(COPP_CGMSA_RedistributionControlRequired + COPP_CGMSA_CopyNever) or
COPP_CGMSA_LevelMax
For COPP_ProtectionType_HDCP, set one of the following values from the COPP_HDCP_Protection_Level
enumerated type:
COPP_HDCP_Level0 or COPP_HDCP_LevelMin (0)
COPP_HDCP_Level1 or COPP_HDCP_LevelMax (1)
Instructing How to Protect the Signal
The COPP command can provide instructions about how to protect the signal that goes through the physical
connector associated with the DirectX VA COPP device. To set signal protection, the video miniport driver's
COPPCommand function receives a pointer to a DXVA_COPPCommand structure with the guidCommandID
member set to the DXVA_COPPSetSignaling GUID and the CommandData member set to a pointer to a
DXVA_COPPSetSignalingCmdData structure that specifies how to protect the signal.
COPP Status
The video miniport driver can receive a request for COPP status on the physical connector associated with the
DirectX VA COPP device.
The video miniport driver's COPPQueryStatus function is passed a pointer to a DXVA_COPPStatusInput structure
that contains the request. COPPQueryStatus writes the status to the DXVA_COPPStatusOutput structure to which
the pOutput parameter points. The guidStatusRequestID and StatusData members of DXVA_COPPStatusInput
specify the status request. Depending on the request, the video miniport driver should cast the status information
to a pointer to a DXVA_COPPStatusData, DXVA_COPPStatusDisplayData, DXVA_COPPStatusHDCPKeyData,
or DXVA_COPPStatusSignalingCmdData structure. The video miniport driver should then copy the status
information to the COPPStatus array member of DXVA_COPPStatusOutput.
Note The driver must return the 128-bit random number used once in the rApp member of
DXVA_COPPStatusData, DXVA_COPPStatusDisplayData, DXVA_COPPStatusHDCPKeyData, or
DXVA_COPPStatusSignalingCmdData. The 128-bit random number was generated by the sending application and
was provided in the rApp member of DXVA_COPPStatusInput.
The driver returns the following status data for the indicated request:
For DXVA_COPPQueryProtectionType set in guidStatusRequestID and nothing set in StatusData, returns a
valid ORed combination of the following values in the dwData member of DXVA_COPPStatusData that indicate
the available types of protection mechanisms on the physical connector associated with the COPP device:
COPP_ProtectionType_Unknown
COPP_ProtectionType_None
COPP_ProtectionType_HDCP
COPP_ProtectionType_ACP
COPP_ProtectionType_CGMSA
For DXVA_COPPQueryConnectorType set in guidStatusRequestID and nothing set in StatusData, returns
one of the following values in the dwData member of DXVA_COPPStatusData that identifies the type of
physical connector the video session uses:
COPP_ConnectorType_Unknown
COPP_ConnectorType_VGA
COPP_ConnectorType_SVideo
COPP_ConnectorType_CompositeVideo
COPP_ConnectorType_ComponentVideo
COPP_ConnectorType_DVI
COPP_ConnectorType_HDMI
COPP_ConnectorType_LVDS
COPP_ConnectorType_TMDS
COPP_ConnectorType_D_JPN
The driver can also combine the COPP_ConnectorType_Internal (0x80000000) value with one of the
preceding connector-type values to indicate that the connection between the graphics adapter and display
monitor is permanent and not accessible from the outside of a non-user-serviceable enclosure.
For DXVA_COPPQueryLocalProtectionLevel or DXVA_COPPQueryGlobalProtectionLevel set in
guidStatusRequestID and the protection type set in StatusData, returns a protection-level value in the
dwData member of DXVA_COPPStatusData. For possible protection levels, see COPP Commands. The
DXVA_COPPQueryLocalProtectionLevel request returns the currently set protection level for the video
session. The DXVA_COPPQueryGlobalProtectionLevel request returns the currently set protection level for
the physical connector.
The COPP status query might also request that the video miniport driver retrieve some extended
information.
For DXVA_COPPQueryBusData set in guidStatusRequestID and nothing in StatusData, returns one of the
following values in the dwData member of DXVA_COPPStatusData that identifies the type of bus used by
the graphics hardware:
COPP_BusType_Unknown
COPP_BusType_PCI
COPP_BusType_PCIX
COPP_BusType_PCIExpress
COPP_BusType_AGP
The driver can only combine the COPP_BusType_Integrated (0x80000000) value with one of the preceding
bus-type values when none of the command and status interface signals between the graphics adapter and
other subsystems are available on an expansion bus that uses a publicly available specification and standard
connector type. Memory buses are excluded from this definition.
For DXVA_COPPQueryDisplayData set in guidStatusRequestID and nothing set in StatusData, returns
information in a DXVA_COPPStatusDisplayData structure that describes the display mode of the signal
that is transmitted over the connector associated with a DirectX VA COPP device.
For DXVA_COPPQueryHDCPKeyData set in guidStatusRequestID and nothing set in StatusData, returns
information in a DXVA_COPPStatusHDCPKeyData structure that describes a High-bandwidth Digital
Content Protection (HDCP) key selection vector (KSV).
For DXVA_COPPQuerySignaling set in guidStatusRequestID and nothing set in StatusData, returns
information in a DXVA_COPPStatusSignalingCmdData structure that describes how the signal that goes
through the physical connector associated with the DirectX VA COPP device is protected.
The COPP status query might also request that the video miniport driver retrieve some extended
information.
COPP Status Events
External events can alter the nature of the protection that is applied to a connector or even modify the type of the
connector. The video miniport driver must report these events to COPP applications whenever the driver receives a
call to its COPPQueryStatus function. The video miniport driver must report the following external events by
returning the specified flags only on the next call to COPPQueryStatus after the events occur:
Connection integrity: If the connection between the computer and the display device becomes unplugged,
the video miniport driver should set the COPP_LinkLost flag in the dwFlags member of the
DXVA_COPPStatusData structure, the DXVA_COPPStatusDisplayData structure, the
DXVA_COPPStatusHDCPKeyData structure, or the DXVA_COPPStatusSignalingCmdData structure.
Connector reconfigurations: If the user causes the configuration of the physical connector to change, the
video miniport driver should set the COPP_RenegotiationRequired flag in the dwFlags member of the
DXVA_COPPStatusData, DXVA_COPPStatusDisplayData, DXVA_COPPStatusHDCPKeyData, or
DXVA_COPPStatusSignalingCmdData structure.
The video miniport driver returns a pointer to a DXVA_COPPStatusData, DXVA_COPPStatusDisplayData,
DXVA_COPPStatusHDCPKeyData, or DXVA_COPPStatusSignalingCmdData structure in the COPPStatus array
member of the DXVA_COPPStatusOutput structure. A pointer to DXVA_COPPStatusOutput is returned through
the pOutput parameter of COPPQueryStatus.
For example, consider two media playback applications, A and B, each controlling, via COPP, the HDCP protection
level of the connector that attaches the computer to the display monitor. Each application controls its own unique
COPP DirectX VA device. If the connector becomes unplugged, then the next time either application initiates a
COPPQueryStatus request to its COPP device, the video miniport driver should return the COPP_LinkLost flag.
Assume application A is the first to initiate a call to COPPQueryStatus on its COPP device. Application A then
receives the COPP_LinkLost flag and acts accordingly. If application A initiates a subsequent COPPQueryStatus call,
it should not receive the COPP_LinkLost flag, unless the connector becomes unplugged again. When application B
initiates a call to COPPQueryStatus on its COPP device, it receives the COPP_LinkLost flag and acts accordingly.
Again, application B should not receive the COPP_LinkLost flag again until the connector becomes unplugged
again.
Implementation Tips and Requirements for COPP
The following topics discuss tips and requirements for implementing COPP functionality in display and video
miniport drivers:
COPP and Multiple-Monitor Support
COPP and ChangeDisplaySettingsEx
COPP and Display Modes
COPP and Multiple-Monitor Support
The only multiple-monitor mode supported by COPP is DualView. Various clone and theater modes are not
supported. The only exception to this rule is the case where a graphics adapter uses both composite and S-Video
connectors and simultaneously feeds the same display signal through both connectors. In this case, the video
miniport driver should report that the connector is S-Video and should ensure that the appropriate protections are
applied to both display outputs when requested by COPP calls initiated through applications.
COPP and ChangeDisplaySettingsEx
Because applications can alter analog content protection (ACP) levels by calling the Microsoft Win32
ChangeDisplaySettingsEx function, the video miniport driver should ensure that adjustments to the ACP
protection type through ChangeDisplaySettingsEx are independent of adjustments made by the
IAMCertifiedOutputProtection interface. In other words, if the ACP protection type is set on the physical
connector through the video miniport driver's COPPCommand function, the video miniport driver should not
permit disabling the ACP protection type on the physical connector through a
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS request. Note that user-mode calls to ChangeDisplaySettingsEx
initiate IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS requests to the video miniport driver.
For more information about the ChangeDisplaySettingsEx function, see the Windows SDK documentation.
COPP and Display Modes
The video miniport driver should report all the protection types that are supported on the physical connector
associated with the DirectX VA COPP device, regardless of the display mode currently being used. The video
miniport driver reports supported protection types when it receives a call to its COPPQueryStatus function with the
DXVA_COPPQueryProtectionType set in the guidStatusRequestID member of the DXVA_COPPStatusInput
structure. For more information, see COPP Status.
If the current resolution is too high for a particular protection type, then when the video miniport driver's
COPPCommand function is called to set the protection level for that protection type, the driver should return an
error. The following scenarios give examples of when the driver's COPPCommand function should return success
or an error:
If the DirectX VA COPP device is associated with an S-Video output connector, a call to the video miniport
driver's COPPQueryStatus function with DXVA_COPPQueryProtectionType set should indicate support of the
analog content protection (ACP) type (COPP_ProtectionType_ACP). Thereafter, if the driver's
COPPCommand function is called to set a level for the ACP type on this connector, the driver should return
success because the output resolution of S-Video is fixed, even though desktop resolution (display mode)
might be higher.
If the DirectX VA COPP device is associated with component output connectors, a call to the video miniport
driver's COPPQueryStatus function with DXVA_COPPQueryProtectionType set should also indicate support
of the ACP type. However, if the driver's COPPCommand function is called to set a level for the ACP type on
this output when the display resolution is 720p or 1080i, the driver should return the DDERR_TOOBIGSIZE
error code because the resolution is too high to set the protection level for the ACP type on component
output connectors.
The example code provided in this section shows implementations of a motion-compensation code template and
a Certified Output Protection Protocol (COPP) video miniport driver template. The motion-compensation code
template is used to access ProcAmp control, deinterlacing, and COPP functionality. The COPP video miniport
driver template is used to access COPP functionality. Using these templates can simplify your display driver and
video miniport driver development. However, you are not required to implement access to ProcAmp control,
deinterlacing, and COPP functionality in this manner for your drivers to work correctly.
Motion Compensation Code Template
COPP Video Miniport Driver Template
Motion Compensation Code Template
The example code provided in this section shows an implementation of a motion-compensation code template that
is used to access ProcAmp control, deinterlacing, and Certified Output Protection Protocol (COPP) functionality.
Using this template can simplify your display driver development. However, you are not required to implement
access to ProcAmp control, deinterlacing, and COPP functionality in this manner for your display driver to work
correctly.
If your driver supports other DirectX VA functions, such as decoding MPEG-2 video streams, then extend the
example code to include processing of additional DirectX VA GUIDs.
Defining DirectX VA Device Classes
Retrieving DirectX VA Devices
Creating Instances of DirectX VA Device Objects
Performing ProcAmp Control and Deinterlacing Operations
Performing Deinterlacing with Substream Compositing Operations
Deleting Instances of DirectX VA Device Objects
Defining DirectX VA Device Classes
Use the example code in this section to define device classes for the deinterlace container device, ProcAmp control
device, deinterlace mode device (for example, bob), and COPP device. These device classes contain declarations for
member functions that comprise the ProcAmp Control DDI and Deinterlace DDI. These device class definitions can
be declared in a driver-supplied header file.
Use the following example code to define each device type and a base class that applies to each device type:
// These enumerated types specify the DirectX VA device class.

enum DXVA_DeviceType {
DXVA_DeviceContainer = 0x0001,
DXVA_DeviceDecoder = 0x0002,
DXVA_DeviceDeinterlacer = 0x0003,
DXVA_DeviceProcAmpControl = 0x0004,
DXVA_DeviceCOPP = 0x0005
};
// Other DirectX VA device classes inherit from this base class,
struct DXVA_DeviceBaseClass {
GUID m_DeviceGUID;
DXVA_DeviceType m_DeviceType;
DXVA_DeviceBaseClass(const GUID& guid, DXVA_DeviceType Type) :

m_DeviceGUID(guid), m_DeviceType(Type)
{}
};
The following topics contain example code that defines classes for the deinterlace container device, ProcAmp
control device, deinterlace bob device, and COPP device:
Defining the Deinterlace Container Device Class
Defining the ProcAmp Control Device Class
Defining the Deinterlace Bob Device Class
Defining the COPP Device Class
Defining the Deinterlace Container Device Class
Use the following example code to define the deinterlace container device class:
// Deinterlace container device class.

struct DXVA_DeinterlaceContainerDeviceClass : public DXVA_DeviceBaseClass
{
// Uses the base class's constructor.
DXVA_DeinterlaceContainerDeviceClass(const GUID& guid, DXVA_DeviceType Type) :
DXVA_DeviceBaseClass(guid, Type)
{}
// Part of the Deinterlace DDI.
HRESULT DeinterlaceQueryAvailableModes(
LPDXVA_VideoDesc lpVideoDescription,
LPDWORD lpdwNumModesSupported,
LPGUID pGuidsDeinterlaceModes
);
HRESULT DeinterlaceQueryModeCaps(
LPGUID pGuidDeinterlaceMode,
LPDXVA_DeinterlaceCaps lpDeinterlaceCaps
);
// Part of the ProcAmp Control DDI.
HRESULT ProcAmpControlQueryCaps(
LPDXVA_ProcAmpControlCaps lpProcAmpControlCaps
);
HRESULT ProcAmpControlQueryRange(
DWORD VideoProperty,
LPDXVA_VideoPropertyRange lpProcAmpControlRange
);
};

Defining the ProcAmp Control Device Class
Use the following example code to define the ProcAmp control device class:
// ProcAmp control device class.

struct DXVA_ProcAmpControlDeviceClass : public DXVA_DeviceBaseClass
{
DXVA_VideoDesc m_VideoDesc;
DXVA_ProcAmpControlDeviceClass(const GUID& guid, DXVA_DeviceType Type) :
{}
// The following three functions are part of the
// ProcAmp Control DDI.
HRESULT ProcAmpControlOpenStream(LPDXVA_VideoDesc lpVideoDescription);
HRESULT ProcAmpControlCloseStream();
HRESULT ProcAmpControlBlt(
LPVOID lpDDSDstSurface,
LPVOID lpDDSSrcSurface,
DXVA_ProcAmpControlBlt* pCcBlt);
};

Defining the Deinterlace Bob Device Class
Use the following example code to define the deinterlace bob device class:
// Deinterlace bob device class.

struct DXVA_DeinterlaceBobDeviceClass : public DXVA_DeviceBaseClass
{
DXVA_VideoDesc m_VideoDesc;
DXVA_DeinterlaceBobDeviceClass(const GUID& guid, DXVA_DeviceType Type) :
{}
// The following functions are part of the
// Deinterlace DDI.
HRESULT DeinterlaceOpenStream(LPDXVA_VideoDesc lpVideoDescription);
HRESULT DeinterlaceCloseStream();
HRESULT DeinterlaceBlt(
REFERENCE_TIME rtTargetFrame,
LPRECT lprcDstRect,
LPDDSURFACE lpDDSDstSurface,
LPRECT lprcSrcRect,
LPDXVA_VideoSample lpDDSrcSurfaces,
DWORD dwNumSurfaces,
FLOAT fAlpha);
HRESULT DeinterlaceBltEx(
REFERENCE_TIME rtTargetFrame,
LPRECT lprcTargetRect,
DXVA_AYUVsample2 BackgroundColor,
DWORD dwDestinationFormat,
DWORD dwDestinationFlags,
LPDDSURFACE lpDDSDstSurface,
LPDXVA_VideoSample2 lpDDSrcSurfaces,
DWORD dwNumSurfaces,
FLOAT fAlpha);
};

Defining the COPP Device Class
Use the following example code to define the COPP device class:
// COPP device class.

struct DXVA_COPPDeviceClass : public DXVA_DeviceBaseClass
{
VOID* m_pThis;
};

Retrieving DirectX VA Devices
Use the following example code to retrieve DirectX VA devices. This code is an implementation of the
DdMoCompGetGuids callback function. The GetMoCompGuids member of the DD_MOTIONCOMPCALLBACKS
structure points to the callback function.
DWORD g_dwDXVANumSupportedGUIDs = 4;
const GUID* g_DXVASupportedGUIDs[4] = {
&DXVA_DeinterlaceContainerDevice,
&DXVA_ProcAmpControlDevice
&DXVA_DeinterlaceBobDevice
&DXVA_COPPDevice
};
DWORD APIENTRY
MOCOMPCB_GETGUIDS(
PDD_GETMOCOMPGUIDSDATA lpData
)
{
DWORD dwNumToCopy;
// If lpGuids == NULL, the driver must return the number of

// supported GUIDS in the dwNumGuids parameter. If non-NULL,
// the supported GUIDS must be copied into the buffer at lpGuids.
if (lpData->lpGuids) {
dwNumToCopy = min(g_dwDXVANumSupportedGUIDs, lpData->dwNumGuids);
for (DWORD i = 0; i < dwNumToCopy; i++) {
lpData->lpGuids[i] = *g_DXVASupportedGUIDs[i];
}
}
else {
dwNumToCopy = g_dwDXVANumSupportedGUIDs;
}
lpData->dwNumGuids = dwNumToCopy;
lpData->ddRVal = DD_OK;
}

Creating Instances of DirectX VA Device Objects
Use the following example code to create instances of DirectX VA device objects. This code is an implementation of
the DdMoCompCreate callback function. The CreateMoComp member of the DD_MOTIONCOMPCALLBACKS
// Determine that the passed in GUID is valid.

BOOL
ValidDXVAGuid(LPGUID lpGuid) {
// See Retrieving DirectX VA Devices, for more information
// about g_dwDXVANumSupportedGUIDs and g_DXVASupportedGUIDs[].
for (DWORD i = 0; i < g_dwDXVANumSupportedGUIDs; i++) {
if (*g_DXVASupportedGUIDs[i] == *lpGuid) {
return TRUE;
}
}
return FALSE;
}
DWORD APIENTRY
MOCOMPCB_CREATE(
PDD_CREATEMOCOMPDATA lpData
)
{
// Determine that the passed in GUID is valid.
if (!ValidDXVAGuid(lpData->lpGuid)) {
lpData->ddRVal = E_INVALIDARG;
}
// Determine that this is the deinterlace container device GUID.
if (*lpData->lpGuid == DXVA_DeinterlaceContainerDevice) {
DXVA_DeinterlaceContainerDeviceClass* lpDev =
new DXVA_DeinterlaceContainerDeviceClass(*lpData->lpGuid,
DXVA_DeviceContainer);
if (lpDev) {
lpData->ddRVal = DD_OK;
}
else {
lpData->ddRVal = E_OUTOFMEMORY;
}
lpData->lpMoComp->lpDriverReserved1 =
(LPVOID)(DXVA_DeviceBaseClass*)lpDev;
}
//
// Determine that this is the ProcAmp control device GUID.
if (*lpData->lpGuid == DXVA_ProcAmpControlDevice) {
DXVA_ProcAmpControlDeviceClass* lpDev =
new DXVA_ProcAmpControlDeviceClass(*lpData->lpGuid,
DXVA_DeviceProcAmpControl);
if (lpDev) {
LPDXVA_VideoDesc lpVideoDescription =
(LPDXVA_VideoDesc)lpData->lpData;
lpData->ddRVal = lpDev->ProcAmpControlOpenStream(
lpVideoDescription);
if (lpData->ddRVal != DD_OK) {
delete lpDev;
lpDev = NULL;
}
}
}
else lpData->ddRVal = E_OUTOFMEMORY;
}
// Determine that this is the deinterlace bob device GUID

if (*lpData->lpGuid == DXVA_DeinterlaceBobDevice) {
DXVA_DeinterlaceBobDeviceClass* lpDev =
new DXVA_DeinterlaceBobDeviceClass(*lpData->lpGuid,
DXVA_DeviceDeinterlacer);
if (lpDev) {
LPDXVA_VideoDesc lpVideoDescription =
(LPDXVA_VideoDesc)lpData->lpData;
lpData->ddRVal = lpDev->DeinterlaceOpenStream(
lpVideoDescription);
delete lpDev;
lpDev = NULL;
}
}
else lpData->ddRVal = E_OUTOFMEMORY;
}
// Determine that this is the COPP device GUID

if (*lpData->lpGuid == DXVA_COPPDevice) {
DXVA_COPPDeviceClass* lpDev =
new DXVA_COPPDeviceClass(*lpData->lpGuid, DXVA_DeviceCOPP);
if (lpDev) {
// Determine the correct DevID of the graphics device that
// the COPP device is attached to.
ULONG DevID = 0;
ULONG BytesReturned;
COPP_IO_InputBuffer InputBuffer;
InputBuffer.ppThis = &lpDev->m_pThis;
InputBuffer.InputBuffer = &DevID;
// Pass, to the video miniport driver, a
// pointer to the error variable.
InputBuffer.phr = &lpData->ddRVal;
EngDeviceIoControl(
(HANDLE)GetDriverHandleFromPDEV(lpData->lpDD->lpGbl->dhpdev),
IOCTL_COPP_OpenDevice,
&InputBuffer,
sizeof(InputBuffer),
NULL,
0,
&BytesReturned);
delete lpDev;
lpDev = NULL;
}
}
else {
lpData->ddRVal = E_OUTOFMEMORY;
}
lpData->lpMoComp->lpDriverReserved1 = (LPVOID)(DXVA_DeviceBaseClass*)lpDev;
}
lpData->ddRVal = DDERR_CURRENTLYNOTAVAIL;
}

Performing ProcAmp Control and Deinterlacing
Operations
Use the following example code to perform ProcAmp control and deinterlacing operations. This code is an
implementation of the DdMoCompRender callback function. The RenderMoComp member of the
DD_MOTIONCOMPCALLBACKS structure points to the callback function.
DWORD APIENTRY
MOCOMPCB_RENDER(
PDD_RENDERMOCOMPDATA lpData
)
{
// The driver saves the device class object in lpDriverReserved1
// during the DdMoCompCreate callback. For more information,
// see Creating Instances of DirectX VA Device Objects.
DXVA_DeviceBaseClass* pDXVABase =
(DXVA_DeviceBaseClass*)lpData->lpMoComp->lpDriverReserved1;
if (pDXVABase == NULL) {
lpData->ddRVal = E_POINTER;
}
// Process according to the device type in the class object.
// For more information, see Defining DirectX VA Device Classes.
switch (pDXVABase->m_DeviceType) {
// This is the deinterlace container device.
case DXVA_DeviceContainer:
case DXVA_DeinterlaceQueryAvailableModesFnCode:
{
DXVA_DeinterlaceContainerDeviceClass* pDXVADev =
(DXVA_DeinterlaceContainerDeviceClass*)pDXVABase;
DXVA_DeinterlaceQueryAvailableModes* pQAM =
(DXVA_DeinterlaceQueryAvailableModes*)lpData->lpOutputData;

lpData->ddRVal =
pDXVADev->DeinterlaceQueryAvailableModes(
(DXVA_VideoDesc*)lpData->lpInputData,
&pQAM->NumGuids,
&pQAM->Guids[0]);
}
case DXVA_DeinterlaceQueryModeCapsFnCode:
{
DXVA_DeinterlaceQueryModeCaps* pQMC =
(DXVA_DeinterlaceQueryModeCaps*)lpData->lpInputData;
DXVA_DeinterlaceCaps*pDC =
(DXVA_DeinterlaceCaps*)lpData->lpOutputData;

lpData->ddRVal = pDXVADev->DeinterlaceQueryModeCaps(
&pQMC->Guid,
&pQMC->VideoDesc,
pDC);
}
case DXVA_ProcAmpControlQueryCapsFnCode:
{
DXVA_VideoDesc* pVideoDesc =
(DXVA_VideoDesc *)lpData->lpInputData;
DXVA_ProcAmpControlCaps* pCC =
(DXVA_ProcAmpControlCaps*)lpData->lpOutputData;

lpData->ddRVal = pDXVADev->ProcAmpControlQueryCaps(
pVideoDesc,
pCC);
}
case DXVA_ProcAmpControlQueryRangeFnCode:
{
DXVA_ProcAmpControlQueryRange* pccqr =
(DXVA_ProcAmpControlQueryRange *)lpData->lpInputData;
DXVA_VideoPropertyRange*pPR =
(DXVA_VideoPropertyRange*)lpData->lpOutputData;

lpData->ddRVal = pDXVADev->ProcAmpControlQueryRange(
pccqr->ProcAmpControlProp,
&pccqr->VideoDesc,
pPR);
}
default:
}
break;
// This is the ProcAmp control device.
case DXVA_DeviceProcAmpControl:
case DXVA_ProcAmpControlBltFnCode:
{
DXVA_ProcAmpControlDeviceClass* pDXVADev =
(DXVA_ProcAmpControlDeviceClass*)pDXVABase;
DXVA_ProcAmpControlBlt* lpBlt =
(DXVA_ProcAmpControlBlt*)lpData->lpInputData;
LPDDMOCOMPBUFFERINFO lpBuffInfo = lpData->lpBufferInfo;
lpData->ddRVal = pDXVADev->ProcAmpControlBlt(
lpBuffInfo[0].lpCompSurface,
lpBlt);
}
default:
}
break;
// This is the deinterlace bob device.
case DXVA_DeviceDeinterlacer:
case DXVA_DeinterlaceBltFnCode:
{
DXVA_DeinterlaceBobDeviceClass* pDXVADev =
(DXVA_DeinterlaceBobDeviceClass*)pDXVABase;
DXVA_DeinterlaceBlt* lpBlt =
(DXVA_DeinterlaceBlt*)lpData->lpInputData;
LPDDMOCOMPBUFFERINFO lpBuffInfo = lpData->lpBufferInfo;
for (DWORD i = 0; i < lpBlt->NumSourceSurfaces; i++) {

lpBlt->Source[i].lpDDSSrcSurface =
lpBuffInfo[1 + i].lpCompSurface;
}
lpData->ddRVal = pDXVADev->DeinterlaceBlt(
lpBlt->rtTarget,
&lpBlt->DstRect,
&lpBlt->SrcRect,
lpBlt->Source,
lpBlt->NumSourceSurfaces,
lpBlt->Alpha);
}
default:
}
break;
}
}

Performing Deinterlacing with Substream
Compositing Operations
Use the following example code to perform operations that combine deinterlacing the video stream and
compositing video substreams on top of the video stream. The example code implements the DdMoCompRender
callback function. The RenderMoComp member of the DD_MOTIONCOMPCALLBACKS structure points to the
callback function. The example code only shows how DdMoCompRender is used for deinterlacing with substream
compositing operations. For an implementation of DdMoCompRender that performs ProcAmp control and
deinterlacing operations, see Performing ProcAmp Control and Deinterlacing Operations.
DWORD APIENTRY
MOCOMPCB_RENDER(PDD_RENDERMOCOMPDATA lpData)
{
}
case DXVA_DeinterlaceBltExFnCode:
{
DXVA_DeinterlaceBltEx* lpBlt =
(DXVA_DeinterlaceBltEx*)lpData->lpInputData;
LPDDMOCOMPBUFFERINFO lpBuffInfo = lpData->BufferInfo;
for (DWORD i = 0; i < lpBlt->NumSourceSurfaces; i++) {

lpBlt->Source[i].lpDDSSrcSurface =
lpBuffInfo[1 + i].lpCompSurface;
}

lpData->ddRVal = pDXVADev->DeinterlaceBltEx(
lpBlt->rtTarget,
&lpBlt->rcTarget,
lpBlt->BackgroundColor,
lpBlt->DestinationFormat,
lpBlt->DestinationFlags,
lpBlt->Source,
lpBlt->NumSourceSurfaces,
lpBlt->Alpha);
}
default:
}
}

Performing COPP Operations Example
Use the following example code to perform operations over the Certified Output Protection Protocol (COPP). The
example code implements the DdMoCompRender callback function. The RenderMoComp member of the
DD_MOTIONCOMPCALLBACKS structure points to the callback function. The example code only shows how
DdMoCompRender is used for COPP operations. For an implementation of DdMoCompRender that performs
ProcAmp control and deinterlacing operations, see Performing ProcAmp Control and Deinterlacing Operations and
Performing Deinterlacing with Substream Compositing Operations.
DWORD APIENTRY
MOCOMPCB_RENDER(
PDD_RENDERMOCOMPDATA lpData
)
{
}
// This is the COPP device.
case DXVA_DeviceCOPP:
{
DXVA_COPPDeviceClass* pDXVACopp =
(DXVA_COPPDeviceClass*)pDXVABase;
HANDLE handle = (HANDLE)GetDriverHandleFromPDEV(lpData->lpDD->lpGbl->dhpdev)
InputBuffer.ppThis = &pDXVACopp->m_pThis;
case DXVA_COPPGetCertificateLengthFnCode:
if (lpData->dwOutputDataSize < sizeof(ULONG)) {
}
else {
InputBuffer.InputBuffer = NULL;
EngDeviceIoControl(handle,
IOCTL_COPP_GetCertificateLength,
&InputBuffer,
lpData->lpOutputData,
lpData->dwOutputDataSize,
&BytesReturned);
}
break;
case DXVA_COPPKeyExchangeFnCode:
if (lpData->dwOutputDataSize < sizeof(DXVA_COPPKeyExchangeOutput)) {
}
else {
DD_SURFACE_LOCAL* lpCompSurf =
lpData->lpBufferInfo[0].lpCompSurface;
InputBuffer.InputBuffer = (PVOID)lpCompSurf->lpGbl->fpVidMem;
EngDeviceIoControl(handle
IOCTL_COPP_KeyExchange,
&InputBuffer,
&BytesReturned);
}
break;
case DXVA_COPPSequenceStartFnCode:
if (lpData->dwInputDataSize < sizeof(DXVA_COPPSignature)) {
}
else {
InputBuffer.InputBuffer = lpData->lpInputData;
IOCTL_COPP_StartSequence,
&InputBuffer,
NULL,
0,
&BytesReturned);
}
break;
case DXVA_COPPCommandFnCode:
if (lpData->dwInputDataSize < sizeof(DXVA_COPPCommand)) {
}
else {
IOCTL_COPP_Command,
&InputBuffer,
NULL,
0,
&BytesReturned);
}
break;
case DXVA_COPPQueryStatusFnCode:
if (lpData->dwInputDataSize < sizeof(DXVA_COPPStatusInput) ||
lpData->dwOutputDataSize < sizeof(DXVA_COPPStatusOutput)) {
}
else {
IOCTL_COPP_Status,
&InputBuffer,
&BytesReturned);
}
break;
default:
break;
}
break;
}
}
}
Deleting Instances of DirectX VA Device Objects
Use the following example code to delete instances of DirectX VA device objects. This code is an implementation of
the DdMoCompDestroy callback function. The DestroyMoComp member of the DD_MOTIONCOMPCALLBACKS
DWORD APIENTRY
MOCOMPCB_DESTROY(
PDD_DESTROYMOCOMPDATA lpData
)
{
// during the call to the DdMoCompCreate callback. For more information,
}
// This is the deinterlace container device.
case DXVA_DeviceContainer:
lpData->ddRVal = S_OK;
delete pDXVABase;
break;
// This is the ProcAmp control device.

case DXVA_DeviceProcAmpControl:
{
DXVA_ProcAmpControlDeviceClass* pDXVADev =
(DXVA_ProcAmpControlDeviceClass*)pDXVABase;
lpData->ddRVal = pDXVADev->ProcAmpControlCloseStream();
delete pDXVADev;
}
break;
// This is the deinterlace bob device.

case DXVA_DeviceDeinterlacer:
{
lpData->ddRVal = pDXVADev->DeinterlaceCloseStream();
delete pDXVADev;
}
break;
// This is the COPP device.

case DXVA_DeviceCOPP:
DXVA_COPPDeviceClass* pDXVADev = (DXVA_COPPDeviceClass*)pDXVABase;
HANDLE handle = (HANDLE)GetDriverHandleFromPDEV(lpData ->lpDD->lpGbl->dhpdev)
InputBuffer.ppThis = &pDXVADev->m_pThis;
IOCTL_COPP_CloseDevice,
&InputBuffer,
NULL,
0,
&BytesReturned);
delete pDXVADev;
}
break;
}
}

COPP Video Miniport Driver Template
The example code provided in this section shows an implementation of a COPP video miniport driver code
template that is used to access COPP functionality. Using this template can simplify your video miniport driver
development. However, you are not required to implement access to COPP functionality in this manner for your
video miniport driver to work correctly.
COPP Device Definition Template Code
COPP Video Miniport Driver IOCTL Template Code
COPP Video Miniport Driver Open Template Code
COPP Video Miniport Driver Get Certificate Template Code
COPP Video Miniport Driver Key Exchange Template Code
COPP Video Miniport Driver Sequence Start Template Code
COPP Video Miniport Driver Command Template Code
COPP Video Miniport Driver Status Template Code
COPP Video Miniport Driver Close Template Code
COPP Device Definition Template Code
Use the following example code to define a COPP DirectX VA device object.
#define COPP_OPENED 0
#define COPP_CERT_LENGTH_RETURNED 1
#define COPP_KEY_EXCHANGED 2
#define COPP_SESSION_ACTIVE 3
typedef struct {
DWORD m_LocalLevel[COPP_MAX_TYPES];
GUID m_KDI;
DWORD m_CmdSeqNumber;
DWORD m_StatusSeqNumber;
DWORD m_rGraphicsDriver;
DWORD m_COPPDevState;
DWORD m_DevID;
AESHelper m_AesHelper;
} COPP_DeviceData;

COPP Video Miniport Driver IOCTL Template Code
The video miniport driver must implement a HwVidStartIO function to process the I/O requests that originate in the
display driver. The following example code shows only how the video miniport driver processes COPP IOCTLs:
BOOLEAN
HwVidStartIO(
PHW_DEVICE_EXTENSION pHwDeviceExtension,
PVIDEO_REQUEST_PACKET pVideoRequestPacket
)
{
VP_STATUS vpStatus;
switch (pVideoRequestPacket->IoControlCode)
{
case IOCTL_COPP_OpenDevice:
vpStatus = IoctlCOPPOpenDevice(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_GetCertificateLength:
vpStatus = IoctlCOPPGetCertificateLength(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_KeyExchange:
vpStatus = IoctlCOPPKeyExchange(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_StartSequence:
vpStatus = IoctlCOPPStartSeqence(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_Command:
vpStatus = IoctlCOPPCommand(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_Status:
vpStatus = IoctlCOPPStatus(pHwDeviceExtension, pVideoRequestPacket);
break;
case IOCTL_COPP_CloseDevice:
vpStatus = IoctlCOPPCloseDevice(pHwDeviceExtension, pVideoRequestPacket);
break;
default:
vpStatus = ERROR_INVALID_FUNCTION;
break;
}
pVideoRequestPacket->StatusBlock->Status = vpStatus;
return TRUE;
}

COPP Video Miniport Driver Open Template Code
Use the following example code to create instances of COPP DirectX VA device objects.
VP_STATUS
IoctlCOPPOpenDevice(
)
{
COPP_IO_InputBuffer* pInBuff = pVideoRequestPacket->InputBuffer;
ULONG uDevID = *(ULONG*)pInBuff->InputBuffer;
COPP_DeviceData* pThis = VideoPortAllocatePool(pHwDeviceExtension,
VpPagedPool,
sizeof(COPP_DeviceData),
'PPOC');
*pInBuff->ppThis = NULL;
if (pThis == NULL) {
*pInBuff->phr = ERROR_NOT_ENOUGH_MEMORY;
return NO_ERROR;
}
*pInBuff->phr = COPPOpenVideoSession(pThis, uDevID);
if (*pInBuff->phr == NO_ERROR) {
*pInBuff->ppThis = pThis;
}
else {
VideoPortFreePool(pHwDeviceExtension, pThis);
}
return NO_ERROR;
}

COPP Video Miniport Driver Get Certificate Template
Code
Use the following example code to retrieve the size, in bytes, of the graphics hardware certificate for the COPP
DirectX VA device object.
VP_STATUS
IoctlCOPPGetCertificateLength(
)
{
COPP_DeviceData* pThis = (COPP_DeviceData*)*pInBuff->ppThis;
HRESULT* phr = pInBuff->phr;
*phr = COPPGetCertificateLength(pThis, (ULONG*)pVideoRequestPacket->OutputBuffer);
if (*phr == NO_ERROR) {
pVideoRequestPacket->StatusBlock->Information = sizeof(ULONG);
}
return NO_ERROR;
}

COPP Video Miniport Driver Key Exchange Template
Code
Use the following example code to retrieve the digital certificate used by the graphics hardware for the COPP
VP_STATUS
IoctlCOPPKeyExchange(
)
{
GUID* lpout = (GUID*)pVideoRequestPacket->OutputBuffer;
BYTE* pCertificate = (BYTE*)pInBuff->InputBuffer;
*phr = COPPKeyExchange(pThis, lpout, pCertificate);
pVideoRequestPacket->StatusBlock->Information = pVideoRequestPacket->OutputBufferLength;
}
return NO_ERROR;
}

COPP Video Miniport Driver Sequence Start
Template Code
Use the following example code to set the current video session to protected mode for the COPP DirectX VA device
object.
VP_STATUS
IoctlCOPPStartSeqence(
)
{
DXVA_COPPSignature* lpin = (DXVA_COPPSignature*)pInBuff->InputBuffer;
*pInBuff->phr = COPPSequenceStart(pThis, lpin);
return NO_ERROR;
}

COPP Video Miniport Driver Command Template
Code
Use the following example code to perform an operation on the COPP DirectX VA device object.
VP_STATUS
IoctlCOPPCommand(
)
{
DXVA_COPPCommand* lpin = (DXVA_COPPCommand*)pInBuff->InputBuffer;
*pInBuff->phr = COPPCommand(pThis, lpin);
return NO_ERROR;
}

COPP Video Miniport Driver Status Template Code
Use the following example code to retrieve status on a protected video session that is associated with the COPP
VP_STATUS
IoctlCOPPStatus(
)
{
DXVA_COPPStatusInput* lpin = (DXVA_COPPStatusInput*)pInBuff->InputBuffer;
DXVA_COPPStatusOutput* lpout = (DXVA_COPPStatusOutput*)pVideoRequestPacket->OutputBuffer;
*phr = COPPQueryStatus(pThis, lpin, lpout);
pVideoRequestPacket->StatusBlock->Information = sizeof(DXVA_COPPStatusOutput);
}
return S_OK;
}

COPP Video Miniport Driver Close Template Code
Use the following example code to release instances of COPP DirectX VA device objects.
VP_STATUS
IoctlCOPPCloseDevice(
)
{
*pInBuff->phr = COPPCloseVideoSession(pThis);
VideoPortFreePool(pHwDeviceExtension, pThis);
return NO_ERROR;
}

This section describes how data flow is managed in the DirectX Video Acceleration DDI and API. This information is
covered in the following sections:
Encryption Support
Setting Up DirectX VA Decoding
Probing and Locking of Configurations
Buffer Description List
Sequence Requirements
Encryption Support
Data used in video decoding can be encrypted for the following structures and types of data:
Macroblock control command structures
Residual difference block structures
Bitstream buffers
In order for the host decoder to use encryption, it must determine what types of encryption the accelerator
supports. The information about the types of encryption that are supported by an accelerator is contained in a list
of encryption-type GUIDs that are supplied to the host as video accelerator format GUIDs. For more information
about video accelerator format GUIDs, see the Microsoft Windows SDK documentation.
Note All DirectX VA accelerators must be able to operate without using encryption. Support for operating without
encryption, therefore, does not need to be declared, and the DXVA_NoEncrypt "no encryption" GUID must never be
sent in the video accelerator format GUID list.
The host selects the type of encryption protocol to apply and indicates this choice by sending a GUID to the
accelerator. In a typical encryption scenario, two more steps take place before encrypted data can be successfully
transferred:
1. The host decoder may require verification that the accelerator is authorized to receive the data. This
verification can be provided by having the accelerator pass a signed structure to the host to prove that it
holds an authorized public/private key pair.
2. The host decoder then sends an encrypted content key to the accelerator.
The precise number of steps for initializing the encryption protocol depends on the type of encryption being used
and how it is implemented.
Each data set that is exchanged between the host and accelerator to pass the necessary encryption initialization
parameters must be prefixed by the encryption protocol type GUID. This GUID distinguishes the data of one type of
encryption from the data of another. This is necessary because one type of encryption could be used for one
DirectX VA buffer, and another type of encryption could be used for another DirectX VA buffer.
The DXVA_EncryptProtocolHeader structure is used to indicate that an encryption protocol is being used as well
as the type of encryption being used.
Setting Up DirectX VA Decoding
In order for a decoder to operate correctly with an accelerator, the decoder and the accelerator must be set up for
two distinct aspects of operation:
The format of the video data to be decoded. The DXVA_ConnectMode structure is used to specify the
format.
The configuration determining the format used for data exchange between the host and the accelerator, and
establishing which process resides on the host and which on the accelerator. This configuration is
established by the negotiation of a connection configuration for each DirectX VA function to be used (as
determined by the bDXVA_Func variable). The DXVA_ConfigPictureDecode structure specifies the
configuration.
bDXVA_Func Variable
The bDXVA_Func variable is an 8-bit value that is associated with DirectX VA operations as follows.
BDXVA_FUNC VALUE OPERATION
1 Compressed picture decoding
2 Alpha-blend data loading
3 Alpha-blend combination
4 Picture resampling
The bDXVA_Func variable is used to perform the following tasks:

Probe and lock a configuration for a specific DirectX VA function. This is done by including bDXVA_Func
in a DXVA_ConfigQueryOrReplyFlag variable and in a DXVA_ConfigQueryOrReplyFlag variable when
these variables are sent in the dwFunction member of a DD_RENDERMOCOMPDATA structure in a call
to DdMoCompRender.
Specify the function associated with a configuration structure passed with a probe or lock command by
inclusion with a DXVA_ConfigQueryOrReplyFlag variable in a DXVA_ConfigQueryOrReplyFlag
variable sent in the dwFunction member of the following structures: DXVA_ConfigPictureDecode for
compressed picture decoding DXVA_ConfigAlphaLoad for alpha-blending data loading
DXVA_ConfigAlphaCombine for alpha-blending combination
Initialize an encryption protocol for a specific DirectX VA function by inclusion in a
DXVA_EncryptProtocolFunc variable sent in the dwFunction member of a
DD_RENDERMOCOMPDATA structure in a call to DdMoCompRender.
Specify the function associated with an encryption protocol by inclusion in the dwFunction member of
the DXVA_EncryptProtocolHeader structure.
Signal an operation to be performed by inclusion in a series of bDXVA_Func byte values in the
dwFunction member of a DD_RENDERMOCOMPDATA structure in a call to DdMoCompRender. The
first bDXVA_Func operation is specified in the most significant byte, the next operation is specified in the
next most significant byte, and so on. Remaining bytes in dwFunction not used to signal an operation are
set to zero.
DXVA_ConfigQueryOrReplyFlag and
DXVA_ConfigQueryorReplyFunc Variables
The DXVA_ConfigQueryOrReplyFlag variable indicates the type of query or response when using probing and
locking commands. The most significant 24 bits of the dwFunction member of the following structures contains
the DXVA_ConfigQueryOrReplyFlag variable.
DXVA_ConfigPictureDecode for compressed picture decoding.
DXVA_ConfigAlphaLoad for alpha-blending data loading.
DXVA_ConfigAlphaCombine for alpha-blending combination.
The most significant 20 bits of the DXVA_ConfigQueryOrReplyFlag variable specify the following queries and
responses.
VALUE DESCRIPTION
0xFFFF1 Sent by the host decoder as a probing command.
0xFFFF5 Sent by the host decoder as a locking command.
0xFFFF8 Sent by the accelerator with an S_OK response to a

probing command, with a copy of the probed
configuration.
0xFFFF9 Sent by the accelerator with an S_OK response to a

probing command, with a suggested alternative
configuration.
0xFFFFC Sent by the accelerator with an S_OK response to a

locking command, with a copy of the locked configuration.
0xFFFFB Sent by the accelerator with an S_FALSE response to a

probing command, with a suggested alternative
configuration.
0xFFFFF Sent by the accelerator with an S_FALSE response to a

locking command, with a suggested alternative
configuration.
The least significant 4 bits of the DXVA_ConfigQueryOrReplyFlag variable specify the following status indicators for
queries and responses.
BIT DESCRIPTION
3 This is zero when sent by the host decoder, and 1 when

sent by the accelerator.
2 This is zero when associated with a probe, and 1 when

associated with a lock.
1 This is zero for success, and 1 for failure.
0 This is zero when it is a duplicate configuration structure,

and 1 when it is a new configuration structure.
The least significant 8 bits of the dwFunction member is the is the bDXVA_Func variable. The bDXVA_Func
variable, when used with DXVA_ConfigQueryorReplyFunc, indicates probing and locking operations and specifies
an associated configuration function.
Probing and Locking
When bDXVA_Func is used to probe and lock a configuration for a specific DirectX VA function, bDXVA_Func is
placed in the 8 least significant bits of the DXVA_ConfigQueryorReplyFunc variable.
DXVA_ConfigQueryorReplyFunc is conveyed to the accelerator as specified in the Microsoft Windows SDK.
Specifying a Configuration To Be Probed or Locked
When bDXVA_Func is used to specify the function associated with a configuration structure that is passed with a
probe or lock command, bDXVA_Func is placed in the 8 least significant bits of the DXVA_ConfigQueryorReplyFunc
variable in the dwFunction member of one of the following configuration structures:
DXVA_ConfigPictureDecode for compressed picture decoding.
DXVA_ConfigAlphaLoad for alpha-blending data loading.
DXVA_ConfigAlphaCombine for alpha-blending combination.
DXVA_EncryptProtocolFunc
The most significant 24 bits of the DXVA_EncryptProtocolFunc DWORD variable are set as follows:
0xFFFF00 when sent by the host software decoder in the dwFunction member of the
DD_RENDERMOCOMPDATA structure in a call to DdMoCompRender.
0xFFFF08 when sent by the video accelerator in the dwFunction member of the
DXVA_EncryptProtocolHeader structure.
The least significant 8 bits of the DXVA_EncryptProtocolFunc DWORD variable contain the value of bDXVA_Func
associated with the encryption protocol. The only value supported for this use is bDXVA_Func = 1 (compressed
picture decoding).
Specifying an Operation to be Performed by DdMoCompRender
When bDXVA_Func is used to signal an actual operation to be performed (compressed picture decoding, alpha-
blend data loading, alpha-blend combination, or picture resampling), bDXVA_Func is conveyed to the accelerator by
inclusion in a series of bDXVA_Func byte values in the dwFunction member of a DD_RENDERMOCOMPDATA
structure in a call to DdMoCompRender. The first bDXVA_Func operation is specified in the most significant byte,
the next operation is specified in the next most significant byte, and so on. Any remaining bytes of dwFunction are
set to zero.
Probing and Locking of Configurations
The process for establishing the configuration for each DirectX VA function (a specific value of bDXVA_Func) that
needs a configuration (for example, compressed picture decoding, alpha-blending data loading, and alpha-blending
combination) can be performed by:
1. Probing (if needed) to determine whether a configuration is accepted by the accelerator.
2. Locking in a specific configuration, if it is supported.
To determine if a specific configuration is supported, a probing command is sent to the accelerator for the
particular bDXVA_Func value to be probed, along with a configuration. In addition to the probing command, a
configuration structure (for the value in bDXVA_Func) is sent that describes the configuration being probed to
determine whether the configuration is supported. The accelerator then returns a value of S_OK or S_FALSE,
indicating whether the specified configuration is supported by the accelerator. The accelerator can also return a
suggested alternative configuration.
To lock in a specific configuration, a locking command is sent to the accelerator for the particular bDXVA_Func to
be locked. Along with the locking command, a configuration structure (for the value in bDXVA_Func) is sent that
describes the configuration to be locked in, if the configuration is supported. The accelerator returns an S_OK or
S_FALSE indicating whether the specified configuration is supported by the accelerator. If the return value is S_OK,
the specified configuration is locked in for use. If the return value is S_FALSE, a suggested alternative configuration
is returned.
The decoder may send a locking command without first sending a probing command for the specified
configuration. If the accelerator returns an S_OK in a probing command for a specific configuration, it returns an
S_OK to a locking command for that same configuration, unless otherwise noted. After a locking command has
been sent and the accelerator returns S_OK, the specified configuration is locked in and no additional probing or
locking commands are sent by the decoder for the same value of bDXVA_Func.
To ensure that all DirectX VA software decoders can operate with all DirectX VA accelerators, a minimal
interoperability configuration set is defined as a set of configurations that must be supported by any decoder using
a particular value for bDXVA_Func. Every accelerator that indicates support for the bDXVA_Func variable by
exposing an associated video accelerator GUID must support at least one member of this interoperability
configuration set. In some cases, an additional encouraged configuration set may also be defined.
The following figure shows the control flow of probing and locking commands sent by the decoder.
Buffer Description List
DirectX VA operates primarily by passing buffers of data from the host decoder to the hardware accelerator. When
a set of buffers is passed from the host to the accelerator, a buffer description list is sent to describe the buffers. A
buffer description list is an array of DXVA_BufferDescription structures. The buffer description list contains one
DXVA_BufferDescription structure for each buffer in the set of buffers being sent. The buffer description list starts
with one or more DXVA_BufferDescription structures for the first type of buffer being sent. This is followed by one
or more DXVA_BufferDescription structures for the next type of buffer being sent, and so on.
The value of the dwTypeIndex member of the DXVA_BufferDescription structure specifies what type of buffer is
passed from the host to the accelerator.
Sequence Requirements
Sequence requirements for the accelerator and for the decoder must be observed to avoid race conditions and
improper operation of the decoder and accelerator during the decoding process.
Accelerator
When queried, the hardware accelerator reports whether the display of an uncompressed surface is pending or in
progress, and if requested operations have been completed. However, it is the responsibility of the host software
decoder (not the accelerator) to ensure that race conditions do not cause undesirable behavior during the decoding
process.
Decoder
The decoder must observe two rules to properly decode and display uncompressed surfaces:
1. Do not overwrite any picture that has been submitted for display unless it has already been shown on the
display and also removed from the display.
2. Do not overwrite any picture that is needed as a reference for the creation of other pictures that have not yet
been created.
Following these rules ensures proper operation of sequential operations in the decoding process and avoids
tearing artifacts on the display. The guiding rule is: Do not write over what you need for referencing or display, and
avoid race conditions.
To avoid race conditions, the software decoder must query the status of the accelerator. The decoder must also use
a sufficient number of uncompressed picture surfaces to ensure that space is available for all necessary operations.
This results in a need for at least four uncompressed picture surfaces for decoding video streams consisting of I, B,
and P pictures. Using more than four surfaces is generally encouraged and is necessary for some operations, such
as front-end alpha blending. (Using extra surfaces can significantly reduce the need to wait for operational
dependencies to be resolved.)
Examples that show the decoding of conventional I, B, and P-structured video frames (without using a deblocking
filter) are provided in Using Four Uncompressed Surfaces for Decoding and Using Five or More Uncompressed
Surfaces for Decoding.
Note For compressed buffers, as well as for uncompressed surfaces, it is generally better to cycle through the
allocated and available buffers rather than to keep reusing the same buffer, or the same subset of allocated buffers.
This can reduce the possibility of added delays caused by waiting on unnecessary dependencies. The allocation of
multiple buffers by a driver should be taken as an indication that cycling through these buffers for double or triple
buffering is the proper way to operate and to avoid artifacts, such as temporary picture freezes. This applies to
alpha-blend data loading in particular.
Using Four Uncompressed Surfaces for Decoding
The following table shows a hypothetical situation in which a video decoder requires one frame time to decode
each picture. It decodes a bitstream consisting of a steadily increasing number of B pictures starting from zero B
pictures after an initial I picture. The bitstream of B pictures occurs between pairs of P pictures. In this table, a letter
shows the type of each picture (I, B, or P), a subscript shows the frame display index (the temporal display order of
each picture), and a superscript shows the number of the buffer containing the picture.
FRAMES DECODED (AT START OF

PICTURE DECODED PICTURE DISPLAYED INTERVAL)
I⁰₀ 0
P¹₁ 1
P²₃ I⁰₀ 2
B³₂ P¹₁ 3
P⁰₆ B³₂ 4
B¹₄ P²₃ 5
B³₅ B¹₄ 6
P²₁₀ B³₅ 7
B¹₇ P⁰₆ 8
B³₈ B¹₇ 9
B¹₉ B³₈ 10
P⁰₁₅ B¹₉ 11
B³₁₁ P²₁₀ 12
B¹₁₂ B³₁₁ 13
FRAMES DECODED (AT START OF
PICTURE DECODED PICTURE DISPLAYED INTERVAL)
B³₁₃ B¹₁₂ 14
B¹₁₄ B³₁₃ 15
P²₂₁ B¹₁₄ 16
B³₁₆ P²₂₁ 17
B¹₁₇ B³₁₆ 18
B³₁₈ B¹₁₇ 19
B¹₁₉ B³₁₈ 20
B³₂₀ B¹₁₉ 21
P⁰₂₈ B³₂₀ 22
Each B picture in the preceding table requires the decoding of two prior pictures in bitstream order before it can be
decoded. As a consequence, the decoder cannot begin displaying pictures with their proper timing until after the
second picture has been decoded (that is, until during the third time slice of decoding). Somewhere during this time
slice, the display of pictures with their proper timing can begin.
The initiation of the display of a picture may not perfectly coincide with the picture that appears on the display.
Instead, the display may continue to show a picture prior to the one that was sent for display until the proper time
arrives to switch to the new picture. For optimal performance, surface 0 (which holds the first I picture) should not
be overwritten for use by the B picture that arrives three frame times later, even though the I picture is not needed
by that B picture for referencing. Instead, a fourth surface (surface 3) should be used to hold that B picture. This
eliminates the need to check whether the display period of the first I picture has been completed before decoding
the B picture.
The two rules described in sequence requirements for decoders require that each of the first three decoded pictures
be placed in different surfaces, because none of them has been displayed until some time during the third period
(period 2). Then, the fourth decoded picture should be placed in a fourth surface, because the display of the first
displayed picture may not yet be over until some time during the fourth period (period 3).
A significant obstacle in the decoding process occurs as a result of having more than two B pictures in succession.
This occurs in the preceding table upon encountering the tenth decoded picture (B¹₉). When the third or
subsequent B picture in a contiguous series is encountered, the time lag tolerance between the display of one B
picture and the use of a surface to hold the next decoded B picture is eliminated. The host decoder must check the
display status of the B picture that was displayed in the previous period (B¹₇) to ensure that it has been removed
from the display (waiting for this to happen if necessary), then it must immediately use the same surface for the
next B picture to be decoded (surface 1 used for B¹₉). The decoder cannot decode the new B picture into either of
the surfaces being used to hold its reference I or P pictures (in this case, surfaces 0 and 2 used for P⁰₆ and P²₁₀), and
cannot decode the new B picture into the surface being displayed during the same interval of time (in this case,
surface 3 used for B³₈). Thus, it must use the surface that was displayed in the immediately preceding period (in this
case, surface 1).
Using Five or More Uncompressed Surfaces for
Decoding
More than four uncompressed surfaces can be used for decoding, allowing the time lag between the start of the
display of a buffer and new write operations to that buffer, to increase from a minimum of one display period to
two or more. This technique can provide more of an allowance for jitter in the timing of the decoding process. This
technique can also enable output processing on the decoded pictures to perform a three-field deinterlace operation
as part of the display process. This is because not only is the current picture available for display, but the previous
picture is also available, and can provide context and allow a one-field delay in the actual display process.
Although a minimum of four buffers is required for effective use of DirectX VA with B pictures, the use of five or
more buffers is encouraged, particularly in scenarios that do not require keeping delay to a minimum. DirectX VA
decoders for I, B, and P-structured video decoding are therefore expected to set their minimum and maximum
requested uncompressed surface allocation counts to at least four and five, respectively, when allocating
uncompressed surfaces. Using one or more extra uncompressed surfaces can achieve reliable, tear-free operation.
This section describes the operations defined by values of the bDXVA_Func variable. This variable is defined by the
dwFunction member of structures that are related to the operations that are described in the following topics:
Compressed Picture Decoding
Alpha-Blend Data Loading
Alpha-Blend Combination
Picture Resampling Control
Compressed Picture Decoding
When the bDXVA_Func variable equals 1, the operation specified is compressed picture decoding. The
DXVA_ConfigPictureDecode structure contains the DirectX VA connection configuration data for compressed
picture decoding.
Compressed Picture Parameters
The parameters that must be sent once for each picture to be decoded are specified in the
DXVA_PictureParameters structure.
Pixel Formats for Uncompressed Video
In order for applications to use uncompressed decoded pictures, pictures must be produced in a known video pixel
format. The list of uncompressed picture formats supported by any DirectX VA accelerator must contain at least one
of the pixel formats described in 4:2:2 Video Pixel Formats or 4:2:0 Video Pixel Formats.
4:2:2 Video Pixel Formats
To decode compressed 4:2:2 video, use one of the following uncompressed pixel formats.
PIXEL FORMAT DESCRIPTION
YUY2 Data is found in memory as an array of unsigned

characters in which the first byte contains the first sample
of Y, the second byte contains the first sample of Cb, the
third byte contains the second sample of Y, the fourth
byte contains the first sample of Cr; and so on. If data is
addressed as an array of two little-endian WORD type
variables, the first WORD contains Y₀ in the least
significant bits and Cb in the most significant bits, and the
second WORD contains Y₁ in the least significant bits and
Cr in the most significant bits. YUY2 is the preferred
DirectX VA 4:2:2 pixel format.
UYVY The same as YUY2, except for swapping the byte order in
each WORD. If data is addressed as an array of two little-
endian WORD type variables, the first WORD contains Cb
in the least significant bits and Y₀ in the most significant
bits, and the second WORD contains Cr in the least
significant bits and Y₁ in the most significant bits.

4:2:0 Video Pixel Formats
To decode compressed 4:2:0 video, use one of the following uncompressed pixel formats.
YUY2 As described in 4:2:2 Video Pixel Formats, except that two

lines of output Cb and Cr samples are produced for each
actual line of 4:2:0 Cb and Cr samples. The second line of
each pair of output lines is generally either a duplicate of
the first line or is produced by averaging the samples in
the first line of the pair with the samples of the first line of
the next pair.
UYVY As described in 4:2:2 Video Pixel Formats, except that two

lines of output Cb and Cr samples are produced for each
actual line of 4:2:0 Cb and Cr samples. The second line of
each pair of output lines is generally either a duplicate of
the first line or is produced by averaging the samples in
the first line of the pair with the samples of the first line of
the next pair.
YV12 All Y samples are found first in memory as an array of

unsigned char (possibly with a larger stride for memory
alignment), followed immediately by all Cr samples (with
half the stride of the Y lines, and half the number of lines),
then followed immediately by all Cb samples in a similar
fashion.
IYUV The same as YV12, except for swapping the order of the
Cb and Cr planes.
NV12 A format in which all Y samples are found first in memory

as an array of unsigned char with an even number of lines
(possibly with a larger stride for memory alignment). This
is followed immediately by an array of unsigned char
containing interleaved Cb and Cr samples. If these
samples are addressed as a little-endian WORD type, Cb
would be in the least significant bits and Cr would be in
the most significant bits with the same total stride as the
Y samples. NV12 is the preferred 4:2:0 pixel format.
NV21 The same as NV12, except that Cb and Cr samples are

swapped so that the chroma array of unsigned char would
have Cr followed by Cb for each sample (such that if
addressed as a little-endian WORD type, Cr would be in
the least significant bits and Cb would be in the most
significant bits).
IMC1 The same as YV12, except that the stride of the Cb and Cr
planes is the same as the stride in the Y plane. Also, the
Cb and Cr planes must fall on memory boundaries that
are a multiple of 16 lines. The following code examples
show calculations for the Cb and Cr planes.
BYTE* pCr = pY + (((Height + 15) & ~15) *

Stride);
BYTE* pCb = pY + (((((Height * 3) / 2) + 15) &
~15) * Stride);
In the preceding examples, pY is a byte pointer that points

to the beginning of the memory array, and Height must
be a multiple of 16.
IMC2 The same as IMC1, except that Cb and Cr lines are

interleaved at half-stride boundaries. In other words, each
full-stride line in the chrominance area starts with a line of
Cr, followed by a line of Cb that starts at the next half-
stride boundary. (This is a more address-space-efficient
format than IMC1, because it cuts the chrominance
address space in half, and thus cuts the total address
space by 25 percent.) This is an optionally preferred
format in relation to NV12, but NV12 appears to be more
popular.
IMC3 The same as IMC1, except for swapping Cb and Cr.
IMC4 The same as IMC2, except for swapping Cb and Cr.
For more information about these formats, see Recommended 8-Bit YUV Formats for Video Rendering in the
Microsoft Media Foundation documentation.
Macroblock-Oriented Picture Decoding
The macroblock is a fundamental unit of the video decoding process. A macroblock consists of a rectangular array
of luminance (Y) samples and two corresponding arrays of chroma (Cb and Cr) samples. In the established video
coding standards, the macroblocks are 16x16 blocks in luminance sample dimensions. If the video is coded in 4:2:0
format, the two chroma arrays each have half the height and half the width of the luma array for the macroblock. If
the video is coded in 4:2:2 format, the two chrominance arrays, each have the same height and half the width of the
luminance array for the macroblock. If the video is coded in the 4:4:4 format, the two chrominance arrays each have
the same size as the luminance array for the macroblock.
A macroblock may be predicted using motion compensation with one or more motion vectors, or may be coded as
intra without such prediction. After determining whether the macroblock is predicted or not, the remaining signal
refinement, if any, is added in the form of residual difference data blocks. In the established video coding standards,
these residual difference data blocks are 8x8, so that four residual difference data blocks are needed to cover a
16x16 luminance macroblock.
Macroblock Control Commands
The generation of each decoded macroblock during compressed picture decoding is governed by a macroblock
control command structure. There are four macroblock control command structures defined in the dxva.h header
file:
DXVA_MBctrl_I_HostResidDiff_1
DXVA_MBctrl_I_OffHostIDCT_1
DXVA_MBctrl_P_HostResidDiff_1
DXVA_MBctrl_P_OffHostIDCT_1
The structures explicitly defined in dxva.h are special cases of a generic design used for macroblock control
commands in DirectX VA. For a description of this generic design, see Generic Form of Macroblock Control
Command Structures.
The selection of which macroblock control command structure to use is based on the type of picture to be decoded
and on how it will be decoded. The following structure members and flags determine picture type, decoding
options, and which of the four DirectX VA macroblock control structures will be used:
The bPicIntra, bChromaFormat, bPicOBMC, bPicBinPB, bPic4MVallowed and bMV_RPS members of
the DXVA_PictureParameters structure.
The bConfigResidDiffHost member of the DXVA_ConfigPictureDecode structure.
The HostResidDiff flag (bit 10 in the wMBtype member of each macroblock control structure).
The values for these structure members and flags are shown in the following sections.
The DXVA_MBctrl_I_HostResidDiff_1 structure is used for intra pictures with host-based residual difference
decoding. The following structure members and flags must equal the indicated values:
bPicIntra must equal 1 (intra pictures).
bChromaFormat must equal 1 (4:2:0 sampling).
HostResidDiff must equal 1 (host-based IDCT).
bConfigResidDiffHost must equal 1 (host-based residual difference decoding).
The DXVA_MBctrl_I_OffHostIDCT_1 structure is used for intra pictures with 4:2:0 sampling with off-host residual
difference decoding. The following structure members and flags must equal the indicated values:
bPicIntra must equal 1 (intra pictures).
HostResidDiff must equal zero (off-host IDCT).
bConfigResidDiffHost must equal zero (off-host residual difference decoding).
The DXVA_MBctrl_P_HostResidDiff_1 structure is used for P and B pictures with host-based residual difference
decoding. The following macroblock control processes are not used: OBMC, use of four motion vectors per
macroblock for the B part of a PB picture, and use of motion vector reference picture selection.
The following structure members and flags must equal the indicated values:
bPicIntra must equal zero (decoding for P picture and B picture or concealment motion vectors in I picture).
HostResidDiff must equal 1 (host-based IDCT).
bPicOBMC must equal zero (OBMC not used).
bMV_RPS must equal zero (motion vector reference picture selection not used).
At least one of bPicBinPB (B-picture in PB-frame motion compensation not used) and bPic4MVallowed
(four forward-reference motion vectors per macroblock not used) must equal zero.
bConfigResidDiffHost must equal 1 (host-based residual difference decoding).
The DXVA_MBctrl_P_OffHostIDCT_1 structure is used for P and B pictures with 4:2:0 sampling with off-host
residual difference decoding. The following macroblock control processes are not used: OBMC, use of four motion
vectors per macroblock for the B part of a PB picture, and use of motion vector reference picture selection.
The following structure members and flags must equal the indicated values:
bPicIntra member of the DXVA_PictureParameters structure must equal zero (decoding for P and B
pictures or concealment motion vectors in I pictures).
HostResidDiff must equal zero (off-host IDCT).
bPicOBMC must equal zero (OBMC not used).
bMV_RPS must equal zero (motion vector reference picture selection not used).
At least one of bPicBinPB (B-picture in PB-frame motion compensation not used) and bPic4MVallowed
(four forward-reference motion vectors per macroblock not used) must equal zero.
bConfigResidDiffHost must equal zero (off-host residual difference decoding).
Motion Vectors
If the picture is not an intra picture (the bPicIntra member of the DXVA_PictureParameters structure is zero),
motion vectors are included in the macroblock control command structure. The number of motion vectors that are
included in the structure depends upon the type of picture (for example, B picture or P picture). Additionally, if
macroblock-based reference-picture selection (as defined in H.263 Annex U) is in use, then a reference-picture
selection index for each motion vector is also included in the macroblock control-command structure.
The space reserved for motion vectors in each macroblock control command structure is generally the amount
needed for four motion vectors. Each motion vector is specified using a DXVA_MVvalue structure. These usual
cases include the two preceding nonintra cases. The remaining cases (not explicitly defined in the dxva.h header file)
are as follows:
If OBMC is in use (the bPicOBMC member of the DXVA_PictureParameters structure is 1) and the picture
is not the B part of a PB picture (the bPicBinPB member of this structure is zero), space for 10 motion
vectors, plus any additional space needed to align to a 16-byte boundary, is included.
If OBMC is in use (the bPicOBMC member of the DXVA_PictureParameters structure is 1) and the picture is
the B part of a PB picture (the bPicBinPB member of this structure is 1), space for 11 motion vectors, plus
any additional space needed to align to a 16-byte boundary, is included.
If OBMC is not in use (the bPicOBMC member of the DXVA_PictureParameters structure is zero), the picture
is the B part of a PB picture (the bPicBinPB member of this structure is 1), and four motion vectors per
macroblock are allowed (the bPic4Mvallowed member of this structure is 1), the space for five motion
vectors, plus any additional space needed to align to a 16-byte boundary, is included.
Macroblock Control Command Buffers
A decoded picture contains one or more macroblock control command buffers (if it does not contain bitstream
buffers). The decoding process for every macroblock is specified (only once) in a macroblock control command
buffer.
For every macroblock control command buffer, there is a corresponding residual difference block data buffer
containing data for the same set of macroblocks. If one or more deblocking filter control buffers are sent, the set of
macroblocks in each deblocking filter control buffer is the same as the set of macroblocks in the corresponding
macroblock control and residual difference block data buffers.
The processing of a picture requires that the motion prediction for each macroblock precede the addition of the
residual difference data. Picture decoding can be accomplished in one of the following two ways:
Process the motion prediction commands in the macroblock control command buffer first and then read the
motion-compensated prediction data back in from the uncompressed destination surface, while processing
the residual difference data buffer.
Process the macroblock control command buffer and the residual difference data buffer in a coordinated
fashion. Add the residual data specified in the residual difference data buffer to the prediction before writing
the result to the uncompressed destination surface.
The macroblock control command and the residual difference data for each macroblock affect only the rectangular
region within that macroblock.
The total number of macroblock control commands in the macroblock control command buffer is specified by the
dwNumMBsInBuffer member of the corresponding DXVA_BufferDescription structure.
The quantity and type of data in the residual difference data buffer is determined by the wPatternCode,
wPC_Overflow, and bNumCoef members of the corresponding macroblock control command.
The following figure shows the relationship between the macroblock control command buffer and the residual
difference data buffer.
If the bConfigMBcontrolRasterOrder member of the DXVA_ConfigPictureDecode structure is equal to 1, then

the following expression applies to the preceding illustration where i is the index of the macroblock within the
macroblock control command buffer.

Macroblock Addresses
A macroblock address is the position of the macroblock in raster-scan order within the picture. The horizontal and
vertical position of the macroblock in the picture is determined from the macroblock address using the specified
width and height of the picture, which is defined by the wPicWidthInMBminus1 and wPicHeightInMBminus1
members of the DXVA_PictureParameters structure. Following are some examples of macroblock addresses.
MACROBLOCK ADDRESS
top-left Zero
top-right wPicWidthInMBminus1
lower-left wPicHeightInMBminus1 x (wPicWidthInMBminus1 + 1)
lower-right (wPicHeightInMBminus1 + 1) x
(wPicWidthInMBminus1 + 1) - 1

Generating Skipped Macroblocks
The generation of a skipped macroblock in DirectX VA differs somewhat from that in MPEG-2 Video Section 7.6.6.
In DirectX VA skipped macroblocks are generated in a separate macroblock control command, rather than being
inferred from the type of the preceding nonskipped macroblock and the type of picture displayed (for example, in
MPEG-2, the method of generating skipped macroblocks depends on whether the picture is a P picture or B
picture.)
The following conditions are required when generating and using skipped macroblocks:
Skipped macroblocks have no residual differences.
Skipped macroblocks can be generated by repeating the operation of a macroblock control command with
an incremented wMBaddress. (Each subsequent skipped macroblock is generated in the same way as the
first, except for incrementing the value of wMBaddress.)
Macroblock skipping is restricted from wrapping to a new row of macroblocks in the picture. (A separate
macroblock control command must be sent to generate the first macroblock of each row of macroblocks.)
The content of a macroblock control command with a nonzero value for MBskipsFollowing is equivalent
(except for the value of MBskipsFollowing) to the content of an explicit specification of the first of the series
of skipped macroblocks. Thus, whenever MBskipsFollowing is not zero, the following structure members and
variables must all be equal to zero: Motion4MV, IntraMacroblock, wPatternCode, and wPC_Overflow.
Because of the first three preceding conditions, an accelerator may implement motion compensation (when
Motion4MV is zero) by applying the specified motion vectors to a rectangle of width equal to the following
expression in the luminance component, and to a similarly specified rectangle in the chrominance components. This
rectangular-area motion compensation method can be performed by the accelerator rather than by using
MBskipsFollowing+1 repetitions of the same macroblock control operation.
(bMacroblockWidthMinus1+1) X (MBskipsFollowing+1)
The bMacroblockWidthMinus1 member is contained in DXVA_PictureParameters. The MBskipsFollowing

variable is in the wMBtype member of each macroblock control structure.
Skipped Macroblocks in H.263 (Annex F )
The generation of skipped macroblocks in H.263 with advanced prediction mode active (Annex F), requires
representing some skipped macroblocks as nonskipped macroblocks in DirectX VA macroblock control commands.
This is done in order to generate the OBMC effect within these macroblocks.
Generating Skipped Macroblocks in MPEG -2 Example
The following example shows how macroblock control commands are used when skipped macroblocks are
generated. For demonstration purposes, assume that in an MPEG-2 bitstream seven macroblocks are used in the
following manner.
MACROBLOCK NUMBER DESCRIPTION
0 Coded with a residual difference

MACROBLOCK NUMBER DESCRIPTION
1 Skipped
3 Skipped
4 Skipped
5 Skipped
These seven macroblocks would require the generation (at least) of the five DirectX VA macroblock control
commands shown in the following table. The MBskipsFollowing variable indicates the number of skipped
macroblocks. The wMBaddress member indicates the address of the macroblock. MBskipsFollowing and
wMBaddress are contained in the DXVA_MBctrl_P_OffHostIDCT_1, and DXVA_MBctrl_P_HostResidDiff_1
structures. (The MBskipsFollowing variable is defined in the dwMB_SNL structure member.)
MACROBLOCK COMMAND MEMBER VALUES
First wMBaddress = 0
MBskipsFollowing = 0
Second wMBaddress = 1
Third wMBaddress = 2
Fourth wMBaddress = 3
Fifth wMBaddress = 6

Deblocking Filter Commands
A deblocking filter command for a macroblock may require the accelerator to read the value of reconstructed
samples within, and next to, the current macroblock. The reconstructed values read are the two rows of samples
above the current macroblock, the two columns of samples to the left of the current macroblock, and samples
within the current macroblock. A deblocking filter command can result in modification of one row of samples
above the current macroblock and one column of samples left of the current macroblock, as well as up to three
rows and three columns of samples within the current macroblock. The deblocking filtering process for a given
macroblock could, therefore, require the prior reconstruction of two other macroblocks.
The two different types of deblocking filter command buffers are:
A buffer that requires access and modification of the value of reconstructed samples for macroblocks
outside those of the current deblocking filter command buffer (when the bPicDeblockConfined member
of the DXVA_PictureParameters structure is zero).
A buffer that does not require access and modification of the value of reconstructed samples for
macroblocks outside those of the current deblocking filter command buffer (when bPicDeblockConfined
is 1).
To process the first type of deblocking command buffer, the accelerator must ensure that the macroblock
reconstruction has been completed for all buffers that affect macroblocks to the left or above the macroblocks
controlled in the current buffer. This must be done before processing the deblocking commands in the current
buffer.
To process the second type of deblocking command buffer, the accelerator uses only prior reconstruction values
within the current buffer.
The deblocking filter operations can be performed in the accelerator in one of two ways:
Processing the motion prediction and residual difference data for the entire buffer or frame first, followed by
reading back in the values of some of the samples and modifying them as a result of the deblocking filter
operations.
Processing the deblocking command buffer in a coordinated way with the residual difference data buffer. In
this case, the deblocking command buffers are processed before writing the reconstructed output values to
the destination picture surface.
Note The destination picture surface for the deblocked picture could differ from that of the picture reconstructed
prior to deblocking. This would then support "outside the loop" deblocking as a postdecoding process that did not
affect the sample values used for prediction of the next picture.
Generic Form of Macroblock Control Command
Structures
The following macroblock control structures explicitly defined in dxva.h are special cases of a generic design used
for macroblock control commands in DirectX VA:
These structures represent only the most commonly used forms of macroblock control commands. Additional
macroblock control commands can be created, based upon the design of these existing structures, to allow a driver
to support other video decoding elements and to handle different configurations for the decoding process.
This section describes the members of a generic macroblock control command structure that are used as the basis
for creating additional macroblock control commands. The macroblock control command structure definition in
this section is divided into four parts.
Note Macroblock control commands are aligned with 16-byte memory boundaries and constructed as packed data
structures with single-byte alignment packing.
First Part of Macroblock Control Command Structure
The first four members of a generic macroblock control command structure are always the same. The following
table describes the members of the first part of this structure.
MEMBER DESCRIPTION
wMBaddress Specifies the macroblock address of the macroblock

currently being processed.
wMBtype Specifies the type of macroblock being processed. This

member contains flags that indicate whether motion
compensation is used to predict the value of the
macroblock and what type of residual difference data is
sent.
dwMB_SNL Contains the two fields MBskipsFollowing (in the upper 8

bits) and MBdataLocation (in the lower 24 bits).
MBskipsFollowing specifies the number of skipped
macroblocks to be generated following the current
macroblock.
MBdataLocation is an index into the IDCT residual
difference block data buffer, indicating the location of the
residual difference data for the blocks of the current
macroblock.
wPatternCode Indicates whether residual difference data is sent for each

block in the macroblock.
wMBaddress
The wMBaddress structure member specifies the macroblock address of the current macroblock in raster scan
order. The following table shows examples of macroblock addresses.
MACROBLOCK ADDRESS
top-left Zero
top-right wPicWidthInMBminus1
lower-left wPicHeightInMBminus1 x (wPicWidthInMBminus1+1)
lower-right (wPicHeightInMBminus1+1) x
(wPicWidthInMBminus1+1) - 1
The wPicWidthInMBminus1 and wPicHeightInMBminus1 addresses are members of the

DXVA_PictureParameters structure.
wMBtype
The wMBtype structure member specifies the type of macroblock being processed. This member contains a set of
bits that define the way macroblocks and motion vectors are processed. The bPic4MVallowed, bPicScanMethod,
bPicBackwardPrediction, bPicStructure, and bPicScanFixed addresses are members of the
DXVA_PictureParametersstructure. The bConfigHostInverseScan address is a member of the
DXVA_ConfigPictureDecode structure.
BITS DESCRIPTION
15 to 12 MvertFieldSel_3 (bit 15, the most significant) through

MvertFieldSel_0 (bit 12)
Specifies vertical field selection for corresponding motion
vectors sent later in the macroblock control command, as
specified in the following tables. For frame-based motion
with a frame picture structure (for example, for H.261 and
H.263), these bits must all be zero. The bits in
MvertFieldSel_0, MvertFieldSel_1, MvertFieldSel_2, and
MvertFieldSel_3 correspond to the
motion_vertical_field_select[r][s] bits in Section 6.3.17.2 of
MPEG-2.
11 Reserved Bit. Must be zero.
10 HostResidDiff
Specifies whether spatial-domain residual difference
decoded blocks are sent, or whether transform coefficients
are sent for off-host IDCT for the current macroblock.
Must be zero if bConfigResidDiffHost is zero. Must be 1
if bConfigResidDiffAccelerator is zero.
9 and 8 MotionType
Specifies the motion type in the picture. For example, for
frame-based motion with a frame picture structure (as in
H.261), bit 9 must be 1 and bit 8 must be zero.
The use of these bits corresponds directly to the use of
the frame_motion_type or field_motion_type bits in
Section 6.3.17.1 and Tables 6-17 and 6-18 of the MPEG-2
video standard when these bits are present in an MPEG-2
bitstream. The use of these bits is further explained
following this table.
BITS DESCRIPTION
7 and 6 MBscanMethod
Specifies the macroblock scan method. This must be equal
to bPicScanMethod if bPicScanFixed is 1. If
HostResidDiff is 1, this variable has no meaning and these
bits should be set to zero.
If bConfigHostInverseScan is zero, MBscanMethod must
be one of the following values:
Bit 6 is zero and bit 7 is zero for zigzag scan
(MPEG-2 Figure 7-2)
Bit 6 is 1 and bit 7 is zero for alternate-vertical
scan (MPEG-2 Figure 7-3)
Bit 6 is zero and bit 7 is 1 for alternate-horizontal
scan (H.263 Figure I.2 Part a)
If bConfigHostInverseScan is 1, MBscanMethod must be
equal to the following value:
Bit 6 is 1 and bit 7 is 1 for arbitrary scan with
absolute coefficient address.
5 FieldResidual
Indicates whether the residual difference blocks use a field
IDCT structure as specified in MPEG-2.
This flag must be 1 if bPicStructure is 1 or 2. This flag
must be zero when used for MPEG-2 if the
frame_pred_frame_DCT flag in the MPEG-2 syntax is 1.
This flag must be equal to the dct_type element of the
MPEG-2 syntax when used for MPEG-2 if dct_type is
present for the macroblock.
4 H261LoopFilter
Specifies whether the H.261 loop filter (Section 3.2.3 of
H.261) is active for the current macroblock prediction. The
H.261 loop filter is a separable ¼, ½, ¼ filter applied both
horizontally and vertically to all six blocks in an H.261
macroblock, except at block edges where one of the taps
would fall outside the block. In such cases, the filter is
changed to have coefficients 0, 1, 0. Full arithmetic
precision is retained with rounding to 8-bit integers at the
output of the 2-D filter process (half-integer or higher
values being rounded up).
3 Motion4MV
Indicates that forward motion uses a distinct motion
vector for each of the four luminance blocks in the
macroblock, as used in H.263 Annexes F and J.
Motion4MV must be zero if MotionForward is zero or if
bPic4MVallowed is zero.
BITS DESCRIPTION
2 MotionBackward
This variable is used as specified for the corresponding
macroblock_motion_backward parameter in MPEG-2. If
the bPicBackwardPrediction member of the
DXVA_PictureParameters structure is zero,
MotionBackward must be zero.
1 MotionForward
This variable is used as specified for the corresponding
macroblock_motion_forward in MPEG-2. The use of this
bit is further explained in the text following this table.
0 IntraMacroblock
Indicates that the macroblock is coded as intra and that
no motion vectors are used for the current macroblock.
This variable corresponds to the macroblock_intra variable
in MPEG-2. The use of this bit is further explained in the
text following this table.
When macroblocks are predictively coded, they have associated motion vector values. The values are generated
based on whether macroblocks are used for field-coded or frame-coded pictures. It is important for any
implementation to properly account for every utilized macroblock type (especially for field-structured pictures or
dual-prime motion).
The following two tables in this section indicate valid combinations of IntraMacroblock, MotionForward,
MotionBackward, MotionType, MvertFieldSel, and MVector for frame-coded and field-coded pictures. MVector
contains the horizontal and vertical components of a motion vector. The remaining variables and flags specify
motion vector operation. This is determined according to the type of macroblock processed and whether
macroblocks are being used for frame-coded or field-coded pictures.
The values shown in the following tables (in this section) occur for the following conditions:
H261LoopFilter, Motion4MV, and bPicOBMC are zero.
PicCurrentField flag is zero unless bPicStructure is 2 (bottom field). In this case, PicCurrentField is 1.
MVector is a member of the DXVA_MBctrl_P_HostResidDiff_1 and DXVA_MBctrl_P_OffHostIDCT_1 structures.
The IntraMacroblock, MotionForward, MotionBackward, MotionType, MvertFieldSel, H261LoopFilter, and
Motion4MV flags and variables are bitfields contained in the wMBtype member of the
DXVA_MBctrl_P_HostResidDiff_1 and DXVA_MBctrl_P_OffHostIDCT_1 structures. bPicOBMC is a member of the
DXVA_PictureParameters structure. The PicCurrentField flag is derived from the bPicStructure member of
DXVA_PictureParameters.
The following considerations apply when reviewing the following tables in this section:
In a number of places, the MPEG-2 variable name PMV is used to indicate the value of a motion vector. This
notation is used to distinguish between the PMV variable as defined in MPEG-2, which is in frame
coordinates, and a motion vector that may be in field coordinates (in other words, at half-vertical resolution).
In all cases, PMV refers to the value of PMV after it has been updated by the current motion vector value (as
specified in MPEG-2 video Section 7.6.3.1).
The definitions of vector'[2][0] and vector'[3][0] are found in MPEG-2 Section 7.6.3.6. The left-shift operation
shown indicates that the vertical component is modified to frame coordinates.
In both "no motion" cases (0,0,0), the macroblock parameters emulate a forward prediction macroblock
(0,1,0) with a zero-valued motion vector. (See also MPEG-2 Section 7.6.3.5.)
The values shown for MotionType in single quotes are binary representations (the first number is for bit 9
and the second is for bit 8).
The left-shift operator in the first table applies only to the second value shown.
Frame -Structured Pictures
The following table shows the valid combinations of element settings for frame-structured pictures (when the
bPicStructure member of the DXVA_PictureParameters structure is equal to 3).
INTRAMACROBLOC
K,
MOTIONFORWARD
, MOTIONTYPE(MEA MVECTOR[0]MVER MVECTOR[1]MVER MVECTOR[2]MVER MVECTOR[3]MVER
MOTIONBACKWAR NING DEPENDS ON TFIELDSEL_0 (1ST, TFIELDSEL_1 (1ST, TFIELDSEL_2 (2ND, TFIELDSEL_3 (2ND,
D PICTURE TYPE) DIR1) DIR2) DIR1) DIR2)
1,0,0 (intra) '00' (intra) - - - -

- - - -
0,0,0 (no '10' (no 0 - - -

motion) motion)
- - - -
0,1,0 '10' (frame PMV[0][0] - - -

MC)
- - - -
0,0,1 '10' (frame - PMV[0][1] - -

MC)
- - - -
0,1,1 '10' (frame PMV[0][0] PMV[0][1] - -

MC)
- - - -
0,1,0 '01' (field PMV[0][0] - PMV[1][0] -

MC)
sel[0][0] - sel[1][0] -
0,0,1 '01' (field - PMV[0][1] - PMV[1][1]

MC)
- sel[0][1] - sel[1][1]
0,1,1 '01' (field PMV[0][0] PMV[0][1] PMV[1][0] PMV[1][1]

MC)
sel[0][0] sel[0][1] sel[1][0] sel[1][1]
INTRAMACROBLOC
K,
MOTIONFORWARD
, MOTIONTYPE(MEA MVECTOR[0]MVER MVECTOR[1]MVER MVECTOR[2]MVER MVECTOR[3]MVER
MOTIONBACKWAR NING DEPENDS ON TFIELDSEL_0 (1ST, TFIELDSEL_1 (1ST, TFIELDSEL_2 (2ND, TFIELDSEL_3 (2ND,
0,1,0 '11' (dual- PMV[0][0] vector'[2][0] PMV[0][0] vector'[3][0]

prime) [0], [0],
0 (top) 1
vector'[2][0][1] vector'[3][0][1]
<<1 <<1
1 (bottom) 0
Field-Structured Pictures
The following table shows the valid combinations of element settings for field-structured pictures (when the
bPicStructure member of the DXVA_PictureParameters structure is equal to 1 or 2).
INTRAMACROBLOC
K,
MOTIONFORWARD MOTIONTYPE
, (MEANING MVECTOR[0]MVER MVECTOR[1]MVER MVECTOR[2]MVER MVECTOR[3]MVER
MOTIONBACKWAR DEPENDS ON TFIELDSEL_0 (1ST, TFIELDSEL_1 (1ST, TFIELDSEL_2 (2ND, TFIELDSEL_3(2ND,
1,0,0 (intra) '00' (intra) - - - -

- - - -
0,0,0 (no '01' (no 0 - - -

motion) motion)
PicCurrentFie - - -
ld
0,1,0 '01' (field PMV[0][0] - - -

MC)
sel[0][0] - - -
0,0,1 '01' (field - PMV[0][1] - -

MC)
- sel[0][1] - -
0,1,1 '01' (field PMV[0][0] PMV[0][1] - -

MC)
sel[0][0] sel[0][1] - -
0,1,0 '10' (16x8 PMV[0][0] - PMV[1][0] -

MC)
sel[0][0] - sel[1][0] -
0,0,1 '10' (16x8 - PMV[0][1] - PMV[1][1]

MC)
- sel[0][1] - sel[1][1]
INTRAMACROBLOC
K,
MOTIONFORWARD MOTIONTYPE
, (MEANING MVECTOR[0]MVER MVECTOR[1]MVER MVECTOR[2]MVER MVECTOR[3]MVER
MOTIONBACKWAR DEPENDS ON TFIELDSEL_0 (1ST, TFIELDSEL_1 (1ST, TFIELDSEL_2 (2ND, TFIELDSEL_3(2ND,
0,1,1 '10' (16x8 PMV[0][0] PMV[0][1] PMV[1][0] PMV[1][1]

MC)
sel[0][0] sel[0][1] sel[1][0] sel[1][1]
0,1,0 '11' (dual- PMV[0][0] vector'[2][0] - -

prime)
PicCurrentFie PicCurrentFie - -
ld ld
Additional Valid Element Settings for Field and Frame Pictures

The remaining allowed cases for frame-structured and field-structured pictures are as follows.
VALUES DESCRIPTION
H261LoopFilter = 1 Indicates that one forward-motion vector is sent in

MVector[0] and that the H.261 loop filter is active for the
bPicOBMC = 0 forward prediction in the macroblock.
Motion4MV = 0 MotionForward must be 1 in this case, and
IntraMacroblock and MotionBackward must both be zero.
bPicOBMC = 0 Indicates that four forward-motion vectors are sent in

MVector[0] through MVector[3]. MotionForward must
Motion4MV = 1 be 1 in this case, and IntraMacroblock must be zero.
If MotionBackward is 1, a fifth motion vector is sent for
backward prediction in MVector[4].
bPicOBMC = 1 Indicates that 10 forward-motion vectors are sent in

MVector[0] through MVector[9] for specification of
Motion4MV = 0 OBMC motion, and that the values of the first four such
motion vectors are all equal.
If MotionBackward is 1, an eleventh motion vector is sent
for backward prediction in MVector[10].
bPicOBMC = 1 Indicates that 10 forward-motion vectors are sent in

MVector[0] through MVector[9] for specification of
Motion4MV = 1 OBMC motion, and that the values of the first four such
motion vectors may differ from each other.
If MotionBackward is 1, an eleventh motion vector is sent
for backward prediction in MVector[10].
Note The average operator is mathematically identical ((s1+s2+1)>>1) for MPEG-1, MPEG-2 half-sample
prediction filtering, bidirectional averaging, and dual prime same-opposite parity combining. The H.263
bidirectional averaging operator does not add the offset of +1 prior to right-shifting. The
bBidirectionalAveragingMode member of DXVA_PictureParameters determines which of these methods is
used.
Interaction Between OBMC and INTER4V in H.263
Some details about the interactions between H.263's OBMC, INTER4V, B, EP, and B in PB frames may be helpful:
No current configuration of the H.263 standard will exercise the case in which bPicOBMC is equal to 1,
Motion4MV is equal to 1, and MotionBackward is equal to 1.
OBMC cannot be used in a H.263 B or EP picture.
OBMC cannot be used in the B part of a H.263 PB picture.
INTER4V cannot be used in a H.263 B or EP picture.
If INTER4V is used in the macroblock of a H.263 P picture and this macroblock is later used as the reference
macroblock for "direct" prediction in a H.263 B picture, OBMC is not used in the direct prediction. This is
because four motion vectors are used according to H.263 Annex M, which uses them like H.263 Annex G,
which does not apply the OBMC. H.263 never requires both OBMC and backward prediction at the same
time, and never uses INTER4V in a backward direction.
dwMB_SNL
The dwMB_SNL structure member specifies the number of skipped macroblocks to be generated following the
current macroblock, and indicates the location of the residual difference data for the blocks of the current
macroblock. This member contains two variables: MBskipsFollowing in the most significant 8 bits and
MBdataLocation in the least significant 24 bits. MBskipsFollowing indicates the number of skipped macroblocks to
be generated following the current macroblock. MBdataLocation is an index into the residual difference block data
buffer. This index indicates the location of the residual difference data for the blocks of the current macroblock,
expressed as a multiple of 32 bits.
Each skipped macroblock indicated by MBskipsFollowing must be generated in a manner mathematically equivalent
to incrementing the value of wMBaddress and then repeating the same macroblock control command.
Any macroblock control command with a nonzero value for MBskipsFollowing specifies how motion-compensated
prediction is performed for each macroblock to be skipped, and is equivalent (except for the value of
MBskipsFollowing) to an explicit nonskip specification of the generation of the first of the series of skipped
macroblocks. Thus, whenever MBskipsFollowing is not zero, the following structure members and variables must all
be equal to zero: Motion4MV IntraMacroblock wPatternCode and wPCOverflow.
The MBdataLocation variable must be zero for the first macroblock in the macroblock control command buffer.
MBdataLocation may contain any value if wPatternCode is zero. When wPatternCode is zero, decoders are
recommended but not required to set this value either to zero or to the same value as in the next macroblock
control command.
For more information about generating skipped macroblocks, see Generating Skipped Macroblocks.
wPatternCode
The wPatternCode structure member indicates whether residual difference data is sent for each block in the
macroblock.
Bit (11- i) of wPatternCode (where bit zero is the least significant bit) indicates whether residual difference data is
sent for block i, where i is the index of the block within the macroblock as specified in MPEG-2 video figures 6-10,
6-11, and 6-12 (raster-scan order for Y, followed by 4:2:0 blocks of Cb in raster-scan order, followed by 4:2:0 blocks
of Cr, followed by 4:2:2 blocks of Cb, followed by 4:2:2 blocks of Cr, followed by 4:4:4 blocks of Cb, followed by 4:4:4
blocks of Cr). The data for the coded blocks (those blocks having bit 11-i equal to 1) is found in the residual coding
buffer in the same indexing order (increasing i). For 4:2:0 MPEG-2 data, the value of wPatternCode corresponds to
shifting the decoded value of CBP (Coded Block Pattern) to the left by six bit positions (those lower bit positions
being used for 4:2:2 and 4:4:4 chroma formats).
If the bConfigSpatialResidInterleaved member of DXVA_ConfigPictureDecode is 1, host-based residual
differences are sent in a chroma-interleaved form matching that of the YUV pixel format in use. In this case, each Cb
and spatially corresponding Cr pair of blocks is treated as a single residual difference structure unit. This does not
alter the value or meaning of wPatternCode, but it implies that both members of each pair of Cb and Cr data
blocks are sent whenever either of these data blocks has the corresponding bit (bit 7 or bit 6) set in wPatternCode.
If the bit in wPatternCode for a particular data block is zero, the corresponding residual difference data values
must be sent as zero whenever the pairing of the Cb and Cr blocks necessitates sending a residual difference data
block for a block with a wPatternCode bit equal to zero.
Second Part of Macroblock Control Command
Structure
The second part of a generic macroblock control command structure contains three variations, depending on the
configuration of the picture decoding process:
1. If HostResidDiff (bit 11 in the wMBtype member) is equal to 1, the next element of the macroblock control
command is wPC_Overflow. The wPC_Overflow member, if used, specifies which blocks of the macroblock
use overflow residual difference data. wPC_Overflow is followed by a DWORD equal to zero.
2. If HostResidDiff (bit 11 in the wMBtype member) is equal to zero and the bChromaFormat member of
DXVA_PictureParameters is equal to 1, the next element of the macroblock control command is
bNumCoef, a six-element array of bytes. The bNumCoef member indicates the number of coefficients in
the residual difference data buffer for each block of the macroblock.
3. If HostResidDiff (bit 11 in the wMBtype element) is equal to zero and the bChromaFormat member of
DXVA_PictureParameters is not equal to 1, the next element of the macroblock control command is
wTotalNumCoef. This is followed by a DWORD equal to zero.
wPC_Overflow
The wPC_Overflow structure member specifies which blocks of the macroblock use overflow residual difference
data.
When using host-based residual difference decoding (when HostResidDiff is equal to 1) with the
bPicOverflowBlocks member of DXVA_PictureParameters equal to 1 and IntraMacroblock equal to zero (the 8-
8 overflow method), wPC_Overflow contains the pattern code of the overflow blocks specified in the same manner
as wPatternCode. The data for the coded overflow blocks (those blocks having bit 11 minus i equal to 1) is found
in the residual coding buffer in the same indexing order (increasing i).
bNumCoef
The bNumCoef structure member is an array of six elements. The ith element of the bNumCoef array contains the
number of coefficients in the residual difference data buffer for each block i of the macroblock, where i is the index
of the block within the macroblock as specified in MPEG-2 video Figures 6-10, 6-11, and 6-12 (raster-scan order for
Y, followed by Cb, followed by Cr). bNumCoef is used only when HostResidDiff is zero and the bChromaFormat
member of DXVA_PictureParameters is 1 (4:2:0). If used in 4:2:2 or 4:4:4 formats, it will increase the size of typical
macroblock control commands past a critical memory alignment boundary, so only an EOB within the transform
coefficient structure is used for determining the number of coefficients in each block in non-4:2:0 cases. The
purpose of bNumCoef is to indicate the quantity of data present for each block in the residual difference data
buffer, expressed as the number of coefficients present. When the bConfig4GroupedCoefs member of
DXVA_ConfigPictureDecode is 1, bNumCoef may contain either the actual number of coefficients sent for the
block or that value rounded up to be a multiple of four. The data for these coefficients is found in the residual
difference buffer in the same order.
wTotalNumCoef
The wTotalNumCoef structure member indicates the total number of coefficients in the residual difference data
buffer for the entire macroblock. This member is used only when HostResidDiff is zero and the bChromaFormat
member of DXVA_PictureParameters is not equal to 1 (4:2:0).
Third Part of Macroblock Control Command Structure
If the bPicIntra member of DXVA_PictureParameters is 1, the macroblock control command structure ends with
the data described in the Second Part of Macroblock Control Command Structure. If bPicIntra is zero, the following
additional data elements are included in the macroblock control command to control the motion compensation
process. The data that follows is an array of DXVA_MVvalue structures contained in the MVector member of the
macroblock control command structure. The number of elements in MVector depends on the type of picture
specified by the members of DXVA_PictureParameters in the following table.
NUMBER OF ELEMENTS IN
BPICOBMC BPICBINPB BPIC4MVALLOWED MVECTOR
0 0 0 4
0 0 1 4
0 1 0 4
0 1 1 5
1 0 0 10
1 0 1 10
1 1 0 11
1 1 1 11
Note The number of motion vectors specified in the MVector arrays for the macroblock control command
structures defined in the dxva.h file is four, as this is the most commonly used form of the structure.
Fourth Part of Macroblock Control Command
Structure
If the bPicIntra and the bMV_RPS members of DXVA_PictureParameters are zero, the macroblock control
command structure ends with the data described in Third Part of Macroblock Control Command Structure. The
macroblock control command structure ends with the third part of the structure padded with zero-valued data, if
necessary, to align the next macroblock control command to a 16-byte boundary.
If the bPicIntra member of DXVA_PictureParameters is zero and the bMV_RPS member of
DXVA_PictureParameters is 1, the fourth part of the macroblock control command structure is an array of bytes
called bRefPicSelect. The number of elements in that array is the same as the number of elements in the MVector
array shown in the preceding table. Each element of the array specifies the index of the uncompressed surface
associated with the corresponding motion vector found in the MVector array. Then, the macroblock control
command structure ends and is padded with zero-valued data, if necessary, to align the next macroblock control
command structure to a 16-byte boundary.
Low-Level IDCT Processing Elements
The DirectX VA interface supports various ways of handling low-level inverse discrete-cosine transform (IDCT).
There are two fundamental types of operation:
1. Off-host IDCT: Passing macroblocks of transform coefficients to the accelerator for external IDCT, picture
reconstruction, and reconstruction clipping.
2. Host-based IDCT: Performing an IDCT on the host and passing blocks of spatial-domain results to the
accelerator for external picture reconstruction and reconstruction clipping.
In both cases, the basic inverse-quantization process, pre-IDCT range saturation, MPEG-2 mismatch control (if
necessary), and intra-DC offset (if necessary) are performed on the host. In both cases, the final picture
reconstruction and reconstruction clipping are done on the accelerator.
The inverse quantization, pre-IDCT saturation, mismatch control, intra-DC offset, IDCT, picture reconstruction, and
reconstruction clipping processes are defined in the following steps. The DXVA_QmatrixData structure loads
inverse-quantization matrix data for compressed video picture decoding. (The values of BPP, WT , and HT should be
assumed to be equal to 8, unless otherwise specified by the DXVA_PictureParameters structure.)
1. Perform inverse quantization as necessary (including application of any inverse-quantization weighting
matrices) to create a set of IDCT coefficient values F"(u,v) from entropy-coded quantization indices. This is
performed by the host.
2. Saturate each reconstructed coefficient value F"(u,v) of the transform coefficient block to obtain a value
F'(u,v) within the restricted allowable range as defined in the following formula. This is performed by the
host.
3. Perform mismatch control for MPEG-2. (This stage of processing is needed for MPEG-2 only.) Mismatch
control is performed by summing the saturated values of all coefficients in the macroblock (this is equivalent
to XORing their least significant bits). If the sum is even, 1 is subtracted from the saturated value of the last
coefficient F'(WT -1,HT -1). If the sum is odd, the saturated value of F'(WT -1,HT -1)*is used as is, without
alteration. The coefficient values that are created after saturation and mismatch control are referred
to as *F(u,v) in this documentation. This is performed by the host. **Note MPEG-1 has a different
form of mismatch control that consists of altering the value by plus or minus 1 for each coefficient that
would otherwise have an even value after inverse quantization. H.263 does not require the mismatch control
described in this section. In any case, mismatch control is the host's responsibility if needed.
4. Add an intra-DC offset (if necessary) to all intra blocks so all intra blocks represent a difference relative to a
spatial reference prediction value of 2(BPP-1). Such an offset is necessary for all the referenced video-coding
standards (H.261, H.263, MPEG-1, MPEG-2, and MPEG-4), except when HostResidDiff is 1 and the
bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is 1. The intra DC
offset has the value (2(BPP-1)) * sqrt(WT HT ) in the transform domain. This value is 1024 in all cases except
MPEG-4, which allows BPP to be greater than 8. This is performed by the host.
5. Perform the inverse discrete cosine transform (IDCT) on either the host or the accelerator. The IDCT is
specified by the following formula, where: C(u) = 1 for u = 0, otherwise C(u) = sqrt(2) C(v) = 1 for v = 0,
otherwise C(v) = sqrt(2) x and y are the horizontal and vertical spatial coordinates in the pixel domain u and
v are the transform-domain horizontal and vertical frequency coordinates WT and HT are the width and
height of the transform block (generally both are 8).
6. Add the spatial-domain residual information to the motion-compensated prediction value for nonintra
blocks or to the constant reference value for intra blocks to perform picture reconstruction on the
accelerator. The constant reference value for intra blocks is 2(BPP-1) except when HostResidDiff (bit 10 of the
wMBtype member of the DXVA_MBctrl_P_HostResidDiff_1) structure is 1 and the
bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is 1. In the latter
case, the constant is zero.
7. Clip the picture reconstruction to a range from zero through (2BPP )-1 and store the final resulting picture
sample values on the accelerator.
Off-Host IDCT
The transfer of macroblock inverse discrete-cosine transform (IDCT) coefficient data for off-host IDCT processing is
done using a buffer of scan index and value information to define and specify the transform equations. Index
information is sent as 16-bit words (although only 6-bit quantities are really needed for 8x8 transform blocks).
Transform coefficient value information is sent as signed 16-bit words (although only 12 bits are needed for the
usual case of 8x8 transform blocks and BPP equal to 8).
Transform coefficients are sent in either the DXVA_TCoefSingle structure or the DXVA_TCoef4Group structure. If
the bConfig4GroupedCoefs member of the DXVA_ConfigPictureDecode structure is zero, coefficients are sent
individually using DXVA_TCoefSingle structures. If bConfig4GroupedCoefs is 1, coefficients are sent in groups of
four using DXVA_TCoef4Group structures.
Host-Based IDCT
IDCT may be performed on the host, with the result passed through the DirectX VA API in the spatial domain. There
are two supported methods for sending the results from the host to the accelerator: 16-bit and 8-8 overflow. The
bConfigSpatialResid8 member of the DXVA_ConfigPictureDecodestructure indicates which method is used.
16-bit Host-Based IDCT Processing
The macroblock control structures used with 16-bit host-based residual difference decoding are
DXVA_MBctrl_I_HostResidDiff_1 and DXVA_MBctrl_P_HostResidDiff_1.
When sending spatial-domain residual difference data using the 16-bit method, blocks of 16-bit data are sent
sequentially. Each block of spatial-domain data consists of 64 16-bit integers.
If BPP, as derived from the DXVA_PictureParameters structure, is greater than 8, only the 16-bit method can be
used. If the bPicIntra member of the DXVA_PictureParameters structure is 1 and BPP is 8, the 8-8 overflow method
is used. If IntraMacroblock is zero, the 16-bit residual difference samples are sent as signed quantities to be added
to the motion-compensated prediction values. If IntraMacroblock is 1, the 16-bit samples are sent as follows:
If the bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is 1, the
samples are sent as unsigned quantities relative to the constant reference value of zero. For example, mid-
level gray would be represented as Y=2(BPP-1), Cb=2(BPP-1), Cr=2(BPP-1).
If the bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is zero, the
samples are sent as signed quantities relative to the constant reference value of 2(BPP-1). For example, mid-
level gray would be represented as Y=0, Cb=0, Cr=0.
Blocks of data are sent sequentially, in the order specified by scanning the wPatternCode member of the
macroblock control structure for bits with values of 1 from the most significant bit to least significant bit.
No clipping of the residual difference values can be assumed to have been performed on the host, unless the
bConfigSpatialHost8or9Clipping member of the DXVA_ConfigPictureDecode structure is 1. Although only a
BPP+1 bit range is needed to adequately represent the spatial-domain difference data, the output of some IDCT
implementations will produce numbers beyond this range unless they are clipped.
Note The accelerator must work with at least a 15-bit range of values. Although video-coding standards typically
specify clipping of a difference value prior to adding it to a prediction value (that is, 9-bit clipping in 8-bit-per-
sample video), this clipping stage is actually unnecessary because it has no effect on the resulting decoded output
picture. It is not assumed that this clipping occurs unless necessary for the accelerator hardware as indicated by the
bConfigSpatialHost8or9Clipping member of the DXVA_ConfigPictureDecode structure being set to 1.
8-8 Overflow Host-Based IDCT Processing
The macroblock control structures used with 8-8 overflow host-based residual difference decoding are
DXVA_MBctrl_I_HostResidDiff_1 and DXVA_MBctrl_P_HostResidDiff_1.
If the BPP variable derived from the DXVA_PictureParameters structure is 8, the 8-8 overflow spatial-domain
residual difference method may be used. Its use is required if the bPicIntra member of this structure is 1 and BPP
is 8. In this case, each spatial-domain difference value is represented using only 8 bits. When sending data using the
8-8 overflow method, blocks of 8-bit data are sent sequentially. Each block of 8-bit spatial-domain residual
difference data consists of 64 bytes containing the values of the data in conventional raster scan order (the
elements of the first row in order, followed by the elements of the second row, and so on).
If IntraMacroblock in the macroblock control command is zero, the 8-bit spatial-domain residual difference samples
are signed differences to be added or subtracted (as determined from the bConfigResid8Subtraction member of
the DXVA_ConfigPictureDecode structure and whether the sample is in a first pass block or an overflow block)
relative to the motion compensation prediction value.
If IntraMacroblock (bit 0 in the wMBtype member of the macroblock structure) is zero, and the difference to be
represented for some pixel in a block is too large to represent using only 8 bits, a second overflow block of 8-bit
spatial-domain residual difference samples is sent.
If IntraMacroblock (bit 0 in the wMBtype member of the macroblock structure) is 1, the 8-bit spatial-domain
residual difference samples are set as follows:
If the bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is 1, the 8-bit
samples are sent as unsigned quantities relative to the constant reference value of zero. For example, mid-
level gray would be represented as Y=2(BPP-1), Cb=2(BPP-1), Cr=2(BPP-1).
If the bConfigIntraResidUnsigned member of the DXVA_ConfigPictureDecode structure is zero, the 8-
bit samples are sent as signed quantities relative to the constant reference value of 2(BPP-1). For example,
mid-level gray would be represented as Y=0, Cb=0, Cr=0.
If IntraMacroblock is 1, 8-bit overflow blocks are not sent.
Blocks of data are sent sequentially, in the order specified by scanning the wPatternCode member of the
macroblock control command for bits with values of 1, from most significant to least significant. All necessary 8-bit
overflow blocks are then sent as specified by the wPC_Overflow member of the macroblock control command.
Such overflow blocks are subtracted rather than added if the bConfigResid8Subtraction member of the
DXVA_ConfigPictureDecode structure is 1. The first pass of 8-bit differences for each nonintra macroblock is
added. If the bPicOverflowBlocks member of the DXVA_PictureParameters structure is zero or the
IntraMacroblock member of the macroblock control command is 1, there is no second pass. If
bPicOverflowBlocks is 1, IntraMacroblock is zero, and bConfigResid8Subtraction is 1, the second pass of 8-bit
differences for each nonintra macroblock is subtracted. If bPicOverflowBlocks is 1, IntraMacroblock is zero, and
bConfigResid8Subtraction is zero, the second pass of 8-bit differences for each nonintra macroblock is added.
If any sample is nonzero in both an original 8-bit block and in a corresponding 8-bit overflow block, the following
rules apply:
If bConfigResid8Subtraction is zero, the sign of the sample must be the same in both blocks.
If bConfigResid8Subtraction is 1, the sign of the sample in the original 8-bit block must be the same as the
sign of negative 1 times the value of the sample in the corresponding overflow block.
These rules allow the sample to be added to the prediction picture with 8-bit clipping of the result after each of the
two passes.
Note Using 8-bit differences with overflow blocks with bConfigResid8Subtraction equal to zero (which results in
adding two 8-bit differences for each overflow block) cannot represent a residual difference value of +255 if
IntraMacroblock is zero. (The largest difference value that can be represented this way is 127+127=254.) This
makes the 8-8 overflow host-based IDCT method not strictly compliant with video-coding standards when
bConfigResid8Subtraction is zero. However, this format is supported because it is used in some existing
implementations, is more efficient than 16-bit sample use in terms of the amount of data needed to represent a
picture, and does not generally result in any meaningful degradation of video quality.
Deblocking Filter Control
Deblocking filter control commands, if present, are sent once for each luminance block in a macroblock and are
sent once for each pair of chrominance blocks. The filter control commands are sent in raster-scan order within the
macroblock. Filter control commands are sent for all blocks for luminance before any blocks for chrominance. Filter
control commands are then sent for one chrominance 4:2:0 block, then for one chrominance 4:2:2 block (if 4:2:2 is
in use), then for two chrominance 4:4:4 commands if needed (the same filtering is applied to both chrominance
components).
The filtering for each block is done by specifying deblocking across the top edge of the block, followed by
deblocking across the left edge of the block. Deblocking is specified for chrominance only once, and the same
deblocking commands are used for both the Cb and Cr components. For example, deblocking of a 16x16
macroblock that contains 4:2:0 data using 8x8 blocks is done by sending four sets of two (top and left) edge-
filtering commands for the luminance blocks, followed by one set of two edge-filtering commands for the two
chrominance blocks.
Edge Filtering Command Bytes
Each edge filtering control command consists of a single byte. The DXVA_DeblockingEdgeControl constant defined
in dxva.h defines how deblocking edges are processed. The 7 most significant bits of the byte contain the
EdgeFilterStrength variable, and the least significant bit is the EdgeFilterOn flag.
Edge filtering is performed as specified in H.263 Annex J. The EdgeFilterStrength variable specifies the strength of
the filtering to be performed. The EdgeFilterOn flag specifies whether filtering is to be done. EdgeFilterOn is 1 if the
edge is to be filtered, and zero if not.
Edge filtering (for the edges with EdgeFilterOn equal to 1) is performed with the strength value specified by
EdgeFilterStrength and with clipping the output to the range of 0 to 2(BPP) - 1. Top-edge filtering for all blocks is
performed before left-edge filtering for any blocks because the values of the samples used for top-edge filtering
must be those reconstructed values prior to any deblocking filtering for left-edge filtering.
If the bPicDeblockConfined member of the DXVA_PictureParameters structure indicates that sample values of
macroblocks outside of the current deblocking filter command buffer are not affected, the EdgeFilterOn flag is zero
for all edges at the left and top of the region covered by the macroblocks with deblocking filter commands in the
buffer.
Read-Back Buffers
One read-back command buffer is passed to the accelerator when the bPicReadbackRequests member of the
DXVA_PictureParameters structure is 1. The data in this buffer commands the accelerator to return the resulting
final picture macroblock data (after deblocking, if applicable) to the host. If an encryption protocol is in use, the
accelerator may respond to read-back requests by returning an error indication, erroneous data, or encrypted data
(as may be specified by the encryption protocol).
The read-back command buffer passed to the accelerator must contain read-back commands consisting of a single
wMBaddress member of the macroblock control command for the macroblock to be read. The wMBaddress
member is a 16-bit value that specifies the macroblock address of the current macroblock in raster-scan order.
Raster-scan order (based on the wPicWidthInMBminus1 and wPicHeightInMBminus1 members of the
DXVA_PictureParameters structure) is defined as follows:
Zero is the address of the top-left macroblock.
wPicWidthInMBminus1 is the address of the top-right macroblock.
wPicHeightInMBminus1 x (wPicWidthInMBminus1+1) is the address of the lower-left macroblock.
(wPicHeightInMBminus1+1) x (wPicWidthInMBminus1+1)-1 is the address of the lower-right
macroblock.
If BPP as specified in the bBPPminus1 member of the DXVA_PictureParameters structure is 8, the macroblock
data is returned in the form of 8-bit unsigned values (thus, black is nominally Y=16, Cb=Cr=128, and white is
nominally Y=235, Cb=Cr=128). If BPP is greater than 8, the data is returned in the form of 16-bit unsigned values.
The macroblock data is returned from the accelerator to the host in the form of a copy of the read-back command
buffer itself, followed by padding to the next 32-byte alignment boundary. Then, the macroblock data values for
luminance and chrominance data are returned in the order sent in the read-back command buffer, in the form of 64
samples per block for each block in each macroblock.
Residual difference blocks within a macroblock are returned in the order specified in MPEG-2 Figures 6-10, 6-11,
and 6-12 (raster-scan order for Y blocks of the macroblock, followed by the 4:2:0 block of Cb, followed by the 4:2:0
block of Cr. If in a 4:2:2 or a 4:4:4 sampling operation, the 4:2:0 blocks are followed by the 4:2:2 block of Cb,
followed by the 4:2:2 block of Cr. If in 4:4:4 sampling operation, the 4:2:2 blocks are followed by the 4:4:4 blocks of
Cb, followed by the 4:4:4 blocks of Cr).
Off-Host VLD Bitstream Decoding Operation
When variable-length decoding of raw bitstream data is performed on the accelerator, the data sent by the host for
the decoding of the picture is divided into the following buffer types.
BUFFER TYPE DESCRIPTION
Inverse-quantization matrix Provides information about how to perform inverse-

quantization of the bitstream data.
Slice control Provides information about the location of start codes and
data within a corresponding bitstream data buffer.
Bitstream Contains raw streams of data encoded according to a

particular video coding specification.
Inverse -Quantization Matrix Buffers

An inverse-quantization matrix buffer is sent to initialize inverse-quantization matrices for off-host bitstream
decoding. Inverse-quantization matrix buffers provide information about how to decode all current and subsequent
video in the bitstream, until a new inverse-quantization matrix buffer is provided. (Thus, inverse-quantization
matrices are persistent.) No more than one inverse-quantization matrix buffer can be sent from the host to the
accelerator at a time. The DXVA_QmatrixData structure loads quantization matrix data for compressed video-
picture decoding.
Slice -Control Buffers
Slice-control buffers guide the operation of off-host VLD bitstream processing. The host software decoder
determines the location of slice-level resynchronization points in the bitstream. A slice is defined to be a
multimacroblock layer that includes a resynchronization point in the bitstream data. In H.261 bitstreams, an H.261
Group Of Blocks (GOB) is considered a slice. In H.263 bitstreams, a sequence of one or more H.263 GOBs starting
with a GOB start code and containing no additional GOB start codes is considered a slice. The slice-control buffer
contains an array of DXVA_SliceInfo slice-control structures, which apply to the contents of a corresponding
bitstream data buffer.
Bitstream Buffers
If a bitstream buffer is used, the buffer simply contains raw bytes from a video bitstream. This type of buffer is used
for off-host decoding, including low-level bitstream parsing with variable-length decoding.
Certain restrictions are imposed on the contents of bitstream buffers, in order that the data received by accelerators
is in a recognizable and efficient form.
1. Except for MPEG-1 and MPEG-2, the first bitstream buffer for each picture must start with all data, if any,
following the end of all data for any prior picture that precedes the first slice for the current picture in the
bitstream (for example, the sequence header or picture header).
2. For MPEG-1 and MPEG-2, the first bitstream buffer for each picture must start with the slice start code of the
first slice of the picture (for example, no sequence header or picture header), because all relevant data is
provided in other parameters.
3. If the start of a slice of bitstream data is located within a particular bitstream buffer, the end of that slice must
also be located within that same buffer unless the buffer that contains the start of the slice has reached its
allocated size.
The decoder should manage the filling of the bitstream buffers to avoid placing the data for one slice into more
than one buffer.
Alpha-Blend Data Loading
When the bDXVA_Func variable is equal to 2, the operation specified is the loading of data specifying an alpha-
blending surface to be blended with video data. There are three ways that the alpha-blending data can be loaded:
A 16-entry AYUV palette with an index-alpha 4-4 (IA44) or alpha-index 4-4 (AI44) alpha-blending surface
A 16-entry AYUV palette with DPXD, Highlight, and DCCMD data
An AYUV graphic surface
The DXVA_ConfigAlphaLoad structure determines which of these methods is used.
Loading an AYUV Alpha-Blending Surface
An AYUV alpha-blending surface is defined as an array of samples of 32 bits each in the DXVA_AYUVsample2
structure. This surface can be used as the source for blending a graphic with decoded video pictures.
The width and height of the AYUV alpha-blending surface are specified in the associated buffer description list.
Loading a 16-Entry YUV Palette
A 16-entry YUV palette is defined as an array of 16 DXVA_AYUVsample2 structures. This palette is used along
with an IA44 or AI44 alpha-blending surface. The palette array is sent to the accelerator in an AYUV alpha-blending
sample buffer (buffer type 8). In this case, the bSampleAlpha8 member of the DXVA_AYUVsample2 structure for
each sample has no meaning and must be zero.
The YUV palette can be used to create the source for blending a graphic with decoded video pictures. This palette
can be used to create the graphic source along with either
An IA44/AI44 alpha-blending surface, or
A DPXD alpha-blending surface, a highlight buffer, and DCCMD data
Loading an AYUV Surface
Rather than loading just a 16-entry palette, an entire image graphic can simply be loaded directly as an AYUV
image to specify the graphic content. In this case, the AYUV graphic is sent to the accelerator in an AYUV alpha-
blending sample buffer (buffer type 8) as specified in the DXVA_BufferDescription structure.
Loading an IA44/AI44 Alpha-Blending Surface
An index-alpha 4-4 (IA44) alpha-blending surface is defined as an array of 8-bit samples, each of which is
structured as a byte. This byte is referred to as DXVA_IA44sample and is defined in dxva.h. The 4 most significant
bits of this byte contain an index referred to as SampleIndex4, and the 4 least significant bits of this byte contain an
alpha value referred to as SampleAlpha4.
An alpha-index 4-4 (AI44) alpha-blending surface is defined as an array of 8-bit samples, each of which is
structured as a byte. This byte is referred to as DXVA_AI44sample and is defined in dxva.h. The 4 most significant
bits of this byte contain an alpha value referred to as SampleAlpha4 and the 4 least significant bits of this byte
contain an index referred to as SampleIndex4.
The SampleIndex4 field for both DXVA_IA44sample and DXVA_AI44sample contains the index into the 16-entry
palette for the sample.
The SampleAlpha4 field for both DXVA_IA44sample and DXVA_AI44sample contains the following values to specify
the opacity of the sample:
Zero indicates that the sample is transparent (so that the palette entry for SampleIndex4 has no effect on the
resulting blended picture). For a zero value of SampleAlpha4, the blend specified is to use the picture value
without alteration.
A value of 15 indicates that the sample is opaque (so that the palette entry for SampleIndex4 completely
determines the resulting blended picture).
Nonzero values indicate that the blend specified is found by the following expression:
((SampleAlpha4+1) X graphic_value + (15-SampleAlpha4) X picture_value + 8) >> 4
The width and height of the IA44 alpha-blending surface are specified in the associated buffer description list.
Loading a DPXD Alpha-Blending Surface
A decoded PXD (DPXD) alpha-blending surface is defined as an array of bytes for a frame. Each byte of frame data
contains four 2-bit samples. Each 2-bit sample is used as an index into a four-color table determined by highlight
and DCCMD (display control command) data. The result of the combination of DPXD, highlight, and DCCMD is
equivalent to an IA44 surface, and is used with a 16-entry YUV palette for blending. If the DPXD alpha-blending
surface is treated as an array of bytes, the index of the first 2-bit sample is in the most significant bits of the first
byte of DPXD data, the next sample is in the next 2 bits, the third sample is in the next 2 bits, the fourth sample is in
the least significant bits, the fifth sample is in the most significant bits of the next byte, and so on.
The DPXD alpha-blending surface may be created from the PXD information about a DVD. (The PXD data is
recorded on a DVD in a run-length encoded format.) The creation of DPXD from the PXD on a DVD requires the
host decoder to perform run-length decoding of the raw PXD data on the DVD.
The stride of the surface must be interpreted as the stride in bytes, not in 2-bit samples. However, the width and
height must be in 2-bit sample units.
Note The PXD on a DVD is in a field-structured interlaced format. The DPXD alpha-blending surface defined for
DirectX VA is not. The host is therefore responsible for interleaving the data from the two fields if forming DPXD
from DVD PXD data.
For more clarification of DVD subpicture definition and data field interpretation, see DVD Specifications for Read-
Only Disk: Part 3 - Video Specification (version 1.11, May 1999).
Loading Highlight Data
The DXVA_Highlight structure specifies a highlighted rectangular area of a subpicture, and is used along with
DCCMD data and a DPXD surface to create an alpha-blending surface. The highlight data is formatted in a manner
compatible with the DVD ROM specification. For further clarification of DVD subpicture definition and data field
interpretation, see DVD Specifications for Read-Only Disk: Part 3 - Video Specification (v. 1.11, May 1999).
Loading DCCMD Data
The DCCMD (display control command) data is formatted in a manner compatible with the DVD ROM specification,
and is to be applied along with highlight data to a DPXD surface to create an alpha-blending surface. The DCCMD
data buffer contents must consist of data formatted as a list of DVD DCCMDs. For further clarification of DVD
subpicture definition and data field interpretation, see DVD Specifications for Read-Only Disk: Part 3 - Video
Specification (version 1.11, May 1999).
Alpha-Blend Combination
When the bDXVA_Func variable is equal to 3, the operation specified is an alpha-blend combination. An alpha-
blend combination takes the last loaded alpha-blend source information and combines it with a reference picture
to create a blended picture for display.
The alpha-blend combination buffer specified by the dwTypeIndex member of the DXVA_BufferDescription
structure is used to generate a blended picture from a source picture and alpha-blending information. In the event
that the source and destination pictures are not in 4:4:4 format, every second sample (for example, the first, third,
fifth, and so on) of the graphic blending information in an AYUV alpha-blending surface or equivalent is applied to
the (lower resolution) source chrominance information in the vertical or horizontal direction, as applicable, to
produce the blended result.
The following structures are used to implement alpha-blend combination.
STRUCTURE DESCRIPTION
DXVA_BufferDescription Specifies the alpha-blend combination buffer to be used.

This buffer governs the generation of a blended picture
from a source picture and alpha-blending information.
DXVA_BlendCombination Specifies how a blended picture is generated from an

alpha-blend combination buffer.
DXVA_ConfigAlphaCombine Establishes the configuration for how alpha-blending

combination operations are to be performed.

MPEG-2 Pan-Scan Example
When the PictureSourceRect16thPel member of the DXVA_BlendCombination structure is used to select an

area specified by MPEG-2 video pan-scan parameters, the values for PictureSourceRect16thPel members can be
computed using the following expressions. These values should not violate the restrictions described for the alpha-
blend combination buffers when using PictureSourceRect16thPel. For more information, see the Remarks
section for the DXVA_BlendCombination structure.
These constraints could be violated with some MPEG-2 pan-scan parameters and, in particular, with some MPEG-2
DVD content, requiring some adjustments to the PictureSourceRect16thPel.
left = 8 x (horizontal_size - display_horizontal_size) - frame_centre_horizontal_offset
top = 8 x (vertical_size - display_vertical_size) - frame_centre_vertical_offset
right = left + (16 x display_horizontal_size)
bottom = top + (16 x display_vertical_size)
The PictureDestinationRect member of the DXVA_BlendCombination structure would then typically use the
following values:
left = 0 or 8 (as in DVD 704-Wide Non-Pan-Scan Picture Example)
top = 0
right = left + display_horizontal_size
bottom = top + display_vertical_size
DVD 4:3 Pan-Scan Within 16:9 Pictures Example
In DVD use of MPEG-2 for 4:3 pan-scan within 16:9 pictures, the pan-scan MPEG-2 variables must not violate the
restrictions specified in the DXVA_BlendCombination structure. These variables must also maintain the following
restrictions required by the DVD specification.
MPEG-2 VARIABLE VALUE
horizontal_size 720 or 704
vertical_size 480 or 576
display_horizontal_size 540
display_vertical_size vertical_size
frame_centre_vertical_offset Zero
frame_centre_horizontal_offset Less than or equal to 1440 for horizontal_size = 720

Less than or equal to 1312 for horizontal_size = 704
The formulation described in MPEG-2 Pan-Scan Example can then be applied directly in this case.
DVD 704-Wide Non-Pan-Scan Example
The use of MPEG-2 on DVD for 704-wide pictures requires a source rectangle that exceeds the boundaries of the
decoded picture (if using the method described in MPEG-2 Pan-Scan Example). In this case, the DVD specifies a
display_horizontal_size of 720 that exceeds the decoded picture's horizontal_size of 704. When the source
rectangle exceeds the boundaries of the decoded picture, the host software decoder is responsible for cropping the
source rectangle to keep it from reaching outside the allocated source area and for managing the destination
rectangle to adjust for the cropping.
The source rectangle is defined by the PictureSourceRect16thPel member of the DXVA_BlendCombination
structure (in one-sixteenth of a luminance sample spacing resolution) with the following values:
left = 0
right = 16 X (left + horizontal_size) = 11264
The picture destination rectangle is defined by the PictureDestinationRect member of the
DXVA_BlendCombination structure (in one-sixteenth of a luminance sample spacing resolution) by one of the
following two alternatives:
1. A rectangle with the following values:
left = (display_horizontal_size âˆ’ horizontal_size) / 2 = 8
right = left + horizontal_size = 712
2. A rectangle with the following values:
left = 0
In the second case, the rectangle indicated by the GraphicDestinationRect member of the
DXVA_BlendCombination structure is displaced to the left by eight samples to compensate for the shifted picture
destination.
The second of these two alternatives creates only the destination area that is used for the display.
DVD 352-Wide Example
DVD can use 352-wide pictures, which can be stretched to a width of 704 by use of the PictureSourceRect16thPel
member of the DXVA_BlendCombination structure (in one-sixteenth of a luminance sample spacing resolution).
The PictureSourceRect16thPel member defines a source rectangle with the following values:
left = 0
right = 16 X (left + horizontal_size) = 5632
The PictureDestinationRect member of the DXVA_BlendCombination structure defines two alternative
destination rectangles with the following values:
1. A destination rectangle with the following values:
left = 8
right = left + (2 X horizontal_size) = 712
2. A destination rectangle with the following values:
left = 0
In the second case, the rectangle indicated by the GraphicDestinationRect member of the
DXVA_BlendCombination structure is displaced to the left by eight to compensate for the shifted picture destination
The second of these two alternatives creates only the destination area that is used for the display.
DVD 720-Wide Example
The use of MPEG-2 on DVD with 720-wide pictures uses picture source rectangle values specified by the
PictureSourceRect16thPel member of the DXVA_BlendCombination structure (in one-sixteenth of a luminance
sample spacing resolution) with the following values:
left = 0
Generally, the following destination rectangle values are used:
left = 0
DVD 16:9 Letterbox Height in 4:3 Example
The use of 16:9 video for 4:3 displays with letterbox framing for DVD has the following values for the source and
destination pictures.
The following rectangle values are used in the PictureSourceRect16thPel member of the
DXVA_BlendCombination structure for the source picture:
top = 0
bottom = top + (16 X vertical_size) = 7680 or 9216
The following rectangle values are used in the PictureDestinationRect member of the
DXVA_BlendCombination structure for the destination picture:
top = vertical_size / 8 = 60 or 72
bottom = 7 X vertical_size / 8 = 420 or 504
Picture Resampling Control
When the bDXVA_Func variable is equal to 4, the operation specified is picture resampling. This operation is used
for purposes such as spatial scalable video coding, reference picture resampling, or resampling for use as an
upsampled or display picture.
Picture resampling is performed as specified in H.263 Annex O Spatial Scalability or in H.263 Annex P with clipping
at the picture edges, which is the same method of picture resampling as in some forms of Spatial Scalability in
MPEG-2 and MPEG-4. This function uses simple two-tap separable filtering.
Note that picture resampling control does not require a connection configuration. Its operation requires only
support of the appropriate restricted mode GUID. Because no connection configuration is needed for picture
resampling control, no minimal interoperability set must be defined for its operation.
A single buffer type defined in the DXVA_PicResample structure controls the resampling process.
An accelerator can be used in restricted operation, in which case it conforms to a restricted profile, or it can be used
in nonrestricted operation, in which case it does not conform to a restricted profile.
Restricted Operation
The capabilities of an accelerator are defined according to which restricted profile it supports. An accelerator may
support one or more restricted profiles.
Some restricted profiles are defined as subsets of the capabilities of other restricted profiles (for example, the
MPEG2_A profile is a subset of the capabilities of the MPEG2_B profile). Accelerators that support a particular
restricted profile must also support any restricted profile that is a subset of the profile being supported. For
example, accelerators that support the MPEG2_B profile must also support the MPEG2_A profile.
Nonrestricted Operation
If in DirectX VA an accelerator is used without strict conformance to a restricted profile, the wRestrictedMode
member of the DXVA_ConnectMode structure must be set to 0xFFFF to indicate this lack of restriction.
All defined values of the bDXVA_Func variable are allowed.
Restricted Profiles
This section provides information about the following restricted profiles that can be supported by Microsoft
DirectX VA.
These restricted profiles are defined in anticipation of combinations of features likely to find widespread support.
They establish a set of video coding tools necessary for decoding and also determining whether a given video data
format can be decoded in some fashion using the DirectX VA API.
H261_A
H261_B
H263_A
H263_B
H263_C
H263_D
H263_E
H263_F
MPEG1_A
MPEG2_A
MPEG2_B
MPEG2_C
MPEG2_D
WMV8_A, WMV8_B, WMV9_A, WMV9_B, and WMV9_C
For information about the restricted profiles of the MPEG-4 AVC (H.264) and VC-1 video codec standards,
download DirectX Video Acceleration Specification for H.264/AVC Decoding and DirectX Video Acceleration
Specification for Windows Media Video v8, v9 and vA Decoding (Including SMPTE 421M "VC-1").
H261_A
The H261_A restricted profile contains the set of features required for minimal support of ITU-T Rec. H.261 without
acceleration support for H.261 Annex D graphics. Support of this profile is currently encouraged, but not required.
This set of features is defined by the following restrictions.
Restrictions on DXVA_ConnectMode
The following restriction on the DXVA_ConnectMode structure applies when the bDXVA_Func variable defined by
the dwFunction member of the DXVA_ConfigPictureDecode structure is equal to 1 (picture decoding).
STRUCTURE MEMBER CONSTANT
wRestrictedMode DXVA_RESTRICTED_MODE_H261_A
Restrictions on DXVA_PictureParameters
The following restrictions on the DXVA_PictureParameters structure apply when the bDXVA_Func variable
defined by the dwFunction member of DXVA_ConfigPictureDecode is equal to 1.
STRUCTURE MEMBER VALUE
bBPPMinus1 7
bSecondField Zero
bMacroblockWidthMinus1 15
bMacroblockHeightMinus1 15
bBlockWidthMinus1 7
bBlockHeightMinus1 7
bChromaFormat 1 (4:2:0)
bPicStructure 3 (frame structured)
bMVprecisionAndChromaRelation 2 (H.261 integer-sample motion)
bPicExtrapolation Zero
bPicDeblocked Zero
bPic4MVallowed Zero
bPicOBMC Zero
bMV_RPS Zero
bPicScanFixed 1
Restrictions on DXVA_MBctrl_I_HostResidDiff_1, DXVA_MBctrl_I_OffHostIDCT_1,

DXVA_MBctrl_P_HostResidDiff_1, and DXVA_MBctrl_P_OffHostIDCT_1
The following restrictions on the DXVA_MBctrl_I_HostResidDiff_1, DXVA_MBctrl_I_OffHostIDCT_1,
DXVA_MBctrl_P_HostResidDiff_1, and DXVA_MBctrl_P_OffHostIDCT_1 structures apply when the bDXVA_Func
variable defined by the dwFunction member of DXVA_ConfigPictureDecode is equal to 1.
MotionType 2 (frame motion) if the MotionForward variable defined in

the wMBtype member of these structures equals 1.
MBscanMethod Zero (zigzag) if the bConfigHostInverseScan member of

DXVA_ConfigPictureDecode equals zero.
FieldResidual Zero (frame residual)
MotionBackward Zero (no backward prediction)
Restrictions on Bitstream Buffers

The contents of any bitstream buffer must contain data in the H.261 video format.
This restriction applies when the bDXVA_Func variable defined by the dwFunction member of
DXVA_ConfigPictureDecode is equal to 1.
H261_B
The H261_B restricted profile contains the set of features required for support of ITU-T Rec. H.261 without
acceleration support for H.261 Annex D graphics, but with deblocking filter postprocessing support. Support of this
profile is currently encouraged, but not required.
This set of features is defined by the restrictions for the H261_A restricted profile, with the following exceptions.
wRestrictedMode DXVA_RESTRICTED_MODE_H261_B
bPicDeblocked Zero or 1
wDeblockedPictureIndex Must not be equal to the wDecodedPictureIndex

member of DXVA_PictureParameters when the
bPicDeblocked member is 1.

H263_A
The H263_A restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a small
specific set of enhanced optional capabilities. Support of this profile is currently encouraged but not required. This
set of features is defined by the following set of restrictions.
wRestrictedMode DXVA_RESTRICTED_MODE_H263_A
BPP variable (defined by adding 1 to bBPPminus1) 8
bSecondField Zero
bMacroblockWidthMinus1 15
bMacroblockHeightMinus1 15
bBlockWidthMinus1 7
bChromaFormat 1 (4:2:0)
bRcontrol Zero
bMVprecisionAndChromaRelation 1 (H.263 half-sample motion)

bPicDeblocked Zero
bPic4MVallowed Zero
bPicOBMC Zero
bMV_RPS Zero
bPicScanFixed 1

The following restrictions on the DXVA_MBctrl_I_HostResidDiff_1, DXVA_MBctrl_I_OffHostIDCT_1,
DXVA_MBctrl_P_HostResidDiff_1, and DXVA_MBctrl_P_OffHostIDCT_1 structures apply when the bDXVA_Func
variable defined by the dwFunction member of DXVA_ConfigPictureDecode is equal to 1.
WMBTYPE VARIABLES VALUE
MotionType 2 (frame motion) if the MotionForward variable defined in

the wMBtype member is equal to 1.
MBscanMethod Zero (zigzag) if the bConfigHostInverseScan member of

DXVA_ConfigPictureDecode equals zero.
H261LoopFilter Zero (no H.261 loop filter)
MotionBackward Zero (no backward or bidirectional motion)

The contents of any bitstream buffer must contain data in the H.263 video format in baseline mode (no options, no
PLUSPTYPE), or with Annex L information (to be ignored).
H263_B
The H263_B restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set
of enhanced optional capabilities. Support of this profile is currently encouraged, but not required. This set of
features is specified by the restrictions listed for the H263_A restricted profile, except for the following additional
restrictions.
The following restriction on the DXVA_ConnectMode structure applies when the bDXVA_Func variable defined in
the dwFunction member of DXVA_ConfigPictureDecode is equal to 1 (picture decoding).
wRestrictedMode DXVA_RESTRICTED_MODE_H263_B
defined in the dwFunction member of DXVA_ConfigPictureDecode is equal to 1.
bRcontrol Equal to zero or 1
bPicExtrapolation Equal to zero or 1
bPic4MVallowed Equal to zero or 1
bPicScanFixed Equal to zero or 1

MBscanMethod May be a value of zero (zigzag), a value of 1 (alternate

vertical) or a value of 2 (alternate horizontal) if
bConfigHostInverseScan is equal to zero.
wMBtype Motion4MV flag contained in this structure member is

equal to zero or 1.

The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of CPCF,
CPFMT and Annexes D, I, N (single forward reference picture per output picture), and T.
H263_C
The H263_C restricted profile contains the set of features required for support of ITU-T Recommendation H.263
and a specific set of enhanced optional capabilities. Support of this profile is currently encouraged but not required.
This set of features is specified by the restrictions listed above for the H263_B restricted profile, except for the
following additional restrictions.
the dwFunction member of DXVA_ConfigPictureDecode is equal to 1 (picture decoding).
wRestrictedMode DXVA_RESTRICTED_MODE_H263_C
bPicDeblocked May be 1.
wDeblockedPictureIndex May or may not be equal to the wDecodedPictureIndex

member of DXVA_PictureParameters when the
bPicDeblocked member is 1.

The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of CPCF,
CPFMT and Annexes D, I, J, N (single forward-reference picture per output picture), and T.
H263_D
The H263_D restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set
features is specified by the restrictions for the H263_C restricted profile, except for the following additional
restrictions.
the dwFunction member of the DXVA_ConfigPictureDecode structure is equal to 1 (picture decoding) or 4
(picture resampling).
wRestrictedMode DXVA_RESTRICTED_MODE_H263_D
bBidirectionalAveragingMode 1 (H.263 bidirectional averaging) or 0 (MPEG-2

bidirectional averaging)
bMV_RPS Zero or 1

wMBtype The MotionBackward variable defined by this member

may be zero or 1.

The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of Annexes
K, O, P (factor-of-two resizing with clipping only in one or both dimensions), S, and U.
Restrictions on DXVA_PicResample
The following restrictions on the DXVA_PicResample structure apply when the bDXVA_Func variable defined in
the dwFunction member of the DXVA_ConfigPictureDecode structure is equal to 4.
dwPicResampleSourceWidth Must be equal to dwPicResampleDestWidth or related

to dwPicResampleDestWidth by a multiplication factor
of 2 (or 1/2).
dwPicResampleDestWidth Must be equal to dwPicResampleSourceWidth or

related to dwPicResampleSourceWidth by a
multiplication factor of 2 (or 1/2).
dwPicResampleSourceHeight Must be equal to dwPicResampleDestHeight or related

to dwPicResampleDestHeight by a multiplication factor
of 2 (or 1/2).
dwPicResampleDestHeight Must be equal to dwPicResampleSourceHeight or

related to dwPicResampleSourceHeight by a
multiplication factor of 2 (or 1/2).
If dwPicResampleSourceHeight and dwPicResampleDestHeight are equal, dwPicResampleSourceWidth

and dwPicResampleDestWidth must be related by a multiplication factor of 2 (or 1/2). If
dwPicResampleSourceHeight and dwPicResampleDestHeight indicate an upsampling operation,
dwPicResampleSourceWidth and dwPicResampleDestWidth must not indicate a downsampling operation,
and vice versa.
Note Although H.263 requires only support of the bBidirectionalAveragingMode member of
DXVA_PictureParameters equal to 1 when MotionForward is 1 and MotionBackward is 1, the H263_D restricted
profile also allows bBidirectionalAveragingMode to be zero. This is intended to allow the H263_D restricted
profile to support MPEG-4 video as well as H.263 video (MPEG-4 uses the MPEG-1/MPEG-2 style of bidirectional
averaging).
H263_E
The H263_E restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set
of enhanced optional capabilities. Support of this profile is currently encouraged but not required. This set of
features is specified by the restrictions listed for the H263_D restricted profile, except for the following additional
restrictions.
wRestrictedMode DXVA_RESTRICTED_MODE_H263_E
bPicOBMC Zero or 1

If the bPicOBMC member of DXVA_PictureParameters is 1 and the Motion4MV variable defined in the
wMBtype member of these macroblock control structures is 1, the MotionBackward variable defined in the
wMBtype member must be zero.
The contents of any bitstream buffer may also contain data in the H.263 (with Annex F) video format.
H263_F
The H263_F restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set
features is specified by the restrictions listed for the H263_E restricted profile, except for the following additional
restrictions.
wRestrictedMode DXVA_RESTRICTED_MODE_H263_F
bPicBinPB Zero or 1

The contents of any bitstream buffer may also contain data in the H.263 video format with any subset of Annexes
G, M, V and W.
MPEG1_A
The MPEG1_A restricted profile contains a set of features required for support of MPEG-1 video. Support of this
profile is required for video accelerator drivers that provide hardware video acceleration capabilities. This set of
features is defined by the following set of restrictions:
wRestrictedMode DXVA_RESTRICTED_MODE_MPEG1_A
BPP variable (defined by adding 1 to bBPPminus1) Equals 8
bSecondField Equals zero
MacroblockWidthMinus1 15
MacroblockHeightMinus1 15
bBlockWidthMinus1 7
bChromaFormat (4:2:0) 1
bRcontrol Zero
bBidirectionalAveragingMode Zero (MPEG-2 bidirectional averaging)
bMVprecisionAndChromaRelation Zero (MPEG-2 half-sample motion)

bPicDeblocked Zero
bPic4MVallowed Zero
bPicOBMC Zero
bMV_RPS Zero
SpecificIDCT Zero
bPicScanFixed Zero

WMBTYPE BITS VALUE
MotionType 2 (frame motion)
MBscanMethod Zero (zigzag) if bConfigHostInverseScan equals zero
H261LoopFilter Zero (no H.261 loop filter)

The contents of any bitstream buffer must contain data in the MPEG-1 main profile video format.
MPEG2_A
The MPEG2_A restricted profile contains a set of features required for support of MPEG-2 video Main Profile.
Support of this profile is required for video accelerator drivers that provide hardware video acceleration
capabilities.
The MPEG2_A profile is defined by the following sets of restrictions:
wRestrictedMode DXVA_RESTRICTED_MODE_MPEG2_A
wRestrictedMode 0x0A
BPP variable (defined by adding 1 to bBPPminus1) 8
MacroblockWidthMinus1 15
MacroblockHeightMinus1 15
bBlockWidthMinus1 7
bChromaFormat (4:2:0) 1
bRcontrol Zero
bBidirectionalAveragingMode Zero (MPEG-2 bidirectional averaging)
bMVprecisionAndChromaRelation Zero (MPEG-2 half-sample motion)
bPicDeblocked Zero
bPic4MVallowed Zero
bPicOBMC Zero
bMV_RPS Zero
SpecificIDCT Zero
bPicScanFixed 1

WMBTYPE BITS VALUE
MBscanMethod May be a value of zero (zigzag) or a value of 1 (alternate

vertical) if the ConfigHostInverseScan member of
DXVA_ConfigPictureDecode is equal to zero.
H261LoopFilter Zero

The contents of any bitstream buffer must contain data in the MPEG-2 main profile video format.
The bNewQmatrix member of DXVA_QmatrixData equals zero, for i = 2 and 3 when inverse-quantization
matrices are used.
MPEG2_B
The MPEG2_B restricted profile contains a set of features required for support of MPEG-2 video Main Profile and an
associated DVD subpicture using front-end buffer-to-buffer subpicture blending. Alpha-blending source and
destination surfaces are supported with width and height of at least 720 and 576, respectively. Support of this
profile is currently encouraged, but not required.
Because the MPEG2_A restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_B
profile, all accelerators that support the MPEG2_B profile must support the MPEG2_A profile.
The restrictions for MPEG2_B are defined by the restrictions listed for MPEG2_A, except for the following additional
restrictions.
These values of the bDXVA_Func variable must be supported: 1 (picture decoding), 2 (alpha-blend data loading), or
3 (alpha-blend combination).
wRestrictedMode DXVA_RESTRICTED_MODE_MPEG2_B
Restrictions on DXVA_ConfigAlphaLoad and DXVA_ConfigAlphaCombine

bConfigBlendType (DXVA_ConfigAlphaCombine) Zero (front-end buffer-to-buffer blending)
bConfigDataType (DXVA_ConfigAlphaLoad) Zero, 1, or 3 (at the accelerator's discretion)

MPEG2_C
The MPEG2_C restricted profile contains a set of features required for support of MPEG-2 video Main Profile.
Support of this profile is required for video accelerator drivers that provide hardware video acceleration
capabilities.
Because the MPEG2_C restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_A
profile (by allowing an accelerator to not support any of the members of the minimal interoperability set for
MPEG2_A), all accelerators that support the MPEG2_A profile must support the MPEG2_C profile. Similarly, all
accelerators that support the MPEG2_D profile must support the MPEG2_C profile.
The restrictions for MPEC2_C are defined by the restrictions listed for MPEG2_A, except for the following additional
restrictions.
wRestrictedMode DXVA_RESTRICTED_MODE_MPEG2_C
Restrictions on DXVA_ConfigPictureDecode
This profile adds an additional configuration to the minimal interoperability set for picture decoding. This
additional configuration is defined by the following DXVA_ConfigPictureDecode members.
bConfigResidDiffHost Zero
bConfigResidDiffAccelerator 1

MPEG2_D
The MPEG2_D restricted profile contains a set of features required for support of MPEG-2 video Main Profile and
an associated DVD subpicture using back-end hardware subpicture blending.
Because the MPEG2_D restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_B
profile (the accelerator is not required to support the minimal interoperability set for MPEG2_B), all drivers that
support the MPEG2_B profile must support the MPEG2_D profile. The restrictions for MPEG2_D are defined by the
restrictions listed for the MPEG2_B restricted profile, except for the following additional restrictions.
wRestrictedMode DXVA_RESTRICTED_MODE_MPEG2_D
Restrictions on DXVA_ConfigPictureDecode
These restrictions add an additional configuration to the minimal interoperability set for picture decoding
(bDXVA_Func equal to 1). This additional configuration is defined by the following DXVA_ConfigPictureDecode
members.
Restrictions on DXVA_ConfigAlphaCombine
bConfigBlendType Zero or 1 (at the accelerator's discretion).
Restrictions on DXVA_ConfigAlphaLoad
bConfigDataType Any value (at the accelerator's discretion).

WMV8_A, WMV8_B, WMV9_A, WMV9_B, and
WMV9_C
The WMV8_A, WMV8_B, WMV9_A, WMV9_B, and WMV9_C restricted profiles contain the sets of features required
for support of Windows Media Video, versions 8 and 9. For more information about these profiles, download
DirectX Video Acceleration Specification for Windows Media Video v8, v9 and vA Decoding (Including SMPTE 421M
"VC-1").
Minimal Interoperability Configuration Sets
All DirectX VA decoders must operate with all DirectX VA accelerators to use a restricted profile. Every decoder
must be capable of operation with any member of a set of connection configurations, and every accelerator must
be capable of operation with at least one member of that set. There are three configuration sets that define the
minimal level of functionality that a device driver must provide:
Compressed Picture Decoding Set
Alpha-Blend Data Loading Set
Alpha-Blend Combination Set
For each set the decoder and accelerator must support the same DirectX VA restricted-mode GUID.
Compressed Picture Decoding Set
This section defines the minimal interoperability configuration set for compressed picture decoding. This entire set
of configurations must be supported by a decoder, and one or more configurations in this set must be supported
by an accelerator. An additional configuration set is provided for which support is encouraged (these
configurations are not required).
The first six configurations in this set are for all restricted profiles. The seventh configuration in this set is defined
only for MPEG2_C and MPEG2_D.
The minimal interoperability configuration set of configurations for compressed picture decoding is defined by the
third through the last members of the DXVA_ConfigPictureDecode structure.
First Picture Decoding Configuration
The first configuration in this set (a configuration preferred over the second and third configurations in this set) is
defined as follows.
MEMBER VALUE
guidConfigBitstreamEncryption DXVA_NoEncrypt
guidConfigMBcontrolEncryption DXVA_NoEncrypt
guidConfigResidDiffEncryption DXVA_NoEncrypt
bConfigBitstreamRaw Zero
bConfigMBcontrolRasterOrder 1
bConfigResidDiffHost 1
bConfigSpatialResid8 Zero
bConfigResid8Subtraction Zero
bConfigSpatialHost8or9Clipping Zero
bConfigSpatialResidInterleaved Zero
bConfigIntraResidUnsigned Zero
bConfigResidDiffAccelerator Zero
bConfigHostInverseScan Zero
bConfigSpecificIDCT Zero
bConfig4GroupedCoefs Zero

Second Picture Decoding Configuration
The second configuration in this set (which is not a preferred configuration) is defined the same way as the first
picture decoding configuration with the following exception.
MEMBER VALUE
bConfigSpatialHost8or9Clipping 1

Third Picture Decoding Configuration
The third configuration in this set (which is not a preferred configuration) is defined the same way as the first
picture decoding configuration with the following exceptions.
MEMBER VALUE
bConfigSpatialResid8 1
bConfigSpatialResidInterleaved 1

Fourth Picture Decoding Configuration
The fourth configuration in this set (which is not a preferred configuration) is defined the same way as the same
way as the first picture decoding configuration with the following exceptions.
MEMBER VALUE
bConfigSpatialHost8or9Clipping 1
bConfigIntraResidUnsigned 1

Fifth Picture Decoding Configuration
The fifth configuration in this set (which is not a preferred configuration) is defined the same way as the first picture
decoding configuration with the following exception.
MEMBER VALUE

Sixth Picture Decoding Configuration
The sixth configuration in this set (a configuration preferred over the fifth in this set) is defined the same way as the
first picture decoding configuration with the following exceptions.
MEMBER VALUE
bConfigResid8Subtraction 1

Seventh Picture Decoding Configuration
The seventh configuration in this set is defined only for the MPEG2_C and MPEG2_D restricted profiles indicated in
the DXVA_ConnectMode structure. No other restricted profiles include this configuration in their minimal
interoperability set.
This configuration (which is not a preferred configuration) is defined the same way as the first picture decoding
configuration with the following exceptions.
MEMBER VALUE
bConfig4GroupedCoefs 1

Additional Encouraged Configuration Set
Implementation of additional configurations for software decoders is encouraged. These configurations may exist
in hardware and can provide a significant performance benefit relative to those in the minimal interoperability
configuration sets.
This additional configuration set is defined in terms of the members of the DXVA_ConfigPictureDecode
structure.
First Encouraged Picture Decoding Configuration
The first encouraged configuration is for improved support of off-host bitstream processing acceleration.
This configuration is defined the same way as the first picture decoding configuration with the following exceptions.
MEMBER VALUE
bConfigBitstreamRaw 1
bConfigMBcontrolRasterOrder Zero

Second Encouraged Picture Decoding Configuration
The second encouraged configuration provides improved support of off-host IDCT acceleration. Accelerators that
implement the first configuration in this set should support the second one. Implementing support for both
configurations provides flexibility in the manner in which their acceleration capabilities can be used.
MEMBER VALUE
bConfigHostInverseScan 1

Third Encouraged Picture Decoding Configuration
The third encouraged configuration provides support for off-host IDCT that is expected in some implementations.
This configuration is encouraged for decoders. However, the second configuration is preferred for accelerators.
MEMBER VALUE
bConfig4GroupedCoefs 1

Alpha-Blend Data Loading Set
The minimal interoperability configuration set for alpha-blend data loading consists of all defined values of the
bConfigDataType member of the DXVA_ConfigAlphaLoad structure.
Alpha-Blend Combination Set
The minimal interoperability configuration set for alpha-blend combination consists of the bConfigBlendType
member of the DXVA_ConfigAlphaCombine structure having a choice of values according to the restricted
profile being implemented.
Video Miniport Drivers in the Windows 2000 Display
Driver Model
This section describes the role of the video miniport driver, which is part of the Microsoft Windows 2000 display
driver model. The video miniport driver is not part of the Windows Vista display driver model.
For more information about the differences between the two display driver models, see Windows 2000 Display
Driver Model.
Video Miniport Driver Header Files (Windows 2000
Model)
Video miniport drivers in the Windows 2000 display driver model include the following header files:
FILE NAME CONTENTS
dderror.h Contains the Win32 status constants that miniport drivers

return to the video port driver, which are also returned to
the miniport driver's corresponding kernel-mode display
driver.
devioctl.h Contains the macros and constants used to define I/O

control codes.
miniport.h Contains the basic types, constants, and structures for

video (and SCSI) miniport drivers.
ntddvdeo.h Contains the system-defined I/O control codes (IOCTLs)

and corresponding structures that are sent in video
request packets (VRPs) to video miniport drivers.
tvout.h Contains the VIDEOPARAMETERS structure used to

implement TV connector and copy protection support and
the constants used in this structure.
video.h Contains the VideoPortXxx and SvgaHwIoPortXxx video

port function declarations, video-specific structures, such
as the VIDEO_REQUEST_PACKET, and the HwVidXxx
video miniport function prototypes.
videoagp.h Contains the AGP-specific structures, AgpXxx miniport

driver function prototypes, and VideoPortXxx function
declarations required to implement AGP support in a
video miniport driver.
These headers are shipped with the Windows Driver Kit (WDK). For more detailed information about the functions,
structures, system-defined I/O control codes, and constants in these header files, see GDI Functions.
Video Miniport Driver Requirements (Windows 2000
Model)
The following are some of the requirements for video miniport drivers.
An NT-based operating system video miniport driver must be a single .sys file.
A miniport driver consists of a single binary file. The miniport driver's main purpose is to detect, initialize,
and configure one or more graphics adapters of the same type.
A miniport driver can only make calls exported by videoprt.sys.
A miniport driver can call only those functions that are exported by the system-supplied video port driver.
(The exported video port functions are listed on the reference pages following Video Port Driver Functions.)
Driver writers can also use the following to determine which functions a miniport driver is calling:
link -dump -imports my_driver.sys
A miniport driver cannot load or install another driver on the machine using undocumented operating
system function calls.
A miniport driver can enable panning only upon receiving an end-user request.
Panning must be disabled by default. The miniport driver should enable it only when it is requested through
a control panel. OEMs can enable panning by default as a part of their preinstall.
Video Miniport Driver Within the Graphics
Architecture (Windows 2000 Model)
The following figure shows the video miniport driver within the NT-based operating system graphics subsystem.
Each video miniport driver provides hardware-level support for a display driver. The display driver calls the
graphics engine EngDeviceIoControl function to request support from the underlying video miniport driver.
EngDeviceIoControl, in turn, calls an I/O system service to send the request through the video port driver to the
miniport driver.
In most circumstances, the display driver carries out time-critical operations that are visible to the user, while the
underlying miniport driver provides support for infrequently requested operations or for truly time-critical
operations that cannot be preempted by an interrupt or a context switch to another process.
A display driver cannot handle device interrupts, and only the miniport driver can set up device memory and map it
into a display driver's virtual address space.
The video port driver is a system-supplied module provided to support video miniport drivers. It acts as the
intermediary between the display driver and video miniport drivers
For more information about NT-based operating system display drivers, see Introduction to Display (Windows
2000 Model) and Display Drivers (Windows 2000 Model).
Video Miniport Driver Initialization (Windows 2000
Model)
Video miniport driver initialization occurs after the NT kernel, HAL, and core drivers, such as the PCI bus driver, are
loaded and initialized. The basic system initialization sequence occurs as follows:
1. The NT kernel and HAL are loaded and initialized.
2. Core drivers such as the PCI bus driver are loaded and initialized.
3. The PCI bus driver obtains PCI resource information and the device ID and vendor ID from each of its
children's PCI configuration spaces and reports this information back to the system.
4. If the PnP manager recognizes the device and vendor IDs, the I/O manager loads the corresponding video
miniport driver and the video port driver from a known location. If the PnP manager does not recognize the
IDs, it prompts the user for the location of the miniport driver and loads it from this location.
5. The I/O manager calls the miniport driver's DriverEntry routine with two system-supplied input pointers.
DriverEntry allocates and initializes a VIDEO_HW_INITIALIZATION_DATA structure with driver-specific
and adapter-specific values, including pointers to the miniport driver's other entry points. DriverEntry must
also claim any legacy resources, which are those resources not listed in the device's PCI configuration space
but that are decoded by the device. See Claiming Legacy Resources for details.
6. The miniport driver's DriverEntry function calls VideoPortInitialize. VideoPortInitialize performs those
aspects of miniport driver initialization that are common to all miniport drivers. For example, for non-PnP
drivers, VideoPortInitialize verifies portions of the miniport driver-initialized
VIDEO_HW_INITIALIZATION_DATA structure, initializes some of the public members of the system-created
device object, allocates memory for the device extension of the device object, and collects and stores
pertinent information in the device extension. See Video Miniport Driver's Device Extension (Windows 2000
Model) for more details about device extensions. For PnP drivers, the device object-related actions occur at a
later time.
7. When VideoPortInitialize returns, DriverEntry propagates the return value of VideoPortInitialize back
to the caller. Miniport driver writers should make no assumptions about the value returned by
VideoPortInitialize.
At this point, the system has loaded and initialized the video miniport driver. The next step is for the PnP manager
to start the device. See Starting the Device of the Video Miniport Driver for details.
Starting the Device of the Video Miniport Driver
The PnP manager sends an IRP code (see IRP Major Function Codes) to the video port driver requesting that the
graphics adapter be started. The video port driver's dispatch routine calls the miniport driver's HwVidFindAdapter
routine in response to this IRP code. Details for some of HwVidFindAdapter's tasks are discussed in the following
topics:
Setting up Video Adapter Access Ranges
Setting Hardware Information in the Registry
Changing State on the Adapter
Setting up Video Adapter Access Ranges
An array of VIDEO_ACCESS_RANGE-type elements describes one or more ranges of memory and/or I/O ports
that a video adapter decodes. Each access range element in this array contains bus-relative physical address values.
The miniport driver's HwVidFindAdapter routine must claim all PCI memory and ports or ranges of ports that the
adapter will respond to. Depending on the adapter and the AdapterInterfaceType value in
VIDEO_PORT_CONFIG_INFO, HwVidFindAdapter can call some of the following VideoPortXxx functions to get
the necessary bus-relative configuration data:
VideoPortGetAccessRanges
VideoPortGetBusData
VideoPortGetDeviceData
VideoPortGetRegistryParameters
VideoPortVerifyAccessRanges
If HwVidFindAdapter cannot get bus-relative access ranges information by calling VideoPortGetBusData or
VideoPortGetAccessRanges, or from the registry with VideoPortGetDeviceData or
VideoPortGetRegistryParameters, the miniport driver should have a set of bus-relative default values for access
ranges.
Every HwVidFindAdapter function must map each claimed bus-relative physical address range to a range in kernel-
mode address space with VideoPortGetDeviceBase before the miniport driver attempts to communicate with an
adapter. The HAL can remap bus-relative access range values to system space logical address ranges, particularly in
multiple bus machines.
With mapped logical range addresses, the driver can call the VideoPortReadXxx and VideoPortWriteXxx
functions to read from or write to an adapter. These kernel-mode addresses also can be passed to
VideoPortCompareMemory, VideoPortMoveMemory, VideoPortZeroDeviceMemory, and/or
VideoPortZeroMemory. For a mapped range in I/O space, the miniport driver calls the VideoPortReadPortXxx
and VideoPortWritePortXxx functions. For a mapped range in memory, the miniport driver calls the
VideoPortReadRegisterXxx and VideoPortWriteRegisterXxx functions.
The HwVidFindAdapter function must always call VideoPortVerifyAccessRanges or
VideoPortGetAccessRanges successfully before it calls VideoPortGetDeviceBase.
Any successful call to VideoPortVerifyAccessRanges or VideoPortGetAccessRanges establishes a
miniport driver's claim on bus-specific video memory and register addresses or I/O ports for its adapter in
the registry. It is critical to note, however, that any subsequent call to VideoPortVerifyAccessRanges or
VideoPortGetAccessRanges will cause that driver's previously claimed resources to be erased and
replaced with the ranges passed to the most recently called function. Therefore, if a driver claims ranges by
separate calls to these functions, it must pass in all range arrays, including those already claimed.
HwVidFindAdapter can claim a small set of access ranges for an adapter, use this set to determine whether
the adapter is one that the miniport driver supports, and claim a full set of access ranges for a supported
adapter with another call to VideoPortGetAccessRanges or VideoPortVerifyAccessRanges. Again, each
successful call to these VideoPort..AccessRanges routines for a particular adapter overwrites the caller's
previous claims in the registry.
To claim other types of hardware resources, such as an interrupt vector, a miniport driver should set
appropriate values in the VIDEO_PORT_CONFIG_INFO and call VideoPortVerifyAccessRanges, or it should
call VideoPortGetAccessRanges.
Calling VideoPortGetAccessRanges or VideoPortVerifyAccessRanges successfully ensures that a
miniport driver does not try to use register or device memory addresses already in use by another driver.
Claiming an adapter's bus-relative hardware resources in the registry prevents drivers that load later from
attempting to use the same access ranges (and other hardware resources) for their adapters. It also prevents
a subsequently loaded driver from changing the initialized state of the video adapter.
The miniport driver of hardware that decodes legacy resources must claim these resources in its DriverEntry
routine, or if implemented, its HwVidLegacyResources routine. Legacy resources are those resources not listed in
the device's PCI configuration space but that are decoded by the device. See Claiming Legacy Resources for details.
After a miniport driver is loaded and its HwVidInitialize function is run, the miniport driver's HwVidStartIO function
is called to map any access range of video memory that the miniport driver makes visible to its corresponding
display driver.
Setting Hardware Information in the Registry
HwVidFindAdapter can call the VideoPortGetRegistryParameters and VideoPortSetRegistryParameters

functions to get and set configuration information in the registry. For example, HwVidFindAdapter might call
VideoPortSetRegistryParameters to set up nonvolatile configuration information in the registry for the next
boot. It might call VideoPortGetRegistryParameters to get adapter-specific, bus-relative configuration
parameters written into the registry by an installation program.
It is recommended that miniport drivers set certain hardware information in the registry to display useful
information to the user and for assistance in debugging. A miniport driver can set a chip type, DAC type, memory
size (of the adapter), and a string to identify the adapter. This information is shown by the Display program in
Control Panel.
The driver sets this information by calling VideoPortSetRegistryParameters. Typically, the driver makes the call
in its HwVidFindAdapter routine.
The following table describes the information that the driver can register and provides details for the ValueName
and ValueData parameters of VideoPortSetRegistryParameters:
INFORMATION FOR ENTRY VALUENAME VALUEDATA
Chip type HardwareInformation.ChipType Null terminated string containing

the chip name.
DAC type HardwareInformation.DacType Null terminated string containing

the DAC name or ID.
Memory size HardwareInformation.MemorySize ULONG containing, in MB, the

amount of video memory on the
adapter.
Adapter ID HardwareInformation.AdapterString Null-terminated string containing

the name of the adapter.
BIOS HardwareInformation.BiosString Null-terminated string containing

information about the BIOS.

Changing State on the Adapter
The miniport driver must not permanently change the state of the adapter until its HwVidInitialize routine is called.
Miniport driver routines called before HwVidInitialize, such as HwVidFindAdapter, should not change the state of
any video adapter unnecessarily and must not change the state of any video adapter permanently.
While HwVidFindAdapter runs, the HAL has control of the video adapter so it can write information to the screen in
the early stages of the system boot process. If HwVidFindAdapter's attempt to identify its adapter affects an
adapter's state, this routine should restore the original state immediately so that on return from HwVidFindAdapter
the HAL can continue to display boot-up messages.
For example, HwVidFindAdapter should defer determining the DAC type of an adapter to the HwVidInitialize
function, because making this determination does not affect whether the miniport driver will be loaded but does
change the state of the adapter permanently.
Claiming Legacy Resources
A video miniport driver must claim and report all legacy resources in its VIDEO_HW_INITIALIZATION_DATA
structure during driver initialization. Legacy resources are those resources not listed in the device's PCI
configuration space but that are decoded by the device. NT-based operating systems will disable power
management and docking when they encounter legacy resources that are not reported in the manner outlined in
this section.
Miniport drivers must do the following to report such legacy resources:
If the legacy resource list for the device is known at compile time, fill in the following two fields of the
VIDEO_HW_INITIALIZATION_DATA structure that is created and initialized in the DriverEntry routine:
STRUCTURE MEMBER DEFINITION
HwLegacyResourceList Points to an array of VIDEO_ACCESS_RANGE

structures. Each structure describes a device I/O port
or memory range for the video adapter that is not
listed in PCI configuration space.
HwLegacyResourceCount Is the number of elements in the array to which

HwLegacyResourceList points.
If the legacy resource list for the device is not known at compile time, implement a HwVidLegacyResources
function and initialize the HwGetLegacyResources member of VIDEO_HW_INITIALIZATION_DATA to point
to this function. For example, a miniport driver that supports two devices with different sets of legacy
resources would implement HwVidLegacyResources to report the legacy resources at run time. The video
port driver will ignore the HwLegacyResourceList and HwLegacyResourceCount members of
VIDEO_HW_INITIALIZATION_DATA when a miniport driver implements HwVidLegacyResources.
Fill in the RangePassive field for each VIDEO_ACCESS_RANGE structure defined in the miniport driver
accordingly. Setting RangePassive to VIDEO_RANGE_PASSIVE_DECODE indicates that the region is
decoded by the hardware but that the display and video miniport drivers will never touch it. Setting
RangePassive to VIDEO_RANGE_10_BIT_DECODE indicates that the device decodes ten bits of the port
address for the region.
Again, a driver should only include resources that the hardware decodes but that are not claimed by PCI. Code in a
driver that needs to claim minimal legacy resources might look something like the following:
// RangeStart RangeLength
// | | RangeInIoSpace
// | | | RangeVisible
// +-----+-----+ | | | RangeShareable
// low high | | | | RangePassive
// v v v v v v v
VIDEO_ACCESS_RANGE AccessRanges[] = {
// [0] (0x3b0-0x3bb)
{0x000003b0, 0x00000000, 0x0000000c, 1, 1, 1, 0},
// [1] (0x3c0-0x3df)
{0x000003C0, 0x00000000, 0x00000010, 1, 1, 1, 0},
// [2] (0xa0000-0xaffff)
{0x000A0000, 0x00000000, 0x00010000, 1, 0, 0, 0},
};
// Within the DriverEntry routine:

VIDEO_HW_INITIALIZATION_DATA hwInitData;
hwInitData.HwLegacyResourceList = AccessRanges;
hwInitData.HwLegacyResourceCount = 3;
The miniport driver can "reclaim" legacy resources again in subsequent call(s) to VideoPortVerifyAccessRanges;
however, the video port driver will just ignore requests for any such previously claimed resources. Power
management and docking will be disabled in the system if the miniport driver attempts to claim a legacy access
range in VideoPortVerifyAccessRanges that was not previously claimed in the HwLegacyResourceList during
DriverEntry or returned in the LegacyResourceList parameter of HwVidLegacyResources.
Initializing the Video Miniport for Communication
with Display Driver
For each adapter found by the PnP manager and successfully configured by the miniport driver, the miniport
driver's HwVidInitialize function is called when the corresponding display driver is loaded. HwVidInitialize can
initialize software state information, but it should not set up visible state on the adapter. On return from
HwVidInitialize, the adapter should be set to the same state as on return from the miniport driver's HwVidResetHw
routine. For more information about HwVidResetHw, see Resetting the Adapter in Video Miniport Drivers.
If necessary, a miniport driver's HwVidInitialize function can carry out a one-time initialization operation on the
adapter that was postponed by its HwVidFindAdapter function. For example, a miniport driver might postpone
loading microcode on the adapter and have the HwVidInitialize function call VideoPortGetRegistryParameters.
When the HwVidInitialize function returns control, the graphics engine has a handle for the miniport driver's
adapter. The corresponding display driver then can call the engine's EngDeviceIoControl function to request
access to mapped video memory or to request any other operation. The video port driver sends such a request on
to the miniport driver's HwVidStartIO function, as a VRP. See Processing Video Requests (Windows 2000 Model)
for details.
Usually, a display driver controls the display the end user sees, except occasionally when a full-screen MS-DOS
application is run in an x86-based machine running an NT-based operating system. For more information about
supporting this feature in VGA-compatible miniport drivers, see VGA-Compatible Video Miniport Drivers (Windows
2000 Model).
The HwVidInitialize function can call VideoPortGetRegistryParameters or VideoPortSetRegistryParameters to
get and set configuration information in the registry. For example, HwVidInitialize might call
VideoPortSetRegistryParameters to set up nonvolatile configuration information in the registry for the next
boot. It might call VideoPortGetRegistryParameters to get adapter-specific, bus-relative configuration
parameters written into the registry by an installation program.
Video Miniport Driver's Device Extension (Windows
2000 Model)
A device extension is each miniport driver's primary and only global storage area for adapter-specific state
information.
Each miniport driver defines the size, internal structure, and contents of its device extension. The video port driver
passes a pointer to the device extension as an input parameter to every system-defined miniport driver function
except DriverEntry and, if implemented, the HwVidSynchronizeExecutionCallback and SvgaHwIoPortXxx functions.
Many VideoPortXxx functions require this pointer as an argument as well.
The miniport driver must also use the device extension to maintain the state information for a single adapter. Each
adapter detected by the system will have separate state information maintained in a separate device extension. The
miniport driver must not use global variables to store any per-adapter state. This is especially critical in order to
provide seamless multiple monitor support.
Individually Registered Callback Functions in Video
Miniport Drivers
In certain instances, communication between the vendor-supplied video miniport driver and the system-supplied
video port driver proceeds as follows:
1. The video miniport driver calls a function in the video port driver.
2. Before the video port driver function completes, it calls back into the video miniport driver for assistance.
When the video miniport driver calls the video port driver function, it passes a pointer to the callback function. For
example, when the video miniport driver calls VideoPortStartDma, it passes a pointer to an HwVidExecuteDma
callback function (implemented by the video miniport driver).
When the video miniport driver passes the address of a callback function to a video port driver function, it registers
the callback function with the video port driver. The registration is temporary in the sense that the video port driver
does not permanently store the callback function pointer. Rather, the video port driver holds the function pointer
only during the execution of the function that calls back. This kind of temporary registration is in contrast to the
permanent registration of many video miniport driver functions. For example, the video miniport driver registers a
set of functions during DriverEntry, and the video port driver stores those function pointers permanently in the
device extension.
In some instances, it makes sense for the video miniport driver to implement several functions, each of which can
serve as the callback function for a particular video port driver function. For example, the video miniport driver
might implement several variations of the HwVidQueryDeviceCallback function and pass the variation of choice in
a particular call to VideoPortGetDeviceData.
For a list of callback functions that can be implemented by the video miniport driver and For information about
how those callback functions are registered, see Individually Registered Video Miniport Driver Functions.
Events in Video Miniport Drivers (Windows 2000
Model)
The video port driver provides support for events, a type of kernel dispatcher object that can be used to
synchronize two threads running below DISPATCH_LEVEL. A video miniport driver can use events to synchronize
access to the video hardware:
By the video miniport driver and the display driver
By the display or video miniport driver and another component, such as an OpenGL driver or a program
extension (such as the Display program in Control Panel).
The following table lists the event-related functions that the video port driver supplies.
VideoPortClearEvent Sets a given event object to the nonsignaled state.
VideoPortCreateEvent Creates an event object.
VideoPortDeleteEvent Deletes the specified event object.
VideoPortReadStateEvent Returns the current state of a given event object: signaled

or nonsignaled.
VideoPortSetEvent Sets an event object to the signaled state if it was not

already in that state, and returns the event object's
previous state.
VideoPortWaitForSingleObject Puts the current thread into a wait state until the given
dispatch object is set to the signaled state, or (optionally)
until the wait times out.
GDI also provides support for events to display drivers. See Using Events in Display Drivers for more information.
For a broader perspective on events, see Event Objects in the Kernel-Mode Drivers Design Guide.
Processing Video Requests (Windows 2000 Model)
All I/O requests that originate in a display driver's call to EngDeviceIoControl are mapped from IRP codes (see
IRP Major Function Codes) to VRPs by the video port driver. The video port driver then calls the corresponding
miniport driver's HwVidStartIO function with a pointer to each VIDEO_REQUEST_PACKET structure that it sets up.
All VRPs sent to HwVidStartIO have the IoControlCode member set to an IOCTL_VIDEO_XXX.
The video port driver also manages the synchronization of incoming requests for all video miniport drivers by
sending each miniport driver's HwVidStartIO routine only one VRP for processing at a time. HwVidStartIO owns
each input VRP until the miniport driver completes the requested operation and returns control. Until a miniport
driver completes the current VRP, the video port driver holds on to any outstanding IRP codes that the I/O
manager sends in response to subsequent calls to EngDeviceIoControl by the corresponding display driver.
On receipt of a video request, HwVidStartIO must examine the VRP, process the video request on the adapter, set
the appropriate status and other information in the VRP, and return TRUE.
System-Defined IOCTL_VIDEO_XXX Requests
Typically, most video miniport drivers support the following requests:

IOCTL_VIDEO_QUERY_NUM_AVAIL_MODES
IOCTL_VIDEO_QUERY_AVAIL_MODES
IOCTL_VIDEO_QUERY_CURRENT_MODE
IOCTL_VIDEO_SET_CURRENT_MODE
IOCTL_VIDEO_RESET_DEVICE
IOCTL_VIDEO_MAP_VIDEO_MEMORY
IOCTL_VIDEO_UNMAP_VIDEO_MEMORY
IOCTL_VIDEO_SHARE_VIDEO_MEMORY
IOCTL_VIDEO_UNSHARE_VIDEO_MEMORY
IOCTL_VIDEO_QUERY_PUBLIC_ACCESS_RANGES
IOCTL_VIDEO_FREE_PUBLIC_ACCESS_RANGES
IOCTL_VIDEO_GET_POWER_MANAGEMENT
IOCTL_VIDEO_SET_POWER_MANAGEMENT
IOCTL_VIDEO_GET_CHILD_STATE
IOCTL_VIDEO_SET_CHILD_STATE_CONFIGURATION
IOCTL_VIDEO_VALIDATE_CHILD_STATE_CONFIGURATION
Depending on the adapter's features, video miniport drivers can support the following additional requests:
IOCTL_VIDEO_QUERY_COLOR_CAPABILITIES
IOCTL_VIDEO_SET_COLOR_REGISTERS (required if the device has a palette)
IOCTL_VIDEO_DISABLE_POINTER
IOCTL_VIDEO_ENABLE_POINTER
IOCTL_VIDEO_QUERY_POINTER_CAPABILITIES
IOCTL_VIDEO_QUERY_POINTER_ATTR
IOCTL_VIDEO_SET_POINTER_ATTR
IOCTL_VIDEO_QUERY_POINTER_POSITION
IOCTL_VIDEO_SET_POINTER_POSITION
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS
IOCTL_VIDEO_SWITCH_DUALVIEW
VGA-compatible SVGA miniport drivers are required to support the following additional requests:
IOCTL_VIDEO_SAVE_HARDWARE_STATE
IOCTL_VIDEO_RESTORE_HARDWARE_STATE
IOCTL_VIDEO_DISABLE_CURSOR
IOCTL_VIDEO_ENABLE_CURSOR
IOCTL_VIDEO_QUERY_CURSOR_ATTR
IOCTL_VIDEO_SET_CURSOR_ATTR
IOCTL_VIDEO_QUERY_CURSOR_POSITION
IOCTL_VIDEO_SET_CURSOR_POSITION
IOCTL_VIDEO_GET_BANK_SELECT_CODE
IOCTL_VIDEO_SET_PALETTE_REGISTERS
IOCTL_VIDEO_LOAD_AND_SET_FONT
Details for each IOCTL can be found in Video Miniport Driver I/O Control Codes. Miniport driver writers should not
use undocumented system-defined IOCTLs.
Privately Defined Display-Miniport Driver
IOCTL_VIDEO_XXX Requests
A miniport driver can define one or more private I/O control codes for its corresponding display driver.
However, only a specific display-and-miniport driver pair can use privately defined I/O control codes. That is, a
miniport driver designed to run under an existing display driver should not define private I/O control codes
because the existing display driver cannot make new I/O control requests without being rewritten and, possibly,
without breaking existing miniport drivers it already uses. An existing or generic display driver layered over many
different models of adapters, such as SVGA adapters, also cannot rely on a privately defined I/O control code to
have the same effects in every underlying miniport driver.
For more information about defining private I/O control codes, see Using I/O Control Codes.
Handling Unsupported IOCTL_VIDEO_XXX Requests
Every HwVidStartIO function also must handle the receipt of an unsupported IOCTL_VIDEO_XXX, as follows:
1. Set the input VRP's Status field to ERROR_INVALID_FUNCTION.
2. Set the input VRP's Information field to zero.
3. Return TRUE to indicate the request was processed.
See the VIDEO_REQUEST_PACKET and STATUS_BLOCK structures for more details.
Plug and Play and Power Management in Video
Miniport Drivers (Windows 2000 Model)
All Windows 2000 and later miniport drivers must support Plug and Play and Power Management. This includes
the ability to enumerate child devices such as DDC monitors, inter-integrated circuit (I²C) devices, and secondary
adapters.
The video port driver manages most of the PnP requirements for the miniport driver, including creating the FDO
(Functional Device Object) and receiving and dispatching PnP-specific IRP codes (see IRP Major Function Codes) on
the miniport driver's behalf.
Miniport drivers must implement the following functions to support PnP and Power Management:
HwVidSetPowerState
HwVidGetPowerState
HwVidGetVideoChildDescriptor
The graphics adapter for a legacy miniport driver cannot be removed from the system while the system is running,
nor are legacy miniport drivers automatically detected when added to a running system.
See Child Devices of the Display Adapter (Windows 2000 Model) for more information about detecting and
communicating with an adapter's child devices. For general information about Plug and Play drivers, see Plug and
Play.
Video Port Driver Support for AGP
The video port driver implements the following functions to support Accelerated Graphics Port (AGP).
AgpReservePhysical
AgpCommitPhysical
AgpReserveVirtual
AgpCommitVirtual
AgpFreeVirtual
AgpReleaseVirtual
AgpFreePhysical
AgpReleasePhysical
AgpSetRate
Before the video miniport driver calls the functions in the preceding list, it must obtain function pointers by calling
VideoPortQueryServices. For more information about obtaining pointers to the AGP functions, see AGP
Functions Implemented by the Video Port Driver.
The video miniport driver performs the following steps to reserve and commit a portion of the AGP aperture
through which the display adapter can access system memory:
1. Call AgpReservePhysical to reserve a contiguous range of physical addresses in the AGP aperture.
2. Call AgpCommitPhysical to map a portion (or all) of the address range returned by AgpReservePhysical to
pages in system memory. The pages in system memory are locked, but not necessarily contiguous. The
video miniport driver can call AgpCommitPhysical several times to do several small commitments rather
than one large one. However, the driver must not attempt to commit a range that is already committed.
Then, for an application to be able to see and use the committed pages in system memory, the video miniport
driver performs the following steps:
1. Call AgpReserveVirtual to reserve a range of virtual addresses in the application's address space. The video
miniport driver must pass AgpReserveVirtual a handle, previously returned by AgpReservePhysical, so that
the reserved virtual address range can be associated with the physical address range created by
AgpReservePhysical.
2. Call AgpCommitVirtual to map a portion of the virtual address range returned by AgpReserveVirtual to
pages in system memory. The pages that AgpCommitVirtual maps must have been previously mapped by a
call to AgpCommitPhysical. Furthermore, that mapping established by AgpCommitPhysical must still be
current; that is, those pages must not have been freed by a call to AgpFreePhysical.
Note Whenever you use the AGP functions to commit or reserve an address range (physical or virtual), the size of
the range must be a multiple of 64 kilobytes.
The video miniport driver is responsible for releasing and freeing all memory that it has reserved and committed
by calling the following functions:
AgpFreeVirtual unmaps virtual addresses that were mapped to system memory by a prior call to
AgpCommitVirtual.
AgpReleaseVirtual releases virtual addresses that were reserved by a prior call to AgpReserveVirtual.
AgpFreePhysical unmaps physical addresses that were mapped to system memory by a prior call to
AgpCommitPhysical.
AgpReleasePhysical releases physical addresses that were reserved by a prior call to
AgpReservePhysical.
Video Port Driver Support for Bug Check Callbacks
In Windows XP SP1 and later, a video miniport driver can implement and register HwVidBugcheckCallback, a
function that the system calls when Bug Check 0xEA (THREAD_STUCK_IN_DEVICE_DRIVER) occurs.
HwVidBugcheckCallback can append its own data to a dump file that driver developers can use to diagnose
problems in their drivers.
For information about registering HwVidBugcheckCallback, see the following topics:
Individually Registered Video Miniport Driver Functions
VideoPortRegisterBugcheckCallback
Child Devices of the Display Adapter (Windows 2000
Model)
The following sections discuss issues that affect miniport drivers of graphics adapters with one or more child
devices:
Detecting Child Devices
Communicating with the Driver of a Child Device
Using I2C to Communicate with a Child Device
Detecting Child Devices
You must implement HwVidGetVideoChildDescriptor in your miniport driver for the Plug and Play manager to be
able to detect child devices of a graphics adapter.
By default, HwVidGetVideoChildDescriptor cannot be called until after the parent device is started; that is,
HwVidGetVideoChildDescriptor cannot be called until after HwVidFindAdapter has completed. To override this
default, thus allowing child enumeration to occur at any time, you can set the AllowEarlyEnumeration member of
VIDEO_HW_INITIALIZATION_DATA to TRUE.
Some devices generate an interrupt when new hardware is connected to the system or when existing hardware is
disconnected from the system. To handle such an interrupt, the miniport driver should do the following:
Implement a DPC (HwVidDpcRoutine) that calls VideoPortEnumerateChildren.
Implement an interrupt handler (HwVidInterrupt) that calls VideoPortQueueDpc to queue the DPC when
an interrupt on the device occurs.
VideoPortEnumerateChildren forces the reenumeration of the adapter's child devices by causing the miniport
driver's HwVidGetVideoChildDescriptor function to be called for each of the parent device's children. The Plug and
Play manager will update the relationship between the parent device and its children accordingly.
Communicating with the Driver of a Child Device
A video miniport driver and the driver of a child device can mutually define an interface that allows the child driver
to communicate with its hardware through the parent miniport driver. The child driver obtains this interface by
sending an IRP_MN_QUERY_INTERFACE request to the video port driver for the parent miniport driver. Upon
receiving such a request, the video port driver calls the miniport driver's HwVidQueryInterface function, if it is
defined, and the miniport driver returns a pointer to the interface. The driver of the child device can then call into
the miniport driver through the functions exposed by HwVidQueryInterface at any time.
If the miniport driver does not implement HwVidQueryInterface or fails the call, the video port driver passes the
request to the parent of the miniport driver's device. If a child driver sends an IRP_MN_QUERY_INTERFACE to
another child of the miniport driver and the other child driver does not implement HwVidQueryInterface or fails
the call, the video port driver returns an error.
Because the child driver can call into the miniport driver without the video port driver's knowledge, the miniport
driver must synchronize access to itself in all of the functions exposed by HwVidQueryInterface. It does this by
calling VideoPortAcquireDeviceLock and VideoPortReleaseDeviceLock to grab and release the video port
driver-maintained device lock, respectively.
A child device is enumerated by HwVidGetVideoChildDescriptor.
Using I2C to Communicate with a Child Device
On Microsoft Windows XP and later, after the Plug and Play manager has enumerated a video adapter's child
devices, the miniport driver can communicate with the adapter's child devices on an I2C bus using the I²C protocol.
Communication between the miniport driver and WDM drivers for those devices on an I²C bus can occur via a
software interface exposed by the miniport driver (as described in Communicating with the Driver of a Child
Device). The miniport driver can initiate physical communication between those devices on the I²C bus by way of a
new hardware interface exposed by the video port driver. If the miniport driver needs the I²C master device (usually
the graphics chip) to read from or write to a physical child device over the I²C bus, it can use the hardware I²C
interface provided by the video port driver's VideoPortQueryServices routine. Note that this communication over
the I²C bus is limited strictly to hardware devices on the same I²C bus. Miniport driver writers are strongly
encouraged to use these routines for all such communication.
This mode of communication is also useful in cases where a video adapter has components for which there is no
WDM driver. For example, a video adapter may have a daughter board or circuit that is used to send the video
image to a digital flat panel. In this case, the miniport driver can make use of the hardware I²C interface provided by
VideoPortQueryServices to send commands to that circuit over the I²C bus.
The preceding figure illustrates how a miniport driver can initiate communication between two hardware devices
on an I²C bus.
To take advantage of the video port's I²C routines, the miniport driver must query the video port driver for an I²C
interface. In preparation for this, the miniport driver must allocate a VIDEO_PORT_I2C_INTERFACE structure, and
initialize its first two members (the Size and Version members) to appropriate values. The miniport driver then
calls the video port driver's VideoPortQueryServices routine, setting the servicesType parameter to
VideoPortServicesI2C, and setting the pInterface parameter to the partially-initialized
VIDEO_PORT_I2C_INTERFACE structure.
If the call to VideoPortQueryServices is successful, the video port driver fills in the remaining members of the
VIDEO_PORT_I2C_INTERFACE structure, including the addresses of four I²C routines: I2CStart, I2CStop, I2CRead,
and I2CWrite.
I2CStart and I2CStop are used, respectively, to initiate communication with the child device, and to terminate
communication with it.
I2CRead reads a specified number of bytes from the child device; I2CWrite writes a specified number of bytes to it.
Display adapters typically communicate with child devices over the I²C bus. For example, a monitor is a child device
of the display adapter, and the display adapter can read a monitor's capability information over the I2C bus, which
is built into all standard monitor cables.
The I²C bus has only two wires: the serial clock line and the serial data line. Data is read from and written to the
lines one bit at a time. Reading and writing data bits to the I²C lines on the display adapter is hardware dependent,
so the vendor-supplied video miniport driver must provide the functions that instruct the display adapter to read
and write the individual bits.
The following functions, implemented by the video miniport driver, read and write individual data bits to the I²C
serial clock and data lines:
ReadClockLine
WriteClockLine
ReadDataLine
WriteDataLine
The I²C specification defines a protocol for initiating I²C communication, reading and writing bytes over the I²C
data line and terminating I²C communication. The system-supplied video port driver provides the following
functions that implement that protocol.
I2CStart
I2CRead
I2CWrite
I2CStop
Each of the functions (implemented by the video port driver) in the preceding list requires assistance from the
video miniport driver. For example, the I2CRead function reads a sequence of bytes over the I²C data line, but
reading each byte requires reading eight individual bits, a task that only the video miniport driver can do. The
I2CRead function can obtain assistance from the video miniport driver because it receives pointers (in an
I2CCallbacks structure) to the four I²C functions implemented by the video miniport driver (ReadClockLine,
WriteClockLine, ReadDataLine, and WriteDataLine). Similarly, I2CStart, I2CRead, and I2CWrite each receive an
I2CCallbacks structure that contains pointers to all four of the video miniport driver's I²C functions.
The HwVidGetChildDescriptor function, implemented by the video miniport driver, is responsible for reading the
Enhanced Display Identification Data (EDID) structure from a particular monitor and returning the EDID to the
video port driver. HwVidGetChildDescriptor can get assistance from the video port driver by calling
VideoPortDDCMonitorHelper, which uses the I²C bus to read a monitor's EDID according to the Display Data
Channel (DDC) standard. However, when VideoPortDDCMonitorHelper needs to read and write individual bits
to the I²C clock and data lines, it must call back into the video miniport driver for assistance. Therefore,
HwVidChildDescriptor passes an I2CCallbacks structure (which contains pointers to ReadClockLine, WriteClockLine,
ReadDataLine, and WriteDataLine) to VideoPortDDCMonitorHelper.
For more information about the I²C functions implemented by the video miniport driver and video port driver, see
the following topics:
I2C Functions
I2C Functions Implemented by the Video Port Driver
For an overview of all video miniport driver functions and how those functions are registered, see Video Miniport
Driver Functions.
For details on the I²C bus, see the I²C Bus Specification published by Philips Semiconductors.
Interrupts in Video Miniport Drivers
A video miniport driver for an adapter that generates interrupts must implement a HwVidInterrupt routine. The
miniport driver's DriverEntry routine should initialize the HwInterrupt member of the
VIDEO_HW_INITIALIZATION_DATA structure to point to the interrupt handler.
The video port driver sets up an interrupt object for the video miniport driver if the adapter generates interrupts.
Because the interrupt object is created and managed by the video port driver, a video miniport driver writer needs
no further information about interrupt objects.
If the miniport driver's HwVidFindAdapter function finds that the video adapter does not actually generate
interrupts or that it cannot determine a valid interrupt vector/level for the adapter, HwVidFindAdapter should set
both InterruptLevel and InterruptVector in the VIDEO_PORT_CONFIG_INFO structure to zero.
When HwVidFindAdapter returns control, the video port driver checks the interrupt configuration members in
VIDEO_PORT_CONFIG_INFO and, if both are zero, does not connect an interrupt for the miniport driver. Explicitly
setting both interrupt configuration members to zero in HwVidFindAdapter disables the HwVidInterrupt entry
point, if any, that was set up by the miniport driver's DriverEntry function.
Note that HwVidInterrupt can access the miniport driver's device extension since it is nonpaged. Depending on the
design of the miniport driver, it might be impossible for other driver functions to share the device extension or a
particular area of the device extension with HwVidInterrupt safely.
For example, suppose the miniport driver's HwVidStartIO function is accessing the device extension when the
adapter interrupts, HwVidInterrupt is run on another processor, and HwVidInterrupt also accesses the device
extension. If such a situation might occur, HwVidStartIO should call VideoPortSynchronizeExecution with a
driver-supplied HwVidSynchronizeExecutionCallback function.
A video miniport driver should adhere to the following two rules:
1. Whenever the miniport driver and hardware are in any state other than D0, the hardware never generates
an interrupt.
2. Because of Rule 1, a device driver ISR should never act on an interrupt if the power state is D3 (it should
return FALSE).
When to Implement a
HwVidSynchronizeExecutionCallback Routine
Miniport drivers for adapters that do not generate interrupts seldom call VideoPortSynchronizeExecution with a
HwVidSynchronizeExecutionCallback function.
In fact, even miniport drivers that have a HwVidInterrupt function do not necessarily have a
HwVidSynchronizeExecutionCallback function. Because the video port driver does not send a request to a miniport
driver's HwVidStartIO function until it completes processing of the preceding request (see Processing Video
Requests (Windows 2000 Model) for more information), miniport drivers rarely call
VideoPortSynchronizeExecution.
There are two possible uses for a miniport driver's HwVidSynchronizeExecutionCallback function:
To access the adapter registers using the miniport driver's device extension for a driver function other than
the HwVidInterrupt function.
When the HwVidSynchronizeExecutionCallback function is given control, interrupts from the adapter are
masked off so the miniport driver's HwVidInterrupt function cannot change state in the device extension
while the HwVidSynchronizeExecutionCallback function is running in an SMP machine.
To write commands to the adapter registers or ports very quickly if the adapter requires it.
When the HwVidSynchronizeExecutionCallback function is given control, almost all system interrupts are
masked off, so the HwVidSynchronizeExecutionCallback function cannot be preempted by a device (or even,
a clock) interrupt.
An HwVidSynchronizeExecutionCallback function must return control as quickly as possible.
With the first type of HwVidSynchronizeExecutionCallback function, the miniport driver calls
VideoPortSynchronizeExecution with the Priority set to VpMediumPriority. With the second type of
HwVidSynchronizeExecutionCallback function, the miniport driver also makes this call with the Priority set to
VpMediumPriority if the driver has no HwVidInterrupt function. Otherwise, such a miniport driver makes this call
with the Priority set to VpHighPriority.
In general, a miniport driver should not call VideoPortSynchronizeExecution with the second type of
HwVidSynchronizeExecutionCallback function unless the driver designer has no other alternative: that is, unless the
adapter is such that it must be programmed with system interrupts masked off. Otherwise, the miniport driver
should call VideoPortSynchronizeExecution with the Priority set to VpLowPriority.
A HwVidSynchronizeExecutionCallback function, like a HwVidInterrupt function, cannot be pageable and cannot call
certain VideoPortXxx functions without bringing down the system. For a summary of VideoPortXxx functions that
the HwVidSynchronizeExecutionCallback function can call safely, see HwVidInterrupt.
Timers in Video Miniport Drivers
Any video miniport driver can have a HwVidTimer function at the discretion of the driver writer. A HwVidTimer
function allows the miniport driver to time out operations or to monitor state changes over a coarser-grained
interval than is possible by calling VideoPortStallExecution. HwVidTimer also does not prevent other system
operations from occurring as VideoPortStallExecution does.
For example, a miniport driver for an adapter that emulates VGA functionality might have a HwVidTimer function
that monitors the status of its adapter's "VGA" registers periodically so the driver can emulate VGA-style graphics.
After a call to VideoPortStartTimer, the video port driver calls HwVidTimer once every second until the video
miniport driver calls VideoPortStopTimer. A video miniport driver can enable and disable calls to the HwVidTimer
function repeatedly.
Note that a HwVidTimer function cannot disable calls to itself with VideoPortStopTimer. Another video miniport
driver function must control the enabling or disabling of calls to a HwVidTimer function through the use of
VideoPortStartTimer and VideoPortStopTimer.
Spin Locks in Video Miniport Drivers
The video port driver supports multiprocessor synchronization in the video miniport driver by providing spin lock
functions to protect data when one or more miniport driver threads are running at or below IRQL
DISPATCH_LEVEL. The video port driver's spin lock functions enable miniport driver threads to create, acquire,
release, and destroy spin locks. The video port driver provides these functions because video miniport driver
writers must implement miniport drivers using functions provided exclusively by the video port driver. For a
general discussion on spin locks, see Spin Locks.
Before a video miniport driver can use a spin lock, it must create the spin lock by calling
VideoPortCreateSpinLock. After the spin lock has been created, a thread can attempt to acquire the spin lock by a
call to either VideoPortAcquireSpinLock or VideoPortAcquireSpinLockAtDpcLevel. The first function of this
pair can be used when the miniport driver's thread is at or below IRQL DISPATCH_LEVEL. The second function can
be used only when the thread is running at IRQL DISPATCH_LEVEL.
When the thread that is holding the spin lock has completed its task, the miniport driver should release the spin
lock. If the thread acquired the spin lock in a call to VideoPortAcquireSpinLock, it should use
VideoPortReleaseSpinLock to release the spin lock. In the call to VideoPortReleaseSpinLock, the thread must
pass the same value in the NewIrql parameter that it received in the OldIrql parameter of
VideoPortAcquireSpinLock when that function returned. If the thread called
VideoPortAcquireSpinLockAtDpcLevel, it should call VideoPortReleaseSpinLockFromDpcLevel to release
the spin lock.
When the miniport driver has no further use for the spin lock, it should destroy the spin lock by a call to
VideoPortDeleteSpinLock.
Resetting the Adapter in Video Miniport Drivers
Every miniport driver must have a HwVidResetHw function if its adapter cannot be reset to an initialized state
without a hard reboot of the machine.
HwVidResetHw is called by the HAL if the machine is about to crash or if the user initiates a soft reboot of the
machine. HwVidResetHw resets the adapter to a specified character mode, so the HAL can display crash-dump
information as it shuts down the system or initialization information during a soft reboot.
HwVidResetHw cannot call the BIOS, cannot call any pageable code, nor may it be made pageable. If possible, it
should call only the VideoPortReadXxx and VideoPortWriteXxx functions, but it also can call any of the
following:
VideoPortStallExecution
VideoPortZeroDeviceMemory
VideoPortZeroMemory
Bus-Master DMA in Video Miniport Drivers
Beginning with Windows XP, the operating system graphics interface supports DMA on PCI bus-master devices.
Video miniport drivers of PCI bus-master devices can implement the following types of DMA support using helper
functions supplied by the video port driver:
Packet-based DMA
In packet-based DMA, data is transferred directly between the requester's space and the device. Since the
requester's space might not be contiguous, packet-based DMA is more efficient on those devices with
hardware scatter/gather support. Packet-based DMA is an ideal choice for moving large amounts of
arbitrary data between user space and the device.
Common-buffer DMA
In common-buffer DMA, a buffer is shared between (hence, common to), and used by both the host and the
device for repeated DMA operations. Some drivers use common-buffer DMA to upload driver-manipulated
data, such as a series of commands, to the graphics engine. The common buffer is contiguous and is always
accessible to both the device and the host CPU.
The common buffer is a precious system resource. For better overall driver and system performance, drivers
should use common-buffer DMA as economically as possible.
Depending on the nature of the bus-master adapter, some miniport drivers use packet-based DMA exclusively,
others use common-buffer DMA exclusively, and some use both.
Regardless of which type of DMA is used, the miniport driver should call VideoPortGetDmaAdapter to get a
pointer to the VP_DMA_ADAPTER structure and use it for subsequent DMA functions calls. When there is no
longer any need for continued DMA operations, the miniport driver should call VideoPortPutDmaAdapter to
discard the adapter object.
The following subsections describe how to use the packet-based and common-buffer DMA support supplied by the
video port driver.
Packet-Based Bus-Master DMA
Common-Buffer Bus-Master DMA
Points to Consider When Using DMA
Packet-Based Bus-Master DMA
Ordinarily, a display driver initiates a DMA operation by sending a transfer request to the miniport driver. When a
miniport driver supporting packet-based DMA operations receives such a request, it first locks the buffer involved
in the data transfer. The miniport driver then initiates the transfer by calling the video port driver's
VideoPortStartDma function, which in turn calls the miniport driver's HwVidExecuteDma callback routine to
carry out the data transfer. This DMA operation is handled asynchronously: VideoPortStartDma does not wait for
the DMA operation to complete before returning control to the miniport driver.
Depending on the size of the transfer request and the number of system resources assigned to the adapter, the
driver may not be able to transfer all of the data in a single DMA operation. The miniport driver should inspect the
actual transfer size returned in order to find out whether there is more data to be transferred. As soon as the DMA
hardware finishes the current transfer, the miniport driver should call the video port driver's
VideoPortCompleteDma function to complete the current DMA operation. If there is still data remaining to be
transferred, the miniport driver repeats the process of calling the video port driver's VideoPortStartDma and
VideoPortCompleteDma functions iteratively until no more data remains to be transferred. When all of the data
has been transferred, the miniport driver should unlock the buffer.
The miniport driver performs the following sequence of operations to use packet-based DMA:
1. Report hardware capabilities to the system and acquire an adapter object.
The miniport driver calls the video port driver's VideoPortGetDmaAdapter function, which returns a
pointer to a VP_DMA_ADAPTER structure. This is usually done at initialization time, typically within the
miniport driver's HwVidFindAdapter routine. The miniport driver uses this pointer for subsequent DMA
operations.
2. Lock host memory.
The miniport driver calls the video port driver's VideoPortLockBuffer function, which probes the buffer,
makes those memory pages resident, and locks them.
3. Start the DMA transfer.
The miniport driver calls the video port driver's VideoPortStartDma function, which flushes the host
processor memory caches, builds the scatter/gather list, and calls the miniport driver's HwVidExecuteDma
callback routine to carry out the DMA operation asynchronously. VideoPortStartDma returns control to the
miniport driver without waiting for the DMA operation to complete.
4. Complete the DMA transfer.
The miniport driver should call the video port driver's VideoPortCompleteDma function as soon as the
hardware finishes the DMA operation. Many video adapters generate an interrupt when a DMA operation is
complete. For example, a system with this type of adapter could react to the interrupt in the following way.
When the hardware generates the interrupt to notify the miniport driver that the DMA operation has
completed, the miniport driver's interrupt service routine (ISR) calls the video port driver's
VideoPortQueueDpc function to queue a DPC routine, which in turn calls the video port driver's
VideoPortCompleteDma function. The ISR cannot directly call VideoPortCompleteDma since this video
port driver function must be called at or below IRQL DISPATCH_LEVEL.
VideoPortCompleteDma flushes any data remaining in the bus-master adapter's internal cache, and frees
any unused resources (including the scatter/gather list built by VideoPortStartDma).
If only part of the data has been transferred (due to limitations on the number of available map registers, for
example), the miniport driver must make repeated calls to VideoPortStartDma and
VideoPortCompleteDma until all of the data has been transferred.
5. Unlock host memory.
When all of the data has been transferred, the miniport driver should call the video port driver's
VideoPortUnlockBuffer function to unlock the data buffer it acquired in the second step.
6. Discard the adapter object.
This step is optional. If, for some reason, the miniport driver decides that there will be no further DMA
operations for the rest of its lifetime, it should discard the DMA adapter object by calling the video port
driver's VideoPortPutDmaAdapter function.
Common-Buffer Bus-Master DMA
The miniport driver performs the following sequence of operations to use common-buffer DMA:
1. Get an adapter object.
The miniport driver calls the video port driver's VideoPortGetDmaAdapter function, usually within the
miniport driver's HwVidFindAdapter routine, to get a pointer to a VP_DMA_ADAPTER structure. The
miniport driver uses this pointer for subsequent DMA operations.
2. Allocate a common buffer.
The miniport driver calls the video port driver's VideoPortAllocateCommonBuffer function, using the
pointer obtained in the previous step.
3. Release the common buffer.
When the miniport driver no longer requires the common buffer, it calls the video port driver's
VideoPortReleaseCommonBuffer function.
4. Discard the adapter object.
This step is optional. If, for some reason, the miniport driver decides that there will be no further DMA
operations for the rest of its lifetime, it should discard the DMA adapter object by calling the video port
driver's VideoPortPutDmaAdapter function.
Points to Consider When Using DMA
This section provides some important points to consider if you plan to use DMA operations in your miniport driver.
Additional Notes on VideoPortStartDma
The display driver usually sends transfer requests to the miniport driver, which actually carries out those DMA
transfers. The display driver cannot assume that just because its DMA engine is idle, all data in a transfer request
has been transferred. This is because the miniport driver needs to call VideoPortStartDma and
VideoPortCompleteDma multiple times for a large transfer request. The hardware's DMA engine is idle between
two such DMA operations, even though there might be additional data to transfer. It is the miniport driver's
responsibility to inform the display driver when the transfer request has been completely accomplished.
The Context parameter of VideoPortStartDma should point to nonpaged memory, such as memory in the
hardware extension. This parameter is passed through to the miniport driver's HwVidExecuteDma callback
routine, which runs at IRQL DISPATCH_LEVEL.
DMA and Interrupts
For many devices, an interrupt is generated when a hardware DMA operation is complete. The video miniport
driver's interrupt service routine (ISR) should queue a DPC routine for further DMA-related tasks. Do not call the
video port driver's DMA functions in an ISR since they can only be called at or below IRQL DISPATCH_LEVEL.
It is safe to check the size being transferred in the aforementioned DPC routine, even if the VideoPortStartDma
function has not yet returned, since the variable pointed to by the pLength argument of VideoPortStartDma has
already been updated at the time HwVidExecuteDma was called.
Logical Addresses Versus Physical Addresses
The video port driver's DMA implementation uses the concept of logical addresses, which are addresses used by
the DMA hardware. Logical addresses can be different from physical addresses. The video port driver-provided
DMA functions take into account any platform-specific memory restrictions. For this reason, it is important to use
the video port driver DMA functions instead of such kernel-mode functions as MmGetPhysicalAddress. Please
refer to Adapter Objects and DMA for more information about logical addresses.
Concurrent DMA
For devices that support concurrent DMA transfers, either on a DMA controller that supports simultaneous reads
and writes, or on two separate DMA controllers, miniport drivers should obtain a separate DMA adapter object for
each concurrent path. For example, if a device has two DMA controllers that work in parallel, the miniport driver
should make two calls to VideoPortGetDmaAdapter, thereby obtaining pointers to two VP_DMA_ADAPTER
structures. After that, whenever the miniport driver makes a DMA transfer request of a particular DMA controller, it
should use the appropriate pointer in that request.
Supporting DualView (Windows 2000 Model)
Many modern display adapters are able to drive two or more different display devices simultaneously. DualView, a
feature of Microsoft Windows XP and later, provides system-level support for features similar to those of
Multimonitor, but requires only a single display adapter. The graphics device interfaces (GDIs), and the end-user
experiences, are identical for both DualView and Multimonitor.
SingleView Mode
In SingleView mode, a display adapter drives a single display device, regardless of the number of monitors. This is
the usual mode for most of the display adapters that Windows 2000 and later operating system versions currently
support.
DualView Mode
A computer in DualView mode can use a single display adapter (with multiple video ports) to drive multiple images
on different monitors, with each display device portraying a different part of the desktop. The primary image
displays the primary view; other images display secondary views.
The following subsections provide more information about DualView:
Enabling DualView
Enabling DualView
For a minimal DualView implementation, perform the following actions:

Just before the miniport driver's HwVidFindAdapter returns, call the new video port driver entry point,
VideoPortCreateSecondaryDisplay, to generate a device extension for the secondary view. In the
secondary device extension, add two new private members:
1. A flag that indicates the device extension is for a secondary display
2. A pointer that contains the address of the primary display's device extension
Four changes must be made in the miniport driver's HwVidStartIO callback routine, modifying the way it
responds to the four IOCTL requests shown. The fourth item in the following list presents two ways of
accomplishing the same outcome.
1. In response to the IOCTL_VIDEO_MAP_VIDEO_MEMORY request, each view's frame buffer pointer and
length should be properly set.
2. The response to the IOCTL_VIDEO_SET_CURRENT_MODE request should be made specific to the
secondary view.
3. The response to the IOCTL_VIDEO_RESET_DEVICE request depends on whether the device is the
primary or the secondary display. If the device is the primary display, carry out any needed operations. If
the device is the secondary display, however, it is recommended that no action be taken.
4. Change the response to the IOCTL_VIDEO_SHARE_VIDEO_MEMORY request, to get a correct map of
both views. Note that for DirectDraw implementations, you can modify the DirectDraw function
DdMapMemory to get the correct map of both views.
The display driver should take care of the adjustment between the logical frame buffer address and the physical
video memory offset. This is especially important for DirectDraw implementations, because in Dualview the
primary surface may start anywhere other than memory location 0. The display driver should notify DirectDraw
by filling pHalInfo->vmiData.pvPrimary and pHalInfo->vmiData.fpPrimary with the appropriate video
memory offsets on handling DrvGetDirectDrawInfo.
Additional Implementation Notes
HwVidInitialize is called only once for the primary device object. Any secondary device objects must be
initialized in this call.
For a DrvAssertMode call in which bEnable is set to FALSE, the miniport driver should check the status of
the other views. It should avoid turning off the video chip while other views are still active.
Never assume that drawing operations have the same drawing context (for example, color depth and stride).
This is especially important for chips that use tile frame buffers.
GDI can only set the primary view on a built-in device. Some systems, such as laptop computers, have built-
in monitor devices (LCDs), but can also be connected to external monitors. The miniport driver should mark
a view as removable by passing the VIDEO_DUALVIEW_REMOVABLE flag when it calls
VideoPortCreateSecondaryDisplay.
On laptop computers in DualView mode, hotkey switches should be disabled. On a video ACPI-enabled
system, the miniport driver should reject IOCTL_VIDEO_VALIDATE_CHILD_STATE_CONFIGURATION
requests.
For laptop computers supporting multichild devices, the miniport driver should handle
IOCTL_VIDEO_GET_CHILD_STATE requests and return logical child relationships (discussed in the
following section).
An ideal DualView implementation should recognize when its secondary views are enabled or disabled. When the
secondary views are disabled, the primary view should behave as it would without DualView enabled. This means
that:
The primary display can access all parts of video memory.
On a laptop computer, the primary display can be switched to any of the child display devices.
Video Memory Arrangement
In an ideal DualView implementation, memory buffer usage is optimized so that the entire video memory is used
by the primary display when the secondary display is disabled. This optimization is optional, however; the video
memory allocation strategy to use is completely up to the driver writer.
When secondary views are disabled, the primary view should be able to access all parts of video memory to
maximize system performance. When secondary views are enabled, however, the miniport driver should not just
appropriate the primary view's memory. Instead, a miniport driver should reserve video memory for secondary
views, prior to changing to DualView mode. Starting with Windows XP (and continuing for later operating system
versions), there is a new video request, IOCTL_VIDEO_SWITCH_DUALVIEW to help driver writers handle video
memory arrangement. When Windows XP (and later) handles a call to the ChangeDisplaySettings function
(described in the Windows SDK documentation), it sends the IOCTL_VIDEO_SWITCH_DUALVIEW request to each
DualView-related view before it attempts to change the mode. Drivers can use that information to make video
memory arrangements in advance of their need.
The following figure illustrates an arrangement of video memory with DualView disabled.
The following figure illustrates a suggested arrangement of video memory with DualView enabled. Each view has
its own screen buffer and offscreen heap.
Child Relationships
A typical mobile video chip has multiple child devices, such as LCD, CRT, and TV. In SingleView mode, as shown in
the following figure, the primary view owns all of these child devices, while the secondary view owns none of them.
A user can switch the primary view from one child device to another. Only one device can be active at a time.
In DualView mode, however, each child can be assigned to a different view; the question arises as to which view is
associated with which child. The relationships between views and devices can be described in two ways: in terms of
physical child relations and in terms of logical child relations.
Physical child relations reflect the relationship between the adapter's video chip and its display devices. After the
system boots, the physical relationship between the video chip and the display devices never changes. In the
preceding figure and the following figure, the video chip owns the LCD, CRT and TV display devices; hence, all three
display devices are physical children of the video chip.
Logical child relations reflect the dynamic relationships between the views and the display devices. In the following
figure, DualView has been enabled, and the situation is that the primary view (View 1) owns the LCD device, while
the secondary view (View 2) owns both the CRT and TV devices. Another way to say this is that the LCD device is
the logical child of the primary view, while the CRT and TV devices are the logical children of the secondary view.
The miniport driver reports logical child relationships through the IOCTL_VIDEO_GET_CHILD_STATE request.
One additional point remains. When DualView is enabled, the primary view may automatically switch children. In
SingleView mode, only the CRT, which is associated with the primary (and only) view, is active. All other display
devices are inactive. After DualView has been enabled, the preceding figure shows the primary view has switched
to display on the LCD device, while the CRT is a child of the secondary view. This switch might be necessary for a
laptop computer due to the fact that the secondary view is removable, which means that the LCD device cannot be
associated with that view. Whether and how to make this switch is totally under the control of miniport drivers.
TV Connector and Copy Protection Support in Video
Miniport Drivers
A video miniport driver for an adapter that has a TV connector must handle VRPs with the
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS I/O control code. This IOCTL is sent to the miniport driver to either
query the capabilities and current settings of the TV connector and copy protection hardware or set the
functionality of the TV connector and copy protection hardware. The miniport driver determines the action to be
performed by checking the dwCommand field of the VIDEOPARAMETERS structure, which is passed in the VRP's
InputBuffer. The system will not allow playback of Rovi (formerly Macrovision) protected DVDs if a miniport
driver does not handle this VRP.
If dwCommand is set to VP_COMMAND_GET, and the device does not support TV output, then the miniport driver
should return NO_ERROR in the Status member of the VRP's StatusBlock. It should also set the Information
member of the VRP's StatusBlock to the size, in bytes, of the VIDEOPARAMETERS structure. It should set dwFlags
to zero, set dwTVStandard to VP_TV_STANDARD_WIN_VGA, and set dwAvailableTVStandard to
VP_TV_STANDARD_WIN_VGA.
If dwCommand is set to VP_COMMAND_GET, and the device does support TV Out, the miniport driver should
indicate this in the VIDEOPARAMETERS structure by setting the appropriate flags in the dwFlags member and by
assigning values to the other structure members that correspond to the set flags.
The following sections provide implementation details for miniport drivers of devices that have a TV connector:
Querying TV Connector and Copy Protection Hardware
Setting Copy Protection Hardware
Querying TV Connector and Copy Protection
Hardware
A video miniport driver for an adapter that has a TV connector must process the
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS request in its HwVidStartIO function. When the IOCTL request is
IOCTL_VIDEO_HANDLE_VIDEOPARAMETERS, the InputBuffer member of the VIDEO_REQUEST_PACKET
structure points to a VIDEOPARAMETERS structure. The dwCommand member of that VIDEOPARAMETERS
structure specifies whether the miniport driver must provide information about the TV connector
(VP_COMMAND_GET) or apply specified settings to the TV connector (VP_COMMAND_SET).
When the dwCommand member of the VIDEOPARAMETERS structure is VP_COMMAND_GET, the miniport driver
must do the following:
Verify the Guid member of the VIDEOPARAMETERS structure.
For each capability that the TV connector supports, set the corresponding flag in the dwFlags member of
the VIDEOPARAMETERS structure.
For each flag set in the dwFlags member, assign values to the corresponding members of the
VIDEOPARAMETERS structure to indicate the capabilities and current settings associated with that flag. See
the VIDEOPARAMETERS reference page for a list of structure members that correspond to a given flag.
The dwMode member of the VIDEOPARAMETERS structure specifies whether the TV output is optimized for video
playback or for displaying the Windows desktop. A value of VIDEO_MODE_TV_PLAYBACK specifies that the TV
output is optimized for video playback (that is, flicker filter is disabled and overscan is enabled). A value of
VIDEO_MODE_WIN_GRAPHICS specifies that the TV output is optimized for Windows graphics (that is, maximum
flicker filter is enabled and overscan is disabled).
In response to VP_COMMAND_GET, the miniport driver must set the VP_FLAGS_TV_MODE flag in dwFlags and
must set the VP_MODE_WIN_GRAPHICS bit in dwAvailableModes. Setting the VP_MODE_TV_PLAYBACK bit in
dwAvailableModes is optional. In addition, the miniport driver must set the VP_FLAGS_MAX_UNSCALED flag in
dwFlags and must assign values to the corresponding members of the VIDEOPARAMETERS structure.
In response to VP_COMMAND_GET, if the TV output is currently disabled, the miniport driver should set dwMode
to 0, set dwTVStandard to VP_STANDARD_WIN_VGA, and set dwAvailableTVStandard to
VP_STANDARD_WIN_VGA.
Example 1: An adapter supports TV output, which is currently disabled. The miniport driver must do the following in
response to VP_COMMAND_GET:
In dwFlags, set VP_FLAGS_TV_MODE, VP_FLAGS_TV_STANDARD, and all other flags that represent
capabilities supported by the TV connector.
Set dwMode to 0.
In dwAvailableModes, set VP_MODE_WIN_GRAPHICS. If the hardware supports VP_MODE_TV_PLAYBACK,
set that bit also.
Set dwTVStandard to VP_TV_STANDARD_WIN_VGA.
In dwAvailableTVStandard, set all bits that represent TV standards supported by the TV connector.
For all flags set in dwFlags (other than VP_FLAGS_TV_MODE and VP_FLAGS_TV_STANDARD, which have
already been discussed), assign values to the corresponding members of the VIDEOPARAMETERS structure.
Example 2: To enable TV output, the caller (not the miniport driver) should do the following:
In dwFlags, set VP_FLAGS_TV_MODE and VP_FLAGS_TV_STANDARD. Clear all other flags.
Set dwMode to either VP_MODE_WIN_GRAPHICS or VP_MODE_TV_PLAYBACK. Do not set both bits.
Set dwTvStandard to the desired standard (for example VP_TV_STANDARD_NTSC_M). Do not set any other
bits in dwTvStandard.
Example 3: To disable TV output, the caller (not the miniport driver) should do the following:
In dwFlags, set VP_FLAGS_TV_MODE and VP_FLAGS_TV_STANDARD. Clear all other flags.
Set dwMode to 0.
In dwTvStandard, set VP_TV_STANDARD_WIN_VGA. Clear all other bits in dwTvStandard.
Setting TV Connector and Copy Protection Hardware
For any bit set by a miniport driver in the dwFlags member of VIDEOPARAMETERS on a VP_COMMAND_GET, the
miniport driver can perform a set on a VP_COMMAND_SET. It is the caller's responsibility to call the miniport driver
to set only that functionality for which the miniport driver indicated support on a VP_COMMAND_GET. The
miniport driver should respond to a VP_COMMAND_SET by setting the hardware with the value of each
VIDEOPARAMETERS field for which the corresponding bit is set in dwFlags. For example:
If the miniport driver set the VP_FLAGS_TV_MODE bit on a VP_COMMAND_GET, then the miniport driver
should change the TV mode to the value specified by dwMode when VP_FLAGS_TV_MODE is set on a
VP_COMMAND_SET.
If the miniport driver set the VP_FLAGS_TV_STANDARD bit on a VP_COMMAND_GET, then the miniport
driver should change the TV standard to the value specified by dwTVStandard when
VP_FLAGS_TV_STANDARD is set on a VP_COMMAND_SET.
If the miniport driver set the VP_FLAGS_CONTRAST bit on a VP_COMMAND_GET, then the miniport driver
should set the contrast to the value specified by dwContrast when VP_FLAGS_CONTRAST is set on a
VP_COMMAND_SET.
A VIDEOPARAMETERS field contains undefined data if the corresponding bit is not set in dwFlags.
Setting Copy Protection Hardware
Miniport drivers that returned VP_FLAGS_PROTECTED in VIDEOPARAMETERS's dwFlags member on a

VP_COMMAND_GET should do the following in response to the VP_COMMAND_SET command, depending on the
dwCPCommand member of the VIDEOPARAMETERS structure:
If dwCPCommand is VP_CP_CMD_ACTIVATE, the miniport driver should turn on copy protection and
generate and return a unique copy protection key in dwCPKey.
If dwCPCommand is VP_CP_CMD_DEACTIVATE and the copy protection key in dwCPKey is valid, the
miniport driver should turn off copy protection.
If dwCPCommand is VP_CP_CMD_CHANGE and the copy protection key in dwCPKey is valid, the miniport
driver should change copy protection based on the data in based on the trigger data in
bCP_APSTriggerBits.
Miniport drivers of devices that do not have copy protection hardware should simply return NO_ERROR in the
Status field of the VRP's StatusBlock.
Multiple Session Copy Protection
The miniport driver of a device that has copy protection can optionally support multiple simultaneous copy
protection sessions. To do so, the miniport driver should do the following:
Return a unique copy protection key in dwCPKey for each copy protection activation.
Keep copy protection enabled until all sessions have been temporarily turned off (through
VP_CP_CMD_CHANGE) or deactivated (VP_CP_CMD_DEACTIVATE). For example, the miniport driver could
increment or decrement a reference count every time copy protection is activated (VP_CP_CMD_ACTIVATE)
or deactivated/turned off, disabling copy protection entirely only when the reference count is zero.
Mirror Driver Support in Video Miniport Drivers
(Windows 2000 Model)
Mirror driver support for video miniport drivers is provided by Windows 2000 and later, so a miniport driver must
not have any special code to attempt such support. See Mirror Drivers for more information about display drivers
in mirroring systems.
The requirements for a mirror driver miniport driver are minimal. The only functions which must be implemented
are DriverEntry, which is exported by the miniport driver, and the following:
HwVidFindAdapter
HwVidInitialize
HwVidStartIO
Since there is no physical display device associated with a mirrored surface, all three of the functions shown in the
preceding list can be empty implementations that always return success.
Drivers.
VGA-Compatible Video Miniport Drivers (Windows
2000 Model)
On x86-based NT-based operating system platforms, there are two kinds of video miniport drivers: nonVGA-
compatible miniport drivers and VGA-compatible miniport drivers.
Most miniport drivers are nonVGA-compatible, and are consequently much simpler to implement. NonVGA-
compatible video miniport drivers rely on having the system-supplied VGA miniport driver (vga.sys) or another
VGA-compatible SVGA miniport driver loaded concurrently. Such a miniport driver is set up to configure itself in
the registry with VgaCompatible set to zero (FALSE) and has the following features:
It provides no special support for full-screen MS-DOS applications in x86-based machines. Instead, it is
loaded along with a system-supplied VGA (or, possibly, with a VGA-compatible SVGA) miniport driver,
which provides this support for full-screen MS-DOS applications.
In most cases, it either is written for an adapter that has no VGA compatibility mode or for an accelerator
that works independently of the VGA.
A VGA-compatible miniport driver is based on the system-supplied VGA miniport driver, with code modified to
support adapter-specific features. The system-supplied VGA display drivers use the support provided by VGA-
compatible miniport drivers, so the developer of a new miniport driver for a VGA-compatible adapter need not
write a new display driver. It provides support for full-screen MS-DOS applications to do I/O directly to the adapter
registers. It also functions as a video validator to prevent such applications from issuing any sequence of
instructions that would hang the machine.
Self-declared "VGA-compatible" miniport drivers are set up to configure themselves in the registry with
VgaCompatible set to one (TRUE).
VGA-compatible miniport drivers in x86-based machines replace the system-supplied VGA miniport driver.
Therefore, VGA-compatible miniport drivers must have a set of SvgaHwIoPortXxx functions to support full-screen
MS-DOS applications as the system-supplied VGA miniport driver does.
The designer of a new VGA-compatible SVGA miniport driver should adapt one of the system-supplied SVGA
miniport driver's SvgaHwIoPortXxx functions to the adapter's features. Miniport drivers for other types of adapters
in x86-based machines can have a set of SvgaHwIoPortXxx routines and provide the same support at the discretion
of the miniport driver designer or if the miniport driver cannot be loaded while the system VGA miniport driver is
loaded.
Windowed VDMs in x86-Based Machines
Each MS-DOS application runs as a Windows VDM, which in turn, runs as a console manager application in the
Win32 protected subsystem.
In NT-based operating system platforms, a kernel-mode component called the V86 emulator traps I/O instructions
issued by MS-DOS applications. As long as such an application runs within a window, its attempts to access video
adapter ports are trapped and reflected back to the system-supplied video VDD, which emulates the behavior of the
adapter for the application.
In other words, the display driver retains control of the video adapter while a VDM runs in a window.
Full-Screen VDMs in x86-based Machines
For performance reasons, when the user switches an MS-DOS application to full-screen mode in an x86-based
machine, the display driver yields control of the adapter. The system VGA or a VGA-compatible miniport driver then
hooks out from the V86 emulator all I/O instructions, such as application-issued IN, REP INSB/INSW/INSD, OUT,
and REP OUTSB/OUTSW/OUTSD instructions, to the video I/O ports. These hooked I/O operations are forwarded
to the VGA-compatible miniport driver's SvgaHwIoPortXxx functions.
However, for faster performance, a miniport driver can call VideoPortSetTrappedEmulatorPorts to allow some
I/O ports to be accessed directly by the application. The miniport driver continues to hook other I/O ports with its
SvgaHwIoPortXxx to validate the application-issued instruction stream to those ports.
To prevent a full-screen application from issuing a sequence of instructions that might hang the machine, the
SvgaHwIoPortXxx functions monitor the application instruction stream to a driver-determined set of adapter
registers. A miniport driver must enable direct access only to I/O ports that are completely safe. For example, ports
for the sequencer and miscellaneous output registers should always be hooked by the V86 emulator and trapped to
the miniport driver-supplied SvgaHwIoPortXxx functions for validation.
Direct access to I/O ports for the application is determined by the IOPM (named for the x86 I/O permission map)
that the VGA-compatible miniport driver sets by calling VideoPortSetTrappedEmulatorPorts. Note that the
miniport driver can adjust the IOPM by calling this function to have access ranges, describing I/O ports, released for
direct access by the application or retrapped to an SvgaHwIoPortXxx function. The current IOPM determines which
ports can be accessed directly by the application and which remain hooked by the V86 emulator and trapped to an
SvgaHwIoPortXxx function for validation.
By default, all I/O ports set up in such a miniport driver's emulator access ranges array are trapped to the
corresponding SvgaHwIoPortXxx function. However, VGA-compatible miniport drivers usually call
VideoPortSetTrappedEmulatorPorts on receipt of an IOCTL_VIDEO_ENABLE_VDM request to reset the IOPM for
the VDM to allow direct access to some of these I/O ports. Usually, such a driver allows direct access to all video
adapter registers except the VGA sequencer registers and the miscellaneous output register, plus any SVGA
adapter-specific registers that the driver writer has determined should always be validated by an SvgaHwIoPortXxx
function.
VGA-Compatible Miniport Driver's
HwVidFindAdapter
A VGA-compatible miniport driver's HwVidFindAdapter function (or registry HwVid..Callback) must set up the
following in the VIDEO_PORT_CONFIG_INFO buffer:
NumEmulatorAccessEntries, indicating the number of entries in the EmulatorAccessEntries array
EmulatorAccessEntries, pointing to a static array containing the given number of
EMULATOR_ACCESS_ENTRY-type elements, each describing a range of I/O ports hooked from the V86
emulator and, by default, forwarded to an SvgaHwIoPortXxx function
Each entry includes a starting I/O address, a range length, the size of access to be trapped (UCHAR, USHORT,
or ULONG), whether the miniport driver supports input or output of string data through the I/O port(s), and
the miniport driver-supplied SvgaHwIoPortXxx function that actually validates and, possibly, transfers the
data. Each SvgaHwIoPortXxx function handles read (IN or REP INSB/INSW/INSD) and/or write (OUT or
REP OUTSB/OUTSW/OUTSD) transfers of UCHAR-, USHORT-, or ULONG-sized data.
EmulatorAccessEntriesContext, a pointer to storage, such as an area in the miniport driver's device
extension, in which the miniport driver's SvgaHwIoPortXxx functions can batch a sequence of application-
issued instructions that require validation
VdmPhysicalVideoMemoryAddress and VdmPhysicalVideoMemoryLength, describing a range of
video memory that must be mapped into the VDM address space to support BIOS INT10 calls from full-
screen MS-DOS applications
The miniport driver can call the VideoPortInt10 function when such an application changes the video mode
to one that the miniport driver's adapter can support.
HardwareStateSize, describing the minimum number of bytes required to store the hardware state for the
adapter in response to an IOCTL_VIDEO_SAVE_HARDWARE_STATE request
When the user switches a full-screen MS-DOS application to run in a window, the miniport driver must save
the adapter state before the display driver regains control of the video adapter. Note that a VGA-compatible
miniport driver also must support the reciprocal IOCTL_VIDEO_RESTORE_HARDWARE_STATE request
because the user might switch the windowed application back to full-screen mode.
A VGA-compatible miniport driver's emulator access entries specify subsets of its access ranges array for the
adapter. The emulator access entries can be and usually are all I/O ports in the mapped access ranges array set up
by its HwVidFindAdapter function. The access ranges it passes in calls to VideoPortSetTrappedEmulatorPorts,
defining the current IOPM and determining the I/O ports that are directly accessible by a full-screen MS-DOS
application, specify subsets of the miniport driver's emulator access entries.
Validating Instructions in SvgaHwIoPortXxx
As already mentioned in VGA-Compatible Miniport Driver's HwVidFindAdapter, the IOPM set for directly accessible
I/O ports usually includes all SVGA registers except the sequencer registers and the miscellaneous output register,
which the VGA-compatible miniport driver continues to monitor with its SvgaHwIoPortXxx functions. The
sequencer registers control internal chip timing on VGA-compatible video adapters. If a full-screen MS-DOS
application touches other adapter registers during a synchronous reset, the machine can hang. Likewise, if the
miscellaneous output register is set to select a nonexistent clock, the machine can hang.
VGA-compatible miniport drivers must ensure that full-screen MS-DOS applications do not issue instructions that
cause the machine to hang. Each such miniport driver must supply SvgaHwIoPortXxx functions that monitor
application-issued instructions to the I/O ports for the adapter sequencer registers and miscellaneous output
register. Each new VGA-compatible miniport driver for an adapter with special features also must monitor and
continue to validate any I/O ports to which an application might send any instruction sequence that could hang the
machine.
Whenever an application attempts to access the sequencer clock register, the SvgaHwIoPortXxx function must
change the IOPM in order to trap all instructions coming in during a synchronous reset. As soon as an application
sends an instruction that affects the sequencer or attempts to write to the miscellaneous output register, the
SvgaHwIoPortXxx function should adjust the IOPM by calling VideoPortSetTrappedEmulatorPorts to disable
direct access to all adapter registers.
The miniport driver-supplied SvgaHwIoPortXxx functions should buffer subsequent IN (or INSB/INSW/INSD)
and/or OUT (or OUTSB/OUTSW/OUTSD) instructions in the EmulatorAccessEntriesContext area it set up in the
VIDEO_PORT_CONFIG_INFO (see VGA-Compatible Miniport Driver's HwVidFindAdapter) until the synchronous
reset is done, or until the application either restores the miscellaneous output register or resets it to a "safe" clock.
Then, the miniport driver is responsible for checking that the buffered instructions cannot hang the machine. If not,
the miniport driver should process the buffered instructions, usually by calling VideoPortSynchronizeExecution
with a driver-supplied HwVidSynchronizeExecutionCallback function. Otherwise, the miniport driver should discard
the buffered instructions.
VGA-Compatible Miniport Driver's HwVidStartIO
When the user switches a full-screen MS-DOS application back to running in a window, a VGA-compatible miniport
driver's HwVidStartIO function is sent a VRP with the I/O control code IOCTL_VIDEO_SAVE_HARDWARE_STATE. The
miniport driver must store the state of the adapter in case the user switches the application to full-screen mode
again.
Note that the miniport driver's SvgaHwIoPortXxx function might have buffered a sequence of application INs
and/or OUTs, as described in Validating Instructions in SvgaHwIoPortXxx, when its HwVidStartIO function is called
to save the adapter state. In these circumstances, the miniport driver should save the current state, including the
buffered instructions, so that the SvgaHwIoPortXxx functions can resume validation operations exactly where they
left off if the user switches the application to full-screen mode again.
When the miniport driver completes a save operation, the port driver automatically disables the current IOPM for
VDMs and the miniport driver's SvgaHwIoPortXxx functions. The video port driver restores the IOPM automatically
if the application is switched to full-screen mode again. It also resumes calling the miniport driver's
SvgaHwIoPortXxx function, after it calls the miniport driver's HwVidStartIO function with the
IOCTL_VIDEO_RESTORE_HARDWARE_STATE request.
Video Miniport Drivers on Multiple Windows Versions
(Windows 2000 Model)
If you plan to modify a video miniport driver written for Windows NT 4.0 to run on a later NT-based operating
system version, see Converting a Windows NT 4.0 Miniport Driver to Windows 2000. You can develop a video
miniport driver on a current version of Windows, but have it run on an earlier NT-based Windows version. For
details, see Using VideoPortGetProcAddress.
Converting a Windows NT 4.0 Miniport Driver to
Windows 2000
A good Windows NT 4.0 and previous miniport driver can easily become a Windows 2000 and later miniport
driver. The following are some of the updates necessary to provide Plug and Play support, which is required in
Windows 2000 and later miniport drivers:
See Plug and Play and Power Management in Video Miniport Drivers (Windows 2000 Model) for a list of
new functions that must be implemented. Be sure to initialize the new members of
VIDEO_HW_INITIALIZATION_DATA to point to these new functions.
Update the call to VideoPortInitialize in your DriverEntry function. The fourth parameter (HwContext)
must be NULL on Windows 2000 and later.
Update your HwVidFindAdapter function. For devices on an enumerable bus, HwVidFindAdapter must be
changed as follows:
Remove most of your device detection code. This is because a call to HwVidFindAdapter on Windows
2000 means that the PnP manager has already detected the device.
Call VideoPortGetAccessRanges to obtain the bus-relative physical addresses to which the device will
respond. These addresses are assigned by the PnP manager.
If the driver supports more than one device type, determine the type of device.
Ignore the Again parameter. This is because the system will call HwVidFindAdapter only once per device.
For a device on a nonenumerable bus such as ISA, PnP still attempts to start the device, although it is the
responsibility of HwVidFindAdapter to determine whether the device is actually present.
Update the .Mfg section of the driver's INF file to include the device and vendor ID. This is required so that
the PnP manager can associate the device with its INF file. Samples of the Windows NT 4.0 and updated
Windows 2000 and later .Mfg sections follow:
[ABC.Mfg] ; Windows NT V4.0 INF

%ABC% ABC Graphics Accelerator A = abc
%ABC% ABC Graphics Accelerator B = abc
[ABC.Mfg] ; Windows 2000 and later INF

%ABC% ABC Graphics Accelerator A = abc, PCI\VEN_ABCD&DEV_0123
%ABC% ABC Graphics Accelerator B = abc, PCI\VEN_ABCD&DEV_4567
You can use the geninf.exe tool that is included with the Driver Development Kit (DDK) to generate an INF. (The
DDK preceded the Windows Driver Kit [WDK].) Keep in mind, however, that geninf.exe does not create an INF for
Windows NT 4.0. You must modify the INF file produced by geninf.exe if you intend to support Windows NT 4.0.
See Creating Graphics INF Files for more details.
The Windows 2000 and later video port supports Windows NT 4.0 miniport drivers as legacy drivers. The graphics
adapter for a legacy miniport driver cannot be removed from the system while the system is running, nor are
legacy miniport drivers automatically detected when added to a running system.
Using VideoPortGetProcAddress
A video miniport driver developed on one NT-based operating system version can be loaded and run on an earlier
operating system version, as long as the miniport driver does not attempt to use functionality that is specific to the
newer operating system version.
When the video miniport driver is loaded, the VideoPortGetProcAddress member of the
VIDEO_PORT_CONFIG_INFO structure contains the address of a callback routine that the video port driver
exports, VideoPortGetProcAddress. A miniport driver can use this callback routine to find the address of a video
port function exported from videoprt.sys. After the miniport driver has the function's address, it can use this
address to call the function. This is shown in the following example code.
// Useful typedef for a function pointer type

// that points to a function with same argument types
// as VideoPortCreateSecondaryDisplay
typedef VP_STATUS ( *pFunc(PVOID, PVOID *, ULONG));
// Declare a pointer to a function

pFunc pVPFunction;
// Declare a pointer to a VIDEO_PORT_CONFIG_INFO struct

PVIDEO_PORT_CONFIG_INFO pConfigInfo;
// Call through VideoPortGetProcAddress callback

// to get address of VideoPortCreateSecondaryDisplay
pVPFunction = (pFunc)
( *(pConfigInfo->VideoPortGetProcAddress)(
pDeviceExt,
"VideoPortCreateSecondaryDisplay")
);
if (NULL == pVPFunction) {
// Video port does not export the function
...
}
else {
Status = pVPFunction(DevExtension,
&SecondDevExtension,
VIDEO_DUALVIEW_REMOVABLE);
}
After the call through the VideoPortGetProcAddress callback routine has executed, pVPFunction either is NULL or
contains the address of the VideoPortCreateSecondaryDisplay function. If pVPFunction is NULL, the video port
driver does not export the function you are trying to find, and the miniport driver must not attempt to use it. If
pVPFunction is not NULL, you can use this pointer to call VideoPortCreateSecondaryDisplay as shown in the
preceding example.
Implementation Tips and Requirements for the
Windows 2000 Display Driver Model
The following topics discuss tips and requirements for implementing display and video miniport drivers:
Exception Handling When Accessing User-Mode Memory
Exception Handling When Accessing User-Mode
Memory
A display or video miniport driver must use exception handling around code that accesses data structures allocated
in user mode. The Microsoft Direct3D runtime secures ownership of such data structures before passing them to
the driver. To secure ownership of user-mode memory, the runtime calls the MmSecureVirtualMemory function.
When the runtime secures ownership of user-mode memory, it prevents any other thread from modifying the type
of access to the memory. For example, if the runtime secures ownership of a data structure that a user-mode
thread has allocated with read and write access, other threads cannot restrict the data structure's access type to
read-only. Also, securing ownership of user-mode memory does not guarantee that the memory remains valid.
Therefore, unless exception handling is implemented around code that accesses such memory, the operating
system crashes if the driver attempts to access invalid user-mode memory. For invalid kernel-memory accesses,
the only available option for the operating system is to crash. However, for invalid user-memory accesses, the
driver can terminate the application that invalidated the memory and leave the operating system and the driver's
device in a stable state.
The driver must implement exception handling rather than rely on the runtime to handle exceptions. If the runtime
handled exceptions and the driver accessed invalid user-mode memory, the stack would return to the exception-
handling code in the runtime, leaving the driver or the device in an unknown state. The driver must implement
exception handling so that it can perform the following operations if an exception occurs:
Restore its state and the state of its device.
Release any spin locks that it acquired.
In the following scenarios, the runtime secures ownership of memory allocated in user mode before passing the
memory to the driver.
The driver processes vertex data that is specified by a pointer to user-mode memory. The driver receives this
memory pointer in a call to its D3dDrawPrimitives2 function. In this D3dDrawPrimitives2 call, the
D3DHALDP2_USERMEMVERTICES flag of the dwFlags member of the
D3DHAL_DRAWPRIMITIVES2DATA structure is set.
The driver updates the render state array to which the lpdwRStates member of
D3DHAL_DRAWPRIMITIVES2DATA points. The driver updates this array during a call to its
D3dDrawPrimitives2 function.
The driver updates its state at the lpdwStates member of the DD_GETDRIVERSTATEDATA structure
during a call to its D3dGetDriverState function.
The driver bit-block transfers or accesses a system texture that was allocated in user memory.
A display driver can use the try/except mechanism to implement exception handling. For more information about
try/except, see the Microsoft Visual C++ documentation.
The following code example shows how the driver can use the try/except mechanism to throw an exception if an
error occurs due to accessing invalid memory.
__try
{
// Access user-mode memory.
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
// Recover and leave driver and hardware in a stable state.
}
Note Aside from accessing and copying the user-mode value into a local variable, the driver should not perform
any other operations inside the __try block. Other operations can cause their own exceptions to occur. The
operating system handles these exceptions differently.
To ensure that the end user is able to use a display driver on a specific operating system and with a specific version
of DirectX, an appropriate version number must be applied to that driver. With DirectX, version numbers have
become very important for device drivers. If a device driver is shipped with the wrong version number or a version
number that uses the wrong format, the end user will encounter difficulties when any DirectX application is
installed.
Note The DriverVer directive provides a way to add version information for the driver package, including the
driver file and the INF file itself, to the INF file. By using and updating the DriverVer directive, driver packages can
be safely and definitively replaced by future versions of the same package. For more information about this
directive, see INF DriverVer Directive in the Device Installation section of the Windows Driver Kit (WDK)
documentation.
The following table gives the range of version numbers appropriate for IHV- or OEM-supplied drivers for
compatibility with various versions of DirectX.
VERSION NUMBER VERSION NUMBER

TARGET SYSTEM FROM: UP THROUGH:
Windows 98-only drivers (DirectX5) 4.05.00.0000 4.05.00.9999
DirectX 1.0-compatible drivers 4.02.00.0095 4.03.00.1096
Windows 98/Me DirectX 7.0- 4.12.10.0000 4.12.10.9999

compatible drivers
Windows 2000 DirectX 7.0- 5.12.10.0000 5.12.10.9999

compatible drivers
Windows XP and later DirectX 7.0- 6.12.10.0000 6.12.10.9999

compatible drivers

compatible drivers
VERSION NUMBER VERSION NUMBER
TARGET SYSTEM FROM: UP THROUGH:
Windows 2000 DirectX 8.0- 5.13.10.0000 5.13.10.9999

compatible drivers

compatible drivers

compatible drivers
Windows 2000 DirectX 9.0- 5.14.10.0000 5.14.10.9999

compatible drivers

compatible drivers
Note The DirectX 9.0 DDK documentation indicated that the version number for a Windows XP and later DirectX-
compatible driver must be from 6.nn.01.0000 to 6.nn.01.9999. However, to support legacy WHQL manual test
specifications, the documentation also indicated that the version number could be from 6.nn.10.0000 to
6.nn.10.9999. Because of this legacy WHQL requirement, some DirectX applications required a display driver
version number of n.nn.10.nnnn. If a display driver's version number was switched from n.nn.10.nnnn to
n.nn.01.nnnn so that it more accurately conformed to the DirectX 9.0 DDK documentation requirement, such
applications might not run because they would interpret the driver as an earlier version.
Therefore, a display driver's version number should be set to n.nn.10.nnnn.
For device drivers that do not support DirectX, the version number must be greater than 4.00.00.0095 and less
than 4.02.00.0095. For example, if a display device driver is a Windows 3.1 display driver or a Windows 95-only
display driver, a version number of 4.01.00.0000 would be correct.
Conversely, a version number of 4.03.00.0000 for this driver would be incorrect. Device drivers that support
DirectX on Windows 95 only should have a version number equal to or greater than 4.02.00.0095 and less than
4.04.00.0000.
Storing Internal Version Numbers
In addition to the format that Microsoft requires for the version number, many vendors have expressed the desire
to store an internal version number for product support and testing purposes. Every DirectX driver has one version
number that is stored in duplicate: one binary version stored as two DWORDs, and one string version. The binary
version cannot be modified.
The string version, however, can be appended in the following way:
1. The vendor creates a version number, as described earlier in this article. This version number is used "as is"
in the binary version number.
2. The vendor uses this version number as the basis for the string version number. If desired, a vendor-specific
version string can be appended to the existing version number to form the complete string version number.
The vendor-specific string and the version number are separated by a "-" (hyphen character).
For example, if "4.03.00.2100" is the version number for a DirectX-compliant display driver, and the vendor uses
the "xx.xx.xx" number format internally, then the combined string version number in the driver is "4.03.00.2100-
xx.xx.xx".
When the customer checks the version number of the driver (by right-clicking on the file in Windows Explorer,
choosing Properties, and then clicking the Version tab), Windows displays the string version. The vendor's
product support should be able to identify the vendor-specific portion of the version number if it exists and take
appropriate action.
A video miniport driver should detect when a removable child device is changed with another like device so the
driver can prevent Plug and Play (PnP) from using the data of the original child device. For example, the video
miniport driver should detect when the user switches monitors.
If Extended Display Information Data (EDID) for the attached monitor changes between successive calls to the video
miniport driver's HwVidGetVideoChildDescriptor function, instead of tearing down the original monitor stack and
building a new stack for the new monitor, the video port driver modifies the state of the current stack. Although the
graphics subsystem can determine the new monitor's capabilities, because the original stack was not torn down,
other operating system components (such as, PnP) use the capability data of the original monitor.
A video miniport driver can detect a change to the attached monitor and perform one of the following operations
to prevent PnP from using the data of the original monitor:
1. The video miniport driver can report that no monitor is present in order to force the tear down of the former
monitor stack. Then, to force the video port driver to re-enumerate child devices in order to report the new
monitor, the video miniport driver calls the VideoPortEnumerateChildren function. The video miniport
driver should call VideoPortEnumerateChildren to schedule the re-enumeration of child devices only
after the first enumeration that reports that the monitor is disconnected completes.
2. On appropriate computer and monitor configurations (see the following exception), the video miniport
driver can respond to its HwVidGetVideoChildDescriptor function by returning the new monitor's
information in the buffer that the pChildDescriptor parameter of HwVidGetVideoChildDescriptor points to.
However, the video miniport driver must specify a 32-bit device ID for the new monitor in the variable that
the UId parameter points to. This value must be different from the value that the video miniport driver used
for the former monitor.
For an Advanced Configuration and Power Interface (ACPI) enumerated monitor, the first mechanism is generally
preferable because 32-bit device IDs are tied to the BIOS implementation. Therefore, specifying a different 32-bit
device ID might not be possible.
GDI
The topics in this section describe Graphics Device Interface (GDI) and its relationship to printer drivers and display
adapter drivers.
Display adapter drivers that run on Windows Vista can adhere to one of two models: the Windows Vista display
driver model or the Windows 2000 display driver model. The topics in this section apply to printer drivers and to
drivers in the Windows 2000 display driver model, but they do not apply to drivers in the Windows Vista display
driver model.
Microsoft Windows NTâˆ’based operating systems provide a robust graphics architecture in which third-party
graphics hardware companies can easily integrate their video displays and printing devices. These sections provide
design guidelines for writing effective graphics drivers:
Graphics DDI
This section describes the Windows Graphics Device Interface (GDI) and Graphics device driver interface
(DDI). Design and implementation details that are common to both display and printing drivers are
discussed.
These sections describe the video display environment in Windows NTâˆ’based operating systems and
provide design and implementation details for display, video miniport, and monitor driver writers. Note that
drivers that comply with the Windows 2000 Display Driver Model cannot be installed on Windows 8 and
later computers.
Print Devices Design Guide
This section describes the drivers and print spooler that make up the printing environment in Windows
NTâˆ’based operating systems. Sections within this part of the Windows Driver Kit (WDK) specify how to
provide customized driver and spooler components, so that new printer hardware and network
configurations can be supported.
In response to device-independent application calls routed through the Graphics Device Interface (GDI), a graphics
driver must ensure that its graphics device produces the required output. A graphics driver controls graphics
output by implementing as much of the graphics Device Driver Interface (DDI) as is necessary.
Graphics DDI function names are in the DrvXxx form. GDI calls these DrvXxx functions to pass data to the driver.
When an application makes a request of GDI, and GDI determines that the driver supports the relevant function,
GDI calls that function. It is the responsibility of the driver to provide the function and return to GDI upon the
function's completion.
This section describes the graphics DDI functions that writers of display and printer drivers must be aware of.
Graphics DDI function declarations, structure definitions, and constants can be found in winddi.h. For more
information about the graphics DDI functions, see GDI Functions Implemented by Printer and Display Drivers.
The topics contained in this section are as follows:
Graphics Driver Functions
Supporting Initialization and Termination Functions
Floating-Point Operations in Graphics Driver Functions
Creating Device-Dependent Bitmaps
Supporting Graphics Output
Supporting Graphics DDI Color and Pattern Functions
Supporting Graphics DDI Font and Text Functions
The DEVMODEW Structure
Graphics Driver Functions
The topics that follow describe the driver entry point functions, categorizing them as required, required under
certain circumstances, and optional.
Required Graphics Driver Functions
Conditionally Required Graphics Driver Functions
Optional Graphics Driver Functions
When a device driver returns an error, it should typically call the GDI EngSetLastError function to report an
extended error code. The application program can then retrieve the error code.
Required Graphics Driver Functions
All graphics drivers must support the entry points that GDI calls to enable and disable the driver, the PDEV
structure, and the surface associated with each PDEV. The following table lists the needed functions in the order in
which they are typically called.
ENTRY POINT DESCRIPTION
DrvEnableDriver As the initial driver entry point, this function provides GDI
with the driver version number and entry points of
optional functions supported. This is also the only driver
function that GDI calls by name. All of the other driver
functions are accessed through a table of function
pointers. Unlike DrvEnableDriver, the names of the
other driver functions are not fixed.
DrvGetModes Lists the modes supported by a specified video hardware

device. (This function is required of display drivers only.)
DrvEnablePDEV Enables a PDEV.
DrvCompletePDEV Informs the driver upon completion of device installation.
DrvEnableSurface Creates a surface for a specified hardware device.
DrvDisableSurface Informs the driver that the surface created for the current
device is no longer needed.
DrvDisablePDEV When the hardware is no longer needed, frees memory

and resources used by the device and any surface created,
but not yet deleted.
DrvDisableDriver Frees all allocated resources for the driver and returns the
device to its initial state.
DrvAssertMode Resets the video mode for a specified hardware device.

(This function is required of display drivers only.)
DrvResetDevice Resets the device when it has become inoperable or

unresponsive.

Conditionally Required Graphics Driver Functions
Besides the functions that are always required, certain other functions may be required, depending on how a driver
is implemented. The conditionally-required functions are listed in the following table. If the driver manages its own
primary surface (using the EngCreateDeviceSurface function to get a handle to the surface), or its own offscreen
bitmaps, the driver must also support several drawing functions. Drivers writing to standard format DIBs usually
allow GDI to manage most or all of these operations. Displays that support settable palettes must also support the
DrvSetPalette function.
It is more common for a printer driver than a display driver to define or draw fonts. A display driver is not required
to handle fonts. If the hardware has a resident font, the driver must supply information to GDI about this font. This
information includes font metrics, mappings from Unicode to individual glyph identities, individual glyph
attributes, and kerning tables.
ENTRY POINT WHEN REQUIRED DESCRIPTION
DrvCopyBits Device-managed surfaces Translates between device-

managed raster surfaces and GDI
standard-format bitmaps.
DrvDescribePixelFormat Displays that support windows with Describes a PDEV's pixel format.
different pixel formats on a single
surface
DrvGetTrueTypeFile TrueType font drivers Gives GDI access to a memory-

mapped TrueType font file.
DrvLoadFontFile Font drivers Specifies file to use for font

realizations.
DrvQueryFont Printer drivers Retrieves a GDI structure for a

given font.
DrvQueryFontCaps Font drivers Asks driver for font driver

capabilities.
DrvQueryFontData Printer drivers Retrieves information about a

realized font.
DrvQueryFontFile Font drivers Asks driver for font file information.
DrvQueryFontTree Printer drivers Queries a tree structure defining

one of three types of font mapping.
ENTRY POINT WHEN REQUIRED DESCRIPTION
DrvQueryTrueTypeOutline TrueType font drivers Returns TrueType glyph handles to

GDI.
DrvQueryTrueTypeTable TrueType font drivers Gives GDI access to TrueType font

files.
DrvResetPDEV Devices that allow mode changes in Transfers driver state from old
documents PDEV to new PDEV.
DrvSetPalette Displays that support settable Realizes the palette for a specified
palettes device.
DrvSetPixelFormat Displays that support windows with Sets a window's pixel format.
different pixel formats on a single
surface
DrvStrokePath Device-managed surfaces Renders a path on the display.
DrvSwapBuffers Drivers that support a pixel format Displays contents of a surface's

with double buffering hidden buffer.
DrvTextOut Device-managed surfaces or drivers Renders a set of character images

that define fonts (glyphs) at specified positions.
DrvUnloadFontFile Font drivers Informs driver that a font file is not

needed.

Optional Graphics Driver Functions
In the interests of reducing driver size, driver writers usually add only those optional functions that are well-
supported in hardware. For example, a driver for hardware that supports Image Color Management (ICM) can
implement the DrvIcmXxx functions. The following tables list the functions that a graphics driver can optionally
implement.
Display and Printer Driver Functions
DrvAlphaBlend Provides bit block transfer capabilities with alpha

blending.
DrvBitBlt Executes general bit block transfers to and from surfaces.
DrvCreateDeviceBitmap Creates and manages a bitmap with a driver-defined

format.
DrvDeleteDeviceBitmap Deletes a device-managed bitmap.
DrvDitherColor Requests a device to create a brush dithered against a

device palette.
DrvFillPath Paints a closed path for a device-managed surface.
DrvGradientFill Shades the specified primitives.
DrvIcmCheckBitmapBits Checks whether the pixels in the specified bitmap lie

within the device gamut of the specified transform.
DrvIcmCreateColorTransform Creates an ICM color transform.
DrvIcmDeleteColorTransform Deletes the specified ICM color transform.
DrvIcmSetDeviceGammaRamp Sets the hardware gamma ramp of the specified display

device.
DrvLineTo Draws a single solid integer-only cosmetic line.

DrvPlgBlt Provides rotate bit block transfer capabilities between

combinations of device-managed and GDI-managed
surfaces.
DrvRealizeBrush Realizes a specified brush for a defined surface.
DrvStretchBlt Allows stretching block transfers among device-managed

and GDI-managed surfaces.
DrvStretchBltROP Performs a stretching bit block transfer using a ROP.
DrvStrokeAndFillPath Simultaneously fills and strokes a path.
DrvSynchronize Coordinates drawing operations between GDI and a

managed surfaces only.
DrvSynchronizeSurface Coordinates drawing operations between GDI and a

managed surfaces only. If a driver provides both
DrvSynchronize and DrvSynchronizeSurface, GDI will call
only DrvSynchronizeSurface.
DrvTransparentBlt Provides bit block transfer capabilities with transparency.
Functions Used Exclusively by Display Drivers

DrvMovePointer Moves a pointer to a new position, and redraws it.
DrvSaveScreenBits Saves or restores a specified rectangle of the screen

(display driver only).
DrvSetPointerShape Removes the pointer from the screen, if the driver has
drawn it, and then sets a new pointer shape.
Functions Used Primarily by Printer Drivers

DrvDestroyFont Notifies driver that a font realization is no longer needed;

driver can free allocated data structures.
DrvDrawEscape Implements draw-type escape functions.
DrvEscape Queries information from a device not available in a

device-independent DDI.
DrvFree Frees font storage associated with an indicated data

structure.
Functions Used Exclusively by Printer Drivers

DrvEndDoc Sends end-of-document information.
DrvFontManagement Allows access to printer functionality not directly available

through GDI.
DrvGetGlyphMode Returns type of font information to be stored for a

particular font.
DrvNextBand Realizes the contents of a surface's just-drawn band.
DrvQueryPerBandInfo Returns banding information for the specified banded

printer surface.
DrvSendPage Sends raw bits from a surface to the printer.
DrvStartBanding Prepares the driver for banding.
DrvStartDoc Sends start-of-document control information.
DrvStartPage Sends start-of-page control information.
Font Driver Function

DrvQueryAdvanceWidths Supplies character advance widths for a specified set of

glyphs.

Supporting Initialization and Termination Functions
A graphics driver can support multiple devices and multiple concurrent use of each device. Therefore, initialization
and termination occur in three distinct layers, with each layer having its own timing. Initialization occurs in the
following order:
1. Driver initialization
2. PDEV initialization
3. Surface initialization
Termination occurs in the reverse order.
Driver Initialization and Cleanup
While the device driver may implement several or many functions, it exports only DrvEnableDriver to GDI. The
driver exposes its other supported functions through a function table. The first call GDI makes to a device driver is
to the DrvEnableDriver function. Within this function, the driver fills in the passed-in DRVENABLEDATA structure
so that GDI can determine which other DrvXxx functions are supported and where they are located. The driver
supplies the following information in DRVENABLEDATA:
The iDriverVersion member contains the graphics DDI version number for a particular Windows operating
system version. The winddi.h header defines the following constants:
CONSTANT OPERATING SYSTEM VERSION
DDI_DRIVER_VERSION_NT4 Windows NT 4.0
DDI_DRIVER_VERSION_NT5 Windows 2000
DDI_DRIVER_VERSION_NT5_01 Windows XP
For more information about how these constants are used, see DRVENABLEDATA.
The c member contains the number of DRVFN structures in the array.
The pdrvfn member points to an array of DRVFN structures that lists the supported functions and their
indexes.
For GDI to call a function other than the driver's enable and disable functions, the driver must make the function's
name and location available to GDI.
While DrvEnableDriver can also perform one-time initializations, such as the allocation of semaphores, a driver
should not actually enable the hardware during DrvEnableDriver. Hardware initialization should occur in a
driver's DrvEnablePDEV function. Likewise, a driver should enable the surface in the DrvEnableSurface function.
GDI calls the DrvDisableDriver function to notify the driver that it is about to be unloaded. In response to this call,
the driver should free all resources and memory still allocated by the driver at this point.
If the hardware needs to be reset, GDI calls the driver's DrvAssertMode function.
PDEV Initialization and Cleanup
Each kernel-mode graphics driver represents a single logical device managed by GDI. In turn, a driver can manage
one or more PDEV structures. A PDEV is a logical representation of the physical device. It is characterized by the
type of hardware, logical address, and surfaces that can be supported:
Type of Hardware âˆ’ As an example of a driver supporting a PDEV characterized by the type of hardware,
one driver could support the LaserWhiz, LaserWhiz II, and LaserWhiz Super printers. The device name
passed by GDI specifies which logical device is requested from the total set of driver-supported devices.
Logical Address âˆ’ A single driver can support printers attached to LPT1, COM2, and a server named
\\SERVER1\PSLASER, for example. In addition, a display driver that can support more than one VGA display
simultaneously might differentiate between them by port numbers, such as 0x3CE, 0x2CE, and so on. The
logical address for printers and other hard copy output devices is determined by GDI; the EngWritePrinter
function directs the output to the proper destination. Displays can either determine their own logical address
implicitly or retrieve the address from the private section of DEVMODEW.
The DEVMODEW structure provides the driver with required environment settings, such as the name of the
device and other information specific to either printer or display drivers.
Surfaces âˆ’ Each PDEV requires a unique surface. For example, if a printer driver is to work on two print
jobs simultaneously, each requiring a different page format such as the landscape and portrait formats, each
print job requires a different PDEV. Similarly, a display driver might support two desktops on the same
display, each desktop requiring a different PDEV and surface. For each surface required, there is a call to the
DrvEnablePDEV function to create a different PDEV for that surface.
In response to a call to DrvEnablePDEV, the driver returns information about the capabilities of the hardware device
to GDI through several structures.
The GDIINFO structure is zero-filled before GDI calls DrvEnablePDEV. The driver fills in GDIINFO to communicate
the following information to GDI:
Driver version number
Basic device technology (raster versus vector)
Size and resolution of printable page
Color palette and gray scale information
Font and text capabilities
Halftoning support
Style step numbers
The driver should fill only the fields it supports and ignore the rest.
The driver fills in the DEVINFO structure with flags that describe the graphics capabilities of this PDEV. In nearly all
cases, the information from DEVINFO tells GDI the level of graphics support the driver can provide. For example, if
a drawing of a treble clef is needed, information within DEVINFO tells GDI whether the driver can handle Bezier
curves or whether GDI must send multiple line segments instead. The driver should fill in as many fields as it
supports and leave the others untouched.
Another important piece of information the driver must provide is a pointer (phsurfPatterns) to a buffer filled with
handles for surfaces representing the standard fill patterns. Besides the standard fill patterns, phsurfPatterns can
contain a null, which causes GDI to create the pattern surface automatically according to the device resolution and
the pixel size. When GDI is called on to realize a brush with a standard pattern, it calls the DrvRealizeBrush
function to realize the brush defined for the requested pattern.
GDI passes DrvEnablePDEV a handle, hDriver, for the kernel driver that supports the device. For a printer driver,
hDriver provides the handle to the printer and is used in calls, such as EngWritePrinter, to the spooler.
Whenever GDI calls DrvEnablePDEV, the driver must allocate the memory required to support the PDEV that is
created, even if DrvEnablePDEV is called to create other PDEV structures for different modes. (A driver can have
several active PDEVs, although only one can be enabled at a time.) However, an actual surface is not supported until
GDI calls DrvEnableSurface.
If a device surface requires the allocation of a bitmap, the allocation is not necessary until the surface is enabled
(usually within the DrvEnableSurface function). Although applications often request device information before
actually writing to the device, waiting to allocate a large bitmap can save valuable resources and improve driver
performance during system initialization.
When the installation of the PDEV is complete, GDI calls the DrvCompletePDEV function to inform the driver that
installation of the physical device is complete. This function also provides the driver with GDI's logical handle to the
PDEV, which the driver uses in calls to GDI functions.
A call to the driver's DrvDisablePDEV function indicates that the given physical device is no longer needed. In this
function, the driver should free any memory and resources used by the physical device.
Refer also to Enabling and Disabling the Surface.
Enabling and Disabling the Surface
As the final initialization stage, GDI calls DrvEnableSurface to have the driver enable a surface for an existing
PDEV. DrvEnableSurface must specify the type of surface by calling the appropriate GDI service to create it. As
described in GDI Support for Surfaces, and depending on the device and circumstances, the driver can call the
appropriate GDI services from within DrvEnableSurface to create the surfaces:
For a device-managed surface, the driver should call the EngCreateDeviceSurface function to get a handle
for the surface.
To create a standard-format (DIB) bitmap that GDI can manage completely, including the performance of all
drawing operations, the driver should call the EngCreateBitmap function. The driver can hook out any
drawing operations it can optimize. The driver can either have GDI allocate the space for the pixels or can
provide the space itself, although the latter option is usually used only by printer and frame buffer drivers.
DrvEnableSurface returns a valid surface handle as a return value.
Following the creation of the surface, the driver must associate that surface with a PDEV by calling the GDI service
EngAssociateSurface. This call also tells GDI which drawing functions a driver has hooked for that surface.
GDI calls the DrvDisableSurface function to inform the driver that the current surface created for the PDEV by
DrvEnableSurface is no longer required. The driver must deallocate any memory and resources allocated during
the execution of DrvEnableSurface. DrvDisableSurface is always called before DrvDisablePDEV, if the PDEV has
an enabled surface.
Once created, a surface must be deleted when it is no longer in use. Failure to properly match surface creation with
deletion can cause stray objects to accumulate and degrade system performance.
Floating-Point Operations in Graphics Driver
Functions
If a graphics driver function contains code that uses the floating-point unit (FPU), that code must be preceded by a
call to EngSaveFloatingPointState and followed by a call to EngRestoreFloatingPointState. For a list of
graphics driver functions, see Graphics Driver Functions.
If an FPU is available, it will be used by any code that assigns a value to a floating-point variable or performs
calculations that involve floating-point numbers. For example, each of the following lines of code uses the FPU.
double myDouble = 5;
int myInt = 5 * 4.3;
int myInt = 50 * cos(2);
Suppose you are writing a DrvAlphaBlend function that uses the FPU. The following example demonstrates how
you should save and restore the floating-point state.
#define DRIVER_TAG // Put your driver tag here, for example 'zyxD'
BOOL DrvAlphaBlend(...)
{
...
ULONG result;
double floatVal;
VOID* pBuf;
ULONG bufSize;
// Determine the size of the required buffer.

bufSize = EngSaveFloatingPointState(NULL, 0);
if(bufSize > 0)
{
// Allocate a zeroed buffer in the nonpaged pool.
pBuf = EngAllocMem(
FL_NONPAGED_MEMORY|FL_ZERO_MEMORY, bufSize, DRIVER_TAG);
if(pBuf != NULL)
{
// The buffer was allocated successfully.
// Save the floating-point state.
result = EngSaveFloatingPointState(pBuf, bufSize);
if(TRUE == result)
{
// The floating-point state was saved successfully.
// Use the FPU.
floatVal = 0.8;
...
EngRestoreFloatingPointState(pBuffer);
}
EngFreeMem(pBuf);
}
}
...
}
GDI automatically saves the floating-point state for any calls to a driver's DrvEscape function when the escape is
OPENGL_CMD, OPENGL_GETINFO, or MCDFUNCS. In those cases, you can use the FPU in your DrvEscape function
without calling EngSaveFloatingPointState and EngRestoreFloatingPointState.
Most DirectDraw and Direct3D callback functions that perform floating-point operations should also save and
restore the floating-point state. For more information, see Performing Floating-point Operations in DirectDraw
and Performing Floating-point Operations in Direct3D.
For information about floating-point services provided by GDI, see GDI Floating-Point Services.
Creating Device-Dependent Bitmaps
When an application requests the creation of a bitmap, a driver can create and manage a DDB by supporting the
DrvCreateDeviceBitmap function. When such a driver creates the bitmap, it can store the bitmap in any format.
The driver examines the passed parameters and provides a bitmap with at least as many bits-per-pixel as
requested.
Note Graphics drivers can improve performance by supporting bitmaps in off-screen memory and by drawing
bitmaps using hardware. For an example of this, see the Permedia display driver sample.
Within DrvCreateDeviceBitmap, the driver calls the GDI service EngCreateDeviceBitmap to have GDI create a
handle for the device bitmap.
If the driver supports DrvCreateDeviceBitmap, it creates a DDB, defines its format, and returns a handle to it. The
driver controls where the bitmap is stored, and in what format. The driver should support the color format that
matches its device surface most closely.
The contents of the bitmap are undefined after creation. If the driver returns NULL, it does not create and manage
the bitmap; instead, GDI performs these tasks.
If the driver creates bitmaps, it must also be able to delete them by implementing the DrvDeleteDeviceBitmap
function.
Supporting Graphics Output
The particular graphics operations that a driver handles depend upon the drawing surface and the capabilities of
the hardware. If the surface is a standard-format DIB, GDI will handle all rendering operations not supported by the
driver. The driver can hook out any of the drawing functions and implement them to take advantage of hardware
support.
For a device-managed surface, a driver must, at a minimum, support the graphics output functions DrvCopyBits,
DrvTextOut, and DrvStrokePath. It can optionally support any of the other graphics output functions. Supporting
DrvBitBlt, for example, can enhance performance. Some functions require a certain level of capability while others
allow the device to indicate its capability by setting the appropriate GCAPS flags in the DEVINFO structure.
All drawing calls to the driver are always single threaded, regardless of the surface type.
The topics that follow describe how a driver can implement the following operations:
Drawing Lines and Curves
Drawing and Filling Paths
Copying Bitmaps
Halftoning
Image Color Management
Drawing Lines and Curves
The types of lines and curves included in graphic output are geometric lines, cosmetic lines, and Bezier curves.
For line and curve output, a driver can support the DrvStrokePath, DrvFillPath, and DrvStrokeAndFillPath
functions. The driver must support DrvStrokePath for drawing lines if the surface is device-managed; drivers are
not required to support curves.
When GDI draws a line or curve with any set of attributes, GDI can call DrvStrokePath. At a minimum, the
DrvStrokePath function must support the drawing of solid and styled cosmetic lines with a solid color brush and
arbitrary clipping. The GDI PATHOBJ_Xxx and CLIPOBJ_Xxx service functions make this possible by breaking down
the lines into a set of lines one pixel wide with precomputed clipping. DrvStrokePath provides a pointer,
plineattrs, to the LINEATTRS structure that defines the various line attributes.
When the path or clipping is too complex for the driver to process on the device, the driver can punt the callback to
GDI by calling the EngStrokePath function. In this case, GDI can break the DrvStrokePath call into a set of lines
one pixel wide with precomputed clipping.
By calling the CLIPOBJ_Xxx services from GDI, a driver can have GDI enumerate all the lines in the path and perform
all of the line clipping calculations. In addition, a driver can use the PATHOBJ_Xxx, CLIPOBJ_Xxx, or XFORMOBJ_Xxx
services to simplify the graphics operations. For example, a driver can use CLIPOBJ_cEnumStart and
CLIPOBJ_bEnum to enumerate the rectangles in a clip region, send this region down to the printer, and clip to it.
The driver can also use PATHOBJ_vEnumStart and PATHOBJ_bEnum to enumerate lines or curves in the path. It
can then send the path to the device, and stroke it.
Cosmetic Lines
A cosmetic line is always one pixel wide and is drawn using a solid color brush. It is rendered according to the Grid
Intersection Quantization (GIQ) diamond convention, which determines which pixels should be turned on to
render the cosmetic line.
The following figure shows a line superimposed on a rectangular grid, in which the pixels are located at the grid
intersection points. To determine which pixels should be illuminated, imagine a diamond, centered on the line, and
sliding along it. The diamond's width and height are exactly equal to the distance between adjacent pixel centers.
As the diamond moves along the line, any pixel whose center is completely covered by the diamond is turned on. If
a line passes through a point halfway between two adjacent pixels, the pixel to be turned on depends on the slope
of the line and how the adjacent pixels are oriented: horizontally (side by side), or vertically (one above the other).
The following table summarizes these cases.
SLOPE OF LINE (ABSOLUTE VALUE) ADJACENT PIXELS ARE ORIENTED RESULT
Slope < 1 Horizontally Light the pixel at diamond's left

vertex.
or
Slope > 1
Slope < 1 Vertically Light the pixel at diamond's top

vertex.
or
Slope > 1
Slope = 1 Horizontally Light the pixel at diamond's top

vertex.
Slope = 1 Vertically Light the pixel at diamond's right

vertex.
The diamond convention lights one pixel in each column for lines with slopes between -1 and 1, and one pixel in
each row for lines with slope greater than 1 in absolute value. This way, a cosmetic line is rendered with no gaps.
Start and end pixels of a cosmetic line are also determined by the diamond convention. A cosmetic line is first-
pixel-inclusive and last-pixel-exclusive; that is, if the line starts inside the diamond for a pixel, that pixel is
illuminated. Similarly, if the line ends inside the diamond for a pixel, that pixel is not illuminated.
The following graph illustrates the diamond convention for a cosmetic line.
For rendering cosmetic lines, the DrvStrokePath function follows the GIQ diamond convention. The DrvLineTo
function is an optional entry point that a driver can supply as an optimization for application calls to the Microsoft
Win32 LineTo function. DrvLineTo is simpler than DrvStrokePath because it supports only integer end-points
and solid cosmetic lines.
For raster devices that support the R2_NOT mix mode, a binary raster operation that changes the destination color
to its inverse, the driver must use exact rendering. Rendering should also be exact for devices that require
rendering by both GDI and the driver. This includes devices for which GDI draws on some bitmaps and the driver
draws on other surfaces (unless the pixels are too small to make any visible difference). This also includes devices
that request GDI to handle complex clipping.
Geometric Wide Lines
The shape of a geometric line is determined by the width, join style, and end-cap style of the brush, and the current
world-to-device transform in the XFORMOBJ structure. The line can be drawn using either a solid or a nonsolid
brush.
Drivers for more advanced hardware may support geometric wide lines in the DrvStrokePath function. GDI
determines whether a driver can draw a path containing a geometric line by testing the GCAPS_GEOMETRICWIDE
capability flag in the DEVINFO structure returned in the call to DrvEnablePDEV. If the driver does not have the
capability, or if the function fails to handle an operation because the path or clipping is too complex for the device,
GDI automatically transforms the call to the simpler DrvFillPath function.
A geometric wide line has a specific meaning to a display driver graphics function. A path containing device
coordinates is transformed to world coordinates using the inverse of the current transform. A geometric
construction with the specified width then obtains a widened version of the path, taking into account joins and end
caps. This path is transformed to device coordinates again and filled with the specified brush.
Styling of a geometric wide line is specified by an array of floating-point values. The array has a finite length, but is
used as though it repeats indefinitely. The first array entry specifies the length, in world coordinates, of the first
dash; the next entry specifies the length of the first gap. After this, lengths of dashes and gaps alternate. For
example, the style array {3.0,1.0,1.0,1.0} causes a line to be drawn with alternating long and short dashes.
Styling can be thought of as the driver moving along a path before widening, "erasing" the parts of the path
corresponding to the gaps. This breaks the path into many subpaths. The broken path is then widened as if it had
no line style, applying end caps and joins as usual. Style arrays can be of odd length. For example, the style array
{1.0} causes the driver to draw a line with alternating dashes. The style state (defined as the current distance into
the styling array) is provided for the beginning of the first subpath in a path. It is considered to be reset to 0.0 at the
beginning of each subsequent subpath, which occurs after any Win32 MoveToEx operation.
Styled Cosmetic Lines
The DrvStrokePath function must support the drawing of cosmetic lines with arbitrary clipping using a solid-color
brush. The driver can make a call to the GDI service PATHOBJ_vEnumStartClipLines to precompute the clipping.
Styling of a cosmetic line is similar to that of a geometric wide line because it is specified by a repeating array. For a
styled cosmetic line, the array entries are LONG values that contain the lengths in style steps. The relation between
style steps and pixels is defined by the xStyleStep, yStyleStep, and denStyleStep fields in the GDIINFO structure
returned by the DrvEnablePDEV function.
When the driver calls PATHOBJ_bEnumClipLines, to handle styled cosmetic lines through complex clipping, GDI
modifies the value of the CLIPLINE structure's iStyleState member to represent the style state. The style state is
the offset back to the first pixel of the line segment; that is, the first pixel that would be rendered if the line were not
clipped. The style state consists of two 16-bit values packed into a ULONG value. If HIGH and LOW are the high-
order and the low-order 16 bits of the style state, a fractional version of the style state, referred to as style position,
can be computed as:
style position = HIGH + LOW/denStyleStep
For example, if the values in iStyleState are 1 and 2, and denStyleStep is 3, then style position is 5/3. To
determine exactly where the drawing of the style begins in the style array, take the product:
style position * denStyleStep
In this example, with a denStyleStep value of 3, the drawing position is calculated to exclude the first five (5/3 * 3)
pixels of the style array. That is, drawing begins at the sixth pixel in the style array of this clipped line.
There are y-styled cosmetic lines and x-styled cosmetic lines. If a line extends dx device units in the x direction and
dy units in the y direction, the line is y-styled when the following is true:
(dy * yStyleStep) >= (dx * xStyleStep)
In this case, the style position is advanced by yStyleStep/denStyleStep for each pixel advanced in the y direction.
Conversely, a line is x-styled and the style position is advanced by xStyleStep/denStyleStep for each pixel
advanced in the x direction when the following is true:
(dx * xStyleStep) > (dy * yStyleStep)
When the style position advances to a new integer, the style step advances one unit in the style array.
The following figure shows several cosmetic styled lines having different slopes.
In this illustration, the pixel grid shown is not square, but is shown as it would be for an EGA display in which four
pixels in the x direction represent the same distance as three pixels in the y direction. The style steps in the
GDIINFO structure ensure that styled lines appear the same at any slope on displays whose pixels are not square.
In this illustration, the styling array (defined by the pstyle member of the LINEATTRS structure) is {1,1}, which is a
broken line having equal-sized dots and gaps. The driver's value of xStyleStep is 3, yStyleStep is 4, and
denStyleStep is 12.
To illustrate further, suppose a dot matrix printer has a 144-dpi horizontal resolution and a 72-dpi vertical
resolution. In addition, suppose the dot length of the minimum dot is 1/24-inch. To support this printer, select the
smallest numbers for xStyleStep and yStyleStep that can compensate for the printer's aspect ratio, such as 1 for
xStyleStep and 2 (144/72) for yStyleStep, and 6 (144/24) for denStyleStep.
If the LA_ALTERNATE bit is set in the flag in the LINEATTRS structure, a special style is used for a cosmetic line. In
this case, every other pixel is on, regardless of direction or aspect ratio. Style state is returned as if the style array is
{1,1} and xStyleStep, yStyleStep, and denStyleStep are all one. In other words, if lStyleState is zero, the first
pixel is on; if lStyleState is one, the first pixel is off.
If the LA_STARTGAP bit is set in the LINEATTRS flag, the sense of the elements in the style array is inverted. The first
array entry specifies the length of the first gap, the second entry specifies the length of the first dash, and so forth.
Bezier Curves
Some advanced hardware devices can draw paths containing Bezier curves (cubic splines), which are general-
purpose curve primitives. If so, the driver can include support for these curves in the DrvStrokePath function.
When GDI must draw a Bezier curve path on a device-managed surface, it will test the GCAPS_BEZIERS flag (in the
DEVINFO structure) to determine if it should call DrvStrokePath. If called, this function either performs the
requested operation or decides not to handle it, just as it does for geometric wide lines. In the latter case, GDI
breaks the request down into simpler operations, for example, by converting curves to line approximations.
Drawing and Filling Paths
The graphics driver considers a path to be a sequence of lines, and/or curves, defined by a path object (PATHOBJ
structure). To handle the filling of closed paths, the driver supports the function DrvFillPath.
GDI can call DrvFillPath to fill a path on a device-managed surface. GDI compares the requirements of the fill with
the DEVINFO structure's flags GCAPS_BEZIERS, GCAPS_ALTERNATEFILL, and GCAPS_WINDINGFILL, to decide
whether to call the driver. If GDI does call the driver, the driver either performs the operation or returns, informing
GDI that the path or clipping requested is too complex to be handled by the device. In the latter case, GDI breaks the
request down into several simpler operations.
A driver can also support the optional DrvStrokeAndFillPath function to fulfill requests for path fills. This function
fills and strokes a path at the same time. Many GDI primitives require this functionality. If a wide line is used for
stroking, the filled area must be reduced to compensate for the increased width of the bounding path.
When the driver returns FALSE from either the DrvFillPath or DrvStrokeAndFillPath functions, GDI converts the
fill-path request to a set of simpler operations and calls the driver function again. If the device returns FALSE again
on the second call to DrvFillPath, GDI converts the path to a clip object and then calls EngFillPath. For a FALSE
return when DrvStrokeAndFillPath is recalled, GDI can convert the call into separate calls to DrvStrokePath and
DrvFillPath.
Path Fill Modes
The two fill modes defined for paths are alternate and winding. Both fill modes use an even-odd rule to determine
how to fill a closed path.
FP_ALTERNATEMODE applies the even-odd rule as follows: draw a line from any arbitrary start point in the closed
path to some point obviously outside of the closed path. If the line crosses an odd number of path segments, the
start point is inside the closed region and is therefore part of the fill area. An even number of crossings means that
the point is not in an area to be filled.
FP_WINDINGMODE considers not only the number of times that the vector crosses segments of the path, but also
considers the direction of each segment. The path is considered to be drawn from start to finish, with each
segment's direction implied by the order of its specified points: the first vertex of a segment is the "from" point, and
the second vertex is the "to" point. Now draw the same arbitrary line described in alternate mode. Starting from
zero, add one for every "forward" direction segment that the line crosses, and subtract one for every "reverse"
direction segment crossed. (Forward and reverse are based on the dot product of the segment and the arbitrary
line.) If the count result is nonzero, then the start point is inside the fill area; a zero count means the point is outside
the fill area.
The following figure shows how to apply both rules to the more complex situation of a self-intersecting path.
In alternate fill mode, point A is inside because ray 1 passes through an odd number of line segments, while points
B and C are outside, because rays 2 and 3 pass through an even number of segments. In winding-fill mode, points A
and C are inside, because the sum of the forward (positive) and reverse (negative) line segments crossed by their
rays, 1 and 3 respectively, is not zero, while point B is outside, because the sum of the forward and reverse line
segments that ray 2 crosses is zero.
Filling Areas (Closed Paths)
As in line drawing, pixels for filling are considered to be at integer coordinates. Each scan line in a region is
bordered on the left and right by a segment of the path. Pixels that fall between the left and right borders are
considered inside the fill region. Pixels that are exactly on the left border are also inside, but those exactly on the
right border are excluded. If a top border is exactly horizontal, any pixels exactly on the border are inside while
pixels exactly on the lower border are excluded.
The following figure shows how the pixels included in the fill region are determined relative to left and right
borders of the region. Stated mathematically, the region is considered to be "closed" on the left and top, and "open"
on the right and bottom.
The convention described above for the x-axis of the fill region also applies to the y-axis by substituting the left
border with the top border and the right border with the bottom border.
Copying Bitmaps
Bit block transfer (BitBlt) functions implemented by drivers copy blocks of bits from one surface to another. These
functions include:
DrvBitBlt
DrvCopyBits
DrvStretchBlt
DrvTransparentBlt
There is also a display-driver-specific BitBlt function named DrvSaveScreenBits.
If the surface being drawn on is a device-managed surface or bitmap, the driver must support a minimum level of
bit block transfer functions. If the surface is a GDI-managed standard format bitmap, GDI handles only those
operations not hooked by the driver.
DrvBitBlt
The DrvBitBlt function provides general bit block transfer capabilities. If a source is used, DrvBitBlt copies the
contents of the source rectangle onto the destination rectangle. (The pptlSrc parameter of this function identifies
the upper left corner of the rectangle.) If there is no source rectangle, DrvBitBlt ignores the pptlSrc parameter. The
destination rectangle, the surface to be modified, is defined by two integer points, the upper left and lower right
corners. The rectangle is lower right exclusive; the lower and right edges of the rectangle are not part of the block
transfer. DrvBitBlt cannot be called with an empty destination rectangle. The two points of the rectangle are always
well ordered; that is, both coordinates of the lower right point are greater than their counterparts in the upper left
point.
DrvBitBlt deals with different ROPs and performs optimizations depending on the device. In some cases, if the
ROP is a solid color, a fill rather than a BitBlt can be performed. For devices or drivers that do not support ROPs,
such as the Pscript driver, there can be discrepancies between the displayed and printed images.
Optionally, a block transfer handled by DrvBitBlt can be masked and involve color index translation. A translation
vector assists in color index translation for palettes. The transfer might need to be arbitrarily clipped by a display
driver, using a series of clip rectangles. The required region and information are furnished by GDI.
Implementing DrvBitBlt represents a significant portion of the work involved in writing a driver for a raster display
driver that does not have a standard-format frame buffer. The Microsoft VGA driver that is furnished with the
Windows Driver Kit (WDK) provides sample code that supports the basic function for a planar device. Implementing
DrvBitBlt for other devices may be less complex.
DrvCopyBits
The DrvCopyBits function is called by GDI from its simulation operations to translate between a device-managed
raster surface and a GDI standard-format bitmap. DrvCopyBits provides a fast path for SRCCOPY (0xCCCC) ROP
bit block transfers.
Required for a graphics driver with device-managed bitmaps or raster surfaces, this function must translate driver
surfaces to and from any standard-format bitmap. DrvCopyBits is never called with an empty destination
rectangle, and the two points of the destination rectangle are always well ordered. This call has the same
requirements as DrvBitBlt.
If a driver supports a device-managed surface or bitmap, the driver must implement the DrvCopyBits function. At
a minimum, the driver must do the following when DrvCopyBits is called:
Perform a block transfer to and from a bitmap, in the device's preferred format, and to the device surface.
Perform the transfer with the SRCCOPY (0xCCCC) raster operation (ROP).
Allow arbitrary clipping.
The driver can use the GDI CLIPOBJ enumeration services to reduce the clipping to a series of clip rectangles. GDI
passes down a translation vector, the XLATEOBJ structure, to assist in color index translation between source and
destination surfaces.
If the surface of a device is organized as a standard-format device-independent bitmap (DIB), the driver can
support only simple transfers. If a call comes in with a complicated ROP, the driver can punt the block transfer
request back to GDI with a call to the EngCopyBits function. This allows GDI to break up the call into simpler
functions that the driver can perform.
DrvCopyBits also is called with RLE bitmaps (see the Microsoft Windows SDK documentation) and device-
dependent bitmaps (DDBs). The bitmaps are provided to this function as a result of application program calls to
several Win32 GDI routines. The optional DDB is supported only by a few specialized drivers.
DrvStretchBlt
A driver optionally can provide the DrvStretchBlt function, even drivers that support device-managed surfaces.
This function provides capabilities for stretching block transfers between device-managed and GDI-managed
surfaces. DrvStretchBlt supports only certain types of stretching, such as stretching by integer multiples.
DrvStretchBlt also allows a driver to write on GDI bitmaps, especially when the driver can do halftoning. The
function also permits the same halftoning algorithm to be applied to GDI bitmaps and device surfaces.
DrvStretchBlt maps a geometric source rectangle exactly onto a geometric destination rectangle. The source is a
rectangle with corners displaced by (-0.5,-0.5) from the given integer coordinates. The points specified in the
function parameters lie on integer coordinates that correspond to pixel centers. A rectangle defined by two such
points is considered to be geometric, with two vertices whose coordinates are the given points, but with 0.5
subtracted from each coordinate. (GDI POINTL structures use a shorthand notation for specifying these fractional
coordinate vertices.) Note that the edges of any such rectangle never intersect a pixel, but go around a set of pixels.
The pixels inside the rectangle are normal pixels for a lower right-exclusive rectangle.
The points defining the corners of the source rectangle are well-ordered; DrvStretchBlt cannot be given an empty
source rectangle. Unlike DrvBitBlt, DrvStretchBlt can be called with a single clipping rectangle to prevent round-
off errors in clipping the output.
The destination rectangle is defined by two integer points. These points are not well ordered, which means that the
coordinates of the second point are not necessarily larger than those of the first. The source rectangle these points
describe does not include the lower and right edges. Because the rectangle is not well ordered, DrvStretchBlt must
sometimes perform inversions in the two x coordinates and/or the two y coordinates. (The driver must not attempt
to read pixels that do not lie on the source surface). DrvStretchBlt cannot be called with an empty destination
rectangle.
For color translation, DrvStretchBlt provides a pointer, pxlo, to the XLATEOBJ structure, which is used to translate
between the source and destination surfaces. The XLATEOBJ structure can be queried to find the destination index
for any source index. For a high-quality stretching block transfer, DrvStretchBlt is required to interpolate colors in
some cases. DrvStretchBlt also uses the COLORADJUSTMENT structure to define the color adjustment values that
are to be applied to the source bitmap before the bits are stretched.
DrvStretchBlt uses the iMode parameter to define how the source pixels are to be combined for output. In
particular, iMode provides the HALFTONE option that permits the driver to use groups of pixels in the output
surface to approximate the color or grey level of the output. Changes to the COLORADJUSTMENT structure are
passed to the driver after the next DrvStretchBlt call with an iMode of HALFTONE. In addition, if the driver requires
GDI to handle halftoning for GDI bitmaps, the driver hooks out DrvStretchBlt, sets the iMode parameter to
HALFTONE, and returns it in EngStretchBlt.
If DrvStretchBlt has hooked a call to the EngStretchBlt function and is asked to do something that it does not
support, it returns the request to GDI so that the appropriate function can handle it.
DrvTransparentBlt
The DrvTransparentBlt function causes a source bitmap to be copied onto a destination bitmap so that portions of
the destination bitmap remain visible after the copy. The iTransColor parameter of this function specifies the color
that is to be made transparent.
The following figure depicts an example of a transparent blt.
From left to right, the preceding figure shows the source bitmap, the destination bitmap before the transparent blt,
and the destination bitmap after the transparent blt. Note that the color in iTransColor is the same as that in the
four regions above, below, and to either side of the central region in the source bitmap.
When the blt operation takes place, these four regions are not copied, which causes any pixel pattern in the
destination bitmap under these regions to remain visible. Any pixel pattern under the other regions (the four
corners and the center) is overwritten in the transparent blt.
This is illustrated in the right-most image: the portions of the letter 'M' in the four corners and the center were
overwritten with the colors in the source bitmap. The portions of the letter 'M' under the four regions whose color
is the same as that in iTransColor remain visible.
Halftoning
Traditional analog halftoning uses a halftoning screen, composed of cells of equal sizes, with fixed-cell spacing
center-to-center. The fixed-cell spacing accommodates the thickness of the ink, while the size of a dot within each
cell can vary to produce the impression of a continuous tone.
On a computer, most printing or screen shading also uses a fixed-cell pixel size. To simulate the variable dot size, a
combination of cluster pixels simulates the halftone screen. GDI includes halftoning default parameters that provide
a good first approximation. Additional device-specific information can be added to the system to improve output.
The driver sends GDI the device-related specifications that GDI needs to do halftoning through the GDIINFO
structure returned by the DrvEnablePDEV function. The driver specifies the pattern size with the ulHTPatternSize
member of GDIINFO, which defines the preferred output format for halftoning. For specific devices, halftoning
relates to the halftone pattern sizes. GDI provides numerous predefined pattern sizes from 2 x 2 through 16 x 16.
For each standard pattern size, there is also a modified version. It is identified by the suffix "_M" on the standard
pattern size's name. For example, the defined name of the standard 6-by-6 pattern is HT_PATSIZE_6x6, while the
name of the modified 6-by-6 pattern is HT_PATSIZE_6x6_M). The modified version gives more color resolution, but
can produce a side effect of horizontal or vertical noise. In addition, because each of these pattern sizes is device
resolution-dependent, the appropriate pattern size depends upon the specific device.
The tradeoff between pattern size (spatial resolution) and color resolution is determined by the pattern size. A
larger halftone pattern produces better color resolution, while a smaller pattern results in the best spatial
resolution. Determining the best pattern size is frequently a matter of trial and error. For more information, refer to
GDIINFO.
Another of the GDIINFO structure members affecting halftoning is flHTFlags, which contains flags that describe
the device resolution needed for halftoning.
GDI handles color adjustment requests from the application and passes the information down to driver functions
through the graphics DDI. If the application selects halftoning and the surface is a standard format DIB, GDI
processes the bitmap using its halftoning capabilities, after which, the bitmap is sent to the device. In the PostScript
driver, the EngStretchBlt function can send the bitmap to the printer using either the DrvCopyBits or DrvBitBlt
(in the SRCCOPY mode) functions.
Letting GDI perform the halftoning instead of the PostScript printer, for example, provides a faster output with
better WYSIWYG quality. An interface to the PostScript driver allows the user to adjust the halftoning and provides
a check box to turn off GDI halftoning if the printer's built-in halftoning capabilities are preferred.
The DrvDitherColor function can return the DCR_HALFTONE value, which requests that GDI approximate a color
using the existing device (halftone) palette. DCR_HALFTONE can be used with a display driver only when the device
contains a device (halftone) palette, such as a VGA-16 adapter card, because it has a standard fixed palette.
Monochrome drivers, including most raster printers, can use the iMode parameter in DrvDitherColor to obtain
good gray-scale effects.
Note Windows 2000 and later do not support halftoning on 24-bit (or higher) devices.
Image Color Management
An image displayed on a computer monitor often appears differently when it is printed on a color printer. In
recognition of this problem, Windows 2000 and later incorporates Image Color Management (ICM) to perform
color correction on images so that their appearance is consistent across a variety of output devices.
To find out more about Image Color Management and a particular class of output device, follow the appropriate
link:
Color Management for Printers
Color Management for Still Image Devices
For a general discussion about Image Color Management, see the Microsoft Windows SDK documentation.
Supporting Graphics DDI Color and Pattern
Functions
Graphics DDI color and pattern functions include palette management and brush realization functions.
Managing Palettes
As described in GDI Support for Graphics Drivers, GDI handles much of the palette management work. The driver
must supply its default palette to GDI in the DEVINFO structure when GDI calls the function DrvEnablePDEV. At
this time, the driver should create the default palette with a call to the GDI service function EngCreatePalette.
Drivers that support settable palettes also must support the DrvSetPalette function. This function is used
exclusively by display drivers.
Realizing Brushes
Graphics functions that output lines, text or fills take at least one brush as an argument. The brush defines the
pattern to be used to draw the graphics object on the specified surface. Each output function that takes a brush
requires a brush origin. The brush origin provides the coordinates of a pixel on the device surface to be aligned
with the upper left pixel of the brush's pattern. The brush pattern is repeated (tiled) to cover the whole device
surface.
The driver can support the following functions to define brushes:
DrvRealizeBrush
DrvDitherColor
A brush is always used with a mix mode that defines how the pattern should be mixed with the data already on the
device surface. The MIX data type consists of two ROP2 values packed into a single ULONG value. The foreground
ROP is in the lowest-order byte. The next byte contains the background ROP. For more information, see the
Microsoft Windows SDK documentation.
GDI keeps track of all logical brushes that an application has requested for use. Before asking a driver to draw
something, GDI first issues a call to the driver function DrvRealizeBrush. This allows the driver to compute the
optimal representation of the required pattern for its own drawing code.
DrvRealizeBrush is called to realize the brush defined by psoPattern (pattern for the brush) and by psoTarget
(surface for the realized brush). A realized brush contains information and accelerators a driver needs to fill an area
with a pattern. This information is defined and used only by the driver. Driver realization of a brush is written into a
buffer that the driver can cause to be allocated by calling the GDI service function BRUSHOBJ_pvAllocRbrush
from within DrvRealizeBrush. GDI caches all realized brushes; consequently, they seldom need to be recomputed.
In DrvRealizeBrush, the BRUSHOBJ user object represents the brush. The surface for which the brush is to be
realized can be the physical surface for the device, a DDB, or a standard-format bitmap. For a raster device, the
surface describing the brush pattern represents a bitmap; and for a vector device, it is always one of the pattern
surfaces returned by the DrvEnablePDEV function. The transparency mask used for the brush is a one-bit-per-
pixel bitmap with the same extent as the pattern. A mask bit of zero means that the pixel is considered to be a
background pixel for the brush; that is, the target pixel is unaffected by that particular pattern pixel.
DrvRealizeBrush uses an XLATEOBJ structure to translate the colors in the brush pattern to the device color
indexes.
The driver should call the GDI service function BRUSHOBJ_pvGetRbrush when the value of the iSolidColor
member of the BRUSHOBJ structure is 0xFFFFFFFF and the pvRbrush member is NULL.
BRUSHOBJ_pvGetRbrush retrieves a pointer to the driver's realization of a specified brush. If the brush has not
been realized when the driver calls this function, GDI automatically calls DrvRealizeBrush for the driver's realization
of the brush.
Dithering
If necessary, GDI can request the assistance of the driver when trying to create a brush with a solid color that
cannot be represented exactly on the hardware. GDI calls the driver function DrvDitherColor to request the driver
to dither a brush against the reserved portion of the device palette.
Dithering uses a pattern of several colors to approximate the chosen color, and its result is an array of device color
indexes. A brush created using these colors for its pattern is usually a good approximation of the given color.
DrvDitherColor can also represent a color that cannot be specified exactly by a device. To do this, DrvDitherColor
requests a pattern of several colors and creates a brush that approximates the given solid color.
The function DrvDitherColor is optional and is called only if the GCAPS_COLOR_DITHER or GCAPS_MONO_DITHER
capability flags are set in the flGraphicsCaps member of the DEVINFO structure. DrvDitherColor can return the
values listed in the following table.
VALUE MEANING
DCR_DRIVER Indicates that the dither values have been calculated by

the driver. The handle to a cxDither by cyDither array of
device color indexes is passed back in this case.
DCR_HALFTONE Indicates that GDI should approximate a color using the

existing device (halftone) palette. For example, GDI can
use the typical palette for a printer that contains only
three or four colors. DCR_HALFTONE can be used with a
display driver only when the device contains a device
(halftone) palette, such as VGA-16 adapter card, which
has a standard fixed palette.
DCR_SOLID Indicates that GDI should map the requested color to the
nearest color value in the existing device palette (many to
one).
Monochrome drivers should support DrvDitherColor in order for GDI to obtain good gray-level patterns.
Supporting Graphics DDI Font and Text Functions
For many devices, GDI can handle all font functions. Some drivers, however, can draw their own fonts, or their
device's own fonts, on device surfaces. Other drivers are font drivers, which can provide glyph bitmaps and/or
outlines, as well as glyph metrics to GDI. In these cases, the driver must support some of the available font
functions.
Text output is a more general function. If the surface is a standard-format bitmap, GDI can handle all text output,
unless the driver hooks out the call to enhance performance. For a device-managed surface, the driver must
support text output.
The following topics provide information with regard to the support of font and text management functions.
Managing and Optimizing Fonts
Drawing Text
Managing and Optimizing Fonts
A producer is a driver that can generate fonts. It produces glyph information as output, including glyph metrics,
bitmaps, and outlines. A consumer is a driver that uses fonts. It accepts glyph information as input for generating
text output, and must draw its own fonts or those of the hardware on a device-managed surface. A driver can be
both a producer and a consumer. For example, a printer driver can act as a producer while servicing a
DrvQueryFontData call to provide glyph metrics and later act as a consumer while processing a DrvTextOut call.
A driver is required to handle fonts only when it is a font producer or a font consumer. If the hardware has a
resident font, the driver must supply information to GDI about this font, including the font metrics in the
IFIMETRICS structure, mappings from Unicode to individual glyph identities, individual glyph attributes, and
kerning tables. There are also functions the driver must support. Some functions are required both by font drivers
and drivers that use driver-specific or device-specific fonts. Others are required only by font drivers.
The support of font functions depends on the driver's abilities. The general types are:
Metrics functions
Glyph functions
TrueType functions
Font Metric Functions
When a driver must support fonts, it must supply font information to GDI through the IFIMETRICS structure. There
is a separate IFIMETRICS structure for each font. Most of the fields are expressed in terms of FWORDs, each a
signed 16-bit quantity, in the design space. If the font is a raster font, the design space and device space are the
same and a font unit is equivalent to the distance between pixels.
Basically, the IFIMETRICS structure is the graphics DDI version of a text-metric structure. All distances refer to the
notional coordinate system of the font designer. The notional space coordinate system is a right-handed Cartesian
coordinate system in which the y-coordinate increases toward the top and the x-coordinate increases toward the
right.
The IFIMETRICS structure is designed to be of variable length. No restriction is placed on the length of the character
strings associated with the font. It is common practice to store the strings immediately following the last field of the
IFIMETRICS structure.
Any driver that provides fonts must support the DrvQueryFont function. The driver also can include the function
DrvQueryFontData to retrieve information about a realized font. In a call to DrvQueryFontData, GDI provides a
pointer to an array of glyphs or kerning handles. The driver returns information about associated glyphs in GDI
GLYPHDATA structures. If DrvQueryFontData has been given kerning handles, it returns information about
kerning pairs in the form of Win32 POINTL structures. The following table lists the font metric functions.
DrvDestroyFont Notifies the driver that a font realization is no longer

needed so the driver can free any data structures that it
allocated. GDI calls this function once for the font
producer and once for the font consumer. Optional--
should be supported only if the driver must free allocated
resources.
DrvFree Informs the driver that the indicated data structure is no

longer needed. Optional--should be implemented only if
the driver's memory management requires this
information.
DrvQueryFont Returns a pointer to the IFIMETRICS structure for a font.

Required by all drivers that deal with fonts.
DrvQueryFontData Returns information about a realized font. Required (for

selected iMode values) by all drivers that deal with fonts.
DrvQueryFontTree Returns pointers to structures that define either the

mapping from Unicode to glyph handles or the mapping
of kerning pairs to kerning handles. Required by all drivers
that deal with fonts.
The function DrvQueryFontTree allows GDI to obtain pointers to tree structures that define one of the following:
Mapping from Unicode to glyph handles, including glyph variants (GDI FD_GLYPHSET structure)
Mapping of kerning pairs to kerning handles (FD_KERNINGPAIR structure)
DrvQueryFontTree requires effort to generate the needed structures, so the driver should precompute these files if
possible. The structures can be stored in a resource or in a file. If the structures are stored in a file, the ideal method
for loading or reading them is to call the EngMapFontFile function, which maps a file to the memory. Because the
file does not get added to the swap file, the memory can be made available if needed, which is more efficient than
opening and reading in a file.
In particular, the driver returns an identifier in the pid parameter. GDI passes it to the DrvFree function, with the
returned pointer, when the FD_GLYPHSET structure or an array of FD_KERNINGPAIR structures is no longer
needed. Depending on how memory is managed in the driver, pid can identify the structure, identify how the
structure was allocated, or do nothing at all.
DrvFree and DrvDestroyFont are both optional functions. GDI calls DrvFree to inform the driver that the specified
data structure is no longer needed. The driver does not need to implement it unless it allocates memory for the
structure and needs to be informed when the corresponding data structure can be released. For example, if the data
is associated with the FONTOBJ structure, the deletion could be deferred until a call to DrvDestroyFont, so it would
not be necessary to implement DrvFree.
DrvDestroyFont notifies the driver that a font realization is no longer needed so the driver can free any data
structures it allocated. GDI calls this function once for the font producer and once for the font consumer. It should
be implemented only if the driver must free allocated resources when the font instance is destroyed.
Font Driver Functions
In addition to the functions described in the previous topics, the following table lists several other functions that
font drivers should support.
DrvLoadFontFile Specifies a file to be used for creating font realizations; the

driver must prepare the file for use. Required for font
drivers.
DrvQueryAdvanceWidths Asks the driver to send GDI character advance widths for
a specified set of glyphs.
DrvQueryFontCaps Copies an array of bits that defines the capabilities of a

font driver, to a specified buffer.
DrvQueryFontFile Depending on the mode of the query, returns the number

of font faces in a font file or in a descriptive string.
Required for font drivers.
DrvQueryGlyphAttrs Returns information about a font's glyphs.
DrvUnloadFontFile Informs driver that a font file is no longer needed so

driver can do necessary cleanup. Required for font drivers.
GDI calls the DrvLoadFontFile function with a particular file to be used for creating font realizations. This function is
required only of font drivers. When the function DrvLoadFontFile is called, the driver performs the conversions
necessary to prepare the file for use.
DrvLoadFontFile returns a unique identifier that allows GDI to request the correct font using a GDI-maintained font
usage table. Once a font is loaded, GDI does not call for the same font to be loaded again.
GDI calls DrvUnloadFontFile when the specified font file is no longer needed. The DrvUnloadFontFile function is
required only in font drivers. DrvUnloadFontFile causes all scratch files to be deleted and all allocated system
resources to be freed.
GDI calls the DrvQueryFontFile function to return information about a font file that was loaded by the driver.
DrvQueryFontFile is required only in font drivers. The type of information to be returned is specified by iMode. If
iMode is QFF_DESCRIPTION, the function returns a string that Microsoft NT-based operating systems use to
describe the font file. If iMode is QFF_NUMFACES, the function returns the number of faces in the font file. The
faces are identified by an index from the range 1 to number of faces.
TrueType Font Driver Functions
TrueType font drivers must support the functions listed in the following table.
DrvGetTrueTypeFile Gives GDI efficient access to the memory-mapped

TrueType font file.
DrvQueryTrueTypeOutline Returns glyph handles in native TrueType format.
DrvQueryTrueTypeTable Gives GDI access to specific files in the TrueType font file
format.
All these functions provide GDI with information about TrueType font files. DrvQueryTrueTypeTable should give
GDI access to specific tables in the TrueType font-file format. DrvQueryTrueTypeOutline must send GDI glyph
outlines in native TrueType format. DrvGetTrueTypeFile returns to GDI the TrueType driver's private entry point that
allows GDI efficient access to the memory mapped TrueType font file.
Drawing Text
The text output functions are called only for a device-managed surface (a device bitmap or surface), or for a GDI-
managed surface if the driver has hooked the call in the EngAssociateSurface function. The graphic output
primitives for text are the functions:
DrvTextOut
DrvGetGlyphMode
GDI calls DrvTextOut to render the pixels for a set of glyphs at specified positions for text output. Many of the
DrvTextOut capabilities are defined with the GCAPS bits of the DEVINFO structure returned by the
DrvEnablePDEV function.
The input parameters for DrvTextOut define two sets of pixels, foreground and opaque. The driver renders the
surface to provide the following results:
1. The opaque pixels are rendered first, with the opaque brush.
2. The foreground pixels are then rendered with the foreground brush.
Each of these rendering operations is performed in a clip region. The pixels outside the clip region cannot be
affected.
The driver must render the surface so opaque pixels are calculated and drawn on the surface first with an opaque
brush. Then foreground pixels are calculated and rendered with a foreground brush. Each of these operations is
limited by clipping.
Foreground and opaque pixels make up a mask through which color is brushed onto the surface. The glyphs of a
font do not, in themselves, have color. The foreground set of pixels is defined as the union of the glyphs' pixels and
the pixels of certain extra rectangles used to simulate strikethrough or underline. Opaque pixels are defined by
opaque rectangles.
DrvTextOut selects the specified font using a pointer, pfo, to query the current FONTOBJ structure. This process
can include downloading a soft font or a font substitution, or any other font optimizations necessary for the device.
If a driver has scalable fonts, it should call the FONTOBJ_pxoGetXform function for the current FONTOBJ
structure, to return the notional-to-device transform for the associated font. This is required for a driver-supplied
font. Notional space is the design space of the device font. For example, PostScript fonts are defined in 1000-by-
1000 unit character cells. Most of the metrics returned in the IFIMETRICS structure are converted to notional space,
which is why the notional-to-device transform is necessary.
The graphics engine queries the driver by calling the function DrvGetGlyphMode to find out how it should
internally cache its font information. It can cache individual glyphs as bitmaps, outlines, or neither (the proper
choice for device fonts).
The DEVMODEW Structure
The DEVMODEW structure is the Unicode version of the DEVMODE structure, which is described in the Microsoft
Windows SDK documentation. (The 'W' suffix on DEVMODEW stands for "wide", or Unicode characters.) While
applications can use either structure, drivers are required to use the DEVMODEW structure rather than the
DEVMODE structure.
Public and Private Members
Immediately following a DEVMODEW structure's defined members (often referred to as its public DEVMODEW
members), there can be a set of driver-defined members (its private DEVMODEW members). The following figure
shows the public section (the actual DEVMODEW structure itself) and the private section.
Normally, the private members are used only by printer drivers. The driver supplies the size, in bytes, of this private
area in the dmDriverExtra member. Driver-defined private members are for exclusive use by the driver.
For printer drivers, the DEVMODEW structure is used to specify user choices for a print document. It is also used to
specify default values of these choices for printers, such as the number of copies to print, paper size, and other
attributes. For display devices, the DEVMODEW structure specifies display attributes such as the number of bits per
pixel, pixel dimensions, and display frequency.
Initializing a DEVMODEW Structure
Depending on whether it is to be used by a display driver or by a printer driver, a DEVMODEW structure is
initialized in two different ways.
Display driver DEVMODEW initialization
A display driver's DrvGetModes entry point initializes all members of the DEVMODEW structure to zero.
DrvGetModes then copies the name of the display driver DLL to the dmDeviceName member, fills in the
dmSpecVersion and dmDriverVersion members with the version of the DEVMODEW structure, and
copies display attribute information to the appropriate members.
Printer driver DEVMODEW initialization
When an application makes a call to either DocumentProperties (a printer interface DLL function that is
described in the Microsoft Windows SDK documentation) or DrvDocumentPropertySheets (an NT-based
operating system graphics DDI), a DEVMODEW structure is created with default values. An application is
then free to modify any of the public DEVMODEW members. After any changes, the application should then
make a second call to the same function it called before, in order to merge the changed members with those
of the driver's internal DEVMODEW structure. The second call is necessary since some changes may not
work correctly; the printer driver must be called to correct the DEVMODEW structure. When the document is
about to be printed, the application passes the merged DEVMODEW structure to CreateDC (described in the
Microsoft Windows SDK documentation), which passes it on to the DrvEnablePDEV DDI. At that time, the
driver's rendering DLL validates the DEVMODEW structure and makes repairs, if necessary, before carrying
out the print job.
Using a DEVMODEW Structure
Several APIs and graphics DDIs use the information in the DEVMODEW structure for such purposes as printing,
querying device capabilities, showing user interface, and others. For example, DrvConvertDevMode is a print
spooler graphics DDI that translates the DEVMODEW structure from one operating system version to another. This
might be necessary if a printer driver gets a DEVMODEW structure from another machine that is running on a
different operating system version.
Modifying a DEVMODEW Structure
Applications and drivers are free to ask for a DEVMODEW structure and modify its public part directly. Only drivers,
however, are permitted to modify the private DEVMODEW structure members.
In order to modify private DEVMODEW structure members, a driver must first determine the offset of the
beginning of the private data. Given a pointer to the beginning of this structure, and the dmSize member, which
holds the size of the public portion of the structure, the beginning of the private portion can be found. The
following example shows how to initialize a pointer to the beginning of the private section. In this example, pdm
points to the beginning of the DEVMODEW structure.
PVOID pvDriverData = (PVOID) (((BYTE *) pdm) + (pdm -> dmSize));
Printer Driver/Display Driver DEVMODEW Differences

The DEVMODEW structure members fall into three categories:
Members used only by printer drivers
Members used only by display drivers
Members used by both printer and display drivers
The following table lists several public DEVMODEW members that are used only by printer drivers:
DEVMODEW MEMBERS USED ONLY BY PRINTER DRIVERS PURPOSE
dmScale Specifies the percentage by which the image is to be

scaled for printing.
dmCopies Specifies the number of copies to be printed.
dmColor Specifies whether a color printer should print color or

monochrome.
dmOrientation Specifies the orientation of the paper, either portrait or

landscape.
The next table lists several public DEVMODEW members that are used only by display drivers:
DEVMODEW MEMBERS USED ONLY BY DISPLAY DRIVERS PURPOSE
dmBitsPerPel Specifies the color resolution, in bits per pixel, of the

display device.
DEVMODEW MEMBERS USED ONLY BY DISPLAY DRIVERS PURPOSE
dmPelsWidth Specifies the width, in pixels, of the visible device surface.
dmPelsHeight Specifies the height, in pixels, of the visible device surface.
dmDisplayFlags Specifies the display mode - color versus monochrome,

interlaced versus noninterlaced.
dmDisplayFrequency Specifies, in hertz, the display's refresh rate.
The third table lists several public DEVMODEW members that are used by both printer and display drivers:
DEVMODEW MEMBERS USED BY PRINTER AND DISPLAY DRIVERS PURPOSE
dmDeviceName For displays, specifies the display driver's DLL.

For printers, specifies the "friendly name" of the printer.
dmFields Specifies bit flags identifying which of the DEVMODEW

members that follow it are in use. For example, the
DM_BITSPERPEL flag is set when the dmBitsPerPel
member contains valid data.
dmSize Specifies the size, in bytes, of the public portion of the

DEVMODEW structure.
dmDriverExtra Specifies the number of bytes of private driver data

following the public structure members. For display
drivers, this is usually zero.

Obsolete Graphics DDI Functions
The following graphics DDI functions appear in the winddi.h header, but are obsolete for Windows 2000 and later.
For a list of obsolete GDI functions and structures, see Obsolete GDI Functions, Structures, and Constants.
DrvMovePanning
DrvPaint
DrvQuerySpoolType
This section describes the Microsoft Windows NT-based operating system graphics device interface (GDI). It then
details the support that GDI provides to graphics drivers.
References to GDI in this section are implicit references to kernel-mode GDI; Microsoft Win32 GDI will be explicitly
identified. Kernel-mode GDI is also known as the Graphics Engine.
GDI function and structure references are documented in the Display Devices Reference section. Most of the GDI
function declarations and structure definitions can be found in Winddi.h. For display drivers, the Microsoft
DirectDraw heap manager functions are declared in Dmemmgr.h. Both files are shipped with the Windows Driver
Kit (WDK).
GDI from the Driver's Perspective
GDI is the intermediary support between a Microsoft Windows NT-based graphics driver and an application.
Applications call Microsoft Win32 GDI functions to make graphics output requests. These requests are routed to
kernel-mode GDI. Kernel-mode GDI then sends these requests to the appropriate graphics driver, such as a display
driver or printer driver. Kernel-mode GDI is a system-supplied module that cannot be replaced.
GDI communicates with the graphics driver through a set of graphics device driver interface (graphics DDI)
functions. These functions are identified by their Drv prefix. Information is passed between GDI and the driver
through the input/output parameters of these entry points. The driver must support certain DrvXxx functions for
GDI to call. The driver supports GDI's requests by performing the appropriate operations on its associated hardware
before returning to GDI.
GDI includes many graphics output capabilities in itself, eliminating the need for the driver to support these
capabilities and thereby making it possible to reduce the size of the driver. GDI also exports service functions that
the driver can call, further reducing the amount of support the driver must provide. GDI service functions are
identified by their Eng prefix, and functions that provide access to GDI-maintained structures have names in the
form XxxOBJ*_*Xxx.
The following figure shows this flow of communication.

GDI as a Graphics Language for Applications
Both the Win32 GDI and the graphics engine are completely device-independent. Consequently, applications do not
need to access the hardware directly. Based on an application graphics request, GDI works in conjunction with
device-dependent graphics drivers to provide high quality graphics output for an array of graphics devices. The
same GDI code paths are used for both printing and display devices.
GDI as a Rendering Engine
For rendering operations, the driver must first enable a surface for each PDEV structure that is enabled. A PDEV is a
logical representation of a physical device. If the hardware can be set up as a GDI standard-format bitmap, GDI can
be used to do some or all of the drawing to the bitmap surface. GDI can also handle advanced halftoning.
For information about enabling PDEVs and surfaces, refer to the DrvEnablePDEV and DrvEnableSurface
functions.
GDI-Managed Bitmaps
GDI manages bitmaps in all DIB formats including 1, 4, 8, 16, 24, and 32 bits-per-pixel. GDI can do all line drawing,
filling, text output, and bit block transfer (bitblt) operations on these bitmaps. This makes it possible for the driver to
either have GDI do all graphics rendering, or to implement functions for which its hardware offers special support.
If the device has a frame buffer in a DIB format, GDI can perform any or all graphics output directly to the frame
buffer, thereby reducing the size of the driver. If the device uses a nonstandard-format frame buffer, then the driver
must implement all required drawing functions. GDI can still simulate most drawing functions, although a
performance cost is incurred: the pixels must be copied into a standard format bitmap before they can be operated
on by GDI, and then be copied back to the original format after drawing is complete.
GDI-Managed Lines and Curves
GDI offers improved definitions of lines and curves. Lines are not required to have integer endpoints in DEVICE
coordinates, as was true for Microsoft Windows 3.x. This allows the driver to transform graphics objects without
gross rounding. The fundamental curve in GDI is a Bezier curve (cubic spline) rather than an ellipse. All GDI internal
operations are handled with Bezier curves, which are supported by most high-end devices. For devices that do not
handle Bezier curves, GDI breaks curves down into line segments before calling the driver to draw them.
GDI can download regions to be filled in the form of paths, as well as rectangles. Drivers can decompose paths into
trapezoids or spans for filling.
GDI-Managed Attributes: Brushes
GDI also manages all attributes. GDI passes attributes to the driver as brushes; the driver realizes these brushes by
converting them to a useful internal form. GDI maintains this converted information for the driver. GDI also
maintains all states of the brushes, including bounds, correlation, current position, and line style. The driver can
cache information but is not assumed to maintain any state. Except for initialization and brush realization, GDI calls
the driver only to draw on the device. GDI takes care of transformations, region locking, and pointer exclusion
before it calls the driver.
Whenever a driver needs to use a brush not yet realized, it calls back to GDI. GDI allocates memory for the brush
and calls the driver to realize it and, if necessary, dither it.
GDI Halftoning Capabilities
GDI halftoning produces a quality dither or color-halftone image for printing devices or display devices that do not
already have such capabilities built-in. Color halftoning provides:
Highest quality color and gray-scale reproduction possible on a given device.
Increased visual resolution with a limited set of intensity levels.
Improved color correlation between the different output devices.
Traditional analog halftoning is a cellular process that uses a halftoning screen. The halftoning screen is composed
of cells of equal sizes, with fixed-cell spacing center-to-center. The fixed-cell spacing accommodates the thickness
of the ink, while the size of a dot within each cell can vary to produce the impression of a continuous tone.
On a computer, most printing or screen shading also uses a fixed-cell pixel size. To simulate the variable dot size, a
combination of cluster pixels simulates the halftone screen. In Windows NT-based operating systems, GDI includes
halftoning default parameters that provide a good first approximation. Additional device-specific information can
be added to the system to improve output.
Using GDI 8-Bit-Per-Pixel CMY Mask Modes
In Microsoft Windows 2000, the HT_Get8BPPMaskPalette function returned 8-bit-per-pixel monochrome or CMY
palettes. In Windows XP and later, this function has been modified so that it also returns inverted-index CMY
palettes when the Use8BPPMaskPal parameter is set to TRUE. The type of palette returned depends on the value
stored in pPaletteEntry[0] when HT_Get8BPPMaskPalette is called. If pPaletteEntry[0] is set to 'RGB0', an
inverted-index palette is returned. If pPaletteEntry[0] is set to 0, a normal CMY palette is returned.
The reason for this change in behavior of HT_Get8BPPMaskPalette is that when Windows GDI uses ROPs, which
are based on the indexes in a palette and not on the palette colors, it assumes that index 0 of the palette is always
black and that the last index is always white. GDI does not check the palette entries. This change in
HT_Get8BPPMaskPalette ensures correct ROP output, instead of a result that is inverted.
To correct the GDI ROP behavior, GDI in Windows XP and later supports a special CMY palette composition format
in which the CMY mask palette entries start at index 255 (white) and work down to index 0 (black), instead of
starting at index 0 (white) and working up to index 255 (black). The CMY inverted modes also move all CMY mask
color entries to the middle of a full 256-entry palette, with the beginning and end of the palette padded with equal
numbers of black and white entries.
Note In the discussion that follows, the term CMY mode refers to a mode supported in the previous
implementation of HT_Get8BPPMaskPalette. The term CMY_INVERTED mode refers to modes supported only on
Windows XP and later GDI, in which this function inverts bitmask indexes when pPaletteEntry[0] is set to 'RGB0'.
The following steps are required for all Windows XP and later drivers that use Windows GDI halftone 8-bit-per-pixel
CMY mask modes. If you are developing a driver for Windows 2000, you should limit the driver's use to 8-bit-per-
pixel monochrome palettes.
1. Set the flHTFlags member of the GDIINFO structure to HT_FLAG_INVERT_8BPP_BITMASK_IDX so that GDI
will render images in one of the CMY_INVERTED modes.
2. Set pPaletteEntry[0] as follows prior to a call to HT_Get8BPPMaskPalette:
pPaletteEntry[0].peRed = 'R';
pPaletteEntry[0].peGreen = 'G';
pPaletteEntry[0].peBlue = 'B';
pPaletteEntry[0].peFlags = '0';
To do this, a caller should use the HT_SET_BITMASKPAL2RGB macro (defined in winddi.h). Here is an
example showing the use of this macro:
HT_SET_BITMASKPAL2RGB(pPaletteEntry)
Here pPaletteEntry is the pointer to the PALETTEENTRY that was passed in the call to the
HT_Get8BPPMaskPalette function. When this macro completes execution, pPaletteEntry[0] will contain the
string 'RGB0'.
3. Check the pPaletteEntry parameter returned from the call to HT_Get8BPPMaskPalette using the
HT_IS_BITMASKPALRGB macro, which is defined in winddi.h. Here is an example showing the use of this
macro.
InvCMYSupported = HT_IS_BITMASKPALRGB(pPaletteEntry)
In this expression, pPaletteEntry is the pointer to the PALETTEENTRY that was passed to the
HT_Get8BPPMaskPalette function. If this macro returns TRUE, then GDI does support the inverted CMY 8-
bit-per-pixel bitmask modes. The caller must use a translation table to convert the palette indexes to ink
levels. See Translating 8-Bit-Per-Pixel Halftone Indexes to Ink Levels for an example of a function that
generates this translation table.
If this macro returns FALSE, then the current version of GDI does not support the inverted CMY 8-bit-per-
pixel bitmask modes. In that case, GDI supports only the older CMY noninverted modes.
For GDI versions that support the 8-bit-per-pixel CMY_INVERTED modes, the meaning of the CMYMask parameter
value passed to the HT_Get8BPPMaskPalette function has been changed. The following table summarizes the
changes:
CMYMASK CMY MODE INDEXES CMY_INVERTED MODE INDEXES

VALUE (PPALETTEENTRY[0] != 'RGB0') (PPALETTEENTRY[0] == 'RGB0')
0 0: White 0 - Black
1 to 254: Light Gray --> Dark Gray 1 to 254: Dark Gray --> Light Gray
255: Black 255: White
1 0: White 0 to 65: Black

1 to 123: 123 5x5x5 colors 66 to 189: 123 5x5x5 colors plus one
124 to 255: Black duplicate. The entry at index 127 is
copied to index 128.
190 to 255: White
The values at indexes 127 and 128 are
duplicated to ensure that the XOR ROP
works correctly.
2 0: White 0 to 20: Black

1 to 214: 214 6x6x6 colors 21 to 234: 214 6x6x6 colors
215 to 255: Black 235 to 255: White
CMYMASK CMY MODE INDEXES CMY_INVERTED MODE INDEXES
VALUE (PPALETTEENTRY[0] != 'RGB0') (PPALETTEENTRY[0] == 'RGB0')
3 to 255 0: White 0: Black

1 to 254: CxMxY color bitmask 1 to 254: Centered CxMxY colors
255: Black padded with black at the beginning and
In the product above, C, M, and Y white at the end
represent the number of levels of cyan, If CxMxY is an odd number, then the
magenta, and yellow, respectively. entry at index 128 is a duplicate of the
Note: For these modes, a valid one at index 127.
combination must not have any of the 255: White
cyan, magenta, or yellow ink levels In the product above, C, M, and Y
equal to zero. For such a combination, represent the number of levels of cyan,
HT_Get8BPPMaskPalette indicates an magenta, and yellow, respectively.
error condition by returning a zero- Note: The (C x M x Y) indexes are
count palette in its pPaletteEntry centered in the 256-entry palette. That
parameter. is, there are equal numbers of black
entries padding the low end of the
palette and white entries padding the
high end.
Note: For these modes, a valid
combination must not have any of the
cyan, magenta, or yellow ink levels
equal to zero. For such a combination,
HT_Get8BPPMaskPalette indicates an
error condition by returning a zero-
count palette in its pPaletteEntry
parameter.
For a value of CMYMask of 0 (Gray scale), the caller can process either the CMY mode or the CMY_INVERTED
mode. Note, however, that GDI ROPs are correctly processed only in the CMY_INVERTED mode.
CMY Mode: Indexes 0 to 255 represent a gray scale from white to black.
CMY_INVERTED Mode: Indexes 0 to 255 represent a gray scale ranging from black to white.
For any valid value of CMYMask from 1 to 255, the caller should use the example function shown in
Translating 8-Bit-Per-Pixel Halftone Indexes to Ink Levels to translate indexes to ink levels.
For any valid value of CMYMask from 1 to 255, the CMY_INVERTED modes pad the palettes with black
entries at the beginning of the array, and an equal number of white entries at the end of the array. The
middle of the array is filled with the other colors. This ensures that all 256 of the color palette entries are
symmetrically distributed so that GDI ROPs, which are index-based, not color-based, work correctly. The
colors are symmetrically distributed when the color at index N is the inverse of the color at index (256 - N).
When a color and its inverse are printed together, the result is black. In other words, for a given color and its
inverse, the two cyan ink levels add to the maximum cyan ink level, as do the two magenta ink levels, and the
two yellow ink levels. The resulting ink levels correspond to black.
For example; a CMY palette with three levels each of cyan, magenta, and yellow has a total of 27 (3 x 3 x 3)
indexes for colors, including black and white. Because 27 is an odd number, and because GDI requires that a
CMY_INVERTED mode palette be padded with equal numbers of black and white entries, GDI duplicates the
entry at the middle index (index 13 of the 27 colors). With the entries at indexes 13 and 14 now the same,
palette will now have 28 colors. To fill the palette, GDI places 114 black entries at the beginning of the palette
(indexes 0 to 113), places the 28 colors at indexes 114 (black) through 141 (white), and fills the remaining
114 entries with white (indexes 142 through 255). This makes a total of 256 entries (114 + 28 + 114 = 256
entries). This layout of the indexes ensures that all ROPs will be correctly rendered. The example function in
Translating 8-Bit-Per-Pixel Halftone Indexes to Ink Levels shows how to generate the ink levels as well as a
Windows 2000 CMY332 index translation table.
The following table lists the cyan, magenta, and yellow levels for the 3 x 3 x 3 palette discussed in the
previous paragraph. The 28 colors (27 original palette colors plus one duplicate) are embedded in the middle
of the 256-color palette, with equal amounts of black padding at the beginning and white padding at the
end. The palette is symmetric, meaning that if the ink levels at index N are added to those at index (256 - N),
the result will be black (cyan, magenta, and yellow levels = 2).
PALETTE INDEX(3X3X3
INDEX) CYAN LEVEL0 TO 2 MAGENTA LEVEL0 TO 2 YELLOW LEVEL0 TO 2
0 to 113 2 2 2
Black
114 (0) 2 2 2
Black
115 (1) 2 2 1
116 (2) 2 2 0
117 (3) 2 1 2
118 (4) 2 1 1
119 (5) 2 1 0
120 (6) 2 0 2
121 (7) 2 0 1
122 (8) 2 0 0
123 (9) 1 2 2
124 (10) 1 2 1
125 (11) 1 2 0
126 (12) 1 1 2
127 (13) 1 1 1
Copied to index 128
PALETTE INDEX(3X3X3
INDEX) CYAN LEVEL0 TO 2 MAGENTA LEVEL0 TO 2 YELLOW LEVEL0 TO 2
128 (14) 1 1 1
Duplicate of entry at
index 127
129 (15) 1 1 0
130 (16) 1 0 2
131 (17) 1 0 1
132 (18) 1 0 0
133 (19) 0 2 2
134 (20) 0 2 1
135 (21) 0 2 0
136 (22) 0 1 2
137 (23) 0 1 1
138 (24) 0 1 0
139 (25) 0 0 2
140 (26) 0 0 1
141 (27) 0 0 0
White
142 to 255 0 0 0
White
If the requested palette is a CMY mode palette (not a CMY_INVERTED mode palette), then for values of
CMYMask from 3 to 255, the rendered 8-bit-per-pixel byte index bits have the following meaning. In this case,
the bit patterns represent ink levels that can be used directly without translation. This also applies when a
CMY_INVERTED mode byte index is mapped to CMY mode using a translation table's CMY332Idx member. See
Translating 8-Bit-Per-Pixel Halftone Indexes to Ink Levels for more information.
Bit 7 6 5 4 3 2 1 0
| | | | | |
+---+ +---+ +-+
| | |
| | +-- Yellow 0-3 (Max. 4 levels)
| |
| +-- Magenta 0-7 (Max. 8 levels)
|
+-- Cyan 0-7 (Max. 8 levels)

Translating 8-Bit-Per-Pixel Halftone Indexes to Ink
Levels
The GenerateInkLevels function shown here provides an example of how to translate 8-bit-per-pixel halftone
indexes to ink levels. These indexes are contained in CMY mode and CMY_INVERTED mode palettes that GDI's
HT_Get8BPPMaskPalette function returns in its pPaletteEntry parameter. GenerateInkLevels generates a 256-
element array of INKLEVELS structures.
This function can be used to generate either a Windows 2000 CMY mode or a post-Windows 2000
CMY_INVERTED mode translation table. This function can also be used to generate a Windows 2000 CMY mode
CMY332 reverse-mapping index table. (CMY332 uses three bits each for cyan and magenta, and two bits for
yellow.) When CMYMask value is in the range 3 to 255, the function's caller can use this table to map post-
Windows 2000 CMY_INVERTED indexes to Windows 2000 CMY indexes for currently existing drivers.
INKLEVELS Structure
typedef struct _INKLEVELS {

BYTE Cyan; // Cyan level from 0 to max
BYTE Magenta; // Magenta level from 0 to max
BYTE Yellow; // Yellow level from 0 to max
BYTE CMY332Idx; // Original windows 2000 CMY332 Index
} INKLEVELS, *PINKLEVELS;
Example GenerateInkLevels Function

The GenerateInkLevels function computes an 8-bit-per-pixel translation table of INKLEVELS structures, based on
the values in the CMYMask and CMYInverted parameters. This function generates an INKLEVELS translation table
for a valid CMYMask value in the range 0 to 255.
When this function is called, the pInkLevels parameter must point to a valid memory location of 256 INKLEVELS
entries. If the function returns TRUE, then pInkLevels can be used to translate 8-bit-per-pixel indexes to ink levels,
or to map to the older CMY332 indexes. If the function is called with CMYMask set to an invalid value (a value from
3 to 255 in which any of the cyan, magenta, or yellow levels is zero), the function returns FALSE.
BOOL
GenerateInkLevels(
PINKLEVELS pInkLevels, // Pointer to 256 INKLEVELS table
BYTE CMYMask, // CMYMask mode
BOOL CMYInverted // TRUE for CMY_INVERTED mode
)
{
PINKLEVELS pILDup;
PINKLEVELS pILEnd;
INKLEVELS InkLevels;
INT Count;
INT IdxInc;
INT cC; // Number of Cyan levels
INT cM; // Number of Magenta levels
INT cY; // Number of Yellow levels
INT xC; // Max. number Cyan levels
INT xM; // Max. number Magenta levels
INT xY; // Max. number Yellow levels
INT iC;
INT iM;
INT iM;
INT iY;
INT mC;
INT mM;
switch (CMYMask) {
case 0:
cC =
cM =
xC =
xM = 0;
cY =
xY = 255;
break;
case 1:
case 2:
cC =
cM =
cY =
xC =
xM =
xY = 3 + (INT)CMYMask;
break;
default:
cC = (INT)((CMYMask >> 5) & 0x07);

cM = (INT)((CMYMask >> 2) & 0x07);
cY = (INT)( CMYMask & 0x03);
xC = 7;
xM = 7;
xY = 3;
break;
} // end switch statement
Count = (cC + 1) * (cM + 1) * (cY + 1);
if ((Count < 1) || (Count > 256)) {

return(FALSE);
}
InkLevels.Cyan =
InkLevels.Magenta =
InkLevels.Yellow =
InkLevels.CMY332Idx = 0;
mC = (xM + 1) * (xY + 1);
mM = xY + 1;
pILDup = NULL;
if (CMYInverted) {
//
// Move the pInkLevels to the first entry following
// the centered embedded entries.
// Skipped entries are set to white (zero). Because this
// is a CMY_INVERTED mode, entries start from back of the
// table and move toward the beginning of the table.
//
pILEnd = pInkLevels - 1;
IdxInc = ((256 - Count - (Count & 0x01)) / 2);
pInkLevels += 255;
while (IdxInc--) {
*pInkLevels-- = InkLevels;
}
}
if (Count & 0x01) {
//
// If we have an odd number of entries, we need to
// duplicate the center one for the XOR ROP to
// operate correctly. pILDup will always be index
// 127, and the duplicates are at indexes 127 and 128.
//
pILDup = pInkLevels - (Count / 2) - 1;
}
//
// We are running from the end of table to the beginning,
// because in CMY_INVERTED mode, index 0 is black and
// index 255 is white. Since we generate only Count
// white, black, and colored indexes, and place them at
// the center, we will change xC, xM, xY max. indexes
// to the same as cC, cM and cY
// so we only compute cC*cM*cY entries.
//
IdxInc = -1;
xC = cC;
xM = cM;
xY = cY;
} else {
IdxInc = 1;
pILEnd = pInkLevels + 256;
}
//
// In the following composition of ink levels, the index
// always runs from 0 ink level (white) to maximum ink
// levels (black). With CMY_INVERTED mode, we compose ink
// levels from index 255 to index 0 rather than from index
// 0 to 255.
//
if (CMYMask) {
INT Idx332C;
INT Idx332M;
for (iC = 0, Idx332C = -mC; iC <= xC; iC++) {
if (iC <= cC) {
InkLevels.Cyan = (BYTE)iC;
Idx332C += mC;
}
for (iM = 0, Idx332M = -mM; iM <= xM; iM++) {
if (iM <= cM) {
InkLevels.Magenta = (BYTE)iM;
Idx332M += mM;
}
for (iY = 0; iY <= xY; iY++) {
if (iY <= cY) {
InkLevels.Yellow = (BYTE)iY;
}
InkLevels.CMY332Idx = (BYTE)(Idx332C +
InkLevels.CMY332Idx = (BYTE)(Idx332C +
Idx332M) +
InkLevels.Yellow;
*pInkLevels = InkLevels;
if ((pInkLevels += IdxInc) == pILDup) {
pInkLevels += IdxInc;
}
}
}
}
//
// Now, if we need to pad black at the other end of the
// translation table, then do it here. Notice that
// InkLevels are at cC, cM and cY here and CMY332Idx
// is at black.
//
while (pInkLevels != pILEnd) {
pInkLevels += IdxInc;
}
} else {
//
// Gray Scale case
//
for (iC = 0; iC < 256; iC++, pInkLevels += IdxInc) {
pInkLevels->Cyan =
pInkLevels->Magenta =
pInkLevels->Yellow =
pInkLevels->CMY332Idx = (BYTE)iC;
}
}
return(TRUE);
}

GDI/Driver Division of Labor
To understand graphics driver design, it is important to understand the roles of GDI and the driver, and how they
negotiate. GDI, with its enhanced capabilities, can handle many operations that previously required a graphics
driver. GDI also has the responsibility of managing data structures critical to graphics operations, such as surfaces,
although each graphics driver must have access to them.
GDI Communication with the Driver
The driver exports only one function to GDI: DrvEnableDriver. All other driver-supported functions, including the
DrvDisableDriver function, are exposed to GDI through an array of pointers. A GDI call to DrvEnableDriver
initializes the driver and passes back the list of driver-supported graphics DDI functions. While there are some
functions a driver must support, GDI will handle those operations not included in the function list received from the
driver's DrvEnableDriver routine. GDI calls DrvDisableDriver when the driver is to be unloaded. Graphics DDI
functions are discussed in depth in Using the Graphics DDI.
GDI makes a large number of objects and services available to the driver. These fall into two categories: user objects
and service routines.
GDI User Objects
GDI maintains important internal data structures, but gives the driver access to the public fields of these structures
by passing them down as user objects. User objects are intermediate data structures that provide an interface
between GDI data structures and the drivers that need access to the information within these structures. The driver
can pass the pointer to a user object back to GDI to query for information or to ask for various services. User
objects with public fields provide the following advantages:
They eliminate problems associated with direct access to internal GDI data structures.
They provide a place to hold GDI data for the driver. For example, a PATHOBJ structure can hold all the extra
data required to enumerate a complex object like a path.
The following user objects are available:
OBJECT DESCRIPTION
BRUSHOBJ Defines the brush objects for graphic functions that

output lines, text, or fills. Drivers can call BRUSHOBJ
services to realize brushes or to find realizations previously
cached by GDI.
CLIPOBJ Provides the driver with access to a clip region for

drawing or filling. This region can be enumerated as a
series of rectangles.
FLOATOBJ Allows graphics drivers to emulate floating-point

operations. Floating-point operations are disabled for all
other kernel-mode drivers.
FONTOBJ Gives the driver access to information about a particular

instance (or realization) of a font.
PALOBJ A structure containing RGB palette colors; accessible via

the PALOBJ_cGetColors and DrvSetPalette functions.
The PALOBJ structure contains no public members.
PATHOBJ Defines a path that specifies what is to be drawn (lines or

Bezier curves). A PATHOBJ structure is passed to the
driver to describe a set of lines and Bezier curves that are
to be stroked or filled.
STROBJ Enumerates for the driver a list of glyph handles and

positions that describe how a text string is to be drawn.
OBJECT DESCRIPTION
SURFOBJ Identifies a surface, which can be a GDI bitmap, a device-

dependent bitmap, or a device-managed surface. See
Surface Types for more information.
XFORMOBJ Describes an arbitrary linear two-dimensional transform,

such as for geometric wide lines.
XLATEOBJ Defines the translations needed to convert pixels from the

source surface format to the destination surface format.

GDI Service Routines
GDI exports many service routines, whose names have the form EngXxx. The driver dynamically links to win32k.sys
to directly access these routines. GDI service routines include surface management, rendering simulations, and
path, palette, font, and text services. These services are discussed in detail in GDI Support Services.
PDEV Negotiation
One of the primary responsibilities of any graphics driver is to enable a PDEV during driver initialization. A PDEV is
a logical representation of the physical device. This representation is defined by the driver and is typically a private
data structure. Refer to DrvEnablePDEV for more information about enabling PDEVs.
Through the DrvEnablePDEV function, the driver must provide information to GDI that describes the requested
device and its capabilities. One piece of important information that the driver gives GDI is the set of graphics
capability flags (GCAPS_Xxx and GCAPS2_Xxx flags) in the flGraphicsCaps and flGraphicsCaps2 members of the
DEVINFO structure.
The capability flags allow GDI to determine which operations the PDEV supports. For example, GDI tests the
capability flags that indicate whether the PDEV can handle Bezier curves and geometric wide lines before GDI
attempts to call the DrvStrokePath function to draw paths with these primitive types. If the capability flags indicate
that the PDEV cannot handle these primitive types, GDI will break down the lines or curves so it can make simpler
calls to the driver.
From the driver's side, whenever the driver gets an advanced path-related call from GDI, it can return FALSE if the
path or clipping is too complex for the device to process.
The driver cannot return FALSE from DrvStrokePath when handling a cosmetic line because the driver must handle
any complex clipping or styling for cosmetic lines. However, DrvStrokePath can return FALSE if the path has Bezier
curves or geometric lines. When this occurs, GDI breaks the call down to simpler calls, just as it does if the capability
bits are not set. For example, if DrvStrokePath returns FALSE when it is sent a geometric line, GDI simplifies the line
and calls the DrvFillPath function.
If DrvStrokePath is to report an error, it must return DDI_ERROR.
This kind of negotiation between GDI and the driver, for functions that depend on the PDEV, permits GDI and the
driver to produce high quality output without excess communication.
Surface Negotiation
Drawing and text output require a surface on which to draw. This surface is created by the DrvEnableSurface
function and is referred to as the primary surface. This surface is also known as the on-screen surface, because it
appears in the video display. There is only one primary surface enabled per PDEV although a driver can support
several PDEVs. Drivers that support the DrvCreateDeviceBitmap function can create and use additional surfaces.
These bitmap surfaces are referred to as secondary surfaces or off-screen surfaces. For either type of surface, the
driver is responsible for determining the type of drawing operations it supports.
Surface Coordinates
A device surface is a subset of an array of 2²⁸ by 2²⁸ pixels. These pixels are addressed by pairs of 28-bit signed
numbers, the upper-leftmost pixel of the device surface is given the coordinates (0,0). The device surface lies in the
lower right quadrant of this coordinate space, where both coordinates are nonnegative.
DC Origin
Application programs are required to keep their graphics within an array of 2²⁷ by 2²⁷ pixels. The device space has
additional size at the graphics DDI level because Window Manager may offset the application's coordinate system
by one signed 27-bit coordinate, the DC origin. The DC origin is not visible to the driver, and the driver identifies the
graphics coordinates only after the offset is applied.
FIX Coordinates
Graphics DDIs use fractional coordinates that can specify a location on the device surface within one-sixteenth of a
pixel. (On vector devices, the fractional coordinates are sixteen times more accurate than the device resolution.) The
fractional coordinates are represented as 32-bit numbers in signed 28.4 bit FIX notation. In this notation, the
highest-order 28 bits represent the integer part of the coordinate, and the lowest 4 bits represent the fractional
part. For example, 0x0000003C equals +3.7500, and 0xFFFFFFE8 equals -1.5000.
FIX coordinates represent control points for lines and Bezier curves. For certain objects, such as rectangular clip
regions, GDI uses signed 32-bit integers to represent coordinates. Because coordinates are 28-bit quantities, the
highest 5 bits of an integer coordinate are either all cleared or all set.
Surface Types
Surface types can be viewed in the context of how they are handled. The following types exist:
Engine-managed surfaces
Device-managed surfaces (standard-format bitmaps)
Device-managed surfaces (nonstandard-format bitmaps)
Engine -Managed Surfaces
An engine-managed surface:
Is created and managed by GDI.
Is created as a device-independent bitmap (DIB) in one of the standard DIB formats: top-down, in which the
origin is at the upper-left corner, or bottom-up, in which the origin is at the lower-left corner.
Is of type STYPE_BITMAP.
Does not have a corresponding device handle to a surface.
A standard-format bitmap is a single-plane, packed-pixel (where the data for each pixel is stored in a contiguous
manner) format bitmap. Each scan line of the bitmap is aligned on a four-byte boundary.
Bitmaps created in the EngCreateBitmap function are in DIB format. A bitmap must be in DIB format for the
engine to manage it.
Device -Managed Surfaces (Standard-Format Bitmaps)
A device-managed surface:
Is created by a call to the device driver's DrvCreateDeviceBitmap function.
Has an associated device handle to a surface (DHSURF; for more information, see SURFOBJ).
Can be either opaque or nonopaque.
An opaque device-managed surface is one for which GDI has neither any information about the bitmap format nor
a reference to the bits in the bitmap. For these reasons, the driver must support, at minimum, the DrvBitBlt,
DrvTextOut, and DrvStrokePath functions. The type of such a surface is STYPE_DEVBITMAP.
A nonopaque device-managed surface is one for which GDI has information about the bitmap format and knows
the location of the bits within the bitmap. Because of this, the driver does not need to implement any drawing
operations, deferring all of them to GDI. The type of such a surface is SYTPE_BITMAP.
For a driver to convert a device-managed opaque bitmap to one that is nonopaque, it must call the
EngModifySurface function. With this call, the driver is informing GDI of the bitmap format and location of the
bitmap in memory.
When a driver has a device-managed DIB surface, the driver can call back to GDI to have GDI draw on the surface. A
driver that is managing its own surfaces, but is using a DIB, can still refer calls back to GDI by wrapping a DIB
(which is created with the EngCreateBitmap function) around its surface. The following steps describe how the
driver can have GDI draw on a device-managed DIB surface:
1. The driver calls EngCreateBitmap to create a DIB engine-managed surface.
2. The driver calls the EngCreateDeviceBitmap function to create a device-dependent bitmap (DDB) surface,
which is a device-managed DIB surface.
3. The driver internally saves the engine-managed DIB data in the device-managed DDB data.
4. GDI always calls the driver to interact with the surface through the device-managed DDB data.
5. When the driver receives a call from GDI and cannot handle the call (for example, the driver cannot handle
complex clipping), the driver retrieves the DIB data that is stored in the DDB data and passes the DIB data to
GDI to render.
Device -Managed Surfaces (Nonstandard-Format Bitmaps)
A driver can enable a device-managed non-DIB surface by calling the EngCreateDeviceSurface function to have
GDI create the surface and return a handle to it. GDI relies on the driver to access, to control drawing to, and to read
from a device-managed surface.
A device-dependent bitmap (DDB), which is sometimes called a device-format bitmap, is another type of non-DIB,
device-managed surface. The DDB is supported to allow certain drivers, such as the VGA driver, to implement faster
bitmap-to-screen block transfers. The DDB also allows drivers to draw to banked or non-DIB bitmaps in offscreen
display memory. If a DDB is required, the driver can support the DrvCreateDeviceBitmap function and call the
EngCreateDeviceBitmap function to have the engine return a handle to the bitmap.
GDI Color Space Conversions
GDI uses three RGB color spaces for its bitmap representations. In each of these color spaces, three bitfields, or
color channels, are used to specify the number of bits used for red, green, and blue, respectively, in a given color. To
be able to match GDI's capabilities with bitmaps, video drivers must be able to convert from one RGB color space to
another.
GDI recognizes the following RGB color spaces:
5:5:5 RGB: five-bit color channels for red, green, and blue
5:6:5 RGB: a five-bit color channel for red, a six-bit color channel for green, and a five-bit color channel for
blue
8:8:8 RGB: eight-bit color channels for red, green, and blue
In general, when converting from a color channel with more bits to one with fewer bits, GDI discards the lower-
order bits. When the conversion goes from a color channel with fewer bits to one with more bits, all of the bits of
the smaller channel are copied to the larger channel. To fill out the remaining bits of the larger channel, some of the
bits of the smaller channel are copied again to the larger channel. The following table summarizes the rules that
GDI uses to convert from one RGB color space to another. In this table, color channels whose values change in the
conversion are shown in bold font style.
GDI Color Space Conversion Rules
FROM TO RULE EXAMPLE
5:5:5 5:6:5 The most-significant bit (0x15, 0x19, 0x1D)

(MSB) of the source's becomes
green channel is
appended to the low- (0x15, 0x33, 0x1D)
order end of the target's Note that only the green
green channel. channel changes. The 5-bit
channel value of the source
is 1 1001, in binary, which is
converted to a 6-bit value,
11 0011.
5:5:5 8:8:8 For each channel, the (0x15, 0x19, 0x1D)

three MSBs of the becomes
source channel are
appended to the lower- (0xAD, 0xCE, 0xEF)
order end of the target In the red channel, 1 0101
channel. becomes 1010 1101. Similar
changes occur in the green
and blue channels.
5:6:5 5:5:5 Discard the least- (0x15, 0x33, 0x1D)

significant bit (LSB) of becomes
the source's green
channel. (0x15, 0x19, 0x1D)
Note that only the green
channel changes. Discard the
lowest bit
of 11 0011 to get 1 1001.
FROM TO RULE EXAMPLE
5:6:5 8:8:8 For the 5-bit (red and (0x15, 0x33, 0x1D)
blue) channels of the becomes
source, copy the three
MSBs from the source (0xAD, 0xCF, 0xEF)
and append them to the In the red channel, 1 0101
lower-order end of the becomes 1010 1101. In the
target. For the 6-bit green channel,11 0011
green channel, copy the becomes
two MSBs from the 1100 1111. The
source and append them transformation in the blue
to the lower-order end channel is similar to that of
of the target. the red channel.
8:8:8 5:5:5 Discard the three LSBs (0xAB, 0xCD, 0xEF)

from each channel of the becomes
source.
(0x15, 0x19, 0x1D)
In the red channel, 1010
1011 becomes 1 0101.
Similar transformations
occur in the other two
channels.
8:8:8 5:6:5 Discard the three LSBs (0xAB, 0xCD, 0xEF)

from the red and blue becomes
channels. Discard the
two LSBs from the green (0x15, 0x33, 0x1D)
channel. In the green channel, 1100
1101 becomes 11 0011. The
changes in the red and blue
channels are identical to
those of the previously-
listed transformation.

Hooking Versus Punting
The terms hooking and punting refer to driver decisions on whether it provides standard bitmap drawing
operations, or relies on GDI to provide them. If the driver implements engine-managed surfaces, GDI can handle all
drawing operations. A driver can, however, provide one or more of the drawing functions if its hardware can
accelerate those operations. It does this by implementing, or hooking, a DrvXxx function.
A driver writer may wish to implement only a subset of the drawing operations a particular graphics DDI entry
point implements. For any operations the driver does not support, it can call the appropriate GDI functions to carry
them out. This is referred to as punting to GDI. There are some situations in which the operation must be
implemented in the driver. For example, if the driver implements a device-managed surface, certain drawing
functions must be implemented in the display driver.
Hooking
By default, when a drawing surface is an engine-managed surface, GDI handles the drawing (rendering) operation.
For a driver to take advantage of hardware that offers acceleration for some or all of these drawing functions for a
given surface, or to make use of special block transfer hardware, it can hook these functions. To hook calls, the
driver specifies the hooks as flags of the flHook parameter of the EngAssociateSurface and EngModifySurface
functions.
If the driver specifies the hook flag of a function, it must provide that function in its list of supported graphics DDI
entry points. The driver can optimize the operation where there is hardware support. Such a driver might handle
only certain cases on a hooked call. For example, if complicated graphics are requested on a call that is hooked, it
may still be more efficient to punt the callback to GDI, allowing GDI to handle the operation.
Here is another example of a driver that chooses whether to handle a hooked call. Consider a driver that supports
hardware capable of handling bit-block-transfer calls with certain ROPs. Even though this driver can carry out many
operations on its own, it is otherwise just a frame buffer. Such a driver will return a handle to the bitmap surface for
the frame buffer as the surface for its PDEV, but it will hook the DrvBitBlt call for itself. When GDI calls DrvBitBlt,
the driver can check the ROP to see if it is one of those supported by the hardware. If not, the driver can pass the
operation back to GDI with a call to the EngBitBlt function.
Drivers that support device-managed surfaces must hook out some of the drawing functions; namely
DrvCopyBits, DrvTextOut, and DrvStrokePath. Although GDI simulations can handle other drawing functions, it
is recommended for performance reasons that drivers of this type hook out other functions, such as the DrvBitBlt
and DrvRealizeBrush functions, because simulation requires drawing from and to the surface.
Punting
Punting the callback to GDI means to put in a call to the corresponding GDI simulation. In general, for every DrvXxx
graphics call, there is a corresponding GDI EngXxx simulation call that takes the same arguments. As long as the
driver has made the bitmap nonopaque, all parameters can be passed without change to a GDI simulation. For each
call the driver punts back to GDI, the size of the driver is reduced (since the code for that functionality can be
omitted). However, because the engine owns the call, the driver does not have control over the execution speed. For
some complicated cases, there may be no real advantage to providing support in the driver.
Hookable GDI Graphics Output Functions
The graphics output functions that the driver can hook and the corresponding GDI simulations are listed in the
following table.
DRIVER GRAPHICS OUTPUT FUNCTION CORRESPONDING GDI SIMULATION
DrvBitBlt EngBitBlt
DrvPlgBlt EngPlgBlt
DrvStretchBlt EngStretchBlt
DrvStretchBltROP EngStretchBltROP
DrvTextOut EngTextOut
DrvStrokePath EngStrokePath
DrvFillPath EngFillPath
DrvStrokeAndFillPath EngStrokeAndFillPath
DrvLineTo EngLineTo
DrvCopyBits EngCopyBits
DrvAlphaBlend EngAlphaBlend
DrvGradientFill EngGradientFill
DrvTransparentBlt EngTransparentBlt

GDI Support Services
GDI exports many service routines that can simplify driver design. The driver can call these routines directly. The
names of routines that are general graphics engine services whose names begin with Eng. Service routines related
to a particular object always start with the name of the object; for example, CLIPOBJ_cEnumStart is a CLIPOBJ
service.
Note The service routines in which the first argument is a pointer to a user object are methods on that user object,
and are called using the usual C++ conventions. Therefore, drivers written in C++ can access the service routines
as methods.
These service routines fall into the following categories:
Surface management
Palette services
Path services
Window services
Rendering services
Font and text services
Memory services
Event services
File, Module, and Process services
Semaphore services
Printer services
Driver-related services
Information services
Utility services
Floating-point services
Halftone services
Using the Graphics DDI describes the graphics DDI entry points and also explains where many of these service
routines can be used to help the driver implement the entry points. For detailed descriptions of each of the service
functions, see GDI Functions Called by Printer and Display Drivers.
GDI Support for Surfaces
For each PDEV, a driver must support the DrvEnableSurface function. DrvEnableSurface sets up the surface to be
drawn on and associates it with the PDEV. The driver must also support the DrvDisableSurface function to disable
created surfaces. Because GDI creates and maintains the surface, the driver relies on several GDI service functions,
listed in the following table, to implement the enabling and disabling of surfaces.
FUNCTION NAME PURPOSE
EngAssociateSurface Associates a surface with a PDEV and defines the drawing

operations the driver writer wants to hook out for that
surface. It uses the PDEV's default palette and style steps.
The driver must make this call for the primary surface
during the execution of DrvEnableSurface. The driver
must also make this call when it enables a secondary
surface before locking the surface to write on it.
EngCheckAbort (Printers only) Enables a printer driver to determine

whether its printer job has been terminated.
EngCreateBitmap Creates a standard format DIB bitmap. GDI can perform

all drawing operations on this type of surface.
EngCreateDeviceBitmap Creates a device-dependent bitmap which the driver is

responsible for drawing on (although it can be created as
a DIB, in which case the driver can call back to have GDI
draw on it).
EngCreateDeviceSurface Creates a device-managed surface. The driver is

responsible for managing certain drawing operations for
this surface. The function returns a handle that the driver
manages.
EngCreateWnd Create a WNDOBJ structure on a specified surface.
EngDeleteSurface Deletes a surface (DIB, device-dependent bitmap, or

device-managed surface).
EngDeleteWnd Deletes a WNDOBJ structure.
EngEraseSurface Fills a specified rectangle on a surface with a given color,

effectively erasing it. This function should be called only to
erase the surface of a GDI bitmap.
FUNCTION NAME PURPOSE
EngLockDirectDrawSurface Locks the kernel-mode handle of a DirectDraw surface.
EngLockSurface Gives the driver access to a created surface by creating a

user object (SURFOBJ) for that surface. (The primary
surface is not locked.)
EngMarkBandingSurface (Printers only) Marks a surface as a banding surface.
EngModifySurface Notifies GDI about the attributes of a surface that was

created by the driver.
EngUnlockDirectDrawSurface Releases the lock on a given DirectDraw specified surface.
EngUnlockSurface Unlocks a surface when the driver has finished a drawing

operation (to be called when disabling a secondary
surface).

GDI Support for Palettes
GDI can do most of the work with regard to palette management. When GDI calls the DrvEnablePDEV function,
the driver returns its default palette to GDI as part of the DEVINFO structure. The driver must create this palette
using the EngCreatePalette function.
A palette effectively maps 32-bit color indexes into 24-bit RGB color values, which is the way GDI uses palettes. A
driver specifies its palette so GDI can determine how different color indexes are to appear on the device.
The driver need not deal with most palette operations and calculations as long as it uses the XLATEOBJ provided
by GDI.
If the device supports a modifiable palette, it should implement the function DrvSetPalette. GDI calls
DrvSetPalette when an application changes the palette for a device and passes the resulting new palette to the
driver. The driver should set its internal hardware palette to match the new palette as closely as possible.
A palette can be defined for GDI in either of the two different formats listed in the following table.
PALETTE FORMAT DESCRIPTION
Indexed A color index is an index into an array of RGB values. The

array can be small, containing, for example, 16 color
indexes, or large, containing, for example, 4096 color
indexes or more.
Bit Fields Bit fields in the color index specify colors in terms of the
amounts of R, G, and B in each color. For example, 5 bits
could be used for each, providing a value between 0 and
31 for each color. The 5-bit value would be scaled up to
cover a range of 0 to 255 for each component when
converting to RGB. (The usual RGB representation itself is
defined by bitfields.)
GDI typically uses the palette mapping in reverse. That is, an application specifies an RGB color for drawing and
GDI must locate the color index that causes the device to display that color. As indicated in the next table, GDI
provides two primary palette service functions for creating and deleting the palette, as well as some service
functions related to the PALOBJ and the XLATEOBJ used to translate color indexes between two palettes.
EngCreatePalette Creates a palette. The driver associates the palette with a

device by returning a handle to the palette in the
DEVINFO structure.
EngDeletePalette Deletes the given palette.
EngDitherColor Returns a standard 8x8 dither that approximates the

specified RGB color.
EngQueryPalette Queries a palette for its attributes.
PALOBJ_cGetColors Allows a driver to download RGB colors from an indexed

palette. Called by the display driver in the DrvSetPalette
function.
XLATEOBJ_cGetPalette Retrieves the 24-bit RGB colors or the bitfield format for
the colors in an indexed source palette. The driver can use
this function to obtain information from the palette to
perform color blending.
XLATEOBJ_hGetColorTransform Returns the color transform for the specified translation

object.
XLATEOBJ_iXlate Translates a single source color index to a destination

color index.
XLATEOBJ_piVector Retrieves a translation vector from an indexed source

palette. The driver can use this vector to perform its own
translation of the source indexes to destination indexes.

GDI Services for Paths
To assist vector devices in filling complex areas, their drivers can call the engine functions, listed in the following
table, that create, modify, and enumerate a path. The driver has access to paths through the PATHOBJ structure.
GDI PATH SERVICE FUNCTION DESCRIPTION
EngCreatePath Allocates a path for the driver's temporary use. The driver
should delete this path before returning to GDI from its
current drawing call.
EngDeletePath Deletes a path allocated by the EngCreatePath function.
PATHOBJ_bCloseFigure Closes a path (for filling) by drawing a line back to the

starting point.
PATHOBJ_bEnum Retrieves the next PATHDATA record from a path. Each

record describes all or part of a subpath.
PATHOBJ_bEnumClipLines Enumerates clipped line segments from a path.
PATHOBJ_bMoveTo Changes the current position in a PATHOBJ-defined

path.
PATHOBJ_bPolyBezierTo Draws Bezier curves (cubic splines) in a PATHOBJ-defined

path.
PATHOBJ_bPolyLineTo Draws lines in a PATHOBJ-defined path.
PATHOBJ_vEnumStart Notifies a PATHOBJ that the driver will begin calling

PATHOBJ_bEnum to enumerate the curves in the
specified path. This function must be called in case of an
enumeration restart.
PATHOBJ_vEnumStartClipLines Allows the driver to ask for lines to be clipped against a

CLIPOBJ. This is useful when the clip region is more
complex than a single rectangle.
PATHOBJ_vGetBounds Returns a bounding rectangle for the path.

GDI Support for Window Objects
GDI provides support for window creation and deletion, and for the enumeration of rectangles in a window.
EngCreateWnd Create a WNDOBJ structure on a specified surface.
EngDeleteWnd Deletes a WNDOBJ structure.
WNDOBJ_bEnum Gets a collection of rectangles from the visible region of a

window.
WNDOBJ_cEnumStart Sets parameters for enumeration of rectangles in the

visible region of a window.
WNDOBJ_vSetConsumer Sets a driver-defined value in the pvConsumer member

of the specified WNDOBJ structure.

GDI Drawing and Related Services
To support the CLIPOBJ, BRUSHOBJ, and XFORMOBJ structures, GDI offers several drawing services, listed in the
following table.
GDI DRAWING SERVICE FUNCTION DESCRIPTION
BRUSHOBJ_hGetColorTransform Retrieves the color transform for the specified brush.
BRUSHOBJ_pvAllocRbrush Allocates memory for the driver's realization of a brush.
BRUSHOBJ_pvGetRbrush Returns a pointer to the driver's realization of the brush.

Realizes the brush if it has not yet been realized.
BRUSHOBJ_ulGetBrushColor Returns the RGB color of the specified solid brush.
CLIPOBJ_bEnum Retrieves a batch of rectangles from the clip region.
CLIPOBJ_cEnumStart Sets parameters for enumeration of the rectangles in all or

part of the clipped region. (The region can be enumerated
once without calling this function, but subsequent
enumerations require this function's use).
CLIPOBJ_ppoGetPath Is used to retrieve complicated regions as a path.
EngAlphaBlend Provides bit-block transfer capabilities with alpha

blending. This is the GDI simulation for the
DrvAlphaBlend function.
EngBitBlt Provides general bit-block transfer capabilities either

between device-managed surfaces, or between a device-
managed surface and a GDI-managed standard format
bitmap. This is the GDI simulation for the DrvBitBlt
function.
EngControlSprites Tears down or redraws sprites on the specified WNDOBJ

area.
EngCopyBits Translates between device-managed raster surfaces and

GDI standard-format bitmaps. This is the GDI simulation
for the DrvCopyBits function.
EngCreateClip Allocates a CLIPOBJ for the driver's temporary use. The

driver should call the EngDeleteClip function to delete it
when it is no longer needed.
EngDeleteClip Deletes a CLIPOBJ allocated with the EngCreateClip

function.
EngDeviceIoControl Sends a control code to the specified video miniport

driver, causing the device to perform the specified
operation.
EngFillPath Fills (paints) a specified path. This is the GDI simulation for
the DrvFillPath function.
EngGradientFill Shades the specified graphics primitives. This is the GDI

simulation for the DrvGradientFill function.
EngLineTo Draws a single, solid, integer-only cosmetic line. This is the

GDI simulation for the DrvLineTo function.
EngMovePointer Moves the engine-managed pointer on the device. This is

the GDI simulation for the DrvMovePointer function.
EngPaint Paints a specified region. This is the GDI simulation for the
obsolete DrvPaint function.
EngPlgBlt Performs a rotate bit-block transfer. This is the GDI

simulation for the DrvPlgBlt function.
EngSetPointerShape Sets the shape of the pointer.
EngSetPointerTag Creates a shape that is ORed with the application's

pointer shape on DrvSetPointerShape calls to other
associated drivers in a mirrored system.
This function is obsolete for Windows 2000 and later.
EngStretchBlt Performs a stretching bit-block transfer. This is the GDI

simulation for the DrvStretchBlt function.
EngStretchBltROP Performs a stretching bit-block transfer using a ROP. This

is the GDI simulation for the DrvStretchBltROP function.
EngStrokeAndFillPath Strokes (draws) a path and fills it at the same time. This is
the GDI simulation for the DrvStrokeAndFillPath
function.
EngStrokePath Strokes (draws) a path. This is the GDI simulation for the
DrvStrokePath function.
EngTransparentBlt Performs a transparent blt. This is the GDI simulation for

the DrvTransparentBlt function.
XFORMOBJ_bApplyXform Applies the given transform or its inverse to the given

array of points.
XFORMOBJ_iGetFloatObjXform Downloads a FLOATOBJ transform to the driver.
XFORMOBJ_iGetXform Downloads a transform to the driver.

GDI Font and Text Services
GDI provides support for font management and text output. The FONTOBJ structure and related functions give a
driver access to a particular instance of a font. To support text output, the driver has access to the STROBJ structure
and related functions. The following table lists FONTOBJ- and STROBJ-related functions.
EngComputeGlyphSet Computes the glyph set supported on a device.
EngFntCacheAlloc Allocates memory for a cached font file.
EngFntCacheFault Reports an error to the font engine if the font driver

encountered an error reading from or writing to a font
data cache.
EngFntCacheLookUp Retrieves a pointer to cached font file data.
EngGetCurrentCodePage Returns the system's default OEM and ANSI code pages.
EngGetType1FontList Retrieves a list of PostScript Type 1 fonts that are installed

both locally and remotely.
EngTextOut This is the GDI simulation for the DrvTextOut function.
FONTOBJ_cGetAllGlyphHandles Allows the driver to retrieve every glyph handle of a GDI

font. The driver uses this service to download an entire
font.
FONTOBJ_cGetGlyphs Translates glyph handles into pointers to the associated

glyph data for the font consumer. These pointers are valid
until the next call to FONTOBJ_cGetGlyphs.
FONTOBJ_pfdg Retrieves the pointer to the FD_GLYPHSET structure

associated with the specified font.
FONTOBJ_pifi Retrieves the pointer to the IFIMETRICS structure that

describes the associated font.
FONTOBJ_pjOpenTypeTablePointer Returns a pointer to a view of an OpenType table.

FONTOBJ_pQueryGlyphAttrs Returns information about a font's glyphs.
FONTOBJ_pvTrueTypeFontFile Retrieves a pointer to a view of a TrueType, OpenType, or

Type1 font file.
FONTOBJ_pwszFontFilePaths Retrieves the file path(s) associated with a font.
FONTOBJ_pxoGetXform Retrieves the Notional-to-Device transform for the

associated font. This transform is required for a driver to
realize a driver-supplied font.
FONTOBJ_vGetInfo Returns information that describes the associated font.
STROBJ_bEnum Enumerates glyph identities and positions in the specified

STROBJ.
STROBJ_bEnumPositionsOnly Enumerates glyph identities and positions for a specified

text string, but does not create cached glyph bitmaps.
STROBJ_bGetAdvanceWidths Returns vectors specifying the probable widths of glyphs

making up a specified string.
STROBJ_dwGetCodePage Returns the code page associated with the specified

STROBJ.
STROBJ_fxBreakExtra Retrieves the amount of extra space to be added to each

space character in a string when displaying and/or
printing justified text.
STROBJ_vEnumStart Restarts the enumeration of the GLYPHPOS array for the

specified STROBJ. This function should be called by the
driver prior to subsequent enumerations.

GDI Memory Services
GDI provides several memory-related services to driver writers, including the ability to allocate and deallocate
system memory, user memory, private user memory, and video memory, as well as the ability to lock and unlock a
range of memory. The following table lists the GDI memory services.
EngAllocMem Allocates a block of memory, and inserts a caller-supplied

tag before the allocation.
EngAllocPrivateUserMem Allocates a block of private user memory from the address

space of a specified process, and inserts a caller-supplied
tag before the allocation.
EngAllocUserMem Allocates a block of memory from the address space of

the current process, and inserts a caller-supplied tag
before the allocation.
EngFreeMem Deallocates a block of system memory allocated by

EngAllocMem.
EngFreePrivateUserMem Deallocates a block of private user memory allocated by

EngAllocPrivateUserMem.
EngFreeUserMem Deallocates a block of user memory allocated by

EngAllocUserMem.
EngSecureMem Locks down the specified address range in memory.
EngUnsecureMem Unlocks a memory address range that is locked down.
HeapVidMemAllocAligned Allocates off-screen memory for a display driver by using

the DirectDraw video memory heap manager.
VidMemFree Frees off-screen memory allocated for a display driver by

HeapVidMemAllocAligned.

GDI Event Services
GDI provides several services related to events. Drivers using these services can create and delete events, map and
unmap events, and read, set and clear events.
EngClearEvent Sets a specified event object to the nonsignaled state.
EngCreateEvent Creates a synchronization event object that can be used

to synchronize hardware access between a display driver
and the video miniport driver.
EngDeleteEvent Deletes the specified event object.
EngMapEvent Maps a user-mode event object to kernel mode.
EngReadStateEvent Returns the current state of the specified event object:

signaled or nonsignaled.
EngSetEvent Sets the specified event object to the signaled state, and
returns the event object's previous state.
EngUnmapEvent Cleans up the kernel-mode resources allocated for a

mapped user-mode event.
EngWaitForSingleObject Puts the current thread of the display driver into a wait
state until the specified event object is set to the signaled
state, or until the wait times out.

GDI File, Module, and Process Services
GDI provides a variety of services for file, module, and process manipulation.
EngDeleteFile Deletes a file.
EngFindImageProcAddress Returns the address of a function within an executable

module.
EngFindResource Determines the location of a resource in a module.
EngFreeModule Unmaps a file from system memory.
EngGetCurrentProcessId Gets the ID of the current process.
EngGetCurrentThreadId Gets the ID of the current thread.
EngGetFileChangeTime Returns the time a file was last written to.
EngGetFilePath Determines the file path associated with the specified font
file.
EngGetProcessHandle Retrieves a handle to the current client process.
EngLoadImage Loads the specified executable image into kernel-mode

memory.
EngLoadModule Loads the specified data module into system memory for
reading.
EngLoadModuleForWrite Loads the specified executable module into system

memory for writing.
EngMapFile Creates or opens a file and maps it into system space.
EngMapFontFile Obsolete. See the entry in this table for

EngMapFontFileFD.
EngMapFontFileFD Maps a font file into system memory, if necessary, and

returns a pointer to the base location of the font data in
the file.
EngMapModule Returns the address and size of an executable file that was
loaded by EngLoadModule.
EngQueryFileTimeStamp Returns the time stamp of a file.
EngUnloadImage Unloads an image loaded by EngLoadModule.
EngUnmapFile Unmaps the view of a file from system space.
EngUnmapFontFile Obsolete. See the entry in this table for

EngUnmapFontFileFD.
EngUnmapFontFileFD Unmaps the specified font file from system memory.

GDI Semaphore Services
GDI provides a selection of services related to semaphores and safe semaphores. A driver can use these services to
create or delete a semaphore, and acquire or release a semaphore.
EngAcquireSemaphore Acquires the resource associated with the semaphore for

exclusive access by the calling thread.
EngCreateSemaphore Creates a semaphore object.
EngDeleteSafeSemaphore Removes a reference to the specified safe semaphore.
EngDeleteSemaphore Deletes a semaphore object from the system's resource

list.
EngInitializeSafeSemaphore Initializes the specified safe semaphore.
EngIsSemaphoreOwned Determines whether any thread holds the specified

semaphore.
EngIsSemaphoreOwnedByCurrentThread Determines whether the currently executing thread holds

the specified semaphore.
EngReleaseSemaphore Releases the specified semaphore.

GDI Printer Services
GDI provides a number of services that are of interest to printer driver writers. The following table lists these
services.
EngEnumForms Enumerates the forms supported by the specified printer.
EngGetForm Gets the FORM_INFO_1 details for the specified form.
EngGetPrinter Retrieves information about the specified printer.
EngGetPrinterData Retrieves configuration data for the specified printer.
EngGetPrinterDataFileName Retrieves the string name of the printer's data file.
EngGetPrinterDriver Retrieves driver data for the specified printer.
EngMapFontFile Obsolete. See the entry in this table for

EngMapFontFileFD.
EngMapFontFileFD Maps a font file into system memory, if necessary, and

returns a pointer to the base location of the font data in
the file.
EngMarkBandingSurface Marks the specified printer surface as a banding surface.
EngSetPrinterData Obsolete. Sets the configuration data for the specified

printer.
EngUnmapFontFile Obsolete. See the entry in this table for

EngUnmapFontFileFD.
EngUnmapFontFileFD Unmaps the specified font file from system memory.
EngWritePrinter Allows printer graphics DLLs to send a data stream to

printer hardware.

GDI Driver-Related Services
Driver writers can use the GDI driver-related services listed in the following table to create or delete driver objects,
obtain the name of the driver's DLL, and lock or unlock a driver object.
EngCreateDriverObj Creates a DRIVEROBJ structure. This structure is used to

track a device-managed resource that must be released if
the resource-allocating process terminates without first
cleaning it up.
EngDeleteDriverObj Frees the handle used for tracking a device-managed

resource.
EngGetDriverName Returns the name of the driver's DLL.
EngLockDriverObj Creates an exclusive lock on a driver object for the calling

thread.
EngUnlockDriverObj Unlocks the driver object.

GDI Information Services
GDI provides several services a driver can use to query the system about device and system attributes, a file's time
stamp, and the performance counter. These services are listed in the following table.
EngQueryDeviceAttribute Allows the driver to query the system about particular

attributes of the device.
EngQueryFileTimeStamp Returns the time stamp of a file.
EngQueryLocalTime Queries the local time.
EngQueryPerformanceCounter Queries the performance counter.
EngQueryPerformanceFrequency Queries the frequency of the performance counter.
EngQuerySystemAttribute Queries processor or system-specific capabilities.

GDI Utility Services
The following table lists the miscellaneous GDI utility services. These services include debugging support, getting
and setting the last error, several conversion services that convert from one character encoding type to another, a
sort routine, and others.
EngBugCheckEx Brings down the system in a controlled manner when the

caller discovers an unrecoverable inconsistency.
EngDebugBreak Causes a breakpoint in the current process to occur.
EngDebugPrint Prints the specified debug message to the kernel

debugger.
EngGetLastError Returns the last error code logged by GDI for the calling
thread.
EngHangNotification Notifies the system that a specified device is inoperable or

unresponsive.
EngLpkInstalled Determines whether the language pack is installed on the

system.
EngMulDiv Multiplies two 32-bit values and then divides the 64-bit
result by a third 32-bit value. The return value is rounded
up or down to the nearest integer.
EngMultiByteToUnicodeN Converts the specified ANSI source string into a Unicode

string using the current ANSI code page.
EngMultiByteToWideChar Converts an ANSI source string into a wide character

string using the specified code page.
EngProbeForRead Probes a structure for read accessibility.
EngProbeForReadAndWrite Probes a structure for read and write accessibility.
EngSetLastError Causes GDI to report an error code, which can be

retrieved by an application.
EngSort Performs a quick-sort on the specified list.
EngUnicodeToMultiByteN Converts the specified Unicode string into an ANSI string

using the current ANSI code page.
EngWideCharToMultiByte Converts a wide character string into an ANSI source

string using the specified code page.

GDI Floating-Point Services
Kernel-mode graphics drivers must do all floating-point operations between calls to the GDI-supplied
EngSaveFloatingPointState and EngRestoreFloatingPointState routines.
If the hardware has a floating-point processor, the driver can do floating-point operations directly. Otherwise, the
driver can use the GDI FLOATOBJ services shown in the following table to emulate floating-point operations.
Regardless of processor type, the driver should use the FLOATL data type when declaring floating-point values.
EngRestoreFloatingPointState Restores the Windows 2000 and later kernel floating-

point state after the driver uses any floating-point or
MMX hardware instructions.
EngSaveFloatingPointState Saves the current Windows 2000 and later kernel

floating-point state.
FLOATOBJ_Add Adds two FLOATOBJs.
FLOATOBJ_AddFloat Adds a FLOATOBJ and a FLOATL.
FLOATOBJ_AddLong Adds a FLOATOBJ and a LONG.
FLOATOBJ_Div Divides one FLOATOBJ by another.
FLOATOBJ_DivFloat Divides a FLOATOBJ by a FLOATL.
FLOATOBJ_DivLong Divides a FLOATOBJ by a LONG.
FLOATOBJ_Equal Determines whether two FLOATOBJs are equal.
FLOATOBJ_EqualLong Determines whether a FLOATOBJ and a LONG are equal.
FLOATOBJ_GetFloat Calculate and return the FLOAT-equivalent value of a

FLOATOBJ.
FLOATOBJ_GetLong Calculate and return the LONG-equivalent value of a

FLOATOBJ.
FLOATOBJ_GreaterThan Determines whether one FLOATOBJ is larger than

another.
FLOATOBJ_GreaterThanLong Determines whether a FLOATOBJ is larger than a LONG.
FLOATOBJ_LessThan Determines whether one FLOATOBJ is less than another.
FLOATOBJ_LessThanLong Determines whether a FLOATOBJ is less than a LONG.
FLOATOBJ_Mul Multiplies two FLOATOBJ values.
FLOATOBJ_MulFloat Multiplies a FLOATOBJ by a FLOATL.
FLOATOBJ_MulLong Multiplies a FLOATOBJ by a LONG.
FLOATOBJ_Neg Changes the sign of a FLOATOBJ.
FLOATOBJ_SetFloat Sets a FLOATOBJ to a particular FLOATL value.
FLOATOBJ_SetLong Sets a FLOATOBJ to a particular LONG value.
FLOATOBJ_Sub Subtracts one FLOATOBJ from another.
FLOATOBJ_SubFloat Subtracts a FLOATL from a FLOATOBJ.
FLOATOBJ_SubLong Subtracts a LONG from a FLOATOBJ.

GDI Halftone Services
GDI Halftone support includes the services listed in the following table.
HT_ComputeRGBGammaTable Causes GDI to compute device red, green, and blue

intensities based on gamma numbers.
HT_Get8BPPFormatPalette Returns a halftone palette for use on standard 8-bits per

pixel device types.
HT_Get8BPPMaskPalette Returns a mask palette for an 8-bits per pixel device type.
HTUI_DeviceColorAdjustment Displays a dialog box that allows a user to adjust a

device's halftoning properties.

GDI Data Types
The data types defined in the following table appear in the device driver interface. Several of the listed data types
have already been described in GDI User Objects. Data types that are pointers are marked with an asterisk (*).
GRAPHICS DDI DATA TYPE VARIABLE NAME PREFIX DEFINITION
BOOL b A 32-bit value that can be either

TRUE or FALSE.
BYTE j An 8-bit unsigned integer.
BRUSHOBJ pbo A pointer to a brush object.
CLIPLINE cl A clipline object.
CLIPOBJ pco A pointer to a clipping object.
DHPDEV dhpdev A 32-bit handle, defined by the

device driver, that identifies a
physical device.
DHSURF dhsurf A 32-bit handle, defined by the

device driver, that identifies a
device-managed surface.
FIX fix A fixed-point number.
FLOATL e A floating-point number.
FLOAT_LONG el A 32-bit overloaded value that is

interpreted as either a LONG or
FLOATL, depending on context.
FLONG fl A set of 32-bit flags.
FONTOBJ pfo A pointer to a font object.
FSHORT fs A set of 16-bit flags.

FWORD fw A 16-bit signed integer.
HBM hbm A 32-bit handle, defined by GDI,

that identifies a bitmap.
HPAL hpal A 32-bit handle, defined by GDI,

that identifies a palette.
HSURF hsurf A 32-bit handle, defined by GDI,

that identifies a surface.
LONG l A 32-bit signed integer.
MIX mix A 32-bit quantity, whose lower 16

bits define foreground and
background mix modes.
PALOBJ ppalo A pointer to a palette object.
PATHOBJ ppo A pointer to a path object.
POINTE pte A point structure that consists of

{FLOATL x, y;}.
POINTFIX ptfx A point structure that consists of

{FIX x, y;}.
POINTQF ptq A point structure that consists of

{LARGE_INTEGER x, y;}. Each
member of this structure is a 64-bit
coordinate in 28.36 format.
PWSZ pwsz A pointer to a null-terminated

Unicode string.
PVOID pv A pointer to a VOID, an undefined

data type.
RECTFX rcfx A rectangle structure that consists

of {FIX xLeft, yTop, xRight,
yBottom;}.
ROP4 rop4 A 32-bit value that specifies how

source, destination, pattern, and
mask pixels are to be mixed.
SHORT s A 16-bit signed integer.
SIZEL sizl A structure that consists of {LONG

cx, cy;}.
STROBJ pstro A pointer to a text string object.
SURFOBJ pso A pointer to a surface object.
ULONG ul A 32-bit unsigned integer.
USHORT us A 16-bit unsigned integer.
XFORMOBJ pxo A pointer to a coordinate transform

object.
XLATEOBJ* pxlo A pointer to a color translation

object.
The parameter prefixes listed in the next table are used to modify variable name prefixes in accordance with their
usage.
PREFIX PARAMETER USAGE
i An enumerated index
c A count
p A pointer

Obsolete GDI Functions, Structures, and Constants
The following functions, structures, and constants appear in the winddi.h header, but are obsolete in Windows 2000
and later.
For a list of obsolete display driver functions, see Obsolete Graphics DDI Functions.
Obsolete GDI Functions
EngDxIoctl
EngMapFontFile
EngQueryEMFInfo
EngUnmapFontFile
FONTOBJ_pGetGammaTables
XFORMOBJ_cmGetTransform
Obsolete GDI Structures
EMFINFO
FD_LIGATURE
LIGATURE
Obsolete GDI Constants
FD_NEGATIVE_FONT
MAXCHARSETS
Display Samples
Starting with Windows 8, samples are published in the MSDN Developer Samples code gallery, along with
documentation. For Windows 7 and earlier, samples and documentation were included in the Windows Driver Kit
(WDK) or Driver Development Kit (DDK).
TARGET
BUILD OPERATING SAMPLE
SAMPLE NAME ENVIRONMENT SYSTEM PNP DRIVER IN-BOX DRIVER DESCRIPTION
Kernel mode Windows 8 Windows 8 Yes No Implements

display-only most of the
miniport device driver
driver interfaces
(DDIs) that a
Available in display-only
the MSDN miniport
Developer driver should
Samples code provide to
gallery. the Windows
Display
Driver Model
(WDDM).
PixLib Windows 8 Windows 8 No No Demonstrate

Windows 7 Windows 7 s how to
Available in Windows Windows implement
the MSDN Server 2008 Server 2008 the CPixel
Developer Windows Windows class.
Samples code Vista Vista
gallery. Windows Windows
Server 2003 Server 2003
Windows XP Windows XP
Mirror - Windows 7 Windows 7 No Yes Demonstrate

Mirror driver Windows Windows s a driver
for mirroring Server 2008 Server 2008 that
GDI content Windows Windows performs
Vista Vista video
Windows Windows mirroring.
Server 2003 Server 2003
Windows XP Windows XP Note Starting
Windows Windows with Windows
2000 2000 8, mirror
drivers will
not install on
the system.
TARGET
BUILD OPERATING SAMPLE
SAMPLE NAME ENVIRONMENT SYSTEM PNP DRIVER IN-BOX DRIVER DESCRIPTION
MonINF Windows 7 Windows 7 No No Demonstrate

Windows Windows s how
Server 2008 Server 2008 monitor
Windows Windows manufacturer
Vista Vista s can avoid
reflashing the
monitor's
EEPROM by
implementin
g a monitor
INF that
overrides
part of, or
the entire,
EDID
information
in software.

Display PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Display PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Table of Contents

This section includes:

Summary of Direct3D support requirements (November 2012)

Corrections to XR_BIAS conversions (November 2012)

Context monitoring A monitored fence object is an advanced form of fence

Send comments about this topic to Microsoft

GPU memory models

Physical memory reference

Accessing allocations by physical address

AccessedPhysically==0 AccessedPhysically==1 Primary &&

Memory Segment Set of pages Contiguous Contiguous

Aperture Segment Not mapped Mapped when resident Mapped when

Send comments about this topic to Microsoft

The GPU virtual address has DXGK_GPUMMUCAPS::VirtualAddressBitCount bits.

Growing and shrinking a root page table

Updating page table

Moving a page table

Physical page size

Application GPU virtual address space

Process privileged virtual address space

Virtual address space on linked display adapters

Send comments about this topic to Microsoft

Paging and unique driver protection

Update GPU virtual address on GPUs with CPU_VIRTUAL page table

Single PTE mode

Dual PTE mode

Assigning a GPU virtual address to a context allocation

Updating the content of a context allocation

Updating page table entries of a process

Making an allocation resident in system memory

Initialization of the memory manager control structures

Residency overview With the introduction of the new residency model,

Access to non-resident allocation Graphics processing unit (GPU) access to allocations

Send comments about this topic to Microsoft

Phasing out allocation and patch location list

GPU ENGINE ALLOCATION LIST? PATCH LOCATION LIST?

GPU virtual address support Yes, 16 entries. No

GPU virtual address support + No No

Send comments about this topic to Microsoft

Monitored fence creation

hSyncObject Handle to the synchronization object. Used to refer to it in

FenceValueCPUVirtualAddress Read-only mapping of the fence value (64bits) for the

FenceValueGPUVirtualAddress Read/write mapping of the fence value (64bits) for the

Advances to the display Infrastructure Windows 8 provides enhancements and optimizations to

Windows Vista WDDM 1.0 D3D9, D3D10 Scheduling, Memory

Windows 7 WDDM 1.1 D3D9, D3D10, D3D10.1, GDI Hardware acceleration,

Windows 8 WDDM 1.2 D3D9, D3D10, D3D10.1, Smooth Rotation,

CURRENTLY USING WDDM SUPPORT FOR XDDM SCENARIOS

XDDM VGA Driver Microsoft Basic Display Driver

Remote Desktop Access/Collab Desktop Duplication API

Remote Session Driver No change, no support for <32 bpp modes

WDDM 1.0 (Windows Vista) Yes No No

WDDM 1.1 (Windows 7) Yes No No

WDDM 1.2 (Windows 8) Yes Yes Yes

Full Graphics Required as boot Optional Optional Optional

Display-Only Not allowed Optional Optional Optional

Render-Only Optional as non- Optional Optional Optional

Headless Not allowed Optional N/A N/A

Video memory offer Enables more efficient M M NA

GPU preemption Improves desktop M M NA

TDR changes in Improved resiliency M M NA

Optimized screen Screen rotation M NA M

Stereoscopic 3D Provides a consistent O NA NA

XPS rasterization on Enables a quality M M NA