Media Foundation

Media Foundation: Essential Concepts
If you are new to digital media, this topic introduces some concepts that you will need to understand before writing a Media Foundation application.
Streams Compression Media Containers Formats Related topics
Streams
A stream is a sequence of media data with a uniform type. The most common types are audio and video, but a stream can contain almost any kind of data, including text, script commands, and still images. The term stream in this documentation does not imply delivery over a network. A media file intended for local playback also contains streams. Usually, a media file contains either a single audio stream, or exactly one video stream and one audio stream. However, a media file might contain several streams of the same type. For example, a video file might contain audio streams in several different languages. At run time, the application would select which stream to use.
Compression
Compression refers to any process that reduces the size of a data stream by removing redundant information. Compression algorithms fall into two broad categories:
Lossless compression. Using a lossless algorithm, the reconstructed data is identical to the original. Lossy compression. Using a lossy algorithm, the reconstructed data is an approximation of the original, but is not an exact match.
In most other domains, lossy compression is not acceptable. (Imagine getting back an "approximation" of a spreadsheet!) But lossy compression schemes are well-suited to audio and video, for a couple of reasons. The first reason has to do with the physics of human perception. When we listen to a complex sound, like a music recording, some of the information contained in that sound is not perceptible to the ear. With the help of signal processing theory, it is possible to analyze and separate the frequencies that cannot be perceived. These frequencies can be removed with no perceptual effect. Although the reconstructed audio will not match the original exactly, it will sound the same to the listener. Similar principles apply to video. Second, some degradation in sound or image quality may be acceptable, depending on the intended purpose. In telephony, for example, audio is often highly compressed. The result is good enough for a phone conversation but you wouldn't want to listen to a symphony orchestra over a telephone. Compression is also called encoding, and a device that encodes is called an encoder. The reverse process is decoding, and the device is a naturally called a decoder. The general term for both encoders and decoders is codec. Codecs can be implemented in hardware or software. Compression technology has changed rapidly since the advent of digital media, and a large number of compression schemes are in use today. This fact is one of the main challenges for digital media programming.
Media Containers
It is rare to store a raw audio or video stream as a computer file, or to send one directly over the network. For one thing, it would be impossible to decode such a stream, without knowing in advance which codec to use. Therefore, media files usually contain at least some of the following elements:
File headers that describe the number of streams, the format of each stream, and so on. An index that enables random access to the content. Metadata that describes the content (for example, the artist or title).
Packet headers, to enable network transmission or random access.
This documentation uses the term container to describe the entire package of streams, headers, indexes, metadata, and so forth. The reason for using the term container rather than file is that some container formats are designed for live broadcast. An application could generate the container in real time, never storing it to a file. An early example of a media container is the AVI file format. Other examples include MP4 and Advanced Systems Format (ASF). Containers can be identified by file name extension (for example, .mp4) or by MIME type. The following diagram shows a typical structure for a media container. The diagram does not represent any specific format; the details of each format vary widely.
Notice that the structure shown in the diagram is hierarchical, with header information appearing at the start of the container. This structure is typical of many (but not all) container formats. Also notice that the data section contains interleaved audio and video packets. This type of interleaving is common in media containers. The term multiplexing refers to the process of packetizing the audio and video streams and interleaving the packets into the container. The reverse process, reassembling the streams from the packetized data, is called demultiplexing.
Formats
In digital media, the term format is ambiguous. A format can refer to the type of encoding, such as H.264 video, or thecontainer, such as MP4. This distinction is often confusing for ordinary users. The names given to media formats do not always help. For example, MP3 refers both to an encoding format (MPEG-1 Audio Layer 3) and a file format. The distinction is important, however, because reading a media file actually involves two stages: 1. 2. First, the container must be parsed. In most cases, the number of streams and the format of each stream cannot be known until this step is complete. Next, if the streams are compressed, they must be decoded using the appropriate decoders.
This fact leads quite naturally to a software design where separate components are used to parse containers and decode streams. Further, this approach lends itself to a plug-in model, so that third parties can provide their own parsers and codecs. On Windows, the Component Object Model (COM) provides a standard way to separate an
API from its implementation, which is a requirement for any plug-in model. For this reason (among others), Media Foundation uses COM interfaces. The following diagram shows the components used to read a media file:
Writing a media file also requires two steps: 1. 2. Encoding the uncompressed audio/video data. Putting the compressed data into a particular container format.
The following diagram shows the components used to write a media file:
Media Foundation Architecture

This section describes the general design of Microsoft Media Foundation. For information about using Media Foundation for specific programming tasks, see Media Foundation Programming Guide.
In this section
Topic Description
Overview of the Media Foundation Architecture Media Foundation Primitives
Gives a high-level overview of the Media Foundation architecture.
Describes some basic interfaces that are used throughout Media Foundation. Almost all Media Foundation applications will use these interfaces. Describes core Media Foundation functions, such as asynchronous callbacks and work queues. Some applications might use platform-level interfaces. Also, custom plugins, such as media sources and MFTs, use these interfaces.
Media Foundation Platform APIs
Media Foundation Pipeline
The Media Foundation pipeline layer consists of media sources, MFTs, and media sinks. Most applications do not call methods directly on the pipeline layer. Instead, applications use one of the higher layers, such as the Media Session or the Source Reader and Sink Writer.
Media Session Source Reader
The Media Session manages data flow in the Media Foundation pipeline. The Source Reader enables an application to get data from a media source, without the applicating needing to call the media source APIs directly. The Source Reader can also perform decoding of compressed streams.
Protected Media Path
The protected media path (PMP) provides a protected environment for playing premium video content. It is not necessary to use the PMP when writing a Media Foundation application.
Overview of the Media Foundation Architecture

This topic describes the general design of Microsoft Media Foundation. For information about using Media Foundation for specific programming tasks, see Media Foundation Programming Guide. The following diagram shows a high-level view of the Media Foundation architecture.
Media Foundation provides two distinct programming models. The first model, shown on the left side of the diagram, uses an end-to-end pipeline for media data. The application initializes the pipelinefor example, by providing the URL of a file to playand then calls methods to control streaming. In the second model, shown on the right side of the diagram, the application either pulls data from a source, or pushes it to a destination (or both). This model is particularly useful if you need to process the data, because the application has direct access to the data stream. Primitives and Platform Starting from the bottom of the diagram, the primitives are helper objects used throughout the Media Foundation API:
Attributes are a generic way to store information inside an object, as a list of key/value pairs. Media Types describe the format of a media data stream. Media Buffers hold chunks of media data, such as video frames and audio samples, and are used to transport data between objects. Media Samples are containers for media buffers. They also contain metadata about the buffers, such as time stamps.
The Media Foundation Platform APIs provide some core functionality that is used by the Media Foundation pipeline, such as asynchronous callbacks and work queues. Certain applications might need to call these APIs directly; also, you will need them if you implement a custom source, transform, or sink for Media Foundation. Media Pipeline The media pipeline contains three types of object that generate or process media data:
Media Sources introduce data into the pipeline. A media source might get data from a local file, such as a video file; from a network stream; or from a hardware capture device. Media Foundation Transforms (MFTs) process data from a stream. Encoders and decoders are implemented as MFTs. Media Sinks consume the data; for example, by showing video on the display, playing audio, or writing the data to a media file.
Third parties can implement their own custom sources, sinks, and MFTs; for example, to support new media file formats. The Media Session controls the flow of data through the pipeline, and handles tasks such as quality control, audio/video synchronization, and responding to format changes. Source Reader and Sink Writer The Source Reader and Sink Writer provide an alternative way to use the basic Media Foundation components (media sources, transforms, and media sinks). The source reader hosts a media source and zero or more decoders, while the sink writer hosts a media sink and zero or more encoders. You can use the source reader to get compressed or uncompressed data from a media source, and use the sink writer to encode data and send the data to a media sink. Note The source reader and sink writer are available in Windows 7. This programming model gives the application more control over the flow of data, and also gives the application direct access to the data from the source.
Media Foundation Primitives

Media Foundation defines several basic object types that are used throughout the Media Foundation APIs. Topic Description
Attributes
Attributes and properties are key/value pairs stored on an object.
Media Types
A media type describes the format of a digital media stream.
Media Buffers
A media buffer manages a block of memory, so that it can be shared between objects.
Media Samples
A media sample is an object that contains a list of media buffers.
Attributes in Media Foundation

An attribute is a key/value pair, where the key is a GUID and the value is a PROPVARIANT. Attributes are used throughout Microsoft Media Foundation to configure objects, describe media formats, query object properties, and other purposes. This topic contains the following sections.
About Attributes Serializing Attributes Implementing IMFAttributes Related topics
About Attributes
An attribute is a key/value pair, where the key is a GUID and the value is a PROPVARIANT. Attribute values are restricted to the following data types:
Unsigned 32-bit integer (UINT32). Unsigned 64-bit integer (UINT64). 64-bit floating-point number. GUID. Null-terminated wide-character string. Byte array. IUnknown pointer.
These types are defined in the MF_ATTRIBUTE_TYPE enumeration. To set or retrieve attribute values, use the IMFAttributesinterface. This interface contains type-safe methods to get and set values by data type. For example, to set a 32-bit integer, callIMFAttributes::SetUINT32. Attribute keys are unique within an object. If you set two different values with the same key, the second value overwrites the first. Several Media Foundation interfaces inherit the IMFAttributes interface. Objects that expose this interface have optional or mandatory attributes that the application should set on the object, or have attributes that the application can retrieve. Also, some methods and functions take an IMFAttributes pointer as a parameter, which enables the application to set configuration information. The application must create an attribute store to hold the configuration attributes. To create an empty attribute store, call MFCreateAttributes. The following code shows two functions. The first creates a new attribute store and sets a hypothetical attribute named MY_ATTRIBUTE with a string value. The second function retrieves the value of this attribute.
extern const GUID MY_ATTRIBUTE; HRESULT ShowCreateAttributeStore(IMFAttributes **ppAttributes) { IMFAttributes *pAttributes = NULL; const UINT32 cElements = 10; // Starting size. // Create the empty attribute store. HRESULT hr = MFCreateAttributes(&pAttributes, cElements); // Set the MY_ATTRIBUTE attribute with a string value. if (SUCCEEDED(hr)) { hr = pAttributes->SetString( MY_ATTRIBUTE, L"This is a string value" );
} // Return the IMFAttributes pointer to the caller. if (SUCCEEDED(hr)) { *ppAttributes = pAttributes; (*ppAttributes)->AddRef(); } SAFE_RELEASE(pAttributes); return hr; } HRESULT ShowGetAttributes() { IMFAttributes *pAttributes = NULL; WCHAR *pwszValue = NULL; UINT32 cchLength = 0; // Create the attribute store. HRESULT hr = ShowCreateAttributeStore(&pAttributes); // Get the attribute. if (SUCCEEDED(hr)) { hr = pAttributes->GetAllocatedString( MY_ATTRIBUTE, &pwszValue, &cchLength ); } CoTaskMemFree(pwszValue); SAFE_RELEASE(pAttributes); return hr; }
For a complete list of Media Foundation attributes, see Media Foundation Attributes. The expected data type for each attribute is documented there.
Serializing Attributes
Media Foundation has two functions for serializing attribute stores. One writes the attributes to a byte array, the other writes them to a stream that supports the IStream interface. Each function has a corresponding function that loads the data. Operation Byte Array IStream
Save
MFGetAttributesAsBlob
MFSerializeAttributesToStream
Load
MFInitAttributesFromBlob
MFDeserializeAttributesFromStream
To write the contents of an attribute store into a byte array, call MFGetAttributesAsBlob. Attributes with IUnknown pointer values are ignored. To load the attributes back into an attribute store, call MFInitAttributesFromBlob. To write an attribute store to a stream, call MFSerializeAttributesToStream. This function can marshal IUnknown pointer values. The caller must provide a stream object that implements the IStream interface. To load an attribute store from a stream, call MFDeserializeAttributesFromStream.
Implementing IMFAttributes
Media Foundation provides a stock implementation of IMFAttributes, which is obtained by calling the MFCreateAttributesfunction. In most situations, you should use this implementation, and not provide your own custom implementation. There is one situation when you might need to implement the IMFAttributes interface: If you implement a second interface that inherits IMFAttributes. In that case, you must provide implementations for the IMFAttributes methods inherited by the second interface. In this situation, it is recommended to wrap the existing Media Foundation implementation of IMFAttributes. The following code shows a class template that holds an IMFAttributes pointer and wraps every IMFAttributes method, except for theIUnknown methods.
#include <assert.h> // Helper class to implement IMFAttributes. // This is an abstract class; the derived class must implement the IUnknown // methods. This class is a wrapper for the standard attribute store provided // in Media Foundation. // template parameter: // The interface you are implementing, either IMFAttributes or an interface // that inherits IMFAttributes, such as IMFActivate template <class IFACE=IMFAttributes> class CBaseAttributes : public IFACE { protected: IMFAttributes *m_pAttributes; // This version of the constructor does not initialize the // attribute store. The derived class must call Initialize() in // its own constructor. CBaseAttributes() : m_pAttributes(NULL) { } // This version of the constructor initializes the attribute // store, but the derived class must pass an HRESULT parameter // to the constructor. CBaseAttributes(HRESULT& hr, UINT32 cInitialSize = 0) : m_pAttributes(NULL) { hr = Initialize(cInitialSize); } // The next version of the constructor uses a caller-provided // implementation of IMFAttributes. // (Sometimes you want to delegate IMFAttributes calls to some
// other object that implements IMFAttributes, rather than using // MFCreateAttributes.) CBaseAttributes(HRESULT& hr, IUnknown *pUnk) { hr = Initialize(pUnk); } virtual ~CBaseAttributes() { if (m_pAttributes) { m_pAttributes->Release(); } } // Initializes the object by creating the standard Media Foundation attribute store. HRESULT Initialize(UINT32 cInitialSize = 0) { if (m_pAttributes == NULL) { return MFCreateAttributes(&m_pAttributes, cInitialSize); } else { return S_OK; } } // Initializes this object from a caller-provided attribute store. // pUnk: Pointer to an object that exposes IMFAttributes. HRESULT Initialize(IUnknown *pUnk) { if (m_pAttributes) { m_pAttributes->Release(); m_pAttributes = NULL; }
return pUnk->QueryInterface(IID_PPV_ARGS(&m_pAttributes)); } public: // IMFAttributes methods STDMETHODIMP GetItem(REFGUID guidKey, PROPVARIANT* pValue) { assert(m_pAttributes); return m_pAttributes->GetItem(guidKey, pValue); } STDMETHODIMP GetItemType(REFGUID guidKey, MF_ATTRIBUTE_TYPE* pType) { assert(m_pAttributes); return m_pAttributes->GetItemType(guidKey, pType); }
STDMETHODIMP CompareItem(REFGUID guidKey, REFPROPVARIANT Value, BOOL* pbResult ) { assert(m_pAttributes); return m_pAttributes->CompareItem(guidKey, Value, pbResult); } STDMETHODIMP Compare( IMFAttributes* pTheirs, MF_ATTRIBUTES_MATCH_TYPE MatchType, BOOL* pbResult ) { assert(m_pAttributes); return m_pAttributes->Compare(pTheirs, MatchType, pbResult); } STDMETHODIMP GetUINT32(REFGUID guidKey, UINT32* punValue) { assert(m_pAttributes); return m_pAttributes->GetUINT32(guidKey, punValue); } STDMETHODIMP GetUINT64(REFGUID guidKey, UINT64* punValue) { assert(m_pAttributes); return m_pAttributes->GetUINT64(guidKey, punValue); } STDMETHODIMP GetDouble(REFGUID guidKey, double* pfValue) { assert(m_pAttributes); return m_pAttributes->GetDouble(guidKey, pfValue); } STDMETHODIMP GetGUID(REFGUID guidKey, GUID* pguidValue) { assert(m_pAttributes); return m_pAttributes->GetGUID(guidKey, pguidValue); } STDMETHODIMP GetStringLength(REFGUID guidKey, UINT32* pcchLength) { assert(m_pAttributes); return m_pAttributes->GetStringLength(guidKey, pcchLength); } STDMETHODIMP GetString(REFGUID guidKey, LPWSTR pwszValue, UINT32 cchBufSize, U INT32* pcchLength) { assert(m_pAttributes); return m_pAttributes->GetString(guidKey, pwszValue, cchBufSize, pcchLength ); } STDMETHODIMP GetAllocatedString(REFGUID guidKey, LPWSTR* ppwszValue, UINT32* p cchLength) {
assert(m_pAttributes); return m_pAttributes->GetAllocatedString(guidKey, ppwszValue, pcchLength); } STDMETHODIMP GetBlobSize(REFGUID guidKey, UINT32* pcbBlobSize) { assert(m_pAttributes); return m_pAttributes->GetBlobSize(guidKey, pcbBlobSize); } STDMETHODIMP GetBlob(REFGUID guidKey, UINT8* pBuf, UINT32 cbBufSize, UINT32* p cbBlobSize) { assert(m_pAttributes); return m_pAttributes->GetBlob(guidKey, pBuf, cbBufSize, pcbBlobSize); } STDMETHODIMP GetAllocatedBlob(REFGUID guidKey, UINT8** ppBuf, UINT32* pcbSize) { assert(m_pAttributes); return m_pAttributes->GetAllocatedBlob(guidKey, ppBuf, pcbSize); } STDMETHODIMP GetUnknown(REFGUID guidKey, REFIID riid, LPVOID* ppv) { assert(m_pAttributes); return m_pAttributes->GetUnknown(guidKey, riid, ppv); } STDMETHODIMP SetItem(REFGUID guidKey, REFPROPVARIANT Value) { assert(m_pAttributes); return m_pAttributes->SetItem(guidKey, Value); } STDMETHODIMP DeleteItem(REFGUID guidKey) { assert(m_pAttributes); return m_pAttributes->DeleteItem(guidKey); } STDMETHODIMP DeleteAllItems() { assert(m_pAttributes); return m_pAttributes->DeleteAllItems(); } STDMETHODIMP SetUINT32(REFGUID guidKey, UINT32 unValue) { assert(m_pAttributes); return m_pAttributes->SetUINT32(guidKey, unValue); } STDMETHODIMP SetUINT64(REFGUID guidKey,UINT64 unValue) { assert(m_pAttributes); return m_pAttributes->SetUINT64(guidKey, unValue); }
STDMETHODIMP SetDouble(REFGUID guidKey, double fValue) { assert(m_pAttributes); return m_pAttributes->SetDouble(guidKey, fValue); } STDMETHODIMP SetGUID(REFGUID guidKey, REFGUID guidValue) { assert(m_pAttributes); return m_pAttributes->SetGUID(guidKey, guidValue); } STDMETHODIMP SetString(REFGUID guidKey, LPCWSTR wszValue) { assert(m_pAttributes); return m_pAttributes->SetString(guidKey, wszValue); } STDMETHODIMP SetBlob(REFGUID guidKey, const UINT8* pBuf, UINT32 cbBufSize) { assert(m_pAttributes); return m_pAttributes->SetBlob(guidKey, pBuf, cbBufSize); } STDMETHODIMP SetUnknown(REFGUID guidKey, IUnknown* pUnknown) { assert(m_pAttributes); return m_pAttributes->SetUnknown(guidKey, pUnknown); } STDMETHODIMP LockStore() { assert(m_pAttributes); return m_pAttributes->LockStore(); } STDMETHODIMP UnlockStore() { assert(m_pAttributes); return m_pAttributes->UnlockStore(); } STDMETHODIMP GetCount(UINT32* pcItems) { assert(m_pAttributes); return m_pAttributes->GetCount(pcItems); } STDMETHODIMP GetItemByIndex(UINT32 unIndex, GUID* pguidKey, PROPVARIANT* pValu e) { assert(m_pAttributes); return m_pAttributes->GetItemByIndex(unIndex, pguidKey, pValue); } STDMETHODIMP CopyAllItems(IMFAttributes* pDest) { assert(m_pAttributes); return m_pAttributes->CopyAllItems(pDest);
} // Helper functions HRESULT SerializeToStream(DWORD dwOptions, IStream* pStm) // dwOptions: Flags from MF_ATTRIBUTE_SERIALIZE_OPTIONS { assert(m_pAttributes); return MFSerializeAttributesToStream(m_pAttributes, dwOptions, pStm); } HRESULT DeserializeFromStream(DWORD dwOptions, IStream* pStm) { assert(m_pAttributes); return MFDeserializeAttributesFromStream(m_pAttributes, dwOptions, pStm); } // SerializeToBlob: Stores the attributes in a byte array. // // ppBuf: Receives a pointer to the byte array. // pcbSize: Receives the size of the byte array. // // The caller must free the array using CoTaskMemFree. HRESULT SerializeToBlob(UINT8 **ppBuffer, UINT32 *pcbSize) { assert(m_pAttributes); if (ppBuffer == NULL) { return E_POINTER; } if (pcbSize == NULL) { return E_POINTER; } *ppBuffer = NULL; *pcbSize = 0; UINT32 cbSize = 0; BYTE *pBuffer = NULL; HRESULT hr = MFGetAttributesAsBlobSize(m_pAttributes, &cbSize); if (FAILED(hr)) { return hr; } pBuffer = (BYTE*)CoTaskMemAlloc(cbSize); if (pBuffer == NULL) { return E_OUTOFMEMORY; } hr = MFGetAttributesAsBlob(m_pAttributes, pBuffer, cbSize); if (SUCCEEDED(hr)) {
*ppBuffer = pBuffer; *pcbSize = cbSize; } else { CoTaskMemFree(pBuffer); } return hr; } HRESULT DeserializeFromBlob(const UINT8* pBuffer, UINT cbSize) { assert(m_pAttributes); return MFInitAttributesFromBlob(m_pAttributes, pBuffer, cbSize); } HRESULT GetRatio(REFGUID guidKey, UINT32* pnNumerator, UINT32* punDenominator) { assert(m_pAttributes); return MFGetAttributeRatio(m_pAttributes, guidKey, pnNumerator, punDenomin ator); } HRESULT SetRatio(REFGUID guidKey, UINT32 unNumerator, UINT32 unDenominator) { assert(m_pAttributes); return MFSetAttributeRatio(m_pAttributes, guidKey, unNumerator, unDenomina tor); } // Gets an attribute whose value represents the size of something (eg a video frame). HRESULT GetSize(REFGUID guidKey, UINT32* punWidth, UINT32* punHeight) { assert(m_pAttributes); return MFGetAttributeSize(m_pAttributes, guidKey, punWidth, punHeight); } // Sets an attribute whose value represents the size of something (eg a video frame). HRESULT SetSize(REFGUID guidKey, UINT32 unWidth, UINT32 unHeight) { assert(m_pAttributes); return MFSetAttributeSize (m_pAttributes, guidKey, unWidth, unHeight); } };
The following code shows how to derive a class from this template:
#include <shlwapi.h> class MyObject : public CBaseAttributes<> { MyObject() : m_nRefCount(1) { } ~MyObject() { } long m_nRefCount;
public: // IUnknown STDMETHODIMP MyObject::QueryInterface(REFIID riid, void** ppv) { static const QITAB qit[] = { QITABENT(MyObject, IMFAttributes), { 0 }, }; return QISearch(this, qit, riid, ppv); } STDMETHODIMP_(ULONG) MyObject::AddRef() { return InterlockedIncrement(&m_nRefCount); } STDMETHODIMP_(ULONG) MyObject::Release() { ULONG uCount = InterlockedDecrement(&m_nRefCount); if (uCount == 0) { delete this; } return uCount; } // Static function to create an instance of the object. static HRESULT CreateInstance(MyObject **ppObject) { HRESULT hr = S_OK; MyObject *pObject = new MyObject(); if (pObject == NULL) { return E_OUTOFMEMORY; } // Initialize the attribute store. hr = pObject->Initialize(); if (FAILED(hr)) { delete pObject; return hr; } *ppObject = pObject; (*ppObject)->AddRef(); return S_OK; } };
You must call CBaseAttributes::Initialize to create the attribute store. In the previous example, that is done inside a static creation function. The template argument is an interface type, which defaults to IMFAttributes. If your object implements an interface that inherits IMFAttributes, such as IMFActivate, set the template argument equal to name of the derived interface.
Media Types
A media type is a way to describe the format of a media stream. In Media Foundation, media types are represented by theIMFMediaType interface. Applications use media types to discover the format of a media file or media stream. Objects in the Media Foundation pipeline use media types to negotiate the formats they will deliver or receive. This section contains the following topics. Topic Description
About Media Types
General overview of media types in Media Foundation.
Media Type GUIDs
Lists the defined GUIDs for major types and subtypes.
Audio Media Types
How to create media types for audio formats.
Video Media Types
How to create media types for video formats.
Complete and Partial Media Types
Describes the difference between complete media types and partial media types.
Media Type Conversions
How to convert between Media Foundation media types and older format structures.
Media Type Helper Functions
A list of functions that manipulate or get information from a media type.
Media Type Debugging Code
Example code that shows how to view a media type while debugging.
About Media Types

A media type describes the format of a media stream. In Microsoft Media Foundation, media types are represented by theIMFMediaType interface. This interface inherits the IMFAttributes interface. The details of a media type are specified as attributes.
To create a new media type, call the MFCreateMediaType function. This function returns a pointer to the IMFMediaTypeinterface. The media type initially has no attributes. To set the details of the format, set the relevant attributes. For a list of media type attributes, see Media Type Attributes.
Major Types and Subtypes

Two important pieces of information for any media type are the major type and the subtype.
The major type is a GUID that defines the overall category of the data in a media stream. Major types include video and audio. To specify the major type, set the MF_MT_MAJOR_TYPE attribute. The IMFMediaType::GetMajorType method returns the value of this attribute. The subtype further defines the format. For example, within the video major type, there are subtypes for RGB-24, RGB-32, YUY2, and so forth. Within audio, there are PCM audio, IEEE floating-point audio, and others. The subtype provides more information than the major type, but it does not define everything about the format. For example, video subtypes do not define the image size or the frame rate. To specify the subtype, set the MF_MT_SUBTYPE attribute.
All media types should have a major type GUID and a subtype GUID. For a list of major type and subtype GUIDs, see Media Type GUIDs.
Why Attributes?
Attributes have several advantages over the format structures that have been used in previous technologies such as DirectShow and the Windows Media Format SDK.
It is easier to represent "don't know" or "don't care" values. For example, if you are writing a video transform, you might know in advance which RGB and YUV formats the transform supports, but not the dimensions of the video frame, until you get them from the video source. Similarly, you might not care about certain details, such as the video primaries. With a format structure, every member must be filled with some value. As a result, it has become common to use zero to indicate an unknown or default value. This practice can cause errors if another component treats zero as a legitimate value. With attributes, you
simply omit the attributes that are unknown or not relevant to your component. As requirements have changed over time, format structures were extended by adding additional data at the end of the structure. For example, WAVEFORMATEXTENSIBLE extends the WAVEFORMATEX structure. This practice is prone to error, because components must cast structure pointers to other structure types. Attributes can be extended safely. Mutually incompatible format structures have been defined. For example, DirectShow defines the VIDEOINFOHEADERand VIDEOINFOHEADER2 structures. Attributes are set independently of each other, so this problem does not arise.
Major Media Types

In a media type, the major type describes the overall category of the data, such as audio or video. The subtype, if present, further refines the major type. For example, if the major type is video, the subtype might be 32-bit RGB video. Subtypes also distinguish encoded formats, such as H.264 video, from uncompressed formats. Major type and subtype are identified by GUIDs and stored in the following attributes: Attribute Description
MF_MT_MAJOR_TYPE
Major type.
MF_MT_SUBTYPE
Subtype.
The following major types are defined. Major Type Description Subtypes
MFMediaType_Audio
Audio.
Audio Subtype GUIDs.
MFMediaType_Binary
Binary stream.
None.
MFMediaType_FileTransfer
A stream that contains data files.
None.
MFMediaType_HTML
HTML stream.
None.
MFMediaType_Image
Still image stream.
WIC GUIDs and CLSIDs.
MFMediaType_Protected
Protected media.
The subtype specifies the content protection scheme.
MFMediaType_SAMI
Synchronized Accessible Media Interchange (SAMI) captions.
None.
MFMediaType_Script
Script stream.
None.
MFMediaType_Video
Video.
Video Subtype GUIDs.
Third-party components can define new major types and new subtypes.
Audio Media Types

This section describes how to create and manipulate media types that describe audio data. Topic Description
Audio Subtype GUIDs
Contains a list of audio subtype GUIDs.
Uncompressed Audio Media
How to create a media type that describes an uncompressed audio
Types
format.
AAC Media Types
Describes how to specify the format of an Advanced Audio Coding (AAC) stream.
Audio Subtype GUIDs

The following audio subtype GUIDs are defined. To specify the subtype, set the MF_MT_SUBTYPE attribute on the media type. Except where noted, these constants are defined in the header file mfapi.h. When these subtypes are used, set the MF_MT_MAJOR_TYPE attribute to MFMediaType_Audio. GUID Description Format Tag (FOURCC)
MEDIASUBTYPE_RAW_ AAC1
Advanced Audio Coding (AAC). This subtype is used for AAC contained in an AVI file with an audio format tag equal to 0x00FF. For more information, see AAC Decoder. Defined in wmcodecdsp.h Advanced Audio Coding (AAC). Note Equivalent to MEDIASUBTYPE_MPEG_HEAAC, defined in wmcodecdsp.h. The stream can contain raw AAC data or AAC data in an Audio Data Transport Stream (ADTS) stream. For more information, see:
WAVE_FORMAT_RAW_A AC1 (0x00FF)
MFAudioFormat_AAC
WAVE_FORMAT_MPEG_ HEAAC (0x1610)
AAC Decoder MPEG-4 File Source
MFAudioFormat_ADTS
Not used.
WAVE_FORMAT_MPEG_ ADTS_AAC (0x1600)
MFAudioFormat_Dolby _AC3_SPDIF
Dolby AC-3 audio over Sony/Philips Digital Interface (S/PDIF). This GUID value is identical to the following subtypes:
WAVE_FORMAT_DOLBY_ AC3_SPDIF (0x0092)
KSDATAFORMAT_SUBTYPE_IEC6193 7_DOLBY_DIGITAL, defined in ksmedia.h. MEDIASUBTYPE_DOLBY_AC3_SPDIF,
defined in uuids.h.
MFAudioFormat_DRM
Encrypted audio data used with secure audio path.
WAVE_FORMAT_DRM (0x0009)
MFAudioFormat_DTS
Digital Theater Systems (DTS) audio.
WAVE_FORMAT_DTS (0x0008)
MFAudioFormat_Float
Uncompressed IEEE floating-point audio.
WAVE_FORMAT_IEEE_FL OAT (0x0003)
MFAudioFormat_MP3
MPEG Audio Layer-3 (MP3).
WAVE_FORMAT_MPEGL AYER3 (0x0055)
MFAudioFormat_MPEG
MPEG-1 audio payload.
WAVE_FORMAT_MPEG (0x0050)
MFAudioFormat_MSP1
Windows Media Audio 9 Voice codec.
WAVE_FORMAT_WMAV OICE9 (0x000A)
MFAudioFormat_PCM
Uncompressed PCM audio.
WAVE_FORMAT_PCM (1)
MFAudioFormat_WMA SPDIF
Windows Media Audio 9 Professional codec over S/PDIF.
WAVE_FORMAT_WMASP DIF (0x0164)
MFAudioFormat_WMA udio_Lossless
Windows Media Audio 9 Lossless codec or Windows Media Audio 9.1 codec.
WAVE_FORMAT_WMAU DIO_LOSSLESS (0x0163)
MFAudioFormat_WMA udioV8
Windows Media Audio 8 codec, Windows Media Audio 9 codec, or Windows Media Audio 9.1 codec.
WAVE_FORMAT_WMAU DIO2 (0x0161)
MFAudioFormat_WMA udioV9
Windows Media Audio 9 Professional codec or Windows Media Audio 9.1 Professional codec.
WAVE_FORMAT_WMAU DIO3 (0x0162)
The format tags listed in the third column of this table are used in the WAVEFORMATEX structure, and are defined in the header file mmreg.h.
Given an audio format tag, you can create an audio subtype GUID as follows: 1. 2. Start with the value MFAudioFormat_Base, which is defined in mfaph.i. Replace the first DWORD of this GUID with the format tag.
You can use the DEFINE_MEDIATYPE_GUID macro to define a new GUID constant that follows this pattern.
Ch
DXVA
About DXVA 2.0

DirectX Video Acceleration (DXVA) is an API and a corresponding DDI for using hardware acceleration to speed up video processing. Software codecs and software video processors can use DXVA to offload certain CPUintensive operations to the GPU. For example, a software decoder can offload the inverse discrete cosine transform (iDCT) to the GPU. In DXVA, some decoding operations are implemented by the graphics hardware driver. This set of functionality is termed the accelerator. Other decoding operations are implemented by user-mode application software, called the host decoder or software decoder. (The terms host decoderand software decoder are equivalent.) Processing performed by the accelerator is called off-host processing. Typically the accelerator uses the GPU to speed up some operations. Whenever the accelerator performs a decoding operation, the host decoder must convey to the accelerator buffers containing the information needed to perform the operation The DXVA 2 API requires Windows Vista or later. The DXVA 1 API is still supported in Windows Vista for backward compatibility. An emulation layer is provided that converts between either version of the API and the opposite version of the DDI:
If the graphics driver conforms to the Windows Display Driver Model (WDDM), DXVA 1 API calls are converted to DXVA 2 DDI calls. If the graphics drivers uses the older Windows XP Display Driver Model (XPDM), DXVA 2 API calls are converted to DXVA 1 DDI calls.
The following table shows the operating system requirements and the supported video renderers for each version of the DXVA API.
API Version
Requirements
Video Renderer Support
DXVA 1
Windows 2000 or later
Overlay Mixer, VMR-7, VMR-9 (DirectShow only)
DXVA 2
Windows Vista
EVR (DirectShow and Media Foundation)
In DXVA 1, the software decoder must access the API through the video renderer. There is no way to use the DXVA 1 API without calling into the video renderer. This limitation has been removed with DXVA 2. Using DXVA 2, the host decoder (or any application) can access the API directly, through the IDirectXVideoDecoderService interface. The DXVA 1 documentation describes the decoding structures used for the following video standards:
ITU-T Rec. H.261
ITU-T Rec. H.263 MPEG-1 video MPEG-2 Main Profile video
The following specifications define DXVA extensions for other video standards:
DXVA Specification for H.264/AVC Decoding DXVA Specification for H.264/MPEG-4 AVC Multiview Video Coding (MVC), Including the Stereo High Profile DXVA Specification for MPEG-1 VLD and Combined MPEG-1/MPEG-2 VLD Video Decoding. DXVA Specification for Off-Host VLD Mode for MPEG-4 Part 2 Video Decoding DXVA Specification for Windows Media Video v8, v9 and vA Decoding (Including SMPTE 421M "VC-1")
DXVA 1 and DXVA 2 use the same data structures for decoding. However, the procedure for configuring the decoding session has changed. DXVA 1 uses a "probe and lock" mechanism, wherein the host decoder can test various configurations before setting the desired configuration on the accelerator. In DXVA 2, the accelerator returns a list of supported configurations and the host decoder selects one from the list. Details are given in the following sections:
Supporting DXVA 2.0 in DirectShow Supporting DXVA 2.0 in Media Foundation
Direct3D Device Manager

The Microsoft Direct3D device manager enables two or more objects to share the same Microsoft Direct3D 9 device. One object acts as the owner of the Direct3D 9 device. To share the device, the owner of the device creates the Direct3D device manager. Other objects can obtain a pointer to the device manager from the device owner, then use the device manager to get a pointer to the Direct3D device. Any object that uses the device holds an exclusive lock, which prevents other objects from using the device at the same time. Note The Direct3D Device Manager supports Direct3D 9 devices only. It does not support DXGI devices. To create the Direct3D device manager, call DXVA2CreateDirect3DDeviceManager9. This function returns a pointer to the device manager'sIDirect3DDeviceManager9 interface, along with a reset token. The reset token enables the owner of the Direct3D device to set (and reset) the device on the device manager. To initialize the device manager, call IDirect3DDeviceManager9::ResetDevice. Pass in a pointer to the Direct3D device, along with the reset token. The following code shows how to create and initialize the device manager.
HRESULT CreateD3DDeviceManager( IDirect3DDevice9 *pDevice, UINT *pReset, IDirect3DDeviceManager9 **ppManager ) { UINT resetToken = 0; IDirect3DDeviceManager9 *pD3DManager = NULL; HRESULT hr = DXVA2CreateDirect3DDeviceManager9(&resetToken, &pD3DManager); if (FAILED(hr)) { goto done; } hr = pD3DManager->ResetDevice(pDevice, resetToken);
if (FAILED(hr)) { goto done; } *ppManager = pD3DManager; (*ppManager)->AddRef(); *pReset = resetToken;
done: SafeRelease(&pD3DManager); return hr; }
The device owner must provide a way for other objects to get a pointer to the IDirect3DDeviceManager9 interface. The standard mechanism is to implement the IMFGetService interface. The service GUID is MR_VIDEO_ACCELERATION_SERVICE. To share the device among several objects, each object (including the owner of the device) must access the device through the device manager, as follows: 1. 2. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the device. To use the device, call IDirect3DDeviceManager9::LockDevice and pass in the device handle. The method returns a pointer to theIDirect3DDevice9 interface. The method can be called in a blocking mode or a non-blocking mode, depending on the value of thefBlock parameter. When you are done using the device, call IDirect3DDeviceManager9::UnlockDevice. This method makes the device available to other objects. Before exiting, call IDirect3DDeviceManager9::CloseDeviceHandle to close the device handle.
3. 4.
You should hold the device lock only while using the device, because holding the device lock prevents other objects from using the device. The owner of the device can switch to another device at any time by calling ResetDevice, typically because the original device was lost. Device loss can occur for various reasons, including changes in the monitor resolution, power management actions, locking and unlocking the computer, and so forth. For more information, see the Direct3D documentation. The ResetDevice method invalidates any device handles that were opened previously. When a device handle is invalid, the LockDevice method returns DXVA2_E_NEW_VIDEO_DEVICE. If this occurs, close the handle and call OpenDeviceHandle again to obtain a new device handle, as shown in the following code. The following example shows how to open a device handle and lock the device.
HRESULT LockDevice( IDirect3DDeviceManager9 *pDeviceManager, BOOL fBlock, IDirect3DDevice9 **ppDevice, // Receives a pointer to the device. HANDLE *pHandle // Receives a device handle. ) { *pHandle = NULL; *ppDevice = NULL; HANDLE hDevice = 0;
HRESULT hr = pDeviceManager->OpenDeviceHandle(&hDevice); if (SUCCEEDED(hr)) { hr = pDeviceManager->LockDevice(hDevice, ppDevice, fBlock); } if (hr == DXVA2_E_NEW_VIDEO_DEVICE) { // Invalid device handle. Try to open a new device handle. hr = pDeviceManager->CloseDeviceHandle(hDevice); if (SUCCEEDED(hr)) { hr = pDeviceManager->OpenDeviceHandle(&hDevice); } // Try to lock the device again. if (SUCCEEDED(hr)) { hr = pDeviceManager->LockDevice(hDevice, ppDevice, TRUE); } } if (SUCCEEDED(hr)) { *pHandle = hDevice; } return hr; }
Supporting DXVA 2.0 in DirectShow

This topic describes how to support DirectX Video Acceleration (DXVA) 2.0 in a DirectShow decoder filter. Specifically, it describes the communication between the decoder and the video renderer. This topic does not describe how to implement DXVA decoding.
Prerequisites Migration Notes Finding a Decoder Configuration Notifying the Video Renderer Allocating Uncompressed Buffers Decoding Related topics
Prerequisites
This topic assumes that you are familiar with writing DirectShow filters. For more information, see the topic Writing DirectShow Filters in the DirectShow SDK documentation. The code examples in this topic assume that the decoder filter is derived from the CTransformFilter class, with the following class definition:
class CDecoder : public CTransformFilter { public: static CUnknown* WINAPI CreateInstance(IUnknown *pUnk, HRESULT *pHr); HRESULT CompleteConnect(PIN_DIRECTION direction, IPin *pPin);
HRESULT InitAllocator(IMemAllocator **ppAlloc); HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp); // TODO: The implementations of these methods depend on the specific decoder. HRESULT CheckInputType(const CMediaType *mtIn); HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut); HRESULT CTransformFilter::GetMediaType(int,CMediaType *); private: CDecoder(HRESULT *pHr); ~CDecoder(); CBasePin * GetPin(int n); HRESULT ConfigureDXVA2(IPin *pPin); HRESULT SetEVRForDXVA2(IPin *pPin); HRESULT FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig, /* [out] */ BOOL *pbFoundDXVA2Configuration ); private: IDirectXVideoDecoderService *m_pDecoderService; DXVA2_ConfigPictureDecode m_DecoderConfig; GUID m_DecoderGuid; HANDLE m_hDevice; FOURCC }; m_fccOutputFormat;
In the remainder of this topic, the term decoder refers to the decoder filter, which receives compressed video and outputs uncompressed video. The term decoder device refers to a hardware video accelerator implemented by the graphics driver. Here are the basic steps that a decoder filter must perform to support DXVA 2.0: 1. 2. 3. 4. Negotiate a media type. Find a DXVA decoder configuration. Notify the video renderer that the decoder is using DXVA decoding. Provide a custom allocator that allocates Direct3D surfaces.
These steps are described in more detail in the remainder of this topic.
Migration Notes
If you are migrating from DXVA 1.0, you should be aware of some significant differences between the two versions:
DXVA 2.0 does not use the IAMVideoAccelerator and IAMVideoAcceleratorNotify interfaces, because the decoder can access the DXVA 2.0 APIs directly through the IDirectXVideoDecoder interface. During media type negotiation, the decoder does not use a video acceleration GUID as the subtype. Instead, the subtype is just the uncompressed video format (such as NV12), as with software decoding.
The procedure for configuring the accelerator has changed. In DXVA 1.0, the decoder calls Execute with a DXVA_ConfigPictureDecodestructure to configure the accerlator. In DXVA 2.0, the decoder uses the IDirectXVideoDecoderService interface, as described in the next section. The decoder allocates the uncompressed buffers. The video renderer no longer allocates them. Instead of calling IAMVideoAccelerator::DisplayFrame to display the decoded frame, the decoder delivers the frame to the renderer by calling IMemInputPin::Receive, as with software decoding. The decoder is no longer responsible for checking when data buffers are safe for updates. Therefore DXVA 2.0 does not have any method equivalent to IAMVideoAccelerator::QueryRenderStatus. Subpicture blending is done by the video renderer, using the DXVA2.0 video processor APIs. Decoders that provide subpictures (for example, DVD decoders) should send subpicture data on a separate output pin.
For decoding operations, DXVA 2.0 uses the same data structures as DXVA 1.0. The enhanced video renderer (EVR) filter supports DXVA 2.0. The Video Mixing Renderer filters (VMR-7 and VMR9) support DXVA 1.0 only.
Finding a Decoder Configuration

After the decoder negotiates the output media type, it must find a compatible configuration for the DXVA decoder device. You can perform this step inside the output pin's CBaseOutputPin::CompleteConnect method. This step ensures that the graphics driver supports the capabilities needed by the decoder, before the decoder commits to using DXVA. To find a configuration for the decoder device, do the following: 1. 2. 3. 4. 5. 6. Query the renderer's input pin for the IMFGetService interface. Call IMFGetService::GetService to get a pointer to the IDirect3DDeviceManager9 interface. The service GUID is MR_VIDEO_ACCELERATION_SERVICE. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the renderer's Direct3D device. Call IDirect3DDeviceManager9::GetVideoService and pass in the device handle. This method returns a pointer to theIDirectXVideoDecoderService interface. Call IDirectXVideoDecoderService::GetDecoderDeviceGuids. This method returns an array of decoder device GUIDs. Loop through the array of decoder GUIDs to find the ones that the decoder filter supports. For example, for an MPEG-2 decoder, you would look for DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT, or DXVA2_ModeMPEG2_VLD. When you find a candidate decoder device GUID, pass the GUID to the IDirectXVideoDecoderService::GetDecoderRenderTargetsmethod. This method returns an array of render target formats, specified as D3DFORMAT values. Loop through the render target formats and look for one that matches your output format. Typically, a decoder device supports a single render target format. The decoder filter should connect to the renderer using this subtype. In the first call to CompleteConnect, the decoder can determing the render target 9. format and then return this format as a preferred output type. Call IDirectXVideoDecoderService::GetDecoderConfigurations. Pass in the same decoder device GUID, along with aDXVA2_VideoDesc structure that describes the proposed format. The method returns an array of DXVA2_ConfigPictureDecodestructures. Each structure describes one possible configuration for the decoder device. 10. Assuming that the previous steps are successful, store the Direct3D device handle, the decoder device GUID, and the configuration structure. The filter will use this information to create the decoder device. The following code shows how to find a decoder configuration.
7.
8.
HRESULT CDecoder::ConfigureDXVA2(IPin *pPin) { UINT cDecoderGuids = 0; BOOL bFoundDXVA2Configuration = FALSE; GUID guidDecoder = GUID_NULL;
DXVA2_ConfigPictureDecode config; ZeroMemory(&config, sizeof(config)); // Variables that follow must be cleaned up at the end. IMFGetService *pGetService = NULL; IDirect3DDeviceManager9 *pDeviceManager = NULL; IDirectXVideoDecoderService *pDecoderService = NULL; GUID *pDecoderGuids = NULL; // size = cDecoderGuids HANDLE hDevice = INVALID_HANDLE_VALUE; // Query the pin for IMFGetService. HRESULT hr = pPin->QueryInterface(IID_PPV_ARGS(&pGetService)); // Get the Direct3D device manager. if (SUCCEEDED(hr)) { hr = pGetService->GetService( MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pDeviceManager) ); } // Open a new device handle. if (SUCCEEDED(hr)) { hr = pDeviceManager->OpenDeviceHandle(&hDevice); } // Get the video decoder service. if (SUCCEEDED(hr)) { hr = pDeviceManager->GetVideoService( hDevice, IID_PPV_ARGS(&pDecoderService)); } // Get the decoder GUIDs. if (SUCCEEDED(hr)) { hr = pDecoderService->GetDecoderDeviceGuids( &cDecoderGuids, &pDecoderGuids); } if (SUCCEEDED(hr)) { // Look for the decoder GUIDs we want. for (UINT iGuid = 0; iGuid < cDecoderGuids; iGuid++) { // Do we support this mode? if (!IsSupportedDecoderMode(pDecoderGuids[iGuid])) { continue; } // Find a configuration that we support. hr = FindDecoderConfiguration(pDecoderService, pDecoderGuids[iGuid], &config, &bFoundDXVA2Configuration);
if (FAILED(hr)) { break; } if (bFoundDXVA2Configuration) { // Found a good configuration. Save the GUID and exit the loop. guidDecoder = pDecoderGuids[iGuid]; break; } } } if (!bFoundDXVA2Configuration) { hr = E_FAIL; // Unable to find a configuration. } if (SUCCEEDED(hr)) { // Store the things we will need later. SafeRelease(&m_pDecoderService); m_pDecoderService = pDecoderService; m_pDecoderService->AddRef(); m_DecoderConfig = config; m_DecoderGuid = guidDecoder; m_hDevice = hDevice; } if (FAILED(hr)) { if (hDevice != INVALID_HANDLE_VALUE) { pDeviceManager->CloseDeviceHandle(hDevice); } } SafeRelease(&pGetService); SafeRelease(&pDeviceManager); SafeRelease(&pDecoderService); return hr; } HRESULT CDecoder::FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig, /* [out] */ BOOL *pbFoundDXVA2Configuration ) { HRESULT hr = S_OK; UINT cFormats = 0; UINT cConfigurations = 0; D3DFORMAT DXVA2_ConfigPictureDecode *pFormats = NULL; *pConfig = NULL; // size = cFormats // size = cConfigurations
// Find the valid render target formats for this decoder GUID. hr = pDecoderService->GetDecoderRenderTargets( guidDecoder, &cFormats, &pFormats ); if (SUCCEEDED(hr)) { // Look for a format that matches our output format. for (UINT iFormat = 0; iFormat < cFormats; iFormat++) { if (pFormats[iFormat] != (D3DFORMAT)m_fccOutputFormat) { continue; } // Fill in the video description. Set the width, height, format, // and frame rate. DXVA2_VideoDesc videoDesc = {0}; FillInVideoDescription(&videoDesc); // Private helper function. videoDesc.Format = pFormats[iFormat]; // Get the available configurations. hr = pDecoderService->GetDecoderConfigurations( guidDecoder, &videoDesc, NULL, // Reserved. &cConfigurations, &pConfig ); if (FAILED(hr)) { break; } // Find a supported configuration. for (UINT iConfig = 0; iConfig < cConfigurations; iConfig++) { if (IsSupportedDecoderConfig(pConfig[iConfig])) { // This configuration is good. *pbFoundDXVA2Configuration = TRUE; *pSelectedConfig = pConfig[iConfig]; break; } } CoTaskMemFree(pConfig); break; } // End of formats loop. } CoTaskMemFree(pFormats); // Note: It is possible to return S_OK without finding a configuration.
return hr; }
Because this example is generic, some of the logic has been placed in helper functions that would need to be implemented by the decoder. The following code shows the declarations for these functions:
// Returns TRUE if the decoder supports a given decoding mode. BOOL IsSupportedDecoderMode(const GUID& mode); // Returns TRUE if the decoder supports a given decoding configuration. BOOL IsSupportedDecoderConfig(const DXVA2_ConfigPictureDecode& config); // Fills in a DXVA2_VideoDesc structure based on the input format. void FillInVideoDescription(DXVA2_VideoDesc *pDesc);
Notifying the Video Renderer

If the decoder finds a decoder configuration, the next step is to notify the video renderer that the decoder will use hardware acceleration. You can perform this step inside the CompleteConnect method. This step must occur before the allocator is selected, because it affects how the allocator is selected. 1. 2. 3. Query the renderer's input pin for the IMFGetService interface. Call IMFGetService::GetService to get a pointer to the IDirectXVideoMemoryConfiguration interface. The service GUID isMR_VIDEO_ACCELERATION_SERVICE. Call IDirectXVideoMemoryConfiguration::GetAvailableSurfaceTypeByIndex in a loop, incrementing the dwTypeIndex variable from zero. Stop when the method returns the value DXVA2_SurfaceType_DecoderRenderTarget in the pdwType parameter. This step ensures that the video renderer supports hardware-accelerated decoding. This step will always succeed for the EVR filter. If the previous step succeeded, call IDirectXVideoMemoryConfiguration::SetSurfaceType with the value DXVA2_SurfaceType_DecoderRenderTarget. Calling SetSurfaceType with this value puts the video renderer into DXVA mode. When the video renderer is in this mode, the decoder must provide its own allocator.
4.
The following code shows how to notify the video renderer.
HRESULT CDecoder::SetEVRForDXVA2(IPin *pPin) { HRESULT hr = S_OK; IMFGetService IDirectXVideoMemoryConfiguration *pGetService = NULL; *pVideoConfig = NULL;
// Query the pin for IMFGetService. hr = pPin->QueryInterface(__uuidof(IMFGetService), (void**)&pGetService); // Get the IDirectXVideoMemoryConfiguration interface. if (SUCCEEDED(hr)) { hr = pGetService->GetService( MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pVideoConfig)); } // Notify the EVR. if (SUCCEEDED(hr))
{ DXVA2_SurfaceType surfaceType; for (DWORD iTypeIndex = 0; ; iTypeIndex++) { hr = pVideoConfig->GetAvailableSurfaceTypeByIndex(iTypeIndex, &surface Type); if (FAILED(hr)) { break; } if (surfaceType == DXVA2_SurfaceType_DecoderRenderTarget) { hr = pVideoConfig->SetSurfaceType(DXVA2_SurfaceType_DecoderRenderT arget); break; } } } SafeRelease(&pGetService); SafeRelease(&pVideoConfig); return hr; }
If the decoder finds a valid configuration and successfully notifies the video renderer, the decoder can use DXVA for decoding. The decoder must implement a custom allocator for its output pin, as described in the next section.
Allocating Uncompressed Buffers

In DXVA 2.0, the decoder is responsible for allocating Direct3D surfaces to use as uncompressed video buffers. Therefore, the decoder must implement a custom allocator that will create the surfaces. The media samples provided by this allocator will hold pointers to the Direct3D surfaces. The EVR retrieves a pointer to the surface by calling IMFGetService::GetService on the media sample. The service identifier isMR_BUFFER_SERVICE. To provide the custom allocator, perform the following steps: 1. Define a class for the media samples. This class can derive from the CMediaSample class. Inside this class, do the following: Store a pointer to the Direct3D surface. Implement the IMFGetService interface. In the GetService method, if the service GUID is MR_BUFFER_SERVICE, query the Direct3D surface for the requested interface. Otherwise, GetService can return MF_E_UNSUPPORTED_SERVICE. Override the CMediaSample::GetPointer method to return E_NOTIMPL. Define a class for the allocator. The allocator can derive from the CBaseAllocator class. Inside this class, do the following. Override the CBaseAllocator::Alloc method. Inside this method, call IDirectXVideoAccelerationService::CreateSurface to create the surfaces. (The IDirectXVideoDecoderService interface inherits this method from IDirectXVideoAccelerationService.) Override the CBaseAllocator::Free method to release the surfaces. In your filter's output pin, override the CBaseOutputPin::InitAllocator method. Inside this method, create an instance of your custom allocator.
2.
3.
4.
In your filter, implement the CTransformFilter::DecideBufferSize method. The pProperties parameter indicates the number of surfaces that the EVR requires. Add to this value the number of surfaces that your decoder requires, and call IMemAllocator::SetProperties on the allocator.
The following code shows how to implement the media sample class:
class CDecoderSample : public CMediaSample, public IMFGetService { friend class CDecoderAllocator; public: CDecoderSample(CDecoderAllocator *pAlloc, HRESULT *phr) : CMediaSample(NAME("DecoderSample"), (CBaseAllocator*)pAlloc, phr, NULL, 0), m_pSurface(NULL), m_dwSurfaceId(0) { } // Note: CMediaSample does not derive from CUnknown, so we cannot use the // DECLARE_IUNKNOWN macro that is used by most of the filter classes. STDMETHODIMP QueryInterface(REFIID riid, void **ppv) { CheckPointer(ppv, E_POINTER); if (riid == IID_IMFGetService) { *ppv = static_cast<IMFGetService*>(this); AddRef(); return S_OK; } else { return CMediaSample::QueryInterface(riid, ppv); } } STDMETHODIMP_(ULONG) AddRef() { return CMediaSample::AddRef(); } STDMETHODIMP_(ULONG) Release() { // Return a temporary variable for thread safety. ULONG cRef = CMediaSample::Release(); return cRef; } // IMFGetService::GetService STDMETHODIMP GetService(REFGUID guidService, REFIID riid, LPVOID *ppv) { if (guidService != MR_BUFFER_SERVICE) { return MF_E_UNSUPPORTED_SERVICE; } else if (m_pSurface == NULL) {
return E_NOINTERFACE; } else { return m_pSurface->QueryInterface(riid, ppv); } } // Override GetPointer because this class does not manage a system memory buff er. // The EVR uses the MR_BUFFER_SERVICE service to get the Direct3D surface. STDMETHODIMP GetPointer(BYTE ** ppBuffer) { return E_NOTIMPL; } private: // Sets the pointer to the Direct3D surface. void SetSurface(DWORD surfaceId, IDirect3DSurface9 *pSurf) { SafeRelease(&m_pSurface); m_pSurface = pSurf; if (m_pSurface) { m_pSurface->AddRef(); } m_dwSurfaceId = surfaceId; } IDirect3DSurface9 DWORD }; *m_pSurface; m_dwSurfaceId;
The following code shows how to implement the Alloc method on the allocator.
HRESULT CDecoderAllocator::Alloc() { CAutoLock lock(this); HRESULT hr = S_OK; if (m_pDXVA2Service == NULL) { return E_UNEXPECTED; } hr = CBaseAllocator::Alloc(); // If the requirements have not changed, do not reallocate. if (hr == S_FALSE) { return S_OK; }
if (SUCCEEDED(hr)) { // Free the old resources. Free(); // Allocate a new array of pointers. m_ppRTSurfaceArray = new (std::nothrow) IDirect3DSurface9*[m_lCount]; if (m_ppRTSurfaceArray == NULL) { hr = E_OUTOFMEMORY; } else { ZeroMemory(m_ppRTSurfaceArray, sizeof(IDirect3DSurface9*) * m_lCount); } } // Allocate the surfaces. if (SUCCEEDED(hr)) { hr = m_pDXVA2Service->CreateSurface( m_dwWidth, m_dwHeight, m_lCount - 1, (D3DFORMAT)m_dwFormat, D3DPOOL_DEFAULT, 0, DXVA2_VideoDecoderRenderTarget, m_ppRTSurfaceArray, NULL ); } if (SUCCEEDED(hr)) { for (m_lAllocated = 0; m_lAllocated < m_lCount; m_lAllocated++) { CDecoderSample *pSample = new (std::nothrow) CDecoderSample(this, &hr) ; if (pSample == NULL) { hr = E_OUTOFMEMORY; break; } if (FAILED(hr)) { break; } // Assign the Direct3D surface pointer and the index. pSample->SetSurface(m_lAllocated, m_ppRTSurfaceArray[m_lAllocated]); // Add to the sample list. m_lFree.Add(pSample); } } if (SUCCEEDED(hr)) {
m_bChanged = FALSE; } return hr; }
Here is the code for the Free method:
void CDecoderAllocator::Free() { CMediaSample *pSample = NULL; do { pSample = m_lFree.RemoveHead(); if (pSample) { delete pSample; } } while (pSample); if (m_ppRTSurfaceArray) { for (long i = 0; i < m_lAllocated; i++) { SafeRelease(&m_ppRTSurfaceArray[i]); } delete [] m_ppRTSurfaceArray; } m_lAllocated = 0; }
For more information about implementing custom allocators, see the topic Providing a Custom Allocator in the DirectShow SDK documentation.
Decoding
To create the decoder device, call IDirectXVideoDecoderService::CreateVideoDecoder. The method returns a pointer to theIDirectXVideoDecoder interface of the decoder device. On each frame, call IDirect3DDeviceManager9::TestDevice to test the device handle. If the device has changed, the method returns DXVA2_E_NEW_VIDEO_DEVICE. If this occurs, do the following: 1. 2. 3. 4. 5. Close the device handle by calling IDirect3DDeviceManager9::CloseDeviceHandle. Release the IDirectXVideoDecoderService and IDirectXVideoDecoder pointers. Open a new device handle. Negotiate a new decoder configuration, as described in the section Finding a Decoder Configuration. Create a new decoder device.
Assuming that the device handle is valid, the decoding process works as follows: 1. 2. Call IDirectXVideoDecoder::BeginFrame. Do the following one or more times: 1. Call IDirectXVideoDecoder::GetBuffer to get a DXVA decoder buffer. 2. Fill the buffer. 3. Call IDirectXVideoDecoder::ReleaseBuffer.
3.
Call IDirectXVideoDecoder::Execute to perform the decoding operations on the frame.
DXVA 2.0 uses the same data structures as DXVA 1.0 for decoding operations. For the original set of DXVA profiles (for H.261, H.263, and MPEG-2), these data structures are described in the DXVA 1.0 specification. Within each pair of BeginFrame/Execute calls, you may call GetBuffer multiple times, but only once for each type of DXVA buffer. If you call it twice with the same buffer type, you will overwrite the data. After calling Execute, call IMemInputPin::Receive to deliver the frame to the video renderer, as with software decoding. The Receive method is asynchronous; after it returns, the decoder can continue decoding the next frame. The display driver prevents any decoding commands from overwriting the buffer while the buffer is in use. The decoder should not reuse a surface to decode another frame until the renderer has released the sample. When the renderer releases the sample, the allocator puts the sample back into its pool of available samples. To get the next available sample, call CBaseOutputPin::GetDeliveryBuffer, which in turn calls IMemAllocator::GetBuffer. For more information, see the topic Overview of Data Flow in DirectShow in the DirectShow documentation.
Supporting DXVA 2.0 in Media Foundation

This topic describes how to support DirectX Video Acceleration (DXVA) 2.0 in a Media Foundation transform (MFT) using Microsoft Direct3D 9 Specifically, it describes the communication between the decoder and the video renderer, which is mediated by the topology loader. This topic does not describe how to implement DXVA decoding. In the remainder of this topic, the term decoder refers to the decoder MFT, which receives compressed video and outputs uncompressed video. The term decoder device refers to a hardware video accelerator implemented by the graphics driver. Here are the basic steps that a decoder must perform to support DXVA 2.0 in Media Foundation: 1. 2. 3. 4. Open a handle to the Direct3D 9 device. Find a DXVA decoder configuration. Allocate uncompressed Buffers. Decode frames.
These steps are described in more detail in the remainder of this topic.
Opening a Direct3D Device Handle

The MFT uses the Microsoft Direct3D device manager to get a handle to the Direct3D 9 device. To open the device handle, perform the following steps: 1. Expose the MF_SA_D3D_AWARE attribute with the value TRUE. The topology loader queries this attribute by callingIMFTransform::GetAttributes. Setting the attribute to TRUE notifies the topology loader that the MFT supports DXVA. When format negotiation begins, the topology loader calls IMFTransform::ProcessMessage with theMFT_MESSAGE_SET_D3D_MANAGER message. The ulParam parameter is an IUnknown pointer to the video renderer's Direct3D device manager. Query this pointer for the IDirect3DDeviceManager9 interface. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the renderer's Direct3D device. Call IDirect3DDeviceManager9::GetVideoService and pass in the device handle. This method returns a pointer to the IDirectXVideoDecoderService interface. Cache the pointers and the device handle.
2.
3. 4. 5.
Finding a Decoder Configuration
The MFT must find a compatible configuration for the DXVA decoder device. Perform the following steps inside theIMFTransform::SetInputType method, after validating the input type: 1. 2. Call IDirectXVideoDecoderService::GetDecoderDeviceGuids. This method returns an array of decoder device GUIDs. Loop through the array of decoder GUIDs to find the ones that the decoder supports. For example, for an MPEG-2 decoder, you would look for DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT, orDXVA2_ModeMPEG2_VLD. When you find a candidate decoder device GUID, pass the GUID to theIDirectXVideoDecoderService::GetDecoderRenderTargets method. This method returns an array of render target formats, specified as D3DFORMAT values. Loop through the render target formats and look for a format supported by the decoder. Call IDirectXVideoDecoderService::GetDecoderConfigurations. Pass in the same decoder device GUID, along with aDXVA2_VideoDesc structure that describes the proposed output format. The method returns an array ofDXVA2_ConfigPictureDecode structures. Each structure describes one possible configuration for the decoder device. Look for a configuration that the decoder supports. Store the render target format and configuration.
3.
4. 5.
6.
In the IMFTransform::GetOutputAvailableType method, return an uncompressed video format, based on the proposed render target format. In the IMFTransform::SetOutputType method, check the media type against the render target format. Fallback to Software Decoding If the MFT cannot find a DXVA configuration (for example, if the graphics driver does not have the right capabilities), it should return the error code MF_E_UNSUPPORTED_D3D_TYPE from the SetInputType and SetOutputType methods. The topology loader will respond by sending the MFT_MESSAGE_SET_D3D_MANAGER message with the value NULL for the ulParamparameter. The MFT should release its pointer to the IDirect3DDeviceManager9 interface. The topology loader will then renegotiate the media type, and the MFT can use software decoding.
Allocating Uncompressed Buffers

In DXVA 2.0, the decoder is responsible for allocating Direct3D surfaces to use as uncompressed video buffers. The decoder should allocate 3 surfaces for the EVR to use for deinterlacing. This number is fixed, because Media Foundation does not provide a way for the EVR to specify how many surfaces the graphics driver requires for deinterlacing. Three surfaces should be sufficient for any driver. In the IMFTransform::GetOutputStreamInfo method, set the MFT_OUTPUT_STREAM_PROVIDES_SAMPLES flag in theMFT_OUTPUT_STREAM_INFO structure. This flag notifies the Media Session that the MFT allocates its own output samples. To create the surfaces, call IDirectXVideoAccelerationService::CreateSurface. (The IDirectXVideoDecoderServiceinterface inherits this method from IDirectXVideoAccelerationService.) You can do this in SetInputType, after finding the render target format. For each surface, call MFCreateVideoSampleFromSurface to create a media sample to hold the surface. The method returns a pointer to the IMFSample interface.
Decoding
To create the decoder device, call IDirectXVideoDecoderService::CreateVideoDecoder. The method returns a pointer to theIDirectXVideoDecoder interface of the decoder device. Decoding should occur inside the IMFTransform::ProcessOutput method. On each frame, callIDirect3DDeviceManager9::TestDevice to test the device handle. If the device has changed, the method returnsDXVA2_E_NEW_VIDEO_DEVICE. If this occurs, do the following: 1. 2. 3. Close the device handle by calling IDirect3DDeviceManager9::CloseDeviceHandle. Release the IDirectXVideoDecoderService and IDirectXVideoDecoder pointers. Open a new device handle.
4. 5.
Negotiate a new decoder configuration, as described in "Finding a Decoder Configuration" earlier on this page. Create a new decoder device.
Assuming that the device handle is valid, the decoding process works as follows: 1. 2. 3. Get an available surface that is not currently in use. (Initially all of the surfaces are available.) Query the media sample for the IMFTrackedSample interface. Call IMFTrackedSample::SetAllocator and provide a pointer to the IMFAsyncCallback interface, implemented by the decoder. When the video renderer releases the sample, the decoder's callback will be invoked. Call IDirectXVideoDecoder::BeginFrame. Do the following one or more times: 1. Call IDirectXVideoDecoder::GetBuffer to get a DXVA decoder buffer. 2. Fill the buffer. 3. Call IDirectXVideoDecoder::ReleaseBuffer. 4. Call IDirectXVideoDecoder::Execute to perform the decoding operations on the frame.
4. 5.
DXVA 2.0 uses the same data structures as DXVA 1.0 for decoding operations. For the original set of DXVA profiles (for H.261, H.263, and MPEG-2), these data structures are described in the DXVA 1.0 specification. Within each pair of BeginFrame/Execute calls, you may call GetBuffer multiple times, but only once for each type of DXVA buffer. If you call it twice with the same buffer type, you will overwrite the data. Use the callback from the SetAllocator method (step 3) to keep track of which samples are currently available and which are in use.
Related topics
DirectX Video Acceleration 2.0 Media Foundation Transforms
DXVA Video Processing

DXVA video processing encapsulates the functions of the graphics hardware that are devoted to processing uncompressed video images. Video processing services include dinterlacing and video mixing. This topic contains the following sections:
Overview Creating a Video Processing Device Get the IDirectXVideoProcessorService Pointer Enumerate the Video Processing Devices Enumerate Render-Target Formats Query the Device Capabilities Create the Device Video Process Blit Blit Parameters Input Samples Image Composition Example 1: Letterboxing Example 2: Stretching Substream Images Example 3: Mismatched Stream Heights Example 4: Target Rectangle Smaller Than Destination Surface Example 5: Source Rectangles Example 6: Intersecting Destination Rectangles Example 7: Stretching and Cropping Video Input Sample Order Example 1
Example 2 Example 3 Example 4 Related topics
Overview
Graphics hardware can use the graphics processing unit (GPU) to process uncompressed video images. A video processingdevice is a software component that encapsulates these functions. Applications can use a video processing device to perform functions such as:
Deinterlacing and inverse telecine Mixing video substreams onto the main video image Color adjustment (ProcAmp) and image filtering Image scaling Color-space conversion Alpha blending
The following diagram shows the stages in the video processing pipeline. The diagram is not meant to show an actual implementation. For example, the graphics driver might combine several stages into a single operation. All of these operations can be performed in a single call to the video processing device. Some stages shown here, such as noise and detail filtering, might not be supported by the driver.
The input to the video processing pipeline always includes a primary video stream, which contains the main image data. The primary video stream determines the frame rate for the output video. Each frame of the output video is calculated relative to the input data from the primary video stream. Pixels in the primary stream are always opaque, with no per-pixel alpha data. The primary video stream can be progressive or interlaced. Optionally, the video processing pipeline can receive up to 15 video substreams. A substream contains auxiliary image data, such as closed captions or DVD subpictures. These images are displayed over the primary video stream, and are generally not meant to be shown by themselves. Substream pictures can contain per-pixel alpha data, and are always progressive frames. The video processing device alpha-blends the substream images with the current deinterlaced frame from the primary video stream. In the remainder of this topic, the term picture is used for the input data to a video processing device. A picture might consist of a progressive frame, a single field, or two interleaved fields. The output is always a deinterlaced frame. A video driver can implement more than one video processing device, to provide different sets of video processing capabilities. Devices are identified by GUID. The following GUIDs are predefined:
DXVA2_VideoProcBobDevice. This device performs bob deinterlacing.
DXVA2_VideoProcProgressiveDevice. This device is used if the video contains only progressive frames, with no interlaced frames. (Some video content contains a mix of progressive and interlaced frames. The progressive device cannot be used for this kind of "mixed" video content, because a deinterlacing step is required for the interlaced frames.)
Every graphics driver that supports DXVA video processing must implement at least these two devices. The graphics driver may also provide other devices, which are identified by driver-specific GUIDs. For example, a driver might implement a proprietary deinterlacing algorithm that produces better quality output than bob deinterlacing. Some deinterlacing algorithms may require forward or backward reference pictures from the primary stream. If so, the caller must provide these pictures to the driver in the correct sequence, as described later in this section. A reference software device is also provided. The software device is optimized for quality rather than speed, and may not be adequate for real-time video processing. The reference software device uses the GUID value DXVA2_VideoProcSoftwareDevice.
Creating a Video Processing Device

Before using DXVA video processing, the application must create a video processing device. Here is a brief outline of the steps, which are explained in greater detail in the remainder of this section: 1. 2. 3. Get a pointer to the IDirectXVideoProcessorService interface. Create a description of the video format for the primary video stream. Use this description to get a list of the video processing devices that support the video format. Devices are identified by GUID. For a particular device, get a list of render-target formats supported by the device. The formats are returned as a list ofD3DFORMAT values. If you plan to mix substreams, get a list of the supported substream formats as well. Query the capabilities of each device. Create the video processing device.
4. 5.
Sometimes you can omit some of these steps. For example, instead of getting the list of render-target formats, you could simply try creating the video processing device with your preferred format, and see if it succeeds. A common format such as D3DFMT_X8R8G8B8 is likely to succeed. The remainder of this section describes these steps in detail. Get the IDirectXVideoProcessorService Pointer The IDirectXVideoProcessorService interface is obtained from the Direct3D device. There are two ways to get a pointer to this interface:
From a Direct3D device. From the Direct3D Device Manager.
If you have a pointer to a Direct3D device, you can get an IDirectXVideoProcessorService pointer by calling theDXVA2CreateVideoService function. Pass in a pointer to the device's IDirect3DDevice9 interface, and specifyIID_IDirectXVideoProcessorService for the riid parameter, as shown in the following code:
// Create the DXVA-2 Video Processor service. hr = DXVA2CreateVideoService(g_pD3DD9, IID_PPV_ARGS(&g_pDXVAVPS));
n some cases, one object creates the Direct3D device and then shares it with other objects through the Direct3D Device Manager. In this situation, you can call IDirect3DDeviceManager9::GetVideoService on the device manager to get theIDirectXVideoProcessorService pointer, as shown in the following code:
HRESULT GetVideoProcessorService( IDirect3DDeviceManager9 *pDeviceManager,
IDirectXVideoProcessorService **ppVPService ) { *ppVPService = NULL; HANDLE hDevice; HRESULT hr = pDeviceManager->OpenDeviceHandle(&hDevice); if (SUCCEEDED(hr)) { // Get the video processor service HRESULT hr2 = pDeviceManager->GetVideoService( hDevice, IID_PPV_ARGS(ppVPService) ); // Close the device handle. hr = pDeviceManager->CloseDeviceHandle(hDevice); if (FAILED(hr2)) { hr = hr2; } } if (FAILED(hr)) { SafeRelease(ppVPService); } return hr; }
Enumerate the Video Processing Devices To get a list of video processing devices, fill in a DXVA2_VideoDesc structure with the format of the primary video stream, and pass this structure to the IDirectXVideoProcessorService::GetVideoProcessorDeviceGuids method. The method returns an array of GUIDs, one for each video processing device that can be used with this video format. Consider an application that renders a video stream in YUY2 format, using the BT.709 definition of YUV color, with a frame rate of 29.97 frames per second. Assume that the video content consists entirely of progressive frames. The following code fragment shows how to fill in the format description and get the device GUIDs:
// Initialize the video descriptor. g_VideoDesc.SampleWidth g_VideoDesc.SampleHeight g_VideoDesc.SampleFormat.VideoChromaSubsampling _MPEG2; g_VideoDesc.SampleFormat.NominalRange g_VideoDesc.SampleFormat.VideoTransferMatrix [0]; g_VideoDesc.SampleFormat.VideoLighting g_VideoDesc.SampleFormat.VideoPrimaries g_VideoDesc.SampleFormat.VideoTransferFunction = VIDEO_MAIN_WIDTH; = VIDEO_MAIN_HEIGHT; = DXVA2_VideoChromaSubsampling = DXVA2_NominalRange_16_235; = EX_COLOR_INFO[g_ExColorInfo] = DXVA2_VideoLighting_dim; = DXVA2_VideoPrimaries_BT709; = DXVA2_VideoTransFunc_709;
g_VideoDesc.SampleFormat.SampleFormat ; g_VideoDesc.Format g_VideoDesc.InputSampleFreq.Numerator g_VideoDesc.InputSampleFreq.Denominator g_VideoDesc.OutputFrameFreq.Numerator g_VideoDesc.OutputFrameFreq.Denominator // Query the video processor GUID. UINT count; GUID* guids = NULL;
= DXVA2_SampleProgressiveFrame = = = = = VIDEO_MAIN_FORMAT; VIDEO_FPS; 1; VIDEO_FPS; 1;
hr = g_pDXVAVPS->GetVideoProcessorDeviceGuids(&g_VideoDesc, &count, &guids);
The code for this example is taken from the DXVA2_VideoProc SDK sample. The pGuids array in this example is allocated by the GetVideoProcessorDeviceGuids method, so the application must free the array by calling CoTaskMemFree. The remaining steps can be performed using any of the device GUIDs returned by this method. Enumerate Render-Target Formats To get the list of render-target formats supported by the device, pass the device GUID and the DXVA2_VideoDesc structure to the IDirectXVideoProcessorService::GetVideoProcessorRenderTargets method, as shown in the following code:
// Query the supported render-target formats. UINT i, count; D3DFORMAT* formats = NULL; HRESULT hr = g_pDXVAVPS->GetVideoProcessorRenderTargets( guid, &g_VideoDesc, &count, &formats); if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorRenderTargets failed: 0x%x.\n", hr)); return FALSE; } for (i = 0; i < count; i++) { if (formats[i] == VIDEO_RENDER_TARGET_FORMAT) { break; } } CoTaskMemFree(formats); if (i >= count) { DBGMSG((L"The device does not support the render-target format.\n")); return FALSE; }
The method returns an array of D3DFORMAT values. In this example, where the input type is YUY2, a typical list of formats might be D3DFMT_X8R8G8B8 (32-bit RGB) and D3DMFT_YUY2 (the input format). However, the exact list will depend on the driver. The list of available formats for the substreams can vary depending on the render-target format and the input format. To get the list of substream formats, pass the device GUID, the format structure, and the render-target format to theIDirectXVideoProcessorService::GetVideoProcessorSubStreamFormats method, as shown in the following code:
// Query the supported substream formats. formats = NULL; hr = g_pDXVAVPS->GetVideoProcessorSubStreamFormats( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, &count, &formats); if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorSubStreamFormats failed: 0x%x.\n", hr)); return FALSE; } for (i = 0; i < count; i++) { if (formats[i] == VIDEO_SUB_FORMAT) { break; } } CoTaskMemFree(formats); if (i >= count) { DBGMSG((L"The device does not support the substream format.\n")); return FALSE; }
This method returns another array of D3DFORMAT values. Typical substream formats are AYUV and AI44. Query the Device Capabilities To get the capabilities of a particular device, pass the device GUID, the format structure, and a render-target format to theIDirectXVideoProcessorService::GetVideoProcessorCaps method. The method fills in a DXVA2_VideoProcessorCapsstructure structure with the device capabilities.
// Query video processor capabilities. hr = g_pDXVAVPS->GetVideoProcessorCaps( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, &g_VPCaps); if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorCaps failed: 0x%x.\n", hr)); return FALSE; }
Create the Device To create the video processing device, call IDirectXVideoProcessorService::CreateVideoProcessor. The input to this method is the device GUID, the format description, the render-target format, and the maximum number of substreams that you plan to mix. The method returns a pointer to the IDirectXVideoProcessor interface, which represents the video processing device.
// Finally create a video processor device. hr = g_pDXVAVPS->CreateVideoProcessor( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, SUB_STREAM_COUNT, &g_pDXVAVPD );
Video Process Blit

The main video processing operation is the video processing blit. (A blit is any operation that combines two or more bitmaps into a single bitmap. A video processing blit combines input pictures to create an output frame.) To perform a video processing blit, call IDirectXVideoProcessor::VideoProcessBlt. This method passes a set of video samples to the video processing device. In response, the video processing device processes the input pictures and generates one output frame. Processing can include deinterlacing, color-space conversion, and substream mixing. The output is written to a destination surface provided by the caller. The VideoProcessBlt method takes the following parameters:
pRT points to an IDirect3DSurface9 render target surface that will receive the processed video frame. pBltParams points to a DXVA2_VideoProcessBltParams structure that specifies the parameters for the blit. pSamples is the address of an array of DXVA2_VideoSample structures. These structures contain the input samples for the blit. NumSamples gives the size of the pSamples array. The Reserved parameter is reserved and should be set to NULL.
In the pSamples array, the caller must provide the following input samples:
The current picture from the primary video stream. Forward and backward reference pictures, if required by the deinterlacing algorithm. Zero or more substream pictures, up to a maximum of 15 substreams.
The driver expects this array to be in a particular order, as described in Input Sample Order. Blit Parameters The DXVA2_VideoProcessBltParams structure contains general parameters for the blit. The most important parameters are stored in the following members of the structure:
TargetFrame is the presentation time of the output frame. For progressive content, this time must equal the start time for the current frame from the primary video stream. This time is specified in the Start member of theDXVA2_VideoSample structure for that input sample. For interlaced content, a frame with two interleaved fields produces two deinterlaced output frames. On the first output frame, the presentation time must equal the start time of the current picture in the primary video stream, just like progressive content. On the second output frame, the start time must equal the midpoint between the start time of the current picture in the primary video stream and the start time of the next picture in the stream. For example, if the input video is 25 frames per second (50 fields per
second), the output frames will have the time stamps shown in the following table. Time stamps are shown in units of 100 nanoseconds.
Input picture
TargetFrame (1)
TargetFrame (2)
200000
400000
600000
800000
800000
1000000
1200000
1200000
1400000
If interlaced content consists of single fields rather than interleaved fields, the output times always match the input times, as with progressive content.
TargetRect defines a rectangular region within the destination surface. The blit will write the output to this region. Specifically, every pixel inside TargetRect will be modified, and no pixels outside of TargetRect will be modified. The target rectangle defines the bounding rectangle for all of the input video streams. Placement of individual streams within that rectangle is controlled through the pSamples parameter of IDirectXVideoProcessor::VideoProcessBlt. BackgroundColor gives the color of the background wherever no video image appears. For example, when a 16 x 9 video image is displayed within a 4 x 3 area (letterboxing), the letterboxed regions are displayed with the background color. The background color applies only within the target rectangle (TargetRect). Any pixels outside of TargetRectare not modified. DestFormat describes the color space for the output videofor example, whether ITU-R BT.709 or BT.601 color is used. This information can affect how the image is displayed. For more information, see Extended Color Information.
Other parameters are described on the reference page for the DXVA2_VideoProcessBltParams structure. Input Samples The pSamples parameter of IDirectXVideoProcessor::VideoProcessBlt points to an array of DXVA2_VideoSample structures. Each of these structures contains information about one input sample and a pointer to the Direct3D surface that contains the sample. Each sample is one of the following:
The current picture from the primary stream. A forward or backward reference picture from the primary stream, used for deinterlacing. A substream picture.
The exact order in which the samples must appear in the array is described later, in the section Input Sample Order. Up to 15 substream pictures can be provided, although most video applications need only one substream, at the most. The number of substreams can change with each call to VideoProcessBlt. Substream pictures are indicated
by setting theSampleFormat.SampleFormat member of the DXVA2_VideoSample structure equal to DXVA2_SampleSubStream. For the primary video stream, this member describes the interlacing of the input video. For more information, seeDXVA2_SampleFormat enumeration. For the primary video stream, the Start and End members of the DXVA2_VideoSample structure give the start and end times of the input sample. For substream pictures, set these values to zero, because the presentation time is always calculated from the primary stream. The application is responsible for tracking when each substream picture should be presented and submitting it to VideoProcessBlt at the proper time. Two rectangles define how the source video is positioned for each stream:
The SrcRect member of the DXVA2_VideoSample structure specifies the source rectangle, a rectangular region of the source picture that will appear in the composited output frame. To crop the picture, set this to a value smaller than the frame size. Otherwise, set it equal to the frame size. The DstRect member of the same structure specifies the destination rectangle, a rectangular region of the destination surface where the video frame will appear.
The driver blits pixels from the source rectangle into the destination rectangle. The two rectangles can have different sizes or aspect ratios; the driver will scale the image as needed. Moreover, each input stream can use a different scaling factor. In fact, scaling might be necessary to produce the correct aspect ratio in the output frame. The driver does not take the source's pixel aspect ratio into account, so if the source image uses non-square pixels, it is up to the application to calculate the correct destination rectangle. The preferred substream formats are AYUV and AI44. The latter is a palletized format with 16 colors. Palette entries are specified in the Pal member of the DXVA2_VideoSample structure. (If your source video format is originally expressed as a Media Foundation media type, the palette entries are stored in the MF_MT_PALETTE attribute.) For non-palletized formats, clear this array to zero.
Image Composition
Every blit operation is defined by the following three rectangles:
The target rectangle (TargetRect) defines the region within the destination surface where the output will appear. The output image is clipped to this rectangle. The destination rectangle for each stream (DstRect) defines where the input stream appears in the composited image. The source rectangle for each stream (SrcRect) defines which part of the source image appears.
The target and destination rectangles are specified relative to the destination surface. The source rectangle is specified relative to the source image. All rectangles are specified in pixels.
The video processing device alpha blends the input pictures, using any of the following sources of alpha data:
Per-pixel alpha data from substreams.
A planar alpha value for each video stream, specified in the PlanarAlpha member of the DXVA2_VideoSamplestructure. The planar alpha value of the composited image, specified in the Alpha member of theDXVA2_VideoProcessBltParams structure. This value is used to blend the entire composited image with the background color.
This section gives a series of examples that show how the video processing device creates the output image. Example 1: Letterboxing This example shows how to letterbox the source image, by setting the destination rectangle to be smaller than the target rectangle. The primary video stream in this example is a 720 480 image, and is meant to be displayed at a 16:9 aspect ratio. The destination surface is 640 480 pixels (4:3 aspect ratio). To achieve the correct aspect ratio, the destination rectangle must be 640 360. For simplicity, this example does not include a substream. The following diagram shows the source and destination rectangles.
The preceding diagram shows the following rectangles:
Target rectangle: { 0, 0, 640, 480 } Primary video: Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 60, 640, 420 }
The driver will deinterlace the video, shrink the deinterlaced frame to 640 360, and blit the frame into the destination rectangle. The target rectangle is larger than the destination rectangle, so the driver will use the background color to fill the horizontal bars above and below the frame. The background color is specified in the DXVA2_VideoProcessBltParamsstructure. Example 2: Stretching Substream Images Substream pictures can extend beyond the primary video picture. In DVD video, for example, the primary video stream can have a 4:3 aspect ratio while the substream is 16:9. In this example, both video streams have the same source dimensions (720 480), but the substream is intended to be shown at a 16:9 aspect ratio. To achieve this aspect ratio, the substream image is stretched horizontally. The source and destination rectangles are shown in the following diagram.
Target rectangle: { 0, 0, 854, 480 } Primary video:
Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 107, 474, 480 } Substream: Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 0, 854, 480 }
These values preserve the image height and scale both images horizontally. In the regions where both images appear, they are alpha blended. Where the substream picture exends beyond the primay video, the substream is alpha blended with the background color. This alpha blending accounts for the altered colors in the right-hand side of the diagram. Example 3: Mismatched Stream Heights In the previous example, the substream and the primary stream are the same height. Streams can also have mismatched heights, as shown in this example. Areas within the target rectangle where no video appears are drawn using the background colorblack in this example. The source and destination rectangles are shown in the following diagram.
Target rectangle: { 0, 0, 150, 85 } Primary video: Source rectangle: { 0, 0, 150, 50 } Destination rectangle: { 0, 17, 150, 67 } Substream: Source rectangle: { 0, 0, 100, 85 } Destination rectangle: { 25, 0, 125, 85 }
Example 4: Target Rectangle Smaller Than Destination Surface This example shows a case where the target rectangle is smaller than the destination surface.
Destination surface: { 0, 0, 300, 200 } Target rectangle: { 0, 0, 150, 85 } Primary video: Source rectangle: { 0, 0, 150, 50 } Destination rectangle: { 0, 17, 150, 67 }
Substream: Source rectangle: { 0, 0, 100, 85 } Destination rectangle: { 25, 0, 125, 85 }
Pixels outside of the target rectangle are not modified, so the background color appears only within the target rectangle. The dotted area indicates portions of the destination surface that are not affected by the blit. Example 5: Source Rectangles If you specify a source rectangle that is smaller than the source picture, the driver will blit just that portion of the picture. In this example, the source rectangles specify the lower-right quadrant of the primary video stream and the lower-left quadrant of the substream (indicated by hash marks in the diagram). The destination rectangles are the same sizes as the source rectangles, so the video is not stretched. The source and destination rectangles are shown in the following diagram.
Target rectangle: { 0, 0, 720, 576 } Primary video: Source surface size: { 0, 0, 720, 480 } Source rectangle: { 360, 240, 720, 480 } Destination rectangle: { 0, 0, 360, 240 } Substream: Source surface size: { 0, 0, 640, 576 } Source rectangle: { 0, 288, 320, 576 } Destination rectangle: { 400, 0, 720, 288 }
Example 6: Intersecting Destination Rectangles This example is similar to previous one, but the destination rectangles intersect. The surface dimensions are the same as in the previous example, but the source and destination rectangles are not. Again, the video is cropped but not stretched. The source and destination rectangles are shown in the following diagram.
Example 7: Stretching and Cropping Video In this example, the video is stretched as well as cropped. A 180 120 region from each stream is stretched to cover a 360 240 area in the destination rectangle.
Input Sample Order

The pSamples parameter of the VideoProcessBlt method is a pointer to an array of input samples. Samples from the primary video stream appear first, followed by substream pictures in Z-order. Samples must be placed into the array in the following order:
Samples for the primary video stream appear first in the array, in temporal order. Depending on the deinterlace mode, the driver may require one or more reference samples from the primary video stream. The NumForwardRefSamplesand NumBackwardRefSamples members of the DXVA2_VideoProcessorCaps structure specify how many forward and backward reference samples are needed. The caller must provide these reference samples even if the video content is progressive and does not require deinterlacing. (This can occur when progressive frames are given to a deinterlacing device, for example when the source contains a mix of both interlaced and progressive frames.) After the samples for the primary video stream, the array can contain up to 15 substream samples, arranged in Z-order, from bottom to top. Substreams are always progressive and do not require reference pictures.
At any time, the primary video stream can switch between interlaced and progressive content, and the number of substreams can change. The SampleFormat.SampleFormat member of the DXVA2_VideoSample structure indicates the type of picture. For substream pictures, set this value to DXVA2_SampleSubStream. For progressive pictures, the value is DXVA2_SampleProgressiveFrame. For interlaced pictures, the value depends on the field layout. If the driver requires forward and backward reference samples, the full number of samples might not be available at the start of the video sequence. In that case, include entries for them in the pSamples array, but mark the missing samples as having type DXVA2_SampleUnknown. The Start and End members of the DXVA2_VideoSample structure give the temporal location of each sample. These values are used only for samples from the primary video stream. For substream pictures, set both members to zero. The following examples may help to clarify these requirements. Example 1 The simplest case occurs when there are no substreams and the deinterlacing algorithm does not require reference samples (NumForwardRefSamples and NumBackwardRefSamples are both zero). Bob deinterlacing is an example of such an algorithm. In this case, the pSamples array should contain a single input surface, as shown in the following table. Index Surface type Temporal location
pSamples[0]
Interlaced picture.
The time value T is assumed to be the start time of the current video frame. Example 2 In this example, the application mixes two substreams with the primary stream. The deinterlacing algorithm does not require reference samples. The following table shows how these samples are arranged in the pSamples array. Index Surface type Temporal location Z-order
pSamples[0]
Interlaced picture
pSamples[1]
Substream
pSamples[2]
Substream
Example 3 Now suppose that the deinterlacing algorithm requires one backward reference sample and one forward reference sample. In addition, two substream pictures are provided, for a total of five surfaces. The correct ordering is shown in the following table. Index Surface type Temporal location Z-order
pSamples[0]
Interlaced picture (reference)
T 1
Not applicable
pSamples[1]
Interlaced picture
pSamples[2]
T +1
Not applicable
pSamples[3]
Substream
pSamples[4]
Substream
The time T 1 is the start time of the frame before the current frame, and T +1 is the start time of the following frame. If the video stream switches to progressive content, using the same deinterlacing mode, the application must provide the same number of samples, as shown in the following table. Index Surface type Temporal location Z-order
pSamples[0]
Progressive picture (reference)
T 1
Not applicable
pSamples[1]
Progressive picture
pSamples[2]
Progressive picture (reference)
T +1
Not applicable
pSamples[3]
Substream
pSamples[4]
Substream
Example 4 At the start of a video sequence, forward and backward reference samples might not be available. When this happens, entries for the missing samples are included in the pSamples array, with sample type DXVA2_SampleUnknown. Assuming that the deinterlacing mode needs one forward and one backward reference sample, the first three calls toVideoProcessBlt would have the sequences of inputs shown in the following three tables. Index Surface type Temporal location
pSamples[0]
Unknown
pSamples[1]
Unknown
pSamples[2]
T +1
Index
Surface type
Temporal location
pSamples[0]
Unknown
pSamples[1]
Interlaced picture
pSamples[2]
T +1
Index
Surface type
Temporal location
pSamples[0]
Interlaced picture
T 1
pSamples[1]
Interlaced picture
pSamples[2]
T +1
DXVA-HD
Microsoft DirectX Video Acceleration High Definition (DXVA-HD) is an API for hardware-accelerated video processing. DXVA-HD uses the GPU to perform functions such as deinterlacing, compositing, and color-space conversion. DXVA-HD is similar to DXVA Video Processing (DXVA-VP), but offers enhanced features and a simpler processing model. By providing a more flexible composition model, DXVA-HD is designed to support the next generation of HD optical formats and broadcast standards. The DXVA-HD API requires either a WDDM display driver that supports the DXVA-HD device driver interface (DDI), or a plug-in software processor.
Improvements over DXVA-VP Related topics
Improvements over DXVA-VP

DXVA-HD expands the set of features provided by DXVA-VP. Enhancements include:
RGB and YUV mixing. Any stream can be either RGB or YUV. There is no longer a distinction between the primary stream and the substreams. Deinterlacing of multiple streams. Any stream can be either progressive or interlaced. Moreover, the cadence and frame rate can can vary from one input stream to the next. RGB background colors. Previously, only YUV background colors were supported. Luma keying. When luma keying is enabled, luma values that fall within a designated range become transparent. Dynamic switching between deinterlace modes.
DXVA-HD also defines some advanced features that drivers can support. However, applications should not assume that all drivers will support these features. The advanced features include:
Inverse telecine (for example, 60i to 24p). Frame-rate conversion (for example, 24p to 120p). Alpha-fill modes. Noise reduction and edge enhancement filtering. Anamorphic non-linear scaling. Extended YCbCr (xvYCC).
This section contains the following topics.
Creating a DXVA-HD Video Processor Checking Supported DXVA-HD Formats Creating DXVA-HD Video Surfaces Setting DXVA-HD States Performing the DXVA-HD Blit
Creating a DXVA-HD Video Processor

Microsoft DirectX Video Acceleration High Definition (DXVA-HD) uses two primary interfaces:
IDXVAHD_Device. Represents the DXVA-HD device. Use this interface to query the device capabilities and create the video processor. IDXVAHD_VideoProcessor. Represents a set of video processing capabilities. Use this interface to perform the video processing blit.
In the code that follows, the following global variables are assumed:
IDirect3D9Ex IDirect3DDevice9Ex IDXVAHD_Device IDXVAHD_VideoProcessor IDirect3DSurface9 const const const const const D3DFORMAT D3DFORMAT UINT UINT UINT
*g_pD3D = NULL; *g_pD3DDevice = NULL; *g_pDXVAHD = NULL; *g_pDXVAVP = NULL; *g_pSurface = NULL; = = = = =
// // // //
Direct3D device. DXVA-HD device. DXVA-HD video processor. Video surface.
RENDER_TARGET_FORMAT VIDEO_FORMAT VIDEO_FPS VIDEO_WIDTH VIDEO_HEIGHT
D3DFMT_X8R8G8B8; D3DFMT_X8R8G8B8; 60; 640; 480;
To create a DXVA-HD video processor: 1. Fill in a DXVAHD_CONTENT_DESC structure with a description of the video content. The driver uses this information as a hint to optimize the capabilities of the video processor. The structure does not contain a complete format description.
2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
DXVAHD_RATIONAL fps = { VIDEO_FPS, 1 }; DXVAHD_CONTENT_DESC desc; desc.InputFrameFormat = DXVAHD_FRAME_FORMAT_PROGRESSIVE; desc.InputFrameRate = fps; desc.InputWidth = VIDEO_WIDTH; desc.InputHeight = VIDEO_HEIGHT; desc.OutputFrameRate = fps; desc.OutputWidth = VIDEO_WIDTH; desc.OutputHeight = VIDEO_HEIGHT;
16. Call DXVAHD_CreateDevice to create the DXVA-HD device. This function returns a pointer to the IDXVAHD_Deviceinterface.
17. 18.
hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc, DXVAHD_DEVICE_USAGE_PLAYBA CK_NORMAL, 19. NULL, &pDXVAHD); 20. 21.

22. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps. This method fills in a DXVAHD_VPDEVCAPS structure with the device capabilities. If you require specific video processing features, such as luma keying or image filtering, check their availability by using this structure.
23. 24. 25. 26. 27.
DXVAHD_VPDEVCAPS caps; hr = pDXVAHD->GetVideoProcessorDeviceCaps(&caps);
28.
29. Check whether the DXVA-HD device supports the input video formats that you require. The section Checking Supported Input Formats describes this step in more detail. 30. Check whether the DXVA-HD device supports the output format that you require. The section Checking Supported Output Formats describes this step in more detail. 31. Allocate an array of DXVAHD_VPCAPS structures. The number of array elements that must be allocated is given by theVideoProcessorCount member of the DXVAHD_VPDEVCAPS structure, obtained in step 3.
32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
// Create the array of video processor caps. DXVAHD_VPCAPS *pVPCaps = new (std::nothrow) DXVAHD_VPCAPS[ caps.VideoProcessorCount ]; if (pVPCaps == NULL) { return E_OUTOFMEMORY; }
44. Each DXVAHD_VPCAPS structure represents a distinct video processor. You can loop through this array to discover the capabilities of each video processor. The structure includes information about the deinterlacing, telecine, and frame-rate conversion capabilities of the video processor. 45. Select a video processor to create. The VPGuid member of the DXVAHD_VPCAPS structure contains a GUID that uniquely identifies the video processor. Pass this GUID to the IDXVAHD_Device::CreateVideoProcessor method. The method returns an IDXVAHD_VideoProcessor pointer.
46. 47. 48. 49. 50.
HRESULT hr = pDXVAHD->GetVideoProcessorCaps( caps.VideoProcessorCount, pVPCaps);
51. Optionally, call IDXVAHD_Device::CreateVideoSurface to create an array of input video surfaces. For more information, see Creating Video Surfaces. The following code example shows the complete sequence of steps:
// Initializes the DXVA-HD video processor. // // // // // // NOTE: The following example makes some simplifying assumptions: 1. 2. 3. 4. There is a single input stream. The input frame rate matches the output frame rate. No advanced DXVA-HD features are needed, such as luma keying or IVTC. The application uses a single input video surface.
HRESULT InitializeDXVAHD() { if (g_pD3DDevice == NULL) { return E_FAIL; } HRESULT hr = S_OK; IDXVAHD_Device *pDXVAHD = NULL;
IDXVAHD_VideoProcessor IDirect3DSurface9
*pDXVAVP = NULL; *pSurf = NULL;
DXVAHD_RATIONAL fps = { VIDEO_FPS, 1 }; DXVAHD_CONTENT_DESC desc; desc.InputFrameFormat = DXVAHD_FRAME_FORMAT_PROGRESSIVE; desc.InputFrameRate = fps; desc.InputWidth = VIDEO_WIDTH; desc.InputHeight = VIDEO_HEIGHT; desc.OutputFrameRate = fps; desc.OutputWidth = VIDEO_WIDTH; desc.OutputHeight = VIDEO_HEIGHT; #ifdef USE_SOFTWARE_PLUGIN HMODULE hSWPlugin = LoadLibrary(L"C:\\dxvahdsw.dll"); PDXVAHDSW_Plugin pSWPlugin = (PDXVAHDSW_Plugin)GetProcAddress(hSWPlugin, "DXVA HDSW_Plugin"); hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc,DXVAHD_DEVICE_USAGE_PLAYBACK_NORM AL, pSWPlugin, &pDXVAHD); #else hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc, DXVAHD_DEVICE_USAGE_PLAYBACK_NOR MAL, NULL, &pDXVAHD); #endif if (FAILED(hr)) { goto done; } DXVAHD_VPDEVCAPS caps; hr = pDXVAHD->GetVideoProcessorDeviceCaps(&caps); if (FAILED(hr)) { goto done; } // Check whether the device supports the input and output formats. hr = CheckInputFormatSupport(pDXVAHD, caps, VIDEO_FORMAT); if (FAILED(hr)) { goto done; } hr = CheckOutputFormatSupport(pDXVAHD, caps, RENDER_TARGET_FORMAT); if (FAILED(hr)) { goto done; } // Create the VP device. hr = CreateVPDevice(pDXVAHD, caps, &pDXVAVP); if (FAILED(hr))
{ goto done; } // Create the video surface for the primary video stream. hr = pDXVAHD->CreateVideoSurface( VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_FORMAT, caps.InputPool, 0, // Usage DXVAHD_SURFACE_TYPE_VIDEO_INPUT, 1, // Number of surfaces to create &pSurf, // Array of surface pointers NULL ); if (FAILED(hr)) { goto done; }
g_pDXVAHD = pDXVAHD; g_pDXVAHD->AddRef(); g_pDXVAVP = pDXVAVP; g_pDXVAVP->AddRef(); g_pSurface = pSurf; g_pSurface->AddRef(); done: SafeRelease(&pDXVAHD); SafeRelease(&pDXVAVP); SafeRelease(&pSurf); return hr; }
The CreateVPDevice function show in this example creates the video processor (steps 57):
// Creates a DXVA-HD video processor. HRESULT CreateVPDevice( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, IDXVAHD_VideoProcessor **ppDXVAVP ) { // Create the array of video processor caps. DXVAHD_VPCAPS *pVPCaps = new (std::nothrow) DXVAHD_VPCAPS[ caps.VideoProcessorCount ]; if (pVPCaps == NULL) {
return E_OUTOFMEMORY; } HRESULT hr = pDXVAHD->GetVideoProcessorCaps( caps.VideoProcessorCount, pVPCaps); // At this point, an application could loop through the array and examine // the capabilities. For purposes of this example, however, we simply // create the first video processor in the list. if (SUCCEEDED(hr)) { // The VPGuid member contains the GUID that identifies the video // processor. hr = pDXVAHD->CreateVideoProcessor(&pVPCaps[0].VPGuid, ppDXVAVP); } delete [] pVPCaps; return hr; }
Checking Supported DXVA-HD Formats

Checking Supported Input Formats
To get a list of the input formats that the Microsoft DirectX Video Acceleration High Definition (DXVA-HD) device supports, do the following: 1. 2. 3. 4. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps to get the device capabilities. Check the InputFormatCount member of the DXVAHD_VPDEVCAPS structure. This member gives the number of supported input formats. Allocate an array of D3DFORMAT values, of size InputFormatCount. Pass this array to the IDXVAHD_Device::GetVideoProcessorInputFormats method. The methods fills the array with a list of input formats.
The following code shows these steps:
// Checks whether a DXVA-HD device supports a specified input format. HRESULT CheckInputFormatSupport( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, D3DFORMAT d3dformat ) { D3DFORMAT *pFormats = new (std::nothrow) D3DFORMAT[ caps.InputFormatCount ]; if (pFormats == NULL) { return E_OUTOFMEMORY; } HRESULT hr = pDXVAHD->GetVideoProcessorInputFormats( caps.InputFormatCount, pFormats ); if (FAILED(hr)) {
goto done; } UINT index; for (index = 0; index < caps.InputFormatCount; index++) { if (pFormats[index] == d3dformat) { break; } } if (index == caps.InputFormatCount) { hr = E_FAIL; } done: delete [] pFormats; return hr; }
Checking Supported Output Formats

To get a list of the output formats that the DXVA-HD device supports, do the following: 1. 2. 3. 4. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps to get the device capabilities. Check the OutputFormatCount member of the DXVAHD_VPDEVCAPS structure. This member gives the number of supported input formats. Allocate an array of D3DFORMAT values, of size OutputFormatCount. Pass this array to the IDXVAHD_Device::GetVideoProcessorOutputFormats method. The methods fills the array with a list of output formats.
The following code shows these steps:
// Checks whether a DXVA-HD device supports a specified output format. HRESULT CheckOutputFormatSupport( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, D3DFORMAT d3dformat ) { D3DFORMAT *pFormats = new (std::nothrow) D3DFORMAT[caps.OutputFormatCount]; if (pFormats == NULL) { return E_OUTOFMEMORY; } HRESULT hr = pDXVAHD->GetVideoProcessorOutputFormats( caps.OutputFormatCount, pFormats ); if (FAILED(hr)) { goto done; }
UINT index; for (index = 0; index < caps.OutputFormatCount; index++) { if (pFormats[index] == d3dformat) { break; } } if (index == caps.OutputFormatCount) { hr = E_FAIL; } done: delete [] pFormats; return hr; }
Creating DXVA-HD Video Surfaces

The application must create one or more Direct3D surfaces to use for the input frames. These must be allocated in the memory pool specified by the InputPool member of the DXVAHD_VPDEVCAPS structure. The following surface types can be used:
A video surface created by calling IDXVAHD_Device::CreateVideoSurface and specifying theDXVAHD_SURFACE_TYPE_VIDEO_INPUT or DXVAHD_SURFACE_TYPE_VIDEO_INPUT_PRIVATE surf ace type. This surface type is equivalent to an off-screen plain surface. A decoder render-target surface, created by calling IDirectXVideoAccelerationService::CreateSurface and specifying the DXVA2_VideoDecoderRenderTarget surface type. This surface type is used for DXVA decoding. An off-screen plain surface.
The following code shows how to allocate a video surface, using CreateVideoSurface:
// Create the video surface for the primary video stream. hr = pDXVAHD->CreateVideoSurface( VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_FORMAT, caps.InputPool, 0, // Usage DXVAHD_SURFACE_TYPE_VIDEO_INPUT, 1, // Number of surfaces to create &pSurf, // Array of surface pointers NULL );
Setting DXVA-HD States

During video processing, the Microsoft DirectX Video Acceleration High Definition (DXVA-HD) device maintains a persistent state from one frame to the next. Each state has a documented default. After you configure the device,
set any states that you wish to change from their defaults. Before you process each frame, update any states that should change. Note This design differs from DXVA-VP. In DXVA-VP, the application must specify all of the VP parameters with each frame. Device states fall into two categories:
Stream states apply each input stream separately. You can apply different settings to each stream. Blit states apply globally to the entire video processing blit.
The following stream states are defined. Stream State Description
DXVAHD_STREAM_STATE_D3DFORMAT
Input video format.
DXVAHD_STREAM_STATE_FRAME_FORMAT
Interlacing.
DXVAHD_STREAM_STATE_INPUT_COLOR_SPACE
Input color space. This state specifies the RGB color range and the YCbCr transfer matrix for the input stream.
DXVAHD_STREAM_STATE_OUTPUT_RATE
Output frame rate. This state controls frame-rate conversion.
DXVAHD_STREAM_STATE_SOURCE_RECT
Source rectangle.
DXVAHD_STREAM_STATE_DESTINATION_RECT
Destination rectangle.
DXVAHD_STREAM_STATE_ALPHA
Planar alpha.
DXVAHD_STREAM_STATE_PALETTE
Color palette. This state applies only to palettized input formats.
DXVAHD_STREAM_STATE_LUMA_KEY
Luma key.
DXVAHD_STREAM_STATE_ASPECT_RATIO
Pixel aspect ratio.
DXVAHD_STREAM_STATE_FILTER_Xxxx
Image filter settings. The driver can support brightness, contrast, and other image filters.
The following blit states are defined: Blit State Description
DXVAHD_BLT_STATE_TARGET_RECT
Target rectangle.
DXVAHD_BLT_STATE_BACKGROUND_COLOR
Background color.
DXVAHD_BLT_STATE_OUTPUT_COLOR_SPACE
Output color space.
DXVAHD_BLT_STATE_ALPHA_FILL
Alpha fill mode.
DXVAHD_BLT_STATE_CONSTRICTION
Constriction. This state controls whether the device downsamples the output.
To set a stream state, call the IDXVAHD_VideoProcessor::SetVideoProcessStreamState method. To set a blit state, call theIDXVAHD_VideoProcessor::SetVideoProcessBltState method. In both of these methods, an enumeration value specifies the state to set. The state data is given using a state-specific data structure, which the application casts to a void* type. The following code example sets the input format and destination rectangle for stream 0, and sets the background color to black.
HRESULT SetDXVAHDStates(HWND hwnd, D3DFORMAT inputFormat) { // Set the initial stream states. // Set the format of the input stream DXVAHD_STREAM_STATE_D3DFORMAT_DATA d3dformat = { inputFormat }; HRESULT hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_D3DFORMAT, sizeof(d3dformat), &d3dformat ); if (SUCCEEDED(hr)) { // For this example, the input stream contains progressive frames. DXVAHD_STREAM_STATE_FRAME_FORMAT_DATA frame_format = { DXVAHD_FRAME_FORMAT _PROGRESSIVE }; hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_FRAME_FORMAT,
sizeof(frame_format), &frame_format ); } if (SUCCEEDED(hr)) { // Compute the letterbox area. RECT rcDest; GetClientRect(hwnd, &rcDest); RECT rcSrc; SetRect(&rcSrc, 0, 0, VIDEO_WIDTH, VIDEO_HEIGHT); rcDest = LetterBoxRect(rcSrc, rcDest); // Set the destination rectangle, so the frame is displayed within the // letterbox area. Otherwise, the frame is stretched to cover the // entire surface. DXVAHD_STREAM_STATE_DESTINATION_RECT_DATA DstRect = { TRUE, rcDest }; hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_DESTINATION_RECT, sizeof(DstRect), &DstRect ); } if (SUCCEEDED(hr)) { DXVAHD_COLOR_RGBA rgbBackground = { 0.0f, 0.0f, 0.0f, 1.0f };
// RGBA
DXVAHD_BLT_STATE_BACKGROUND_COLOR_DATA background = { FALSE, rgbBackground }; hr = g_pDXVAVP->SetVideoProcessBltState( DXVAHD_BLT_STATE_BACKGROUND_COLOR, sizeof (background), &background ); } return hr; }
Performing the DXVA-HD Blit

BOOL ProcessVideoFrame(HWND hwnd, UINT frameNumber) { if (!g_pD3D || !g_pDXVAVP) { return FALSE; }
RECT client; GetClientRect(hwnd, &client); if (IsRectEmpty(&client)) { return TRUE; } // Check the current status of D3D9 device. HRESULT hr = TestCooperativeLevel(); switch (hr) { case D3D_OK : break; case D3DERR_DEVICELOST : return TRUE; case D3DERR_DEVICENOTRESET : return FALSE; break; default : return FALSE; } IDirect3DSurface9 *pRT = NULL; // Render target
DXVAHD_STREAM_DATA stream_data = { 0 }; // Get the render-target surface. hr = g_pD3DDevice->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &pRT); if (FAILED(hr)) { goto done; } // Initialize the stream data structures for the primary video stream // and the substream. stream_data.Enable = TRUE; stream_data.OutputIndex = 0; stream_data.InputFrameOrField = 0; stream_data.pInputSurface = g_pSurface; // Perform the blit. hr = g_pDXVAVP->VideoProcessBltHD( pRT, frameNumber, 1, &stream_data ); if (FAILED(hr)) { goto done; } // Present the frame.
hr = g_pD3DDevice->Present(NULL, NULL, NULL, NULL); done: SafeRelease(&pRT); return SUCCEEDED(hr); }

Media Foundation

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Media Foundation

Diunggah oleh

Hak Cipta:

Format Tersedia

Media Foundation: Essential Concepts

Streams Compression Media Containers Formats Related topics

Packet headers, to enable network transmission or random access.

Media Foundation Architecture

Overview of the Media Foundation Architecture Media Foundation Primitives

Gives a high-level overview of the Media Foundation architecture.

Media Foundation Platform APIs

Media Foundation Pipeline

Media Session Source Reader

Protected Media Path

Overview of the Media Foundation Architecture

Media Foundation Primitives

Attributes and properties are key/value pairs stored on an object.

A media type describes the format of a digital media stream.

A media sample is an object that contains a list of media buffers.

Attributes in Media Foundation

About Attributes Serializing Attributes Implementing IMFAttributes Related topics

About Media Types

General overview of media types in Media Foundation.

Media Type GUIDs

Lists the defined GUIDs for major types and subtypes.

Audio Media Types

How to create media types for audio formats.

Video Media Types

How to create media types for video formats.

Complete and Partial Media Types

Media Type Conversions

Media Type Helper Functions

A list of functions that manipulate or get information from a media type.

Media Type Debugging Code

About Media Types

Major Types and Subtypes

Major Media Types

Audio Subtype GUIDs.

A stream that contains data files.

Still image stream.

WIC GUIDs and CLSIDs.

The subtype specifies the content protection scheme.

Synchronized Accessible Media Interchange (SAMI) captions.

Video Subtype GUIDs.

Audio Media Types

Audio Subtype GUIDs

Contains a list of audio subtype GUIDs.

Uncompressed Audio Media

How to create a media type that describes an uncompressed audio

AAC Media Types

Audio Subtype GUIDs

WAVE_FORMAT_RAW_A AC1 (0x00FF)

WAVE_FORMAT_MPEG_ HEAAC (0x1610)

AAC Decoder MPEG-4 File Source

WAVE_FORMAT_MPEG_ ADTS_AAC (0x1600)

WAVE_FORMAT_DOLBY_ AC3_SPDIF (0x0092)

KSDATAFORMAT_SUBTYPE_IEC6193 7_DOLBY_DIGITAL, defined in ksmedia.h. MEDIASUBTYPE_DOLBY_AC3_SPDIF,

Encrypted audio data used with secure audio path.

Digital Theater Systems (DTS) audio.

Uncompressed IEEE floating-point audio.

WAVE_FORMAT_IEEE_FL OAT (0x0003)

MPEG Audio Layer-3 (MP3).

WAVE_FORMAT_MPEGL AYER3 (0x0055)

MPEG-1 audio payload.

Windows Media Audio 9 Voice codec.

WAVE_FORMAT_WMAV OICE9 (0x000A)

Uncompressed PCM audio.

if (FAILED(hr)) { goto done; } ppManager = pD3DManager; (ppManager)->AddRef(); *pReset = resetToken;

pDXVAVP = NULL; pSurf = NULL;