Media processing is not easy. It’s more accurate to say its amazingly complicated. With byte-streams, protocols, codecs, clocks, sample formats and rendering, there’s a lot of knowledge anyone needs to “just to be able to play some media.” There are several libraries and frameworks out there that ease this task, putting only smaller burdens on developers so one may only need to know specifics about their specific domain. MediaFoundation is a media processing framework from Microsoft. It made its debut back in the Vista days and has been positioned in many ways as the successor to Microsoft’s previous media framework, DirectShow. Since it’s first release, the framework has not stood still. There have been a few things deprecated, and quite a few things added.
Metro, What’d You Do With My…
Like most familiar Windows technologies that are available in Metro, the framework has been slimmed down from it’s “desktop” API. When I first investigated the difference, I had some anxiety. A lot of what I knew of MediaFoundation and processing media was gone. Giving a closer look, things didn’t look as bad as I first felt. Some things just didn’t fit the Metro sandbox others things I found wouldn’t be missed…and hopefully you won’t miss them either.
MediaFoundation, Play Me Some Media
There’s quite a few ways to deal with reading of media in MediaFoundation…even in Metro, but each varies in complexity. The main interfaces developers will probably use are the IMFMediaEngine and IMFSourceReader.
Media Engine – Also known as the IMFMediaEngine, which is new to Windows 8. This allows for fairly high level interaction with media, but has lower level extensibility points. This is most likely the replacement for the deprecated IMFPMediaPlayer. The nice thing about this interface, is if you just want to load up an audio/video file, you only have to supply it with a source file (path, url, byte steam) and tell it to play. To render video you can simply run TransferVideoFrame and render it with D3D or other means. If you wish to make custom processing or effects, you would simply create an IMTransform and register it with IMFMediaEngine::InsertVideoEffect/InsertAudioEffect. You can even register your own custom decoders and sources via the Windows Runtime class, “MediaExtensionManager”. It may also be worth to mention that IMFMediaEngine seems to be what Microsoft uses inside the XAML MediaElement class and also in WinJS (you can see the similarity in the API).
Media Source Reader – Also known as the IMFSourceReader, which IIRC, is new to Windows 7 (but works on Vista now), but has features that are at the moment exclusive to Windows 8. This is lower level than the IMFMediaEngine, but still is fairly high level when you consider everything it is handling for you. The typical scenarios you’d use a source reader is when you simply just care about having access to media samples, whether compressed or uncompressed. This may be a case where you already have a rendering/processing pipeline outside MediaFoundation. The closest analogy is the DirectShow “SampleGrabber” filter, but much more powerful and much easier to use. One simply needs to create a media source with a url/path/byte-stream, configure it to deliver what you need (eg, uncompressed or untouched compressed samples). You call IMFSourceReader::ReadSample(…) and you are on your way.
GPUs!? How about them GPUs?!
GPUs are almost a requirement for any type of efficient decode and playback of media, specially with HD video and slow mobile processors. It’s natural to want to offload as much of the video pipeline to as possible. MediaFoundation has always supported DXVA for GPU decoding of certain media types (eg VC1 and H264) and also heavy things like deinterlacing and colorspace conversions. Historically, MediaFoundation only supported Direct3D9, but with Windows 8, it has full support D3D11, and you should utilize it in Metro applications.
To enable GPU support with MediaFoundation, you should first familiarize yourself with the IMFDXGIDeviceManager. This is usually created and configured with a D3D11 device by the developer and passed to MediaFoundation to allow synchronized access to the D3D device.
For the IMFMediaEngine, you would set MF_MEDIA_ENGINE_DXGI_MANAGER/IMFDXGIDeviceManager instance with the IMFAttributes you use to instantiate the IMFMediaEngine. The IMFMediaEngine will then use the GPU where possible to accelerate the media pipeline.
For the IMFSourceReader, it is similar to IMFMediaEngine, but instead you use MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING/IMFDXGIDeviceManager instance with the IMFAttributes with the methods you use to instantiate the IMFSourceReader. Without the “ADVANCE_VIDEO_PROCESSING” attribute, the IMFSourceReader will only return you back a format the decoder supports. For instance, the H264 codec will give you a DXGI_FORMAT_NV12, which is not very helpful if you wish to do video processing with D3D or D2D. With that attribute, you can get DXGI_FORMAT_R8G8B8X_UNORM, which can be used in Direct3D or blitted to a surface D2D supports.
Go Make Some Media Apps!
I hope to see some great media related applications in the Windows 8 App store that leverage MediaFoundation. I also hope to contribute to some of those apps too. If you are familiar with C++ and COM, you should find Metro MediaFoundation a delight to use. Just make sure to RTFM, or you will waste lots of precious development time like I did.