Pixel Shaders for Xaml Windows Runtime

Pixel Shaders first popped up in WPF around .NET 3.5 and were a replacement for the very slow BitmapEffects. This allowed developers to make GPU accelerated shader effects, such as Blur and DropShadow. When Silverlight started picking up steam, they were also added, but as a CPU (concurrent) rasterized version. Since then pixel shader effects have all but disappeared from XAML frameworks such as Windows Phone or XAML in Metro applications.

Windows 8.1 brings a lot of new features that can now facilitate pixel shaders of Xaml UIElements and is the goal of this library. Before any one gets too excited, there are limitations. Most imposed by the XAML framework and some by my library. But first, let me explain further.

How does it work?

The most trivial explanation on how this is accomplished: We use the new RenderTargetBitmap to rasterize a UIElement. The result is then loaded to a GPU surface and Direct2D is used to apply a pixel shader. The shaded surface is then presented with SurfaceImageSource. The original UIElement is left at 0.0 opacity, so it still remains interactive and makes it appear as if the shaded result is interactive.

To give a better perspective, here’s a sample of the usage:

<xamlFx:ShaderContentControl EffectRenderStrategy=”RenderingEvent”





<xamlFx:BlurEffect Radius=”{Binding Value, ElementName=amountSlider}” />


<Button Content=”BUTTON”






Notice that the implementation is a subclass of ContentControl. My original idea was to make a Blend Behavior, but to make this work I had to modify the visual tree. While this would be possible from a Behavior, it wasn’t optimal and would cause unexpected issues to the consuming developer. The ShaderContentControl will capture the UIElement set to the Content and use that as the target for the RenderTargetBitmap. Within the ShaderContentControl template is an <Image/>, that sits directly behind the set ShaderContentControl’s content. This is used to render the D3D content. Here is what the ShaderContentControl’s template looks like:

<ControlTemplate TargetType=”ContentControl” xmlns=”http://schemas.microsoft.com/winfx/2006/xaml/presentation&#8221; xmlns:x=”http://schemas.microsoft.com/winfx/2006/xaml”&gt;


<ContentControl UseLayoutRounding=”False”

Width=”{Binding Content.ActualWidth, ElementName=contentShim}”

Height=”{Binding Content.ActualHeight, ElementName=contentShim}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”>

<Image x:Name=”rendererImage”




<ContentPresenter x:Name=”contentShim”


ContentTemplate=”{TemplateBinding ContentTemplate}”

ContentTransitions=”{TemplateBinding ContentTransitions}”

Content=”{TemplateBinding Content}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”/>



Keeping the <Image/> “perfectly” synchronized with where the UIElement would be rendered proved pretty difficult. With most shaders, like a Hue/Saturation modifier, one would only have to place the <Image/> behind the UIElement and bind to its size. The problem comes when you have shader effects that draw larger than the original sample. Shader’s such as drop shadow and blur are good examples of this. If this is not handled, the D2D surface is a natural clipping area and your blur clipped, which sucks. To overcome this, a shader can supply a padding value which supplies how much the shader will grow on the left, right, top and bottom and the ShaderContentControl will handle the rest.


Performance is always a big concern but this is one of those limitations. Though the RenderTargetBitmap in XAML is the fastest relative to all other implementations, it still accounts for almost all the CPU overhead here. On top of that, RTB will map GPU memory to system memory to retrieve the pixel buffer, just to be sent back to the GPU by me. The other issue, there’s no way to tell a UIElement has visually changed, so a constant re-rasterization needs to be done (in some cases). That all being said, it does perform well, but there’s gotchas:

The ShaderContentControl has a property called “EffectRenderStrategy”, which can take 3 enums:

  1. RenderingEvent – Renders on every CompositionTarget::Rendering event
  2. Conservative – Renders when a layout updated has occurred or the pixel shader has been invalidated
  3. Manual – Renders only when ShaderContentControl->Render is called

Known Issues?

  1. RenderTargetBitmap::RenderAsync has issues with rendering certain elements.
  2. RenderTargetBitmap::RenderAsync has a problem with items in a ScrollViewer, such as a ListBox. I have a repro example here.
  3. Shaders are not extensible outside the library. This could be resolved, but I may have to do a lot of work abstracting Direct2D or making it more ABI friendly.
  4. This is a project-to-project library at the moment. I may make it an extension SDK later
  5. I only wrote a drop shadow and blur effect. Other effects coming soon.

Where do I get it?


Some of my XAML Windows 8.1 Blend Behaviors

It’s been a while since I’ve blogged any .NET stuff.  At the moment I’m jumping around and wanted to share a few behaviors I made that I think can be useful or fun.

ZoomToSelectedBehavior – This is a very easy to setup behavior you put on any Selector.  Once an item is selected a render transform is animated to perfectly size the selected element to the size of the Selector.

DistanceValueBehavior – This is my favorite.  There’s been so many design situations where I simply wanted to apply an effect based on how far an element is to something else.  This allows you to create things like parallax effects on a per-item in an ItemsControl.  If you remember the ZuneHD had similar effects to really add some depth in the UI (like scrolling music lists, the “Play” icons would scroll faster).  This is not a simple binding to a ScrollViewer’s Horizontal/VerticalOffset as not all situations you want to use this have ScrollViewers and that value does not take into account the “touch-pull-elastic-effect”.  The behavior pumps out values of ~0 to ~1.0.  Value converters are made to customize custom situations.

AncestorProviderBehavior – WPF has the ability to bind to an element’s ancestor based on type and other parameters.  This was really useful in templated controls, where an element who’s property you wanted to bind to is out of the namescope.  Store XAML doesn’t have this ability and none of the RelativeSource parameters fit all needs.  This behavior allows you to bind to an ancestor element by Type, Name or Type and Name along with an ancestor level.  I haven’t fully tested this one, but the demo seem’s to work.

Here’s the sample app and behaviors.

Download Behaviors Here

Direct2D GUI Library–Graphucks



Say What?

A few months ago, I started writing a Windows 8 application in my free time.  I am aiming it to be a video editing application, where users can cut, mix, add audio tracks and effects.  I got far along enough on the video encoding and decoding libraries where I had to start to focus on video mixing and effects.

To create a good video mixer that performed well, I needed to lean on D3D and D2D.  Video samples from each source would would be composed in a D3D scene and the final output would be rendered to the screen and also sent to my encoder library.

When I started working on the mixer, I felt it would be a good idea to make this code reusable for not only composing videos, but to spice up my application’s UI as there was quite a bit of visual “wow” XAML has left out, such as shaders, blending and a “WriteableBitmap.Render(…)”.  Even if XAML had all these, I would still need a D2D library as I need to be able to have frame-by-frame control.

As I started building, the first run of it didn’t look so dissimilar to an XNA game template, but as I continued iterating it, it began to look a lot more like a GUI library…and kinda-sorta like WPF.

XAML Style Layout – Measure, Arrange, Oh My!

If you don’t know about Measure and Arrange from XAML UI frameworks, go look it up.  The layout engine a very powerful mechanism to simplify how UI elements get arranged on the screen.  It’s an extensible model which allows anyone to write a Panel subclass and create any layout they desire.  I wanted Graphucks to be simple like this so I made sure to have this built in there.  Internally, layout is a little more complex than what is required of a Measure/ArrangeOverride.  The engine must handle situations like clipping, for when an element’s desired size is greater than the available size.  It must handle margins (negative ones too!).  It also must handle Vertical/Horizontal alignment.  Ultimately it must handle the final calculation of where the element is placed.

Right now Graphucks has a Panel base class, StackPanel, RadialPanel and a MultiContentPanel.  MultiContentPanel is a temporary solution until I build a Grid panel in.  I didn’t have a required dependency property system until now (more on this later).

Resource Handling

Creating resources in Direct2D and DirectWrite are quite different than what you do in XAML development.  For instance, in XAML you never have to worry about recreating resources if a device is loss.  In Direct2D, you do.  In Direct2D you also create resources from a factory.  In XAML you create a resource, such as a brush, by instantiating a “new Brush”.   Also in D2D and DWrite, a lot of things are immutable, such as geometries.  In XAML you just set properties and forget about it.   I wanted to keep with this intuitive XAML API, so all resources are lazy loaded.  This means that no D2D/DWrite resource is created until it NEEDS to be used.  All resources are fully “retained mode” and keep enough information on resource state to recreate. If an immutable property of a resource changes, it is marked as dirty and recreated again when it is needed to be used.  For device-dependent resources, this happens on a render pass.  For device-independent resources, this can be on a render pass, or even a hit test on geometry/text. This is transparent to the API consumer.

The one side effect to this, is it hides what calls and situations can be “heavy” or not.  But I’m giving you the source code, so you can figure it out yourself.

XAML Brushes

It wouldn’t be any fun if we didn’t have brushes.  These include SolidColorBrush, LinearGradientBrush, RadialGradientBrush and ImageBrush right now.  These were the easiest as they are all direct abstractions of Direct2D.  I would like to make a VisualBrush and/or a VideoBrush, but I need to make a MediaElement first.

The trick with Direct2D and XAML is how XAML maps a brush over geometry content.  XAML will map a brush to bounds of the drawn geometry and then apply any transformations.  For the most part, bugs withstanding, I think I got this mostly implemented.  The API need’s a little more TLC though to fit with newer parts of the library better.


To be anything like XAML we have to be able to render vector graphics.  I have most of the Direct2D geometry stuff wrapped up to work nicely with the Graphucks library.  I do not yet have a WPF-like “Path” element to easily render complex vector graphics, but it can easily be made and is on my list of to-dos.

As explained earlier, resources are lazy loaded, so it’s the case with geometry.  Modifying geometry isn’t that much bigger of a hit compared to re-rendering the same geometry, frame by frame.  This is because the biggest hit there is Direct2D will re-tessellate the geometry each draw.  To mitigate this, I have feature to cache geometries to an A8 texture if desired.  An A8 texture is simply an alpha-only texture on the GPU.  Because it is only an alpha, it uses 3 times less memory than an RGBA.  It still is rather large if used in abundance.  The only other built-in way to cache geometry in Direct2D is with a mesh.  The upside to a mesh is it only caches vertex data, which is much smaller than an A8 and also D2D may be able to coalesce subsequent draw calls for better performance.  The downside is you must render them without anti aliasing.  Which means quality is compromised.  Quality of the A8 is lost when scaled larger, but not if just translated or rotated.

If this geometry cache feature is enabled (property of the geometry base class or the ShapeElement), modifying the geometry does become much heavier.



Surely lacking a lot here.  Hit testing is implemented and seemingly working, but I simply don’t have any events in there beyond “PointerDown”.  Even the eventing implementation is a “just make it work to test something else” implementation.  More event’s aren’t difficult to add though.  The way I’m hoping Graphucks to work generally, is to be able to embed it into any existing UI framework that supports rendering D3D/D2D or to completely stand alone.  Desktop applications, Metro apps, Windows Phone 8 (if they ever open up D2D).  What this means in the context of input is that how it receives things like “pointer moved, pointer down, key press” needs to be able to come from anywhere.  For instance, a win32 message, winforms event, xaml event, etc.  Once these events are sent to the “Graphucks Root”, it will do it’s own hit testing and input logic.

Rendering System

Direct2D makes this incredibly simple, but my code isn’t without some flaws here, which I’ll explain.  After the Measure/Arrange and input has happened, rendering starts at the root element of a parent-child tree of “VisualElement”s.  Each VisualElement runs a RenderOverride and is passed a rendering context.  This is very similar to WPF, the biggest difference is the Graphucks render context is a true immediate mode system, where WPF’s is deferred to the render thread and it’s copy of the visual tree.

This system is very simplistic, until you actually come across the need for an intermediate render target (IRT).  You need IRTs for lots of things, such as doing effects, blending or “BitmapCache”.  The existence of these throws a wrench in the whole deal.  For instance, when rendering to an IRT, you must keep the transform data relative to the IRT for rendering and a totally separate one for hit testing.  Also (an annoying current bug), is if the size of the IRT isn’t correct (size incorrect due to not taking things like RenderTransform or child render transforms into consideration), it forces clipping to the IRT, even when the layout doesn’t call for it.  Another issue is with blending, or as Direct2D calls them, composition modes.  Composition modes are like, “Plus”, “XOR”, “MaskInvert” that blends the source and destination pixels based on some simple algorithm.  This is a problem with my rendering code.  Right now if VisualElement-Root (backbuffer) has a VisualElement-A child that is an IRT, when the IRT is drawn to the backbuffer, the effect is “correct”.  If VisualElement-B has a child that is an IRT, well when B gets blended with A, then have that output blended to the root, the result is not “correct”.  Still a bit of work to be done here!

Shader Effects

Shader effects were probably the easiest thing to implement.  Direct2D does all the work for you.  I do still have a “one effect per element” rule like WPF, but because D2D is so awesome, they are much more flexible.  You can easily chain together a “drop shadow + desaturation + blur” in a single re-usable “ElementEffect”.  Blur effect never looked so simple!  I need to be honest though, I am talking about built in D2D effects (which come with 50 you can chain together in diff configs).  If you want to use your own shader code, you still have to author a D2D effect and the rest fits right into Graphucks.

Dependency Property System

This is something I didn’t plan on making at first, but you really start seeing the value of them when you don’t have them!  Some great things like Grid and Canvas need at least attached properties to be useful…unless you spend a lot of time writing things like “GridItemContainer” classes.  They will also come in handy for making an animation framework, like XAML’s (not even started) or even binding.

This area is pretty new also and only started implementing them in certain areas of Graphucks.  Property inheritance does seem to work now.  I’m not sure I’m totally happy with using a shared_ptr<void> as my “value abstraction” and been thinking about internally wrapping values in a class that can provide richer information on the types.  Dependency properties here are heavy at the moment, but like most of this library, there’s lots of room for optimization.

I do have some classes that are special handled in the DP, like “DependantObject” (not to be confused with DependencyObject).  It’s used so a resource, like a Brush, can invalidate a visual that is using it.  It’s kind of a hack to get around the fact that resources aren’t part of the property inheritance chain.

What next?

I don’t have much free time between home and work, but I’ve been trying to at least put in a few hours a week on this because it’s something I need and happy to share.  I think immediately I’ll try to write some quick C# WinRT samples on how quickly do simple things that you can’t in XAML, such as “Text with a drop shadow” or “Effect on an image.”  Or even, *gasp* a RadialGradientBrush.  I’d also like to get Grid and Canvas in there, animation, maybe bindings, fix the many rendering/layouts bugs.  More text features are low hanging also.  Thinking about everything is making my head spin…


Writing a Win8 App: Part 1 of Many

I previously wrote about how I was starting a Metro application, for learning and fun’s sake.  I talked how I wanted to exercise some knowledge in multimedia and get comfortable with all the WinRT goodies.  I also covered that I wanted to make a video editing application.  That idea, even now, is pretty abstract.  The loose goals are “something like Adobe Premiere, Avid for iPad or iMovie”.  I’m making this app as I go along.  This specific post digs into a lot of Media Foundation and I sometimes make a lot of assumptions on everyone’s knowledge (and willingness to read about something as boring as COM media APIs).  My hope is cover some thought process and even a few humps in the road so others may not have to endure.

Making a Video Editing Engine

Any video editing application is going to need a way of dealing with media.  The term “engine” might be a little strong in this case, which in reality would be a set of components.  These components are suited for specific tasks related to processing media, like reading media attributes, processing an audio/video stream and decompressing it, encoding, mixing of audio and video, and of course playback.  Metro allows for a subset of MediaFoundation that will facilitate a large portion of the multimedia features.  If you are familiar with Media Foundation, you will notice the subset is very trimmed down from it’s desktop counterpart.  My initial reaction, was “OMG!  How am I going to do anything without XYZ?!”, but I found the subset and tradeoff to eventually be beneficial and just easier.  Some areas required some flexing of the API, but it’s hard to complain when you get so much more for “free”.

My first look at how this editing engine would be made was to see what existing infrastructure existed in Metro already.  There’s a transcode API and MediaElement, both of which can take a custom Media Foundation Transform (MFT) to add effects.  I can see using the transcode API in some situations in this app, but in reality, by the time we are done, we’ll have “rewritten” all that functionality (plus more).  Unfortunately transcode and MediaElement do not allow the kind of flexibility we want in this application.  To get the control we really need, we’ll need to mainly use the IMFSourceReader, IMFWriterSink, a custom media source, Direct3D and the IMFMediaEngine.

The Media Type Detector

All media container formats (WMV, MOV, AVI, MKV, etc) contain important information about the media it houses.  For this app, we need to know things like, “What kind of media streams do you have?” and “What compression format are the streams”, “What is the duration of the stream?”.  This kind of data is useful to be able to actually decode the media, but it also nice to have so it can be displayed to the user.  With this component we can discover video resolutions, so we can property initialize other parts of the application.  It is also possible to discover multiple audio tracks in case the user wishes to include those within the video editing session.

DirectShow had a built in class to do this, called IMediaDet.  MediaFoundation has the same functionality, but it’s rolled into the IMFSourceReader.  The source reader handles quite a bit, which I’ll cover next, but for this requirement, we are only using it’s GetCurrentMediaType/GetNativeMediaType functionality. 

Note on Metro and IMFSourceReader:  Because of the sandbox restrictions, you may find creating a IMFSourceReader using MFCreateSourceReaderFromURL problematic.  You will want to use MFCreateSourceReaderFromByteStream, which takes an IMFByteStream.  You can use any WinRT random access stream, first by using MFCreateMFByteStreamOnStreamEx.  That function will wrap your WinRT stream with an IMFByteStream.  It is also useful in other areas of Media Foundation that use/require IMFByteStream

The Media Readers

Reading media is where things get a little more complicated.  This component needs to handle everything from parsing the container, media samples, decoding.  In the case of video, the GPU should be leveraged as much as possible for using accelerated decoding and colorspace conversion.  For this, like the media type detector, we use the IMFSourceReader.   The source reader handles everything you need to read and decode media almost automatically, at least media supported by installed codecs.  If it’s a video stream, you can tell it to decode to a specific colorspace (eg RGB32).  If it’s audio, you can have it down/up sample, and even covert the number of channels.  This is extremely helpful when you need a specific format to do audio/video mixing, and also for outputting to specific encoders.

The IMFSourceReader has a method called ReadSample.  This will return the next consecutive media sample.  If your media contains an audio track and video track, ReadSample(…) could return an audio sample, or a video sample in the order it was interleaved into the media container.  This is great for some cases, but we have an issue.  What if a user wishes to move an audio track +/-5 seconds relative to the video?  What if the user only wants to use a video stream from the media?  Or just an audio track?  The solution here is to write an “Audio Sample Reader” and a “Video Sample Reader”.  Each is independent of each other, which allows for the greatest flexibility, independent seeking, and overcomes a situation where the IMFSourceReader will queue up media samples if you do not “ReadSample” from a specific stream. 

With the “Video Sample Reader”, I needed to make sure performance is key.  I wrote how to achieve GPU acceleration with IMFSourceReader previously.  Not all codecs allow for GPU decoding, but H264 and VC1 should in most cases along with YUV –> RGB conversions.  When you do have a GPU enabled IMFSourceReader, ReadSample(…) will give you an IMFDXGIBuffer in the IMFSample’s buffers.  I have settled on GPU surfaces as the base format returned by my “Video Sample Reader”.  They work to render to XAML, and they work to sending to encoders.

The “Audio Sample Reader” is slightly less complex.  There is no GPU acceleration for audio decoding in Media Foundation Winking smile.  I simply let the consumer set the output format (channels, sample rate, bits per channel), and Media Foundation diligently uncompressed and coverts to that format.  Setting all audio streams to decode to a single uncompressed format will simplify the process of making our own custom audio mixer later. 

The Media Writer

To encode media, I am using the IMFSinkWriter.  With this interface one can simply specify an output stream or file, configure audio and video streams encoder parameters and just send uncompressed samples to it.  The rest is magically handled for you.  Though this API is very easy to use, there are plenty of “gotchas”.  The first is you need to be very aware of what container formats can contain certain streams.  For instance, an MP4 container cannot hold a VC-1 video stream.  WMV (aka ASF) can technically hold “any” format, but compatibility with many players is minimal.  The second issue is you must be very aware of what parameters to use for the codec.  Some audio codecs only work with very specific bitrates, not all are published fully either.  You can find valid WMA bitrates and information in my rant here.

There is also a more subtle “gotcha” that I came across.  When I first tested the writer, the output video looked very “jerky”.  I double and triple checked the time stamps, but everything looked fine.  I found that every time I sent a video sample to the writer, it would not encode immediately and required at least a few samples in the queue to do its job.  In essence, I kept modifying the output before it got encoded.  To solve this, I created a IMFVideoSampleAllocator.  Before I would write the video sample, I would first grab a sample from the allocator, copy the source sample it, then send the copy to the sink writer.  Once the sink writer is finished with the sample, it will return to the allocator.

The Media Sample Player

Now I have a “media type detector”, “media readers” and a “media writer”.  This is the bare essentials for being able to transcode any format to any format.  That is important when a user wishes to compile their editing project to a media file as fast as the device can.  What we need now is a way to show a “live preview” to the user.  “Media readers” will just read as fast as it can, do not render audio and do not keep the audio and video in sync.  Audio rendering is not a problem with Metro APIs, but writing clocks and doing stream syncing and doing correct timing is not easy or fun.  The solution I came up with is to leverage the IMFMediaEngine and write a custom media source. 

The IMFMediaEngine can be configured to use a custom media source.  Typically folks may use this functionality to write a custom protocol (eg RTSP, RTP, etc) that plugs right into the Media Foundation pipeline.  For our situation, we want to make it a “virtual media source”.  By “virtual”, I mean the media source streams don’t really exist as a file, but is created on the fly from a user’s video editing session (picture a non linear video editor UI).  It receives output from the audio and video mixers that I have yet to discuss.  So think of this virtual source as a virtual file and when IMFMediaEngine is told to play the next sample, the mixers will compose an audio or video sample based on the current media time.  This could involve reading a new sample from the “media readers”.  If the IMFMediaEngine is told to seek, the application will virtually seek within the editing session, rendering audio and video from the mixer. 

One “gotcha” with this setup, is IMFMediaEngine will not let you directly instantiate your custom media source.  This is a requirement for me as I needed to make sure I can pump it with media samples to be rendered.  The work-around was to implement IMFMediaEngineExtension and register it with the MF_MEDIA_ENGINE_EXTENSION when you create the IMFMediaEngine.  When you tell a IMFMediaEngine to open a custom URI, eg “myproto://”, it will call into you, where you can instantiate your custom source, keep reference to it and pass it back to the IMFMediaEngine.

Note:  When IMFMediaEngine calls your custom IMFMediaStream::RequestSample(…), DO NOT synchronously call a IMFSourceReader::ReadSample(…).  It will deadlock.  Instead put IMFSourceReader in async mode, or use something like MS ppl library to asynchronously call IMFSourceReader::ReadSample.

What’s Next?

At this moment, this is all I have written.  I plan on tackling the infrastructure for the video mixer next, which involves the fun stuff of things like Direct3D/2D.  I will post on this when I have something more solid.  It will be exciting for me as I can finally demonstrate the output of the system-as-it-stands.  Right now, to you, it’s just a long, rambling blog post Smile


I’m Building a Windows 8 App

I’m not alone in downloading and eagerly using all the Windows 8 betas.  Spending nights, perusing the documentation for new features.  Using up what little time I have extra to play with APIs.  I’m not silent or original in calling out this Metro, tablet-touch app model’s shortfalls.  I’m not even particularly keen on much of what Microsoft has deemed Metro design principals (Metro Design v1?).  On the other side, with my developer hat on, I see something that others see also.  The Metro platform is part of Windows.  It’s deeply rooted in it’s subsystems.  It’s littered throughout SDK header files.  WinRT support is in the .NET CLR and IE.  It’s not some-flavor-of-the-month monstrosity.  Silverlight was not Windows.  Metro is.

Why are you building a Windows 8 app?

I don’t have an immediate, pressing request to make a Windows 8 application.  I’m not doing it for a particular business idea.  I’m doing it to stay literate with Windows.  There will be a time, soon, when I have a hard business requirement to support Windows 8.  This isn’t a hypothetical and I’m not going to ignore it.  There’s also a second reason.  Fun.  Developers understand that we can do the same type of thing we do at work for enjoyment.  It’s not like “asking a mail man to go for a walk” for us.

What kind of application are you attempting to build?

I’ve spent a lot of my career dealing with multimedia.  It seems to only be natural that I attempt to build something around what I know.  Putting just a little bit of thought into it, I’ve decided to build a video editing application.  I’ve had this itch for a while now, but feel this is the first time Microsoft has given me the toolset to make a mobile video editing app I may be proud of.  Windows Phone 7 is a particularly weak developer platform in this context.  WPF/Win7 tablet wouldn’t make for a very fun experience for users either.  We developers never are in short supply of complaints over the Metro API, but in my humblest opinions, this is the strongest mobile platform the company has ever released.

WinJS or XAML?  Please tell me so I can edify my feelings on my favorite platform or berate you on yours!

If you must really know, I plan on using XAML.  I chose XAML for a number of reasons, but the most compelling reason was that it fit the requirements for this application.  When dealing with video, you want to leverage the GPU as much as possible.  From decoding, colorspace conversions to rendering.  WinJS has done an incredible job at keeping parity with XAML on being able to add video/audio effects.  WinJS can interact with C++ components, which turn can use the GPU.  The problem with WinJS, is there is no (sane) way to render D3D to the screen.  Beyond just the “video” aspect, D3D and Direct2D are a form of escape hatch, where you wish to move beyond the design or performance limitations of XAML or HTML. 

So XAML…C# or C++?

C++ of course.  This was an easy pick for me also because, again, it fit my requirements for this application.  There is naturally going to be a lot of C++ code in this project as it uses quite a bit of Metro-safe COM/Win32 API.  Also there are obvious performance and battery benefits to consider in such a resource intensive mobile application.  Surely I could write C++ WinRT components to consume in .NET, but honestly, that sounds like more work.  Right now I have five static libraries and wrapping them in an ABI safe component is just another thing I don’t have to worry about.  I also chose C++ so ya’ll don’t just decompile it, convert it to Chinese and re-submit it.

So this is going to be closed source?

Even though I’m making this app for fun in what little time I have each week, I am going to be making it as a closed source application.  On the other hand, I plan on doing blog posts as I develop it, explaining how I am making it.  Even though it’s natural to be guarded against possible “app competitors”, I do think sharing knowledge is important also.  This is more pronounced with the fact that I do not expect to become rich off this. Rest assured my next blog post on this subject will be heavy into MediaFoundation and Direct3D.

Mixing the Concurrency PPL Tasks with “Regular” C++ Callbacks

If you are doing C++ programming in Windows 8 and don’t know about the PPL tasks and how they make asynchronous programming much easier, then this post is the wrong starting point for you.

Callbacks and PPL…What’s the Problem?

What is great about the PPL tasks is being able to chain tasks using the .then(…).  This makes asynchronous order of execution much easier to understand and write.  A problem I came across the other night was how do I combine regular C++ callbacks, whether they be COM callback interfaces, or just a C++ function pointer callbacks?  This was difficult with my shallow knowledge of the PPL because the lambda you pass to the task<> constructor is scheduled for execution immediately.  This doesn’t play well when you don’t want your task to execute immediately and fire off the “.then(…)” chain.  You want that to happen when you get a callback from somewhere else!

What’s a Solution?

The one solution I’ve found is the task_completion_event.  The task<> class has a constructor overload that takes this type.  Simply said, a task initialized with a task_completion_event does not execute anything initially, but waits for the task_completion_event::set(…) to be called.  Once this happens, your task will continue on with it’s .then(…) chain.

Below is a very generic example of the task_completion_event.  In a real application, you may have to add a little bit more management internally to make sure your callback executes the correct completion event that corresponds to another native callback.


MediaFoundation for Metro

Media processing is not easy.  It’s more accurate to say its amazingly complicated.  With byte-streams, protocols, codecs, clocks, sample formats and rendering, there’s a lot of knowledge anyone needs to “just to be able to play some media.”  There are several libraries and frameworks out there that ease this task, putting only smaller burdens on developers so one may only need to know specifics about their specific domain.  MediaFoundation is a media processing framework from Microsoft.  It made its debut back in the Vista days and has been positioned in many ways as the successor to Microsoft’s previous media framework, DirectShow.  Since it’s first release, the framework has not stood still.  There have been a few things deprecated, and quite a few things added.

Metro, What’d You Do With My…

Like most familiar Windows technologies that are available in Metro, the framework has been slimmed down from it’s “desktop” API.  When I first investigated the difference, I had some anxiety.  A lot of what I knew of MediaFoundation and processing media was gone.  Giving a closer look, things didn’t look as bad as I first felt. Some things just didn’t fit the Metro sandbox others things I found wouldn’t be missed…and hopefully you won’t miss them either. 

MediaFoundation, Play Me Some Media

There’s quite a few ways to deal with reading of media in MediaFoundation…even in Metro, but each varies in complexity.  The main interfaces developers will probably use are the IMFMediaEngine and IMFSourceReader

Media Engine – Also known as the IMFMediaEngine, which is new to Windows 8.  This allows for fairly high level interaction with media, but has lower level extensibility points.  This is most likely the replacement for the deprecated  IMFPMediaPlayer.  The nice thing about this interface, is if you just want to load up an audio/video file, you only have to supply it with a source file (path, url, byte steam) and tell it to play.  To render video you can simply run TransferVideoFrame and render it with D3D or other means.  If you wish to make custom processing or effects, you would simply create an IMTransform and register it with IMFMediaEngine::InsertVideoEffect/InsertAudioEffect.  You can even register your own custom decoders and sources via the Windows Runtime class, “MediaExtensionManager”.  It may also be worth to mention that IMFMediaEngine seems to be what Microsoft uses inside the XAML MediaElement class and also in WinJS (you can see the similarity in the API).

Media Source Reader – Also known as the IMFSourceReader, which IIRC, is new to Windows 7 (but works on Vista now), but has features that are at the moment exclusive to Windows 8.  This is lower level than the IMFMediaEngine, but still is fairly high level when you consider everything it is handling for you.  The typical scenarios you’d use a source reader is when you simply just care about having access to media samples, whether compressed or uncompressed.  This may be a case where you already have a rendering/processing pipeline outside MediaFoundation. The closest analogy is the DirectShow “SampleGrabber” filter, but much more powerful and much easier to use.  One simply needs to create a media source with a url/path/byte-stream, configure it to deliver what you need (eg, uncompressed or untouched compressed samples).  You call IMFSourceReader::ReadSample(…) and you are on your way.

GPUs!?  How about them GPUs?!

GPUs are almost a requirement for any type of efficient decode and playback of media, specially with HD video and slow mobile processors.  It’s natural to want to offload as much of the video pipeline to as possible.  MediaFoundation has always supported DXVA for GPU decoding of certain media types (eg VC1 and H264) and also heavy things like deinterlacing and colorspace conversions.  Historically, MediaFoundation only supported Direct3D9, but with Windows 8, it has full support D3D11, and you should utilize it in Metro applications.

To enable GPU support with MediaFoundation, you should first familiarize yourself with the IMFDXGIDeviceManager.  This is usually created and configured with a D3D11 device by the developer and passed to MediaFoundation to allow synchronized access to the D3D device.

For the IMFMediaEngine, you would set MF_MEDIA_ENGINE_DXGI_MANAGER/IMFDXGIDeviceManager instance with the IMFAttributes you use to instantiate the IMFMediaEngine.  The IMFMediaEngine will then use the GPU where possible to accelerate the media pipeline.

For the IMFSourceReader, it is similar to IMFMediaEngine, but instead you use MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING/IMFDXGIDeviceManager instance with the IMFAttributes with the methods you use to instantiate the IMFSourceReader.  Without the “ADVANCE_VIDEO_PROCESSING” attribute, the IMFSourceReader will only return you back a format the decoder supports.  For instance, the H264 codec will give you a DXGI_FORMAT_NV12, which is not very helpful if you wish to do video processing with D3D or D2D.  With that attribute, you can get DXGI_FORMAT_R8G8B8X_UNORM, which can be used in Direct3D or blitted to a surface D2D supports.

Go Make Some Media Apps!

I hope to see some great media related applications in the Windows 8 App store that leverage MediaFoundation.  I also hope to contribute to some of those apps too.  If you are familiar with C++ and COM, you should find Metro MediaFoundation a delight to use.  Just make sure to RTFM, or you will waste lots of precious development time like I did.