Mono on Windows IoT Galileo

Over the last week, since my Intel Galileo dev board arrived, I’ve been trying to get mono ported over to Microsoft’s “Windows on Devices”.  With a bit of recent success, I wanted to share the experience and give you something to download. 

tldr;Download link at the bottom of the post

Why Mono on Windows?  Doesn’t Microsoft have their own CLR and why not just use a Raspberry Pi- like board?

Right now Microsoft’s Windows on IoT (let’s call it mincore until we are corrected), there is minimal .NET support.  The CLR is there, but there’s little to no supporting libs.  In short it’s either incomplete, or perhaps just waiting for .NET native support.  I don’t want to wait around to see what Microsoft is doing with this OS.  I just paid $50 for this device, so I want to do something besides blinking LEDs. 

So why not just use an already Linux supported device running mono?  For some people, this may be a no-brainer and probably already doing it.  For others like me, we have a good chunk of dependencies on Microsoft technologies and/or APIs, which also may include our own C and C++ libraries.  So naturally, something I want is a cheap low powered device I can mostly recompile and not muck up a stable codebase in the process.  Mono on this mincore Windows just may fit, even if temporary.

What is this mincore-Windows-IoT and what’s involved in porting Mono over to it?

There’s little out there that really tell us the direction for this Microsoft OS.  But what we can tell at this point is that its SMALL.  The compressed WIM file is 171 MB!  So you can imagine how much “stuff” was cut out of this.  There’s no shell32.dll at all.  Some Win32 functions are totally gone or moved to different DLLs.  Mono already supports windows, but in the WIN32 build configuration it’s runtime and BCL use a good share of Win32 API.  It should also be noted, the specific Galileo CPU doesn’t support any modern instruction set.  No MMX, SSE, or the like.  Luckily in this case this just came down to compiler flags.

The first step I took in trying to get mono working was simply to compile it using cygwin/mingw.  I don’t have much experience with gcc, but I couldn’t get a single build of the mono runtime to execute without an “illegal instruction” error.  This pointed the problem to compiler switches (-march=i586 in this case), but no matter what I tried, I got no love.  On the plus side, this process did create working Mono BCLs.

I just about gave up, when I decided to see if anyone used the MSVC compiler on Mono, which is when I found out I should have RTFM from the beginning.  There’s an msvc folder ready to go in their source tree, with Visual Studio .sln and project files ready to go.  Changing the compiler flags to /arch:IA32 on all the projects, but also adding mincore.lib to the linker with additional options such as:

  -d2:-nolock /NODEFAULTLIB:ole32.lib /NODEFAULTLIB:kernel32.lib /NODEFAULTLIB:user32.lib /NODEFAULTLIB:advapi32.lib

The linker stuff is very important.  The mincore.lib appears to provide exports for moved win32 functions.  For instance, CoInitializeEx used to exist in ole32.dll, but now that the dlls is gone, it’s in api-ms-win-core-com-l1-1-1.dll.  the /NODEFAULTLIB:lib just tells the linker to ignore the standard win32 stuffs and use the ones defined in mincore.lib.

Figuring out what Win32 methods just are missing was another issue.  At runtime, when windows loads up a DLL, it enumerates through it’s imports to verify them, and if one export is missing, you app will get a 0xC0000139, missing entry point.  To find out which methods these failing at is easy. 

reg ADD “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager” /v GlobalFlag /t REG_DWORD /d 2

That turns on Windows loader snaps when you debug an application after a reboot.  This means running an app is very verbose, but you get the missing Win32 you were looking for.  One the missing method was identified, that method usage in mono could be changed to the alternative win32 method (using the win32Ex alternative), or the calling method in the mono source was otherwise modified.

What about p/invoke stuff in the BCLs?

There’s also a good amount of DllImports in the Mono BCLs.  Win32RegistryApi.cs is a good example of an internal class that is used quite a bit that will fail.  They point to entry points in advapi32.dll, when they have been moved to API-MS-WIN-CORE-REGISTRY-L1-1-0.DLL.  I didn’t fix every BCL DllImport yet, but did at least patch this one because how much it is used.

What isn’t working with mono besides stuff that doesn’t already work with mono?

I can’t think of much that isn’t working now that I figured out some mistakes I made early on.  COM interop should be working now.  Some windows security stuff may be broken.  EnumProcesses doesn’t exist afaik, so that won’t work.  It’s a large test surface, with a lot of variables, but hopefully it will get you by until a supported CLR comes from MS.

Where’s the code? You gotta hook it up.  The power of GPL compels you!

It’s coming, I promise.  The problem is I hacked up so many things, just trying to get it to work, I’m working on cleaning it up.  I want it to be compatible the rest of the mono build system, only enabled with some preprocessor flags  Shouldn’t be much longer than a week.

Where’s the bins, I wanna try it out!

I uploaded the latest port here: https://dl.dropboxusercontent.com/u/4165054/mono_iot.zip, Enjoy (remember, JITTing on this is SLOW)

Also remember this environment variable if you like verbose output:  set MONO_LOG_LEVEL=debug

PS, Can you port Node.js to mincore for me?

No I will not.

EntranceThemeTransitionBehavior Behavior for WPF

I really liked the xaml transition themes in Windows 8.  They give a nice way to have consistent animations from within your application and through application to application.

One transition theme I really like in Xaml was the entrance theme.  One feature of this theme is it has an “IsStaggeringEnabled” in which it will stagger the animations of items within the same container, giving an incremental loading effect.

I wanted to use this effect in WPF and here it is.  I also added a few more features for controlling the animation that Win8 Xaml doesn’t include (and for good reason).

http://1drv.ms/Pkca5H

Please note that virtualization of things like ListBox can cause the effect to go overboard, so you may want to use: VirtualizingPanel.VirtualizationMode=Recycling attached property.

Pixel Shaders for Xaml Windows Runtime

Pixel Shaders first popped up in WPF around .NET 3.5 and were a replacement for the very slow BitmapEffects. This allowed developers to make GPU accelerated shader effects, such as Blur and DropShadow. When Silverlight started picking up steam, they were also added, but as a CPU (concurrent) rasterized version. Since then pixel shader effects have all but disappeared from XAML frameworks such as Windows Phone or XAML in Metro applications.

Windows 8.1 brings a lot of new features that can now facilitate pixel shaders of Xaml UIElements and is the goal of this library. Before any one gets too excited, there are limitations. Most imposed by the XAML framework and some by my library. But first, let me explain further.

How does it work?

The most trivial explanation on how this is accomplished: We use the new RenderTargetBitmap to rasterize a UIElement. The result is then loaded to a GPU surface and Direct2D is used to apply a pixel shader. The shaded surface is then presented with SurfaceImageSource. The original UIElement is left at 0.0 opacity, so it still remains interactive and makes it appear as if the shaded result is interactive.

To give a better perspective, here’s a sample of the usage:

<xamlFx:ShaderContentControl EffectRenderStrategy=”RenderingEvent”

HorizontalAlignment=”Left”

HorizontalContentAlignment=”Stretch”

VerticalAlignment=”Stretch”>

<xamlFx:ShaderContentControl.Effect>

<xamlFx:BlurEffect Radius=”{Binding Value, ElementName=amountSlider}” />

</xamlFx:ShaderContentControl.Effect>

<Button Content=”BUTTON”

Width=”300″

Height=”300″

BorderThickness=”4″>

</Button>

</xamlFx:ShaderContentControl>

Notice that the implementation is a subclass of ContentControl. My original idea was to make a Blend Behavior, but to make this work I had to modify the visual tree. While this would be possible from a Behavior, it wasn’t optimal and would cause unexpected issues to the consuming developer. The ShaderContentControl will capture the UIElement set to the Content and use that as the target for the RenderTargetBitmap. Within the ShaderContentControl template is an <Image/>, that sits directly behind the set ShaderContentControl’s content. This is used to render the D3D content. Here is what the ShaderContentControl’s template looks like:

<ControlTemplate TargetType=”ContentControl” xmlns=”http://schemas.microsoft.com/winfx/2006/xaml/presentation&#8221; xmlns:x=”http://schemas.microsoft.com/winfx/2006/xaml”&gt;

<Grid>

<ContentControl UseLayoutRounding=”False”

Width=”{Binding Content.ActualWidth, ElementName=contentShim}”

Height=”{Binding Content.ActualHeight, ElementName=contentShim}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”>

<Image x:Name=”rendererImage”

Stretch=”Fill”

IsHitTestVisible=”False”/>

</ContentControl>

<ContentPresenter x:Name=”contentShim”

Opacity=”0.0″

ContentTemplate=”{TemplateBinding ContentTemplate}”

ContentTransitions=”{TemplateBinding ContentTransitions}”

Content=”{TemplateBinding Content}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”/>

</Grid>

</ControlTemplate>

Keeping the <Image/> “perfectly” synchronized with where the UIElement would be rendered proved pretty difficult. With most shaders, like a Hue/Saturation modifier, one would only have to place the <Image/> behind the UIElement and bind to its size. The problem comes when you have shader effects that draw larger than the original sample. Shader’s such as drop shadow and blur are good examples of this. If this is not handled, the D2D surface is a natural clipping area and your blur clipped, which sucks. To overcome this, a shader can supply a padding value which supplies how much the shader will grow on the left, right, top and bottom and the ShaderContentControl will handle the rest.

Performance?

Performance is always a big concern but this is one of those limitations. Though the RenderTargetBitmap in XAML is the fastest relative to all other implementations, it still accounts for almost all the CPU overhead here. On top of that, RTB will map GPU memory to system memory to retrieve the pixel buffer, just to be sent back to the GPU by me. The other issue, there’s no way to tell a UIElement has visually changed, so a constant re-rasterization needs to be done (in some cases). That all being said, it does perform well, but there’s gotchas:

The ShaderContentControl has a property called “EffectRenderStrategy”, which can take 3 enums:

  1. RenderingEvent – Renders on every CompositionTarget::Rendering event
  2. Conservative – Renders when a layout updated has occurred or the pixel shader has been invalidated
  3. Manual – Renders only when ShaderContentControl->Render is called

Known Issues?

  1. RenderTargetBitmap::RenderAsync has issues with rendering certain elements.
  2. RenderTargetBitmap::RenderAsync has a problem with items in a ScrollViewer, such as a ListBox. I have a repro example here.
  3. Shaders are not extensible outside the library. This could be resolved, but I may have to do a lot of work abstracting Direct2D or making it more ABI friendly.
  4. This is a project-to-project library at the moment. I may make it an extension SDK later
  5. I only wrote a drop shadow and blur effect. Other effects coming soon.

Where do I get it?

https://xamlfx.codeplex.com/

Some of my XAML Windows 8.1 Blend Behaviors

It’s been a while since I’ve blogged any .NET stuff.  At the moment I’m jumping around and wanted to share a few behaviors I made that I think can be useful or fun.

ZoomToSelectedBehavior – This is a very easy to setup behavior you put on any Selector.  Once an item is selected a render transform is animated to perfectly size the selected element to the size of the Selector.

DistanceValueBehavior – This is my favorite.  There’s been so many design situations where I simply wanted to apply an effect based on how far an element is to something else.  This allows you to create things like parallax effects on a per-item in an ItemsControl.  If you remember the ZuneHD had similar effects to really add some depth in the UI (like scrolling music lists, the “Play” icons would scroll faster).  This is not a simple binding to a ScrollViewer’s Horizontal/VerticalOffset as not all situations you want to use this have ScrollViewers and that value does not take into account the “touch-pull-elastic-effect”.  The behavior pumps out values of ~0 to ~1.0.  Value converters are made to customize custom situations.

AncestorProviderBehavior – WPF has the ability to bind to an element’s ancestor based on type and other parameters.  This was really useful in templated controls, where an element who’s property you wanted to bind to is out of the namescope.  Store XAML doesn’t have this ability and none of the RelativeSource parameters fit all needs.  This behavior allows you to bind to an ancestor element by Type, Name or Type and Name along with an ancestor level.  I haven’t fully tested this one, but the demo seem’s to work.

Here’s the sample app and behaviors.

Download Behaviors Here

Direct2D GUI Library–Graphucks

screenshot

CodePlex:http://graphucks.codeplex.com/

Say What?

A few months ago, I started writing a Windows 8 application in my free time.  I am aiming it to be a video editing application, where users can cut, mix, add audio tracks and effects.  I got far along enough on the video encoding and decoding libraries where I had to start to focus on video mixing and effects.

To create a good video mixer that performed well, I needed to lean on D3D and D2D.  Video samples from each source would would be composed in a D3D scene and the final output would be rendered to the screen and also sent to my encoder library.

When I started working on the mixer, I felt it would be a good idea to make this code reusable for not only composing videos, but to spice up my application’s UI as there was quite a bit of visual “wow” XAML has left out, such as shaders, blending and a “WriteableBitmap.Render(…)”.  Even if XAML had all these, I would still need a D2D library as I need to be able to have frame-by-frame control.

As I started building, the first run of it didn’t look so dissimilar to an XNA game template, but as I continued iterating it, it began to look a lot more like a GUI library…and kinda-sorta like WPF.

XAML Style Layout – Measure, Arrange, Oh My!

If you don’t know about Measure and Arrange from XAML UI frameworks, go look it up.  The layout engine a very powerful mechanism to simplify how UI elements get arranged on the screen.  It’s an extensible model which allows anyone to write a Panel subclass and create any layout they desire.  I wanted Graphucks to be simple like this so I made sure to have this built in there.  Internally, layout is a little more complex than what is required of a Measure/ArrangeOverride.  The engine must handle situations like clipping, for when an element’s desired size is greater than the available size.  It must handle margins (negative ones too!).  It also must handle Vertical/Horizontal alignment.  Ultimately it must handle the final calculation of where the element is placed.

Right now Graphucks has a Panel base class, StackPanel, RadialPanel and a MultiContentPanel.  MultiContentPanel is a temporary solution until I build a Grid panel in.  I didn’t have a required dependency property system until now (more on this later).

Resource Handling

Creating resources in Direct2D and DirectWrite are quite different than what you do in XAML development.  For instance, in XAML you never have to worry about recreating resources if a device is loss.  In Direct2D, you do.  In Direct2D you also create resources from a factory.  In XAML you create a resource, such as a brush, by instantiating a “new Brush”.   Also in D2D and DWrite, a lot of things are immutable, such as geometries.  In XAML you just set properties and forget about it.   I wanted to keep with this intuitive XAML API, so all resources are lazy loaded.  This means that no D2D/DWrite resource is created until it NEEDS to be used.  All resources are fully “retained mode” and keep enough information on resource state to recreate. If an immutable property of a resource changes, it is marked as dirty and recreated again when it is needed to be used.  For device-dependent resources, this happens on a render pass.  For device-independent resources, this can be on a render pass, or even a hit test on geometry/text. This is transparent to the API consumer.

The one side effect to this, is it hides what calls and situations can be “heavy” or not.  But I’m giving you the source code, so you can figure it out yourself.

XAML Brushes

It wouldn’t be any fun if we didn’t have brushes.  These include SolidColorBrush, LinearGradientBrush, RadialGradientBrush and ImageBrush right now.  These were the easiest as they are all direct abstractions of Direct2D.  I would like to make a VisualBrush and/or a VideoBrush, but I need to make a MediaElement first.

The trick with Direct2D and XAML is how XAML maps a brush over geometry content.  XAML will map a brush to bounds of the drawn geometry and then apply any transformations.  For the most part, bugs withstanding, I think I got this mostly implemented.  The API need’s a little more TLC though to fit with newer parts of the library better.

Geometry

To be anything like XAML we have to be able to render vector graphics.  I have most of the Direct2D geometry stuff wrapped up to work nicely with the Graphucks library.  I do not yet have a WPF-like “Path” element to easily render complex vector graphics, but it can easily be made and is on my list of to-dos.

As explained earlier, resources are lazy loaded, so it’s the case with geometry.  Modifying geometry isn’t that much bigger of a hit compared to re-rendering the same geometry, frame by frame.  This is because the biggest hit there is Direct2D will re-tessellate the geometry each draw.  To mitigate this, I have feature to cache geometries to an A8 texture if desired.  An A8 texture is simply an alpha-only texture on the GPU.  Because it is only an alpha, it uses 3 times less memory than an RGBA.  It still is rather large if used in abundance.  The only other built-in way to cache geometry in Direct2D is with a mesh.  The upside to a mesh is it only caches vertex data, which is much smaller than an A8 and also D2D may be able to coalesce subsequent draw calls for better performance.  The downside is you must render them without anti aliasing.  Which means quality is compromised.  Quality of the A8 is lost when scaled larger, but not if just translated or rotated.

If this geometry cache feature is enabled (property of the geometry base class or the ShapeElement), modifying the geometry does become much heavier.

 

Input

Surely lacking a lot here.  Hit testing is implemented and seemingly working, but I simply don’t have any events in there beyond “PointerDown”.  Even the eventing implementation is a “just make it work to test something else” implementation.  More event’s aren’t difficult to add though.  The way I’m hoping Graphucks to work generally, is to be able to embed it into any existing UI framework that supports rendering D3D/D2D or to completely stand alone.  Desktop applications, Metro apps, Windows Phone 8 (if they ever open up D2D).  What this means in the context of input is that how it receives things like “pointer moved, pointer down, key press” needs to be able to come from anywhere.  For instance, a win32 message, winforms event, xaml event, etc.  Once these events are sent to the “Graphucks Root”, it will do it’s own hit testing and input logic.

Rendering System

Direct2D makes this incredibly simple, but my code isn’t without some flaws here, which I’ll explain.  After the Measure/Arrange and input has happened, rendering starts at the root element of a parent-child tree of “VisualElement”s.  Each VisualElement runs a RenderOverride and is passed a rendering context.  This is very similar to WPF, the biggest difference is the Graphucks render context is a true immediate mode system, where WPF’s is deferred to the render thread and it’s copy of the visual tree.

This system is very simplistic, until you actually come across the need for an intermediate render target (IRT).  You need IRTs for lots of things, such as doing effects, blending or “BitmapCache”.  The existence of these throws a wrench in the whole deal.  For instance, when rendering to an IRT, you must keep the transform data relative to the IRT for rendering and a totally separate one for hit testing.  Also (an annoying current bug), is if the size of the IRT isn’t correct (size incorrect due to not taking things like RenderTransform or child render transforms into consideration), it forces clipping to the IRT, even when the layout doesn’t call for it.  Another issue is with blending, or as Direct2D calls them, composition modes.  Composition modes are like, “Plus”, “XOR”, “MaskInvert” that blends the source and destination pixels based on some simple algorithm.  This is a problem with my rendering code.  Right now if VisualElement-Root (backbuffer) has a VisualElement-A child that is an IRT, when the IRT is drawn to the backbuffer, the effect is “correct”.  If VisualElement-B has a child that is an IRT, well when B gets blended with A, then have that output blended to the root, the result is not “correct”.  Still a bit of work to be done here!

Shader Effects

Shader effects were probably the easiest thing to implement.  Direct2D does all the work for you.  I do still have a “one effect per element” rule like WPF, but because D2D is so awesome, they are much more flexible.  You can easily chain together a “drop shadow + desaturation + blur” in a single re-usable “ElementEffect”.  Blur effect never looked so simple!  I need to be honest though, I am talking about built in D2D effects (which come with 50 you can chain together in diff configs).  If you want to use your own shader code, you still have to author a D2D effect and the rest fits right into Graphucks.

Dependency Property System

This is something I didn’t plan on making at first, but you really start seeing the value of them when you don’t have them!  Some great things like Grid and Canvas need at least attached properties to be useful…unless you spend a lot of time writing things like “GridItemContainer” classes.  They will also come in handy for making an animation framework, like XAML’s (not even started) or even binding.

This area is pretty new also and only started implementing them in certain areas of Graphucks.  Property inheritance does seem to work now.  I’m not sure I’m totally happy with using a shared_ptr<void> as my “value abstraction” and been thinking about internally wrapping values in a class that can provide richer information on the types.  Dependency properties here are heavy at the moment, but like most of this library, there’s lots of room for optimization.

I do have some classes that are special handled in the DP, like “DependantObject” (not to be confused with DependencyObject).  It’s used so a resource, like a Brush, can invalidate a visual that is using it.  It’s kind of a hack to get around the fact that resources aren’t part of the property inheritance chain.

What next?

I don’t have much free time between home and work, but I’ve been trying to at least put in a few hours a week on this because it’s something I need and happy to share.  I think immediately I’ll try to write some quick C# WinRT samples on how quickly do simple things that you can’t in XAML, such as “Text with a drop shadow” or “Effect on an image.”  Or even, *gasp* a RadialGradientBrush.  I’d also like to get Grid and Canvas in there, animation, maybe bindings, fix the many rendering/layouts bugs.  More text features are low hanging also.  Thinking about everything is making my head spin…

-Jer

Writing a Win8 App: Part 1 of Many

I previously wrote about how I was starting a Metro application, for learning and fun’s sake.  I talked how I wanted to exercise some knowledge in multimedia and get comfortable with all the WinRT goodies.  I also covered that I wanted to make a video editing application.  That idea, even now, is pretty abstract.  The loose goals are “something like Adobe Premiere, Avid for iPad or iMovie”.  I’m making this app as I go along.  This specific post digs into a lot of Media Foundation and I sometimes make a lot of assumptions on everyone’s knowledge (and willingness to read about something as boring as COM media APIs).  My hope is cover some thought process and even a few humps in the road so others may not have to endure.

Making a Video Editing Engine

Any video editing application is going to need a way of dealing with media.  The term “engine” might be a little strong in this case, which in reality would be a set of components.  These components are suited for specific tasks related to processing media, like reading media attributes, processing an audio/video stream and decompressing it, encoding, mixing of audio and video, and of course playback.  Metro allows for a subset of MediaFoundation that will facilitate a large portion of the multimedia features.  If you are familiar with Media Foundation, you will notice the subset is very trimmed down from it’s desktop counterpart.  My initial reaction, was “OMG!  How am I going to do anything without XYZ?!”, but I found the subset and tradeoff to eventually be beneficial and just easier.  Some areas required some flexing of the API, but it’s hard to complain when you get so much more for “free”.

My first look at how this editing engine would be made was to see what existing infrastructure existed in Metro already.  There’s a transcode API and MediaElement, both of which can take a custom Media Foundation Transform (MFT) to add effects.  I can see using the transcode API in some situations in this app, but in reality, by the time we are done, we’ll have “rewritten” all that functionality (plus more).  Unfortunately transcode and MediaElement do not allow the kind of flexibility we want in this application.  To get the control we really need, we’ll need to mainly use the IMFSourceReader, IMFWriterSink, a custom media source, Direct3D and the IMFMediaEngine.

The Media Type Detector

All media container formats (WMV, MOV, AVI, MKV, etc) contain important information about the media it houses.  For this app, we need to know things like, “What kind of media streams do you have?” and “What compression format are the streams”, “What is the duration of the stream?”.  This kind of data is useful to be able to actually decode the media, but it also nice to have so it can be displayed to the user.  With this component we can discover video resolutions, so we can property initialize other parts of the application.  It is also possible to discover multiple audio tracks in case the user wishes to include those within the video editing session.

DirectShow had a built in class to do this, called IMediaDet.  MediaFoundation has the same functionality, but it’s rolled into the IMFSourceReader.  The source reader handles quite a bit, which I’ll cover next, but for this requirement, we are only using it’s GetCurrentMediaType/GetNativeMediaType functionality. 

Note on Metro and IMFSourceReader:  Because of the sandbox restrictions, you may find creating a IMFSourceReader using MFCreateSourceReaderFromURL problematic.  You will want to use MFCreateSourceReaderFromByteStream, which takes an IMFByteStream.  You can use any WinRT random access stream, first by using MFCreateMFByteStreamOnStreamEx.  That function will wrap your WinRT stream with an IMFByteStream.  It is also useful in other areas of Media Foundation that use/require IMFByteStream

The Media Readers

Reading media is where things get a little more complicated.  This component needs to handle everything from parsing the container, media samples, decoding.  In the case of video, the GPU should be leveraged as much as possible for using accelerated decoding and colorspace conversion.  For this, like the media type detector, we use the IMFSourceReader.   The source reader handles everything you need to read and decode media almost automatically, at least media supported by installed codecs.  If it’s a video stream, you can tell it to decode to a specific colorspace (eg RGB32).  If it’s audio, you can have it down/up sample, and even covert the number of channels.  This is extremely helpful when you need a specific format to do audio/video mixing, and also for outputting to specific encoders.

The IMFSourceReader has a method called ReadSample.  This will return the next consecutive media sample.  If your media contains an audio track and video track, ReadSample(…) could return an audio sample, or a video sample in the order it was interleaved into the media container.  This is great for some cases, but we have an issue.  What if a user wishes to move an audio track +/-5 seconds relative to the video?  What if the user only wants to use a video stream from the media?  Or just an audio track?  The solution here is to write an “Audio Sample Reader” and a “Video Sample Reader”.  Each is independent of each other, which allows for the greatest flexibility, independent seeking, and overcomes a situation where the IMFSourceReader will queue up media samples if you do not “ReadSample” from a specific stream. 

With the “Video Sample Reader”, I needed to make sure performance is key.  I wrote how to achieve GPU acceleration with IMFSourceReader previously.  Not all codecs allow for GPU decoding, but H264 and VC1 should in most cases along with YUV –> RGB conversions.  When you do have a GPU enabled IMFSourceReader, ReadSample(…) will give you an IMFDXGIBuffer in the IMFSample’s buffers.  I have settled on GPU surfaces as the base format returned by my “Video Sample Reader”.  They work to render to XAML, and they work to sending to encoders.

The “Audio Sample Reader” is slightly less complex.  There is no GPU acceleration for audio decoding in Media Foundation Winking smile.  I simply let the consumer set the output format (channels, sample rate, bits per channel), and Media Foundation diligently uncompressed and coverts to that format.  Setting all audio streams to decode to a single uncompressed format will simplify the process of making our own custom audio mixer later. 

The Media Writer

To encode media, I am using the IMFSinkWriter.  With this interface one can simply specify an output stream or file, configure audio and video streams encoder parameters and just send uncompressed samples to it.  The rest is magically handled for you.  Though this API is very easy to use, there are plenty of “gotchas”.  The first is you need to be very aware of what container formats can contain certain streams.  For instance, an MP4 container cannot hold a VC-1 video stream.  WMV (aka ASF) can technically hold “any” format, but compatibility with many players is minimal.  The second issue is you must be very aware of what parameters to use for the codec.  Some audio codecs only work with very specific bitrates, not all are published fully either.  You can find valid WMA bitrates and information in my rant here.

There is also a more subtle “gotcha” that I came across.  When I first tested the writer, the output video looked very “jerky”.  I double and triple checked the time stamps, but everything looked fine.  I found that every time I sent a video sample to the writer, it would not encode immediately and required at least a few samples in the queue to do its job.  In essence, I kept modifying the output before it got encoded.  To solve this, I created a IMFVideoSampleAllocator.  Before I would write the video sample, I would first grab a sample from the allocator, copy the source sample it, then send the copy to the sink writer.  Once the sink writer is finished with the sample, it will return to the allocator.

The Media Sample Player

Now I have a “media type detector”, “media readers” and a “media writer”.  This is the bare essentials for being able to transcode any format to any format.  That is important when a user wishes to compile their editing project to a media file as fast as the device can.  What we need now is a way to show a “live preview” to the user.  “Media readers” will just read as fast as it can, do not render audio and do not keep the audio and video in sync.  Audio rendering is not a problem with Metro APIs, but writing clocks and doing stream syncing and doing correct timing is not easy or fun.  The solution I came up with is to leverage the IMFMediaEngine and write a custom media source. 

The IMFMediaEngine can be configured to use a custom media source.  Typically folks may use this functionality to write a custom protocol (eg RTSP, RTP, etc) that plugs right into the Media Foundation pipeline.  For our situation, we want to make it a “virtual media source”.  By “virtual”, I mean the media source streams don’t really exist as a file, but is created on the fly from a user’s video editing session (picture a non linear video editor UI).  It receives output from the audio and video mixers that I have yet to discuss.  So think of this virtual source as a virtual file and when IMFMediaEngine is told to play the next sample, the mixers will compose an audio or video sample based on the current media time.  This could involve reading a new sample from the “media readers”.  If the IMFMediaEngine is told to seek, the application will virtually seek within the editing session, rendering audio and video from the mixer. 

One “gotcha” with this setup, is IMFMediaEngine will not let you directly instantiate your custom media source.  This is a requirement for me as I needed to make sure I can pump it with media samples to be rendered.  The work-around was to implement IMFMediaEngineExtension and register it with the MF_MEDIA_ENGINE_EXTENSION when you create the IMFMediaEngine.  When you tell a IMFMediaEngine to open a custom URI, eg “myproto://”, it will call into you, where you can instantiate your custom source, keep reference to it and pass it back to the IMFMediaEngine.

Note:  When IMFMediaEngine calls your custom IMFMediaStream::RequestSample(…), DO NOT synchronously call a IMFSourceReader::ReadSample(…).  It will deadlock.  Instead put IMFSourceReader in async mode, or use something like MS ppl library to asynchronously call IMFSourceReader::ReadSample.

What’s Next?

At this moment, this is all I have written.  I plan on tackling the infrastructure for the video mixer next, which involves the fun stuff of things like Direct3D/2D.  I will post on this when I have something more solid.  It will be exciting for me as I can finally demonstrate the output of the system-as-it-stands.  Right now, to you, it’s just a long, rambling blog post Smile

-Jer

I’m Building a Windows 8 App

I’m not alone in downloading and eagerly using all the Windows 8 betas.  Spending nights, perusing the documentation for new features.  Using up what little time I have extra to play with APIs.  I’m not silent or original in calling out this Metro, tablet-touch app model’s shortfalls.  I’m not even particularly keen on much of what Microsoft has deemed Metro design principals (Metro Design v1?).  On the other side, with my developer hat on, I see something that others see also.  The Metro platform is part of Windows.  It’s deeply rooted in it’s subsystems.  It’s littered throughout SDK header files.  WinRT support is in the .NET CLR and IE.  It’s not some-flavor-of-the-month monstrosity.  Silverlight was not Windows.  Metro is.

Why are you building a Windows 8 app?

I don’t have an immediate, pressing request to make a Windows 8 application.  I’m not doing it for a particular business idea.  I’m doing it to stay literate with Windows.  There will be a time, soon, when I have a hard business requirement to support Windows 8.  This isn’t a hypothetical and I’m not going to ignore it.  There’s also a second reason.  Fun.  Developers understand that we can do the same type of thing we do at work for enjoyment.  It’s not like “asking a mail man to go for a walk” for us.

What kind of application are you attempting to build?

I’ve spent a lot of my career dealing with multimedia.  It seems to only be natural that I attempt to build something around what I know.  Putting just a little bit of thought into it, I’ve decided to build a video editing application.  I’ve had this itch for a while now, but feel this is the first time Microsoft has given me the toolset to make a mobile video editing app I may be proud of.  Windows Phone 7 is a particularly weak developer platform in this context.  WPF/Win7 tablet wouldn’t make for a very fun experience for users either.  We developers never are in short supply of complaints over the Metro API, but in my humblest opinions, this is the strongest mobile platform the company has ever released.

WinJS or XAML?  Please tell me so I can edify my feelings on my favorite platform or berate you on yours!

If you must really know, I plan on using XAML.  I chose XAML for a number of reasons, but the most compelling reason was that it fit the requirements for this application.  When dealing with video, you want to leverage the GPU as much as possible.  From decoding, colorspace conversions to rendering.  WinJS has done an incredible job at keeping parity with XAML on being able to add video/audio effects.  WinJS can interact with C++ components, which turn can use the GPU.  The problem with WinJS, is there is no (sane) way to render D3D to the screen.  Beyond just the “video” aspect, D3D and Direct2D are a form of escape hatch, where you wish to move beyond the design or performance limitations of XAML or HTML. 

So XAML…C# or C++?

C++ of course.  This was an easy pick for me also because, again, it fit my requirements for this application.  There is naturally going to be a lot of C++ code in this project as it uses quite a bit of Metro-safe COM/Win32 API.  Also there are obvious performance and battery benefits to consider in such a resource intensive mobile application.  Surely I could write C++ WinRT components to consume in .NET, but honestly, that sounds like more work.  Right now I have five static libraries and wrapping them in an ABI safe component is just another thing I don’t have to worry about.  I also chose C++ so ya’ll don’t just decompile it, convert it to Chinese and re-submit it.

So this is going to be closed source?

Even though I’m making this app for fun in what little time I have each week, I am going to be making it as a closed source application.  On the other hand, I plan on doing blog posts as I develop it, explaining how I am making it.  Even though it’s natural to be guarded against possible “app competitors”, I do think sharing knowledge is important also.  This is more pronounced with the fact that I do not expect to become rich off this. Rest assured my next blog post on this subject will be heavy into MediaFoundation and Direct3D.