Jer's Hacks

I assume no liability

HTTP WinRT Client (for C++)

To really start digging into the Metro style apps in C++, I wanted to make a simple component.  Currently, there are two ways of making generic HTTP calls from C++ in a Metro application.  One is IXMLHTTPRequest2 and the other is Casablanca.  IXMLHTTPRequest2 is not really that great of an API.  Casablanca is really going to be awesome and suggest folks try it out.  When it’s released I’m positive it’ll be the preferred way of doing HTTP in C++.  It is pre-beta and the API is going to change in coming releases.

So why did I make this? As a learning exercise.  There’s a lot of concepts I wanted to learn and figured a WinRT wrapper around IXMLHTTPRequest2 would be full of surprises.

In the code I have a few things of interest:

  • A custom IBuffer implementation made with WRL (Microsoft::WRL::RuntimeClass).  WRL will let us mix old style COM interfaces with a runtime object.  It’s very well made and way easier than ATL.
  • CoCreateInstanceFromApp usage.  This is the Metro version of CoCreateInstance and will let you use your own registration free COM objects and Metro approved COM classes.
  • Lots of WRL ComPtr.  Never use raw COM interface pointers unless you know what you are doing!  This will ensure you never miss an AddRef/Release
  • WeakReferences.  A Circular reference is the Achilles heal for reference counted objects!  Learn when to use WeakReferences

I implemented the standard /CX events for callbacks.  The new async model (the impl is still async) would have been better, so one could use the continuations or async/await, but I’ll tackle that in another project.

One note, in the sample application I seemed to have odd perf problems with long strings in a XAML text box, so you will see I hard-coded the string to be no more than 200 chars.

As always, this is a prototype for learning and fun.  Since I’m still learning, there could be mistakes, bugs and what not Smile

image

Here’s the code!

http://dl.dropbox.com/u/4165054/HttpSample.zip

GPU Accelerated Media Effects in Windows 8 Metro

Windows 8 Metro applications have the ability to add effects to media, such as video playback and live cameras.  This is usually applied to the MediaElement class via the AddVideoEffect (or Audio) call.  There are a few reasons you’d want to make these a/v effects.  One being post-processing of video playback, the other is to be able to apply these effects as they are being encoded and recorded.

These media effects are created by implementing a MediaFoundation Transform.  There is a fine example located here.  Unfortunately that linked sample only shows how to do effects by processing pixels on the CPU.  Since this isn’t the 80’s any more, where we modify pixels in system memory, I wanted to use the GPU and Direct3D and D2D to create GPU accelerated effects for video.

In the sample application I only have it displaying video in a MediaElement with a blur effect applied to it.  But with not too much work I believe you can make this work in a D3D application with IMFMediaEngine and possibly even a WinJS application.  You can also add it to a video capture and record it.

Keep in mind this is just a prototype and not sure it works in every situation (eg, does it need a software fallback?  What about device loss?)

image

Download the application here (You may need the full version of VS11)

And special thanks for Shawn Hargreaves SpriteBatch!

Windows 8 Metro XAML UI and Direct3D 11 Interop

One feature of WPF that I always liked was the D3DImage class in WPF.  It allowed you to render any Direct3D surface to WPF airspace.  This came in handy for when the retained mode XAML system (or WPF 3D) was not not flexible enough.  This opened up a wide range of opportunities in the user interface, such as hardware accelerated video, and high quality 3D graphics.  Windows Phone 7 also has a similar feature, where in Silverlight you can mix XNA and XAML in the same user interface.

At the time of this writing, the new Windows 8 Metro XAML user interface, (which is prerelease) that feature is missing at the moment.  You can make a Direct3D11 Metro application or a XAML application.  No mixing allowed.  As an intermediate solution I wanted to take a crack on what can be done about this.  Here’s a quick video of the result, with apologies for my crappy camera.

How does this work?

Before I get anyone too excited, let me just say it involves a GPU read-back to system memory.  Yeah, I know it’s not optimal, but it’s better than nothing :) .  Now that we got that out of the way…

If you aren’t having hurdles to jump over, then you aren’t using pre-beta software.

I was going to be writing this in C++ for a number of reasons, so I spent a good amount of time reading the MSDN documents.  I was happy to see there was a WriteableBitmap class as at least this will let us render pixels.  My excitement dwindled when I found the WriteableBitmap::PixelBuffer property returned an IBuffer.  Why does this stink?  IBuffer only has two properties, “Capacity” and “Length”.  There doesn’t seem to be a “GetBuffer” method.  After some digging I found some examples of how to do this in C# by using some WinRT extension methods.  This was little help because I was doing this in C++.  I decided I needed a little desperation, so I pulled out reflector to see how .NET was getting the pixel buffer.  Low and behold, there is an undocumented interface!   Here is the COM interface I rewrote in native:

image

Ready for pixels!

Now that we had a way to update the WriteableBitmap we needed pixels.  With my sheer amount of weekend laziness, I decided to just hack up one of the Direct3D11 Metro samples.  To make one of these samples work, I only had to do a minimal amount of changes.  Most notably moving all the code to  WinRT DLL, removing all swap chain code and replacing it with an ID3D11Texture2D.  I had a few other pain-points with “Content” in WinRT DLLs, but I figure it’s my ignorance or pre-beta blues.

Performance?

Like I said before, we’re downloading the texture from video memory to system memory, just to be copied and sent back to the GPU…so there is a performance hit.  In my testing I’ve seen around 40 – 55 FPS.  We can probably increase performance a little more with maybe a few more tricks. 

Codes.  Shut up and gimme codes!

You can download the code example here.

Included is a reusable D3DRenderer control that you can use from C++ or C# and sample application.

Have fun…and I assume no liability.

Windows 8. “Don’t worry, nothing has changed for traditional desktop development” And that’s the problem

Well the Build conference has come and gone.  The dust has started to settle and folks have been doing a lot of reflecting on what all the Windows 8 announcements really mean.  We learned quite a bit from Build, but I don’t think we know the whole story yet.

Before Build, there was a wave of uncertainty with some developers about whether their favorite platform would be supported on Windows 8…or if we were all going to be shoe-horned into HTML5 and Javascript.  Developers were grasping at every rumor that supported what they wanted to believe.  Some folks were out for blood.  Others quietly jumped off bridges in anticipation of what they felt was the inevitable.  It was hysteria of biblical proportions.  People wanted to know that the technology they bet on was the right one.  People wanted to know that they weren’t using a technology that was soon to become obsolete.

I don’t know what pre-Build camp I fell into.  I felt WPF would not be satisfactory unless it got a major performance overhall, which Microsoft has already publically told us it would not get.  I also didn’t feel Silverlight was robust or performant enough for a general, full featured desktop framework.  Nothing less than an overhall the size WPF needed, and wouldn’t get, would make this the framework of the future. Personally I became tired and apathetic about the fates of WPF/Silverlight.  I was ready for something new…and like they always say, “Be careful what you wish for.”

After the announcements of WinRT and this new “Metro Style Xaml UI”, which are nothing short of engineering masterpieces, it was explained that, “Yes you can still run your applications of today on the Windows 8 desktop.”  One of the keynote speakers even showed a screenshot of Adobe Photoshop, stressing that these types of UI are not metro, but still important.  So this is what options Microsoft laid out for us:

  1. “Fast and fluid” UI framework, enhanced by Direct2D and D3D11, that is restricted to WinRT and subsets of Win32.  Deployment is restricted to AppStore and enterprise.  These applications will not run on the traditional desktop.
  2. “Slow and clunky” UI frameworks such as WPF or Silverlight.  Deployment options are side loading.

image

So you got your “fast and fluid” XAML framework you always wanted.  What’s the beef?

If the classic desktop is still important (e.g. the Photoshop example), what’s the forward momentum?  The WinRT style app simply doesn’t cover all the ground of what is possible on a Windows OS, nor does it cover all the deployment needs.  Are we to just get minor tweaks on WPF and Silverlight with a new version number and we as Microsoft customers are happy?   Yes, your favorite framework will run on Windows 8.  Yes, you are getting new versions of all of them.  But unless Microsoft lets us run these “Metro XAML UI” apps on the desktop, it could be a sign that everything running under it might be on the chopping block.  Murdered by neglect.  In the near future, maybe a couple years, you will have to make a new application, and it might not be the right fit for a “metro style” application.  Will you be stuck with slow-and-clunky?

Give some constructive criticism!

It’s all just complaining if you don’t provide an alternative, right?

One solution would to allow classic desktop applications to use the new XAML framework in WinRT.  This means getting all the advantages of the new Direct2D powered XAML, WinRT and the full Win32 plus side loading.

I have built a proof-of-concept to load WinRT applications on the desktop.  But if it’s not a supported scenario, it might as well be a hack.  Source code here (requires VS11 Ultimate).

image

Give me a conspiracy theory.  I like those!

One thought is that Microsoft has this feature planned.  But if they did announce it, they’d cannibalize their efforts with Silverlight 5 and .NET 4.5.  Not to mention they’d probably have a riot given how big of a reaction developers had to just the thought of their platforms being last week’s flavor of the month.  Microsoft doesn’t really want us to have 4 XAML frameworks to worry about and fiddle with.  They want one.  By not having “Metro XAML UI” on the desktop, they are just prolonging pain and guiding us to make the wrong investments.  If this conspiracy is wrong, then this signals the beginning of the end of the classic desktop and everything that runs inside it.

In short – I think we don’t have the full picture on Windows 8.

Reengineering DirectCanvas

image

When I first started work on DirectCanvas, the idea was to give .NET developers access to a rich, high performance, 2D drawing API.  I took the project so far, but gained a level in frustration and decided to restart the whole project, starting with C++.

Why C++?

The number one reason is performance.  When developing anything “high performance” a  managed language is like bringing a Yugo to the drag strip.  You can drop as many horses into that Yugo, but you are still using it beyond what it was designed for. You begin to put a lot of mental bandwidth into writing around the GC, and still you and up with something that isn’t as fast as it could be.  I hate leaving performance on the table, don’t you?

Another reason is productivity.  Given all the APIs DirectCanvas would eventually use are written in native, it’s just easier to also write in native code. I was using SlimDX for the Direct3D interop in the .NET version of DirectCanvas.  I can’t say enough nice things about SlimDX and would suggest it over anything else to anyone doing D3D in .NET.  But there’s things that SlimDX didn’t cover, like WIC, which I ended up having to write interop code for.  I wrote it using .NET COM interop, which I came across RCW issues which I promptly got fed up with and decided to make it work well, I’d have to rewrite the interop code yet again in CLI/C++.

The third reason, is “C++ devs need a drawing library too.”  This might sound odd as Direct2D, Direct3D, WIC, etc, are already C++ APIs, so devs already have that support.  But in the end, just because one is fluent in C++ doesn’t mean they want to understand the GPU, image processing, media or any other of the facilities DirectCanvas abstracts.

What about .NET?

I still plan on supporting .NET via a wrapper in the future.  It is not really a high priority until the C++ code is to a solid state.  I think it is important to have eventually as it increases the developer base of the library.  Also a lot of performance can be preserved by just going through one managed –> native abstraction layer with most things still “managed” in the native code.

What is the current state of native DirectCanvas?

I’ve been using the .NET version as a reference implementation.  Slowly rewriting all the features to mostly match, but in native code. 

Compositor – The glorified batch sprite is mostly complete.  This allows for extremely high number of sprites (500k at 60 FPS on my machine) and supports blend modes (eg, Additive, Subtractive, etc).

MediaPlayer – This plays media and is backed by MediaFoundation, so it supports GPU accelerated decoding of VC1 and H264 video.  I have much of this completed, but still needs a handful of features to make it worthwhile.  I’ve totally thrashed some Microsoft samples for implementing a custom EVR and player, but still it was a huge undertaking that still needs a little more love.

VideoBrush – Allows you to use a MediaPlayer as a brush.  Because it is a brush, it can be painted within any vector geometry or stroke around geometry.

SolidColorBrush – Nothing fancy here

DrawingLayer – This is where all drawing happens, a conceptual GPU backed bitmap.  Right now only supports drawing rectangles, ellipses and MediaPlayers but sooner or later it will support all geometry.  That shouldn’t be too hard as it’s mostly wrapping up Direct2D API.

D3D9 GPU Support – I’ve built in support for a compatibility mode so folks can use this on DX9 GPUs.  There are some restrictions with that enabled, but mostly stuff that can be worked around.

HwndPresenter – This lets you present your drawing layer to any Hwnd window.

Where’s the codes?

You can download a current copy of it here, but I do warn that everything is in a state of flux and refactoring happens on a weekly basis.  Don’t expect to much. :) It does require the June 2010 Direct SDK and the Windows SDK to compile.  Also remember release builds and release builds w/o the debugger attached have the best performance (duh!).

C++ Productivity: Memory Management 101 for the .NET Guy

I wanted to start what hopefully will become a series on C++ productivity that is geared towards .NET developers that have had at least some (maybe painful) experience with C++.  The first topic I’m covering is memory management as this area is often cited as the killer reason to use a garbage collected runtime over a native language like C++.

“My code has ninety-nine problems, but memory management should not be one.”

Fair enough.  Developing code is hard.  Developing code that is stable and doesn’t leak like a sieve is even harder.  Even in .NET we fight “leaks”, but as we became masters of the runtime we remembered simple rules to avoiding leaks in our managed application such as:

  • Being mindful of a hooked event, making sure it is unhooked if the object lifetime of the observer is shorter than the subject.
  • Static objects are rooted.  Objects that static objects refers to are also rooted.
  • Dispose an IDisposable or make use of the using statement.
  • Be aware of framework deficiencies (like this one from WPF)

Being masters of the .NET runtime, rules like these are second nature to us.  What about C++ where we directly allocate/deallocate memory?  How much more difficult is it than .NET?  Depending on your point of view, some say it’s easier.  Just like .NET, you just have to remember some simple rules.  I’ll explain as I go along, but first some background…

C++, Dynamic Memory and Automatic Variables

Apologies if this next part is rehash from computer science, but bear with me as it contains some important concepts.  I won’t go too far in depth as there are thousands of smarter folks that have covered this far better than I.  In C++ you have dynamically allocated variables and automatic variables.  The distinction is very simple.  Dynamic means you created something using the new keyword and will look like a pointer.  Automatic means the new keyword was not used.

Here is an example of a class that uses dynamically allocated memory:

CMyClass* pClass = new CMyClass();
pClass->DoSomething();

Notice here that pClass is a pointer type.  When we instantiate it with the new keyword, memory is allocated on the heap and that memory location is assigned to pClass.  This memory will not be freed unless we execute delete pClass.

Here is an example of an automatic variable:

CMyClass myClass;
myClass.DoSomething();

Compare this to the previous dynamically allocated example.  There is no new keyword involved.  This might seem foreign to a .NET developer as the next line of code, executing the DoSomething method, would cause a NullReferenceException to be thrown.  In C++, myClass is automatically allocated, and the constructor is called.  You also do not call delete myClass to free it.  It gets freed and the deconstructor gets called automatically.  The question here should be “When does this happen?”  And the answer is, “It gets freed the only safe time to release something automatic…when it loses scope.” 

Consider the following example:

  1: void MyMethod()
  2: {
  3: //myClassA ctor called 
  4:   CMyClass myClassA;
  5: 
  6: }//myClassA dtor called

In this snippet myClassA is allocated on the stack because it is local to the method.  When the method exits, myClass is automatically deallocated.

Here is another example, but this time we have an automatic variable, m_myClass, that is scoped inside a class as a member.

  1: class MyClassA
  2: {
  3:   MyClassA()
  4:   {
  5: 
  6:   }
  7: 
  8:   ~MyClassA()
  9:   {
 10:     /* Do nothing as m_myClass
 11:        will automatically be freed */
 12:   }
 13: private:
 14:   MyClassX m_myClass;
 15: }

So hopefully now you have a rough understanding of automatic vs dynamic.  Just to make sure, I’ll reiterate.  Dynamic allocations are pointers created with the new keyword (malloc too if you wanna be picky), and freed with the delete keyword.  Automatic variables are auto instantiated and freed when they lose scope.

“I need dynamic allocations!  I can’t just use auto variables everywhere!  You better not be wasting my time.”

Truth!  Nothing but a hello word application can really suffice without dynamically allocating some of that sweet, sweet memory.  How else are you going to hold thousands of strings for your latest Twitter client?  Or where else are you going to keep that huge buffer of the image you decoded?  In C++ it’s just a fact of life that for every new keyword use, you better have a delete to free it at some point, unless you consider leaking a feature.  Managing dynamically allocated memory is where almost everyone has had an issue with native code.  Here are some common scenarios:

  • Ownership problem:  Say the Joe class hands over a jpeg image to Randy.  Sooner or later, Joe gets destroyed.  Who frees the jpeg image?  If it is Joe, then what if Randy is still using the jpeg?  In that case Randy will just get pissed and crash the entire app when he finds out!
  • Exception problem: Say we just allocated us a string in a ProcessSomething method, then immediately run the DoStuff method.  DoStuff ends up throwing an exception.  The stack then unwinds and our ProcessSomething method never runs it's cleanup to free the string.
  • Double delete/free:  You run delete on some memory, but accidently run delete on it again.  Usually related to the ownership problem. Some C runtimes will crash, some do nothing.
  • Use after delete/free:  You try to use your memory after it was already deleted.  Also usually related to the ownership problem.  This will surely crash most apps!
  • Human error:  You just plain don’t remember to free something you allocated.

With all these things to worry about, no wonder native code has such a bad name.  It just looks like a minefield of problems where one has to worry more about infrastructure than writing code to solve a real world problem!  Rest assured there is a solution in a little idiom known as RAII.

”Resource Allocation Is Initialization” or RAII: Another acronym to make things sound more complicated than they really are.

RAII has been around for quite a while and wikipedia has a good article on it, so I’ll just give it to you in layman’s.  RAII is a pattern for binding the lifetime of an allocated resource to automatic variables. Remember that part!  RAII is the basis of making robust, leak proof applications.  Though RAII is at the root of the solution, keep in mind that it is not the solution in it’s entirety.

Here is a simple C++ class that can use RAII:

  1: class MyClass 
  2: {
  3: public:
  4:   MyClass()
  5:   {
  6:     /* Allocate some memory */
  7:     m_myDynamicString = new wchar_t[50];
  8:   }
  9:  
 10:   ~MyClass() 
 11:   {
 12:     /* Free memory on dtor */
 13:     delete m_myDynamicString;    
 14:   }
 15:  
 16:   void DoSomething ()
 17:   {
 18:     throw std::runtime_error("file write failure");
 19:   }
 20:  
 21: private:
 22:     wchar_t* m_myDynamicString;
 23:  };

In this class, there is nothing out of the ordinary.  It allocates memory on the constructor and nicely deallocates it on the deconstructor.  When we use MyClass as an automatic variable, RAII starts to make sense.  Consider this usage of the above "MyClasss”.

  1: void CrazyMethod()
  2: {
  3:   MyClass myClass; /* ctor executed */
  4:   myClass.DoSomething(); /* throws exception, 
  5:                dtor automatic
  6:                when stack unwinds */
  7: }

Because myClass is an automatic variable, the deconstructor is guarenteed to be executed in the event of an exception.  It is also guarenteed to be executed when CrazyMethod leaves scope.  Because the deconstructor always runs, delete is always called on our allocated memory.  Alternatively, MyClass can be a member of another class as an automatic variable, and it will still be released when the class it’s contained in loses scope (or is released).  This is all great stuff, but we still have problems…

“Your example still shows having to call delete xyz!  I thought you were going to show me how to avoid having to do that!  This solves only a few issues with memory management!”

RAII is a great way to handle some problems, but this basic pattern doesn’t go far enough.  As mentioned before, we still have to explicitly delete the memory we allocated with the new keyword.  It also does not solve the “ownership problem” of resources. If only we could bind all dynamically allocated memory to a specific scope and also have a way to track ownership, we’d be sitting pretty.  Luckily there’s some smart folks that have come up with a very smart solution.

Enter the Smart Pointers – Giving Intelligence to a Memory Address

So I finally get to the root of this post.  Smart pointers, simply said, are wrappers around pointers.  They are also not a new concept either.  The smart pointers I’ve been exposed to all use RAII to make them work.  Also they are all based on templates (generics for you .NET folks), so they are generally very flexible and add little code smell.

For the sake of brevity, I only want to cover some smart pointers, at least the ones I find the most useful in modern C++ development. 

The idea behind all smart pointers I cover here are to take all assignments of new.  They take care of managing lifetime of your dynamic memory and making sure you have no leaks.

unique_ptr

Nothing explains more than a good old example.  So lets start there!

  1: void DoSomething()
  2: {
  3:   /* auto keyword is like var in c# */
  4:   auto myWideString = unique_ptr<wchar_t[]>(new wchar_t[10]);
  5: }

Here we initialize a new unique_ptr, passing it some wchar_t array we have dynamically allocated.  Keep in mind that we have NOT done “myWideString = new unique_ptr…”.  If fact, I can’t think of a reason to ever new a smart pointer as it defeats the purpose.  If you recall the previous section on RAII, you will realize that the lifetime of our “new wchar_t[10]” memory is controlled by the lifetime of the unique_ptr.  In this case, for the duration of the DoSomething method.  Once DoSomething exits, the unique_ptr deconstructor is automatically called.  The unique_ptr is smart enough to to run the “delete[]” on the buffer we have assigned to it.

Here is another, more complex example:

  1: class MyCoolClass
  2: {
  3: public:
  4:   MyCoolClass()
  5:   {
  6:     m_myAllocedInt(new int(123));
  7:   }
  8: 
  9:   void Process()
 10:   {
 11:     /* Process something */
 12:   }
 13: private:
 14:   /* Auto destroyed */
 15:   unique_ptr<int> m_myAllocedInt;
 16: };
 17: 
 18: void DoSomethingElse()
 19: {
 20:   auto myCoolInstance = unique_ptr<MyCoolClass>(new MyCoolClass());
 21:   myCoolInstance->Process();
 22: }

First notice that MyCoolClass has a unique_ptr (m_myAllocedInt) as a class member.  It’s also an automatic variable that is tied to the scope of the class.  In the constructor a new int is allocated on the heap and handed over to the unique_ptr.  This memory will be freed when the class’s deconstructor is called.

Next look at the DoSomethingElse method.  We dynamically create the MyCoolClass on the heap and assign it to the myCoolInstance unique_ptr.  This ties that class instances to the lifetime of the unique_ptr that owns it, which until DoSomethingElse exits.  The next line shows a call to myCoolInstance->Process.  Notice that even though we are dealing with an automatic variable, we still can use it as if it was a pointer of the template type!  Neat!

Performance – Performance overhead of unique_ptr is nothing.  The compiler will inline everything, so it’s just as if you called the delete yourself.

Caveats – The unique_ptr directly replaces the older smart pointer called auto_ptr.  Only one pointer can own the dynamically allocated memory at one time.  So you cannot do “unique_ptr<MyCoolClass> x = myCoolInstance” as it will result in a compiler error.  Instead you must do “x.Swap(myCoolInstance)”  This keeps ownership very obvious and concise and fixes major complaints of the auto_ptr it replaced.

When to use – Use unique_ptr in situations where you need to transfer ownership of an instance.  Personally, I mostly use this internal to a class implementation. If you wish to have multiple classes share an instance, you would use this next smart pointer…

shared_ptr

shared_ptr is by far my most favorite of all smart pointers.  It’s very similar to unique_ptr in usage, solves the same issue as unique_ptr.  In addition, it also solves the ownership problem too.  shared_ptr achieves this by using a pattern known as (automatic) reference counting.  If you were to create a shared_ptr, it would have an initial reference count of 1.  If you were to pass your shared_ptr to be referenced somewhere else, it has a count of 2.  As soon as shared_ptr instance falls out of scope (recall RAII), the reference count is decremented. Once the reference count hits 0, it will run delete on the memory.

Let’s look at this common example of needing to share instances between classes:

  1: class MyCommonInstance
  2: {
  3: public:
  4:   void DoMagic();
  5: };
  6: 
  7: class MyCoolClass
  8: {
  9: public:
 10:   MyCoolClass()
 11:   {
 12:   }
 13: 
 14:   void SetCommonData(shared_ptr<MyCommonInstance>& someInstance)
 15:   {
 16:     /* auto increments ref count */
 17:     m_localInstance = someInstance;
 18:   }
 19: private:
 20:   shared_ptr<MyCommonInstance> m_localInstance;
 21: };
 22: 
 23: void DoSomethingElse()
 24: {
 25:   auto instanceA = unique_ptr<MyCoolClass>(new MyCoolClass());
 26:   auto instanceB = unique_ptr<MyCoolClass>(new MyCoolClass());
 27:   
 28:   /* 1 ref count */
 29:   auto sharedInstance = shared_ptr<MyCommonInstance>(new MyCommonInstance);
 30: 
 31:   /* 2 ref count */
 32:   instanceA->SetCommonData(sharedInstance);
 33: 
 34:   /* 3 ref count */
 35:   instanceB->SetCommonData(sharedInstance);
 36: } /* smart ptrs destroyed on exit -
 37:      instanceA dtor called - ref count 2
 38:    instanceB dtor called - ref count 1
 39:    sharedInstance dtor   - ref count 0, delete called automatically */
  1. Here two instances of MyCoolClass have been created.  I am wrapping the instances in a unique_ptr as you should already be familiar with it if you haven’t totally skimmed this post. 
  2. Next we create a shared_ptr of type MyCommonInstance.  At the point of instantiation, the  sharedInstance variable has a reference count of 1.  When we pass sharedInstance to instanceA->SetCommonData, instanceA keeps reference to it. 
  3. So by the time SetCommonData completes, sharedInstance will have a reference count of 2.  Because we send the shared_ptr to instanceB, sharedInstance will have a total reference count of 3 before the method exits.

  4. The method exits, freeing all smart pointers and subsequently the memory that was assigned to them.  As a shared_ptr gets released, it’s reference count gets lowered until it hits 0 and it’s deleter gets called.

Performance – shared_ptr performs very fast, but there are some things to be aware of.  When a shared_ptr is first created, it has to do an allocation on the heap to store the reference counting memory.  So you are doing two allocations by default, one for the reference counting memory and one for dynamically allocated memory.  There is a work around though.  If you use the make_shared helper method, it will allocate your instance AND the reference counting memory in one allocation.  Neat!  Also support for custom allocators makes it extremely flexible, but that’s a 200 level topic.  You can say that reference counting eats up blip of CPU time, but the reality is if you are going to be sharing ownership, you will most likely be doing reference counting by hand.  shared_ptr simply automates it.  shared_ptr also has virtual ctor/dtor’s so there is also the small overhead in vtable lookup.  Usually not a problem, but in high perf scenarios you should be aware of this.

Caveats – The Achilles heal of reference counting mechanisms are the dreaded circular references. The .NET GC handles this scenario automatically.  But here in native land, we aren’t using a big complicated memory management system.  Consider class Parent has reference to Child.  And Child keeps reference to Parent.  Both have a strong reference to each other and will never be released with a shared_ptr.  This is easily solvable with the use of another smart pointer, known as weak_ptr, which I’m not covering here.

The other caveat is with arrays.  Consider this:

  1: auto myWideString = shared_ptr<wchar_t[]>(new wchar_t[10]);

By default, shared_ptr will run “delete pData” This isn’t correct in C++, for arrays must be freed with “delete[] pData”.  The unique_ptr can handle this situation by default, but shared_ptr cannot. To rectify the situation we just supply a custom deleter to our shared_ptr.  Here is an example:

  1: struct array_deleter
  2: {
  3:   inline void operator ()(void * p)
  4:   { 
  5:     delete[] p; 
  6:   }
  7: };
  8: 
  9: void DoSomething()
 10: {
 11:   /* auto keyword is like var in c# */
 12:   auto myWideString = shared_ptr<wchar_t[]>(new wchar_t[10], array_deleter());
 13: }

This will ensure the memory is properly released

When to use – (Almost) all the time!  Seriously.  The few times not to use this smart pointer is when you need extreme performance in allocating/deallocating hundreds of thousands of instances.  In this case, just go for the unique_ptr.

CComPtr, CComQIPtr and _com_ptr_t

When working with COM objects, we realize that they implement the IUnknown interface.  This interface already supports reference counting (like shared_ptr).  The difference is COM uses something called intrusive reference counting.  This means the reference counting is built into the class itself.

Now that you know the basics of smart pointers, I won’t go into the grit of these three pointer types.  Do know that they will call the AddRef/Release on your COM object automatically.  AddRef is called when the smart pointer gets reference, Release is called when the smart pointer falls out of scope.  The COM class itself will keep track of it’s reference count and delete itself when it gets to 0.  If using COM smart pointers, the reference count will hit 0 when all smart pointers referencing it fall out of scope.

Performance – Just as much performance as calling AddRef/Release yourself!

Conclusion

“So what were those simple rules for not leaking memory in my C++ application?”

  • Use smart pointers with every new keyword.
  • Always pass your smart pointers by reference:
    void DoSomething(smartPtr<mytype>& data);
  • Never call delete on a smart pointer
  • Be mindful of shared_ptr and arrays.  Use a custom deleter as shown previously
  • Circular references.  Know how to use weak_ptr!

We all know that no application is free of memory leaks, no matter how managed it is.  Hopefully I have shown how memory management in C++ is not as difficult as it used to be and even mortals like us can make applications that are robust and free of leaks and tedious cleanup routines.  Just #include <memory> and have fun!

-Jer

COM for the “I did C++ once a thousand years ago, but only do .NET now” Developer –or– In the Defense of COM

 

To some .NET developers, COM is a dirty little turd that no matter how hard they try, it won’t flush.   Microsoft and other vendors keep on pumping out new COM based SDKs year after year.  Sometimes these APIs are directly consumable by .NET and in the case they are not, you can visibly see developers’ anxiety rise at the thought of working directly with COM.  There’s a lot of beef with a lot of developers about COM.  Where does it all come from?

image

“I did XYZ in COM and I couldn’t make it work, so COM sucks”, “I used ABC COM technology and it was way too overcomplicated” and the variations are things that have been heard and are wide spread.  While the fact that so many developers have these grievances about COM is generally relevant, it is also not very fair.  Imagine looking at .NET for the first time and diving right into WCF or Workflow and saying “I got burned by it, so .NET blows”.

Wikipedia has a fairly comprehensive article on COM, but unfortunately it just re-enforces what many developers already believe…That it’s an over-engineered, over-complicated bloated piece of shier…machinery.  (Being objective, I wanted to add that we’ve heard the same thing about the .NET BCLs.)  To the defense of COM, I wanted to write an article explaining why at its root, COM is simple, elegant, largely misunderstood and how it wants to be your friend.

COM – What is it?  Why is it?

COM covers a lot of ground and has accumulated quite a few surrounding technologies around it over the years, many I would consider obsolete.  Because of this, I only wanted to cover what I felt was the core and timeless pieces of COM. 

Get on with it already!

Back in the olden days, a developer would create a DLL using good old “C” to contain all of their wonderful logic.  When they would consume these DLLs, they would dynamically link to these libraries using the DLL’s lib file or export some specific methods (the methods you would call using p/invoke).  Life was simple in these days, but then object oriented programming had to come in and screw it all up. How does one share objects between DLLs?

Looking at the “Why” so we can explain the “What”

To really give an accurate picture of what COM is and why it exists, I need to give a little background.  When C++ was coming on the scene, there were no C++ compilers as we know today.  One would write C++ code, then “precompile” that to “C” code for a “C” compiler to directly consume and compile.  This made practical sense as C++ was really just C with some minor compiler magic.  Your CPU doesn’t know what an object is, so it was the C++ precompiler job to flatten out your class prototype to “C”.  Today’s C++ compilers do not need a C++ precompiler, but the end result is conceptually the same.

Since examples in assembly can scare most folks away from a blog post (and my asm prowess is weak), here is an example of what flattening a C++ class to “C” might look like is this (pseudo code, so not exact, but similar):

class CMyWidget
{
public:
    void Process()
    {
       ProcessMyPrivate(5);
    }

private:
    void ProcessMyPrivate(int num)
    {
      this->m_myLocal = num;
    }

    int m_myLocal;
};

Might get compiled down to this:

CMyWidget_Process(void* pObj)
{
    /* Won’t compile, but gives a good picture */
   *(pObj_vtable + CMyWidget_ProcessMyPrivate_vTable_Offset)(pObj, 5)
}

CMyWidget_ProcessMyPrivate(void* pObj, int num)
{
   /* Calc mem address of m_myLocal and set value */
   *(pObj + m_myLocal_Memory_Offset) = num;
}

When working with higher level and object oriented languages like .NET it’s easy to forget some important items.  So when I explain this example to .NET devs, or devs that have never used native code before, I need to point out a few things.  First is that all methods, even if they belong to your class, are ALWAYS “static”.  That means for every method you write in code, by the time it gets executed, it only exists in one memory location.  Second, there is no such thing as an object.  The object gets compiled down to a pointer, which in the end, only represents a memory address that signifies the start memory address of all your objects local variables (also has the v-table, but I’ll talk about that in a bit).  Notice the class method calls have been expanded to take this context pointer.   So really, to your CPU, an object instance is simply just a context structure, and the “this” keyword simply refers to that context.

Sharing the C++ Class across DLL Boundaries – What Seems to be the Problem?

I explained briefly how simple exporting regular “C” methods can be to share code between DLLs and the application.  I also explained how C++ classes get flattened and compiled down and also how they are seen to your CPU (more or less).  If C++ classes just get compiled to static “C” methods, which can be exported from our DLL, is that not sufficient to reuse and share classes across DLLs?  Yes…but not without some major caveats as this would come at a cost.

·         The first disadvantage of exporting a C++ class is you must use the same C runtime in all DLLs/exe involved.  So if one DLL used MSVCRT7, the caller of the DLL must be using MSVCRT7.  The reason is different versions of C runtimes manage the heap differently, so if DLL A tried to free resources allocated by DLL B, memory corruption can occur.

·         The second disadvantage is the same compiler version must be used.  This is because different compilers mangle names differently.  Name mangling differences among compilers ensures a DLL built with Microsoft would have completely different symbol names than a DLL built with GCC.  So when linking of your code occurs, the linker might be looking for “??4MyWidget…” when the export is really “MyWidget__Fi…”

·         Third, if a large change of the base classes in DLL A are made, even without breaking API, then DLL B that uses the base classes must be recompiled.

·         Forth, if you take the last three disadvantages into consideration, we have a very tight coupling between our DLLs.  This inhibits reusability of the module.

Sometimes these issues are not an issue for a given project.  This is usually acceptable when the project is small, or an in house utility.  That’s totally fine.  But let’s pretend Microsoft ignored these problems.  All developers would have to use the same compiler version.  Applications would have to be written for very specific versions of Windows.  This would be a complete nightmare for everyone.

Enter COM to Save the Day

COM attempts to solve these particular issues of sharing an object between DLLs that may or may not have been built using different compilers or use different C runtimes in a simple and elegant way, by way of using interfaces and harnessing the v-table.  So hopefully I’ve confused you here and asking “WTF is a v-table?” or “C++ doesn’t have a notion of interfaces, WTF are you talking about?”

Remember how I said a C++ object really just gets compiled down to a context-like pointer?  Well it not only keeps reference to your class’s local variables, it also keeps reference to all the memory locations of the functions that make up the class.  This is known as a virtual table aka v-table.

 

CMyWidget Object Pointer

CMyWidget V-Table

0 index

0xDEADBEEF
/* Points to CMyWidget::Process */

1 index

0xBADBEEF
/* Points to CMyWidget::ProcessMyPrivate

 

 

m_myLocal /* Address to local variable */

So if we look at the value located at the CMyWidget object pointer, we will find another pointer.  This is the pointer to the v-table.  Think of the v-table as just an array of pointers.   In an x86 process, the v-table is just 4 bytes that point to a method.  The next 4 bytes point to another method and so on.

So here we can deduce that if we have an object’s pointer, we can read its v-table.  If we can read it’s v-table, we can call any method on a C++ class!  Still this might not be very helpful as the compiler might assign private methods mixed in with the public methods.  COM solves this by using interfaces.  Interfaces don’t exist specifically in C++, but do conceptually and known as abstract virtual classes.  One might look like this:

struct IMyWidget
{
   virtual int Process() = 0;
}

Then we can modify our class to implement this interface:

class CMyWidget : public IMyWidget
{
public:
    void Process()
    {
       ProcessMyPrivate(5);
    }

private:
    void ProcessMyPrivate(int num)
    {
      this->m_myLocal = num;
    }

    int m_myLocal;
};

So how does this solve the issue of passing a class across DLL boundaries? Consider this in DLL A.

extern “C” IMyWidget CreateWidget()
{
   return new CMyWidget();
}

The compiler will construct a CMyWidget object and because it returns an IMyWidget interface, it will return a pointer with one entry in the v-table, which will be the “Process” method defined in the IMyWidget. This means a calling DLL, say DLL B, can simply do something like this:

IMyWidget* widget = CreateWidget();
widget->Process();

This works because when DLL B is compiled, the compiler reads the IMyWidget definition and knows the first (0 index) method is the “Process” method.  This is how COM gets around sharing object instances across DLLs, across compilers and across C runtime versions.  There are a couple implied things here though.  The first is a factory method is required to instantiate the object.  This is because the object needs to be created by the DLL that defines it as that DLL will use a specific version of the C runtime.  The other implication, not shown here, is you cannot simply call “delete widget” from DLL B.  This is because the DLLs may be using different C runtimes.  Instead a method must be added to the CMyWidget and interface that executes “delete this”.  That will ensure the correct version of the C runtime will delete the object.

You Seem to be Digressing.  Get Back to COM!

So now we should be familiar with the general problem of sharing native objects between DLLs and how COM attempts to fix it.   That’s all fine and dandy, but there needs to be some standardization as this simple pattern and compiler parlor tricks don’t make up a technology such as COM.  COM standardization all begins with the interface called IUnknown.  If a class implements IUnknown, it’s a COM object.  No if’s and’s or buts.   Here is what IUnknown is defined as:

interface IUnknown

{

   virtual HRESULT QueryInterface(REFIID riid, void**ppvObject)=0;

   virtual ULONG AddRef(void)=0;

   virtual ULONG Release(void)=0;

};

 

Three functions.  That’s it.  As far as I’m concerned, this IS COM.  Though these methods are simple, they deserve explanation. 

AddRef/Release – Reference Counting

In native languages like C and C++ there is no garbage collector.  When you allocate on the heap, it stays there until “delete” is called on it.  This can create difficulties, especially when you consider object ownership.  The problem is if Class A has reference to Object-X, then gives Object-X to Class B, who “owns” Object-X?  More specifically, who deletes Object-X?  If Class B deletes it when a Class B object gets destroyed, Class A now has an invalid Object-X reference!  A common pattern for fixing this is known as reference counting.  So if we consider Object-X as a COM object that Class A had reference to, Object-X would have a reference count of 1.  If Class B got a reference to Object-X, it would then have a reference count of 2.  If Class B got destroyed, it would not delete Object-X, but instead it would decrement Object-X’s reference count to 1.  If Class A was destroyed, it would decrement Object-X’s reference count, which would then be 0.  If Object-X’s count reaches 0, then Object-X would delete itself (delete this).  The Achilles-Heel of reference counting is circular references, but there are patterns to help with this.

COM has what is known as intrusive reference counting.  This means that reference counting is built into the object itself, by way of IUnknown.  The AddRef method increments the reference count and Release, decrements the reference count, when reaching 0, it will delete itself.

The basic protocol with AddRef/Release on a COM object is this:

·         Any factory method that gives creates you a COM object will have a reference count of greater than 0 and will have been incremented for you.

·         Any time a method returns reference to a COM object, the callee has already called AddRef.

·         When done with a reference to a COM object, call Release

As a side note, if you use smart pointers like CComPtr, it will handle all this house keeping for you and you’ll never leak or never need to call AddRef/Release ever again.

All about IUnknown::QueryInterface

The QueryInterface method is at the very heart of COM.  It enables the “C” in the COM acronym.  As a component, a COM object can contain several different services (or interfaces).  QueryInterface provides access to these.

Consider this example:

IMyWidget* widget = (IMyWidget*)pComObj;

This would work in COM because the compiler knows to construct an IMyWidget pointer, setup with the correct v-table.  The problem is we may not know what interfaces pComObj supports at runtime.  We can’t be casting to various interfaces and get exceptions when one isn’t supported.  Another issue is pComObject might not implement a specific interface, so it won’t be castable, but it might contain an object internally that does.

QueryInterface takes two parameters.  The first is a GUID.  Because C++ doesn’t have a rich typing system/reflection like Java or .NET, QueryInterface needs to know what specific interface is being requested.  If QueryInterface was a .NET technology, it might look like this:

comObj.QueryInterface<IMyWidget>(out widget);

A typical QueryInterface implementation looks like this:

QueryInterface (REFIID   riid, LPVOID * ppvObj)
{
// Always set out parameter to NULL, validating it first.
    if (!ppvObj)
        return E_INVALIDARG;
    *ppvObj = NULL;
    if (riid == IID_IUnknown)
    {
// Increment the reference count and return the pointer.
        *ppvObj = (IUnknown*)this;
        AddRef();
        return NOERROR;
    }
    if (riid == IID_IMyWidget)
    {
// Increment the reference count and return the pointer.
        *ppvObj = (IMyWidget*)this;
        AddRef();
        return NOERROR;
    }
    if (riid == IID_ISomeOtherInterface)
    {
// Return a local reference to an internal service
        *ppvObj = m_myLocalISomeOtherInterface;
        myLocalISomeOtherInterface->AddRef();
        return NOERROR;
    }

    return E_NOINTERFACE;
}

Here you can see any interface can be queried safely at runtime and even internal local instances can be returned, making the COM object truly componentized.  That’s it!  Hopefully by now you see that COM is quite simple, and in its own way, elegant.

There’s Gotta be More to COM!

There is more, but that’s where things do get complicated.  Technologies like threading apartments, registry (if gratuitous), COM+, DCOM.  Bah!   None of these technologies do you have to use in order to use COM.  These are the technologies I feel are obsolete and to an extent I think Microsoft feels the same way as almost all their new COM APIs do not use these things, but instead just use what they describe as “lightweight COM”, which is essentially what I have described here.

COMmon Misconceptions”

“I need to register a COM object (regsvr32) to use it”

False.  Registering a COM object is synonymous with adding a .NET assembly to the GAC.  It is not required for use, but if one wants their COM object globally accessible by just a ProgId or a CLSID (like a DirectShow filter or a WIC codec) registration is required.  When a COM object is registered, it simply adds ONE GUID (along with some other minor metadata) per COM object to the registry and the path to the DLL that the COM object resides in.

“COM is just over complicated and bloated”

False.  COM is just IUnknown, described above in this post.  The years and years of random technologies surrounding COM is what is complicated and bloated.  In the end, you choose how complicated your COM based project will be.

“I cannot use any parameter types I want in my COM methods.  I don’t want to be confined to COM automation types!”

True and False.  This is more of a C++ issue.  For instance if one parameter of your COM method was an std::string, one could have issues because the caller is using a different C runtime than the compiled DLL that contains the COM class.  That’s not to say it’s not possible, but for safety and compatibility you want to pick bare-bones types.

“COM Threading Apartments are bullshit”

True.  They are bullshit and you should not use them (IMO) in modern COM development.  As of this writing, its 2011.  We should not need things like single-threaded-apartment as we know how to properly synchronize our code to be thread safe.  It’s my advice you always make your COM objects free threaded and honestly, just bypass CoInitialize and CoCreateInstance where possible and just use factory methods to instantiate your COM objects.

 

 

 

A Critical Deep Dive into the WPF Rendering System

At first I didn’t think I’d publish this. I wanted to consider a bit of diplomacy and also thought I’m beating a dead horse.  After being convinced by some people who’s opinion’s I highly value, I decided to. Developers are investing quite a bit into Microsoft’s UX platforms should know more about how the innards of the platform works, as when they hit a brick wall, they can properly understand the issue and also communicate what they need changed in the platform more accurately.

I believe WPF and Silverlight are well made technologies, but…

If you’ve been following my Twitter stream the last few months, you might have noticed I’ve been taking what looks like some pretty cheap shots at WPF (and Silverlight for that matter) performance.  Why would I do that?  After all I have invested hundreds, and hundreds of hours of my own time over the years, evangelizing the platform, building libraries, community help, guidance etc.  I am by definition, personally invested. I want to see the platforms get better.

Performance, Performance, Performance

When developing an immersive, consumer based UX, performance is your number one feature.  It is the enabling feature that allows you to add all other features.  How many times have you had to scale back you UI because it was too jerky?  How many times have you came up with the “groundbreaking new UX model” that you had to scrap because the technology couldn’t handle it?  How many times have you told a customer they require a 2.4ghz quad core to get the full experience? I’ve been asked by customers why they cannot deliver the same fluid UX they have on their iPad application using WPF or Silverlight on a PC with four times the horses.  This technology may be good enough for line-of-business applications, but it falls short of being able to deliver a next generation consumer application.

I thought WPF was hardware accelerated?  Tell me why you think it is inefficient.

WPF is hardware accelerated, and is actually pretty neat how some parts of it work internally.  Unfortunately it doesn’t efficiently use the GPU nearly as well as it could.  It’s rendering system is very brute force.  I hope to explain that claim here.

Analyzing a single WPF rendering pass

For analyzing performance, we need to find out what WPF is really doing under the covers.  To do this, I use “PIX”, a Direct3D profiler that comes with the DirectX SDK.  PIX will launch your D3D based application and inject hooks into all Direct3D calls in order to analyze and monitor.

I’ve created a simple WPF application that contains two ellipses that animate left to right.  Each ellipse has the same fill color (#55F4F4F5) and a black stroke.  You can see the screenshot below:

clip_image001

How does WPF render this?

The first thing WPF will do is clear (#ff000000) out the dirty region that is going to redraw.  The purpose of dirty regions is to reduce the amount of pixels sent to the output merger stage of the GPU pipeline.  We might even be able to guess that it can reduce the geometry that needs to be re-tessellated (more on that later).  At the clear of the dirty region our frame looks like this:

clip_image002

Next WPF does something I don’t understand.  It first fills up a vertex buffer and then it looks like it draws what looks to be a quad over the dirty region.  So the frame now looks like this (exciting huh?):

clip_image003

Next it tessellates an ellipse on the CPU.  Tessellation, you may already know about, but essentially its turning our 100×100 ellipse geometry into a bunch of triangles.  The reason this happens is for 1) Triangles are sort of the native rendering primitive of a GPU.  2) Tessellating an ellipse might only be a couple hundred vertices, so it is MUCH faster than rasterizing 10,000 anti-aliased pixels on the CPU (what Silverlight does).  Below is a screenshot of what the tessellation looks like.  For those of you versed in 3D programing, you may have noticed this is a triangle strip.  Notice that the ellipse does look somewhat incomplete in the tessellation.  WPF next takes this tessellation and loads it into a vertex buffer for the GPU and issues yet another draw command using the pixel shader that is configured to use the “brush” configured in our Xaml.

clip_image004

Remember how I mentioned the ellipse looks incomplete?  Well it is.  WPF then generates what Direct3D programmers would know as a “line list”.  GPU’s understand lines as well as triangles.  WPF fills in a vertex buffer with these lines…and you guess it!  Issues another draw call.  Here is what the line list looks like:

clip_image005

So WPF is done drawing the ellipse now, right?  Nope!  You forgot about the stroked border!  The stroked border is also a line list.  This is sent to the vertex buffer for the GPU and another draw call is sent.  Here is what the border looks like.

clip_image006

By now we have drawn one ellipse, so our frame will look like this:

clip_image007

The process must be completed for each ellipse in the scene.  In this case two.

I don’t get it.  Why is this bad for performance?

The first thing you may have noticed is that it took three draw calls to render one ellipse.  Over those three draw calls, the same vertex buffer was used twice.  To explain the inefficiency, I need to explain a little on how GPUs work.  First, today’s GPUs process data VERY fast and run asynchronously with the CPU.  Also, there is costly user-mode to kernel mode transitions that happen with certain operations.  In the case that that a vertex buffer is filled, it must be locked.  If the buffer currently is used by the GPU, this causes the GPU to sync with the CPU, which can cause a performance hit.  The vertex buffer is created with a D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, but when it is locked (which happens quite a bit), the D3DLOCK_DISCARD is not used.  This could cause a stall (a sync of the CPU and GPU) of the GPU if the buffer is in use by the GPU.  In the case of lots of draw calls, we have possibly a lot of kernel transitions and driver load.  The goal for good performance is to send as much work as possible to the GPU, or else your CPU will be busy and your GPU will be idle.  Also, do not forget that in this example, I’m only talking about 1 frame.  Typical WPF UI tries to execute at 60 frames every second!  If you’ve ever wondered what that high CPU usage on your render thread was from, you’ll find a lot (most?) is coming from your GPU driver.

What about Cached Composition?  That really helps with the performance!

No doubt it does.  Cached composition, aka BitmapCache, works by caching a visual to a GPU texture.  That means your CPU does not have to re-tessellate and your GPU does not have to re-rasterize.  On a rendering pass, WPF can just use the texture in video ram to render, therefore increasing performance.  Below is the BitmapCache of an ellipse.

clip_image008

WPF has a dark side to this though.  For every BitmapCache it comes across, it issues a single draw call.  Now to be fair, sometimes you do have to issue a single draw call to render a visual for some scenarios.  It’s the nature of the beast.  But let’s give a scenario where we have a <Canvas/> filled with 300 animated BitmapCached ellipses.  An advanced system would look ahead, determine it had 300 textures to render and that they are all z-ordered one after another.  It would then batch as many as possible, which I believe DX9 can do 16 sampler inputs at a time.  That would take 300 draw calls, down to 19 in this scenario, saving quite a bit of CPU load.  In terms of 60 FPS, we take it from 18,000 draw calls a second to 1,125/s. In Direct3D 10, the number of sampler inputs is much higher.

Ok, I read this far. Tell me about how WPF handles pixel shaders!

WPF has an extensible pixel shader API, along with some build in effects.  This allows developers to really add some very unique effects to their UI.  In Direct3D when you apply a shader to an existing texture, it’s very typical to use an intermediate rendertarget…after all you can’t sample from a texture you are writing to!  WPF does this also, but unfortunately it will create a totally new texture EACH FRAME and destroy it when it’s done.  Creating and destroying GPU resources is one of the slowest things you can do on a per frame basis.  I wouldn’t even typically do this with system memory allocations of that size. There would be a considerable performance increase on the use of shaders if somehow these intermediate surfaces can be reused.  If you’ve ever wondered why you get noticeable CPU usage with these hardware accelerated shaders, this is why.

Well maybe this is how vector graphics need to be rendered on the GPU!

Microsoft has put considerable effort into fixing a lot of these issues, unfortunately it wasn’t focused on WPF.  The answer is with Direct2D.   Consider this group of 9 stroked ellipses rendered by Direct2D:

clip_image009

Remember how many draw calls it took WPF to render a single ellipse with a stroke border?  And how many vertex buffer locks?  Direct2D did this in ONE draw call.  Here’s what the tessellation looks like:

clip_image010

Direct2D tries to draw as much as it can at once, maximizing GPU usage and minimizing unneeded CPU overhead. Reading the “Insights: Direct2D Rendering” at the bottom of this page, Mark Lawrence, explains in a good amount of detail of how Direct2D works. You can also look deeper and see that even though Direct2D is very fast, that there’s even MORE areas it can be improved in a v2. It’s also not too far-fetched to believe that a v2 of Direct2D would support the hardware tessellation features of DX11.

Looking at the Direct2D API, it wouldn’t be crazy to believe that a lot of the code was taken from WPF to create it.  If you watch this old Avalon video, Michael Wallent does talk about creating a native replacement for GDI from this technology. It has similar geometry API and nomenclature.  Internally it does a lot of the same things, but it is very optimized and very modern

What about Silverlight?

I would go into Silverlight, but it would be a bit redundant.  The performance of the Silverlight renderer is inefficient, but in different ways.  It rasterizes on the CPU (even shaders, which IIRC are written partly in assembler), but the CPU is at least 10x – 30x slower than the GPU. This leaves you with quite a bit less power to render your UI, and even less for your application logic.  It’s hardware acceleration is very rudimentary and is almost exactly the same as WPF’s Cached Composition and behaves in the same manor, issuing a draw call for every BitmapCached visual.

Where do we go from here?

This is a common question I get from customers with performance problems with WPF or Silverlight. Unfortunately I don’t have an answer for a lot of them. Some that can, roll their own framework for their specific needs. Others, I lend them an ear, but they have to live with it as there are no rich alternatives to WPF or SL. I will say my customers that just build LOB, they generally don’t have many complaints and are just happy with the developer productivity. It’s the folks that want to build experiences (ie, consumer apps or kiosk apps) that are in pain.

If you find any information in here incorrect, please notify me and I’ll get it changed.

Windows Phone 7 Sockets and File System Libraries

I never got a chance to write a blog post about it because my last blog got too painful to even use.  Here are two libraries that will only work on developer phones.  Do not expect to get applications written with them passed the market place approval.  They are only made for fun.

Sockets

This library I wrote is very similar to the System.Net sockets class.  I never built any code to host a socket, but others have taken it and added it to make things like a WP7 web server.

File System

This library is very similar to the System.IO classes to read a directory or a file via a file stream.  The phone is pretty locked down, so the only place I’ve been able to read is the \Windows folder.  There are several ways to get those files off the phone.  One ingenious solution given to me by a good friend (not sure he wants to be named or not) is to base64 the file and view it in the text viewer in the debugger.  The other way is to upload the file to a web server…or use the sockets library above.

Happy Hacking!

A Simple DirectCanvas Tutorial

I admit it’s probably still too early to be giving tutorials on how to use DirectCanvas.  I’m sure it’ll be outdated by the time the weekend is over!  But I do think it is far enough along to show how to fire up the library and render something.

The Pixel Shader Sample

In the DirectCanvas source code, I have a sample here, called the PixelShaderScene.  Below is an image of it’s output.

image

The Boiler Plate

“Oh nos!  Boiler plate?!  I thought this crap was supposed to be easy to use! I hate you.”
There’s no getting around it.  You have to write a few lines of code before you get started, but I promise it’s nothing much to worry about.

image

So far we’ve had to create our DirectCanvasFactory.  As it stands now, all resources created with a single instance of a DirectCanvasFactory can only be used with other resources that were created with the same DirectCanvasFactory.

We’ve also created a WindowsFormsPresenter.  This is a special DrawingLayer made to render content to various technologies.  In this sample, it happens to be Windows Forms.  If we were to render to WPF, it would look like this:

image

When we are done drawing, presenting the rendering to the given technology the Presenter is tied to, simply run this:

image

Now what?  I want to draw something!

So now on to the juicy stuff!  This is where this PixelShaderSource.cs code comes into play. If you look at the constructor here, you will notice it takes a parameter of a DrawingLayer.

image

Our presenter we had created earlier, well they both inherit from a DrawingLayer, so we had no problems passing it to our PixelShaderScene class.  If you read the second line of code, you may be wondering, “What is this InitializeResources method?”

The InitializeResources will setup the resources we wish to use for rendering.  In this case they are all meant to exist for the life of the application.  Here we actually are creating two drawing layers.  One is going to be used for a place to render our initial drawing.  The other one is going to hold an image we load from the file system.

image

One other important thing to notice in this code snippet is the RippleEffect.  This class uses the extensible shader API in DirectCanvas.  The original code actually came from the codeplex project and only needed slight modification when ported over (which was actually only a 5 minute process in whole).

Cool, we created ‘resources’.  But you never showed me how to draw!

Truth!

In the same class, you will find a method called Render.  This method is actually called from a simple timer in the main application.  So how that method is being executed is nothing important, so I won’t say anything more on it!

You will notice that we have a Begin/EndDraw pattern on our DrawingLayer.  This not only mirrors the underlying graphics libraries, but is also a necessary evil for a high performance API.  This is because a BeginDraw statement will setup any needed state or resources needed for drawing.  An EndDraw will typically flush any drawing commands, doing it’s best to batch the work to the GPU.  The DrawingLayer also has BeginCompose/EndCompose methods, but I’ll save that for another tutorial.

image

The DrawLayer method that is being called in between the Begin/EndDraw, draws it’s contents into the given target rectangle area.  This ensures our image will always fill the m_tempLayer entirely, even warping it if it has to.

Also notice the ApplyEffect method and how you must specify an output layer.  You always need an output layer!  This is a restriction of how the GPU works.  Effects are applied basically on a copy of the source to a destination.  In this case the output is our m_mainLayer, which also happens to be our presenter we had initialized in our boiler plate.  That means a Present just needs to be called…and BAP!  GPU Ripples!

-Jer

Follow

Get every new post delivered to your Inbox.