A Critical Deep Dive into the WPF Rendering System

At first I didn’t think I’d publish this. I wanted to consider a bit of diplomacy and also thought I’m beating a dead horse.  After being convinced by some people who’s opinion’s I highly value, I decided to. Developers are investing quite a bit into Microsoft’s UX platforms should know more about how the innards of the platform works, as when they hit a brick wall, they can properly understand the issue and also communicate what they need changed in the platform more accurately.

I believe WPF and Silverlight are well made technologies, but…

If you’ve been following my Twitter stream the last few months, you might have noticed I’ve been taking what looks like some pretty cheap shots at WPF (and Silverlight for that matter) performance.  Why would I do that?  After all I have invested hundreds, and hundreds of hours of my own time over the years, evangelizing the platform, building libraries, community help, guidance etc.  I am by definition, personally invested. I want to see the platforms get better.

Performance, Performance, Performance

When developing an immersive, consumer based UX, performance is your number one feature.  It is the enabling feature that allows you to add all other features.  How many times have you had to scale back you UI because it was too jerky?  How many times have you came up with the “groundbreaking new UX model” that you had to scrap because the technology couldn’t handle it?  How many times have you told a customer they require a 2.4ghz quad core to get the full experience? I’ve been asked by customers why they cannot deliver the same fluid UX they have on their iPad application using WPF or Silverlight on a PC with four times the horses.  This technology may be good enough for line-of-business applications, but it falls short of being able to deliver a next generation consumer application.

I thought WPF was hardware accelerated?  Tell me why you think it is inefficient.

WPF is hardware accelerated, and is actually pretty neat how some parts of it work internally.  Unfortunately it doesn’t efficiently use the GPU nearly as well as it could.  It’s rendering system is very brute force.  I hope to explain that claim here.

Analyzing a single WPF rendering pass

For analyzing performance, we need to find out what WPF is really doing under the covers.  To do this, I use “PIX”, a Direct3D profiler that comes with the DirectX SDK.  PIX will launch your D3D based application and inject hooks into all Direct3D calls in order to analyze and monitor.

I’ve created a simple WPF application that contains two ellipses that animate left to right.  Each ellipse has the same fill color (#55F4F4F5) and a black stroke.  You can see the screenshot below:

clip_image001

How does WPF render this?

The first thing WPF will do is clear (#ff000000) out the dirty region that is going to redraw.  The purpose of dirty regions is to reduce the amount of pixels sent to the output merger stage of the GPU pipeline.  We might even be able to guess that it can reduce the geometry that needs to be re-tessellated (more on that later).  At the clear of the dirty region our frame looks like this:

clip_image002

Next WPF does something I don’t understand.  It first fills up a vertex buffer and then it looks like it draws what looks to be a quad over the dirty region.  So the frame now looks like this (exciting huh?):

clip_image003

Next it tessellates an ellipse on the CPU.  Tessellation, you may already know about, but essentially its turning our 100×100 ellipse geometry into a bunch of triangles.  The reason this happens is for 1) Triangles are sort of the native rendering primitive of a GPU.  2) Tessellating an ellipse might only be a couple hundred vertices, so it is MUCH faster than rasterizing 10,000 anti-aliased pixels on the CPU (what Silverlight does).  Below is a screenshot of what the tessellation looks like.  For those of you versed in 3D programing, you may have noticed this is a triangle strip.  Notice that the ellipse does look somewhat incomplete in the tessellation.  WPF next takes this tessellation and loads it into a vertex buffer for the GPU and issues yet another draw command using the pixel shader that is configured to use the “brush” configured in our Xaml.

clip_image004

Remember how I mentioned the ellipse looks incomplete?  Well it is.  WPF then generates what Direct3D programmers would know as a “line list”.  GPU’s understand lines as well as triangles.  WPF fills in a vertex buffer with these lines…and you guess it!  Issues another draw call.  Here is what the line list looks like:

clip_image005

So WPF is done drawing the ellipse now, right?  Nope!  You forgot about the stroked border!  The stroked border is also a line list.  This is sent to the vertex buffer for the GPU and another draw call is sent.  Here is what the border looks like.

clip_image006

By now we have drawn one ellipse, so our frame will look like this:

clip_image007

The process must be completed for each ellipse in the scene.  In this case two.

I don’t get it.  Why is this bad for performance?

The first thing you may have noticed is that it took three draw calls to render one ellipse.  Over those three draw calls, the same vertex buffer was used twice.  To explain the inefficiency, I need to explain a little on how GPUs work.  First, today’s GPUs process data VERY fast and run asynchronously with the CPU.  Also, there is costly user-mode to kernel mode transitions that happen with certain operations.  In the case that that a vertex buffer is filled, it must be locked.  If the buffer currently is used by the GPU, this causes the GPU to sync with the CPU, which can cause a performance hit.  The vertex buffer is created with a D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, but when it is locked (which happens quite a bit), the D3DLOCK_DISCARD is not used.  This could cause a stall (a sync of the CPU and GPU) of the GPU if the buffer is in use by the GPU.  In the case of lots of draw calls, we have possibly a lot of kernel transitions and driver load.  The goal for good performance is to send as much work as possible to the GPU, or else your CPU will be busy and your GPU will be idle.  Also, do not forget that in this example, I’m only talking about 1 frame.  Typical WPF UI tries to execute at 60 frames every second!  If you’ve ever wondered what that high CPU usage on your render thread was from, you’ll find a lot (most?) is coming from your GPU driver.

What about Cached Composition?  That really helps with the performance!

No doubt it does.  Cached composition, aka BitmapCache, works by caching a visual to a GPU texture.  That means your CPU does not have to re-tessellate and your GPU does not have to re-rasterize.  On a rendering pass, WPF can just use the texture in video ram to render, therefore increasing performance.  Below is the BitmapCache of an ellipse.

clip_image008

WPF has a dark side to this though.  For every BitmapCache it comes across, it issues a single draw call.  Now to be fair, sometimes you do have to issue a single draw call to render a visual for some scenarios.  It’s the nature of the beast.  But let’s give a scenario where we have a <Canvas/> filled with 300 animated BitmapCached ellipses.  An advanced system would look ahead, determine it had 300 textures to render and that they are all z-ordered one after another.  It would then batch as many as possible, which I believe DX9 can do 16 sampler inputs at a time.  That would take 300 draw calls, down to 19 in this scenario, saving quite a bit of CPU load.  In terms of 60 FPS, we take it from 18,000 draw calls a second to 1,125/s. In Direct3D 10, the number of sampler inputs is much higher.

Ok, I read this far. Tell me about how WPF handles pixel shaders!

WPF has an extensible pixel shader API, along with some build in effects.  This allows developers to really add some very unique effects to their UI.  In Direct3D when you apply a shader to an existing texture, it’s very typical to use an intermediate rendertarget…after all you can’t sample from a texture you are writing to!  WPF does this also, but unfortunately it will create a totally new texture EACH FRAME and destroy it when it’s done.  Creating and destroying GPU resources is one of the slowest things you can do on a per frame basis.  I wouldn’t even typically do this with system memory allocations of that size. There would be a considerable performance increase on the use of shaders if somehow these intermediate surfaces can be reused.  If you’ve ever wondered why you get noticeable CPU usage with these hardware accelerated shaders, this is why.

Well maybe this is how vector graphics need to be rendered on the GPU!

Microsoft has put considerable effort into fixing a lot of these issues, unfortunately it wasn’t focused on WPF.  The answer is with Direct2D.   Consider this group of 9 stroked ellipses rendered by Direct2D:

clip_image009

Remember how many draw calls it took WPF to render a single ellipse with a stroke border?  And how many vertex buffer locks?  Direct2D did this in ONE draw call.  Here’s what the tessellation looks like:

clip_image010

Direct2D tries to draw as much as it can at once, maximizing GPU usage and minimizing unneeded CPU overhead. Reading the “Insights: Direct2D Rendering” at the bottom of this page, Mark Lawrence, explains in a good amount of detail of how Direct2D works. You can also look deeper and see that even though Direct2D is very fast, that there’s even MORE areas it can be improved in a v2. It’s also not too far-fetched to believe that a v2 of Direct2D would support the hardware tessellation features of DX11.

Looking at the Direct2D API, it wouldn’t be crazy to believe that a lot of the code was taken from WPF to create it.  If you watch this old Avalon video, Michael Wallent does talk about creating a native replacement for GDI from this technology. It has similar geometry API and nomenclature.  Internally it does a lot of the same things, but it is very optimized and very modern

What about Silverlight?

I would go into Silverlight, but it would be a bit redundant.  The performance of the Silverlight renderer is inefficient, but in different ways.  It rasterizes on the CPU (even shaders, which IIRC are written partly in assembler), but the CPU is at least 10x – 30x slower than the GPU. This leaves you with quite a bit less power to render your UI, and even less for your application logic.  It’s hardware acceleration is very rudimentary and is almost exactly the same as WPF’s Cached Composition and behaves in the same manor, issuing a draw call for every BitmapCached visual.

Where do we go from here?

This is a common question I get from customers with performance problems with WPF or Silverlight. Unfortunately I don’t have an answer for a lot of them. Some that can, roll their own framework for their specific needs. Others, I lend them an ear, but they have to live with it as there are no rich alternatives to WPF or SL. I will say my customers that just build LOB, they generally don’t have many complaints and are just happy with the developer productivity. It’s the folks that want to build experiences (ie, consumer apps or kiosk apps) that are in pain.

If you find any information in here incorrect, please notify me and I’ll get it changed.

56 comments

  1. Fahad

    Very clear and in-detail explanation. I was also thinking in-terms of Direct2D and was about to mail Rob Rylea on the facts in which WPF rendering could be changed. Now, I’m just gona forward this blog post :), and request him to check on WPF vNext. It is critical if the performance issues in WPF gets resolved. Everyone would sleep more! thanks again.

    -Fahad

  2. Tom

    Thank you for sharing the details of your observations. It’s interesting stuff and I don’t doubt that there’s huge room for improvement. Unfortunately, I’m afraid the WPF team probably has (or should have) bigger concerns than performance issues of this sort. For example, correctness of behavior on a wide variety of systems, mitigation of complexity, friendliness towards users (developers, end-users, underlying APIs, the OS, etc), not being buried by the opposing factions within MS, and so on.

    Correctness alone is huge. You mentioned a way that cached composition could be more sophisticated, but the current implementation doesn’t even work correctly in some cases that come up easily in a real-world app. That kind of thing is far more important to fix than performance of drawing ellipses IMO.

    It seems pretty easy to look at a real-world system that actually shipped and pick apart performance for specific scenarios. I think it’s only fair to also consider those attempts to make things better that failed or weren’t attempted for all kinds of reasons, such as undesirable interactions with other internal systems, driver incompatibilities and bugs, concerns about maintainability or future expansion, lack of manpower, political problems, etc. There’s also the considerable work needed to create something entirely new that has to offer the same capabilities (and much more) that other technologies evolved for years. Something that has to cover all those edge cases. (Not that WPF does, but it goes a long way.) And then there’s ever increasing difficulty and risk associated with making changes to the system as time goes on.

    Maybe such thoughts aren’t relevant to the current discussion, in which case I apologize. I just think that, all things considered, they actually did a pretty decent job with rendering performance if one excludes WPF3D and remembers the focus of the technology: application UI. I’m seeing very smooth scrolling and animation in my current WPF4 app even on a cheap Atom-based netbook with 1GB of RAM and Intel integrated graphics. Perhaps I spent too much time with Silverlight and my ambitions are too modest. I’m not drawing enough ellipses yet. :) I’m curious to know what sorts of real-world WPF app scenarios have found developers unable to achieve the performance they desire. How does building “experiences” and “consumer applications” translate to demand on the rendering system, specifically? These terms seem a bit nebulous and perhaps worthy of some clarification.

    In my opinion, the biggest and most noticeable performance problem with WPF in a real-world app is not rendering but cold startup time. It’s painful on systems with slow disks. I hate needing a splash screen for my app and making people wait as the disk is molested for as long as 20-30 seconds. I still need to see what kinds of tricky things I might be able to do to cut this down, but I don’t think it should be this bad out of the box.

    Anyway, I’d love to see improvements in rendering performance, but I suspect there are probably many other things that the WPF team should address first. These are just my thoughts as an outsider… someone trying to make WPF work in a real-world app of substantial complexity.

    P.S. I’ve stepped through WPF source code in the debugger in the past. It was made available to the public some time ago. Were the parts relevant to your examination of rendering behavior excluded? I was just thinking it might be easier than using PIX. Then again, maybe not… :)

  3. Christian Jetter

    Great article!
    I totally agree with you on the “performance, performance, performance” point and I love your detailed analysis.

    My problem is that I absolutely love WPF but currently WPF’s performance and airspace issues prohibit so many really revolutionary designs. I am developing most of my stuff for Surface and WPF is simply the best platform for multi-touch and tangible UIs, but it is still too sluggish and still has all these issues with airspace and interoperability (e.g. WebBrowser control). This is definitely on top of my wishlist for vNext!

    I think Tom completely missed your point when he reduced the problem of inefficient tessellation to ellipses. I strongly assume that all kinds of hardware rendered UI elements of WPF have to be tessellated (except textures of course). Therefore optimizing this process is absolutely essential. For example, text rendering is certainly something that all UIs have to struggle with…

    For me, it would be very helpful, if you could comment on what you believe happens during WPF’s text rendering. What I experience here are extremely slow frame rates and lags as soon as we have a lot of text on the screen. With regard to what I learned from your article, I guess that the tessellation of fonts can become extremely costly and is some kind of a worst-case scenario for WPF performance. Do you agree? Or is text rendering a different story?

    Christian

    • Pete Brown

      @Christian

      Airspace is a big deal for us as well. I’m not sure if you caught the PDC announcements and presentations, but we’re taking a big stick to the airspace issues with WPF vnext. The web browser scenario you mentioned is a big one we intend to fix.

      Pete

  4. Mike Carlton

    A couple of years ago I built a type of flow chart application using WPF. At that time WPF had serious issues with poor rendering of fonts and issues with the performance of dependency properties.

    I came to the conclusion that it was ahead of its time, i.e it really needs a 250dpi monitor technology and a lot of CPU power to overcome the problems with dependency properties.

    Silverlight seems to be dying now anyway. It is poison to the average consumer website to demand that visitors download a plugin, so SL is largely hidden behind firewalls.

  5. Ben

    ” I’ve been asked by customers why they cannot deliver the same fluid UX they have on their iPad application using WPF or Silverlight on a PC with four times the horses.”

    Thank you. I’m developing a touch screen WPF application right now, and while I love the API and the productivity it affords, my (relatively simple) application’s UI just feels pokey. I get the same question from other developers and management: “why do phone applications running on dinky ARM processors feel so much smoother?”

    Then I will see some cool HTML5 canvas example posted on Reddit, check the CPU loading it causes, and think there’s no way I could do that in WPF without 2x the loading. Even jquery widgets seem to behave more responsively than my WPF app.

  6. JohnW

    Could you mention which version of .NET you tried this on? When Microsoft implemented WPF for the IDE in VS2010, performance did get better.

  7. Cory Plotts

    Great article Jer! I, too, am frustrated by hitting the WPF perf wall … time after time. This shouldn’t be the case … WPF is a beautiful framework and perf is the only thing holding it back.

    Let’s break it free! It wants to fly!

  8. Pingback: A Critical Deep Dive into the WPF Rendering System | Switch on the Code
  9. Pingback: Thoughts on WPF / Silverlight « Code972
  10. Trackback: Microsoft Weblogs
  11. Pingback: Top Posts — WordPress.com
  12. Tom

    @Christian – my point about not enough ellipses was sarcastic. Sorry this was not obvious. Of course other graphics primitives are going to require tessellation. But I think it’s a big mistake to take an analysis of one case (such as an ellipse) and assume that other things (such as text) behave the same way. It’s also a big mistake to analyze a single operation (i.e. single loop iteration) when what’s important in the real world is the net result of a number of operations (i.e. 100 or 1000 or 100000 iterations). When trying to understand the truth of performance, experience says it’s important to be specific, make minimal assumptions, and do a proper investigation with careful analysis of the behavior and data observed.

    Imagine you’re a programmer on the WPF team. You’ve heard from a customer or two that text looks blurry. (Sarcasm again, ok?) You’ve also heard that it’s too slow. So after several years of painstaking effort, you finally introduce new text drawing code that solves all kinds of problems. Fast forward to the present. Some customers come along and assume that drawing text works the same way as drawing ellipses. Well maybe it does, maybe it used to and doesn’t anymore, maybe it never did! Wouldn’t you be kind of irritated to see people jumping to such conclusions?

    I have a real WPF app with plenty of text in view and, as I said before, its rendering performance is excellent even on a very low end machine. If WPF was really as bad at rendering as this post and the comments make it sound, I just don’t know how this could be possible.

    I’d like to see more concrete evidence and less jumping to conclusions, that’s all. It’s a bit too convenient to take one person’s detailed analysis and say “aha, that must be the reason behind x” where x is something different.

    • Christian Jetter

      @Tom

      Sorry for getting the sarcastic bit wrong. I also totally agree with you that jumping to conclusions would be the wrong thing to do now.

      However, in my opinion, Jer’s approach appears very sound and solid and delivers true insight. From my experience, to understand a problem, it is sometimes inevitable to look at a single case for reasoning about the whole (talking about debbuging). Of course a large-scale analysis would be helpful now, but then again: is that our job as WPF users and should not be MS doing such large scale surveys? I mean it shows true dedication to a platform, if a user like Jer tries to look into the inner workings and details to find out what’s going wrong. I think this deserves our and MS’s support. It is not like that Jer dully dismisses WPF as “too slow”.

      Everyone of us who has faced WPF’s performance issues has most likely tried out a whole bunch of different approaches to find out what’s going wrong. To some extent the MS performance and profiling tools do their job, e.g. if it’s about beginners’ mistakes like CPU-rendered drop shadows or large visual trees or far too many images at a time. MS also offers a lot of blog posts and the WPF team gave a lot of talks and tutorials with practical advice.

      However, after initial progress with all that you hit what @Cory called the “WPF perf wall” above. You know what you are doing is quite costly (e.g. in my case it’s Zoomable User Interfaces with lots of text and controls at different scales on one screen), but still frame rates are far too slow considering that WPF is hardware accelerated and other platforms achieve similar user interfaces with higher performance only on the CPU. It’s also kind of confusing if your pixel shader leads to 50% CPU load. :-) If you then test your code on high-end hardware with a NVIDIA QuadroPlex and a Core i7 and you do not see any performance improvement, you start thinking about a bottleneck which lies below the application layer. And this is where the trouble with WPF starts: existing profiling and performance tools are too high-level to understand what’s going wrong and MS does not really give much insight into what’s happening below the application layer. So I think it’s very important that Jer highlights selected problems to create the necessary awareness.

      It’s great for you that you have never experienced any performance problems in your applications. I am daring to say that this is most likely due to the fact that you are dealing with a rather conventional GUI. However, WPF is MS’s platform for the whole Surface and NUI eco-system where things have to look and feel very different from traditional GUIs, e.g. zoomable and rotatable user interfaces that feel much more like a game than a traditional dialog-driven GUI. Regarding this future scenario of use for WPF – which I personally find much more convincing than the initial belief that WPF will replace Win32 or WinForms – performance is absolutely mission critical.

      Also, I have not claimed that text rendering behaves in the same way as the ellipses from Jer’s example. I meant that from what I know this could be the case and if that should be the case the impact on performance would be grave. Therefore I have asked Jer for his opinion on this. I have not drawn the conclusion that my performance problems have the same origin, but I really would like to know if he believes that this could be the case. If text rendering happens without tessellation in textures this is not the case of course.

      • Tom

        Christian – it sounds like you’re thinking about this reasonably. My frustration is with the “WPF is a big slow inefficient piece of crap” bandwagon that people seem to be jumping onto here. A few responses to things in your post:

        Profiling tools should be good enough to get a sense of where the bottleneck is. If you know you’re burning a lot of CPU, find out whether it’s managed code or native code. Some good free options are SlimTune (for managed) and AMD CodeAnalyst (for native). You can easily find out if the display driver is the bottleneck, for example, which would suggest the kinds of problems Jer is talking about here – resource abuse, too many draw calls, etc. PIX can also be useful but it’s relatively difficult to work with.

        I didn’t say I haven’t experienced performance problems when working with WPF, or in my app. I have. But I overcome them through virtualization, cached composition, and keeping the visual tree reasonable. And I’m happy with the result, aside from the correctness issue I mentioned.

        I’ve always had the feeling that there’s a lot of overhead with WPF, especially in the managed layer with dependency properties. It doesn’t surprise me that there’s inefficiency in rendering. But how much these rendering inefficiencies contribute to the problems people encounter in the real world has not been proven at all – yet people act as though it has. This is the source of my frustration. Too much has been assumed.

        You are correct in saying that my app has more of a conventional UI. It’s more like an Office app than an iPhone app. Although, a major component of it is 3D and I chose not to use WPF for that part. Preliminary research indicated that WPF3D was terrible. As you distance yourself more and more from the conventional UI paradigm, I have to question whether WPF is the right tool for the job – or ever will be. Maybe this part of Jer’s message didn’t sink in for me before. Do people really expect WPF to handle any kind of UI? It never struck me as a Scaleform competitor. And I’ve never associated a high level API with high performance, UNLESS I stay within the lines (do not deviate from intended use). How far do the lines for WPF extend beyond conventional UI? Hard for me to say. It sounds like not as far as many would like.

      • Ben

        I agree that WPF is great for conventional UI’s, but anytime you get into continous animations–and the animation system is one of WPF’s key selling points, in my opinion–CPU loading goes off the charts.

        I’m currently working on a touch UI for an embedded medical device, and the requirement is basically to provide an iPad app-like UX. The machine has probably 2-3x the raw horsepower of an iPad, but no matter what, the UI just won’t feel as smooth or responsive as a slick iOS app.

        For example, create a 300×300 rectangle and do a DoubleAnimation that animates its opacity from 0 to 1 in a loop. Very high loading, even if you set the framerate to 20-30.

        For another example, create an animated Expander with a large amount of content in it, or basically, slide anything with a large amount of content in it. This is a common iOS thing. It’s animation will be stuttery, even on a Core i7.

        In my current application, I’m animating the scale of a vector graphic that is approximately 100 pixels square. It sits at the bottom of the screen, but during the animation, it for some reason invalidates a rectangle that extends all the way to the top of the screen. This causes parts of a chart to be redrawn and sends loading through the roof.

        For another example, try using a DropShadow, or any pixel shader effect, and animate the RenderTransform of an object that has the effect applied to it. Extreme CPU loading, even though pixel shaders should be running on the GPU, right?

        For another example, try to implement any sort of real-time charting updating at 20 Hz or faster (e.g., medical waveforms or live trends) without ridiculous CPU loading. I went through the gamut of overriding OnRender, using DrawingVisuals, WriteableBitmap + GDI plus, etc. High CPU loading accross the board, even with cached composition. We’ve gotten the CPU loading to an acceptable level after considerable effort, but again, it just doesn’t have the smooth and ultra-responsive feel of a typical iOS app.

        I know we could use Direct2D or some other native solution, but the reason we chose WPF/.Net in the first place was for the enormous developer productivity it affords. I hope that WPF v.next will either address the performance or provide easy Direct2D interop, ideally through a well thought-out managed wrapper.

  13. Pingback: Windows Client Developer Roundup 059 for 2/14/2011 · All About Computer
  14. Bigsby

    Nicely done, Jer.

    I feel for your pain but I’m also very sympathetic with the Vole’s timings on these matters. There are far more important issues than this to approach in WPF/SL world. It’s a matter of compromise. And sure some bad decisions were and will always be made.

    About the performance thing, I must say that my experience is that either someone wants the whole world on a browser in 3D (and millions of particle fogs) or there is a very poor implementation of UI. Things like very deep visual trees with tones bubbling events.

    Everyone wants 3D on SL5 but I wish/hope this doesn’t happen. Although I gather enabling 3D improves everything else’s performance I think it’s too soon for so many people to look at WPF and SL as 3D engines. We are still very far from that. Honesty, it should be looked upon as a merely academic feature.

    Here, I’m totally with David Platt. If candy doesn’t serve the purpose of the UI it should be removed. WPF/SL and the Binding engine have no competition in LOB and that’s their purpose. Other usage is like using an iPhone to print. Personally, I’m getting fed up already of all these blinding gradients and, even worst, transparencies.

    Anyway…loved to read your article. The content is great but, in my opinion, misplaced.

    Cheers.

  15. Paulus

    Jer,

    Thanks for this excellent article.

    Regarding WPF perfomance, some basic capabilities like “Enable RedrawRegions” and “Enable FrameRateCounter” would routinely help in optimizing LOB WPF development and discover/validate adequate design strategies.

    Cheers, Paulus

  16. Jon Harrop

    I codeveloped a vector graphics GUI library that used hardware acceleration from 1998 to 2002:

    http://ffconsultancy.com/products/smoke_vector_graphics/

    We put a lot of effort into optimizing using the techniques you describe and many more besides (we would incrementally update tesselations in real time during animations and could zoom and pan around Gb PDF drawings). However, we made the mistake of not investing nearly enough time and effort in stability across a wide range of platforms and, consequently, many people suffered from buggy OpenGL drivers and our product was a commercial failure as a library (although it has been used in many demonstrations and presentations).

    With the benefit of hindsight, I think Microsoft did exactly the right thing. In point of fact, we have since ported our higher-level code to F# and WPF/Silverlight and it is a commercial success as a product (and continues to be used in demonstrations and presentations):

    http://ffconsultancy.com/products/fsharp_for_visualization/

    Also, I should mention that the performance problems don’t stop with just the rendering. We had a problem with our WPF product where a 3D animation would stutter visibly as the mouse was moved and the problem turned out to be the hit testing in WPF. My first foray into investigating this was to write a completely naive function to do the hit testing myself, simply by intersecting the pointer with each triangle in the scene. Amazingly, my completely naive implementation turned out to be around 10,000× faster than the built-in WPF solution. To this day, I cannot begin to imagine what they must be doing to make that code run so slowly…

  17. Abu S3ood

    I have noticed a performance difference -smoother transitions- in a Kiosk app I did back with .NET 3.5 SP1 -I haven’t changed the frame rate- after disabling Aero Glass in Windows and it was a very noticeable.

    I seriously think that the bottleneck have been in the OS’s GPU acceleration where an app and an OS that is GPU intense fight for GPU cycles.

    Maybe someone with Multi-GPU Cards on a PC can tryout an App and give feedback as the lower-level GPU paralleled engine might might lead to that Fact -That at the OS level GPU Acceleration needs a re-arch -

    • Ben

      I can confirm this observation. Disabling Aero (which turns off the Desktop Window Manager) really improved our WPF application’s performance. This was the only way we were able to achieve acceptable performance on an Intel Atom processor. I shared this feedback with both an Intel engineer and an engineer on the WPF team, but who knows if they’ll actually do anything about it.

      FYI the Intel engineer said that WPF does some questionable stuff, like calling Flush() way too often.

  18. Luke

    I really love WPF from the development point of view. Its flexibility and development efficiency, the binding systen, the animation system, this is all really great stuff. I just wished that the performance would be at least on par with other UI frameworks. But since many versions now, I do not see major improvement in this area. And I do not have the feeling that MS is taking this seriously enough.

    With the arrival of all those new Pads, Slates, Tablets, etc, performance and efficiency is a crucial point while still providing rich, animated user experices for end user applications! These devices do not have the horse powers of your multi core desktop machine, they work on much less power, and you also do not want to waste their CPU cycles (=> battery level). Both iOS and Android show that this is possible on ARM based devices. With WPF this is impossible right now, even a much more powerful system like a dual core atom is not able to smoothly render a halfway rich WPF ui.

  19. Pingback: the rasx() context » Blog Archive » “A Critical Deep Dive into the WPF Rendering System” and other links…
  20. Pingback: Die MIX 11 ist vorbei und von WPF gibt es nix Neues, Silverlight wird auf Windows reduziert und HTML5 soll es richten « Der faule Programmierer
  21. J Trana

    Jer,

    I’ve been watching your blog for a while now and the question that comes to mind is this: why not write a framework that would act similarly to a subset of WPF using your latest library where MS chose to use MilCore? I know that’s a big undertaking but with a solid MilCore-esque layer it would be much easier (so that one could in some ways start with DrawingContext half implemented). Although WPF (w/ XAML templating, styles, etc.) is powerful and expressive I’ve wondered about a few other core decisions as well: why can’t I build visual trees in code like XAML can – why is that API closed? Is INotifyPropertyChanged really the right way to signal ViewModel changes? For that matter, why are binding times listed at greater than 100 ms??
    Anyways, I guess I’m curious about your thoughts on some of those questions. I think the core concepts of WPF are pretty elegant, but…

  22. Karoline

    I have always wondered why my WPF Applications perform so poor. Unit now, I have blamed it on the .Net Framework and it being a “managed Environment”. Performance (+bugs in Wpf) have convinced me to stay in the c++ Win32 world. I think, WPF wil phase out in the near future.

  23. Jesus

    Hi jeremiah,

    we are working in a propietary wpf implementation based on c++ and GPU.

    Could you contact me please?

    Thanks!

  24. Pingback: Feature: Why Microsoft has made developers horrified about coding for Windows 8 – JailBake
  25. Leo

    what about WP7?

    silverlight for wp7 isn’t perfect but it performs decently and with really good frame rates as long as you keep the fill rate below the recommended maximum guidelines. animations can be very smooth considering the hardware is a fraction of the power of a PC. I am wondering if the optimizations to the rendering pipeline for wp7 aren’t being ported back to win8 as we speak.

  26. Josh

    “Those of you who have been following me on Twitter”… and where pray tell would that be? There’s not a single link to your Twitter here or anywhere else on this website.

  27. Pingback: Is Microsoft Killing Silverlight?
  28. Pingback: Why Microsoft has made developers horrified about coding for Windows 8 | WorldWright's …
  29. Chris Nahr

    Great article. Now that news about the “Windows Runtime” in Windows 8 start trickling out, it seems that we’ll finally get a fast native XAML implementation, although it’s not yet clear how the existing WPF API will profit from that.

    Meanwhile, some people here and elsewhere have wondered how to do fast custom drawing in WPF (frequently updated charts etc.). You could embed DirectX (e.g. via SlimDX) but that’s rather laborious. However, I got a surprising result from a small benchmark for triangle drawing: double-buffered GDI+ can be an order of magnitude faster than WPF, even when unbuffered GDI+ is slower! Since GDI+ is very easy to use within a WPF application, that’s a good first choice when you need more speed.

    http://www.kynosarges.org/WpfPerformance.html

  30. Good Night and Good Luck

    I think WPF is broken at the core. Apart from the rendering, the approach to object layout is horribly slow.

    We can’t get a test project with a single column of text in a ListView to scroll smoothly on an Atom tablet. We are using only stock objects with all optimizations (virtualizations, etc) enabled. Profiling shows severe CPU spikes in the Measuring system. The problem is in the entire layout/drawing approach, and the design appears to be flawed at the planning stage. After reading the WPF sources I do not believe it can be fixed.

    A test project in C++/MFC doing the same thing is totally fluid. WPF performance is absolutely pathetic and is probably why MS is leaving it behind.

  31. falken

    Thanks. This is really serious dive inside. I still hope that strategic thing here is the XAML. When you consider WPF as V1 managed prototype, then SL as V2 native prototype then today must be released V3 native rewrite, but still using XAML interface, almost unchanged, but with bindings also to C++ and integrated with HTML5/JS. And if anything goes to 3rd version not only in MS, then its often done really right. Really, your comments to current rendering flaws are serious, no doubt.

  32. Andrew B

    Absolutely excellent article. I’ve always suspected WPF was doing something horribly wrong to achieve lower performance than software-based renderers like GDI. I suspected it was the layout engine (which always shows up as a CPU hog in profilers) but to my horror the rendering engine is also flawed!

    Well what to do about this? I don’t see WPF and Silverlight going away any time soon. There are businesses running LoB apps with Win32 and later Windows Forms and these technologies were supposed to ‘die’ years ago. However they really need to do something to get the performance up to scratch.

    I guess the proof is in the pudding. Microsoft have barely used WPF in their own products. Its a shame really, it really is a good framework other than that!

  33. Pingback: Fast Line Rendering in WPF
  34. Pingback: Jeremiah Morrill & The Scholarly Kitchen « Kynosarges Weblog
  35. Pingback: Grand Subscriptions & Miscellaneous Update « Kynosarges
  36. Philip

    This question is not entirely related to the topic, but I wanted to ask you about something from your previous blog and I figure I have better chances here to get noticed – sorry for the inconvenience.
    A while ago you have posted a code which allows to get to the DirectX stuff behind WPF. My question goes: is it possible to change the BackBufferFormat used by WPF? I’ve tried to achieve that with your hack, but with no effect. My aim is to change the format to 10bit (e.g. A2B10G10R10) to be able to use specialized monitors, but for the sake of testing on normal monitors 5bit format (e.g. A1R5G5B5) is a good indication (it shows ‘stripes’ on gradients when tested on pure DirectX). Any help would be appreciated.

    • jeremiahmorrill

      Sorry I can’t really help with this. Changing something like that on such a integral, low level is bound to break things. With the knowledge that milcore was the precursor to Direct2D, and D2D needed very specific surface formats, it’s highly unlikely that this would work. :(

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s