libuvxx – a libuv based c++ library

https://github.com/jmorrill/libuvxx

I’ve been doing some development on these small, IoT like devices, such as Raspberry PI, Beagle Bone and Galileo.  I first tried out the .NET experience via mono.  This stuff works fine for some tasks, but some applications I’m looking to do need to be real time, and are relatively heavy.  I ported some of our media streaming stuff over to mono and unfortunately you could visually see the GC kicking in via micro-stutters in the video, even with the sgen enabled.  I then wanted to explore technologies and techniques that better fit these relatively low powered, sometimes single core, IoT devices.

“Dat Event Loop Tho”

There’s been a recent upswing in the popularity of event loops for systems programming.  This is primarily a single thread that services I/O asynchronously.  I/O event loops are nothing new.  If you’ve ever done any GUI work, you’ve used an event loop, though typically the input was keyboard, mouse and output was GUI layout and rendering.  With systems programming, the event loop model’s I/O servicing is mostly asynchronous socket or file system calls.  Why is this important and how is this better than a pool of 40 threads servicing I/O and timers?  The answer is lies in the fact that threads are not free of cost.  They take up memory (mostly in stack space), and they cost CPU in context switches from the operating system scheduler.  In a ideal situation, an application only have a maximum of one thread per CPU core and these threads would never block on I/O, only invoking asynchronous I/O calls to the operating system.  I will stress that this is only a vector of “ideal” and usually not the current reality of application design or technology (more on this later).  Given that an event loops encourages single threaded development, I felt this was a perfect starting point for low powered IoT devices.

What Programming Libraries Support Event Loops?

There’s quite a event looping libraries out there, some with very specific uses and some with generalized ones.  Some of the generalized libraries are asio and libuv.  In the end, I chose libuv as I felt it to have a) a larger abstraction surface, b) a simple bare-bones API.  libuv is also one of the libraries that power the popular node.js.  libuv leverages asynchronous OS APIs on sockets and uses it’s small, shared thread pool for calls that only have synchronous calls (such as resolving DNS).  Unfortunately libuv uses it’s thread pool for all file system calls because Linux’s async filesystem APIs suck.  Windows has async calls (via IOCP and APC)  for read/write operations, but apparently they can block under certain conditions.  I still advocate libuv doing APC/IOCP on Windows, even on the thread pool…just to free up any contention with any other blocking calls happening.  In the end, this is just a reality of asynchronous I/O today.  It is not a deal-breaker as the libuv thread pool is small (unless reconfigured via UV_THREADPOOL_SIZE environment var) and ignoring disk caching, a disk will not go faster just because you throw more threads at it.  The threads are just in a blocked state until the I/O request completes.  This can become an issue if dealing with high latency storage, like a network share, or plan on writing/reading to more storage devices simultaneously than your thread pool has threads.

Why Develop a Library Around libuv if libuv is So Great?

libuv is a C library and really “looks” like someone reimagined a POSIX/BSD sockets API to make them asynchronous.  It’s bare-bones, which makes it very flexible.  The difficulty with libuv actually has nothing to do with the library, but with the nature of asynchronous programming.  The dreaded “callback hell”.  Coordinating callbacks quickly becomes a nightmare.  C# had System.Threading.Tasks to help solve the problem, then followed up soon after with it’s async/await features.  Javascript has “promises”.  C++ also has a billion “solutions”, but none I personally liked and none that really felt modern…except Microsoft’s “PPL Tasks”.  This the closest I’ve seen to Javascript promises.  It has a very natural syntax.  I wanted this PPL Task library working with libuv.

Microsoft PPL Task and libuv Mashup or “Is libuvxx Just a Redundant Library?”

Microsoft was nice enough to release a cross platform, Apache licensed version of the PPL tasks. It’s contained in the Casablanca SDK.  I know what some may be thinking here.  “If Casablanca is already cross platform, supports ppl tasks, why bother making a NIH library?”  Casablanca does support IOCP file operations on Windows (sync calls thread pooled on Linux), but does not have cross platform socket APIs (does have HTTP client server though) and does NOT support an event loop OUTSIDE a Windows Store Application.  I need something that is optimized for single core, but scales up to many core.  Even though Casablanca is an extremely well made library, it wasn’t exactly what I was looking for.  Also, working with a well made, well tested library didn’t sound as fun :)

What Does libuvxx Look Like?

Before going further I think it’d be good showing some of the library usage.

/* get the dispatcher for the current thread */
auto dispatcher = event_dispatcher::current_dispatcher();

/* run the event loop*/
dispatcher.run();

If you have ever done any work with WPF, you may find this very similar to the WPF Dispatcher.  You never create a dispatcher object directly, but always via the static function “event_dispatcher::current_dispatcher()”.  Internally we keep weak references to all dispatcher and return them based off the current thread’s ID.

/* get the dispatcher for the current thread */
auto dispatcher = event_dispatcher::current_dispatcher();

/* get all files in all subdirectories */
fs::directory::get_files_async(“C:\\Users\\Jeremiah\\”, true).
then([](task<fs::directory_entry_result_ptr> task)
{
     auto file_list = task.get();
}).
then([]
{
     /* do something else */
});

/* run the event loop*/
dispatcher.run();

This next example shows getting a list of all files from a giving path, and receiving it via a task continuation.  All the continuations here run on the event_dispatcher thread.  How does this all work?  Hell if I know, but I did spend weeks in a debugger, modifying the pplx task library and got a bit lucky.  If you want to see a fuller example you can check out this test project.

Seriously, How Does the PPL Tasks Dispatch to libuv? What Changes Were Made?

Even though the Apache 2.0 PPLX Tasks is a well made lib, it’s a SOB to figure out due to it’s high use of templates.  There may have been better ways of modifying the PPL Tasks, but I figure if you fork it, then make it your own.  First part of business was ripping out almost all the WinRT related stuff.  What we really needed was the task_continuation_context::use_current() abilities.  Under WinRT, this continuation context captures a special COM object to use later.  This is not so much different in concept to the .NET SynchronizationContext class.  This COM object is closed source, but we can assume it queues something up in the WinRT message pump.  I replaced usage of this COM object with a simple thread id value of the current thread if it is an event_dispatcher thread.  When the call needs to be processed, I simply pass the method to execute to the event_dispatcher, using the thread id to look up the correct dispatcher.

I have also made many performance improvements..and broke a few things (like stack trace capturing).  The vanilla PPLX Tasks oddly will dispatch ALL calls first to the thread pool, then they will be passed back to the UI thread in WinRT.  In my tests involving a tight loop of ppl tasks, this was the heaviest operation.  It seemed silly to even involve the thread pool if the method was not task_continuation_context::use_arbitrary().  I modified this to simply dispatch to the event_dispatcher’s function queue and immediately saw huge improvement.

I also ripped out a lot of places things could be std::move’d or passed by reference.  My profiler showed quite a bit of copying going on.  This was also followed up by reducing the amount of atomic reference counting was happening.  I may have broke something, but so far my changes appear “stable”.  I also have a more optimized version of create_iterative_task, which reduces a lot of copying an possible reference counts.  My version only exits the loop via exception, but I possibly may add an exit-by-return-false.

How Far Along is this Project?

Not too far.  Much of my time was spent in the pplx and reducing overhead as much as possible.  A lot of the APIs were copied from the .NET BCLs so they should be semi familiar.

So far you’ll find:

  • fs::directory – Static functions for querying for files or directories and deleting directories.  Supports recursive delete and reading of contents.
  • fs::path – Helper functions for dealing with filenames (like .NET’s Path)
  • fs::file – A file class for read/write
  • net::stream_socket – A socket client for read / write.  Had to do some funky stuff to make the read_async work well.  Still more to be done.
  • net::dns – Functions dealing with name resolution
  • uvxx::event_dispatcher_timer – Executes callback on a interval.  Leverages libuv timer.
  • uvxx::event_dispatcher_object – Much like a DispatcherObject in WPF.

As far as the usage goes, the objects are passed around by value or reference and act like a smart pointer.  So to initialize a stream_socket, you’d just type:  stream_socket s;

What Operating Systems Does This Support?

I’ve tested on Ubuntu on Beagle Bone Black, Windows and Mincore on the Galileo.  For Galileo I had to make some small changes to libuv as a method was not supported.  The build configuration is in the main msvc solution.  On Linux, this requires at least GCC 4.8.  I cheated and used VisualGDB and Linaro cross compiler so I could do all the Linux stuff in Visual Studio. A Linux savvy person should be able to use the .mak files and compile right on Linux.  I’ll save that for another blog post as this one is getting too long.

-Jer

Managed Galileo Arduino-ish Stuff

With some pointers from @Pete_Brown and @BretStateham, I started writing a .NET interop layer for the Galileo I/O stuff that currently only exists as a C++ API.  This includes the GPIO, Analog-to-Digital converters, I2C, PWM and SPI APIs.  This will allow folks to use the mono, currently, to control the I/O that comes natively with the device.

At the moment, I largely only have the p/invoke (mostly) completed.  These mostly need a .NET style wrapper around them.  This is mostly simple work that needs to be done, but it’s not all straightforward.  Looking the the C++ SDK code for the Ardunio, there are port mappings for the pins.  This seems to be from the C++ SDK striving to have an Arduino style API (which is possibly some unofficial “standard” amongst devices) and things like I/O ports are different on the Galileo.

No promises for finishing this up soon, but the code so far is here on github here.

Mono on Windows for Intel Galileo source code

Previously I blogged about running mono on the Intel Galileo board and provided the binaries.  I promised the source code and wanted to make good on that.  I have uploaded this experiment to github here.

Some Quick Notes on Compiling

At the moment, compiling is sort of a “two pass” deal.  I’m sure someone with more gcc experience could make it a single pass, but this method works well.

Once you download the source, more or less follow the instructions here but use the source from my repo: http://www.codeproject.com/Articles/815565/How-to-build-Mono-on-Windows Those instructions will essentially build the mono runtime, compiler and .NET BCLs.  The native mono runtime and other native binaries from this output is not compatible with the Intel Galileo, but it does give us valid BCL dlls.

Next open up mono\msvc_minwin\mono.sln.  Compile a Release|Win32 or Release_SGEN|Win32.  Copy the binary outputs (mono*.exe and mono*.dlll) to your monoinstall\bin folder.  Copy your mono installation directory to your device and enjoy.

Quick Notes on the Code Changes

  • Added a MONO_MINWIN preprocessor that the msvc projects define.  Changes are mostly minimal and are due to win32 apis being removed totally or forced to use the methodEx alternatives.  Some functionality is totally disabled until more research on removed win32 apis that have been removed (like EnumProcesses)
  • Added mincore.lib and /NODEFAULTLIB:libs to each MSVC project output.  mono.props modified also.
  • Modified some BCLs that were pointed to advapi32.dll.  These were mostly registry p/invoke
  • Added /arch:IA32 so the instruction set is compatible with Galileo

Mono on Windows IoT Galileo

Over the last week, since my Intel Galileo dev board arrived, I’ve been trying to get mono ported over to Microsoft’s “Windows on Devices”.  With a bit of recent success, I wanted to share the experience and give you something to download. 

tldr;Download link at the bottom of the post

Why Mono on Windows?  Doesn’t Microsoft have their own CLR and why not just use a Raspberry Pi- like board?

Right now Microsoft’s Windows on IoT (let’s call it mincore until we are corrected), there is minimal .NET support.  The CLR is there, but there’s little to no supporting libs.  In short it’s either incomplete, or perhaps just waiting for .NET native support.  I don’t want to wait around to see what Microsoft is doing with this OS.  I just paid $50 for this device, so I want to do something besides blinking LEDs. 

So why not just use an already Linux supported device running mono?  For some people, this may be a no-brainer and probably already doing it.  For others like me, we have a good chunk of dependencies on Microsoft technologies and/or APIs, which also may include our own C and C++ libraries.  So naturally, something I want is a cheap low powered device I can mostly recompile and not muck up a stable codebase in the process.  Mono on this mincore Windows just may fit, even if temporary.

What is this mincore-Windows-IoT and what’s involved in porting Mono over to it?

There’s little out there that really tell us the direction for this Microsoft OS.  But what we can tell at this point is that its SMALL.  The compressed WIM file is 171 MB!  So you can imagine how much “stuff” was cut out of this.  There’s no shell32.dll at all.  Some Win32 functions are totally gone or moved to different DLLs.  Mono already supports windows, but in the WIN32 build configuration it’s runtime and BCL use a good share of Win32 API.  It should also be noted, the specific Galileo CPU doesn’t support any modern instruction set.  No MMX, SSE, or the like.  Luckily in this case this just came down to compiler flags.

The first step I took in trying to get mono working was simply to compile it using cygwin/mingw.  I don’t have much experience with gcc, but I couldn’t get a single build of the mono runtime to execute without an “illegal instruction” error.  This pointed the problem to compiler switches (-march=i586 in this case), but no matter what I tried, I got no love.  On the plus side, this process did create working Mono BCLs.

I just about gave up, when I decided to see if anyone used the MSVC compiler on Mono, which is when I found out I should have RTFM from the beginning.  There’s an msvc folder ready to go in their source tree, with Visual Studio .sln and project files ready to go.  Changing the compiler flags to /arch:IA32 on all the projects, but also adding mincore.lib to the linker with additional options such as:

  -d2:-nolock /NODEFAULTLIB:ole32.lib /NODEFAULTLIB:kernel32.lib /NODEFAULTLIB:user32.lib /NODEFAULTLIB:advapi32.lib

The linker stuff is very important.  The mincore.lib appears to provide exports for moved win32 functions.  For instance, CoInitializeEx used to exist in ole32.dll, but now that the dlls is gone, it’s in api-ms-win-core-com-l1-1-1.dll.  the /NODEFAULTLIB:lib just tells the linker to ignore the standard win32 stuffs and use the ones defined in mincore.lib.

Figuring out what Win32 methods just are missing was another issue.  At runtime, when windows loads up a DLL, it enumerates through it’s imports to verify them, and if one export is missing, you app will get a 0xC0000139, missing entry point.  To find out which methods these failing at is easy. 

reg ADD “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager” /v GlobalFlag /t REG_DWORD /d 2

That turns on Windows loader snaps when you debug an application after a reboot.  This means running an app is very verbose, but you get the missing Win32 you were looking for.  One the missing method was identified, that method usage in mono could be changed to the alternative win32 method (using the win32Ex alternative), or the calling method in the mono source was otherwise modified.

What about p/invoke stuff in the BCLs?

There’s also a good amount of DllImports in the Mono BCLs.  Win32RegistryApi.cs is a good example of an internal class that is used quite a bit that will fail.  They point to entry points in advapi32.dll, when they have been moved to API-MS-WIN-CORE-REGISTRY-L1-1-0.DLL.  I didn’t fix every BCL DllImport yet, but did at least patch this one because how much it is used.

What isn’t working with mono besides stuff that doesn’t already work with mono?

I can’t think of much that isn’t working now that I figured out some mistakes I made early on.  COM interop should be working now.  Some windows security stuff may be broken.  EnumProcesses doesn’t exist afaik, so that won’t work.  It’s a large test surface, with a lot of variables, but hopefully it will get you by until a supported CLR comes from MS.

Where’s the code? You gotta hook it up.  The power of GPL compels you!

It’s coming, I promise.  The problem is I hacked up so many things, just trying to get it to work, I’m working on cleaning it up.  I want it to be compatible the rest of the mono build system, only enabled with some preprocessor flags  Shouldn’t be much longer than a week.

Where’s the bins, I wanna try it out!

I uploaded the latest port here: https://dl.dropboxusercontent.com/u/4165054/mono_iot.zip, Enjoy (remember, JITTing on this is SLOW)

Also remember this environment variable if you like verbose output:  set MONO_LOG_LEVEL=debug

PS, Can you port Node.js to mincore for me?

No I will not.

EntranceThemeTransitionBehavior Behavior for WPF

I really liked the xaml transition themes in Windows 8.  They give a nice way to have consistent animations from within your application and through application to application.

One transition theme I really like in Xaml was the entrance theme.  One feature of this theme is it has an “IsStaggeringEnabled” in which it will stagger the animations of items within the same container, giving an incremental loading effect.

I wanted to use this effect in WPF and here it is.  I also added a few more features for controlling the animation that Win8 Xaml doesn’t include (and for good reason).

http://1drv.ms/Pkca5H

Please note that virtualization of things like ListBox can cause the effect to go overboard, so you may want to use: VirtualizingPanel.VirtualizationMode=Recycling attached property.

Pixel Shaders for Xaml Windows Runtime

Pixel Shaders first popped up in WPF around .NET 3.5 and were a replacement for the very slow BitmapEffects. This allowed developers to make GPU accelerated shader effects, such as Blur and DropShadow. When Silverlight started picking up steam, they were also added, but as a CPU (concurrent) rasterized version. Since then pixel shader effects have all but disappeared from XAML frameworks such as Windows Phone or XAML in Metro applications.

Windows 8.1 brings a lot of new features that can now facilitate pixel shaders of Xaml UIElements and is the goal of this library. Before any one gets too excited, there are limitations. Most imposed by the XAML framework and some by my library. But first, let me explain further.

How does it work?

The most trivial explanation on how this is accomplished: We use the new RenderTargetBitmap to rasterize a UIElement. The result is then loaded to a GPU surface and Direct2D is used to apply a pixel shader. The shaded surface is then presented with SurfaceImageSource. The original UIElement is left at 0.0 opacity, so it still remains interactive and makes it appear as if the shaded result is interactive.

To give a better perspective, here’s a sample of the usage:

<xamlFx:ShaderContentControl EffectRenderStrategy=”RenderingEvent”

HorizontalAlignment=”Left”

HorizontalContentAlignment=”Stretch”

VerticalAlignment=”Stretch”>

<xamlFx:ShaderContentControl.Effect>

<xamlFx:BlurEffect Radius=”{Binding Value, ElementName=amountSlider}” />

</xamlFx:ShaderContentControl.Effect>

<Button Content=”BUTTON”

Width=”300″

Height=”300″

BorderThickness=”4″>

</Button>

</xamlFx:ShaderContentControl>

Notice that the implementation is a subclass of ContentControl. My original idea was to make a Blend Behavior, but to make this work I had to modify the visual tree. While this would be possible from a Behavior, it wasn’t optimal and would cause unexpected issues to the consuming developer. The ShaderContentControl will capture the UIElement set to the Content and use that as the target for the RenderTargetBitmap. Within the ShaderContentControl template is an <Image/>, that sits directly behind the set ShaderContentControl’s content. This is used to render the D3D content. Here is what the ShaderContentControl’s template looks like:

<ControlTemplate TargetType=”ContentControl” xmlns=”http://schemas.microsoft.com/winfx/2006/xaml/presentation&#8221; xmlns:x=”http://schemas.microsoft.com/winfx/2006/xaml”&gt;

<Grid>

<ContentControl UseLayoutRounding=”False”

Width=”{Binding Content.ActualWidth, ElementName=contentShim}”

Height=”{Binding Content.ActualHeight, ElementName=contentShim}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”>

<Image x:Name=”rendererImage”

Stretch=”Fill”

IsHitTestVisible=”False”/>

</ContentControl>

<ContentPresenter x:Name=”contentShim”

Opacity=”0.0″

ContentTemplate=”{TemplateBinding ContentTemplate}”

ContentTransitions=”{TemplateBinding ContentTransitions}”

Content=”{TemplateBinding Content}”

HorizontalAlignment=”{TemplateBinding HorizontalContentAlignment}”

VerticalAlignment=”{TemplateBinding VerticalContentAlignment}”/>

</Grid>

</ControlTemplate>

Keeping the <Image/> “perfectly” synchronized with where the UIElement would be rendered proved pretty difficult. With most shaders, like a Hue/Saturation modifier, one would only have to place the <Image/> behind the UIElement and bind to its size. The problem comes when you have shader effects that draw larger than the original sample. Shader’s such as drop shadow and blur are good examples of this. If this is not handled, the D2D surface is a natural clipping area and your blur clipped, which sucks. To overcome this, a shader can supply a padding value which supplies how much the shader will grow on the left, right, top and bottom and the ShaderContentControl will handle the rest.

Performance?

Performance is always a big concern but this is one of those limitations. Though the RenderTargetBitmap in XAML is the fastest relative to all other implementations, it still accounts for almost all the CPU overhead here. On top of that, RTB will map GPU memory to system memory to retrieve the pixel buffer, just to be sent back to the GPU by me. The other issue, there’s no way to tell a UIElement has visually changed, so a constant re-rasterization needs to be done (in some cases). That all being said, it does perform well, but there’s gotchas:

The ShaderContentControl has a property called “EffectRenderStrategy”, which can take 3 enums:

  1. RenderingEvent – Renders on every CompositionTarget::Rendering event
  2. Conservative – Renders when a layout updated has occurred or the pixel shader has been invalidated
  3. Manual – Renders only when ShaderContentControl->Render is called

Known Issues?

  1. RenderTargetBitmap::RenderAsync has issues with rendering certain elements.
  2. RenderTargetBitmap::RenderAsync has a problem with items in a ScrollViewer, such as a ListBox. I have a repro example here.
  3. Shaders are not extensible outside the library. This could be resolved, but I may have to do a lot of work abstracting Direct2D or making it more ABI friendly.
  4. This is a project-to-project library at the moment. I may make it an extension SDK later
  5. I only wrote a drop shadow and blur effect. Other effects coming soon.

Where do I get it?

https://xamlfx.codeplex.com/