libuvxx – a libuv based c++ library

https://github.com/jmorrill/libuvxx

I’ve been doing some development on these small, IoT like devices, such as Raspberry PI, Beagle Bone and Galileo.  I first tried out the .NET experience via mono.  This stuff works fine for some tasks, but some applications I’m looking to do need to be real time, and are relatively heavy.  I ported some of our media streaming stuff over to mono and unfortunately you could visually see the GC kicking in via micro-stutters in the video, even with the sgen enabled.  I then wanted to explore technologies and techniques that better fit these relatively low powered, sometimes single core, IoT devices.

“Dat Event Loop Tho”

There’s been a recent upswing in the popularity of event loops for systems programming.  This is primarily a single thread that services I/O asynchronously.  I/O event loops are nothing new.  If you’ve ever done any GUI work, you’ve used an event loop, though typically the input was keyboard, mouse and output was GUI layout and rendering.  With systems programming, the event loop model’s I/O servicing is mostly asynchronous socket or file system calls.  Why is this important and how is this better than a pool of 40 threads servicing I/O and timers?  The answer is lies in the fact that threads are not free of cost.  They take up memory (mostly in stack space), and they cost CPU in context switches from the operating system scheduler.  In a ideal situation, an application only have a maximum of one thread per CPU core and these threads would never block on I/O, only invoking asynchronous I/O calls to the operating system.  I will stress that this is only a vector of “ideal” and usually not the current reality of application design or technology (more on this later).  Given that an event loops encourages single threaded development, I felt this was a perfect starting point for low powered IoT devices.

What Programming Libraries Support Event Loops?

There’s quite a event looping libraries out there, some with very specific uses and some with generalized ones.  Some of the generalized libraries are asio and libuv.  In the end, I chose libuv as I felt it to have a) a larger abstraction surface, b) a simple bare-bones API.  libuv is also one of the libraries that power the popular node.js.  libuv leverages asynchronous OS APIs on sockets and uses it’s small, shared thread pool for calls that only have synchronous calls (such as resolving DNS).  Unfortunately libuv uses it’s thread pool for all file system calls because Linux’s async filesystem APIs suck.  Windows has async calls (via IOCP and APC)  for read/write operations, but apparently they can block under certain conditions.  I still advocate libuv doing APC/IOCP on Windows, even on the thread pool…just to free up any contention with any other blocking calls happening.  In the end, this is just a reality of asynchronous I/O today.  It is not a deal-breaker as the libuv thread pool is small (unless reconfigured via UV_THREADPOOL_SIZE environment var) and ignoring disk caching, a disk will not go faster just because you throw more threads at it.  The threads are just in a blocked state until the I/O request completes.  This can become an issue if dealing with high latency storage, like a network share, or plan on writing/reading to more storage devices simultaneously than your thread pool has threads.

Why Develop a Library Around libuv if libuv is So Great?

libuv is a C library and really “looks” like someone reimagined a POSIX/BSD sockets API to make them asynchronous.  It’s bare-bones, which makes it very flexible.  The difficulty with libuv actually has nothing to do with the library, but with the nature of asynchronous programming.  The dreaded “callback hell”.  Coordinating callbacks quickly becomes a nightmare.  C# had System.Threading.Tasks to help solve the problem, then followed up soon after with it’s async/await features.  Javascript has “promises”.  C++ also has a billion “solutions”, but none I personally liked and none that really felt modern…except Microsoft’s “PPL Tasks”.  This the closest I’ve seen to Javascript promises.  It has a very natural syntax.  I wanted this PPL Task library working with libuv.

Microsoft PPL Task and libuv Mashup or “Is libuvxx Just a Redundant Library?”

Microsoft was nice enough to release a cross platform, Apache licensed version of the PPL tasks. It’s contained in the Casablanca SDK.  I know what some may be thinking here.  “If Casablanca is already cross platform, supports ppl tasks, why bother making a NIH library?”  Casablanca does support IOCP file operations on Windows (sync calls thread pooled on Linux), but does not have cross platform socket APIs (does have HTTP client server though) and does NOT support an event loop OUTSIDE a Windows Store Application.  I need something that is optimized for single core, but scales up to many core.  Even though Casablanca is an extremely well made library, it wasn’t exactly what I was looking for.  Also, working with a well made, well tested library didn’t sound as fun 🙂

What Does libuvxx Look Like?

Before going further I think it’d be good showing some of the library usage.

/* get the dispatcher for the current thread */
auto dispatcher = event_dispatcher::current_dispatcher();

/* run the event loop*/
dispatcher.run();

If you have ever done any work with WPF, you may find this very similar to the WPF Dispatcher.  You never create a dispatcher object directly, but always via the static function “event_dispatcher::current_dispatcher()”.  Internally we keep weak references to all dispatcher and return them based off the current thread’s ID.

/* get the dispatcher for the current thread */
auto dispatcher = event_dispatcher::current_dispatcher();

/* get all files in all subdirectories */
fs::directory::get_files_async(“C:\\Users\\Jeremiah\\”, true).
then([](task<fs::directory_entry_result_ptr> task)
{
     auto file_list = task.get();
}).
then([]
{
     /* do something else */
});

/* run the event loop*/
dispatcher.run();

This next example shows getting a list of all files from a giving path, and receiving it via a task continuation.  All the continuations here run on the event_dispatcher thread.  How does this all work?  Hell if I know, but I did spend weeks in a debugger, modifying the pplx task library and got a bit lucky.  If you want to see a fuller example you can check out this test project.

Seriously, How Does the PPL Tasks Dispatch to libuv? What Changes Were Made?

Even though the Apache 2.0 PPLX Tasks is a well made lib, it’s a SOB to figure out due to it’s high use of templates.  There may have been better ways of modifying the PPL Tasks, but I figure if you fork it, then make it your own.  First part of business was ripping out almost all the WinRT related stuff.  What we really needed was the task_continuation_context::use_current() abilities.  Under WinRT, this continuation context captures a special COM object to use later.  This is not so much different in concept to the .NET SynchronizationContext class.  This COM object is closed source, but we can assume it queues something up in the WinRT message pump.  I replaced usage of this COM object with a simple thread id value of the current thread if it is an event_dispatcher thread.  When the call needs to be processed, I simply pass the method to execute to the event_dispatcher, using the thread id to look up the correct dispatcher.

I have also made many performance improvements..and broke a few things (like stack trace capturing).  The vanilla PPLX Tasks oddly will dispatch ALL calls first to the thread pool, then they will be passed back to the UI thread in WinRT.  In my tests involving a tight loop of ppl tasks, this was the heaviest operation.  It seemed silly to even involve the thread pool if the method was not task_continuation_context::use_arbitrary().  I modified this to simply dispatch to the event_dispatcher’s function queue and immediately saw huge improvement.

I also ripped out a lot of places things could be std::move’d or passed by reference.  My profiler showed quite a bit of copying going on.  This was also followed up by reducing the amount of atomic reference counting was happening.  I may have broke something, but so far my changes appear “stable”.  I also have a more optimized version of create_iterative_task, which reduces a lot of copying an possible reference counts.  My version only exits the loop via exception, but I possibly may add an exit-by-return-false.

How Far Along is this Project?

Not too far.  Much of my time was spent in the pplx and reducing overhead as much as possible.  A lot of the APIs were copied from the .NET BCLs so they should be semi familiar.

So far you’ll find:

  • fs::directory – Static functions for querying for files or directories and deleting directories.  Supports recursive delete and reading of contents.
  • fs::path – Helper functions for dealing with filenames (like .NET’s Path)
  • fs::file – A file class for read/write
  • net::stream_socket – A socket client for read / write.  Had to do some funky stuff to make the read_async work well.  Still more to be done.
  • net::dns – Functions dealing with name resolution
  • uvxx::event_dispatcher_timer – Executes callback on a interval.  Leverages libuv timer.
  • uvxx::event_dispatcher_object – Much like a DispatcherObject in WPF.

As far as the usage goes, the objects are passed around by value or reference and act like a smart pointer.  So to initialize a stream_socket, you’d just type:  stream_socket s;

What Operating Systems Does This Support?

I’ve tested on Ubuntu on Beagle Bone Black, Windows and Mincore on the Galileo.  For Galileo I had to make some small changes to libuv as a method was not supported.  The build configuration is in the main msvc solution.  On Linux, this requires at least GCC 4.8.  I cheated and used VisualGDB and Linaro cross compiler so I could do all the Linux stuff in Visual Studio. A Linux savvy person should be able to use the .mak files and compile right on Linux.  I’ll save that for another blog post as this one is getting too long.

-Jer

One comment

  1. Niek

    >> C++ also has a billion “solutions”, but none I personally liked and none that really felt modern…
    What about the *modern* asio approach? You can choose between callback, promise/future and (stackfull) coroutines, where I find the latter the most convenient. What’s your take on coroutines (sometimes called fibers)?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s