Archive for the ‘Harder to C++’ Category

Harder to C++: Member Function Callbacks

Using a class member function as a callback is a possible source of confusion in C++, not in the least because C++11 brings considerable changes at this point. In this blog post we will see a few ways to do it well, and also mention deprecated facilities. It is not that I invented these techniques, but it seems helpful to spread the good word.

What is the Problem?

I wanted native C++ classes to report status updates to a XAML user interface. The XAML user interface has a method in its C++ code behind class that updates a TextBlock, and thus may inform the user about significant events in the program. The idea is to provide the objects of the native classes with a pointer or reference to the method in the code behind, so they can invoke that method and thus update the log.

How hard could that be??? Well, hard enough. So, I turned to the World Wide Web, and tried to find that one good solution. It turns out that C++ did not support callbacks in a very comfortable way until C++11, so easy to handle solutions are relatively new, and not easy to find on the World Wide Web. That’s why I wrote this blog post: to increase the probability that people needing to write callbacks for C++ can find a solution, if they need one. Note that if you need a systematic treatment of the subject you are probably better off by reading (parts of) an authoritative book e.g. “The C++ Standard Library”, by Nicolai M. Josuttis, 2nd ed.

This blog post will not in depth discuss the ins and outs of callbacks in general. Let’s just say that a callback is a function (or method) that is inserted into an object at runtime, such that the object may call the function whenever its logic dictates so. The object doesn’t know the definition of the injected function.

What’s the Solution?

The solution is to exploit a number of features that are new in C++11 and its Standard Template Library. Below we will see them introduced one by one, to settle in the end on a solution that seems most elegant and easy to use, to me (hopefully, you will agree with me on the conclusion). Note that the solutions presented here are simplified samples meant to demonstrate principle, not the code I use to log events to a display.

The ‘function’ Class

To conveniently create callable objects that are also easy to use, i.e. like standard C++ objects or functions, one uses the std::function class. For the samples in this blog post we will use the following function instantiation:

Variables of type WriteOutFunc are functions that take an int argument and return a string.

Below follows the definition of a simple class. Instantiations of this class will receive a function of type WriteOutFunc , and they will invoke (call) this function.

The method WriteForMe is there just for demonstration purposes. In real life the internal logic of a class dictates when the callback is invoked.

As a first use we instantiate a Caller, while injecting a lambda expression that implements a trivial roman number writer: it just writes the Roman number 3 (III). We do not yet inject a class member function, but just a lambda that is local to the _tmain function.

So what happens here is we initiate a Caller object with a simple lambda that returns “III”. Then we call the Caller’s method WriteForMe, with an argument ‘3’. Then WriteForMe invokes the lambda we injected and the result is returned by WriteForMe, and finally written to the standard out stream. The output is indeed “III” (without the quotes). All pretty standard.

The ‘bind’ Function

Next we define a Callee class that initiates a Caller object and injects a member function, so that the Caller object may invoke the method on this specific Callee object:

The important stuff is in the constructor. Since a member function is always called on a specific object, we ‘bind’ the object (this) to the member function WriteAsString, and assign the result to a WriteOutNumFunc variable m_numWriter. So, a call on m_numWriter(3) for a Callee my_callee is always my_callee ->WriteAsString(3). Once we have bound the Callee object and member function, we create a Caller object with the resulting WriteOutNumFunc object m_numWriter. The WriteForMe method is there again just to serve demonstration purposes.

The Callee class can be used as follows:

The output is 16 (not 12 🙂 ). I’ve added the offset, that gets modified when the callback is invoked by the Caller, and then added to the argument. This is to show that the Callee object can have its state changed by the Caller object by means of the callback. So, the Callee class demonstrates a solution to the problem how to create callbacks for class member functions. It’s a good solution, since it is built from STL facilities. It’s an elegant solution as well, it requires just a single extra line of code (bind) compared to the non member function case.

Let’s halt at this point for a moment, and discuss deprecated STL facilities. Point is that member function callbacks were, of course, already possible in C++ and the STL before C++11, but they were much more complex, and required much more work. See e.g. section 18.4.4.2 and 18.4.4.3 in Bjarne Stroustrup’s “The C++ programming Language, Special Edition”. Those sections lay out constructs that depend on templates as mem_fun and bind2nd. Point: these templates are now deprecated (see e.g. ‘Josuttis’, page 497).

In Comes the Lambda Expression

OK, we already have a neat solution to the problem posed. However, we can take it a bit further. We can introduce lambda expressions into the picture. With lambda expressions we can rewrite our Callee class as follows:

Well, this is considerably shorter, and works just as well. Of course, if you want to inject the same functionality into several objects, it is better to first create a named lambda expression, like so:

So, that’s it. And isn’t it a nice solution.

Advertisements

Harder to C++: Aligned Memory Allocation

Using the DirectX XMMatrix structure may under certain conditions crash your program. Overloading the new and delete operators in a specific way solves this problem, as does the STL aligned_storage class. This blog post integrates information from several sources – books, official documentation, forums / fora) to provide an overview of possible solutions.

What is the XMMatrix structure?

DirectX 11 is contains a high performance math library, called DirectXMath, specifically designed to handle up to 4 element vectors and up to 4 x 4 element matrices as fast as modern processors (implementing SSE2) can process them. XMVECTOR, and XMMATRIX are the central data structures in the library – you use them all through your code when programming DirectXMath.

In code you typically find something like

The function XMMatrixIdentity is also part of the library, along with a host of other functions, and generates an Identity matrix. For the uninitiated: multiplying a matrix A with an equal dimensioned identity matrix is like multiplying an integer by 1.

So, do we want to use XMMATRIX? Although there are other, similar data structures in the library? Yes, we do. We want the performance, the other data structures don’t offer the same performance, or the same compatibility with functions like XMMatrixIdentity.

What is the Problem, Exactly

Having decided we want to use XMVECTOR and XMMATRIX we will have to deal with the requirements for their use, which is that these structures need to be 16 byte aligned in memory (RAM). To be 16 byte aligned in memory means that the memory address of the data structure is a multiple of 16. The alignment requirement entails that any data structure that contains XMVECTOR or XMMATRIX also needs to be 16 byte aligned, etc. (recursively).

In many scenario’s in Windows 8 this is not a problem, you will not notice this requirement exists. However, I just happened to have stumbled upon a scenario in which the requirement does come to play, and it crashes my program.

The scenario is this: In a windows Store application (henceforth WinRT application) define a native class (pure C++, as opposed to C++/CX). holding an XMMATRIX object. This class’ constructor creates an XMMATRIX matrix and assigns an identity matrix to it using the XMMatrixIdentity function. In release builds (but not in debug builds) instantiating this class on the heap (not on the stack, and not as a static variable) will crash the program – every once in a while (!). So, for testing purposes I surrounded creation and destruction of an object of my class with a for loop. Within 10 iterations the program then practically always crashes.

The class looks like this.

And we use it in MainPage.cpp (this is about where you start programming a WinRT application) like this:

The error message looks like this, location 0xFFFFFFFF is typical for this error:

What is the Solution?

Of course, I could not be the only one that has encountered this problem, and indeed, a number of other people also got stuck. It turns out that people that come to a forum with a hard problem definitely find a lot of good intentions, though sometimes founded on arrogance. Alas they not that often find authoritative knowledge of C++ and the Standard Template Library, or even a clear understanding of their problem. Not that I myself am such an expert, but it is painful to browse through the numerous accounts that describe how a person went to the forum in despair with a problem he couldn’t solve or even understand, and subsequently had to fend off several guys that try to push very bad solutions onto him, and who typically end up fighting among each other which of them is really knowledgeable. It makes you think twice before asking for help.

Nevertheless, I managed to work my way through the debris, and find some valuable information. In this blog post we will examine three solutions from various sources:

  1. Use of _aligned_malloc and placement new by the MainPage class. This leaves the DXMathTrial class unchanged
  2. Overloading the new and delete operators of the DXMathTrial class.
  3. Creating an aligned typedef with the aligned_storage class

From the DirectXMath documentation we learn that we can overload the new and delete operators if we want to allocate 16 byte aligned variables on the heap of a class with XMMATRIX / XMVECTOR members. The documentation also suggests the use _aligned_malloc, see below. We can combine that nicely with placement new, see e.g. section 10.4.11, Special Edition of the good old C++ manual by Stroustrup. The latter idiom refers to a standard overload of the new operator that takes a memory address as an argument.

Placement new

What we do in this scenario, is we first allocate a correctly aligned block of memory with _aligned_malloc, then call placement new to construct an object of the DXMathTrial class at the obtained and aligned address. To destroy an object we first call the destructor of our DXMathTrial object, then free the allocated memory with _aligned_free. See the code below.

This solution works, and we have fine control over it: if we put the alignment to 8, we get the errors right back again. Nevertheless, this solution has some drawbacks: matter of style, or good taste.

  1. Although the DXMathTrial holds the XMMATRIX, the MainPage object has to do all the work to get the instantiation right. This is not really like the OO spirit, it doesn’t seem fair. The DXMathTrial class should hold the code to make instantiation of its objects easy and natural.
  2. It now takes much more code to create and delete an object: 7 lines instead of 2;

Overloading new  and delete

To alleviate the drawbacks of the above solution, one can overload the new and delete operators. This can be done globally (no!), or just for the relevant class.

But how does one overload new and delete? That is not straightforward, and I never did that before. Information about overloading new and delete in the context of memory alignment on the heap can be found e.g. here. Funny how the contributors do not mention XMMATRIX / XMVECTOR at all. So, you cannot find this solution on the internet using search terms describing your solution only, you will have to describe the solution!

Overloading new and delete is well treated in e.g. S.B. Lippman et all.: C++ Primer, Fifth Edition). It boils down to allocating memory in a user defined new operator overload, and de-allocating it in a user defined delete overload. In this case we will do the (de-) allocation with the ‘aligned’ variants, which gives us the following definition for the DXMathTrial class.

Testing with our initial definition of the MainPage class confirms that this is a solution. The drawbacks of the first solution are now gone, but we now see other drawbacks:

  • _aligned_malloc and _aligned_free are Microsoft specific; members of the VC++ CRT. We would prefer a solution that is completely general, one that is pure standard C++ & STL.
  • What I didn’t do here is overload all new and delete operators, but that really *is* required. That would make 8 overloads in all (see e.g. section 19.1.1 in S.B. Lippman et all.: C++ Primer, Fifth Edition). Code bloat!

The aligned_storage class

The Standard Template Library contains the aligned_storage class. It is a template that takes two value parameters: the size of the memory to be allocated, and the required alignment. To use it we add one line of code (!) to the DXMathTrial.h file to define an aligned version of the DXMathTrial class, which we will call DXMathTrialA (appending the ‘A’ may become a naming convention). We adapt our MainPage code accordingly. This gives us the following class:

With corresponding usage:

And now the problem has gone. Man, what a solution! …

BUT, aligned_storage requires the type to be aligned to be a POD type (see here for an explanation). The class above is at the edge of being a POD type. If you e.g. add a member method that sets m_matrix to the identity matrix, like so:

the problems are back again. So, we settle at overloading new and delete.

Inheritance and Membership Relations

The final question we would like to see answered is to what extent the solution involving overloaded new and delete operators propagates through membership and inheritance relations. To that end we define a class A_Base, and A_Child that both contain an XMMATRIX member, and both assign the identity matrix to this member in a dedicated method. A pointer to class A_Child will be a member of the DXMathTrial class, and allocation will be on the heap. The classes look like this.

Note that the definition of the overloaded operators have been simplified (past sound programming practice). The usage of the DXMathTrial class is unchanged. The result is that no exceptions are thrown. So, the new and delete operators of member classes also need to be overridden, but it suffices to overload these operators in the base class. This then is the solution.