Harder to C++: Aligned Memory Allocation

Posted: December 23, 2012 in Harder to C++
Tags: , , , , , , ,

Harder to C++: Aligned Memory Allocation

Using the DirectX XMMatrix structure may under certain conditions crash your program. Overloading the new and delete operators in a specific way solves this problem, as does the STL aligned_storage class. This blog post integrates information from several sources – books, official documentation, forums / fora) to provide an overview of possible solutions.

What is the XMMatrix structure?

DirectX 11 is contains a high performance math library, called DirectXMath, specifically designed to handle up to 4 element vectors and up to 4 x 4 element matrices as fast as modern processors (implementing SSE2) can process them. XMVECTOR, and XMMATRIX are the central data structures in the library – you use them all through your code when programming DirectXMath.

In code you typically find something like

The function XMMatrixIdentity is also part of the library, along with a host of other functions, and generates an Identity matrix. For the uninitiated: multiplying a matrix A with an equal dimensioned identity matrix is like multiplying an integer by 1.

So, do we want to use XMMATRIX? Although there are other, similar data structures in the library? Yes, we do. We want the performance, the other data structures don’t offer the same performance, or the same compatibility with functions like XMMatrixIdentity.

What is the Problem, Exactly

Having decided we want to use XMVECTOR and XMMATRIX we will have to deal with the requirements for their use, which is that these structures need to be 16 byte aligned in memory (RAM). To be 16 byte aligned in memory means that the memory address of the data structure is a multiple of 16. The alignment requirement entails that any data structure that contains XMVECTOR or XMMATRIX also needs to be 16 byte aligned, etc. (recursively).

In many scenario’s in Windows 8 this is not a problem, you will not notice this requirement exists. However, I just happened to have stumbled upon a scenario in which the requirement does come to play, and it crashes my program.

The scenario is this: In a windows Store application (henceforth WinRT application) define a native class (pure C++, as opposed to C++/CX). holding an XMMATRIX object. This class’ constructor creates an XMMATRIX matrix and assigns an identity matrix to it using the XMMatrixIdentity function. In release builds (but not in debug builds) instantiating this class on the heap (not on the stack, and not as a static variable) will crash the program – every once in a while (!). So, for testing purposes I surrounded creation and destruction of an object of my class with a for loop. Within 10 iterations the program then practically always crashes.

The class looks like this.

And we use it in MainPage.cpp (this is about where you start programming a WinRT application) like this:

The error message looks like this, location 0xFFFFFFFF is typical for this error:

What is the Solution?

Of course, I could not be the only one that has encountered this problem, and indeed, a number of other people also got stuck. It turns out that people that come to a forum with a hard problem definitely find a lot of good intentions, though sometimes founded on arrogance. Alas they not that often find authoritative knowledge of C++ and the Standard Template Library, or even a clear understanding of their problem. Not that I myself am such an expert, but it is painful to browse through the numerous accounts that describe how a person went to the forum in despair with a problem he couldn’t solve or even understand, and subsequently had to fend off several guys that try to push very bad solutions onto him, and who typically end up fighting among each other which of them is really knowledgeable. It makes you think twice before asking for help.

Nevertheless, I managed to work my way through the debris, and find some valuable information. In this blog post we will examine three solutions from various sources:

  1. Use of _aligned_malloc and placement new by the MainPage class. This leaves the DXMathTrial class unchanged
  2. Overloading the new and delete operators of the DXMathTrial class.
  3. Creating an aligned typedef with the aligned_storage class

From the DirectXMath documentation we learn that we can overload the new and delete operators if we want to allocate 16 byte aligned variables on the heap of a class with XMMATRIX / XMVECTOR members. The documentation also suggests the use _aligned_malloc, see below. We can combine that nicely with placement new, see e.g. section 10.4.11, Special Edition of the good old C++ manual by Stroustrup. The latter idiom refers to a standard overload of the new operator that takes a memory address as an argument.

Placement new

What we do in this scenario, is we first allocate a correctly aligned block of memory with _aligned_malloc, then call placement new to construct an object of the DXMathTrial class at the obtained and aligned address. To destroy an object we first call the destructor of our DXMathTrial object, then free the allocated memory with _aligned_free. See the code below.

This solution works, and we have fine control over it: if we put the alignment to 8, we get the errors right back again. Nevertheless, this solution has some drawbacks: matter of style, or good taste.

  1. Although the DXMathTrial holds the XMMATRIX, the MainPage object has to do all the work to get the instantiation right. This is not really like the OO spirit, it doesn’t seem fair. The DXMathTrial class should hold the code to make instantiation of its objects easy and natural.
  2. It now takes much more code to create and delete an object: 7 lines instead of 2;

Overloading new  and delete

To alleviate the drawbacks of the above solution, one can overload the new and delete operators. This can be done globally (no!), or just for the relevant class.

But how does one overload new and delete? That is not straightforward, and I never did that before. Information about overloading new and delete in the context of memory alignment on the heap can be found e.g. here. Funny how the contributors do not mention XMMATRIX / XMVECTOR at all. So, you cannot find this solution on the internet using search terms describing your solution only, you will have to describe the solution!

Overloading new and delete is well treated in e.g. S.B. Lippman et all.: C++ Primer, Fifth Edition). It boils down to allocating memory in a user defined new operator overload, and de-allocating it in a user defined delete overload. In this case we will do the (de-) allocation with the ‘aligned’ variants, which gives us the following definition for the DXMathTrial class.

Testing with our initial definition of the MainPage class confirms that this is a solution. The drawbacks of the first solution are now gone, but we now see other drawbacks:

  • _aligned_malloc and _aligned_free are Microsoft specific; members of the VC++ CRT. We would prefer a solution that is completely general, one that is pure standard C++ & STL.
  • What I didn’t do here is overload all new and delete operators, but that really *is* required. That would make 8 overloads in all (see e.g. section 19.1.1 in S.B. Lippman et all.: C++ Primer, Fifth Edition). Code bloat!

The aligned_storage class

The Standard Template Library contains the aligned_storage class. It is a template that takes two value parameters: the size of the memory to be allocated, and the required alignment. To use it we add one line of code (!) to the DXMathTrial.h file to define an aligned version of the DXMathTrial class, which we will call DXMathTrialA (appending the ‘A’ may become a naming convention). We adapt our MainPage code accordingly. This gives us the following class:

With corresponding usage:

And now the problem has gone. Man, what a solution! …

BUT, aligned_storage requires the type to be aligned to be a POD type (see here for an explanation). The class above is at the edge of being a POD type. If you e.g. add a member method that sets m_matrix to the identity matrix, like so:

the problems are back again. So, we settle at overloading new and delete.

Inheritance and Membership Relations

The final question we would like to see answered is to what extent the solution involving overloaded new and delete operators propagates through membership and inheritance relations. To that end we define a class A_Base, and A_Child that both contain an XMMATRIX member, and both assign the identity matrix to this member in a dedicated method. A pointer to class A_Child will be a member of the DXMathTrial class, and allocation will be on the heap. The classes look like this.

Note that the definition of the overloaded operators have been simplified (past sound programming practice). The usage of the DXMathTrial class is unchanged. The result is that no exceptions are thrown. So, the new and delete operators of member classes also need to be overridden, but it suffices to overload these operators in the base class. This then is the solution.

Comments
  1. Adrian says:

    This is an extremely detailed and civilized explanation of the problem and solution, thank you! Excellent work.

    I’m building a DirectX engine and was torn between using Bullet’s linear math (since I may want to use Bullet anyway, just for easy intersection/collision) and converting to XMxxx types as needed, or just using DirectXMath everywhere. The mention of alignment requirements in the documentation was off-putting, as it seemed like a solution along these lines would be necessary, and indeed it is. MS didn’t provide a lot of detail, and it’s really so helpful that you did.

    Cheers!
    Adrian

  2. Alex says:

    Hello Marc,
    Thanks a lot for this helpful and detailed explanation. This is exactly what I was looking for !

  3. Asgard says:

    Thanks from me too!

  4. Bob says:

    Multiple places say “16 bit” instead of “16 byte”, otherwise good article!

Leave a comment