Archive for August, 2014

Boolean Fields in a Struct: Another Complex PInvoke Construct

Adjacent Boolean fields in a structure constitute a problem in marshaling a C++ struct to C# .Net. Point is that the memory layout of structs is maintained in marshalling. Using the MarshalAs function or the StructLayout Pack field doesn’t solve the problem. Sharing of the bool aggregation memory word by several Properties, in combination with bit-masks and a bitwise operator, does however do resolve the situation.

Introduction

In the previous blog post we saw how to use PInvoke to copy an array of structs from C++ to C#, using a callback. In this blog post we build on the code developed in that previous post to solve a problem in working with Boolean fields (fields of type bool).

I know, it should be easy. However, it is complicated.

Point is: if we extend our structure with a single Boolean field, marshaling works fine, but if we add two adjacent Boolean fields, marshaling breaks down. Why?

Let’s first see what happens, then analyze the problem , and with the analysis in hand evaluate solution alternatives. And yes, the problem will be solved.

Adding Boolean Fields to a Struct, Then Marshal it in PInvoke

Assume we would like to extend our C++ MeteoInfo (meteorological Information) struct with a Boolean field indicating whether it is raining (whether there is precipitation) at the meteo station at time of measurement. In C++ we get:

struct MeteoInfo
{
    wchar_t* DisplayName;
    wchar_t* UniqueID;
    int Temp;
    bool IsRaining;
    double Humidity;
};

In our WPF C# GUI we would meet that with:

public struct MMeteoInfo
{
// Properties, not fields to serve data binding
    public string DisplayName { get; private set; }
    public string UniqueID { get; private set; }
    public int Temp { get; private set; }
    public bool IsRaining { get; private set; }
    public double Humidity { get; private set; }
}

 

And this will work.

Then, encouraged by our success, we add two more, but adjacent Boolean fields: IsOperational and IsOnline, which indicate whether the meteo station is known to function properly, and can be reached by internet, respectively. In C++:

struct MeteoInfo
{
    wchar_t* DisplayName;
    wchar_t* UniqueID;
    bool IsOperational;
    bool IsOnline;
    int Temp;
    bool IsRaining;
    double Humidity;
};

And in C#:

public struct MMeteoInfo
{
// Properties, not fields to serve data binding
    public string DisplayName { get; private set; }
    public string UniqueID { get; private set; }
    public bool IsOperational { get; private set; }
    public bool IsOnline { get; private set; }
    public int Temp { get; private set; }
    public bool IsRaining { get; private set; }
    public double Humidity { get; private set; }
}

But now we are in trouble. We create an array of MeteoInfos as:

MeteoInfo m_infos[] =
{
    { L"Meteo1", L"123-123-123", true, true, 25, true, 60.3 },
    { L"Meteo2", L"456-456-456", true, false, 27, false, 81.25 },
    { L"Meteo3", L"789-789-789", false, true, 33, true, 36.7 }
};

And it shows up in the GUI like:

Which is completely wrong!

So, what happened?

Analysis: Struct Memory Layout

From an issue posted a connect.microsoft.com (MS’s feedback site). We learn that:

  • In Marshaling a type with layout from C++, default alignment/padding of types is the same as in C++.
  • Alignment requirement is by default 8 but can be changed using the Pack field on the StructLayoutAttribute.

To complete the picture we should add that:

  • The (minimum) byte size of the bool type has not been defined in C++, so it is implementation dependent. In VC++ sizeof(bool)=1, as it is in C#.

Now let’s take a look at the memory layout of the structure in C++, hence in C# .Net, and compare this with the layout we would expect in C# based on aligned type byte sizes.

Field Byte offset in C++, C# Type Size CLR Aligned type size based offset
DisplayName 0 8 0
UniqueID 8 8 8
Temp 16 4 16
IsRaining 20 1 (padding=3) 20
Humidity 24 8 24
One Past End of struct (size) 32 32 32

So indeed, no problems in sight; the actual offset are equal to the offsets based on aligned managed type sizes.

How did I get these data? The C++ structure memory layout can be read off the Memory window while debugging in VS2013. However, in order to obtain the offset numbers for the C# struct (MMeteoInfo) you have to use a trick, see here: assign each field of a struct object to a suitably typed variable (i.e. var 🙂 ), and evaluate the offset of the source value. (I’m open to suggestions on utilities that show memory layout of complex types in managed memory, how ever volatile.)

When we add two (or more) adjacent Boolean fields, we are in trouble

Field Byte offset in C++, C# Type Size CLR Aligned type size based offset
DisplayName 0 8 0
UniqueID 8 8 8
IsOperational 16 1 16
IsOnline 17 1 20
Temp 20 (aligned at word) 4 24
IsRaining 24 1 28
Humidity 32 (aligned at 8 multiple) 8 32
One Past End of struct(size) 40 40 40

This shows that the two adjacent Boolean fields are packed together in a single word. This is the doing of default marshaling, but clearly the CLR cannot adapt to this.

We can explain, or predict the errors now:

  • An assignment of IsOperational grabs four bytes, so it also copies IsOnline, hence with our initializations the result is always true. Indeed, if we set both to False, IsOperational comes out false too.
  • An assignment of IsOnline also grabs four bytes, so also the first byte of Temp. Temp is positive in our initializations, so IsOnline is always true. Indeed, if we set Temp to zero, IsOnline comes out False.
  • Assignment of Temp really grabs the value of IsRaining at offset 24, so we get to see the value of IsRaining instead of the value of Temp.
  • The value for IsRaining is copied from the empty padding bytes at offset 28, rendering it the value 0.
  • The value or Humidity, a double, is taken from the first multiple of 8 bytes, at offset 32, hence it is correct.

So, now we know the MSIL instructions copy too many bytes, or bytes from the wrong location, how do we repair this error?

Solutions and Non-Solutions

A solution we do not want is to change the C++ code by using #pragma pack(…). #pragma is used to create non-portable code, pack is used to adapt the memory layout of structures. I don’t want to make code non-portable, just because I would like to add a GUI to that code.

Let’s review a number of approaches that might seem reasonable, but in fact will not work.

The MarshalAs Method

We could insert a clause like:

MarshalAs(UnmanagedType.Bool)

However, that one is meant for export of data to C++, targeting the Windows BOOL type, which is four bytes. For our 1 byte bool type we use:

MarshalAs(UnmanagedType.U1)

Then we change the definition of MMeteoInfo so we can marshal the bool variables, if we would like to:

[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct MMeteoInfo
{
    public string DisplayName { get; private set; }
    public string UniqueID { get; private set; }

    [MarshalAs(UnmanagedType.U1)]
    bool isOperational;
    public bool IsOperational
    {
        get { return isOperational; }
        private set { isOperational = value; }
    }

    [MarshalAs(UnmanagedType.U1)]
    bool isOnline;
    public bool IsOnline
    {
        get { return isOnline; }
        private set { isOnline = value; }
    }

    public int Temp { get; private set; }

    [MarshalAs(UnmanagedType.U1)]
    bool isRaining;
    public bool IsRaining
    {
        get { return isRaining; }
        private set { isRaining = value; }
    }

    public double Humidity { get; private set; }
}

 

But if we run the program, we get an error:

Let’s say this is a marshaling error.

The StructLayoutAttribute.Pack Field

If we set Packing to 4, all our troubles should be over; all fields should start at word boundary. However, mirroring the C++ layout takes precedence – the two Boolean fields keep being marshaled as two adjacent bytes, and we still get the same error message. The same result is obtained for each packing value we may choose.

Use the StructLayout Attribute with LayoutKind.Explicit

So, we cannot change the structure’s memory layout by merely specifying a packing. Next step is to explicitly specify the layout per structure field. If we do so, the definition of the structure becomes like this:

[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Unicode)]
public struct MMeteoInfo
{
    [FieldOffset(0)]
    string displayName;
    public string DisplayName
    {
        get { return displayName; }
        private set { displayName = value; }
    }

    [FieldOffset(8)]
    string uniqueID;
    public string UniqueID
    {
        get { return uniqueID; }
        private set { uniqueID = value; }
    }

    [FieldOffset(16)]
    bool isOperational;
    public bool IsOperational
    {
        get { return isOperational; }
        private set { isOperational = value; }
    }

    public bool IsOnline
    {
        get { return isOperational; }
        private set { isOperational = value; }
    }

    [FieldOffset(20)]
    int temp;
    public int Temp
    {
        get { return temp; }
        private set { temp = value; }
    }

    [FieldOffset(24)]
    bool isRaining;
    public bool IsRaining
    {
        get { return isRaining; }
        private set { isRaining = value; }
    }

    // double gets laid out on a multiple of 8 bytes.
    [FieldOffset(32)]
    double humidity;
    public double Humidity
    {
        get { return humidity; }
        private set { humidity = value; }
    }
}

Note that the property IsOnline does not have a backing field of its own, It shares a backing field with IsOperational. After all, both bools are encoded in these same 4 bytes. This code works, except that IsOperational, and IsOnline are now dependent values.

So, how do we get the correct value for each bool out of the 4 bytes in the isOperational field?

We use a mask and a Boolean bitwise operator, like this:

[FieldOffset(16)]
int booleans;
public bool IsOperational
{
    get
    {
        int mask = 1;
        // The first byte of booleans contains the value of IsOperational
        int tmp = booleans & mask;
        return tmp > 0;
    }
    private set { ; } // dummy
}

public bool IsOnline
{
    get
    {
        int mask = 256;
        // The second byte of booleans contains the value of IsOnline
        int tmp = booleans & mask;
        return tmp > 255;
    }
    private set { ; } // dummy
}

And it works!

Now we have the same values as in the initialization code above.