Archive for December, 2014

Harder to C++: Monads for Mortals [6], Performance of Basic Implementations

In parts 2 – 5 we have seen four different implementations of the monad. Now, it is time for a shootout. I want to see which version has best time performance and compare it with a regular task to evaluate the cost of using a monad.

The monad implementations are:

  1. The first implementation that copies all its data.
  2. The lazy implementation that evaluates at explicit call.
  3. The references and move semantics implementation.
  4. The pointer based implementation.

The Task at Hand

The challenge is to create a task that is hard to optimize by cleverly removing parts of the functionality. Imagine we have an algorithm that fills a vector with arbitrary data, which is then thrown away unused because the vector goes out of scope. We wouldn’t want the compiler to decide that it is useless to create the vector altogether and skip the code.

Further, I will make the task less predictable by extensive use of the rand() function. I know, rand() is not a great pseudo random engine. However, rand() is not used here to simulate randomness, but to get values that are available only at runtime, not at compile time – when the code optimizer runs. This way, the code cannot have optimizations based on compile time knowledge of constant values.

The algorithm that will be implemented by the functions that are ‘applied’ by the monad, and that will also be implemented by two regular functions is as follows (we will be working with the ValueObject class used in earlier episodes, but with a method added that retrieves a pointer to the payload):

  1. Receive a (pointer / reference to a) ValueObject object.
  2. Retrieve a char from the ValueObject object’s payload, with a random index, stored it in a global data structure.
  3. Create a size for a new ValueObject object, that depends on the size of the received object, and the pseudo random generator.
  4. Create a (monad with a new) ValueObject object and return (a pointer to) it.

Step 2 is to ensure that the payload is not optimized away because it will never be accessed. Step 3 serves to make the size of the payload somewhat unpredictable, and also wildly varying.

This is the version for the reference and move semantics based implementation:

	// Creates one more ValueObject object
	monad<ValueObject> onemore(const ValueObject& v)
	{
		// samples is a file scope vector<char>
		samples.push_back(*(v.getload() + rand() % (v.size()+1)));

		do
		{
			rsize = rand();
		} while (rsize == 0);

		vsize = v.size() + 1;
		rsize = rsize > vsize ? rsize / vsize : vsize / rsize;

		return monad<ValueObject>(ValueObject(rsize));
	}

Main Routine

In the main routine, we have a composition of 25 applications of the onemore function by the bind function. We execute this composition iterations times, which we call a run. We measure the time of each run. We average over the runs, and print the result tot screen.

The number of iterations and runs are given by rand(), which is seeded with a number obtained from the user.

Scoping is such that all objects are created and destroyed in measured time.

This is the code for the reference and move semantics based implementation:

{
	using namespace I3;

	srand(seed);
	unsigned int iterations = rand();
	unsigned int runs = rand() % 1000;

	cout << "Runs: " << runs << endl
		<< "Iterations per run: " << iterations << endl
		<< "Applications per iteration: 25" << endl;

	cout << "References and move semantics." << endl;
	total = 0;

	for (unsigned int i = 0; i < runs; ++i)
	{
		t.Start();

		for (unsigned int j = 0; j < iterations; ++j)
		{
			unsigned int size1 = rand();

			auto m1 = monad<ValueObject>(ValueObject(size1));

			auto m2 = m1
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore;			}

		t.Stop();
		total += t.Elapsed();

		cout << ". ";
	}

	cout << endl << "Average over "
	<< runs << " differently sized runs of 25 x "
	<< iterations << " applications: " << total / runs << " ms"
	<< endl << endl;
}

As earlier, the timer code is from Simon Wybranski. The outer braces are just to assure different implementations do not interfere.

The iterations for the lazy monad differ from the above articulation in that it also makes a call: m2() to evaluate the constructed expression.

Regular Algorithms

The monad implementations are compared to regular algorithms that are functionally equivalent, but do not use monads. We will use one variant with references, and one variant with pointers and heap allocation.

The algorithm used is the same as above. The call in the main loop is terrible:

auto m2 = m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m1)
))))))))))))))))))))))));

Which also illustrates that the name onemore was abbreviated to m.

For references (&), the m function was implemented as:

	ValueObject m(const ValueObject& v)
	{
		samples.push_back(*(v.getload() + rand() % (v.size()+1)));

		do
		{
			rsize = rand();
		} while (rsize == 0);

		vsize = v.size() + 1;
		rsize = rsize > vsize ? rsize / vsize : vsize / rsize;

		// First creating a tmp, then returning it coerces move semantics
		// instead of copy semantics
		auto tmp = ValueObject(rsize);

		return tmp;
	}

Procedure

Time performance was measured using a release build, with default compiler switches for release builds, and the runs were executed outside of Visual Studio. That is, the program was started from a console window, not from Visual Studio.

Results

And now the results:

So, what do we see?

  1. The monad implementation based on references and move semantics (yellow) is the fastest. But the pointer based implementation has about the same performance.
  2. The regular implementations are only slightly faster (red, orange).
  3. The lazy monad (blue) is very slow, as expected.

Conclusions

The choice is clear. In following episodes, we will work with the monad built on references and move semantics. If required, we might use the pointer based variant as well. I will not use the lazy monad, unless it is clear that very, very special circumstances make it a favorable choice.

I was surprised by the small difference (if at all) between the fast monad implementation and the regular algorithms; the monad covers about twice the source code as the regular function – it also includes the bind function. So (hypothesis), the cost of using the references and move semantics monad is negligible.

Next

Next time we will start looking at various types of monads and how we could build compositions of them.

Harder to C++: Monads for Mortals [5], Pointers

In part 2 we have seen a small and elegant implementation of the monad. In part 4 we have seen how references and move semantics can be used to make the monad’s performance independent of the size of the wrapped value. In this part we will see another implementation the monad based on a pointer.

Implementation of a Monad with a Smart Pointer

We have seen in part 2 that the type constructor is represented by a template in C++. In part 2 this is a simple struct, in part 3 it is a std::function holding a lambda expression – so as to implement delayed evaluation, and in this part, it is a smart pointer, the std::unique_ptr to be precise.

So, now the type constructor is:

// Monad type constructor
template<typename T>
using monad = unique_ptr<T>;

We do not use a separate unit function, the unique_ptr‘s constructor will do. The bind function is:

// Bitwise OR operator overload, Monad bind function
template<typename A, typename R>
monad<R> operator|(monad<A>& mnd, monad<R>(*func)(const A*))
{
	log("---Function bind");

	return func(mnd.get());
}

Since in the applied functions we do not manage the life cycle of the existing A object, we send in a raw pointer, not a smart pointer.

Given the simple functions we used earlier, this implementation of the monad is used as follows:

// divides arg by 2 and returns result as monadic template
monad<ValueObject> divideby2(const ValueObject* v)
{
	log("---Function divideby2");

	return monad<ValueObject>(new ValueObject(v->size() / 2));
}

// checks if arg is even and returns result as monadic template
monad<bool> even(const ValueObject* v)
{
	log("---Function even");

	return monad<bool>(new bool(v->size() % 2 == 0));
}

void valueobjectmain()
{
	{
		auto m1 = monad<ValueObject>(new ValueObject(16));

		auto m2 = m1 | divideby2 | divideby2 | divideby2 | even;

		cout << boolalpha << *m2 << endl;
	}
}

The ValueObject class is the same as used in part 4.

Running the above code results in output:

I think the most important result here is what we don’t see: calls to the move constructor and the accompanying call to the destructor of the ValueObject.

Pro’s, Con’s, and Questions

The above image of the output is nicely constrained. Functions return the unique_ptr by value, i.e. it gets copied which is ok, and the bind function takes a reference to a unique_ptr to elicit the raw pointer from. We see the calls to ValueObject destructor occur after assignment to m2, and when leaving the anonymous scope.

So, pro’s for this implementation are its simplicity, clarity, and efficiency: the overhead of a unique_ptr is comparable to the overhead of a raw pointer. Values are not copied, so performance is size independent.

You may wonder, though, whether it is a good idea to create all the monad’s values on the heap, which is not so efficient. In part 6 we take a look at what we have so far and see what works best.