Harder to C++: Monads for Mortals [6], Performance of Basic Implementations

Posted: December 15, 2014 in Harder to C++
Tags: ,

Harder to C++: Monads for Mortals [6], Performance of Basic Implementations

In parts 2 – 5 we have seen four different implementations of the monad. Now, it is time for a shootout. I want to see which version has best time performance and compare it with a regular task to evaluate the cost of using a monad.

The monad implementations are:

  1. The first implementation that copies all its data.
  2. The lazy implementation that evaluates at explicit call.
  3. The references and move semantics implementation.
  4. The pointer based implementation.

The Task at Hand

The challenge is to create a task that is hard to optimize by cleverly removing parts of the functionality. Imagine we have an algorithm that fills a vector with arbitrary data, which is then thrown away unused because the vector goes out of scope. We wouldn’t want the compiler to decide that it is useless to create the vector altogether and skip the code.

Further, I will make the task less predictable by extensive use of the rand() function. I know, rand() is not a great pseudo random engine. However, rand() is not used here to simulate randomness, but to get values that are available only at runtime, not at compile time – when the code optimizer runs. This way, the code cannot have optimizations based on compile time knowledge of constant values.

The algorithm that will be implemented by the functions that are ‘applied’ by the monad, and that will also be implemented by two regular functions is as follows (we will be working with the ValueObject class used in earlier episodes, but with a method added that retrieves a pointer to the payload):

  1. Receive a (pointer / reference to a) ValueObject object.
  2. Retrieve a char from the ValueObject object’s payload, with a random index, stored it in a global data structure.
  3. Create a size for a new ValueObject object, that depends on the size of the received object, and the pseudo random generator.
  4. Create a (monad with a new) ValueObject object and return (a pointer to) it.

Step 2 is to ensure that the payload is not optimized away because it will never be accessed. Step 3 serves to make the size of the payload somewhat unpredictable, and also wildly varying.

This is the version for the reference and move semantics based implementation:

	// Creates one more ValueObject object
	monad<ValueObject> onemore(const ValueObject& v)
	{
		// samples is a file scope vector<char>
		samples.push_back(*(v.getload() + rand() % (v.size()+1)));

		do
		{
			rsize = rand();
		} while (rsize == 0);

		vsize = v.size() + 1;
		rsize = rsize > vsize ? rsize / vsize : vsize / rsize;

		return monad<ValueObject>(ValueObject(rsize));
	}

Main Routine

In the main routine, we have a composition of 25 applications of the onemore function by the bind function. We execute this composition iterations times, which we call a run. We measure the time of each run. We average over the runs, and print the result tot screen.

The number of iterations and runs are given by rand(), which is seeded with a number obtained from the user.

Scoping is such that all objects are created and destroyed in measured time.

This is the code for the reference and move semantics based implementation:

{
	using namespace I3;

	srand(seed);
	unsigned int iterations = rand();
	unsigned int runs = rand() % 1000;

	cout << "Runs: " << runs << endl
		<< "Iterations per run: " << iterations << endl
		<< "Applications per iteration: 25" << endl;

	cout << "References and move semantics." << endl;
	total = 0;

	for (unsigned int i = 0; i < runs; ++i)
	{
		t.Start();

		for (unsigned int j = 0; j < iterations; ++j)
		{
			unsigned int size1 = rand();

			auto m1 = monad<ValueObject>(ValueObject(size1));

			auto m2 = m1
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore
				| onemore | onemore | onemore | onemore | onemore;			}

		t.Stop();
		total += t.Elapsed();

		cout << ". ";
	}

	cout << endl << "Average over "
	<< runs << " differently sized runs of 25 x "
	<< iterations << " applications: " << total / runs << " ms"
	<< endl << endl;
}

As earlier, the timer code is from Simon Wybranski. The outer braces are just to assure different implementations do not interfere.

The iterations for the lazy monad differ from the above articulation in that it also makes a call: m2() to evaluate the constructed expression.

Regular Algorithms

The monad implementations are compared to regular algorithms that are functionally equivalent, but do not use monads. We will use one variant with references, and one variant with pointers and heap allocation.

The algorithm used is the same as above. The call in the main loop is terrible:

auto m2 = m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m(m1)
))))))))))))))))))))))));

Which also illustrates that the name onemore was abbreviated to m.

For references (&), the m function was implemented as:

	ValueObject m(const ValueObject& v)
	{
		samples.push_back(*(v.getload() + rand() % (v.size()+1)));

		do
		{
			rsize = rand();
		} while (rsize == 0);

		vsize = v.size() + 1;
		rsize = rsize > vsize ? rsize / vsize : vsize / rsize;

		// First creating a tmp, then returning it coerces move semantics
		// instead of copy semantics
		auto tmp = ValueObject(rsize);

		return tmp;
	}

Procedure

Time performance was measured using a release build, with default compiler switches for release builds, and the runs were executed outside of Visual Studio. That is, the program was started from a console window, not from Visual Studio.

Results

And now the results:

So, what do we see?

  1. The monad implementation based on references and move semantics (yellow) is the fastest. But the pointer based implementation has about the same performance.
  2. The regular implementations are only slightly faster (red, orange).
  3. The lazy monad (blue) is very slow, as expected.

Conclusions

The choice is clear. In following episodes, we will work with the monad built on references and move semantics. If required, we might use the pointer based variant as well. I will not use the lazy monad, unless it is clear that very, very special circumstances make it a favorable choice.

I was surprised by the small difference (if at all) between the fast monad implementation and the regular algorithms; the monad covers about twice the source code as the regular function – it also includes the bind function. So (hypothesis), the cost of using the references and move semantics monad is negligible.

Next

Next time we will start looking at various types of monads and how we could build compositions of them.

Comments
  1. […] Harder to C++: Monads for Mortals [6], Performance of Basic Implementations […]

Leave a comment