Posts Tagged ‘Performance’

Pixel Shader Based Expansion

Posted: June 7, 2011 in Silverlight
Tags: animation, Expansion, Internet, Performance, Pixel Shader, Silverlight

Pixel Shader Based Expansion

Best animation performance in Silverlight 4 is obtained from the combination of procedural animation and a pixel shader, as reported in a previous blog. A pixel shader is not really meant to be used for spatial manipulation, but in Silverlight 4 vertex and geometry shaders are not available. Also, pixel shaders are limited to Model 2 shaders in Silverlight 4, and only limited data exchange is possible. The question is then, “how far can we push pixel shaders model 2”. Another previous blog post discussed a preliminary version of dispersion. This post is about Expansion. The effect is much like Dispersion, but the implementation is quite different – better if I may say so. Expansion means that a surface, in this case a playing video is divided up in blocks. These blocks then move toward, and seemingly beyond the edges of the surface.

The Pixel Shader

The pixel shader was created using Shazzam. The first step is to reduce the image within its drawing surface, thus creating room for expansion. Expansion is in fact just translation of all the blocks in distinct direction that depend on the location of the block relative to the center of the reduced image.

In the pixel shader below, parameter S is for scaling, reducing, the pixel surface. Parameter N defines the number of blocks along an axis. So if N = 20, expansion concerns 400 blocks. In the demo App I’ve set the maximum for N to 32, which results in 1024 blocks tops. Parameter D defines the distance over which the reduced image is expanded. If the maximum value for D is the number of blocks, all the blocks (if N is even), or all but the center block (N is odd) just ‘move off’ the surface.

The code has been commented extensively, so should be self explanatory. The big picture is that the reduced, centered and then translated location of the blocks is calculated. Then we test whether a texel is in a bloc, not in an inter block gap. If the test is positive, we sample the unreduced, uncentered and untranslated image for a value to assign to the texel.

sampler2D input : register(s0);

// new HLSL shader

/// <summary>Reduces (S: Scales) the image</summary>
/// <minValue>0.01</minValue>
/// <maxValue>1.0</maxValue>
/// <defaultValue>0.3</defaultValue>
float S : register(C0);

/// <summary>Number of blocks along the X or Y axis</summary>
/// <minValue>1</minValue>
/// <maxValue>10</maxValue>
/// <defaultValue>5</defaultValue>
float N : register(C2);

/// <summary>Displaces (d) the coordinate of the reduced image along the X or Y axis</summary>
/// <minValue>0.0</minValue>
/// <maxValue>10.0</maxValue> // Max should be N
/// <defaultValue>0.2</defaultValue>
float D : register(C1);

float4 main(float2 uv : TEXCOORD) : COLOR
{
	/* Helpers */
	float4 Background = {0.937f, 0.937f, 0.937f, 1.0f};
	// Length of the side of a block
	float L = S / N;
	// Total displacement within a Period, an inter block gap
	float d = D / N;
	// Period
	float P = L + d;
	// Offset. d is subtracted because a Period also holds a d
	float o = (1.0f - S - D - d) / 2.0f;
	// Minimum coord value
	float Min = d + o;
	// Maximum coord value
	float Max = S + D + o;

	// First filter out the texels that will not sample anyway
	if (uv.x >= Min && uv.x <= Max && uv.y >= Min && uv.y <= Max)
	{
		// i is the index of the block a texel belongs to
		float2 i = floor( float2( (Max - uv.x ) / P , (Max - uv.y ) / P  ));

		// iM is a kind of macro, reduces calculations.
		float2 iM = Max - i * P;
		// if a texel is in a centered block,
		// as opposed to a centered gap
		if (uv.x >= (iM.x - L) && uv.x <= iM.x &&
		    uv.y >= (iM.y - L) && uv.y <= iM.y)
		{
			// sample the source, but first undo the translation
			return tex2D(input, (uv - o - d * (N -i)) / S );
		}
		else
			return Background;
	}
	else
		return Background;
}

Client Application

The above shader is used in an App that runs a video fragment, and can be explored at my App Shop. The application has controls for image size, number of blocks, and distance. The video used in the application is a fragment of “Big Buck Bunny”, an open source animated video, made using only open source software tools.

Animation

Each of the above controls can be animated independently. Animation is implemented using Storyboards for the slider control values. Hence you‘ll see them slide during animation. The App is configured to take advantage of GPU acceleration, where possible. It really needs that for smooth animation. Also the maximum frame rate has been raised to 1000.

Performance Statistics

The animations run at about 225 FPS on my pc. This requires significant effort as from the CPU –about 50% of the processor time. The required memory approaches 2.3Mb.

Pixel Shader Based Translation

Posted: May 30, 2011 in Silverlight
Tags: animation, Internet, Performance, Pixel Shader, Silverlight, Translation

Pixel Shader Based Translation

Best animation performance in Silverlight 4 is obtained from the combination of procedural animation and a pixel shader, as reported in a previous blog. I know, a pixel shader is not really meant to be used for spatial manipulation. However, in Silverlight 4 vertex and geometry shaders are not available. Also, pixel shaders are limited to Model 2 shaders, and only limited data exchange is possible. The question is then, “how far can we push pixel shaders model 2” Another previous blog post discussed a preliminary version of dispersion, which is fairly complicated. This post is about translation. Translation means here that the coordinates in the 2-dimensional plane of a reduced image, in this case a video, are changed.

The Pixel Shader

The pixel shader was created using Shazzam. The first step is to reduce the image within its drawing surface, thus creating the room required for translation. Translation is in fact just changing the coordinates of the reduced images while correcting for the change in coordinates when sampling the source texture.

The pixel shader below, the if statement does the filtering. Left and Right are parameterized offset and cutoff values. The offset ‘Left’ is also used to correct for the displacement in sampling. Sampling is done by the ‘tex2d’ function.

sampler2D input : register(s0);

// new HLSL shader

/// <summary>Reduces the image</summary>
/// <minValue>0.01</minValue>
/// <maxValue>1.0</maxValue>
/// <defaultValue>0.33</defaultValue>
float Scale : register(C0);

/// <summary>Changes the coordinate of the reduced image along the X axis</summary>
/// <minValue>0.0</minValue>
/// <maxValue>1.0</maxValue>
/// <defaultValue>0.5</defaultValue>
float OriginX : register(C1);

/// <summary>Changes the coordinate of the reduced image along the Y axis</summary>
/// <minValue>0.0</minValue>
/// <maxValue>1.0</maxValue>
/// <defaultValue>0.5</defaultValue>
float OriginY : register(C2);

float4 main(float2 uv : TEXCOORD) : COLOR
{
	float4 Background = {0.937f, 0.937f, 0.937f, 1.0f};
	float2 Origin = {OriginX, OriginY};
	// Reduce nr. of computations
	float halfScale = 0.5 * Scale;
	float2 Left = Origin - halfScale;
	float2 Right = Origin + halfScale;

	// Filter out texels that do not sample (the correct location)
	if (uv.x >= Left.x && uv.x <= Right.x && uv.y >= Left.y && uv.y <= Right.y)
		return tex2D(input, (uv - Left) / Scale);
	else
		return Background;
}

Client Application

The above shader is used in an App that runs a video fragmant, and can be explored
at my App Shop. The application has controls for video surface reduction, translation along the X-axis, and translation along the Y-axis. The video used in the application is a fragment of “Big Buck Bunny”, an open source animated video, made using only open source software tools.

Animation

Each of the above controls can be animated independently. Animation is implemented suing Storyboards for the slider control values. Hence you‘ll see them slide during animation. I didn’t use procedural animation, since, as discussed in the aforementioned previous blog post, is hindered by the implementation of the MediaElement.

The App is configured to take advantage of GPU acceleration, where possible. It really needs that for smooth animation. Also the maximum frame rate has been raised to 1000.

Performance Statistics

The animations run at about 250 FPS on my pc. This requires both significant effort from both the GPU as from the CPU. The required memory reaches a little over 3.5Mb.

Spatial transformation by pixel shader

Posted: May 27, 2011 in Silverlight
Tags: Bitmap PNG Conversion, Internet, MediaStreamSource, Performance, Pixel Shader, Procedural Animation, Silverlight

Spatial Transformation by Pixel Shader

As reported in a previous blog, best animation performance in Silverlight 4 is obtained from the combination of procedural animation and a pixel shader. A pixel shader is meant to be used for the manipulation of colors and transparency, resulting also in lightning and structural effects that can be expressed pixel wise.

A pixel shader is not really meant to be used for spatial manipulation – spatial information is hard to come by, but see e.g. this discussion on the ShaderEffect.DdxUvDdyUvRegisterIndex property. Vertex shaders and geometry shaders exist to filter, scale, translate, rotate and filter vertices and topological primitives respectively. Another limitation of the use of shaders in Silverlight 4 is that only model 2 pixel shaders can be used, so that only 64 arithmetic slots are available. Finally, you can apply only one pixel shader to a UIElement.

In order to explore the limitations on spatial manipulation and the number of arithmetic slots, I decided to build a Silverlight 4 application that does some spatial manipulation of a playing video by means of procedural animation and a pixel shader. Well, this is also my first encounter with pixel shaders, so I gathered that a real challenge would show me many sides of pixel shaders (and indeed, no disappointments here).

The manipulation consists of two steps. First the video surface is reduced in size in order to create some space for the second step. In the second step, the video is divided up in a (large) number of rectangles which drift apart – disperse, while reducing in size. The second step is animated.

To be frank, I’m not very excited about the results so far. It definitely needs improvement, but I need some time and a well defined starting point for further elaboration.

The completed application

You can explore the completed application, called ‘Dispersion Demo’ in my App Shop. The application is controlled by 4 parameters:

Initial Reduction. This parameter reduces the video surface. The higher the value, the greater the reduction.
Dispersion. The Dispersion parameter controls the amount of dispersion of the blocks. If you set both Initial Reduction and Dispersion to 1, you can see the untouched video.
Resolution. The Resolution is a measure for the number of blocks along both the X-axis, and the Y-axis.
Duration. Controls the duration of the dispersion animation.

The application also has three buttons to Start, Stop, and Pause the animation of the Dispersion. The idea is that you first select an initial Reduction and a Resolution and then click Start to animate the Dispersion.

The video used in the application is a fragment of “Big Buck Bunny”, an open source animated video, made using only open source software tools (and an astonishing amount of talent).

Performance statistics

The use of some procedural animation and the pixel shader results on my pc in a frame rate of about 198 FPS, and a footprint of about 2Mb video memory. This is good (though ~400FPS would be exciting), and leaves plenty of time for other manipulations of the video image.

The pixel shader

The pixel shader was created using Shazzam, an absolutely fabulous tool, and perfect for the job. The pixel shader shown below centers the image or video snapshot and reduces it. Then it enlarges it again according to the Dispersion factor. The farther away from the center, the larger the gaps are between the blocks displaying video, and the smaller the blocks themselves are.

sampler2D input : register(s0);

// new HLSL shader

/// <summary>Reduces image to starting point.</summary>
/// <minValue>1.0</minValue>
/// <maxValue>20.0</maxValue>
/// <defaultValue>10.0</defaultValue>
float Scale : register(C0);

/// <summary>Expands / enlarges the image.</summary>
/// <minValue>1</minValue>
/// <maxValue>20</maxValue>
/// <defaultValue>6</defaultValue>
float Dispersion : register(C1); //Dispersion should never get bigger than Scale!!!

/// <summary>Measure for number of blocks</summary>
/// <minValue>100</minValue>
/// <maxValue>800</maxValue>
/// <defaultValue>200</defaultValue>
float Resolution : register(C2);

float4 main(float2 uv : TEXCOORD) : COLOR
{
       float ratio = Dispersion / Scale;
       float offset = (1 - ratio) / 2;
       float cutoff = 1 - offset;
       float center = 0.5;
       float4 background = {0.937f, 0.937f, 0.937f, 1.0f}; // Same as demo program's Grid

       float2 dispersed = (uv - offset)  / ratio;
       float2 limit = 4 * (abs(uv - center) * (Dispersion - 1) / (Scale - 1)) - 1;

       if(uv.x >= offset && uv.x <= cutoff && uv.y >= offset && uv.y <= cutoff)
       {
             if (sin(dispersed.x * Resolution) > limit.x &&
                 sin(dispersed.y * Resolution) > limit.y)
             {
                    return tex2D(input, dispersed);
             }
             else return background;
       }
       else return background;
}

In the code above, the outer “if” does the scaling and centering. The sine function and Limit variable take care of the gaps and block size. See the graph below, the Limit rises to cut off increasingly many values of the sine, thus creating smaller blocks and larger gaps.

Evaluation of limitations

Although this is a solution for the challenge set, it is not the solution. This solution uses scaling for dispersion, but what you really would like to have is a translation per block of pixels that disperse. Needing this reveals another limitation: in Silverlight 4 you cannot provide your pixel shader with an array of structs that define your blocks layout, hence it is very hard to identify the block a certain pixel belongs to, and thus what its translation vector is.

At this point I had the idea to code the block layout into a texture which is made available to the pixel shader (you can have up to three extra textures (PNG images) in your pixel shader. To create such a lookup map you have to create a writeable bitmap with the values you want, then convert this to a PNG image as Andy Beaulieu shows, using Joe Stegman’s PNG encoder. The hard part will be the encoding (and decoding in the shader). You have at your disposal 4 unsigned bytes, and you will have to encode 2D translation vectors that also consist of negative numbers. The encoding / decoding will involve a signed – unsigned conversion, and the encoding shorts (16 bit integers) as 2 bytes. This will, of course, be a feasible but tedious chore.

A better move would be to invest some effort in getting acquainted with Silverlight 5 Beta’s integration with XNA, by which also vertex shaders come available.

As far as the limit of 64 arithmetic slots concern, it seems that one absolutely must hit it. But then, it always seems possible to rethink and simplify your code, thus reducing its size. This limitation turned out not very prohibitive.

Procedural animation

The above pixel shader is integrated in the Silverlight application using the C# code generated by Shazzam. Then when implementing procedural animation, another limitation came up concerning manipulating videos.

The initial plan was to display the video using a collection of WriteableBitmaps that could be manipulated using the high performance extensions from the WriteableBitmapEx library. However, it turns out that to take a snapshot from a MediaElement, you have to create a new WriteableBitmap per snapshot. This is expensive since we now have to create at least about 30 WriteableBitmaps per second. Indeed, running the Dispersion application (at 198 FPS) sets the CPU load to about 50% at my pc. This is not really what we are looking for.

A way out is to implement the abstract MediaStreamSource class, both for the vision part as well as for the sound part of the video to be exposed. By the looks of it, this seems to be quite a challenge. However, Pete Brown has provided examples for both video and sound. And there is also the CodePlex project that provides the ManagedMediaHelpers. So, we add this challenge to the list.

Another experiment might be to see if the XNA sound and video facilities can be accessed when exploiting the Silverlight – XNA integration in Silverlight 5.

Conclusions

The challenge set turned out to be very instructive. I’ve learned a lot about writing pixel shaders, though there will be much more to learn. No doubt about that.

It is absolutely true that spatial manipulation is hard in pixel shaders :-), the main cause being the extraordinary effort required to provide the shader with sufficient data, or sufficiently elaborate data. The current shader needs improvement, the next challenge in writing shaders will address Silverlight 5 (beta) and XNA.

I noticed that you can achieve a lot in pixel shaders using trigonometry, and (no doubt) the math of signals.

Finally, in order to really do procedural animation on surfaces playing parts of a video, you seem to need to either implement MediaStreamSource or get use the XNA media facilities.

Silverlight massive animation performance

Posted: May 10, 2011 in Silverlight
Tags: DirectX, Internet, Performance, Silverlight, Silverlight 5, XNA

Silverlight massive animation performance

As it turns out, Storyboard animations in Silverlight have limited performance capability. Presumably this system has been designed for ease of use and developer / designer productivity. If you want to create massive amounts of animations, like for instance in particle systems, you soon hit the performance limits of the rendering, graphics, animation subsystem.

Hey, that’s interesting!

Of course, now we want to know what the performance limits are, and how we can get around them. When I first hit the aforementioned performance limits, I had no clue as to how to improve performance. In this article you will find some articles I found on the World Wide Web concerning the subject. Great stuff. Some solutions found are about 20 times faster than others, and current developments of Silverlight 5 seems to promise to take it a step further.

Growing trees

The first article encountered was How I let the trees grow by Peter Kuhn. He describes how he ran into performance problems creating a tree that grows by splitting branches into smaller branches, terminating in leaves. At some point he finds his software trying to render over 20k paths, which is ‘massive’ enough to create performance problems. The solution is found in the use of the WriteableBitmapEx CodePlex project. The WriteableBitmapEx contains (among others) a fast Blit operation for copy operations (claimed to be 20-30 times faster than the standard Silverlight operation – I believe it). You can draw on Bitmaps that are not shown yet, thus prepare images for the screen, and then quickly shove them into vision when ready. The (in browser – IE9) solution presented performs well.

What we do not get from this article are clear figures about standard Silverlight performance and improved performance. So let’s discuss another article.

Procedural animations

The WriteableBitmapEx CodePlex project contains a reference to Advanced Animation: Animating 15,000 Visuals in Silverlight by Eric Klimczak. He tells us that if we want to animate ~50 objects concurrently, we need additional performance measures over Storyboards and Timelines. The main performance measures he employs extend the ones mentioned above with: “procedural animations”.

In Procedural Animation within the context of Silverlight you code an Update() and Draw() Loop that is driven by the Windows.Media.CompositionTarget.Rendering event. Essentially, you now code the new position, color, or any other attribute, in the Update() method, and Blit it to the render target in the Draw() method – thus putting it on screen.

This works very well! Eric Klimczak has provided source code with his article, among which a program that animates moving particles that respond quickly to mouse actions (in browser).

For 3000 particles the program renders at ~200 frames per second (FPS), tops, and 15.000(!) particles are rendered at a still pleasant 36 – 46 FPS. I’ve used the Silverlight FPS counter for all Silverlight programs in this article in order to get comparable measurements. See the fps counter in the status bar of the IE screenshot below.

Curiously, there is no maxFrameRate setting in his code. About this maxFrameRate setting the Silverlight documentation writes: “A value of approximately 60 frames-per-second should be reliable on all platforms. The 1000 and 30000 frames-per-second range is where the maximum frame rate could differ between platforms”. So, the obvious step is to set the maxFrameRate to 1.000 – both in code as well as in the html host, which showed a factor 3 performance increase compared to the original article software, for the 3K particle case (screenshot above). The Silverlight Documentation also states that the enableGPUAcceleration setting doesn’t work for the WriteableBitmap, so I skipped that one.

It seems to me that this approach solves most problems. However, procedural animation – a gaming software approach – opens the door to other, even more apt approaches.

Note that this approach does not employ the GPU. All rendering is done using the CPU.

Pixel shaders

An approach that takes performance a step further is Silverlight 3 WriteableBitmap Performance Follow-Up by René Schulte. In this article a number of approaches are compared. All approaches yield comparable results, except the pixel shader approach, which yields a factor ~20 better performance compared to the WriteableBitmap (WOW!).

How does it work? UIElement descendants have an Effect property. You can create custom Effects using a pixel shader written in HLSL (a .fx file) which you compile using e.g. fxc.exe – the DirectX HLSL compiler, or Shazzam. The compiled shader effect is loaded as a resource by a descendant of the ShaderEffect class. The article by René Schulte uses a custom derived class thereby showing how to transfer data into the shader during program execution. The loaded shader should be attached to the UIElement’s Effect Property. The shader will be executed for each pixel to be rendered. This gives you great control over the UIElement. You can modify many attributes of each pixel, for instance color and opacity. Do not forget that dropshadows are implemented as shaders, so you can also duplicate the UIElement visual.

According to René Schulte, the program / pixel shader is not executed on the GPU. That may have been true for Silverlight 3, but in Silverlight 4 it is absolutely possible to put the GPU to work. So, with a bit of tweaking the code here and there we find a maximum performance of >450 FPS.

I’ve registered the GPU invocation for specific tasks using the Catalyst utility of my graphics card, see the fields ‘GPU Clock’ and ‘Memory Clock’ at the bottom of the screen shot below. Regular values are 157 and 300 respectively.

Silverlight 5 Beta and XNA

Recently (April 13th 2011), Silverlight 5 Beta was released. It includes the DrawingSurface control which is a gateway into XNA functionality. A little experimenting reveals that like XNA the default drawing frequency is at 60 FPS, and you can’t seem to get it up by recurring calls to the OnDraw() event handler.

In Silverlight the frequency is raised as described above. In XNA the default of 60 FPS can be lifted by setting both the Game’s object ‘IsFixedTimeStep’ property and the GraphicsDeviceManager’s ‘SynchronizeWithVerticalRetrace’ property to false.

From the MIX demo video it is clear that performance is very good, however, at 60 FPS within Silverlight. The performce step is ‘made’ by the shaders and realized on the GPU. It is currently not clear to me how to measure that performance, so this exercise ends here for now.

Non Silverlight performance

What performance can we expect? Is Silverlight slow, despite the extra tricks? What is the promise hidden in Silverlight 5? We now know that for demanding graphics we can turn to the integration of Silverlight with XNA. XNA, in turn is built upon DirectX.

Below you’ll find a screenshot of a DirectX11 particle demo. For 16K particles (reminiscent of the 15K in the above particle demo) we see a performance of ~620 FPS (not measured with the same frame counter as with the other programs, however), immediately requiring maximum performance from the GPU. For 8K particles performance rises to ~1175 FPS.

One conclusion I would like to draw here is that this performance correlates to the performance of the pixel shader used as a custom Effect. So, we may conclude that the real performance enhancement lies with the use of shaders.

Will this performance hold up in XNA? Yes, a particle simulation in XNA (from the XNA community, with small adaptation) brings us a ~1000 FPS performance, see screenshot.

Conclusion

The above is an exploration of techniques and approaches to realize massive animation performance in Silverlight. It is not a methodological, comparative study. A more rigorous investigation into performance (of what exactly?) might be subject for a later article that builds on the findings presented here.

Here we have learnt that in order to have massive animation in Silverlight we use the WriteableBitmap, the Blit operation from the WriteableBitmapEx Codeplex project, Procedural Animation programming, and pixel shaders (do not forget the enableGPUAcceleration setting, when applicable). We have seen that the exposure of XNA, built on DirectX, in Silverlight will most likely bring us further performance improvements.

Today we can have a very powerful massive animation performance of around 400-500 FPS, and the future is bright.

Clatter from the Byte Kitchen

Recent Posts

Archives

Categories

Meta

Posts Tagged ‘Performance’