COMP 238: Advanced Image Generation
Max Smolens (max@cs.unc.edu)
15 December, 2004
Final Project: GPU-Accelerated Painterly Rendering


OVERVIEW

I implemented Aaron Hertzmann's "Painterly Rendering with Curved Brush Strokes of Multiple Sizes" [1], with the image processing operations performed on the GPU. The painterly rendering algorithm produces images that appear hand-painted. The algorithm builds the painting up in layers, with each layer corresponding to a single brush size. The algorithm performs blurring, edge detection, difference, and averaging operations on images when rendering each layer. All these steps are 2D convolution operations in which a kernel or multiple kernels are applied to each pixel in an image. Such repetitive calculations are suitable for GPU implementations, because of the hardware's massive parallelism.


PAINTERLY RENDERING

The painterly rendering algorithm transforms an input image into a version of the image that looks hand-painted. The algorithm progressively refines the painting by first placing large brush strokes in large, flat areas of color in the source image, and then drawing an additional layer for each smaller brush size in decreasing order. The algorithm uses small brush strokes only in regions of high detail in the source image. Therefore, the lower layers, those with broad brush strokes, are still visible in the final painting.

For each layer, the algorithm creates a reference image by blurring the source image to a degree proportional to the brush size used in the layer. Reference images represent the image to be approximated by each brush. Therefore, large brushes approximate only the gross features in a source image and small brushes approximate the fine features. In pseudocode, this algorithm is written as:

Paint (source_image, brush_radii[])
{
	canvas := constant color image
	foreach brush radius r from largest to smallest do
		reference_image = gaussian_blur (source_image)
		PaintLayer (canvas, reference_image, r)
	end
}
The PaintLayer routine paints a new layer on the canvas based on the current reference image and brush size. The routine chooses starting points for brush strokes by traversing a grid whose size is proportional to the brush size. The average color difference between the canvas and the reference image at each grid point is computed and compared to a threshold. If the difference is greater than the threshold for a grid point, then a stroke is drawn for that grid point. This thresholding allows previously drawn layers to remain visible, because each successive layer does not cover the entire canvas. The actual starting point for each stroke is chosen as the pixel in the neighborhood with greatest difference between the canvas and the reference image. The routine is written as:
PaintLayer (canvas, reference_image, r)
{
	S := empty set of strokes
	D := pointwise_difference (canvas, reference_image)
	grid := f_g * r
	for x in [0..width] step grid do
		for y in [0..height] step grid do
			area_error := average difference of points in grid*grid neighborhood around (x,y)
			if (area_error > threshold) then
				(x',y') := largest error point in neighborhood around (x,y)
				s := MakeStroke (r, x', y', reference_image)
				add s to S
		end
	end
	paint strokes in S in random order
}
where f_g is some constant grid size factor. The distance between two colors is defined as sqrt((r_1-r_2)^2 + (g_1-g_2)^2 + (b_1-b_2)^2). make_stroke draws a brush stroke of radius r starting at (x',y').


CURVED BRUSH STROKES

The brush strokes are modeled as cubic B-splines, and have constant thickness and color. Each successive B-spline control point is located by following the image gradient's normal for the length of the brush radius. Therefore, the splines roughly match the reference image's isocontours. Sobel edge detection of the reference image luminance determines the image gradients. A stroke ends when it has either reached a maximum length or when its color deviates too much from the reference image.

When choosing the normal of the image gradient for a control point there are two candidates. Hertzmann chooses the normal that minimizes the stroke curvature. The algorithm can also limit or exaggerate the curvature by filtering the stroke directions. Strokes are rendered using OpenGL with either points or a triangle strip along the subdivided B-spline. The stroke drawing algorithm is written as:

MakeStroke (r, x0, y0, reference_image)
{
	stroke_color := reference_image at (x0,y0)
	K := stroke of radius r and color stroke_color
	add (x0,y0) to K
	(x,y) = (x0,y0)
	(last_dx,last_dy) := (0,0)
	for i in [1..max_stroke_length] do
		if (i > min_stroke_length and
			   |reference_image at (x,y) - canvas at (x,y)| <
			   |reference_image at (x,y) - stroke_color|) then
			return K
		// detect vanishing gradient
		if reference_image gradient magnitude at (x,y) near 0 then
			return K
		// get unit vector of gradient
		(gx,gy) := reference_image gradient direction at (x,y)
		// compute a normal direction
		(dx,dy) := (-gy,gx)
		// reverse normal if necessary
		if (last_dx * dx + last_dy * dy < 0) then
			(dx,dy) := (-dx,-dy)
		// filter the stroke direction
		(dx,dy) := f_c*(dx,dy) + (1-f_c)*(last_dx,last_dy)
		normalize (dx,dy)
		(x,y) := (x+r*dx,y+r*dy)
		(last_dx,last_dy) := (dx,dy)
		add (x,y) to K
	end
	return K
}
where f_c is a filter constant.


RENDERING STYLES

Hertzmann's painterly rendering algorithm makes uses of style parameters:

My implementation defines twelve styles as combinations of these parameters. The style can be changed for the loaded image in the program by selecting new style from the menu:


GPU-ACCELERATION

When drawing each layer, the painterly rendering algorithm described above performs four image processing operations that cleanly and directly map to to the GPU: blurring, edge detection, difference, and area averaging. I implemented each operation in fragment programs. The input images are bound as textures and the results are written to textures.

I implemented the blurring operation as a separable Gaussian blur. In general, a 2D convolution requires (width * height) multiplications per pixel, but when the convolution filter can separated the convolution requires only (width + height) multiplications per pixel. Therefore, I first convolve the image with a (n * 1) filter, then convolve the resulting image with a (1 * n) filter. As an example, my Gaussian blur with a 5x5 filter uses the following two fragment programs to create a reference image from the source image:

// Horizontal filter.
void
main (half4 position : POSITION,
      half2 texCoord : TEXCOORD0,
      
      uniform samplerRECT image,
      uniform half weights[5],
      
      out half4 color : COLOR)
{
	color = 0;
	for (int x = -2; x <= 2; x++) {
		color += texRECT(image, half2(texCoord.x+x, texCoord.y)) * weights[x+2];
	}
}

// Vertical filter.
void
main (half4 position : POSITION,
      half2 texCoord : TEXCOORD0,
      
      uniform samplerRECT image,
      uniform half weights[5],
      
      out half4 color : COLOR)
{
	color = 0;
	for (int y = -2; y <= 2; y++) {
		color += texRECT(image, half2(texCoord.x, texCoord.y+y)) * weights[y+2];
	}
}
The Gaussian filter coefficients are bound to weights. The source image is initially bound as image in the horizontal pass, then the result of that pass is bound to image in the vertical pass. The result is a Gaussian blurred source image—the reference image. I have four sets of these horizontal/vertical filter fragment programs; one for each of the four brush sizes defined by my program.

The PaintLayer routine finds the difference between the canvas and the reference image with the following fragment program:

void
main (half4 position : POSITION,
      half2 texCoord : TEXCOORD0,

      uniform samplerRECT canvas,
      uniform samplerRECT reference_image,
      
      out half3 color : COLOR)
{
	half3 c = texRECT(canvas, texCoord);
	half3 r = texRECT(reference_image, texCoord);
	color = distance(c, r);
}
Note that the distance function computes the Euclidean distance between c and r—the exact computation to find the color difference.

The PaintLayer routine also finds the average color difference in the neighborhood of each grid point. I achieve this by feeding the result of the above difference operation into a box filter convolution. Like the Gaussian filter, the box filter is separable. Therefore, I can use the same set of fragment programs as the Gaussian blur, but I bind uniform fractional weights to the weights arrays.

Finally, the MakeStroke routine of the painterly rendering algorithm requires the image gradient directions and magnitudes. I find these by Sobel filtering the luminance of the reference image in two passes, first with the filter:

|-1 -2 -1|
| 0  0  0|
| 1  2  1|
and then with the transpose of that filter:
|-1  0  1|
|-2  0  2|
|-1  0  1|
I store the results in a 16-bit floating point texture. As an optimization, the texture coordinates of the neighboring fragments are computed in a vertex program. My Sobel filtering fragment program is:
void
main (half4 position        : POSITION,
      half2 texCoord        : TEXCOORD0,
      half4 texCoord_ne_nw  : TEXCOORD1,
      half4 texCoord_n_s    : TEXCOORD2,
      half4 texCoord_e_w    : TEXCOORD3,
      half4 texCoord_se_sw  : TEXCOORD4,
      
      uniform samplerRECT reference_image,

      out half4 color : COLOR)
{
	static const half3 lum = half3 (0.3, 0.59, 0.11);
	static const int weights[2][9] = {{-1, 0, 1, -2, 0, 2, -1, 0, 1},  // top to bottom
					  {-1, -2, -1, 0, 0, 0, 1, 2, 1}}; // left to right
	half2 g = half2(0,0);
	color = 0.0;
	half lums[9];
	lums[0] = dot(lum, texRECT (reference_image, texCoord_se_sw.zw));
	lums[1] = dot(lum, texRECT (reference_image, texCoord_n_s.zw));
	lums[2] = dot(lum, texRECT (reference_image, texCoord_se_sw.xy));
	lums[3] = dot(lum, texRECT (reference_image, texCoord_e_w.zw));
	lums[4] = dot(lum, texRECT (reference_image, texCoord));
	lums[5] = dot(lum, texRECT (reference_image, texCoord_e_w.xy));
	lums[6] = dot(lum, texRECT (reference_image, texCoord_ne_nw.zw));
	lums[7] = dot(lum, texRECT (reference_image, texCoord_n_s.xy));
	lums[8] = dot(lum, texRECT (reference_image, texCoord_ne_nw.xy));

	for (int i = 0; i < 9; i++) {
		g += lums[i] * int2(weights[0][i], weights[1][i]);
	}

	// Magnitude.
	color.r = length(g);
	// Direction.
	color.gb = normalize(g);
	color.g = -color.g;
}
Note that the fragment program normalizes the direction and computes the normal direction by flipping the sign of one of the direction components.

The results of three of the four image processing operations performed on the GPU are read back to the CPU in order to draw the brush strokes; only the difference operation result is not read back. Note that the canvas is rendered using OpenGL and never needs to be read back from the GPU. For each layer, the brush strokes are rendered to the canvas using render-to-texture operations. Therefore, the canvas is always available as a texture for the next layer's image processing operations or to be displayed.


PERFORMANCE

I tested my program on a 3GHz Pentium 4 with a Quadro FX 500 graphics card. The processing time should be roughly proportional to the source image size. The GPU image processing and the CPU stroke calculations are both included in these approximate measurements:

Table 1. Quadro FX 500 (Linux)
Image SizeProcessing Time
320x2400.5 s
640x4802 s
1280x8505.5 s

The CPU processing times depend on the style parameters. For example, a style with a long maximum brush stroke length takes longer to process. However, the GPU image processing times should be independent of the style. I found that with the Quadro FX 500 approximately 50% of the processing time is from the GPU processing. Therefore, the CPU stroke calculations are significant in the total processing time. There is likely room for improved efficiency in my CPU stroke drawing implementation. I also measured the processing times on a GeForce 6800 Ultra:

Table 2. GeForce 6800 Ultra (Windows)
Image SizeProcessing Time
320x2400.38 s
640x4801.17 s
1280x8502.5 s

Although the processing times are much lower than those on the Quadro FX 500, it is clear that the CPU processing time is significant. I measured the GPU processing time for the GeForce 6800 Ultra:

Table 3. GeForce 6800 Ultra (Windows)
Image SizeGPU Processing Time
320x2400.22 s
640x4800.28 s
1280x8500.50 s

With the 1280x850 image, the GPU processing time accounts for only 20% of the total processing time. Unfortunately, I did not have time to implement CPU-based versions of these algorithms to compare with. A proper implementation for the speed comparison would use an optimized image processing library like the Intel Performance Primitives.


RESULTS

Examples of the painting styles are presented on separate pages:

Impressionism
Expressionism
Pointillism
Colorist Wash


REFERENCES

  1. Aaron Hertzmann. "Painterly rendering with curved brush strokes of multiple sizes." Proc. SIGGRAPH 1998.