🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Floodfill using compute shaders?

Started by
9 comments, last by KaiserJohan 3 years, 6 months ago

Hello,

I am trying to implement a flood fill alghorithm on a compute shader. I have seeded a texture with water-texels and then I want other texels to hold “distance to water” values so the seeded texels would have value 0, their direct neighbours value 1, and so on…

It would be trivial to implement on CPU but I am thinking if I can do this on a compute shader instead. I am using a very large texture and I am already doing work with the data inside a compute shader.

The problem I see is that the alghorithm I had in mind dosn't seem to fit very well with how compute shaders execute their threads.

On a CPU I would use a FIFO structure, append the seeded river texels and then for all its neighbours increment and append to the queue.

Any idea how I could do something like this on the GPU?

Advertisement

I wonder if this works:

For each pixel: distance = max(left and right neighbors) + 1;

For each pixel: distance = max(top and bottom neighbors) + 1;

Assuming your max distance would be 20, this would be already 40 dispatches. : /

Eventually start with a downs scaled map of yours, to obtain lists of tiles per dispatch to avoid useless processing.

The texture is 12000x9000 so its gonna be rather large

And is your initial seed sparse or noisy?

If it's sparse, my hierarchical idea should work.
E.g. if using 16*16 tiles, see if there is any seed in the tile and if so, make the tile a seed for the low res version.
Then, calculate distances for the low res version, and append tiles to one of N work lists, using the distance to index the list.
After that, the first dispatch for the high res version would process tiles from the list with distance 0, next dispatch distance 1, etc.

The goal would be to process the high res tiles in coarse order at a time where their affecting neighborhood is already known.

Though, i'm pretty sure it's not that easy, and there should be a need to process some tiles multiple times. Getting this right surely is some work. And if your input is fuzzy (some seed ends up in almost every low res tile, so most tiles go to the first list), this won't help at all. Otherwise a fully hierarchical approach e.g. using multi level grids makes sense as well.

Some alternatives i can think of:

  • Exponential sum of density mipmaps. (I use this to approximate a SDF from density octree obtained from voxelized models, and it works quite well.) But this won't give you exact integer Manhattan distance.
  • Having a temporally inaccurate solution which reduces error every frame with a constant time algorithm. E.g. selecting N random texels per frame and update them looking only at their local neighborhood. That's surely nice if you can accept resulting latency of distance data. (Idea: texels with low distance update more often.)

You probably want to look up articles on SDF genertion, which is quite popular now and the same problem. Found this: https://prideout.net/blog/distance_fields/​ There are also various open source plug ins for Unity or UE4.

when u do it using compute shaders, forget about this:

KaiserJohan said:
On a CPU I would use a FIFO structure, append the seeded river texels and then for all its neighbours increment and append to the queue.

u need to think differently, compute shaders are not invoked in fix order;

compute shaders run on compute units which are loaded with work groups from a work domain (these work groups can run in any particular order to execute their work items), so there is no dependency on a particular order of execution when it comes to work items in compute shader concepts; they can all run at the same time or independently of each other:

for example, 10 bottom-right pixels of your flood fill algorithm could be the first lot to be drawn (and not the 19 first top-left);

imagine walking into a large football stadium at night; when u turn the lights on, they don't all come on at the same time, right: some will blink for a little before they steady others will flash for a little while, but ultimately they all become lit;

it's kinda like this a compute shader gets executed;

so when u write your compute shader think more in the lines of “when the shader gets called to process this pixel then do blablabla …”

look at this example:

https://forum.unity.com/threads/compute-shader-for-line-drawing.599989/

https://gist.github.com/unitycoder/2fe0bfd2498041dedd5c326c9d4c727e

u will see that there is no for-loop to run from pixel to pixel instead the code works out which pixel it is about to draw:

(from the 1st link)
float2 uv = float2 ((float)id.x/1024, (float)id.y/1024);

1024 is both the width and height of the texture passed to CShader;

then u use id.x and id.y divided by the size of your texture to tell which pixel to process;

so float2 uv in this will be in the range UV(0,1) … like texture coordinates ?

so once u know what coordinate u are working on, then plug it into your algorithm;

there's a lot more I haven't said, but to keep things simple, i hope this clarifies it a bit;

have fun ?

You probably want to try jump flooding for this, which is the go-to algorithm for generating a distance field on the GPU.

MJP said:

You probably want to try jump flooding for this, which is the go-to algorithm for generating a distance field on the GPU.

@MJP

I will look into jump flooding alghorithm.

It requires me to run the same shader a dozen times but with different offsets. Is there a smarter way of providing this offset rather than via a constant buffer?

Also slightly related question. If I invoke Dispatch a dozen times, do I need to worry about concurrency between Dispatches (since each run will read/write to the intermediate JFA texture) or will each Dispatch block until the next one runs?

And what happens if I try updating a constant buffer currently in use by a Dispatch call?

@kaiserjohan

which API are you using for this? D3D11/OpenGL will handle everything for you in terms of synchronization, in D3D12/Vulkan you will need to issue the proper barriers between dispatches and also handle updating the constant buffer in a safe way.

So looked into the Jump Flooding Alghorithm but my implementation must have some issue I can't figure out.

Here's a visualization of the distance where 1.0 = river pixel and 0.0 is as far away from river pixel as possible.

The first ~3-4 pixels away from a river pixel looks OK but then I get wildly swinging values as you can see from the noise-like pattern since some of the nearby pixels actually stored a coordinate that was way, way off…

The alghorithm seems pretty simple unless I am gravely missunderstanding something.

This is the C++ code that runs the JFA passes:

void DX11TerrainPass::UpdateJFA( TerrainRenderData& renderData, uint32_t width, uint32_t height, uint32_t dispatchX, uint32_t dispatchY )
{
	const DX11DynamicTexture& JFARiverMap = renderData.mJumpFloodRiverMap;

	mContext->CSSetShader( mJFAComputeShader, nullptr, 0 );

	mJFACBuffer.Bind();

	mContext->CSSetUnorderedAccessViews( UAV_SLOT_0, 1, &JFARiverMap.mUAV.p, nullptr );


	// 12000 x 9000 texture so ~13 passes
	int32_t logX = static_cast<int32_t>( std::log2( width ) );
	int32_t logY = static_cast<int32_t>( std::log2( height ) );

	int32_t numRuns = std::max( logX, logY );
	for ( int32_t runIndex = 0; runIndex < numRuns; ++runIndex )
	{
		int32_t offsetX = static_cast<int32_t>( std::pow( 2, logX - runIndex - 1 ) );
		int32_t offsetY = static_cast<int32_t>( std::pow( 2, logY - runIndex - 1 ) );

		mJFACBuffer.SetData( { offsetX, offsetY } );
		mContext->Dispatch( dispatchX, dispatchY, 1 );
	}

	mContext->CSSetUnorderedAccessViews( UAV_SLOT_0, 1, &gNullUAV.p, nullptr );
}

This is the compute shader for JFA:

cbuffer PerTerrainConstants : register( CBUFFER_REGISTER_EXTRA )
{
	int2 gSampleOffset;
}

RWTexture2D<uint2> gJumpFloodRiverMap : register( UAV_REGISTER_0 );

void UpdateClosest( inout int2 currentValue, int2 currentTexcoord, int2 jumpTexcoord )
{
	// out-of-bounds or no value stored
	int2 jumpValue = gJumpFloodRiverMap[ jumpTexcoord ];
	if ( IsZero( jumpValue ) )
	{
		return;
	}


	// nothing stored yet, use this
	float jumpDistance = distance( (float2)currentTexcoord, (float2)jumpTexcoord );
	if ( IsZero( currentValue ) )
	{
		gJumpFloodRiverMap[ currentTexcoord ] = jumpValue;
		currentValue = jumpValue;
		return;
	}

	float currentDistance = distance( (float2)currentTexcoord, (float2)currentValue );
	if ( jumpDistance < currentDistance )
	{
		gJumpFloodRiverMap[ currentTexcoord ] = jumpValue;
		currentValue = jumpValue;
	}
}

[numthreads(TERRAIN_NORMAL_THREADS_AXIS, TERRAIN_NORMAL_THREADS_AXIS, 1)]
void cs_main(uint3 groupID : SV_GroupID, uint3 dispatchTID : SV_DispatchThreadID, uint3 groupTID : SV_GroupThreadID, uint groupIndex : SV_GroupIndex)
{
	int2 texCoord = dispatchTID.xy;
	int2 currentValue = gJumpFloodRiverMap[ texCoord ];


	// the offset texture coordinates, clockwise from top
	int2 texCoordT = texCoord + int2( 0, gSampleOffset.y );
	int2 texCoordTR = texCoord + gSampleOffset;
	int2 texCoordR = texCoord + int2( gSampleOffset.x, 0 );
	int2 texCoordBR = texCoord + int2( gSampleOffset.x, -gSampleOffset.y );
	int2 texCoordB = texCoord + int2( 0, -gSampleOffset.y );
	int2 texCoordBL = texCoord - gSampleOffset;
	int2 texCoordL = texCoord + int2( -gSampleOffset.x, 0 );
	int2 texCoordTL = texCoord + int2( -gSampleOffset.x, gSampleOffset.y );

	UpdateClosest( currentValue, texCoord, texCoordT );
	UpdateClosest( currentValue, texCoord, texCoordTR );
	UpdateClosest( currentValue, texCoord, texCoordR );
	UpdateClosest( currentValue, texCoord, texCoordBR );
	UpdateClosest( currentValue, texCoord, texCoordB );
	UpdateClosest( currentValue, texCoord, texCoordBL );
	UpdateClosest( currentValue, texCoord, texCoordL );
	UpdateClosest( currentValue, texCoord, texCoordTL );
}

Anything that stands out?

Nevermind was distancing the offset coordinate rather than the stored value. This is the correct:

float jumpDistance = distance( (float2)currentTexcoord, (float2)jumpValue);

Seems to work fine now at a quick glance!

This topic is closed to new replies.

Advertisement