Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!
I am trying to implement a flood fill alghorithm on a compute shader. I have seeded a texture with water-texels and then I want other texels to hold “distance to water” values so the seeded texels would have value 0, their direct neighbours value 1, and so on…
It would be trivial to implement on CPU but I am thinking if I can do this on a compute shader instead. I am using a very large texture and I am already doing work with the data inside a compute shader.
The problem I see is that the alghorithm I had in mind dosn't seem to fit very well with how compute shaders execute their threads.
On a CPU I would use a FIFO structure, append the seeded river texels and then for all its neighbours increment and append to the queue.
Any idea how I could do something like this on the GPU?
If it's sparse, my hierarchical idea should work. E.g. if using 16*16 tiles, see if there is any seed in the tile and if so, make the tile a seed for the low res version. Then, calculate distances for the low res version, and append tiles to one of N work lists, using the distance to index the list. After that, the first dispatch for the high res version would process tiles from the list with distance 0, next dispatch distance 1, etc.
The goal would be to process the high res tiles in coarse order at a time where their affecting neighborhood is already known.
Though, i'm pretty sure it's not that easy, and there should be a need to process some tiles multiple times. Getting this right surely is some work. And if your input is fuzzy (some seed ends up in almost every low res tile, so most tiles go to the first list), this won't help at all. Otherwise a fully hierarchical approach e.g. using multi level grids makes sense as well.
Some alternatives i can think of:
Exponential sum of density mipmaps. (I use this to approximate a SDF from density octree obtained from voxelized models, and it works quite well.) But this won't give you exact integer Manhattan distance.
Having a temporally inaccurate solution which reduces error every frame with a constant time algorithm. E.g. selecting N random texels per frame and update them looking only at their local neighborhood. That's surely nice if you can accept resulting latency of distance data. (Idea: texels with low distance update more often.)
You probably want to look up articles on SDF genertion, which is quite popular now and the same problem. Found this: https://prideout.net/blog/distance_fields/ There are also various open source plug ins for Unity or UE4.
when u do it using compute shaders, forget about this:
KaiserJohan said: On a CPU I would use a FIFO structure, append the seeded river texels and then for all its neighbours increment and append to the queue.
u need to think differently, compute shaders are not invoked in fix order;
compute shaders run on compute units which are loaded with work groups from a work domain (these work groups can run in any particular order to execute their work items), so there is no dependency on a particular order of execution when it comes to work items in compute shader concepts; they can all run at the same time or independently of each other:
for example, 10 bottom-right pixels of your flood fill algorithm could be the first lot to be drawn (and not the 19 first top-left);
imagine walking into a large football stadium at night; when u turn the lights on, they don't all come on at the same time, right: some will blink for a little before they steady others will flash for a little while, but ultimately they all become lit;
it's kinda like this a compute shader gets executed;
so when u write your compute shader think more in the lines of “when the shader gets called to process this pixel then do blablabla …”
It requires me to run the same shader a dozen times but with different offsets. Is there a smarter way of providing this offset rather than via a constant buffer?
Also slightly related question. If I invoke Dispatch a dozen times, do I need to worry about concurrency between Dispatches (since each run will read/write to the intermediate JFA texture) or will each Dispatch block until the next one runs?
And what happens if I try updating a constant buffer currently in use by a Dispatch call?
which API are you using for this? D3D11/OpenGL will handle everything for you in terms of synchronization, in D3D12/Vulkan you will need to issue the proper barriers between dispatches and also handle updating the constant buffer in a safe way.
So looked into the Jump Flooding Alghorithm but my implementation must have some issue I can't figure out.
Here's a visualization of the distance where 1.0 = river pixel and 0.0 is as far away from river pixel as possible.
The first ~3-4 pixels away from a river pixel looks OK but then I get wildly swinging values as you can see from the noise-like pattern since some of the nearby pixels actually stored a coordinate that was way, way off…
The alghorithm seems pretty simple unless I am gravely missunderstanding something.