🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Compute shader performance tied to size of buffer?

Started by
2 comments, last by AidanofVT 3 years, 5 months ago

Hi all, this is my first time posting on these forums, so please let me know if there are any conventions I'm missing:

I have a compute shader that was working just fine until I quadrupled the size of one of the compute buffers from 262,144 bytes to 1,048,576 bytes. After that increase, the Unity performance profiler tells me that this shader takes about 70ms to complete its work, when previously the time was insignificant. This buffer is only set once, at launch, and is never read back into the main C# script, so I don't think it can be anything to do with data transfer. I can only assume that I've crossed some sort of subtle-but-important memory threshhold: some register is overflowing or something.

Shader memory management is a pretty arcane topic, so I don't know where to start. Can anyone give me a clue? Any help is appreciated.

--and here's the shader. The buffer in question is "imageLibrary:"

RWStructuredBuffer<uint> bugger;
RWStructuredBuffer<float> imageLibrary;
RWStructuredBuffer<uint> world;
RWStructuredBuffer<float2> cameraDimensions;
RWStructuredBuffer<float2> hereNow;
RWStructuredBuffer<float2> scale;
RWStructuredBuffer<float> output;

#pragma kernel action
[numthreads(1,1,1)]
void action (uint3 group : SV_GroupID, uint3 thread : SV_GroupThreadID) {
    uint members, stride;
    imageLibrary.GetDimensions(members, stride);
    for (int i  = 0; i < members; ++i) {
        bugger[i] = world [i];
    }
    uint totalWidth = cameraDimensions[0].x;
    float halfWidth = totalWidth / 2;
    uint totalHeight = cameraDimensions[0].y;
    float halfHeight = totalHeight / 2;
    uint offset, nevermind;
    world.GetDimensions(offset, nevermind);
    offset = sqrt(offset * 4) / 2;
    uint sectorLength = totalWidth / 8; //THE DIVISOR CHANGES IF THE THREAD-GROUP COUNT IS CHANGED
    uint sectorHeight = totalHeight / 8; //THE DIVISOR CHANGES IF THE THREAD-GROUP COUNT IS CHANGED
    uint xStart = group.x * sectorLength;
    uint yStart = group.y * sectorHeight;
    int k = 0;
    for (uint i = 0; i < sectorHeight; ++i) {
        for (uint j = 0; j < sectorLength; ++j) {
            float worldX = hereNow[0].x + ((float)(xStart - halfWidth) + (float) j) * scale[0].x / halfWidth;
            float worldY = hereNow[0].y + ((float)(yStart - halfHeight) + (float) i) * scale[0].y / halfHeight;
            int xInSquare = floor(worldX);
            int yInSquare = floor(worldY);
            int index = ((xInSquare + offset) * offset * 2 + (yInSquare + offset));
            int byteInFloat = index % 4;         
            index = index / 4;         
            int tileType = world[index];
            tileType = tileType >> (byteInFloat * 8);
            tileType = tileType & 0x000000ff;         
            int2 fromPixel = int2((float)(worldX - xInSquare) * 128, (float)(worldY - yInSquare) * 128);
            int2 toPixel = int2 (j + xStart, i + yStart);
            output[cameraDimensions[0].x * toPixel.y + toPixel.x] = imageLibrary.Load(16384 * tileType + 128 * fromPixel.y + fromPixel.x);
        }
    }

}

None

Advertisement

AidanofVT said:
[numthreads(1,1,1)]

this is huuuugely discouraged by Shawn Hargreaves, this means that you are using 1 thread per threadgroup when you have at your disposal 32, 64, etc… threads to use per threadgroup;

the point about gpu and simd is to run multiple threads at the same time, but you're using 1 and hoping that mum will have your dinner ready before you get home -lol-

AidanofVT said:
uint sectorLength = …; //THE DIVISOR CHANGES IF THE THREAD COUNT IS CHANGED

yes but you have set your thread count to 1 so how does your comment help? ?

what you've done by increasing from 256KB to 1MB is you have increased the football playground size but kept the numbers of players the same and what's worse is you're only using 1 player - some team this is !-

i don't think the other players like u as a coach ?

right, let's get you out of misery, go and learn this:

https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-attributes-numthreads

and be a better coach ?

@ddlox Thanks for the response. What's not shown here is that the shader is dispatched with 64 thread groups. Originally I actually did have multiple threads per thread group, but I found that it was fast enough with just 64 threads overall, so for the sake of simplicity I made thread groups the only determinant of responsibility. Before posting this thread, I did change it to one group of 64 threads, and there was no improvement. I'll try a few threads per thread group with 64 threads and see what happens, but I doubt it will change anything because the “imageLibrary” buffer has nothing to do with the shader's workload. It's just an array of pixels to refer to; the number of screen-pixels that each thread is responsible for isn't changed, so I wouldn't think that there are any more operations demanded.

None

This topic is closed to new replies.

Advertisement