🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Percentage Closer Soft Shadows: ( combined with jumpflood ) how estimating correctly penumbra size ( diagrams inside )

Started by
20 comments, last by JoeJ 2 years, 1 month ago

evelyn4you said:
But i have clearly to admit a BIG DISADVANTAGE of raytraced shadows. Using chracter animation with high poly characters will slow down considerably, because it is needed to rebuild or update the BLAS ( bottom level access structure per frame)

Animated foliage will add to this issue.

evelyn4you said:
( really big scene)

But what if you have open world? Then you need to build BVH for streamed scene at runtime. At this point, because current APIs do not allow to precompute and stream BVH, the whole raytracing idea is just broken at the moment, at least on PC.
We need a BVH API, so we can stream our custom BVH from disk and convert it to the format as required by hardware.
We also need BVH API so we can edit the BVH locally, to support advanced LOD solutions such as Nanite.

As it is now, raytracing seems more like a step back, lacking any flexibility required to move on, but bringing us back to the stone age of fixed function restrictions and brute force.

Short sighted, expensive, too slow for what it gives. It came years too early, was rushed just to pretend progress, and to impose how games have to work to the advantage of a certain chip maker.

You see, i really don't like it's current state, although i agree it's preferable over shadow maps, and the proper solution for many other things. : /

Advertisement

I took liberty of fixing my back-projection shadows algorithm (the one without using hierarchy). There were certain issues regarding linearization of Z (which I was linearizing multiple times, because I'm stupid (sorry for the rant now) and my engine already HAS THE Z LINEARIZED ALL THE TIME … AND I WAS HAVING THE COMMENT 10 LINES ABOVE WHERE I EDITED CODE). I'll clean up the code a little bit and post it here for standard back-projection shadows. I will intentionally show 2 results:

Back projection shadow maps with gap filling - ends in over-estimated penumbra (i.e. the penumbra is darker than what one would expect)
Back projection shadow maps without gap filling - ends up in correct penumbra, but introduces light leaks

Now, I do know that one can get rid of light leaks in multiple ways:

  • Use the gap filling (with parameter fine-tuning it could get better in overestimation of penumbra) … I might grab a video of that because it really is visible only with small, but already area-light sizes
  • Darkening … which is used with VSM and often heavily destroys the penumbra part
  • Use depth-peeling for this (i.e. have more information than just ‘first back-face hit’ in shadow map … which would be astronomical performance hit

There is another huge problem, when your search through shadow map can't cover the light size (at which point the shadow starts getting brighter). The proper solution for this one is to either use astronomically huge filter kernel (I've tried 128x128 and that somewhat fixes it in most cases - but even Radeon Rx 6800 was sweating about as much as in my custom path tracer) … or use the hierarchical approach.

I will look into the hierarchical one at some point and fix it (so it is correct too).

Now the interesting part and that's the code (very crude, not many comments - I didn't have time to update it - this version is without gap filling):

float visibility = 1.0f;
float searchTexelSpread = 1.0f;
float oneTexelSize = lightsData[lightInput.id].shadowScale;
float halfTexelSize = oneTexelSize * 0.5f;
float2 ndcCurrentFragmentLightmapTexel = lNDC.xy;
float lightSizeScale = 1.0f / lightsData[lightInput.id].data3.x;
float near_plane = spotLight.shadowNear;
float far_plane = spotLight.shadowFar;
float lightmapFarPlaneSize = tan(spotLight.spotAngle * 0.5f) * 2.0f * far_plane;
float oneFarPlaneTexelSize = lightmapFarPlaneSize * lightsData[lightInput.id].shadowScale;
 
float z = lNDC.z;
 
for (int i = -KERNEL_SEARCH_SIZE; i < KERNEL_SEARCH_SIZE; i++)
{
    for (int j = -KERNEL_SEARCH_SIZE; j < KERNEL_SEARCH_SIZE; j++)
    {
        float sampleTexX = lNDC.x + oneTexelSize * searchTexelSpread * float(i);
        float sampleTexY = lNDC.y + oneTexelSize * searchTexelSpread * float(j);
 
        float2 sampleCoords = float2(sampleTexX, sampleTexY);
 
        float zs = shadowMap.SampleLevel(shadowSampler, sampleCoords.xy, 0.0f).x;
 
        if (zs > z)
        {
            continue;
        }
 
        float thisSampleZPlaneSize = lightmapFarPlaneSize * zs;
 
        float2 ndcNextFragmentLightmapTexel = sampleCoords;
        float scaleFactor = z / (z - zs);
 
        float left = (ndcNextFragmentLightmapTexel.x - ndcCurrentFragmentLightmapTexel.x - halfTexelSize) * thisSampleZPlaneSize * scaleFactor * lightSizeScale;
        float right = (ndcNextFragmentLightmapTexel.x - ndcCurrentFragmentLightmapTexel.x + halfTexelSize) * thisSampleZPlaneSize * scaleFactor * lightSizeScale;
        float top = (ndcNextFragmentLightmapTexel.y - ndcCurrentFragmentLightmapTexel.y + halfTexelSize) * thisSampleZPlaneSize * scaleFactor * lightSizeScale;
        float bottom = (ndcNextFragmentLightmapTexel.y - ndcCurrentFragmentLightmapTexel.y - halfTexelSize) * thisSampleZPlaneSize * scaleFactor * lightSizeScale;
 
        left = clamp(left, -0.5f, 0.5f);
        right = clamp(right, -0.5f, 0.5f);
        top = clamp(top, -0.5f, 0.5f);
        bottom = clamp(bottom, -0.5f, 0.5f);
 
        float width = right - left;
        float height = top - bottom;
 
        float area = width * height;
 
        visibility -= area * 1.0f;
    }
}
 
if (visibility < 0.0f)
{
    visibility = 0.0f;
}

Most of those should be self-explanatory - keep in mind all depth info NEED TO BE IN LINEARIZED Z or you will bash your head against keyboard why it looks like one huge light leak. Few notes:

  • lNDC is NDC from light view projection
  • shadowScale is 1.0/resolution of shadow map
  • data3.x on lights is light size
  • searchTexelSpread should be parameter (I don't expose it to GUI right now)
  • spotAngle is there because I implemented it for spotlights only as of now

If I get a chance (and time) I may turn this into a proper article at some point. And while large blocker search seems to be a bit of problem - I can imagine situations where this is going to be superior to other techniques. I'm keeping that option in light editor inside my engine as of now.

Note: As for the gap filling, you have to do something like this prior to calculating width and height of single blocker:

float2 sampleCoordsX = float2(sampleTexX - (oneTexelSize * searchTexelSpread), sampleTexY);
float zsX = shadowMap.SampleLevel(shadowSampler, sampleCoordsX.xy, 0.0f).x;

if (zsX < z)
{
    float thisSampleZPlaneTexelSize = lightmapFarPlaneSize * zsX;
    float2 ndcNextFragmentLightmapTexel = sampleCoordsX;
    float scaleFactor = z / (z - zsX);
    float leftX = (ndcNextFragmentLightmapTexel.x - ndcCurrentFragmentLightmapTexel.x - halfTexelSize) * thisSampleZPlaneTexelSize * scaleFactor * lightSizeScale;
    leftX = clamp(leftX, -0.5f, 0.5f);
    left = min(leftX, left);
}

float2 sampleCoordsY = float2(sampleTexX, sampleTexY - (oneTexelSize * searchTexelSpread));
float zsY = shadowMap.SampleLevel(shadowSampler, sampleCoordsY.xy, 0.0f).x;

if (zsY < z) 
{
    float thisSampleZPlaneTexelSize = lightmapFarPlaneSize * zsY;
    float2 ndcNextFragmentLightmapTexel = sampleCoordsY;
    float scaleFactor = z / (z - zsY);
    float bottom1 = (ndcNextFragmentLightmapTexel.y - ndcCurrentFragmentLightmapTexel.y - halfTexelSize) * thisSampleZPlaneTexelSize * scaleFactor * lightSizeScale;
    bottom1 = clamp(bottom1, -0.5f, 0.5f);
    bottom = min(bottom1, bottom);
}

Basically you sample neighboring fragment, and update the size of current one in case it is also a blocker.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Next one, you've asked me how do I build hierarchical min-max mip map.

It is unoptimized (unlike standard mip map creation - because I was lazy), but on my lighting system I have a call:

void GenerateMipmaps(Engine::DescriptorHeap* heap, Engine::ComputeContext* context)
{
	context->SetPipelineState(mMipmapPS);
	context->SetRootSignature(mMipmapRS);
	context->SetDescriptorHeap(Engine::DescriptorHeap::CBV_SRV_UAV, heap);
	int todo = mMiplevels - 1;
	int base = 0;
	int dimension = mShadowAtlasResolution;

	while (todo != 0)
	{
		int mipLevels = 1;

		context->SetConstants(0, Engine::DWParam(base), Engine::DWParam(mipLevels), Engine::DWParam(1.0f / (float)dimension), Engine::DWParam(1.0f / (float)dimension));
		context->SetDescriptorTable(1, mShadowColorMap->GetSRV());
		context->SetDescriptorTable(2, mShadowColorMap->GetUAV(base + 1));
		context->Dispatch(dimension / 2, dimension / 2, 1);
		context->TransitionResource(mShadowColorMap, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, true);

		todo -= 1;
		base += 1;
		dimension /= 2;
	}
}

Which goes in a loop to generate all mip levels of texture. You can use LDS in compute shader to nicely compute multiple levels at once (I didn't do that here yet - as I said, this part is WIP - I implemented some proof of concept years ago, and never really finished it … so seeing a post will kind of force me to drag it over the finish line, which often is mainly doing a bit of additional changes, cleanup and allow given technique everywhere) - I think I went with 4 levels computed at once with standard mipmaps. The compute kernel to calculate single mip level is straight forward:

[numthreads(2, 2, 1)]
void GenerateMipmapsMinMax(uint GI : SV_GroupIndex, uint3 DTid : SV_DispatchThreadID)
{
	float4 src0;
	float4 src1;
	float4 src2;
	float4 src3;
	float2 uv = (DTid.xy + 0.5f) * texelSize;
	src0 = srcLevel.SampleLevel(srcSampler, uv, srcMiplevel);
	StoreColor(GI, src0);
	GroupMemoryBarrierWithGroupSync();

	if (GI == 0)
	{
		src1 = LoadColor(GI + 0x01);
		src2 = LoadColor(GI + 0x02);
		src3 = LoadColor(GI + 0x03);

		mipLevel1[DTid.xy / 2] = float4(min(min(src0.x, src1.x), min(src2.x, src3.x)), max(max(src0.x, src1.x), max(src2.x, src3.x)), 0.0f, 0.0f);
	}
}

This way you will end up with minimum in red channel and maximum in green channel. You have to use 2 component textures for this.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

@Vilem Otte

thanks for your code and the screenshots. The back projected shadow maps version with gap filling looks very good.
The code with so much texture fetches looks very expensive :-(

JoeJ said:
But what if you have open world? Then you need to build BVH for streamed scene at runtime. At this point, because current APIs do not allow to precompute and stream BVH, the whole raytracing idea is just broken at the moment, at least on PC. We need a BVH API, so we can stream our custom BVH from disk and convert it to the format as required by hardware. We also need BVH API so we can edit the BVH locally, to support advanced LOD solutions such as Nanite.

hi JoeJ, let me ask here, are you sure ?
My intention was the following scenario:

A. precompute at game building time:

foreach BLAS object
- create vertexbuffer, indexbuffer
- create BLAS
- copy BLAS to CPU space and store to disc.

B. At runtime
- When object is needed upload VB, IB, BLAS to GPU
- only update TLAS

I haven´t done the streaming part, but this was on my route. Is that impossible ?

The update time of the TLAS seemed to me much less time critical than the creation of the BLAS.


evelyn4you said:
- copy BLAS to CPU space and store to disc.

Tbh, i did not know this is possible! Is it?
Though, i guess what you get is vendor specific data. Your data from using a RTX 2000 might not work on a RTX 3000, future models, AMD or Intel GPUs?
I guess the data might even break after the next driver update.

Maybe you can clarify this. Would be interesting. Personally i haven't arrived at next gen HW yet.

But even if this would work, i would still rant.
The whole idea of static BLAS models breaks once you introduce some solution to achieve continuous LOD, because then the mesh constantly changes.
RT APIs literally prevent us from coming up with a solution of the LOD problem, which is one of the key problems in computer graphics.
As i work on LOD, i'm pretty mad about those short sighted amateurs ;D

JoeJ said:
Though, i guess what you get is vendor specific data. Your data from using a RTX 2000 might not work on a RTX 3000, future models, AMD or Intel GPUs? I guess the data might even break after the next driver update. Maybe you can clarify this. Would be interesting. Personally i haven't arrived at next gen HW yet.

To clarify this one needs to read functional specifications of both - DXR and Vulkan Ray Tracing, and both are… disappointing at least. Apart from definition of TLAS (scene level) and BLAS (primitive/instance level) - there aren't requirements of acceleration structure to be of specific type. Now keep in mind that there are many acceleration structures and pretty much all of them can be multi level ones, that includes: Grids, BSP-trees (and KD-trees), various BVHs that are incompatible to each other (standard binary BVH, ternary BVH (or QBVH), 8 BVH (i.e. each interior node has 8 children), etc.), BIH (Bounding Interval Hierarchies), etc. etc.

Let's assume 2 cases (and unless I'm terribly blind - the specifications do allow for this):

Hardware Vendor A:

  • Builds BVH tree using standard SAH-BVH algorithm
  • Decides that they have implementation of QBVH traversal faster than standard binary BVH, so right after build they just transfer BVH to QBVH (therefore removing every 2nd layer in the tree).

Hardware Vendor B:

  • Builds BVH tree using the same standard SAH-BVH algorithm
  • Decides to be more generic and uses standard binary BVH

If you serialize BLAS from vendor A, it is incompatible with vendor B hardware. And remember at any point they can release a just a driver update which makes previous versions completely incompatible. Yeah.

All in all, one can always use compute to build ray tracer, acceleration structures, etc. etc. (which I've been doing for years so far … fun fact is that if I'm not mistaken Unreal Engine also went this way) - what's the point of additional hardware though, as of me it would simply be better to increase number of processors, increase memory, bandwidths, etc. Instead of going into this direction. Vendor's sales teams though about different route though (doing comparison videos of RTX vs No-RTX are often laughable - but apparently worked as sales point by cleverly picking extreme scenarios). While I get the hype among consumers, I absolutely not get hype among developers - it is fixed function pipeline with all its pitfalls all over again (which we never even asked for, because real time ray tracing is possible for over a decade so far).

To follow up, with current DXR/VulkanRT specifications we are bad direction - there is no acceleration structure to rule them all.

  • For static geometry the best way is to just build one huge SBVH (also known as Split-BVH), which can take up to few minutes (or more) for standard scene … precomputing is the base way of course
  • For dynamic geometry like trees or grass or ocean surface - one most likely wants also SBVH or even SAHBVH variant (Note: one may also want to fine-tune parameters to make sure refit for wind animation is performance-wise the most ideal) … but you need to refit the BVH most of the time on the fly (not rebuild) - as the movement is subtle
  • For deformable geometry rebuilding may be the only option (at which point HLBVH or similar algorithm might be viable)
  • For skinned geometry it might be possible to precompute & refit between precomputed keyframes … during animation blending or ragdoll (generally physics taking over) rebuild may be the only option
  • For particle systems doing fast LBVH/HLBVH rebuilds might be the only option, refit would often produce way too poor performance

And you could go on. Current specifications don't allow this, and ever won't. It is likely there is going to be either a major re-spec, or DXR/VulkanRT won't be nearly used as much as public would expect.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Hi Vilem Otte and JoeJ,

all what you say about the problems of dynamic bvh update are right.

Maybe a little missunderstanding is happening here because i did not correctly say what i mean.
what i meant:

the BLAS buffers shall be precomputed and stored only once on the user machine so the BLAS buffer will be vendor specific but that will be no problem.
According to the specs. The BLAS do NOT have any live reference to the vertex- and indexbuffer. They have not to reside in the ram. E. g. after creating the BLAS the buffer could even be unloaded until they are needed.

After generating the vertex- and indexbuffers alle the "big data" is in the ram and the creation of BLAS is very, very fast ( some seconds or so )

As we all know, the “real truth” comes us when we really want to implement it, which i havent done yet.

About nanite technique:

According to the published docs and the source code epic games runs partly in the same problem that was mentioned.
Yes, they have solved a ver big everlasting problem of continuous LOD handling even paired with data compression.

But the necessary preprocessing of the nanite meshes is much, much more complex and CPU intensive with all the graph theory behind.
Using nanite for realtime character animation or water animation or rasterizer comparable folige animation will probably be impossible in the near future.

So developer also have to go the hybrid route and mix old lower poly mesh technique and new nanite pipeline which is handled smart by UE.

Vilem Otte said:
To clarify this one needs to read functional specifications of both - DXR and Vulkan Ray Tracing, and both are… disappointing at least. Apart from definition of TLAS (scene level) and BLAS (primitive/instance level) - there aren't requirements of acceleration structure to be of specific type. Now keep in mind that there are many acceleration structures and pretty much all of them can be multi level ones, that includes: Grids, BSP-trees (and KD-trees), various BVHs that are incompatible to each other (standard binary BVH, ternary BVH (or QBVH), 8 BVH (i.e. each interior node has 8 children), etc.), BIH (Bounding Interval Hierarchies), etc. etc.

The reasoning is obvious: By not specifying the data structures, vendors can do what suits their approach best.

Nothing wrong with that. But all 3 vendors communicate that they use BVH, Which is the most widely used for raytracing in general. Thus we can rule out all the alternatives, making the request on a BVH API reasonable.
But then, how should we deal with the differences? We know AMD uses a branching factor of 4 from their RT instructions. NV and Intel may use binary trees, up to branching factors of maybe 64. NV may even use compression based on treelets.

So what we need is an API in form of special compute functions to generate the BVH data with the necessary amount of abstractions, plus a specification we can query from the driver, so we know which exact requirements the custom BVH has.
I see this is pretty involved, and not many would want to use such low level stuff. We end up writing code for multiple vendors and chip generations. But if we want any form of continuous LOD, we simply have no other choice.

It will take many years until we get this, if at all.

The workaround is to do what Epic does with Nanite: Use low poly proxies with discrete LOD for raytracing. No details, no exact results, but at least that's doable.
But this also means: Keep the legacy crap around just to support raytracing. Even if we have just solved the LOD problem for rasterization, we are still stuck and left with all the problems from before for raytracing.
This is not acceptable imo. It's better to choose: Either LOD or raytracing. Pick one of those two wrong options.

Vilem Otte said:
All in all, one can always use compute to build ray tracer, acceleration structures, etc. etc. (which I've been doing for years so far … fun fact is that if I'm not mistaken Unreal Engine also went this way)

UE does software tracing of SDF volumes, but now you can completely replace this system with DXR and proxies, afaik.
Pros and cons are scene dependent. DXR sucks in scenes with many overlaps due to kitbashing (early cave demo), but wins in modular scenes (city demo).
Neither option will give accurate high frequency details in general. SDF is an approximation, DXR traces low poly proxies. Thus you could not use RT for shadows. Instead they keep using ‘crappy legacy shadow maps’, because 'awesome and innovative raytracing' is not even compatible with their geometry, which still is just triangles.
Beside that, although they already have BVH on disk (Nanite), they still need to build RT BVH at runtime as well. (One may argue Nanite BVH is not good for RT, but in my case it would be, and Epics does look good for such purpose too.)

I'm not impressed from this Patchwork. It's garbage, waiting to be replaced with something better. Just, the smart guys at MS and NV forgot about giving us options to do so. They thought the classic approaches from the 70's and offline rendering are fine for games, and the next big thing for sure. <:]

Now the damage has been done and it's very hard to fix that mess.

Vilem Otte said:
I absolutely not get hype among developers

I was hanging around a lot on b3d forum to discuss those issues. Many industry experts there. It were the most heated discussions of my internet life. Total fun as well, but opinions, expectations, arguments and agendas just vary wildly.

Idk, but ofc. my request on dynamic geometry does not represent state of the art in games tech. I see that, and beside that, raytracing is pretty nice.

But to me, ‘state of the art in games’ means one thing at first: Constant progress. I want the next game to look better than the former game. RT may give us that, but people can not see the doors it closes, the double work it requires, and that flexibility is more worth than a temporary performance advantage.

Vilem Otte said:
For skinned geometry it might be possible to precompute & refit between precomputed keyframes … during animation blending or ragdoll (generally physics taking over) rebuild may be the only option

We can not precompute based on key frames. We have dynamic ragdolls, blending animations, procedural animation, etc.
For chracters or foliage it might be best to precompute BVH at reference pose, extend bounds so they bound all possible animations, and then transform the nodes per frame. Notice, due to the extension, we do not even need to refit the BVH (causing expensive barriers between tree levels). Simply transforming the bounds is good enough.
The resulting overlaps of bounds would decrease tracing performance, but cost to update BVH is basically zero. Sounds like a win to me.

Vilem Otte said:
It is likely there is going to be either a major re-spec, or DXR/VulkanRT won't be nearly used as much as public would expect.

If there is enough pressure from the devs, they will solve this more sooner than later. But at the moment high HW prices are the bigger problem preventing RT adoption. And i don't think this will change either.

I see two camps: Devs which want to deliver cutting edge tech, so they request high minimum specs, targeting an enthusiast niche of gamers.
Devs which try to keep gaming affordable, investing more in art than tech.

The former makes little sense currently, so i assume NV pushes RT a lot with both money and assistance. But NV does not listen to what people want. Instead they tell them how and with what technology they will get what they want. They pretend the role of the innovator, while what they actually want is to sell bigger and bigger GPUs. For them, gaming is ideally a status symbol of cutting edge hardware.

And their strategy just works. Nobody doubts it, despite the obvious conflict of interest.

IMO, what we want is a small box with a 5tf APU. Low power, but powerful enough to game and do all other PC things. Something like Series S, or even a M1 chip for games.
Cheap hardware, and if we lower some expectations, just fine. This should be what the gaming industry wants, no?

It does not seem so. Actually we almost have such chip: AMDs Rembrandt has a 3.5 tf GPU, if i guess correctly, at 60 Watt.
I like that. I think that's what i need to deliver UE5 level gfx. But at 60 fps, not 20 like in that city demo.
So i tried to get one such laptop.
But all such laptops are ‘enthusiast gaming laptops’. Expensive, and they have a discrete GPU like a RTX 3050 at least. That's a 5.3 tf GPU. No big win over the iGPU. A waste of money and power. Ridiculously stupid.
I did not find a single model without dGPU. So again, i'll keep my money, and stick at retro games.

From my perspective as a PC player, gaming currently really looks bad, no matter from what angle i look at it. There are no interesting offers, regarding both games and hardware.
But well, we've had some lows before.

JoeJ said:
The workaround is to do what Epic does with Nanite: Use low poly proxies with discrete LOD for raytracing. No details, no exact results, but at least that's doable.

Doable, but at the cost of losing ray tracing advantages, ending up with heavier algorithms. Speaking about reflections - this solution is about as hacky as cube maps, but at astronomically higher cost. Speaking about shadows - shadow maps will beat this in terms of accuracy, not even mentioning performance. And so on… You will literally lose all advantages over previous approaches, at cost of being (much) slower than those. Doesn't seem like a win to me.

JoeJ said:
DXR sucks in scenes with many overlaps due to kitbashing (early cave demo), but wins in modular scenes (city demo).

Hm… now I'm going into assumptions here (I don't know implementations for its BLAS/TLAS … and if someone does, he is free to correct me here), but Sponza (and Crytek version too) were kind of tricky for ray tracers. I remember from the time when I worked on my thesis that different approaches for BVH didn't really work that well for it until I started messing with Split-BVH (which took multiple minutes to build on high end hardware at the time … Split-BVH builder was of course running on CPU). The difference between using binned SAH-BVH and proper Split-BVH were huge, clearly at least in order of magnitude when doing ray traversals after.

My assumption here is that vendors don't really use algorithms like Split-BVH due to their enormous requirements on time, but instead they favor binned approaches (like good old binned SAH-BVH) or morton code ones (like HLBVH - which can use GPU to significantly increase their build time). This would explain why overlapping geometry could cause such problems for DXR.

JoeJ said:
Idk, but ofc. my request on dynamic geometry does not represent state of the art in games tech.

At least terrain systems dare to disagree in general.

JoeJ said:
For chracters or foliage it might be best to precompute BVH at reference pose, extend bounds so they bound all possible animations, and then transform the nodes per frame. Notice, due to the extension, we do not even need to refit the BVH (causing expensive barriers between tree levels). Simply transforming the bounds is good enough.

General problem with acceleration structures which I had in the past (and still have) is, that the performance difference between good acceleration structure that is refitted after few frames of character animation … and the one that is re-built even with slightly worse algorithm like HLBVH - always ended up in favor of rebuilt. Extensions and refit within bounds doesn't seem that attractive to me. But this being said, I have not tried it - and the more thought I give it (I'm rewriting this paragraph like for the 10th time) … it might not be that bad idea for foliage.

Practically you would build a standard BVH with algorithm of your choice. Then in a loop over the whole animation recalculating vertices in the leaves - extending the node itself to the bounds of the animated vertices throughout whole animation loop. At last you'd simply go bottom-up and refit interior nodes to contain anything below them. Your BVH would be overestimated whole time - but as long as animation is subtle (exactly like grass blades in wind or such) it's going to work correctly. But as for performance one would need to compare it to rebuild/refit (this could be even automated and choose what's the best strategy for given animation).

JoeJ said:
But at the moment high HW prices are the bigger problem preventing RT adoption. And i don't think this will change either.

Yup, I can confirm - I bought Rx 6800 on launch day and even then Ferrari looked like cheaper investment next to it. Since then the GPUs just got even more expensive.

I'm still not confirmed on devs going in 2 different ways - market will win, it always does (whatever sells more). For me it always is whether graphics fits the genre and the game. Yes, plausible shadows are great, perfect reflections and GI looks awesome, and flying decapitated parts of enemies make the wow effect. But in the heart of the game - I love stories, characters, writing and gameplay - the thing that holds you tight to your screen overnight. The wow from graphics catches the eye, but it can't fill in lacking heart of the game. And from my point of view it looks like most games in almost past decade have, sadly, forgotten that (kudos to smaller studios, often with custom engines, that still deliver the experience I'm looking for).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Vilem Otte said:
You will literally lose all advantages over previous approaches, at cost of being (much) slower than those. Doesn't seem like a win to me.

Yeah. I have some packet traversal in mind, which will do more work than classic raytracing, but solves the random access problem. And i can fake cone tracing by using the LOD embedded in my BVH hierarchy.
That's some advantages, so maybe my approach will be fast enough. Worth a try, but i won't be surprised if it's just a waste of time.
But in any case it will be faster than using DXR, which would force me to rebuild BVH for the entire scene each frame due to LOD.
Using low poly proxies is no option either, because i already have a complete but low frequency lighting model from my GI. So my only interest in raytracing is to get the high frequency detail i currently miss.

Vilem Otte said:
My assumption here is that vendors don't really use algorithms like Split-BVH due to their enormous requirements on time, but instead they favor binned approaches (like good old binned SAH-BVH) or morton code ones (like HLBVH - which can use GPU to significantly increase their build time). This would explain why overlapping geometry could cause such problems for DXR.

No, the problem is completely independent from BVH quality, speaking of UE5.
It's about having very high detail, and to fit this into memory, but mainly on disk, they need to use heavy instancing of their models.
And if you want to compose a nature scene from a limited set of instanced rock models, you have to generate natural variation by combining multiple models to form new and unique shapes. That's where the overlaps come from.

Now we could solve this problem, e.g. my removing interior geometry, and build just one BVH over the final result. But then we can not precompute BVH, because it would take terabytes of storage.

So our only option is to generate BVH for each model, and each model becomes it's own BLAS. But then BVH instances overlap, no matter how good their quality is. And a ray has to traverse all overlaps to find the closest intersection.

To get the best of both worlds, we could use a bit vector per instance, to disable branches of the interor BVH. Such bit vectors would fit on disk, and tracing speed will be much better.

Though, to do such things, we would need some flexibility. And NV does not want to give us that but rather teraflops. So they depicted their legacy offline Optix API to be the next big thing for games, and Microsoft took the bait.
If people are stupid enough to buy snake oil without asking a doctor on his opinion, there is just nothing you can do about that.
I wish our tech world would be less dominated by a tiny set of megacorps, then such mistakes would be prevented by competition.

Vilem Otte said:
At least terrain systems dare to disagree in general.

Yeah, but unfortunately you can just use a static high res hight map, so this argument does not make enough impact to be heard.
I would hope more companies work on LOD, so more people realize the problem and complain.
But guess what? They just scratch their engines and switch over to UE.
So it's just Brian Karis asking ‘How to do LOD with DXR?’ on twitter, and me invading tech forums. : /
They surely talk, complain, criticize and request changes, but MS and NV will just listen and say ‘We work on it. We'll come up with something. Just wait and see. We are experts and know what we do, and what's currently possible.’
blah, blah, blah

Vilem Otte said:
General problem with acceleration structures which I had in the past (and still have) is, that the performance difference between good acceleration structure that is refitted after few frames of character animation … and the one that is re-built even with slightly worse algorithm like HLBVH - always ended up in favor of rebuilt. Extensions and refit within bounds doesn't seem that attractive to me. But this being said, I have not tried it - and the more thought I give it (I'm rewriting this paragraph like for the 10th time) … it might not be that bad idea for foliage.

I lack experience with classical raytracing, so i'm not sure either. I just think we don't have so many characters, and we can cull some foliage with distance, but we can not rebuild BVH for a lot of stuff at runtime. And awesome hardware acceleration should compensate for some bad quality BVH.

It remains an open problem. But hey - it's no longer our business. Our hands are bound, and we can do nothing anymore to improve raytracing performance. Let the geniuses at NV care about it, they know better anyway. Maybe machine learning can solve this problem, like any other as well, and we finally get some work to do for all those useless tensor cores. \:D/

Vilem Otte said:
Then in a loop over the whole animation recalculating vertices in the leaves - extending the node itself to the bounds of the animated vertices throughout whole animation loop.

A better approach would be to use definition of joint limits to generate random poses, and build the bounds from that. Because all your animations will still miss a lot of poses which might happen at runtime due to ragdoll physics.
Top level bounds would increase and overlap like crazy, but the further down the tree you get, the lesser the issue. That's not bad.

To get the best of both worlds, we could so rebuild only the top levels of the tree, while transforming the massive lower levels.
Separation into top and low levels, treating them differently is always a good idea. But we actually need a hierarchy of such separations. Just one single TLAS over all BLAS is not enough. We need fine grained options to be efficent.
So we get to a very similar situation as with LOD: BVH must be opened up to enable local processing and optimization.

Vilem Otte said:
market will win, it always does (whatever sells more).

That's very optimistic.

The problem is: The market is only as good as the options it gives. If it gives no good option at all, the good option can not win.
And that's our current situation.
Devs: ‘This is the best game you can get. It did cost half a billion to make it. It’s the same game as the former game, but way bigger!' Nobody takes the risk to work on some new gameplay ideas, so nobody notices the stagnation, because it's state of the art and all there is.

HW Vendors: ‘Moores Law is dead, but our new GPU is still twice as powerful as the former. It takes 1000W, and you'll pay premium, but it's worth it. You will play the newest games at max settings, and this looks soooo much better.’ And there is no alternative offer to that. I can't get the laptop i wanted, next GPU generation will lack entry level completely, as far as we know.

And new players can't enter the market. You can not make some fancy new console today anymore, or a new platform. It simply won't compete against the industry behemoths, no matter how wrong they are.
And the behemoths can not react to the current time because they are too big, and too restricted by their business relationships.

The market won't self regulate, it will simply collapse. Game over.
But maybe there's a reboot after that, at much smaller scales.

Ofc. i hope i'm just too pessimistic, but it seems we are beyond our peak in any regard, not just in tech. It goes downhill, and capitalism can't deal well with such situation.

This topic is closed to new replies.

Advertisement