🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

[D3D12] Best approach to manage constant buffer for the frame?

Started by
8 comments, last by MJP 3 years, 6 months ago

Hello everybody!
Using Direct3D 11 you could use one constant buffer multiple time but with Direct3D12 this is not the case anymore.
I saw multiple text about it and it looks like one approach can be to offset a large allocated memory and create a view with this offset.
The problem is you need to set the good size so it's not a safe approach but is it the recommended way?
Another approach would surely be to create on the fly a constant buffer using a pool.
What is the best recommended way to handle constant buffer in Direct3D 12?
Thanks!

Advertisement

in d3d12, u can create your cb like this:

// pseudo

num_frame_buffers = 2 // double-buffer (for example)
ID3D12DescriptorHeap* cb_in_system_mem[num_frame_buffers]
ID3D12Resource* cb_in_gpu_mem[num_frame_buffers]

struct const_buf
{
	mat4 transform
}

const_buf data

UINT8* const_buf_gpu_address[num_frame_buffers]

void init_best_engine_once()
{
  foreach (n : num_frame_buffers)
  {
  	...
  	// create cb cpu-side
  	device->CreateDescriptorHeap(... IID_PPV_ARGS(&cb_in_system_mem[n]))
  	...
  	// create cb gpu-side
  	device->CreateCommittedResource(...IID_PPV_ARGS(&cb_in_gpu_mem[n]))
  	...
  	// create view for cb
  	device->CreateConstantBufferView(... cb_in_system_mem[n]->GetCPUDescriptorHandleForHeapStart())
  	...
  	// get a gpu location for cb to copy data to
  	cb_in_gpu_mem[n]->Map(... &const_buf_gpu_address[n])
  	
  	// initial update of transform data into cb gpu location
    memcpy(const_buf_gpu_address[n], &data, sizeof(data))
  }
}

void update()
{
	...
 	// update your constants with memcpy or whatever...
 	data.transform = to_mat4(new_rotation)
 	...
 	// update
 	memcpy(const_buf_gpu_address[n], &data, sizeof(data))
}

void render()
{
	...
	// get current frame buf
	bb = swap_chain->GetCurrentBackBufferIndex()
	
	ID3D12DescriptorHeap* my_sys_mem_heaps[] = { cb_in_system_mem[bb], ... }
    
    // set cb and others...
    el->SetDescriptorHeaps(.... my_sys_mem_heaps)
	...
	// set cb view
	el->SetGraphicsRootDescriptorTable(0, cb_in_system_mem[bb]->GetGPUDescriptorHandleForHeapStart())
	...
	el->DrawIndexedInstanced(...)
}

void main( )
{
 	init_best_engine_once( )
 	loop
 	{	
 		update()
 		render()
 	}
 	end_best_engine( )
}

ok so i have knocked up a quick simplified pseudo for your OP;

notice in the update( ) you only have the memcpy to update yr cb and in the render you set the cb view for the current frame buf bb;

take some time to inspect it, it should become clearer;

have fun ?

This solution works if you render one object using one constant buffer, the problem comes if you want to reuse the constant buffer for another object, then it's where this strategy is not working anymore where for D3D11 it was working fine.

(respectfully) your usage of the word “reuse” isn't very clear:

  • a/ are you trying to change cb.transform to cb.direction for a different object?
  • b/ or are you saying that cb.transform = obj1.transform, render( ) then cb.transform = obj2.transform, render() is no longer possible in dx12?

which is it? a, b or none of above? if none of above then give us an example ?

You can have a constant buffer with the world-view-projection matrix and then you will map/unmap for one object, have the draw call for this object, then you will do map/unmap on the same constant buffer, have the draw call for this second object then you will execute all this commands but there is a conflict since the constant buffer was updated before the first draw done. Maybe it's possible to add a wait for gpu which is not the solution because that will make wait. Another approach would be to have a pool of buffer that you can cycle for the rendering of one frame but if you have 1000 object to render then you will have 1000 constant buffers to create for the first time you will render this 1000 objects. This is the main point of this question thread, how to handle correctly the rendering using multiple time a constant buffer.

u can use 1 structured buffer with multiple instances but for constants:

//pseudo
struct const_b
{
	mat4 transform
}

D3D12_CONSTANT_BUFFER_VIEW_DESC d = {};
d.BufferLocation = ...
d.SizeInBytes = ((sizeof(const_b) + 255) & ~255) * 1000 // <-- your objects 
dev->createcommitresource(...d...&cb_in_gpu_mem[n]) // like in my previous pseudo

then u want to populate this buffer like this:

// pseudo
mat4 m
std::vector<mat4> data(1000, m) // create a 1000 entries
foreach (n : 1000)
	data[n].transform = obj[n].transform
	
// update buffer
map(cb_in_gpu_mem)
{
	memcpy to cb_in_gpu_mem from data.data( ), data.size() 
}

then remember to update the shader as well:

struct const_b
 { 
  float4x4 wvp_mat; 
 }; 
 ConstantBuffer<const_b> all_objects: register(b0); 
 ...
 main( )
 {
 	all_objects[id].wvp_mat * vert (or whatever)
 }

that will get u going ?

Yes this is the only solid option I see to avoid 1000 allocation of creation of constant buffer on this example. But then that means if you reach 1000 objects and need to render 1001 objects then you enter in the trouble time. Surely a paging system is needed to allocate a new page of 1000 virtual constant buffers.

yes there's 4 ways about it:

  • gb (grow-by) method: u can destroy the previous one and create the new view to cater for the new number of objects: 1001
    pros: fast (if buffer is not growing too often + at some point in the future it will not need to grow if u can organise it in the game)
    cons: growing-by can hit perf for a frame or 2 (depends on your engine, card, or whateve…)
  • bdr (batch-draw-repeat) method: fill up your 1000, draw them, don't recreate the view, fill up the 1 & draw it
    pros: faster as no buffer recreated
    cons: n-passes
  • paging method: create a new page on the fly and fill up and draw
    pros: fast (if pages don't need to be created often)
    cons: memory waste if new pages is never filled up + n-passes (for each page)
  • large buffer method: create once and organise your scene such that you can always have the correct amount of objs fitted in and no recreation needed
    pros: fastest
    cons: hard to achieve

paging was also known as ‘bucketing’ back in the days; they were not exactly the same but very similar (bucketing took materials and textures into the equation);

if there's another method to add to this list then i don't know of it…(yet);

implement what u need when u need it ?

Generally I like to sort dynamic (CPU-writable) buffers into two use cases: temporary and persistent. This refers to their lifetime: temporary buffers are requested on the fly from a global source and can only be used for that frame, and persistent buffers who allocate their own backing memory and thus their contents will survive for as long as you want after being written to (although you can only write to them at most once per frame). For the global pool that you pull temporary buffers from there are multiple ways to handle it:

  • You can have N per-allocated buffers of a fixed size (where N is the number of frames in flight) and use atomics to linearly allocate chunks of memory from it on the fly
  • You can have buffers per-thread to avoid atomics, similar to how command allocators work
  • You can allocate chunks that you linearly allocate from, and add new chunks when you run out
  • You can use a ring buffer where the tail gets moved up as the GPU consumes resources
  • You can use more complex heap-based allocation

There's really all sorts of possibilities here, I couldn't possibly list them all. The first one though is probably the simplest to implement. If you then bind your constant buffers as root CBV's you're good to go, you don't need to mess with descriptors at all. However if you're using descriptor tables then you can potentially use similar methods to allocate and fill CBV descriptors on-the-fly. Those techniques then extend to other buffer types that require SRV's, like structured buffers. I have a pretty simple and straightforward implementation of this in my open source codebase that I use for samples: https://github.com/TheRealMJP/DXRPathTracer/blob/master/SampleFramework12/v1.02/Graphics/DX12_Helpers.cpp#L605
https://github.com/TheRealMJP/DXRPathTracer/blob/master/SampleFramework12/v1.02/Graphics/DX12_Upload.cpp#L440

For persistent buffers, the simplest approach is to have N internal buffers (or a buffer whose size is multiplied by N) where N is the number of frames in flight. Whenever you need to update the contents to you can flip to the next buffer (and descriptor if necessary) in the chain, and as long as you do that no more than once a frame you will be able to do it safely without touching memory that the GPU is currently reading from. I have a simple example of that here: https://github.com/TheRealMJP/DXRPathTracer/blob/master/SampleFramework12/v1.02/Graphics/GraphicsTypes.cpp#L325

This topic is closed to new replies.

Advertisement