🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Benefits of allocating all memory of a structure contiguously

Started by
2 comments, last by Alberth 4 years, 4 months ago

Hello,

I was wondering if allocating all members of a structure contiguously helps the CPU to reduce number of fetches from main memory / disk.

Example:

struct MyStruct
{
  void* data;
  void* data1;
};
void initMyStruct(MyStruct** ms)
{
  size_t requiredSize = sizeof(MyStruct);
  requiredSize += sizeOfData;
  requiredSize += sizeOfData1;
  BYTE* mem = allocator(requiredSize);
  *ms = (MyStruct*)mem;
  (*ms)→data = *ms + 1;
  (*ms)→data1 = mem + sizeof(MyStruct) + sizeOfData;
}

Does the code above have any benefit over allocating MyStruct::data and MyStruct::data1 separately? Like will it reduce number of accesses to main memory?

As far as I remember, data gets loaded into cache in chunks based on the line size. So only a part of MyStruct::data may get loaded if we assume sizeOfData greater than the line size. So we will need another trip to main memory to load the remaining parts of MyStruct::data and MyStruct::data1. Assuming all this is correct, is there any advantage in allocating struct members contiguously with the struct?

Thank you

Advertisement

Well you gain from not calling your allocator/dealloctor so many times, some of these can be expensive. But I'd certainly wait until they show up on a profiler.
You can get many benefits just from better allocators (e.g. consider that malloc/free don't go straight to the OS kernel, they already have their own internal optimised structure, and then you have things like pools and other specialized allocators) or avoiding allocations entirely (which I guess you already looking at. Combining like you showed, or using the stack for temp variables, and pre-calculate string/array/etc. size instead of dynamically growing).

Otherwise as far as I recall,

At best you save two trips for the cache (for three items, struct, data, data1) where the combined size is less than the three separately. The allocator would also need to be aligned to the cache. CPU's are pretty smart about fetching memory these days, I think mostly this would only be really measurable if you have a lot of `MyStruct` objects in a loop.

For swap/page/mapped memory hitting the disk, the blocks are much larger, I think a minimun of 4KB, and it is also much slower, even on an SSD. So if you were doing that, combing stuff that is accessed together into a physical page can be a great gain (again the allocator would need to do this). But I think in the context of a game, you probably don't want to be touching the swapfile anyway, at least for anything performance sensitive.

graphics_programmer said:
Does the code above have any benefit over allocating MyStruct::data and MyStruct::data1 separately? Like will it reduce number of accesses to main memory?

struct MyData {
    struct SubData1 data;
    struct SubData2 moreData;
};

seems like a much simpler way to get the same effect, and it saves you two pointers.

If MyData doesn't fit in a cache line, then likely you're having close to 0 effect. One additional memory access typically dwarves any time you save by doing a few cycles on the first part of the second sub structure. You do get the advantage of pre-fetching by the cache though if you nicely sequentially access the first sub-structure.

Of course, all of this is pretty much at the bottom of getting performance, using a good algorithm and good data structures for the problem is going to give you much more gain.

This topic is closed to new replies.

Advertisement