0

I have a memory-heavy application which is supposed to run with low latency and with constant speed, but in practice it has poor performance during the first few seconds of startup. This appears to be because the initial memory accesses triggers page faults which have significant performance implications.

I would like to try preallocating a single large block of memory, paging it all in (via mlock() or just by touching each byte), and then using a custom malloc()/free() implementation to ensure that all further allocations are done from within this block.

I am aware of numerous custom memory allocators (TCMalloc, Hoard, jemalloc, etc) but it is not clear to me whether they can be backed by user-provided memory, or whether they always perform their internal allocations from the OS. Does anyone have any insight or recommendations here?

To be clear, I am not looking for a memory pooling system (which would be for reusing small objects). The custom implementation of malloc()/free() should be able to perform any size allocation while limiting fragmentation of its backing store and following other best practices.

Edit based on comments: I do not expect to make the system faster - I just want to move the slow part (allocation, initial page faults) to the start of the process, and then do the real computation work once the system is 'primed'.

Thanks!

6
  • 1
    "First few seconds" is rather short time. Especially if the total runtime of your program stretches over several minutes, hours or even days. Are these "few seconds" really such a big problem for your larger system? Also, even if you allocate a large memory area and "touch" bytes in each page, you will get several page-faults to actually create and map the pages to your process, so the performance gain might not be as big as you expect. Commented Jul 15, 2022 at 8:14
  • 1
    Did you already consider std::pmr::monotonic_buffer_resource? Commented Jul 15, 2022 at 8:15
  • 1
    I'm not sure how you expect this to help. If the OS needs time to zero out pages, nothing you do can help with that. And eventually, all memory comes from the OS. You should profile the problem first to find the source of the delays. Finally, "fragmentation of its backing store"? That does not sound like a real problem to me. Commented Jul 15, 2022 at 8:15
  • 1
    @Someprogrammerdude It depends on the system, but it can impact performance for the first 20-30 seconds if there are a lot of threads (each thread allocates memory once it gets assigned some work). One of the main challenges it that this makes calibrating the system difficult, because for calibration purposes I would ideally run for a relatively short time. So I'd like to get the system 'primed' and as ready a it can be before I start the actual processing. Commented Jul 15, 2022 at 8:25
  • 1
    @MSalters No, I don't expect to make it faster overall. I would just like to move the slow operations (allocation, paging) to the start of the process, so that they do not affect performance once I start the real work. As for fragmentation, I believe it is a real problem that memory management systems need to address (though not something which the user is typically exposed to). I'm not well-versed there though. Commented Jul 15, 2022 at 8:39

1 Answer 1

1

A bit late to the party.

dlmalloc is one choice that can be backed by pre-allocated memory. You can find it here. You may just need to add some extra definitions in the beginning to force it to use your pre-allocated memory rather than call the system mmap, you can refer to the nice documentation at the beginning of the file.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. I've moved on to the next project now so I won't get a chance to test this, but I'll accept it because it seems reasonable and is the only answer I got.
For the occasional googler, here's also an example of it being used. In this instance, the user allocated a huge shared memory heap, then uses dlmalloc to allocate from it as needed by their application.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.