Monday, November 14, 2016

the malloc challenge

Welcome to the malloc challenge. The task is to implement an application level memory allocator in C that outperforms provided implementations according to specified evaluation criterias; using the standard library, libc4life, creativity and skills. Links to all submitted implementations will be added to the end of this post with the goal of gathering enough input for a follow up to analyze the results. Discussions and submissions are directed to the posts on Reddit and Hacker News.

evaluation criterias


correctness

For a submitted implementation to be considered valid, it has to run the benchmark in valgrind without producing warnings and free any memory allocated from the heap.

performance

A benchmark is provided that measures the time each implementation takes to allocate and deallocate a large number of random sized blocks.

focus and simplicity

Implementations that provide specific features, solve specific problems well and play nice with other allocators are preferred to global, time traveling masters of all trades.

example

The following minimal implementation may be used as a starting point.



reference implementations

Four reference implementations are provided, a basic allocator that delegates to malloc/free, a slab allocator that allocates memory in blocks of user defined size from any upstream allocator, a pool allocator that adds the ability to track pointers to any allocator, and a free list allocator that adds pointer recycling on top of pool allocators.

the pool allocator (mpool.h / mpool.c)

The pool allocator allows treating sets of separate allocations as single blocks of memory, while retaining the ability to release individual pointers. The data needed for book keeping is prefixed to each allocation and supports O(1) addition and removal without additional allocations.

the slab allocator (mslab.h / mslab.c)

The slab allocator allocates memory as slabs of user defined size and keeps track of available space within them. Since it doesn't keep any information about individual allocations; the only way to release allocated memory is to free the allocator. It's useful for reducing the number of actual allocations that a downstream allocator causes.

the free list allocator (mfreel.h / mfreel.c)

The freelist allocator is used to recycle released pool memory, it reuses the embedded book keeping data to track released pointers.

performance comparison

The short story is that slabs are faster than the basic allocator; adding a free list to recycle pointers is typically about as fast, while reducing memory usage. The reason the pool/slab combination sticks out is most probably that the added overhead pushes the number of slab allocations, the problem disappears when a free list is added in front. I still haven't solved the mystery of why the pool allocator by itself is faster than the basic allocator; some kind of alignment effect from the prefix, maybe; all I know is it executes additional code before delegating to the basic allocator, allocates additional memory, and still manages to run faster.

getting started

libc4life includes everything you need to get started; start by forking the repository and following the instructions to build and run the tests. Add the files for your allocator to the src/mem directory, add the implementation below the others in tests/malloc_perf.c. Rerun cmake/make and launch 'build/tests' to run the benchmark.

May the source be with you...

peace, out