slab: cache sizes for kmalloc

Discussion:

Maksym Planeta

2011-03-17 23:18:20 UTC

There are predefined cache sizes in <linux/kmalloc_sizes.h>. But I don't
understand why exactly these sizes were chosen.

I've wrote a hook were I've counted witch object sizes are the most
popular. They were objects of sizes 8 and 16 bytes, but the smallest
available cache has size 32 bytes. So in this cache fragmentation is
about 40%. There is big fragmentation in 512 and 1024-byte caches too --
25 and 35 percent correspondingly. Also there are empty caches, all DMA
caches on my system are empty. In total there is wasting of memory.

That's why, I think that caches for kmalloc can be created dynamically.
For example, if I have 32-byte cache, but it's fragmentation exceeds
level of 20% can be created new cache with smaller size and new objects,
that fit this new size, should be allocated there. But if there are too
little objects in the cache, new allocating to it can be stopped and
with the lapse of time when it become empty it could be destroyed.

The aim is to reduce memory waste and make fragmentation nearly equal.
So I would like to know is there any sense in such cache management. If
yes, I'll work on this.

--
Thanks,

Maksym Planeta

Mulyadi Santosa

2011-03-17 23:56:46 UTC

Permalink

Hi....

Probably just a quick share from me...

Post by Maksym Planeta
I've wrote a hook were I've counted witch object sizes are the most
popular.

Uhuh, and why you just don't use "slabtop" utility which just use
/proc/slabinfo?

Post by Maksym Planeta
They were objects of sizes 8 and 16 bytes, but the smallest
available cache has size 32 bytes. So in this cache fragmentation is
about 40%. There is big fragmentation in 512 and 1024-byte caches too --
25 and 35 percent correspondingly. Also there are empty caches, all DMA
caches on my system are empty. In total there is wasting of memory.

I think 32 byte is chosen due to the size of the page in x86 32 bit ==
4 KiB... by doing that, cache is simply allocated using page_alloc (or
alloc_page? I forgot) and then later "teared apart" into slab
objects...

--
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

Maksym Planeta

2011-03-18 05:52:19 UTC

Permalink

Post by Mulyadi Santosa

Post by Maksym Planeta
I've wrote a hook were I've counted witch object sizes are the most
popular.

Uhuh, and why you just don't use "slabtop" utility which just use
/proc/slabinfo?

In slabinfo I can see which cache how many objects has. But I was
interested witch object sizes are requested most of all. And there isn't
such information in slabinfo. For example, if I request 8 bytes 32-byte
object will be allocated. And there is no information in slabinfo how
much memory I really needed.

Post by Mulyadi Santosa
I think 32 byte is chosen due to the size of the page in x86 32 bit ==
4 KiB... by doing that, cache is simply allocated using page_alloc (or
alloc_page? I forgot) and then later "teared apart" into slab
objects...

But in slub allocator there are 8- and 16- byte caches. Why in slab
can't be the same?

--
Thanks,

Maksym Planeta

Mulyadi Santosa

2011-03-18 22:26:15 UTC

Permalink

Hi ....

Post by Maksym Planeta
In slabinfo I can see which cache how many objects has. But I was
interested witch object sizes are requested most of all. And there isn't
such information in slabinfo. For example, if I request 8 bytes 32-byte
object will be allocated. And there is no information in slabinfo how
much memory I really needed.

Hm... alright, if that's the one you seek,maybe slabinfo can't provide
it.. although once I think you can approximate it by number of
objects. But since you need to compare between requested v.s actual
allocation, that would be something hardly provided in slabinfo AFAIK

Post by Maksym Planeta
But in slub allocator there are 8- and 16- byte caches. Why in slab
can't be the same?

"
config SLAB
bool "SLAB"
help
The regular slab allocator that is established and known to work
well in all environments. It organizes cache hot objects in
per cpu and per node queues.
"

I am thinking about the word "cache hot objects". Well, IMHO, it is
achievable by allocating biggest page size possible (without using PAE
etc), and that's 4K in x86.

So we get this 4 KiB arena and put it as close as possible to the
needed CPU ( to avoid cache ping pong AFAIK)...or in NUMA case, to
make it real close to the needing CPU.

By using the normal granularity (which is page size), I think moving
cache will be a lot simplier....

just my thoughts...

--
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com