Efficient metadata caching
In my last post I was talking about 1 Million mailboxes. Each of them is a directory with several subdirectories, like Trash, Sent Items etc.
The mailbox directory itself lies 4 directories below the root node (like /a/b/c/mailbox). The hierarchy is managed by our mail-store application.
I don't know the average number of files/directory, but let's assume, each mailbox consists in average of 20 files/directories, we would currently have about 20'000'000 inodes.
Mailbox access is mostly random. We don't know when a mail is coming in, we also don't know about when a user is reading his mails. What we now from experience is, that a lot of time is spend in looking up metadata.
With mostly random access (we measured it as ~ 55% write / 45% read a while ago), and the amount of data, the chance to identify a data working set is quite low. Ok, maybe recently received emails could be part of a "working-set".
But wouldn't it be great if we could cache as much metadata as possible?
Roch Bourbonnais wrote a blog entry a while ago about inodes on zfs. This is by no means a scientific analysis, but let's take his numbers:
"23.8M files consuming 27GB of data. Basically less than 1.2K of used disk space per KB of files"
Let's say, each mail/directory uses 0.2K, and we have 20'000'000 of them, we would currently have 3.8GB of inode data. No problem to cache that. I certainly have to investigate a little bit more what kind of metadata the ARC Cache is caching.
Thanks to analytics, I can at least do a bit of sanity checking, it currently shows me that around 11G inside ARC are used for metadata caching.
If we take another view, we can see that not only do we have metadata cached, it is also heavily used. In this picture I have colored all cache hits.
Lessons learned today:
-ZFS does not waste space for inodes and therefore not cache.
-ARC is very efficient
Questions to be answered:
-What does "metadata" include?
The mailbox directory itself lies 4 directories below the root node (like /a/b/c/mailbox). The hierarchy is managed by our mail-store application.
I don't know the average number of files/directory, but let's assume, each mailbox consists in average of 20 files/directories, we would currently have about 20'000'000 inodes.
Mailbox access is mostly random. We don't know when a mail is coming in, we also don't know about when a user is reading his mails. What we now from experience is, that a lot of time is spend in looking up metadata.
With mostly random access (we measured it as ~ 55% write / 45% read a while ago), and the amount of data, the chance to identify a data working set is quite low. Ok, maybe recently received emails could be part of a "working-set".
But wouldn't it be great if we could cache as much metadata as possible?
Roch Bourbonnais wrote a blog entry a while ago about inodes on zfs. This is by no means a scientific analysis, but let's take his numbers:
"23.8M files consuming 27GB of data. Basically less than 1.2K of used disk space per KB of files"
Let's say, each mail/directory uses 0.2K, and we have 20'000'000 of them, we would currently have 3.8GB of inode data. No problem to cache that. I certainly have to investigate a little bit more what kind of metadata the ARC Cache is caching.
Thanks to analytics, I can at least do a bit of sanity checking, it currently shows me that around 11G inside ARC are used for metadata caching.
If we take another view, we can see that not only do we have metadata cached, it is also heavily used. In this picture I have colored all cache hits.
Lessons learned today:
-ZFS does not waste space for inodes and therefore not cache.
-ARC is very efficient
Questions to be answered:
-What does "metadata" include?
Comments