In my last post I was talking about 1 Million mailboxes. Each of them is a directory with several subdirectories, like Trash, Sent Items etc. The mailbox directory itself lies 4 directories below the root node (like /a/b/c/mailbox). The hierarchy is managed by our mail-store application. I don't know the average number of files/directory, but let's assume, each mailbox consists in average of 20 files/directories, we would currently have about 20'000'000 inodes. Mailbox access is mostly random. We don't know when a mail is coming in, we also don't know about when a user is reading his mails. What we now from experience is, that a lot of time is spend in looking up metadata. With mostly random access (we measured it as ~ 55% write / 45% read a while ago), and the amount of data, the chance to identify a data working set is quite low. Ok, maybe recently received emails could be part of a "working-set". But wouldn't it be great if we could cache as muc...