A basic test brought up many questions about the btrfs filesystem design. Btrfs is a filesystem that uses the b-tree algorithm. There has been a debate if it is a good idea to use b-trees for filesystems.
I'm not enough into algorithms, but I let you decide...
The test consists of a loop, creating as much 2k sized files as possible on a 1GB Filesystem:
# for i in $(seq 1000000); \
do dd if=/dev/zero of=/mnt/file_$i bs=2048 count=1; done
(terminated after getting "No space left on device" reports).
The result from Edward Shishkin (RedHat) was 59480 Files. This would give us 2048*59480 ~ 116MB. Or in other words, we would waste around 880MB of Space.
In the meanwhile, Chris Mason, inventor of btrfs created a patch for increased utilisation.
He was able to achieve 106894 Files. That's 208 MB or a waste of 800MB. I'm not sure what he meant by the comment about the duplicate of metadata. Maybe, If you took the duplicates away, you're able to store as twice as much data on it (putting data at risk?). It still wastes almost 60% of the space...
Next on the list was of course to see how ZFS behaves.
I've created a testpool from a 1GB File. The usable space according to zfs list is 984M. I was able to squeeze in 444555 files. This results in 868MB of used space.
If we compare this to the initial 1GB capacity, we lose about 13% for metadata etc. I think that's not much, if we think of all the validation and checksumming happening behind the curtains...
Now, someone might say, nobody stores that many small files. On our mail platform we do. So any wasted space cost $$$ in the end.