Btrfs or how to lose space
On the linux-btrfs mailinglist there was an interesting entry about internal fragmentation in btrfs.
A basic test brought up many questions about the btrfs filesystem design. Btrfs is a filesystem that uses the b-tree algorithm. There has been a debate if it is a good idea to use b-trees for filesystems.
I'm not enough into algorithms, but I let you decide...
The test consists of a loop, creating as much 2k sized files as possible on a 1GB Filesystem:
The result from Edward Shishkin (RedHat) was 59480 Files. This would give us 2048*59480 ~ 116MB. Or in other words, we would waste around 880MB of Space.
In the meanwhile, Chris Mason, inventor of btrfs created a patch for increased utilisation.
He was able to achieve 106894 Files. That's 208 MB or a waste of 800MB. I'm not sure what he meant by the comment about the duplicate of metadata. Maybe, If you took the duplicates away, you're able to store as twice as much data on it (putting data at risk?). It still wastes almost 60% of the space...
Next on the list was of course to see how ZFS behaves.
I've created a testpool from a 1GB File. The usable space according to zfs list is 984M. I was able to squeeze in 444555 files. This results in 868MB of used space.
If we compare this to the initial 1GB capacity, we lose about 13% for metadata etc. I think that's not much, if we think of all the validation and checksumming happening behind the curtains...
Now, someone might say, nobody stores that many small files. On our mail platform we do. So any wasted space cost $$$ in the end.
A basic test brought up many questions about the btrfs filesystem design. Btrfs is a filesystem that uses the b-tree algorithm. There has been a debate if it is a good idea to use b-trees for filesystems.
I'm not enough into algorithms, but I let you decide...
The test consists of a loop, creating as much 2k sized files as possible on a 1GB Filesystem:
# for i in $(seq 1000000); \
do dd if=/dev/zero of=/mnt/file_$i bs=2048 count=1; done
(terminated after getting "No space left on device" reports).
The result from Edward Shishkin (RedHat) was 59480 Files. This would give us 2048*59480 ~ 116MB. Or in other words, we would waste around 880MB of Space.
In the meanwhile, Chris Mason, inventor of btrfs created a patch for increased utilisation.
He was able to achieve 106894 Files. That's 208 MB or a waste of 800MB. I'm not sure what he meant by the comment about the duplicate of metadata. Maybe, If you took the duplicates away, you're able to store as twice as much data on it (putting data at risk?). It still wastes almost 60% of the space...
Next on the list was of course to see how ZFS behaves.
I've created a testpool from a 1GB File. The usable space according to zfs list is 984M. I was able to squeeze in 444555 files. This results in 868MB of used space.
If we compare this to the initial 1GB capacity, we lose about 13% for metadata etc. I think that's not much, if we think of all the validation and checksumming happening behind the curtains...
Now, someone might say, nobody stores that many small files. On our mail platform we do. So any wasted space cost $$$ in the end.
Comments
> idea to use b-trees for filesystems.
>
> I'm not enough into algorithms, but I let you decide...
I am into algorithms, and that sentence alone is enough to make me stop read your article. Sorry, sounded interesting otherwise.
I prefer to be honest about things I'm not an expert in.
But tell me, why would I need to be an expert in algorithms, if all I want to do is using a filesystem?
Fact is, btrfs is not ready for environments with many small files. And if you read the btrfs mailing-list, there is no quick fix around.