Comment on [PSA] Swapping your Deck's filesystem to Btrfs is easy to do, and can give you more space for free
skullgiver@popplesburger.hilciferous.nl 1 year agoIt should be, but neither files were damaged before the dedup attempt. I went balance-check-dedup-check-balance on purpose to make sure I wouldn’t accidentally deduplicate with a damaged extent. I don’t know if the metadata was damaged or the extent itself, but there were checksum failures on two specific extents in two sets of files that got deduplicated (4 files in total, both sets of temporary files in the Lutris cache).
I’m not mad or anything, and I accept a few kilobytes of lost data every now and then. There’s a reason I have (signed, encrypted, diffed) backups in the first place! That doesn’t change the fact that there still are a few edge cases where BTRFS suffers corruption under heavy load with a wide range of features in use (many of which don’t even exist in other file systems, I myself am quite fond of CoW+deduplication of existing files+compression on selected paths+snapshots every time I run apt upgrade). If we all pretend these edge cases never happen, they’ll never get fixed.
If this happened enough for me to be able to replicate the problem, I would’ve filed a bug report. I’m happily using btrfs on my desktop drives drives and I’m planning on converting my laptop as soon as I can get enough space for a full backup just in case. There’s no doubt BTRFS is superior to ext4, and even ZFS has issues BTRFS doesn’t have (no dumb Oracle licensing issues, for one).
For something like a Steam Deck SD card, BTRFS is a no-brainer. I’m a little annoyed that I need to mount the SD card manually if it’s not ext4, but the space savings and improved loading times are worth it.
yote_zip@pawb.social 1 year ago
What deduplication program did you use? Deduplication is not technically an end-to-end supported feature, and depending on how the third-party program implemented it there could be issues earlier in the pipeline. I’m also not sure how a RAM bit flip would interact in this scenario - I know ZFS checks the file checksum several times during transaction but I don’t know how often BTRFS does.
The problem is that there are a lot of people online reporting vague problems with BTRFS, but all reports have little info on how they were actually caused and are not able to be reproduced. There is no solution if we’re operating under these rules, other than to completely stop using BTRFS out of pure superstition. If there are bugs we need to be able to point to the bugs in order to fix them. As I said before, this problem you had would not have even been detected by Ext4, so I think error reporting is biased against a FS that actually checks its work. W/r/t to checking work, I think ZFS gets away with a lot more because it’s normally run in RAID setups, where healing happens automatically. BTRFS, lacking RAID5/6 support, is usually just run on a single drive, and any data integrity error becomes a target of frustration as soon as it happens.
skullgiver@popplesburger.hilciferous.nl 1 year ago
I used duperemove which uses the
FIDEDUPERANGE
ioctl.My biggest issues with BTRFS is that the tooling isn’t complete yet. On ext4 a wide range of issues can be fixed with fsck (including fixing broken metadata) while the
btrfs check
documentation basically tells you:It’s a bit like Windows: it works while it works, it can do some great tricks, but when it breaks and the automagic recovery doesn’t help, you’re kind of screwed.
I feel you, most people with BTRFS issues have hardware issues and are hit by Intel/AMD making ECC memory an expensive feature. But in my specific case, the files actually got messed up because of a dedup gone wrong.
yote_zip@pawb.social 1 year ago
I’m interested to see that reported somewhere - the duperemove repo might be a good starting point as that’s generally the standard BTRFS dedupe solution. I don’t currently see any issues on the GitHub repo about corruption (or at least the last one was 7 years ago). Again, I’m not sure if a RAM bit flip could cause this during a dedupe. Just because you scrubbed, deduped, and scrubbed again doesn’t mean there wasn’t a bit flip during the dedupe.
As for
btrfs-check
vsfsck
, there are just way fewer things that need to be repaired in BTRFS and ZFS because they are copy-on-write (ZFS doesn’t even have afsck
at all!). Because Ext4 is not Copy-On-Write, it’s highly vulnerable to powerloss events, and anfsck
is required to replay the journal when this happens. BTRFS and ZFS make atomic COW transactions and will never be in a state of corruption on power loss. The other part offsck
is repairing the filesystem, which BTRFS and ZFS do throughscrub
and/or auto-heal on read instead. ZFS and BTRFS keep multiple copies of the filesystem metadata so that it can auto-repair itself while online.btrfs check
is not something that should be used lightly, and I’ve seen a lot of people just runbtrfs-check --repair
expecting the same behavior asfsck
, then wonder why they ended up with a broken filesystem.