by cesarb on 12/26/2023, 1:32:52 PM
by mgerdts on 12/26/2023, 1:13:58 PM
When I think of a fs corruption bug, I think of something that causes fsck/scrub to have some work to do, sometimes sending resulting in restore from backups. From the early reports of this, I was having a hard time understanding how it was a corruption bug. This excellent write up clears that up:
> Incidentally, that’s why this isn’t “corruption” in the traditional sense (and why a scrub doesn’t find it): no data was lost. cp didn’t read data that was there, and it wrote some zeroes which OpenZFS safely stored.
by dannyw on 12/26/2023, 10:10:28 AM
Fascinating write up. As someone with a ZFS system, how can I check if I’m affected?
by LanzVonL on 12/26/2023, 6:25:22 PM
It's important to note that the recent showstopper bugs have all been in OpenZFS, with the Oracle nee Sun ZFS being unaffected by either.
by frankjr on 12/26/2023, 6:39:33 PM
I wonder if any large storage provider has been affected by this. I know Hetzner Storage Box and rsync.net both use ZFS under the hood.
by joshxyz on 12/26/2023, 11:42:52 AM
anyone know what diagram tool did he use? thanks
by commandersaki on 12/26/2023, 10:53:37 AM
Excellent writeup robn!
by lupusreal on 12/26/2023, 5:10:13 PM
Is anybody using bcachefs yet?
by MenhirMike on 12/26/2023, 10:12:58 AM
Periodic reminder to check if your backups are working, and if you can also restore them. It doesn't matter which file system or operating system you use, make sure to backup your stuff. In a way that's immune to ransomware as well, so not just a RAID-1/5/Z or another form of hot/warm storage (RAID is not a backup, it's an uptime/availability mechanism) but cold storage. (I snapshot and tar that snapshot every night, then back it up both on tape and in the cloud.)
by hulitu on 12/26/2023, 11:12:30 AM
> This whole madness started because someone posted an attempt at a test case for a different issue, and then that test case started failing on versions of OpenZFS that didn’t even have the feature in question.
One will expect more seriosity from filesystem maintainers and serious regression testing before a release.
IMO, part of the issue is that something which used to be just a low-level optimization (don't store large sequences of zeros) became visible to userspace (SEEK_HOLE and friends). Quoting from this article:
"This is allowed; its always safe to say there’s data where there’s a hole, because reading a hole area will always find “zeroes”, which is valid data."
But I recall reading elsewhere a discussion about some userspace program which did depend on holes being present in the filesystem as actual holes (visible to SEEK_HOLE and so on) and not as runs of zeros.
Combined with the holes being restricted to specific alignments and sizes, this means that the underlying "sequence of fixed-size blocks" implementation is leaking too much over the abstract "stream of bytes" representation we're more used to. Perhaps it might be time to rethink our filesystem abstractions?