ZFS and shred
By Rémi Duraffort on Friday, July 30 2010, 00:02 - Geek - Permalink
ZFS is a really powerful and convenient file system but the way ZFS works makes some tools absolutely inefficient...
ZFS: a short overview
ZFS is a file system designed by Sun Microsystems. The file system is based on the Copy-on-Write (CoW) paradigm. When a file is modified, the blocks that must be changed are never modified in place. Instead the following operations are executed:
- Copy the block
- Change the new block

A block is never ever modified in place: a copy is created and then modified.
Like some other modern file-systems, ZFS uses a binary tree to store the list of blocks that form a file: each node knows the address of its two children. When a block is to be changed, a copy is done then the copy is modified. This means that the parent has to be updated when one of his children change. Thus the following operations happen:
- Copy the block
- Modify the new block
- Copy the parent block
- Update the address of the child in the new parent block
- Loop until the root block (Über block) is reached
- Atomically update the Über block

Erasing a file
Classical file system
When deleting a file, most file systems only remove the references to the blocks that form the file while letting the blocks unchanged. That's why sometimes, files can be restaured after a deletion: the blocks are still present on the hard drive.
Sometimes you might want to erase a file and to unsure that the data are no longer present on the hard drive. A tool called Shred has been developed for this purpose.
#!/bin/sh root@localhost:% cat private Really important information that must be removed. root@localhost:% shred private root@localhost:% hexdump -C private 00000000 c9 b9 75 91 02 1f a6 6f 71 d0 8a 9f 3c b5 f7 0f |..u....oq...<...| 00000010 a4 9d 7c fb 56 ac 41 b3 a5 dc be f8 8d c4 41 5d |..|.V.A.......A]| ..............
The file content is now erased by some random data (this process must be repeated several times to unsure that data cannot be recovered by some special tools)
ZFS
On a ZFS file system, the same set of commands will show the same result: the file is replaced by some random data. But as we have seen before, ZFS is based on CoW, which means that data blocks can still be present on the hard drive. Let's have a look at the hard drive to see if we can find the deleted data.
For the sake of the demonstration, I am using a file as a partition for the zfs file system. With a real device the operations are exactly the same.
#!/bin/sh root@localhost:% zpool create zpool_test /root/zfs_partition root@localhost:% zfs mount zpool_test root@localhost:% cat /zpool_test/private Really important information that must be removed. root@localhost:% shred /zpool_test/private root@localhost:% hexdump -C /zpool_test/private 00000000 c7 cc 86 60 d6 a3 f4 45 37 d5 e7 68 4d 49 c4 43 |...`...E7..hMI.C| 00000010 a8 87 ae e8 8c ac 21 37 aa e7 c1 34 a2 d5 1d ad |......!7...4....| ..............
Shred seems to do its job, but if we look directly at the partition:
#!/bin/sh root@localhost:% hexdump -C /root/zfs_partition [...] 0040f000 52 65 61 6c 6c 79 20 69 6d 70 6f 72 74 61 6e 74 |Really important| 0040f010 20 69 6e 66 6f 72 6d 61 74 69 6f 6e 20 74 68 61 | information tha| 0040f020 74 20 6d 75 73 74 20 62 65 20 72 65 6d 6f 76 65 |t must be remove| 0040f030 64 2e 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............| [...]
Here we found the data that should have been removed by Shred. But as ZFS is a CoW file systems as long as the blocks are not reused, the data stay on the hard drive.
Workaroud
This issue occurs with every CoW file systems like the promising Btrfs. I don't know of any way to erase data over than wiping the entire partition. Maybe a specific tool will be developed for this purpose...
Comments
I'm not sure what the benefit of copy-on-write is - maybe it's applicable to physical storage such as flash memory that is only physically written in big blocks - but it seems to me that what may be wanted is a secure delete option provided by the file system itself: that it goes back later and kills the older copy of your data. Rather like the automatic data-garage collection... that is done by the Java execution environment... devised by Sun.
Alternatively, as I did recently when a Windows tool called sdelete didn't do what I expected, use a DOS command to create a file of space characters of length 2k / 4k / 8k / 16k / so on up to, well, after 1 GiB I couldn't put any more of them in zip files. But anyway, as many copies of those files as are required to fill up your disk with null bytes - then delete them again. It's a disk with bad sectors, ntfsclone got confused and let me down, and I wanted to back it up using dd working around the bad part but that means that you back up every file that USED to be on the hard disc, not just the ones that are now.
I had expected that the option I used with sdelete would fill my disk free space with blankness instead of evil random bytes, but it didn't. YMMV.
Copy-on-Write is really useful to ensure data integrity. As you change the über block atomically, if the system crash when you change a file, the file is still consistent. Moreover (I did spoke about this advantage of Copy-on(Write in this article) that's really easy to create snapshots of the filesystem. After a snapshot, when a file is changed, do the usual copy-on-write mechanism but when the über block is changed, keep the old one (with the tree of blocks) as part of the snapshot.
About erasing data on the filesystem, the easy way is just to encrypt the filesystem but IMHO that a bit too heavy to just erase one file.
Thank you. By the way, "data-garage collection" was meant as "data-garbage collection", and not a smaller-scale version of a "data-warehouse".
I’m a student and interested in the zettabyte file system.I know zfs is the system that write out of place,which is COW.So this file system will produce the garbage. How to collect the garbage is a problem in my mind and I can’t find it . After repeated inquiries on the web, I failed .So i want to ask for your help.Could you help me explain how zfs collect the garbage.
When a block is modified, the old block is marked as free unless a snapshot occurred (in this case the old block is now part of the snapshot tree). Their is no need for nay garbage collection as the file system allways know if a block is allocated or free.
You can have a look here for more information about the way ZFS keep track of allocated and freed blocks
Thank you so much!