ZFS Writeup.
Here’s my ZFS write up.
Here I’ve got 4×160GB Seagate drives. The first two are my OS Gmirror. I’m not going to fuck with those. I’ll be doing my testing with ad8 and ad10.
[root@db ~]# dmesg |grep Seagate
ad4: 152627MB <Seagate ST3160812AS 3.AAD> at ata2-master SATA300
ad6: 152627MB <Seagate ST3160812AS 3.AAD> at ata3-master SATA300
ad8: 152627MB <Seagate ST3160812AS 3.AAD> at ata4-master SATA300
ad10: 152627MB <Seagate ST3160812AS 3.AAD> at ata5-master SATA300
Here I create a mirror using the two 160GB disks.
[root@db ~]# zpool create zfs mirror ad8 ad10
Here I confirm it looks good.
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
zfs 147G 0B 147G 0% /zfs
[root@db ~]# zpool status
pool: zfs
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
mirror ONLINE 0 0 0
ad8 ONLINE 0 0 0
ad10 ONLINE 0 0 0
errors: No known data errors
Here I create a couple file systems with various mount points. I’m giving Skip a 10GB quota so he doesn’t consume the entire pool. I reserve 100GB for porn. I don’t want to take any chances on running out of space on that mount point.
[root@db ~]# zfs create -o quota=10G -o mountpoint=/usr/home/skip zfs/skip
[root@db ~]# zfs create -o reservation=100G -o mountpoint=/usr/local/pr0n zfs/pr0n
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
zfs 47G 0B 47G 0% /zfs
zfs/skip 10G 128K 10G 0% /usr/home/skip
zfs/pr0n 147G 0B 147G 0% /usr/local/pr0n
What is interesting is that the ‘zfs’ partition went from 147GB down to 47GB due to the 100GB porn reservation. The ’skip’ partition is listed as 10GB due to his quota. Now I’m going to create a partition for some music.
[root@db ~]# zfs create -o mountpoint=/usr/local/music zfs/music
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
zfs 47G 0B 47G 0% /zfs
zfs/skip 10G 128K 10G 0% /usr/home/skip
zfs/pr0n 147G 0B 147G 0% /usr/local/pr0n
zfs/music 47G 0B 47G 0% /usr/local/music
I set that one up without any flags, so it get access to the entire non-reserved pool (47GB). Next I’m going to popluate the filesystems with a bunch of data.
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
zfs 39G 0B 39G 0% /zfs
zfs/skip 10G 3.6G 6.4G 36% /usr/home/skip
zfs/pr0n 139G 3.6G 136G 3% /usr/local/pr0n
zfs/music 43G 3.6G 39G 8% /usr/local/music
Now, with the machine live, I am pulling out one of the 160GB drives and replacing it with a 500GB.
[root@db ~]# zpool status
pool: zfs
state: ONLINE
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ‘zpool online’.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
mirror ONLINE 0 0 0
ad8 ONLINE 0 0 0
ad10 UNAVAIL 0 0 0 cannot open
errors: No known data errors
At this point we see that the ZPOOL is irritated and recognizes that there is a problem. However, note that there are no data errors. This is of course to be expected by any RAID mirror.
[root@db ~]# zpool replace zfs ad10
[root@db ~]# zpool status
pool: zfs
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 92.58% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
zfs DEGRADED 0 0 0
mirror DEGRADED 0 0 0
ad8 ONLINE 0 0 0
replacing DEGRADED 0 0 0
ad10/old UNAVAIL 0 0 0 cannot open
ad10 ONLINE 0 0 0
errors: No known data errors
Done. So at this point, I’ve got a 160GB drive in a mirror along with a 500GB mirror. Of course it’s only usable up to the 160GB mark. Now, less than ten minutes later, I am pulling the remaining 160GB drive out and replacing it with a 500GB drive.
[root@db ~]# zpool status
pool: zfs
state: ONLINE
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ‘zpool online’.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: scrub completed with 0 errors on Thu Mar 20 19:49:18 2008
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
mirror ONLINE 0 0 0
ad8 UNAVAIL 0 0 0 cannot open
ad10 ONLINE 0 0 0
errors: No known data errors
[root@db ~]# zpool replace zfs ad8
At this point, I still have a 160gb mirror. It is made up of two 500GB drives. Two commands later, all that will change.
[root@db ~]# zpool export zfs
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
[root@db ~]# zpool import zfs
[root@db ~]# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 4.8G 197M 4.3G 4% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 9.7G 12K 8.9G 0% /tmp
/dev/mirror/gm0s1f 63G 1.4G 57G 2% /usr
/dev/mirror/gm0s1d 63G 43M 58G 0% /var
zfs/skip 10G 3.6G 6.4G 36% /usr/home/skip
zfs/music 354G 3.6G 351G 1% /usr/local/music
zfs/pr0n 451G 3.6G 447G 1% /usr/local/pr0n
zfs 351G 0B 351G 0% /zfs
Unfortunately, the grow from a 160GB mirrored pool to a 500GB mirror DID require un-mounting the file systems. Additionally, shrinking a pool is not an option under FreeBSD at this point. I do believe that both of those things are going to change before long. I’ve seen a neat PDF talking about some of the benefits of ZFS @ http://mediacast.sun.com/users/JamesCMcPherson/media/ZFS_SOSUG17oct2005_preso.pdf
Thanks!
As an additional thought, I didn’t *HAVE* to unmount the file systems to swap out those drives. I only needed to in order to utilize the additional space. I could have just waited until the next reboot and it would have been available. One of the huge benefits to me would come in a situation where I’ve got a 1TB mirror with a bunch of data on it. Say, for a high traffic web server for example. In order to gain more space (prior to ZFS) I’d either need to migrate over to a new machine, or backup the data from the massive mirror, do the mirror rebuild using the bigger disks, and then get it all back. That can take a *VERY* long time. With ZFS I can eliminate the downtime of the copy back.
Lastly, with ZFS, if I had the additional space for more disks (rather than just larger disks) I can add to the ZPOOL at any time and the space becomes usable immediately.