Anda di halaman 1dari 159

Solaris 10 Administration Topics Workshop

3 - File Systems
By Peter Baer Galvin

For Usenix
Last Revision April 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


About the Speaker
Peter Baer Galvin - 781 273 4100
pbg@cptech.com
www.cptech.com
peter@galvin.info
My Blog: www.galvin.info
Bio
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading
systems integrator and VAR, and was the Systems Manager for Brown University's
Computer Science Department. He has written articles for Byte and other magazines. He
was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's
Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the
systems administration column there. He is now Sun columnist for the Usenix ;login:
magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating
Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials
in security and system administration and given talks at many conferences and
institutions.

Copyright 2009 Peter Baer Galvin - All Rights Reserved 2

Saturday, May 2, 2009


Objectives
Cover a wide variety of topics in Solaris 10

Useful for experienced system administrators

Save time

Avoid (my) mistakes

Learn about new stuff


Answer your questions about old stuff

Won't read the man pages to you

Workshop for hands-on experience and to reinforce concepts

Note – Security covered in separate tutorial

Copyright 2009 Peter Baer Galvin - All Rights Reserved 3

Saturday, May 2, 2009


More Objectives
What makes novice vs. advanced administrator?
Bytes as well as bits, tactics and strategy
Knows how to avoid trouble
How to get out of it once in it
How to not make it worse
Has reasoned philosophy
Has methodology

Copyright 2009 Peter Baer Galvin - All Rights Reserved 4

Saturday, May 2, 2009


Prerequisites

Recommend at least a couple of years of


Solaris experience
Or at least a few years of other Unix
experience
Best is a few years of admin experience,
mostly on Solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved 5

Saturday, May 2, 2009


About the Tutorial

Every SysAdmin has a different knowledge set


A lot to cover, but notes should make good
reference
So some covered quickly, some in detail
Setting base of knowledge

Please ask questions


But let’s take off-topic off-line
Solaris BOF
Copyright 2009 Peter Baer Galvin - All Rights Reserved 6

Saturday, May 2, 2009


Fair Warning
Sites vary
Circumstances vary
Admin knowledge varies
My goals
Provide information useful for each of you at
your sites
Provide opportunity for you to learn from
each other

Copyright 2009 Peter Baer Galvin - All Rights Reserved 7

Saturday, May 2, 2009


Why Listen to Me
20 Years of Sun experience
Seen much as a consultant
Hopefully, you've used:
My Usenix ;login: column
The Solaris Corner @ www.samag.com
The Solaris Security FAQ
SunWorld “Pete's Wicked World”
SunWorld “Pete's Super Systems”
Unix Secure Programming FAQ (out of date)
Operating System Concepts (The Dino Book), now 8th ed
Applied Operating System Concepts

Copyright 2009 Peter Baer Galvin - All Rights Reserved 8

Saturday, May 2, 2009


Slide Ownership

As indicated per slide, some slides


copyright Sun Microsystems
Feel free to share all the slides - as long as
you don’t charge for them or teach from
them for fee

Copyright 2009 Peter Baer Galvin - All Rights Reserved 9

Saturday, May 2, 2009


Overview
Lay of the Land

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


Schedule
Times and Breaks

Copyright 2009 Peter Baer Galvin - All Rights Reserved 11

Saturday, May 2, 2009


Coverage

Solaris 10+, with some Solaris 9 where


needed
Selected topics that are new, different,
confusing, underused, overused, etc

Copyright 2009 Peter Baer Galvin - All Rights Reserved 12

Saturday, May 2, 2009


Outline

Overview
Objectives
Choosing the most appropriate file system(s)
UFS / SDS
Veritas FS / VM (not in detail)
ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved 13

Saturday, May 2, 2009


Polling Time
Solaris releases in use?
Plans to upgrade?
Other OSes in use?
Use of Solaris rising or falling?
SPARC and x86
OpenSolaris?

Copyright 2009 Peter Baer Galvin - All Rights Reserved 14

Saturday, May 2, 2009


Your Objectives?

Copyright 2009 Peter Baer Galvin - All Rights Reserved 15

Saturday, May 2, 2009


Lab Preparation
Have device capable of telnet on the
USENIX network
Or have a buddy
Learn your “magic number”
Telnet to 131.106.62.100+”magic number”
User “root, password “lisa”
It’s all very secure

Copyright 2009 Peter Baer Galvin - All Rights Reserved 16

Saturday, May 2, 2009


Lab Preparation

Or...
Use virtualbox
Use your own system
Use a remote machine you have legit
access to

Copyright 2009 Peter Baer Galvin - All Rights Reserved 17

Saturday, May 2, 2009


Choosing the Most Appropriate File Systems

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


Choosing the Most Appropriate File Systems

Many file systems, many not optional (tmpfs et al)

Where you have choice, how to choose?

Consider

Solaris version being used

< S10 means no ZFS

ISV support

For each ISV make sure desired FS is supported

Apps, backups, clustering

Priorities

Now weigh priorities of performance, reliability, experience,


features, risk / reward

Copyright 2009 Peter Baer Galvin - All Rights Reserved 19

Saturday, May 2, 2009


Consider...
Pros and cons of mixing file systems
Root file system
Not much value in using vxfs / vxvm here
unless used elsewhere
Interoperability (need to detach from one type
of system and attach to another?)
Cost
Supportability & support model
Non-production vs. production use
Copyright 2009 Peter Baer Galvin - All Rights Reserved 20

Saturday, May 2, 2009


Root Disk Mirroring
The Crux of Performance

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


Topics

•Root disk mirroring


•ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved 22

Saturday, May 2, 2009


Root Disk Mirroring
Complicated because
Must be bootable
Want it protected from disk failure
And want the protection to work

Can increase or decrease upgrade


complexity
Veritas
Live upgrade
Copyright 2009 Peter Baer Galvin - All Rights Reserved 23

Saturday, May 2, 2009


Manual Mirroring
Vxvm encapsulation can cause lack of availability
Vxvm needs a rootdg disk
Any automatic mirroring can propagate errors
Consider
Use disksuite (Solaris Volume Manager) to mirror boot disk
Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror
copy
Or use 10Mb rootdg on 2 boot disks in disksuite to do the
mirroring
Best of all worlds – details in column at
www.samag.com/solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved 24

Saturday, May 2, 2009


Manual Mirroring
Sometimes want more than no mirroring, less than real mirroring
Thus "manual mirroring"
Nightly cron job to copy partitions elsewhere
Can be used to duplicate root disk, if installboot used
Combination of newfs, mount, ufsdump | ufsrestore
Quite effective, useful, and cheap
Easy recovery from corrupt root image, malicious error, sysadmin
error
Has saved at least one client
But disk failure can require manual intervention
Complete script can be found at www.samag.com/solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved 25

Saturday, May 2, 2009


Best Practice – Root Disk
Have 4 disks for root!
1st is primary boot device
2nd is disksuite mirror of first
3rd is manual mirror of 1st
4th is manual mirror, kept on a shelf!
Put nothing but systems files on these disks
(/, /var, /opt, /usr, swap)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 26

Saturday, May 2, 2009


Aside: Disk Performance
Which is faster?

73GB drive 300GB drive


10000 RPM 10000 RPM
3Gb/sec 3Gb/sec

Copyright 2009 Peter Baer Galvin - All Rights Reserved 27

Saturday, May 2, 2009


UFS / SDS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


UFS Overview
Standard Pre-Solaris 10 file system
Many years old, updated continously
But still showing its age
No integrated volume manager, instead use SDS
(disk suite)
Very fast, but feature poor
For example snapshots exist but only useful for
backups
Painful to manage, change, repair

Copyright 2009 Peter Baer Galvin - All Rights Reserved 29

Saturday, May 2, 2009


Features
64-bit pointers
16TB file systems (on 64-bit Solaris)
1TB maximum file size
metadata logging (by default) increases
performance and keeps file systems (usually)
consistent after a crash
Lots of ISV and internal command (dump) support
Only bootable Solaris file system (until S10 10/08)
Dynamic multipathing, but via separate “traffic
manager” facility
Copyright 2009 Peter Baer Galvin - All Rights Reserved 30

Saturday, May 2, 2009


Issues
Sometimes there is still corruption

Need to run fsck

Sometimes it fails

Many limits
Many features lacking (compared to ZFS)

Lots of manual administration tasks

format to slice up a disk

newfs to format the file system, fsck to check it

mount and /etc/vfstab to mount a file system

share commands, plus svcadm commands, to NFS export


Plus separate volume management
Copyright 2009 Peter Baer Galvin - All Rights Reserved 31

Saturday, May 2, 2009


Volume Management
Separate set of commands (meta*) to manage volumes (RAID et al)

For example, to mirror the root file system

Have 2 disks with identical partitioning

Have 2 small partition per disk for meta-data (here


slices 5 and 6)

newfs the file systems

Create meta-data state databases (at least 3, for quorum)

# metadb -a /dev/dsk/c0t0d0s5

# metadb -a /dev/dsk/c0t0d0s6

# metadb -a /dev/dsk/c0t1d0s5

# metadb -a /dev/dsk/c0t1d0s6

Copyright 2009 Peter Baer Galvin - All Rights Reserved 32

Saturday, May 2, 2009


Volume Management (cont)
Initialize submirrors (components of mirrors) and mirror the partitions - here
we do /, swap, and /var
# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d0 -m d10
Make the new / bootable
# metaroot d0
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d1 -m d11
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d4 -m d14
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d7 -m d17
Copyright 2009 Peter Baer Galvin - All Rights Reserved 33

Saturday, May 2, 2009


Volume Management (cont)

Update /etc/vfstab to reflect new meta devices


/dev/md/dsk/d1 - - swap - no -
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes -
/dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes -

Finally attach the submirror to each device to be mirrored


# metattach d0 d20
# metattach d1 d21
# metattach d4 d24
# metattach d7 d27

Now the root disk is mirrored, and commands such as Solaris upgrade, live
upgrade, and boot understand that

Copyright 2009 Peter Baer Galvin - All Rights Reserved 34

Saturday, May 2, 2009


Veritas VM / FS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


Overview
A popular, commercial addition to Solaris
64-bit
Integrated volume management (vxfs + vxvm)
Mirrored root disk via “encapsulation”
Good ISV support
Good extended features such as snapshots, replication
Shrink and grow file systems
Extent based (for better and worse), journaled,
clusterable
Cross-platform
Copyright 2009 Peter Baer Galvin - All Rights Reserved 36

Saturday, May 2, 2009


Features
Very large limits
Dynamic multipathing included
Hot spares to automatically replace failed
disks
Dirty region logging (DRL) volume
transaction logs for fast recovery from
crash
But still can require consistency check

Copyright 2009 Peter Baer Galvin - All Rights Reserved 37

Saturday, May 2, 2009


Issues
$$$
Adds supportability complexities (who do
you call)
Complicates OS upgrades (unencapsulate
first)
Fairly complex to manage
Comparison of performance vs. ZFS at
http://www.sun.com/software/whitepapers/
solaris10/zfs_veritas.pdf

Copyright 2009 Peter Baer Galvin - All Rights Reserved 38

Saturday, May 2, 2009


ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009


ZFS
Looks to be the “next great thing”
Shipped officially in S10U2 (the 06/06 release)
From scratch file system
Includes volume management, file system, reliability,
scalability, performance, snapshots, clones,
replication
128-bit file system, almost everything is “infinite”
Checksumming throughout
Simple, endian independent, export/importable…
Still using traffic manager for multipathing
(some following slides are from ZFS talk by Jeff Bonwick
and Bill Moore – ZFS team leads at Sun)
Copyright 2009 Peter Baer Galvin - All Rights Reserved 40

Saturday, May 2, 2009


Trouble with Existing Filesystems
No defense against silent data corruption
Any defect in disk, controller, cable, driver, or firmware can
corrupt data silently; like running a server without ECC
memory
Brutal to manage
Labels, partitions, volumes, provisioning, grow/shrink, /etc/
vfstab...
Lots of limits: filesystem/volume size, file size, number of files,
files per directory, number of snapshots, ...
Not portable between platforms (e.g. x86 to/from SPARC)
Dog slow
Linear-time create, fat locks, fixed block size, naïve prefetch,
slow random writes, dirty region logging
Copyright 2009 Peter Baer Galvin - All Rights Reserved 41

Saturday, May 2, 2009


Design Principles
Pooled storage
Completely eliminates the antique notion of volumes
Does for storage what VM did for memory

End-to-end data integrity


Historically considered “too expensive”
Turns out, no it isn't
And the alternative is unacceptable

Transactional operation
Keeps things always consistent on disk
Removes almost all constraints on I/O order
Allows us to get huge performance wins
Copyright 2009 Peter Baer Galvin - All Rights Reserved 42

Saturday, May 2, 2009


Why “volumes” Exist
In the beginning, each filesystem managed a
single disk
Customers wanted more space, bandwidth,
reliability
Rewrite filesystems to handle many disks: hard
Insert a little shim (“volume”) to cobble disks together:
easy

An industry grew up around the FS/volume


model
Filesystems, volume managers sold as separate products
Inherent problems in FS/volume interface can't be fixed
Copyright 2009 Peter Baer Galvin - All Rights Reserved 43

Saturday, May 2, 2009


Traditional Volumes

FS FS

Volume Volume
(stripe) (mirror)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 44

Saturday, May 2, 2009


ZFS Pools

Abstraction: malloc/free
No partitions to manage
Grow/shrink automatically
All bandwidth always available
All storage in the pool is shared

Copyright 2009 Peter Baer Galvin - All Rights Reserved 45

Saturday, May 2, 2009


ZFS Pooled Storage

FS FS FS FS FS

Storage Pool Storage Pool


(RAIDZ) (Mirror)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 46

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 47

Saturday, May 2, 2009


ZFS Data Integrity Model
Everything is copy-on-write
Never overwrite live data
On-disk state always valid – no “windows of
vulnerability”
No need for fsck(1M)
Everything is transactional
Related changes succeed or fail as a whole
No need for journaling
Everything is checksummed
No silent data corruption
No panics due to silently corrupted metadata
Copyright 2009 Peter Baer Galvin - All Rights Reserved 48

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 49

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 50

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 51

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 52

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 53

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 54

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 55

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 56

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 57

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 58

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 59

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 60

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 61

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 62

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 63

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 64

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 65

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 66

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 67

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 68

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 69

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 70

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 71

Saturday, May 2, 2009


Terms
Pool - set of disks in one or more RAID
formats (i.e. mirrored stripe)
No “/”
File system - mountable-container of files
Data set - file system, block device,
snapshot, volume or clone within a pool
Named via pool/path[@snapshot]

Copyright 2009 Peter Baer Galvin - All Rights Reserved 72

Saturday, May 2, 2009


Terms (cont)
ZIL - ZFS intent log
On-disk duplicate of in-memory log of
changes to make to data sets
Write goes to memory, ZIL, is
acknowledged, then goes to disk
ARC - in-memory read cache
L2ARC - level 2 ARC - on flash memory

Copyright 2009 Peter Baer Galvin - All Rights Reserved 73

Saturday, May 2, 2009


What ZFS doesn’t do
Can’t remove individual devices from pools
Rather, replace the device, or 3-way mirror
including the device and then remove the device
Can’t shrink a pool (yet)
Can add individual devices, but not optimum (yet)
If adding disk to RAIDZ or RAIDZ2, then end up
with RAIDZ(2)+ 1 concatenated device
Instead add full RAID elements to a pool
Add a mirror pair or RAIDZ(2) set
Copyright 2009 Peter Baer Galvin - All Rights Reserved 74

Saturday, May 2, 2009


zpool
# zpool
missing command
usage: zpool command args ...
where 'command' is one of the following:

create [-fn] [-o property=value] ...


[-O file-system-property=value] ...
[-m mountpoint] [-R root] <pool> <vdev> ...
destroy [-f] <pool>

add [-fn] <pool> <vdev> ...


remove <pool> <device> ...

list [-H] [-o property[,...]] [pool] ...


iostat [-v] [pool] ... [interval [count]]
status [-vx] [pool] ...

online <pool> <device> ...


offline [-t] <pool> <device> ...
clear <pool> [device]

Copyright 2009 Peter Baer Galvin - All Rights Reserved


75

Saturday, May 2, 2009


zpool (cont)
attach [-f] <pool> <device> <new-device>
detach <pool> <device>
replace [-f] <pool> <device> [new-device]

scrub [-s] <pool> ...

import [-d dir] [-D]


import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] -a
import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id>
[newpool]
export [-f] <pool> ...
upgrade
upgrade -v
upgrade [-V version] <-a | pool ...>

history [-il] [<pool>] ...


get <"all" | property[,...]> <pool> ...
set <property=value> <pool>
Copyright 2009 Peter Baer Galvin - All Rights Reserved 76

Saturday, May 2, 2009


zpool (cont)
# zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0
# zpool status -v
pool: ezfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM


ezfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0

errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved 77

Saturday, May 2, 2009


zpool (cont)
pool: zfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM


zfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c0d0s7 ONLINE 0 0 0
c0d1s7 ONLINE 0 0 0
c1d1 ONLINE 0 0 0
c1d0 ONLINE 0 0 0

errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved 78

Saturday, May 2, 2009


zpool (cont)
(/)# zpool iostat -v
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
bigp 630G 392G 2 4 41.3K 496K
raidz 630G 392G 2 4 41.3K 496K
c0d0s6 - - 0 2 8.14K 166K
c0d1s6 - - 0 2 7.77K 166K
c1d0s6 - - 0 2 24.1K 166K
c1d1s6 - - 0 2 22.2K 166K
---------- ----- ----- ----- ----- ----- -----

Copyright 2009 Peter Baer Galvin - All Rights Reserved 79

Saturday, May 2, 2009


zpool (cont)
# zpool status -v
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c0d0s0 ONLINE 0 0 0
c0d1s0 ONLINE 0 0 0
errors: No known data errors
pool: zpbg
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpbg ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved 80

Saturday, May 2, 2009


zpool (cont)
zpool iostat -v
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
rpool 6.72G 225G 0 1 9.09K 11.6K
mirror 6.72G 225G 0 1 9.09K 11.6K
c0d0s0 - - 0 0 5.01K 11.7K
c0d1s0 - - 0 0 5.09K 11.7K
---------- ----- ----- ----- ----- ----- -----
zpbg 3.72T 833G 0 0 32.0K 1.24K
raidz1 3.72T 833G 0 0 32.0K 1.24K
c4t0d0 - - 0 0 9.58K 331
c4t1d0 - - 0 0 10.3K 331
c5t0d0 - - 0 0 10.4K 331
c5t1d0 - - 0 0 10.3K 331
c6t0d0 - - 0 0 9.54K 331
---------- ----- ----- ----- ----- ----- -----

Copyright 2009 Peter Baer Galvin - All Rights Reserved 81

Saturday, May 2, 2009


zpool (cont)

Note that for import and export, a pool is


the delineator
You can’t import or export a file system
because it’s an integral part of a pool
Might cause you to use smaller pools
than other

Copyright 2009 Peter Baer Galvin - All Rights Reserved 82

Saturday, May 2, 2009


zfs
# zfs
missing command
usage: zfs command args ...
where 'command' is one of the following:

create [-p] [-o property=value] ... <filesystem>


create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume>
destroy [-rRf] <filesystem|volume|snapshot>

snapshot [-r] [-o property=value] ... <filesystem@snapname|


volume@snapname>
rollback [-rRf] <snapshot>
clone [-p] [-o property=value] ... <snapshot> <filesystem|volume>
promote <clone-filesystem>
rename <filesystem|volume|snapshot> <filesystem|volume|snapshot>
rename -p <filesystem|volume> <filesystem|volume>
rename -r <snapshot> <snapshot>

Copyright 2009 Peter Baer Galvin - All Rights Reserved 83

Saturday, May 2, 2009


zfs (cont)
list [-rH] [-o property[,...]] [-t type[,...]] [-s
property] ...
[-S property] ... [filesystem|volume|snapshot] ...
set <property=value> <filesystem|volume|snapshot> ...
get [-rHp] [-o field[,...]] [-s source[,...]]
<"all" | property[,...]> [filesystem|volume|
snapshot] ...
inherit [-r] <property> <filesystem|volume|snapshot> ...
upgrade [-v]
upgrade [-r] [-V version] <-a | filesystem ...>

mount
mount [-vO] [-o opts] <-a | filesystem>
unmount [-f] <-a | filesystem|mountpoint>
share <-a | filesystem>
unshare [-f] <-a | filesystem|mountpoint>

Copyright 2009 Peter Baer Galvin - All Rights Reserved 84

Saturday, May 2, 2009


zfs (cont)
send [-R] [-[iI] snapshot] <snapshot>
receive [-vnF] <filesystem|volume|snapshot>
receive [-vnF] -d <filesystem>

allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...]


<filesystem|volume>
allow [-ld] -e <perm|@setname>[,...] <filesystem|volume>
allow -c <perm|@setname>[,...] <filesystem|volume>
allow -s @setname <perm|@setname>[,...] <filesystem|volume>

unallow [-rldug] <"everyone"|user|group>[,...]


[<perm|@setname>[,...]] <filesystem|volume>
unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem|
volume>
Each dataset is of the form: pool/[dataset/]*dataset[@name]
For the property list, run: zfs set|get
For the delegated permission list, run: zfs allow|unallow

Copyright 2009 Peter Baer Galvin - All Rights Reserved 85

Saturday, May 2, 2009


zfs (cont)
# zfs get
missing property argument
usage:
get [-rHp] [-o field[,...]] [-s source[,...]]
<"all" | property[,...]> [filesystem|volume|snapshot] ...
The following properties are supported:
PROPERTY EDIT INHERIT VALUES
available NO NO <size>
compressratio NO NO <1.00x or higher if compressed>
creation NO NO <date>
mounted NO NO yes | no
origin NO NO <snapshot>
referenced NO NO <size>
type NO NO filesystem | volume | snapshot
used NO NO <size>
aclinherit YES YES discard | noallow | restricted |
passthrough
aclmode YES YES discard | groupmask | passthrough
atime YES YES on | off

Copyright 2009 Peter Baer Galvin - All Rights Reserved 86

Saturday, May 2, 2009


zfs (cont)
canmount YES NO on | off | noauto
casesensitivity NO YES sensitive | insensitive | mixed
checksum YES YES on | off | fletcher2 | fletcher4 |
sha256
compression YES YES on | off | lzjb | gzip | gzip-[1-9]
copies YES YES 1 | 2 | 3
devices YES YES on | off
exec YES YES on | off
mountpoint YES YES <path> | legacy | none
nbmand YES YES on | off
normalization NO YES none | formC | formD | formKC |
formKD
primarycache YES YES all | none | metadata
quota YES NO <size> | none
readonly YES YES on | off
recordsize YES YES 512 to 128k, power of 2
refquota YES NO <size> | none
refreservation YES NO <size> | none
reservation YES NO <size> | none

Copyright 2009 Peter Baer Galvin - All Rights Reserved 87

Saturday, May 2, 2009


zfs (cont)
secondarycache YES YES all | none | metadata
setuid YES YES on | off
shareiscsi YES YES on | off | type=<type>
sharenfs YES YES on | off | share(1M)
options
sharesmb YES YES on | off | sharemgr(1M)
options
snapdir YES YES hidden | visible
utf8only NO YES on | off
version YES NO 1 | 2 | 3 | current
volblocksize NO YES 512 to 128k, power of 2
volsize YES NO <size>
vscan YES YES on | off
xattr YES YES on | off
zoned YES YES on | off

Sizes are specified in bytes with standard units such as K, M, G,


etc.
User-defined properties can be specified by using a name
containing a colon (:).

Copyright 2009 Peter Baer Galvin - All Rights Reserved 88

Saturday, May 2, 2009


zfs (cont)
(/)# zfs list
NAME USED AVAIL REFER MOUNTPOINT
bigp 630G 384G - /zfs/bigp
bigp/big 630G 384G 630G /zfs/bigp/big
(root@sparky)-(7/pts)-(06:35:11/05/05)-
(/)# zfs snapshot bigp/big@5-nov
(root@sparky)-(8/pts)-(06:35:11/05/05)-
(/)# zfs list
NAME USED AVAIL REFER MOUNTPOINT
bigp 630G 384G - /zfs/bigp
bigp/big 630G 384G 630G /zfs/bigp/big
bigp/big@5-nov 0 - 630G /zfs/bigp/big@5-nov

# zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/


big@5-nov
# zfs send -i 5-nov big/bigp@6-nov | ssh host \
zfs receive poolB/received/big

Copyright 2009 Peter Baer Galvin - All Rights Reserved 89

Saturday, May 2, 2009


zfs (cont)
# zpool history
History for 'zpbg':
2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0
c11t0d0 c12t0d0 c13t0d0
2006-04-03.18:19:48 zfs receive zpbg/imp
2006-04-03.18:41:39 zfs receive zpbg/home
2006-04-03.19:04:22 zfs receive zpbg/photos
2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home
2006-04-03.19:44:22 zfs receive zpbg/mail
2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail
2006-04-03.20:14:32 zfs receive zpbg/mqueue
2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/
mqueue
# zfs create -V 2g tank/volumes/v2
# zfs set shareiscsi=on tank/volumes/v2
# iscsitadm list target
Target: tank/volumes/v2
iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-
cf9a72aa062a
Connections: 0
Copyright 2009 Peter Baer Galvin - All Rights Reserved 90

Saturday, May 2, 2009


zpool history -l
Shows user name, host name, and zone of
command
# zpool history -l users
History for ’users’:
2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
[user root on corona:global]
2008-07-10.09:43:13 zfs create users/marks
[user root on corona:global]
2008-07-10.09:43:44 zfs destroy users/marks
[user root on corona:global]
2008-07-10.09:43:48 zfs create users/home
[user root on corona:global]
2008-07-10.09:43:56 zfs create users/home/markm
[user root on corona:global]
2008-07-10.09:44:02 zfs create users/home/marks
[user root on corona:global]

Copyright 2009 Peter Baer Galvin - All Rights Reserved 91

Saturday, May 2, 2009


zpool history -i

Shows zfs internal activities - useful for


debugging
# zpool history -i users
History for ’users’:
2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
2008-07-10.09:43:13 [internal create txg:6] dataset = 21
2008-07-10.09:43:13 zfs create users/marks
2008-07-10.09:43:48 [internal create txg:12] dataset = 27
2008-07-10.09:43:48 zfs create users/home
2008-07-10.09:43:55 [internal create txg:14] dataset = 33

Copyright 2009 Peter Baer Galvin - All Rights Reserved 92

Saturday, May 2, 2009


ZFS Delegate Admin
Use zfs allow and zfs unallow to grant
and remove permissions
Use “delegation” property to manage if
delegation enabled
Then delegate
# zfs allow cindys create,destroy,mount,snapshot tank/cindys
# zfs allow tank/cindys
-------------------------------------------------------------
Local+Descendent permissions on (tank/cindys)
user cindys create,destroy,mount,snapshot
-------------------------------------------------------------

# zfs unallow cindys tank/cindys


# zfs allow tank/cindys

Copyright 2009 Peter Baer Galvin - All Rights Reserved 93

Saturday, May 2, 2009


ZFS - Odds and Ends
zfs get all will display all set attributes of all ZFS file
systems
Recursive snapshots (via -r) as of S10 8/07
zfs clone makes a RW copy of a snapshot
zfs promote sets the root of the file system to be the
specified clone
You can undo a zpool destroy with zpool import
-D
As of S10 8/07 ZFS is integrated with FMA
As of S10 11/06 ZFS supports double-RAID parity
Copyright 2009 Peter Baer Galvin - All Rights Reserved 94

Saturday, May 2, 2009


ZFS “GUI”

Did you know that Solaris has an admin


GUI?
Webconsole enabled by default
Turn off via svcadm if not used
By default (on Nevada B64 at least) ZFS
only on-by-default feature

Copyright 2009 Peter Baer Galvin - All Rights Reserved 95

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 96

Saturday, May 2, 2009


ZFS Automatic Snapshots
In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris
2008.11

SMF service and GNOME app


Can take automatic scheduled snapshots

By default all zfs file systems, at boot, then every 15


minutes, every hour, every day, etc
Auto delete of oldest snapshots if user-defined
amount of space is not available

Can perform incremental or full backups via those snapshots

Nautilus integration allows user to browse and restore files


graphically

Copyright 2009 Peter Baer Galvin - All Rights Reserved 97

Saturday, May 2, 2009


ZFS Automatic Snapshots (cont)

One SMF service per time frequency:


frequent snapshots every 15 mins, keeping 4 snapshots
hourly snapshots every hour, keeping 24 snapshots
daily snapshots every day, keeping 31 snapshots
weekly snapshots every week, keeping 7 snapshots
monthly snapshots every month, keeping 12 snapshots

Details here: http://src.opensolaris.org/source/xref/jds/zfs-


snapshot/README.zfs-auto-snapshot.txt

Copyright 2009 Peter Baer Galvin - All Rights Reserved 98

Saturday, May 2, 2009


ZFS Automatic Snapshots (cont)
Service properties provide more details

zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the
system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to
true, so to take frequent snapshots of tank/timf, run the following zfs command:

# zfs set com.sun:auto-snapshot:frequent=true tank/timf


The "snap-children" property is ignored when using this fs-name value. Instead, the system
automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system,
based on the values of the ZFS user properties.

zfs/interval [ hours | days | months | none]

When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to
manually fire the method script whenever they want - useful for snapshotting on system events.

zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four
most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has
been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot.
Setting to "all" keeps all snapshots.

zfs/period How often you want to take snapshots, in intervals set according to "zfs/
interval" (eg. every 10 days)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 99

Saturday, May 2, 2009


ZFS Automatic Snapshots (cont)
zfs/snapshot-children "true" if you would like to recursively take snapshots of all child
filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//'

zfs/backup [ full | incremental | none ]

zfs/backup-save-cmd The command string used to save the backup stream.

zfs/backup-lock You shouldn't need to change this - but it should be set to "unlocked"
by default. We use it to indicate when a backup is running.

zfs/label A label that can be used to differentiate this set of snapshots from
others, not required. If multiple schedules are running on the same machine, using
distinct labels for each schedule is needed - otherwise oneschedule could remove
snapshots taken by another schedule according to it's snapshot-retention policy. (see
"zfs/keep")

zfs/verbose Set to false by default, setting to true makes the service


produce more output about what it's doing.

zfs/avoidscrub Set to false by default, this determines whether we should avoid


taking snapshots on any pools that have a scrub or resilver in progress. More info in the
bugid:

6343667 need itinerary so interrupted scrub/resilver doesn't have to start over


Copyright 2009 Peter Baer Galvin - All Rights Reserved 100

Saturday, May 2, 2009


ZFS Automatic Snapshot (cont)

http://blogs.sun.com/erwann/resource/
menu-location.png

Copyright 2009 Peter Baer Galvin - All Rights Reserved 101

Saturday, May 2, 2009


ZFS Automatic Snapshot (cont)

If life-preserver icon enabled in file browser,


then backup of directory is available
Press to bring up nav bar

Copyright 2009 Peter Baer Galvin - All Rights Reserved 102

Saturday, May 2, 2009


ZFS Automatic Snapshot (cont)
Drag slider into past to show previous version
of files in the directory
Then right-click on afile and select “Restore to
Desktop” if you want it back
More features coming

Press to bring up nav bar


Copyright 2009 Peter Baer Galvin - All Rights Reserved 103

Saturday, May 2, 2009


ZFS Status
Netbackup, Legato support ZFS for
backup / restore
VCS supports ZFS as file system of
clustered services
Most vendors don’t care which file system
app runs on
Performance as good as other file systems
Feature set better

Copyright 2009 Peter Baer Galvin - All Rights Reserved 104

Saturday, May 2, 2009


ZFS Futures
Support by ISVs
Backup / restore
Some don’t get metadata (yet)
Use zfs send to emit file containing filesystem

Clustering (see Lustre)

Performance still a work in progress


Being ported to BSD, Mac OS Leopard
Check out the ZFS FAQ at
http://www.opensolaris.org/os/community/zfs/faq/

Copyright 2009 Peter Baer Galvin - All Rights Reserved 105

Saturday, May 2, 2009


ZFS Performance
From http://www.opensolaris.org/jive/thread.jspa?
messageID=14997
billm

  Reply
On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote:
> Does ZFS reorganize (ie. defrag) the files over time?

Not yet.

> If it doesn't, it might not perform well in "write-little read-much"


> scenarios (where read performance is much more important than write
> performance).

As always, the correct answer is "it depends". Let's take a look at


several cases:

- Random reads: No matter if the data was written randomly or


sequentially, random reads are random for any filesystem,
regardless of their layout policy. Not much you can do to
optimize these, except have the best I/O scheduler possible.

Copyright 2009 Peter Baer Galvin - All Rights Reserved 106

Saturday, May 2, 2009


ZFS Performance (cont)

- Sequential writes, sequential reads: With ZFS, sequential writes


lead to sequential layout on disk. So sequential reads will
perform quite well in this case.

- Random writes, sequential reads: This is the most interesting


case. With random writes, ZFS turns them into sequential writes,
which go *really* fast. With sequential reads, you know which
order the reads are going to be coming in, so you can kick off
a bunch of prefetch reads. Again, with a good I/O scheduler
(which ZFS just happens to have), you can turn this into good read
performance, if not entirely as good as totally sequential.

Believe me, we've thought about this a lot. There is a lot we can do to
improve performance, and we're just getting started.

Copyright 2009 Peter Baer Galvin - All Rights Reserved 107

Saturday, May 2, 2009


ZFS Performance (cont)
For DBs and other direct-disk-access-
wanting applications
There is no direct I/O in ZFS
But can get very good performance by
matching I/O size of the app (e.g.
Oracle uses 8K) with recordsize of zfs
file system
This is set at filesystem create time
Copyright 2009 Peter Baer Galvin - All Rights Reserved 108

Saturday, May 2, 2009


ZFS Performance (cont)
The ZIL can be a bottleneck on NFS servers
NFS does sync writes
Put the ZIL on another disk, or on SSD
ZFS aggressively uses memory for caching
Low priority user, but can cause temporary
conflicts with other users
Use arcstat to monitor memory use
http://www.solarisinternals.com/wiki/index.php/
Arcstat
Copyright 2009 Peter Baer Galvin - All Rights Reserved 109

Saturday, May 2, 2009


ZFS Backup Tool
Zetaback is a thin-agent based ZFS backup tool

Runs from a central host

Scans clients for new ZFS filesystems


Manages varying desired backup intervals (per host) for

full backups
incremental backups

Maintain varying retention policies (per host)

Summarize existing backups


Restore any host:fs backup at any point in time to any target
host
https://labs.omniti.com/trac/zetaba
Copyright 2009 Peter Baer Galvin - All Rights Reserved 110

Saturday, May 2, 2009


zfs upgrade
On-disk format of ZFS changes over time
Forward-upgradeable, but not backward
compatible
Watch out when attaching and detaching zpools
Also “sent” not readable by older zfs versions
# zfs upgrade
This system is currently running ZFS filesystem version 2.
The following filesystems are out of date, and can be upgraded. After being
upgraded, these filesystems (and any ’zfs send’ streams generated from
subsequent snapshots) will no longer be accessible by older software
versions.
VER FILESYSTEM
--- ------------
1 datab
1 datab/users
1 datab/users/area51

Copyright 2009 Peter Baer Galvin - All Rights Reserved 111

Saturday, May 2, 2009


Automatic Snapshots and Backups

Unsupported services, may become


supported
http://blogs.sun.com/timf/entry/
zfs_automatic_snapshots_0_10
http://blogs.sun.com/timf/entry/
zfs_automatic_for_the_people

Copyright 2009 Peter Baer Galvin - All Rights Reserved 112

Saturday, May 2, 2009


ZFS - Smashing!

http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18
Copyright 2009 Peter Baer Galvin - All Rights Reserved 113

Saturday, May 2, 2009


Storage Odds and Ends
iostat -y shows performance info on multipathed devices

raidctl is RAID configuration tool for multiple RAID controllers

fsstat file-system based stat command


# fsstat -F
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
0 0 0 0 0 0 0 0 0 0 0 ufs
0 0 0 26.0K 0 52.0K 354 4.71K 1.56M 0 0 proc
0 0 0 0 0 0 0 0 0 0 0 nfs
53.2K 1.02K 24.0K 8.99M 48.6K 4.26M 161K 44.8M 11.8G 23.1M 6.58G zfs
0 0 0 2.94K 0 0 0 0 0 0 0 lofs
7.26K 2.84K 4.30K 31.5K 83 35.4K 6 40.5K 41.3M 45.6K 39.2M tmpfs
0 0 0 410 0 0 0 33 11.0K 0 0 mntfs
0 0 0 0 0 0 0 0 0 0 0 nfs3
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 0 0 0 0 0 0 0 0 autofs
Copyright 2009 Peter Baer Galvin - All Rights Reserved 114

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes
http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html
Example 1: ZFS Filesystem

Objectives:

Understand the purpose of the ZFS filesystem.

Configure a ZFS pool and filesystem.

Requirements:

A server (SPARC or x64 based) running the OpenSolaris OS.

Configuration details from the running server.

Step 1: Identify your Disks.

Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here:

# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t2d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0
1. c0t3d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0
Specify disk (enter its number): ^D

Copyright 2009 Peter Baer Galvin - All Rights Reserved 115

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 2: Add your disks to your ZFS pool.

# zpool create -f mypool c0t3d0s0


# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 10G 94K 10.0G 0% ONLINE -
Step 3: Create a filesystem in your pool.

# zfs create mypool/myfs


# df -h /mypool/myfs
Filesystem size used avail capacity Mounted on
mypool/myfs 9.8G 18K 9.8G 1% /mypool/myfs

Copyright 2009 Peter Baer Galvin - All Rights Reserved 116

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 2: Network File System (NFS)

Objectives:

Understand the purpose of the NFS filesystem.

Create an NFS shared filesystem on a server and mount it on a client.

Requirements:

Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS.

Configuration details from the running systems.

Step 1: Create the NFS shared filesystem on the server.

Switch on the NFS service on the server:

# svcs nfs/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/nfs/server:default
# svcadm enable nfs/server
Share the ZFS filesystem over NFS:

# zfs set sharenfs=on mypool/myfs


# dfshares
RESOURCE SERVER ACCESS TRANSPORT
x4100:/mypool/myfs x4100 - -

Copyright 2009 Peter Baer Galvin - All Rights Reserved 117

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 2: Switch on the NFS service on the client.

This is similar to the the procedure for the server:

# svcs nfs/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/nfs/client:default
# svcadm enable nfs/client
Mount the shared filesystem on the client:

# mkdir /mountpoint
# mount -F nfs x4100:/mypool/myfs /mountpoint
# df -h /mountpoint
Filesystem size used avail capacity Mounted on
x4100:/mypool/myfs 9.8G 18K 9.8G 1% /mountpoint

Copyright 2009 Peter Baer Galvin - All Rights Reserved 118

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Example 3: Common Internet File System (CIFS)

Objectives:

Understand the purpose of the CIFS filesystem.

Configure a CIFS share on one machine (from the previous example) and make it available on the other machine.

Requirements:

Two servers (SPARC or x64 based) running the OpenSolaris OS.

Configuration details provided here.

Step 1: Create a ZFS filesystem for CIFS.

# zfs create -o casesensitivity=mixed mypool/myfs2


# df -h /mypool/myfs2
Filesystem size used avail capacity Mounted on
mypool/myfs 2 9.8G 18K 9.8G 1% /mypool/myfs2
Step 2: Switch on the SMB Server service on the server.

# svcs smb/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/smb/server:default
# svcadm enable smb/server

Copyright 2009 Peter Baer Galvin - All Rights Reserved 119

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 3: Share the filesystem using CIFS.

# zfs set sharesmb=on mypool/myfs2


Verify using the following command:

# zfs get sharesmb mypool/myfs2


NAME PROPERTY VALUE SOURCE
mypool/myfs2 sharesmb on local
Step 4: Verify the CIFS naming.

Because we have not explicitly named the share, we can examine the default name assigned to it using the following command:

# sharemgr show -vp


default nfs=()
zfs
zfs/mypool/myfs nfs=()
/mypool/myfs
zfs/mypool/myfs2 smb=()
mypool_myfs2=/mypool/myfs2
Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown.

Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS.

Add the following line to the end of the file:

other password required pam_smb_passwd.so.1 nowarn

Copyright 2009 Peter Baer Galvin - All Rights Reserved 120

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 6: Change the password using the passwd command.


# passwd username
New Password:
Re-enter new Password:
passwd: password successfully changed for root
Now repeat Steps 5 and 6 on the Solaris client.
Step 7: Enable CIF client services on the client node.
# svcs smb/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/smb/client:default
# svcadm enable smb/client

Copyright 2009 Peter Baer Galvin - All Rights Reserved 121

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 8: Make a mount point on the client and mount the CIFS resource
from the server.

Mount the resource across the network and check it using the following
command sequence:
# mkdir /mountpoint2
# mount -F smbfs //root@x4100/mypool_myfs2 /mountpoint2
Password: *******
# df -h /mountpoint2
Filesystem size used avail capacity Mounted on
//root@x4100/mypool_myfs2 9.8G 18K 9.8G 1% /
mountpoint2
# df -n
/ : ufs
/mountpoint : nfs
/mountpoint2 : smbfs
Copyright 2009 Peter Baer Galvin - All Rights Reserved 122

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 4: Comstar Fibre Channel Target

Objectives

Understand the purpose of the Comstar Fibre Channel target.

Configure an FC target and initiator on two servers.

Requirements:

Two servers (SPARC or x64 based) running the OpenSolaris OS.

Configuration details provided here.

Step 1: Start the SSCSI Target Mode Framework and verify it.

Use the following commands to start up and check the service on the host that provides the target:

# svcs stmf
STATE STIME FMRI
disabled 19:15:25 svc:/system/device/stmf:default
# svcadm enable stmf
# stmfadm list-state
Operational Status: online
Config Status : initialized
Copyright 2009 Peter Baer Galvin - All Rights Reserved 123

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 2: Ensure that the framework can see the ports.

Use the following command to ensure that the target mode framework can see the HBA ports:
# stmfadm list-target -v

Target: wwn.210000E08B909221
Operational Status: Online

Provider Name : qlt

Alias : qlt0,0
Sessions : 4

Initiator: wwn.210100E08B272AB5

Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3

Logged in since: Thu Mar 27 16:38:30 2008


Initiator: wwn.210000E08B072AB5

Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60

Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved 124

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Target: wwn.210100E08BB09221
Operational Status: Online
Provider Name : qlt
Alias : qlt1,0
Sessions : 4
Initiator: wwn.210100E08B272AB5
Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B072AB5
Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60
Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved 125

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 3: Create a device to use as storage for the target.

Use ZFS to create a volume (zvol) for use as the storage behind the
target:

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 68G 94K 68.0G 0% ONLINE -

# zfs create -V 5gb mypool/myvol


# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 5.00G 61.9G 18K /mypool
mypool/myvol 5G 66.9G 16K -

Copyright 2009 Peter Baer Galvin - All Rights Reserved 126

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 4: Register the zvol with the framework.

The zvol becomes the SCSI logical unit (disk) behind the target:
# sbdadm create-lu /dev/zvol/rdsk/mypool/myvol
Created the following LU:
GUID DATA SIZE SOURCE
6000ae4093000000000047f3a1930007 5368643584 /dev/zvol/rdsk/mypool/
myvol

Confirm its existence as follows:

# stmfadm list-lu -v
LU Name: 6000AE4093000000000047F3A1930007

Operational Status: Online

Provider Name : sbd


Alias : /dev/zvol/rdsk/mypool/myvol
View Entry Count : 0
Copyright 2009 Peter Baer Galvin - All Rights Reserved 127

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 5: Find the initiator HBA ports to which to map the LUs.

Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303

Copyright 2009 Peter Baer Galvin - All Rights Reserved 128

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 5: Find the initiator HBA ports to which to map the LUs.

Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303
. . .
Copyright 2009 Peter Baer Galvin - All Rights Reserved 129

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA
ports to it.

Name the group mygroup:


# stmfadm create-hg mygroup
# stmfadm list-hg
Host Group: mygroup

Add the WWNs of the ports to the group:


# stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 \
wwn.210100E08B296A60 \
wwn.210100E08B272AB5 \
wwn.210000E08B072AB5

Now check that everything is in order:


# stmfadmlist-hg-member -v -g mygroup

With the host group created, you're now ready to export the logical unit. This is accomplished by
adding a view entry to the logical unit using this host group, as shown in the following command:
# stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007

Copyright 2009 Peter Baer Galvin - All Rights Reserved 130

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 7: Check the visibility of the targets on the initiator host.

First, force the devices on the initiator host to be rescanned with a simple
script:
#!/bin/ksh
fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln
do
fcinfo remote-port -p $ln -s >/dev/null 2>&1
done
The disk exported over FC should then appear in the format list:
# format
Searching for disks...done
c6t6000AE4093000000000047F3A1930007d0: configured with
capacity of 5.00GB

Copyright 2009 Peter Baer Galvin - All Rights Reserved 131

Saturday, May 2, 2009


Build an OpenSolaris Storage Server in 10 Minutes - cont

...
partition> p
Current partition table (default):
Total disk cylinders available: 20477 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks


0 root wm 0 - 511 128.00MB (512/0/0) 262144
1 swap wu 512 - 1023 128.00MB (512/0/0) 262144
2 backup wu 0 - 20476 5.00GB (20477/0/0) 10484224
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 1024 - 20476 4.75GB (19453/0/0) 9959936
7 unassigned wm 0 0 (0/0/0) 0

partition>

Copyright 2009 Peter Baer Galvin - All Rights Reserved 132

Saturday, May 2, 2009


ZFS Root
Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file
system (as does OpenSolaris)

Note that you can’t as of U6 flash archive a ZFS root system(!)

Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and
upgrading there, then booting there

lucreate to copy the primary BE to create an alternate BE

# zpool create mpool mirror c1t0d0s0 c1t1d0s0


# lucreate -c c1t2d0s0 -n zfsBE -p mpool
The default file systems are created in the specified pool and the non-shared file
systems are then copied into the root pool

Run luupgrade to upgrade the alternate BE (optional)

Run luactivate on the newly upgraded alternatve BE so that when the system is
rebooted, it will be the new primary BE

# luactivate zfsBE
Copyright 2009 Peter Baer Galvin - All Rights Reserved 133

Saturday, May 2, 2009


Life is good
Once on ZFS as root, life is good
Mirror the root disk with 1 command (if not mirrored):
# zpool attach rpool c1t0d0s0 c1t1d0s0
Note that you have to manually do an installboot on the
mirrored disk
Now consider all the ZFS features, used on the boot disk
Snapshot before patch, upgrade, any change
Undo change via 1 command
Replicate to another system for backup, DR
...

Copyright 2009 Peter Baer Galvin - All Rights Reserved 134

Saturday, May 2, 2009


ZFS Labs
What pools are available in your zone?
What are their states?
What is their performance like?
What ZFS file systems?
Create a new file system
Create a file there
Take a snapshot of that file system
Delete the file
Revert to the file system state as of the snapshot
How do you see the contents of a snapshot?

Copyright 2009 Peter Baer Galvin - All Rights Reserved 135

Saturday, May 2, 2009


ZFS Final Thought
Eric Schrock's Weblog - Thursday Nov 17, 2005

UFS/SVM vs. ZFS: Code Complexity

A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People
tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment
to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that
UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being
bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging
effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is
considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a
filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true
measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate
yields:

UFS: kernel= 46806 user= 40147 total= 86953

SVM: kernel= 75917 user=161984 total=237901

TOTAL: kernel=122723 user=202131 total=324854

ZFS: kernel= 50239 user= 21073 total= 71312

The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to
be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code
(kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what
those ZFS numbers will look like in 20 years...
Copyright 2009 Peter Baer Galvin - All Rights Reserved 136

Saturday, May 2, 2009


Copyright 2009 Peter Baer Galvin - All Rights Reserved 137

Saturday, May 2, 2009


Where to Learn More
Community: http://www.opensolaris.org/os/community/zfs
Wikipedia: http://en.wikipedia.org/wiki/ZFS
ZFS blogs: http://blogs.sun.com/main/tags/zfs
ZFS ports
Apple Mac: http://developer.apple.com/adcnews
FreeBSD: http://wiki.freebsd.org/ZFS
Linux/FUSE: http://zfs-on-fuse.blogspot.com
As an appliance: http://www.nexenta.com
Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/
features/articles/zfs_overview.jsp
Copyright 2009 Peter Baer Galvin - All Rights Reserved 138

Saturday, May 2, 2009


Sun Storage 7x10

Copyright 2009 Peter Baer Galvin - All Rights Reserved 139

Saturday, May 2, 2009


Speaking of Futures

The future of Sun storage?


Announced 11/10/2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved 140

Saturday, May 2, 2009


Most Scalable Storage System Design

• Hybrid Flash Storage Pools Read/


> Data is intelligently placed in L2ARC
SSDs
Write/
DRAM, Flash or DIsk ZIL
SSDs
> Transparently Managed as one
storage pool HDD Pool
> Optimizes $/GB and $/IOP (SATA)

performance
• Enterprise Grade Flash
> 3-5 year lifetime

Sun Confidential: Internal Only


Copyright 2009 Peter Baer Galvin - All Rights Reserved 10 141

Saturday, May 2, 2009


Latency Comparison
Bridging the DRAM to HDD Gap
1S

100mS

10mS

1mS

100uS

10uS TAPE

1uS HDD
100nS FLASH/
SSD
10nS
DRAM
1nS
CPU

Copyright 2009 Peter Baer


Sun Confidential: Galvin - All Rights Reserved
Internal Only 35 142

Saturday, May 2, 2009


ZFS Hybrid Pool Example
Based on Actual Benchmark Results
4.9x
3.2x 4%

2x
11%

Read IOPs Write IOPs Cost Storage Power Raw Capacity


(Watts) (TB)

Hybrid Storage Pool (DRAM + Read SSD + Write SSD + 5x


4200 RPM SATA)
Traditional Storage Pool (DRAM + 7x 10K RPM 2.5”)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 143

Sun Confidential: Internal Only 12

Saturday, May 2, 2009


Full Compliment of Storage Software
Included with the system at no additional cost

Data Data Additional


Data Protocols
Protocols Data Services Data Management
Management
Services

• NFS v3 and v4 • Write Flash Acceleration • DTrace Analytics


• CIFS • Read Flash Acceleration • Self-healing system
• ISCSI • RAID-Z DP (6) and data
• HTTP • Mirroring • Simple out-of-the-box
• WebDAV • Striping setup
• FTP • Active-active Clustering • Secure Browser UI
and CLI
• NDMP v4 • Remote Replication • Advanced Networking
• FC Target (Roadmap) • Antivirus Quarantine • NIS, LDAP, and AD
• InfiniBand (Roadmap) • Snapshots • Users, Rolls
• SNMP (r/o, r/w, unlimited)
• Compression • Dashboard
• Alerts
• Phone Home
• Scripting
• Upgrade

Copyright 2009 Peter Baer Galvin - All Rights Reserved 144

Saturday, May 2, 2009


Copyright 2009 Peter Baer
Sun Confidential: Galvin - All Rights Reserved
Internal Only 27 145

Saturday, May 2, 2009


Providing Unprecedented
Storage Analytics
• Automatic real-time visualization of
application and storage related workloads
• Simple yet sophisticated instrumentation
provides real-time comprehensive analysis
• Supports multiple simultaneous application
and workload analysis in real- time
• Analysis can be saved, exported and
replayed for further analysis.
• Built on DTrace instrumentation
> NFSv3, NFSv4, CIFS, iSCSI
> ZFS and the Solaris i/o path
> CPU and Memory Utilization
> Networking (TCP, UDP, IP)
Sun Confidential: Internal Only 7
Copyright 2009 Peter Baer Galvin - All Rights Reserved 146

Saturday, May 2, 2009


ANSWERING KEY QUESTIONS
“What is CPU and Memory
Utilization?”
“How much storage is being
utilized?”
“How is disk performing?
How many Ops/Sec?”
“What Services are active?”
“Which applications/users
are causing performance
issues?”
Sun Confidential: Internal Only
Copyright 2009 Peter Baer Galvin - All Rights Reserved 8 147

Saturday, May 2, 2009


Data Services
ZFS - Continued
• ZFS Useable Space
" Market Leading Usable Space
Double Parity Double Parity Mirrored Single Parity Striped
RAID RAID RAID
Wide Stripes

72% 83% 42% 60% 90%


Copyright 2009 Peter Baer Galvin - All Rights Reserved 148

Saturday, May 2, 2009


Sun Storage 7000 Unified Storage Systems
Price, Performance, Capacity and Availability
7410 Cluster
288 x 3.5” SATAII Disks
Up to 287TB* total storage
Hybrid Storage Pool with Read / Write optimized SSD

7410
288 x 3.5” SATAII Disks
Up to 287TB* total storage
Hybrid Storage Pool with Read and Write optimized SSD
Price

7210
48x 3.5” SATAII Disks
Up to 46TB total storage
Hybrid Storage Pool with Write optimized SSD

7110
16x2.5”SAS Disks, 2.3TB
Standard Storage Pool SSD is not used

*Up to 575TB soon after release Capacity / Performance

Copyright 2009 Peter Baer Galvin - All Rights Reserved 149

Saturday, May 2, 2009


References
You Are Now Free to Move About
Solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved 150

Saturday, May 2, 2009


References
 [Kozierok] TCP/IP Guide, No Starch Press, 2005
 [Nemeth] Nemeth et al, Unix System Administration

Handbook, 3rd edition, Prentice Hall, 2001


 [SunFlash] The SunFlash announcement mailing list

run by John J. Mclaughlin. News and a whole lot more.


Mail sunflash-info@sun.com
 Sun online documents at docs.sun.com


[Kasper] Kasper and McClellan, Automating Solaris
Installations, SunSoft Press, 1995

Copyright 2009 Peter Baer Galvin - All Rights Reserved 151

Saturday, May 2, 2009


References (continued)

 [O’Reilly] Networking CD Bookshelf, Version 2.0,


O’Reilly 2002
 [McDougall] Richard McDougall et al, Resource

Management, Prentice Hall, 1999 (and other


"Blueprint" books)

[Stern] Stern, Eisler, Labiaga, Managing NFS
and NIS, 2nd Edition, O’Reilly and Associates,
2001

Copyright 2009 Peter Baer Galvin - All Rights Reserved 152

Saturday, May 2, 2009


References (continued)
 [Garfinkel and Spafford] Simson Garfinkel and
Gene Spafford, Practical Unix & Internet
Security, 3rd Ed, O’Reilly & Associates, Inc,
2003 (Best overall Unix security book)
 [McDougall, Mauro, Gregg] McDougall, Mauro,

and Gregg, Solaris Internals and Solaris


Performance and Tools, 2007 (great Solaris
internals, DTrace, mdb books)

Copyright 2009 Peter Baer Galvin - All Rights Reserved 153

Saturday, May 2, 2009


References (continued)
 Subscribe to the Firewalls mailing list by sending
"subscribe firewalls <mailing-address>" to
Majordomo@GreatCircle.COM
 USENIX membership and conferences. Contact

USENIX office at (714)588-8649 or office@usenix.org


 Sun Support: Sun’s technical bulletins, plus access to

bug database: sunsolve.sun.com



Solaris 2 FAQ by Casper Dik:
ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ

Copyright 2009 Peter Baer Galvin - All Rights Reserved 154

Saturday, May 2, 2009


References (continued)
 Sun Managers Mailing List FAQ by John
DiMarco:
ftp://ra.mcs.anl.gov/sun-managers/faq
Sun's unsupported tool site (IPV6,
printing)
http://playground.sun.com/
Sunsolve STBs and Infodocs
http://www.sunsolve.com

Copyright 2009 Peter Baer Galvin - All Rights Reserved 155

Saturday, May 2, 2009


References (continued)
 comp.sys.sun.* FAQ by Rob Montjoy: ftp://
rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq

“Cache File System” White Paper from Sun:


http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris-
whitepapers.html
 “File System Organization, The Art of
Automounting” by Sun:
ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps

Solaris 2 Security FAQ by Peter Baer Galvin


http://www.sunworld.com/common/security-faq.html

Secure Unix Programming FAQ by Peter Baer


Galvin
http://www.sunworld.com/swol-08-1998/swol-08-security.html

Copyright 2009 Peter Baer Galvin - All Rights Reserved 156

Saturday, May 2, 2009


References (continued)
 Firewalls mailing list FAQ:
ftp://rtfm.mit.edu/pub/usenet-by-group/
Comp.answers/firewalls-faq
 There are a few Solaris-helping files available via anon
ftp at
ftp://ftp.cs.toronto.edu/pub/darwin/
solaris2
Peter’s Solaris Corner at SysAdmin Magazine
http://www.samag.com/solaris
 Marcus and Stern, Blueprints for High Availability, Wiley,
2000
 Privilege Bracketing in Solaris 10
http://www.sun.com/blueprints/0406/819-6320.pdf

Copyright 2009 Peter Baer Galvin - All Rights Reserved 157

Saturday, May 2, 2009


References (continued)

Peter Baer Galvin's Sysadmin Column (and old Pete's


Wicked World security columns, etc)
http://www.galvin.info
My blog at http://pbgalvin.wordpress.com
Operating Environments: Solaris 8 Operating
Environment Installation and Boot Disk Layout by
Richard Elling
http://www.sun.com/blueprints (March 2000)
Sun’s BigAdmin web site, including Solaris and Solaris
X86 tools and information’
http://www.sun.com/bigadmin

Copyright 2009 Peter Baer Galvin - All Rights Reserved 158

Saturday, May 2, 2009


References (continued)

DTrace
http://users.tpg.com.au/adsln4yb/
dtrace.html
http://www.solarisinternals.com/si/dtrace/
index.php
http://www.sun.com/bigadmin/content/dtrace/

Copyright 2009 Peter Baer Galvin - All Rights Reserved 159

Saturday, May 2, 2009

Anda mungkin juga menyukai