Anda di halaman 1dari 42

Solaris 10:

What's New, DTrace and Zones


Simon Ritter
Technology Evangelist
Sun Microsystems
Agenda

• Introduction
• Solaris 10 Feature overview
• Solaris Zones
• Summary and Resources
Solaris Timeline
• 1990: work begins on Solaris 2.0
> Multithreading, scalability, real-time, ...
• 2000: major architectural efforts finish
> 64-bit, IPv6, NFS v3, ...
• Several different engineering teams began to
pursue new, radical ideas
• These ideas have taken 2-3 years to migrate to final
product
• 2004: all available in Solaris 10
TCP/IP Overhaul

• Problem:
> Poor “first byte” TCP/IP latency
> STREAMS are flexible but complex (=slow)
> NCA is fast but not generally used
• Solution:
> Major rewrite of TCP/IP implementation
> Eliminates STREAMS between TCP and IP
> Improved CPU locality
> No API changes
> Result: 20-40% on SPECweb99
Advanced Processor Support
• Problem:
> Emerging multithreaded/multicore processors have
new OS requirements
> Mostly CPU scheduling, synchronization
> Also need to cope with heterogeneous systems
• Solution:
> New scheduler design allows per-CPU optimizations
> Supports a variety of CMT enhancements for SPARC
and x86
> Behavior on existing systems unchanged
Process Rights Management
• Problem:
> Current “all or nothing” privilege model leads to
security problems
> Applications needing only a few privileged operations
run as root (network daemons)
> No way to limit root's privileges
> No way for non-root users to perform privileged
operations
• Solution:
> Fine-grained privileges allow apps and users to run
with just the privileges they need
> Even root can be restricted
Predictive Self-Healing
• Problem:
> Limited resilience to HW faults
> Ad hoc error reporting and handling
> Dependent on human fault diagnosis
• Solution:
> Cohesive structure for fault management
> Consistent standards for error and fault
reporting
> Pluggable diagnosis engines consuming error
event stream
> Tracks dependencies between system
components – needed to limit impact of faults
Predictive Self-Healing (cont'd)
• Problem:
> Ad hoc mechanisms for managing services:
rc scripts, /etc files
• Solution:
> Framework for service management
> Repository for configuration data
> Administrative enable/disable controls
> Fine-grained access control
> Link between applications and fault management
> Automated single-node restart
Solaris Zones
Zones Overview
• Provides virtualized OS services that look like
different Solaris instances
• Can improve system security
• Isolates applications from each other
• Hides details of the underlying platform
• Provides almost arbitrary granularity in isolating
and sharing resources
• Application environment is compatible for existing
programs
When to Deploy Zones
• Hostile and untrustworthy applications
> Sample 2 web servers each binding to port 80
> Untrusted software that should be isolated
• Data center consolidation
> Multiple database instances with different admins
• Hosting
> Consolidate many small customers onto a server,
giving some or all of them a root password
• Software development
> A cheap way to simulate a set of production systems,
test software installation, etcetera
What is a Zone
• Virtual Platform
> File systems
> Network interfaces
> Devices
> Resource Management Controls
• Application Environment
> Security
> Processes
> IPC objects
> Identity: Nodename, Timezone, RPC domain, Locale,
et cetera
Zones Block Diagram
global zone (serviceprovider.com)
blue zone (bslugs.com) foo zone (foo.net) beck zone (beck.org)
zone root: /aux0/bslugs zone root: /aux0/foonet zone root: /aux0/beck
web services login services web services
(Apache 1.3.22, J2SE) (OpenSSH sshd 3.4) (Apache 2.0)

Environment
Application
enterprise services network services network services
(Oracle 8i, IAS 6) (BIND 8.3, sendmail) (BIND 9.2, sendmail)
core services core services core services
(ypbind, automountd) (ypbind, inetd, rpcbind) (inetd, ldap_cachemgr)
/opt/yt

zcons

zcons

zcons
ce0:1
ge0:1

ge0:2

Platform
ce0:2
/usr

/usr

/usr

Virtual
zoneadmd zoneadmd zoneadmd

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

core services
remote admin/monitoring platform administration
(inetd, rpcbind, ypbind,
automountd, snmpd, dtlogin, (SNMP, SunMC, WBEM) (syseventd, devfsadm, ...)
sendmail, sshd, ...)

network device network device storage complex


(ce0) (ge0)
Zone Security Overview

• Each zone has a security boundary around it


• Zones run with reduced privilege
• Important namespaces are isolated
• A compromised zone is not able to escalate its
privileges; thus it cannot compromise the whole
system or another zone
• Processes running in non-global zones (even as
root) are not able to affect activity in other zones
Security: System Calls

• Global zone root user can see and do almost


everything
• Activity is restricted inside a non-global zone at
the system call boundary
> Safe: chmod, chroot, chown and setuid
> Unsafe: memcntl, mknod, stime
> Some calls, such as kill are limited in scope
Security: Other Restrictions

• Other restricted operations


> Loading and unloading of kernel modules
> Plumbing and modifying network interfaces
> Access to DLPI (for example, snoop)
• See also privileges
• By default, cross-zone communication is via the
network only
Process Model in a Zone
• The process namespace is partitioned
• Processes:
> In the same zone interact as usual
> May not see or interact with processes in other zones
> Running in other zones appear not to exist from
within a zone
> In the global zone are able to see processes in all
zones
• proc(4) only provides information about
processes in the zone
• proc_zone privilege required to control processes in
other zones
File Systems in a Zone

• Virtualized view of the file system namespace


• The zonepath is part of the configuration
• The root of the zone is located at
$zonepath/root
• Restricted access to $zonepath
• Per-zone mount table:
> Mounts from global zone into zone
> From the zone configuration or done manually
> Mounts from within zone limited by what is accessible
> vfstab(4)
Networking Overview

• Single TCP/IP stack for the system as a whole


• Are shielded from details like routing and the
network configuration
• Zones are assigned one or more unique
IPv4/IPv6 addresses
• Each zone has its own distinct IP port space
• Zone processes binding to INADDR_ANY only
receive traffic destined for that zone
Networking: Security and Isolation

• Zones cannot view another zone's


network traffic although inter-zone traffic
is permitted
• When a zone is booted, a logical
interface is plumbed for each of its
configured addresses and assigned to
that zone
• Except for ICMP, raw IP socket access
is not permitted from within a zone
Minimal Configuration

• Zone name
> Modeled after host names
• Zone path
• IP address
> Not required, but makes a zone more interesting
> IPv4 and IPv6 (manual configuration only)
> Unless explicitly provided as a prefix length, netmask
is looked up in netmasks(4) of the global zone
Inherited Package Directories

• Four default inherit-pkg-dir resources provided


> /lib, /platform, /sbin, /usr
> Implemented via a read-only loopback file system
mount which provides security as well as storage and
virtual memory efficiencies
• /opt is good to add to this list, unless it will be
configured differently than in the global zone
DTrace
Introducing DTrace

• Dynamic tracing framework introduced in Solaris 10


• Available on stock systems – typical system has more
than 25,000 probes
• Dynamically interpreted language allows for arbitrary
actions and predicates
• Can instrument at both user-level and kernel-level
• Runs on production systems
• No impact on performance when not enabled
Introducing DTrace

• Powerful data management primitives eliminate need


for most postprocessing
• Unwanted data is pruned as close to the source as
possible
• Mechanism to trace during boot
• Mechanism to retrieve all data from a kernel crash
dump
• Much more...
The D language

• D is a C-like language specific to DTrace, with some


constructs similar to awk
• Complete access to kernel C types
• Complete access to statics and globals
• Complete support for ANSI-C operators
• Support for strings as first-class citizen
• We'll introduce D features as we need them...
Probes

• A probe is a point of instrumentation


• A probe is made available by a provider
• Each probe identifies the module and function that it
instruments
• Each probe has a name
• These four attributes define a tuple that uniquely
identifies each probe
• Each probe is assigned an integer identifier
Providers

• A provider represents a methodology for


instrumenting the system
• Providers make probes available to the DTrace
framework
• DTrace informs providers when a probe is to be
enabled
• Providers transfer control to DTrace when an
enabled probe is hit
Consumers
• A DTrace consumer is a process that interacts with
DTrace
• No limit on concurrent consumers; DTrace handles the
multiplexing
• Some programs are DTrace consumers only as an
implementation detail
• dtrace(1M) is a DTrace consumer that acts as a
generic front-end to the DTrace facility
Actions
• Actions are taken when a probe fires
• Actions are completely programmable
• Most actions record some specified state in the system
• Some actions change the state of the system system
in a well-defined way
> These are called destructive actions
> Disabled by default
• Many actions take as parameters expressions in the D
language
Predicates
• Predicates allow actions to only be taken when certain
conditions are met
• A predicate is a D expression
• Actions will only be taken if the predicate expression
evaluates to true
• A predicate takes the form “/expression/” and is placed
between the probe description and the action
Predicates
• For example, tracing the pid of every process named
“date” that performs an open(2):
#!/usr/sbin/dtrace -s
syscall::open:entry
/execname == “date”/
{
trace(pid);
}
Actions: More actions
• tracemem() records memory at a specified
location for a specified length
• stack() records the current kernel stack trace
• ustack() records the current user stack trace
• exit() tells the DTrace consumer to exit with the
specified status
Actions: Destructive actions

• Must specify “-w” option to DTrace


• stop() stops the current process
• raise() sends a specified signal to the current
process
• breakpoint() triggers a kernel breakpoint
• panic() induces a kernel panic
• chill() spins for a specified number of
nanoseconds
DTrace Variables
• Global
> No need to define
> DTrace infers appropriate type
• Thread local
> Separate storage for each thread
> Referenced by self->
• Clause local
> Storage for the clause
> Referenced by this->
Thread-local D variables
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::poll:entry
{
self->ts = timestamp;
}
syscall::poll:return
/self->ts && timestamp – self->ts >
1000000000/
{
printf(“%s polled for %d seconds\n”,
execname,(timestamp – self->ts)
/ 1000000000);
self->ts = 0;
}
Aggregations
• An aggregation is the result of an aggregating
function keyed by an arbitrary tuple
• For example, to count all system calls on a system
by system call name:
dtrace -n 'syscall:::entry \
{
@syscalls[probefunc] = count();
}'

• By default, aggregation results are printed when


dtrace(1M) exits
The DTrace Revolution
• DTrace tightens the diagnosis loop: hypothesis,
instrumentation, data gathering, analysis, hypothesis
• Tightened loop effects a revolution in the way we
diagnose transient failure
• Focus can shift from instrumentation stage to
hypothesis stage:
> Much less labor intensive, less error prone
> Much more brain intensive
> Much more effective! (And a lot more fun)
Summary and
Resources
Summary

• Solaris 10 has lots of great features


• Solaris 10 leapfrogs the competition
• Solaris 10 is open source
• The OS might be a commodity, but it still matters
• Zones are cool
Resources

• All things Solaris


www.sun.com/solaris
• OpenSolaris
www.opensolaris.org
Solaris 10:
What's New, DTrace and Zones
Simon Ritter
simon.ritter@sun.com

Anda mungkin juga menyukai