Anda di halaman 1dari 55

Next Generation Systems Architectures

A Sun Perspective
Franz Haberhauer
Technical Director
Chief Technologist Client Solutions Germany
Franz.Haberhauer@Sun.com
Innovation Matters
$1.9B R&D
Sun's Three Franchises
Sun Microsystems
A True Systems Company
Developer

Apps / Webservices
Middleware
Web App Dir Msg Msg ...

Operating System
Server
Network Storage
The New Software Platform
Shrink-Wrap
App
Application scales from 1
O/S to perhaps 100's of users
Processor
MVS For Desktop:
OS400
AS400 370 1 copy SW = 1 copy HW
DGUX Solaris TM

88K Irix SPARC®


MIPS

VMS Linux AIX


VAX x86 Power
Ultrix
Alpha
Aegis HP-UX Windows
x86
Server
68K Precision
The New Software Platform
Comp
Shrink-Wrap Comp MW
App Comp MW
MW
O/S Comp
Networked
Processor Components MW

OS400 MVS
AS400 370
DGUX Solaris TM

88K Irix SPARC® Service must scale


MIPS
VMS Linux AIX to 10,000's or millions
x86 Power
VAX Ultrix of simultaneous users!
Alpha
Aegis HP-UX Windows
x86
Server
68K Precision
The New Software Platform
Comp
Comp MW
App Comp MW
MW
O/S Comp

Processor MW

OS400 MVS
AS400 370 The
DGUX Solaris TM
Java Platform
TM

88K Irix SPARC®


MIPS

VMS Linux AIX XML


VAX x86 Power
Ultrix
Alpha The Windows
Aegis HP-UX Windows
x86
Server Platform
68K Precision
Application Architecture Evolution
Services = Graphs of Interacting Components

Web
Client/Server Objects Application Web Service Next Next Next

XDR Directories
CORBA RMI COM+ XML Security Models
RPC Administration
XML

XML
XML
EJB JSP cache/ Midlets
Services App Web filter J2ME

Graphs SQL
LDAP X 106
DB
Dir MSG

SLA

Capability Capacity Connectivity

Computing
Pools Internet/
Intranet

Storage
Virtualization

Storage
Pools
New Challenges
● Virtualization
● Provisioning

● Infrastructure

● Services

● Telemetry

● Observability
Service Point Architecture
SAN SAN NAS NAS

Storage Network
Presentation
App
DB
Internet/
Intranet

Directory Security Policy Management


Multiprocessor Cluster /Blade Server
● Loosly coupled
Proc Proc I/O
● Cheap, small nodes
● 1-4 way, few I/O slots
Mem Mem I/O
Vertical Scaling / Scale Up / Data Facing

● Scaling by the number of nodes


● Load scheduling by a separate component (HW or
Memory Switch SW)
● Loadbalancer, Oracle RAC, App Server, Grid Job
Queing System
● Cache coherant SMPs ● Usually only when TA/Session/Job is startet
● Large main memory Cluster
● Highly parallel/parallelizable loads (Web, HPTC)
● One pool of processors ● RAS through replication
Management
● Tightly coupled Proc Proc Proc
● High bandwidth, low latency
Multiple
● Flexible, dynamic scheduling by OS Mem Mem Mem
OS Instances
● Scaling within the node
● Highly available nodes I/O I/O I/O
● Powerful I/O subsystems
Network Switch

Horizontal Scaling / Scale Out / Network Facing


Deutsch's 7 Fallacies of Networking
● The Network is reliable
● Latency is zero
● Bandwidth is infinite
● The network is secure
● Topology doesn't change
● There is one administrator
● Transport cost is zero
Blade Shelf
A Horizontally Scaled System
Blade Server Evolution
Platform Wave 1: Evolving thin server design to provide increased density,
improved environmentals and better system management for enhanced
throughput processing

Platform Wave 2: Adopts the latest interconnect technology and optimizes the
blade design to deliver superior transaction processing

Evolution between Waves is achieved through enhanced service management

Wave 2
Market Adoption

Wave 1

2003 2005
A New Meaning of “System”
What we did inside ● True scalability:
the E10K box… Add performance without
adding management
complexity!
● “Soft configuration” and
“Soft cabling”
We are doing to ● Multiple, secure domains
the network… ● But with a big difference:
Heterogeneous
elements
Network becomes like
SMP backplane
Trend: Traditional Tiers are
Disaggregated and Recomposed
Disaggregation Recomposition into Entities live in
of server state optimized entities the network
and function Appliances, intelligent Collectively, they are
Computation, storage, storage, compute engines, the (new) computer
network, software EJB™ components, N1 Grid:
software services,
load-balancers, etc. The Computer
that is the
Network
Moore's Law 130 nm (100-400M)
90 nm (200-800M)
65 nm (300M-1B)

1T
1T

1B
1B

42M
1M
1M 16-bit 64-bit
29K 3.2M
32-bit
4-bit
2.25K 275K Transistor density
8-bit doubles every
1K 5K
18-24 months

Planar
1 transistor

1959 1970 1980 1990 2000 2010 2020 2030


UltraSPARC IV Die

Data Instruction Cache Instruction Cache Data


Cache Cache
Instruction Logic Instruction Logic

FPU FPU
DTLB DTLB
Core 0

Core 1
ECC for ECC for
L2 Tags L2 Tags

L2 Tags L2 Tags
Challenges in Processor Design

Memory Complexity

Power
Memory Bottleneck
Relative Performance
10000
CPU Frequency 2x Every 2 Years
DRAM Speeds
1000

100
Gap

10
2x Every 6 Years
1
1980 1985 1990 1995 2000 2005
Typical Complex High
Frequency Processor
Time Saved

Thread C M C M C M
Time
Memory Latency Compute

Thread C M C M C M
Time
Memory Latency Compute
HURRY
UP AND
WAIT! Note: Up to 75% Cycles Waiting for Memory
Throughput Computing
● Basic idea
– Maximum application-level work performed
for throughput

● Exploit rich TLP in modern workloads


– Parallelism
– Pipeline simplicity
– Latency tolerance
Chip Multithreading (CMT)

Thread 4 C M C M C M
Thread 3 C M C M C M
Thread 2 C M C M C M
Thread 1 C M C M C M
Time
Memory Latency Compute
CMT – A Simpler Design
● Large caches
● Superscalar design
● Out-of-order execution Faster
time-to-market

● Very high clock rates


● Deep pipelines
● Speculative prefetches
Simple Core: Step and Repeat
Simpler by Design

Limited number of unique transistors


CMT—Multiple Multithreaded Cores
Thread 4
Thread 3
Core 8 Thread 2
Thread 1
Thread 4
Thread 3
Core 7 Thread 2
Thread 1
Thread 4
Thread 3
Core 6 Thread 2
Thread 1
Thread 4
Thread 3
Core 5 Thread 2
Thread 1
Thread 4
Thread 3
Core 4 Thread 2
Thread 1
Thread 4
Thread 3
Core 3 Thread 2
Thread 1
Thread 4
Thread 3
Core 2 Thread 2
Thread 1
Thread 4
Thread 3
Core 1 Thread 2
Thread 1
Time
Memory Latency Compute
How Can CMT Deliver?
1.0
Relative Performance Per Core

0.5

0.5X 1X

10% 100%
Core Size vs. Die Usage
How Can CMT Deliver?

100% 10% 100%

100% x 1 = 1x 50% x 10 = 5x
Throughput Networking:
Changing the Rules Again
Two tasks on two hardware threads
Hardware Increase packet
Thread 1
C C C C processing by devoting
Hardware silicon and threads to
Thread 2 network tasks
Network Processing No missed packets

Compute Packet Processing Context Switching/Interrupts

Chip Multithreading: Increase Network Efficiency


Niagara
http://blogs.sun.com/roller/page/jonathan/20040910

That's what we call a system. A


system built for internet workloads.
Not for the expedience of a press
release. ...
Silicon for our Project Niagara chip: 8 cores * 4 (And before you ask, yes, we are
threads per core = a 32-way computer. On a chip. planning a nicer box when we ship :)
Throughput Gains: An Order of Magnitude Ahead
SPARC: The Next Generation
SPARC Processor Roadmap
Rock
Data
Intensive
Today
30X
SPARC APL SPARC APL+

UltraSPARC III UltraSPARC IV UltraSPARC IV+ 8X


1050 1200 1200

1X 2X 4X

Niagara
Network
Intensive
15Y
UltraSPARC IIIi UltraSPARC IIIi+
1000 1280

1Y 2Y
Joining SPARC Forces
Ult
ra SPA
Advanced
R C IV Product Line
(APL)
Ul
Sun Fire Throu traSPARC
Comp gh IV+ Optimized to address
uting put

Excel Design all network computing


lence workloads

Thousands of Applications ● Multiple product


families (low, mid, high)

n - c r itical ge
o
Missi ing Herita ● Systems based on
ut SPARC64 (jointly
Comp developed) and
Niagara/Rock
(Sun developed)
64 V+
Fujitsu ARC
SP
PRIMEPOWER V
A R C64
SP
2004 2005 2006
Sun x86 System Roadmap
Beyond

Blade Systems Coming Soon

8 Socket Systems Coming Soon


Nauticus Network
Switch Q3 2004
Sun W1100z
W2100z July 30, 2004

Sun Fire V40z July 30, 2004

Sun Fire Feb. 10, 2004


V20z

Sun/AMD Alliance Nov. 17, 2003


AMD64 Benefits Besides 64 Bits
● Full 32-bit compatibility
– Both for i386 ABI and Linux
● Benefits: 32-bit Apps on 64-bit OS
– More Memory - Full 4GB virtual adress space
– System Call and Library Optimizations
– PROT_EXEC
– Faster Kernel – read/write, mmap, etc.
● Large segmap, SEG_KPM, etc.

– Seamless operation
● Apps ported to 64 bits can be faster
– Twice # of Integer Registers
– Increased # of SSE Registers (128-bit register)
– Improved calling conventions
– PIC code no longer causes speed impact (Position Independent Code)
AMD Opteron & HyperTransport Advantages

IA32 and IA64 AMD64


CPU CPU CPU CPU CPU CPU

CPU CPU

NorthBridge

PCI-X PCI-X
bridge bridge
Southbridge

● Shared Frontside Bus


● Modern, distributed
between memory traffic architecture
and I/O is performance
bottleneck
Suns Multi-Platform OS-Strategy

● Complements Solaris on the


● SPARC and x86 x86 platform ● Interoperability
– maybe Power, Itanium ● 32 and 64-Bit
● 64 and 32-Bit ● Horizontal scaling
● Horizontal and vertical scaling ● Sun and 3rd party hw
● Sun and 3rd party hw – wide variety
● Established ISV acceptance ● Starting ISV acceptance
● Runs Linux apps unchanged – Red Hat
(x86) – SuSE
● Defined release cycle ● Fokus on open source less
– Binary compatibility binary compatibility
“write once – run forever” – Solaris (UNIX)
● Directed innovation interoperability
Innovation in Operating Systems
Solaris 10

Trusted
Solaris
Advanced Military-Grade
Tracing Security
N1 Grid Live Monitoring
Containers of Production
Systems
Extreme System Software
Predictive Performance Partitioning
Self Healing Fast TCP/IP Stack
Reduced System
Downtime
Automatic Service
Restart
Server Virtualization with
N1 Grid Containers Network

192.9.9.1 192.9.9.2 192.9.9.3

www store B2B


192.9.9.4 192.9.9.5

appserver oltp
Domain 1
Resource Management Independent
Users

+ Separate
Networks
Disk Storage
and
Security Isolation Independent
Storage File Systems
Fault Isolation Isolated
Containers
N1 Grid Containers
Zones
global zone (serviceprovider.com)
blue zone (blueslugs.com) foo zone (foo.net) beck zone (beck.org)
zone root: /aux0/blueslugs zone root: /aux0/foonet zone root: /aux0/beck
web services login services web services
(Apache 1.3.22, J2SE) (OpenSSH sshd 3.4) (Apache 2.0)

Environment
Application
enterprise services network services network services
(Oracle, Application Server) (BIND 8.3, sendmail) (BIND 9.2, sendmail)
core services core services core services
(ypbind, automountd) (ypbind, inetd, rpcbind) (inetd, ldap_cachemgr)

hme0:2
hme0:1
/opt/yt

zcons
zcons

zcons
ce0:1

ce0:2

Platform
/usr

/usr

/usr

Virtual
zoneadmd zoneadmd zoneadmd

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)


core services remote admin/monitoring platform administration
(inetd, rpcbind, ypbind, (SNMP, SunMC, WBEM) (syseventd, devfsadm, ...)
automountd, snmpd, dtlogin,
sendmail, sshd, ...)

storage complex
network device network device
(hme0) (ce0)
Virtual Machines versus Containers
Multiple OS Instances One OS Instance
5-15%? Overhead

Container

Container

Container
4000 tested on a V880
<1% Overhead
{
OS

OS

OS

Kernel Kernel Kernel

HyperVisor
Host OS { Solaris 10 OS
Kernel

Hardware SPARC or x86

IBM LPAR
HP VPAR
EMC VMware
Server Virtualization
N1 Grid Containers - “N1 Grid In a Box”
● Dynamic
Reconfiguration
● Dynamic
System Domains
Server ● N1 Grid
Containers
Container 1 Container 2 Container 3

Container 4 Container 5
Solaris Solaris
Domain 1 Domain 2
N1 Grid – One System to Manage

The Datacenter as a Single System


“n computers operating as 1”
Sun Microsystems
A True Systems Company
Developer
Java Systems

Apps / Webservices
Middleware
Web App Dir Msg Msg ...

Operator
N1 Grid
Operating System
Server
Network Storage
Scaling the N1 Grid

N1 Grid Distributed
Datacenter

N1 Grid System

N1 Grid Server
N1 Grid System
Workload N1 Grid Engine

Workflow,
N1 Grid Service Provisioning System
Monitoring
& Business Application Services
Policy
Automation
N1 Grid Provisioning Server for Blades
System Infrastructure N1 Grid Containers
N1 Grid Data Platform
N1 Grid Architectural Concepts
● Service Virtualization
● Separating the service from the platform
● "Movable" service components
● Efficient scaling of the number of
instances
● Cloning reference installations
● Service level management
● Performance monitoring and modelling
● Capacity planning
48
There May Still Be A Lockin

Apps / Webservices

Administrative
Lockin
Middleware
Web App Dir Msg Msg ...

Binary Lockin
Operating System
Server
Network Storage
Sun Software Product Stack

Open Source SUN JAVA SYSTEMS


TM

Software
Java Java Java Java
Enterprise Desktop Mobility Card
Infrastructure System System System System
Services
Java Studio
Operating
Systems N1 Grid
Solaris TM
Linux
Hardware
SPARC x86
Java Enterprise System
TM

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Directory
Identity/Access
TM
J2EE Application
Web
Portal
E-Mail/Messaging
Calendar Server
Instant Messaging
Collaboration
Availability
MPEG Streaming
Grid
Virtualization
Solaris
S
Dev. Environment
Portal Server
- Userman., Authentication, Authorization, SSO -
Access Manager
& Directory Server
SecureJava
XML/JSP - Personalized
ES Portal Server
– Centralized
+ Java
JavaES
Server
- Presentation, Transformation, Desktop, Peronalization
Access
SRAPto+Services
Mobile Access
and Information
- Secure Access (Search, + PKP
Files, Mail,
Portlet Applications,
Portlet VPN...)
Portlet Portlet Portlet Portlet
Studio
IDE
- Device aware

Java
LDAP ES
Web/App. Server

CentralUDDI SSO
Directory A S S S S S S
SS S WebS S S
CRM S
S Shop Comm.
Comm. & Collab. Server
and
- User Profiles S S SS SS & Collab. Server
- Passwords Services
Access
- Service S S Server SAP
Java
Web ESS WebWeb
& Application
CMS S SServer
Container EJB Container
S Mail Server
Java ES
Profiles
...Manager Portal S Communication
PKI MTA
Calendar
and Server
Web Server Web Container EJB Container Mail Server
Collaboration
IM Server
Identity Manager Calendar Server
Services
Message Queue IM Server
Address Book
Sync/Provisioning Engine
Java ES Workflow
Rules
Message Oriented Middleware
Java ES Mess.
(JMS, Queue
SOAP, XML) Java ES
Identity Verfügbarkeit
Cluster
Cluster
Manager
Adapters Adapters Adapters
&
&
Grid Computing
Performance
Grid Engine
Java Enterprise System
TM

Delivery
Pricing
Licensing

$100/Employee/Year
Summary
● Sun's franchises
– Innovations matters and pays
● Choice
– SPARC or x86-based systems
– Solaris or Linux on x86-based systems from Sun
● Sun as a true systems vendor
– Pre-integrated Java Systems
● Licensing model fits to throughput computing and N1 Grid
– Other components integratable
– N1 Grid to manage n systems as 1
Sun
The Network is the Computer
Franz Haberhauer
Franz.Haberhauer@Sun.com