Anda di halaman 1dari 19

TSC SystemsTech Talk Feb , 2009

Logical Domains Networking : An Introduction to Logical Domain Channels

Lui, Hoe Keong

TSC APAC

Agenda
Logical Domains : Virtual IO Model What's the Logical Domain Channel LDC : An Overview LDC Comms : Packet Based LDC Comms : Shared Memory LDom Networks & LDC

Sun Confidential: Need to Know Only

LDOMs : Networking

Virtual network (vnet) device implements a virtual Ethernet device and communicates with other vnet devices in the system using the virtual network switch. Virtual network switch (vsw) is a layer-2 network switch that connects the virtual network devices to the external network and also switches packets between them.
Sun Confidential: Need to Know Only 3

LDOM : Virtual IO Model


Virtualized I/O Model, showing devices shared from an I/O Service Domain through a Logical Domain Channel (LDC) to a Guest Domain The concept of virtual devices is based upon at least one service domain owning a device through the direct I/O model, and establishing a path to the other domains by a logical domain channel. The operating system in the guest domains then sees a virtual device driver with which it can interact as if it were a local, physical device.

Sun Confidential: Need to Know Only

LDOM : Background
LDoms allows you to allocate a systems various resources, such as memory, CPUs, and devices, into logical groupings and create multiple, discrete systems, each with their own operating system, resources, and identity within a single computer system. This is done via facilitating the abstraction of the underlying compute and IO resources LDoms Virtual IO (VIO) infrastructure provides device access to domains via virtualized devices that communicate with a 'service' domain that completely own a device along with its driver, and functions as a proxy to the device This is implemented via a client server model where client virtual devices communicate with their service counterpart via a general purpose channel infrastructure for Inter-domain and domain-Hypervisor communications the Logical Domain Channels ( LDCs )

Sun Confidential: Need to Know Only

LDOM Virtual IO Model : LDC


Virtual device drivers interact with their underlying hardware via the Hypervisor There are two primary reasons to virtualize a device driver : sharing and security LDOM virtual I/O functionaility includes support for virtual networking , disk and console ( along with their correponding services backend ) Underlying the infrastructure of services backends and services consumers is a general purpose channel infrastructure for Inter-domain and domain-Hypervisor communications the Logical Domain Channels ( LDCs ) :
41 131f648 d558 1 1 ldc (sun4v LDC module v1.9) vldc (sun4v Virtual LDC Driver 1.6)

165 7bfd0000

30d8 130

The vldc driver extends the LDC functionality to user level clients via standard driver interfaces

Sun Confidential: Need to Know Only

Logical Domains : What's LDC ?


When you first setup a LDOMS system, you first boot into the "factory-default" machine description which basically gives all of the hardware resources to the control node. Then from the control node you take things away, and setup the virtualization server services, so that guests can be created. You update the machine description for the control node, giving it a new name, then reboot into it. You're ready to create guests. The copies of these machine descriptions sit on the System Controller of the machine. Along with the machine description facility are Logical Domain Channels (LDCs). Each virtualized service between guests, control nodes, service nodes, and the system controller communicate over point to point links. Each link is configured within the hypervisor, there is a transmit queue and a receive queue for each end of the channel. Each entry in the queue holds a 64-byte fixed-size LDC packet. You can size your queues however you like with some minor restrictions.

Sun Confidential: Need to Know Only

Logical Domains : What's LDC ?

Sun Confidential: Need to Know Only

Logical Domains : What's LDC ?


The LDC link layer defines a handshake, reliable and unreliable as well as raw modes of operation. The handshake is used to negoatiate a LDC protocol version that both sides can understand. The handshake also is used to get the sequence numbers initialized so real work can be done on the link. The raw mode elides the handshake entirely, has no packet headers, and just sends raw 64-byte packets over the link. The hypervisor also provides memory sharing facilities for the LDC channels. There is a page table where exported pages are defined, and exported memory is expressed to the remote consumer using "cookies" which essentially define which export page table entry holds the translation, the offset into that page, the page size of the translation, and the size of the area being described. Essentially these cookies are DMA descriptors.

Sun Confidential: Need to Know Only

Logical Domains : What's LDC ?


So the safest thing to do, and what every existing use of LDC channels does, it use the hypervisor copy operation to access imported memory. In this case you only need to handle error return values from the LDC hypervisor call, rather than complicated faults all over the place, when revoked memory is accessed. On top of the LDC protocol sits the VIO layer, which has it's own handshake mechanism. It handles versioning and sequence number initialization just like the LDC handshake does, but it also handles the transfer of device specific attributes such as exported disk size, network device MTU, etc. The VIO handshake also handles the registry of descriptor rings. These rings are how VIO devices setup I/O operations. The ring entries are composed of a generic VIO tag (containing a entry state value, and an ACK field which says if the receiver should ACK the ring entry after it is processed or defer the ACK until it's current run over the ring is complete). After the tag is the device type specific area where virtual disk devices describe the block I/O and virtual network devices can describe the size of the packet etc. Finally, there is an array of cookie entries to describe the I/O buffer.

Sun Confidential: Need to Know Only

10

LDCs : An Overview
A point-to-point duplex link for:
> domain-to-domain > hypervisor-domain > and HV/domain-SP communication.

Two methods for transferring data:


> A simple 64-byte datagram > Shared memory DMA > Link layer protocol provides drivers with the ability to

choose either mechanism for data transfer

Multi-protocol sun4v transport layer


Sun Confidential: Need to Know Only 11

LDCs : An Overview

LDom Manager (ldmd ) specifies LDCs as a node in the Hypervisor Machine Descriptor (MD) The virtual devices in the LDOM VIO infrastructure sees two classes of nodes : > Devices which do not use LDCs : virtual-devices > Devices which use LDCs : channel-devices Virtual-devices nodes under channel-devices can point to channel-endpoint node(s) which represents an instance of a channel endpoint available to this guest domain
Sun Confidential: Need to Know Only 12

LDC : Packet based Comms


A simple packet based transfer mechanism where data is sent in 64-byte packets Each logical domain LDC transport will register Tx and Rx messages queues with the Hypervisor on behalf of the virtual device client(s), along with a target virtual CPU for each LDC endpoint. Each entry on the queue holds 64 bytes of data Transfer is initiated when data is copied into the Tx queue & invoking the Hypervisor API to set the tail to the Tx queue. The Hypervisor triggers a dev_mondo interrupt to the target vCPU for the Rx queue. Data is then transferred from the Tx to the Rx queues at the request of the receiver when it reads the head and tail pointers Link layer protocol is responsible for fragmenting (& reassembling) messages being transferred. It inserts additional header information to each packet to denote the start/end of the fragmented data transfer The packet based transfer approach is recommended for use for short messages

Sun Confidential: Need to Know Only

13

LDC : Shared Memory framework


The LDC Shared Memory framework allows one logical domain to export a number of its own memory pages across a LDC for access & use by the logical domain at the other end of the channel This approach allows clients to export regions of their memory address space with clients ( at the other end of the LDC connection ). This communications method will allow the importing client to access the remote memory region by either mapping it into its address space, use Hypervisor API to copy data to/from the exported memory or program an IOMMU to directly read/write the memory region Logical domains uses export and import map tables ( that are allocated & defined within its own memory ) that are registered with the hypervisor. A map table entry consists of two 64-bit words which corresponds to the location of the shared memory pages

Sun Confidential: Need to Know Only

14

LDOM Networking & LDCs


Virtual Network support within LDOM is realized by two components : > virtual network device (vnet) which emulates a Ethernet device, that communicates with other vnet devices or the virtual switch device (vsw) over a point-to-point connection > All comunications between the virtual network components occur via LDC connections. The job of establishing & tearing down the channel connections is facilitated by the virtual channel nexus (cnex) > Vnet device driver uses a multi-protocol transport infrastructure so that it can use different types of transport to send/receive data > GLDv3 framework ( see OpenSolaris' Project Nemo ) compliant driver implementation i.e., plumbs & configures into IP stack as a regular driver, supports standard ethernet / jumbo MTUs, permits creation of logical interfaces on top of it, support snoop etc. The only material difference lies with the fact that changing MAC addr via the ifconfig cmd is not supported ( by default, LDom manager assigns a unique MAC addr to each vnet interface )

Sun Confidential: Need to Know Only

15

LDOM Networking & LDCs


Virtual Network support within LDOM is realized by two components : > virtual switch device (vsw) functions as a mux/demux for ingress/egress packets for all network traffic ( to and from host systems outside the system ) > May be bound to one physical network interface and/or network group > Virtual switch device comprises of two components : vnet proxy server ( which functions as a switch that interacts with all vnet devices, LDOM mgr & the packet multiplexer on top of it )& packet multiplexer ( functions by sending packets it receives from vnet devices to its destination via the physical network i/f & distribute packets it receives from the physical i/f to the appropriate vnet devices ) > Virtual switch device can be plumbed as a network device with IP routing enabled to allow the vsw device to exchange packets with the outside world, on behalf of its network clients : i.e.,
# dladm show-link | grep -i vsw vsw0 type: non-vlan mtu: 1500 device: vsw0 # ifconfig vsw0 <IP-addr> netmask <netmask-addr> broadcast + up

Sun Confidential: Need to Know Only

16

LDOM Networking & LDCs


Logical Domain 0 Service Domain Solaris TCP/IP and Nemo Ldoms Manager vSwitch packet multiplexer Solaris TCP/IP and Nemo Vnet (leaf) device vnet proxy server Vnet proxy client Logical Domain A

Generic layer

Nemo MAC driver (bge / e1000g)

Ctrl

RX/TX

Hypervisor

Data path

Dumb NIC

Control path Error path

Sun Confidential: Need to Know Only

17

LDOM Networking & LDCs


For the network device there is a single TX descriptor ring created at each end, these are populated locally with transmit packets for receive at the other end. They are imported into the peer using the hypervisor export mechanism. Descriptor ring entries at the importer side are accessed with the aforementioned LDC copy mechanism. I/O is triggered using DRING_DATA packets over the LDC channel, which tell the receiver which entries in the descriptor ring to process. Writes into the local peers descriptor entry just use local cpu loads and stores, ordering is important. The DRING_DATA packets give a start and end descriptor index for the peer to process. The end index can be specified as "-1" which means to just keep processing until you see a descriptor which is not in READY state. Thus is it important for the sending peer to update the state field as the last possible operation, with a memory barrier, such that the receiver does not accidently see a half-initialized descriptor in READY state.

Sun Confidential: Need to Know Only

18

Questions ?

Logical Domains Networking : An Introduction to Logical Domain Channels

Lui, Hoe Keong

APAC TSC

Anda mungkin juga menyukai