Anda di halaman 1dari 79

VXRAIL CONCEPTS AND ARCHITECTURE

VCE VXRAIL APPLIANCE


Hyper-Converged Infrastructure Appliance
from EMC and VMware

Document H15104
Version 1.0
April, 2016

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Copyright 2016 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of
its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, VCE, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United State and
other countries. All other trademarks used herein are the property of their respective owners.
For the most up-to-date regulator document for your product line, go to EMC Online Support (https://support.emc.com).

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Table of Contents
Preface
AUDIENCE ....................................................................................................................... 6
RELATED RESOURCES AND DOCUMENTATION ...................................................................... 6
CONTRIBUTORS ............................................................................................................... 7
CONVENTIONS ................................................................................................................. 7

Introduction
DEPLOYMENT TREND TOWARDS CONVERGED INFRASTRUCTURE ............................................ 8
DESIGN TREND TOWARDS SDDCs ...................................................................................... 9
HYPER-CONVERGED INFRASTRUCTURE ............................................................................. 10

VCE Converged Infrastructure Platforms Overview


BLOCK ARCHITECTURE ................................................................................................... 13
RACK ARCHITECTURE ..................................................................................................... 14
APPLIANCE ARCHITECTURE ............................................................................................. 14
VCE VXRAIL PRODUCT PROFILE ........................................................................................ 15

VxRail Hardware Architecture


VXRAIL APPLIANCE CLUSTER ...........................................................................................
VxRail Node .................................................................................................................
VxRail Node Storage Disk Drives .....................................................................................
VXRAIL MODELS AND SPECIFICATIONS .............................................................................
Scaling ..........................................................................................................................

17
17
19
19
20

VxRail Software Architecture


APPLIANCE MANAGEMENT ............................................................................................... 22
VxRail Manager ............................................................................................................ 22
VxRail Manager Extension .............................................................................................. 23
VMWARE VSPHERE .........................................................................................................
VMware vSphere vCenter Server ....................................................................................
vCenter Server Services and Interfaces ............................................................................
PSC Deployment Options ..............................................................................................
VMware vSphere ESXi ...................................................................................................
ESXi Overview............................................................................................................
Communication between vCenter Server and ESXi Hosts....................................................
Virtual Machines ...........................................................................................................
2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

25
25
26
27
28
28
29
29
3

VXRAIL CONCEPTS AND ARCHITECTURE


Virtual Machine Hardware .............................................................................................. 30
Virtual Machine Communication ....................................................................................... 31

Virtual Networking ........................................................................................................


Standard Virtual Switch ................................................................................................
Virtual Distributed Switch ..............................................................................................
Migration and VMotion ...................................................................................................
Enhanced vMotion Compatibility ......................................................................................
Storage vMotion .........................................................................................................
vSphere Distributed Resource Scheduler ..........................................................................
vSphere High Availability (HA) ........................................................................................
vCenter Server Watchdog ..............................................................................................
vSphere Fault Tolerance (FT) .........................................................................................

31
32
33
34
35
35
36
38
40
41

VIRTUAL SAN .................................................................................................................


Disk Groups .................................................................................................................
Hybrid and All-Flash Differences .....................................................................................
Read Cache: Basic Function ...........................................................................................
Write Cache: Basic Function ..........................................................................................
Flash Endurance ...........................................................................................................
Virtual SANs Impact on Flash Endurance...........................................................................
Client Cache ................................................................................................................
Objects and Components ...............................................................................................
Witness ....................................................................................................................
Replicas ...................................................................................................................
Storage Policy Based Management (SPBM) .......................................................................
Dynamic Policy Changes ...............................................................................................
Storage Policy Attributes ...............................................................................................
I/O Paths and Caching Algorithms ...................................................................................
Read Caching .............................................................................................................
Write Caching ............................................................................................................
Distributed Caching Considerations ..................................................................................
Virtual SAN High Availability and Fault Domains ................................................................
Limitations of Two- and Three-Node Configurations ..............................................................
Fault Domain Overview .................................................................................................
Virtual SAN Stretched Cluster .........................................................................................
Site Locality ..............................................................................................................
Networking ...............................................................................................................
Stretched-Cluster Heartbeats and Site Bias ........................................................................
vSphere HA settings for Stretched Cluster .........................................................................
Snapshots ...................................................................................................................
How Snapshots Work ...................................................................................................
Managing Snapshots ....................................................................................................
Deduplication and Compression ......................................................................................
Advantages of Data-Reduction Technology .........................................................................
In-line Deduplication and Compression per Disk Group ..........................................................
Latency and Resource Consumption .................................................................................
Enabling Deduplication and Compression ...........................................................................
Erasure Coding.............................................................................................................
Enabling Erasure Coding ...............................................................................................
Requirements ............................................................................................................
Overhead Issues (RAID-5 and RAID-6) .............................................................................

42
43
44
44
45
45
45
45
45
46
46
46
47
47
50
50
52
54
55
55
56
57
58
59
59
59
59
60
62
62
62
63
64
64
64
66
67
67

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Integrated Solutions
STORAGE TIERING WITH CLOUDARRAY ............................................................................. 68
INTEGRATED BACKUP AND RECOVERY WITH VSPHERE DATA PROTECTION (VDP) ................... 70
INTEGRATED REPLICATION WITH RECOVERPOINT FOR VIRTUAL MACHINES .......................... 71

Use Case Examples


USE CASE: CREATE IT CERTAINTY FOR VIRTUAL DESKTOP INFRASTRUCTURE (VDI) ...............
Meeting the Virtualization Challenge for Federal Agencies .................................................
USE CASE: SIMPLIFYING THE DISTRIBUTED ENTERPRISE ENVIRONMENT ..............................
Meeting the Distributed Enterprise Challenge for State and Local Agencies ..........................

72
73
74
75

Product Information
PRODUCT SUPPORT ........................................................................................................ 76
EMC PROFESSIONAL SERVICES FOR VXRAIL APPLIANCES .................................................... 76
VSPHERE ORDERING INFORMATION ................................................................................. 77
WED LIKE TO HEAR FROM YOU! ....................................................................................... 77

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Preface
This EMC TechBook provides a thorough conceptual and architectural review of the VCE VxRail
Appliance. It reviews current trends in the industry that are driving adoption of converged
infrastructure and highlights the pivotal role of VxRail in todays modern data center.
As part of an effort to improve and enhance the performance and capabilities of its product lines,
EMC periodically releases revisions of its hardware and software. Therefore, some functions
described in this document may not be supported by all versions of the software or hardware
currently in use. For the most up-to-date information on product features, refer to the product
release notes. If a product does not function as described in this document, please contact your
EMC representative.

AUDIENCE
This TechBook is intended for EMC field personnel, partners, and customers involved in designing, acquiring,
managing, or operating a VxRail Appliance solution. This TechBook may also be useful for Systems Administrators
and EMC Solutions Architects.

RELATED RESOURCES AND DOCUMENTATION


Refer to the following items for related, supplemental documentation, technical papers, and websites.
DRS Web Content at https://www.vmware.com/products/vsphere/features/distributed-switch#sthash.WC5hSHzt.dpuf
EMC CloudArray Product Description Guide: https://www.emc.com/collateral/guide/h13456-cloudarray-pdg.pdf
EMC CloudArray Administrator Guide: http://uk.emc.com/collateral/TechnicalDocument/docu60786.pdf
An overview of VMware VSAN Caching Algorithms at https://www.vmware.com/files/pdf/products/vsan/vmwarevirtual-san-caching-whitepaper.pdf
vSphere Resource Management at http:/www.vmware.com/support/pubs
Virtual SAN 6.2 Stretched Cluster Guide at: http://www.vmware.com/files/pdf/products/vsan/VMware-Virtual-SAN6.2-Stretched-Cluster-Guide.pdf
Virtual SANSparseTech Note for Virtual SAN 6.0 Snapshots at https://www.vmware.com/files/pdf/products/
SAN
vSphere Virtual Machine Administration Guide at https://www.vmware.com/support/pubs/vsphere-esxi-vcenterserver-6-pubs.html
Blogs, web pages, publications, and multimedia content from http://www.hyperconverged.org/

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

CONTRIBUTORS
Along with other EMC and VMware engineers, field personnel, and partners, the following individuals have been
contributors to this TechBook:
Flavio Fomin
Bill Leslie
Arron Lock
Joe Vukson
Sam Huang
Aleksey Lib
Violin Zhang
Colin Gallagher
Megan McMichael
Hanoch Eiron
Gail Riley
Jim Wentworth

CONVENTIONS
EMC uses the following type style conventions in this document.
NormalUsed in running (nonprocedural) text for:
Names of interface elements, such as names of windows, dialog boxes, buttons, fields, and menus
Names of resources, attributes, pools, Boolean expressions, DQL statements, keywords, clauses, environment
variables, functions, and utilities
URLs, pathnames, filenames, directory names, computer names, links, groups, file systems, and notifications
BoldUsed in running (nonprocedural) text for names of commands, daemons, options, programs, processes,
services, applications, utilities, kernels, notifications, system calls, and man pages.
Italic: Used in all text (including procedures) for:
Full titles of publications referenced in text
Emphasis, for example, a new term
Policies and variables
CourierUsed for:
System output, such as an error message or script
URLs, complete paths, filenames, prompts, and syntax when shown outside of running text

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Introduction
The IT infrastructure market is undergoing unprecedented transformation. The most significant transformation is
reflected by two major trends: a deployment trend toward converged infrastructure and a design trend toward
software-defined data centers (SDDCs). Both are responses to the IT realities of infrastructure clutter, complexity,
and high cost; they represent attempts to simplify IT and reduce the overall cost of infrastructure ownership.
Todays infrastructure environments are typically comprised of multiple hardware and software products from
multiple vendors, with each product offering a different management interface and requiring different training. Each
product in this type of legacy stack is likely to be grossly overprovisioned, using its own resources (CPU, memory,
and storage) to address the intermittent peak workloads of resident applications. The value of a single shared
resource pool, offered by server virtualization, is still generally limited to the server layer. All other products are
islands of overprovisioned resources that are not shared. Therefore, low utilization of the overall stack results in the
ripple effects of high acquisition, space, and power costs. Too many resources can be wasted in legacy
environments.

DEPLOYMENT TREND TOWARDS CONVERGED


INFRASTRUCTURE (CI)
Industry-infrastructure deployment has shifted from a build to a buy approach. This shift is being driven by the
need for IT to focus limited economic resources on driving business innovation. While a build-your-own strategy can
achieve a productive IT infrastructure, these deployments can be difficult and lengthy to implement and vulnerable
to higher operating costs, and theyre susceptible to greater risk related to component integration, configuration,
qualification, compliance, and management. Converged infrastructure (CI) packages compute, storage, and
networking components into a single optimized IT solution. CI is a simple, fast, and effective alternative to buildyour-own and has been widely adopted.

CI typically brings together blade-servers, enterprise storage arrays, storage area networks, IP networking,
virtualization, and management software into a single product. CI means that multiple pre-engineered and preintegrated components operate under a single controlled converged architecture with a single point of management
and a single source for end-to-end support. CI provides a localized single resource pool that enables a higher
overall resource utilization than with a legacy island-based infrastructure. Overall acquisition cost is lower and

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


management is simplified. In the data center, CI typically has a smaller footprint with less cabling and can be
deployed much faster than traditional infrastructure.

DESIGN TREND TOWARDS


SOFTWARE-DEFINED DATA CENTERS (SDDCs)
Traditional data centers are hardware-centric. Emerging data centers are software-centric. While the concept is still
evolving, a software-defined data center (SDDC) is a software-centric architectural approach based on virtualization
and automation. To logically define all infrastructure services, the SDDC applies the widely successful principles of
server virtualizationabstraction, isolation, and poolingto the remaining network and storage infrastructure
services. SDDC management is automated through policy-based software which controls both on-premises and offpremises resources. With SDDC, traditional enterprise applications can be supported in a more flexible and cost
effective manner. SDDC represents the epitome of the agile digital business model, where pooled resources adapt
and respond to shifting application requirements.

Figure 1: SDDC
Virtualized servers are probably the most well-known software-defined IT entity, where hypervisors running on a
cluster of hosts allocate hardware resources to virtual machines (VMs). In turn, VMs can function with a degree of
autonomy from the underlying physical hardware. Software-defined storage (SDS) and software-defined
networking (SDN) are based on a similar premise: physical resources are aggregated and dynamically allocated
based on predefined policies with software abstracting control from the underlying hardware. The result is the
logical pooling of compute, storage, and networking resources. Physical servers function as a pool of CPU resources
hosting VMs, while network bandwidth is aggregated into logical resources, and pooled storage capacity is allocated
by specified service levels for performance and durability.
Once the data center has abstracted resources, SDDC services make the data center remarkably adaptable and
responsive to business demands. In addition to virtualized infrastructure, the SDDC includes automation, policybased management, and hybrid cloud services. The policy-based model insulates users from the underlying
commodity technology, and policies balance and coordinate resource delivery. Resources are allocated where
needed, absorbing utilization spikes while maintaining consistent and predictable performance. Conceptually, SDDC
encompasses more than the IT infrastructure itself; it also represents an essential departure from traditional
methods of delivering and consuming IT resources. Infrastructure, platforms, and software have become services,
and SDDC is the fundamental mechanism that underpins the most sophisticated cloud services. The most effective
SDDC deployments are based on technology that provides simple implementation, administration, and
2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


management. This requires an infrastructure solution with an extremely high level of efficiency and serviceability,
such as hyper-converged infrastructure.

HYPER-CONVERGED INFRASTRUCTURE
Hyper-converged infrastructure (HCI) is the next level of converged infrastructure. HCI is a new type of CI with a
software-centric architecture based on smaller, industry-standard building-block servers that can be scaled. HCI
has a software-defined architecture with everything virtualized. Compute, storage, and networking functions are
decoupled from the underlying infrastructure and run on a common set of physical resources that are based on
industry-standard components. Hyper-converged systems do not include separate enterprise storage arrays.
Instead, they adopt industry-standard server platforms with local direct-attached storage (DAS), which is
virtualized using software-defined storage technology. (See Figure 2 below.) By integrating these technologies,
HCI systems are managed as a single system through a common toolset.
The ideal HCI solution integrates these building-block servers with a familiar, simple management software for
reliability and serviceability. This enables efficient and safe use of commodity-off-the-shelf (COTS) hardware.
Simple management software allows a common operational model, which drives efficiency and enables workload
mobility. Other benefits of HCI include a lower total cost of operation as well as flexible scalabilitynodes, which
provide both CPU and storage, can easily be added to meet business demands. Unlike CI, the technologies in HCI
are so integrated that they cannot be broken down into separate components for independent use. HCI offers a
seamless framework of integrated, virtualized, scalable nodes with built-in management.

Figure 2: CI and HCI


HCI carries forward the benefits of CI, including a single shared resource pool, and takes them even further. By
reinventing the underlying data architecture, HCI includes full data services. Complete integration and innovation at
the software layer allows for radically simple end-to-end data management. Deploying new infrastructure, which
could take up to a week in the build-your-own model, can be up and running in under 30 minutes, because HCI
offers such high levels of task automation. Ideally, HCI is fully integrated, preconfigured, and tested. This provides
a simple, cost effective, non-disruptive scalable solution with centralized management functionality, rich data
services, and a single source of support.
HCI enables faster, better, and simpler management of consolidated workloads, virtual desktops, business-critical
applications, and remote office infrastructure.
HCI solutions have distinct features including scalability, simplicity, and data services.
Scalability. Hyper-converged infrastructures are designed to scale out by adding nodes, which provides a
predictable pay-as-you-grow approach. Adding nodes rather than separately adding CPUs or storage capacity,

10

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


provides linear performance and an elastic infrastructure. Dynamic pooled resources are allocated according to
fluctuating workload requirements. This absorbs application workload spikes and maintains performance
consistency. Mid-sized IT departments or remote enterprise-edge locations, like branch offices, can implement an
inexpensive, entry-level HCI solution, starting small and then easily and non-disruptively scaling both capacity and
performance. HCI integration with public-cloud offerings can also seamlessly and securely expand capacity on
demand and without limits to provide a hybrid-cloud solution.
Simplicity. Hyper-convergence changes the game in terms of management and serviceability. Seamless
integration among HCI elements unifies operations, using familiar consistent interfaces, and simplifies
management. In addition, HCI facilitates simple workload mobility within the entire SDDC. The HCI management
software stack includes applications for monitoring, logging, security and access control, compliance and upgrades,
in addition to configuration utilities for virtual machines, network, and data services. The building-block design
provides a superior implementation model in which all the components have been fully integrated, preconfigured,
and tested, making the system simple to set up, expand, and maintain.
Data Services. HCI provides the same level of mission-critical data services provided by traditional high-end
enterprise storage arrays. Enterprise IT applications are designed with the expectation that the IT infrastructure is
equipped for consistent performance, high availability, and disaster recovery. HCI meets these expectations with
rich data services such as deduplication, compression, replication, and backup and recovery. HCI brings
consumption-based infrastructure economics and flexibility to enterprise IT without compromising on performance,
reliability, or availability.
So when should CI be implemented and when is HCI a better option? The answer depends on the scale and scope
of the infrastructure and the workloads. If the purpose is to support a large number of dense workloads and a
multi-petabyte capacity, then CI is a better option. But for a smaller set of workloadsincluding the most
demanding loads like databases and OLTP, but at a smaller scalethen HCI is an excellent option. It also is the
appropriate choice for specific departments or remote offices. In short, HCI is ideal for applications that need agility
and need to scale quickly at the lowest cost per unit. HCI is easy to deploy with little expertise. HCI doesnt replace
CI, but it allows IT to better tier infrastructure for varied application needs. Most IT operations can benefit from a
combination of CI and HCI that can flex to meet the evolving demands of their business.
In summary, IT organizations are rapidly evolving into cloud-centric business models where agility, scalability,
security, resource optimization, and SLAs are paramount. The SDDC architecture makes the hybrid cloud possible
by defining a platform common to both private and public clouds. Enterprises have three ways to establish an
SDDC: 1) build their own; 2) use a converged infrastructure; or 3) use a hyper-converged infrastructure. With
seamless integration of the technology stack, both CI and HCI create platforms that allow IT organizations to
efficiently and effectively transition to a modern Software Defined Data Center (SDDC). HCI is the easiest and
fastest way to stand up a fully virtualized software-defined data center (SDDC) environment so IT organizations can
focus on innovation and adding business value.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

11

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 3: One Destination, Multiple Deployment Approaches

VCE Converged Infrastructure Platforms


Overview
This section reviews the VCE CI and HCI platform architectures and product portfolios and then specifically focuses
on the VCE VxRail Appliance. Included is an introduction to the VxRail architecture and components with a specific
emphasis on the key integrated VMware software technologies that provide VxRail core services and functionality.
VCE, the Converged Platforms Division of EMC, specializes in industry-leading Converged and Hyper-Converged
Infrastructure platforms which simply and quickly transition data centers to a modern SDDC, enabling business
transformation. Simplicity is the core driver behind the VCE portfolio of CI platforms. The VCE mission is to break
down the silos of static infrastructure in the data center and make available flexible, shared pools of resources.
With the VCE portfolio, IT leaders have the flexibility to shift resources from maintaining infrastructure to delivering
new, innovative business services while remaining cost-effective. The VCE portfolio can quickly and reliably
modernize the data center to meet the evolving and dynamic demands of todays tech-savvy business workforce.
VCE pioneered converged infrastructure with the introduction of Vblock systems, which bring together VMware
virtualization, Cisco networking and compute, and EMC storage. The VCE portfolio expanded quickly, offering
increased choice, flexibility, and targeted application-workload solutions as new workload platforms emerged in the
industry. Applications are now typically identified by industry-defined workload platforms: Platform 1.0 which refers
to mainframe-application workloads; Platform 2.0, which refers to client-server and virtualized x86 traditionalapplication workloads; and Platform 3.0, which refers to Big Data applications with new workloads built for cloud,
social and mobile.

12

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 4: Industry-Defined Workload Platforms

The full VCE portfolio features pre-integrated, preconfigured components, tested, validated and qualified with a
single source of support. The VCE portfolio is built on the widely adopted, industry leading VMware technology for
core functionality and management operations. The VCE portfolio features three distinct system-level architectures,
reflected in the graphic below. The architectures are Blocks, Racks and Appliances and the correlated design points
are proven, flexible, and simple. Each architecture has its own distinct role in a SDDC and hybrid-cloud solution
based on application workload and business requirements.

Figure 5: VCE Portfolio

BLOCK ARCHITECTURE
In the Block architecture, VCE offers two product families, Vblock and VxBlock. These systems bring together
VMware virtualization, Cisco networking and compute, and varied EMC storage arrays. The Block system

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

13

VXRAIL CONCEPTS AND ARCHITECTURE


architecture typically implements Cisco UCS server blades configured as ESXi hosts for compute layer services. The
VxBlock adds two fully integrated options for software-defined networking (SDN) and network-layer abstraction,
VMware NSX technology or Ciscos Application Centric Infrastructure (ACI). Within the Vblock product family,
specific models correspond to specific data-center purposes, but they all focus on traditional, mission-critical
enterprise workloads.
The Block architecture design center is proven. Vblock and VxBlock are proven and widely deployed. In fact,
they have become an industry-standard CI system with the terms Vblock and converged infrastructure
often used interchangeably.
The Block system-level architecture has disaggregated compute, memory, network, and storage which allows
for variation at all layers. Vblock and VxBlock also have the traditional elements required to deliver legacy
persistence and networking capabilities. This Block system-level architecture has step-function scaling.
The Block architecture workload and business requirements focus on rich infrastructure services to support
Platform 2.0 applications. Vblock and VxBlock both support any open-system workload in the data center and
have a broad set of traditional data services to meet enterprise business requirements.

RACK ARCHITECTURE
The VxRack expands VCEs industry leading CI portfolio to include hyper-converged infrastructure. The VxRack
architecture scales linearly with hyper-converged node servers that consolidate compute and storage layers. It
incorporates a leaf-spine network architecture specifically designed to accommodate extensive, scale-out workloads
and over a thousand nodes. VCE refers to the VxRack platform as hyper-convergence at rack scale. It represents a
full system deployment that includes integrated storage-attached servers and network hardware. The VxRack
implements VMware EVO SDDC to facilitate ESXi server-based software-defined storage and to deploy a virtualized
NSX network layer over the physical network fabric for SDN. VxRack provides performance, reliability, and
operational simplicity at large scale.
The Rack architecture center is flexible. VCE VxRack is an example of the flexible design center. Its an
adaptable platform in terms of its hardware and persona. (Persona flexibility refers to VxRacks ability to run
multiple hypervisorsESXi or KVMas well as support bare-metal deployments.)
Rack systems are engineered systems with network design as the key differentiator. At scale, leaf-and-spine
and top-of-rack (ToR) cabling architectures are critical. Rack architecture incorporates the leaf-and-spine
network and ToR cabling architectures that enable scaling to hundreds and thousands of nodes, deployed not
in small clusters but as a massive, rack-scale, web-scale, and hyper-scale system. VxRack incorporates the
network fabric as a core part of the system design and management stack. The network is not just bundled
but rather is an integral part of the system with single support and warranty plus management integration.
Rack system-level architecture uses software-defined storage (SDS) and commodity-off-the-shelf (COTS)
hardware. This rack system-level architecture has linear-function scaling.
Rack-architecture workload and business requirements focus on flexibility for different workload types
(Platform 2.0, Platform 3.0, kernel-mode VMs, Linux containers) and come in multiple personas. (VxRack
supports OpenStack and VMware hypervisors initially and will support others in the future).

APPLIANCE ARCHITECTURE
The hyper-converged VxRail appliance features a clustered node architecture that consolidates compute, storage,
and management into a single, resilient, network-ready HCI unit. The software-defined architectural structure
converges server and storage resources, allowing a scale-out, building-block approach, and each appliance carries
management as an integral component. From a hardware perspective, the VxRail node is a server equipped with
integrated direct-attached storage. No network components are included with the appliance; VxRail leaves that up
to the customer (although VCE can bundle switch hardware and NSX can function as an integrated option for SDN).
Typically, organizations with a small IT staff can benefit from the simplicity of the appliance architecture to expedite
application deployment and take advantage of the same data services available from high-end systems.
The Appliance architecture design center is simple. VxRail is simple to acquire, deploy, operate, scale, and maintain.

14

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


The Appliance system-level architecture uses SDS and multi-node servers with integrated storage and can leverage
whatever network infrastructure is available. Appliance architecture provides low-cost and low-capacity entry points
with simple configurations that can easily scale.
Appliance-architecture workload and business requirements focus on simplicity and the ability to start small and grow
easily. VDI and productivity applications are examples of the initial workloads deployed in appliances.

Figure 6: VCE Blocks, Racks, and Appliances


All three VCE converged infrastructure architecture models can be deployed in the same data center or, as shown
below in the Figure 7 below, can be part of a Federated Enterprise Hybrid Cloud (FEHC) that allows integration of
the entire suite of data center solutions (including those in remote, branch and edge locations) and provisioning of
the resources in local or remote sites using a common service catalog.

Figure 7: VCE Converged Infrastructure in the Enterprise Data Center

VCE VXRAIL PRODUCT PROFILE


VxRail was jointly developed by EMC and VMware and is the only fully integrated, preconfigured, and tested HCI
appliance powered by VMware Hyper-Converged Software. Managed through the ubiquitous VMware vCenter Server
interface, VxRail provides a familiar VMware experience that enables streamlined deployment and the ability to
extend existing IT tools and processes. The VxRail Appliance is fully loaded with integrated, mission-critical data

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

15

VXRAIL CONCEPTS AND ARCHITECTURE


services from EMC and VMware including compression, deduplication, replication, and backup. The VxRail Appliance
delivers resiliency and centralized-management functionality enabling faster, better, and simpler management of
consolidated workloads, virtual desktops, business-critical applications, and remote-office infrastructure. As the
exclusive hyper-converged infrastructure appliance from VCE and VMware, VxRail is the easiest and fastest way to
stand up a fully virtualized SDDC environment.
VxRail provides an entry point to the SDDC and caters to small- and medium-sized environments, remote and
branch offices (ROBO), edge departments, and projects within larger organizations. Small-shop IT personnel can
benefit from the simplicity of the appliance model to expedite the application-deployment process while still taking
advantage of data services only typically available in high-end systems. VxRail allows businesses to start small,
with a single appliance, and scale non-disruptively. VxRail is highly configurable. Storage can be configured for both
all-flash or hybrid applications. In addition, appliances are available in nine different models, each with a different
configuration, scale points, and options for processors, storage, and cache capacity. Finally, because the VxRail is
jointly engineered, integrated, and tested, organizations can leverage a single source of support and remote
services from EMC.
Each VxRail appliance holds four server nodes with direct-attached storage drives. VxRail appliances are delivered
ready to deploy and ready to attach to a 10GB customer provided network. At the software layer, VxRail uses
VMware technology for server virtualization, network virtualization, and software-defined storage. VxRail servers
are configured as ESXi hosts, and VMs depend on the virtual switch for logical networking. VMware Virtual SAN
technology embeds storage pooling capabilities at the ESXi-kernel level, a highly efficient design which dramatically
reduces the complexities involved in infrastructure management. The policy-based software in the management
layer controls storage distribution based on application service settings.
The VxRail management platform is a strategic advantage for VxRaila remedy for the HCI systems inherent
operational complexity. VxRail bundles management software as a centralized stack, and the VxRail Manager and
Manager Extension each have a simple dashboard interface to automate and accelerate deployment and to perform
management tasks like upgrades. Since VxRail nodes function as ESXi hosts, the appliance taps vCenter Server for
VM-related management, automation, monitoring, and security. Furthermore, VxRail supports the wider-ranging
VMware ecosystem for high availability, cloud management, and end-user computing services. vSphere is a wellestablished virtualization platforma familiar usable entity in most data centers. The VxRail product relies on a
tailor-made management stack rather than the Advanced Management Pod model used by Vblock and VxBlock.
However, all three VCE product platforms leverage vCenter Server and offer support for optional VMware and EMC
services.
Software-defined functionality provided by VxRail introduces significant advancements in IT services. The appliance
is built around VMware Hyper-Converged Software (HCS), an operational software stack that includes vSphere
functionality for ESXi-based virtualization and VM networking as well as Virtual SAN for SDS. NSX for SDN can also
be easily integrated into the solution as an option. A VxRail implementation integrates smoothly into VMwarecentric data centers and, as a VCE product, it operates in concert with the Block and Rack level deployments. This
allows all data-center assets to be maintained using a single administrative platform, which means monitoring,
upgrading, and diagnostics activities are performed efficiently and reliably. Blocks, Racks, and Appliances use the
same migration technologies from VMware for moving VMs and data, thus providing advantages in workload
mobility. Finally, VxRail supports existing tools and optional services with seamless integration. The VxRail
Extension provides additional EMC services, including RecoverPoint replication, Data Domain for backup, EMC
Remote Secure Services (ESRS) and cloud tiering services. VxRail also has optional support for VCE Vision
Intelligent Operations software, allowing IT shops to leverage integration with VxRack and Vblock, enabling them to
deliver a full enterprise solution for all workloads and to replicate and protect from the enterprise edge to the data
center.

16

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

VxRail Hardware Architecture


The VxRail Appliance family is a proven building block of the Software-Defined Data Center and delivers up to five times
the performance of other hyper-converged appliances. The appliance-based design allows IT centers to scale capacity and
performance non-disruptively, so they can start small and grow incrementally with minimal up-front planning. VxRail
configurations can start with as few as 200 virtual machines (VMs) and scale to thousands. The VxRail architecture
enables a predictable pay-as-you-grow approach that aligns to changing business goals and user demand.
The VxRail is built using a distributed system architecture consisting of modular blocks (a 2U appliance with four nodes)
that scales linearly from 1 to 16 appliances, for a maximum of 64 nodes in a cluster. In addition, different options are
available for compute, memory, and storage configurations to match any use case. Choose from a range of next-gen Intel
processors, variable RAM, storage, and cache capacity for flexible CPU-to-RAM-to-storage ratios. Single-node scaling and
a low-cost entry point lets customers procure just the right amount of storage and compute for todays requirements and
tomorrows growth. Additionally, all-flash models deliver the industrys most powerful HCI to maximize performance and
scale for applications that demand low latency. Figure 8 below shows the basic VxRail Appliance building block: a fournode appliance with storage in front and compute in the back.

Figure 8: VxRail Appliance

VXRAIL APPLIANCE CLUSTER


Again, each VxRail appliance consists of four nodes. Each node includes a server and six storage disk drives, either allflash SSDs or a hybrid mix of flash SSDs and HDDs. The nodes form a networked cluster that can be expanded by adding
more appliances (containing more nodes).

VxRail Node
The VxRail Appliance is assembled with proven server-node hardware that has been integrated, tested, and validated as a
complete solution by EMC. The current generation of VxRail nodes uses Haswell-based Intel Xeon E5-2600 processors. The
Intel Xeon E5 processor family is a multi-threaded, multi-core CPU designed to handle diverse workloads for cloud
services, high-performance computing, and networking. The number of cores and memory capacity differ for each VxRail
Appliance model. Figure 9 below shows a physical view of a node server with its processors, memory and supporting
components.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

17

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 9: VxRail Physical Node Server


Each node server includes the following technology:
1 2 Intel Xeon E5-2600 V3 processors with 6, 8, or 10 cores per processor
16 DDR4 DIMMs, providing memory capacity from 64GB to 512GB per node
A PCIe SAS Controller supporting 6GB SAS speeds
A 64GB SATADOM sub-module
Dual-port network adapters
An integrated graphics BMC port, 2 USB ports, 1 Serial port, 1 VGA port
Figure 10 shows the single node from the back.


Figure 10: VxRail Node Server: Back View

18

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

VxRail Node Storage Disk Drives


Storage capacity for the VxRail Appliance is provided by disk drives that have been integrated, tested, and validated by
EMC. 2.5 form-factor Solid State Disks (SDD) and mechanical Hard Disk Drives (HDD) are managed in logical groups.
Each group has up to six disk drives and each node has one disk group. Disk groups are configured in two ways:
Hybrid configurations, which contain a single SDD flash-based disk for caching (the caching-tier) and multiple HDD
disks for capacity (the capacity-tier)
All-flash configurations, which contain all SDD flash based disk drives
The flash drives used for caching and capacity have different endurance levels. Endurance level refers to the number of
times that an entire flash disk can be written every day for a five-year period before it has to be replaced. A higherendurance SSD is used for caching than for capacity. Currently, the caching tier uses 200GB, 400GB, and 800GB flash
disks, and the capacity tier uses either 3.84TB flash SSDs, 1.2TB HDDs, or 2TB HDDs. All VxRail disk configurations use a
carefully designed cache-to-capacity ratio to ensure consistent performance.

VXRAIL MODELS AND SPECIFICATIONS


Nine VxRail Appliance models are currently available, ranging from the Model 60 with nodes containing a single, 6-core
processor and 64GB of memory to the Model 280F with nodes that use dual, 14-core processors and up to 512GB of
memory. Figure 11 identifies the configuration range for both the hybrid and all-flash nodes.

Figure 11: Configuration ranges for all-flash and hybrid nodes.


(*Certain selections can limit other options that are available.)

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

19

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 12 shows the five VxRail Appliance models that have nodes containing all-flash storage, and Figure 13 shows the
four hybrid disk-configuration models.

Figure 12: All-Flash VxRail Models

Figure 13: Hybrid VxRail Models

Scaling
Current model configurations start with as few as four nodes housed in a single appliance and can grow in one-appliance
increments up to 16 appliances (64 nodes). New appliances can be added non-disruptively, and different model appliances
can be mixed within the larger appliance cluster environment.

20

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 14: VxRail Scaling


A few basic rules regarding scaling are worth considering for planning a cluster build out:

1.

2.

Balance: All nodes in an appliance chassis must be balanced (i.e., be the same).
a.

Only the first appliance must include full four nodes.

b.

Additional appliances can be partially populated with 1, 2, or 3 nodes, or they can be fully populated.

c.

If a drive is added to one node in an appliance, all nodes in that appliance must also receive the drive upgrade.

Flexibility: Appliances in a cluster can be different models and can have different numbers of nodes.
a.

Exceptions:

Hybrid models and flash models cannot be mixed in a cluster.

1GB models (i.e. the VxRail 60) cannot be mixed with 10GB-networking models (i.e. VxRail 120
and higher).

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

21

VXRAIL CONCEPTS AND ARCHITECTURE

VxRail Software Architecture


These sections on software architecture provide a comprehensive examination of all the VxRail software components and
their relationships and co-dependencies. The VCE VxRail Appliance is architected with software for appliance management
and for virtualization and virtual-system management. The software stack comes preinstalled and simply requires running
a configuration wizard on site to integrate the appliance into an existing network environment. The picture below (Figure
15) shows the software layers and the previously discussed underlying hardware represented at a high level.
The VxRail management, operations, and automation software includes:
VxRail Manager
VxRail Manager Extension (including VMware vRealize Log Insightformerly vCenter Log Insight)
Supplemental management options: VCE Vision Intelligent Operations software and additional VMware vRealize
components
The VMware virtualization and virtual-infrastructure management software includes:
vSphere vCenter Server
vSphere ESXi
VMware Virtual SAN (Software-Defined Storage)

Figure 15: VxRail Infrastructure Components

VxRail provides a unique and tightly integrated architecture for VMware environments. VxRail deeply integrates VMware
virtualization software. Specifically, VMware Virtual SAN is integrated at the kernel level and is managed with VMware
vSphere, which enables higher performance for the VxRail as well as automated scaling and wizard-based upgrades.
The next sections review the VxRail management, operations, and automation software in depth.

APPLIANCE MANAGEMENT
VxRail Manager
In the introduction section of this TechBook, we discussed the complexity of the software-defined data center and the
challenges of managing and maintaining an SDDC environment. The VxRail Manager provides a user-friendly dashboard
22

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


interface (shown below in Figure 16) to automate VxRail configuration, VM provisioning, and management The dashboard
Health Tab can be used to monitor the health of all individual appliances and individual nodes in the entire cluster.
Once the appliance is configured and deployed, VxRail Manager can be accessed by pointing a browser at the VxRail
Manager IP address or the DNS host name.

Figure 16: VxRail Manager Dashboard: The Home view displays all the VMs, and the Health Tab indicates CPU, memory,
storage, and usage.

VxRail Manager Extension


VxRail Manager Extension is used for adding new appliances to an existing cluster easily and non-disruptively, monitoring
the appliance resource utilization, expediting diagnostics, and troubleshooting software problems. It can, for instance,
guide systems administrators through the replacement of failed disk drives without disrupting the appliances availability.
The VxRail Manager Extension leverages the underlying VMware vRealize Log Insight product to capture events and
provide real-time holistic notifications about the state of virtual applications, virtual machines, and appliance hardware.
The Manager Extension adopts the VxRail Managers simple, effective dashboard user interface (shown below in Figure
17), providing a consistent look and feel for convenient access to EMC services.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

23

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 17: VxRail Manager Extension displays overall system health, and its Support Tab displays support status
information and resources.
The Manager Extension dashboard lets users directly reach things like EMC knowledge-base articles and user-community
forums for FAQ information and VxRail best practices. The Manager Extension also provides service integration and
simplifies the appliance life-cycle management by delivering patch software and update notifications that can be
automatically installed without interruption or downtime.
Another feature within the Manager Extension is EMC Software Remote Services (ESRS), which enables appliances
deployed offsite to have the same level of support and service as the devices deployed in the main datacenter. ESRS also
can be used for online chat support and EMC field-service assistance. Figure 18 below summarizes its implementation
details.

Figure 18: VxRail Manager Extension ESRS details

24

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Furthermore, the Manager Extension provides access to a digital market (Figure 19) for finding and downloading qualified,
value-add VxRail VM applications such as CloudArray, RecoverPoint for VMs, and vSphere Data Protection (VDP).

Figure 19: VxRail Manager Extension Dashboard Market Tab

In addition to service integration, the Manager Extension augments the VxRail Managers health monitoring via integration
with the VMware vRealize Log Insight to track alerts for hardware, software, and virtual machines. It delivers real-time
automated log management for the VxRail Appliance with log monitoring, intelligent grouping, and analytics to provide
better troubleshooting at scale across VxRail physical, virtual, and cloud environments.

VMWARE VSPHERE
The VMware vSphere software suite delivers an industry-leading virtualization platform to provide application virtualization
within a highly available, resilient, efficient on-demand infrastructuremaking it the ideal software foundation for VxRail.
ESXi and vCenter are components of the vSphere software suite. ESXi is a hypervisor installed directly onto a physical
server node in VxRail, enabling it to be partitioned into multiple logical servers referred to as virtual machines (VMs). VMs
are installed on top of the ESXi server. VMware vCenter server is a centralized management application that is used to
manage the ESXi hosts and VMs.
The following sections will provide in-depth examination of the VMware vSphere software components that are
implemented in the VxRail software architecture.

VMware vSphere vCenter Server


VxRail uses vSphere vCenter Server from VMware as the central administrator for networked ESXi hosts. vCenter Server
provides the VxRail appliance with trusted, functional, and familiar VM management. vCenter Server enables pooling and
manages resources from multiple ESXi servers. (See Figures 20 and 21 below.) A single vCenter Server can manage up to
1,000 ESXi hosts and/or up to 10,000 virtual machines.
The vCenter Server architecture includes the following components:
vSphere Client, which provides direct connection to ESXi hosts.
vSphere Web Client, which provides direct connection to vCenter Server.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

25

VXRAIL CONCEPTS AND ARCHITECTURE


vCenter Server database, which functions as the back-end SQL database for storing the inventory items, security
roles, resource pools, performance data, and other critical information for vCenter Server.
VMware vSphere Platform Services Controller (PSC), which is a new service in vSphere 6 that handles the
infrastructure security functions such as vCenter Single Sign-On, licensing, certificate management, directory services,
and server reservation. The PSC also includes a Lookup Service that keeps topology information about the vSphere
infrastructure for secure component interconnectivity. Other services (such as the Inventory Service) register with the
Lookup Service so they can be located by vCenter Server components (like the vSphere Web Client).

Figure 20: vCenter Server Architecture (1 of 2)

Figure 21: vCenter Server Architecture (2 of 2)

vCenter Server Services and Interfaces


vCenter provides a number of services and interfaces, including:
Core VM and resource services such as an inventory service, task scheduling, statistics logging, alarm and event
management, and VM provisioning and configuration
Distributed services such as vSphere vMotion, vSphere DRS, and vSphere HA
vCenter Server database interface

26

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 22: vCenter Server services

PSC Deployment Options


The Platform Services Controller (PSC) can be deployed either as embedded or external, as depicted in Figure 23.
Embedded PSC is implemented in stand-alone deployments where vCenter Server is the only SSO-integrated solution.
The vCenter Server is bundled with an embedded PSC, and all the PSC services reside on the same host machine as
vCenter Server.
External PSC is deployed in environments with multiple SSO-enabled solutions, and supports an Enhanced Linked
Mode (ELM) that connects multiple vCenter Servers to the External PSC. VxRail administrators have a clear view of all
the vCenter Server instances across all linked vCenter Server systems and can create and replicate roles, permissions,
licenses, and other key data. vCenter supports High-Availability External PSC configurations, where multiple PSCs use
a load balancer to provide resilient availability. (See Figure 24.) The vCenter Server systems can then join that PSC
domain using the IP address of the load balancer. In the end, the ELM-created replicated services that exist on
multiple instances of vCenter Server can be attached to two PSCs implemented in a highly available configuration,
which is resilient to failures.

Figure 23: Embedded and External PCS deployments


2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

27

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 24: External PSCs configured for High Availability

VMware vSphere ESXi


vSphere is the core operational software in the VxRail appliance. vSphere aggregates a comprehensive set of features that
efficiently pools and manages the resources available under the ESXi hosts. Keep in mind that this TechBook focuses on
vSphere technology specifically as it pertains to the VxRail appliance. Features included in other vSphere implementations
may not apply to VxRail and features included in VxRail may not apply to other implementations.

ESXi Overview
VMware ESXi is an enterprise-class hypervisor that deploys and services virtual machines. Diagram 25 illustrates its basic
architecture. ESXi partitions a physical server into multiple secure and portable VMs that can run side by side on the same
physical server. Each VM represents a complete systemwith processors, memory, networking, storage, and BIOSso
any operating system (guest OS) and software applications can be installed and run in the virtual machine without any
modification. The hypervisor provides physical-hardware resources dynamically to virtual machines (VMs) as needed to
support the operation of the VMs. The hypervisor enables virtual machines to operate with a degree of independence from
the underlying physical hardware. For example, a virtual machine can be moved from one physical host to another. Also,
the VMs virtual disks can be moved from one type of storage to another without affecting the functioning of the virtual
machine. ESXi also isolates VMs from one another, so when a guest operating system running in one VM fails, other VMs
on the same physical host are unaffected and continue to run. Virtual machines share access to CPUs and the hypervisor
is responsible for CPU scheduling. In addition, ESXi assigns VMs a region of usable memory and provides shared access to
the physical network cards and disk controllers associated with the physical host. Different virtual machines can run
different operating systems and applications on the same physical computer.

28

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 25: Birds-Eye View: vSphere ESXi Architecture

Communication between vCenter Server and ESXi Hosts


vCenter Server communicates with the ESXi host through a vCenter Server agent, also referred to as vpxa or the
vmware-vpxa service, which is started on the ESXi host when it is added to the vCenter Server inventory. (See Figure
26.) Specifically, the vCenter vpxd daemon communicates through the vpxa to the ESXi host daemon known as the
hostd process. The vpxa process acts as an intermediary between the vpxd process that runs on vCenter Server and the
hostd process that runs on the ESXi host, relaying the tasks to perform on the host. The hostd process runs directly on
the ESXi host and is responsible for managing most of the operations on the ESXi host including creating VMs, migrating
VMs, and powering on VMs.

Figure 26: Communication Between vCenter and ESXi Hosts

Virtual Machines
A virtual machine consists of a core set of the following related files, or a set of objects. (See Figure 27.) Except for the
log files, the name of each file starts with the virtual machines name (VM_name). These files include:
A configuration file (.vmx) and/or a virtual-machine template-configuration file (.vmtx)
One or more virtual disk files (.vmdk)

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

29

VXRAIL CONCEPTS AND ARCHITECTURE


A file containing the virtual machines BIOS settings (.nvram)
A virtual machines current log file (.log) and a set of files used to archive old log entries (-#.log)
Swap files (.vswp), used to reclaim memory during periods of contention
A snapshot description file (.vmsd), which is empty if the virtual machine has no snapshots

Figure 27: Virtual Machine Files

Virtual Machine Hardware


A virtual machine uses virtual hardware. Each guest operating system sees ordinary hardware devices and does not know
that these devices are virtual. (Hardware resources are shown below in Figure 28.) All virtual machines have uniform
hardware, except for a few variations that the system administrator can apply. Uniform hardware makes virtual machines
portable across VMware virtualization platforms. vSphere supports many of the latest CPU features, including virtual CPU
performance counters. It is possible to add virtual hard disks and NICs, and configure virtual hardware, such as CD/DVD
drives, floppy drives, SCSI devices, USB devices, and up to 16 PCI vSphere DirectPath I/O devices.

30

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 28: Hardware resources for VMs

Virtual Machine Communication


The Virtual Machine Communication Interface (VMCI) provides a high-speed communication channel between a virtual
machine and the hypervisor. VMCI devices cannot be added or removed. The SATA controller provides access to virtual
disks and DVD/CD-ROM devices. The SATA virtual controller appears to a virtual machine as an AHCI SATA controller.
Without VMCI, virtual machines would communicate with the host using the network layer, which adds overhead to the
communication. With VMCI, communication overhead is minimal, and tasks requiring that communication can be
optimized. An internal network can transmit an average of slightly over 2Gbps using VMXNET3. VMCI can go up to nearly
10Gbps with twelve 8k-sized queue pairs.
VMCI provides socket APIs that are very similar to the APIs already used for TCP/UDP applications.
For more information about the virtual hardware, see the vSphere Virtual Machine Administration Guide at
https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html.

Virtual Networking
VMware vSphere provides a rich set of networking capabilities that integrate well with sophisticated enterprise networks.
These networking capabilities are provided by ESXi Server and managed by vCenter. Virtual networking provides the
ability to network virtual machines in the same way physical machines are networked. Virtual networks can be built within
a single ESX Server host or across multiple ESX Server hosts. Virtual switches allow virtual machines on the same ESX
Server host to communicate with each other using the same protocols that would be used over physical switches, without
the need for additional networking hardware. ESX Server virtual switches also support VLANs that are compatible with
standard VLAN implementations from other vendors. A virtual switch, like a physical Ethernet switch, forwards frames at
the data link layer. A virtual machine can be configured with one or more virtual Ethernet adapters, each of which has its
own IP address and MAC address. As a result, virtual machines have the same properties as physical machines from a
networking standpoint. In addition, virtual networks enable functionality not possible with physical networks today. The

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

31

VXRAIL CONCEPTS AND ARCHITECTURE


key virtual networking components provided by vSphere are virtual Ethernet adapters, used by individual virtual machines
and virtual switches, which connect virtual machines to each other and connect both virtual machines and the ESX Server
service console to external networks.

Figure 29: Virtual Switch Architecture


An ESXi host might contain multiple virtual switches. The virtual switch connects to the external network through
outbound Ethernet adapters called vmnics, and the virtual switch can bind multiple vmnics together (much like NIC
teaming on a traditional server), extending availability and bandwidth to the virtual machines it services.
Virtual switches are similar to their physical-switch counterparts. A general architecture is depicted in Figure 29. Like a
physical network device, each virtual switch is isolated for security and has its own forwarding table. An entry in one table
cannot point to another port on another virtual switch. The switch looks up only destinations that match the ports on the
virtual switch where the frame originated. This feature stops potential hackers from breaking virtual switch isolation.
Virtual switches also support VLAN segmentation at the port level, so each port can be configured either as an access port
to a single VLAN or as a trunk port to multiple VLANs.
VMware has developed two virtual switchesthe standard switch and the distributed switchfor different applications. The
VxRail supports both switch types through vCenter Server.

Standard Virtual Switch


The standard virtual switch is responsible for connecting virtual machines to a virtual network. It works similar to a
physical switch and controls how virtual machines communicate with one another. The standard switch has a host-level
virtual network configuration. In this case, each ESXi host uses the standard switch both to connect virtual machines to
the physical network and to connect the physical network to VMkernel services, including access to IP storage, such as
NFS or iSCSI.

32

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 30: Single Standard Switch


More than one network can coexist on the same virtual switch (Figure 30), or multiple networks can exist on separate
virtual switches (Figure 31).

Figure 31: Multiple Standard Switches

Virtual Distributed Switch


The VMware vSphere Distributed Switch (VDS) has similar components to those of a standard switch, but functions as a
single virtual switch across all associated hosts. This switch enables virtual machines to maintain consistent network
configuration as they migrate across multiple hosts. A distributed switch is configured in vCenter Server at the data-center
level and makes the configuration consistent across all hosts. vCenter Server stores the state of distributed ports in the
vCenter Server database. Networking statistics and policies migrate with virtual machines when the virtual machines are
moved from host to host. As we discuss in upcoming sections, Virtual SAN relies on VDS for its storage-virtualization
capabilities, and the VxRail Appliance uses VDS for appliance traffic.
Figure 32 provides a VDS overview. Detailed information about VDS is available at:
https://www.vmware.com/products/vsphere/features/distributed-switch#sthash.WC5hSHzt.dpuf

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

33

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 32: Distributed Switch

Migration and VMotion


The advanced capabilities for migrating data without disruption is one of the features that distinguishes the VxRail solution
from other HCI options. In the vSphere virtual infrastructure, migration refers to moving a virtual machine from one host,
datastore, or vCenter Server system to another host, datastore, or vCenter Server system. Different types of migrations
exist including:
Cold, which is migrating a powered-off VM to a new host or datastore
Suspended, which is migrating a suspended VM to a new host or datastore
Live, which uses vSphere vMotion to migrate a live, powered-on VM to a new host and/or uses vSphere Storage
vMotion to migrate the files of a live, powered-on VM to a new datastore
vMotion allows for live migration of virtual machines between compatible ESXi hosts with no disruption or downtime. The
process is summarized in Figure 33. With vMotion, while the entire state of the virtual machine is migrated, the data
storage remains in the same datastore. The state information includes the current memory content and all the information
that defines and identifies the virtual machine. The memory content consists of transaction data and whatever bits of the
operating system and applications in memory. The definition and identification information stored in the state includes all
the data that maps to the virtual machine hardware elements, including BIOS, devices, CPU, and MAC addresses for the
Ethernet cards.

34

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 33: vMotion Migration

A vMotion migration consists of the following steps:
1.

The VM memory state is copied over the vMotion network from the source host to the target host. Users continue to
access the VM and, potentially, update pages in memory. A list of modified pages in memory is kept in a memory
bitmap on the source host.

2.

After most of the VM memory is copied from the source host to the target host, the VM is quiesced. No additional
activity occurs on the VM. During the quiesce period, vMotion transfers the VM-device state and memory bitmap to
the destination host.

3.

Immediately after the VM is quiesced on the source host, the VM is initialized and starts running on the target host. A
Gratuitous Address Resolution Protocol (GARP) request notifies the subnet that the MAC address for the VM is now on
a new switch port.

4.

Users access the VM on the target host instead of the source host. The memory pages used by the VM on the source
host are marked as free.

Enhanced vMotion Compatibility


Enhanced vMotion Compatibility (EVC) is a cluster feature that prevents vMotion migrations from failing because of
incompatible CPUs. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if
the actual CPUs on the hosts differ. It prevents migration failures due to CPU incompatibility.

Storage vMotion
Storage vMotion uses an I/O-mirroring architecture to copy disk blocks between source and destination. The image below
(Figure 34) helps to describe the process:
1.

Initiate storage migration.

2.

Use the VMkernel data mover and provide vSphere Storage APIs for Array Integration (VAAI) to copy data.

3.

Start a new VM process.

4.

Mirror I/O calls to file blocks that have already been copied to virtual disk on the target datastore.

5.

Switch to the target-VM process to begin accessing the virtual-disk copy.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

35

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 34: Storage vMotion

The storage-migration process copies the disk just once, and the mirror driver synchronizes the source and target blocks
with no need for recursive passes. In other words, if the source block changes after it migrates, the mirror driver writes to
both disks simultaneously which maintains transactional integrity. The mirroring architecture of Storage vMotion produces
more predictable results, shorter migration times, and fewer I/O operations than more conventional storage-migration
options. Its fast enough to be unnoticeable to the end user. It also guarantees migration success even when using a slow
destination disk.
vSphere 6.0 supports the following Storage vMotion migrations:
Between clusters
Between datastores
Between networks
Between vCenter Server instances for vCenter Servers configured in Enhanced Link Mode with hosts that are timesynchronized
Over long distances (up to 150ms round trip time)

vSphere Distributed Resource Scheduler


VMware Distributed Resource Scheduler (DRS) is a key feature included with vSphere Enterprise Plus and vSphere with
Operations Management Enterprise Plus. DRS balances computing capacity across a collection of VxRail server resources
that have been aggregated into logical pools. It continuously balances and optimizes compute resource allocation among
the VMs. When a VM experiences an increased workload, DRS evaluates the VM priority against user-defined resourceallocation rules and policies. If justified, DRS allocates additional resources. It can also be configured to dedicate
consistent resources to the VMs of particular business-unit applications to meet SLAs and business requirements. DRS
allocates resources to the VM either by migrating the VM to another server with more available resources or by making
more resources for the VM on the same server by migrating other VMs off the server. In the VxRail appliance, all ESXi
hosts are part of a vMotion network. The live migration of VMs to different node servers is completely transparent to end
users through VMotion (see Figures 35 and 36 below). DRS adds tremendous value to the VxRail by automating VM
placement, ensuring consistent and predictable application-workload performance.

36

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 35: DRS Movement of VMs Across Node Servers

Figure 36: VM migration across the vMotion Network


DRS offers a considerable advantage to VxRail users during maintenance situations, because it automates the tasks
normally involved in manually moving live machines during upgrades or repairs. DRS facilitates maintenance automation,
providing transparent, continuous operations by dynamically migrating all VMs to other physical servers. That way,
servers can be attended to for maintenance, or new node servers can be added to a resource pool, all while DRS
automatically redistributes the VMs among the available servers as the physical resources change. In other words, DRS
dynamically balances VMs as soon as additional resources become available when new server is added or when an existing
server has finished its maintenance cycle. DRS allocates only CPU and memory resources for the VMs and uses Virtual
SAN for shared storage.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

37

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 37: Configuring DRS Settings
Some conditions and business operations warrant a more aggressive DRS migration strategy than others. Adjustable
Virtual SAN cluster parameters establish the thresholds that trigger DRS migrations. For example, a Level-2 threshold only
applies specified migration recommendations to make a significant impact on the clusters load balance, whereas a Level-5
threshold applies all the recommendations to even slightly improve the clusters load balance.
DRS applies only to VxRail virtual machines. (Virtual SAN uses a single datastore and handles placement and balancing
internally. Virtual SAN does not currently support Storage DRS or Storage I/O Control.)

vSphere High Availability (HA)


vSphere provides several solutions to ensure a high level of availability, both planned and unplanned downtime scenarios.
vSphere depends on the following technologies to make sure that virtual machines running in the environment remain
available (as in Figure 38):
Virtual machine migration
Multiple I/O adapter paths
Virtual machine load balancing
Fault tolerance
Disaster recovery
Together with Virtual SAN, vSphere HA produces a resilient, highly available solution for VxRail virtual machine workloads.
vSphere HA protects virtual machines by restarting them in the event of a host failure. It leverages the ESXi cluster
configuration to ensure rapid recovery from outages, providing cost-effective high availability for applications running in
virtual machines. When a host joins a cluster, its resources become part of the cluster resources. The cluster manages the
resources of all hosts within it. In a vSphere environment, ESXi clusters are responsible for vSphere HA, DRS, and the
Virtual SAN technology that provides VxRail software-defined storage capabilities.

38

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 38: vSphere HA
vSphere HA provides several points of protection for applications:
It circumvents any server failure by restarting the virtual machines on other hosts within the cluster.
It continuously monitors virtual machines and resets any detected VM failures.
It protects against datastore accessibility failures and provides automated recovery for affected virtual machines. With
Virtual Machine Component Protection (VMCP), the affected VMs are restarted on other hosts that still have access to
the datastores.
It protects virtual machines against network isolation by restarting them if their host becomes isolated on the
management or VMware Virtual SAN network. This protection is provided even if the network has become partitioned.
Once vSphere HA is configured, all workloads are protected. No actions are required to protect new virtual machines and
no special software needs to exist within the application or virtual machine.
Included in the failover capabilities in vSphere HA is a service called the Fault Domain Manager (FDM) that runs on the
member hosts. After the FDM agents have started, the cluster hosts become part a fault domain, and a host can exist in
only one fault domain at a time. Hosts cannot participate in a fault domain if they are in maintenance mode, standby
mode, or disconnected from vCenter Server.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

39

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 39: Fault Domain Management
FDM uses a master-slave operational model (Figure 39). An automatically designated master host manages the fault
domain, and the remaining hosts are slaves. FDM agents on slave hosts communicate with the FDM service on the master
host using a secure TCP connection. In the VxRail environment, vSphere HA is enabled only after the Virtual SAN cluster
has been configured. Once vSphere HA has started, vCenter Server contacts the master host agent and sends it a list of
cluster-member hosts along with the cluster configuration. That information is saved to local storage on the master host
and then pushed out to the slave hosts in the cluster. If additional hosts are added to the cluster during normal operation,
the master agent sends an update to all hosts in the cluster.
The master host provides an interface to vCenter Server for querying and reporting on the state of the fault domain and
virtual machine availability. vCenter Server governs the vSphere HA agent, identifying the virtual machines to protect and
maintaining a VM-to-host compatibility list. The agent learns of state changes through hostd, and vCenter Server learns
of them through vpxa. The master host monitors the health of the slaves and takes responsibility for virtual machines
that had been running on a failed slave host. Meanwhile, the slave host monitors the health of its local virtual machines
and sends state changes to the master host. A slave host also monitors the health of the master host.
vSphere HA is configured, managed, and monitored through vCenter Server. Cluster configuration data is maintained by
the vCenter Server vpxd process. If vxpd reports any cluster configuration changes to the master agent, the master
advertises a new copy of the cluster configuration information and then each slave fetches the updated copy and writes
the new information to local storage. Each datastore includes a list of protected virtual machines. The list is updated after
vCenter Server notices any user-initiated power-on (protected) or power-off (unprotected) operation.

vCenter Server Watchdog


One method of providing vCenter Server availability is to use the Watchdog feature in a vSphere HA cluster. Watchdog
monitors and protects vCenter Server services. If any services fail, Watchdog attempts to restart them. If it cannot restart
the service because of a host failure, vSphere HA restarts the virtual machine (VM) running the service on a new host.
Watchdog can provide better availability by using vCenter Server processes (PID Watchdog) or the vCenter Server API
(API Watchdog).

40

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

vSphere Fault Tolerance (FT)


vSphere Fault Tolerance provides a higher level of availability, allowing users to protect any virtual machine from a host
failure with no loss of data, transactions, or connections. Fault Tolerance works through redundancy. It duplicates the
virtual machine workload and transactions onto an identical virtual machine on a different host so it can be used for
transparent failover. In other words, it implements a primary and secondary VM, as in Figure 40 below. The key is
ensuring that the states of the primary and secondary virtual machines remain identical at all points in the instruction
execution.


Figure 40: Fault Tolerance

vSphere Fault Tolerance creates two complete virtual machines. Each virtual machine has its own .vmx configuration file
and .vmdk files. The protected virtual machine is the primary, and the secondary VM runs on another host. It can take
over at any point without interruption, providing fault-tolerant protection.
The primary and secondary virtual machines continuously monitor the status of one another to securely maintain fault
tolerance. If the primary VM fails, the secondary is activated immediately as a replacement. At that point, a new
secondary virtual machine is started and redundant fault tolerance is reestablished automatically. Furthermore, if a host
failure occurs on the secondary VM, it is also immediately replaced. In either case, users experience no interruption in
service and no loss of data.
vSphere Fault Tolerance needs to be compatible with DRS. Using both solutions requires that the Enhanced vMotion
Compatibility mode be enabled. Then DRS can make initial placement recommendations for fault-tolerant virtual machines
knowing that fault-tolerant primary and secondary VMs cannot run on the same host.
vSphere Fault Tolerance can accommodate symmetric multiprocessor (SMP) virtual machines with up to four vCPUs.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

41

VXRAIL CONCEPTS AND ARCHITECTURE

VIRTUAL SAN
VxRail leverages VMwares Virtual SAN software, which is fully integrated with vSphere to access full-featured,
efficient, and cost-effective software-defined storage. Virtual SAN aggregates locally attached disks of vSphere
cluster hosts to create a pool of distributed shared storage. (See Figure 41 below.) IT centers can easily scale up
the Virtual SAN storage solution by adding new or larger disks to the ESXi hosts (nodes) and just as easily scale it
out by adding new ESXi hosts to the cluster. This provides the flexibility to start with a very small environment and
scale it over time, adding new hosts and more disks. VM-level policies can be set and modified on the fly to control
storage provisioning and day-to-day management of storage service-level agreements (SLAs). vSphere and Virtual
SAN are integrated into VxRail to deliver enterprise-class features for VMs such as vMotion, HA, and DRS and to
provide storage scale and performance.
Virtual SAN is a software-based distributed storage solution that is built into the ESXi hypervisor. Its preconfigured
and managed through vCenter to provide storage capacity across all VxRail appliance nodes. The applianceinitialization process collects locally attached storage disks from each ESXi node in the cluster to create a
distributed, shared-storage datastore. The amount of storage in the Virtual SAN datastore is an aggregate of all of
the capacity drives in the cluster. Cache drives are not used in calculating the size of the datastore. For example, if
a cluster has eight hosts, and each host contributes three 12GB SAS drives, the Virtual SAN datastore will be
approximately 288GB. All VMs created in VxRail are automatically added to the Virtual SAN datastore. A typical
VxRail configuration would have four ESXi node servers for each appliance, and the disk group for each node
contains at least one flash SSD and three-to-five HDDs.

Figure 41: Virtual SAN Datastore


Virtual SAN enables rapid storage provisioning within vCenter as part of the VM-creation and -deployment operations.
Virtual SAN is policy driven and designed to simplify storage provisioning and management. It automatically and
dynamically matches requirements with underlying storage resources based on VM-level storage policies. With Virtual
SAN, VxRail provides two different node-storage configuration options: a hybrid configuration that leverages both flash
SSDs and mechanical HDDs, and an all-flash SSD configuration. The hybrid configuration uses flash SSDs at the cache tier

42

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


and mechanical HDDs for capacity and persistent data storage. This delivers enterprise performance and a resilient
storage platform. The all-flash configuration uses flash SSDs for both the caching tier and capacity tier.

Disk Groups
Storage disks in VxRail hosts are organized into disk groups, and they contribute to the storage available from the
Virtual SAN cluster. Think of disk groups as the main unit of storage in on an ESXi host. (See Figure 42 below.) In a
VxRail appliance, a disk group contains a maximum of one flash-cache device and up to five capacity devices:
either mechanical disks or flash devices used as capacity in an all-flash configuration. Each server node (ESXi host)
has its own disk group.

Figure 42: VxRail Disk Groups

In hybrid configurations, a disk group combines a single flash-based device for caching with multiple mechanicaldisk devices for capacity. For theses deployments, the flash device is assigned during configuration to provide the
cache for a given set of capacity devices. This gives a degree of control over performance because the cache-tocapacity ratio is based on disk-group configuration. Wider cache-to-capacity ratios generally require flash devices of
larger capacity. Currently, the VxRail Appliance is offered with 200GB, 400GB, or 800GB cache-tier flash devices for
hybrid configurations.
The screenshot below (Figure 43) identifies the disk group on a host that contains four disks. The first is a flash
SSD, and its role is defined as Cache. The other three disks are HDDs defined as Capacity. The role of the disks,
either cache or capacity, is automatically set in the VxRail appliance.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

43

VXRAIL CONCEPTS AND ARCHITECTURE


Figure 43: Disk Group Configuration


Hybrid and All-Flash Differences
The cache is used differently in hybrid and all-flash configurations. In hybrid disk-group configurations (which use
mechanical HDDs for the capacity tier and flash SSD devices for the caching tier), the caching algorithm attempts
to maximize both read and write performance. The flash SSD device serves two purposes: a read cache and a write
buffer. 70 percent of the available cache is allocated for storing frequently read disk blocks, minimizing accesses to
the slower mechanical disks. The remaining 30 percent of available cache is allocated to writes. Multiple writes are
coalesced and written sequentially if possible, again maximizing mechanical HDD performance.
In all-flash configurations, one designated flash SSD device is used for the cache tier, while additional flash SSD
devices are used for the capacity tier. In all-flash disk-group configurations, there are two types of flash SSDs: a
very fast and durable flash device that functions as write cache and more cost-effective SSD devices that function
as capacity. Here, the cache-tier SSD is 100 percent allocated for writes. None of the flash cache is used for reads;
read performance from capacity-tier flash SSDs is more than sufficient for high performance. Many more writes can
be held by the cache SSD in an all-flash configuration, and writes are only written to capacity when needed, which
extends the life of the capacity-tier SSD.
While both configurations dramatically improve the performance of VMs running on Virtual SAN, all-flash
configurations provide the most predictable and uniform performance regardless of workload.

Read Cache: Basic Function


The read cache, which only exists in hybrid configurations, keeps a collection of recently read disk blocks. This
reduces the I/O read latency in the event of a cache hit, i.e. the disk block can be fetched from cache rather than
mechanical disk. For a given VM data block, Virtual SAN always reads from the same replica/mirror. However,
when there are multiple replicas (to tolerate failures), Virtual SAN divides up the caching of the data blocks evenly
between the replica copies.
If the data block being read from the first replica is not in cache, the directory service is referenced to discover
whether or not the data block exists in the cache of another mirror (on another host) in the cluster. If the data
44

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


block is found there, the data is retrieved. If the data block isnt in cache on the other host, then there is a readcache miss. In that case, the data is retrieved directly from the mechanical HDD.

Write Cache: Basic Function


The write cache, found in both hybrid and all flash configurations, behaves as a non-volatile write buffer. This
greatly improves performance in both hybrid and all-flash configurations and also extends the life of flash capacity
devices in all-flash configurations. When writes are written to cache, Virtual SAN ensures that a copy of the data is
written elsewhere in the cluster. All VMs deployed with Virtual SAN are set with a default availability policy that
ensures at least one additional copy of the VM data is available. This includes making sure that writes end up in
multiple write caches in the cluster.
Once an application running inside the guest OS initiates a write, it is duplicated to the write cache on the hosts
that include replicas of the storage objects. This means that in the event of a host failure, a copy of the data is in
cache and no data loss occurs. The VM simply uses the replicated copy of the cache data.

Flash Endurance
Flash endurance is related to the number of write/erase cycles that the cache-tier flash SSD can tolerate before it
begins having issues with reliability. For Virtual SAN 6.0 and VxRail configurations, the endurance specification has
been changed to use Terabytes Written (TBW); previously the specification was full Drive Writes Per Day (DWPD).
By quoting the specification in TBW, VMware allows vendors the flexibility to use larger capacity drives with lower
full DWPD specifications. For example, from an endurance perspective, a 200GB drive with a specification of 10 full
DWPD is equivalent to a 400GB drive with a specification of 5 full DWPD. If VMware kept a specification of 10 DWPD
for Virtual SAN flash devices, the 400 GB drive with 5 DWPD would be excluded from the Virtual SAN certification.
By changing the specification to 2TBW per day, both the 200GB drive and 400GB drives meet the certification
requirement. 2TBW per day is the equivalent of 5DWPD for the 400GB drive and is the equivalent of 10 DWPD for
the 200GB drive. For all-flash Virtual SAN deployments running high workloads, the flash-cache device specification
is 4TBW per daythe equivalent of 7300 TB Writes over five years.

Virtual SANs Impact on Flash Endurance


There are two commonly used approaches to improve NAND Flash endurance: improve wear leveling and minimize
write activity. Unfortunately, a distributed storage implementation that focuses on localizing data on the same node
where the VMs reside prevents the distribution of the writes across all the drives in the cluster. This localization
inevitably increases drive usage, leading to early drive replacement.
In contrast, Virtual SAN distributes the objects and components of a VM across all the disk groups in the VxRail
cluster. This distribution significantly improves wear leveling and reduces write activity by deferring writes. Virtual
SAN also reduces writes by employing data-reduction techniques such as de-duplication and compression.

Client Cache
The client cache is used on both hybrid and all-flash configurations. It leverages local DRAM server memory (client
cache) within the node to the VM to accelerate read performance. The amount of memory allocated is .4% -1GB per
host. Virtual SAN first tries to fulfill the read request from the local client cache, so the VM can avoid crossing the
network to complete the read, and its fulfilled faster. If the data is unavailable in the client cache, the cache-tier
SSD is queried to fulfill the read request. The client cache benefits read-cache-friendly workloads.

Objects and Components


VxRail virtual machines are made up of a set of objects. For example, a VMDK is an object, a snapshot is an object,
VM swap space is an object, and the VM home namespace (where the .vmx file, log files, etc. are stored) is also an
object. (See Figure 44 below.)
Virtual-machine objects are split into multiple components based on performance and availability requirements
defined in the VM storage profile. For example, if the VM is deployed with a policy to tolerate failure, the objects
2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

45

VXRAIL CONCEPTS AND ARCHITECTURE


have two replica components. Distributed storage uses a disk-striping process to distribute data blocks across
multiple devices. The stripe itself refers to a slice of divided data; the striped device is the individual drive that
holds the stripe. If the policy contains a stripe width, the object is striped across multiple devices in the capacity
layer, and each stripe is an object component.

Figure 44: Virtual SAN Objects and Components

Each Virtual SAN host has a maximum of 9,000 components. The largest component size is 255GB. For objects
greater than 255GB, Virtual SAN automatically divides them into multiple components. For example, a VMDK of
62TB generates more than 500 x 255GB components.

Witness
In Virtual SAN, witnesses are generally an integral component of every storage object, as long as the object is
configured to tolerate at least one failure. They are components that contain no data, only metadata. Their purpose
is to serve as tiebreakers when availability decisions are made to meet the failures to tolerate policy setting, and
theyre used when determining if a quorum of components exist in the cluster.
In Virtual SAN 6.0, storage components can be distributed in such a way that they can guarantee availability
without relying on a witness. In this case, each component has a number of votesat least one or more. Quorum is
calculated based on the rule that requires "more than 50 percent of votes." (Still, many objects have a witness in
6.0.)

Replicas
Replicas make up the virtual machines storage objects. Replicas are instantiated when an availability policy
(NumberOfFailuresToTolerate) is specified for the virtual machine. The availability policy dictates how many replicas
are created and lets virtual machines continue running with a full complement of data even when host, network, or
disk failures occur in the cluster.

Storage Policy Based Management (SPBM)


Virtual SAN policies define virtual-machine storage requirements, such as performance and availability. These
policies determine how storage objects are provisioned and allocated within the datastore to guarantee the required
level of service.

46

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Virtual SAN implements Storage Policy Based Management, and each virtual machine deployed in a Virtual SAN
datastore has at least one assigned policy. When the VM is created and assigned a storage policy, the policy
requirements are pushed to the Virtual SAN layer. (See Figure 45 below.)

Figure 45
Policy assignments can be manually or automatically generated, based on rules. For instance, all virtual machines
that include with PROD-SQL in their name or resource group might be set at RAID-1 and a 5-percent read-cache
reservation, and TEST-WEB would be automatically set to RAID-0.

Dynamic Policy Changes


Administrators can dynamically change a VM storage policy. When changing attributes such as
NumberOfFailuresToTolerate (FTT), Virtual SAN attempts to find a new placement for a replica with the new
configuration. In some cases, existing parts of the current configuration can be reused, and the configuration just
needs to be updated or extended. For example, if an object currently uses NumberOfFailuresToTolerate=1, and the
user asks for NumberOfFailuresToTolerate=2, Virtual SAN can simply add another mirror (and witness).
In other cases, such as changing the stripe width from one to two, Virtual SAN cannot reuse existing replicas, and it
creates a brand new replica (or replicas) without impacting the existing objects.

Storage Policy Attributes


The screenshot in Figure 46 displays the current policy attributes available with Virtual SAN:

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

47

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 46: Virtual SAN Policy Attributes

Number of Disk Stripes per Object


This policy attribute establishes the minimum number of capacity devices used for striping each virtual-machine
replica. A value higher than 1 might result in better performance, but it also results in higher resource
consumption. The default value is the minimum 1, and the maximum value is 12. The stripe size is 1MB.
Virtual SAN may decide that an object needs to be striped across multiple disks without any stripe-width policy
requirement. The reason for this can vary, but typically it occurs when a VMDK is too large to fit on a single
physical drive. However, when a particular stripe width is required, then it should not exceed the number of disks
available to the cluster.

Flash Cache Reservation


Flash Cache Reservation refers to flash capacity reserved as read cache for the virtual-machine object, and it
applies to hybrid configurations only. By default, Virtual SAN dynamically allocates read cache to storage objects
based on demand. As a result, no need typically exists to change the default 0 value for this parameter.
However, in very specific cases, when a small increase in the read cache for a single VM can provide a significant
change in performance, it is an option. It should be used with caution to avoid wasting resources or taking
resources from other VMs.
The default value is 0 percent. Maximum value is 100 percent.

Number of Failures to Tolerate


This FTT option generally defines the number of host and device failures that a virtual machine object can tolerate.
For n failures tolerated, n+1 copies of the VM object area created and 2n+1 hosts with storage are required.
The default value is 1. Maximum value is 3.
Virtual SAN supports two specific configurations when erasure codes are enabled. The first, RAID-5, applies when
the number of failures to tolerate is set to 1, and the second, RAID-6, applies when the number of failures to
tolerate is set to 2. Note that a Virtual SAN cluster size needs to be at least four hosts for RAID-5 and at least six
hosts for RAID-6. Of course, it may be (much) larger than that.

48

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Fault Tolerance Method
Fault Tolerance Method specifies whether the data-replication method optimizes for performance or capacity. The
RAID-1 mirroring option for performance uses more disk space to place the object components but consumes less
CPU and network resources. RAID-5/6 erasure coding is the capacity option. It uses less disk space, but consumes
more CPU and network resources. (An upcoming section on erasure coding section provides additional information.)

IOPS Limit for Object (QoS)


This attribute defines the IOPS limit for an object, such as a VMDK. IOPS is calculated as the number of disk I/O
operations, using a weighted size. If the system uses the default base size of 32KB, two I/O operations would be
represented as 64KB I/O. This Quality of Service option can be used to keep workloads from impacting each other
(the noisy-neighbor issue) or establish limits for differentiated services.
A few notes regarding IOPS:
When calculating IOPS, read and write are considered equivalent, but keep in mind that cache-hit ratio and
sequentiality are not considered.
When an object exceeds its disk IOPS limit, I/O operations are throttled.
If the IOPS limit for object is set to 0, IOPS limits are not enforced.
Virtual SAN allows the object to double the IOPS-limit rate during the first second of operation or after a period
of inactivity.

Figure 47: IOPS limits impact Quality of Service.

Checksum
Virtual SAN uses end-to-end checksum to ensure the integrity of data by confirming that each copy of a file is
exactly the same as the source file. The system checks the validity of the data during read/write operations, and if
an error is detected, Virtual SAN repairs the data or reports the error. If a checksum mismatch is detected, Virtual
SAN automatically repairs it by overwriting the data by overwriting with correct data. Checksum calculation and
error-correction are background operations.
The default setting for all objects in the cluster is No, which means that checksum is enabled.

Force Provisioning
If this option is set to Yes, the object is provisioned even if the NumberOfFailuresToTolerate,

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

49

VXRAIL CONCEPTS AND ARCHITECTURE


NumberOfDiskStripesPerObject, and FlashReadCacheReservation policies specified in the storage policy cannot be
satisfied by the datastore.
This parameter is used in bootstrapping scenarios and during an outage when standard provisioning is no longer
possible.
The default No is acceptable for most production environments. Virtual SAN fails to provision a virtual machine
when the policy requirements are not met, but it successfully creates the user-defined storage policy.

Object Space Reservation


Object space reservation defines the logical size of the VMDK object as percentage of the actual VMDK. It reflects
the reserved, thick-provisioned space required for deploying virtual machines.
The default value is 0%. Maximum value is 100%.
The value should be set either to 0% or 100% when using RAID-5/6.

I/O Paths and Caching Algorithms1


This section elaborates on some of the Virtual SAN concepts that have been introduced so far with additional,
general information about Virtual SANs caching algorithms. The next paragraphs briefly describe how Virtual SAN
leverages flash, memory, and rotating disks. They also illustrate the I/O Paths between the guest OS and the
persistent storage areas.

Read Caching
Read caching in Virtual SAN exists to separate performance from capacity and deliver low latency and capacity
density at a competitive cost. Part of the SSD is used as the read cache (RC) of the corresponding disk group. The
purpose is to serve the highest possible ratio of read operations from data staged in the RC and to minimize the
portion of read operations served by the HDDs. It leverages the higher IOPS capabilities and lower latencies of the
SSDs to provide a cost-performance solution for the VxRail appliance.
The RC is organized in terms of cache lines. They represent the unit of data management in RC, and the current
size is 1MB. Data is fetched into the RC and evicted at cache-line granularity. In addition to the SSD read cache,
Virtual SAN also maintains a small in-memory (RAM) read cache that holds the most-recently accessed cache lines
from the RC. The in-memory cache is dynamically sized based on the available memory in the system.
Virtual SAN maintains in-memory metadata that tracks the state of the RC (both SSD and in memory), including
the logical addresses of cache lines, valid and invalid regions in each cache line, aging information, etc. These data
structures are designed to compress for efficiencies, using memory space without imposing a substantial CPU
overhead on regular operations. No need exists to swap RC metadata in or out of persistent storage. (This is one
area where VMware holds important IP.)
Read-cache contents are not tracked across power-cycle operations of the host. If power is lost and recovered, then
the RC is re-populated (warmed) from scratch. So, essentially RC is used as a fast-storage tier, and its persistence
is not required across power cycles. The rationale behind this approach is to avoid any overheads on the common
data path that would be required if the RC metadata was persisted every time RC was modifiedsuch as cache-line
fetching and eviction, or when write operations invalidate a sub-region of a cache line.

Anatomy of a Hybrid Read


Read operations follow a defined procedure. To illustrate, the VMDK in the example below has two replicas on esxi1
and esxi3.

1
Much of the content in this specific section has been extracted from an existing technical whitepaper: An overview of
VMware VSAN Caching Algorithms.

50

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


1.

Guest OS issues a read on virtual disk

2.

Owner chooses replica to read from


Load balance across replicas
Not necessarily local replica (if one)
A block always reads from same replica

3.

At chosen replica (esxi-03): read data from flash write buffer, if present

4.

At chosen replica (esxi-03): read data from flash read cache, if present

5.

Otherwise, read from HDD and place data in flash read cache
Allocate a 1MB buffer for the missing cache line and replace coldest data (eviction of coldest data to
make room for new read)
o

Each missing line is read from the HDD as multiples of 64KB chunks, starting with the chunks that
contain the referenced data

6.

Return data to owner

7.

Complete read and return data to VM

8.

Once the 1MB cache line is added to the in-line read cache, its population continues asynchronously. This
occurs to explore both the spatial and temporal locality of reference, increasing the changes that the next
reads will find in the read cache.

Figure 48: Hybrid Read

Anatomy of an All-Flash Read


1.

Guest OS issues a read on virtual disk

2.

Owner chooses replica to read from


Load balance across replicas
Not necessarily local replica (if one)

3.

At chosen replica (esxi-03): read data from flash write buffer, if present

4.

Otherwise, read from capacity flash device

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

51

VXRAIL CONCEPTS AND ARCHITECTURE


5.

Return data to owner

6.

Complete read and return data to VM

Figure 49: All-Flash Read


The major difference is that read-cache misses cause no serious performance degradation. Reads from flash
capacity devices should be almost as quick as reads from the cache SSD. Another significant difference is that no
need exists to move the block from the capacity layer to the cache layer, as in hybrid configurations.

Write Caching
Why write-back caching? In hybrid-configurations, this is done entirely for performance. The aggregate-storage
workloads in virtualized infrastructures are almost always random, thanks to the statistical multiplexing of the
many VMs and applications that share the infrastructure.
HDDs can perform only a small number of random I/O with a high latency compared to SSDs. So, sending the
random write part of the workload directly to spinning disks can cause performance degradation. On the other
hand, magnetic disks exhibit decent performance for sequential workloads. Modern HDDs may exhibit sequentiallike behavior and performance even when the workload is not perfectly sequential. Proximal I/O suffices.
In hybrid disk groups, Virtual SAN uses the write-buffer (WB) section of the SSD (by default, 30 percent of device
capacity) as a write-back buffer that stages all the write operations. The key objective is to de-stage written data
(not individual write operations) in a way creates a benign, near-sequential (proximal) write workload for the HDDs
that form the capacity tier.
In all-flash disk groups, Virtual SAN utilizes the tier-1 SSD entirely as a write-back buffer (100 percent of the
device capacityup to a maximum of 600GB). The purpose of the WB is quite different in this case. It absorbs the
highest rate of write operations in a high-endurance device and allows only a trickle of data to be written to the
capacity flash tier. This approach allows low-endurance, larger-capacity SSDs at the capacity tier.
Nevertheless, capacity-tier SSDs are capable of serving very large numbers of read IOPS. Thus, no read caching
occurs in the tier-1 SSD, except when the most-recent data referenced by a read operation still resides in the WB.

52

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


In either case (hybrid or all-flash), every write operation is handled through transactional processes: A record for
the operation is persisted in the transaction log in the SSD.
The data (payload) of the operation is persisted in the WB.
Updated in-memory tables reflect the new data and its logical address space (for tracking) as well as its
physical location in the capacity tier.
The write operation completes upstream after the transaction has committed successfully.
Commonly (under typical steady-state workload), the log records of multiple write operations are coalesced before
they are persisted in the log. This reduces the amount of metadata-write operations for the SSD. By definition, the
log is a circular buffer, written and freed in a sequential fashion. Thus write amplification can be avoided (good for
device endurance). The WB region allocates blocks in a round-robin fashion, keeping wear leveling in mind. Even
when a write operation overwrites existing WB data, Virtual SAN never rewrites an existing SDD page in place.
Instead, it allocates a new block and updates metadata to reflect that the old blocks are invalid. Virtual SAN fills an
entire SSD page before it moves to the next one. Eventually, entire pages are freed when all their data is invalid.
(It is very rare to re-buffer data to allow SSD pages to be freed). Also, because the device firmware does not have
visibility into invalidated data, it sees no holes in pages. In effect, internal write leveling (by moving data around
to fill holes in pages) is all but eliminated. This extends the overall endurance of a device. In general, the Virtual
SAN design has gone to great lengths to impose a benign workload in terms of endurance. As a result, the life
expectancy of SSDs implemented in VIRTUAL SAN may exceed the manufacturers specifications, which are
developed with more generic workloads in mind.

Anatomy of a Write I/O Hybrid and All-Flash


1.

VM running on host esxi-01

2.

esxi-01 is owner of virtual disk object


Number Of Failures To Tolerate = 1

3.

Object has two (2) replicas on esxi-01 and esxi-03

4.

Guest OS issues write op to virtual disk

5.

Owner clones write operation


In parallel: sends write op to esxi-01 (locally) and esxi-03

6.

esxi-01, esxi-03 persist operation to flash (log)

7.

esxi-01, esxi-03 ACK-write operation to owner

8.

Owner waits for ACK from both writes and completes I/O!

9.

Later, backend hosts commit batch of writes

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

53

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 50: Hybrid and Flash Write I/O

Distributed Caching Considerations


Virtual SANs caching algorithms and data-locality techniques reflect a number of objectives and observations
pertaining to distributed caching:
Virtual SAN exploits temporal and spatial locality for caching.
Virtual SAN implements a distributed, persistent cache on flash across the cluster. Caching is done in front of
the disks where the data replicas live, not on the client side. A distributed-caching mechanism results in better
overall flash-cache utilization.
Another benefit of distributed caching is during VM migrations, which can happen in some data centers over
ten times a day. With DRS and vMotion, VMs can move around from host-to-host in a cluster. Without a
distributed cache, the migrations would have to move around a lot of data and rewarm caches every time a VM
migrates. As the graph below (Figure 51) illustrates, Virtual SAN prevents any performance degradation after a
VM migration.

Figure 51: Virtual SAN prevents performance degradation after VM migration.

54

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


The network introduces a small latency when accessing data on another host. Typical latencies in 10GbE
networks range from 5 50 microseconds. Typical latencies of a flash drive, accessed through a SCSI layer,
are near 1ms for small (4K) I/O blocks. So, for the majority of the I/O executed in the system, the network
impact adds near 0.1 percent to the latency.
Few workloads are actually cache friendly, meaning that they dont take advantage of the way small increases
in cache size can significantly increase the rate of I/O. These workloads can benefit from local cache.
VirtualSAN works with a View Accelerator (deduplicated, in-memory read cache), which is notably effective for
VDI use cases. Remember also that Virtual SAN 6.2 features client cache that leverages DRAM memory local to
the virtual machine to accelerate read performance. The amount of memory allocated is anywhere from .4
percent to 1GB per host.

Virtual SAN High Availability and Fault Domains


Virtual SAN policy attributes establish parameters to protect against host failures, but they may not be the most
effective or efficient way to build tolerance for events like rack failures. This section surveys the availability
solutions for Virtual SAN clusters on the VxRail appliance. It starts out by looking at the availability implications on
small VxRail deployments with fewer than four nodes.

Limitations of Two- and Three-Node Configurations


Currently, VxRail clusters a minimum of four nodes. If start small the ideal for scalability, why not begin even
smaller than the four-node cluster? Virtual SAN supports a three-node cluster, but IT shops that deploy it needs to
understand the trade-off between the cost of the hardware and software components and the degree of availability
that the configuration provides. Two- and three-node configurations can behave differently from configurations with
at least four nodes. In particular, the system can come up short in the event of a failure. Such small clusters have
slim resourcescertainly not enough to rebuild components on another host and automatically restore fault
tolerance. Also two-node and three-node configurations affect VM uptime during certain host-maintenance
operations that require data migration to another host.
Recall that Virtual SAN replication requires two copies of data and a witnessall of which reside on a different host.
In configurations with fewer than four nodes, thats a problem. At best they can only tolerate one failure. If a node
fails, Virtual SAN can neither rebuild components nor provision new VMs that tolerate failures until the failed node
is replaced. When the applications require maximum availability, both for planned and unplanned outage scenarios,
a configuration with at least four nodes is recommended.
That said, VCE is planning a two-node VxRail appliance for the near future. The two-node deployment is targeted at
ROBO locations where a small witness VM can reside in the central data center (1+1+1) or in the cloud. Each of the
nodes is a failure domain. The witness VM requires two vCPUs, 8GB of memory, 15GB of capacity, and 10GB for
caching.
On larger enterprise deployments, a three- or four-node Virtual SAN cluster could be deployed in the central data
center to host all the witnesses (as in Figure 52 below). All sites could be managed centrally by a single instance of
vCenter. (vSphere limitations apply: 1,000 hosts per vCenter, etc.)

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

55

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 52: ROBO implementation: witness VMs at a central location.

Fault Domain Overview


Virtual SAN and VxRail appliances implement fault domains as a solution for tolerating rack and site failures. Fault
domains instruct Virtual SAN to spread redundancy components across the servers in separate racks. They protect
the environment from a rack-level failure such as loss of power or connectivity. Consider, for example, a cluster
with four VxRail appliances, each one placed in a different rack. The nodes of each appliance can be in a different
fault domain.
In terms of implementation, any host that is not part of another fault domain is considered its own single-host fault
domain. Virtual SAN requires at least two fault domains, and each has at least one host. Fault-domain definitions
recognize the physical hardware constructs that represent the domain itself. Once the domain is enabled, Virtual
SAN applies the active virtual-machine storage policy to the entire domain, instead of just to the individual hosts.
The number of fault domains in a cluster is calculated based on the FTT attribute: (NumberOfFaultDomains) = 2 *
(NumberOfFailuresToTolerate) + 1

Figure 53: Managing Fault Domains

56

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Fault Domains and Rack-Level Failures
The fault-domain mechanism is smart enough to perceive when the configuration is vulnerable. Consider a cluster
that contains four server racks, each with two hosts. If the NumberOfFailuresToTolerate is set to 1, and fault
domains are not enabled, Virtual SAN might store both replicas of an object with hosts in the same rack, and if
thats the case, applications are exposed to a potential rack-level failure. With fault domains enabled however,
Virtual SAN ensures that each protection component (replicas and witnesses) is placed in a separate fault domain.
It makes sure that the hosts cant fail together. The chart below (Figure 54) illustrates a four-server rack, each with
two ESXi hosts.
Four defined Fault Domains:
FD1 = esxi-01, esxi-02
FD2 = esxi-03, esxi-04
FD3 = esxi-05, esxi-06
FD4 = esxi-07, esxi-08

Figure 54: Fault Domains for a Four-Server VxRail Rack

This configuration guarantees that the replicas of an object are stored in hosts of different rack enclosures,
ensuring availability and data protection in case of a rack-level failure.

Virtual SAN Stretched Cluster


We touched on the advantages of the Virtual SANs native integration with vSphere, and the concept of a stretched
cluster is exactly the kind of thing we were talking about. This is a case where deploying VxRail technology extends
the availability of the larger enterprise data center. The stretched cluster is a specific configuration implemented in
environments where the requirement for data-center level disaster/downtime avoidance is absolute. Weve already
reviewed the way fault domains enable rack awareness for rack failures. This section discusses how fault domains
leverage data-center awareness, providing virtual-machine availability despite specific data-center failure
scenarios.
In a VxRail environment, stretched clusters with witness host refers to a deployment where a Virtual SAN cluster
consists of two active/active sites with an identical number of ESXi hosts distributed evenly between them. The
sites are connected via a high bandwidth/low latency link.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

57

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 55: Stretched VxRail Cluster

In the graphic above (Figure 55), each site is configured as a Virtual SAN fault domain. The nomenclature used to
describe the stretched-cluster configuration is X+Y+Z, where X is the number of ESXi hosts at Site A, Y is the
number of ESXi hosts at Site B, and Z is the number of witness hosts at site C.
A virtual machine deployed on a stretched cluster has one copy of its data on Site A, and another on Site B, as well
as witness components placed on the host at Site C.
Its a singular configuration, achieved only through a combination of fault domains, hosts and VM groups, and
affinity rules. In the event of a complete site failure, the other site still has a full copy of virtual-machine data and
at least half of the resource components available. That means all the VMs remain active and available on the
Virtual SAN datastore.
The minimum supported configuration is 1+1+1 (3 nodes). The maximum configuration is 15+15+1 (31 nodes).
Stretched clusters are supported by both hybrid- and all-flash VxRail configurations.
NOTE: This section contains only a brief design and considerations discussion. More information can be found in
VMwares Virtual SAN 6.2 Stretched Cluster Guide: http://www.vmware.com/files/pdf/products/vsan/VMwareVirtual-SAN-6.2-Stretched-Cluster-Guide.pdf

Site Locality
In a conventional storage-cluster configuration, reads are distributed across replicas. In a stretched-cluster
configuration, the Virtual SAN Distributed Object Manager (DOM) also takes into account the objects fault domain,
and only reads from replicas in the same domain. That way, it avoids any lag time associated with using the intersite network to perform reads.

58

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Networking
Both Layer-2 (same subnet) and Layer-3 (routed) configurations are used for stretched-cluster deployments. A
Layer-2 connection should exist between data sites, and Layer-3 connection between the witness and the data
sites.
The bandwidth between data sites depends on workloads, but VCE recommends a minimum of 10Gbps for VxRail
appliances, and that should accommodate the stretched cluster. (In two-node ROBO configurations, dedicated
1Gbps may suffice, but it still depends on workload activity.) The supported latency for witness hosts is up to
500ms RTT and a bandwidth of 2Mbps for every 1,000 Virtual SAN objects. Also bear in mind that the latency
between data sites should be no be greater than 5ms, with the estimated distance for a 5ms RTT is 500km or about
310 miles.

Stretched-Cluster Heartbeats and Site Bias


Stretched cluster configurations effectively have three fault domains. The first functions as the preferred data site,
the second is the secondary data site, and the third is simply the witness host site.
The Virtual SAN master node is placed on the preferred site and the Virtual SAN backup node is placed on the
secondary site. As long as nodes (ESXi hosts) are available in the preferred site, then a master is always selected
from one of the nodes on this sitesimilarly for the secondary site, as long as nodes are available on the secondary
site.
The master node and the backup node send heartbeats every second. If heartbeat communication is lost for five
consecutive heartbeats (five seconds), the witness is deemed to have failed. If the witness has suffered a
permanent failure, a new witness host can be configured and added to the cluster. Preferred sites gain ownership in
case of a partition.
After a complete failure, both the master and the backup end up at the sole remaining live site. Once the failed site
returns, it continues with its designated role as preferred or secondary, and the master and secondary migrate to
their respective locations.
In terms of the communication with the witness, if the heartbeat pauses for five consecutive beats, the master
assumes that the witness failed. If its a permanent failure, a new witness host needs to be configured and added to
the cluster, and preferred sites gain ownership in case of a partition.

vSphere HA settings for Stretched Cluster


Host monitoring is enabled by default in all VxRail deployments, including of course stretched-cluster
configurations. This feature also uses network heartbeat to determine the status of hosts participating in the
cluster. It indicates a possible need for remediation, such as restarting virtual machines on other cluster nodes.
Configuring admission control ensures that vSphere HA has sufficient available resources to restart virtual
machines after a failure. This may be even more significant in a stretched cluster than it is in a single-site
cluster, because it makes the entire, multi-site infrastructure resilient. Workload availability is perhaps the
primary motivation behind most stretched-cluster implementations.
The deployment needs sufficient capacity to accommodate full-site failure. Since the stretched cluster equally
divides the number of ESXi hosts between sites, VCE recommends configuring the admission-control policy to
50 percent for both CPU and memory to ensure that all workloads can be restarted by vSphere HA.

Snapshots
Snapshots have been around for a while as a means of capturing the state of system at a particular point in time
(PIT), so that it can be rolled back to that state if need be after a crash. In the case of the VxRail solution,
administrators can create, roll back, or delete VM snapshots using the Snapshot Manager in the vSphere Web
client. Each VM supports a chain of up to 32 snapshots.
A virtual machine snapshot generally includes the settings (.nvram and .vmx) and power state, state of all the
VMs associated disks, and optionally, the memory state. Specifically, each snapshot includes:
2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

59

VXRAIL CONCEPTS AND ARCHITECTURE


Delta disk:
o

The state of the virtual disk at the time the snapshot is taken is preserved. When this occurs, the guest OS
is unable write to its .vmdk file. Instead, changes are captured in an alternate file named VM_namedelta.vmdk.

Memory-state file:
o

VM_name-Snapshot#.vmsn, where # is the next number in the sequence, starting with 1. This file
holds the memory state since the snapshot was taken. If memory is captured, the size of this file is the
size of the virtual machines maximum memory. If memory is not captured, the file is much smaller.

Disk-descriptor file:
o

VM_name-00000#.vmdk, a small text file that contains information about the snapshot.

Snapshot-delta file:
o

VM_name-00000#-delta.vmdk, which contains the changes to the virtual disks data at the time the
snapshot was taken.

VM_name.vmsd:
o

This snapshot list file is created when virtual machine itself is deployed. It maintains VM snapshot
information that goes into a snapshot list in the vSphere Web Client. This information includes the name of
the snapshot .vmsn file and the name of the virtual-disk file.

The snapshot state uses a .vmsn extension and stores the requisite VM information at the time of the snapshot.
Each new VM snapshot generates a new .vmsn file. The size of this file varies, based on the options selected during
creation. For example, including the memory state of the virtual machine increases the size of the .vmsn file. It
typically contains the name of the VMDK, the display name and description, and an identifier for each snapshot.
Other files might also exist. For example, a snapshot of a powered-on virtual machine has an associated
snapshot_name_number.vmem file that contains the main memory of the guest OS, saved as part of the
snapshot.
A quiesce option is available to maintain consistent point-in-time copies for powered-on VMs. VMware tools may
use their own sync driver or use Microsofts Volume Shadow Copy Service (VSS) to quiesce not only the guest OS
files system, but also any Microsoft applications that understand VSS directives.

How Snapshots Work


Virtual SAN snapshots use an efficient, on-disk Virtual SANSparse format. When a base-disk snapshot is taken, it
creates a child delta disk. The parent functions as a static, PIT copy. Meanwhile the child delta starts a snapshot
chain, recording the virtual-machine write history. The delta disk snapshot object is made up of a set of grains,
where each grain is a block of sectors containing virtual-disk data. The deltas keep only changed grains, which
makes them space efficient.
In the diagram below (Figure 56), the base disk object is called Disk.vmdk and sits at the bottom of the chain. The
chain includes three snapshot objects (Disk-001.vmdk, Disk-002.vmdk and Disk-003.vmdk) that have been
taken at various intervals. Various guest-OS writes have also occurred at various intervals, leading to changes in
snapshot deltas.
Base object writes to grains 1, 2, 3, and 5,
Delta object Disk-001 writes to grains 1 and 4
Delta object Disk-002 writes to grains 2 and 4
Delta object Disk-003 writes to grains 1 and 6

60

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Figure 56: Snapshot Chain


A virtual-machine read would return the following:
Grain 1 retrieved from Delta object Disk-003
Grain 2 retrieved from Delta object Disk-002
Grain 3 retrieved from Base object
Grain 4 retrieved from Delta object Disk-002
Grain 5 retrieved from Base object 0 returned as it was never written
Grain 6 retrieved from Delta object Disk-003
The diagram below (Figure 57) reuses the example above to illustrate the Virtual SANSparse driver and its inmemory cache.

Figure 57: Virtual SANSparse Driver

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

61

VXRAIL CONCEPTS AND ARCHITECTURE

When a guest OS sends a write request, the Virtual SANSparse driver writes the data to the top-most object in the
snapshot chain and updates its in-memory read cache. On subsequent reads, the Virtual SANSparse references the
in-memory read cache to determine which delta (or deltas) to read. The read requests are sent in parallel to all
deltas that have the necessary data.

Managing Snapshots
Administrators use the Snapshot Manager to review active virtual-machine snapshots and perform limited
management operations, including:
Delete, which commits snapshot data to its parent snapshot
Delete All, which removes all the snapshots, including the parent
Revert to, which rolls back to a referenced snapshot so that it becomes the current snapshot
Note that deleting a snapshot consolidates the changes between snapshots and previous disk states. It also writes
to the parent disk all data from the delta disk and the deleted snapshot. When the parent is deleted, all changes
merge with the base VMDK.
Administrators also should remember to monitor read cache, because snapshotsused extensivelycan consume
RC at a higher-than-optimal rate.
NOTE: For full details regarding VxRail snapshot technology, refer to Virtual SANSparse Tech Note for Virtual SAN
6.0 Snapshots at https://www.vmware.com/files/pdf/products/ SAN

Deduplication and Compression


Many IT sites want their storage solution to include data-reduction technology. For some, its more of a
requirement than for others. Naturally, environments with highly redundant datafull-clone virtual desktops for
instance, or homogenous-server operating systemsbenefit the most from deduplication. Likewise, compression
makes more of an impact on resources that compress well: text, bitmap, and program files. For these
environments, deduplication and compression can dramatically reduce the amount of physical storage consumed,
resulting in a lower total cost of ownership.
It may sound obvious, but considering that deduplication and compression algorithms consume CPU and memory,
its important to verify that the stored data in question is actually compressible. Sometimes data has already been
compressedfor example, certain graphics formats and video files, or encrypted files. These may ultimately yield
little or no reduction at all in storage consumption from compression.

Advantages of Data-Reduction Technology


Several years ago, when NAND flash started to appear in storage arrays, a gulf separated HDDs from flash drives in
terms of cost/GB. Flash cost fifteen times more than magnetic devices. The introduction of deduplication and
compression techniques in the data path helped create the market segment of all-flash arrays (AFAs), which were
effective in reducing the cost of flash for tier-1 applications, despite the high cost of a global-lookup table for
fingerprints.
More recently, the cost of NAND flash has dropped 50 percent, and an all-flash configuration is suddenly very
attractive for more than just tier-1 workloads. It also has the opportunity to better balance the data-reduction
target and the consumption of CPU against memory and network resources on an appliance like VxRail. This is
precisely where data reduction benefits VxRail customers. The appliance includes in-line deduplication and
compression at a disk-group level.

62

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Remember our conversation about flash endurance and drive writes per day (DWPD)? Currently, the price-per-GB
of a 1DWPD flash drive is about 4.5 times more expensive than that of a 10K RPM HDD. If cost alone is the issue, a
data reduction of 4.5 times makes the price of an all-flash appliance compatible with the cost of a hybrid
configuration.
But cost is not the only factor worth considering here. As HDD capacity increases, so does the gap in performance
between HDDs and flash disks. In other words, capacity grows much faster than performance. In terms of
IOPS/GB, a 3.8TB flash performs 50 times better than 1.2TB 10K rpm HDD, and it has a latency advantage of at
least a 10 to 1.
Because of its data-reduction technology, the all-flash VxRail configuration in particular has found the sweet spot in
terms of price-performance, even if the compression ratio is lower than 4:1. An all-flash appliance provides a
significantly higher throughput and a much more predictable performance behavior at an attractive cost.

In-line Deduplication and Compression per Disk Group


In Virtual SAN, deduplication occurs when data is de-staged from the cache tier. It uses a fixed block-length
deduplication (4KB blocks), which increases the chances of finding duplicated blocks. Virtual SAN performs the
deduplication algorithm within each disk group and reduces redundant copies into one copy (as in Figure 58 below).
Redundant blocks across multiple disk groups, though, are not deduplicated.
This is a smart technique. By deduplicating only when de-staging, the implementation minimizes the CPU overhead
of creating hash keys for new writes directed to the same cache locality. By limiting the deduplication domain to a
disk group, Virtual SAN further diminishes network overhead and CPU utilization. It avoids the requirement of a
global lookup table, which would add a sizable resource overhead. This way, resources can track to a smaller and
more meaningful block size.
Compression occurs after deduplication, but before the data is de-staged from the cache to the capacity tier. Virtual
SAN only stores compressed data if it can reduce a unique 4KB block to 2KB. Otherwise, the block is written
uncompressed, avoiding misalignment and resource waste.

Figure 58: Deduplication

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

63

VXRAIL CONCEPTS AND ARCHITECTURE


Latency and Resource Consumption
Performance overhead should be expected during a read miss, or during decompression when data moves from the
capacity to the performance tier. However, dont overlook the fact that any overhead is mitigated by low-latency
flash-disk response timesnearly 1ms for small-block I/O. Meanwhile, write latency should not be affected.
The metadata created in the data-reduction process is kept in the capacity tier and can consume between 3 5
percent of the flash-disk space.

Enabling Deduplication and Compression


In VxRail appliancesincluding stretched-cluster implementationsdata-reduction operations use cluster-wide
settings. Deduplication and compression are disabled by default, so they need to be enabled. (See Figure 59.) They
become activated at the same time, which executes an online rolling reformat on all the disks in the Virtual SAN
cluster. If deduplication and compression become disabled at some point, turning them back on triggers another
rolling-reformat execution.

Figure 59: Deduplication and Compression Enabled

Erasure Coding
When it comes to fault tolerance and data protection, purely conventional data-replication services are not the most
workable solution for a distributed storage system, because replication consumes so much storage space. Erasure
coding provides a practical alternative for all-flash VxRail configurations. It breaks up data into fragments, and
distributes redundant chunks of data across the system.
Erasure codes introduce redundancy by using data blocks and striping. We briefly discussed striping earlier, and we
wont go too far into explaining it here, because it could lead to an unnecessary investigation of RAID technology.
But basically, data blocks are grouped in sets of n, and for each set of n data blocks, a set of p parity blocks exists.
Together, these sets of (n + p) blocks make up a stripe. The crux is that any of the n blocks in the (n + p) stripe is
enough recover the entire data on the stripe.
In VxRail clusters, the data and parity blocks that belong to a single stripe are placed in different ESXi hosts in a
cluster, providing a layer of fault tolerance for each stripe. Stripes dont follow a one-to-one distribution model. Its
not a situation where the set of n data blocks sits on one host, and the parity set sits on another. Rather, the
algorithm distributes individual blocks from the parity set among the ESXi hosts.

64

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


The diagrams below (60 and 61) illustrate the implementation. A 3+1 stripe uses 3 data blocks and 1 parity block.
It requires a minimum of four hosts or four fault domains to ensure availability in case one of the hosts or disks
fails. This is recognized as a RAID-5 network implementation.

Figure 60: RAID-5 Network

A RAID-6 implementation with a 4+1 configuration requires at least six hosts.

Figure 61: RAID-6 Network

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

65

VXRAIL CONCEPTS AND ARCHITECTURE


Look at the comparison of usable capacity in the graph below (Figure 62). The erasure-code protection method
increases the usable capacity up to 50 percent compared to mirroring.

Figure 62: Erasure coding increases usable capacity up to 50 percent.

An all-flash VxRail node, using 3.84TB drives has up to 19.2TB of raw capacity (5 x 3.84).
When using mirroring as the protection method and an FTT policy of 1, the usable capacity is 9.6TB.
When using Erasure Coding as the protection method and FTT=1, the usable capacity is 14.4TB

Enabling Erasure Coding


As mentioned in the section on Storage Policy Based Management, a rule called Fault Tolerance Method lets
administrators choose between RAID-1 (Mirroring) and RAID-5/6 (Erasure Coding). The FTT policy (in Figure 63)
determines the number of parity blocks written by the erasure code.

Figure 63: FTT policy determines the number of parity blocks written by the erasure code

VxRail implements erasure coding at a very granular level, and it can be applied to VMDKs, making for a nuanced
approach. Configurations for VMs with write-intensive workloadsa database log, for instancecan include a
mirroring policy, while the data component can include an erasure coding.
66

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


Requirements
Erasure coding requires a minimum number of fault domains to ensure availability. (Remember that if no fault
domains have been defined, an individual host becomes fault domain.)

Overhead Issues (RAID-5 and RAID-6)


Erasure coding saves space, yes, but the cost is performance. Computing parity blocks consumes CPU cycles and
adds overhead to the network and disks, as does distributing data slices across multiple hosts. This extra activity
can affect latency and overall IOPS throughput.
The rebuild operation also adds overhead. In general, rebuild operations multiply the number of reads and network
transfers used for replication. A formula is available here, too. If, n refers to the number of blocks in a stripe, then
the rebuild operations cost n times that of ordinary replication. For a 3+1 stripe, that means three disk reads and
three network transfers for every one of conventional data-replication. The rebuild operation can also be invoked to
serve read requests for currently available data.
This additional I/0 is the primary reason why only all-flash VxRail configurations use erasure coding. The rationale
here is that the flash disks compensate for the extra I/O.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

67

VXRAIL CONCEPTS AND ARCHITECTURE

Integrated Solutions
STORAGE TIERING WITH CLOUDARRAY
EMC CloudArray, EMCs cloud storage gateway, is integrated into VxRail and it seamlessly extends the appliance to public
and private clouds to securely expand managed storage capacity. Cloud-storage gateways make it possible to take
advantage of storage services from both public and private cloud storage providers while maintaining predictable
performance behavior. EMC CloudArray is accessed through VxRail Manager Extension and provides an additional 10TB of
on-demand cloud storage per appliance. EMC CloudArray currently provides connections (APIs) to over 20 different public
and private clouds including EMC ViPR, VMware vCloud Air, Rackspace, Amazon Web Services, Google Cloud, EMC Atmos,
and Openstack. VxRail CloudArray can provide an elegant, seamless solution for cost-efficient cold (inactive) data storage
or an easily accessible online archive with predictable performance behavior.
VxRail deploys CloudArray as a virtual appliance, a preconfigured, ready-to-run VM packaged with an operating system
and a software application. Self-contained virtual appliances make it simpler to acquire, deploy, and manage applications.
The CloudArray virtual appliance is essentially a VM already installed with and running the EMC CloudArray software
application. The communication between the VxRail VMs and the CloudArray VM takes place through the VM IP network.
An iSCSI initiator is configured on the VMs guest OS to connect it to the CloudArray VM, and the IP address of the
CloudArray VM is defined as the iSCSI target. Diagram 64 below illustrates the implementation.

Figure 64: CloudArray Communication


When using VxRail and CloudArray for cloud tiering, virtual disks (vdisks) are first created in the VSAN Datastore for the
CloudArray virtual appliance to use as cache sources. CloudArray identifies these vdisk devices as cache sources and
places them in pools, and the cache sources then allocate the capacity into different-sized spaces, or cache areas.
For the VxRail Appliance, CloudArray creates volumes using specific volume-provisioning definitions associated with the
cache area. These definitions determine whether the volume accesses capacity from a cloud service or remains local
(cloudless). Typically, local provisioning requires large cache areas that can store 100 percent of the volume capacity
locally. Large cache areas accommodate frequently accessed volumes. Less-active volumes are generally provisioned
using small cache areas and leverage a cloud provider for capacity.

68

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Observe in the illustration (Figure 65) below that Vol1 requires 10GB of capacity from Cache1, which can provide up to
600GB of capacity. On the other hand, Vol2 requires 100GB of capacity from Cache2, which is allocated from a cache area
that provides only 25GB of capacity. Regardless of the cache area size, the cache always maintains the most recently
accessed data, and the less frequently accessed data can be tiered to a cloud.

Figure 65: CloudArray Cache Sources


CloudArray can also create and schedule in-cloud snapshots, which are extremely space efficient and can be controlled via
age-based retention controls. A granular bandwidth scheduler helps optimize WAN utilization by enabling the scheduling
and bandwidth control used by CloudArray. Local caching naturally reduces bandwidth consumption and data latency, and
only changed data blocks are sent to the cloud after the initial data is delivered.
CloudArray also provides a multi-layered AES 256-bit encryption. Both data and metadata are encrypted separately, with
two different sets of keys. Furthermore, the keys themselves are password protected.

Figure 66: CloudArray Local and Cloud storage

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

69

VXRAIL CONCEPTS AND ARCHITECTURE


In conclusion, CloudArray offers VxRail environments a valuable set of extended services to make cloud tiering simple,
secure, reliable, and efficient. For more information about CloudArray, refer to EMC CloudArray Product Description and
Administrator Guides:
https://www.emc.com/collateral/guide/h13456-cloudarray-pdg.pdf
http://uk.emc.com/collateral/TechnicalDocument/docu60786.pdf

INTEGRATED BACKUP AND RECOVERY WITH VSPHERE DATA


PROTECTION (VDP)
VxRail appliances interoperate with vSphere Data Protection (VDP) for extended backup and recovery services. VDP is
deployed as a Linux-based virtual appliance and includes up to 8TB of backup virtual disks per ESXi host. VDP protects
every application or VM on the VxRail appliance. It features the familiar vCenter Server interface and is powered by EMC
Avamar with built-in enterprise deduplication to reduce network bandwidth and shrink backup windows. VDP leverages
vCenter management for one-step recovery with verification, enabling 30 percent-faster backups compared to disk
backup. VDP provides agentless backup and recovery for VMs running VSAN Datastores. VDPs deduplication uses a
variable-length segment algorithm that reduces consumption in backup storage. Backup data can also be moved offsite
using replication.
VDP backs up VMs without running any services within the VM itself. APIs allow VDP to connect to the ESXi host running
the VM and to take a snapshot via a process similar to VSANs standard snapshot technology. The VDP snapshot is a
static, read-only, point-in-time reference that non-disruptively captures virtual-disk data and VM-configuration
information. The snapshot information is then copied to backup media, and VDP tracks changes to disk sectors using
changed-block-tracking (CBT).
In addition, VDP has the ability to reduce bandwidth consumption by using SCSI HotAdd for backup data transmission.
VDP attaches a vdisk to the backup storage device the same way the vdisk would attach to a VM. As long as the ESXi host
of the VM being backed up has access to the backup storage device, VDP does not use the network. (See the diagram in
Figure 67 below.) If the ESXi host cannot access the backup storage device, VDP sends the encrypted snapshot data
across the network using an incremental transmission to maintain low bandwidth.

Figure 67: vSphere Data Protection

70

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

INTEGRATED REPLICATION WITH RECOVERPOINT FOR VIRTUAL


MACHINES
EMCs RecoverPoint for VMs (RPVM) provides simple and efficient local and remote VM-level replication for VxRail
deployments. It supports synchronous or asynchronous replication over any distance and includes built-in capabilities for
workflow and disaster-recovery automation (as illustrated in Figure 68 below). RCVM is integrated with vCenter to provide
continuous data protection, built-in orchestration and automation, and recovery for VMs to any point in time. It also
features deduplication and compression and uses algorithms to reduce bandwidth consumption. Each VxRail Appliance
includes RecoverPoint for VM licenses to replicate fifteen VMs.
RecoverPoint for VMs has three architectural components which are fully integrated and deployed in a VMware ESXi server
environment: the vCenter plug-in, a RecoverPoint write-splitter embedded in vSphere ESXi, and a virtual appliance. VxRail
implements RCVM as a virtual appliance. A RecoverPoint write-splitter embeds directly into the ESXi kernel on all servers
with protected workloads, allowing replication and recovery at the virtual-disk (VMDK and RDM) granularity level.
Replication provisioning occurs through vCenter, using a simple user interface to select the destination for the replication,
define the consistency group of multiple VMs representing inter-dependent applications, set the data-protection policies,
and auto-provision VMDKs and VMs on the replicas. The automated workflows for disaster recovery include: recovery from
logical corruption to any point, failover and failback of specific consistency groups, and non-disruptive DR test. RPVMs
compression, deduplication, and advanced bandwidth-reduction algorithms dramatically decrease WAN bandwidth
consumption by up to 90 percent, saving associated communication costs. RCVM scales along with the VxRail appliance
and can support the maximum 16-appliance configuration and thousands of VMs.

Figure 68: RCVM implements a journal model that tracks changes to the virtual machine as rolling data that can be
unrolled to a specific point in time.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

71

VXRAIL CONCEPTS AND ARCHITECTURE

VxRail Use-Case Examples


VCE VxRail Hyper-Converged Infrastructure Appliances have been deployed successfully to fit many use cases. This
section describes two such use casesone for a virtual desktop infrastructure (VDI) platform and one for a remote
office/branch office IT infrastructure platform. Each use case is then highlighted in a specific customer solution
implementation. These customers benefit from the simplicity and business value of the VxRail Appliance.

USE CASE: CREATE IT CERTAINTY FOR VIRTUAL DESKTOP


INFRASTRUCTURE (VDI)
VCE VxRail Appliances are the easiest, fastest, most affordable way to implement a high performance VDI infrastructure.
Rapidly deploy an appliance that integrates market-leading compute, storage, virtualization, and management software
from EMC and VMware to set up a VDI infrastructure in minutes. Flexible configuration options and modular scalability
ensure that optimum performance and capacity are always available whether you are deploying hundreds or thousands of
virtual desktops. The VxRail Appliances highly redundant architecture, integrated EMC data protection software, and nondisruptive upgrades create certainty that virtual desktops will always be available to end users and that the user
experience will always exceed expectations. VCE VxRail Appliances are a family of hyper-converged infrastructure (HCI)
appliances that include a full suite of industry-leading data services, including replication, backup, and recovery for data
protection. Built on the foundation of VMware Hyper-Converged Software and managed through the familiar VMware
vCenter interface, VxRail Appliances provide customers with a familiar experience that also allows them to take advantage
of the hallmark benefits of VCEincreased agility, simplified operations, and lower risk.

VxRail Advantages for VDI


Quick and easy automated deployment with power-on to VM creation in minutes and easy ongoing VM management
Scalability from 80 to 600 virtual desktops per appliance, and a maximum 9,600 desktops in a fully-populated VxRail
cluster
One-click, non-disruptive patches and upgrades
Application uptime ensured through highly available VMware VSAN
Automated operational and disaster-recovery orchestration for VMs, including local and remote replication and
continuous data protection with granular recovery to any point in time
VxRail Appliances enable customers to reduce VDI footprints, saving power and infrastructure costs while minimizing
administrative burdens and lowering operational costs. The modular, just-in-time purchase approach enables predictable
evolution with a repeatable, simple, and agile means to scale on demand. VxRail Appliances can host virtual desktops from
VMware, Citrix, and other VDI vendors. Businesses can be confident that VxRail Appliances will meet performance and
capacity demands associated with desktop growth and application and user demands through continuous hardware and
software evolution. VxRail Appliances seamlessly integrate new enterprise class x86 and storage technologies and nondisruptively update to the latest VMware software to ensure that the VDI deployment can continuously modernize to meet
business demands.

72

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

00002016

VXRAIL CONCEPTS AND ARCHITECTURE

Meeting the Virtualization Challenge for Federal Agencies


Business Challenge
The IT challenges facing todays federal government organizations are much like those of their corporate counterparts:
deadlines and budgets shrink while expectations grow. They need to provide security and the freedom and flexibility to
support a mobile workforce. However, federal agencies also face the added pressures of public oversight. IT purchases
may be subject to strict procurement guidelines, require more than the typical due diligence in planning and may take
more time for purchase approvals. In one study of federal IT professionals, more than half (54 percent) said they do not
believe that their agency is able to acquire new IT resources in a timely manner. This is a challenge, especially in light of
the fact that many federal agencies are vulnerable to the problems and inefficiencies of aging IT systems and
infrastructures. The same survey noted that 77 percent felt that their agencies needed a more flexible IT infrastructure.

Business Solution
For increasing numbers of federal agencies, the answer to the challenge is IT resource virtualization. A virtual desktop
infrastructure (VDI) puts resources precisely where they are needed and in the strength they are needed at a moments
notice. Virtualized IT infrastructures are in place in most large organizations today. But until the recent advent of HCI
technology, they have been beyond the reach of smaller federal agencies or departments within large federal
organizations. With VCE VxRail Appliances, federal organizations can take advantage of a just-in-time approach to
deployment and expansion. An organization can start with a single appliance and then build out an IT infrastructure over
time. This can help expedite the procurement process by keeping incremental purchase amounts for technology below
discretionary federal agency spending limits. It can also reduce the need for overprovisioning and facilitate the creation of
a master configuration that can be replicated in other departments within the organization.
VxRail Appliances make federal agencies more confident in their IT infrastructure because they provide a pre-configured,
pre-tested solution jointly developed by EMC/VCE and VMware, trusted vendors by organizations around the world, and
they are backed by a single point of 24/7 global support for all appliance hardware and software. With VxRail Appliances,
businesses can be confident that the virtual infrastructure will work today and will lead them along the path to more
innovative technologies, from cloud computing to the software-defined data center (SDDC).

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

73

VXRAIL CONCEPTS AND ARCHITECTURE

USE CASE: SIMPLIFYING THE DISTRIBUTED ENTERPRISE


ENVIRONMENT
Distributed Enterprises usually have a central IT staff that creates the overall business network architecture for the
enterprise, and also have many remote offices that are essential to running the business, normally with limited on-site
technical staff. However, the infrastructure and data at these distributed locations is mission critical. Typical important
operations found in distributed enterprises are warehousing and distribution, manufacturing of the company's core
products and mobile or remote life-saving operations like health clinics. Support responsibilities for these remote
operations usually fall to the central IT staff. According to Enterprise Strategy Group (ESG) in their 2015 research report
Remote Office/Branch Office Technology Trends, 72 percent of organizations intend to increase spending on remote office
IT infrastructure. In addition to the above challenges, footprint is an issue because remote locations, unlike data centers,
do not have the dedicated space, power, or cooling capabilities necessary for multiple servers running multiple
applications. This means remote organizations are much more sensitive to server sprawl. While some organizations may
look to the cloud to reduce server sprawl and centralize operations, in many cases that is not feasible. This is because
offices either are in remote locations with limited Internet service or have minimal WAN bandwidth and redundancy
available. So issues such as latency and availability become limiting factors. VCE VxRail Appliances are ideal for
consolidating multiple applications in a remote location onto a single high-performance and highly-available platform that
is easy to deploy and manage.
VCE VxRail Appliances that integrate compute, storage, virtualization, and management software from EMC and VMware
are the optimal endpoints for the distributed enterprise. As an integral solution in the VCE converged infrastructure
portfolio, VxRail Appliances can be monitored with VCE Vision Intelligent Operations, enabling IT to have visibility across
the distributed solution from the same single-pane-of-glass console used to manage the data center infrastructure. The
VxRail Appliance enables customers to consolidate multiple remote office applications onto a single appliance. VMware
Virtual SAN software integrated with flash or hybrid storage ensures the highest possible performance since Virtual SAN is
embedded in the hypervisor and eliminates many data path bottlenecks. Simple deployment enables customers to be up
and running in 15 minutes. The local team only needs to plug in the appliance and power it up. All other configuration can
be done remotely. In addition, the VxRail Appliance is the only HCI appliance on the market offering Quality of Service
(QoS) functionality that eliminates noisy neighbors. This functionality makes it certain that multiple applications can be
hosted on the same appliance or in the same cluster without performance impact.

VxRail Appliance Advantages for Distributed Enterprises


Tailor compute and capacity deployment for each remote location
Simple, standard set-up reduces IT skills needed at remote locations
Part of a complete portfolio of converged infrastructure core-to-edge solutions
Backup locally at the remote office or over the WAN to central data centers with RecoverPoint for VMs
Management and visibility across the distributed enterprise with VMware tools and VCE Vision software

74

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE

Meeting the Distributed Enterprise Challenge for State and Local Agencies
Business Challenge
For state and local agencies, the promises of new technology come with unique new challenges as well. New applications,
including those for remote and mobile computing, for example, can boost user productivity, but they present performance
and data-storage demands for aging IT infrastructures that their original planners could not have foreseen. State and local
systems are barely able to keep up with the refresh cycles of their various hardware and software components, much less
adjust to todays demands for tightened service level agreements, shorter project deadlines, and shrinking budgets.
Pressured to keep costs low, agencies have difficulty justifying specialized IT technicians or even new real estate to
support an upgrade in IT infrastructure.

Business Solution
For a fast-growing segment of state and local agencies, a hyper-converged appliance is an effective solution to the
problems of high expectations and small budgets. Leveraging VxRail reduces cost by eliminating conflicting system-refresh
cycles and redundant software and the need for specialized IT technicians. VxRail Appliances provide the ability to put
compute resources where they are most needed at any given time, saving the cost of over-provisioning IT systems and
building out new office space for larger servers, storage, or networking gear. With the emergence of VxRail Appliances,
the benefits of a Software-Defined Data Center (SDDC) are within the reach of state and local agencies.
With conventional IT systems, deployment can take months, to plan, procure, install, configure, provision and test. And it
can require the services of technicians skilled in servers, storage, networking, and applications. The more time it takes for
deployment, the higher the cost and the more likely the project will be stopped in its tracks or diminished in scope by
budget-conscious regulators. VxRail Appliances avoid these pitfalls because they are totally self-contained and thoroughly
tested by EMC/VCE before they are shipped. Wizard-based automation helps non-technical staff set up pools of virtual
machines for users. Once this setup is complete, it takes just 15 minutes from power-on to creation of a new virtual
machine. Expansion is a simple matter of plugging in a new node or adding another appliance. New nodes are hot
swappable, so the appliance does not have to be powered down and no new software is required to grow your
infrastructure. In addition, VxRail Appliances have a system architecture that is predictable and repeatable, new versions
of a master configuration can be installed into other offices without new testing or troubleshooting.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

75

VXRAIL CONCEPTS AND ARCHITECTURE

PRODUCT INFORMATION
For documentation, release notes, software updates, or for information about EMC products, licensing, and service,
go to the EMC Online Support site (registration required) at: https://support.EMC.com.

PRODUCT SUPPORT
Single source, 24X7 global support is provided for VxRail Appliance hardware and software via phone, chat or
instant message. Support also includes access to online support tools and documentation, rapid on-site parts
delivery and replacement, access to new software versions, assistance with operating environment updates, and
remote monitoring, diagnostics and repair with EMC Secure Remote Services (ESRS).

EMC PROFESSIONAL SERVICES FOR VXRAIL APPLIANCES


EMC offers installation and implementation services to ensure smooth and rapid integration of VxRail Appliances
into customer networks. The standard service, optimal for a single appliance, provides an expert on site to perform
a pre-installation checklist with the data-center team, confirm the network and Top of Rack (TOR) switch settings,
conduct site validation, rack and cable, configure, and initialize the appliance. Finally, an on-site EMC service
technician will configure EMC Secure Remote Services (ESRS) and conduct a brief functional overview on essential
VxRail Appliance administrative tasks. A custom version of this installation and implementation service is available
for larger-scale VxRail Appliance deployments, including those with multiple appliances or clustered environments.
Also offered is VxRail extended service, which is delivered remotely and provides an expert service technician to
rapidly implement VxRail pre-loaded data services (RecoverPoint for Virtual Machines, vSphere Data Protection, and
CloudArray).

vSPHERE ORDERING INFORMATION


Beginning May 9, 2016, the VxRail Appliance is moving to a vSphere license independent model to allow customers
to use any existing eligible vSphere licenses. This VxRail vSphere license independent model (also called bring
your own or BYO vSphere License model or VMware Loyalty Program model or VLP model) allows customers to
leverage a wide variety of vSphere licenses they may have already purchased. Therefore, the VxRail bundled
vSphere Standard Edition licenses option will no longer be an orderable option.
For the VxRail BYO vSphere license model, several vSphere license editions are supported including Enterprise+,
Standard, and ROBO editions. Also supported are vSphere licenses from Horizon bundles or add-ons when the
appliance is dedicated to VDI. Using vSphere licenses editions other than Enterprise+ editions requires VxRail 3.5,
which will be available in June.

76

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

VXRAIL CONCEPTS AND ARCHITECTURE


If vSphere BYO licenses need to be purchased, they should be ordered through the customers preferred VMware
channel partner or from VMware directly. vSphere licenses will be orderable from EMC in July. BYO license acquired
through VMware ELA, VMware partners or EMC will receive singe call support from EMC. See the VMWare Loyalty
Program (VLP) FAQ on the enablement center (https://www.emc.com/collateral/faq/vmware-vsphere-loyaltyprogram-vce-vxrailappliances.pdf) for additional details.

WED LIKE TO HEAR FROM YOU!


Feedback will help us continue to improve the accuracy, organization, and overall quality of EMC user publications.
Please send feedback regarding this TechBook to: techpubcomments@emc.com.

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

77

VXRAIL CONCEPTS AND ARCHITECTURE

ABOUT VCE
VCE, an EMC Federation Company, is the world market leader in converged infrastructure and converged solutions. VCE
accelerates the adoption of converged infrastructure and cloud-based computing models that reduce IT costs while improving
time to market. VCE delivers the industry's only fully integrated and virtualized cloud infrastructure systems, allowing customers
to focus on business innovation instead of integrating, validating, and managing IT infrastructure. VCE solutions are available
through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development
environments, allowing customers to focus on business innovation instead of integrating, validating, and managing IT
infrastructure.
For more information, go to vce.com.

Copyright 2010-2016 VCE Company, LLC. All rights reserved. VCE, VCE Vision, VCE Vscale, Vblock, VxBlock, VxRack, VxRail, and the VCE logo are registered
trademarks or trademarks of VCE Company LLC. All other trademarks used herein are the property of their respective owners.

78

2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.

Anda mungkin juga menyukai