CAP1426 - The Benefits of Virtualization For Middleware PDF

APP-CAP1426
The Benefits of Virtualization for Middleware
Jeff Battisti, Cardinal Health Emad Benjamin, VMware, Inc.
#vmworldapps
Disclaimer
This session may contain product features that are

currently under development.
This session/overview of the new technology represents

no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in

contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
About the speaker
I have been with VMware for the last 7 years, working on Java
and vSphere
20 years experience as a Software Engineer/Architect, with last 15

years focused on Java development
Open source contributions Prior work with Cisco, Oracle, and Banking/Trading Systems. Authored the Enterprise Java Applications Architecture on VMware
Disclaimer
This session may contain product features that are

currently under development.
This session/overview of the new technology represents

no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in

contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
Agenda

Conventional Middleware Platforms Middleware Platform Architecture on VMware vSphere
Design and Sizing Middleware Platforms

Performance Benefits of Virtualizing Middleware Customer Success Stories Questions
Conventional Middleware Platform
Enterprise Java applications are multitier (Client-Server) The - in Client-Server is essentially the Middleware
Load Balancer Tier

Load Balancers
Web Server Tier

Web Servers
Java App Tier

Java Applications
DB Server Tier
DB Servers
IT Operations Network Team
IT Operations Server Team
IT Apps Java Dev Team
IT Ops & Apps Dev Team
Organizational Key Stakeholder Departments
Middleware Platform Architecture on vSphere

Load Balancers as VMs
High Uptime, Scalable, and Dynamic Enterprise Java Applications
Load balancers Web Servers Java Applications DB Servers
Web Servers
VMware vSphere
APPLICATION SERVICES
Java Application Servers
Capacity On Demand
Dynamic
High Availability
SHARED INFRASTRUCTURE SERVICES
SHARED, ALWAYS-ON INFRASTRUCTURE
VMware vFabric
Programming Model
Rich Web
Social and Mobile
Data Access
Integration Patterns
Batch Framework
Spring Tool Suite
WaveMaker
Cloud Foundry
Java Runtime (tc Server)
Web Runtime (ERS)
Data Director)
Messaging Global Data In-mem SQL (SQLFire) (RabbitMQ) (GemFire)
App Monitoring Performance Mgmt (Spring Insight) (Hyperic)
Java Optimizations (EM4J, )
Virtual Datacenter
Cloud Infrastructure and Management
App Director
Step 2 Establish Benchmark
Scale Up Test
ESTABLISH BUILDING BLOCK VM Establish Vertical scalability Scale Up Test Establish how many JVMs on a VM? Establish how large a VM would be in terms of vCPU and memory
Investigate bottlnecked layer Network, Storage, Application Configuration, & vSphere
Building Block VM
If building block app/VM config problem, adjust & iterate
If scale out bottlenecked layer is removed, iterate scale out test
No
Test complete
SLA OK?
Scale Out Test

Building Block VM Building Block VM Building Block VM
DETERMINE HOW MANY VMs Establish Horizontal Scalability Scale Out Test How many VMs do you need to meet your Response Time SLAs without reaching 70%-80% saturation of CPU? Establish your Horizontal scalability Factor before bottleneck appear in your application
Design and Sizing HotSpot JVMs on vSphere
VM Memory
Guest OS Memory -Xss per thread Java Stack JVM Memory Perm Gen
Other mem
-XX:MaxPermSize
Direct native
memory Non Direct Memory Virtual
JVM Max Heap -Xmx
Initial Heap
-Xms
Address space
10
Design and Sizing of HotSpot JVMs on vSphere
Guest OS=Memory (depends on OS/other processes) VM Memory Guest OS approx Memory1G + JVM Memory
JVM Memory = JVM Max Heap (-Xmx value) + JVM Perm Size (-XX:MaxPermSize) +
Perm Size is an area additional the Xmx (Max Heap) value and NumberOfConcurrentThreads * (-Xss) +to other Mem is not GC-ed because it contains class-level information. other mem is additional mem required for NIO buffers, JIT code
cache, classloaders, Socket Buffers (receive/send), JNI, GC internal info
If you have multiple JVMs (N JVMs) on a VM then: VM Memory = Guest OS memory + N * JVM Memory
11
Sizing Example
set mem Reservation to 5088m VM Memory (5088m) JVM Memory (4588m) Guest OS Memory Java Stack
500m used by OS -Xss per thread (256k*100) Other mem (=217m)
Perm Gen
-XX:MaxPermSize (256m)
JVM Max Heap -Xmx (4096m)
Initial Heap -Xms (4096m)
12
Larger JVMs for In-Memory Data Management Systems

VM Memory for SQLFire (34g) JVM Memory for SQLFire (32g) JVM Max Heap -Xmx (30g) Set memory reservation to 34g Guest OS Memory Java Stack
0.5-1g used by OS -Xss per thread (1M*500) Other mem (=1g)
Perm Gen
-XX:MaxPermSize (0.5g)
Initial Heap -Xms (30g)
13
Middleware ESXi Dedicated Cluster

Locator/heart beat for middleware DO NOT VMotion Middleware
components
Memory Available for all VMs = 96*0.99 -1GB => 94GB Per NUMA memory => 94/2 47GB
14
96GB RAM 47GB RAM VMs with 8vCPU 2 sockets 8 pCPU per socket
ESX Scheduler
vCPU VMs Less than 47GB RAM on each VM
Each NUMA Node has 94/2 47GB
96 GB RAM on Server
15
Performance Perspective
See the Performance of Enterprise Java Applications on VMware

vSphere 4.1 and SpringSource tc Server at http://www.vmware.com/resources/techresources/10158 .
% CPU
R/T
80% Threshold
16
If given 4 vCPUs, which VM Configuration is better?

JVM-1 Web JVM-1 Web JVM-2 Web JVM-1 JVM-2 JVM-3 JVM-4 Web Web Web Web
1 VM
4vCPU 1 JVM 4GB
2 VMs
2vCPU on each VM 2 JVMs 2.5 GB each
4 VMs
1vCPU on each VM 4 JVMs 2 GB each
17
Most Common VM Size for Java workloads
2 vCPU VM with 1 JVM, for tier-1 production workloads Maintain this ratio as you scale out or scale-up, i.e. 1 JVM : 2vCPU Scale out preferred over Scale-up, but both can work You can diverge from this ratio for less critical workloads
2 vCPU VM 1 JVM (-Xmx 4096m) Approx 5GB RAM Reservation
18
However for Large JVMs + CMS

Start with 4+ vCPU VM with 1 JVM, for tier-1
in memory data management systems type of production workloads
For large JVMs 4+ vCPU VM 1 JVM (8-128GB)
Likely increase JVM size, instead of

launching a second JVM instance
Multiple 4vCPU+ will allow for

ParallelGCThreads to be allocated 50% of the available vCPUs to the JVM, i.e. 2 GC Threads +
Ability to increase ParallelGCThreads

is critical to YoungGen scalability for large JVMs
ParallelGCThreads should be allocated 50%

of available vCPU to the JVM and not more. You want to ascertain there other vCPUs available for other txns
19
Most Common Sizing and Configuration Question

JVM-1 JVM-2 JVM-1 JVM-3 JVM-2 JVM-4
Option-1 Scale out VM and JVM ( best)
Option-3 Scale out VM and JVM (3rd best)
JVM-1
JVM-2
JVM-1
JVM-2
Option-2 Scale Up JVM heap size (2nd best)

20
What else to consider when sizing?

JVM-2 Web Job Web Job
Mixed workloads Job Scheduler vs Web app require

different GC Tuning
Job Schedulers care about Throughput

Web apps care about minimize latency and
response time
You cant have both reduced response time and

increased throughput, without compromise
Separate the concerns for optimal tuning

JVM-3 Web Job JVM-4
JVM-1 Web Job
Vertical
Web
Horizontal
Job
21
Which GC?
ESX doesnt care which GC you select, because of the degree of

independence of Java to OS and OS to Hypervisor
22
Tuning GC Art Meets Science!
Either you tune for Throughput or Latency, one at the cost of the
other Reduce Latency improved R/T reduce latency impact slightly reduced throughput
Tuning Decisions
improved throughput longer R/T increased latency impact
Increase Throughput
23
Parallel Young Gen and CMS Old Gen

-Xmn Xmx minus Xmn
Young Generation Minor GC

Parallel GC in YoungGen using XX:ParNewGC & XX:ParallelGCThreads
Old Generation Major GC

Concurrent using in OldGen using XX:+UseConcMarkSweepGC
S S 0 1
minor GC threads
24
application threads
concurrent mark and sweep GC
High Level GC Tuning Recipe

Step CSurvivor Spaces Tuning Measure Minor GC Duration and Frequency Adjust Xmn Young Gen size and /or ParallelGCThreads Adjust Xmn And/or SurvivorSpaces
Step A-Young Gen Tuning
Measure Major GC Duration And Frequency
Adjust Heap space Xmx
Step B-Old Gen Tuning

25
CMS Collector Example

java Xms30g Xmx30g Xmn10g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC _XX:CMSInitiatingOccupancyFraction=75 XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC -XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking -XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=4 -XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseCompressedStrings
-XX:+UseStringCache
This JVM configuration scales up and down effectively -Xmx=-Xms, and Xmn 33% of Xmx -XX:ParallelGCThreads=< minimum 2 but less than 50% of available
vCPU to the JVM. NOTE: Ideally use it for 4vCPU VMs plus, but if used on 2vCPU VMs drop the -XX:ParallelGCThreads option and let Java select it
26
Middleware on VMware Best Practices

Enterprise Java Applications on VMware Best Practices Guide Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs High Performance Data with VMware vFabric GemFire Best Practices Guide http://www.vmware.com/resources/techresources/1087
http://www.vmware.com/resources/techresources/10220
http://www.vmware.com/resources/techresources/10231
27
Middleware on VMware Best Practices Summary
Follow the design and sizing examples we discussed thus far Set appropriate memory reservation Leave HT enabled, size bases on vCPU=1.25pCPU if needed RHEL6 and SLES 11 SP1 have tickless kernel that does not rely on a high frequency interrupt-based timer, and is therefore much friendlier to virtualized latency-sensitive workloads
Do not over commit memory Locators/heartbeat process should not be vMotion migrated, it
otherwise would lead to network split brain problems
vMotion over 10Gbps when doing scheduled maintenance Use Affinity and Anti-Affinity rules to avoid redundant copies on
the same VMware ESX/ESXi host
28
Middleware on VMware Best Practices
Disable NIC interrupt coalescing on physical and virtual NIC Extremely helpful in reducing latency for latency-sensitive virtual
machines
Disable virtual interrupt coalescing for VMXNET3

It can lead to some performance penalties for other virtual machines on the
ESXi host, as well as higher CPU utilization to deal with the higher rate of interrupts from the physical NIC
This implies it is best to use dedicated ESX cluster for Middleware

Platforms All host are configured the same way for latency sensitivity and this insures
non middleware workloads, such as other enterprise applications are not negatively impacted
29
SQLFire vs. Traditional RDBMS
30
31
SQLFire scaled 4x compared to RDBMS Response times of SQLFire are 5x to 30x faster than RDBMS Response times on SQLFire are more stable and constant with increased load. RDBMS response times increase with increased load
32
Middleware on VMware Benefits
Flexibility to change compute resources, VM sizes, add more hosts Ability to apply hardware and OS patches while minimizing
downtime
Create more manageable system through reduced middleware

sprawl
Ability to tune the entire stack within one platform Ability to monitor the entire stack within one platform Ability to handle seasonal workloads, commit resources when
they are needed and then remove them when not needed
33
Cardinal Health Java on WebSphere
Jeff Battisti
Sr. Enterprise Architect August 30, 2012
Copyright 2011, Cardinal Health, Inc. or one of its subsidiaries. All rights reserved.
About Cardinal Health

Founded in 1971 Leading provider of products and services across the healthcare supply chain; extensive footprint across multiple channels Serving >50,000 customers at >60,000 healthcare sites across North America daily Approximately one-third of all distributed pharmaceutical, laboratory and medical products in the U.S. and Puerto Rico flow through the Cardinal Health supply chain More than 30,000 employees; direct operations in 10 countries Number 19 on Fortune magazines list of 500 largest U.S. corporations
The business behind healthcare with the broadest view of the supply chain
Agenda
Virtualization journey Why virtualize on WebSphere on VMWare Factors Impacting Migration/Expansion Virtualization Questions to answer
Performance and scalability High availability Licensing
Summary
36
Virtualization
Journey
Theme Centralized IT Shared Service Capital Intensive - High Response
Variable Cost SubscriptionServices
Timeline
Virtual
2005 2008
Consolidation < 40% Virtual <2,000 VMs <2,355 physical Data Center Optimization 30 DCs to 2 DCs
2009 2012
Internal cloud >81% Virtual >4,054 VMs <1147 physical Power Remediation P2Vs on refresh
2013 2016
Cloud Resources >95% Virtual >10,000 VMs <800 physical Optimizing DCs Internal disaster recovery Metered service offerings (SAAS, PAAS, IAAS) Shrinking HW Footprint > 50% Utilization > 100:1 VM/Physical Cloud Computing Virtualized Databases Open Source Migrations
DC
HW
Transition to Blades <10% Utilization <10:1 VM/Physical
HW Commoditization 25% Utilization 60:1 VM/Physical
SW
Low Criticality Systems 8X5 Applications
Business Critical Systems WebSphere ~ 490 Unix to Linux ~ 655 SAP ~ 550
37
Virtualization
Why Virtualize WebSphere on VMWare

DC strategy alignment
Pooled resources capacity ~15% utilization Elasticity for changing workloads Unix to Linux (>$36 million savings) Disaster Recovery w/o license implications
Simplification and manageability

High availability for thousands instead of thousands of high availability solutions Network & system management in DMZ
Five year cost savings ~ $6 million

Hardware Savings ~ $660K WAS Licensing ~ $862K Unix to Linux ~ $3.7M DMZ ports~ >$1M
38
Virtualization
Questions to Answer
Alignment across teams and consistent messaging to stakeholders was critical Can the system perform & scale?
JVM Stacking & Vertical scaling Massive OLTP Performance Memory - over commitment, DRS, and large JVMs IO Impacts
Will VMWare provide challenges to the WebSphere HA Model?
Affinity/Anti-affinity Complexity must remain low Storage, VMWare, servers, chassis, and switches cannot be a single point of failure User error should not take down both sides of critical applications
Can we maintain or reduce WebSphere license costs?
JVM stacking Server affinity Over commitment in non-production systems
39
Factors impacting Migration/Expansion

150% user growth & 90% of traffic is in a 3 hour time frame Moving to a new data center New DMZs In both data centers AIX to SUSE Linux migration Virtualizing on VMWare Upgrade WebSphere Stack:
WebSphere Commerce 6.0 to 7.0 WebSphere Portal 6.1.5 to 7.0 WebSphere Content Mgr 6.1.5 to 7.0 WAS Services layer WAS 6.1 to WAS 7.0 DB2 Platform 9.5 to v9.7 J2EE 1.4 | 32 bit to JEE 6 | 64 bit Siteminder Upgrade
Non-functional enhancements:
Central Logging framework Remote Portlet Rendering Positiong for future cross datacenter failover 64-bit JVM and JEE 6 support
Systems 111 DB Instances 31 Portal Servers 19 Commerce Servers 15 WAS Servers 7 WCM Servers 3 BODL Servers 11 Business Object Servers 15 Endecca Servers 9 Planet Press Servers
40
Performance & Scalability
WebSphere VM Performance
Sweet spot is between 2 4 cores JVM stacking/vertical scaling reduced, but still in place for licensing reasons
20% Performance Difference 2 to 4 cores
Source: http://www.vmware.com/resources/techresources/10095
Performance & Scalability
Intel vs. Power 6 Java Performance

Intel systems out performed Power systems by over 300%
SpecJbb
2,478,929
2,500,000
2,000,000
1,433,000
1,500,000
1,000,000
509,962 350,642
500,000
0 HP BL490 Nehalem 2.93 GHz 8 cores HP DL580 G7 32 cores* IBM Power7 750 3.55GHz IBM Power6 550 4.2 GHz 32 cores 8 cores
Order Express Performance

9 8 7 6 5 250% 4 3 145% 2 1 0 150% 100% 207% 200% 400% 348% 316% 500%
150% customer growth 454%
450%
400% 350% 300%
50% 13%
View Dashboard Order Submit Buy Now View Product Details Quick Add Search Results View Cart 0%
Version 1
Version 2
% Diff
High Availability
Low Complexity
High availability for thousands instead of thousands of high availability solutions
Challenges: Affinity/Anti-affinity Complexity No SPOFs User Error

44
Licensing
WebSphere Non Production Model

Using Server Affinity with failover & CPU over allocation we save millions in licensing
45
Summary
Culture shift from virtualization to save costs to virtualization for superior capabilities We are continually improving performance, resiliency, availability, and manageability The Simplicity of this design enables us to effectively and efficiently manage these systems
Thanks you and are there any Questions?

Emad Benjamin, ebenjamin@vmware.com You can get my book here: https://www.createspace.com/3632131
47
Backup slides
48
ESX Memory Management
Ballooning makes the Guest OS aware that ESX Host memory is short Due to VMs isolation, the Guest OS is not aware that it is in a VM and is
not aware of the state of other VMs, or the ESX Host memory situation
Balloon driver is loaded into Guest OS, communicates with ESX via
private channel, and ESX will instruct it to inflate (allocate Guest OS Physical Pages) by an amount. An amount that it needs to reclaim.
49
Why did we use Memory Reservation?
Memory Reservation
We used an amount of 5088MB as Memory Reservation for the VM when all
the various memory segments were added up.
This is the physical ESX Host memory guaranteed to be available to the VM

up-on startup such that memory over commitment is avoided
Hence ESX memory management techniques such as Ballooning and

Swapping avoided in order to preserve performance
50
Guest virtual memory is mapped to Guest Physical is in turn mapped to

ESX Physical Memory
51
The below shows an over committed memory situation, only 4GB

physical is available but 6GB has been allocated to the VMs
ESX uses Transparent Page Sharing (TPS), Ballooning, and host

swapping, in order to support memory reclamation in an overcommitted situation
52
GC Policy Types
GC Policy Type Description
Concurrent GC
Concurrent Mark and Sweep, no compaction Concurrent implies when GC is running it doesn't pause your application threads this is the key difference to throughput/parallel GC Suited for application that care more about response time than throughput CMS does use more heap when compared to throughput/ParallelGC CMS works on OLD gen concurrently, but young generation is collected using ParNewGC, a version of the throughput collector Has multiple phases: Initial mark (short pause) concurrent mark (no pause)
Pre-cleaning (no pause)

re-mark (short pause) Concurrent sweeping (no pause)
G1
53
Only in J7 and mostly experimental, equivalent to CMS + compacting
ESX Host Swapping this is a last resort, and things are quite bad at
this stage. If TPS and Ballooning didnt work, then ESX will swap VM memory to swap file.
54
Impact of Reducing Young Generation (-Xmn)

Young Gen Minor GC
More frequent Minor GC but shorter duration
Old Gen Major GC

Potentially increased Major GC duration
You can mitigate the increase in Major GC duration by decreasing -Xmx
55
Increasing Survivor Ratio Impact on Old Generation

Young Gen Minor GC
Old Gen Major GC
Increased Tenure ship/promotion to old Gen hence increased Major GC
S S 0 1
56
Why is Duration and Frequency of GC Important?

Young Gen Minor GC
Old Gen Major GC
We want to ensure regular application user threads get a chance to execute in between GC activity frequency frequency
Young Gen minor GC duration

57
Old Gen GC duration

Young Gen Minor GC
Parallel/Throughput GC in YoungGen using XX:ParNewGC XX:ParallelGCThreads
Old Gen Major GC

Application user threads
Concurrent using XX:+UseConcMarkSweepGC
Minor GC threads
58
Concurrent Mark and Sweep

-Xmn Xmx minus Xmn


minor GC threads
59
application threads
concurrent mark and sweep GC

-Xmn Xmx minus Xmn


60

JVM Option Description
-Xmn16g -XX:+UseConcMarkSweepGC
Fixed size Young Generation The concurrent collector is used to collect the tenured generation and does most of the collection concurrently with the execution of the application. The application is paused for short periods during the collection. A parallel version of the young generation copying collector is used with the concurrent collector. This sets whether to use multiple threads in the young generation (with CMS only!). By default, this is enabled in Java 6u13, probably any Java 6, when the machine has multiple processor cores. This sets the percentage of the heap that must be full before the JVM starts a concurrent collection in the tenured generation. The default is some where around 92 in Java 6, but that can lead to significant problems. Setting this lower allows CMS to run more often (all the time sometimes), but it often clears more quickly to avoid fragmentation.
-XX:+UseParNewGC
XX:CMSInitiatingOccupancyFraction= 51
61

XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC -XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8
Indicates all concurrent CMS cycles should start based on XX:CMSInitiatingOccupancyFraction=51 Do young generation GC prior to a full GC. Desired percentage of survivor space used after scavenge. Ratio of eden/survivor space size
62

-XX:+UseBiasedLocking
Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact. Sets the maximum tenuring threshold for use in adaptive GC sizing. The current largest value is 15. The default value is 15 for the parallel collector and is 4 for CMS.
-XX:MaxTenuringThreshold=15
63

-XX:ParallelGCThreads=6
Sets the number of garbage collection threads in the young and old parallel garbage collectors. The default value varies with the platform on which the JVM is running. Enables the use of compressed pointers (object references represented as 32 bit offsets instead of 64-bit pointers) for optimized 64-bit performance with Java heap sizes less than 32gb.
-XX:CompressedOops
-XX:+OptimizeStringConcat
-XX:+UseCompressedStrings
Optimize String concatenation operations where possible. (Introduced in Java 6 Update 20)
Use a byte[] for Strings which can be represented as pure ASCII. (Introduced in Java 6 Update 21 Performance Release) Enables caching of commonly allocated strings
-XX:+UseStringCache
64
IBM JVM - GC choice

-Xgc:mode -Xgcpolicy:Optthruput (Default) Usage Performs the mark and sweep operations during garbage collection when the application is paused to maximize application throughput. Mostly not suitable for multi CPU machines. Performs the mark and sweep concurrently while the application is running to minimize pause times; this provides best application response times. There is still a stop-the-world GC, but the pause is significantly shorter. After GC, the app threads help out and sweep objects (concurrent sweep). Treats short-lived and long-lived objects differently to provide a combination of lower pause times and high application throughput. Before the heap is filled up, each app helps out and mark objects (concurrent mark). Example Apps that demand a high throughput but are not very sensitive to the occasional long garbage collection pause Apps sensitive to long latencies transactionbased systems where Response Time are expected to be stable
Xgcpolicy:Optavgpause
-Xgcpolicy:Gencon
Latency sensitive apps, objects in the transaction don't survive beyond the transaction commit
65
jRockit JVM - GC choice

-Xgc:mode -Xgc:throughput (Default) -Xgc:genpar -Xgc:singlepar (non-gen) -Xgc:parallel (non-gen) Usage Optimizes for max throughput Example Apps that demand a high throughput but are not very sensitive to the occasional long garbage collection pause Apps sensitive to long latencies transactionbased systems where Response Time are expected to be stable
-Xgc:pausetime -Xgc:gencon -Xgc:singlecon (non-gen) -Default pause target is 500ms
Optimizes for short and even pause times. Can use -XpauseTarget:time The pause target affects the application throughput. A lower pause target inflicts more overhead on the memory management system. Optimizes for very short and deterministic pause times Can use XpauseTarget:time
-Xgc:deterministic
Apps with deterministic latencies transaction-based applications such as brokerage
66
What is the practical limit for JVM Memory sizing (not to scale)
Most limiting practical sizing factor is the per NUMA node RAM Guest OS Limit 1 to 16 TB 16 Exa Bytes
64 bit Java Theoretical Limit
ESX5i limit 32vCPU 1TB RAM
Physical Server limit ~256G <1TB
Per NUMA RAM
67
What are the practical and theoretical limits of JVM sizes

Java is 64bit hence theoretical limit is 16Exa Windows 2008 (64-bit) is 2TB. RHEL 5 is 1TB of RAM. RHEL 6 is 2TB of RAM. SUSE 11 is 16TB of RAM. ESXi5 limits 32vCPu and 1TB RAM Practical NUMA localization limits depended on the amount of RAM
available in the server hardware you select, divided by the number of CPU sockets on the server.
NUMA Local Memory = Total RAM on Server/Number of Processors
GC Tuning knowledge <4GB, <12GB, <32G, and >32GB

68
Next limitation is GC tuning knowledge for each JVM size

32GB to 128 GB
12 GB to 32GB 4GB to 12GB
<4GB
Enterprise webapps internal/ external
69
Large enterprise apps
Large Motoring systems, other large web scale public apps
Large Distributed Data platforms (trading systems)
Less than 4GB JVMs

90% percent of enterprise web apps will fit into this The ease of scale out in Java allows you to keep your JVM size
small you can add more VMs in horizontal scale-out fashion to service more traffic
Throughput Collector -XX:+UseParallelOldGC should be good enough Benefit from automatic 32bit address compression even though you are
in 64bit JVM, most JVMs do this automatically
<4GB
Enterprise webapps internal/ external
70
Less than 4GB to 12GB JVMs

Large enterprise applications Could be applications that dont have good horizontal scalability built in,
or just need the larger heap
Medium amount of GC tuning needed MinorGC frequency, -Xmn, and FullGC duration adjustments start to
become a consideration
Throughput Collector -XX:+UseParallelOldGC should be good enough

Use Large Pages if application consumes a lot of data
within a single thread of allocation 4GB to 12GB
You will need to turn on Hotspot XX:+UseCompressedOops

for up to 32GB, in Java 6 update 18 this is enabled by default
Large enterprise apps
71
12GB to 32GB JVMs

Large enterprise systems/monitoring tools for example 12GB to 32GB Could be applications that dont have good horizontal scalability built
in, or just need the larger heap
Medium amount of GC tuning needed MinorGC frequency, -Xmn, and FullGC duration adjustments start to
become a consideration
Use Large Pages

Likely CMS will be used if latency and response times not met due to
GC pause
12 GB to 32GB
You will need to turn in Hotspot XX:+UseCompressedOops for up to

32GB, in Java 6 update 18 this is enabled by default
Large Motoring systems, other large web scale public apps
72
Greater than 32GB to 128GB JVMs

Large distributed data platforms Could be applications that dont have good horizontal scalability
built in, or just need the larger heap
32GB to 128 GB
Extensive amount of GC tuning needed MinorGC frequency, -Xmn, and FullGC duration adjustments
start to become a consideration, survivor space sizing, *.*
Parallel GCThreads set to 50% of available cores

All about CMS, ie XX:+UseConcMarkSweepGC Use Large Pages if needed
XX:+UseCompressedOops no longer applicable

UseNUMA flag doesnt work with CMS, so you may have to rely
on numactl and/or esx
128GB is the extent of most $$ feasible Servers, assuming

256GB server with 2 Sockets.
73
Large Distributed Data platforms (trading systems)
Sizing Large JVMs

set mem Reservation to 31955m Guest OS Memory JVM Memory for GemFire (31455m) JVM Max Heap -Xmx (29696m) Java Stack
VM Memory for GemFire (31955)
500m used by OS -Xss per thread (192k*100) Other mem (=1484m)
Perm Gen
-XX:MaxPermSize (256m)
Initial Heap -Xms (29696m)
74
Sizing large JVMs
-XX:+UseNUMA in HotSpot JVM

Only available Java 6 update 2 Only available on XX:+UseParallelOldGC and XX:+UseParallelGC Only resort to NUMA tuning if it is a large JVM that spans multiple
memory/process nodes You can check to see N%L in esxtop if not a 100% then workload is not NUMA local
Esxi5 NUMA optimizations are effective for majority of cases

In Linux for singular JVMs use numacntl interleave, for multiple JVMs you
could use cpubind=<nodenumber> and memnode=<nodenumber> We typically dont see the need to do this and the esx scheduler does a pretty
good job
75
Sizing large JVMs and the Java Stack (-Xss)
Increasing Java stack may improve memory intensive workloads

In regular, small heap spaces <4GB found in typical webpps built with Java we
would decreased the default stack size in order to increase scalability
However, in memory intensive workloads if sufficient objects are created within

one thread and do NOT escape to another thread, you can: Increase the Xss to 1MB to 2MB range, but must be within L1 and L2 range You may still get improvement in speed of execution if you slightly exceed the
available L1 cache
This limit the number of horizontally scaled out, or concurrent threads you can fit
within the memory space
This is more suited to large JVMs and data intensive in memory data management
systems, i.e. distributed cache etc.
We are trying to localize as much of the execution within the L1, L2 and L3 and
NUMA, in that order of priority
76
Most Common VM Size for Java workloads
2 vCPU VM with 1 JVM, for tier-1 production workloads Maintain this ratio as you scale out or scale-up, i.e. 1 JVM : 2vCPU Scale out preferred over Scale-up, but both can work You can diverge from this ratio for less critical workloads
2 vCPU VM 1 JVM (-Xmx 4096m) Approx 5GB RAM Reservation
77
However for Large JVMs + CMS

Start with 4+ vCPU VM with 1 JVM, for tier-1
in memory data management systems type of production workloads For large JVMs 4+ vCPU VM 1 JVM (8-128GB)
Likely increase JVM size, instead of

launching a second JVM instance
Multiple 4vCPU+ will allow for

ParallelGCThreads to be allocated 50% of the available vCPUs to the JVM, i.e. 2 GC Threads +
Ability to increase ParallelGCThreads is

critical to YoungGen scalability for large JVMs
ParallelGCThreads should be allocated 50%

of available vCPU to the JVM and not more. You wan tot ensure there other vCPUs available for other txns
78
Large RAM NUMA Nodes on ESX Hosts
Most ESX Hosts have 48-144GB memory

range, it can be more in some cases 384GB
In order to consume this amount RAM,

designers are forced to increase the traditional JVM size
Take an example of 128GB and 2 Socket,

8 Core each
If we assume 2vCPU per JVM, then we

can have 8 JVMs/VMs
ESX memory overhead 1% 128*0.99/8 => 15.84GB Java process is typically 25% in addition
to Java heap, hence JVM Heap can be 11.88GB
79
An Example ESX Host BIOS Configuration
80
Performance Perspective See the Performance of Enterprise Java Applications on VMware

vSphere 4.1 and SpringSource tc Server at http://www.vmware.com/resources/techresources/10158 .
81
The 90th percentile response-time curves and CPU utilizations for

the 2 CPU native and virtualized cases.
Below 80% CPU utilization in the VM, the native and virtual
configurations have essentially identical performance, with only minimal absolute differences in response-times.
% CPU
R/T
80% Threshold
82
The 90th percentile response-time curves for increasing load in the

4 CPU native and virtualized cases.
The native and virtual configurations have essentially identical

performance across all loads, with only minimal differences in response-times
% CPU R/T
80% Threshold
83
Shows the peak throughput for a single instance of Olio running on tc Server, both natively and in a VM, with 1, 2, and 4 CPUs
84
If give 4 vCPUs, which VM Configuration is better?

1 VM
4vCPU 1 JVM 4GB
2 VMs
4 VMs
85
Lowest CPU % Best R/T

Number of vCPUs per VM
1 2 4
86
Number of VMs
4 2 1
Per-VM Maximum Heap Size

2GB 2.5GB 4GB
Total Heap for 4vCPU Case
8GB 5GB 4GB
Best case
If given 4 vCPUs, which VM Configuration is better?

1 VM
4vCPU 1 JVM 4GB
2 VMs
4 VMs
87
Option -1 JVM/VM Horizontal Scalability

(best option)
4 off 2 vCPU VMs 1 JVM on each VM 4GB Heap on each JVM
JVM-1 Web
JVM-2 Web
JVM-3 Web
JVM-4 Web
Option -2 JVM Vertical Scalability

(second best option)
2 off 2 vCPU VMs 1 JVM on each VM 8GB Heap on each JVM Reduce JVM/VM sprawl
JVM-1
Web
JVM-2
Web
88
Option -3 JVM Stacking (third best option) 1 off 4 vCPU VMs 2 JVMs on each VM 4GB Heap on each JVM Reduce JVM/VM sprawl Reduce OS count Increased number of JVM instances on a single OS While JVM to vCPU ratio is still 1 JVM to 2 vCPU it is still not as prudent as Option-1
JVM-1 JVM-2 Web Web
It would likely not perform if you configured a 2vCPU VM In busy JVMs GC would require 1vCPU while other user
transactions would take additional available vCPUs
89
2 VMs 2 vCPU each, 2.5G RAM on each VM showed best Throughput for the amount of memory used and R/T achieved
90
Scaling of Peak Throughput
2 vCPU VMs shows best scalability when horizontally scaled.
91
FILL OUT A SURVEY

EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A $25 VMWARE COMPANY STORE GIFT CERTIFICATE
APP-CAP1426
The Benefits of Virtualization for Middleware
Jeff Battisti, Cardinal Health Emad Benjamin, VMware, Inc.
#vmworldapps

CAP1426 - The Benefits of Virtualization For Middleware PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

CAP1426 - The Benefits of Virtualization For Middleware PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

APP-CAP1426

The Benefits of Virtualization for Middleware

Jeff Battisti, Cardinal Health Emad Benjamin, VMware, Inc.

This session may contain product features that are

This session/overview of the new technology represents

Features are subject to change, and must not be included in

About the speaker

20 years experience as a Software Engineer/Architect, with last 15

This session may contain product features that are

This session/overview of the new technology represents

Features are subject to change, and must not be included in

Design and Sizing Middleware Platforms

Conventional Middleware Platform

Load Balancer Tier

Web Server Tier

Java App Tier

IT Operations Network Team

IT Operations Server Team

IT Apps Java Dev Team

IT Ops & Apps Dev Team

Organizational Key Stakeholder Departments

Middleware Platform Architecture on vSphere

Java Application Servers

SHARED INFRASTRUCTURE SERVICES

SHARED, ALWAYS-ON INFRASTRUCTURE

Social and Mobile

Spring Tool Suite

Java Runtime (tc Server)

Web Runtime (ERS)

Messaging Global Data In-mem SQL (SQLFire) (RabbitMQ) (GemFire)

App Monitoring Performance Mgmt (Spring Insight) (Hyperic)

Java Optimizations (EM4J, )

Step 2 Establish Benchmark

Investigate bottlnecked layer Network, Storage, Application Configuration, & vSphere

If building block app/VM config problem, adjust & iterate

If scale out bottlenecked layer is removed, iterate scale out test

Scale Out Test

Design and Sizing HotSpot JVMs on vSphere

JVM Max Heap -Xmx

Design and Sizing of HotSpot JVMs on vSphere

500m used by OS -Xss per thread (256k*100) Other mem (=217m)

JVM Max Heap -Xmx (4096m)

Initial Heap -Xms (4096m)

Larger JVMs for In-Memory Data Management Systems

0.5-1g used by OS -Xss per thread (1M*500) Other mem (=1g)

Initial Heap -Xms (30g)

Middleware ESXi Dedicated Cluster

vCPU VMs Less than 47GB RAM on each VM

Each NUMA Node has 94/2 47GB

See the Performance of Enterprise Java Applications on VMware

If given 4 vCPUs, which VM Configuration is better?

Most Common VM Size for Java workloads

2 vCPU VM 1 JVM (-Xmx 4096m) Approx 5GB RAM Reservation

However for Large JVMs + CMS

Likely increase JVM size, instead of

Multiple 4vCPU+ will allow for

Ability to increase ParallelGCThreads

ParallelGCThreads should be allocated 50%

Most Common Sizing and Configuration Question

Option-1 Scale out VM and JVM ( best)

Option-3 Scale out VM and JVM (3rd best)

Option-2 Scale Up JVM heap size (2nd best)

What else to consider when sizing?