Anda di halaman 1dari 60

Jetstress 2013

Jetstress Field Guide


Wednesday, 26 February 2014 Version 2.0.0.8 [Issued]

Prepared by neil.johnson@microsoft.com

Template Version October 2011

Prepared for Exchange Community

MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, our provision of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The descriptions of other companies products in this document, if any, are provided only as a convenience to you. Any such references should not be considered an endorsement or support by Microsoft. Microsoft cannot guarantee their accuracy, and the products may change over time. Also, the descriptions are intended as brief highlights to aid understanding, rather than as thorough coverage. For authoritative descriptions of these products, please consult their respective manufacturers. 2011 Microsoft Corporation. All rights reserved. Any use or distribution of these materials without express authorization of Microsoft Corp. is strictly prohibited. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Page ii
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Revision and Signoff Sheet


Change Record
Date Author Version Change reference 2.0.0.1 2.0.0.2 First draft for Jetstress 2013 Updates after feedback from Robert Gillies and Ramone Infante. Final issue after internal review Updated Error Table description with JET codes Added troubleshooting information for ESE 606. Fixed formatting issues

22/03/2013 Neil Johnson 03/04/2013 Neil Johnson

19/06/2013 Neil Johnson 20/06/2013 Neil Johnson

2.0.0.5 2.0.0.6

20/06/2013 Neil Johnson

2.0.0.7

Page iii
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Document Contributors
Name Neil Johnson Alexandre Costa Ross Smith IV Position Senior Consultant, UK MCS SENIOR SDET, Exchange Test PRINCIPAL PROGRAM MANAGER, Exchange CXP Section Author Jetstress internals Configuring Jetstress Various Various Various

Ramon b. Infante DIR, WW COMMUNITIES, UC Matt Gossage Umair Ahmad PRINCIPAL PROGRAM MANAGER LEAD SDET II, Exchange Test

Page iv
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Reviewers
Name Neil Johnson Alexandre Costa Ross Smith IV Version 2.0.0.1 2.0.0.1 2.0.0.1 Position Senior Consultant II, MCS UK SENIOR SDET, Exchange Test PRINCIPAL PROGRAM MANAGER, Office 365 - CAT SVCS DIR, WW COMMUNITIES, UC PRINCIPAL PROGRAM MANAGER LEAD, Exchange PM US SDET II, Exchange Test US SENIOR PROGRAM MANAGER, Exchange PM - US PRINCIPAL TECHNICAL WRITER, Content Publishing DELIVERY ARCHITECT, US-US-MCS West SL 2 SENIOR PROGRAM MANAGER LEAD, Office 365 - CAT SVCS REGIONAL ARCHITECT, US-MCS DOD SL 2 PRINCIPAL CONSULTANT, US-MCS Civilian SL 2 Date

Ramon b. Infante 2.0.0.1 Matt Gossage 2.0.0.1

Umair Ahmad Nathan Muggli Scott Schnoll Boris Lokhvitsky Jeff Mealiffe

2.0.0.1 2.0.0.1 2.0.0.1 2.0.0.1 2.0.0.1

Robert Gillies David Mosier

2.0.0.1 2.0.0.1

Table 1: Document reviewers

Page v
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Table of Contents
1 Purpose...................................................................................................................... 1 2 What is New in Jetstress 2013 .................................................................................... 1 3 Introduction to Jetstress ............................................................................................. 2 4 Jetstress Internals ...................................................................................................... 3
4.1 Main Jetstress Components ....................................................................................................... 3
4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 Auto Tuning Component ..................................................................................................................3 Thread Dispatcher ............................................................................................................................5 Background Log Checksummer ........................................................................................................5 Offline Log and Database Checksummer ..........................................................................................5 Reporting and Verification ................................................................................................................6

5 Planning for Jetstress ................................................................................................. 7


5.1 Jetstress testing flow chart ......................................................................................................... 7
5.1.1 5.1.2 High Level Test Overview ..................................................................................................................7 Process with Automatic thread tuning .............................................................................................8

5.2 5.3 5.4

When should I run Jetstress in my project? ............................................................................... 9 Where should I run Jetstress in my infrastructure? ................................................................. 10 Failure Mode Testing ................................................................................................................ 11
5.4.1 5.4.2 5.4.3 Raid Array Testing ...........................................................................................................................11 Resilient Component Testing ..........................................................................................................11 Example of a failed degraded mode test ........................................................................................12

5.5 5.6

Jetstress testing inside virtual machines .................................................................................. 13


5.5.1 What is different about Jetstress inside a virtual machine? ...........................................................13

How much time should I allocate for Jetstress testing? ........................................................... 15


5.6.1 5.6.2 5.6.3 Initialisation ....................................................................................................................................15 Testing ............................................................................................................................................15 Clean-up ..........................................................................................................................................16

5.7 5.8

Preparing for the Jetstress test ................................................................................................ 17 What happens if the test fails? ................................................................................................. 18

6 Installing Jetstress .................................................................................................... 19


6.1 Documentation ......................................................................................................................... 19
Page vi
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

6.2 6.3 6.4

Jetstress Version and Download .............................................................................................. 19 Prerequisites ............................................................................................................................. 20 Getting ESE Files necessary for Jetstress .................................................................................. 21
6.4.1 6.4.2 File locations from an installed Exchange Server ...........................................................................21 File locations from the installation media ......................................................................................21

6.5

Installation ................................................................................................................................ 22
6.5.1 6.5.2 Application Installation ...................................................................................................................22 ESE File Installation .........................................................................................................................24

7 Configuring Jetstress ................................................................................................ 26


7.1 Jetstress Test Types .................................................................................................................. 26
7.1.1 7.1.2 Test a disk subsystem throughput ..................................................................................................26 Test an Exchange mailbox profile ...................................................................................................26

7.2

Initial configuration .................................................................................................................. 27

8 Jetstress Output Files ............................................................................................... 33 9 Reading Jetstress report data ................................................................................... 34


9.1 9.2 Target design values ................................................................................................................. 34 Reading the Jetstress Test Result Report ................................................................................. 35
9.2.1 9.2.2 9.2.3 9.2.4 9.2.5 9.2.6 9.2.7 9.2.8 9.2.9 9.2.10 9.2.11 Test Summary .................................................................................................................................35 Database Sizing and Throughput ....................................................................................................35 Jetstress System Parameters ..........................................................................................................36 Database Configuration ..................................................................................................................36 Transactional I/O Performance.......................................................................................................36 Background Database Maintenance I/O Performance ...................................................................37 Log Replication I/O Performance ....................................................................................................37 Total I/O Performance ....................................................................................................................38 Host System Performance ..............................................................................................................39 Error Counts Per Volume ................................................................................................................39 Test Log ...........................................................................................................................................42

9.3 9.4

Interpreting Jetstress test results ............................................................................................. 43 Test evaluation ......................................................................................................................... 44

10 11

Appendix A Configuring thread count ................................................................ 45 Appendix B Configuring sluggishsessions ........................................................... 46


Page vii
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

12 13 14
14.1

Appendix C - Running a Jetstress Test with JetstressCmd.exe ............................... 47 Appendix E Running Jetstress on a production server ........................................ 49 Common Issues.................................................................................................... 50
Troubleshooting Jetstress......................................................................................................... 50
Jetstress cannot attach to or create a database .............................................................................50 Error loading Performance Monitor counters ................................................................................50 Unable to tune for the parameters ................................................................................................51 Unable to mount databases due to invalid mount point configuration .........................................51 14.1.1 14.1.2 14.1.3 14.1.4

14.1.5 Jetstress testing failed. Error: System.ApplicationException: Faulty performance counter paths: \MSExchange Database(*)\* .........................................................................................................................52

Page viii
Jetstress 2013, Field Guide, Version 2.0.0.8 Issued Prepared by Neil Johnson "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Purpose

This document is intended to explain the process and requirements for validating an Exchange 2013 storage solution prior to releasing an Exchange deployment into production. It will explain how Jetstress works, how to plan for and perform a Jetstress test, and how to analyse the results of the test. This document is not intended to provide Exchange storage design guidance. For guidance on Exchange 2013, server design and planning refer to Planning and Deployment.

What is New in Jetstress 2013


Jetstress 2013 is an evolution of Jetstress 2010. It has some improvements, bug fixes and it allows validation of Exchange Server 2013 solutions. A quick outline of new features: The Event log is captured and logged to the test log. These events show up in the Jetstress UI as the test is progressing. Any errors are logged against the volume that they occurred. The final report shows the error counts per volume in a new sub-section. A single IO error anywhere will fail the test. In case of CRC errors, they might be remapped. A re-run of Jetstress should verify that they indeed were remapped. Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew, DbtimeTooOld. Threads, which generate IO, are now controlled at a global level. Instead of specifying Threads/DB, you now specify a global thread count, which works against all databases. This improves the granularity of thread tuning and enables automatic tuning to work more effectively. Jetstress configuration files (JetstressConfig.XML) generated from an older version of Jetstress is no longer allowed.

Important Changes Do not use Jetstress 2013 for older versions of Exchange Server. Jetstress 2013 has only been tested with Exchange Server 2013.

Page 1
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Introduction to Jetstress

Jetstress is a tool for simulating Exchange database I/O load without requiring Exchange to be installed. It is primarily used to validate physical deployments against the theoretical design targets that were derived during the design phase. To simulate the complex Exchange database I/O pattern effectively, Jetstress makes use of the same ESE.DLL that Exchange uses in production. It is therefore vital Jetstress use the same version of the Extensible Storage Engine (ESE) files that your Exchange infrastructure will be built with in production. Ideally, Jetstress testing will be part of the overall project plan. The best time to schedule Jetstress testing is just before Exchange will be physically installed onto the servers. Jetstress testing provides the following benefits prior to deploying live users. Validates that the physical deployment is capable of meeting specific performance requirements Validates that the storage design is capable of meeting specific performance requirements Finds weak components prior to deploying in production Proves storage and I/O stability

The most important aspect of Jetstress testing is that it allows you to see how the physically deployed storage and server infrastructure will behave once a real Exchange workload is applied. This often works out differently from expectations, especially in scenarios where shared storage infrastructure is deployed or where the storage design is complex. Often the Jetstress test will not provide the results that were expected. Sometimes by making subtle configuration changes to the storage infrastructure (for example, driver or firmware updates) it is then possible to get the test to pass. It is important to remember that when the Jetstress test reports a failure, Jetstress has not failed, Jetstress is just reporting on the performance of your storage solution. This may seem an obvious point, however a large number of customer escalation cases for Jetstress are not actually Jetstress cases and are instead storage performance cases. If you need to remediate a test failure, remember that Jetstress is dumb tool that is used worldwide by thousands of Exchange professionals and in Office 365. It is extremely unlikely that Jetstress is broken; it is far more likely that you have a design issue or misconfiguration with your storage deployment. Fundamentally, a successful Jetstress test validates that all of the hardware and software components within the I/O stack from the operating system down to the physical disk drive are working to a sufficient level to meet the predicted performance required by Exchange to operate successfully.

Page 2
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Important: The validity of your Jetstress testing is only as good as the user profile analysis and workload prediction that was completed during the design phase of the project.

4
4.1

Jetstress Internals
Main Jetstress Components
Like Exchange, Jetstress is an ESE-based application. It runs in user memory space, makes API calls to ESE, which in turn makes calls to the Windows File system and I/O Manager to gain access to the data stored on disk. During each of these tasks Windows records performance information about the specific task and the operating system as a whole. Once the test is completed, Jetstress analyses the performance data to determine if the system meets the targets specified at the beginning of the test.
Windows Operating System
Windows Performance Counters

Hardware

Performance Data

Jetstress Application Auto tuning

Storage Subsystem Extensible Storage Engine (ESE) Background Database Maintenance

Windows I/O Manager

Reporting and Verification

Thread Dispatcher Transactional I/O Background Log Checksummer

Offline Log & Database Checksummer

Figure 1 - Main Jetstress Components

4.1.1

Auto Tuning Component

This component is responsible for auto tuning within Jetstress. It attempts to determine the maximum thread count that the solution can support. Each thread performs a set amount of ESE calls, which generates a set amount of disk I/O. By raising or lowering thread count, the storage workload can be modified. The auto-tuning component attempts to determine the maximum thread count that the storage solution can support, whilst remaining within the published disk latency guidelines for Exchange Server. The Jetstress test parameters for disk latency are shown in section 8.3 Interpreting Jetstress test results.

Device Drivers

Page 3
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

New: Auto tuning has been improved in Jetstress 2013 by moving to a global thread controller. Auto-tuning may still fail, however it should be successful in many more scenarios than in 2010.

Page 4
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

4.1.2

Thread Dispatcher

The thread dispatcher is responsible for managing workload within Jetstress. The main areas of interest within the thread dispatcher are as follows: ThreadCount: number of transactional threads globally (prior to Exchange 2010, it used to be the number of threads per storage group and in Exchange 2010 it was number of threads per database). In Exchange 2013 this is a global parameter. ThreadTypes: each of those threads chooses to do one type of work against the database. The same thread can perform different types of work during a given run. There are four types: insert, read, update and delete (all of those against records on a table). The default operation mix for an Exchange 2010 simulation is: 40%, 35%, 5% and 20%, respectively. SluggishSessions: the default is 1 for Exchange 2010. This is usually used to fine tune the amount of work performed by a given thread. Internally, a thread sleeps for (SluggishSessions * TaskRunTime) before picking up the next task to run. For example, if you have 3 for SluggishSessions and an insert thread took 100ms in the last cycle, it will sleep for 300ms before moving on to the next cycle. Of course, 0 means go full throttle.

4.1.3

Background Log Checksummer

This component simulates the I/O overhead of additional database copies. This copy operation has an I/O cost which increases with each additional copy.

4.1.4

Offline Log and Database Checksummer

This process checksums all database and log files at the end of a Jetstress run to ensure that all data is intact. It also provides performance data for CRC checksum speed should VSS copies require a checksum prior to backup. This process is extremely hard on storage hardware, often applying an I/O load many times greater than the workload that the actual Jetstress test applies. Important If you are running Jetstress on multiple servers in parallel on shared storage infrastructure, it is vital that the CRC check is not running while other servers are performing their Jetstress tests. Selecting the multi-host option during the test configuration causes the testing process to stop and wait for confirmation before beginning the CRC check to avoid servers interfering with each others results.

While working out the correct thread count to use it is not necessary to let the checksum part of the test complete. To stop the checksum you can either click on cancel, which will stop the checksum part of the test but still generate the performance test report, or edit the Jetstress configuration file and change the VerifyChecksum value to false (default is true). <VerifyChecksum>false</VerifyChecksum>

Page 5
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

4.1.5

Reporting and Verification

At the end of a Jetstress test, the reporting and verification process compares the observed performance results against a set of acceptable values. These results are then written to a HTML file. During the test, binary performance data is written out to a BLG file.

Page 6
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Planning for Jetstress

Jetstress testing can be difficult to account for in your planning process. Particularly, how much time to allocate for testing, and which parts of the project should Jetstress testing occur? This section will try to answer some of these questions and explain the process in more detail.

5.1

Jetstress testing flow chart


The aim of the following process is to find the maximum workload while still passing the test. Fundamentally, the aim is to increase workload until the test fails or meets the design goals identified in the mailbox role calculator. Important: The last value before failure is the highest workload that the system can support. If this value is below the design target, then use sluggishsessions to fine-tune the test. If the storage is still unable to meet the requirements then we have determined that it is unsuitable for the workload intended.

The following process assumes that you are using the disk subsystem throughput test and autotuning as recommended.

5.1.1

High Level Test Overview

Figure 2 - High Level Test Overview shows a high-level flowchart for Jetstress testing. The process begins with a completed Mailbox Role Calculator and ends when the test has passed successfully while meeting the targets identified in the calculator.
Complete Mailbox Role Calculator

Begin Testing

Jetstress Testing

Test Pass?

yes

Achieved IOPS?

yes

Validation Complete

No

No

Remediation / Reconfiguration

Figure 2 - High Level Test Overview

Page 7
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.1.2

Process with Automatic thread tuning


Record failure results and Retest Increase thread count manually

Reduce thread count manually


YES

IOPS Exceeded?
NO

NO

Testing Begins

Non Latency Error?


NO

YES

NO

Test Initialisation

AutoTuning sets thread count

Perform 15 minute test

Test Pass?

YES

IOPS Sufficient?

YES

Storage Solution Requires Remediation


NO

Perform 2hr strict mode test

Test Pass?

YES

Perform 24hr lenient mode test

Test Pass?

YES

Test Results

Testing Ends

NO

Figure 3 - Jetstress test flowchart for automatic thread tuning

Page 8
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.2

When should I run Jetstress in my project?


Jetstress testing can often take place at multiple phases within the project plan. Depending on the design approach taken, Jetstress testing may be performed during both the planning (design) and build phases of a project.

Figure 4 - SDM phase overview

So, why would you run Jetstress during the planning/design phase of a project? The simple answer is that with todays powerful hardware, Exchange design teams must use standard chunks of hardware to create their design. Rather than attempt to guess what the I/O limits are of the hardware it is preferable to perform some Jetstress tests on the hardware to determine the maximum storage IO capacity of the system. This allows the design team to specify the bill of materials much more precisely, thereby saving money and reducing risk. However, if you have already proven the solution in the lab, why test again at build time? This is a common question. Many projects only schedule sufficient time for testing a single server and its storage solution with the belief that they only need to validate the design. The problem with this approach is that it assumes a zero error rate in the build out. What happens if someone forgets a part of the build on one server? Alternatively, deploys a different device driver from the one used in the lab? What happens if a faulty piece of hardware has been deployed? Jetstress testing at build time is a great way to validate that the physically deployed hardware and software are capable of providing the required I/O performance for Exchange. Jetstress testing at build time is also a way to identify failing components such as disk drives; it is much less stressful to identify a weak batch of disks during a Jetstress test than on a Monday morning after a large user migration! If the project plan will allow it, build in sufficient time to test each server and storage chassis that will be deployed before migrating user mailboxes to it. Remember that Jetstress can be fully automated, so with a little bit of planning it can be left to run overnight and may not actually add any significant overhead to the project.

Page 9
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.3

Where should I run Jetstress in my infrastructure?


To ensure that the Jetstress test is representative of production, it is recommended to run Jetstress on every set of disks that will hold mailbox database copies (active, passive or lagged). The test is designed to validate the storage system and so it is important that where you have multiple Exchange servers that use the same storage system, you must test them in parallel to simulate the production workload. If the storage system also supports additional workload, you should use IOMeter to simulate this if it is not yet active on the storage system at the time of testing. Note: It is important to remember not to run Jetstress on production servers that have Exchange Server already installed. This may lead to problems with Exchange performance counters. It is recommended to run Jetstress BEFORE installing Exchange Server into production. In the event that you have already installed and configured Jetstress on your production Exchange Servers, refer to the following article for more information on resolving Exchange Performance Counter problems: http://blogs.technet.com/b/mikelag/archive/2010/09/10/how-to-unload-reloadperformance-counters-on-exchange-2010.aspx

Each database copy must be designed to provide sufficient I/O to support the copy if it were to become active. Therefore, by testing each database LUN in parallel, we are validating that the storage solution is able to meet the design requirements. We are also validating that any pieces of shared infrastructure are able to meet the demand of the entire solution, rather than simply testing each server individually. Note: Where there is no shared infrastructure and all storage is directly attached, servers may be tested individually. However, the test must be configured to include any active, replica or lagged LUNS that could become online at the same time to be a valid test.

Page 10
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.4

Failure Mode Testing


5.4.1 Raid Array Testing
Since the improvements in Exchange I/O from Exchange 2007, it is now viable to deploy Exchange Server databases on a multitude of storage types, from JBOD to RAID 6. Raid arrays offer a great compromise between data redundancy and performance. However, they can also suffer from a significant performance reduction when operating in degraded mode (spindle failure). Due to this, it is recommended to design RAID arrays that will host Exchange Server databases such that the RAID array should provide sufficient IOPS performance for the Exchange workload when running in degraded mode. Important: While testing for failure scenarios it is not necessary to run your Jetstress test at peak working load. Instead, it is recommended to modify the thread count until the Jetstress test achieves just above the Total Database Required IOPS / Server value reported in the Mailbox Role Calculator.

From a service availability perspective, it is important to validate that your storage can provide sufficient performance in all common failure conditions. Due to this, it is recommended to run the Jetstress test while the array is operating in the following conditions.
Array Condition Optimal Degraded Rebuilding Test importance Recommended for all deployments Recommended for all deployments Recommended if array has hot spare .
1

Description All disk spindles operating normally Single spindle removed from the array Failed spindle replaced and array controller is rebuilding the array

Table 2: Raid array testing conditions

Ideally, the Jetstress test should still pass during a degraded mode test. If the test fails, refer to this post to analyse the failure severity.

5.4.2

Resilient Component Testing

Any aspect of the storage solution that has been designed to be resilient should also be tested in a failed state to determine the impact. For example if there are multiple paths between the host and the storage controller, the Jetstress test should still pass if one is disabled. Since there are so many possible types of resilient components, it is impossible to list them here, however the general spirit of this test is to evaluate potential sources of failure within your storage solution and ensure that Jetstress still passes if they enter a degraded state.
1

If your array does not contain a hot spare, you can choose to perform array rebuilds out of hours so the end user impact is minimized, however your data loss exposure is increased. If you plan on performing array rebuilds during working hours, even if you do not have a hot spare configured it is recommended to perform a Jetstress test run while the array is rebuilding. Page 11
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.4.3

Example of a failed degraded mode test

This example shows an unacceptable test result. I have chosen to show an unacceptable result since a good test is just a flat line and that is not particularly interesting. In this instance, the storage was based on Raid6 technology. The Jetstress test was configured to run at 1256 IOPS (Mailbox Role Calculator predicted 1200 IOPS). Approximately half way through the test, a hard disk drive was (carefully) removed from the array and the spare began rebuilding. The test data shows that the average read I/O latency (Exchange Database ==> Instances\I/O Database Reads (Attached) /average Latency) increased from 11ms to 400ms+, with latency spikes of 3000-4000ms on the affected LUN. This situation took 18 hours to return to normal after the failure. This represented a clear failure of the degraded mode test. Important: Common failure modes such as a disk rebuild should not materially affect the test results.

Figure 5: Degraded mode failure

Note: Please refer to the following section about understanding storage configuration for Exchange Server 2013 for more information on recommended raid configurations for Exchange Server. http://technet.microsoft.com/en-us/library/ee832792.aspx

Page 12
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.5

Jetstress testing inside virtual machines


A quick history lesson: Over the years, we have seen a huge increase is deployments on hypervisor technology. During the early stages of hypervisor use for Exchange, we worked with a number of customers who observed inaccurate results during their Jetstress tests of virtual machines. This culminated in the Exchange product group releasing a statement that advised against using Jetstress inside a virtual machine and instead to test on the root of the hypervisor obviously this worked for Hyper-V, but was not quite so practical for all hypervisors. On 30th March 2012 after significant internal testing against modern hypervisors the Exchange Product group announced that it is now viable to perform your Jetstress testing directly from inside the virtual machines that are planned to host the Exchange Mailbox role. The single caveat is that the hypervisor being used is one of the following or newer: Microsoft Windows Server 2008 R2 (or newer) Microsoft Hyper-V Server 2008 R2 (or newer) VMware ESX 4.1 (or newer) Information: More information about deploying Exchange Server 2013 on a Hypervisor can be found here: http://technet.microsoft.com/en-us/library/jj619301.aspx

5.5.1

What is different about Jetstress inside a virtual machine?

The approach and testing process do not change. The aim of the test is to validate that the storage presented to the virtual guest can provide sufficient performance to meet the predicted requirements from the mailbox role calculator. All performance counters and recommended values remain the same from a physical to a virtual guest and the recommendations for testing against raid arrays and in failure-modes still apply. However, there are things that we may need to consider during our Jetstress testing. 1. Is the virtual host operating at a normal working load during our test? If the host has capacity for 10 virtual machines and we are testing with a single virtual machine running, then there is the possibility that we will experience performance problems once the host is fully loaded. 2. Does the host server have any high availability technology that we need to test in degraded mode? This could include things like multiple paths to the storage or network, or maybe even a Hypervisor HA solution. Additionally the host may be the failover location for other guests, meaning that workload may increase dramatically in a failure scenario. 3. Follow the current recommended practices from both Microsoft and your hypervisor vendor. Yes, I know this is obvious but it still amazes me how many problems are resolved by following the recommended guidance!
Page 13
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Guidance The spirit of the test is to ensure that the system can meet its predicted workload during normal working conditions and during any common failure modes for which the system has been designed to survive.

Page 14
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

For more information about virtualizing Exchange Server: Announcing Enhanced Hardware Virtualization Support for Exchange 2010 (this applies equally to Exchange Server 2013): http://blogs.technet.com/b/exchange/archive/2011/05/16/announcing-enhancedhardware-virtualization-support-for-exchange-2010.aspx Demystifying Exchange 2010 SP1 Virtualization (this applies equally to Exchange Server 2013): http://blogs.technet.com/b/exchange/archive/2011/10/11/demystifying-exchange-2010sp1-virtualization.aspx Best Practices for Virtualizing Exchange Server 2010 with Windows Server 2008 R2 Hyper V (Applies equally to Exchange Server 2013): http://www.microsoft.com/download/en/details.aspx?id=2428

5.6

How much time should I allocate for Jetstress testing?


Jetstress testing can take a long time to complete and it is vital that this time is correctly planned for within your Exchange project plan. Generally, the test procedure can be broken up into three parts. Initialisation Testing Clean-up

5.6.1

Initialisation

This phase includes installation, prerequisites and initial database creation. Of these tasks, the initial database creation will take the longest amount of time. Database creation time varies between hardware deployments however expect around 24 hours for 10TB of data per server (~7GB/minute). If you are using direct attached storage and initialise multiple servers in parallel these predictions apply to each server. If you are using shared storage, your initialisation time may take considerably longer.
DATA (TB) TIME (Hours) TIME (Days) 1TB 2.4 0.1 2TB 4.8 0.2 5TB 12.0 0.5 10TB 24.1 1.0 50TB 120.3 5.0 100TB 240.6 10.0

Table 3: Database initialisation time

5.6.2

Testing

The actual testing phase will vary depending on the complexity and maturity of the design. If your design is based on complex, cutting-edge storage technology, it is highly likely that you will need to allocate more time for testing. If your design is based on common direct attached components, the testing phase is likely to be quite short. For simple direct attached solutions allow between 2-5 days, for complex SAN solutions try to allocate up to 10 working days. If you are working in a complex
Page 15
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

enterprise with large scale, complex storage infrastructure budget between 4-6 weeks for Jetstress testing. Troubleshooting storage performance issues can often be very time-consuming.

5.6.3

Clean-up

Before the server can be put into production, it is necessary to remove the Jetstress application and the test databases that were created. The recommended procedure is as follows Uninstall Jetstress and Reboot Copy the Jetstress data to a safe location Delete the Jetstress installation folder Remove all test databases

Depending on complexity, allow between 1 and 2 hours per Exchange server that needs to have Jetstress uninstalled. Tip: If you have a complex deployment, you can use the scripts embedded here:

JetstressScripts.zip

The scripts will parse your JetstressConfig.XML file and remove all database and log folders defined in the test. The scripts takes two input parameters: [XMLFile] Path to JetstressConfig.XML file defaults to C:\Program Files\Exchange Jetstress\JetstressConfig.xml if no other value is specified. [Prompt] $true or $false, default is $true, specify $false to use as part of an automated process.

Note that these scripts are unsupported and you use them entirely at your own risk. They are provided here for convenience only.

Page 16
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.7

Preparing for the Jetstress test


Jetstress simulates an Exchange database workload. To ensure that the environment is ready it should be configured according to both the hardware vendors and Microsoft recommendations. Refer to Understanding Exchange 2013 Storage Configuration Options for further detail. As a starting point, ensure that the following conditions have been met: 1. If multiple clusters will be sharing any aspect of the disk subsystem, the server/storage configuration must be Cluster/Multi-Cluster Certified. 2. Verify with vendors that drivers and firmware are current and consistent across all servers. Drivers and firmware include, but are not limited to, the following items: a. Server BIOS/firmware b. SCSI/Array Controller firmware and driver c. Fibre Host Bus Adapter (HBA) firmware and driver d. Fibre switch/hub firmware e. SAN (Storage Area Network) enclosure Operating System/Microcode/firmware f. Hard disk firmware 3. Verify that the HBA/SAN specific configuration is set correctly and is consistent across all servers. Many HBAs use registry keys to customize the configuration to a specific SAN platform (for example, Queue Depth). 4. Raid Controller Stripe size is 256Kb or greater (refer to hardware vendor for guidance). 5. Read/Write Cache is 75% Write and 25% Read on all LUNs. 6. Configure the storage logical unit numbers (LUNs) (consider Exchange log devices and database devices). 7. Format the LUNs within Windows with NTFS file system. Best practice = 64k allocation unit size. 8. NTFS Compression is not enabled. 9. File Level Anti-Virus is configured to exclude all Exchange data locations and any directories that Jetstress has been configured to use. 10. Storport.SYS has been updated to the latest supported version for your hardware.

Page 17
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.8

What happens if the test fails?


It is important to determine the pass and fail criteria for the test. The test will find the peak working load that the storage is able to provide at the I/O latency targets recommended by the Microsoft Exchange Team. These are defined in section 8.3 Interpreting Jetstress test results. If the recorded IOPS target from the Jetstress test is above the targets documented within the Exchange design then the storage solution is deemed to have passed the test. If it does not meet the design targets, then the storage solution is deemed to have failed the test. If the test shows that, the storage has failed to meet its design targets it will be necessary to perform remediation. This usually involves a combination of resources from the design/project, build, hardware, and storage vendor teams. The aim of remediation is to determine why the IOPS target was below the design target and to provide a remediation plan before submitting the solution for a re-test. Before beginning significant storage redesign work, it is important to check the basics listed in section 4.7 Preparing for the Jetstress test. The most common causes of Jetstress test failures are missing simple configuration steps during deployment and/or misconfiguring the Jetstress test itself. One of the most common pitfalls that occurs when a test fails is focussing on Jetstress itself. Remember that Jetstress has not failed. Your storage has failed the test. Jetstress is just the messenger, instead concentrate on understanding the data that Jetstress has provided and how you can fix your storage solution. Jetstress is a well-proven tool and is extremely unlikely to be the root cause of your storage test failing. Advice: It is much easier to resolve configuration problems during this phase of the deployment than after the Exchange servers have been put into production. It is far better to suffer a small delay to the project timescales than put a service into production that does not meet its original goals.

Page 18
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

6
6.1

Installing Jetstress
Documentation

The document that you are currently reading represents the main source of information for Jetstress 2013. If you are validating Exchange Server 2003, 2007 or 2010 refer to the Jetstress Field Guide for Jetstress 2010.

6.2

Jetstress Version and Download


Version 14.01.0225.017 14.01.0225.017 15.0.658.4 Build 32 bit 64 bit 64 bit Usage Exchange 2003 Exchange 2007 Exchange 2010 Exchange 2013
2

Link http://www.microsoft.com/enus/download/details.aspx?id=20054 http://www.microsoft.com/enus/download/details.aspx?id=4167 http://www.microsoft.com/enus/download/details.aspx?id=36849

Table 4 - Jetstress version and download table

Note: Although there is a 32-bit build of Exchange 2007, it is not recommended or supported to use these ESE files to run a Jetstress test. This is due to the requirement for a 64-bit address space to simulate a realistic Exchange I/O pattern. Jetstress 2013 will not allow you to use an XML configuration file from an older version of Jetstress. Always ensure that you use the same version of Jetstress to initialise the databases and to perform the testing.

Refer to Appendix D Exchange 2003 for information on configuring Jetstress 14.01.225.x for Exchange 2003 Page 19
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

6.3

Prerequisites
.NET Framework 4.5 or higher A copy of your 64-bit production ESE files3 o ese.dll o eseperf.dll o eseperf.hxx o eseperf.ini o eseperf.xml

It is important that the version of ESE that is used for the test is the same version that will be used in production.

See section 5.4 Getting ESE Files necessary for Jetstress for the locations of these files. Page 20
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

6.4

Getting ESE Files necessary for Jetstress


Jetstress requires ESE to function. The needed files are available from an installed Exchange server or from the Exchange installation media. It is recommended to get the files from an installed Exchange server that has been fully updated and patched. If you are validating Exchange 2010 or newer, it is possible to get the necessary files directly from the installation media without requiring an Exchange installation. Note: AMD64 refers to the x86-64 bit architecture and is not specific to AMD processors. Do NOT use the x86 files!

6.4.1

File locations from an installed Exchange Server


File ESE.DLL ESEPERF.DLL ESEPERF.HXX ESEPERF.INI ESEPERF.XML Path C:\Program Files\Microsoft\Exchange Server\V15\Bin C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64 C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64 C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64 C:\Program Files\Microsoft\Exchange Server\V15\Bin\perf\AMD64

Table 5 - ESE file locations on running Exchange server

6.4.2

File locations from the installation media


File ESE.DLL ESEPERF.DLL ESEPERF.HXX ESEPERF.INI ESEPERF.XML Path \setup\serverroles\common \setup\serverroles\common\perf\amd64 \setup\serverroles\common\perf\amd64 \setup\serverroles\common\perf\amd64 \setup\serverroles\common\perf\amd64

Table 6 - ESE file locations from installation media

Caution Remember to use the same version of ESE files in your Jetstress tests that you will use in production.

Page 21
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

6.5

Installation
Before performing this section, it is recommended that all prerequisites have been met and that Exchange server is not installed on any servers being used for Jetstress testing.

6.5.1
# 1.

Application Installation
Screenshot

Instruction Begin Jetstress installation

2.

Accept License agreement

Page 22
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

3.

Leave the installation options as default unless you have a good reason to change them. Note: All performance data and HTML reports will be stored in the installation folder so if your system drive is short of space select an alternative folder.

4.

This is the last chance to stop the installation. Click on Next to install

Page 23
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

5.

Once installation is completed click on Close.

Table 7 - Jetstress installation instructions

6.5.2
# 1.

ESE File Installation


Screenshot

Instruction Copy ESE prerequisite files into the Jetstress installation folder. By default this is c:\Program Files\Exchange Jetstress

Page 24
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

2.

Start Exchange Jetstress 2013 Note: Jetstress requires local Administrator access. If user access control is enabled, ensure that you start the JetstressWin.EXE process as an administrator.

3.

Click on Start new test

4.

Jetstress will attempt to use the ESE files that were copied over in step 1. The first time that this occurs Jetstress must be restarted. Verify in the output on this screen that the ESE version is correct and that the last line of the status output requires that Jetstress be restarted. Close Jetstress This is the end of the Jetstress installation.

Table 8 - ESE installation instructions

Page 25
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Configuring Jetstress

For the purposes of this document, we will be configuring a disk subsystem throughput test. The goal of this test is to identify the peak working IOPS value that the storage subsystem can sustain while remaining within the disk latency targets established by the Exchange Product Group.

7.1

Jetstress Test Types


7.1.1 Test a disk subsystem throughput
This test uses some fixed parameters to determine the maximum storage performance at maximum working capacity (80%). This is the recommended test type since it identifies the maximum working load of the storage solution for use with Exchange Server 2013 while the disks are filled to capacity. The values observed from this test can be used both to qualify the solution ready for production and to calculate available system I/O headroom once the service is in production. This test should be regarded as mandatory for each Exchange server released into production. Databases Size Control Where you are testing multiple databases per volume, Jetstress will automatically calculate the database size of all databases on the same volume to ensure that the test runs at 80% of volume capacity. If your volume is over-sized for your solution for some reason and the test databases are too large, then you can control the size of the databases by reducing the size the database using storage capacity percentage box during the test configuration to be whatever you need.

7.1.2

Test an Exchange mailbox profile

Helps you determine whether your storage system meets or exceeds the planned Exchange mailbox profile. In the Exchange mailbox profile test scenario, you can specify the number of mailbox users, IOPS per mailbox and quota size to simulate the profiled Exchange mailbox load. This test type can be useful if your storage has been specifically designed to operate only at a specific disk capacity4. Note: Even if this test type is used, it is still recommended to complete the disk subsystem throughput test to determine the maximum working load of the storage solution at full capacity.

It is not recommended to design Exchange storage performance based on less than 80% utilisation capacity. Page 26
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

7.2
#

Initial configuration
Instruction Open Exchange Jetstress 2013 Screenshot

1.

2.

Click on Start new test

3.

Check that the status text does not ask for a restart and that the last two lines state that the ESE engine and performance libraries were detected.

Page 27
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

4.

Since this is the first time, we are configuring a test we will accept the defaults and click next. This will create a new configuration file called JetstressConfig.xml in the default installation directory. If you already have an XML file select that.

5.

Select the Test disk subsystem throughput test and click next

6.

Ensure that Supress tuning and use thread count is unchecked. This is a change to Jetstress 2010 where autotuning would rarely work. Auto tuning should work in most scenarios with Jetstress 2013. If Auto-tuning fails, revert to manual thread configuration as per Appendix A Configuring Thread Count. You should always test with 100% database capacity and target IOPS throughput, however if the storage presented to your servers is greatly oversized then you can control the Jetstress test database sizes by reducing the size the database using storage capacity percentage. Most validation tests should leave both values at 100.

Page 28
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

7.

Configure the test for performance. If you are testing a shared storage platform, enable the multi-host checkbox. Ensure that run background database maintenance is checked. Set continue the test run despite encountering errors to enabled. If any errors are detected during the test, they will be reported in a new table to highlight disk errors.

8.

Enter in the folder for storing the test results and set the correct duration for Jetstress. A minimum of one successful 2hr and a separate 24 test is required for deployment validation. Note: While auto-tuning or configuring thread count, you can set a shorter than 2 hour test by typing directly into the window. 0.75 = 45m 0.50 = 30m 0.25 = 15m

Recommendation: Use 0.50 (30 minute) test runs to set thread count for SAN storage.

9.

Configure the test to represent the production deployment. Number of databases should be the total on this server including all database copies, active, passive and lagged. Number of copies per database represents the number of total copies
Page 29
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

that will exist for each unique database. This value simply simulates some LOG I/O reads to account for the log shipping between active and passive databases it does NOT actually copy logs between servers. For example, if your 6 server DAG contained 30 databases, with 1 active copy, 2 passive HA copies and 1 lagged copy per database (or 120 database copies spread across 6 servers, with each server hosting 20 copies), you would set the number of databases to 20 and the number of copies per database to 4.

10. Configure the database and log file paths appropriately. Scroll to the bottom of this page to find the next link. Note: Refer to the Mailbox Role Calculators Distribution Tab to understand how your database should be configured.

11. If this is the first time the test has been run select to Create new databases, otherwise select Attach existing databases.

Page 30
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

12. Verify that the paths are as expected and click Prepare test

13. This will begin database initialisation this process will vary but plan on 24 hours for every 10TB worth of data to be initialised. This value should equate to 80% of the available storage. Refer to section 4.6.1 Initialisation, for further information on database sizes and creation time.

Page 31
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

14. Once the test has been initialised, click Execute Test.

15. Once the test has completed, close Jetstress and copy the Jetstress report and performance data somewhere for analysis. Each performance test will generate the following files. Performance_<date>.XML Performance_<date>.HTML Performance_<date>.BLG DBChecksum_<date>.XML DBChecksum_<date>.HTML DBChecksum_<date>.BLG XMLConfig_<date>.XML

Ensure that you make a copy of all of these files. Note: In addition you may also wish to make a copy of the *.EVT files which contain event log data taken during the test.
Table 9 - Jetstress initial configuration

Page 32
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Jetstress Output Files

This section will explain what output files will be created after the test and what is in each one.
File Performance_<date>.BLG Content Purpose

Binary performance data To provide detailed data for analysis. captured during the performance Open this file in perfmon and test. examine the counters manually to understand reasons for failure. XML Report for the performance test Provides the status report data in XML format.

Performance_<date>.XML Performance_<date>.HTML DBChecksum_<date>.BLG

HTML Report for the performance Provides an easy to read status test report for the test. Binary performance data captured during the checksum test. Provides binary performance data gathered during the CRC checksum of the database. Useful if the checksum fails or takes a long time to complete.

DBChecksum_<date>.XML DBChecksum_<date>.HTML XMLConfig_<date>.XML


Table 10 - Jetstress output files

XML Report for the checksum test Provides status report data in XML format. HTML Report for the checksum test XML Configuration File Provides an easy to read status report for the checksum test. Provides a backup of the Jetstress Configuration file used for the test.

Page 33
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Reading Jetstress report data

This section will walk through a very simple sample report, and explain where the key values are stored and how to interpret the data.

9.1

Target design values


Before we can evaluate our Jetstress data, we need to know what our design targets are. Assuming that the storage design was based on data from the Mailbox Role calculator (which they should be), the information we need is in the following table on the Role Requirements tab.

Make a note of the following value: Total Database Required IOPS / Server

Page 34
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.2

Reading the Jetstress Test Result Report


The following report is for a test with four databases configured.

9.2.1

Test Summary

This section is a basic summary of the test, when it started, finished and which versions of operating system and ESE were used. The most important part of this section is the overall test result, pass or fail.

9.2.2

Database Sizing and Throughput

This section shows some more detailed parameters regarding the test. A test disk subsystem throughput test report will always show 100% for Capacity Percentage and Throughput Percentage. In this example, 4 x 25GB Databases were created on a 126GB LUN. Jetstress created a total of 101GB (109154926592 bytes) of data for testing which is 80% of the available space. This is normal behaviour; by default, in performance mode Jetstress will use 80% of the disk capacity to allow room for growth during the test process. The most important value in this section is the Achieved Transactional I/O per Second. In this example the test validated the storage can provide 231 transactional I/O per second. This represents random database IOPS. Note: To validate that the test has met the design requirements compare the Achieved Transactional I/O per Second from your Jetstress report to the Total Database Required IOPS / Server value recorded in section 8.1 Target design values, from the Mailbox Role Calculator.

Page 35
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.2.3

Jetstress System Parameters

This section displays some system values that Jetstress used for this test. The important values for analysis here are the thread count and number of copies per database.

9.2.4

Database Configuration

This section lists the paths for each database and log combination. In this example, 4 x 25GB databases were configured on a single LUN. Check that all of the test databases are listed here and the path names are correct.

9.2.5

Transactional I/O Performance

This section of the report displays the Transactional I/O values that were achieved for each database. Transactional I/O does not include I/O for Background Database Maintenance. BDM I/O is mostly sequential so it is not usually considered during the design phase. Information: If you sum the values highlighted in the red box the result should add up to the Achieved Transactional I/O per second reported in the Database Sizing and Throughput table. In this example, 33.859 + 24.069 + 33.87 + 23.491 + 33.978 + 24.186 + 34.043 + 23.807 = ~231 IOPS.
Page 36
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.2.6

Background Database Maintenance I/O Performance

This section displays the I/O that was used to perform Background Database Maintenance only. The sum of values in the red box shows the total amount of IO used for BDM operations. These are sequential operations and we do not usually need to account for them in our design. However, take the advice of your storage vendor on this aspect, some storage platforms do not handle sequential IO as well as others and may require some additional design work to help them deal with BDM more gracefully.

9.2.7

Log Replication I/O Performance

This section displays the I/O overhead for LOG file replication. In this example there were two replica copies (replicas=2), this is shown by a non-zero count for I/O Log Reads/sec. If this value is greater than zero it confirms that database replication is being simulated. Note: For those that noticed, I finally provided a report that shows log IO I know, the little things count

Page 37
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.2.8

Total I/O Performance

This table shows all I/O that was recorded during the test (transactional I/O plus BDM I/O plus LOG I/O). The summation of I/O values from areas highlighted in red in this table should agree (roughly) with those observed at the storage subsystem. In this case, the summation suggests that the storage subsystem had to deal with 349 IOPS. However, roughly 1/3rd of those (349-231=117) IOPS were sequential and so were not accounted for during the design process, since sequential I/O is very easy on most disk subsystems. The following chart shows the observed IOPS from the Windows host during the Jetstress test. This counter includes all system IOPS as well as the test IOPS; however there should be a strong correlation between the IOPS observed on the windows host and at the storage subsystem. In the event of contradiction between observed IOPS at the Windows Host and those at the storage controller, the windows host values take precedence from a Jetstress validation perspective.

Figure 6 - Host observed IOPS


Page 38
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

It is import to differentiate between sequential IOPS and transactional (random) IOPS when validating your storage. We are only interested in transactional IOPS when we are Jetstress testing BDM and LOG IO are sequential in nature and so we ignore them from a performance planning perspective for Exchange Server. Often storage teams are confused by the results of a Jetstress test since the achieved transactional I/O per second value is much lower than the observations they make at the storage system. It is important to differentiate between the workloads. Note: It is an invalid approach to sum the values displayed in the Total I/O Performance table and compare them to the Total Database Required IOPS / Server predicted by the Mailbox Role calculator. The only value from the Jetstress report that is required for validation is Achieved Transactional I/O per Second. All other values are for support and curiosity only!

9.2.9

Host System Performance

Figure 7: Host System Performance Table

This section of the report shows the observed system performance during the test. This section is most often used for troubleshooting. The most important thing to note from this section is that the CPU load from Jetstress is usually minimal. Jetstress has been optimized to evaluate the storage subsystem and not the host performance itself.

9.2.10 Error Counts Per Volume


If the Jetstress test detects IO errors, during the test it will try to continue to run the test and report the errors in both the Test Log and Error counts per Volume table. The table lists each volume along with the number and type of IO errors that were recorded.

Figure 8: Error Counts Per Volume Table

Page 39
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Error Type IO Failures

JET/ESE Error Type JET_errDiskIO JET_errReadVerifyFailure JET_errPageNotInitialized JET_errReadPgnoVerifyFailure JET_errDiskReadVerificationFailure

Error Code -1022 -1018 -1019 -1118 -1021 -533 -528 -501 -1023 -1024 -1025 -1032 -1812 -1852 -1305 -1119 -566 -567

Filesystem Corruptions

JET_errCheckpointCorrupt JET_errMissingLogFile JET_errLogFileCorrupt JET_errInvalidPath JET_errInvalidSystemPath JET_errInvalidLogDirectory JET_errFileAccessDenied JET_errFileInvalidType JET_errLogCorrupted JET_errObjectNotFound

Lost Flush

JET_errReadLostFlushVerifyFailure JET_errDbTimeTooOld JET_errDbTimeTooNew

Table 11: JET Error Code Groupings

Information Some failure events are more important than others. Lost Flush events signal significant data corruption has occurred and something is very wrong with your storage (under no circumstances should you entertain putting a system into production that is experiencing ANY lost flush events during a test). However, some other IO Failures are relatively normal, for example, in a JBOD environment we may see -1021 (JET_errDiskReadVerificationFailure) which, although signifies that the data we read was not the same that we originally wrote (checksum failed), Exchange will try to deal with this scenario via Page Patching in normal operation and so is not of critical importance.

For a full list of JET/ESE event types see the following article Extensible Storage Engine Error Codes.

Page 40
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

What is a Lost Flush? A lost flush occurs if we issued a write operation to the disk and the OS reported the operation as having successfully completed, but it actually didnt get physically committed to the non-volatile storage. The two main reasons for this to happen are: 1. A bug somewhere in the storage stack. 2. Power loss on storage with write-cache enabled: in this case, the operation is committed to the volatile cache of the disk or controller, but if the hardware loses power, it means it never actually made it to the non-volatile storage, even though it was reported to the application that it did. This is the reason why we only run with write-cache enabled on the storage if theres a battery backing up the cache, so if it loses power, the controller makes sure to flush the uncommitted cache to the disk. A lost flush is a very insidious type of storage failure for a database engine because the consequences can range from none (if we are very lucky) to nasty and potentially undetectable logical database corruption (more likely). Undetected lost flushes on the active copy may show up as a JET_errDbTimeTooNew (-567) replication error on the passive copy. Undetected lost flushes on the passive copy may show up as a JET_errDbTimeTooOld (-566) replication error on the passive copy. ESE has implemented lost flush detection, based on a flush map. Basically, every time we issue a write on a page, we flip a bit on the actual page and also store that bit in a flush map in memory. If we read the page again off the disk, we check the bit against the in-memory flush map and if they dont match, it means the flush was lost. Important: The bottom line for lost flushes is that you should NEVER put a system into production that has recorded lost flushes during the Jetstress test. You must be 100% certain that you have resolved the underlying problem and have at least one good 24 hour test that has no lost flushes recorded before accepting the solution into production.

Page 41
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.2.11 Test Log


This section of the report is a log of the Jetstress test. It is most often used for troubleshooting failures.

Page 42
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.3

Interpreting Jetstress test results


Jetstress evaluates latency values for Database Reads and LOG writes since these affect the end user experience. Performance Test Strict mode (<= 6 hour test) Average Database Read Latency: 20ms Average Log File Write Latency: 10ms Max Database Read Latency: 100ms Max Log File Write Latency: 100ms

Stress Test Lenient mode (> 6 hour test) Average Database Read Latency: 20ms Average Log File Write Latency: 10ms Max Database Read Latency: 200ms Max Log File Write Latency: 200ms

Page 43
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

9.4

Test evaluation
Evaluate the following criteria for each test run. The first test is validated against the design target and must be performed manually; Jetstress does not validate this value. The second and third are against pre-defined latency targets for Exchange, if these values are not within tolerance, Jetstress will report the test as failed. 1. DB IOPS Target: Is the Achieved Transactional I/O per Second in the test report higher than the Total Database Required IOPS / Server predicted in the Mailbox Role Calculator? 2. Is the I/O Database Reads Average Latency in the test report <20ms? 3. Is the I/O Log Writes Average Latency in the test report <10ms?

DB IOPS Target PASS FAIL

DB Read Latency PASS PASS

LOG Write Action Latency PASS PASS Test successful The test is failing to meet the IOPS target, but the latency values are good. Increase the thread count by 1 and re-test. Use sluggishsessions to finetune if necessary. At least one database has recorded latency over threshold. If the latency values are very close to limits increase sluggish sessions by 1, if both target IOPS and latency values are much higher decrease the thread count. If the test shows that Achieved IOPS is below the design target AND the test latency values are above limits the storage solution is unable to meet the requirements. At this stage, it is necessary to re-evaluate the storage design and begin troubleshooting the physical deployment to determine the correct remediation.

PASS PASS PASS FAIL FAIL FAIL

FAIL PASS FAIL FAIL FAIL PASS

FAIL FAIL PASS FAIL PASS FAIL

Table 12 - Quick results analysis table

Page 44
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

10 Appendix A Configuring thread count


Jetstress 2013 has been updated so that the auto-tuning feature will work in far more scenarios than previously. Due to this, it is recommended to begin Jetstress testing in auto-tuning mode and only revert to manual thread configuration if auto-tuning fails to set a thread value. Thread count controls how many IOPS Jetstress attempts to drive through the storage subsystem. Setting this value correctly requires some trial and error. For the process described within this document the goal is to increase the thread count to a value that fails and then reduce the value until the test passes, this should then represent the peak working IOPS value that the storage subsystem can support. Each thread will generate a workload on the system. So for example, if the storage design team recommended that the storage for a given server was able to support 1000 IOPS: Target IOPS = 1000
( )

Starting thread count = Given this example Starting thread count =


(

= 15.38 (round up to 16)

Notes: Try auto-tuning with Jetstress 2013 If in doubt start with thread=1 and work up until the test fails. If the thread count predicted is less than 1 it may be necessary to modify the sluggishsessions value afterwards. The exact quantity of IOPS generated per thread will change as the storage system workload changes. As the storage system gets closer to its performance limit the IOPS per thread value will reduce. Jetstress was designed to produce approximately 60 IOPS per thread at 20ms disk latency.

Page 45
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

11 Appendix B Configuring sluggishsessions


If it is not possible to achieve the right IOPS value by modifying the thread count it becomes necessary to modify the sluggishsessions value within the JetstressConfig.xml file. The sluggishsessions value adds a pause between each task. This allows a level of fine-tuning over the workload dispatched by Jetstress. As sluggishsessions is increased, the achieved IOPS value decreases. To change the value, open the JetstressConfig.xml file and look for the default configuration option <SluggishSessions>1</SluggishSessions> Modify the value, save the configuration file and then re-start Jetstress.

Page 46
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

12 Appendix C - Running a Jetstress Test with JetstressCmd.exe


Both JetstressWin.exe and JetstressCmd.exe use the common Jetstress core library files, which means you will have comparable test results with the same XML configuration file. We recommend that you use JetstressWin.exe to create new test scenarios, and JetstressCmd.exe to open and run the test scenarios by using the /config command-line option. You can also see all the other available options by using the /? (help) command-line option. Action Argument help Example of Use /? Description The help for the command-line program Open a configuration file Generate a sample XML configuration file Test Duration. Default is 2 hours. Path for test output. Default is the current directory. Database paths for each storage group Log path for each storage group Specify capacity percentage Specify throughput percentage Suppress auto tuning and specify thread count Do not run background database maintenance during performance/stress test Run background database maintenance during soft recovery test /new Create new databases

Config Generate

/c JetstressConfig.xml /g

TimeOut

/TimeOut 2H0M0S

Output

/output c:\output

DBPath

/dbpath m:\sg1\mdb /dbpath n:\sg2\mdb /log x:\sg1\log y:\sg2\log /pctcapacity 100 /throughput 100 /threads

LogPath PctCapacity Throughput Threads

DoNotRunDBMPerformance

RunDBMPerformance

New

Page 47
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

Open Bak Recovery Streaming Transaction

/open /bak /recovery

Open existing databases Restore backup database Run soft recovery test Run streaming backup test Run transaction performance test Run database checksums

VerifyCheckSum

Page 48
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

13 Appendix E Running Jetstress on a production server


Although the formal support position on this is that you shouldnt do it ever at all under no circumstances in fact you shouldnt even be reading this section of the field guide however, we all accept there are cases where it can be necessary, such as when attaching new storage to an existing server or troubleshooting performance bottlenecks on existing servers. That still doesnt mean its ok to do it!! If you really MUST do it, here are some things to know before beginning Record the start-up state of all Exchange Services. Stop and Disable all Exchange Services on the server. Copy the ESE files from the currently installed version of Exchange server Jetstress will detect that the performance counters are already installed for this version of ESE and will use them, this will prevent performance counter problems afterwards! Do not unload/reload performance counters after the test (if you have used the same ESE files as are currently installed this is unnecessary and could break things!). Remember to clean up the Jetstress test databases after testing. Uninstall Jetstress. Set Exchange Services back to the state they were in before you began testing. Reboot your Exchange Server. Inspect Exchange Performance counters are working. Inspect Windows System and Application Event logs for errors.

Remember: This is not supported or recommended only follow this as a matter of last resort or under the instruction of Microsoft Support/Microsoft Consulting Services.

Page 49
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

14 Common Issues
14.1 Troubleshooting Jetstress
While using Jetstress, you may encounter some known issues with Jetstress. This section provides possible causes, and the recommended solutions.

14.1.1 Jetstress cannot attach to or create a database


Event log error that may display: Error -1023 Possible cause: The path of the database or log files is incorrect. Solution: Ensure that the paths and file names are correct.

Event log error that may display: Error -1032 Possible cause: Permissions are insufficient to access the .edb file or the log files. Solution: Verify that permissions are sufficient for the account under which Jetstress is running. Jetstress requires read/write permission to the directories it is using.

Event log error that may display: Error -550 (0) Possible cause: The last time Jetstress was run, it was ended uncleanly. This caused the log files to become unsynchronized with the database. Solution: Delete the Jetstress database (*.edb), log files (*.log), and check file (*.chk), and re-create the Jetstress database. You can also use Eseutil.exe with the /r switch to resynchronize the logs and database.

Event log error that may display: Error -1022 Possible cause: The failure is caused by circular logging by Jetstress. Solution: Check the log drive for the log file name that is identified in the event log. Delete that log file and all the log files that have a higher number in the file name. Then, run Eseutil.exe /r to recover Jetstress.edb. When the database is in a good state, delete all the log files in the log directory, and rerun Jetstress.

14.1.2 Error loading Performance Monitor counters


JetstressWin.exe relies on performance counters to monitor the system. JetstressWin.exe requires the ESE database counters to be installed. Cause: When the counters are not loaded correctly, you may see exception errors related to performance counters. Solution: To reload the counters, exit from JetstressWin.exe. Locate the directory where JetstressWin.exe was installed and verify that eseperf.dll, eseperf.hxx, and eseperf.ini files exist in the directory. In a command shell window, type the command unlodctr ESE and then click Enter. This will unregister the ESE Database performance counters. Start JetstressWin.exe and allow it to reload the performance counters.

Page 50
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

14.1.3 Unable to tune for the parameters


This error indicates that Jetstress could not find appropriate parameters that could be used to run a performance or stress test at the desired level of I/O load. Cause: This can be caused by several factors. The most common reason is that the storage subsystem has multiple hosts attached to it, and those hosts are competing for common resources during the tuning process. Solution: When you are running in a scenario such as this, you can run Jetstress on a single host with tuning enabled to generate the appropriate load parameters, and then rerun the test on the other hosts with the Suppress Tuning option enabled and the tuning parameters entered manually from the results of the first test.

14.1.4 Unable to mount databases due to invalid mount point configuration


When using mount points and running the Prepare phase of Jetstress, the operation fails with error There is insufficient disk space on volume <system drive>:\ , where <system drive> is the drive letter where you keep your root mount folder. Cause: This error means that one or more of the mount points is invalid or the mount point folder path is not connected to its LUN. Database creation fails saying that volume C: (or in general, the system volume) does not have enough space. The issue here is that some of the mount-points mapped to directories in the system volume are not properly configured and so Jetstress is looking at the directory (thus checking against the system drive itself), rather than the actual disk. Troubleshooting: Execute a DIR command in the mount point root folder. ALL mount point folder paths are indicated by a <JUNCTION> notation. Any folder that is listed as a <DIR> is not attached to its mount point and is likely causing the problem.

Solution: The mount path folder could be listed as <DIR> for a number of reasons:
Page 51
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Prepared for Exchange Community

1. Verify the LUN is present and in good health. 2. Use the storage system array management software to verify that the LUN has an assigned logical drive. 3. Using the Disk Management MMC, re-assign the LUN to the correct mount-point.

14.1.5 Jetstress testing failed. Error: System.ApplicationException: Faulty performance counter paths: \MSExchange Database(*)\*
Jetstress version 658.004 has an incompatibility with ESE version 620 (CU1) and above, if you try to run a test with more than 38 databases configured. If you experience this issue either use the RTM version of ESE (516.26) or use a version of ESE later than 726, which will be released with CU2. Additionally, a fixed version of Jetstress will be released (726) that will work with all versions of ESE after 516.26 (Exchange 2013 ESE).

Page 52
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8 Draft Prepared by neil.johnson@microsoft.com "Document1" last modified on 26 Feb. 14, Rev 2

Anda mungkin juga menyukai