2 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
Contents
Executive Summary / Introduction ............................................................................ 4
Overview ................................................................................................................. 4
About the Open Text Archive Server ........................................................................ 5
Architecture ............................................................................................................ 5
Scalability and Distribution ............................................................................... 5
Features of the Open Text Archive Server ............................................................. 6
Single Instance Archiving ................................................................................. 6
Compression .................................................................................................... 6
Encryption of the stored data ........................................................................... 6
Secure Data Transport ..................................................................................... 6
Data transport secured with checksums .......................................................... 6
Retention Handling .......................................................................................... 6
Storage Management ............................................................................................. 7
Logical archives ............................................................................................... 7
Hardware abstraction ....................................................................................... 7
Supported storage media ................................................................................. 7
Backup, Replication, High Availability and Disaster Recovery ............................... 7
Backup ............................................................................................................. 7
Disaster recovery ............................................................................................. 8
Remote standby ............................................................................................... 8
High Availability ................................................................................................ 8
About Microsoft Windows Azure Storage ................................................................ 9
Archive Server integration with Azure Storage ..................................................... 10
Business Case........................................................................................................... 12
How can Open Text customers profit from the Microsoft Azure
Storage?......................................................................................................... 12
What are the benefits for the customer .......................................................... 12
Performance Measurements .................................................................................... 13
Test scenarios ....................................................................................................... 13
Test environment .................................................................................................. 13
Host system ................................................................................................... 14
Virtual test clients ........................................................................................... 14
Archive Server ............................................................................................... 15
Network connection ....................................................................................... 15
Performance Results ............................................................................................ 1716
Load on the Archive Server .............................................................................. 1716
Iteration with 10 kB documents ........................................................................ 2019
Iteration with 20 kB documents ........................................................................ 2221
Iteration with 50 kB documents ........................................................................ 2322
Iteration with 100 kB documents ...................................................................... 2423
3 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
4 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
5 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
Architecture
The Open Text Archive Server comprises multiple services and processes, such
as the Document Service, the Administration Server and the Storage Manager.
The Document Services provides document management functionality, storage of
technical metadata, and secure communication with archiving clients. . The
Storage Manager is responsible for managing external devices. The
Administration Server offers an API to administer the archive environment, tools
and jobs.
Open Text
ArchiveServer
Architecture
6 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
Compression
In order to save storage space, content can be compressed before writing to
storage system. Compression can be activated different content types, and can
reduce storage storage by more than 30 percent.
Retention Handling
The Archive Server allows applying retention periods to content. Retention
periods are handled by the Archive Server and are passed to the storage
platform, as far as the storage platform supports the notion of retention.
7 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
Storage Management
Logical archives
A logical archive is an area on the Archive Server in which documents belonging
together can be stored. Each logical archive can be configured to represent a
different archiving strategy appropriate to the types of documents archived
exclusively there.
Logical archives make it possible to store documents in a structured way. You
can organize archived documents in different logical archives according to
various criteria, e.g.
Compliance requirements
Storage platforms
Security requirements
Hardware abstraction
Key task of the Archive Server is hiding specific hardware characteristics to
leading applications, providing transparent access, and optimizing storage
resources.
The Archive Server can handle various types of storage hardware; and provides
hardware abstraction by offering a unified storage. If a hardware vendors storage
API changes, or if new versions come up, its not necessary to change all the
leading applications using the hardware, but only the Archive Servers interface to
the storage device.
8 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
Archive Server can create copies of volumes as backups. The copies may be
produced on the local archive server or on a remote backup or standby server. To
avoid losing data in the event of a hard disk failure and resume using Archive
Server immediately, we recommend using RAID (Redundant Array of
Independent Disks) technology as an additional data backup mechanism.
In addition to document content, administrative information is synchronized
between original and backup systems.
Disaster recovery
The Archive Server stores the technical meta data together with content on the
storage media (e.g. DocId, aid, timestamp, ). This allows Archive Server to
completely restore access to archived documents in case the Archive Server
hardware has a major breakdown or has been destroyed.
Remote standby
With a remote standby server, all the documents in an archive are duplicated on
a second Archive Server the remote standby servervia a WAN connection for
geographic separation. If the production Archive Server fails, the remote standby
server continues to provide read-access to all the documents. Physically
separating the two servers also provides optimal protection against fire, flood and
other catastrophic loss.
High Availability
To eliminate long downtimes, the Archive Server offers active-passive high
availability.
High availability is a two node cluster solution, in which a fully-equipped Archive
Server node monitors the current production system by heart-beat. If a node fails,
the other node automatically assumes all activities, with full transparency for end
users.
If the production system fails, users can continue to work normally on the
secondary archive system. In contrast to the remote standby server scenario,
both read (retrieval) and write (archiving) access to documents is possible in this
configuration.
9 | M ic r os of t W indo w s A zu r e P la tf or m W hite P ap er
The following picture shows an Azure device names OTCloud. Five volumes are
configured.
The volumes market_vol1, market_vol2, market_vol3 are used in the logical archive
HH_LA_4.
Business Case
How can Open Text customers profit from the Microsoft
Azure Storage?
Any customer using an application based on the Open Text ECM Suite and the
Archive Server is a candidate for using Azure Storage. Customers have to
upgrade to Archive Server 9.7.1. Use of Azure Storage is not restricted to
Microsoft Windows platforms, but also available for Unix OS, such as Sun
Solaris, IBM AIX, HP HP-UX and Linux. The Archive Server runs on-premise at
customer site whereas the Azure Storage is provided over the Internet.
To use Azure Storage customers need to contact Microsoft for an account. With
the account the customer can configure the storage environment (see page 10)
and start using the cloud.
The Archive Server comes with an in-built Volume Migration tool which allows
transparently migrating existing content on local hardware to the Azure Storage.
Performance Measurements
Performance tests were done by using the Open Text XOTE test tool.
The XOTE test tool is an internal test suite developed by the Open Text Quality
Assurance department to run performance and benchmark tests with the Archive
Server and storage platforms. The tool allows creating arbitrary documents of
different size; supports automated test scenarios and collects result for evaluation
in log files.
Within the benchmark test the following test cases were set up.
Test scenarios
1.
2.
3.
4.
5.
6.
Document size
20000
10 kB
10000
20 kB
10000
50 kB
5000
100 kB
5000
200 kB
2000
500 kB
2000
1000 kB
Measured times are extracted from the log files of the tool with a precision of
milliseconds. The start and end times are given in GMT+1 (CET).
Test environment
The test setup consists of one Archive Server 9.7.1 connected to Microsoft
Windows Azure storage.
ARCHIVE SERVER
Document Service
15 connections
libAzure
http client
15 connections
http server
There are four test PCs each hosting 5 virtual clients that send parallel read and
write request to the Archive Server. In sum, 20 parallel clients send read and
write requests.
All servers are hosted on a Hyper-V server with Microsoft Windows 2008 Server
as operating system.
Host system
2 x Quad Core Opteron 2376 (2,3GHz, 6MB)
32GB (8x4GB Dual Rank DIMMs) 667MHz
450GB SAS 15.000 1/min
Gigabit Ethernet network
Archive Server
Version 9.7.1, Patch AS097-057
Windows Server 2008 R2 Standard (64 Bit)
4 (virtual) CPU 2,3 GHz Opteron 2376
4 GB memory
Network connection
The Archive Server is connected via a Gigabit Ethernet to the Open Text
network in Munich/Germany. The Open Text network (Ethernet backbone)
connects via the Internet (155 Mbit) to the cloud storage stored in South US.
Therefore, the latencies and throughput from the Archive Server to Windows
Azure is dominated by a combination of (a) the connection between Munich
and South US, (b) the bandwidth between the two sites..
Location: Munich, Germany
Client PC:
Windows Server 2008 R2 Standard (64 Bit)
2 (virtual) CPU 2,3 GHz Opteron 2376
2 GB memory
Internet
Microsoft Azure
Storage
Hyper-V Server
2 x Quad Core Opteron 2376
(2,3GHz, 6MB)
32GB (8x4GB Dual Rank DIMMs)
667MHz
450GB SAS 15.000 1/min
Client PC:
Windows Server 2008 R2 Standard (64 Bit)
2 (virtual) CPU 2,3 GHz Opteron 2376
2 GB memory
Hyper-V Server
2 x Quad Core Opteron 2376
(2,3GHz, 6MB)
32GB (8x4GB Dual Rank DIMMs)
667MHz
450GB SAS 15.000 1/min
Internet
Microsoft Azure
Storage
Performance Results
Load on the Archive Server
The following figures show the load of the server during the different phases.
These figures didnt change with different document size.
Figure 5 Archive Server taskmanger during archiving documents to the disk buffer
Figure 6 Archive Server taskmanger during verifying documents on the disk buffer
Figure 8 Archive server taskmanger during purging documents from the disk buffer
Figure 9 Archive server taskmanger during verifying documents from Microsoft Windows Azure
Figure 10 Archive server taskmanger during deletion of documents from Microsoft Windows Azure
Action
AVG (ms)
Min (ms)
Max (ms)
19,75
< 16
297
87,25
16
656
1.239,89
1.190
2.846
825,25
578
11.327
1.153,75
828
2.969
Write to Azure
Read from Azure
Delete from Azure
6,5 hours
2009-11-13 22:11:54 (Fri)
2009-11-14 04:32:12 (Sat)
The cause for the maximum value is unknown. The average was calculated over
20.000 documents. The minimal value (578 ms) for reading from Azure is an
upper boundary for the latency time.
In the Figure 11 the average time per step during the test is shown in a graphical
view.
Write to Diskbuffer
1.000,00
800,00
Write to Azure
600,00
400,00
200,00
0,00
20000 documents
AVG (ms)
Min (ms)
Max (ms)
20,00
< 16
344
121,25
16
39.059
1.213,52
1.170
2.441
822,50
578
22.048
1.125,75
827
2.375
Write to Diskbuffer
1.000,00
800,00
Write to Azure
600,00
400,00
200,00
0,00
10000 documents
6,5 hours.
2009-11-13 22:11:54 (Fri)
2009-11-14 04:32:12 (Sat)
AVG (ms)
Min (ms)
Max (ms)
37,75
16
5.044
200,50
16
859
1.362,63
1.180
370.016
881,50
594
96.232
1.141,25
812
4.429
50 kB documents
1.400,00
1.200,00
Write to Diskbuffer
1.000,00
800,00
Write to Azure
600,00
400,00
200,00
0,00
10000 documents
approx. 7 hours.
2009-11-07 11:31:44 (Sat)
2009-11-07 18:12:24 (Sat)
AVG (ms)
Min (ms)
Max (ms)
54,25
16
328
350,50
31
906
Write to Azure
1.402,12
1.354
4.332
1.310,25
828
6.874
1.114,00
797
2.531
100 kB documents
1.600,00
1.400,00
1.200,00
Write to Diskbuffer
1.000,00
800,00
Write to Azure
600,00
400,00
200,00
0,00
5000 documents
Figure 14 Graphical overview for 100 kB documents
The graphic shows that the read request is longer than the delete request and
almost as long as the write request.
The following iterations show that the time of the read process will increase with
file size.
Iteration duration for 5.000 documents:
Start:
End:
approx. 7 hours
2009-11-07 21:16:57 (Sat)
2009-11-08 04:17:31 (Sun)
AVG (ms)
Min (ms)
Max (ms)
91,00
47
4.218
707,50
78
20.499
Write to Azure
1.650,55
1.549
3.839
2.269,25
1.203
36.934
935,75
811
2.577
200 kB documents
2.500,00
2.000,00
Write to Diskbuffer
1.500,00
1.000,00
500,00
0,00
5000 documents
AVG (ms)
Min (ms)
Max (ms)
199,00
156
1.172
1.551,75
202
2.906
Write to Azure
2.530,55
2.221
6.773
4.016,50
2.125
10.826
1.089,75
811
2.250
500 kB documents
4.500,00
4.000,00
3.500,00
3.000,00
2.500,00
2.000,00
1.500,00
1.000,00
500,00
0,00
Write to Diskbuffer
Read from Diskbuffer
Write to Azure
Read from Azure
Delete from Azure
2000 documents
Figure 16 Graphical overview on results for 500 kB documents
approx. 10 hours.
2009-11-09 19:58:19 (Mon)
2009-11-10 06:10:26 (Tue)
AVG (ms)
Min (ms)
Max (ms)
383,75
328
1.484
3.281,50
484
4.453
Write to Azure
3.866,18
2.544
5.511
7.249,75
3.952
13.405
1.074,00
828
3.154
1.000 kB documents
8.000,00
7.000,00
6.000,00
Write to Diskbuffer
5.000,00
4.000,00
Write to Azure
3.000,00
2.000,00
1.000,00
0,00
2000 documents
11 hours:
2009-11-13 10:21:04 (Fri)
2009-11-13 21:09:09 (Fri)
Summary
Any interpretation of the results has to be done with care. There are a lot of
known and unknown parameters influencing the results.
The following parameters can influence the results:
Outlook
The results show that the Internet latency seems to be limiting factor for write and
read requests. To proof the assumption and to overcome this factor several steps
are possible.
Client and Archive Server installation in the U.S. or use a European data
center with the clients in Munich.
time in msec
size
Write to Diskbuffer
Read from Diskbuffer
Write to Azure
Read from Azure
Delete from Azure
10 KB
20 KB
50 KB
100 KB
200 KB
500 KB
1000 KB
20
87
1.240
825
1.154
20
121
1.214
823
1.126
38
201
1.363
882
1.141
54
351
1.402
1.310
1.114
91
708
1.651
2.269
936
199
1.552
2.531
4.017
1.090
384
3.282
3.866
7.250
1.074
The following graphic shows an overall view on the different test runs.
8.000
7.000
6.000
Time in ms
5.000
Write to Diskbuffer
4.000
3.000
2.000
1.000
0
0
500
1000
Document size in kB
4,00
3,50
3,00
2,50
2,00
1,50
Write rate
1,00
0,50
0,00
0
200
400
600
800
1000
Document size in kB
Figure 19 Write rates for Microsoft Windows Azure
The write rate to Azure is calculated from the number of connections to Azure,
document size and write time per document.
Write Rate = # of connections/write time * document size
3,00
2,50
2,00
1,50
Read rate Azure
1,00
0,50
0,00
0
200
400
600
800
1000
Document size in KB
10.000
Time in ms
1.000
Write to Diskbuffer
100
10
1
0
200
400
600
800
1000
Document size in kB
Figure 20 Overall view of benchmark test with Microsoft Windows Azure Cloud Storage (logarithmical)
[email address]
[phone number]
Support:
[email address]
[phone number]
w w w. o p e n t e x t . c o m
For more information about Open Text products and services, visit www.opentext.com. Open Text is a publicly traded company on both NASDAQ (OTEX) and the TSX (OTC).
Copyright 2009 by Open Text Corporation. Open Text and The Content Experts are trademarks or registered trademarks of Open Text Corporation. This list is not exhaustive. All other
trademarks or registered trademarks are the property of their respective owners. All rights reserved. SKU#_EN