This document describes products and services of Pegasystems Inc. It may contain trade secrets and proprietary information. The document and product are protected by copyright and distributed under licenses restricting their use, copying distribution, or transmittal in any form without prior written authorization of Pegasystems Inc. This document is current as of the date of publication only. Changes in the document may be made from time to time at the discretion of Pegasystems. This document remains the property of Pegasystems and must be returned to it upon request. This document does not imply any commitment to offer or deliver the products or services described. This document may include references to Pegasystems product features that have not been licensed by your company. If you have questions about whether a particular capability is included in your installation, please consult your Pegasystems service consultant. For Pegasystems trademarks and registered trademarks, all rights reserved. Other brand or product names are trademarks of their respective holders. Although Pegasystems Inc. strives for accuracy in its publications, any publication may contain inaccuracies or typographical errors. This document could contain technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Pegasystems Inc. may make improvements and/or changes in the information described herein at any time.
This document is the property of: Pegasystems Inc. 101 Main Street Cambridge, MA 02142-1590 Phone: (617) 374-9600 Fax: (617) 374-9620 www.pega.com PegaRULES Process Commander Document: Database LOB Sizing and Performance Optimization Software Version 5.4 Updated: February 4, 2009
CONTENTS
1 2 3 4 5 OVERVIEW ..................................................................................................................... 1 SUGGESTED APPROACH ................................................................................................. 1 THE STRUCTURE OF PEGARULES DATA ........................................................................ 1 DATABASE SIZING SCRIPTS AND SCRIPT OUTPUT ........................................................... 2 ORACLE BLOB SIZING................................................................................................... 2 5.1 5.2 5.3 5.4 5.5 6 6.1 6.2 6.3 6.4 6.5 6.6 7 7.1 7.2 7.3 7.4 7.5 Using the Oracle Sizing Script..................................................................................................... 2 Effects of Row Storage, Chunk Size, and BLOB Caching ........................................................ 3 Performance and Sizing Guidelines for PegaRULES BLOBs in Oracle .................................. 4 Exposed Columns and Table Sizing in Oracle Databases.......................................................... 4 Performance and Sizing Guidelines for Oracle Tables .............................................................. 5 Using the Sizing Scripts for DB2 Versions Earlier Than V8.2.................................................. 6 Using the Sizing Scripts for DB2 Versions 8.2 and Later.......................................................... 7 BLOB Storage in DB2 UDB ....................................................................................................... 7 Performance and Sizing Guidelines for PegaRULES BLOBs in DB2 ..................................... 8 Exposed Column and Table Sizing in DB2 Databases .............................................................. 8 Performance and Sizing Guidelines for DB2 Tables ................................................................. 9 Using the Sizing Script with SQL Server 2000 .......................................................................... 9 Using the Sizing Script with SQL Server 2005/2008 ...............................................................10 BLOB Storage in SQL Server ...................................................................................................11 Performance and Sizing Guidelines for PegaRULES BLOBs in SQL Server .......................12 Exposed Column and Table Sizing in SQL Server Databases ................................................12
1 Overview
This document explains how to estimate disk requirements for the PegaRULES database by running sizing scripts, calculating an estimated size, and tuning performance for BLOB (Binary Large Object) data. Database system types described in this document are Oracle, DB2, and SQL Server. This document is intended for database and system administrators. Readers of this document should be familiar with the management of their database system and the PRPC database.
2 Suggested Approach
The BLOB sizing scripts should be run during new application development, at the point when the developers have some knowledge of the volume of data being generated, and need to estimate the size of the database and its BLOBs. The developers of more mature applications should run the sizing procedures to estimate disk space whenever new work objects or work types are introduced that will put a new strain on the database. Important factors in both cases include the size of the applications work objects, and whether or not BLOB compression is enabled. Finally, the procedures can be run on a schedule, as part of regular database administration, to aid the DBA in estimating the growth of the database for each table. Typically database management systems store BLOB data in an area separate from the data in exposed columns. In addition, default caching and access methods for this binary data may be different from the ones used by regular data. Therefore, sizing and managing the BLOB data for space, performance, and scale requires careful consideration. Sizing information generated by the script helps a DBA decide on the appropriate placement and sizing of the BLOB data for each table in PRPC. Sizing entails a three-part procedure: Run the sizing script appropriate for the database type to collect storage allocation information on both the BLOB column and all exposed columns for each table in the database. Use the script data as input to sizing calculations. These yield the sizing estimates for the tables. Adjust database settings accordingly. Use the given performance guidelines to re-organize the storage of the database tables.
Compression, which can reduce BLOB size by a third or more, is enabled by a property setting in the PRPC pegarules.xml file (versions 4.x and earlier) or the prconfig.xml file (versions 5.x and later). Compression is turned on by default. Refer to PRKB-9850, How to Compress the BLOB Values in the PegaRULES Database, for further information. Scripts for each supported database system are in the zip file Blob_sizing.zip. Download this zip from: http://pdn.pega.com/DevNet/PRPCv4/TechnologyPapers/TechPapers.asp
Text data, used by the application, is stored in relational, or exposed, columns in database tables. A developer might choose to add to the exposed data by adding text columns for data maintained in the BLOBs.
SQL Server
All versions Version 8.2 and later (Version 8.2 and Version 8.1 Fix Pack 7 are the same.) All versions earlier than 8.2 All versions
No Yes
Yes Yes
Script Output
For each table in the PegaRULES schema, the script output lists: Name of Table Row Count Average Row Length: The average row size in bytes, including the exposed columns and the tables LOB locator value. Average BLOB Length: The average size in bytes of the tables BLOB column. Max BLOB Length: The size in bytes of the tables largest BLOB.
A PegaRULES table can have either zero or one BLOB column. If a table meets the following two conditions, it will not have a BLOB column: Average BLOB length and Max BLOB length equals 0, -, or NULL. Number of rows is greater than 0.
5.1
Download and open Blob_Sizing.zip. Extract oracle_sizing.sql to a local directory on your system and navigate to that directory. Log on to the Oracle database using SQL*Plus with DBA credentials. For example:
C:\>sqlplus sys/manager as sysdba
Run the script using the name of the PegaRULES schema as the argument.
sql>@oracle_sizing.sql PEGA
where PEGA is the schema name for the PegaRULES application. The schema name must be in upper case letters. 6. Turn off spooling and exit.
sql>spool off; sql>exit
The resulting file (pega_sizes.txt) contains a tabular report that shows row and column size statistics for each table in the PegaRULES schema.
SQL> @oracle_sizing.sql PEGA Table Name PC_ASSIGN_WORKBASKET PC_ASSIGN_WORKLIST PC_DATA_WORKATTACH PC_HISTORY_WORK | | | | | NUM ROWS | AVG ROW SZ | MAX LOB LEN | AVG LOB LEN 0 | 0 | | 1 | 2237 | 1867 | 1867 0 | 0 | | 13 | 1598 | 1625 | 1117
5.2
Row Storage. Oracle uses a complex strategy for storing and accessing BLOB data. BLOBs may be
stored either in the row (along with other data) or in a separate storage area. If the clause ENABLE STORAGE IN ROW is used during table creation, and if the size of the BLOB is less than 3964 bytes, it will be stored together with other row data. When BLOBs are stored in a row, multiple BLOBs may reside in a single data block. ENABLE STORAGE IN ROW is the default setting for all tables. If DISABLE STORAGE IN ROW clause is used during table creation, then regardless of the BLOBs size, it is stored in a separate area.
Chunk Size. A chunk is one or more Oracle blocks, and the default chunk size is one block. When data
is stored in an area outside of the row, the CHUNK SIZE parameter sets the number of bytes Oracle will read in and out of the BLOB at a time. Specify the chunk size for the LOB when creating the table that contains the LOB. Chunk size corresponds to the data size used in Oracle when accessing or modifying the LOB value. Set CHUNK SIZE to a multiple of the tablespace block size for that LOB column. For example, if the block size is 8192 bytes, using a chunk size of 16384 bytes will require two blocks for each row with BLOB data (16384 bytes / 8192 bytes). If the CHUNK SIZE value is less than the block size, or not a multiple of it, Oracle sets the CHUNK SIZE to the next closest multiple of the block size. In the above example, if the CHUNK SIZE is set to 1024, Oracle resets it to 8192 bytes.
BLOB Caching. BLOBs that are stored in table rows use the same buffer caching rules as the tabular
row, as they reside in the same data block as the rows. By default, BLOBs stored out of the row use direct path reads and writes to store and retrieve data without caching. Another way to cache BLOBs is to use the CACHE parameter when defining the BLOB storage clause. This parameter causes reads and writes to BLOB data to use the buffer cache. Since CACHE is used for frequently-accessed data, set the CACHE parameter for BLOBs that are small enough to stay in memory without excessive thrashing. An alternative to the CACHE parameter is CACHE READ. This causes BLOB data to be brought into the buffer cache during read operations only. During write operations, the BLOB will use direct path writes and bypass the buffer cache.
5.3
1.
2.
3.
4.
5.
6.
7.
8.
5.4
The tables that are most likely to grow over time are:
Depending on the flows and the processes developed, additional tables (such as pc_other) may grow as well; discuss this issue with your PRPC System Architect.
4 Database LOB Sizing and Performance Optimization
Use the data from Pega_sizing.txt to calculate the size required for the exposed columns for each table.
1.
Estimate the number of rows per DB block by dividing the block size by the average row size from pega_sizes.txt.
DB Block Size / Average row size = Rows per block
2.
Calculate the number of blocks required for the table by dividing the number of rows by the number of rows per block.
Number of rows in table / Number of rows per block = Number of blocks needed per table
3.
Calculate the space required for the exposed columns by multiplying the blocks needed by the block size.
Number of blocks needed per table * block size = Space required
For example, consider this example space calculation for a database created with a block size of 8K, in which the table pc_work contains 2000 rows with an average row length of 1.5K (from pega_sizes.txt).
Block size / Ave row size = Rows per block 8K / 1.5K = 5 Rows in table / Rows per block) = Blocks required 2000 / 5 = 400 Blocks required * Block size = Space required 400 * 8K = 3200K (3.2MB)
In this example, the pc_work table requires approximately 400 8K blocks for storage, or 3.2 MB of space.
Notes
Keep in mind that this figure is an estimate, because it does not account for the per-block header of approximately 80 to 150 bytes, nor for any percent free space value that might have been specified when the table was allocated. Both values must be subtracted from the DB block size to report a more accurate value. Discard any remainder portion of the rows per block value. It is necessary to round down this number, because the system cannot split a row across blocks (8 / 1.5 = 5.33, rounded down to 5).
5.5
1.
For example, if a user has requested a large amount of data (256K) through a Windows system, the system can return this data in 64K increments. If the database block size is 8K, and db_file_multiblock_read_count is not set, then each read will return only 8K. It would require 32 reads to return all the data. However, if db_file_multiblock_read_count is set to 8, the system will return 8 blocks of 8K with one read. Thus only four reads are necessary, greatly increasing the efficiency of the database access.
6.1
Using the Sizing Scripts for DB2 Versions Earlier Than V8.2
Use the sizing script db2_pre8.2_sizing.db2 if you are running DB2 versions earlier than 8.2. The download location for the sizing script is: http://pdn.pega.com/DevNet/PRPCv4/TechnologyPapers/TechPapers.asp Open Blob_Sizing.zip and extract the script db2_pre8.2_sizing.db2 to a local directory. From DB2s Command Center Utility, run the RUNSTATS command for all tables in the PegaRULES schema. 3. Edit the db2_pre8.2_sizing.db2 script: Substitute your database name, logon user and user password for the placeholders in the scripts first line:
1. 2.
Enter an output directory location for the files exported by the script. 4. Run the script:
$>db2 td@ -f db2_pre8.2_sizing.db2
The output file, avg_row_length.txt, contains a list of tables, along with the avg_col_size and number of pages used for each table. 5. This script creates another SQL script file (by default in C:\) called SIZESCRIPT.DB2. Open this script for editing. Substitute your database name, logon user and user password for the placeholders in the scripts first line:
Connect to <DATABASE> user <USERNAME> using <PASSWORD>;
Find all instances of the double-quote character (") and replace with a blank character (space). 6. Run the updated script:
$>db2 t f SIZESCRIPT.DB2 r lob_sizes.txt
The output file, lob_sizes.txt, will contain a list of table names in the PegaRULES schema, along with a count of the number of rows and the average and maximum size of the LOB field.
Database Connection Information Database server SQL authorization ID Local database alias = DB2/NT 8.1.0 = PRPCV51 = PEGA
TABLENAME NUM_ROWS AVG_LOB_LEN MAX_LOB_LEN ------------------ ----------- ----------- ----------PC_ASSIGN_WORKLIST 0 1 record(s) selected. TABLENAME NUM_ROWS ---------------- ----------PC_DATA_UNIQUEID 1 1 record(s) selected. ...
6.2
Using the Sizing Scripts for DB2 Versions 8.2 and Later
The later DB2 versions use the SQL scripts db2_sizing_proc.sql and db2_sizing_run.sql to determine the size of the LOB columns. The download location for the sizing script is: http://pdn.pega.com/DevNet/PRPCv4/TechnologyPapers/TechPapers.asp
Note
1. 2.
You must have a compiler installed on your DB2 server in order to install and run stored procedures.
Open Blob_Sizing.zip and extract the script db2_pre8.2_sizing.db2 to a local directory. From DB2s Command Center Utility, run the RUNSTATS command for all tables in the PegaRULES schema. 3. Substitute your database name, logon user and user password for the placeholders in the scripts first line:
Connect to <DATABASE> user <USERNAME> using <PASSWORD>;
4.
Run the db2_sizing_proc.db2 script to set up the stored procedures required to return the results.
$>db2 td@ -f db2_sizing_proc.db2
5.
This file lists the average row length, number of rows, and average LOB size for tables in the PegaRULES schema. Run this script on a regular schedule to determine the growth rate of the PegaRULES database.
6.3
In DB2, LOB columns are stored in a different physical format, separate from other data in a row. LOB data is stored in 64 MB areas broken up into segments starting at 1024 bytes and doubling in size with each successive segment (that is, the first segment is 1024 bytes, the next 2048, the next 4096 bytes, etc., up to 64MB). You can store BLOB columns and indexes in separate tablespaces from the tabular rows using the LONG IN clause of the CREATE TABLE statement. Note that the tablespace specified for the table with a BLOB must be a DMS tablespace, and the tablespace specified for the BLOB column must be a LARGE DMS tablespace.
Database LOB Sizing and Performance Optimization 7
Note
The separation of BLOB columns can only be done as part of the table creation, and not with the ALTER TABLE statement.
LOB fields are never placed in the buffer cache and can only take advantage of O/S level file system caching, if available.
6.4
1.
2.
3. 4. 5.
BLOB compression can greatly improve utilization of disk storage. Refer to PRKB-9850, How to Compress the BLOB Values in the PegaRULES Database, for further information.
6.5
The tables that are most likely to grow over time in PRPC are:
Depending on the flows and the processes developed, additional tables (such as pc_other) may grow as well; discuss this issue with your PRPC System Architect. Use the data from the sizing scripts (refer to Section 6.1 or Section 6.2) to calculate the size required for the exposed columns for each table.
1.
Estimate the number of rows per DB page by dividing the page size by the average row size from pega_sizes.txt.
DB page size / Average row size = Rows per page
2.
Calculate the number of pages required for the table by dividing the number of rows by the number of rows per page.
Number of rows in table / Rows per page = Number of pages needed for table
3.
Calculate the space required for the exposed columns by multiplying the pages needed by the page size.
For example, consider this example space calculation for a database created with a page size of 4K, in which the table pc_work contains 1,500 rows with an average row length of 1.2K (from pega_sizes.txt).
Page size / Ave row size = Rows per page 4K / 1.2K = 3 Rows in table / Rows per page) = Pages required 1500 / 3 = 500 Pages required * Page size = Space required 500 * 4K = 2000K (2.0MB) Notes
Keep in mind that this figure is an estimate, because it does not account for the page header of 68 bytes, nor for any percent free space value that might have been specified when the table was allocated. Both values must be subtracted from the page size to attain a more accurate value. Discard any remainder portion of the rows per page value. It is necessary to round down this number, because the system cannot split a row across pages (4 / 1.2 = 3.33, rounded down to 3).
6.6
1.
7.1
Download Blob_Sizing.zip and extract the sql_server_sizing.sql script to a local directory on your system. 2. Open MS Query Analyzer and log on to the PegaRULES database. 3. Select File > Open. Navigate to the location where you saved sql_server_sizing.sql, and open the file. The file opens in the Editor pane. 4. Search for the statement containing TABLE_CATALOG:
1.
select distinct TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_CATALOG = 'DB_NAME' and table_name <> 'dtproperties' and table_name not like 'sy%'
Change the TABLE_CATALOG value to the name of your PegaRULES database. Save the script. 5. Press F5 to execute the script. 6. Open a new Query Analyzer window and run:
exec dbo.listTableRowCounts
The results display in the Results pane and can be exported, or you can copy and paste the information into another application.
7.2
Download Blob_Sizing.zip and extract the sql_server_sizing.sql script to a local directory on your system. Open SQL Server Management Studio and connect to your server. In the Object Explorer, navigate to the PegaRULES database. Select File > Open > File... Navigate to the location where you saved sql_server_sizing.sql, and open the file. The file opens in the Editor pane. Search for the statement containing DB_NAME():
SET @SQL = 'DBCC UPDATEUSAGE (' + DB_NAME() + ')'
Change the TABLE_CATALOG value to the name of your PegaRULES database. Save the script. 7. Click Execute. Ignore the message reporting an error caused by dropping an unknown procedure. 8. Click the New Query button. Enter and execute the stored procedure:
exec dbo.listTableRowCounts; GO
10
9.
Save the results to a file with Query > Results To > Results to File, or copy and paste the information into another application.
7.3
Note
SQL Server 2000 stores BLOB data using the IMAGE data type, which is stored as a collection of 8K pages that may not be located together on a disk. In SQL Server 2000, BLOB data pages are logically organized in a B-tree structure to facilitate accessing sub-structures within the BLOB. The text_in_row parameter of sp_tableoption controls the behavior of image storage in a BLOB:
sp_tableoption [ @TableNamePattern = ] <tablename> , [ @OptionName = ] text_in_row' , [ @OptionValue = ] 'ON'
If this parameter is set to ON, SQL Server stores image data in the row up to 256 bytes (by default). You can also enter a value between 24 and 7,000 to store image data in the row up to that specific value. If text_in_row is ON and the space needed for the BLOB is larger than the amount of space available, or the space needed is larger than the amount specified in the text_in_row option, then the database inserts a 72-byte root structure in the row, as well as pointers to the pages that contain the BLOB data. If text_in_row is set to OFF, then the image data is stored in separate pages as a logical B-tree. SQL Server allows more than one BLOB per image data page. If there is less than 32Kb of data, then the database stores a 16-byte pointer with the row data that points to a root structure stored on an image page. If there is less than 64 bytes of data for the image, all the data is stored with the root structure on an image page. If the BLOB size is greater than 32K, the database builds intermediate node structures between the data blocks and the root node. These intermediate node structures are stored in separate pages, which are not shared with image data pages.
11
BLOB columns may be stored in a separate file group from relational data if TEXT_IMAGE ON is set during table creation.
7.4
1.
BLOB compression can greatly improve utilization of disk storage. Refer to PRKB-9850, How to Compress the BLOB Values in the PegaRULES Database, for further information.
7.5
The tables that are most likely to grow over time in PRPC application are:
Depending on the flows and the processes developed, additional tables (such as pc_other) may grow as well; this should be discussed with your PRPC System Architect. Using the data obtained from running sql_server_sizing.sql to calculate the size required for the exposed columns.
1.
Estimate the number of rows per DB page by dividing the page size by the average row size from pega_sizes.txt.
DB page size / Average row size = Rows per page
2.
Calculate the number of pages required for the table by dividing the number of rows by the number of rows per page.
Number of rows in table / Rows per page = Number of pages needed for table
3.
Calculate the space required for the exposed columns by multiplying pages needed by page size.
Number of pages needed for table * page size = Space required
For example, consider this example space calculation for a database created with a fixed page size of 8K (8060 usable bytes), in which the table pc_work contains 1500 rows with an average row length of 1.2K (from pega_sizes.txt).
Page size / Ave row size = Rows per page 8K / 1.2K = 6
12
Rows in table / Rows per page = Pages required 1500 / 6 = 250 Pages required * Page size = Space required 250 * 8K = 2000K (2.0MB) Notes
This figure is an estimate, because it does not account for the page header and footer of 132 bytes, nor for any percent free space value that might have been specified when the table was allocated. Both values must be subtracted from the page size to attain a more accurate value. Discard any remainder portion of the rows per page value. It is necessary to round down this number, because the system cannot split a row across pages (8 / 1.2 = 6.66, rounded down to 6).
13