Anda di halaman 1dari 3

How to configure indexer partitions for FAST ESP

Summary This article applies to any capacity-planning exercise that uses FAST ESP. This article describes what to consider when you size data_index partitions or set triggers for their use. This article also suggests what to do when you receive any of the following messages in the product: Memory allocation failed for query (ERROR 1013) Partition full Ran out of memory. Indexing failed. Items to have available: %FASTSEARCH%\etc\config_data\RTSearch\webcluster\rtsearchrc.xml from the ESP admin node The amount of free space on the drive that is hosting the data_index directory on ESP indexer nodes The total amount of RAM on the servers that are hosting ESP indexer nodes A calculator

Data to be collected or calculated: Average bytes for each document's attributes that are loaded into memory Maximum number of document attributes that can be loaded into memory Maximum number of documents that can be indexed on disk Average size in kilobytes (KB) of the documents that are indexed on disk

Any recommendations for ESP sizing that you determine by using this article are an approximation and are specific to an index-profile and document corpus. Any adjustments to these recommendations require new sizing calculations. Relevant sizing factors that are not covered in this article include CPU and I/O load. These factors will vary from system to system. Any changes that you make to partition configuration should be benchmarked to make sure that, for example, query latency does not suffer because these other resource constraints are reached. How to size for memory usage ESP processes are 32-bit. Therefore, each process is limited in how much memory it can address: 2 gigabytes (GB) on 32-bit systems and 4 GB on 64-bit systems. Fsearch will be limited to one of these limits in the number of large partitions that it can load. To determine the maximum number of these partitions, subtract 4 GB from the total system RAM, and then divide the remaining memory by 2 (or by 4, if you are using a 64-bit system). This number, rounded down, is the maximum recommended number of large partitions. For an 8 GB 32-bit server, the limit would then be two large index partitions. Partitions 0 and 1 are generally kept empty through triggering, to maintain a low latency indexing of new documents. Therefore, the total number of large and small partitions the server can accommodate is four. The number of documents per large partition will be limited by the bytes per document, the average of which you can obtain by using the rc command: rc -f float -r indexer-0-0 | findstr bytes Values will be displayed for each indexing column in the installation. The highest value should be used. This memory usage is determined by the quantity and content of fields that have navigators or fullsort enabled. Changing these fields or their content can reduce the overall partition memory usage and the number of bytes per document. The maximum memory usage for all documents in a partition is set in rtsearchrc.xml through the docsDistributionMaxMB parameter. This parameter has a default value of 1,024 MB. However,

partitions are marked as "full" at 90 percent of this value. Full partitions cannot be indexed further. Therefore, 85 percent of the value of the docsDistributionMaxMB parameter can be used to reach a safe 870 MB (912261120 bytes) of planned usage. For example, if the average number of bytes per document is 384, the default settings will allow for 2.375 million documents per partition before the partition is marked as "full." The docsDistributionMaxMB parameter can be adjusted based on the platform (32-bit or 64-bit) and on the complexity of queries. (Queries use the memory space that remains after the document attributes are loaded). At least 512 MB should be left for queries and for query caching. However, some scenarios will require more. The default leaves 1,024 MB for queries and for query caching. How to size for disk usage The maximum number of documents in a column may be limited by the available hard disk space. To assess this value, you have to perform several operations: 1. Run indexerinfo -a activeindexset. Each column and row will have output similar to the following: Column 0, row 0: active index set is 0_439,1_3,2_3 2. In Windows Explorer, check the size of the largest partition (index_2_3) on disk (located in %FASTSEARCH %\data\data_index\). To do this, right-click the directory, and then click Properties. 3. Determine the number of documents in the partition. To do this, use an rc command that is directed at that column and row: rc -f int -r indexer-0-0 | findstr "documents," 4. Determine the number of kilobytes per document. This is equal to the partition size in kilobytes divided by the number of documents. 5. Add the size of the whole current data_index on that column to the free space on the drive, and then divide this value by the number of kilobytes per document and then by 2.5 (overhead that is needed for the re-indexing of partitions). You should also leave some overhead free space. For example, if the index set that we are considering here contained 1 million documents in index_1_3, and if index_2_3 was marked "full" at 3 million documents because of memory limits, you would focus just on index 2_3. If the size of index 2_3 on disk was 200 GB (209,715,200 KB), you would divide 200 GB by the number of documents to reach a figure of roughly 70 kilobytes per document. (The highest value should be used if there are multiple columns.) If the total free space was 480 GB, and the total data_index size was 270 GB, the maximum usable space would be 750 GB (786,432,000 KB). Dividing this figure by 70 and then by 2.5 yields a maximum of 4.5 million documents. The existing 4 million could be expected to use approximately 670 GB (4,000,000 * 70 * 2.5) during a resetindex, leaving 80 GB for overhead. Therefore, you would conclude that the indexing column had sufficient space for its current content. Combining memory and disk constraints If your system can handle an additional partition based on memory considerations, and if your system is not approaching the document limit based on disk considerations, you can add a fourth partition to resolve any of the three errors that are mentioned in the "Summary" section. To do this, divide the documents across more partitions by using the following rtsearchrc.xml settings: docsDistributionPst = "100,100,100,50" numberPartitions = "4" In the earlier sections on disk and memory, partition 2 had reached a memory limit. But the server had sufficient memory (8 GB) to handle additional large partitions. You also verified that there was sufficient space to fully reindex the current document load. The rtsearchrc.xml settings would then create partition 3. Partition 3 would have 50 percent of the content, or 2 million documents. Partition 3 would be expected to have approximately 730 MB (2,000,000 * 384 bytes) of attribute memory usage and therefore not be "full." The docsDistributionPst setting is used right-to-left to allocate documents from the largest partition to the smallest. Generally, partition 2 and any following partitions are used to split documents evenly. For example, the following setting allocates one-third of the content to each of the large partitions: 100,100,100,50,33 Any change in the number and sizes of partitions also requires trigger changes. (Trigger changes are described in the next section.) Any change in the number of partitions also requires the restart of the "clarity" monitoring service to detect the new partition total. You can also follow the steps in the earlier sections for an idea of the capacity of the indexing column based on the maximums that you determined in those sections. You should use the lower of the two maximums unless you plan to add more storage or memory. If you expect to have more documents than the number of columns multiplied by the capacity, you may need more servers to host additional indexing columns.

If the system is not yet close to capacity, you may want to consider using fewer partitions. This has the advantage of reduced I/O and CPU load. Setting the number of partitions to the maximum requires trigger changes as document load increases. Setting partition sizes close to the maximum requires adding more partitions as document load increases. Partition triggering Partitions are allocated documents from data_fixml, oldest to newest. The largest partition (ID 2 in a default installation) contains all the oldest content. Triggers keep documents moving to larger partitions so that partitions 0 and 1 can index small subsets quickly. A resetindex ignores triggers completely and allocates documents based only on docsDistributionPst. Document count triggers should generally be 10 percent larger than the expected documents in a partition after resetindex. In the example data that we considered earlier, four partitions would be used to index 4 million documents, with 2 million documents each in partitions 2 and 3. The trigger for partition 3 should then be set to 2.2 million: <index-scheduling type="docCount" generationTriggers="2:6"/> These trigger settings have the following meaning: Partition 0 is triggered when there is new content. Partition 1 is triggered when partition 0 reaches 10,000 documents. Partition 2 is triggered when partition 1 reaches 100,000 documents or indexes 6 times. Partition 3 is triggered when partition 2 reaches 2,200,000 documents. The generationtriggers are intended to keep documents flowing to partition 2 even when the overall total in partition 1 has not reached 100,000. This helps maintain low indexing latency and makes sure that partition 2 uses resources efficiently. Or, if the final document volume is known but the current counts continue to change, one setting can be made that will adjust triggers regardless of absolute count per partition, as follows: <index-scheduling type="smart" triggerPercentage="5"/> This allows for triggering based on percent difference from the docsDistributionPst, although it results in more disk thrashing if the full document set is updated repeatedly instead of growing organically. triggers="10000,100000,2200000"

Anda mungkin juga menyukai