Anda di halaman 1dari 4

Indexes, Too Much of a Good Thing?

Some of you may be familiar with relational databases other than Teradata and how those other RDBMs utilize indexes. In Teradata, an Index is a physical mechanism that is used to distribute, store, and access data rows. Indexes provide a physical access path to the data and their use can avoid unnecessary fulltable scans to locate rows. To level set, lets first consider the difference between a Key and an Index. A Key is a relational term. The Primary Key of a table is a column set that uniquely identifies a row in a logical table. A key is an identifier and not a physical mechanism. In Teradata, there are four main types of indexes: Primary Indexes, Secondary Indexes, Join Indexes, and Hash Indexes. Each of these types of indexes have their own distinctive flavors and uses. All Teradata Database tables require a Primary Index because the system distributes table rows to the AMPs based on primary index values. (* Teradata 13.0 has a NoPI feature for loading staging tables.) To accommodate Teradatas massively parallel architecture, indexes use a hashing algorithm based on data row hash values as the most efficient means of distributing and retrieving data. Primary indexes can be either unique or non-unique and partitioned or non-partitioned:
y y y y

Unique primary index (UPI) Non-unique primary index (NUPI) Non-partitioned primary index (NPPI) Partitioned primary index (PPI)

Partitioned primary indexes may have a single partitioning expression or multiple partitioning expressions:
y y

Single-level PPI (SLPPI) Multilevel PPI (MLPPI)

The main purposes for Teradatas Primary Indexes:


y

Provide access to data rows, obviating the need to do full-table scans. By providing the values for all the primary index columns in your SQL WHERE clause, direct access to the AMP with the row(s) can be made using that primary index value. (Note: Teradata

does not do partial indexing, you must provide values for all the columns defined in an index.) Determine which AMP a row will distribute to. Rows are distributed across the AMPs based on the hash of their Primary Index. The uniqueness of the data values in the primary index will affect how evenly the data distribution is across the AMPs. Efficiency of join processing. Depending upon the choice of index, rows of the table may or may not have to be redistributed, spooled, and sorted prior to the join.

A trade-off that must often be considered is between the needs of full table scan queries and queries that can be range constrained in some way. For example, between strategic queries that need to scan a multi-year history of sales, and tactical queries that only need transactions from, say, the last 60 days. The compromise thats often made is to break the large table into a number of range partitioned tables which work better for tactical queries than scanning one big table, and to UNION the partitioned tables together for strategic queries that need all the data. But breaking up what should be one table into multiple tables makes them harder to manage and maintain, and although it improves performance for tactical queries, it is not the optimum solution. Up to 64 columns can be specified for a primary index definition.

Partitioned Primary Index (PPI) is a table organization scheme that very elegantly enhances the existing Teradata structures and helps remove this trade-off dilemma. Its an optional extension to Teradatas hashed primary indexing that adapts it to more efficiently handle range queries. What if, instead of having to fully scan a large table to satisfy a query we only had to scan 50% of it? What if we only had to scan 10%.or perhaps even less? This would have a huge, very positive impact on performance, and this is what PPI has been designed to address, to push the envelope for Teradata users who want to see better and better query performance whether its for tactical or strategic requests. PPI has also been designed to be very easily set up and managed and to put a minimum additional burden on the DBA in keeping with our philosophy of low cost database management.

Secondary Indexes provide alternate access paths to the data and may be Hash-ordered or Value-ordered. Secondary indexes may be added or dropped as needed with the caveat that building them requires some amount of system resources and youll want to check with your DBA appropriate.
y y

Unique secondary index (USI) Nonunique secondary index (NUSI)

Join Indexes offer a variety of six subtypes that include:


y y

Single-table join index (STJI) Single-table aggregate join index (STAJI)

y y y y

Single-table sparse join index Multitable simple join index Multitable aggregate join index Multitable sparse join index

Join Indexes are highly recommended for decision support applications because they often provide superior performance to large table joins and aggregate computations. Any Join Index, whether simple or aggregate, multi-table or single-table, can be sparse. The create statement of the index uses a constant expression in the WHERE clause of its definition to narrowly filter its row population. A caution when choosing Join Indexes is that neither MultiLoad nor FastLoad utilities support tables with join indexes. If Join Indexes are needed, a possible workaround could be to FastLoad data into an empty table and then either MERGE or do a INSERT/SELECT into the table with the Join Index. Hash Indexes are a similar to Join Indexes and have a narrower usage. Hash Indexes are limited to one table and the tables Primary Index cannot be partitioned. Hash indexes are useful for queries where the index contains the columns referenced by a query, which will allow the Optimizer to use the index to satisfy the query rather than the underlying base table. Hash indexes can also be defined on a table in place of traditional secondary indexes. Like the Join Index, FastLoad and MulitLoad do not support Hash Indexes. Criteria for choosing Join Indexes vs NUSIs The similarities between STJI and NUSI are that STJIs can be defined with the same columns as NUSI; the concept of index covering (the query can be satisfied by columns in the index without accessing the base table) applies to both STJI and NUSI; value ordering is available on both STJI and NUSI; and joins to either STJIs or NUSIs are possible. Teradata does not do partial index retrievals, however, it can use either STJIs or NUSIs to do Full Table Scans of their subtables instead of a FTS of the base table. The basic differences between STJI and NUSI are that a STJI is similar to a table with a primary index with additional columns defined; a STJI row can be stored on the same AMP or a different AMP as table data row whereas NUSIs are stored on same AMP as table data row; and NUSIs are supported by MultiLoad, but not STJIs.

Advantages of Indexes Indexes are a retrieval mechanism and the intent of indexes is to lessen the time it takes to retrieve rows from a database and eliminate full table scans. Disadvantages of Indexes This is where too much of a good thing can be a disadvantage. Whenever a base table row is updated, deleted, or inserted, Index subtables must be also be updated. The more secondary, join or hash indexes you have, the more maintenance that will have to be performed. All Teradata secondary, hash, and join indexes are stored in subtables and will require extra storage space. Some of the indexes are allergic to some Load Utilities so youll want to check on what may

be impacted as a result of index choices.

A good practice is to use the EXPLAIN before you execute a query will help you determine which indexes are being used for your query. One of the most important tasks of a DBA is to choose Indexes, CHOOSE WISELY!

http://developer.teradata.com/database/articles/indexes-too-much-of-a-good-thing

Anda mungkin juga menyukai