Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. Scenario:................................................................................................................................................ 2 Why partitioning? ................................................................................................................................. 2 History: .................................................................................................................................................. 2 Steps to create partitioned table: ......................................................................................................... 3 Kinds of query well-supported by partitioning: .................................................................................... 5 Manage partitions: ................................................................................................................................ 9 Create index on partitioned table: ...................................................................................................... 11 Convert non-partitioned table to partitioned table: .......................................................................... 12 Solution to the scenario: ..................................................................................................................... 12
1. Scenario:
Imagine we have a table named Orders which stores data of orders from customers in the current year and many previous years. We have to do the following operations on this table: Do analysis only on data in the previous years. Particularly, we only do analysis on 5 consecutive years. Do data modification only on data in the current year. Because the table only store data in a period of 11 years backward from the current year, data before this period must moved to another table (or deleted) in any way.
2. Why partitioning?
We have a very large table and frequently query on this table. However, we only need some specific chunks of data at the time of querying. So, if we could partition this table (break down this table into many pieces and store them separately) according to our need and only query on the right pieces that we need, our query would run more efficiently. For instance, in the above scenario, we frequently do analysis on 5 consecutive years. We have various read-write operations on the table concurrently but read operations is carried out only on some specific chunks of data and write operations on others. If these chunks of data are mixed in a table, contention will increase. However, if these chunks of data were stored separately according to the need of our read-write operations, contention would be decreased. In the above scenario, we do analysis only on the previous years and data modification on the current year only.
3. History:
Releases before 7.0: We must design many smaller sub-tables instead of one large table. Our stored procedures or access codes have to direct exactly to the right sub-tables. This will increase complexity in designing and programming. To deal with the problem of complexity, we usually create a view that unions many sub-tables and our access codes will use this view instead of the sub-tables. However, this approach eventually leads us back to the problem of inefficiency that we are facing.
Version 7.0: This version introduced a new concept called partitioned view. We still have to design many sub-tables explicitly and create a partitioned view to union them instead of such a normal view we have ever used in the previous releases. We can achieve efficiency through selecting on the partitioned view but this concept only supports for select statement not for any other DML statements. Version 2000: Partitioned view is extended to support all DML statements. Version 2005: This version introduced a really new concept called partitioned table. Armed with this concept, we do not need to design many sub-tables and create a view to union them. Instead, we just create a partitioned table and our access codes can access this table directly to achieve real efficiency. DML statement is also supported and directed to the right pieces of the table.
CREATE DATABASE [TestPartitioning] ON PRIMARY (NAME='Data Partition DB Primary FG', FILENAME= 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\Primary\Data Partition DB Primary FG.mdf', SIZE=5, MAXSIZE=500000, FILEGROWTH=1 ), FILEGROUP [Data Partition DB FG1] (NAME = 'Data Partition DB FG1', FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\FG1\Data Partition DB FG1.ndf', SIZE = 5MB, MAXSIZE=500000, FILEGROWTH=1 ), FILEGROUP [Data Partition DB FG2] (NAME = 'Data Partition DB FG2', FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\FG2\Data Partition DB FG2.ndf', SIZE = 5MB, MAXSIZE=500000, FILEGROWTH=1 ), FILEGROUP [Data Partition DB FG3] (NAME = 'Data Partition DB FG3',
4
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\FG3\Data Partition DB FG3.ndf', SIZE = 5MB, MAXSIZE=500000, FILEGROWTH=1 ), FILEGROUP [Data Partition DB FG4] (NAME = 'Data Partition DB FG4', FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\FG4\Data Partition DB FG4.ndf', SIZE = 5MB, MAXSIZE=500000, FILEGROWTH=1 ), FILEGROUP [Data Partition DB FG5] (NAME = 'Data Partition DB FG5', FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\FG5\Data Partition DB FG5.ndf', SIZE = 5MB, MAXSIZE=500000, FILEGROWTH=1 )
create partition function DateTimePartitionFunction(datetime) as range left for values ('2006/12/31', '2007/12/31', '2008/12/31', '2009/12/31')
create partition scheme DateTimePartitionScheme as partition DateTimePartitionFunction to ([Data Partition DB FG1], [Data Partition DB FG2], [Data Partition DB FG3], [Data Partition DB FG4], [Data Partition DB FG5])
If the number of file groups is more than the number of partitions, the first unassigned file group will be marked as "Next Used" and the remaining file groups will be ignored.
create table PartitionedOrders ( OrderDate datetime, OrderId int, OrderData char(1000) ) on DateTimePartitionScheme(OrderDate)
create table PartitionedOrderDetails ( OrderDate datetime, OrderId int, OrderDeatilId int, OrderDeatilData char(1000) ) on DateTimePartitionScheme(OrderDate)
Before we run some query, we have to run the following script to insert sample data into the two above table.
declare @i int, @j int, @k int, @t int, @year char(4), @month char(2) set @i = 1 while @i <= 5 begin if @i = 1 set @year = '2006' else if @i = 2 set @year = '2007' else if @i = 3 set @year = '2008' else if @i = 4 set @year = '2009' else set @year = '2010' set @j = 1 while @j <= 12 begin if @j < 10 set @month = '0' + CAST(@j as CHAR) else set @month = CAST(@j as CHAR)
6
set @k = 1 while @k <= 100 begin insert into PartitionedOrders(OrderDate, OrderId, OrderData) values (@year + '/' + @month + '/' + '15', @k, 'a') insert into NonpartitionedOrders(OrderDate, OrderId, OrderData) values (@year + '/' + @month + '/' + '15', @k, 'a') set @t = 1 while @t <= 5 begin insert into PartitionedOrderDetails(OrderDate, OrderId, OrderDeatilId, OrderDeatilData) values(@year + '/' + @month + '/' + '15', @k, @t, 'b') insert into NonpartitionedOrderDetails(OrderDate, OrderId, OrderDeatilId, OrderDeatilData) values(@year + '/' + @month + '/' + '15', @k, @t, 'b') set @t = @t + 1 end set @k = @k + 1 end set @j = @j + 1 end set @i = @i + 1 end
The query must have at least one filter predicate on partitioning column. Filter predicate on partitioning column:
select * from PartitionedOrders where OrderDate between '2007/01/01' and '2007/12/31' select * from NonpartitionedOrders where OrderDate between '2007/01/01' and '2007/12/31'
select * from PartitionedOrders where OrderId = 10 select * from NonpartitionedOrders where OrderId = 10
The filter predicate on partition column must be simple (comparison between partition column and an expression whose operands are just constants or variables).
select * from PartitionedOrders where (OrderDate + 1) between '2007/01/01' and '2007/12/31' select * from NonpartitionedOrders where (OrderDate + 1) between '2007/01/01' and '2007/12/31'
Indexes should be align-partitioned. The joining tables should be align-partitioned. In order to test this idea, we create another partition function, then another partition scheme and finally another table that can join with PartitionedOrders table.
create partition function DateTimePartitionFunction1(datetime) as range left for values ('2007/12/31') create partition scheme DateTimePartitionScheme1 as partition DateTimePartitionFunction1 to ([Data Partition DB FG1], [Data Partition DB FG2]) create table PartitionedOrderDetails1 ( OrderDate datetime, OrderId int, OrderDeatilId int, OrderDeatilData char(1000) ) on DateTimePartitionScheme1(OrderDate)
So now, we can run the two queries and see the difference in performance between them
select * from PartitionedOrders o inner join PartitionedOrderDetails od on (o.OrderDate = od.OrderDate and o.OrderId = od.OrderId) where o.OrderDate between '2007/01/01' and '2007/12/31' select * from PartitionedOrders o inner join PartitionedOrderDetails1 od on (o.OrderDate = od.OrderDate and o.OrderId = od.OrderId) where o.OrderDate between '2007/01/01' and '2007/12/31'
6. Manage partitions:
Modify partition scheme:
alter partition scheme DateTimePartitionScheme next used [Data Partition DB FG1]
In a partition scheme, only one file group can be designated "Next Used". If there is another file group marked with "Next Used", "Next
10
Used" will be transferred to the file group you specified in the statement. Modify partition function: Merge range:
alter partition function DateTimePartitionFunction() merge range ('2006/12/31')
The partition that held boundary value is dropped and its file group will be removed from the partition schemes that used the partition function unless it is being used by the remaining partitions or marked as "Next Used". The merged partition is the one that did not hold boundary value. Data in the dropped partition will be scanned to be moved to the merged partition. Split range:
alter partition function DateTimePartitionFunction() split range ('2010/12/31')
The partition where the new boundary value will resides is considered the new one. At the time of spitting, there must be one file group marked with "Next Used" in the partition schemes that used the partition function. Data in the old partition that contains boundary value will be scanned to be re-distributed into the two new partitions. Switch data in/out partition: This technique is used mainly for moving data between partitions of partitioned tables. Benefit of switch over insert-delete: Data switching is just a metadata modification, not data modification at all, so performance is really good. Requirements on switch: Source partition and target partition must be in the same file group. Source table and target table must have the same structure. Set of constraints on target partition must be reasoned from set of constraints on source partition. For table constraint: Check matching of the two expressions. For column constraint:
11
For simple constraint (comparison just between one column and a constant): Check logic. For complex constraint: Always evaluate to false unless this constraint is set "No Check" in the target. No primary key/foreign key relationship between source table and target table. Source table cannot be referenced from any other tables. Any indexes on source table must be aligned with source table and so for target table. "Aligned" means that they must use the same partition function or different ones having the same boundary values. For any index defined on target table, there must be an index on source table that is identical in terms of uniqueness, sub-keys, sorting direction (ascending or descending) for each key column. Index partitions of the target partition must be in the same file groups with the corresponding index partitions of the source partition. Triggers on source table and target table will not be activated while switching partitions. Switch data out from partitioned table:
alter table PartitionedOrders switch partion 1 to OldOrders
12
For unique index: The partition column must be one of the index keys. This rule is for checking unique constraint efficiently. For non-unique index: Clustered: If partition column is not specified, SQL Server automatically adds partition column to the end of the list of index keys. Non-clustered: If partition column is not specified, SQL Server automatically adds partition column to the end of the list of included columns.
13 Use switch statement to move data of the oldest partition in OldOrders to OldData. In order for this statement to execute successfully, OldData must have all indexes that OldOrders have and meet some other requirements as stated in the preceding section. Drop table OldData. Set nextused for the file group that contain the oldest partition. Split the rightmost partition into 2 partition. Again, one is used to store data of CurrentOrders and the remaining rightmost one is used for this process in the next year. Move data from CurrentOrders to the correct partition of OldOrders by insert and delete statement. This step will need a lot of time. We cannot use switch statement for this step because CurrentOrders and OldOrders have different set of indexes. Use truncate statement to delete data of CurrentOrders efficiently. Self-evaluate the solution: Strong points: Since the analysis is done only in some chunks of 5 consecutive years and OldOrders is partitioned based on year, this job can be done efficiently (of course, our queries have to be written carefully in such ways that are well-supported by partitioning as presented in the preceding section). The contention problem is solved because analysis and data modification is done on 2 separate tables (OldOrders and CurrentOrders). The process of moving data at the end date of every year is rather efficient except the step moving data from CurrentOrders to OldOrders. Weak points: The step moving data from CurrentOrders to OldOrders is not efficient.