Anda di halaman 1dari 9

Best Practices for synchronizing data

SQL Data Compare version 6,7


3 September, 2008

SQL Data Compare is a tool that can be used to inspect the contents of
similar tables in a database, for instance, to make sure that the data is identical
after a synchronization, to inspect a database after a software upgrade, or to
perform Quality Assurance on data. Additionally, it can synchronize the data so
that the tables in both databases contain exactly the same information.

When using Data Compare as a data synchronization tool, there are a


number of issues that may occur because of the way that Data Compare works.

The diagram shows a few important things. One is that rows are matched
together using the column of a table on which there is a primary key, unique
index, or unique constraint. This can be bypassed by manually setting the
comparison key column, but the performance is degraded as compared to a
column with a primary key or unique index. Another important thing to note is
that the data comparison and storage of the results is done on the local
workstation’s hard disk. This could potentially limit the scalability of the
comparison.

Once a comparison is complete, the results can be reviewed, and if


desired, a synchronization SQL script can be saved or run from within the
program. There are various methods that can be used to manage the data
synchronization in some cases.

A common problem that can be solved using SQL Data Compare’s data
synchronization is synchronizing data from various off-line sources into a central
data repository. Because the client machines are potentially not connected to a
network, this makes some other data replication techniques unfeasible or difficult.
One such example exists at a fire station. The station has a central server with a
Microsoft SQL Server database that logs the activities of the firefighters. The
firefighters carry a laptop in the cab of the truck when they are called to the
scene of a fire. After the fire has been extinguished, the details are logged into
the laptop, and when the truck returns to the station, the data is synchronized
into the main server, and new data from any other units at other scenes is
synchronized back from the main server to the firefighters’ laptop. The schema of
one of the tables is similar to this:

CREATE TABLE Response (


RecordID int PRIMARY KEY IDENTITY(1,1),
UnitId int NOT NULL,
CaptainID int NOT NULL,
Address NVARCHAR(255)
)

This causes a problem when the data is synchronized with the station’s server,
because of conflicting identity values: remember that Data Compare uses the
primary key column to match rows of data! To illustrate:

Station
RecordID UnitID CaptainID Address
1 2 1 1 Main Street House
2 1 2 59 Elm Street
3 3 4 315 Bowery

RecordID UnitID CaptainID Address RecordID UnitID CaptainID Address


4 1 1 1060 W Addison 4 2 2 1600 Pennsylvania Av

Unit 1 Unit 2

In this situation, two units had been dispatched at the same time, and
they had both returned with a new record to insert into the station’s database.
Once the new record from Unit 1 had been inserted, the new record from Unit 2
cannot be inserted because the RecordID of 4 already exists.

Designing a database for synchronization

There are a number of changes that could be made to the database


schema to prevent this conflict. The easiest and most obvious option would
simply be to assign a range of RecordIDs for each unit. After synchronizing the
schema to the laptops, it would be relatively simple to adjust the schema on each
laptop to change the identity seed. This would not help with data that already
exists. There is also a risk that the identities could overlap, given time.

USE FireUnit1
DBCC CHECKIDENT (Response, RESEED, 1000)
Use FireUnit2
DBCC CHECKIDENT (Response, RESEED, 2000)

Another idea would be create a compound primary key from RecordID and
UnitID, which should still uniquely identify each response. If the table had a
relationship with another table, then the second table would also need to have a
UnitID column and a compound foreign key created in order to ensure referential
integrity:

CREATE TABLE [dbo].[ResponseDetails](


[ResponseID] [int] NOT NULL,
[UnitID] [int] NOT NULL,
[Details] [ntext] NULL,
CONSTRAINT [PK_ResponseDetails] PRIMARY KEY CLUSTERED
(
[UnitID] ASC,
[ResponseID] ASC
)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO
ALTER TABLE [dbo].[ResponseDetails] WITH CHECK ADD CONSTRAINT
[FK_ResponseDetails_Response] FOREIGN KEY([ResponseID], [UnitID])
REFERENCES [dbo].[Response] ([RecordID], [UnitId])
GO
ALTER TABLE [dbo].[ResponseDetails] CHECK CONSTRAINT
[FK_ResponseDetails_Response]

This is assuming that the individual firefighters only add new data at the
scene. If they were allowed to change existing data, there would be more to pay
attention to.

The next step, once all of the firefighting units have synchronized to the
fire station’s database, would be to synchronize all of the data back to the
firefighters’ laptops using SQL Data Compare.

Data merging

There may also be situations where the type of replication required is to


merge data from several sources into one – a hub and spoke topology. This
introduces several new challenges: how to manage new records from multiple
sources, how to manage the deletion of records, and how to design strategies for
electing updates when the same record is being updated from several sources.

Unlike in the first example, the ‘hub’ database may have had some records
deleted, and these deletions need to be applied to the databases at the end of the
‘spoke’. Any record that exist at the end of the spoke, but not on the hub,
however, must be prevented from deletion. The way to prevent this is to use
Data Compare’s Actions menu, and choose to exclude all on the right.
If records have been updated at the spoke database and need to be
synchronized to the hub database, then you can click synchronize. To prevent
updates from being sent upwards, exclude all different objects. Now click
synchronize, and run the synchronization against the hub database.

The next step would be to get the changes made at the hub database and
pull them down to the spoke database. The synchronization direction can be
changed by double-clicking the ‘synchronization direction’ arrow, causing it to
point at the ‘spoke’ database. If, as mentioned in the previous paragraph,
updates are allowed only at the hub database, select Include all, then exclude
all on the left. If updates are allowed only from the spoke database to the hub,
also choose exclude all different objects. Now click the synchronize button,
which will cause a synchronization to happen on the ‘spoke’ database.

Congratulations! The databases have effectively been merged!

One of the more advanced topics regarding data merging is how to elect a
winner when two updates to the same row of data occur at both data sources.
Data Compare’s default behaviour is simply to overwrite the row, meaning it’s a
last-update-wins strategy for the whole row. In some cases, this is too simplistic.
The Red Gate SQL Tools do offer a solution for choosing which changes will be
applied on a row-by-row basis. This requires the SQL Toolkit programming API,
which will allow you to write code to choose records based on their content, for
instance, automatically choose rows of data in which a certain datetime field
contains a newer date than in the second database. Please see the
SelectionDelegate object in the SQL Comparison and Synchronization Toolkit help
file that comes with SQL Bundle for more information.

Managing the temporary disk storage

Because SQL Data Compare heavily uses disk storage to hold comparison
results, it may be sensible to consider limiting the amount of data actually being
compared. SQL Data Compare allows this by implementing table selection (object
filtering), column selection (vertical filtering) and WHERE clauses (horizontal
filtering). The Tables & Views tab of the Data Compare project allows all of
these filters to be set.

In this example, all tables except Responses and ResponseDetails are removed
from the comparison, because the responding units will never update the
‘captains’ or ‘units’ tables. These are only updated back at the station. If some of
the columns in either of the included tables are not necessary, they can also be
removed. If the latest recordID value was known when the responding unit had
left the station, this could be included in a WHERE clause, to prevent the repeated
comparison of data that may well never change. The rows of data that will be
stored on local disk have now been cut from 17 rows down to only two, saving a
bit of hard disk space.

It is also possible to change the location of the temporary storage, for


instance, to a larger hard disk. To do this, open a command prompt, change the
current directory to the %programfiles%\red gate\sql bundle 5 folder, and run
the following commands:

SET RGTEMP=”d:\temp”
RedGate.SQLDataCompare.UI.exe

Comparing different types of data


In some cases, columns containing different types of data must be
compared. This could potentially cause some problems. The first concerns NULL
data.

CREATE TABLE [dbo].[Maintenance]( CREATE TABLE [dbo].[Maintenance](


[MaintenenceID] [int] PRIMARY KEY [MaintenenceID] [int] PRIMARY KEY
NOT NULL, NOT NULL,
[Date] [datetime] NULL CONSTRAINT [Date] [datetime] NULL CONSTRAINT
[DF_Maintenence_Date] DEFAULT [DF_Maintenence_Date] DEFAULT (getdate()),
(getdate()), [MaintCode] [nchar](10) NOT NULL
[MaintCode] [nchar](10) NULL )
)

The Maintenance table in one database allows NULL data for the
MaintCode column, and the other does not. If NULLS are synchronized to the
second database, the update will fail. One solution would be to replace all of the
NULL data in the first database, but possibly the owner of this data would object.
In that case, a view could be used rather than a table. The view could CAST data
or replace it with a value. Creating a view on database 1 is the first
step:

CREATE VIEW [dbo].[MaintView]


AS
SELECT CASE WHEN MaintCode IS NULL THEN 'NONE' ELSE MaintCode END AS
MaintCode, Date, MaintenenceID
FROM dbo.Maintenance

This view will replace any NULL data with the word ‘NONE’. The next step
is to instruct Data Compare to map the view to the Maintenance table on the
other database:
After unmapping the Maintenance table, it can be mapped to the MaintView view:
And finally, set the comparison key for the mapped objects:
After comparing the data, the result is that the SQL update script replaces NULL
with ‘NONE’:

-- Update 1 row in [dbo].[Maintenance]


UPDATE [dbo].[Maintenance] SET [MaintCode]=N'NONE', [Date]='20070504
09:29:54.403' WHERE [MaintenenceID]=1

The same idea can be used when the datatype of the columns in a table are
different, by using the CAST or CONVERT functions instead of the CASE
statement.

In this document, it has been demonstrated that SQL Data Compare can be used
as a tool for synchronizing data between Microsoft SQL Server databases when
the databases are specifically designed with this in mind. Best practices for
performing synchronizations efficiently have been demonstrated by using
horizontal and vertical filtering, and workarounds for comparing disparate
datatypes using views have been demonstrated.

To learn more, or to download a free 14-day evaluation of the software, please


visit http://www.red-gate.com.

Anda mungkin juga menyukai