What is new in SQL Server 2008 aka Katmai
Change Data Capture feature in SQL Server 2008
What is Change Data Capture?
Change data capture feature is a new enhancement in sql server management with
SQL Server 2008, Katmai. Change Data Management is included in the new features
of Katmai with the June CTP release of SQL Server 2008.
Change data capture enables SQL Server administrators and developers to capture
insert, update and delete events in a sql server table as well as the details of
the event which caused data change on the relevant database table. When you
apply Change Data Capture feature on a database table, a mirror of the tracked
table is createad which reflects the same column structure of the original table
and additional columns that include metadata which is used to summarize what is
the change in the database table row.
So enabling the Change Data Capture feature on a database table, you can track
the activity on modified rows or records in the related table. Change Data Capture (CDC) can be considered as Microsoft solution for data capture systems in SQL Server 2008 and next versions.
There were samples of data capture solutions implemented for Microsoft SQL Server 2000 and SQL Server 2005 by using after update/insert or after delete triggers.
But CDC enables SQL Server developers to build sql server data archiving without a necessity to create triggers on tables for logging.
SQL Server database administrators or programmers can also easily monitor the activity for the logged tabled.
How to enable a SQL Server database for Data Change Capture
Before applying the data change capture (CDC) on a SQL Server database table the
database should be enabled for Data Change Capture. To see whether data change
capture (cdc) is enabled over a database, you can run and check the
is_cdc_enabled column value in the sys.databases catalog view by running the following query.
select [name], database_id, is_cdc_enabled from sys.databases
If is_cdc_enabled column value for the related database is false, then you can
enable this change data capture by running the sys.sp_cdc_enable_db_change_data_capture system stored procedure in the related database.
When CDC is enabled on a database, a new user named "cdc", a schema named "cdc"
and the following tables are created on the related database.
How to enable a database table for Data Change Capture
After enabling the database for CDC, now you can configure tables that you wish
to track the modifications upon by running the sys.sp_cdc_enable_table_change_data_capture system stored procedure.
Before running the sys.sp_cdc_enable_table_change_data_capture stored procedure you can check whether the related table is enabled preior for data change capture
by querying the is_tracked_by_cdc column value in the sys.tables for the related database table.
select [name], is_tracked_by_cdc from sys.tables
Also for Change Data Capture to be successfull, SQL Server Agent should be
running. Otherwise, you may get the following error message:
SQLServerAgent is not currently running so it cannot be notified of this action.
exec sys.sp_cdc_enable_table_change_data_capture @source_schema = N'dbo', @source_name = N'Country', @role_name = N'cdc_Admin'
SQL Server Agent plays an important role in the database design or implementation of data capture solution for Microsoft SQL Server databases since for each database two CDC related job is created automatically when sys.sp_cdc_enable_table_change_data_capture is first executed for a database.
These jobs are named cdc.database_name_capture and cdc.database_name_cleanup
cdc.database_name_capture SQL Server job runs the sql server system stored procedure sys.sp_MScdc_capture_job to start Change Data Capture Collection Agent
For more detailed inside view of SQL Server for CDC you can examine the stored procedure sp_cdc_scan how the CDC scan is executed on a data capture enabled sql database.
cdc.database_name_cleanup SQL Server job runs the sql server system stored procedure sys.sp_MScdc_cleanup_job to clean up database changes tables
After you have run the above t-sql command, you can view the cdc.change_tables
for the newly created record. capture_instance is a parameter that can be set by passing it as a parameter to the sys.sp_cdc_enable_table_change_data_capture procedure.
If you have not passed the capture instance as a parameter, you can get the default created value by the capture_instance column value of the cdc.change_tables table.
After you have enabled CDC for a table, for example for Country table just as I
did, another table is created for keeping changed data and information about the
As you will see there are five additional fields to the mirrored original table
__$operation identifies the data manipulation language (DML) operation which
caused the Change Data Capture fetch this process.
An __$operation column value 2 refers to an Insert statement
If an Update statement is run against a row in the table, a row with
__$operation column value 3 is created referring to the column values before
update and a row with __$operation column value 4 is created referring to the
column values after update statement is run.
An __$operation column value which is equal to 1 refers to a Delete statement.
__$update_mask identifies the columns which are affected by the DML command. If
an Insert command (where __$operation is equal to 2) or a Delete command (where
__$operation is equal to 1) since all the columns are effected __$update_mask
has a value formed of bit value 1's for each column.
For example, the followin Insert command results with binary value "111" for
each column having 1's in order
INSERT INTO Country SELECT N'TR', N'Türkiye', N'Ankara'
The __$update_mask becomes 0x07 where we have "111" in binary. The binary
digit with least value is the first column in the table.
If an Update command is run and the Capital column value is updated then 2
new rows are created in the CDC table of the Country table.
UPDATE Country SET Capital = N'Tunus' WHERE Code = 'TU'
First row indicates the values prior the Update statement. This row has
__$operation equal to 3 and __$update_mask value equal to 0x04 which is "100" in
binary. You can see the updated column value from the table referencing the
related column. The second row indicated the values after the Update statement.
This second row has __$operation equal to 4 and again the same __$update_mask
column value with the new value of the related columns.
You can also get the changes on a table by calling the cdc.fn_cdc_get_all_changes_capture_instance function.
the capture_instance in the function name is the capture_instance column value
which is declared or determined when the table is enabled for capturing data
You can also find the capture_instance for a table that is tracked and logged for data changes in the cdc.change_tables table as the capture_instance column value.
You can execute the below sql statement to fetch the capture_instance value to use as a parameter to cdc.fn_cdc_get_all_changes_capture_instance function.
select capture_instance from cdc.change_tables
Sample Demonstration on Change Data Capture with DML queries
Assume that we have an empty dbo.Country table as shown in figure below
Assume that on the sample table which is initially empty and CDC feature is enabled, the following DML queries are executed in order.
insert into Country select 'TR', 'Turkey', 'Ankara'
update Country Set Name = N'Türkiye' where Code= 'TR'
delete from Country where code = 'TR'
If you want to see the effects of the above insert, update and delete queries on the capture instance of the Country table, run the below select query.
select * from cdc.dbo_Country_CT
How to get a list of all the data changes for a SQL Server database table
So you can run the below SELECT query from the CDC table to get all changes.
select * from cdc.dbo_Country_CT
Or you can run the below SELECT query from the user function which returns
rows only whose __$start_lsn value is between
@from_lsn and @to_lsn
DECLARE @begin_time datetime
DECLARE @end_time datetime
DECLARE @from_lsn binary(10)
DECLARE @to_lsn binary(10);
SET @begin_time = GETDATE()-1;
SET @end_time = GETDATE();
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
select * from cdc.fn_cdc_get_all_changes_dbo_Country(@from_lsn, @to_lsn, 'all');
LSN values __$start_lsn are Log Sequence Number associated with the
commit transaction for the changes.
Log sequence numbers for a date-time value can be obtained from the cdc.lsn_time_mapping table
SQL syntax for sys.fn_cdc_map_time_to_lsn to get mapped lsn value for a datetime value is as follows:
sys.fn_cdc_map_time_to_lsn ( '<relational_operator>' , tracking_time )
Tracking time is in datetime and possible relational_operator values are;
- largest less than
- largest less than or equal
- smallest greater than
- smallest greater than or equal
To disable Change Data Capture feature on a database table, you can use
the system procedure sys.sp_cdc_disable_table_change_data_capture. The syntax of
the sys.sp_cdc_disable_table_change_data_capture procedure is as follows;
[ @source_schema = ] 'source_schema' ,
[ @source_name = ] 'source_name'
[, [ @capture_instance = ] 'capture_instance' | 'all' ]