SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for ASP.Net, SQL Server, Reporting Services, T-SQL, Windows, AWS, SAP HANA and ABAP


String Manipulation to Find Difference in Different Columns in a Wide Table

The recent thread at T-SQL forum at MSDN Dynamic SQL Stored Proc with information_schema.columns presented an interesting problem � find the differences in columns and list old and new values along with the column names in a new table.

The test tables that demonstrate the problem are:

CREATE TABLE dbo.z_test (
  businessdate DATE,
  colpk VARCHAR(20),
  col3 VARCHAR(10),
  col4 VARCHAR(10),
  col5 VARCHAR(10),
  col6 VARCHAR(10),
  col7 VARCHAR(10),
  col8 VARCHAR(10),
  col9 VARCHAR(10)
)
INSERT INTO z_test
VALUES ('20100628','ABC01','A','B','C','D','E','F','G')

INSERT INTO z_test
VALUES ('20100628','ABC02','AA','BB','CC','DD','EE','FF','GG')

INSERT INTO z_test
VALUES ('20100628','ABC03','AAA','BBB','CCC','DDD','EEE','FFF','GGG')

INSERT INTO z_test
VALUES ('20100627','ABC01','A','BB','C','D','E','F','GG')

INSERT INTO z_test
VALUES ('20100627','ABC02','A','BB','C','DD','E','F','GG')

INSERT INTO z_test
VALUES ('20100627','ABC03','AAA','BBB','CC','D','EEE','FF','GGG')
Code

And log table:

-- Log table
CREATE TABLE dbo.z_logtable (
  businessdate DATE,
  colpk VARCHAR(20),
  colname VARCHAR(100),
  preval VARCHAR(20),
  curval VARCHAR(20)
)
Code

Now, we want to construct a SQL statement, that will populate the Log table with the Column Name and Previous and New values for this column if there is a difference.

For this purpose we will use meta-table with the information about columns Information_Schema.Columns

The query I�m using is not guaranteed to work 100% since it uses string concatenation technique and assumes that the string will be concatenated in the order of columns in a table. In my tests this technique always worked as intended.

The first script demonstrates how to list columns with differences and count number of such columns:

SET nocount ON;

DECLARE @SQL VARCHAR(MAX)

SELECT @SQL = Coalesce(@SQL + ' + space(1) + ','') +
  ' case when isnull(a.' + Quotename(column_name) + ',space(0)) <> isnull(b.' + Quotename(column_name) + ',space(0)) then ' + Char(39) + Quotename(column_name) + Char(39) + ' else '''' end'
FROM information_schema.columns
WHERE table_name = 'z_test'
  AND column_name NOT IN ('BusinessDate','ColPK')

SET @SQL = '
select *, LEN(ColDiff) - LEN(replace(colDiff,'']'',space(0))) as CountDiff
from (
  Select a.BusinessDate, a.ColPK, RTRIM(LTRIM(' + @SQL + ')) as ColDiff
  from z_test a
  INNER JOIN z_test b on a.ColPK = b.ColPK and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)
) X'

--print @SQL
EXECUTE( @SQL)
Code

In the script above I used ISNULL(a.column,space(0)) <> ISNULL(b.column, space(0)) in order to account for possible NULLs in the columns. In the later scripts I will just test a.column <> b.column for simplicity.

The sample data can be displayed by running the below t-sql SELECT query.

select * from z_test order by colpk, businessdate
Code

sample-data-to-find-differences-in-sql-table-fields

When we execute dynamic sql script, it will compare sql data in database table z_test and find differences.
The list of different fields are listed in the output of the data compare statement as follows.

compare-sql-data-in-database-table-find-differences





The second query shows how to create and execute script that will log column differences and previous/new values:

SET nocount ON

DECLARE @SQL VARCHAR(MAX)
DECLARE @SQLCase VARCHAR(MAX)
DECLARE @SQlFrom VARCHAR(MAX)
DECLARE @SQlSelect VARCHAR(MAX)

SELECT @SQlSelect = 'SELECT b.BusinessDate, a.ColPK, '

SELECT @SQlFrom = ' FROM Z_Test a
        INNER JOIN Z_Test b on a.ColPK = b.ColPK
        and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)'

SELECT @SQL = Coalesce(@SQL + '','') + Quotename(column_name,'''') + ' as ColName, b.' +
      Quotename(column_name) + ' as PrevVal, a.' + Quotename(column_name) + ' as CurVal ' +
      @SQlFrom + '
      where a.' + Quotename(column_name) + '<> b.' + Quotename(column_name) +
      Char(13) + ' UNION ALL' + Char(13) + @SQlSelect
FROM information_schema.columns
WHERE table_name = 'Z_Test'
   AND column_name NOT IN ('BusinessDate','ColPK')

SELECT @SQL = Substring(@SQL,1,Len(@SQL) - Len(@SQlSelect) - 10)

SELECT @SQL = 'Select b.BusinessDate, b.ColPK, ' + @SQL

DELETE FROM dbo.z_logtable

SELECT @SQL = 'Insert into dbo.Z_LogTable (
  BusinessDate, ColPK, ColName, CurVal, PreVal
) ' + @SQL

PRINT @SQL

EXECUTE( @SQL)

select * from dbo.Z_LogTable order by BusinessDate, ColPK, ColName
Code

This generates one long insert statement for all columns with differences.

compare-data-and-display-previous-and-current-column-values.PNG

The original poster also asked for amendment in this script to do such inserts in chunks.
The next script demonstrates the same technique but with split for every 3 columns (see the difference in generated scripts in the Messages pane):

set nocount on

declare @ColCount int

select @ColCount = COUNT(*)
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'Z_Test'

declare @SQL varchar(max)
declare @SQLInsert varchar(max)
declare @SQlFrom varchar(max)
declare @SQlSelect varchar(max)

select @SQlSelect = 'Select b.BusinessDate, a.ColPK, '

select @SQlFrom = '
  from Z_Test a
  INNER JOIN Z_Test b on a.ColPK = b.ColPK
  and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)'

set @SQLInsert = 'Insert into dbo.Z_LogTable (
  BusinessDate, ColPK, ColName, CurVal, PreVal
) '

select @SQL = coalesce(@SQL + '','') +
  quotename(Column_Name,'''') + ' as ColName,
  ' +
  'b.' + quotename(Column_Name) + ' as PrevVal, ' +
  'a.' + quotename(Column_Name) + ' as CurVal ' +
  @SQlFrom + '
  where a.' + QUOTENAME(Column_Name) +
  '<> b.' + QUOTENAME(Column_Name) + CHAR(13) +

  Case when ORDINAL_POSITION = @ColCount
    then ''
  when ORDINAL_POSITION %3 = 0
    then @SQLInsert + CHAR(13) + @SQlSelect
  else
    ' UNION ALL' + CHAR(13) + @SQlSelect
  end
  from INFORMATION_SCHEMA.COLUMNS
  where TABLE_NAME ='Z_Test'
    and COLUMN_NAME NOT IN ('BusinessDate','ColPK')

--SELECT @SQL = SUBSTRING(@SQL, 1,LEN(@SQL) -LEN(@SQlSelect) -10)

SELECT @SQL = 'Select b.BusinessDate, b.ColPK, ' + @SQL

DELETE FROM dbo.Z_LogTable

SELECT @SQL = @SQLInsert + @SQL

print @SQL
execute(@SQL)

SELECT * FROM Z_LogTable order by BusinessDate, ColPK, ColName
Code

As you see, such tasks can be solved using simple string manipulation, although this technique is not guaranteed to work, especially if you try to add ORDER BY in the queries used. In my tests, though, it always worked.

You can see more scripts discussing the same technique at this blog
How to get information about all databases without a loop

About the Author : Naomi Nosonovsky

Naomi Nosonovsky Naomi Nosonovsky is a Senior Software Developer who has more than 15 years of enterprise experience in analysis, design, development and implementation of high-performance client/server data management solutions.

She is a personality of the year 2008 and 2009 at www.universalthread.com in .NET, SQL Server & VFP categories.
Naomi Nosonovsky is also an All Star contributor at ASP.NET Forums and MSDN Transact-SQL Forum.
Code

Readers of this sql article can share their ideas about this article at String Manipulation Discussion Forum



SQL Server

SQL Server 2019 Installation
download SQL Server 2019
download SQL Server 2017
download SQL Server 2016
download SQL Server 2014
download SQL Server 2012
MacOS ve SQL Server 2019


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.