SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for ASP.Net, SQL Server, Reporting Services, T-SQL, Windows, AWS, SAP HANA and ABAP


Detect Data Islands of Nodes and Edges using SQL on SQL Server

SQL developers can query hierarchy data using recursive CTE expression on SQL Server databases easily. Although SQL recursive CTE queries are advanced techniques for database developers which is not supported by all database providers like SAP HANA, it seems to be an option to work with connected nodes just like graph data. For example, to separate nodes connected to each other by edges into groups like islands seperating connected ones from others, many SQL programmers get stuck in recursive SQL queries.

In this SQL tutorial, I want to share a solution for SQL database developers showing how to group nodes which are connected into islands. I use the island term as grouped nodes which are related to each other but totally separated from nodes in other islands.

Following image is showing a group of nodes and relation between nodes (fromNode connected to toNode).
This fromNode and toNode naming calls a similarity for hierarchy data. On the other hand, it seems to be impossible for me at least to solve this grouping nodes into islands problem with recursive CTE query.

As seen in below image, nodes with same color have a relation and there is a path (ignoring the arrow direction and assuming relation is bidirectional) from one node in island to an arbitrary node in the same island.

edge and nodes grouped into data islands by sql code

My sample data has four islands, now I want to build a SQL script that will identify these four islands including the nodes and edges within each data island.

Before you execute the given SQL code script, maybe I can tell you how it works for you to make you understand it easily.

First of all, I create a temporary table and store my sample data in it.
This is the ##Edges table where I insert data rows representing edges (relation of fromNodes and toNodes)

A second temp table I use for calculating groups in SQL database is ##RelatedNodes.
This table has nodes field where I add every node (fromNode, toNode) which are related to each other.
If we continue to read the SQL code, you will see I create a SQL cursor and loop through all edges of my sample data and try to place these nodes in one of the rows of ##RelatedNodes temp table.

Basically, ##RelatedNodes temporary table is a sandbox where I insert independent edge data as a separate row.

If there is a relation of the current cursor row nodes with only one of the rows of the temp table, then I concatenate the loop nodes to the related data group in ##RelatedNodes temp table

One of the cases that database programmers cannot handle is the situation where the current row of the cursor has a relation with two of the rows (two previously calculated islands).
In this case the current loop data (fromNode and toNode) is a bridge between two different nodes.
This case requires removal of two different islands by replacing them with a single island representation.

All these calculations are managed within the SQL cursor.
The number of relation of the loop data is calculated into @cnt variable.

If you place your pseudo-code within the loop or query ##RelatedNodes temp table to understand how SQL script code works, you will see there are repeating nodes in table's nodes field concatenated and separated by ","" comma character.'
I did not follow a path to remove these repeats since it will cause more complexity in SQL code and might cause the performance be worse.

And one last note for performance. Although I mentioned a few times that recursive SQL CTE query is not the way to solve this problem, running it once and grouping hierarhical data at the beginning of the script can gain performance. After the first detection of related nodes, you will see recursive CTE fails to combine some data. At this point the following structure can be executed to fetch additional relations, to detect bridges and to merge different data islands into a single group of nodes.

Here is the SQL codes that can be executed on SQL Server database

set nocount on

-- sample data
create table ##Edges (
 NodeFrom int,
 NodeTo int
)

insert into ##Edges values (1,4),(10,4),(10,16),(16,17),(10,11),(3,11),(19,11),(19,20)
insert into ##Edges values (5,2),(2,21),(2,9),(14,9),(15,14)
insert into ##Edges values (7,12),(12,8),(8,13)
insert into ##Edges values (6,18)

-- required temporary table
create table ##RelatedNodes (id int identity(1,1), nodes varchar(max))

-- variables
declare @NodeFrom int
declare @NodeTo int
declare @cnt int
declare @Nodes1 varchar(max)
declare @Nodes2 varchar(max)
declare @Id1 int
declare @Id2 int

----------------------- SQL CURSOR ---------------------
DECLARE sqlcursor CURSOR FAST_FORWARD FOR
select
 NodeFrom, NodeTo
from ##Edges
Order By NodeFrom, NodeTo

OPEN sqlcursor

FETCH NEXT FROM sqlcursor INTO @NodeFrom, @NodeTo

WHILE @@FETCH_STATUS = 0
BEGIN
--- SQL Code in Cursor

 select
  Id
 into ##tblId
 from ##RelatedNodes
 where
  nodes like ('%,' + convert(varchar(max),@NodeFrom) + ',%') or
  nodes like ('%,' + convert(varchar(max),@NodeTo) + ',%')

 select @cnt = count(*) from ##tblId

 if @cnt = 0
 begin
  insert into ##RelatedNodes
  select
   case when @NodeFrom < @NodeTo
    then (',' + convert(varchar(max),@NodeFrom) + ',' + convert(varchar(max),@NodeTo) + ',')
    else (',' + convert(varchar(max),@NodeTo) + ',' + convert(varchar(max),@NodeFrom) + ',')
   end
 end
 else if @cnt = 1
 begin

  update ##RelatedNodes
  set nodes = nodes + convert(varchar(max),@NodeFrom) + ',' + convert(varchar(max),@NodeTo) + ','
  from ##RelatedNodes rn
  inner join ##tblId t
   on rn.id = t.id
  
 end
 else if @cnt = 2
 begin

  select top 1
   @Id1 = rn.id,
   @Nodes1 = nodes
  from ##RelatedNodes rn
  inner join ##tblId t
   on rn.id = t.id
  order by t.id

  select top 1
   @Id2 = rn.id,
   @Nodes2 = nodes
  from ##RelatedNodes rn
  inner join ##tblId t
   on rn.id = t.id
  order by t.id desc

  set @Nodes1 = @Nodes1 + @Nodes2

  update ##RelatedNodes
  set nodes = @Nodes1
  where Id = @Id1

  delete ##RelatedNodes where Id = @Id2

 end

 drop table ##tblId

--- SQL Code in Cursor
 FETCH NEXT FROM sqlcursor INTO @NodeFrom, @NodeTo
END

CLOSE sqlcursor
DEALLOCATE sqlcursor
----------------------- SQL CURSOR ---------------------

select
 NodeFrom, NodeTo, GroupId
from ##Edges
inner join (
 select nodes, row_number() over (order by id) as GroupId from ##RelatedNodes
) g
 on g.nodes like ('%,' + convert(varchar(max),NodeFrom) + ',%')
order by GroupId, NodeFrom, NodeTo

drop table ##RelatedNodes
drop table ##Edges

set nocount off
Code

When above SQL code is executed on a SQL Server database, it will create sample data, calculate node relations in SQL cursor code and display grouped data as islands identified with GroupId in execution output

SQL Server graph data grouped as islands

To summarize, I know executing SQL scripts with cursors is not a best practise for database developers. I have to admit that I will accept a solution using a set based operation instead of SQL cursors and looping through each data row to perform grouping and building relations between nodes. But I worked to solve this problem, seperating nodes and edges into data islands as I shared in the image at the beginning of this SQL tutorial but failed to resolve the issue without the use of SQL cursors.

Please feel free to inform me for additional solutions of this SQL problem. I will be pleased especially if there is a graph solution of this grouping and categorizing of nodes and edges.



SQL Server

SQL Server 2019 Installation
download SQL Server 2019
download SQL Server 2017
download SQL Server 2016
download SQL Server 2014
download SQL Server 2012
MacOS ve SQL Server 2019


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.