Update Table Data for Uniform Distribution in SQL

On SQL Server database table, if SQL developers want to create equal number of rows or uniform distribution among data categories database programmers can use SQL script given in this tutorial.

The original SQL table data is having a distribution on selected column as follows

SQL table data distribution


;with summary as (

 SELECT distinct 

  COUNT(*) Over (Partition By 1) Cnt, 

  RevId, 

  COUNT(RevId) Over (Partition By RevId) RevCnt

 FROM MyTable

), todo as (

 select

  Cnt, 

  Cnt / (count(*) over (partition by 1)) Average,

  RevId, RevCnt

 from summary

), joint as (

select

 id,

 MyTable.RevId,

 orderno = ROW_NUMBER() over (partition by MyTable.RevId order by id),

 Cnt,

 Average,

 RevCnt,

 RevCnt - Average as forupdate

from MyTable

inner join todo on MyTable.RevId = todo.RevId

), upd as (

 select * from joint

 where RevCnt < Average and orderno <= (forupdate * -1)

 union all

 select * from joint

 where RevCnt > Average and orderno <= forupdate

), final as (

select 

 id, RevId,

 case when forupdate > 0 then 1 else -1 end as overAvg,

 rn = ROW_NUMBER() over (partition by (case when forupdate > 0 then 1 else -1 end) order by id)

from upd

)

update MyTable

set

 RevId = ISNULL(n.RevId,p.RevId)

from MyTable t

inner join final p on p.id = t.id

left join final n on p.rn = n.rn and n.overAvg = -1

where p.overAvg = 1

Code

After above SQL update statement is executed, table data is now having a uniform distribution on selected column

uniform data distribution

Unfortunately, I realized that above update is not sufficient in some cases.
Assume that our values are distributed as seen in below numbers in our sample database table.

As you will realize, I had a total number of 100 rows at first sample.
Then for following case, I had introduced two new groups of rows in low numbers.
5 new rows for category 400 and 5 rows for category group 500

table data

After we execute above code, the updated data will have below count values.
Since the count difference between minimum group (with 5 rows) and target distribution count (100+5+5)/5 = 22 is bigger than minimum group count (5), the distribution is not uniform even after first correction.

group by counts

An other attemp to make numbers equal among different categories or groups by running the same script once more will improve the result

SQL table data grouped by categories

If not satistied with the result, let's try again. Execute the exact SQL script again. And then check the final situation on value distribution on table data.

uniformly distributed table data

Yes now it is all uniformly distributed :)

SQL Server

SQL Server 2019 Installation
download SQL Server 2019
download SQL Server 2017
download SQL Server 2016
download SQL Server 2014
download SQL Server 2012
MacOS ve SQL Server 2019

SQL Resources

SQL Server Tutorials
SQL Server 2012
SQL Server Tools
SQL Server SSIS