T-SQL Union – Grouping Results from Multiple Databases in Batches

group bypowershellt-sqlunion

First post in dba, so please be gentle.

I have a project I am working through and have 4+ SQL Server databases to work with.

These DBs contain 2 (or more tables) I wish to UNION the results from.

The data I am interested in contains Windows Server file paths (SrcFilename, DestFilename for reference) and looks something like this:

srcfilename001.dwg|\\srcServer1\path\to\file\|destfilename001.dwg|\\destserver\path\to\file\
srcfilename002.dwg|\\srcServer2\path\to\file\|destfilename002.dwg|\\destserver\path\to\file\
srcfilename003.dwg|\\srcServer3\path\to\file\|destfilename003.dwg|\\destserver\random\path\
srcfilename004.dwg|\\srcServer4\path\to\file\|destfilename004.dwg|\\destserver\random\path\

I have been able to UNION the data using a query like this:

    SELECT * FROM (
SELECT * FROM [Database_015].[dbo].[CAD_DATA] Where [Database_015].[dbo].[CAD_DATA].[UniqueValue] = 1 
UNION ALL
SELECT * FROM [Database_016].[dbo].[CAD_DATA] Where [Database_016].[dbo].[CAD_DATA].[UniqueValue] = 1 
UNION ALL
SELECT * FROM [Database_017].[dbo].[CAD_DATA] Where [Database_017].[dbo].[CAD_DATA].[UniqueValue] = 1 
UNION ALL
SELECT * FROM [Database_020].[dbo].[CAD_DATA] Where [Database_020].[dbo].[CAD_DATA].[UniqueValue] = 1 
) A  
ORDER BY NEWID()

Which works great…

I get the data I need in a random order rather than a bunch of files from \\server1, \\server2, \\server3 etc.

Obviously the randomness is because of the ORDER BY NEWID().

The reason for gathering the data is because we're migrating millions of files and ultimately need to hash both the source and destination files to see if they match.

Ultimately I will use Powershell to hash the files from the resultant data and I think that having each Powershell Start-ThreadJob 'Job' point at each server in turn for n files will be faster than having it work on a "stripe" of data across multiple servers.

Thanks for your time. 🙂

Edit: The first comment asked for more explanation so here goes:

I don't know how to combine both UNION ALL, GROUP BY and ORDER BY to produce a dataset from multiple different database tables that has n batches of files grouped by server in a random order.

Best Answer

Assuming the question is...

"I've randomised per-server, but now I need to randomize among the whole set"

...then it may be prudent to remove UNION & ORDER BY from your query altogether. You can unify a standard query against all database/server combinations into a single dataset stored as a PSCustomObject and pipe the unified data into Sort-Object using the expression {Get-Random} to randomise among the whole set rather than per-call implicit partitions.

You can try something like the following using dbatools.

$connections = @"
Server,Database
sql-01,Database_015
sql-01,Database_016
sql-02,Database_017
sql-03,Database_020
@" | ConvertFrom-Csv

$query_tmp = @"
select 
    '{0}' as serv,
    '{1}' as db,
    SrcFilename,
    DestFilename
from dbo.CAD_DATA
where UniqueValue = 1;
"@

$table = foreach($conn in $connections){
    $query = $query_tmp -f $conn.Server, $conn.Database
    Invoke-DbaQuery -Query $query -Server $conn.Server -Database $conn.Database
}

$table | Sort-Object {Get-Random} | Export-Csv ~/Desktop/myquery.csv

If you don't have access to / can't be bothered with dbatools, you may be able to use Invoke-SqlCmd in lieu of Invoke-DbaQuery