Sql-server – How to Develop Union Reports from Single Tenant Databases

data-warehousesql serversql-server-2016view

We have a database for each customer, for over 700 databases. All the schemas are the same. Now we want to conduct Reporting for : All Databases Customer Analysis. Many db professionals prefer single tenant databases. All database reside on the same server instance.
Single Tenant Database Discussion

CREATE TABLE CustomerOne.[dbo].[CustomerTransaction]
(
    [CustomerTransactionid] [int] identity(1,1) primary key NOT NULL,
    [Customerid] [int] NOT NULL default(1),
    [QuantityBought] [int] NULL,
)

Does anyone know of any other ideas not listed below, which would work?
Our idea is to either create large Views, however, View performance is slow. The second option is stored procedures.

(1)

CREATE VIEW Reportingdb.dbo.CustomerTransaction
as
SELECT 
    [Customertransactionid]
    ,[Customerid]
    ,[Quantitybought]
FROM [CustomerOne].[dbo].[customertransaction]

UNION ALL

SELECT 
    [Customertransactionid]
    ,[Customerid]
    ,[Quantitybought]
FROM [CustomerTwo].[dbo].[customertransaction]

2) We can also create a stored procedure. Stored Procedure which will insert into this large table. However there will be a time lag between runs, we want instantaneous data.

insert into ReportingDB.[dbo].[customertransaction] 
select * from CustomerOne.[dbo].[customertransaction]
UNION ALL
select * from CustomerTwo.[dbo].[customertransaction]

SQL How to Convert Single Tenant Databases to a Multitenant Database

Best Answer

Given that the data is in 700 separate databases, and likely to stay there, and that you want reporting to show the current state of the transactional database I see basically two choices. One is send the query to the data; the second is send the data to the query.

Your view would be an example of sending the query to the data. Each client DB performs some of the work and the reporting DB collates and formats the results. There will be a couple of challenges with this. Cross-database queries are slower than single database queries. I don't have numbers to quote, but this is the received wisdom. Likely the diverse databases will be accessed serially - first DB1 is read, then DB2 is read, then DB3 etc. - meaning the reporting query's execution time will be (at least) 700 times an individual DB's query time. Ouch. There may be clever ticks to force parallel processing in this circumstance. I don't know of any. Certainly a SQL CLR procedure could be written to spawn a thread for each client DB, then correlate the results. A sort of Hadoop Lite, if you will. If I were going to those lengths, however, I think I'd prefer to have BIML generate the necessary packages and run them on demand. It would be a close-run thing to decide which I disliked more - 700 extra threads running inside the SQL Server process or 700 SSIS packages running simultaneously.

By sending the data to the query I mean having a second copy of the data in a single reporting database. It is against this copy that report queries run. This strategy comes in two flavours - pull and push. Pull is the reporting DB sucking data from client DBs according to the reporting DB's schedule and requirements. Your second suggestion using a stored procedure would fall into this approach. BIML-generated SSIS packages could also be used for this. It will have the normal issues with polling (figuring out what's changed, wasted cycles when nothing has). SQL Server has features to help with this (Change Tracking and Change Data Capture), but still.

Push techniques are generally favoured because they avoid the polling overhead. A simple trigger may suffice, duplicating the data to the reporting DB or a staging table on each write to a client DB. This is extra work in a client-facing transaction, however, and extra opportunity for failure. For this reason asynchronous techniques would be preferable. One option is to have the application put a message on a reliable service which the reporting application picks up and processes. There is even a message service baked into SQL Server which could be used (Service Broker). Processing, coordination, failure and recovery are all you concern in client code, however.

SQL Server comes with a number of replication options. These reliably copy data from a source to a destination. Initiation, synchronisation and recovery are all available. I think this is likely to be your best option. Configure SQL Server to replicate changes in client DBs asynchronously but reliably to the reporting DB. Run queries off this reporting DB.