Sql-server – How to create sums/counts of grouped items over multiple tables

sql serversql server 2014sql-server-2012window functions

DISCLAIMER: The title might be misleading.

Introduction

I have aquired the wonderful project of bringing all our SQL Server instances up-to-date, which means I have to ensure that the approrpriaate Service Packs are applied to each instance.

We have a database tools (abbreviated as DBT) application that stores all the information of our MySQL, PostgreSQL, Microsoft SQL Server and Oracle RDBMS databases in one location.

This DBT-application links a given database to an application, the database to an instance and the instance to a server, and of course the database to a responsible person.

A database will and can have a lot of additional information (DB Version, Status, Project Manager, Database Manager, …) which I have kept out of the description to simplify the explanation.

To get the project going I wanted to output a list of unique SQL Servers with a sum of the databases and grouped by all other relevant information.
The idea was to have an overview of the SQL Servers having the most databases and the highest complexity (users, applications, instances).

TL;DR

Here is a sample of already summarized data and what I was expecting to achieve

Sample result set

SRV_NAME             INST_NAME            DB_NAME              USER_NAME            APPL_NAME
-------------------- -------------------- -------------------- -------------------- --------------------
SQLSRV_01            ANOTHER              HIS_DB               HIM                  TELLTAIL            
SQLSRV_01            ANOTHER              RZO_P4               YOU                  PSB IZQ             
SQLSRV_01            GENERAL              MY_DB2               ME                   HAL_2000            
SQLSRV_01            GENERAL              MY_DB3               ME                   HAL_2000            
SQLSRV_01            GENERAL              MY_DB4               ME                   HAL_2000            
SQLSRV_01            GENERAL              RZO_6_4              ME                   RZO_6.4             
SQLSRV_01            GENERAL              RZO_6_4_1            ME                   RZO_6.4             
SQLSRV_01            GENERAL              RZO_6_4_2            YOU                  RZO_6.4             
SQLSRV_01            GENERAL              YOUR_DB2             YOU                  HAL_2000            
SQLSRV_01            SECURE               DB1                  ME                   HAL_2000            
SQLSRV_01            SECURE               PURCHGRAV            HER                  PURCHGRAV           
SQLSRV_01            SECURE               TELLTAIL             HER                  TELLTAIL            

Expected results after further grouping by / order over

SRV_NAME             GRP_CNT_INST_NAME    SUM_DB_NAME          GRP_CNT_USER_NAME    GRP_CNT_APPL_NAME
-------------------- -------------------- -------------------- -------------------- --------------------
SQLSRV_01            3                    12                   4                    5

Explanation of expected results

The SQL Server SQLSRV_01 in the example has three (3) unique instances, twelve (12) databases total, four (4) responsible persons and five (5) applications linked to the databases. This is the summary of the sample data above.

Applying this to the whole DBT database would provide me with an overview of the most complex systems.

Reference Material already consulted

The long Version

Following the data and definitions for each table involved in the query. And at the end, the steps I have so far accomplished.

Table [DBT].[Server]

Data

 ID | SRV_NAME  | ... 
----+-----------+----
  1 | SQLSRV_01 |     
  2 | SQLSRV_11 |     
  3 | SQLSRV_21 |     

Definition

CREATE TABLE [DBT].[Server](
    [ID] [int] NOT NULL,
    [SRV_NAME] [nchar](20) NULL
) ON [PRIMARY]
END
GO
ALTER AUTHORIZATION ON [DBT].[Server] TO  SCHEMA OWNER 
GO
SET ANSI_PADDING ON
GO
CREATE UNIQUE CLUSTERED INDEX [CL_UX_Server_ALL] ON [DBT].[Server]
(
    [ID] ASC,
    [SRV_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Table [DBT].[Instance]

Data

 ID | INST_NAME   | SRV_ID | ...
----+-------------+--------+----
  1 | GENERAL     |      1 |
  2 | SECURE      |      1 |
  3 | ANOTHER     |      1 |
  4 | GENERAL     |      2 |
  5 | MSSQLSRV    |      3 |
  6 | MSSQLSRV    |      2 |
  7 | PRODUCTION  |      2 |
  8 | TESTING     |      3 |
... |             |        | 

Definition

CREATE TABLE [DBT].[Instance](
    [ID] [int] NOT NULL,
    [INST_NAME] [nchar](20) NOT NULL,
    [SRV_ID] [int] NOT NULL
) ON [PRIMARY]
END
GO
ALTER AUTHORIZATION ON [DBT].[Instance] TO  SCHEMA OWNER 
GO
SET ANSI_PADDING ON
GO
CREATE UNIQUE CLUSTERED INDEX [CL_UX_Instance_ALL] ON [DBT].[Instance]
(
    [ID] ASC,
    [INST_NAME] ASC,
    [SRV_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Table [DBT].[Application]

Data

 ID | APPL_NAME  | ... 
----+------------+-----
  1 | HAL_2000   |     
  2 | RZO_6.4    |     
  3 | PSB IZQ    |     
  4 | TELLTAIL   |     
  5 | PURCHGRAV  |     
... |            |     

Definition

CREATE TABLE [DBT].[Application](
    [ID] [int] NOT NULL,
    [APPL_NAME] [nchar](20) NOT NULL
) ON [PRIMARY]
END
GO
ALTER AUTHORIZATION ON [DBT].[Application] TO  SCHEMA OWNER 
GO
SET ANSI_PADDING ON
GO
CREATE UNIQUE CLUSTERED INDEX [CL_UX_Application_ALL] ON [DBT].[Application]
(
    [ID] ASC,
    [APPL_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Table [DBT].[User]

Data

 ID | USER_NAME  | ... 
----+------------+-----
  1 | ME         |     
  2 | YOU        |     
  3 | HIM        |     
  4 | HER        |     
  5 | THE OTHERS |     
  6 | ALIENS     |     
... |            |     

Definition

CREATE TABLE [DBT].[User](
    [ID] [int] NOT NULL,
    [USER_NAME] [nchar](20) NOT NULL
) ON [PRIMARY]
END
GO
ALTER AUTHORIZATION ON [DBT].[User] TO  SCHEMA OWNER 
GO
SET ANSI_PADDING ON
GO
CREATE UNIQUE CLUSTERED INDEX [CL_UX_User_ALL] ON [DBT].[User]
(
    [ID] ASC,
    [USER_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Table [DBT].[Database]

Data

 ID | DB_NAME    | INST_ID | APPL_ID | USER_ID | ...
----+------------+---------+---------+---------+-----
  1 | MY_DB2     |       1 |       1 |       1 | 
  2 | YOUR_DB2   |       1 |       1 |       2 | 
  3 | RZO_6_4    |       1 |       2 |       1 | 
  4 | DB1        |       2 |       1 |       1 | 
  5 | TELLTAIL   |       2 |       4 |       4 | 
  6 | PURCHGRAV  |       2 |       5 |       4 | 
  7 | HIS_DB     |       3 |       4 |       3 | 
  8 | RZO_P4     |       3 |       3 |       2 | 
  9 | PURCH      |       4 |       5 |       2 | 
 10 | YOUR_DB    |       5 |       4 |       2 | 
 11 | HER_DB     |       6 |       4 |       4 | 
 12 | TEST_PURCH |       6 |       5 |       5 | 
 13 | PROD_PURCH |       7 |       5 |       5 | 
 14 | TELLTAIL   |       7 |       4 |       4 | 
 15 | IZQ_TEST   |       8 |       3 |       3 | 
 16 | IZQ_PROD   |       7 |       2 |       2 | 
 17 | HAL_CA1    |       5 |       1 |       3 | 
 18 | MY_DB3     |       1 |       1 |       1 | 
 19 | MY_DB4     |       1 |       1 |       1 | 
 20 | RZO_6_4_1  |       1 |       2 |       1 | 
 21 | RZO_6_4_2  |       1 |       2 |       2 | 
 22 | HAL_CA1_1  |       5 |       1 |       3 | 
 23 | HAL_CA1_2  |       5 |       1 |       6 | 
... |

Definition

CREATE TABLE [DBT].[Database](
    [ID] [int] NOT NULL,
    [DB_NAME] [nchar](20) NOT NULL,
    [INST_ID] [int] NOT NULL,
    [APPL_ID] [int] NOT NULL,
    [USER_ID] [int] NOT NULL
) ON [PRIMARY]
END
GO
ALTER AUTHORIZATION ON [DBT].[Database] TO  SCHEMA OWNER 
GO
SET ANSI_PADDING ON
GO
CREATE UNIQUE CLUSTERED INDEX [CL_UX_Database_ID_DB_NAME_INST_ID] ON [DBT].[Database]
(
    [ID] ASC,
    [DB_NAME] ASC,
    [INST_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

That's all for now.

Selecting all the information

The first statement was my starting point for selecting the base information.

SELECT s.[SRV_NAME], i.[INST_NAME], d.[DB_NAME], u.[USER_NAME], a.[APPL_NAME]
FROM   [DBT].[Server]            AS s
       JOIN [DBT].[Instance]     AS i
            ON  s.ID = i.SRV_ID
       JOIN [DBT].[Database]     AS d
            ON  i.[ID] = d.[INST_ID]
       JOIN [DBT].[Application]  AS a
            ON  d.[APPL_ID] = a.[ID]
       JOIN [DBT].[User]         AS u
            ON  u.ID = d.[USER_ID]
ORDER BY 1, 2, 3, 4, 5

This results in the following records being returned and is the long version of the sample data in the introduction:

SRV_NAME             INST_NAME            DB_NAME              USER_NAME            APPL_NAME
-------------------- -------------------- -------------------- -------------------- --------------------
SQLSRV_01            ANOTHER              HIS_DB               HIM                  TELLTAIL            
SQLSRV_01            ANOTHER              RZO_P4               YOU                  PSB IZQ             
SQLSRV_01            GENERAL              MY_DB2               ME                   HAL_2000            
SQLSRV_01            GENERAL              MY_DB3               ME                   HAL_2000            
SQLSRV_01            GENERAL              MY_DB4               ME                   HAL_2000            
SQLSRV_01            GENERAL              RZO_6_4              ME                   RZO_6.4             
SQLSRV_01            GENERAL              RZO_6_4_1            ME                   RZO_6.4             
SQLSRV_01            GENERAL              RZO_6_4_2            YOU                  RZO_6.4             
SQLSRV_01            GENERAL              YOUR_DB2             YOU                  HAL_2000            
SQLSRV_01            SECURE               DB1                  ME                   HAL_2000            
SQLSRV_01            SECURE               PURCHGRAV            HER                  PURCHGRAV           
SQLSRV_01            SECURE               TELLTAIL             HER                  TELLTAIL            
SQLSRV_11            GENERAL              PURCH                YOU                  PURCHGRAV           
SQLSRV_11            MSSQLSRV             HER_DB               HER                  TELLTAIL            
SQLSRV_11            MSSQLSRV             TEST_PURCH           THE OTHERS           PURCHGRAV           
SQLSRV_11            PRODUCTION           IZQ_PROD             YOU                  RZO_6.4             
SQLSRV_11            PRODUCTION           PROD_PURCH           THE OTHERS           PURCHGRAV           
SQLSRV_11            PRODUCTION           TELLTAIL             HER                  TELLTAIL            
SQLSRV_21            MSSQLSRV             HAL_CA1              HIM                  HAL_2000            
SQLSRV_21            MSSQLSRV             HAL_CA1_1            HIM                  HAL_2000            
SQLSRV_21            MSSQLSRV             HAL_CA1_2            ALIENS               HAL_2000            
SQLSRV_21            MSSQLSRV             YOUR_DB              YOU                  TELLTAIL            
SQLSRV_21            TESTING              IZQ_TEST             HIM                  PSB IZQ             

Summarize by count(DB_NAME)

So I thought it would be a good idea to group by the SRV_NAME, INST_NAME, USER_NAME and APPL_NAME and then add COUNT(DB_NAME) to the select statement.

Statement

SELECT s.[SRV_NAME], i.[INST_NAME], count(d.[DB_NAME]) AS SUMDB, u.[USER_NAME], a.[APPL_NAME]
FROM   [DBT].[Server]            AS s
       JOIN [DBT].[Instance]     AS i
            ON  s.ID = i.SRV_ID
       JOIN [DBT].[Database]     AS d
            ON  i.[ID] = d.[INST_ID]
       JOIN [DBT].[Application]  AS a
            ON  d.[APPL_ID] = a.[ID]
       JOIN [DBT].[User]         AS u
            ON  u.ID = d.[USER_ID]

GROUP BY s.[SRV_NAME], i.[INST_NAME], u.[USER_NAME], a.[APPL_NAME]
ORDER BY 1, 2, 3, 4, 5

Results

SRV_NAME             INST_NAME            SUMDB       USER_NAME            APPL_NAME
-------------------- -------------------- ----------- -------------------- --------------------
SQLSRV_01            ANOTHER              1           HIM                  TELLTAIL            
SQLSRV_01            ANOTHER              1           YOU                  PSB IZQ             
SQLSRV_01            GENERAL              1           YOU                  HAL_2000            
SQLSRV_01            GENERAL              1           YOU                  RZO_6.4             
SQLSRV_01            GENERAL              2           ME                   RZO_6.4             
SQLSRV_01            GENERAL              3           ME                   HAL_2000            
SQLSRV_01            SECURE               1           HER                  PURCHGRAV           
SQLSRV_01            SECURE               1           HER                  TELLTAIL            
SQLSRV_01            SECURE               1           ME                   HAL_2000            
SQLSRV_11            GENERAL              1           YOU                  PURCHGRAV           
SQLSRV_11            MSSQLSRV             1           HER                  TELLTAIL            
SQLSRV_11            MSSQLSRV             1           THE OTHERS           PURCHGRAV           
SQLSRV_11            PRODUCTION           1           HER                  TELLTAIL            
SQLSRV_11            PRODUCTION           1           THE OTHERS           PURCHGRAV           
SQLSRV_11            PRODUCTION           1           YOU                  RZO_6.4             
SQLSRV_21            MSSQLSRV             1           ALIENS               HAL_2000            
SQLSRV_21            MSSQLSRV             1           YOU                  TELLTAIL            
SQLSRV_21            MSSQLSRV             2           HIM                  HAL_2000            
SQLSRV_21            TESTING              1           HIM                  PSB IZQ        

As you can see by the results there is further potential to summarise (group?) for example by the INST_NAME, USER_NAME, APPL_NAME to have an overview of the most complex systems.

Group over INST_NAME, USER_NAME and APPL_NAME

So basically I would like to have a summary of each unique (sub-)item based on the server as explained in the introduction:

SRV_NAME             GRP_CNT_INST_NAME    SUM_DB_NAME          GRP_CNT_USER_NAME    GRP_CNT_APPL_NAME
-------------------- -------------------- -------------------- -------------------- --------------------
SQLSRV_01            3                    12                   4                    5

Hmmm. Looking at the books online I have the option of the OVER Clause (Transact-SQL) and partitioning on the relevant columns. But then I might be misinterpreting the description.

Statement

SELECT s.[SRV_NAME], 
COUNT(i.[INST_NAME]) OVER (PARTITION by i.[INST_NAME]) as GRP_CNT_INST_NAME, 
COUNT(d.[DB_NAME]) AS SUMDB, 
COUNT(u.[USER_NAME]) OVER (PARTITION by u.[USER_NAME]) as GRP_CNT_USER_NAME, 
COUNT(a.[APPL_NAME]) OVER (PARTITION by a.[APPL_NAME]) as GRP_CNT_APPL_NAME
FROM   [DBT].[Server]            AS s
       JOIN [DBT].[Instance]     AS i
            ON  s.ID = i.SRV_ID
       JOIN [DBT].[Database]     AS d
            ON  i.[ID] = d.[INST_ID]
       JOIN [DBT].[Application]  AS a
            ON  d.[APPL_ID] = a.[ID]
       JOIN [DBT].[User]         AS u
            ON  u.ID = d.[USER_ID]

GROUP BY s.[SRV_NAME]--, i.[INST_NAME], u.[USER_NAME], a.[APPL_NAME]
ORDER BY 1, 2, 3, 4, 5

Results

SRV_NAME             GRP_CNT_INST_NAME SUMDB       GRP_CNT_USER_NAME GRP_CNT_APPL_NAME
-------------------- ----------------- ----------- ----------------- -----------------
SQLSRV_01            2                 1           3                 5
SQLSRV_01            2                 1           6                 2
SQLSRV_01            3                 1           3                 5
SQLSRV_01            3                 1           4                 4
SQLSRV_01            3                 1           4                 5
SQLSRV_01            5                 1           6                 3
SQLSRV_01            5                 1           6                 5
SQLSRV_01            5                 2           3                 3
SQLSRV_01            5                 3           3                 5
...
...

That doesn't look like what I was expecting to achieve. But then again, I might need a totally different approach.

Question

I'm still trying to find the right way to summarise each sub-item so as to have an overview of the most complex systems. What is a possible solution to my problem?

Best Answer

It appears you want COUNT(DISTINCT), which gives you the count of unique values in a column – seems to be exactly what you want.

SELECT s.[SRV_NAME],
       GRP_CNT_INST_NAME = COUNT(DISTINCT i.[INST_NAME]),
       SUM_DB_NAME       = COUNT(*),
       GRP_CNT_USER_NAME = COUNT(DISTINCT u.[USER_NAME]),
       GRP_CNT_APPL_NAME = COUNT(DISTINCT a.[APPL_NAME])
FROM   [DBT].[Server]            AS s
       JOIN [DBT].[Instance]     AS i
            ON  s.ID = i.SRV_ID
       JOIN [DBT].[Database]     AS d
            ON  i.[ID] = d.[INST_ID]
       JOIN [DBT].[Application]  AS a
            ON  d.[APPL_ID] = a.[ID]
       JOIN [DBT].[User]         AS u
            ON  u.ID = d.[USER_ID]
GROUP BY s.[SRV_NAME];

Based on your joins, it appears that DB rows are going to be unique, so you probably do not need COUNT(DISTINCT) in that specific instance. If the output should reflect the number of actual databases, counting distinct names can give you a skewed result, since different instances might have databases with identical names and COUNT(DISTINCT) would see that as a single item.

On the other hand, there would probably be no issue if you counted IDs rather than names:

SELECT s.[SRV_NAME],
       GRP_CNT_INST_NAME = COUNT(DISTINCT i.[ID]),
       SUM_DB_NAME       = COUNT(DISTINCT d.[ID]),
       GRP_CNT_USER_NAME = COUNT(DISTINCT u.[ID]),
       GRP_CNT_APPL_NAME = COUNT(DISTINCT a.[ID])
FROM   [DBT].[Server]            AS s
       JOIN [DBT].[Instance]     AS i
            ON  s.ID = i.SRV_ID
       JOIN [DBT].[Database]     AS d
            ON  i.[ID] = d.[INST_ID]
       JOIN [DBT].[Application]  AS a
            ON  d.[APPL_ID] = a.[ID]
       JOIN [DBT].[User]         AS u
            ON  u.ID = d.[USER_ID]
GROUP BY s.[SRV_NAME];