Depending on your permissions, the linked server could be trying to stream all the data over locally and then doing filtering. References
You might be able to skip that pain by computing the total aggregate count first into a table on the local server and then beat against that.
CREATE TABLE #LOCAL
(
package_uuid nvarchar(255) NOT NULL PRIMARY KEY CLUSTERED
, [count] bigint
);
INSERT INTO
#LOCAL
SELECT
p.package_uuid
, count(d.external_identification) AS [count]
FROM
ServerB.DATABASE.dbo.package p
INNER JOIN
ServerB.DATABASE.dbo.doc2 d
ON p.package_id = d.package_id
GROUP BY
p.package_uuid;
Try running that query locally on ServerB first to get an understanding of the theoretical throughput without factoring in your network. You can then do some quick and dirty estimates based on data sizes (500 + 8 per row in temporary table) and then it depends on your network. Hopefully this is all local network.
If the time is significantly different between the run on ServerB and pulling it back over, then you might need to use the OPENQUERY syntax to force the join on the remote server. Code approximately
CREATE TABLE #LOCAL
(
package_uuid nvarchar(255) NOT NULL PRIMARY KEY CLUSTERED
, [count] bigint
);
INSERT INTO
#LOCAL
SELECT
OQ.package_uuid
, OQ.[count]
FROM
OPENQUERY(ServerB,
N'
SELECT
p.package_uuid
, count(d.external_identification) AS [count]
FROM
DATABASE.dbo.package p
INNER JOIN
DATABASE.dbo.doc2 d
ON p.package_id = d.package_id
GROUP BY
p.package_uuid
) AS OQ;
Even though you fixed the immediate rounding issue, the overall algorithm to get per-object / index stats is incorrect. It does not properly handle LOB and row-overflow data. It also excludes: Indexed Views, FullText indexes, XML indexes, and a few other cases. Hence, you might not be seeing all of your data.
The following is an adaptation of the code I posted to an answer on StackOverflow ( sp_spaceused - How to measure the size in GB in all the tables in SQL ) that handles all of the cases that sp_spaceused
handles. That S.O. question was only concerned with per-object stats, not per index, so I have adjusted the code to handle things at the index level.
;WITH agg AS
( -- Get info for Tables, Indexed Views, etc
SELECT ps.[object_id] AS [ObjectID],
ps.index_id AS [IndexID],
NULL AS [ParentIndexID],
NULL AS [PassThroughIndexName],
NULL AS [PassThroughIndexType],
SUM(ps.in_row_data_page_count) AS [InRowDataPageCount],
SUM(ps.used_page_count) AS [UsedPageCount],
SUM(ps.reserved_page_count) AS [ReservedPageCount],
SUM(ps.row_count) AS [RowCount],
SUM(ps.lob_used_page_count + ps.row_overflow_used_page_count)
AS [LobAndRowOverflowUsedPageCount]
FROM sys.dm_db_partition_stats ps
GROUP BY ps.[object_id],
ps.[index_id]
UNION ALL
-- Get info for FullText indexes, XML indexes, Spatial indexes, etc
SELECT sit.[parent_id] AS [ObjectID],
sit.[object_id] AS [IndexID],
sit.[parent_minor_id] AS [ParentIndexID],
sit.[name] AS [PassThroughIndexName],
sit.[internal_type_desc] AS [PassThroughIndexType],
0 AS [InRowDataPageCount],
SUM(ps.used_page_count) AS [UsedPageCount],
SUM(ps.reserved_page_count) AS [ReservedPageCount],
0 AS [RowCount],
0 AS [LobAndRowOverflowUsedPageCount]
FROM sys.dm_db_partition_stats ps
INNER JOIN sys.internal_tables sit
ON sit.[object_id] = ps.[object_id]
WHERE sit.internal_type IN
(202, 204, 207, 211, 212, 213, 214, 215, 216, 221, 222, 236)
GROUP BY sit.[parent_id],
sit.[object_id],
sit.[parent_minor_id],
sit.[name],
sit.[internal_type_desc]
), spaceused AS
(
SELECT agg.[ObjectID],
agg.[IndexID],
agg.[ParentIndexID],
agg.[PassThroughIndexName],
agg.[PassThroughIndexType],
OBJECT_SCHEMA_NAME(agg.[ObjectID]) AS [SchemaName],
OBJECT_NAME(agg.[ObjectID]) AS [TableName],
SUM(CASE
WHEN (agg.IndexID < 2) THEN agg.[RowCount]
ELSE 0
END) AS [Rows],
SUM(agg.ReservedPageCount) * 8 AS [ReservedKB],
SUM(agg.LobAndRowOverflowUsedPageCount +
CASE
WHEN (agg.IndexID < 2) THEN (agg.InRowDataPageCount)
ELSE 0
END) * 8 AS [DataKB],
SUM(agg.UsedPageCount - agg.LobAndRowOverflowUsedPageCount -
CASE
WHEN (agg.IndexID < 2) THEN agg.InRowDataPageCount
ELSE 0
END) * 8 AS [IndexKB],
SUM(agg.ReservedPageCount - agg.UsedPageCount) * 8 AS [UnusedKB],
SUM(agg.UsedPageCount) * 8 AS [UsedKB]
FROM agg
GROUP BY agg.[ObjectID],
agg.[IndexID],
agg.[ParentIndexID],
agg.[PassThroughIndexName],
agg.[PassThroughIndexType],
OBJECT_SCHEMA_NAME(agg.[ObjectID]),
OBJECT_NAME(agg.[ObjectID])
)
SELECT sp.SchemaName,
sp.TableName,
sp.IndexID,
CASE
WHEN (sp.IndexID > 0) THEN COALESCE(si.[name], sp.[PassThroughIndexName])
ELSE N'<Heap>'
END AS [IndexName],
sp.[PassThroughIndexName] AS [InternalTableName],
sp.[Rows],
sp.ReservedKB,
(sp.ReservedKB / 1024.0 / 1024.0) AS [ReservedGB],
sp.DataKB,
(sp.DataKB / 1024.0 / 1024.0) AS [DataGB],
sp.IndexKB,
(sp.IndexKB / 1024.0 / 1024.0) AS [IndexGB],
sp.UsedKB AS [UsedKB],
(sp.UsedKB / 1024.0 / 1024.0) AS [UsedGB],
sp.UnusedKB,
(sp.UnusedKB / 1024.0 / 1024.0) AS [UnusedGB],
so.[type_desc] AS [ObjectType],
COALESCE(si.type_desc, sp.[PassThroughIndexType]) AS [IndexPrimaryType],
sp.[PassThroughIndexType] AS [IndexSecondaryType],
SCHEMA_ID(sp.[SchemaName]) AS [SchemaID],
sp.ObjectID
--,sp.ParentIndexID
FROM spaceused sp
INNER JOIN sys.all_objects so -- in case "WHERE so.is_ms_shipped = 0" is removed
ON so.[object_id] = sp.ObjectID
LEFT JOIN sys.indexes si
ON si.[object_id] = sp.ObjectID
AND (si.[index_id] = sp.IndexID
OR si.[index_id] = sp.[ParentIndexID])
WHERE so.is_ms_shipped = 0
--so.[name] LIKE N'' -- optional name filter
--ORDER BY ????
Best Answer
SQL Server stores data on 8kb pages. You can't have a table that's smaller than than two pages. So when you add one row to the table, you have a minimum space consumption of 2 * 8kb = 16kb. But those pages aren't full, which means adding 15 more rows doesn't increase table size by (15 * initial size), it just means that the data page holding the initial row had enough room for more data. If you add 1000 more rows, I bet your size will increase, but this will vary depending on what's in a row and how many rows can fit on an 8kb page.
I don't know that it matters, really, how much data is in a table if you want to only look at the space on a page that's used, and ignore the space on a page that isn't used. You're taking 8 kb on disk and 8 kb in memory whether the page has 10 bytes of data or 7,997 bytes. So you will see that most space calculation scripts just take
(number of pages) * 8,192
because the unused space on a page is very rarely relevant.