Sql-server – Consolidating 2 Tables with same Values

duplicationmigrationsql server

I am working on a project trying to consolidate two databases, DB1 and DB2, for my organization into a single database, DB3.

Initially our organization started with DB1. A few years back they made a new database called DB2 and moved on to using only DB2.

Currently, I am working on the Orders Table.

Since new rows were inserted into DB2 without importing existing rows from DB1, Order-id Column values started from 1 instead of continuing from where the Order-id Column in DB1 had ended.

I have many Order-id Column values that are the same in both Tables that represent different orders.

For example Order-id = 100 in DB1 refers to order "X", whereas Order-id = 100 in DB2 refers to order "Y".

If I insert rows from both DB1.Orders and DB2.Orders into DB3.Orders there will be duplicate rows representing different orders.

As I have not come across this sort of situation before, I need a suggested approach for dealing with this.

Best Answer

Sometimes, the lack of an uint type in SQL Server can come in handy. If Order-id is defined as identity(1,1) in DB2, you likely do not have any negative Order-ids.

Consider importing all the legacy DB1..Orders.Order-id fields as the same number, but multiplied by -1. If there's a FK to Order-id in DB1, you would use the same multiple-by-negative-one conversion when those records are loaded into their corresponding tables in DB2.

With this method, you also have the benefit of easily identifying the legacy orders originally from DB1 because their Order-id is always negative (just be sure to document this.)

Edit - added an example

-- Assumptions:
-- DB1 and DB2 are on the same server
-- The table Orders is in the dbo schema
-- Order-id is an integer-type in DB1 and DB2
-- Order-id >= 0 in DB1 and DB2
-- Order-id is an identity(a, b), where a and b >=0 in DB2

-- Overview: Insert records from DB1 into DB2, but reverse the
-- sign of Order-id (i.e., positive to negative).

Use DB2
Go

Set Identity_Insert dbo.Orders On;

Insert dbo.Orders
(Order-id, col2, col3)

Select (-1 * Order-id), col2, col3
From DB1.dbo.Orders;

Set Identity_Insert dbo.Orders Off;
Go

-- Have related records? Insert them here.

Related Solutions

Sql-server – Bitmask Flags with Lookup Tables Clarification

I partially agree with Aaron's comment - in the most general case for storing 21 unrelated pieces of information, you'd probably use 21 bit columns. As a general solution, it may well be your best solution. If you had multiple bitmask-ed varchar columns, that would translate to a row with possibly over a hundred bit flags. FYI, 21 bits get stored as 3 bytes when you don't define them as NULLable, removing the necessity for space in the NULL bitmap. Since you have multiple bitmask columns, you'd end up with every 8 bits mashed into a byte.

What SQL Server ends up doing with your multi-column queries is eventually a bunch of bitmasking routines (yes! SQL Server uses bitmasks, so they the concept per se can't be all bad!) but for average use cases, it makes life easier for you.

If we had more information about what types of queries you run, we may be able to better advise, because ultimately the use cases dictate the design.

If you persist with the COMPUTED column, I would persist and index it if you haven't already. It helps some queries, such as

exact matches

WHERE computedInt = POWER(2, 6) -- bit position 7
AND matching on 15th bit and OR matching on 2 other bits (10th and 7th)

WHERE computedInt >= Power(2,14) AND computedInt < Power(2,15) AND computedInt & (Power(2,9) + Power(2,6)) > 0

But these are probably exotic samples and yet also real live in some cases. It's certainly not too much worse than 21 individual bit columns, for which yes your statements could be easier to write, but remember that SQL Server has mashed them for storage into 3 bytes and will be doing the bit-unmasking anyway! You would have thought if bit-masking were all bad (without exception) then SQL Server wouldn't be doing it, right?

EDIT

Re the scenario of

Four flags, HasHouse,HasCar,HasCat,HasDog, 0000 is has none, 1111 is has all.

it is more efficient and logically expedient to test all 4 bits at once and do a single integer based operation, e.g.

WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) = 0 -- has none
WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) > 0 -- has one or more

Hypothetically, if this were your most exercised query on the table, you might even group the four columns into another computed column and index it separately, making the bitmask unnecessary (just test the resultant int with =0 and >0). You might even go further and just precompute the answer... horses for courses.

Sql-server – Question about non-clustered index storage in SQL Server

Nonclustered indexes always include a row locator.

For a heap this will be an 8 byte RID (File:Page:Slot). On a table with a clustered index it will be the clustered index key column(s). And it will always be the copied values not a pointer to the values. This duplication of CI key values into all non clustered indexes is why it is often recommended that the CI key be narrow and not frequently updated.

In the table shown in the question the Clustered index key is a 4 byte integer and potentially may also include a 4 byte uniqueifier for any duplicate key values.

In your case as the NCIs are not declared as unique the CI key will be appended to the NCI key.

For unique non clustered indexes the CI key would be added as included column(s) in the leaf pages unless explicitly made part of the key.

See Kalen Delaney: More About Nonclustered Index Keys for some additional information about how you can see this for yourself.

With these 4 rows of data all three indexes only consume a single 8KB data page.

SELECT index_id,
       index_level,
       page_count,
       record_count
FROM   sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('people'), NULL, NULL, 'DETAILED')

Returns

+----------+-------------+------------+--------------+
| index_id | index_level | page_count | record_count |
+----------+-------------+------------+--------------+
|        1 |           0 |          1 |            4 |
|        2 |           0 |          1 |            4 |
|        3 |           0 |          1 |            4 |
+----------+-------------+------------+--------------+

The additional page shown in use by sys.allocation_units.total_pages is an IAM page. This is not used for storing data but just for tracking the pages and extents comprising the index.

Best Answer

Related Solutions

Sql-server – Bitmask Flags with Lookup Tables Clarification

Sql-server – Question about non-clustered index storage in SQL Server

Related Question