Sql-server – How to break down RELATIONAL data into HIERARCHY data structure

hierarchysql serversql-server-2008

I have a query that returns the following dataset:

RELATIONid MAPid  D1id   D2id  D3id
4999       4999   626    1250    7 
5000       5000   626    1250    8

For the next step, I need to bind those datasets into treeview (hierarchy structure). I need to transform this dataset into the following:

Nodeid   ParentNodeid  Header
626       null           D1
1250      626            D2
7         1250           D3
8         1250           D3

How I can achieve those structure from the original dataset?

I have one more favor to ask, the data structure a (little bit) more complex than the previous one. Let's say I have sample dataset like this:

RELATIONid MAPid  D1id   D2id  D3id
4999       4999   626    1250    7 
5000       5000   626    1250    8
5001       5001   627    1300    10 
5002       5002   627    1300    12 
5003       5003   628    1400    15

From the following dataset, we have 3 MainParent: 626, 627, 628 and the transformation (cross apply) output expectation will be like this:

Nodeid   ParentNodeid  Header
626       null           D1
1250      626            D2
7         1250           D3
8         1250           D3
627       null           D1
1300      627            D2
10        1300           D3
12        1300           D3
628       null           D1
1400      628            D2
15        1400           D3

Please note that the data are ordered per ParentNode followed by its node data.

Best Answer

A CROSS APPLY would seem perfect for the job:

SELECT
  v.Nodeid,
  v.ParentNodeid,
  v.Header
FROM
  dbo.atable
  CROSS APPLY
  (
    VALUES
    (D1id, NULL, 'D1'),
    (D2id, D1id, 'D2'),
    (D3id, D2id, 'D3')
  ) AS v (Nodeid, ParentNodeid, Header)
;

For every row of the source dataset, CROSS APPLY produces three, using the VALUES row constructor, explicitly specifying which column of the original set goes into which new column.

Now, the above will return duplicates if some pairs in your source repeat. You can suppress them with DISTINCT:

SELECT DISTINCT
  v.Nodeid,
  v.ParentNodeid,
  v.Header
...

In order for the transformed set to follow the order of D1id ASC, D2id ASC, D3id ASC, you could include those columns in the output and use them for sorting:

SELECT DISTINCT
  t.D1id,
  t.D2id,
  t.D3id,
  v.Nodeid,
  v.ParentNodeid,
  v.Header
FROM
  dbo.atable AS t
  CROSS APPLY
  (
    VALUES
    (t.D1id, NULL  , 'D1'),
    (t.D2id, t.D1id, 'D2'),
    (t.D3id, t.D2id, 'D3')
  ) AS v (Nodeid, ParentNodeid, Header)
ORDER BY
  t.D1id ASC,
  t.D2id ASC,
  t.D3id ASC
;

The reason you have to include them in SELECT is because, when you have DISTINCT, you may only sort by columns in the SELECT clause. Naturally, the result set will include the three extra columns as well. If you do not want them in the output, you can use the above as a derived table: your outer SELECT would pull only the three required columns and sort by the other three:

SELECT
  Nodeid,
  ParentNodeid,
  Header
FROM
(
  SELECT DISTINCT
    t.D1id,
    t.D2id,
    t.D3id,
    v.Nodeid,
    v.ParentNodeid,
    v.Header
  FROM
    dbo.atable AS t
    CROSS APPLY
    (
      VALUES
      (t.D1id, NULL  , 'D1'),
      (t.D2id, t.D1id, 'D2'),
      (t.D3id, t.D2id, 'D3')
    ) AS v (Nodeid, ParentNodeid, Header)
) AS s
ORDER BY
  D1id ASC,
  D2id ASC,
  D3id ASC
;

Alternatively, you could use GROUP BY instead of DISTINCT and thus return the sorted three-column set without a derived table:

SELECT
  v.Nodeid,
  v.ParentNodeid,
  v.Header
FROM
  dbo.atable AS t
  CROSS APPLY
  (
    VALUES
    (t.D1id, NULL  , 'D1'),
    (t.D2id, t.D1id, 'D2'),
    (t.D3id, t.D2id, 'D3')
  ) AS v (Nodeid, ParentNodeid, Header)
GROUP BY
  t.D1id,
  t.D2id,
  t.D3id,
  v.Nodeid,
  v.ParentNodeid,
  v.Header
ORDER BY
  t.D1id ASC,
  t.D2id ASC,
  t.D3id ASC
;

Related Solutions

Sql-server – Database structure, relational or data warehouse

It seems you want to aggregate location based statistics over time for rainfall. A database structure like the one below would let you do that. The 'data source' could be just a filename, or some indication as to where it came from.

create table DimDataSource (
       DataSourceID      int identity (1,1) not null
       DataSourceDesc    nvarchar (100)  -- May need unicode for file names
)
go

alter table DimDataSource
  add constraint PK_DataSource
      primary key clustered (DataSourceID)
go

create table DimLocation (
       LocationID        int identity (1,1) not null
       LocationDesc      varchar (50)
)
go

alter table DimLocation
  add constraint PK_Location
      primary key clusterd (LocationID)
go

create table DimDate (
       DateID           smalldatetime not null  -- 'Date' is a reserved word
      ,MonthID          int not null
      ,MonthDesc        varchar (15)
      ,QuarterID        int not null
      ,QuarterDesc      varchar (15)
      ,YearID
)
go

alter table DimDate
  add constraint PK_Date
      primary key clustered (DateID)
go

create table DimTime (
       TimeID           time not null  -- 'Time' is a reserved word
      ,Hour             int not null
)
go

alter table DimTime
  add constraint PK_Time
      primary key clustered (TimeID)
go


-- If the table is <50GB, don't bother with partitioning, but put a clustered
-- index on DateID or LocationID and DateID, depending on how you normally expect
-- to query the data.

create table FactRainfall (
       RainfallID        int identity (1,1) not null -- May need a wider type if >4B rows.
                                                     -- SSAS likes an identity column for
                                                     -- incremental loads
      ,DataSourceID      int not null
      ,LocationID        int not null
      ,DateID            smalldatetime not null
      ,TimeID            time not null
      ,Rainfall          float
)
go

-- Add foreign keys as necessary

Populate the dimensions with the appropriate list of locations, date ranges, time of day to the right grain and one data source record per file. This table will also allow you to put a cube over the top, or can be flattened with a view, which will help people using tools like Excel or stats packages to get and use the data.

Sql-server – Combining data from two databases with same structure into one database

There's some lack of detail in the question; for example, it would be helpful to know:

What is the reason to combine the data together into a single database? It makes no sense to me if there isn't a further goal behind doing so. Perhaps there is a better way to make that end goal happen with fewer intermediate step(s).
Is there some kind of boundary between the sources where the backups are taken, and the destination where the backups are restored that prevents using a more direct approach?
What kind of overlap in data is there between the source sites? Obviously the user data will be different, but things like default/pre-defined rows may be common (are they user-editable?).
How well-structured is the database schema? Are there sufficient surrogate and business keys defined to be able to uniquely identify rows, or merge rows by business meaning? This is critical if you choose a programmatic/dynamic approach to merge the data.
What is the data access strategy? Is it stored-procedure-based, or is it direct table access, or by views, etc.?

In any event, here are a few different architectures you could pursue based on the answers to the questions above.

The most obvious, and probably least disruptive, is to add a piece of code to the existing environment that will merge the data together from the restored databases.

Depending on the number of tables in the database, you could simply script it out (for a small number of tables), or see if there's an off-the-shelf piece of software to do it for you (for a large number of tables). Since you're asking this question, I'm assuming the number of tables isn't trivial.

I've actually written a piece of software that essentially does this (we didn't use off-the-shelf because we have custom requirements) for our database, which is about 750 tables, with the requirement to merge common data. I will tell you now that this was not an easy task. If there are custom requirements, I would strongly consider manually scripting the transfer process, even for a relatively large number of tables. That may sound like a lot of work, and it is, but it's simpler to create and maintain -- the complexity is in size rather than code wizardry, which is much more difficult to debug and test.
Merge replication. This could be accomplished directly from the source databases, assuming there isn't any barrier between the sources and target (see above).

This introduces extra requirements, potential performance issues, and support complexity around the source databases. I would only recommend this if the schema is really clean and there is very little overlap in data between the sources.
Shell database. Create a new (empty) database at the target location that contains views that mimic the source tables or views by using 3-part names to UNION or UNION ALL the data from the individual target databases.

If the eventual plan is to do something like create a data warehouse and you're just going to end up scanning the target databases anyway, this strategy could work out really well, as it's completely transparent if you end up adding more databases down the line. It also requires very little additional storage space.

So those are a few of the architectures I could think of off the top of my head. There are undoubtedly others. What you ultimately end up doing will depend greatly on your specific environment and requirements.

Best Answer

Related Solutions

Sql-server – Database structure, relational or data warehouse

Sql-server – Combining data from two databases with same structure into one database

Related Question