I have a table which contains folders' paths. This table contains four columns:
DirID
– The folder's ID.BaseDirID
– The ID of the first folder in the hierarchy. So all the folders (paths) from the same hierarchy share the same value in this column.DirLevel
– The depth of the folder.DisplayPath
– The folder's path.
I need to find all the "gaps" between those folders in the hierarchy.
Sample data for example:
DirID BaseDirID DirLevel DisplayPath
1 1 1 'A'
2 1 3 'A\B\C'
3 1 5 'A\B\C\D\E'
4 1 3 'A\B\F'
5 1 5 'A\B\F\G\H'
6 2 1 'U'
7 2 3 'U\V\W'
8 2 5 'U\V\W\X\Y'
9 2 3 'U\V\M'
10 2 5 'U\V\M\L\O'
So we need to find the following data:
BaseDirID DisplayPath
1 'A\B'
1 'A\B\C\D'
1 'A\B\F\G'
2 'U\V'
2 'U\V\W\X'
2 'U\V\M\L'
Comments:
- This table contains more than 250,000 records of folders, so we seek for the most efficient way to do so, otherwise, the script will be stuck for a long time, the time we don't have.
- I don't have the list of all folders. What I have are the "root" folders and the "leafs" folders which I need to find the "gaps" between them in the hierarchy.
- The table can contain more than one hierarchy and we need to find the "gaps" in all of the hierarchies.
- Each of the hierarchies can split, as you can see in the sample data the first hierarchy splits to two folders paths from the 'A\B' folder:
'A\B\C' and 'A\B\F'.
And the second hierarchy splits to two folders paths from the 'U\V' folder: 'U\V\W' and 'U\V\M'.
And we need to find all the "gaps" even in such cases when the hierarchies split. - We can make any changes to the table – add pk, indexes etc…
- The SQL Server version is 2012 SP3.
- The real folder names could be anything – one character or more.
- The results must only contain the "gaps".
- All missing intermediate folders are required.
Extended sample data and expected results in this dbfiddle.
This question is a continuation of the Stack Overflow question Find missing hierarchy Folders (Paths) in a table. Our question includes also the 4th comment which appears in bold.
I saw that there is a new type called hierarchyid
(starting from SQL Server 2008), which I thought might help us. What do you think?
Best Answer
I'd try something like this:
'A\B\C\D\E'
becoming'A\B\C\D'
) and add these new filepaths in the table, if they aren't already there.The initialization:
And the iterations:
After the process is finished, you can get only the missing filepaths with a simple query:
Tested in dbfiddle.uk.
The reverse strings are not strictly needed but they simplify the splitting of the strings. You could go without them (and a bit more complex code). You could also have only the reverse paths stored in the new
filepaths
table and when the iterations are complete, reverse them again into the target table (the existing one I guess or wherever you want them to go).The whole thing could also be done with a single pass - using a recursive CTE - but I don't know if that would be more or less efficient.