Sql-server – Find closed sub directory with full text search

sql servert-sql

I have a simple table with registered directory names inside a full text indexed varchar column.

For example:

  1. 'c:\Dir1\'
  2. 'c:\Dir1\Sub1\'
  3. 'c:\Dir1\Sub2\'
  4. 'c:\Dir2\'
  5. 'c:\Dir3\'

I now need to find the closest subdirectory to file paths like:

'c:\Dir1\File.txt' -> 'c:\Dir1\'

'c:\Dir1\Sub2\File.txt' -> 'c:\Dir1\Sub2\'

'c:\Dir2\Sub1\SubSub1\File.txt' -> 'c:\Dir2\'

All my efforts with CONTAINS and NEAR failed.
Can somebody help me how I can solve this with text search?

Best Answer

If you take the path of interest and parse it into it's sub part, and use those to build a search string, you should then be able to use that search string to find matches of all lengths and then pick the best fit. There is an assumption that the folder structure is embedded in other data within the field (otherwise, why use Full Text Indexing), so the manner of finding the best match will depending on your specific context.

Here is some code the demonstrates the suggested path, include a sample way of selecting the best fit.

DECLARE @PathOfInterest NVARCHAR(MAX) = 'c:\Dir1\Sub2\File.txt';


DECLARE @SearchString NVARCHAR(MAX);

DROP TABLE IF EXISTS #Search;

WITH Builder AS (
    SELECT
        0 AS Score -- How close to the initla path it is   
        , @PathOfInterest AS InitalPath -- Path
        , CAST(NULL AS NVARCHAR(MAX) ) AS SubPath -- SubPath
        , CAST(0 AS BIGINT)  AS DelimPos -- End of last Search
        , CAST(NULL AS NVARCHAR(MAX) ) AS SearchValue -- Cumulative Search Value
    UNION ALL   
    SELECT
        Score + 1 
        , InitalPath -- Pass through path
        , CAST(SUBSTRING(InitalPath, 1, CHARINDEX('\\', InitalPath, DelimPos + 1) - 1)AS NVARCHAR(MAX) ) -- Extract the string to the folder delimi
        , CHARINDEX('\\', InitalPath, DelimPos + 1) -- Find the next delim
        , CAST(
            CONCAT(
                SearchValue + ' OR ' -- When V is null, no leading OR
                , '("'
                , SUBSTRING(InitalPath, 1, CHARINDEX('\\', InitalPath, DelimPos + 1)) 
                , '")'
            ) AS NVARCHAR(MAX) -- Make sure the type matches the anchor of the recusive CTE
        )  
    FROM Builder 
    WHERE 
        CHARINDEX('\\',InitalPath, DelimPos + 1) > 0
)
SELECT
    Score, SubPath, SearchValue
--INTO #Search
FROM Builder

SELECT @SearchString = SearchValue FROM #Search


SELECT * FROM #Search s



DROP TABLE IF EXISTS #Search;
SELECT 
    FolderInfo
INTO 
    #FirstPass
FROM FullTextIndexTables
WHERE CONTAINS(FolderInfo, @SearchString)

---- Select best fit from matching results
SELECT TOP 1 * 
FROM
    #Search
CROSS APPLY (
    SELECT * FROM #FirstPass
    WHERE FolderInfo LIKE CONCAT('%', SubPath, '[^\]%' ) 
)
ORDER BY Score DESC

Notes: 1. To preserve formatting in SO, I changed '\' to '\'. 2. I did not have an environment to test the Full Index Search, so some tweaking may be needed. 3. The selection criteria in the 'best fit' may produce inconsistent results if there are multiple matches with the same score (wobble). If this is of concern, additional ORDER BY criteria should be specified.