MacOS – How to make a custom search engine in AppleScript

applescriptmacossearch

So I have a database of folders and files (.txt) and I'm trying to create a program that I can enter keywords and have it search those files, and it will output the text of any files containing the keywords. I have a working code that does just that, but it doesn't search the words individually – it searches for the words together, as a single string.

For instance, if I input "Joe Bill Bob," I'd like it to output files that contain each of those words anywhere in the file, even if they aren't next to each other, or in that order.

I would rather avoid doing a repeat loop to input one search term at a time.

I'd also rather avoid setting up hundreds of variables in the code, and do a repeat loop that if a character isn't a space it adds it onto the first blank variable and if it is it skips to the next variable.

If you have any other ideas, that would be great. Thanks!

Best Answer

Note: the text files must be encoded with the "UTF8" encoding.

Here's a starting script:

Example: The keywords:

"Bill
Bob
Joe"

fgrep and sort return lines, like this :

/path/of/thisFile.txt:Bill
/path/of/thisFile.txt:Bob
/path/of/thisFile.txt:Joe
/path/of/thisFile2.txt:Bob
/path/of/thisFile2.txt:Joe
/path/of/thisFile3.txt:Bob
/path/subf1/of/some_File.txt:Bill
/path/subf3/of/some_xzzz_File.txt:Bill

The script use a loop to check the path of each item in this list.

The script get the path from the first item, it remove ":Bill" at end of the line --> so the path is "/path/of/thisFile.txt".

The script check the item (current index + the number of keywords -1), it's the third item, so the third item contains the same path, then the script append the path into a new list

The others items doesn't contains all the keywords.

set r to text returned of (display dialog "What keywords?" default answer "Joe Bill Bob") --- each keyword must be sepearated by a space
set tKeys to my makeKeysForGrep(r)
if tKeys is not "" then
    set masterFolder to choose folder with prompt "Select the source folder .."
    set filesList to my getFilescontainingKeywords(masterFolder, tKeys) -- get a list of files ( each file contains all the keywords)
    --do something with the filesList -- this list contains path of type 'posix path' 
end if

on makeKeysForGrep(t)
    (***   delete trailing and leading spaces, replace multiple spaces in a row by one space (clean the string to avoid issue with the unnecessary spaces in the grep command),
     and replace the space character by a linefeed character , so each line contains a keyword.  ***)
    set r to do shell script "perl -pe 's/ +$|^ +//g; s/ +/ /g; s/ /\\n/g; '  <<< " & (quoted form of t) & "| sort -u" without altering line endings
    if r is not linefeed then return text 1 thru -2 of r -- remove the last line (it's a blank line)
    return "" -- r is a a blank line, so return ""
end makeKeysForGrep

on getFilescontainingKeywords(dir, tKeys)
    script o
        property tfiles : {}
    end script
    set numOfKeywords to count (paragraphs of tKeys) -- get the number of keywords
    set tFolder to quoted form of POSIX path of dir
    set o's tfiles to do shell script "fgrep -R -o -w  --include \"*.txt\" " & (quoted form of tKeys) & " " & tFolder & " | sort -u"
    -- fgrep return the full path + ":" + the keyword, sort -u  : sort the paths and deletes duplicate lines (because the same file can contains multiple occcurences of a keyword)

    if o's tfiles is not "" then
        if numOfKeywords = 1 then return o's tfiles -- no need to continue because one keyword only,  return all Files
        set l to {}
        set o's tfiles to paragraphs of o's tfiles
        set tc to count o's tfiles
        set firstKeyword to ":" & (paragraph 1 of tKeys)
        set numCh to (length of firstKeyword) + 1
        set i to 1
        repeat while (i < tc) -- check each path in the list, the same path must exists  numOfKeywords  in a row 
            set thisItem to (item i of o's tfiles)
            if thisItem ends with firstKeyword then
                set textFilepath to text 1 thru -numCh of thisItem
                set j to (i + numOfKeywords - 1)
                if j > tc then exit repeat
                if (item j of o's tfiles) starts with textFilepath then -- this file contains all the keywords
                    set end of l to textFilepath --- append this path to the list
                    set i to i + numOfKeywords -- to skip items wich contains the same path
                else
                    set i to i + 1 -- next file
                end if
            else
                set i to i + 1 -- next file
            end if
        end repeat
        return l -- list of files which contains all the keywords
    end if
    return {} -- No files found
end getFilescontainingKeywords

The options of fgrep :

The --include \"*.txt\" option : only files matching the given filename pattern are searched, so any name which end with ".txt"

The -w option : match word only, so Bob does not match Bobby, remove this option if you want to match a substring in the text.

The -R option : Recursively search subdirectories, remove this option if you don't want recursion.

Add the -i option to perform case insensitive matching. By default, fgrep is case sensitive.

Related Solutions

Can “mdfind” search for phrases and not just unordered words

You need to escape your quotes like so:

mdfind \"I love Apple\" -onlyin ~/Documents

This results in just the one document being found:

~/Documents/test1.txt

Without escaping them, I don't think the quotes actually get passed to the mdfind command, they're just interpreted by your shell to say that I love Apple is a single argument. With the backslash-escaping, the argument then includes the quote characters.

How to get a value from a list with a string in AppleScript

This would be my take on this:

set ipList to {"8.8.8.8", "8.8.8.6", "8.8.4.4"}
set Output1 to ""
set Output2 to ""
global Output1, Output2
repeat with i from 1 to number of items in ipList
    set this_item to item i of ipList
    my ipCheck(this_item, i)
end repeat


if Output1 is not "" or Output2 is not "" then
    display dialog (Output1 & Output2) buttons {"OK"} default button 1 with title "Resultaat"
end if

on ipCheck(IP_address, i)
    try
        set ping to do shell script ("ping -c 2 " & IP_address & "| head -2 | tail -1 |cut -d = -f 4")
        if ping contains "ms" then
            set Output1 to Output1 & return & "DNS" & i & "  UP"
        else if ping contains "timeout" then
            set Output2 to Output2 & return & "DNS" & i & " DOWN"
        end if
    end try
end ipCheck

enter image description here

Best Answer

Related Solutions

Can “mdfind” search for phrases and not just unordered words

How to get a value from a list with a string in AppleScript

Related Question