MacOS – How to introduce a list object that is stored in a .txt file to an AppleScript

applescriptmacosperformancetext;

I have an AppleScript .scpt file, triggered by a key combination in FastScripts.app, that functions as a thesaurus. The script looks up the selected word in a pre-formatted list, and, if the word is found in this list, it displays the synonyms of this word to the user1.

This list is contained in a plain text (.txt) file. The list is already formatted in the AppleScript list format. I would like my .scpt file to be able to accept this text as a true list2.

It is important to note that the .txt file contains 2.5 million words3.

This is why I am not simply copying the contents of the .txt file into the .scpt file itself, despite the fact that the text file is 100% static and will never be altered. Inserting the text directly into my script would bring with it considerable lag and sluggishness as I edit and compile my .scpt file in Script Editor.app.

Script Editor.app froze every time that I tried to read the .txt file. The problem is that Script Editor reads into the memory a given text file in its entirety, instead of streaming the contents in a more efficient manner. So, I broke this text file up into 10 smaller text files4, each new .txt file containing about 250,000 words.

At 250,000 words, of course, the text files are still extremely large (by any standard).

Here is a (severely condensed) example of what the contents of each text file looks like:

{{"exaltation","accolade","adulation","advance","advancement"},{"exalted","winnowing","winsome"},{"exam","audition","blue book","examen","examination","final","examination","test","trial","tripos","viva","written","written examination"},{"examination","Pap test","Socratic method","airing","analysis","anatomic diagnosis","appraisal","work-up","written","written examination"},{"examine","air","analyze","appraise","archetype","asleep","assess","canvass","case"},{"examiner","analyst","analyzer","asker"},{"examining","analytic","examinational","exploratory"},{"example","admonishment","admonition","alarm","archetype"},{"exasperate","bedevil","vex","work up","worry"},{"exasperated","aggravated","amplified","angry","annoyed"},{"exasperating","annoying","bothering","bothersome"}}

As you can see, the contents of the text file is a nested list5 that is organized in the same way that AppleScript formats a list. Each text file contains no line breaks or paragraphs.

I am looking for a method to get this list into my AppleScript, with as little latency as possible6. This is why I pre-formatted it. So, speed is key.


Footnotes:

1. My thesaurus script is similar to the built-in thesaurus feature that exists in Microsoft Word. One notable difference is that my script works system-wide.

2. By true list, I mean that I can call, for example, item 12 of this list later on in my AppleScript.

3. My source for the thesaurus data is Grady Ward's "Moby" Thesaurus. I found this database from this answer: Looking for Thesaurus Data – Stack Overflow

4. I had to use Hex Fiend.app to cut from the text file and paste into a new text file. I could not edit the file in TextEdit.app, without TextEdit freezing on me.

5. The outer list contains each thesaurus entry. The inner lists contain all of the synonyms for that entry. The first item of each inner list is the entry title. Both the outer list and each inner list are ordered alphabetically (with the exception of the first word of each inner list, because, again, this word is the entry title).

6. I understand that even the fastest method will still have several seconds of latency, since the text file is so large.


Best Answer

Obviously, I do not know the total scope of what you're doing or how you have other things coded, as you have not supplied all the details and code, however, I would take a different approach.

I downloaded the Moby Thesaurus from the linked page in your question and preformed the following actions on it.

  1. Extracted the contents of the mthes.tar.Z file.
  2. Opened the ./mthes/mobythes.aur file in TextWrangler and noticed two things to change.
    • Change the line endings from Classic Mac (CR) to Unix (LF).
    • Removed unwanted trailing commas from 6 lines.

Note that while I could make these changes in TextWrangler, nonetheless I prefer to use Terminal, and did so using the following command:

tr "\r" "\n" < mobythes.aur | sed -E 's/[,]{1,}$//' > mobythes.txt

Which took but literally a second to do (as I actually prefaced the above command with time, out of curiosity). With the mobythes.aur file having now been processed, saved to mobythes.txt and copied to my Documents folder, I will use this new plain CSV file as is, to query the search string for a match to the first field of each record and return the record, sans the first field, as a list to choose from in AppleScript. I found this method to be extremely fast, while searching for "zoom" the last record in the CSV file, it took but a second to return and create the list for that record on the fly.

In AppleScript Editor I use the following code to test against the plain CSV file as a single file containing the 30,260 lines with 2.5 million synonyms and related words.

set AppleScript's text item delimiters to ""
set theMobyThesaurus to POSIX path of (path to documents folder) & "mobythes.txt"

set theSearchString to the text returned of (display dialog "Find synonyms for:" default answer "" buttons {"Cancel", "Search"} default button 2 with title "Search Moby Thesaurus")

if theSearchString is not equal to "" then

    try
        set theSearchResults to (do shell script "grep -i -m 1 '^" & theSearchString & ",' " & theMobyThesaurus)
    on error
        display dialog "No match for \"" & theSearchString & "\" available." buttons {"OK"} default button 1
        return
    end try

    if theSearchResults is not equal to "" then
        set AppleScript's text item delimiters to ","
        set theSynonymsList to items 2 thru -1 of text items of theSearchResults as list
        set AppleScript's text item delimiters to ""

        choose from list theSynonymsList with prompt "Choose a synonym for: " & linefeed & theSearchString
        if the result is not false then
            set theChosenWord to (item 1 of the result)
        end if
    end if

end if

In this example, assuming a search match was made and nothing canceled, then the theChosenWord variable now contains what was chosen from the displayed list and can be processed further as needed/wanted.

Note that this is of course strictly example code for testing purposes and will need to be adapted to your use case scenario while incorporating appropriate error handling as needed.

I believe this is going to be the fastest way while leaving the Moby Thesaurus as a single CSV file, and is probably faster then whatever methods you tried thus far.