How to reproduce the Finder sort order without asking Finder to sort for me

applescriptcommand linefinderperformancesort

I work with a large collection of mostly-text files subdivided by series, periodically mirroring them from an upstream repository and repackaging them using a shell script from a slightly chaotic source organization into more orderly and compact forms. The sources are collected in directories of files for each series numbering from a handful to hundreds per set, with a semi-orderly human-oriented naming convention dating back to the 90's with this general pattern:

multi-word-series-name[-$subsetnumber]-$docnumber[.{txt,html,pdf}]

Where a series may or may not have subsets, subsets may be numbered with roman or arabic numerals, a series may start with numbered docs in no subset and later get a subset labeled "II" or "2", and so on. These names work well for people using them from the Finder, which in most cases orders them in a human-sensible way and detects that name-II-1 comes later than name-6 and that name-2 does not come after name-19. Because my repackaging consists of assembling the latest version of each series into single files in their human-rational order, I use a simple bit of AppleScript that uses the Finder to sort the names of the items in a given directory. This yields correct results, but it is spectacularly inefficient for reasons I don't understand. The AppleScript is:

on run argv
    set op to ""
    set upath to POSIX file argv as string
    tell application "Finder"
        set foo to every item of folder upath
        set foo to sort foo by name
        repeat with curfile in foo
            set thisname to the name of curfile
            set op to op & " " & thisname as string
        end repeat
    end tell
    return op
end run

(And no, I don't recall why I did it that way. I wrote it circa 2008 and I mostly hate AppleScript…) This is compiled into a script named "flist.scpt" and run from my shell script with "osascript flist.scpt /path/to/series/folder/". The result of running that script with a directory that has 26 files in it is to peg the cpu for over 2 minutes. For comparison, 'ls' gives lexically sorted results on the same machine (a G5 iMac running Leopard… don't laugh, it's a utility machine) in 0.008s with the script chewing on the same directory in the background.

I'm looking to get rid of that AppleScript altogether, but I have not found a way to reproduce the Finder sort logic easily, i.e. by setting a locale or giving some complex arguments to 'sort' that will order the names correctly. My fallback solution if I can't find something canned to do this will be to reproduce the Finder logic myself in some evil mix of sort, sed, awk, and shell, but I'd really like to avoid that if possible. If there's something goofy about my AppleScript that is causing the horrid performance, fixing that would be almost as good as a magic incantation to 'ls' to make it sort like the Finder.

UPDATE: SOLVED!
I first tried the Perl Sort::Naturally module, but it sorted labeled-subset names ahead of the implied subset "I" members, which isn't what I wanted. So I went ahead and dove into writing my first Ruby script, after hacking a fix into the Leopard Ruby config to make 'gem' work in the modern world and installing the naturalsort gem. The replacement Ruby script (called with a directory name as an argument) is:

#! /usr/bin/env ruby -rubygems -KU
# Largely cargo-culted from stackexchange response. 
# I dunno what exactly the shebang line opts to ruby do. Probably pwn me. 
# My first Ruby script ever. Don't laugh too hard. 

require 'natural_sort' # gem install naturalsort

dname = ARGV[0]
input = Dir.entries(dname)
puts NaturalSort.naturalsort(input)

The minor differences in output are that this includes the . and .. entries and delimits entries by line rather than with spaces, but as this is being called from a shell script that already filters out some special names these are trivial to handle.

Best Answer

The exact sorting order used by Finder probably depends on the locale, but scripting languages like Ruby and Python have libraries for sorting strings naturally.

#!/usr/bin/env ruby -rubygems -KU

require 'natural_sort' # gem install naturalsort

input = "name-II-1
name-6
name-2
name-19".lines
# input = Dir["#{ENV['HOME']}/Documents/*"]

puts NaturalSort.naturalsort(input)