Linux – recursively sort files with string “SENT_” somewhere in filename by the substring immediately after SENT_, and then display them. Linux

awkbashfindlinux

So I wish to run this from any directory, say ~/ and have it work for subdirectories. Example would be;

ls ~/:

/home/me/FILES/DIR_1/DIR_a/SENT_2222_....
/home/me/FILES/DIR2/SENT_3333....
/home/me/FILES/SENT_4444__....

So a fullpath filename could have several / as well as _.

The output would be all files picked out by find, sorted by the first numeric substring after SENT_ and then printed out (possibly to file) with the full path and filename, eg. just like the above directory structure is listed. (This numeric substring is from appending $(date +%s) in a filename. I cannot use attributes).

I do know not to parse on the output of ls

https://askubuntu.com/questions/161802/how-do-i-select-a-field-column-from-the-output-of-ls-l


https://stackoverflow.com/questions/34725005/linux-sort-files-by-part-of-name-no-delimiters

My first attempt (which almost works except I can't get it to display the filename anymore);

find . -type f -name 'SENT*' -printf '%P\n' | awk 'BEGIN { FS="_" }; {print $2}' | sort -k1.1,1.3

where I use %P to get rid of the ./ at the beginning. Then I was going to use multiple field separators in awk but couldn't get it to work ('/' and '_'). But the second field $2 worked because there was only _ in the filenames. BUT I do have directories with _ in them so maybe I should get awk to just search for SENT_.

(I was just experimenting with sort.)

Anyway, all this does is list the sorted field and I don't know the full name of the file (with path).


My second attempt was

https://www.gnu.org/software/gawk/manual/html_node/Field-Separators.html

https://www.linuxquestions.org/questions/programming-9/multiple-field-seperators-in-awk-178132/

https://www.unix.com/shell-programming-and-scripting/159544-cut-awk-reverse.html

find . -type f -name 'SENT*' -printf '%P\n' | awk -F_ '{NF-=2;}1' OFS='_'

in an attempt to count backwards searching for _. It didn't get really far as I could not get the multiple field separators to work, and there could be varied numbers of _ in the 'local' filename.


And my last attempt was searching for SENT_;

https://stackoverflow.com/questions/27153582/using-awk-to-get-a-specific-string-in-line

find . -type f -name 'SENT*' -printf '%P\n' | awk -F"SENT_" '{split($2,a," ");print a[1]}' | sort -r -t_  -k1.1n

where -F"SENT_" is my field separator (afaik), and everything before and after is split into an array. Since I want to sort based on the first substring (separated now by _) of everything after, I used $2, leaving a filename like 3432432_xxxxx_yyyyy_..... Then all I had to do was sort by the first substring (3432432 above).


The last link, I tried several of the other unaccepted answers as well, e.g. grep -o 'SENT_[a-z0-9]\+' and then piping to cut -d "_" -f 2 | sort -bn, but not even the sorting was working, not to mention the reassembly back to a full path.

IM(very)HO, I think the latter attempt is the best approach and only request help on it.

I don't know, maybe this is too big for a 'one-liner' but a simple function would work also…

EDIT

Though I ask that the immediately above method be helped with, I also tried, as another option, to move the $(date +%s) to the front and sort that way,

shopt -s globstar
for i in **; do ni=$(echo $i | awk -F'[_.]' '{print $2"_"$4"_"$1"_"$3"."$5 }') && mv "$i" "$ni" ; done

but the double __ I think is screwing it up. Plus traversing directories doesn't seem to work.

Best Answer

A solution :

find . -type f -name 'SENT*' -printf '%P|%f\n' | \
   sort -t'|' -k2 | \
   cut -d'|'  -f 1 

Explanation :

  1. find output will look like :

    <DIRNAME>/<FILENAME>|<FILENAME>
    
  2. i can sort on the second column | will be the separator

  3. use cut to keep only the first column

Related Question