I have 1000s of files in a single directory that I want to sort into subdirectories based on their filenames. They're all consistently named with a set structure of p-[number]_n-[number]_a-[number].[ext].
Here's a small sample…
- p-12345_n-987_a-1254.jpg
- p-12345_n-987_a-9856.pdf
- p-12345_n-987_a-926.docx
- p-12345_n-384_a-583.pdf
- p-12345_n-384_a-987.pdf
- p-2089_n-2983_a-2348.gif
- p-2089_n-1982_a-403.jpeg
- p-38422_n-2311_a-126.pdf
- p-38422_n-2311_a-5231.docx
What I'm after is a folder structure like this:
p-12345
⊢ n-987
⊢ p-12345_n-987_a-1254.jpg
⊢ p-12345_n-987_a-9856.pdf
⊢ p-12345_n-987_a-926.docx
⊢ n-384
⊢ p-12345_n-384_a-583.pdf
⊢ p-12345_n-384_a-987.pdf
p-2089
⊢ n-2983
⊢ p-2089_n-2983_a-2348.gif
⊢ n-1982
⊢ p-2089_n-1982_a-403.jpeg
p-38422
⊢ n-2311
⊢ p-38422_n-2311_a-126.pdf
⊢ p-38422_n-2311_a-5231.docx
I hope that makes sense.
Is it possible to write a script to organise the file in this way?
EDIT: To clarify: Yes, my question should be how can I write a script to organise the files? 🙂 I'm very new to Unix and the command line in general. So far I've only written/used basic shell scripts. I have a hunch that the answer will probably involve regular expressions but beyond that I'm not really sure where to start.
The best idea I've come up with is to
- Export the file list to a text file
- Find and replace "_n" and "_a" with "/n" and "/a"
- Create a series of mv commands from that
- Save it as a shell script
I'm sure that's far more long-winded than it needs to be though. I'd also like to have something repeatable in case I need to do it for more files in future.
Best Answer
As already noted, the short answer is "yes".
The long answer is: You can do it with a bash script that uses
awk
to extract the filename elements you want to base your directory structure on. It could look something like this (where more emphasis is placed on readability than "one-liner" compactness).To explain:
ls
, so something likeFILES=$(ls p-*); for FILE in $FILES; do ...
would be considered a no-go).p-
and_n
needed to generate the first level of your directory structure usingawk
(as you suspected, with regular expressions), the same for the numerals betweenn-
and_a
for the second level. The idea is to use thematch
function which not only looks for the place where the specified regular expression occurs in your input, but also gives you the "completed" value of all elements enclosed in round brackets( ... )
in the array "fields".For more information, have a look at the Advanced bash scripting guide and the GNU Awk Users Guide.
Once you are more firm in scripting and regular expressions, you can make this much more compact; in the above script, for example, the generation of the directory/subdirectory path could easily be contracted to just one
awk
call.For one, since the directory names are actually
p-<number>
andn-<number>
, the same as in your filename, we could have letawk
do the work to extract these characters for us, too, by writingmatch($1,"(^p-[[:digit:]]+)_(n-[[:digit:]]+)_[[:print:]]*",fields)
We can further offload work to
awk
by having it generate the directory-subdirectory path at the same time with a suitable argument ofprint
:would readily yield (e.g.)
p-12345/n-384
for filep-12345_n-384_a-583.pdf
. If we combine that with the usage ofmkdir -p
as indicated by @wurtel, the script could look like