Removing files based on MIME types in linux

findgrepmime-typesrm

I'm quite a beginner to Linux and I'm having trouble removing files based on the mime type. Currently, I have a set of files on my Linux machine, and below are the types for a small subset.

0FiTahKc: M3U playlist, ASCII text, with very long lines, with CRLF line terminators
0FJsEpTc: ASCII text, with CRLF line terminators
0fKPkmwe: ASCII text, with CRLF line terminators
0FLR6MWB: ASCII text
0FMa2xL2: C source, ASCII text, with CRLF line terminators
0fN8DDbf: exported SGML document, ASCII text, with very long lines, with no line terminators
0fSM3YyG: ASCII text, with CRLF line terminators
0fTXKtZD: UTF-8 Unicode text, with CRLF line terminators
0FUcusxr: ASCII text, with CRLF line terminators

I tried looking into the different type of files in my directory and below is the output:

$ find -type f -exec file {} \; | sed 's/^.*: //' | sort -u

ASCII text
ASCII text, with CRLF line terminators
ASCII text, with no line terminators
ASCII text, with very long lines, with CRLF line terminators
ASCII text, with very long lines, with no line terminators
C source, ASCII text, with CRLF line terminators
exported SGML document, ASCII text, with very long lines, with no line terminators
M3U playlist, ASCII text, with very long lines, with CRLF line terminators
M3U playlist, UTF-8 Unicode text, with CRLF line terminators
UTF-8 Unicode text, with CRLF line terminators

I wanted to perform a grep of 'C source','M3U playlist' and 'SGML' etc or using 'find' and delete these files from the directory except the ASCII types in the first 5 lines. I'm looking for a command or a script that I can run where I can pipe different type of these file types and remove them.

Best Answer

Piece some tools together into a single line:

  • Use find and files to list every file's mime type (as shown in your question).
  • Use awk to filter that list based on the type.
  • Use xargs to take that filtered list and rm each file.

I recommend you use echo to prevent rm doing anything first. This will dry-run the command so you can check which files it will remove!

Eg: to remove "C source".

find . -type f -exec file {} + | awk -F: '$(NF) ~ "C source" {print $1}' | xargs echo rm

Then run the same line removing echo to actually remove the files.


To explain the use of awk for filtering, the clause $(NF) ~ "C source" tells awk to match any line with the second column (everything after a :) containing C source. This condition can be extended to anything you like. So most obviously you can search for C Source or M3U playlist using:

$(NF) ~ "C source" || $(NF) ~ "M3U playlist"

Example:

find . -type f -exec file {} + | awk -F: '$(NF) ~ "C source" || $2 ~ "M3U playlist" {print $1}' | xargs echo rm
Related Question