Ubuntu – File list command line (hidden and subfolders)

command line

I need to get a text list of all the files (hidden and visible) off a hard drive, subfolders included. Ideally the list would have the filename, path, size, and creation (or last modified) date if possible. Can someone please tell me what command I need? Also, is it possible to have this created as a .csv file or something similar for use in Excel?

I am not super proficient with Ubuntu so an explanation of each command item would also be greatly appreciated.

Best Answer

With bash

Assuming the disk in question is mounted under /media/disk1:

$ shopt -s globstar dotglob
$ stat -c '"%n",%s,%y' /media/disk1/**/* >disk1.csv

shopt -s globstar dotglob turns on recursive globbing feature of bash (enables use of '**', see https://unix.stackexchange.com/questions/49913/recursive-glob). It also turns on matching of files starting with a ., als known as hidden files.

stat is the program used to get file meta data. Basically this program will be run for each file on the disk.

-c '"%n",%s,%y' specifies the output format for the stat command. %n is the file name, enclosed in double quotes, %s is the file size, %y is the last modification time. (see stat --help)

/media/disk1/**/* tells bash to hand all the file names recursively found under that path to the pogram (stat), for both, normal and hidden files, since dotglob is enabled.

>disk1.csv redirects output into a file named disk1.csv.

The output in disk1.csv will look like this for my home for instance:

$ stat -c '"%n",%s,%y' /home/seb/**/*
"/home/seb/111",82,2018-03-26 18:38:04.048099912 +0200
"/home/seb/app",4096,2017-07-13 23:39:06.509862769 +0200
"/home/seb/Applications",4096,2018-03-14 20:20:48.552005660 +0100
"/home/seb/Applications/arduino-1.8.2",4096,2017-05-29 20:45:01.184017517 +0200
"/home/seb/Applications/arduino-1.8.2/arduino",946,2017-03-22 13:32:41.000000000 +0100
[...]

I tested to import the resulting csv into libreoffice calc and it worked nicely, also with funny file names with line breaks in them. It will probably choke on file names with double quotes in them.

ARG_MAX

The above command will fail if the total number of files is too high or the total number of characters in all file names is too high. For small drives (USB thumb drives etc.) it should be enough, but if you are indexing a big disk with millions of files you would probably hit that limit.

You can run the following instead, it will produce the same output (and eat less memory):

find /media/disk1 -type f -print0 | xargs -0 stat -c '"%n",%s,%y' >disk1.csv

For the "find .. -print0 | xargs -0 .." pattern you will find many answers here already, e. g. Difference between "xargs" and command substitution?

Related Question