I have been using ls -Rlh /path/to/directory > file
to create some text file records of what's in some hard drives.
I want to delete some strings from the text files after they've been created.
An example of part of a text file is:
external1:
total 36K
drwxrwxr-x 2 emma emma 4.0K Oct 31 01:29 dir1
drwxrwxr-x 2 emma emma 12K Oct 31 01:29 dir2
drwxrwxr-x 2 emma emma 20K Oct 31 01:29 dir3
external1/dir1:
total 4.5M
-rw-rw-r-- 1 emma emma 769K Oct 31 01:12 a001.jpg
-rw-rw-r-- 1 emma emma 698K Oct 31 01:12 a002.jpg
-rw-rw-r-- 1 emma emma 755K Oct 31 01:12 a003.jpg
-rw-rw-r-- 1 emma emma 656K Oct 31 01:12 a004.jpg
-rw-rw-r-- 1 emma emma 756K Oct 31 01:12 a005.jpg
-rw-rw-r-- 1 emma emma 498K Oct 31 01:12 a006.jpg
-rw-rw-r-- 1 emma emma 455K Oct 31 01:12 a007.jpg
external1/dir2:
total 8.7M
-rw-rw-r-- 1 emma emma 952K Oct 31 01:13 a001.jpg
-rw-rw-r-- 1 emma emma 891K Oct 31 01:13 a002.jpg
-rw-rw-r-- 1 emma emma 838K Oct 31 01:13 a003.jpg
-rw-rw-r-- 1 emma emma 846K Oct 31 01:13 a004.jpg
-rw-rw-r-- 1 emma emma 876K Oct 31 01:13 a005.jpg
-rw-rw-r-- 1 emma emma 834K Oct 31 01:13 a006.jpg
-rw-rw-r-- 1 emma emma 946K Oct 31 01:13 a007.jpg
-rw-rw-r-- 1 emma emma 709K Oct 31 01:13 a008.jpg
-rw-rw-r-- 1 emma emma 1007K Oct 31 01:13 a009.jpg
-rw-rw-r-- 1 emma emma 940K Oct 31 01:13 a010.jpg
external1/dir3:
total 4.6M
-rw-rw-r-- 1 emma emma 408K Oct 31 01:15 a001.jpg
-rw-rw-r-- 1 emma emma 525K Oct 31 01:15 a002.jpg
-rw-rw-r-- 1 emma emma 383K Oct 31 01:15 a003.jpg
-rw-rw-r-- 1 emma emma 512K Oct 31 01:15 a004.jpg
-rw-rw-r-- 1 emma emma 531K Oct 31 01:15 a005.jpg
-rw-rw-r-- 1 emma emma 532K Oct 31 01:15 a006.jpg
-rw-rw-r-- 1 emma emma 400K Oct 31 01:15 a007.jpg
-rw-rw-r-- 1 emma emma 470K Oct 31 01:15 a008.jpg
-rw-rw-r-- 1 emma emma 407K Oct 31 01:15 a009.jpg
-rw-rw-r-- 1 emma emma 470K Oct 31 01:15 a010.jpg
The actual text files are thousands of lines long and several megabytes in size.
What I want to do is delete everything before the file size from each applicable line, so that each line starts with the file size. E.g.
512K Oct 31 01:15 a004.jpg
531K Oct 31 01:15 a005.jpg
532K Oct 31 01:15 a006.jpg
400K Oct 31 01:15 a007.jpg
470K Oct 31 01:15 a008.jpg
However, I want to keep all of the other lines (with the directory names and total sizes) intact, so this means that I can't use colrm
or cut
.
Best Answer
parsing the output of
ls
is unreliable, but this should work in this particular case:That deletes everything up to "emma emma " on each line. if that string doesn't appear on a line, it is unchanged.
I've written the regexp to only remove the first space after emma, so that the size field remains right-aligned (e.g. ' 709K' and '1007K' both take the same amount of chars on the line)
if you don't wan't that, use this instead:
that will delete all whitespace after emma until the start of the next field.
Here's a sed version that works with any
user group
:it relies even more heavily on the exact format of your
ls
output, so it is technically even worse than the first version....but it should work for your particular file.see Why *not* parse `ls`? for info on why parsing ls is bad.
If not all files are owned by
emma
, you might want to use an awk script like this instead.For lines with more than 2 fields, it prints only fields 5-9. for lines with <3 fields, it prints the entire line. unfortunately, this loses the right-alignment of the size field....that can be fixed with a slightly more complicated
awk
script:This final version merges the for loop from jasonwryan's answer, so copes with filenames that have any number of single spaces in them (but not consecutive spaces, as mentioned by G-Man):