How to delete a fixed part of some lines from a text file

text processing

I have been using ls -Rlh /path/to/directory > file to create some text file records of what's in some hard drives.

I want to delete some strings from the text files after they've been created.

An example of part of a text file is:

external1:
total 36K
drwxrwxr-x 2 emma emma 4.0K Oct 31 01:29 dir1
drwxrwxr-x 2 emma emma  12K Oct 31 01:29 dir2
drwxrwxr-x 2 emma emma  20K Oct 31 01:29 dir3

external1/dir1:
total 4.5M
-rw-rw-r-- 1 emma emma 769K Oct 31 01:12 a001.jpg
-rw-rw-r-- 1 emma emma 698K Oct 31 01:12 a002.jpg
-rw-rw-r-- 1 emma emma 755K Oct 31 01:12 a003.jpg
-rw-rw-r-- 1 emma emma 656K Oct 31 01:12 a004.jpg
-rw-rw-r-- 1 emma emma 756K Oct 31 01:12 a005.jpg
-rw-rw-r-- 1 emma emma 498K Oct 31 01:12 a006.jpg
-rw-rw-r-- 1 emma emma 455K Oct 31 01:12 a007.jpg

external1/dir2:
total 8.7M
-rw-rw-r-- 1 emma emma  952K Oct 31 01:13 a001.jpg
-rw-rw-r-- 1 emma emma  891K Oct 31 01:13 a002.jpg
-rw-rw-r-- 1 emma emma  838K Oct 31 01:13 a003.jpg
-rw-rw-r-- 1 emma emma  846K Oct 31 01:13 a004.jpg
-rw-rw-r-- 1 emma emma  876K Oct 31 01:13 a005.jpg
-rw-rw-r-- 1 emma emma  834K Oct 31 01:13 a006.jpg
-rw-rw-r-- 1 emma emma  946K Oct 31 01:13 a007.jpg
-rw-rw-r-- 1 emma emma  709K Oct 31 01:13 a008.jpg
-rw-rw-r-- 1 emma emma 1007K Oct 31 01:13 a009.jpg
-rw-rw-r-- 1 emma emma  940K Oct 31 01:13 a010.jpg

external1/dir3:
total 4.6M
-rw-rw-r-- 1 emma emma 408K Oct 31 01:15 a001.jpg
-rw-rw-r-- 1 emma emma 525K Oct 31 01:15 a002.jpg
-rw-rw-r-- 1 emma emma 383K Oct 31 01:15 a003.jpg
-rw-rw-r-- 1 emma emma 512K Oct 31 01:15 a004.jpg
-rw-rw-r-- 1 emma emma 531K Oct 31 01:15 a005.jpg
-rw-rw-r-- 1 emma emma 532K Oct 31 01:15 a006.jpg
-rw-rw-r-- 1 emma emma 400K Oct 31 01:15 a007.jpg
-rw-rw-r-- 1 emma emma 470K Oct 31 01:15 a008.jpg
-rw-rw-r-- 1 emma emma 407K Oct 31 01:15 a009.jpg
-rw-rw-r-- 1 emma emma 470K Oct 31 01:15 a010.jpg

The actual text files are thousands of lines long and several megabytes in size.

What I want to do is delete everything before the file size from each applicable line, so that each line starts with the file size. E.g.

512K Oct 31 01:15 a004.jpg
531K Oct 31 01:15 a005.jpg
532K Oct 31 01:15 a006.jpg
400K Oct 31 01:15 a007.jpg
470K Oct 31 01:15 a008.jpg

However, I want to keep all of the other lines (with the directory names and total sizes) intact, so this means that I can't use colrm or cut.

Best Answer

parsing the output of ls is unreliable, but this should work in this particular case:

sed -e 's/^.*emma emma //' file

That deletes everything up to "emma emma " on each line. if that string doesn't appear on a line, it is unchanged.

I've written the regexp to only remove the first space after emma, so that the size field remains right-aligned (e.g. ' 709K' and '1007K' both take the same amount of chars on the line)

if you don't wan't that, use this instead:

sed -e 's/^.*emma emma  *//' file

that will delete all whitespace after emma until the start of the next field.

Here's a sed version that works with any user group:

sed -e 's/^.\{10\} [0-9]\+ [^ ]\+ [^ ]\+ //' file

it relies even more heavily on the exact format of your ls output, so it is technically even worse than the first version....but it should work for your particular file.

see Why *not* parse `ls`? for info on why parsing ls is bad.


If not all files are owned by emma, you might want to use an awk script like this instead.

awk 'NF>2 {print $5,$6,$7,$8,$9} ; NF<3 {print}' file

For lines with more than 2 fields, it prints only fields 5-9. for lines with <3 fields, it prints the entire line. unfortunately, this loses the right-alignment of the size field....that can be fixed with a slightly more complicated awk script:

awk 'NF>2 {printf "%5s %s %s %s %s\n", $5, $6, $7, $8, $9} ; NF<3 {print}' file 

This final version merges the for loop from jasonwryan's answer, so copes with filenames that have any number of single spaces in them (but not consecutive spaces, as mentioned by G-Man):

awk 'NF>2 {printf "%5s", $5; for(i=6;i<=NF;i++){printf " %s", $i}; printf "\n"} ; NF<3 {print}' file 
Related Question