Lum – Easily get a particular column from output without sed or awk

awkcolumnssedtext processing

Is there a quicker way of getting a couple of column of values than futzing with sed and awk?

For instance, if I have the output of ls -hal / and I want to just get the file and directory names and sizes, how can I easily and quickly doing that, without having to spend several minutes tweaking my command.

total 16078
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 .
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 ..
-rw-rw-r--     1 root  admin    15K Aug 14 00:41 .DS_Store
d--x--x--x     8 root  wheel   272B Jun 20 16:40 .DocumentRevisions-V100
drwxr-xr-x+    3 root  wheel   102B Mar 27 12:26 .MobileBackups
drwx------     5 root  wheel   170B Jun 20 15:56 .Spotlight-V100
d-wx-wx-wt     2 root  wheel    68B Mar 27 12:26 .Trashes
drwxrwxrwx     4 root  wheel   136B Mar 30 20:00 .bzvol
srwxrwxrwx     1 root  wheel     0B Aug 13 16:57 .dbfseventsd
----------     1 root  admin     0B Aug 16  2012 .file
drwx------  1275 root  wheel    42K Aug 14 00:05 .fseventsd
drwxr-xr-x@    2 root  wheel    68B Jun 20  2012 .vol
drwxrwxr-x+  289 root  admin   9.6K Aug 13 10:29 Applications
drwxrwxr-x     7 root  admin   238B Mar  5 20:47 Developer
drwxr-xr-x+   69 root  wheel   2.3K Aug 12 21:36 Library
drwxr-xr-x@    2 root  wheel    68B Aug 16  2012 Network
drwxr-xr-x+    4 root  wheel   136B Mar 27 12:17 System
drwxr-xr-x     6 root  admin   204B Mar 27 12:22 Users
drwxrwxrwt@    6 root  admin   204B Aug 13 23:57 Volumes
drwxr-xr-x@   39 root  wheel   1.3K Jun 20 15:54 bin
drwxrwxr-t@    2 root  admin    68B Aug 16  2012 cores
dr-xr-xr-x     3 root  wheel   4.8K Jul  6 13:08 dev
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 etc -> private/etc
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 home
-rw-r--r--@    1 root  wheel   7.8M May  1 20:57 mach_kernel
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 net
drwxr-xr-x@    6 root  wheel   204B Mar 27 12:22 private
drwxr-xr-x@   68 root  wheel   2.3K Jun 20 15:54 sbin
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 tmp -> private/tmp
drwxr-xr-x@   13 root  wheel   442B Mar 29 23:32 usr
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 var -> private/var

I realize there are a bazillion options for ls and I could probably do it for this particular example that way, but this is a general problem and I'd like a general solution to getting specific columns easily and quickly.

cut doesn't cut it because it doesn't take a regular expression, and I virtually never have the situation where there's a single space delimiting columns. This would be perfect if it would work:

ls -hal / | cut -d'\s' -f5,9

awk and sed are more general than I want, basically entire languages unto themselves. I have nothing against them, it's just that unless I've recently being doing a lot with them, it requires a pretty sizable mental shift to start thinking in their terms and write something that works. I'm usually in the middle of thinking about some other problem I'm trying to solve and suddenly having to solve a sed/awk problem throws off my focus.

Is there a flexible shortcut to achieving what I want?

Best Answer

I'm not sure why

ls -hal / | awk '{print $5, $9}'

seems to you to be much more disruptive to your thought processes than

ls -hal / | cut -d'\s' -f5,9

would have been, had it worked. Would you really have to write that down? It only takes a few awk lines before adding the {} becomes automatic. (For me the hardest issue is remembering which field number corresponds to which piece of data, but perhaps you don't have that problem.)

You don't have to use all of awk's features; for simply outputing specific columns, you need to know very little awk.

The irritating issue would have been if you'd wanted to output the symlink as well as the filename, or if your filenames might have spaces in them. (Or, worse, newlines). With the hypothetical regex-aware cut, this is not a problem (except for the newlines); you would just replace -f5,9 with -f5,9-. However, there is no awk syntax for "fields 9 through to the end", and you're left with having to remember how to write a for loop.

Here's a little shell script which turns cut-style -f options into an awk program, and then runs the awk program. It needs much better error-checking, but it seems to work. (Added bonus: handles the -d option by passing it to the awk program.)

#!/bin/bash
prog=\{
while getopts f:d: opt; do
  case $opt in
    f) IFS=, read -ra fields <<<"$OPTARG"
       for field in "${fields[@]}"; do
         case $field in
           *-*) low=${field%-*}; high=${field#*-}
                if [[ -z $low  ]]; then low=1; fi
                if [[ -z $high ]]; then high=NF; fi
                ;;
            "") ;;
             *) low=$field; high=$field ;;
         esac
         if [[ $low == $high ]]; then
           prog+='printf "%s ", $'$low';'
         else
           prog+='for (i='$low';i<='$high';++i) printf "%s ", $i;'
         fi
       done
       prog+='printf "\n"}'
       ;;
    d) sep="-F$OPTARG";;
    *) exit 1;;
  esac
done
if [[ -n $sep ]]; then
  awk "$sep" "$prog"
else
  awk "$prog"
fi

Quick test:

$ ls -hal / | ./cut.sh -f5,9-
7.0K bin 
5.0K boot 
4.2K dev 
9.0K etc 
1.0K home 
8.0K host 
33 initrd.img -> /boot/initrd.img-3.2.0-51-generic 
33 initrd.img.old -> /boot/initrd.img-3.2.0-49-generic 
...

Related Solutions

Concatenate lines by first column by awk or sed

To get the first columns in each line using awk you can do the following:

< testfile awk '{print $1}'
aaa
aaa
aaa
www
hhh
hhh

These are your keys for the rest of the lines. So you may create a hash table, using the first column as a key and the second column of the line as the value:

< testfile awk '{table[$1]=table[$1] $2;} END {for (key in table) print key " => " table[key];}'
www => yyy
aaa => bbbNULLbbb
hhh => 111111

To get the whole rest of the line, starting with column 2, you need to collect all columns:

< testfile awk '{line="";for (i = 2; i <= NF; i++) line = line $i " "; table[$1]=table[$1] line;} END {for (key in table) print key " => " table[key];}'
www => yyy hhh NULL NULL NULL NULL 
aaa => bbb ccc ddd NULL NULL NULL NULL NULL NULL NULL NULL NULL bbb ccc    NULL NULL NULL NULL 
hhh => 111 333 yyy ooo hyy uuuioooy 111 333 yyy ooo hyy NULL

Lum – Join every other column with sed or awk

In sed:

sed 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file

This will make the substitutions and print the result to standard out. To modify the file in place, add the -i switch:

sed -i 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file

Explanation

This sed command will look for a space, followed by at least one non-space character, followed by a space or the end of the line. It substitutes this sequence with whatever non-space characters it found followed by a single space. The substitution is applied as many times as possible across the line (this is called a global substitution) because the g modifier is supplied at the end. So, basically, with a sequence like A B C, sed will find the pattern " B " and substitute it with "B " leaving you with AB C as the final result.

Assumptions made by this code

This code assumes the spaces between your columns are really spaces and not TABs for example. This can be easily fixed at the expense of readability:

sed 's![[:blank:]]\+\([^[:blank:]]\+\)\([[:blank:]]\+\|$\)!\1 !g' your_file

Best Answer

Related Solutions

Concatenate lines by first column by awk or sed

Lum – Join every other column with sed or awk

Related Question