text-processing – How to Convert Rows to Columns

awksedtext processing

I have a file that includes details about VMs running in a hypervisor. We run some command and redirect the output to a file. And the is data available in the below format.

Virtual Machine : OL6U5
        ID     : 0004fb00000600003da8ce6948c441bb
        Status : Running
        Memory : 65536
        Uptime : 17835 Minutes
        Server : MyOVS1.vmorld.com
        Pool   : HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6
Virtual Machine : OL6U6
        ID     : 0004fb00000600003da8ce6948c441bc
        Status : Running
        Memory : 65536
        Uptime : 17565 Minutes
        Server : MyOVS2.vmorld.com
        Pool   : NON-HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6
Virtual Machine : OL6U7
        ID     : 0004fb00000600003da8ce6948c441bd
        Status : Running
        Memory : 65536
        Uptime : 17835 Minutes
        Server : MyOVS1.vmorld.com
        Pool   : HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6

This output differs from hypervisor to hypervisor since on some hypervisors we have 50 + vms running. Above file is a just an example from hypervisor where we have only 3 VMs running and hence the redirected file is expected to contain information about several( N number of VMs)

We need to get this details in the below format using awk/sed or with a shell script

Virtual_Machine  ID                                Status   Memory  Uptime  Server              Pool        HA     VCPU  Type     OS
OL6U5            0004fb00000600003da8ce6948c441bb  Running  65536   17835   MyOVS1.vmworld.com  HA-POOL     false  16    Xen PVM  Oracle Linux 6
OL6U6            0004fb00000600003da8ce6948c441bc  Running  65536   17565   MyOVS2.vmworld.com  NON-HA-POOL     false  16    Xen PVM  Oracle Linux 6
OL6U5            0004fb00000600003da8ce6948c441bd  Running  65536   17835   MyOVS1.vmworld.com  HA-POOL     false  16    Xen PVM  Oracle Linux 6

Best Answer

If walking the file twice is not a (big) problem (will store only one line in memory):

awk -F : '{printf("%s\t ", $1)}' infile
echo
awk -F : '{printf("%s\t ", $2)}' infile

Which, for a general count of fields would be (which could have many walks of the file):

#!/bin/bash
rowcount=2
for (( i=1; i<=rowcount; i++ )); do
    awk -v i="$i" -F : '{printf("%s\t ", $i)}' infile
    echo
done

But for a really general transpose, this will work:

awk '$0!~/^$/{    i++;
                  split($0,arr,":");
                  for (j in arr) {
                      out[i,j]=arr[j];
                      if (maxr<j){ maxr=j} # max number of output rows.
                  }
            }
    END {
        maxc=i                             # max number of output columns.
        for     (j=1; j<=maxr; j++) {
            for (i=1; i<=maxc; i++) {
                printf( "%s\t", out[i,j])  # out field separator.
            }
            printf( "%s\n","" )
        }
    }' infile

And to make it pretty (using tab \t as out field separator) :

./script | |column -t -s $'\t'

Virtual_Machine  ID                                Status   Memory  Uptime  Server              Pool     HA     VCPU  Type     OS
OL6U7            0004fb00000600003da8ce6948c441bd  Running  65536   17103   MyOVS1.vmworld.com  HA-POOL  false  16    Xen PVM  Oracle Linux 6

The code above for a general transpose will store the whole matrix in memory.
That could be a problem for really big files.

Update for new text.

To process the new text posted in the question, It seems to me that two pass of awk are the best answer. One pass, as short as fields exist, will print the header field titles. The next awk pass will print only field 2. In both cases, I added a way to remove leading and trailing spaces (for better formatting).

#!/bin/bash
{
awk -F: 'BEGIN{ sl="Virtual Machine"}
         $1~sl && head == 1 { head=0; exit 0}
         $1~sl && head == 0 { head=1; }
         head == 1 {
             gsub(/^[ \t]+/,"",$1);   # remove leading  spaces
             gsub(/[ \t]+$/,"",$1);   # remove trailing spaces
             printf( "%s\t", $1)
         }
         ' infile
#echo
awk -F: 'BEGIN { sl="Virtual Machine"}
         $1~sl { printf( "%s\n", "") }
         {
             gsub(/^[ \t]+/,"",$2);   # remove leading  spaces
             gsub(/[ \t]+$/,"",$2);   # remove trailing spaces
             printf( "%s\t", $2)
         }
         ' infile
echo
} | column -t -s "$(printf '%b' '\t')"

The surrounding { ... } | column -t -s "$(printf '%b' '\t')" is to format the whole table in a pretty way.
Please note that the "$(printf '%b' '\t')" could be replaced with $'\t' in ksh, bash, or zsh.

Explanation

The -l trims newlines from each input line, the -a splits input fields on whitespace into the array @F and the -p prints each input line after applying the script given by -e.

The script itself iterates over each input field (the @F array), saving each as $i. The substitution looks for 2 or more consecutive $i followed by 0 or more spaces and replaces them with $i+.

Best Answer

Update for new text.

Related Solutions

How to use awk or sed to convert rows to columns

Bash – merge duplicate rows in columns

Explanation

Related Question