Bash – Format column-based text file into tree structure using bash

bashshell-scripttext formattingtext processing

Is there a Unix/Linux command that can turn this:

AMERICA USA NEW_YORK    AB-100
AMERICA USA NEW_YORK    VF-200
AMERICA USA NEW_YORK    XY-243
AMERICA USA LOS_ANGELES UH-198
AMERICA CANADA  TORONTO UT-876
AMERICA CANADA  TORONTO UT-877
AMERICA CANADA  VANCOUVER   UT-871
AMERICA CANADA  VANCOUVER   UT-872
AMERICA CANADA  VANCOUVER   UT-873
AMERICA MEXICO  MEXICO  OU-098
AMERICA MEXICO  MONTERREY   OU-099
AMERICA MEXICO  MONTERREY   OU-100
EUROPE  FRANCE  PARIS   IV-122
EUROPE  FRANCE  PARIS   AV-112
EUROPE  FRANCE  PARIS   IF-111
EUROPE  FRANCE  PARIS   XX-190
EUROPE  FRANCE  TOULOUSE    TL-654

Into this:

AMERICA
    USA
        NEW_YORK
            AB-100
            VF-200
            XY-243
        LOS_ANGELES
            UH-198
    CANADA  
        TORONTO 
            UT-876
            UT-877
        VANCOUVER
            UT-871
            UT-872
            UT-873
    MEXICO  
        MEXICO  
            OU-098
        MONTERREY
            OU-099
            OU-100
EUROPE
    FRANCE
        PARIS
            IV-122
            AV-112
            IF-111
            XX-190
        TOULOUSE
            TL-654

Best Answer

In awk, you could do:

$ awk '{
        a[$1][$2][$3] ? 
            a[$1][$2][$3]=a[$1][$2][$3]"\n\t\t\t"$4 :
            a[$1][$2][$3]="\t\t\t"$4 ;
      }
      END{
        for(cont in a){
            printf "%s\n", cont;
            for(count in a[cont]){
                printf "\t%s\n", count;
                for(city in a[cont][count]){
                    print "\t\t"city"\n"a[cont][count][city]
      }}}}' file
EUROPE
    FRANCE
        TOULOUSE
            TL-654
        PARIS
            IV-122
            AV-112
            IF-111
            XX-190
AMERICA
    USA
        NEW_YORK
            AB-100
            VF-200
            XY-243
        LOS_ANGELES
            UH-198
    CANADA
        VANCOUVER
            UT-871
            UT-872
            UT-873
        TORONTO
            UT-876
            UT-877
    MEXICO
        MEXICO
            OU-098
        MONTERREY
            OU-099
            OU-100

In Perl:

perl -lane 'push @{$k{$F[0]}{$F[1]}{$F[2]}},"\t\t\t".$F[3];
            END{
                for $cont (keys(%k)){
                    print "$cont";
                    for $coun (keys(%{$k{$cont}})){
                        print "\t$coun";
                        for $city (keys(%{$k{$cont}{$coun}})){
                            print "\t\t$city\n", 
                              join "\n",@{$k{$cont}{$coun}{$city}}
             }}}}' file
EUROPE
    FRANCE
        PARIS
            XX-190
            XX-190
        TOULOUSE

            TL-654
AMERICA
    USA
        NEW_YORK
            XY-243
            XY-243
        LOS_ANGELES

            UH-198
    MEXICO
        MONTERREY
            OU-100
            OU-100
        MEXICO

            OU-098
    CANADA
        VANCOUVER
            UT-873
            UT-873
        TORONTO
            UT-877
            UT-877
Related Question