Sed – How to Parse File to Extract 3-Digit Numbers in Group

awklatexsed

I am writing a shell script to parse a text file that is extracted from a standardization pdf file. I would like for each test group (identified by Group 0, Group 1… to get the list of test numbers, such as 101, 102, 412… for group 0. I have tried sed, awk, but I do not have sufficient skills to succeed. Ideally I would like to get the output translated into LaTeX code, ie each output item surrounded by a suitable string, such as

\section{Group0}
\Testdetails{101}
\Testdetails{102}
...............
\section{Group1}
\Testdetails{305}
................

This is the source file.

                                                Table 6

                       Tests                     EN 2591-                   Remarks

                                                            All models
 Group 0
 Visual examination                                101
 Examination of dimensions and mass                102      To be performed on one pair per layout, in
                                                            sealed and un-sealed versions
 Contact insertion and extraction forces           412      To be performed on one pair per layout, in
                                                            sealed and un-sealed versions
 Measurement of insulation resistance              206      Only specimens of group 6
 Voltage proof test                                207      Only specimens of group 6
 Contact resistance - Low level                    201
 Contact resistance at rated current               202
 Mating and unmating forces                        408      On specimens of groups 2, 4 and 6
 Visual examination                                101
 Group 1
 Rapid change of temperature                       305
 Visual examination                                101
 Interfacial sealing                               324
 Measurement of insulation resistance              206      Immersed connectors
 Voltage proof test                                207      Immersed connectors
 Insert retention in housing (axial)               410
 Contact retention in insert                       409
 Mechanical strength of rear accessories           420
 Contact retention system effectiveness            426
 (removable contact walkout)
 Visual examination                                101
 Group 2
 Contact retention in insert                       409
 Rapid change of temperature                       305

Best Answer

awk '
    $1 == "Group" {printf("\\section{%s%d}\n", $1, $2); next}
    {for (i=1; i<=NF; i++) 
        if ($i ~ /^[0-9][0-9][0-9]$/) {
            printf("\\Testdetails{%d}\n", $i)
            break
        }
    }
' 

Update based on comment:

awk '
    $1 == "Group" {printf("\\section{%s %d}\n", $1, $2); next}
    {
      title = sep = ""
      for (i=1; i<=NF; i++) 
        if ($i ~ /^[0-9][0-9][0-9]$/) {
          printf("\\subsection{%s} \\Testdetails{%d}\n", title, $i)
          break
        }
        else {
          title = title sep $i
          sep = FS
        }
    }
' 
Related Question