AWK Text Processing – How to Select File Name in URL

awktext processingurl

I have an awk script that runs this way.

Raw data text:

date:
  1.0.1: http://example.com/1.0.1.tgz
  1.0.2: http://example.com/1.0.2.tgz
  1.0.3: http://example.com/1.0.3.tgz
  1.0.4: http://example.com/1.0.4.tgz
  1.0.5: http://example.com/1.0.5.tgz
  1.0.6: http://example.com/1.0.6.tgz
  1.0.7: http://example.com/1.0.7.tgz
  1.0.8: http://example.com/1.0.8.tgz
  1.0.9: http://example.com/1.0.9.tgz
  1.0.10: http://example.com/1.0.10.tgz

Convert to an HTML form via awk:

<table>
    <thead>
        <tr>
            <th>ver</th>
            <th>link</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>1.0.1</td>
            <td><a href="http://example.com/1.0.1.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.2</td>
            <td><a href="http://example.com/1.0.2.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.3</td>
            <td><a href="http://example.com/1.0.3.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.4</td>
            <td><a href="http://example.com/1.0.4.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.5</td>
            <td><a href="http://example.com/1.0.5.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.6</td>
            <td><a href="http://example.com/1.0.6.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.7</td>
            <td><a href="http://example.com/1.0.7.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.8</td>
            <td><a href="http://example.com/1.0.8.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.9</td>
            <td><a href="http://example.com/1.0.9.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.10</td>
            <td><a href="http://example.com/1.0.10.tgz">download</a></td>
        </tr>
    </tbody>
</table>

I want to replace the "download" text in the form with the link file name. How should I modify it? The following is the existing awk script code.

#!/usr/bin/env awk

BEGIN {
    print "<table>"
    print "\t<thead>"
    print "\t\t<tr>"
    print "\t\t\t<th>ver</th>"
    print "\t\t\t<th>link</th>"
    print "\t\t</tr>"
    print "\t</thead>"
    print "\t<tbody>"
}

match($0, /^ +(.*): (.*)$/, r) {
    print "\t\t<tr>"
    printf "\t\t\t<td>%s</td>\n", r[1]
    printf "\t\t\t<td><a href=\"%s\">download</a></td>\n", r[2]
    print "\t\t</tr>"
}

END {
    print "\t</tbody>"
    print "</table>"
}

I am a beginner, I hope to get everyone's help. Any helpful suggestions, thank you in advance!

Best Answer

Just create a 3rd capture group in the match() regexp to save the file name in and then print that on the appropriate line:

$ cat tst.awk
BEGIN {
    print "<table>"
    print "\t<thead>"
    print "\t\t<tr>"
    print "\t\t\t<th>ver</th>"
    print "\t\t\t<th>link</th>"
    print "\t\t</tr>"
    print "\t</thead>"
    print "\t<tbody>"
}

match($0, /^ +(.*): (.*\/([^/]+))$/, r) {
    print "\t\t<tr>"
    printf "\t\t\t<td>%s</td>\n", r[1]
    printf "\t\t\t<td><a href=\"%s\">%s</a></td>\n", r[2], r[3]
    print "\t\t</tr>"
}

END {
    print "\t</tbody>"
    print "</table>"
}

.

$ awk -f tst.awk data.text
<table>
        <thead>
                <tr>
                        <th>ver</th>
                        <th>link</th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td>1.0.1</td>
                        <td><a href="http://example.com/1.0.1.tgz">1.0.1.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.2</td>
                        <td><a href="http://example.com/1.0.2.tgz">1.0.2.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.3</td>
                        <td><a href="http://example.com/1.0.3.tgz">1.0.3.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.4</td>
                        <td><a href="http://example.com/1.0.4.tgz">1.0.4.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.5</td>
                        <td><a href="http://example.com/1.0.5.tgz">1.0.5.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.6</td>
                        <td><a href="http://example.com/1.0.6.tgz">1.0.6.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.7</td>
                        <td><a href="http://example.com/1.0.7.tgz">1.0.7.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.8</td>
                        <td><a href="http://example.com/1.0.8.tgz">1.0.8.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.9</td>
                        <td><a href="http://example.com/1.0.9.tgz">1.0.9.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.10</td>
                        <td><a href="http://example.com/1.0.10.tgz">1.0.10.tgz</a></td>
                </tr>
        </tbody>
</table>
Related Question