Extracting data from complex file structure

sedtext processing

I have a txt file that is a dump from a database, that contains one entry per line. The structure is like this:

1500
29/03/2010 
18
02
09
47
17
45
28.248
0
0.01
130
12.721
7908
298,809
YES
3.046.550,39
6.500.000,00
17,444,222


1501
30/03/2010
27
54
28
50
22
03
37.223
0
0.00
97
22,466
7379
421.90
NO
20,262,429
25,000,000.01
17,995,281.33


... the third record starts here

The database contains 21 fields. The previous lines shows the dump of two records of that database. The blank lines represent blank fields on the database.

The first field (F0) is the number you see 1500, 1501…

The second field (F1) is a date in the format day, month, year.

Fields F2, F3, F4, F5, F6, F7 are six integer numbers.

What I need is to extract F0, F2, F3, F4, F5, F6, F7 from this file creating one line for each one.

Given the two records above, the final file would be

1500,18,02,09,47,17,45
1501,27,54,28,50,22,03

I know how to do that using a bash script that will be miles long and interact over each line, etc. But I also know that unix is a bag of tricks, specially the sed command and that this probably can be done with a simple line. I love to learn new stuff, so I ask you guys that are gods in Unix, how do I do that.

I am on OSX Mavericks. Thanks.

Best Answer

Here's one way:

$ perl -000ne '@f=split(/\n/); print join(",",@f[0,2..7]) , "\n"' file.txt  
1500,18,02,09,47,17,45
1501,27,54,28,50,22,03

Explanation:

  • -000 : activates "paragraph mode", it sets perl's field delimiter to \n\n, consecutive newlines. This means that it will treat each of your records as a single line.

  • @f=split(/\n/); : split the current line (the record) on newlines and save as the array @f. This array now contains each field from your record. This means that the array slice @f[0,2..8] will contain fields 0 and 2 through 8.

  • print join ",",@f[0,2..8] , "\n"' : this will join the array slice with commas, and print the resulting string followed by a newline.

Related Question