Ubuntu – How to Export several grep outputs into one file

filesgrepmergeoutput

I have a little grep command line that finds certain values in a text file and exports them into a .csv-file.

Here it is:

cat file.txt | grep 'Values a' | cut -d' ' -f9 >a.csv
cat file.txt | grep 'Values b' | cut -d' ' -f9 >b.csv
cat file.txt | grep 'Values c' | cut -d' ' -f9 >c.csv
cat file.txt | grep 'Values d' | cut -d' ' -f9 >d.csv

How can I modify the commands to get one (comma separated) .csv-file with four columns a, b, c, d?
Any help is appreciated!

Best Answer

As the first comment said, without a view of your "file.txt" it is tough to know for sure we are getting this right.

My first thought was why don't you take the easy road? The >> will append, so create the one thing, add the rest.

cat file.txt | grep 'Values a' | cut -d' ' -f9 > a.csv
cat file.txt | grep 'Values b' | cut -d' ' -f9 >> a.csv
cat file.txt | grep 'Values c' | cut -d' ' -f9 >> a.csv
cat file.txt | grep 'Values d' | cut -d' ' -f9 >> a.csv

Then I wondered why do you run cat at all

grep 'Values a' file.txt | cut -d' ' -f9 > a.csv
grep 'Values b' file.txt | cut -d' ' -f9 >> a.csv
grep 'Values c' file.txt | cut -d' ' -f9 >> a.csv
grep 'Values d' file.txt | cut -d' ' -f9 >> a.csv

That works, but it puts the 4 things on new lines, rather than as comma separated values.

So I concentrated on stopping the new lines from being inserted, and getting a comma instead. awk can handle that.

grep 'Values a' file.txt | cut -d' ' -f9 | awk '{printf $0 ","}' > a.csv
grep 'Values b' file.txt | cut -d' ' -f9 | awk '{printf $0 ","}' >> a.csv

and so forth. That works, but right before I hit "send" I realized that I was following along with you using cut and we don't need to. awk can select the 9th thing.

grep 'Values a' file.txt | awk '{printf $9 ","}' > a.csv
grep 'Values b' file.txt | awk '{printf $9 ","}' >> a.csv

That worked on a little test case I made, maybe not on your test case.

If I were doing repeated greps on data to build a file, I'd probably prefer to keep it in separate lines, rather than in a big, complicated tangle like the other one recommended. So I don't think I'd go for a one liner, four lines I can understand is good enough.

If you are going to do more shell work like this, I suggest you get a copy of the old book Unix Power Tools. It has a lot of tricks like this.

Single command combination of the three greps

If you just want to run a single command you can use awk which works with regular expressions too and can combine them with logical operators. Here is the equivalent of your filter:

awk '/a/ && /c/ && $0 !~ /d/'

I think in most cases there is no reason for simplifying a pipe to a single command except when the combination results in a realatively simple grep expression which could be faster (see results below).

Unix-like systems are designed to use pipes and to connect various utilities together. Though the pipe communication is not the most effective possible but in most cases it is sufficient. Because nowadays most of new computers have multiple CPU cores you can "naturally" utilize CPU parallelization just by using a pipe!

Your original filter works very well and I think that in many cases the awk solution would be a little bit slower even on a single core.

Performance comparison

Using a simple program I have generated a random testing file with 200 000 000 lines, each with 4 characters as a random combination from characters a, b, c and d. The file has 1 GB. During the tests it was completely loaded in the cache so no disk operations affected the performance measurement. The tests were run on Intel dual core.

Single grep

$ time ( grep -E '^[^d]*a[^d]*c[^d]*$|^[^d]*c[^d]*a[^d]*$' testfile >/dev/null )
real    3m2.752s
user    3m2.411s
sys 0m0.252s

Single awk

$ time ( awk '/a/ && /c/ && $0 !~ /d/' testfile >/dev/null )
real    0m54.088s
user    0m53.755s
sys 0m0.304s

The original three greps piped

$ time ( grep a testfile | grep c | grep -v d >/dev/null )
real    0m28.794s
user    0m52.715s
sys 0m1.072s

Hybrid - positive greps combined, negative piped

$ time ( grep -E 'a.*c|c.*a' testfile | grep -v d >/dev/null )
real    0m15.838s
user    0m24.998s
sys 0m0.676s

Here you see that the single grep is very slow because of the complex expression. The original pipe of three greps is pretty fast because of a good parallelization. Without parallelization - on a single core - the original pipe runs just slightly faster than awk which as a single process is not parallelized. Awk and grep probably use the same regular expressions code and the logic of the two solutions is similar.

The clear winner is the hybring combining two positive greps and leaving the negative one in the pipe. It seems that the regular expression with | has no performance penalty.

14.04 Command-Line Dropbox – Combine Multiple Text Files by Date Created

It can be done with a python script, with one sidenote: I took the modification date instead of the creation date, since the creation date will almost certainly not match the real creation date: it is the date the file was copied to the computer, while the modification date seems unchanged during copying (see discussion at @cOrps answer). You will have to see if it works in your situation.

If that is acceptable for you, you can use the script below to create a combined file with your notes. It reads the notes, sorts them and appends them to a text file (creates it if it doesn't exist).

The good news is that you can append your new notes to the same file without overwriting the old ones.

Example output:

Mon Sep 29 08:48:31 2014
This is my first note.
As you can read, I am not really awake yet.

----------
Mon Sep 29 09:04:06 2014
It is really time I am going to eat something.
I am a bit hungry.
Making it a bit longer.

----------

How to use:

Copy the script below into an empty file, save it as add_notes.py
change the directories for files_dir (where your notes are) and the file in which you want to save the notes: combined_file (the script creates the file if it doesn't exist)
run the script in a terminal window by typing the command:
```
python3 /path/to/add_notes.py
```

The script:

#!/usr/bin/env python3

import os
import time
import subprocess

# --------------------------------------------------------
files_dir = "/path/to/your/textfiles"
combined_file = "/path/to/your/combined/file.txt"
# ---------------------------------------------------------
notes = []

if not os.path.exists(combined_file):
    subprocess.Popen(["touch", combined_file])

def read_file(file):
    with open(file) as note:
        return note.read()

def append_file(combined_file, text):
    with open(combined_file, "a") as notes:
        notes.write(text)

for root, dirs, files in os.walk(files_dir):
    for name in files:
        subject = root+"/"+name
        cr_date_text = time.ctime(os.path.getmtime(subject))
        cr_date_n = os.stat(subject).st_mtime
        notes.append((cr_date_n, cr_date_text, subject))

notes.sort(key=lambda x: x[0])

for note in notes:
    text = note[1]+"\n"+read_file(note[2])+"\n"+"-"*10+"\n"
    append_file(combined_file, text)

Best Answer

Related Solutions

Ubuntu – how can one combine a series of grep statements piped together into one grep statement

Single command combination of the three greps

Performance comparison

14.04 Command-Line Dropbox – Combine Multiple Text Files by Date Created

How to use:

The script:

Related Question