Add missing contents of one file to another

filesscriptingtext processing

FILE1=$1   
FILE2=$2

if [ ! -e "$FILE2" ]
then
   touch $FILE2
fi

while read line; do
  if grep -q "$line" $FILE2
  then
    echo
  else
    echo $line >> $FILE2
  fi
done < $FILE1

I want to add the lines of FILE1 to FILE2 if they are not present in FILE2. My purpose is to list FILE1's lines, and add them to FILE2 if the lines are not present in FILE2.

What's wrong?

Best Answer

awk 'NR == FNR {first[$0];next}; ! ($0 in first)' "$file2" "$file1" >> "$file2"

That assumes that the file names don't contain = characters. If they might, on systems that support /dev/fd/n, you could do:

awk 'NR == FNR {first[$0];next}; ! ($0 in first)' \
  /dev/fd/3 3< "$file2" /dev/fd/4 4< "$file1" >> "$file2"

Note that if lines occur several times in $file1 and not in $file2, all occurrences will be appended to $file2. If that's not what you want, then:

awk 'NR != FNR && ! ($0 in seen); {seen[$0]}' \
  /dev/fd/3 3< "$file2" /dev/fd/4 4< "$file1" >> "$file2"

If the lines are unique and you don't mind the output being sorted, you could also do:

LC_ALL=C sort -uo "$file2" "$file1" "$file2"

As to what's wrong in your code:

FILE1=$1   
FILE2=$2

all uppercase variables are by convention reserved for environment variables (variables exported to the environment which is a shared namespace), so it's good practice not to use all uppercase variables for mere shell variables.

if [ ! -e "$FILE2" ]
then
   touch $FILE2
fi

You quoted it in the arguments to [, but not to touch, why?

Also, note it's

touch "$file_or_option"

If you want the variable to always be treated as a file argument, you need:

touch -- "$file2"

Also the if ! condition; then do-something-to-make-it-true are often bad practice and subject to race condition. It's better to do enforce-the-condition || die-if-couldn't. So here:

touch -- "$file2" || exit

But anyway, the >> redirection will create the file (and return an error if it can't).

while read line; do

When you start using loops in shells especially while read loops, then that generally means you're going for the wrong approach. You're going to run several commands sequentially for each line of a file, which is not how you do shell scripting. Instead, you want to run a few specialised commands once cooperating to the task.

First, to read a line in POSIX shells, the syntax is

IFS= read -r line

not read line

  if grep -q "$line" $FILE2

Again, it's:

grep -q "$option_or_regexp"

Or:

grep -q -- "$regexp"

Or:

grep -qe "$regexp"

If you want to search for strings instead of matching regexps, you need the -F option and if you want to match lines as a whole, you need -x. So:

grep -qxFe "$line" < "$file2"
  then
    echo
  else
    echo $line >> $FILE2

Again, you're applying the split+glob operator to the content of $line (and to $FILE2 if using bash not in sh mode).

Also, echo can't be used for arbitrary data. You need:

printf '%s\n' "$line" >> "$file2"

Here. But it's a waste to re-open $file2 for each pass in the loop where you could have done it once for the whole loop.

  fi
done < $FILE1

Again:

done < "$file1"

if using bash.

Related Question