Can I use Notepad++ to selectivelly merge two text files

notepad

I have a two lists of words, one per line, each list in a separate file and I need to do two things:

  1. Merge the two lists but excluding the duplicates.
  2. Remove all words, that are less than 5 characters long.

For example: First list:

apple
banana
orange

Second list:

apricot
avocado
lime

Merged list:

apple
banana
orange
apricot
avocado

How to do this task with Notepad++?

Best Answer

Merging:

The easiest way to merge two files is to copy and paste. Notepad++ has no build-in file merging feature.

You can, however, install a plugin for this. See Combining files in Notepad++.

Another solution would be the command line's copy. See Need to combine lots of files in a directory

Replacing line breaks:

Removing duplicates will be trickier than removing short words since Notepad++'s search does not search over multiple lines at once, so we will have to convert the line breaks into something else.

To achieve this, you can perform an Extended replace, finding all \r\n (DOS line break) and replacing them by # (or any other character that does not appear in your list).

If the last line was not blank, append a # to the end of the resulting string.

Removing duplicates:

Now perform a Regular expression replace, finding all ([^#]+)#(.*#)\1# and replacing them by \1#\2.

If there were duplicates in a single file, you might have to do that more than once.

Removing words of 4 or less characters:

This one is easy. Just perform a Regular expression replace, finding all #.?.?.?.?# and replacing them by #.

Line breaks:

Now you can get rid of the line break hack. Just perform an Extended replace, finding all # and replacing them by \r\n.

Finally, delete the last line as it will be blank.

Related Question