How to remove watermark from pdf using pdftk

pdfpdftkwatermark

I need to remove some stupid email watermark that expands across all pages of a public domain book. I looked at pdftk man page and some examples but still can not figure out how to remove the watermarks. I appreciate your hints.

Best Answer

very simply task to perform:

use sed:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf

but, after, be sure to repair resulting output pdf

pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

all into one command:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf && pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

text watermarks are nothing else than a text between two tags inside the pdf compressed code

Steps

Download Pdftk and extract pdftk.exe and libiconv2.dll to %windir%\System32, a directory in the path or any other location of your choice.
Download and install Notepad++.
PDF streams are usually compressed using the DEFLATE algorithm. This saves space, but it makes the PDF's source illegible.

The command
```
pdftk original.pdf output uncompressed.pdf uncompress
```
uncompresses all streams, so they can be modified by a text editor.
Open uncompressed.pdf with Notepad++ to reveal the structure of the watermark.

In this specific case, every page begins with the block
```
q 9 0 0 9 2997 4118.67 cm
BI
/CS/RGB
/W 1
/H 1
/BPC 8
ID Ÿ®¼
EI Q
```
and nearly 4,000 blocks just like this one. This particular block sets only one (/W 1 /H 1) of the watermark's pixels.

Scrolling down until the pattern changes reveals that the watermark's stream is 95,906 bytes long (counting newlines). The exact same stream is repeated on every page of the PDF file.
Press Ctrl + H and set the following:
```
Find:               q 9 0 0 9 2997 4118\.67 cm.{95881}
Replace:            (blank)
Match case:         checked
Wrap around:        checked
Regular expression: selected
. matches newline:  checked
```
The regular expression q 9 0 0 9 2997 4118\.67 cm.{95881} matches the first line of the above block (q 9 0 0 9 2997 4118.67 cm) and all following 95,881 characters, i.e., the watermark's stream.

Clicking Replace All removes it from all pages of the PDF file.
The watermark has now been removed, but the PDF file has errors (the streams' lengths are incorrect) and it's uncompressed.

The command
```
pdftk uncompressed.pdf output nowatermark.pdf compress
```
takes care of both.
uncompressed.pdf is no longer needed. You can delete it.

The result is the same PDF without the watermark (and about half the size).

No output from pdftk

I had the same issue after upgrading to OS X 10.11 El Capitan.

The solution Sid Stewart, the creator of pdftk, offered up to a different stack exchange question did the trick for me.

Question: https://stackoverflow.com/q/32505951

Answer: https://stackoverflow.com/a/33248310

Best Answer

Related Solutions

How to Remove Watermark from PDF Files

Steps

No output from pdftk

Related Question