Keep copy/paste and TOC in PDF converted from ebook

copy/pasteebookpdfpreview

For research, I often convert non-DRM ebooks to PDF using Calibre. The resulting PDF has a TOC with working links. However, it is not searchable in Preview, and copy/paste results in blank white space, even when pasting into TextEdit or nValt.

(Adobe Acrobat can search, copy/paste the PDF and the TOC works, but I use many tools that use Apple's PDF frameworks, so I'd like to solve this.)

To make it searchable and copyable, I run the PDF through Ghostscript using these commands:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="output" "input"

This renders a new PDF that is searchable, and copy/pastes properly. However, it strips the links from the TOC.

Is there a way to convert the PDF so that it will retain its TOC links and also be searchable and have copy/paste work?

Best Answer

I suppose the first thing would be to investigate Calibre's PDF output to see if it is to spec, and whether it has options that Preview might prefer.

Ghostscript normally should preserve bookmarks and other annotations. Maybe explicitly set -dPDFSETTINGS=/default.?

See the long answer here about using GS:

https://superuser.com/questions/466031/how-do-i-reduce-a-pdfs-size-and-preserve-the-bookmarks

If that doesn't work, you could use this python script, which will copy Bookmarks from one PDF to another. Note that it will overwrite the destination PDF.

#!/usr/bin/python

# Copy PDF Table of Contents from one PDF to another.
# 
# copyOutlines.py <source file> <destination file>
   
from Foundation import  NSURL
import Quartz as Quartz
import sys

def copyOutlines(source, dest):
    pdfURL = NSURL.fileURLWithPath_(source)
    inPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
    if inPDF:
        outline = Quartz.PDFOutline.alloc().init()
        outline = inPDF.outlineRoot()
    pdfURL = NSURL.fileURLWithPath_(dest)
    outPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
    outPDF.setOutlineRoot_(outline)
    outPDF.writeToFile_(dest)   

if __name__ == '__main__':
    copyOutlines(sys.argv[1], sys.argv[2])