I'm developing a "paperless" workflow and plan to save all files in PDF/A-1b format.
I'm trying to develop a simple batch file for converting PDF files that I create or receive to PDF/A-1b. Starting from this answer, I have the following batch file:
gswin32c ^
-dPDFA ^
-dNOOUTERSAVE ^
-sProcessColorModel=DeviceCMYK ^
-dUseCIEColor ^
-sDEVICE=pdfwrite ^
-o %2 ^
-dPDFACompatibilityPolicy=1 ^
"C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
%1
In PDFA_def.ps, I've tried a few different ICC profiles, including one I found on my system
C:/Windows/System32/spool/drivers/color/CalibratedDisplayProfile-5.icc
and sRGB_IEC61966-2-1_no_black_scaling.icc
from color.org.
My test input file is a 1-page email printed from Microsoft Outlook 2010 using CutePDF 2.8 (which uses Ghostscript 8.15).
After converting with my batch file and Ghostscript 9.07, Adobe Reader thinks the output is PDF/A, but PDF/A-1b validation by pdf-tools.com fails with the message "The value of the key N is 4 but must be 3."
I have traced this back to the following construct in the PDF output file:
<</Filter/FlateDecode
/N 4/Length 2595>>stream
If I change /N 4
to /N 3
, the "value of key N" message goes away. /N
apparently represents the number of objects in the stream that follows this header. I don't know how to read the encoded stream so I don't understand what it contains nor why pdf-tools thinks it must only contain 3 objects.
A PDF/A printed using Bullzip, which also uses Ghostscript, also fails validation with the "key N is 4 but must be 3" message.
Does this have something to do with the color space? I'm out of my depth there. I think I'd be happy with a "plain" sRGB space. Ghostscipt docs say the PDF/A encoding must be CMYK. Adobe implies that either RGB or CMYK works for PDF/A. So I'm unclear about how to find an appropriate .icc profile.
Or maybe the validator is wrong and everything is fine?
Best Answer
With the help of a GhostScript developer in this bug report, I was able to solve the
/N
problem. Lessons learned:/N
represents the number of colorants./N
value. The sample included with Ghostscript 9.07 only emits/N 1
(for ProcessColorModel=DeviceGray) or/N 4
(for any other ProcessColorModel)./N 4
, but used an ICC profile describing an RGB color space. The validators correctly caught this discrepancy: I promised 4 colors but only described 3.Most ICC profiles that I found for displays and office printers describe an RGB color space. (CMYK seems more specific to high-end printing presses and certain kinds of paper.) For my purposes, RGB is preferable. The following batch file converts a PDF file to PDF/A-1b with an RGB color space:
In PDFA_def.ps, specify an ICC profile that describes an RGB color space, and change the section for defining an ICC profile as follows:
The long line includes a nested
ifelse
statement that will detect ProcessColorModel=DeviceRGB and emit the appropriate/N 3
. The resulting file should pass validation at pdf-tools.com.Update: I've created a somewhat more capable batch program and published it in a blog post: Batch Convert PDF to PDF/A.