Ghostscript PDF/A conversion fails validation

cutepdfghostscriptpdf

I'm developing a "paperless" workflow and plan to save all files in PDF/A-1b format.

I'm trying to develop a simple batch file for converting PDF files that I create or receive to PDF/A-1b. Starting from this answer, I have the following batch file:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceCMYK ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o %2 ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
    %1

In PDFA_def.ps, I've tried a few different ICC profiles, including one I found on my system

C:/Windows/System32/spool/drivers/color/CalibratedDisplayProfile-5.icc

and sRGB_IEC61966-2-1_no_black_scaling.icc from color.org.

My test input file is a 1-page email printed from Microsoft Outlook 2010 using CutePDF 2.8 (which uses Ghostscript 8.15).

After converting with my batch file and Ghostscript 9.07, Adobe Reader thinks the output is PDF/A, but PDF/A-1b validation by pdf-tools.com fails with the message "The value of the key N is 4 but must be 3."

I have traced this back to the following construct in the PDF output file:

<</Filter/FlateDecode
/N 4/Length 2595>>stream

If I change /N 4 to /N 3, the "value of key N" message goes away. /N apparently represents the number of objects in the stream that follows this header. I don't know how to read the encoded stream so I don't understand what it contains nor why pdf-tools thinks it must only contain 3 objects.

A PDF/A printed using Bullzip, which also uses Ghostscript, also fails validation with the "key N is 4 but must be 3" message.

Does this have something to do with the color space? I'm out of my depth there. I think I'd be happy with a "plain" sRGB space. Ghostscipt docs say the PDF/A encoding must be CMYK. Adobe implies that either RGB or CMYK works for PDF/A. So I'm unclear about how to find an appropriate .icc profile.

Or maybe the validator is wrong and everything is fine?

Best Answer

With the help of a GhostScript developer in this bug report, I was able to solve the /N problem. Lessons learned:

  • The GhostScript doc referenced in my question is out of date. The current doc, here, says that ProcessColorModel=DeviceRGB is okay.
  • ICC profiles describe a color space. Some valid color spaces are GRAY, RGB, and CMYK. You can check the color space of an ICC profile using the free ICC Profile Inspector.
  • In the section of the PDF file causing validation errors, /N represents the number of colorants.
  • The PDFA_def.ps file emits the /N value. The sample included with Ghostscript 9.07 only emits /N 1 (for ProcessColorModel=DeviceGray) or /N 4 (for any other ProcessColorModel).
  • My original test specified ProcessColorModel=DeviceCMYK which caused /N 4, but used an ICC profile describing an RGB color space. The validators correctly caught this discrepancy: I promised 4 colors but only described 3.

Most ICC profiles that I found for displays and office printers describe an RGB color space. (CMYK seems more specific to high-end printing presses and certain kinds of paper.) For my purposes, RGB is preferable. The following batch file converts a PDF file to PDF/A-1b with an RGB color space:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceRGB ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o %2 ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
    %1

In PDFA_def.ps, specify an ICC profile that describes an RGB color space, and change the section for defining an ICC profile as follows:

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {systemdict /ProcessColorModel get /DeviceRGB eq {3} {4} ifelse} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

The long line includes a nested ifelse statement that will detect ProcessColorModel=DeviceRGB and emit the appropriate /N 3. The resulting file should pass validation at pdf-tools.com.

Update: I've created a somewhat more capable batch program and published it in a blog post: Batch Convert PDF to PDF/A.

Related Question