How to analyze whether a PDF file is valid


I have been running into problems with Office2007-generated PDF files. You can read all about it here.

TL;DR: Some PDFs generated from PPTX files using
the "Save as PDF/XPS…" add-in are rendered only partially and trigger
error messages in Adobe Reader/Acrobat Pro.

After trying many other options outlined in @harrymc's answer to my other question, and after confirmation from other users who have encountered the same problem, I have decided to get Microsoft support involved, which is quite expensive (€299 + taxes which you only get refunded if the support incident uncovers a bug in an MS product, and if support decides that it's in fact a bug).

My problem now is that in the first return call, MS tech support suggested that if only Adobe Reader/Acrobat are having a problem with the file, but Foxit or Chrome can render it correctly, then it's Adobe's problem, not theirs. So now it looks like I need to be able to prove that the generated PDF is in fact invalid.

In my other question, @harrymc provided an error message from GhostScript which suggests to me that there is in fact an error in the PDF. But can I really take this as evidence? Is there something like an official PDF validator that can point out exactly what's wrong with my file? Or with Adobe?

For reference, here's one file that's causing these problems.


MS tech support has been able to reproduce the problem (even in their own XPS viewer), and they agree it's a bug (although they called it a "limitation", gotta remember this), so I won't have to pay for the incident. They will pass it on to the developers, but couldn't guarantee a fix and recommended an upgrade to Office 2010. I'll have to see if my university will play along with this – our standard is currently Office 2007, but I know my license is also valid for 2010.

Best Answer

From the Adobe validator (Preflight in Acrobat X Pro):

enter image description here Click for full size

I don't see how much more official you can get. That happened on a "Report PDF syntax issues" in Preflight. The same thing when I tried to test for PDF/A validity. The report process aborts rather than continuing, as it would do for minor errors. There is no response on the numerous Adobe forum posts about this error.

Opening the file in Notepad++ and ripping out every stream (stream to endstream inclusive) leads to a blank file that does not report an error on opening and only a few minor syntax errors in Preflight (related to the missing streams). Obviously there's something invalid in/about one of those streams, perhaps an invalid control character or something. I don't know much about the PDF format.

Also, PDF creation using the built-in tool works perfectly on your presentation in PowerPoint 2010. It appears only 2007 SP3 is affected - as you found yourself, no previous version was and no later version is. Depending on Microsoft's policy, this may or may not warrant a bugfix. It could be that the encoding used in 2007 SP3's version for images is not fully supported by Adobe.

Was the file you provided exported with the "ISO 19005-1 compliant (PDF/A)" option checked? If not, could you provide one that is?

Unless the file was exported as a standards compliant format (that option is unchecked by default!), it is not necessarily a 'bug', unless they explicitly say Adobe Acrobat/Reader should be able to open their PDFs - especially when some programs can. You may be fighting an uphill battle for a refund.