How to read out internal pdf creation/modified date with Windows PowerShell

pdfpowershellpropertiesshell-script

PDF files seem to have a separate set of file properties which contain (among others) a creation date and a modified date (see screenshot here: http://ventajamarketing.com/writingblog/wp-content/uploads/2012/02/Acrobat-Document-Properties1-300×297.png).

Those date obviously can differ from the creation and modified date shown in the file system (Windows Explorer).

How can I access the date information in the PDF file and read it out in Windows 7 with Windows PowerShell (or maybe another method)?

Best Answer

You can read a PDF file (at least you can in newer formats) as though it were text. You will find an embedded XML section that uses the Adobe XMP schema. This contains the metadata you need.

Here is an example:

%PDF-1.5
%âãÏÓ
2 0 obj
<<
/AcroForm 4 0 R
/Lang (en-GB)
/MarkInfo <<
/Marked true
>>
/Metadata 5 0 R
/Pages 6 0 R
/StructTreeRoot 7 0 R
/Type /Catalog
>>
endobj
5 0 obj
<<
/Length 2971
/Subtype /XML
/Type /Metadata
>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
                xmlns:xmp="http://ns.adobe.com/xap/1.0/">
            <xmp:CreateDate>2014-03-05T15:03:02+01:00</xmp:CreateDate>
            <xmp:ModifyDate>2014-05-30T11:58:02+01:00</xmp:ModifyDate>
            <xmp:MetadataDate>2014-03-05T14:03:46Z</xmp:MetadataDate>
        </rdf:Description>
        <rdf:Description rdf:about=""
                xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
            <xmpMM:DocumentID>uuid:8b5fe011-ed77-4298-aa84-d1eda797b9ff</xmpMM:DocumentID>
            <xmpMM:InstanceID>uuid:88074e0b-42f7-4268-bc89-0162e417c9ad</xmpMM:InstanceID>
        </rdf:Description>
        <rdf:Description rdf:about=""
                xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:format>application/pdf</dc:format>
        </rdf:Description>
    </rdf:RDF>
</x:xmpmeta>

The following example will retrieve the create date:

$a = Select-String "CreateDate\>(.*)\<" .\filename.pdf

Which returns something like:

filename.pdf:20:         <xap:CreateDate>2009-11-03T10:54:29Z</xap:CreateDate>
filename.pdf:12921:         <xap:CreateDate>2009-11-03T10:54:29Z</xap:CreateDate>

Getting to the exact data:

$a.Matches.Groups[1]

Which returns:

2009-11-03T10:54:29Z
Related Question