Grab the content from a .pages file, without using Pages.app

pages

I’m interested in automatically extracting the content of .pages documents, preferably via a programming language on a web server.

There are quite a few references around the web that say Pages documents are actually zipped archives containing various files (e.g. http://www.tuaw.com/2009/11/02/iwork-secret-life-as-zip-file-revealed-includes-pdf-preview/), but a Pages document sent to me by a friend doesn’t seem to be unzippable by Mac OS X’s Archive Utility, or The Unarchiver, no matter what I change the extension to.

Is there a way to get the content from recent Pages files?

Best Answer

If you're trying to roll your own solution, the actual .pages file is a package. If you right click it, you can show package contents. Inside the resulting folder will be all the graphic files plus a file called index.xlm.gz. If you unzip the file, it is an xml file containing all the text in the pages document.

Best Answer

Related Solutions

How to make Pages default to the folder of the opened file when exporting a PDF version

IPad – Post a file from pages.app to a Yahoo! group

Related Question