When I insert JPEG screenshots into Microsoft Word, it smoothes them instead of preserving the original pixels from the bitmap. When I then print to PDF (using Acrobat Distiller), depending on my downsample settings, I either get blurry screenshots or hugely bloated file sizes.
What I want:
I would like Word and Acrobat to leave the bitmaps alone so that they make it through the process with their pixels intact. This is what the original image looks like when you zoom in:
What I get:
This is what the Word document looks like when you insert the same image and zoom in. When this is printed to PDF, all those extra pixels result in a much larger file.
Sample files:
- Test.png (56K) A sample screenshot image file
- Test.docx (69K) A Word file containing nothing but this image
- Test.PDF (9.4MB) A PDF file printed from the Word file using Distiller, with all downsampling turned off
- Test2.PDF (98K) A PDF file generated using Word 2010's "Save as PDF" tool (note the very low quality of the compressed image)
Edit: This is with Word 2010 – I've updated the tags to reflect that.
Edit: I've confirmed that OpenOffice doesn't have this problem. I've opened Test.docx (referenced above) and exported it as a PDF from OO (choosing "lossless compression" under Images in the options), and the image comes through unharmed.
Unfortunately, OpenOffice mangles the formatting on more complex Word documents that I've created; so I can't just create the documents in Word and use OO to render the PDFs; I'd have to switch to OO altogether, which is a bigger step than I'm prepared to take right now.
Best Answer
Word maybe just renders upscaled image and sends it that way as printer input (I presume that Distiller works as a printer). If so, then it's good for normal printers, but inefficient for fake printers producing PDF files.
For instance pdfLaTeX properly embeds image in output file. Check my PDF uploaded to min.us gallery: Embedding image in LaTeX document
Important thing is what PDF producing stack you are using. If trying other PDF printer, like great and free PDFCreator, does not fix the problem, then you should try using dedicated PDF export, i.e. not working as a printer. AFAIK recent Word versions have PDF export built-in, so if it is properly implemented, then you will get small file, thanks to embedding images used in the document.
HUGE EDIT
Gallery has been renamed to Embedding PNG image in LaTeX vs Word
I've looked more thoroughly at my
mytest.pdf
generated by pdfLaTeX and yourtest2.pdf
generated by Word.mytest.pdf test2.pdf
Let's start with uncompressing. If you look into uncompressed file, you'll easily spot beginning of the image stream (
<<...>>stream
line with Width and Height parameters, same as intest.png
, i.e. 176x295), which ends withendstream
tag. Peek time.(WARNING at this point pdftk is assumed to be in version 1.41)
test2.pdf
So Word is giving JPEG instead of PNG on its internal output for further PDF processing. Just WOW! Same thing may happen when sending output to printer.
test2stream.jpg
mytest.pdf
It's not COM file, but it's not PNG either.
You see it now? Image stream (of PNG) from PDF produced by pdfLaTeX is possibly simple raw format (176*295*3=155760, 1 comes from superfluous newline). Let's check it:
And we have our original image back! No, wait. It looks that pdftk 1.41 uncompression is buggy and image was almost the same with a few flaws. I upgraded to pdftk 1.44, but this version does not decompress image stream at all. Moreover pdftk does not output stream dictionary in one line, so above extraction using sed no longer works, but there is no point in fixing it now.
So what we can do about Word? Not much methinks. At least you can transplant embedded image from one PDF to another. I repeated uncompression of both PDFs using recent pdftk, opened them in vim, replaced in
test2uc.pdf
<<...>>stream...endstream
with counterpart frommytestuc.pdf
, saved astest2fixuc.pdf
and compressed totest2fix.pdf
.test2fix.pdf
test.pdf
It would be a sin not checking your big PDF after all. Ok, I've prepared another oneliner to play with pdftk 1.44 uncompressed PDFs to list image streams and their beginning lines in files. So I'll start with uncompressing
test.pdf
.(WARNING at this point pdftk is assumed to be in version 1.44)
Something is really insane here! 6 raw images (apparently this time pdftk did not have any problems in uncompressing them) taking together 43444452 bytes! Let's recheck
test2uc.pdf
andmytestuc.pdf
.In both cases only one image stream. Why the heck there could be more of them?!
Image was cut to many pieces... It looks like some kind of utterly stupid protection, maybe introduced by Distiller (and maybe it can be turned off)? I doubt same thing would be spitted by PDFCreator, unless it's Word who performs this unbelievable insanity...
testuc-stream1.png and others (use right arrow to navigate)
Conclusion
Important things are:
Phew. This investigation took some time. Word is piece of junk.
Workarounds?
In the meantime some suggestions were given. Let me comment them.
Using writer with decent PDF support like LibreOffice (forget about OpenOffice, it's obsoleted now) is good solution, unless some incompabilities make you unable to work with it.
Using bigger image in same box on the page is also not that bad idea, because even after JPEG-izing, artifacts will be less visible.
My another grosz though is using JPEG from the beginning. That way Word shouldn't recompress it (you never know...) and you can provide highest possible quality of JPEG. There is also lossless JPEG compression. Developers from Redmond presumably thought it's not needed, so I won't be surprised if Word doesn't handle such JPEGs. Well, TBH it's not widely supported (even in open source world), just like arithmetic coding (or it's rather even worse situation in case of arithmetic coding).
(In Windows use 416 instead of this
$(())
arithmetic expansion available in POSIX shells)I think that default Mitchell is good one for upscaling, but if you really want such pixelatic image, then go with Box as @ceving suggested. Of course first 2 files are useful only if you must (for some reason) use fake PDF printers.
I've uploaded all three files.
test-300dpi-mitchell.jpg (426 KB) test-300dpi-box.jpg (581 KB) test.jpg (74 KB)
If my hypothesis is right and Word won't recompress JPEG image, then just use the last one not upscaled and go with built-in PDF output, because it has less shortcommings (at least it avoids needless upscale).