Word – Stop Microsoft Word 2010 from smoothing screenshots

adobe-acrobatanti-aliasingmicrosoft-word-2010pdfscreenshot

When I insert JPEG screenshots into Microsoft Word, it smoothes them instead of preserving the original pixels from the bitmap. When I then print to PDF (using Acrobat Distiller), depending on my downsample settings, I either get blurry screenshots or hugely bloated file sizes.

What I want:

I would like Word and Acrobat to leave the bitmaps alone so that they make it through the process with their pixels intact. This is what the original image looks like when you zoom in:

What I want

What I get:

This is what the Word document looks like when you insert the same image and zoom in. When this is printed to PDF, all those extra pixels result in a much larger file.

What I get

Sample files:

Test.png (56K) A sample screenshot image file
Test.docx (69K) A Word file containing nothing but this image
Test.PDF (9.4MB) A PDF file printed from the Word file using Distiller, with all downsampling turned off
Test2.PDF (98K) A PDF file generated using Word 2010's "Save as PDF" tool (note the very low quality of the compressed image)

Edit: This is with Word 2010 – I've updated the tags to reflect that.

Edit: I've confirmed that OpenOffice doesn't have this problem. I've opened Test.docx (referenced above) and exported it as a PDF from OO (choosing "lossless compression" under Images in the options), and the image comes through unharmed.

Test_OO.pdf

Unfortunately, OpenOffice mangles the formatting on more complex Word documents that I've created; so I can't just create the documents in Word and use OO to render the PDFs; I'd have to switch to OO altogether, which is a bigger step than I'm prepared to take right now.

Best Answer

Word maybe just renders upscaled image and sends it that way as printer input (I presume that Distiller works as a printer). If so, then it's good for normal printers, but inefficient for fake printers producing PDF files.

For instance pdfLaTeX properly embeds image in output file. Check my PDF uploaded to min.us gallery: Embedding image in LaTeX document

Important thing is what PDF producing stack you are using. If trying other PDF printer, like great and free PDFCreator, does not fix the problem, then you should try using dedicated PDF export, i.e. not working as a printer. AFAIK recent Word versions have PDF export built-in, so if it is properly implemented, then you will get small file, thanks to embedding images used in the document.

HUGE EDIT

Gallery has been renamed to Embedding PNG image in LaTeX vs Word

I've looked more thoroughly at my mytest.pdf generated by pdfLaTeX and your test2.pdf generated by Word.

mytest.pdf test2.pdf

Let's start with uncompressing. If you look into uncompressed file, you'll easily spot beginning of the image stream (<<...>>stream line with Width and Height parameters, same as in test.png, i.e. 176x295), which ends with endstream tag. Peek time.

(WARNING at this point pdftk is assumed to be in version 1.41)

test2.pdf

$ pdftk test2.pdf output test2uc.pdf uncompress
$ sed '\,^<</Width 176[^>]*/Height 295[^>]*>>stream$,!d' test2uc.pdf
<</Width 176/BitsPerComponent 8/Interpolate true/Height 295/Filter[/DCTDecode]/Subtype/Image/Length 20003/ColorSpace/DeviceRGB/Type/XObject>>stream
$ sed '1,\,^<</Width 176[^>]*/Height 295[^>]*>>stream$,d;/^endstream$/,$d' test2uc.pdf > test2stream
$ xxd test2stream | head -10
0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048  ......JFIF.....H
0000010: 0048 0000 ffe1 005c 4578 6966 0000 4d4d  .H.....\Exif..MM
0000020: 002a 0000 0008 0004 0302 0002 0000 0016  .*..............
0000030: 0000 003e 5110 0001 0000 0001 0100 0000  ...>Q...........
0000040: 5111 0004 0000 0001 0000 0b13 5112 0004  Q...........Q...
0000050: 0000 0001 0000 0b13 0000 0000 5068 6f74  ............Phot
0000060: 6f73 686f 7020 4943 4320 7072 6f66 696c  oshop ICC profil
0000070: 6500 ffe2 0c58 4943 435f 5052 4f46 494c  e....XICC_PROFIL
0000080: 4500 0101 0000 0c48 4c69 6e6f 0210 0000  E......HLino....
0000090: 6d6e 7472 5247 4220 5859 5a20 07ce 0002  mntrRGB XYZ ....
$ file test2stream 
test2stream: JPEG image data, JFIF standard 1.01

So Word is giving JPEG instead of PNG on its internal output for further PDF processing. Just WOW! Same thing may happen when sending output to printer.

test2stream.jpg

mytest.pdf

$ pdftk mytest.pdf output mytestuc.pdf uncompress
$ sed '\,^<</Width 176[^>]*/Height 295[^>]*>>stream$,!d' mytestuc.pdf
<</Width 176/BitsPerComponent 8/Height 295/Subtype/Image/Length 155760/ColorSpace/DeviceRGB/Type/XObject>>stream
$ sed '1,\,^<</Width 176[^>]*/Height 295[^>]*>>stream$,d;/^endstream$/,$d' mytestuc.pdf > myteststream
$ xxd myteststream | head -10
0000000: ebeb ebea eaea ecec eceb ebeb ebeb ebeb  ................
0000010: ebeb ebeb ebec ecec ebeb ebeb ebeb ebeb  ................
0000020: ebeb ebeb ebeb ebeb ebeb ebeb ebeb ebeb  ................
0000030: ebeb ebea eaea eaea eaec ecec eaea eaec  ................
0000040: ecec ebeb ebec ecec ebeb ebeb ebeb ebeb  ................
0000050: ebeb ebeb ebeb ebeb ebeb ebeb ebeb ebeb  ................
0000060: ebeb ebeb ebeb ebeb ebeb ebeb ebeb ebeb  ................
0000070: ebeb ebeb ebeb ebeb ebeb ebeb ebeb ebeb  ................
0000080: ebea eaea ecec eceb ebeb ebeb ebea eaea  ................
0000090: ebeb ebeb ebeb ebeb ebeb ebeb ebeb ebeb  ................
$ file myteststream 
myteststream: DOS executable (COM)

It's not COM file, but it's not PNG either.

$ du -b test.png test2stream myteststream 
57727   test.png
20004   test2stream
155761  myteststream

You see it now? Image stream (of PNG) from PDF produced by pdfLaTeX is possibly simple raw format (176*295*3=155760, 1 comes from superfluous newline). Let's check it:

$ convert -depth 8 -size 176x295 rgb:myteststream myteststream.png

And we have our original image back! No, wait. It looks that pdftk 1.41 uncompression is buggy and image was almost the same with a few flaws. I upgraded to pdftk 1.44, but this version does not decompress image stream at all. Moreover pdftk does not output stream dictionary in one line, so above extraction using sed no longer works, but there is no point in fixing it now.

So what we can do about Word? Not much methinks. At least you can transplant embedded image from one PDF to another. I repeated uncompression of both PDFs using recent pdftk, opened them in vim, replaced in test2uc.pdf <<...>>stream...endstream with counterpart from mytestuc.pdf, saved as test2fixuc.pdf and compressed to test2fix.pdf.

test2fix.pdf

test.pdf

It would be a sin not checking your big PDF after all. Ok, I've prepared another oneliner to play with pdftk 1.44 uncompressed PDFs to list image streams and their beginning lines in files. So I'll start with uncompressing test.pdf.

(WARNING at this point pdftk is assumed to be in version 1.44)

$ pdftk test.pdf output testuc.pdf uncompress
$ awk '{if(i)h=h$0} /^[0-9]+ [0-9]+ obj $/{i=1;h=""}/^stream$/{i=0;if(h!~/\/Image/)next;print h,":"NR+1}' testuc.pdf 
<</ColorSpace /DeviceRGB/Subtype /Image/Length 10443804/Width 707/Type /XObject/BitsPerComponent 8/Height 4924>>stream :619
<</ColorSpace /DeviceRGB/Subtype /Image/Length 11264460/Width 953/Type /XObject/BitsPerComponent 8/Height 3940>>stream :12106
<</ColorSpace /DeviceRGB/Subtype /Image/Length 2813256/Width 953/Type /XObject/BitsPerComponent 8/Height 984>>stream :12910
<</ColorSpace /DeviceRGB/Subtype /Image/Length 11264460/Width 953/Type /XObject/BitsPerComponent 8/Height 3940>>stream :18547
<</ColorSpace /DeviceRGB/Subtype /Image/Length 2813256/Width 953/Type /XObject/BitsPerComponent 8/Height 984>>stream :19312
<</ColorSpace /DeviceRGB/Subtype /Image/Length 4845216/Width 328/Type /XObject/BitsPerComponent 8/Height 4924>>stream :19326

Something is really insane here! 6 raw images (apparently this time pdftk did not have any problems in uncompressing them) taking together 43444452 bytes! Let's recheck test2uc.pdf and mytestuc.pdf.

$ awk '{if(i)h=h$0} /^[0-9]+ [0-9]+ obj $/{i=1;h=""}/^stream$/{i=0;if(h!~/\/Image/)next;print h,":"NR+1}' test2uc.pdf 
<</Width 176/BitsPerComponent 8/Interpolate true/Height 295/Filter /DCTDecode/Subtype /Image/Length 20003/ColorSpace /DeviceRGB/Type /XObject>>stream :113
przemoc@debian:~/latex/test/img/mod$ awk '{if(i)h=h$0} /^[0-9]+ [0-9]+ obj $/{i=1;h=""}/^stream$/{i=0;if(h!~/\/Image/)next;print h,":"NR+1}' mytestuc.pdf 
<</DecodeParms <</Colors 3/Columns 176/Predictor 10/BitsPerComponent 8>>/Width 176/BitsPerComponent 8/Height 295/Filter /FlateDecode/Subtype /Image/Length 54954/ColorSpace /DeviceRGB/Type /XObject>>stream :22

In both cases only one image stream. Why the heck there could be more of them?!

$ sed '1,618d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 707x4924 rgb:- testuc-stream1.png
$ sed '1,12105d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 953x3940 rgb:- testuc-stream2.png
$ sed '1,12909d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 953x984 rgb:- testuc-stream3.png
$ sed '1,18546d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 953x3940 rgb:- testuc-stream4.png
$ sed '1,19311d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 953x984 rgb:- testuc-stream5.png
$ sed '1,19325d;/^endstream $/q' testuc.pdf | convert -depth 8 -size 328x4924 rgb:- testuc-stream6.png

Image was cut to many pieces... It looks like some kind of utterly stupid protection, maybe introduced by Distiller (and maybe it can be turned off)? I doubt same thing would be spitted by PDFCreator, unless it's Word who performs this unbelievable insanity...

testuc-stream1.png and others (use right arrow to navigate)

Conclusion

Important things are:

you can clearly see, that huge image that was cut into pieces is actually upscaled JPEG, so my hypothesis was correct,
because in PDFCreator you get also huge file in the output, it's the Word who provides awfully big image to the fake PDF printer, and my earlier supposition was also correct.

Phew. This investigation took some time. Word is piece of junk.

Workarounds?

In the meantime some suggestions were given. Let me comment them.

Using writer with decent PDF support like LibreOffice (forget about OpenOffice, it's obsoleted now) is good solution, unless some incompabilities make you unable to work with it.

Using bigger image in same box on the page is also not that bad idea, because even after JPEG-izing, artifacts will be less visible.

My another grosz though is using JPEG from the beginning. That way Word shouldn't recompress it (you never know...) and you can provide highest possible quality of JPEG. There is also lossless JPEG compression. Developers from Redmond presumably thought it's not needed, so I won't be surprised if Word doesn't handle such JPEGs. Well, TBH it's not widely supported (even in open source world), just like arithmetic coding (or it's rather even worse situation in case of arithmetic coding).

convert test.png -quality 100 -resize $((100*300/72))% test-300dpi-mitchell.jpg
convert test.png -quality 100 -filter box -resize $((100*300/72))% test-300dpi-box.jpg
convert test.png -quality 100 test.jpg

(In Windows use 416 instead of this $(()) arithmetic expansion available in POSIX shells)

I think that default Mitchell is good one for upscaling, but if you really want such pixelatic image, then go with Box as @ceving suggested. Of course first 2 files are useful only if you must (for some reason) use fake PDF printers.

I've uploaded all three files.

test-300dpi-mitchell.jpg (426 KB) test-300dpi-box.jpg (581 KB) test.jpg (74 KB)

If my hypothesis is right and Word won't recompress JPEG image, then just use the last one not upscaled and go with built-in PDF output, because it has less shortcommings (at least it avoids needless upscale).

Related Solutions

Word – Poor quality PNG image when printing to PDF in Word 2010

The PNG format is not suitable for what you are trying to acheive.

You mention that you have an EPS version of the logo.

I would insert the EPS directly into the Word file for the best results. The EPS file is a vector graphic format that will allow you to resize it without getting the scaling artifacts you get with bitmap formats.

The only issue with doing it this way is that Word will display and most likely print a bitmap preview image instead of the actual vector information. But, when you print the Word file to PDF the vector information will be embedded in the PDF.

Support for EPS in Office 2013

It is well-known that the EPS import filter in MS Office is very out-of-date (seemingly was not changed much from mid-1990) and can import only limited subset of EPS files. Official Microsoft website provides little information on it but it tells us that

The Encapsulated PostScript graphics filter (Epsimp32.flt) supports the Adobe Systems Encapsulated PostScript Specification versions 3.0 and earlier.

(refs: 1, 2). The PostScript Specification version 3.0 dates back to 1992 year when it was published by Adobe. From that time it was extended essentially. Note also that PostScript Level 3 came at the end of 1997 and one should not be confused with these things: at the time of PostScript Specification version 3.0 only PostScript Level 2 was was introduced.

Besides that one should keep in mind that MS Office works only in sRGB colorspace and renders graphics in the other colorspaces (such as CMYK so much loved by Adobe) incorrectly. But since in the case of embedded EPS images it sends the original PostScript code directly to a PostScript printer (and only to a PostScript printer, other printers will receive a low-resolution raster preview!) it may be not so bad idea to work with CMYK EPS files in MS Office: despite incorrect on-screen rendering they will print nicely (but only to PostScript printers!).

As to my experience recent versions of CorelDraw and Illustrator produce EPS files compatible with MS Office (although it is necessary to turn off generation of CMYK colors and work exclusively in RGB colorspace).

If you see a placeholder instead of a figure it simply means that the EPS was not imported because the MS Office EPS import filter cannot handle this particular EPS file. One possible workaround is to import this EPS file in Illustrator or CorelDraw and then export it as EPS again. The EPS file produced in this way should be compatible with MS Office EPS import filter. You could try the same method with Inkscape although EPS files generated by Inkscape are not always compatible with MS Office. Another approach is to convert EPS to PDF using Arobat Distiller, then open it in Acrobat and export to EPS, but again EPS files produced by Acrobat are not always compatible with MS Office.

Free utilities pdftops and pdftocairo from Poppler utilities for Windows provide another option. They create MS Office compatible EPS files from PDF when are launched with -level2 -eps option:

pdftops -level2 -eps input.pdf
pdftocairo -level2 -eps input.pdf

It seems that the only difference between them is that pdftocairo produces compressed EPS file while pdftops not.

Note that if the PDF file contains transparent objects they will be rasterized when converting to EPS because EPS basically does not support transparency. In such cases Acrobat or Illustrator can be used to get proper EPS file without rasterization.

P.S. Here is published an interesting example of EPS file which can be imported in MS Office and is displayed incorrectly but can be printed to PostScript printers correctly.