How to extract pages from a PDF as a set of images ?

Imagemagick-logoI was recently requested to transform a slideset (originally generated with LibreOffice, but shared as PDF), into a set of images.
This is very simple, using Imagemagick program : convert

The question is : how ?

1- Get the job done

First, I suggest that you create a specific directory for that. Let’s say /tmp/images

mkdir /tmp/images
cd /tmp/images

Then, assuming your file is ~/myslideset.pdf, you simply launch the command:

convert ~/myslideset.pdf image.png

and this creates a set of pictures (image-0.png, image-1.png,…)

Of course, you could create JPEG pictures, simply by replacing “image.png” with “image.jpg” in the above command

That’s all…

2- Cleanup

… now, if you want smart naming (all files with 2 digits numbers), you could launch this simple bash command:

for i in image-?\.png
do r=$(echo $i | sed -e 's/-/-0')
echo "$i renamed $r"
mv $i $r
done

3- PNG transparency issue

I used the same commands to extract pictures from a booklet that I had to include in a Memo.

I faced an issue with PNG generated with this command: they were all set with the alpha transparency activated, and that was generating pictures that kept this transparency.

I was looking for something else : simple pages, on a white background.

Hopefully, the “convert” command is full of options (RTFM => “man convert” is your first friend, “imagemagick convert” on your search engine is another one…). I found how to solve this transparency issue adding options when converting my PDF:

convert ~/myslideset.pdf -background white -alpha remove image.png

I wanted a white background, and no alpha transparency channel…

Quite simple, isn’t it ?

4- PNG quality issue

Another issue came from the quality of the generated PNG.

When looking at the PDF file, text, especially small font text, is very clean, very easy to read. A PDF contains directly the text as such, and your PDF reader is the one displaying it using the appropriate fonts at the appropriate zoom level…

With a converted image, you loose this feature. The text is now pixels in the picture. And if you compress to much, or too badly, things are getting fuzzy, sometimes even unreadable…

So, I looked at the “-quality” option when using “convert” command.

Straightforward for a JPEG image (simply the % of quality, from 0 to 100, 100 = highest quality, highest size). I usually set quality between 85-95% depending on the nature of the picture.

For a PNG file, this is different. I read somewhere that the number is not meaningful by itself, but rather its 2 digits independently:

  • First digit (tens) = Zlib compression level (from 0 to 9, 0 = no compression)
  • Second digit (units) = PNG data encoding filter type.
    • 0 is none,
    • 1 is “sub”,
    • 2 is “up”,
    • 3 is “average”,
    • 4 is “Paeth”,
    • 5 is “adaptive”.

So, usually, I’m trying “-quality 00″, or “-quality 05″, or, even, “-quality 55″:

convert ~/myslideset.pdf -background white -alpha remove -quality 55 image.png

That works great !

Enjoy…

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>