How to Convert PDF to Image (JPG or PNG) In C#
Create a command line program in C# that can convert a PDF document into a series of images, one for each page of the document. The program will allow the user to select the start and end pages to convert, and what bitmap file format (JPEG, BMP, GIF, and PNG) to save in.
- C# development environment (Visual Studio used in this example)
- Accusoft PDFXpress .Net SDK (download free evaluation copy here)
Accusoft's PDFXpress is a great tool for editing, annotating, and putting information into a PDF file or extracting it. But suppose you need to turn PDF pages into images for hosting on a web page. Something quick so you can create a gallery for people who just need that one page instead grabbing the entire PDF file.
PDFXpress can do that, too. This sample program will create a command line tool in C# that will demonstrate the basic commands for using PDFXpress. This will allow a user to convert a PDF document into a series of image files, creating one file per page as a bitmap image specified by the user (BMP, JPEG, GIF, or PNG). A typical program run would be:
The flags "image format", "start page number", and "end page number" are all optional, and without them the program will default to creating a JPEG image of every page in the PDF file.
You'll need a copy of the Accusoft PDFXpress.NET SDK. Copy the CMap and Font folders from the Support folder where PDFXpress was installed to a folder called ‘library’. The setup should look like this:
The PDF file will be in the same folder as the executable. If you want to dive right in, download the sample code.
For more explanation, continue reading to follow the code walkthrough.
Getting set up
In order to create the program, we'll be using Visual Studio 2017 Community Edition, and the PDFXpress.NET SDK kit. Previous versions of Visual Studio are also compatible with PDFXpress.
Once you have downloaded PDFXpress, create your project as a C# Console Application. In order to leverage the right name spaces, we need to edit our references (either by right clicking on "References in the Solution Explorer, or clicking "Project -> Add Reference".)
Select "Extensions", and make sure that "System.Drawing" and "Accusoft PDFXpress7.NET" are checked:
PDF to Image Code Walkthrough
To start, we need the following namespaces for our code to work properly:
In this case, the class "PDFXpressConverter" is used to hold our methods. To make sure things are set up right, we can load our initializing settings in the constructor:
This tells our PDFXpress object where the font and character mapping files are in the "library" folder (make sure to save a copy of Font and CMap into a folder called "library" with your executable).
Just to make sure our program exits cleanly, we'll also set up a deconstructor to clear out the PDFXpress object from memory:
We'll skip explaining the main function – all it really does is load up the settings of the file name, file format, start and end page to convert our PDF into images. Let's take a direct look into our conversion method, ConvertPDF2Image:
The first thing to do is load the PDF file into our PDFXpress object:
You may be asking: Why are we storing the result of adding the file to our PDFXpress object as an integer? In this case, "index" tracks the documents collection within our object – meaning you can store multiple PDFs into one PDFXpress object, each tracked by a different number.
Next we'll set up our rendering options. We'll keep this at a resolution of 300 x 300. Here's a neat tip – one of the rendering options allows for annotations to be captured as well. See more detail on the PDFXpress API documentation PDFXpress API documentation.
We're going to take a look at one piece of our member function that does error correction: If "pageEnd" is set to "-1", then it will go through all the pages. PDFXpress, once it loads a file, has a simple property for displaying the total number of pages:
Why set pageEnd to the last page minus one? The pages are listed with 0 being the first page, so if it's a 500 page document, the last page will be "499".
And finally, here's the part of the method that does the heavy lifting, taking our PDF document and saving each page in our range of pages as images:
A quick note: The "Progress(++pageIndex, totalPages);" is a simple method that displays the current page being rendered and how many pages are left to go.
That's how easy it is to use the PDFXpress SDK! One line of code takes an entire PDF page with all of the text, fonts, images and encapsulates it into a bitmap object.
Interested? Download the sample code and try it out for yourself!