How to Convert PDF to Image (JPG or PNG) In C#

Goal

Create a command line program in C# that can convert a PDF document into a series of images, one for each page of the document. The program will allow the user to select the start and end pages to convert, and what bitmap file format (JPEG, BMP, GIF, and PNG) to save in.

Requirements

C# development environment (Visual Studio used in this example)
Accusoft PDFXpress .Net SDK (download free evaluation copy here)

Accusoft’s PDFXpress is a great tool for editing, annotating, and putting information into a PDF file or extracting it. But suppose you need to turn PDF pages into images for hosting on a web page. Something quick so you can create a gallery for people who just need that one page instead grabbing the entire PDF file.
PDFXpress can do that, too. This sample program will create a command line tool in C# that will demonstrate the basic commands for using PDFXpress. This will allow a user to convert a PDF document into a series of image files, creating one file per page as a bitmap image specified by the user (BMP, JPEG, GIF, or PNG). A typical program run would be:


PDFXpressConvertPDF2Image [PDF file] [image format] [start page number] [end page number]

The flags “image format”, “start page number”, and “end page number” are all optional, and without them the program will default to creating a JPEG image of every page in the PDF file.

Prerequisites

You’ll need a copy of the Accusoft PDFXpress.NET SDK. Copy the CMap and Font folders from the Support folder where PDFXpress was installed to a folder called ‘library’. The setup should look like this:


PDFXpressConvertPDF2Image
├──library
│   ├───CMap
│   └───Font

The PDF file will be in the same folder as the executable. If you want to dive right in, download the sample code.

For more explanation, continue reading to follow the code walkthrough.

Getting set up

In order to create the program, we’ll be using Visual Studio 2017 Community Edition, and the PDFXpress.NET SDK kit. Previous versions of Visual Studio are also compatible with PDFXpress.

Once you have downloaded PDFXpress, create your project as a C# Console Application. In order to leverage the right name spaces, we need to edit our references (either by right clicking on “References in the Solution Explorer, or clicking “Project -> Add Reference”.)

Select “Extensions”, and make sure that “System.Drawing” and “Accusoft PDFXpress7.NET” are checked:

PDF to Image Code Walkthrough

To start, we need the following namespaces for our code to work properly:


   using Accusoft.PdfXpressSdk;
   using System.IO;
   using System.Drawing;

In this case, the class “PDFXpressConverter” is used to hold our methods. To make sure things are set up right, we can load our initializing settings in the constructor:


public PDFXpressConverter()
{
        pdfXpress = new Accusoft.PdfXpressSdk.PdfXpress();
        string resourcePath = Environment.CurrentDirectory + @"library";
        string fontPath = resourcePath + "Font";
        string cmapPath = resourcePath + "CMap";
        //init the PDFXPress object
        pdfXpress.Initialize(fontPath, cmapPath);
}

This tells our PDFXpress object where the font and character mapping files are in the “library” folder (make sure to save a copy of Font and CMap into a folder called “library” with your executable).

Just to make sure our program exits cleanly, we’ll also set up a deconstructor to clear out the PDFXpress object from memory:


~PDFXpressConverter()
{
         //properly dispose of our PDF object
        if (pdfXpress != null)
        {
                pdfXpress.Dispose();
                pdfXpress = null;
        }
}

We’ll skip explaining the main function – all it really does is load up the settings of the file name, file format, start and end page to convert our PDF into images. Let’s take a direct look into our conversion method, ConvertPDF2Image:


public int ConvertPDF2Image(string PDFFileName, String format = "jpg", System.Int32
pageStart = 0, System.Int32 pageEnd = -1)

The first thing to do is load the PDF file into our PDFXpress object:


System.Int32 index = pdfXpress.Documents.Add(PDFFileName);

You may be asking: Why are we storing the result of adding the file to our PDFXpress object as an integer? In this case, “index” tracks the documents collection within our object – meaning you can store multiple PDFs into one PDFXpress object, each tracked by a different number.

Next we’ll set up our rendering options. We’ll keep this at a resolution of 300 x 300. Here’s a neat tip – one of the rendering options allows for annotations to be captured as well. See more detail on the PDFXpress API documentation PDFXpress API documentation.


RenderOptions renderOpts = new RenderOptions();
renderOpts.ProduceDibSection = false;
renderOpts.ResolutionX = 300;
renderOpts.ResolutionY = 300;

We’re going to take a look at one piece of our member function that does error correction: If “pageEnd” is set to “-1”, then it will go through all the pages. PDFXpress, once it loads a file, has a simple property for displaying the total number of pages:


//if they give us -1 as the parameter, do all the pages
if (pageEnd == -1)
{
     pageEnd = pdfXpress.Documents[index].PageCount - 1;
}

Why set pageEnd to the last page minus one? The pages are listed with 0 being the first page, so if it’s a 500 page document, the last page will be “499”.

And finally, here’s the part of the method that does the heavy lifting, taking our PDF document and saving each page in our range of pages as images:


//Start from the first page to be converted, and keep going through
for (int pageIndex = 0; pageStart <= pageEnd; pageStart++)
{
//Render the current PageStart page using the options above to the Bitmap object
   using (Bitmap bp = pdfXpress.Documents[index].RenderPageToBitmap(pageStart, renderOpts))
     {
         //save the bitmap object as a file
         bp.Save(outputFileBase + pageStart.ToString(numberFormat) + "." + format, imageType);
     }
         //show current progress
         Progress(++pageIndex, totalPages);
         //ending the progress line
     }
System.Console.WriteLine();

A quick note: The “Progress(++pageIndex, totalPages);” is a simple method that displays the current page being rendered and how many pages are left to go.


using (Bitmap bp = pdfXpress.Documents[index].RenderPageToBitmap(pageStart, renderOpts))

That’s how easy it is to use the PDFXpress SDK! One line of code takes an entire PDF page with all of the text, fonts, images and encapsulates it into a bitmap object.

Interested? Download the sample code and try it out for yourself!