How To Convert Microsoft Word (Docx/Doc) To PDF In C# with ImageGear


01/31/2018

HTML to PDF in C#
Goal

Create a C# command line program that can read from existing Microsoft .docx (or .doc) documents and convert them to an Adobe PDF file

Requirements
Programming Skill

Visual C# Intermediate Level

Need to turn Microsoft Word documents into PDFs? That's easy: Click File > Export > Create PDF/XPS > Publish. Want to do this 1000 times? Nah. The process is laborious if you have more than one document.

So let’s use C# to convert Docx or Doc files to PDF programmatically, so you can convert hundreds of documents in seconds.


Installing The Docx To PDF SDK (ImageGear)

First, we need to install a .NET SDK for handling the heavy lifting of the Word to PDF file conversion. The examples below will be using Microsoft Visual Studio 2017, but you can use previous versions back to Visual Studio 2010.

  1. After you've installed Visual Studio to your liking, head over to the Accusoft ImageGear Developer Toolkit, and download the version for .NET. As we can see, there is support for Java, C, and C++ to fit your favorite development platform. Download the .NET installer.
  2. Now you can run the Accusoft ImageGear Developer Toolkit installer. It will take a little bit as it downloads additional libraries.
  3. OK – installation is done! Let's get to coding!

The ImageGear Developer Toolkit will put the files into the Public Documents of the file system, usually located at "C:\Users\Public\Documents\Accusoft". We'll be referring to them as we go along.


Setup Your Project

Once you have the toolkit installed, let's create a new C# project. For this project we'll just do a C# command line program so we dive right into the meat of the program, rather than needing to build a GUI with Windows Forms or WPF. But once you have it here, you can import this class into any other .NET project you like.

Just click on File, Project, and from the "Visual C#" list select Console App (.Net Framework):

Sample Code

To keep things simple we'll name the project "ImageGearConversionDemo."

Once the project is started in Visual Studio, we can use NuGet to add the reference files we need:

  1. From within Visual Studio, click on Tools, NuGet Package Manager, then Manage NuGet Packages for Solution.
  2. Make sure that the Package Source is set to nuget.org:
  3. Package Source
  4. Select "Browse", then input "ImageGear" into the search window. You'll see different installation options depending on your project. Just to make things easier on us, select "Accusoft.ImageGear.All" to snag everything in one fell swoop. Or you can just specify the ones you need: ImageGear.Core, ImageGear.Evaluation, ImageGear.Formats, ImageGear.Formats.Office, & ImageGear.Formats.PDF. Click the project we want to apply it to, click "Install", and NuGet will take care of the details.
  5. Installing ImageGear
  6. We can also see via the "Solutions Explorer" window that NuGet automatically added the references we need for the project:
  7. Solution Explorer

    Next we'll want to make sure that the components that do the document conversion are in place.

  8. Click on Project, then Properties at the bottom. In the case of our example, that will be ImageGearConversionDemo. Click on Build. Make sure the Platform Target is x64, and the output directory is bin\Debug.
  9. Project Build
  10. In the Toolkit install directory, in a standard install, is a folder C:\Users\Public\Documents\Accusoft\ImageGear.NET v23 64-bit\Bin\OfficeCore. Copy the entire folder to your Debug directory.
  11. To make things easier, let's also set up our project properties for how the program is run. Click the Debug tab. Our final program is going to take two parameters:

  12. The DOCX file we're going to convert.
  13. The PDF file we're converting our DOCX file to.

You can set Command line arguments in Visual Studio by right clicking on your project in the Solutions Explorer, and going to Properties > Debug. Put in the two file names we'll be using. In our case, we will be using TheGreatestDocInTheWorld.docx, and outputting it to TheGreatestDocInTheWorld.pdf . Set those as your arguments, then make sure that the Working directory is in our Debug folder since that's where we're generating our program to.

TheGreatestDocInTheWorld

If you want to add the ImageGear references to your program manually, you can use the instructions in the Accusoft ImageGear .NET documentation.

With that, now we can get to coding!


C# Sample Code

Here's our C# code for testing out ImageGear's Word to PDF conversion capabilities. It works with .docx and .doc files. You can copy/paste this code to get started (you’ll also need a free trial version of ImageGear), or keep scrolling for a walkthrough of the code.


Understanding The Word To PDF C# Code

The first part of the C# program is going to be importing our namespaces. If you used NuGet then you have all of the references you need. But the program needs then declared so they can be used. You can find more information on the API calls on the ImageGear User Guide, but here's the namespaces we'll need:

The ImageGear.Formats.Office and ImageGear.Formats.PDF are what's going to provide the bulk of the work here – they are what will be able to read from a docx, and write to a pdf file.

To handle the class conversions, we'll create a simple class and call it DocConverter. For this example, we're going to populate it with just one method – SaveDocXAsPdf:

SaveDocXAsPDF has two required arguments, and one optional one that we'll use in this example to control any console output. They're just there so we can trace the program steps as it goes through – by default, they won't display.

Before we do anything, we have to initialize the license. We'll be using an evaluation copy for this demonstration – but if you already have a license, follow the registration steps on the Accusoft ImageGear .NET instruction page ( http://help.accusoft.com/ImageGear-Net/v24.0/Windows/HTML/webframe.html#topic601.html ).

The next thing to do is to initialize the ImageGear File Format – in this case, Microsoft Word. In another example we'll show how to expand that to other file formats.

And while we're at it, we'll also initialize the ImageGear PDF object. This is an important step: Whenever we Initialize an ImageGear PDF object, it must be terminated later. Here's how it looks in our program:

ImGearPDF is not a typical C# object that self terminates, so make sure it's terminated.

Now – the actual reading of .doc/.docx files and writing of PDF files is pretty simple:

If we follow the code, the process is straightforward. Remember the "verbose" option will turn on and off the console outputs if you want the program to be quieter.

First, we create a file input stream, and a file output stream. The office document is loaded into the variable igDocument. We then set up the pdfOptions that will be used for exporting the file. And finally – write the file. If there is already a PDF file with the same name, we're going to overwrite it.

Let's see our C# Docx to Pdf code in action:

Initializing Conversion Program

If we compare our new PDF to a PDF created using Microsoft Word's export option, the file created by ImageGear is smaller – 383 KB versus 504 KB. And the PDF file generated with ImageGear has kept all internal links and formatting.

Converting a DOCX to PDF is just scratching the surface of what ImageGear can do. ImageGear supports over 100 file formats for conversion, editing, compression, and more To find out more, check out the ImageGear overview page.

Related posts


TeamCity Dependencies
Visualizing TeamCity Dependencies with Python
Read More >
reactive javascript
Vue.js: Embracing Reactive JavaScript Without Losing Your Mind
Read More >
ImageGear C++ samples
A look at ImageGear C++ Samples
Read More >

Join the discussion.