Skip to main content

How to Use an API to Convert a PDF to a DOCX File

 

There are a variety of reasons why someone would need to convert a file format, but the process can be time consuming and confusing if you don’t know where to start. In this blog, I’ll discuss how to use an API to convert a PDF to a DOCX file. If you are interested in other file format conversions, check out How to Convert an Excel File to PDF, and keep an eye out for new blogs on different conversions coming soon.


Why PDFs Are Commonly Used File Formats

PDFs are arguably one of the most widely used document file formats in the world. Some of the primary reasons people use PDFs include:

  • PDF files are portable and cross platform.
  • Plenty of viewing options are available.
  • They are a reasonably future proof archival format, perhaps the best available, especially when converted and ‘flattened’ to PDF/A
  • You can be sure when sharing a PDF that all users have the same experience, as PDF files support images, vector content, graphics and much more. There are many types of PDF files, but they all share a consistency in display.
  • PDF files can (usually) be compressed to save storage space

Benefits of Converting a PDF to an MS Word DOCX File

PDF (Portable Document Format) is great for various uses. We all have PDF viewers on our Windows PCs, Macs, and even Linux systems. For many years now, our favourite browsers have been capable PDF viewers. PDFs are great for disseminating information, controlling the look and feel of your message, and being flexible enough to contain images, vector content (embedded CAD files), and even 3D objects. Some of the most common use cases for PDF viewers, outside of simply viewing documents, include form filling and e-signature ceremonies.

Editing PDF files tends to be less user-friendly, sometimes requiring expensive editing software that may require training to use, and perhaps more importantly, licensing. If you already have a potent MS Word / DOCX editor in your browser, or you license Microsoft Office™ in one of the many possible ways, why bother with licensing or learning another text editor? Instead, convert the PDF(s) to DOCX using Accusoft’s Content Conversion Services API, included with PrizmDoc Viewer. There are plenty of benefits.

  • User familiarity with MS Word / MS Office is at an all time high, with most competing DOCX editors adopting the style language and ‘look and feel’ of MS Word.
  • Online SaaS DOCX editors are proliferating, from the consumer and business focused Google Docs to more specialized OEM Editors, like Accusoft’s own PrizmDoc Editor.
  • More understood by laypeople. Most of us have edited a Word file, not many have edited a PDF.
  • Your grandmother can do it.

Do you have your own reasons? Let us know!

Benefits of DOCX Files

When it comes to deciding whether or not to convert to a DOCX file, the simplest argument for the action is that Microsoft took the time to build PDF to DOCX right into Microsoft Word. Open Word, select File, Open, open a PDF file, and Word will convert to a DOCX for you immediately. 

Opening each file and converting that way might not be the best option for a lot of developers and ISVs. People who like to build applications targeting specific use cases and keeping their licensed users engaged need tools that can be white-labeled and automated. When this is the case, that’s where Content Conversion Services (CCS) API from Accusoft can help. Let’s look at how conversion can be streamlined using RESTful APIs.

Rather than detail one method, I’m going to provide several different options. All use the same Content Conversion Services (CCS) API. The Server endpoint can be self-hosted or you can sign up and use PrizmDoc Cloud (Accusoft Hosted).

There are several different ways to deploy a content conversion API:

  • You can write your own helper application.
  • use one provided by Accusoft, for example  the sample Express application.
  • use the API directly in your application.
  • Take advantage of Accusoft’s specialty .NET SDK

 

Here are some popular options in no particular order.

Option 1: A Simple Node.js API for Document Processing

Link – https://github.com/Accusoft/document-processing-helper

From the readme.md:

Simple node.js helper for document processing, powered by PrizmDoc Server. You can use this helper with either PrizmDoc Cloud or your own self-hosted PrizmDoc Server”.

If you don’t have your own PrizmDoc Server instance, the easiest way to get started is with PrizmDoc Cloud. Sign up for a free trial account to get an API key here . Sample code, PDF to DOCX, below. For more information see the full documentation.



const Helper = require('@accusoft/document-processing-helper');
 
async function main() {
  const documentProcessingHelper = new Helper({
    prizmDocServerBaseUrl: 'https://api.accusoft.com',
    apiKey: 'YOUR_API_KEY'
  });
 
  // Initialize a conversion
  const output = await documentProcessingHelper.convert({
      input: Your_PDF_FILE.PDF',
      outputFormat: 'DOCX'
    });
 
  // Download the output and save the file
    await output[0].saveToFile('output.docx');
}
 
main();

Option 2: An Express Server Sample

This option requires that you perform a local installation of PrizmDoc Server on Windows or Linux. This one provides a nice GUI with all options available.

If you want this option, follow the install guides nested within the PrizmDoc Administration section of the help file.

Done already? Excellent. You should now be able to access the Express sample at http://localhost:18001, or the domain name of your PrizmDoc Server. See this Configuration Guide for more on the Express Demo. 

Option 3: The PrizmDoc Server .NET SDK

This option is great for .NET developers who work in Visual Studio. The PrizmDoc Server .NET SDK is a wrapper around the PrizmDoc Server REST APIs, making it easy to use PrizmDoc Server functionality in .NET.

You can use this library with any deployment of PrizmDoc Server, whether it’s your own self-hosted deployment or Accusoft’s cloud-hosted offering. Simply construct an instance of PrizmDocServerClient with the information about how to connect to PrizmDoc Server and start using any of the document-processing methods to do things like:

  • Convert documents to PDF, TIFF, JPEG, PNG, or SVG.
  • Combine documents to PDF or TIFF.
  • Extract pages from documents.
  • Split and merge pages from various documents.
  • Create thumbnail images for document pages.
  • Apply headers and footers to documents.
  • Perform OCR to produce a text-searchable PDF.
  • Automatically identify text to be redacted by regex.
  • Redact to PDF or plain text.
  • Burn-in annotations to PDF.

There is extensive documentation here – https://help.accusoft.com/PrizmDoc/sdks/server/dotnet/v1/.

Option 4: The Content Conversion Services RESTful API

Content Conversion Services (CCS) is available via PrizmDoc Cloud (Accusoft Hosted) or by self-hosting PrizmDoc Server. For self-hosting, the options include Windows, Linux, and Docker. Whichever you choose, the API will be available post-install.

The documentation is detailed, but most conversion operations are started by uploading a source document to the API. To convert that PDF to a DOCX, it starts with a POST:


    "request": {
        "method": "POST",
        "url": "http://localhost:18001/#/requestLo/v2/contentConverters",
        "json": {
            "input": {
                "sources": [
                    {
                        "fileId": "p6tzcndAniM_dwQybR0LZw"
                    }
                ],
                "dest": {
                    "format": "docx"
                },
                "_features": {
                    "pdfToDocx": {
                        "enabled": true
                    }
                }
            }
        }
    },

 

In summary, PDF files, while great for viewing, signing, and forms completion, are not a common editing format. Certainly not as common as the ubiquitous Microsoft Word. If your organization has tasked you or your developers with PDF to DOCX conversion, the Content Conversion Services API and one of the above methods will take you from zero to hero using proven, supported technology.

If you have any questions about this technology, or any Accusoft solution, please contact us.

Brandon Mount, Sr. Pre-Sales Software Engineer

Brandon Mount currently works as a senior pre-sales software engineer at Accusoft. He joined Accusoft in 2010 as a sales engineer. Prior to Accusoft, Brandon spent most of the late 90s and 2000s working as a support and implementation consultant in the Enterprise Content Management space in London, UK and Silicon Valley. Brandon spends his day to day working with Accusoft customers, so if you are considering adopting Accusoft technologies, there is a very good chance you will meet him and his teammates. He has a passion for researching and understanding Accusoft customer requirements, and matching the correct technology that will solve those requirements. When he’s not working, Brandon is a proud parent. For fun, he’s an amateur photographer. To stay mentally and physically fit he jogs daily.