How to Use an API to Convert a PDF to a DOCX File
There are a variety of reasons why someone would need to convert a file format, but the process can be time consuming and confusing if you don’t know where to start. In this blog, I’ll discuss how to use an API to convert a PDF to DOCX file. If you are interested in other file format conversions, check out How to Convert an Excel File to PDF, and keep an eye out for new blogs on different conversions coming soon.
Why PDFs Are Commonly Used File Formats
PDFs are arguably one of the most widely used document file formats in the world. Some of the primary reasons people use PDFs include:
- PDF files are portable and cross platform.
- Plenty of viewing options are available.
- They are a reasonably future proof archival format, perhaps the best available, especially when converted and ‘flattened’ to PDF/A
- You can be sure when sharing a PDF that all users have the same experience, as PDF files support images, vector content, graphics and much more. There are many types of PDF files, but they all share a consistency in display.
- PDF files can (usually) be compressed to save storage space
Benefits of Converting a PDF to an MS Word DOCX File
PDF (Portable Document Format) is great for various uses. We all have PDF viewers on our Windows PCs, Macs, and even Linux systems. For many years now, our favourite browsers have been capable PDF viewers. PDFs are great for disseminating information, controlling the look and feel of your message, and being flexible enough to contain images, vector content (embedded CAD files), and even 3D objects. Some of the most common use cases for PDF viewers, outside of simply viewing documents, include form filling and e-signature ceremonies.
Editing PDF files tends to be less user-friendly, sometimes requiring expensive editing software that may require training to use, and perhaps more importantly, licensing. If you already have a potent MS Word / DOCX editor in your browser, or you license Microsoft Office™ in one of the many possible ways, why bother with licensing or learning another text editor? Instead, convert the PDF(s) to DOCX using Accusoft’s Content Conversion Services API, included with PrizmDoc Viewer. There are plenty of benefits.
- User familiarity with MS Word / MS Office is at an all time high, with most competing DOCX editors adopting the style language and ‘look and feel’ of MS Word.
- Online SaaS DOCX editors are proliferating, from the consumer and business focused Google Docs to more specialized OEM Editors, like Accusoft’s own PrizmDoc Editor.
- More understood by laypeople. Most of us have edited a Word file, not many have edited a PDF.
- Your grandmother can do it.
Do you have your own reasons? Let us know!
Benefits of DOCX Files
When it comes to deciding whether or not to convert to a DOCX file, the simplest argument for the action is that Microsoft took the time to build PDF to DOCX right into Microsoft Word. Open Word, select File, Open, open a PDF file, and Word will convert to a DOCX for you immediately.
Opening each file and converting that way might not be the best option for a lot of developers and ISVs. People who like to build applications targeting specific use cases and keeping their licensed users engaged need tools that can be white-labeled and automated. When this is the case, that’s where Content Conversion Services (CCS) API from Accusoft can help. Let’s look at how conversion can be streamlined using RESTful APIs.
Rather than detail one method, I’m going to provide several different options. All use the same Content Conversion Services (CCS) API. The Server endpoint can be self-hosted or you can sign up and use PrizmDoc Cloud (Accusoft Hosted).
There are several different ways to deploy a content conversion API:
- You can write your own helper application.
- use one provided by Accusoft, for example the sample Express application.
- use the API directly in your application.
- Take advantage of Accusoft’s specialty .NET SDK.
Here are some popular options in no particular order.
Option 1: A Simple Node.js API for Document Processing
Link – https://github.com/Accusoft/document-processing-helper
From the readme.md:
“Simple node.js helper for document processing, powered by PrizmDoc Server. You can use this helper with either PrizmDoc Cloud or your own self-hosted PrizmDoc Server”.
If you don’t have your own PrizmDoc Server instance, the easiest way to get started is with PrizmDoc Cloud. Sign up for a free trial account to get an API key here . Sample code, PDF to DOCX, below. For more information see the full documentation.
const Helper = require('@accusoft/document-processing-helper');
async function main() {
const documentProcessingHelper = new Helper({
prizmDocServerBaseUrl: 'https://api.accusoft.com',
apiKey: 'YOUR_API_KEY'
});
// Initialize a conversion
const output = await documentProcessingHelper.convert({
input: Your_PDF_FILE.PDF',
outputFormat: 'DOCX'
});
// Download the output and save the file
await output[0].saveToFile('output.docx');
}
main();
Option 2: An Express Server Sample
This option requires that you perform a local installation of PrizmDoc Server on Windows or Linux. This one provides a nice GUI with all options available.
If you want this option, follow the install guides nested within the PrizmDoc Administration section of the help file.
Done already? Excellent. You should now be able to access the Express sample at http://localhost:18001, or the domain name of your PrizmDoc Server. See this Configuration Guide for more on the Express Demo.
Option 3: The PrizmDoc Server .NET SDK
This option is great for .NET developers who work in Visual Studio. The PrizmDoc Server .NET SDK is a wrapper around the PrizmDoc Server REST APIs, making it easy to use PrizmDoc Server functionality in .NET.
You can use this library with any deployment of PrizmDoc Server, whether it’s your own self-hosted deployment or Accusoft’s cloud-hosted offering. Simply construct an instance of PrizmDocServerClient with the information about how to connect to PrizmDoc Server and start using any of the document-processing methods to do things like:
- Convert documents to PDF, TIFF, JPEG, PNG, or SVG.
- Combine documents to PDF or TIFF.
- Extract pages from documents.
- Split and merge pages from various documents.
- Create thumbnail images for document pages.
- Apply headers and footers to documents.
- Perform OCR to produce a text-searchable PDF.
- Automatically identify text to be redacted by regex.
- Redact to PDF or plain text.
- Burn-in annotations to PDF.
There is extensive documentation here – https://help.accusoft.com/PrizmDoc/sdks/server/dotnet/v1/.
Option 4: The Content Conversion Services RESTful API
Content Conversion Services (CCS) is available via PrizmDoc Cloud (Accusoft Hosted) or by self-hosting PrizmDoc Server. For self-hosting, the options include Windows, Linux, and Docker. Whichever you choose, the API will be available post-install.
The documentation is detailed, but most conversion operations are started by uploading a source document to the API. To convert that PDF to a DOCX, it starts with a POST:
"request": {
"method": "POST",
"url": "http://localhost:18001/#/requestLo/v2/contentConverters",
"json": {
"input": {
"sources": [
{
"fileId": "p6tzcndAniM_dwQybR0LZw"
}
],
"dest": {
"format": "docx"
},
"_features": {
"pdfToDocx": {
"enabled": true
}
}
}
}
},
In summary, PDF files, while great for viewing, signing, and forms completion, are not a common editing format. Certainly not as common as the ubiquitous Microsoft Word. If your organization has tasked you or your developers with PDF to DOCX conversion, the Content Conversion Services API and one of the above methods will take you from zero to hero using proven, supported technology.
If you have any questions about this technology, or any Accusoft solution, please contact us.