A Guide for Adding Voice Support to the PrizmDoc Viewer

Technology is simplifying communication between humans and computers. The old keyboard-and-mouse model has become antiquated for many uses, and even touch screens introduce inefficiencies to the process of entering and retrieving information. As multitasking has become the norm in both personal and professional life, a very pure simplicity – voice technology – is emerging as the new standard for managing devices in the home as well as the office.

The use of voice commands is a simple, natural way to control machines, freeing up the user’s hands much like speakerphones, headsets and the popular Amazon Echo and Google Home smart speakers. Word recognition technology still isn’t perfect and can lead to some unintended (and very funny) gaps in semantics, but most users are more than willing to accept the occasional misstep in exchange for the convenience of breaking away from the keyboard and screen while operating a device.

Who can voice technology help?

There are numerous examples of cases where this technology is useful, such as in presenting information. Years ago, a slide show presenter might use an assistant to manually advance through a display, typically, sequence of images or data. More recently, a Bluetooth controller with “Next” and “Previous” buttons could maneuver somewhat more smoothly through the slides. But voice commands take the interface to another level, freeing the presenter to interact more freely with the audience and possibly present more information in parallel.

Users reading instructions to perform a complex task can also benefit from the technology. Voice commands controlling a phone or tablet can help navigate through a PDF or Microsoft Word document detailing recipes, furniture assembly or even engine repair while leaving the operator’s hands – and eyes – free to handle the job.

Portability is another key element in the use of voice commands. The ability to verbally control an application makes it much more accessible and convenient on the go, whether at a hospital, school or other public place where typing can be awkward and time-consuming. Retrieving maps, directions and contact information are all important tasks for mobile users that are simplified by voice functionality.

An entire generation, of course, is already accustomed to this technology, using it to handle everyday tasks without a second thought: “Siri, where’s the nearest coffee shop?” “Alexa, play that song that goes ‘Make a change, and break away.’” More and more devices facilitate the exchange of information by handling voice commands just like these, indulging the growing demand for automation.

Making the PrizmDoc viewer understand voice

We identified a number of use cases where voice support can be a huge help for anyone viewing information. The versatile PrizmDoc is a powerful tool for accessing a multitude of documents and images, and here we will detail the technical process of making it understand voice commands.

PrizmDoc allows multiple types of customization, and adding basic voice control is not too different from adding a custom button. For our purposes we are using Web Speech API to process voice data.

We will add a button which starts voice recognition in the viewer. When the button is pressed, the viewer will listen and process a few voice commands, such as scrolling, page navigation and adding annotations.

If the viewer cannot understand a command, it will indicate this with a red question mark in the voice recognition button.

Adding the button HTML

The bulk of the viewer markup is inside the file viewerTemplate.html; this includes all the toolbars and vertical slide-outs.

OS and browser requirements

This example was tested in the Chrome browser running Windows. See the “Browser Compatibility” section on Mozilla’s developer site for more details on browser support.

Note: Chrome only allows access to the microphone if the web site uses HTTPS protocol, or if it is hosted locally.

A more complete viewer solution

PrizmDoc, compatible with all programming languages and platforms and able to display all major file formats, has established itself as the premier development toolkit for document and image viewing. Adding voice support to its user interface makes it easier to operate just as it saves time and trouble for developers and end users alike.

To learn more about PrizmDoc and its various uses in virtually unlimited industries, click here. To view demos of PrimzDoc’s features, click here. To contact Accusoft with any questions or comments about this blog or its products for document and imaging solutions, click here.