PrizmDoc Hybrid Viewing: Reduce server viewing requirements and streamline document processing
Enable your employees to remain productive
throughout the document management process.
Learn how SmartZone uses a regular
expression engine integrated into the recognition engine to achieve the best possible accuracy
on data that can be defined by a regular expression.
Docubee is an intelligent contact automation platform built to help your team success
I want to load an HTML document in PrizmDoc with UTF-8 encoding. Can this be done automatically in the product?
Currently, no. We have a parameter for .txt files which does that (detailed here), but this “textFileEncoding” intentionally only works for .txt, not .html files. There is a feature request for this:
In the meantime, this can be fixed manually by adding charset = “utf-8” to the meta tag of the HTML document. One POC way this might be done programmatically is below in Python 3.7 (need obvious polishing like checking for the tag already existing, multiple “meta” tags, etc):
with open(filename, "r") as file:
content = file.read()
index = content.find("meta") + len("meta")
new_content = content[:index] + " charset=\"utf-8\" " + content[index:]
with open(filename, "w") as file: