Join us for an engaging webinar, as we unravel the potential of AI for revolutionizing document management.
Watch Now
Enable your employees to remain productive throughout the document management process.
Read More
Learn how SmartZone uses a regular expression engine integrated into the recognition engine to achieve the best possible accuracy on data that can be defined by a regular expression.
Docubee is an intelligent contact automation platform built to help your team success
I want to load an HTML document in PrizmDoc with UTF-8 encoding. Can this be done automatically in the product?
Currently, no. We have a parameter for .txt files which does that (detailed here), but this “textFileEncoding” intentionally only works for .txt, not .html files. There is a feature request for this:
https://ideas.accusoft.com/ideas/PDV-I-546
In the meantime, this can be fixed manually by adding charset = “utf-8” to the meta tag of the HTML document. One POC way this might be done programmatically is below in Python 3.7 (need obvious polishing like checking for the tag already existing, multiple “meta” tags, etc):
with open(filename, "r") as file: content = file.read() index = content.find("meta") + len("meta") new_content = content[:index] + " charset=\"utf-8\" " + content[index:] with open(filename, "w") as file: file.write(new_content)