Can PrizmDoc Viewer load HTML documents with UTF-8?

Question

I want to load an HTML document in PrizmDoc with UTF-8 encoding. Can this be done automatically in the product?

Answer

Currently, no. We have a parameter for .txt files which does that (detailed here), but this “textFileEncoding” intentionally only works for .txt, not .html files. There is a feature request for this:

https://ideas.accusoft.com/ideas/PDV-I-546

In the meantime, this can be fixed manually by adding charset = “utf-8” to the meta tag of the HTML document. One POC way this might be done programmatically is below in Python 3.7 (need obvious polishing like checking for the tag already existing, multiple “meta” tags, etc):

with open(filename, "r") as file:
    content = file.read()

index = content.find("meta") + len("meta")

new_content = content[:index] + " charset=\"utf-8\" " + content[index:]

with open(filename, "w") as file:
    file.write(new_content)