![]() If you've done that, then any remaining problem is ActivePDF's fault. Well anyway, for now you can add one of the following to your document's and see if that makes it look right in the browser: Replace the header Shakespeares Website with the real name of your website.2.Visitors arriving at your site at the main page should be able to tell what your. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business. KompoZer's unique spelling might be linked to Kaz and Glazman’s names. The lead developer of was Fabien Cavanaze, who is also known as Kaz. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. KompoZer was an open-source HTML editor created in 2005 to fix bugs on a previous HTML editor known as Nvu. What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. That includes a trailing nbsp which you might not be noticing if that byte isn't there, then something else has mauled your document and we need to see further up to find out what. The non-breaking space character is byte 0xA0 in ISO-8859-1 when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as "Â ". That'd be encoding to UTF-8 then, not ISO-8859-1. ![]() Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character I'm getting by with this for now, though it hardly seems like a good solution: Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String It works similarly for &x200A which is replaced with a small space that looks like a space in source mode. Return (Encoding.Convert(isoEncoding, Encoding.UTF8, source)) Helping new users of KompoZer and The GIMP. Private Shared Function ConvertToUTF8(ByVal html As String) As Stringĭim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")ĭim source As Byte() = isoEncoding.GetBytes(html) My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything. ActivePDF pukes on these non-UTF8 characters. Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). Send off the HTML to a web service that creates the PDF.Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values).Pull an HTML template from a DB with tokens in it to be replaced (e.g.It generates a bunch of HTML that gets turned into PDF reports by ActivePDF. I've got a legacy app just starting to misbehave, for whatever reason I'm not sure.
0 Comments
Leave a Reply. |