A user is having problems with tinymce, UTF8 encodeing and formatting issues

jhh's Avatar

jhh

10 Apr, 2012 03:10 PM

A user is having problems with tinymce, UTF8 encoding and formatting issues. Here are the contents of a couple of EMail messages he passed my way:
Message 1
No change that I can discern. The behind-the-screen load of UTF-8 characters still gets them into the database correctly as UTF-8 characters and back out too for page serving. However editing a page so input (merely going into edit mode and submitting) or putting UTF-8 encoded characters into the WYSIWYG boxes (but not textareas) seems to get the instant translation to HTML Unicode entities, UTF-8 is gone for good (or bad, actually).

Message 2
Well there’s one piece of good news but mostly nothing’s changed. The good news is that when I do an automated behind-the-screen load of page data when converting it from the old site to a CMS one, the UTF-8 characters I stuff into the CMS stay UTF-8 characters and publish as UTF-8 characters (before they became byte by byte entities). Unfortunately the instant I edit that loaded page through the CMS the WYSIWYG editor (apparently) converts them all to HTML Unicode entities the same as when I enter UTF-8 characters into the WYSIWYG box. Once the CMS has made them HTML Unicode entities it keeps them forever as HTML Unicode entities and publishes them as such too, not as UTF-8 encoded characters. I didn’t see any changes for text boxes or textareas but I did not examine minutely.

Message 3
I’m not getting UTF-8 output out of the Cascade server on publication, I’m getting HTML entities instead regardless of whether I check or don’t check (the UTF-8 option) for my destination. UTF-8 can be input but it immediately gets translated to the HTML entities and saved that way.

INPUT:

 年 ミシガン州のロムニー知事と滋賀県の野崎知事によって、姉妹県州協定が締結されました。ミシガン州と滋賀県のパートナーシップは、日本とアメリカの姉妹県州関係のなかで、最も歴史が古く、さまざまな分野に広がっています。ミシガンと滋賀の住民は、姉妹都市間の共同事業や、学生、教員、地域社会の方々、官庁職員を含む交流事業によって、緊密な関係を維持し続けています。

OUTPUT:

年 ミシガン州のロムニー知事と滋賀県の野崎知事によって、姉妹県州協定が締結されました。ミシガン州と滋賀県のパートナーシップは、日本とアメリカの姉妹県州関係のなかで、最も歴史が古く、さまざまな分野に広がっています。ミシガンと滋賀の住民は、姉妹都市間の共同事業や、学生、教員、地域社会の方々、官庁職員を含む交流事業によって、緊密な関係を維持し続けています。

In any event, it is a problem, but I suspect you folks already know about it.

Let me know if there is anything to be done for it at this time.

Thanks,
John Hayes

  1. 1 Posted by Ryan Griffith on 20 Jun, 2012 01:24 PM

    Ryan Griffith's Avatar

    Hi John,

    Apologies for the late response.

    There was a recent discussion regarding language support that sounds similar to this.

    In Tim's response, he mentions:

    Cascade applies cleanup routines to the content. During this process, Cascade will convert those characters to numeric entities. This is done to prevent some issues we've experienced in the past with UTF-8 characters getting mangled by the editor.

    Please let us know if you have any further questions.

    Thanks.

  2. Ryan Griffith closed this discussion on 30 Jul, 2012 01:03 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac