Why doesn't cascade XML use CDATA for WYSIWYG content?

pkhalil's Avatar

pkhalil

13 Aug, 2010 11:08 PM

When using the WYSIWYG editor the raw XML from cascade places HTML elements directly inside the XML elements, which makes it very difficult to work with and also poses the risk of valid HTML breaking the XML output. If the HTML is not in XHTML format this will render the XML broken & useless since it will not be well formed.

Here's an example ...

If the WYSIWYG contains the following valid HTML markup ...

<h1>This is a title
<p>this is a paragraph

The cascade raw XML output would result in the following malformed XML since not all tags are closed ...

<system-data-structure>
    <wysiwyg-content>
        <h1>This is a title
        <p>this is a paragraph
    </wysiwyg-content>
</system-data-structure>

Somehow either cascade or the WYSIWYG editor itself silently converts the markup to XHTML with varying results ...

<h1>this is a heading
<p> this is a paragraph</p></h1>

This is not what was intended and in fact now non-valid HTML although the XML is now wellformed :(

The correct solution would be for cascade to not alter the WYSIWYG markup at all and embed the content using CDATA in the XML

<system-data-structure>
    <wysiwyg-content><![CDATA[ 
        <h1>This is a title
        <p>this is a paragraph
    ]]></wysiwyg-content>
</system-data-structure>

Is there anyway to get cascade to correct this behavior? It seems like this should at least be a setting, if not the default.

  1. Support Staff 1 Posted by Tim on 18 Aug, 2010 09:03 PM

    Tim's Avatar

    Hi,

    Cascade is based on XML so Tidy will run and attempt to make the content well-formed XML.

    Are you wanting to output those tags as HTML markup or are you trying to render them as text on a page? If you are attempting to render them as text, try using the escape sequences for those characters (ie %lt; and &gt;).

  2. 2 Posted by pkhalil on 19 Aug, 2010 05:52 AM

    pkhalil's Avatar

    The problem here is that this mixes the content with the XML structure and there is no easy way to separate the two ... this probably violates all sorts of XML data principles but mainly makes it a major pain in the ass to work with either using XSLT or PHP XML functions.

  3. 3 Posted by pkhalil on 23 Aug, 2010 06:30 PM

    pkhalil's Avatar

    Ok, here is a little XML quiz ... which one of these options is the best way
    to embed HTML in XML? (hint: not the way cascade does it)


    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <item>
            <!-- 0. this is plain data -->
            here is some plain text data
        </item>
        <item>
            <!-- 1. this HTML will break the XML -->
            <h1>this is a heading
            <p>this is a paragraph
            <br>
        </item>
        <item>
            <!-- 2. this XHTML will be fine as long as there are no unknown entities or encodings -->
            <h1>this is a heading</h1>
            <p>this is a paragraph</p>
            <br />
        </item>
        <item>
            <!-- 3. this XHTML has entities &amp; will break -->
            <h1>this is a heading</h1>
            <p>this is a paragraph with entities &copy; </p>
            <br />
        </item>
        <item>
            <!-- 4. this example shows unwrapped content and nesting problems -->
            this is unwrapped text
            <h1>this is a heading</h1>
            here is more unwrapped text
            <p>this is a paragraph with <span>nested content</span>
            and even more unwrapped text
        </item>
        <item>
            <!-- 5. this CDATA will be fine no matter what -->
            <![CDATA[
            this is unwrapped text
            <h1>this is a heading</h1>
            here is more unwrapped text
            <p>this is a paragraph with <span>nested content</span>and entities &copy;</p>
            and even more unwrapped text
            <br>
            ]]>
        </item>
    </root>
  4. Support Staff 4 Posted by Tim on 27 Aug, 2010 06:54 PM

    Tim's Avatar

    Hi,

    Cascade Server uses XML throughout the entire application. The technologies that the system uses to transform content (Velocity, XSL) rely on the data being well-formed XML. Having said that, I think I have a couple of different ways that may enable you to do this:

    • Create a Text Block (New -> Default -> Block -> Text Block) containing this data and plug it into a region where you need this content

    • Create a Data Definition which has a text area. Then, you can use an XSLT Format to output the text area. In order to do this, you'll need to make use of the cdata-section-elements and disable-output-escaping attributes. Here is an example I put together which will transform a very simple Data Definition that contains 1 text area (with an identifier named 'text-area'):

      <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output cdata-section-elements="text-area" method="xml" />
      <xsl:template match="/">
          <xsl:apply-templates select="system-data-structure" />
      </xsl:template>
      <xsl:template match="system-data-structure">
          <xsl:value-of disable-output-escaping="yes" select="text-area" />
      </xsl:template>
      </xsl:stylesheet>

    I am still curious to find out what this HTML is going to be used for. I tested this content on various browsers and the result seems to be inconsistent.

    Anyhow, hope my comments above will help.

  5. 5 Posted by pkhalil on 30 Aug, 2010 07:23 PM

    pkhalil's Avatar

    Hi Tim,

    Thanks for the reply.

    Maybe i'm not doing a very good job of explaining the situation.

    It's not really about what the HTML will be used for, that was just a basic example, it's more about getting the HTML out of the XML.

    I'm basically using cascade for the data input and then just publishing the raw XML through a template that has a REGION=DEFAULT.

    Once the XML is published from cascade it is then processed with PHP to spit out the appropriate content on various pages.

    The problem I'm having is mostly with the way the WYSIWYG embeds HTML in the XML because the HTML structure is at that point intertwined with the XML structure.

    It just doesn't make sense to me why it was done this way, especially for a product that is build around XML & HTML ...

  6. Tim closed this discussion on 21 Jun, 2011 01:27 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac