Velocity and CDATA in RSS Feed

erikg's Avatar

erikg

21 Sep, 2012 05:39 PM

I'm having a lot of trouble creating a Velocity script which can properly parse an RSS feed that contains CDATA information. Here is a very shortened example of the feed

<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
    <title>Reed Switchboard Hearts 2012</title>
    <link rel="alternate" type="text/html" href="http://blogs.reed.edu/reed_blogs/the_riffin_griffin/2012/06/reed-switchboard-hearts-2012.html" />
    <id>tag:blogs.reed.edu,2012:/reed_blogs/the_riffin_griffin//22.1707</id>

    <published>2012-06-01T16:53:33Z</published>
    <updated>2012-06-01T17:32:33Z</updated>

      <summary type="html">       
 <![CDATA[<p>Information here</p>]]></summary>
    <author>
        <name>Robin Tovey &apos;97</name>
        
    </author>
    <content type="html" xml:lang="en" xml:base="http://blogs.reed.edu/reed_blogs/the_riffin_griffin/">
        <![CDATA[<p>Information here</p>]]>
    </content>
</entry>
</feed>

and I have the following script:

#set ($feed = $_XPathTool.selectSingleNode($contentRoot,"/feed"))
#set ($entries = $feed.getChildren())

#if ( $entries.size() > 0 )

    #foreach($entry in $entries)
        #if ( $entry.getName() == "entry" )
            #set ( $elements = $entry.getChildren() )
            #foreach ( $e in $elements )
                #if ( $e.getName() == "title" ) 
                    #set ( $title = $e )
                #end
                  #if ( $e.getName() == "link" ) 
                    #set ( $link = $e )
                  #end
                  #if ( $e.getName() == "published" ) 
                    #set ( $published = $e )
                  #end
                  #if ( $e.getName() == "author" ) 
                    #set ( $author = $e )
                  #end
                  #if ( $e.getName() == "summary" ) 
                    #set ( $summary = $e )
                  #end
            #end
             
             <div class="newsPost">
            <h4><a href="$_EscapeTool.xml($link.value)">$_EscapeTool.xml($title.value)</a></h4>
            <span class="cite">from $_EscapeTool.xml($author.value)</span> 
            
            $_SerializerTool.serialize($summary, true)
            
                </div>
        #end
    #end
#end

The issue is the presentation of the summary field. The output, despite serialization, contains the CDATA beginning and ending segments, which leak through and alter the presentation. I have not found a good way to remove the CDATA elements and still get the contents to display correctly. How do I deal with this?

  1. 1 Posted by erikg on 21 Sep, 2012 05:42 PM

    erikg's Avatar

    I should say I found I can do this in XSLT by disabling the escaping of the output of the summary field

    <xsl:value-of disable-output-escaping="yes" select="atom:summary"/>

    but I can see no way to do something similar in Velocity.

  2. 2 Posted by Ryan Griffith on 21 Sep, 2012 06:56 PM

    Ryan Griffith's Avatar

    Hi,

    I believe the issue is definitely that HTML doesn't know what to do with the CDATA tags. Not quite sure, but for some reason this works (can't guarantee it always will):

    #set ($feed = $_XPathTool.selectSingleNode($contentRoot,"/feed"))
    #set ($entries = $feed.getChildren())
    
    #if ( $entries.size() > 0 )
    
        #foreach($entry in $entries)
            #if ( $entry.getName() == "entry" )
                #set ( $elements = $entry.getChildren() )
                #foreach ( $e in $elements )
                    #if ( $e.getName() == "title" ) 
                        #set ( $title = $e )
                    #end
                      #if ( $e.getName() == "link" ) 
                        #set ( $link = $e )
                      #end
                      #if ( $e.getName() == "published" ) 
                        #set ( $published = $e )
                      #end
                      #if ( $e.getName() == "author" ) 
                        #set ( $author = $e )
                      #end
                      #if ( $e.getName() == "summary" ) 
                        #set ( $summary = $e )
                      #end
                #end
                 
                 <div class="newsPost">
                <h4><a href="$_EscapeTool.xml($link.value)">$_EscapeTool.xml($title.value)</a></h4>
                <span class="cite">from $_EscapeTool.xml($author.value)</span> 
                
                $summary.value
                
                    </div>
            #end
        #end
    #end
    

    Also wanted to note, instead of looping through $entry.getChildren(), you could do:

    $entry.getChild('summary').value
    
  3. 3 Posted by erikg on 21 Sep, 2012 08:50 PM

    erikg's Avatar

    Actually, from what I can tell the looping has to occur. The feed has a namespace (the feed is an atom feed), so the elements can't be referred to directly like you would in regular XML constructions. Other discussion topics here regarding XML and atom feeds have used this looping convention to get around the namespace issue, and it's the only thing I've found to work. In fact, I've tried what you mention and it hasn't worked to return the node elements. Oh do I wish it could, though. :)

    As for the $summary.value part, that does not work for me either, though I imagine it will with the XML snippet I provided above. I abbreviated it for the example above, but the real content of the summary field looks more like this:

    <summary type="html">       
     <![CDATA[<p>
        <a href="http://blogs.reed.edu/reed_blogs/the_riffin_griffin/switchboard_5.12.jpg"><img alt="switchboard_5.12.jpg" class="mt-image-left" height="225" src="http://blogs.reed.edu/reed_blogs/the_riffin_griffin/assets_c/2012/05/switchboard_5.12-thumb-300x225-2989.jpg" style="float: left; margin: 0 20px 20px 0;" width="300" /></a></p>
    <p>
        The <a href="http://reedswitchboard.com/" target="_blank">Reed Switchboard</a> is a volunteer effort aimed at fostering contact between current students and alumni. We&#39;ve partnered with artist Lucy Bellwood &#39;12 to bring you the <a href="http://switchboardhearts2012.tumblr.com/" target="_blank">Switchboard Hearts 2012 project</a>. During <a href="http://reedfayre.reed.edu/" target="_blank">Reunions &#39;12: Reedfayre</a>, Lucy will be on campus taking photos of Reedies and their passions on Saturday, noon to 5 p.m., outside between Old Dorm Block and Eliot.</p>
    <p>
        <strong>Alumni</strong>: what are you passionate about? If a current student called you, what would you want to talk to them about? Who are the Reedies you&#39;d like to connect with and what are their interests? The photos we take during Reedfayre will be posted online later this summer so that everyone interested in, say, &quot;museums&quot; can learn about their shared interests and connect with each other. &nbsp;Join the photo shoot!</p>
    <p>
        For more information, send email to&nbsp;<a href="***@***">[email blocked]</a>.</p>
    ]]></summary>
    

    Because the contained html contains code like &nbsp; and such, all I get are errors when referring to the element by $summary.value. Sorry for not making my example above more robust.

    It seems to me that the SerializerTool is actually escaping the CDATA code segments so that they show. It would be better if it just stripped them out, or at least had an option to not escape the content like XSLT does.

  4. 4 Posted by erikg on 21 Sep, 2012 09:05 PM

    erikg's Avatar

    Oh, for anyone interested, here is the source feed I'm trying to parse:

    http://blogs.reed.edu/the_riffin_griffin/feed_reunions.xml

  5. 5 Posted by Ryan Griffith on 24 Sep, 2012 02:45 PM

    Ryan Griffith's Avatar

    HI Erik,

    My apologies, I didn't see the entities within the summary.

    I'm not seeing a straightforward way to output that content. You could perhaps encode the HTML using $_EscapeTool.javascript() (this seems to strip out the CDATA) and use JavaScript to decode the HTML and append the result to your listing. I found a few posts by searching something like javascript html entity decode.

    As you stated above, I'm thinking your best bet would be to use an XSLT Format to output your content.

    Hopefully someone else has encountered this situation with Velocity and will chip in.

  6. 6 Posted by Ryan Griffith on 18 Oct, 2012 03:55 PM

    Ryan Griffith's Avatar

    Hi Erik,

    I was going over some older discussions and noticed this one is still open. Were you able to get your Velocity Format working, or did you switch to XSLT?

    Please feel free to let us know if you have any other questions.

    Thanks.

  7. 7 Posted by erikg on 18 Oct, 2012 04:08 PM

    erikg's Avatar

    Thanks for checking in. I ended up abandoning the Velocity approach and went with XSLT instead. I just couldn't get Velocity to behave correctly without introducing unnecessary complexity. A simple XSLT script did the trick.

    From what I could see, Velocity lacked the equivalent of XSLT's disable-output-escaping option, as in the following:

    <xsl:value-of disable-output-escaping="yes" select="atom:summary"/>

    Velocity has been my script format of choice, but it's not quite ready for some things. Name spaces and CDATA seem to be its greatest weaknesses.

  8. 8 Posted by Ryan Griffith on 18 Oct, 2012 05:25 PM

    Ryan Griffith's Avatar

    Thank you for the follow up, Erik. Glad to hear you were able to get an XSLT equivalent working.

    Velocity has been my script format of choice, but it's not quite ready for some things. Name spaces and CDATA seem to be its greatest weaknesses.

    I definitely agree, there are some things that XSLT does better and those two points are two of them

    I'm going to go ahead and close this discussion, feel free to reply or comment to re-open this discussion if you have any additional questions.

    Thanks.

  9. Ryan Griffith closed this discussion on 29 Oct, 2012 02:26 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac