Automatically updating XML sitemap

Nicole Foster's Avatar

Nicole Foster

13 Oct, 2014 02:13 PM

I am looking to create an automatically generating, updating, and publishing XML sitemap that pulls only published content if possible.

I have used the format from the Sitemap SEO code repo

My index block is attached as screenshot. Here is my data definition XML:

<system-data-structure>
  <asset type="block" identifier="chooser" label="Index Block" render-content-depth="5" required="true"/>
</system-data-structure>

Everything is being pulled correctly, but I'm not sure how to use all of this to generate an XML sitemap.

Any advice?

  1. 1 Posted by Ryan Griffith on 13 Oct, 2014 03:04 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    Once you have the Index Block and the Data Definition set up, what you will need is a simple Template containing a single DEFAULT region:

    <system-region name="DEFAULT"/>
    

    Next, you will need to being to set up your Page by creating a Configuration Set. Within your Configuration Set, you create an XML output and use your new Template. Within your output's DEFAULT region, apply the XSLT Format provided by the repository and assign a calling page Index Block. You will then need a Content Type, which should be assigned your new Data Definition and Configuration Set.

    Once you create a new page using this Content Type and select an Index Block, the resulting content should contain the sitemap XML. You can choose to publish this page manually, or add it to a Publish Set so it can be scheduled to publish automatically.

    Please let me know if you have any questions.

    Thanks!

  2. 2 Posted by Nicole Foster on 13 Oct, 2014 03:40 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Thank you for explaining this all to me.

    It seems to be working fine. However, I need to modify the rootURL variable:

    <xsl:variable name="rootURL">http://www.hannonhill.com</xsl:variable>
    

    If possible, I would like to grab the site's URL dynamically so I can reuse these on our 20+ sites. Is there a way in XSLT to grab the site URL?

    Thanks,
    Nicole

  3. 3 Posted by Ryan Griffith on 13 Oct, 2014 07:39 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    Unfortunately, there is currently no direct way to include the Site's URL, so you would need to either use a different Format per Site, or if the majority of your Sites do use the same base URL, have a means of including perhaps the sub-directory for each individual Site.

    We do have the following related suggestions on our Idea Exchange, I highly recommend voting them up:

    Please let me know if you have any questions.

    Thanks!

  4. 4 Posted by Nicole Foster on 14 Oct, 2014 02:11 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Thank you for clearing that up with me.

    My sites do include the same base URL, but with different sub-domains. How do you recommend I try this approach?

    Thanks,
    Nicole

  5. 5 Posted by Ryan Griffith on 14 Oct, 2014 08:42 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    One workaround clients have used in the past is to name the Site the same as the URL, and replace slashes with a valid character such as a plus sign or underscore. Then, you can use the value of the <site>, clean it up a little (eg replace pluses with slashes), and append it onto the front of the URL.

    For example: www.syr.edu+folder+subfolder could be used and, after replacing the plus signs and adding the protocol, you can end up with http://www.slc.edu/folder/subfolder for your links.

    Please let me know if you have any questions.

    Thanks!

  6. 6 Posted by Nicole Foster on 21 Oct, 2014 02:19 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Thank you for explaining this solution to me.

    Is there a way to set up like domain.syr.edu? Many of our live sites don't use subfolders.

    Thanks,
    Nicole

  7. 7 Posted by Ryan Griffith on 21 Oct, 2014 02:33 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    You certainly can use a sub-domain for the Site's URL. In-fact that's even easier since you don't have to replace anything, you can simply use the value of <site> as-is.

    Please let me know if you have any questions.

    Thanks!

  8. 8 Posted by Nicole Foster on 21 Oct, 2014 05:02 PM

    Nicole Foster's Avatar

    Hi Ryan,

    I figured it would be simpler with subdomains.

    How would I get the value of in this instance?

    Thanks,
    Nicole

  9. 9 Posted by Ryan Griffith on 21 Oct, 2014 05:53 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    You can obtain this value while looping over your assets, it will be the <site> element. Feel free to attach your Format and I would be more than happy to try and help point you in the right direction.

    Please let me know if you have any questions.

    Thanks!

  10. 10 Posted by Nicole Foster on 21 Oct, 2014 06:31 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Are you asking for the format of the sitemap?

    If so, here it is below:

    <?xml version="1.0" encoding="UTF-8" ?>
    <!--
        Generate On-demand Sitemap XML from Structured Data
        Created by Ross Williams on 2010-02-17.
        Copyright (c) 2010 Hannon Hill Corp. All rights reserved.
    --><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
        <xsl:include href="/_internal/stylesheets/format-date"/>
        
        <xsl:variable name="rootURL">Site URL will go here</xsl:variable>
        <xsl:variable name="defaultExtension">.html</xsl:variable>
        <xsl:variable name="lastPublished" select="/system-index-block/@current-time"/>
    
        <xsl:template match="/">
            <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
                <xsl:apply-templates/>
            </urlset>
        </xsl:template>
        
        <xsl:template match="/system-index-block[calling-page]">
            <xsl:apply-templates select="calling-page/system-page/system-data-structure"/>
        </xsl:template>
        
        <xsl:template match="/system-data-structure | system-data-structure[parent::system-page[parent::calling-page]]">
            <xsl:apply-templates select="*[content/system-index-block or path]"/>
        </xsl:template>
    
        <!-- Process all non-Index-Block assets -->
        <xsl:template match="*[content[not(system-index-block)] and path] | system-page | system-file">
            <url>
                    <loc><xsl:value-of select="concat($rootURL,path,$defaultExtension)"/></loc>
                    <xsl:call-template name="generate-lastmod"/>
                    <xsl:call-template name="generate-changefreq"/>
            </url>
        </xsl:template>
        
        <!-- Catch non-Index-Block assets that are not published -->
        <xsl:template match="system-page[ancestor::system-folder[not(is-published)]] | system-file[ancestor::system-folder[not(is-published)]]"/>
        
        <!-- Catch Index Blocks and process their contained elements -->
        <xsl:template match="*[content/system-index-block]">
            <xsl:apply-templates select="content/system-index-block/descendant::system-page"/>
        </xsl:template>
        
        <!-- Use last-published-on of the asset if available, else fall back to last-published-on of the sitemap index -->
        <xsl:template name="generate-lastmod">
            <xsl:variable name="lastmodFormat">UTC:yyyy-mm-dd'T'HH:MM:ss'+00:00'</xsl:variable>
            <lastmod><xsl:call-template name="format-date">
                <xsl:with-param name="date" select="last-modified"/>
                <xsl:with-param name="mask" select="$lastmodFormat"/>
            </xsl:call-template></lastmod>
        </xsl:template>
        
        <xsl:template name="generate-changefreq">
            <xsl:choose>
                <xsl:when test="dynamic-metadata[name='changefreq']/value != ''">
                    <changefreq><xsl:value-of select="dynamic-metadata[name='changefreq']/value"/></changefreq>
                </xsl:when>
                <xsl:otherwise/>
            </xsl:choose>
        </xsl:template>
        
        <!-- Catch-all -->
        <xsl:template match="*" priority="-10">
            [system-view:internal]
            <xsl:comment> Unexpected Element Encountered: name="<xsl:value-of select="local-name()"/>" </xsl:comment>
            [/system-view:internal]
        </xsl:template>
        
    </xsl:stylesheet>
    

    Thanks,
    Nicole

  11. 11 Posted by Ryan Griffith on 21 Oct, 2014 06:55 PM

    Ryan Griffith's Avatar

    Thank you for providing the Format, Nicole.

    So, assuming all of your Site URLs are updated to be something like www.syr.edu and domain.syr.edu, you would modify the following line:

    <loc><xsl:value-of select="concat($rootURL,path,$defaultExtension)"/></loc>
    

    To something like the following:

    <loc><xsl:value-of select="concat('http://', substring-after(link, 'site://'),$defaultExtension)"/></loc>
    

    Alternatively, I believe you could also use the following:

    <loc><xsl:value-of select="concat('http://',site,path,$defaultExtension)"/></loc>
    

    Please let me know if you have any questions.

    Thanks!

  12. 12 Posted by Nicole Foster on 21 Oct, 2014 07:16 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Thank you for your guidance.

    It seems that the URL is getting messed up. When the XML is exported, it is grabbing the name of the site and not the actual URL.

    Here is what one of our test sites looks like:

    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
    <loc>http://DSA-CS-New/majorcareer/index.html</loc>
    <lastmod>2014-10-10T14:59:13+00:00</lastmod>
    </url>
    <url>
    <loc>
    http://DSA-CS-New/majorcareer/current-stories/test.html
    </loc>
    <lastmod>2014-10-10T14:59:14+00:00</lastmod>
    </url>
    <url>
    <loc>http://DSA-CS-New/resumesletters/index.html</loc>
    <lastmod>2014-10-10T14:59:12+00:00</lastmod>
    </url>
    <url>
    <loc>http://DSA-CS-New/resumesletters/template.html</loc>
    <lastmod>2014-10-10T14:59:12+00:00</lastmod>
    </url>
    <url>
    <loc>http://DSA-CS-New/sitemap.html</loc>
    <lastmod>2014-10-21T19:10:04+00:00</lastmod>
    </url>
    <url>
    <loc>http://DSA-CS-New/sitemap.html</loc>
    <lastmod>2014-10-21T19:10:04+00:00</lastmod>
    </url>
    </urlset>
    

    DSA-CS-New is the name of the site, but the URL I gave to Cascade is http://dsa-webpriv.syr.edu/depts/cs-new

    Any idea what could be happening?

    Thanks,
    Nicole

  13. 13 Posted by Ryan Griffith on 21 Oct, 2014 07:42 PM

    Ryan Griffith's Avatar

    Hi Nicole,

    It seems that the URL is getting messed up. When the XML is exported, it is grabbing the name of the site and not the actual URL.

    Correct, as I mentioned, this Format would assume you went through and renamed all appropriate Sites so that their name is of the form www.syr.edu or domain.syr.edu. This way, you can use the name of the Site as the URL when generating the links. Otherwise, there is no way to obtain the Site's URL from your Format.

    Please let me know if you have any questions.

    Thanks!

  14. 14 Posted by Nicole Foster on 21 Oct, 2014 07:48 PM

    Nicole Foster's Avatar

    Hi Ryan,

    Thank you for clarifying that. I must have misunderstand what you said.

    I will look into other options. Thank you for your help.

    Thanks,
    Nicole

  15. 15 Posted by Ryan Griffith on 21 Oct, 2014 07:56 PM

    Ryan Griffith's Avatar

    Not a problem at all, Nicole. My apologies for not being able to provide you with an easy solution here.

    Definitely vote up the suggestions I linked to previously if you have not already. I believe they would help make a solution to this problem much simpler.

    Let me check with our Services team to see if they may have any suggestions.

    In the meantime, another thought I had would be to publish separate files to the base folder of each Site and use an external script that reads the files and aggregates them after changing the URLs somehow. Might run into the same issue, though.

    Please let me know if you have any questions.

    Thanks!

  16. Ryan Griffith closed this discussion on 10 Nov, 2014 09:00 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac