How to select all published pages from the database

jperreault's Avatar

jperreault

18 Jul, 2014 10:35 PM

Hi,

I am trying to get a list of all published pages by querying the database. I used this query :

select cachepath from cxml_foldercontent
where assettype = 'PAG'
and iscurrentversion = 1
and siteid = '18cb76db7f0000010104957650b3d5a1'
and shouldbepublished = 1
and isrecycled = 0
order by cachepath

However, sometimes the parent folder, or the parent's-parent folder is set to not publish. The result is that my list has a lot of pages which are not actually published, because one of the parent, grand-parent, etc folders has been set to not publish.

Can you help me with a way to select just the pages that get published ? If not through the database, is there some other way ? I tried using a block, but not all published pages are indexed.

Thanks,

Jeff

  1. 1 Posted by Wing Ming Chan on 21 Jul, 2014 09:03 PM

    Wing Ming Chan's Avatar

    Hi Jeff,

    This morning I wrote a few lines of code and did exactly what you wanted by using web services. The solution is provided by the AssetTree class. Let me know if you are interested.

    Wing

  2. 2 Posted by Ryan Griffith on 22 Jul, 2014 12:34 PM

    Ryan Griffith's Avatar

    As always, thank you for helping out, Wing.

    One thing to keep in mind, as Jeff mentioned, is just because a page has a last published on timestamp, does not necessarily mean it is published. Also, the definition of what it means to be published could vary.

    For example:

    • The page may have been unpublished, which won't change this timestamp. Or, perhaps it was not (un)published to all available Destinations.
    • An ancestor Folder could be set to be unpublishable, or may have been unpublished at some point.
    • The page may have been removed from a Destination server manually, while it still appears to have been published from within Cascade.

    You would almost need to traverse the system and check the Destination server(s) to see if the asset still exists. And, perhaps even compare the last modified timestamp to the page's published on timestamp.

  3. 3 Posted by Wing Ming Chan on 22 Jul, 2014 01:59 PM

    Wing Ming Chan's Avatar

    Hi,

    My interpretation of 'publishable' is the following:

    1. The asset must have the shouldBePublished property set to true
    2. The parent folder must be publishable (defined recursively)

    This definition only identifies assets (pages, files, folders) that CAN be published. To check whether they have been published/unpublished where and so on, we still need to look at the the lastPublishedDate property and possibly the associated audit.

    Wing

  4. 4 Posted by Ryan Griffith on 23 Jul, 2014 06:09 PM

    Ryan Griffith's Avatar

    You are definitely spot on, Wing, with regards to what would be considered publishable. I guess the question is, does Jeff want to locate publishable or published assets?

  5. 5 Posted by jperreault on 23 Jul, 2014 06:25 PM

    jperreault's Avatar

    I was hoping to use a database query to select all “publishable” pages. We're re-designing our website, and want to do some QA to test if we've remembered to update every page.

    I could get published assets by logging into the webserver and using a unix command like this :

    find /var/www/html/ -name "*.html"

    In reality, they should be nearly the same, but I know that our webserver has some extra old "orphaned" html pages that someone renamed when we were using a previous version of Cascade Server. Even now, sometimes people make mistakes and uncheck the "Unpublish Content" when renaming files or directories.

    I was hoping there was a simple SQL query, but if need be, I can use the webservices calls that Wing discussed to traverse the entire site directories tree

    Jeff

  6. 6 Posted by Ryan Griffith on 23 Jul, 2014 06:34 PM

    Ryan Griffith's Avatar

    Hi Jeff,

    Let me check with the developers to see if this would be possible using a query, or if this would need to be done programatically. I will be in touch as soon as I hear back.

    Thanks!

  7. 7 Posted by Wing Ming Chan on 23 Jul, 2014 10:26 PM

    Wing Ming Chan's Avatar

    Jeff,

    Besides the CascadeInstances class, I am also working on a Report class. Getting a report you want will require a simple method call. The class can be available in a few days.

    Wing

  8. 8 Posted by Ryan Griffith on 24 Jul, 2014 12:47 PM

    Ryan Griffith's Avatar

    Jeff and Wing,

    After speaking with a developer, determining what is publishable would need to be done programatically.

    A few more scenarios he mentioned that I had overlooked:

    • If the page is set to be publishable, but its outputs are not publishable
    • If the page's content type is configured with outputs that are not publishable
    • If the page's content type is configured to publish to specific destinations and those destinations are not publishable
  9. 9 Posted by Wing Ming Chan on 24 Jul, 2014 01:50 PM

    Wing Ming Chan's Avatar

    That's why I prefer global configuration sets and content types. At Upstate we cannot just simply shut down an output in a configuration set, because this will basically make everything related to that output of that configuration set, which is used by every site, unpublishable. Site destinations are different and should be dealt with at the site level.

    Wing

  10. 10 Posted by Wing Ming Chan on 29 Jul, 2014 03:18 PM

    Wing Ming Chan's Avatar

    Just to give everybody a heads-up, my Report class is available now. See Generating Reports for details. To generate a report of publishable pages, we just need code like the following:

    $results = $report->
        setRootFolder( // where to start traversal
            $cascade->getFolder( 
                'how-to/getting-started', // folder path
                'cascade-admin' )         // site name
        )->reportPublishable();
    echo S_H2 . "Publishable Assets" . E_H2;
    DebugUtility::dump( $results );       // dump the report
    
    

    I am still adding report types though...

    Wing

  11. Ryan Griffith closed this discussion on 20 Aug, 2014 01:05 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac