Options for indexing a lot of data in one folder

geoff's Avatar

geoff

03 Feb, 2016 05:01 PM

Hi folks,

I'm trying to figure out options for indexing a folder that contains about 8,000 blocks of content & metadata. I'm not sure how many bytes of data that equates to, but it's too much data to render with a single index block. For context, we need reports of various data combinations from all content blocks in that folder. Unfortunately, it's not practical at the moment to divide that folder's content into smaller, more digestible chunks. So I'm wondering what other options are available.

-- Increase RAM: Our CMS currently has 8GB and the Max. Rendered Size is set to 40. Is there an easy way to estimate how much of an increase is needed to handle the report I need? I'm hesitant to increase the max. rendered size without increasing the RAM (I don't want to crash the CMS under all our concurrent users) and I don't know how much extra RAM would meet the need. Thoughts?

-- Index using multiple index blocks: Is that even possible? For example, can I create eight index blocks to index 1000 assets each without overlap?

-- Create temporary duplicates of the content to be indexed: I could copy the 8000 blocks into smaller folders and index each folder independently, but I'd have to repeat that process at least a couple times a year considering the content isn't static, so this option isn't as efficient as I'm hoping for.

Any other options? Thanks in advance for any suggestions.

  1. Support Staff 1 Posted by Tim on 05 Feb, 2016 09:10 PM

    Tim's Avatar

    Hi Geoff,

    For context, we need reports of various data combinations from all content blocks in that folder.

    Do these reports need to be visible within the CMS (by viewing a Page containing the results, for example) or would the information work for you outside of the application? The reason I ask is because Web Services could potentially be an option, although performing Read operations on that many assets isn't going to be very easy on the system in terms of resource usage either.

    -- Increase RAM: Our CMS currently has 8GB and the Max. Rendered Size is set to 40.

    That setting is already configured to the highest value I've ever seen. Prior to that, the highest I had seen was around 20MB. As you are probably already aware, raising that even more can put a real strain on system resources and cause performance problems system-wide for all users. So, I probably wouldn't recommend increasing this any more at this point.

    -- Index using multiple index blocks: Is that even possible? For example, can I create eight index blocks to index 1000 assets each without overlap?

    This isn't really possible unless you divide the content into different Folders, then create separate Index Blocks for each of those Folders. Even so, my guess is that you'd want to aggregate all of those Index Blocks which would essentially lead to the same resource usage as one large Index Block. One minor benefit to this approach might be that the Index Block cache could work more efficiently, but I'm not sure how much of a noticeable difference you might see.

    I'll wait to hear from you regarding my first question and perhaps we can offer some recommendations based on that.

    Thanks

  2. 2 Posted by geoff on 05 Feb, 2016 09:27 PM

    geoff's Avatar

    Thanks for your feedback, Tim.

    To answer your first question, no, these particular reports don't need to be visible within the CMS. As for Web Services, we've never written any and have barely cracked the book on how to create them.

    As for RAM, are you recommending that even if we doubled our RAM to 16GB that you wouldn't recommend increasing the Max. Rendered Size? (Somewhere on your site there's mention of a general rule of 5MB per 1GB of RAM, so I presumed that would scale up as needed... although I realize somewhere we'd hit the point of diminishing returns.)

    As for aggregating multiple index blocks into one, in this particular case, we don't need to; having eight separate reports would work just as well.

    Thanks again.
    Geoff

  3. Support Staff 3 Posted by Tim on 05 Feb, 2016 09:49 PM

    Tim's Avatar

    Hey Geoff,

    Somewhere on your site there's mention of a general rule of 5MB per 1GB of RAM, so I presumed that would scale up as needed... although I realize somewhere we'd hit the point of diminishing returns.

    Yea, the recommendation you saw is correct, but the diminishing returns is also something that I feel kicks in anytime we're talking about rendering a document over ~15MB. The assembly of that much data takes a lot of time, but also just making it available to view can be problematic (for example, if you've ever tried to open a 15MB XML document in your browser, it generally doesn't work too well).

    As for aggregating multiple index blocks into one, in this particular case, we don't need to; having eight separate reports would work just as well.

    This may be your best bet then. If you can find a way to logically divide these blocks out into different folders, you can then index each folder with its own Index Block.

    If your data happened to be included in Pages as opposed to Blocks, I was considering recommending publishing to a database and then creating your reports by querying that directly. Since Blocks aren't publishable assets this won't work in this case.

    Let me know what you think about splitting into different folders and indexing each of those individually.

    Thanks

  4. 4 Posted by geoff on 05 Feb, 2016 10:01 PM

    geoff's Avatar

    Thanks, Tim.

    As for Pages, that might be possible. We'd need to update the transform (and maybe the template?) to publish the metadata we need. Ideally that metadata wouldn't be accessible to our external users, but maybe we can create a second format just for this purpose and publish to our QA site. Hmm...

    As for dividing the folder into smaller chunks, it's not practical to do right now unless we make copies of the blocks when we need to run the report... which isn't horrible, but it's not ideal.

    Cheers,
    Geoff

  5. 5 Posted by Wing Ming Chan on 06 Feb, 2016 01:56 PM

    Wing Ming Chan's Avatar

    Hi Geoff,

    When you work with large amount of assets, your best bet is to use web services. I strongly encourage you to check out my online tutorials, which are still going on, for the following reasons:

    1. You can run your program daily, possibly after midnight
    2. You can write any reports to a database or create XML files
    3. Generating reports using my library is relatively easy, and I have been posting programs on github

    Let me know if you need any pointers.

    Wing

  6. 6 Posted by geoff on 09 Feb, 2016 06:26 PM

    geoff's Avatar

    Thanks, Wing. (Bumping web services up another rung on my priorities list.)

  7. Support Staff 7 Posted by Tim on 09 Feb, 2016 06:55 PM

    Tim's Avatar

    Hey Geoff,

    As for dividing the folder into smaller chunks, it's not practical to do right now unless we make copies of the blocks when we need to run the report... which isn't horrible, but it's not ideal.

    I had forgotten about this part of it, so I'm going to go back to my initial feeling (and Wing's) that Web Services will be the way to go for this. Let us know how things go if you decide to take this on!

    Wing, thanks for sharing links to your library!

  8. 8 Posted by Wing Ming Chan on 10 Feb, 2016 01:57 PM

    Wing Ming Chan's Avatar

    I have just posted a web service program on github/wingmingchan, showing how to use web services to retrieve index block XML and output it as a feed. A feed block can then be used to read the feed and make the XML available to Velocity. This approach can be modified to generate reports and make them available to Cascade. We can also use AJAX to read the feed directly without using a feed block.

  9. Tim closed this discussion on 26 Feb, 2016 09:55 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac