Blog

Should I block Googlebot from crawling a dynamically generated calendar?

Matt's answer:

Uncle asks a question from Prague in the Czech Republic. Uncle wants to know, should I deny access to a dynamically generated vBulletin calendar via robots.txt? Whenever I open my Who’s Online page, I can always see more than 20 Google spiders indexing it. Yes, I probably would disallow Googlebot from the dynamically generated vBulletin calendar. Google actually has things to try to spot what we call spider traps or infinite crawl spaces. So things like calendars, if you think about it, you can crawl them as much as you want. So you can generate calendars all the way up to the year 3,000 and beyond. So that’s why we say the web is infinite, because there are so many web servers that can generate more links that Googlebot can crawl again, and again, and again. And my guess is, if you block out the calendar from your robots.txt, then probably Googlebot, rather than having 20 simultaneous bots crawl on your calendar, would find the other pages on your site that are little more useful, that have a little more content, that your users will find useful if they find it in Google search results pages. So yeah, that would be an opportunity. Calendars are a good one where, if there’s really not a lot of rich material there, there’s not as many events. Or especially if you find them crawling into the future, that might be a good opportunity to set up a block in robots.txt where Googlebot won’t crawl those pages.

by Matt Cutts - Google's Head of Search Quality Team

Should I block Googlebot from crawling a dynamically generated calendar?

Matt's answer:

Original video: