Can I use robots.txt to optimize Googlebot’s crawl?

Can I use robots.txt to optimize Googlebot’s crawl? - answered by Matt Cutts

Matt's answer:

Matt Cutts: Today’s question comes from Blind Five Year Old in San Francisco who wants to know, “Can I use robots.txt to optimize Googlebot’s crawl? For example, can I disallow all but one section of a site, for one week, to ensure it is crawled, and then revert to a ‘normal’ robots.txt?” Oh, Blind Five Year Old, this is another one of those “Noooooo!” kind of videos. I swear I had completely brown hair until you asked this question and then suddenly grey just popped in [fingers snapping] like that. That’s where the grey came from, really. So, no, please don’t use robots.txt in a, in an attempt to sort of say, “Shunt Googlebot all the way over to one section of a website, but only for a week.” Although we try to fetch robots.txt on a sort of daily basis or once every few hundred fetches to make sure we have an accurate copy of robots.txt, weird things can happen if you’re trying to flail around and change your robots.txt really fast. The other thing is that’s really not the best mechanism to handle it. Robots.txt is not the best way to do that. Suppose you want to make sure a section of say, ten pages, gets crawled well. It’s much better to take those ten pages and link to them from your root page and say, “Hey, our featured category this week is red widgets instead of brown widgets, or blue widgets.” And then just link to all of the ten red widget pages. That’s because when all the page rank comes into the root page of your site, which is where most of your page rank typically comes in because most people typically link to the root of your website. If you put the links to the pages that you care about right up front and center on that root page, then page rank flows more so to those pages than to the rest of the pages on your site. They might be five or six or seven links away from your root page. So, what I would say is you could try using robots.txt. I really don’t think it would work. You would be much more likely to shoot yourself in the foot by trying to jump around and swap out different robots.txt every week. What’s much better is instead, to work on your site architecture to rearchitect things such that the sites that you want to highlight, the sites where, or the parts of your site, where you would like more page rank and more crawling, is linked to more directly or more closely from your root page. And that will lead Googlebot more into that part of your site. So, please, don’t try to just swap in and out different robots.txt’s and sort of say, “Ok, now you’re gonna crawl to this part of the site this week, and this part of the site next week.” You’re much more likely just to confuse Googlebot and Googlebot might say, “You know what? Maybe I just won’t crawl any of these pages. This seems very strange to me.” So, that’s the other way that I’d recommend is put it right at, change your site architecture and make your site more crawlable that way.


by Matt Cutts - Google's Head of Search Quality Team

 

Original video: