How can I remove old content from Google’s index?

How can I remove old content from Google’s index? - answered by Matt Cutts

Matt's answer:

CUTTS: Here is a fun question from Sebastian in Germany. Sebastian asks, “We still have old content in the index. We block them via robots.txt use 404 and delete them via Webmaster Tools, but Google still keeps it. What can we do to quickly delete content from the index?” This is a great question. It looks like you’re doing all the right things, so I’d be interested to find out more details. But let me tell you what most people do because there is often, you know, some sort of accident in place. If you want to remove a single page, you need to make sure that your web server returns a true 404 code for that single page. So, for example, if you say “file not found, page not found,” but the HTTP status code that you return is a 200 and not a 404, then we’ll say, oh, okay, this page is still alive because it’s a 200 code so we won’t process that URL removal request. Instead, we’ll say, “No, this page is still live.” It needs to be truly gone and truly returning a 404 before we’ll delete it. So that’s deleting a single page. Now, let’s talk about deleting an entire site. Because we might not be able to check every single page on the site, we require that if you want to remove the entire site, it needs to be blocked in robots.txt. If you do those things, removing a site, block in robots.txt; removing a page, make sure that it truly does return a 404 status code, then everything should go smoothly in the URL Removal Tool. If it doesn’t, stop by our Webmaster Help Forum and ask, “Hey, what’s going on?” It’s at google.com/webmasters. You can find the link there. And if you’re returning the right status code and you’ve got to block it in robots.txt, that’s something we want to know if we’re not removing your content quickly, because if you don’t want your content in Google’s index then we don’t want to return it if you don’t want it returned. So those are some simple mistakes. Most people don’t return the true 404 code and most people don’t know that if you want to block an entire site, we say it’s somewhere in the documentation. I’m sure it has to be blocked in robots.txt, so that we’re not just checking individual pages. Those handle 90% of the cases where people say, “I tried to remove things and it didn’t really disappear.” So checkout those two factors.


by Matt Cutts - Google's Head of Search Quality Team

 

Original video: