Wrong email or password! Try one more time.

Forgot password?

An account with this email already exists.

An email with a confirmation link has been sent to you.

Did you forget your password? Don't panic. Enter your email address,
and we will email you a link where you may create a new password.

If this address exists, we will send you an email with further instructions.

Back to authentication

State of the Index 2009

State of the Index 2009 - answered by Matt Cutts

Matt's answer:

CUTTS: Okay. I made a promise that whenever I give a presentation at a conference, I would try to recreate it. So, today, I’m recreating this State of the Index talk that I gave at PubCon in November 2009. So, the way that I decided to structure this talk was to say, “Okay, what has Google done for users, web developers, and webmasters lately? Have we communicated? What services that we put out there?” Just sort of showing that in the last year there has been a lot of stuff that Google’s done which hopefully helps users and webmasters or web developers. So, starting out, what has Google done for users lately? Well, for an average user I think Chrome, Android, and Wave are all really interesting. They all have very large open-source components. So, you know, if you think that Chrome talks back to Google, which really it doesn’t and you don’t need to worry about it, you could pull down the open-source version, Chromium, compile it and surf away, and not have to worry about it, because you can see everything in the open-source code, which is really, really nice. But I wanted to tilt things even what have we done for users towards the webmaster’s side of things. So, this talk was given in Las Vegas. So I talked about the Music OneBox which we had just launched. So you can search for an artist and a song name. So, for example, Sheryl Crow, Leaving Las Vegas. And you can not only play her song and hear the music right there, but you can also buy it, so for 99 cents or a very reasonable price, you can buy the MP3 which is really quite pretty handy. One feature that the Music OneBox has that I don’t think anybody else has that I know of is you don’t just have to search for artist name and song or the song, you can also type in lyrics. So, for example, “Lady Luck please let the dice stay hot,” and that brings back “Viva Las Vegas” by Elvis Presley. So, if you’ve heard a song on the radio and you don’t know who’s singing it, you can still probably type in a few words and get back to that song. So you can listen to it or buy it. So, we’re going along with the theme with what is Google done for user. But, realistically, you got the tunes on, what are you interested in now? Well, as a user maybe you want to do some key word research. So, over the summer, we introduced a product called Google Squared which is really pretty fun. A lot of people have played with it, but not everybody has realized how deep it is. You can type almost anything into it. So I showed an example where you typed in Las Vegas Shows, and you got Bette Midler, Chris Angel, Rita Rudner, stuff that seems like it should be relatively common sense. You know, like, “Oh, okay, what so great about that?” But you can type anything in to Google Squared, and you’ll usually get some pretty interesting or reasonable results. So if you’re just coming up blank and you’re trying to do some brainstorming–I showed an example of social networking sites–and what I did is I took all the standard stuff like Facebook and you know, MySpace out of it. And what was left was myYearbook, Skyrock, Netlog, MEETin, Essembly, all these sort of really almost niche sites that appeal to young people or to specific languages. And I asked the audience, “Okay, how many of you are members of any of these social networks?” and almost no one raised their hand, of course. And so, I said, “Look, instead of trying to chase the market in Facebook, you could establish a valid presence on some of these social networks and participate in that community. And it might be a little bit easier to get attention over there than in some of the louder, crowded, noisy place where everybody is.” So that’s social networking sites, that’s one where you could use it. But Google Squared, you can type in almost anything, and you can type in suggestions and get a lot of good feedback on, in terms of brainstorming or keyword research. So speaking of social networking sites, the next thing that I mentioned that a lot of users enjoyed that we’ve done in the last year’s Google Social Search. So I showed an example where I search for PubCon and I got really good posts. Not just like, you know, two days ago but posts that had stood the test of time. So, Michael Grey was talking about how PubCon is like Star Wars, which is a really fun, entertaining post. And that’s not just stuff from the web, it’s stuff from the Web that’s public but has also involved in your social circle. So, if you click on for example results from people in your social circle, you’ll get to this sort of tool belt on the left-hand side where you can slice and dice your search results. And one of the things that’s shown is Social. So you can do a search and click on that tool belt, you can go to Social. And the example that I showed was meta tags, and I got all the people who had done public blog posts or talked on Twitter about meta tags, and it was really pretty interesting. One feature that I talked about that not everybody knows about it on social search is that if you open up this one, this tool belt mode where you say Show Options up at the top above the Search Results. Suppose you do a search like meta tags, you’ll see the people who are most relevant who have written about meta tags in the past. And the people who are shown will change depending on what you search. So if I search for Podcasts, I’d probably get Leo Laporte. But if I search for meta tags, you know, I get Jennifer Slegg or Eric Goldman or Danny Sullivan. And so, it’s pretty neat to see how different searches will bring up different people who are experts in your social circle. So, I kind of wanted to push it a little bit, because whenever I talked to the social search people before I left, they said, “Hey, we will have query capacity. Tell them to sign up for it.” So the simple way to sign up for Google Social Search is, first, it helps if you have a Google profile because then we know from your account what are the different services like Twitter or FriendFeed that you use. Add the links to your Google profile to sort of point those services. It could be your blog or it could be Twitter, and then you have to opt-in. You go to Google.com/experimental, and say “Yes, I’d like to be on the Social Search.” And then, whenever you search you just have to be signed in so we know it’s you, but we’ll surface people when we think is relevant from your social circle with the public stuff that they have said. So, it’s very fun. I was really impressed with the quality of the stuff that it surfaced. It sort of surprised me that it was a really nice blend of both social but also relevant. So, continuing on with what has Google done for users?” I talked about Show Options, which internally at Google we call Google Tool Belt, because it’s a nice little tool belt of different ways to slice and dice your search results. So, if you click on Show Options above your search results, there are all these great ways where you can say, “Okay. I searched for PubCon, but show me mentions of PubCon within the last 24 hours.” And in fact, you can say, “Sort by date,” which is really handy. You want to find out who are the people who have mentioned Google and tell me the most recent blog post or the most recent Web pages that we’ve found. So, it can be a good way to monitor repetition. We also have the ability to search by date range, so you can say, “Okay, show me all the mentions of, you know, Barack Obama, but show me from 2002 to 2006.” So you don’t have to get stuff from the presidential election, you can get from when he was a senator. So that’s a really helpful way to do power searches, and a lot of people appreciate that. One last fun thing that is it the tool belt on this side on the Search Results is what we call Wonder Wheel. And that’s another way that as a webmaster or a publisher, you can do a lot of keyword research and brainstorming. So if you typed in PubCon, you can always use the Google keyword tool. There’s a bunch of different ways where you can do keyword research. But in Wonder Wheel, it’s in flash so can type in PubCon. And some of the suggestions include PubCon 2009; Tony Hsieh who is associated with Zappos who did the keynote at the conference; Las Vegas Convention Center. But there’s also related conferences like Search Engine Strategies and the ADTECH and Affiliate Summit. So, it lets you bring so many ways you might not normally brainstorm. And if you click on one of those entries, PubCon will move out of the way a little bit, and this new entry whether it be ADTECH or PubCon 2009 will take the center, and you’ll see different related concepts to that particular keyword phrase. So you can kind of click around and explore the space a little bit and quickly move in to different ways to brainstorm, different ways to get good keyword research done. And with that, we were done with what has Google done for users. And I wanted to spend a little bit of time talking about web developers, not just webmasters, because we’ve done some really nice things. So, at code.google.com/speed, we’ve released a bunch of different tools so that people can figure out how to make their site faster. So one of them is a Firefox extension called Page Speed. And what it will do is basically try to show you all the different ways whenever you load a page that you can try to make things load a little bit faster. So leveraging browser caching, minifying JavaScript, taking a bunch of different CSS files and combining them in to one CSS file so that you have fewer HTTP requests. And at the time I told the people at the conference that while we currently don’t use Page Speed as a factor in our search results and how we rank different search results, there are people at Google who definitely want to. And a lot of people within Google have been thinking about ways where we can figure out how Page Speed can be one of the factors, not the biggest factor, not the only factor but one of the factors, because if you have a fast site, it really improves the user experience. And so, I sort of tried to let people know that if you can improve the speed of your site, it’s good for users and there’s at least a chance in 2010 that it would be good for your website rankings as well or help just a little bit at least, so it’s worth paying attention to. The next slide is about webpagetest.org. Google has similar tools. This particular site is not associated with Google, but I just wanted to throw it out there because it’s really neat. You can see like almost a waterfall model of how long it takes to load your site and what things are loading. It’s just a sort of thing where you can’t manage something until you can measure it. And having the ability to see how long does it take to load all the different stuff on a site can be a really eye-opener. I think Barry Schwartz looked at rustybrick.com afterwards and sort of found the way to squeeze two-thirds of a time of loading the site completely out just by using some very simple changes. So, you’d be amazed at how much of a difference it really can make. Okay. So what else has Google done for web developers? We have just recently released this fantastic set of tools called Closure, and you can find it at code.google.com/closure, and it’s a bunch of different things. It’s compiler, a library, and a templating system, so I’ll just focus on a couple of those. It’s a compiler in the sense that it will take JavaScript and it will basically try to combine it down into something that’s very, very compact. So you might have–I’ll talk about that on the next slide but it will squeeze it down to be very, very small. The library is incredibly interesting. There’s something like over 180 different just UI elements alone. So, I’m showing on this slide, goog.ui.DatePicker which is the same DatePicker that Google use whether it’s in Google calendar or a lab in Gmail. And you can use that code totally for free. So by providing this library which has got all kinds of user interface components but also a bunch of different things for just math and time and all sorts of stuff like that, you save yourself the work of writing that JavaScript code, and it’s very nicely internationalized. It works really well. It’s the same stuff that we use. So, we’re trying to make the web better by making it easier to develop for the web. So to talk about the compiler part of Closure a little bit, Google Reader’s JavaScript, they talked to the Google Reader team, and I think it was Louis Gray that interviewed Mihai Paparita. And Mihai said that Google Reader’s JavaScript would have been two 2 megabytes uncompressed, and Closure got it down to 513 kilobytes, so 25 percent of the original size. And then, with gzipping it, they were able to get it down to 184 kilobytes. So, 2 megabytes down to 184 kilobytes is really worth the few minutes of running this Closure compiler and figuring out how to do gzipping, because it makes a difference between things loading in two or three seconds versus 20 or 40 seconds. So, it’s really a pretty good idea to pay some attention to. So, I didn’t want to emphasize the web developer stuff for too, too long. I didn’t want to bore people, but I did want to include a couple of shout-outs to other Google tools that make things easier. One is the Google Web Toolkit; we use that ourselves for a bunch of different stuff. I believe Google Wave uses it. The latest version of AdWords, I believe uses Google Web Toolkit. And it just makes it so that it’s a lot easier to sort of write your code. It’s almost like you can write it in Java and compile it down to JavaScript, and you get a lot or reusable components where you don’t have to worry about cross-compile or cross browser aspects, all those sort of things. So, a lot of people enjoy GWT or the Google Web Toolkit. And then, a final thing for web developers is called the AJAX Libraries API. So what is that? Google found a bunch of really useful AJAX Libraries, you know, Scriptaculous and the whole bunch of different stuff as I recall. And we said to these people, “You know what? We will host this for free on Google. We’ll pay the bandwidth bills, all that sort of stuff.” We’ll also make sure that if you include it from here, you always get the most recent version. So now you don’t have to worry about a security hole in your JavaScript library or a third party library that you’re using. As long as you’re using the version hosted on Google, you’ll always get the most recent version. So, it’s a very handy thing. It’s just, you know, means that there’s infrastructure that you don’t have to worry about. You can let somebody else deal with it and that can be a very handy thing. So a lot of different stuff in the last few months that have been pretty helpful for web developers. And then we got to what was my favorite part which was, what has Google done for webmasters? So there’s a bunch of different stuff, starting off February 2009, rel=canonical. So this is something that the major search engines support. If you have two pages that are basically the same pages, you can say, “You know what? This is my preferred page so I’m going to put a rel=canonical on this page to point to this page and Google can sort of glom those together and say, “Oh, the links to this page should be combined with the links to this page.” Now, if you can do it with your site architecture where you don’t have to worry about the incoming links and the duplicate content at all, that’s best. If you can do a 301 redirect where it passes the page rank and you can say, “You know what? I have duplicate URLs but I can do a 301 redirect to this one single location,” that’s almost as good. But if you can’t generate either one of those because of your CMS or for whatever reasons, rel=canonical is a pretty good way to say, “You know what? These two pages are actually the same page.” So that’s been something that a lot of people–I’ve been surprised at how much uptake we’ve gotten and how much traction it’s sort of gotten in just a few months. I also took people on a little bit of a tour of what’s new in webmaster console, because not everybody goes back all the time to see what’s new. So, Yahoo, I’ve sent out a shout out to Yahoo, because they have this great feature that lets you say, “Here are URL parameters, which I don’t find that useful”; section IDs, for example. And if you want to, you can specify now in Google’s Webmaster console the URL parameters that you think should be ignored. So that’s very handy. A lot of people really enjoy that. It took us a while to deliver it, but I’m glad that we did because now you can say, “You know what? Here’s this parameter. I can’t get rid of it because of my CMS. Google, just please ignore it,” and Google will do that for you. My personal favorite new feature in the Google Webmaster Console is that you can fetch as Googlebot. So you prove that you own a site and you can tell Google to go and fetch that site or a page on that site and show exactly what Google saw. Now, why would you need to use this? Primarily, my favorite reason why you’d use it is if your site has been hacked. It turns out people are so evil these days that they will only show the hacked content when Google comes crawling. So, if it’s someone who pretends to be Googlebot but they’re not coming from the right IP address, they don’t show the hacked content. They only show it to Googlebot when it’s coming from the right address, the right IP address. And if they’re really sneaky, they’ll put on a noarchive meta tag on. And then, you can’t see the cached link so there’s no way that you can see what Googlebot saw when it crawled your page. So as long as you own the page and you register that it’s yours in the webmaster tools area, you can fetch the page’s Googlebot and then you can say, “Uh-ah, here’s my hacked content, buy cheap [INDISTINCT].” And you iterate, you can fetch it a bunch of different times a day to sort of saying, “Okay, I tried to clean it up. Is it gone? No. Crap! Okay. I’ll try to clean it up again. Oh, got it. Okay. Good. Now, my site is clean.” So it’s a very handy tool. In my mind, it’s primarily the best for detecting malware. We’ve even seen people at Google get their blogs hacked, so it can be really, really handy even for people at Google to be able to fetch the page’s Googlebot. What else? A better malware Warnings. So not just telling you that you have malware but trying to actually show you, you know, more information: what is the URL or what’s the specific content on the page that was giving you the malware or making us flag the site as having malware. The more information we can provide, the faster you can diagnose and debug the stuff, clean it up, and get back into Google, or you know, hopefully, not have to worry about having infected your users, so a very simple thing but very, very handy. This one’s kind of interesting: message forwarding. So, if you have a message in the webmaster console, it used to be you have to go and check on it and say, “Go log in every day and see do I have a new message.” And not everybody lives and dies and breathes the webmaster console and wants to show up every morning and, “Oh, I wonder if I have any new messages. I can’t wait to find out.” So, adding the ability to say, “You know what? If I get a message in the webmaster console, send it to me by email or forward it to me by email.” I was kind of surprised. This one got spontaneous applause. So, apparently, there are a few people who are really glad that they could forward their messages on to their email address. Relatively straightforward thing but very, very helpful. Another straightforward but helpful thing is keyword details. So we’re starting to show more information where, suppose you look at my blog, for example. I rank maybe for a keyword like SEO. I could click on that and I could see some of the top pages that have the keyword SEO. So, it could be very handy to just sort of drill down in more detail on what are the exact pages that have these keywords, things like SEO. So, to close out what the PubCon presentation was talking about. One of the big things we talked about was communication. So we did over 80 blog posts in the last year. We posted a 20 plus page SEO Beginner’s Guide in PDF. So that’s really handy, because a lot of people think Google hates SEO, and nothing could be further from the truth. SEO, when done well and when done in a whitehat way can make your site more crawlable, more accessible, and can help users find useful content on your site. So a lot of people like to think, “Oh, Google hates SEO. Google thinks all SEO is evil.” And I was really glad that we published this SEO Beginner’s Guide because, you know, if we thought SEO is evil, we wouldn’t tell people, “Hey, here’s an introduction to what it is, how to do it well, how to do it in a whitehat way.” I think there’s nothing more than we would like than to have webmasters and Google cooperating to try to return good information to the users. That’s in everybody’s interest. So that was a really big step, to be able to put that beginner’s guide out there. Something that eased a lot of people’s mind was just that do a blog post and a video that say, “We don’t use keyword meta tags.” You see these lawsuits going on where somebody is like, “He put, you know, my business’s name in the keyword meta tags; therefore, I’m going to sue him.” And it was kind of nice to do this blog post, because we’ve already seen a little bit of an effect from that where people are like, look, it’s been a well-known fact for a long time. You can test it by putting a weird, unique keyword meta tag and, you know, you search for it later and you don’t find it. So, any reasonable person running the experiment would conclude Google doesn’t use the keyword meta tag. But just to come out and confirm it so that people don’t have to worry about it and don’t have to waste their time on it is really nice. So I was glad we were able to do that. Something people in the United States might not appreciate, but people around the world appreciate is that we now have the webmaster console in 40 different languages. So, that was towards the beginning of the year but still very, very important. And then one thing that we’ve been doing which you are very familiar with is Webmaster Videos. So I just mentioned the fact that we’ve done over 165 videos to date, over 1 million page views. I think it’s something like 1.2 or 1.3 million page views at this point. And you know, we have a webmaster channel on YouTube where hundreds of people subscribe to it. And they’re sometimes watching the video before I even Tweet about it. So, it’s at YouTube.com/googlewebmasterhelp. And it’s just all kinds of really interesting stuff. Sometimes it’s keynote presentations, you know, recreating talks; sometimes it’s one or two-minute videos. But it’s free, it’s often very useful information, and I’m really glad that we’ve tried that experiment to communicate more. So it was kind of a hot topic about Caffeine right around the time the PubCon was going on, so we just included a slide or two to talk about Caffeine. So, just to remind everybody, it’s a rewrite of our indexing infrastructure so it’s taking the old way that we used to index things that we crawled around the web and replacing that with new architecture that was fresh and was written to be more scalable, more flexible, the ability to attach different types of data in the process of indexing, the ability to do more documents or more comprehensive version of the web, and the ability to do it faster. All of that sort of stuff is really, really useful. And people were a little worried so we just reassured them that we got great feedback for the beta, but we were going to open up Caffeine at one data center. It was going to stay at one data center before the holidays, so you wouldn’t see Caffeine at any other data centers than the one until after the holidays until at least January. And that’s just put everybody’s mind at ease. They don’t need to worry about Caffeine. We are mindful of the fact that when the holidays are coming, webmasters get a little jittery. They get a little anxious. They don’t want rankings to change. They don’t want major changes to happen. And to the extent that we can, we try not to make any major changes. Now, Q4 is one-fourth of the year, and you can’t just shut down the search engine and not make any daily changes or make any changes at all for one-fourth of the year or you’d lose a lot of productivity. But we try to figure out if there’s something big coming, can we either do it earlier in the year or can we do it after the holiday so that we can to avoid getting any major problems. And I think that people appreciate that. Looking forward to the future, what do people see coming down the pipe? I think hacking and malware will continue. We see a lot more people sort of checking those webmaster documentation pages. So now we’re in hacking; it will continue to keep growing. But, we’re going to keep working on making our relevance better, trying to find ways to detect hack sites and detect spam as we always do. We’re going to keep looking at ways to communicate in better, more scalable ways. We’ve tried everything from webmaster chats to forums to Twittering, Tweeting, to videos. And we just keep trying to find ways whether it’s blogs or conferences to answer questions, and we’re going to keep doing that. So, I close the presentation with just a few takeaways. I said if there’s nothing else that you remember from this talk, the four things that I’d like you to remember are number one, try Social Search out. It’s surprisingly useful; it’s really good at surfacing relevant public content from your friends. Number two, try to speed up your site. There’s a bunch of tools and you’d be amazed at how easily you can speed up your site very quickly in some ways without doing a ton of new things, just trying to tweak a few small things, and users really appreciate that. If you haven’t looked at the webmaster console in a while, dig into it because there’s a lot of good content there. And finally, go ahead and subscribe to the official blog, the Google Webmaster Central Blog and the video channel on YouTube. I am kind of proud that I feel like I’m a little superfluous. People don’t really need me as much anymore. So I’ve noticed that I don’t post as much Google stuff on my personal blog, because there’s so much more stuff going up on the official blog. And I’m sort of urging people to think about switching their mental model from “lets see what Matt has to say today” to “lets see what the official blog has to say today,” because that’s always going to be completely comprehensive. They’re going to go into a lot of detail. And if you look at the schedule, they’re posting a lot of new information all the time on there. So I would definitely make sure that you subscribe with that blog and check it. There’s a lot of great information. Okay. That’s basically how the talk went. Everything else was panel and questions. So I hope you enjoyed the recreation of the panel presentation.

by Matt Cutts - Google's Head of Search Quality Team


Original video: