Saturday, December 14th, 2019

Posted by Dr. Pete
Lately, I’ve been seeing data visualizations everywhere, including the products in my own kitchen. This week, I had sightings on my tea and my tortilla chips. This is a story about the box my tea came in (for the sake of my ma…

Posted by Dr. Pete
It’s been over four years (February 2009) since Google and Yahoo announced support for the rel=canonical tag, and yet this single line of HTML is still causing a lot of confusion for SEOs and webmasters. Recently, Google poste…

Posted by Dr. Pete
This is not a post about SEO. It is, however, a post about the future of search. This surprised even me – when I started writing this piece, it really was just an idea about building a better review. I realized, though, that f…

Posted by Dr. Pete
Since the Wild West days of the late 1990s, SEOs have been grouped into two camps – the “black hats” and the “white hats”. Over time, these distinctions have become little more than caricatures, cartoon…

Posted by Dr. Pete
We tend to think of AdWords as the domain of PPC specialists, but it’s becoming clearer and clearer that Google’s SERP advertising has a huge impact on the position and effectiveness of organic results. So, I wanted to a…

Posted by Dr. Pete
We tend to think of AdWords as the domain of PPC specialists, but it’s becoming clearer and clearer that Google’s SERP advertising has a huge impact on the position and effectiveness of organic results. So, I wanted to a…

Posted by Dr. Pete
In the days of 10 blue links, getting a #1 ranking on Google was the ultimate goal. As advertising becomes more prominent, local and vertical results become more complex, and Knowledge Graph and other rich SERP features become more …

Posted by Dr. Pete

Secrets of the 7-result SERP (pulp sci-fi cover)In August of 2012, Google launched 7-result SERPs, transforming page-one results. MozCast data initially showed that as many as 18% of the queries we tracked were affected. We’ve been collecting data on the phenomenon ever since, and putting some of the most common theories to the test. This is the story of the 7-result SERP as we understand it today (image created with PULP-O-MIZER).

I. 7-Result SERPs in The Wild

By now, you’ve probably seen a few 7-result SERPs in the “wild”, but I think it’s still useful to start at the beginning. Here are a few examples (with screenshots) of the various forms the 7-result SERP takes these days. I apologize in advance for the large images, but I think it's sometimes important to see the full-length SERP.

(1) The “Classic” 7-Result SERP

The classic 7-result SERP usually appears as a #1 listing with expanded site-links (more on that later), plus six more organic listings. Here’s a screenshot from a search for “some ecards”, a navigational query:

Classic 7-result SERP

(2) The 7 + 7 with Local Results

It’s also possible to see 7-result SERPs blended with other types of results, including local “pack” results. Here’s the result of a search with local intent – “williamsburg prime outlets”:

7-result SERP with 7 local

(3) The 6 + Image Mega-Pack

It’s not just organic results that can appear in the #1 spot of a 7-result SERP, though. There’s a rare exception when a “mega-pack” of images appears at the top of a SERP. Here’s a “7-result” SERP with one image pack and six organic listings – the search is “pictures of cats”:

7-result SERP with image mega-pack

II. Some 7-Result SERP Stats

Our original data set showed 7-result page-one SERPs across about 18% of the queries we tracked. That number has varied over time, dropping as low as 13%. Recently, we’ve been experimenting with a larger data set (10,000 keywords). Over the 10 days from 1/13-1/22 (the data for this post was collected around 1/23), that data set tracked 7-result SERPs in the range of 18.1% – 18.5%. While this isn’t necessarily representative of the entire internet, it does show that 7-result SERPs continue to be a significant presence on Google.

These percentages are calculated by unique queries. We can also looking at query volume. Using Google’s “global” volume (exact-match), the percentage of queries by volume with 7-result SERPs for 1/22 was 19.5%. This compares to 18.5% by unique queries. Factoring in volume, that’s almost a fifth of all queries we track.

Here are the 7-result SERP percentages across 20 industry categories (500 queries per category) for 1/22:

 CATEGORY  7-SERPS
 Apparel  23.6% 
 Arts & Entertainment  16.8% 
 Beauty & Personal Care  12.6% 
 Computers & Consumer Electronics  16.8% 
 Dining & Nightlife  27.2% 
 Family & Community  13.2% 
 Finance  19.2% 
 Food & Groceries  13.4% 
 Health  3.8% 
 Hobbies & Leisure  11.0% 
 Home & Garden  20.0% 
 Internet & Telecom  12.6% 
 Jobs & Education  21.4% 
 Law & Government  16.2% 
 Occasions & Gifts  7.8% 
 Real Estate  13.2% 
 Retailers & General Merchandise  29.6% 
 Sports & Fitness  28.6% 
 Travel & Tourism  36.2% 
 Vehicles  26.0% 

These categories were all borrowed from the Google AdWords keyword research tool. The most impacted vertical is “Travel & Tourism”, at 36.2%, with “Health” being the least impacted.  At only 500 queries/category, it’s easy to over-interpret this data, but I think it’s interesting to see how much the impact varies.

III. The Site-Link Connection

Many people have hypothesized a link between expanded site-links and 7-result SERPs. We’ve seen a lot of anecdotal evidence, but I thought I’d put it to the test on a large scale, so we collected site-link data (presence and count) for the 10,000 keywords in this study.

Of the 1,846 queries (18.5%) in our data set that had 7-result SERPs on the morning of 1/22, 100% of them had expanded site-links for the #1 position. There were 45 queries that had expanded site-links, but did not show a 7-result count, but those were all anomalies based on how we count local results (we include blended local and packs in the MozCast count, whereas Google may not). There is nearly a perfect, positive correlation between 7-result SERPs and expanded site-links. Whatever engine is driving one also very likely drives the other.

The only minor exception is the image blocks mentioned above. In those cases, the image “mega-pack” seems to be the equivalent of expanded site-links. Internally, we count those as 6-result SERPs, but I believe Google sees them as a 7-result variant.

While most (roughly 80%) of 7-result SERPs have six expanded site-links, there doesn’t seem to be any rule about that. We’re tracking 7-result SERPs with anywhere from one to six expanded site-links. It doesn’t take a full set of site-links to trigger a 7-result SERP. In some cases, it seems to just be the case that the domain only has a limited number of query-relevant pages.

IV. 7-Result Query Stability

Originally, I assumed that once a query was deemed “worthy” of site-links and a 7-result SERP, that query would continue to have 7 results until Google made a major change to the algorithm. The data suggests that this is far from true – many queries have flipped back and forth from 7 to 10 and vise-versa since the 7-result SERP roll-out.

While our MozCast Top-View Metrics track major changes to the average result count, the real story is a bit more complicated. On any given day, a fairly large number of keywords flip from 7s to 10s and 10s to 7s. From 1/21 to 1/22, for example, 61 (0.61%) went from 10 to 7 results and 56 (0.56%) went from 7 to 10 results. A total of 117 “flips” happened in a 24-hour period – that’s just over 1% of queries, and that seems to be typical.

Some keywords have flipped many times – for example, the query “pga national” has flipped from 7-to-10 and back 27 times (measured once/day) since the original roll-out of 7-result SERPs. This appears to be entirely algorithmic – some threshold (whether it’s authority, relevance, brand signals, etc.) determines if a #1 result deserves site-links, probably in real-time, and when that switch flips, you get a 7-result SERP.

V. The Diversity Connection

I also originally assumed that a 7-result SERP was just a 10-result SERP with site-links added and results #8-#10 removed. Over time, I developed a strong suspicion this was not the case, but tracking down solid evidence has been tricky. The simple problem is that, once we track a 7-result SERP, we can’t see what the SERP would’ve looked like with 10 results.

This is where query stability comes in – while it’s not a perfect solution (results naturally change over time), we can look at queries that flip and see how the 7-result SERP on one day compares to the 10-result SERP on the next. Let’s look at our flipper example, “pga national” – here are the sub-domains for a 7-result SERP recorded on 1/19:

  1. www.pgaresort.com
  2. www.pganational.com
  3. en.wikipedia.org
  4. www.jeffrealty.com
  5. www.tripadvisor.com
  6. www.pga.com
  7. www.pgamembersclub.com

The previous day (1/18), that same query recorded a 10-result SERP. Here are the sub-domains for those 10 results:

  1. www.pgaresort.com
  2. www.pgaresort.com
  3. www.pgaresort.com
  4. www.pgaresort.com
  5. www.pganational.com
  6. en.wikipedia.org
  7. www.tripadvisor.com
  8. www.pga.com
  9. www.jeffrealty.com
  10. www.bocaexecutiverealty.com

The 10-result SERP allows multiple listings for the top domain, whereas the 7-result SERP collapses the top domain to one listing plus expanded site-links. There is a relationship between listings #2-#4 in the 10-result SERP and the expanded site-links in the 7-result SERP, but it’s not one-to-one.

Recently, I happened across another way to compare. Google partners with other search engines to provide data, and one partner with fairly similar results is EarthLink. What’s interesting is that Google partners don’t show expanded site-links or 7-result SERPs – at least not in any case I’ve found (if you know an exception, please let me know). Here’s a search for “pga national” on EarthLink on 1/25:

  1. www.pgaresort.com
  2. www.pgaresort.com
  3. www.pgaresort.com
  4. www.pganational.com
  5. en.wikipedia.org
  6. www.tripadvisor.com
  7. www.jeffrealty.com
  8. www.pga.com
  9. www.bocaexecutiverealty.com
  10. www.devonshirepga.com

Again, the #1 domain is repeated. Looking across multiple SERPs, the pattern varies a bit, and it’s tough to pin it down to just one rule for moving from 7 results to 10 results. In general, though, the diversity pattern holds. When a query shifts from a 10-result SERP to a 7-result SERP, the domain in the #1 spot gets site-links but can’t occupy spots #2-#7.

Unfortunately, the domain diversity pattern has been hard to detect at large-scale.  We track domain diversity (percentage of unique sub-domains across the Top 10) in MozCast, but over the 2-3 days that 7-results SERPs rolled out, overall diversity only increased from 55.1% to 55.8%.

Part of the problem is that our broad view of diversity groups all sub-domains, meaning that the lack of diversity in the 10-result SERPs could overpower the 7-result SERPs. So, what if we separate them? Across the core MozCast data (1K queries), domain diversity on 1/22 was 53.4%. Looking at just 7-result SERPs, though, domain diversity was 62.2% (vs. 54.2% for 10-result SERPs). That’s not a massive difference, but it’s certainly evidence to support the diversity connection.

Of course, causality is tough to piece together. Just because 7-result SERPs are more diverse, that doesn’t mean that Google is using domain crowding as a signal to generate expanded site-links. It could simply mean that the same signals that cause a result to get expanded site-links also cause it to get multiple spots in a 10-result SERP.

VI. The Big Brand Connection

So, what drives 7-result SERPs? Many people have speculated that it’s a brand signal – at a glance, there are many branded (or at least navigational) queries in the mix. Many of these are relatively small brands, though, so it’s not a classic picture of big-brand dominance. There are also some 7-result queries that don’t seem branded at all, such as:

  1. “tracking santa”
  2. “cool math games for kids”
  3. “unemployment claim weeks”
  4. “cell signaling”
  5. “irs transcript”

Granted, these are exceptions to the rule, and some of these are brand-like, for lack of a better phrase. The query “irs transcript” does pull up the IRS website in the top spot – the full phrase may not signal a brand, but there’s a clear dominant match for the search. Likewise, “tracking santa” is clearly NORAD’s domain, even if they don’t have a domain or brand called “tracking santa”, and even if they’re actually matching on “tracks santa”.

In some cases, there does seem to be a brand (or entity) bias. Take a search for “reef”, which pulls up Reef.com in the #1 spot with four site-links:

Google #1 result for Reef.com

Not to pick on Reef.com, but I don’t think of them as a household name. Are they a more relevant match to “reef” than any particular reef (like the Great Barrier Reef) or the concept of a reef in general? It could be a question of authority (DA = 66) or of the Exact-Match Domain in play – unfortunately, we throw around the term “brand” a lot, but we don’t often dig into how that translates into practical ranking signals.

I pulled authority metrics (DA and PA) for a subset of these queries, and there seems to be virtually no correlation between authority (as we measure it) and the presence of site-links. An interesting example is Wikipedia. It occupies over 11% of the #1 results (yeah, it’s not your imagination), but only seven of those 1,119 queries have 7-result SERPs. This is a site with a Domain Authority of 100 (out of 100).

VII. The "Entity" Connection

One emerging school of thought is that named entities are getting more ranking power these days. A named entity doesn’t have to be a big brand, just a clear match to a user’s intent. For example, if I searched for “sam’s barber shop”, SamsBarberShop.com would much more likely match my intent than results for barbers who happened to be named Sam. Sam’s Barber Shop is an entity, regardless of its Domain Authority or other ranking signals. This goes beyond just an exact-match domain (EMD) connection, too.

I think that 7-result SERPs and other updates like Knowledge Graph do signal a push toward classifying entities and generally making search reflect the real world. It’s not going to be enough in five years simply to use keywords well in your content or inbound anchor links. Google is going to want to want to return rich objects that represent “real-world” concepts that people understand, even if those concepts exist primarily online. This fits well into the idea of the dominant interpretation, too (as outlined in Google’s rater guidelines and other documents). Whether I search for “Microsoft” or “Sam’s Barber Shop”, the dominant interpretation model suggests that the entity’s website is the best match, regardless of other ranking factors or the strength of their SEO.

There's only one problem with the entity explanation. Generally speaking, I'd expect an entity to be stable – once a query was classified as an entity and acquired expanded sitelinks, I'd expect it to stay that way. As mentioned, though, the data is fairly unstable. This could indicate that entity detection is dynamic – based on some combination of on-page/link/social/user signals.

VIII. The Secret Sauce is Ketchup

Ok, maybe “secrets” was a bit of an exaggeration. The question of what actually triggers a 7-result SERP is definitely complicated, especially as Google expands into Knowledge Graph and advanced forms of entity association. I'm sure the broader question on everyone's mind is "How do I get (or stop getting) a 7-result SERP?" I'm not sure there's any simple answer, and there's definitely no simple on-page SEO trick. The data suggests that even a strong link profile (i.e. authority) may not be enough. Ultimately, query intent and complex associations are going to start to matter more, and your money keywords will be the ones where you can provide a strong match to intent. Pay attention not only to the 7-result SERPs in your own keyword mix, but to queries that trigger Knowledge Graph and other rich data – I expect many more changes in the coming year.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by Dr. Pete

There’s an app for everything – the problem is that we’re so busy chasing the newest shiny toy that we rarely stop to learn to use simple tools well. As a technical SEO, one of the tools I seem to never stop finding new uses for is the site: operator. I recently devoted a few slides to it in my BlueGlassX presentation, but I realized that those 5 minutes were just a tiny slice of all of the uses I’ve found over the years.

People often complain that site:, by itself, is inaccurate (I’ll talk about that more at the end of the post), but the magic is in the combination of site: with other query operators. So, I’ve come up with two dozen killer combos that can help you dive deep into any site.

1. site:example.com

Ok, this one’s not really a combination, but let’s start with the basics. Paired with a root domain or sub-domain, the [site:] operator returns an estimated count of the number of indexed pages for that domain. The “estimated” part is important, but we’ll get to that later. For a big picture, I generally stick to the root domain (leave out the “www”, etc.).

Each combo in this post will have a clickable example (see below). I'm picking on Amazon.com in my examples, because they're big enough for all of these combos to come into play:

You’ll end up with two bits of information: (1) the actual list of pages in the index, and (2) the count of those pages (circled in purple below):

Screenshot - site:amazon.com

I think we can all agree that 273,000,000 results is a whole lot more than most of us would want to sort through. Even if we wanted to do that much clicking, Google would stop us after 100 pages. So, how can we get more sophisticated and drill down into the Google index?

2. site:example.com/folder

The simplest way to dive deeper into this mess is to provide a sub-folder (like “/blog”) – just append it to the end of the root domain. Don’t let the simplicity of this combo fool you – if you know a site’s basic architecture, you can use it to drill down into the index quickly and spot crawl problems.

3. site:sub.example.com

You can also drill down into specific sub-domains. Just use the full sub-domain in the query. I generally start with #1 to sweep up all sub-domains, but #3 can be very useful for situations like tracking down a development or staging sub-domain that may have been accidentally crawled.

4. site:example.com inurl:www

The "inurl:" operator searches for specific text in the indexed URLs. You can pair “site:” with “inurl:” to find the sub-domain in the full URL. Why would you use this instead of #3? On the one hand, "inurl:" will look for the text anywhere in the URL, including the folder and page/file names. For tracking sub-domains this may not be desirable. However, "inurl:" is much more flexible than putting the sub-domain directly into the main query. You'll see why in examples #5 and #6.

5. site:example.com -inurl:www

Adding [-] to most operators tells Google to search for anything but that particular text. In this case, by separating out "inurl:www", you can change it to "-inurl:www" and find any indexed URLs that are not on the "www" sub-domain. If "www" is your canonical sub-domain, this can be very useful for finding non-canonical URLs that Google may have crawled.

6. site:example.com -inurl:www -inurl:dev -inurl:shop

I'm not going to list every possible combination of Google operators, but keep in mind that you can chain most operators. Let's say you suspect there are some stray sub-domains, but you aren't sure what they are. You are, however, aware of "www.", "dev." and "shop.". You can chain multiple "-inurl:" operators to remove all of these known sub-domains from the query, leaving you with a list of any stragglers.

7. site:example.com inurl:https

You can't put a protocol directly into "site:" (e.g. "https:", "ftp:", etc.). Fortunately, you can put "https" into an "inurl:" operator, allowing you to see any secure pages that Google has indexed. As with all "inurl:" queries, this will find "https" anywhere in the URL, but it's relatively rare to see it somewhere other than the protocol.

8. site:example.com inurl:param

URL parameters can be a Panda's dream. If you're worried about something like search sorts, filters, or pagination, and your site uses URL parameters to create those pages, then you can use "inurl:" plus the parameter name to track them down. Again, keep in mind that Google will look for that name anywhere in the URL, which can occasionally cause headaches.

Pro Tip: Try out the example above, and you'll notice that "inurl:ref" returns any URL with "ref" in it, not just traditional URL parameters. Be careful when searching for a parameter that is also a common word.

9. site:example.com -inurl:param

Maybe you want to know how many search pages are being indexed without sorts or how many product pages Google is tracking with no size or color selection – just add [-] to your "inurl:" statement to exclude that parameter. Keep in mind that you can combine "inurl:" with "-inurl:", specifically including some parameters and excluding others. For complex, e-commerce sites, these two combos alone can have dozens of uses.

10. site:example.com text goes here

Of course, you can alway combine the "site:" operator with a plain-old, text query. This will search the contents of the entire page within the given site. Like standard queries, this is essentially a logical [AND], but it's a bit of a loose [AND] – Google will try to match all terms, but those terms may be separated on the page or you may get back results that only include some of the terms. You'll see that the example below matches the phrase "free Kindle books" but also phrases like "free books on Kindle".

11. site:example.com “text goes here”

If you want to search for an exact-match phrase, put it in quotes. This simple combination can be extremely useful for tracking down duplicate and near-duplicate copy on your site. If you're worried about one of your product descriptions being repeated across dozens of pages, for example, pull out a few unique terms and put them in quotes.

12. site:example.com/folder “text goes here”

This is just a reminder that you can combine text (with or without quotes) with almost any of the combinations previously discussed. Narrow your query to just your blog or your store pages, for example, to really target your search for duplicates.

13. site:example.com this OR that

If you specifically want a logical [OR], Google does support use of "or" in queries. In this case, you'd get back any pages indexed on the domain that contained either "this" or "that" (or both, as with any logical [OR]). This can be very useful if you've forgotten exactly which term you used or are searching for a family of keywords.

Edit: Hat Tip to TracyMu in the comments – this is one case where capitalization matters. Either use "OR" in all-caps or the pipe "|" symbol. If you use lower-case "or", Google could interpret it as part of a phrase.

14. site:example.com “top * ways”

The asterisk [*] can be used as a wildcard in Google queries to replace unknown text. Let's say you want to find all of the "Top X" posts on your blog. You could use "site:" to target your blog folder and then "Top *" to query only those posts.

Pro Tip: The wild'card [*] operator will match one or multiple words. So, "top * questions" can match "Top 40 Books" or "Top Career Management Books". Try the sample query above for more examples.

15. site:example.com “top 7..10 ways”

If you have a specific range of numbers in mind, you can use "X..Y" to return anything in the range from X to Y. While the example above is probably a bit silly, you can use ranges across any kind of on-page data, from product IDs to prices.

16. site:example.com ~word

The tilde [~] operator tells Google to find words related to the word in question. Let's say you wanted to find all of the posts on your blog related to the concept of consulting – just add "~consulting" to the query, and you'll get the wider set of terms that Google thinks are relevant.

17. site:example.com ~word -word

By using [-] to exclude the specific word, you can tell Google to find any pages related to the concept that don't specifically target that term. This can be useful when you're trying to assess your keyword targeting or create new content based on keyword research.

18. site:example.com intitle:”text goes here”

The "intitle:" operator only matches text that appears in the <TITLE></TITLE> tag. One of the first spot-checks I do on any technical SEO audit is to use this tactic with the home-page title (or a unique phrase from it). It can be incredibly useful for quickly finding major duplicate content problems.

19. site:example.com intitle:”text * here”

You can use almost any of the variations mentioned in (12)-(17) with "intitle:" – I won't list them all, but don't be afraid to get creative. Here's an example that uses the wildcard search in #14, but targets it specifically to page titles.

Pro Tip: Remember to use quotes around the phrase after "intitle:", or Google will view the query as a one-word title search plus straight text. For example, "intitle:text goes here" will look for "text" in the title plus "goes" and "here" anywhere on the page.

20. intitle:”text goes here”

This one's not really a "site:" combo, but it's so useful that I had to include it. Are you suspicious that other sites may be copying your content? Just put any unique phrase in quotes after "intitle:" and you can find copies across the entire web. This is the fastest and cheapest way I've found to find people who have stolen your content. It's also a good way to make sure your article titles are unique.

21. “text goes here” -site:example.com

If you want to get a bit more sophisticated, you can use "-site:" and exclude mentions of copy on any domain (including your own). This can be used with straight text or with "intitle:" (like in #20). Including your own site can be useful, just to get a sense of where your ranking ability stacks up, but subtracting out your site allows you to see only the copies.

22. site:example.com intext:”text goes here”

The "intext:" operator looks for keywords in the body of the document, but doesn't search the <TITLE> tag. The text could appear in the title, but Google won't look for it there. Oddly, "intext:" will match keywords in the URL (seems like a glitch to me, but I don't make the rules).

23. site:example.com ”text goes here” -intitle:"text goes here"

You might think that #22 and #23 are the same, but there's a subtle difference. If you use "intext:", Google will ignore the <TITLE> tag, but it won't specifically remove anything with "text goes here" in the title. If you specfically want to remove any title mentions in your results, then use "-intitle:".

24. site:example.com filetype:pdf

One of the drawbacks of "inurl:" is that it will match any string in the URL. So, for example, searching on "inurl:pdf", could return a page called "/guide-to-creating-a-great-pdf". By using "filetype:", you can specify that Google only search on the file extension. Google can detect some filetypes (like PDFs) even without a ".pdf" extension, but others (like "html") seem to require a file extension in the indexed document.

25. site:.edu “text goes here”

Finally, you can target just the Top-Level Domain (TLD), by leaving out the root domain. This is more useful for link-building and competitive research than on-page SEO, but it's definitely worth mentioning. One of our community members, Himanshu, has an excellent post on his own blog about using advanced query operators for link-building.

Why No Allintitle: & Allinurl:?

Experienced SEOs may be wondering why I left out the operators "allintitle:" and "allinurl:" – the short answer is that I've found them increasingly unreliable over the past couple of years. Using "intitle:" or "inurl:" with your keywords in quotes is generally more predictable and just as effective, in my opinion.


Putting It All to Work

I want to give you a quick case study to show that these combos aren't just parlor tricks. I once worked with a fairly large site that we thought was hit by Panda. It was an e-commerce site that allowed members to spin off their own stores (think Etsy, but in a much different industry). I discovered something very interesting just by using "site:" combos (all URLs are fictional, to protect the client):

(1) site:example.com = 11M

First, I found that the site had a very large number (11 million) of indexed pages, especially relative to its overall authority. So, I quickly looked at the site architecture and found a number of sub-folders. One of them was the "/stores" sub-folder, which contained all of the member-created stores:

(2) site:example.com/stores = 8.4M

Over 8 million pages in Google's index were coming just from those customer stores, many of which were empty. I was clearly on the right track. Finally, simply by browsing a few of those stores, I noticed that every member-created store had its own internal search filters, all of which used the "?filter" parameter in the URL. So, I narrowed it down a bit more:

(3) site:example.com/stores inurl:filter = 6.7M

Over 60% of the indexed pages for this site were coming from search filters on user-generated content. Obviously, this was just the beginning of my work, but I found a critical issue on a very large site in less than 30 minutes, just by using a few simple query operator combos. It didn't take an 8-hour desktop crawl or millions of rows of Excel data – I just had to use some logic and ask the right questions.


How Accurate Is Site:?

Historically, some SEOs have complained that the numbers you get from "site:" can vary wildly across time and data centers. Let's cut to the chase: they're absolutely right. You shouldn't take any single number you get back as absolute truth. I ran an experiment recently to put this to the test. Every 10 minutes for 24 hours, I automatically queried the following:

  1. site:seomoz.org
  2. site:seomoz.org/blog
  3. site:seomoz.org/blog intitle:spam

Even using a fixed IP address (single data center, presumably), the results varied quite a bit, especially for the broad queries. The range for each of the "site:" combos across 24 hours (144 measurements) was as follows:

  1. 67,700 – 114,000
  2. 8,590 – 8620
  3. 40 – 40

Across two sets of IPs (unique C-blocks), the range was even larger (see the "/blog" data):

  1. 67,700 – 114,000
  2. 4,580 – 8620
  3. 40 – 40

Does that mean that "site:" is useless? No, not at all. You just have to be careful. Sometimes, you don't even need the exact count – you're just interested in finding examples of URLs that match the pattern in question. Even if you need a count, the key is to drill down. The narrowest range in the experiment was completely consistent across 24 hours and both data centers. The more you drill down, the better off you are.

You can also use relative numbers. In my example above, it didn't really matter if the 11M total indexed page count was accurate. What mattered was that I was able to isolate a large section of the index based on one common piece of site architecture. Assumedly, the margin of error for each of those measurements was similar – I was only interested in the relative percentages at each step. When in doubt, take more than one measurement.

Keep in mind that this problem isn't unique to the "site:" operator – all search result counts on Google are estimates, especially the larger numbers. Matt Cutts discussed this in a recent video, along with how you can use the page 2 count to sometimes reduce the margin of error:


The True Test of An SEO

If you run enough "site:" combos often enough, even by hand, you may eventually be greeted with this:

Google Captcha

If you managed to trigger a CAPTCHA without using automation, then congratulations, my friend! You're a real SEO now. Enjoy your new tools, and try not to hurt anyone.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by Dr. Pete
On the morning of December 14th, MozCast registered the largest 24-hour Google ranking flux on record since we started tracking data in early April. The temperature for Thursday, December 13th was 102.2° F (for reference, the or…