ranking-factors-smx-munich-2011-110404181056-phpapp01

Document Sample
ranking-factors-smx-munich-2011-110404181056-phpapp01 Powered By Docstoc
					Google’s Ranking Factors 2011
Early data from SEOmoz’s survey of 132 SEO professionals and
       correlation data from 10,000+ keyword rankings




                            Download at:

   http://bit.ly/rankfactorsmunich
                  Rand Fishkin, SEOmoz CEO, April 2011
SEOmoz Makes Software! We don’t offer consulting.
Understanding, Interpreting & Using
       Survey Opinion Data

                     Everybody’s wrong sometimes,
                      but there’s a lot we can learn
                    from the aggregation of opinions
              #1: Opinions are Not Fact
(these are smart people, but they can’t know everything about Google’s rankings)



               #2: Not Everyone Agrees
         (standard deviation can help show us the degree of consensus)



             #3: Data is Still Preliminary
                 (these are raw responses without any filtering)



       http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
      Many thanks to all who contributed their time to take the survey!
Understanding, Interpreting & Using
         Correlation Data

                   This is powerful, useful information,
                        but with that power comes
                  responsibility to present it accurately
                             Methodology
  10,271 Keywords, pulled from Google AdWords US Suggestions
  (all SERPs were pulled from Google in March 2011, after the Panda/Farmer update)


                Top 30 Results Retrieved for Each Keyword
                       (excluding all vertical/non-standard results)


 Correlations are for Pages/Sites that Appear Higher in the Top 30
        (we use the mean of Spearman’s correlation coefficient across all SERPs)


   Results Where <2 URLs Contain a Given Feature Are Excluded
(this also holds true for results where all the URLs contain the same values for a feature)


     More details, including complete documentation and the raw dataset will be
        http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
     released in May with the published version of the 2011 Ranking Factors
             Correlation & Dolphins




Dolphins who swim at the front of the pod tend to have larger dorsal fins, more muscular
tails and more damage on their flippers. The first two might have a causal link, but the
damaged flippers is likely a result of swimming at the front (i.e. having damaged flippers
doesn’t make a dolphin a better front-of-the-pod-swimmer). Likewise, with ranking
correlations, there’s probably many features that are correlated but not necessarily the
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
cause of the positive/negative rankings.
       Correlation IS NOT Causation




   Earning more linking root                 But, will adding more characters
domains to a URL may indeed                     to the HTML code of a page
 increase that page’s ranking.               increase rankings? Probably not.

  Just because a feature is correlated, even very highly, doesn’t necessarily mean that
     http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
  improving that metric on your site will necessarily improve your rankings.
How Confident Can We Be in the Accuracy
        of these Correlations?




  Because we have such a large data set, standard error is extremely low.
    This means even for small correlations, our estimates of the mean
  correlation are close to the actual mean correlation across all searches.

      Standard error won’t be reported in this presentation, but it’s less than 0.0035 for all of
         http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
      Spearman correlation results (so we can feel quite confident about our numbers)
Do Correlations in this Range Have
        Value/Meaning?




           Most of our data is                        A factor w/ 1.0 correlation
             in this range                         would explain 100% of Google’s
                                                   algorithm across 10K+ keywords
A rough rule of thumb with linear fit numbers is that they explain the number squared of the
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
system’s variance. Thus, a factor with correlation 0.3 would explain ~9% of Google’s algorithm.
Are You Ready for Some Data?!
Overall Algorithmic Factors


              These compare opinion/survey data
                      from 2009 vs. 2011
In 2009, link-based factors (page and domain-level) comprised 65%+ of voters’
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
algorithmic assessment
In 2011, link-based factors (page and domain-level) have shrunk in the voters’ minds to only
~45% of algorithmic components. Note: because the question options changed slightly (and more
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
options were added), direct comparison may not be entirely fair.
Page-Specific Link Signals


              These metrics are based on links that
              point specifically to the ranking page
            Most Important Page-Level Link Factors
                            (as voted on by 132 SEOs)




                                                               My guess: Some voters
                                                             didn’t fully understand the
                                                              “linking c-blocks” choice




With opinion data, voters ordered the factors from most important to least. Thus, when looking
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
at opinion stats, the factor voters felt was most important will have the smallest rank.
                                                                      In the rest of this deck,
                                                                      we’ll use linking c-blocks
                                                                       as a reference point,
                                                                          hence the red 




This data is exactly what an SEO would expect – the more diverse the sources, the greater the
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
correlation with higher rankings. These numbers are relatively similar to June 2010 data.
Correlations of Page-Level, Anchor Text-Based Link Data




                                                     No Surprise: Total links (including internal)
                                                     w/ anchor text is less well-correlated than
                                                           external links w/ anchor text


Partial anchor text matches have greater correlation than exact match. This might be correlation
  http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
only, or could indicate that the common SEO wisdom to vary anchor text is accurate.
                            Rand’s Takeaways
      #1: SEOs Believe the Power of Links Has Declined
       (correlation of link data w/ rankings has fallen slightly from 2010 to 2011 as well)


                 #2: Diversity of Links > Raw Quantity
(This fits well with most SEOs expectations. Also helps me feel better about the correlation data)


 #3: Exact Match Anchor Text Appears Slightly Less Well
  Correlated than Partial Anchor Text in External Links
    (This was surprising to me, though from Google’s perspective, it makes good sense. The
                        aggregated voter opinions agreed with this, too.)


           http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
         These are my personal takeaways from the data; others’ interpretations may vary
Domain-Wide Link Signals


          These metrics are based on links that point
             to anywhere on the ranking domain
        Most Important Domain-Level Link Factors
                       (as voted on by 132 SEOs)



                                                    C-Blocks: Likely the same
                                                    vote interpretation issue
                                                       as with page-level




  http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
Voters seem to believe that diversity/quantity is more important that quality.
          Correlation of Domain-Level Link Data




                                                              Nice Work! Excluding the
                                                              “c-blocks” issue, voters +
                                                              correlations match nicely.



  http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
Domain-level link data is surprisingly similar to page-level link data in correlation
                       Rand’s Takeaways

 #1: Google May Rank Pages, But Domains Matter Too
    (the closeness of correlation data and the opinions of voters both back this up)



 #2: Link Velocity & Diversity of Link Types Would Be
   Interesting to Measure Given Voters’ Opinions
                  (Hopefully we can look at these in future analyses)


#3: Correlations w/ “All” Links vs. Followed-Only is Odd
                     (Let’s take a closer link at these correlations)


       http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
Something Funny About Nofollows


              These compare followed vs. nofollowed
                    links to the domain + page
              Correlation of Followed vs. Nofollowed Links




 Nofollowed Matters? Many SEOs have been
saying that nofollow links can help w/ rankings.
 The correlation suggests maybe they’re right.

       These numbers exhibit why we like to build ranking models using machine learning. Models can
         http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
       help determine whether nofollowed links have a causal impact or whether it’s mere correlation.
On-Page Signals


     These metrics are based on keyword usage
       and features of the ranking document
      Most Important On-Page, Keyword-Use Factors
                           (as voted on by 132 SEOs)


                                                                 My guess: Some voters
                                                                 didn’t fully understand
                                                                  the internal/external
                                                                   link anchors choice




NOTE: We surveyed SEOs about more on-page optimization features, but I didn’t include them all
  http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
on this chart as it would make the labels very tiny and hard to read 
       Correlation of On-Page Keyword-Use Elements



                                                                    Curious: Longer
                                                                documents seem to rank
                                                                better than shorter ones




                                                            Keyword-based factors are
                                                           generally less well correlated
                                                           w/ higher rankings than links.



This is just a sampling of the on-page elements we observed; some factors haven’t yet been
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
calculated and thus couldn’t be compared for this presentation. They’ll be in the full version.
             Correlation of On-Page Keyword-Use Elements




The theory that AdSense
                                          More reason to believe Google when they say
use boosts rankings isn’t
                                         page load speed is a factor, but a very small one
 supported by the data
      There’s a longtime rumor that linking externally to Google.com (or Microsoft on Bing) helps with
         http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
      rankings. It’s comforting to see that correlation-wise, linking to MS is better on Google 
                          Rand’s Takeaways
#1: Very Tough to Differentiate w/ On-Page Optimization
       (as in the past, the data suggests that lots of results are getting on-page right)


   #2: Longer/Larger Documents Tend to Rank Better
      (It could be that post-Panda/Farmer update, robust content is rewarded more)


     #3: Long Titles + URLs are Still Likely Bad for SEO
 (In addition to the negative correlations, they’re harder to share, to type-in and to link to)


  #4: Using Keywords Earlier in Tags/Docs Seems Wise
  (Correlation backs up the common wisdom that keywords closer to the top matter more)


        http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
       We definitely need to look at more on-page factors in the data for the full report, too.
Social Signals

         These signals are based on data
        from users of Twitter, Facebook &
            Google Buzz via their APIs
         Most Important Social Media-Based Factors
                            (as voted on by 132 SEOs)
                                                             Curious: For Twitter, voters felt
                                                            authority matters more, while for
                                                            Facebook, it’s raw quantity (could
                                                             be because GG doesn’t have as
                                                             much access to FB graph data).




Although we didn’t ask voters for a cutoff on what they believe matters vs. doesn’t, I suspect
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
many/most would have said that Google Buzz and Digg/Reddit/SU aren’t used in the rankings.
           Correlation of Social Media-Based Factors
                     (data via Topsy API & Google Buzz API)




                                                                  Amazing: Facebook Shares
                                                                     is our single highest
                                                                   correlated metric with
                                                                   higher Google rankings.



Although voters thought Twitter data / tweets to URLs were more influential, Facebook’s metrics
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
are substantially better correlated with rankings. Time to get more FB Shares!
Percent of Results (from our 10,200 Keyword Set) in Which the
                      Feature Was Present




                                                                   It amazed me that
                                                                Facebook Share data was
                                                                present for 61% of pages
                                                                  in the top 30 results




      most link factors, 99%+ of results had data from Linkscape; for social data, this was much
  Forhttp:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
  lower, but still high enough that standard error is below 0.0025 for each of the metrics.
                          Rand’s Takeaways
             #1: Social is Shockingly Well-Correlated
(it’s hard to doubt causation, particularly after reading the SearchEngineLand interview here)


   #2: Facebook may be more influential than Twitter
  (Or it may be that Facebook data is simply more robust/available for URLs in the SERPs)


      #3: Google Buzz is Probably Not in Use Directly
 (Since so many users simply have their Tweet streams go to Buzz, and correlation is lower)


#4: We Need to Learn More About How Social is Used
   (Understanding how Google uses social metrics, parses “anchor text,” etc. looms large)

        Expect more experimentation and, sadly, some gaming attempts w/
          http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
        Twitter + Facebook by SEOs (and spammers) in the future.
Highest Positively + Negatively
  Correlated Metrics Overall

                These are the features most indicative
                     of higher vs. lower rankings
                Top 8 Strongest Correlated Metrics




Exact match domain is actually not in the top 8, but I thought I should include it, as we haven’t
   http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
discussed domain name matches yet. More data on that will be calculated/released in the full set.
                 Top 8 Most Negatively Correlated Metrics




Be concise and to-the-point;
 it’s good for users and for
       your rankings 




     Long domain names, titles, URLs and domain names all had negative correlations with rankings.
       http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
     Again, I’ve included # of words in title, which isn’t technically in the top 8, but still interesting
                    Top 8 Most Negatively Correlated Metrics




   One of the most surprising finds in our
  dataset. We double-checked to be sure.
  40% of URLs in the set had only followed
links, and these tended to have lower Page
 Authority (and lower rankings) than those
   w/ both followed and nofollowed links.
    Our data scientist thinks there’s some
   correlation between having nofollowed
     and other good/natural link signals.




        Also note that % of followed links on a page has a slightly negative correlation with rankings.
           http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
        Perhaps sites that make all their links out followed aren’t being careful about what they link to?
Which Domains Appeared Most
Frequently in Our 10K+ SERPs?
    Top 20 Root Domains Most
Prevalent in our 10,200 keyword set
       (top 30 rank positions)




    SEOs may be disappointed to see
   eHow.com performing so well, but
     classic content aggregators like
 About.com + Wikipedia still beat them.




         http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
What do the Experts think
   the Future Holds?
 What Do SEOs Believe Will Happen w/ Google’s Use of
           Ranking Features in the Future?




While there was some significant contention about issues like paid links and ads vs. content, the
  http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
voters nearly all agreed that social signals and perceived user value signals have bright futures.
                            IMPORTANT!
            Don’t Misuse or Misattribute Correlation Data!

Think of correlation data as a way of seeing features of sites that rank
well, rather than a way of seeing what metrics search engines are
actually measuring and counting.

A well-correlated metric can often be its own reward, even if it
doesn’t directly impact search engine rankings. Virtually all the data
in this report reflect the best practices of inbound marketing overall –
and using the data to help support these is an excellent application 

Thanks much!
Rand

     We are looking forward to sharing the full data in the new version of the Search Ranking Factors
        http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
     report coming in ay 2011. Lots more cool info along with the full dataset will be available then.
                           Q+A
                            Download at:

     http://bit.ly/rankfactorsmunich
                                           You can now try SEOmoz PRO Free!
                                           http://www.seomoz.org/freetrial

Rand Fishkin, CEO & Co-Founder, SEOmoz

    • Twitter: @randfish
    • Blog: www.seomoz.org/blog
    • Email: rand@seomoz.org

				
DOCUMENT INFO