Whether you own a small personal blog, or run the e-commerce department for massive high-street chains, link analysis is important to you.
It’s how you can tell whether your posts are being promoted, how far your last campaign reached – and it’s also how you know if you’re about to get hit by a Google penalty.
Google Penguin is now a constant threat, and in case you’ve never seen the effect of one before; here’s what happened to the search visibility of Interflora back in February 2012:
They didn’t even rank for their own brand name.
And if no-one can find you in the search results because you’ve been penalised, you’re in big trouble.
However, link analysis is a tricky art. Link data can be expensive, and you can only analyse the links you find.
If you don’t use Bing Webmaster Tools you could miss 37% of the whole link profile.
With this profile, Bing WMT provided an additional 1,070 domains that all the other big names missed – so it’d be impossible to find 100% of this site’s backlinks without it!
There are multiple places to obtain backlink data, and everyone has their own opinions on which is best. The majority providers will bill you based on how much you’d like to download. We like to use a broad variety of sources, so I’m able to compare and contrast data accurately.
I took the backlink profile of one website (Under NDA so no name dropping, sorry) and pieced together links from ahrefs, Bing Webmaster Tools, Google Webmaster Tools, Majestic’s fresh index as well as their historic, Moz Open Site Explorer and SEO Spyglass.
In the end there were a total of 6,289 unique linking domains, and a respectable 225,664 unique links.
If you were to look at this by data source you’d see that Majestic Historic and ahrefs come way out in front as the providers of both the most links:
To me this is no surprise; this particular profile has an issue with historical sitewide links. The interesting data comes when you look at which source provided the most linking domains:
Here we can see that Majestic Historic and Bing provided the most ample supply of domains.
If you look at it from a different perspective, we know that there are 6,289 linking domains on file – so we can deduce precisely how much of the total profile each data source could possibly cover:
While ahrefs provided a substantial amount of backlinks, this boils down to only 13% of the full profile.
But anyone who’s worked with this sort of data knows that raw data like this can be very misleading. When it comes to spam hunting, or assessing the value of your current link profile only the live links count.
So I crawled all 225,664 links in search of my client’s site, and found that more than 70% of my compiled links were dead!
Suddenly this paints a very different picture – what use is having all this data if a link doesn’t exist anymore? Now that I can measure what’s live I can accurately show you which link data source provides the most live links and domains… and whether they’re free.
Who provides the most links?
The answer to this is an easy one, ahrefs, which prides itself on having a diverse and fresh index provided me with the greatest number of live links. Interestingly Moz came last – but this is mainly due to their throttling of the amount of data you can download without extortionate amounts of money:
It was also no suprise that Majestic Historic had the most dead links in it’s sample. The historic index is a very thorough snapshot into the past, so I would expect many of the old links to have dropped out by now:
Who provides the most linking domains?
This is where I was surprised. I often download millions and millions of rows of backlink data per analysis – and the webmaster tools sample are by far the smallest data-sets that go into the database. But we can clearly see that actually it’s the free data that provided the most number of live domains in their samples:
Looking at the dead domains, with the exception of Majestic Historic which you would expect to have a greater number of dead domains, it’s the webmaster tools accounts that yield the most diverse backlink samples:
Hold on! More isn’t necessarily better!
In a previous article, I pointed out that providing lots of links isn’t necessarily a good thing for any single data source. If you bought a big bag of fruit pastilles, but it turned out they were all green you’d be pretty annoyed.
What you need is variety, as many different flavours (or in this case domains) as possible.
Here ahrefs was the best at providing the most number of live links, but only 5th at the amount of live linking domains. When it comes to managing link data, it’s all about how measure the live links.
Each data source is only really as good as the number of domains that it, and only it, can show you. For example ‘forum.m.m.biz’, one of the many damned m.biz domains to pop up recently, appears only in the download from Google Webmaster Tools. ‘forum.m.m.biz’ would thus be considered unique.
I did a comparison of how unique each linking domain in the profile was, and if you’re interested in these sorts of numbers, here are the results:
To make the data a little clearer, here are how many totally unique, live, domains each data source provided:
This means that without downloading Bing Webmaster tools I would absolutely have missed 1,070 domains from my link analysis. That’s around 37% of the whole profile!
Following this line of thought then, what proportion of the whole profile would be covered if you only used the free tools, Google and Bing Webmaster tools?
Would anyone reasonably have to pay money to analyse their own site at all?
I adapted my formulas and found very clear answers to both those questions:
What if you only used free link data?
In this scenario, if I had only used Bing and Google Webmaster tools I would have covered more than a significant amount of the whole profile. Here is a table of how much each source covered individually:
Using each source on its own would still give you a significant amount of the data needed, but using them together you can cover 82.6% of the whole backlink profile.
(That’s adding Google Unique Domains + Bing Unique Domains + the 587 domains present in both profiles)
What does my paid link data cover?
Surprisingly, despite having huge indexes of their own, the paid link sources would not have covered that much of the profile. If you were to use only these paid sources you would struggle to identify all the spammy links in the profile. It get’s slightly complicated looking at the numbers, but here’s a representative table:
In fact, if you were to combine just these sources (say you’re working on a competitor’s site and don’t have access to sensitive webmaster data) – then you could only hope to cover around 1,424 domains, 48.7% of the total profile.
(Or at least with this scenario.)
Don’t get me wrong, paid data is great. It provides you with a bundle of different stats about each link that the free tools wouldn’t ordinarily cover. Perfect for general tasks like link prospecting – especially if it’s on a site that you don’t own, and don’t have webmaster access to.
If you have a penalty of some kind, the most cost effective thing you can do is hire a professional with the access and ability to compile multiple sources, to scour them and manually identify your problem links.
And if they’re really good, they can remove those links for you as well.
I appreciate that this is just an isolated link profile and does not represent average crawling behaviour. I’ll carry out a similar analysis on a number of profiles in future to see whether these conclusions were fair. This example is meant to point out that when it comes to link data, quantity is certainly not quality – and topline stats can be very misleading if you can’t tell which links are live or dead.