Link analysis going Crazy. Next Linkscape update. Multiple Problems
-
If you have concerns about changes in data with the latest update, the quickest and most effective way of getting answers is to email the Help Team direct using help [at] seomoz.org.
Moz Staff do not routinely read the Q&A,so are unlikely to be aware of issues unless they are specifically alerted to an thread by an Associate who happens to see it.
-
We need to make them aware, because this many unhappy clients, and i am sure there are alot more when everyone woke up and logged on this morning. They need to sort this as it may become really really bad for them
-
Nice 1. Yer tweet them and make them aware. Ill do the same to.
-
I am not disputing that SEOmoz need be alerted to an issue.
My point was that an email to the Help Team when this thread was originated 8 hours ago would have alerted SEOmoz that there was an issue much more quickly.
Sha
-
I emailed them 3 hours before I started this post. Reliable service huh
-
For updates on this issue, please follow @SEOmoz on Twitter.
Tweeted in the past few minutes:
Good morning everyone. We've seen the tweets and Q&A about OSE and will update here with information soon.
-
A slight correction is that the help desk staff doesn't routinely read the Q&A, but other SEOmoz staff and associates do read Q&A. I personally try to read every new question that comes in, but am myself on Pacific time and don't see things right away when they come in overnight. Emailing help at seomoz.org is the best way to report a problem like this.
An OSE engineer did just come into the office and is aware of these Q&A threads and will be responding soon. We're so sorry that there have been problems, and we're working to resolve them as soon as possible. Sha Menz, thank you so much for helping out here.
-
Hi everyone,
First, there was an update last night, and I believe we've adjusted the calendars now.
We're actively tracking down what happened. What the team really needs now is domains affected. If you could respond here with the domains involved (if you're allowed to share them) that would be great, otherwise, please send an email to help@seomoz.org with "OSE Domains" in the subject line so they can filter the incoming messages that way.
Thanks for your patience everyone!
-
my domain was affected: www.completeoffice.co.uk. thanks
-
Hi Keri, Thanks for your response. The domain involved for me is confetti.co.uk Thanks again, Brendan.
-
Hey gang - there's a thread going around the SEOmoz engineering + help teams on this topic today. We're researching what happened right now, and Kate Matsudaira, our VP Engineering, has promised to leave a reply once she's got the full story. We'll try to be as transparent as possible here and as fast as we can as well, but Linkscape investigation can take some time due to the massive complexity of the system.
Thanks for posting responses and please do keep suggesting sites we're missing, pages we might not have crawled but should have, large drops in metrics (particularly if/when they're outliers to the rest of your competitors/other sites in your sphere).
Thanks much!
Randp.s. Normally, I'd be much more involved in this myself, but today's my anniversary with my wife and we're on vacation in Southern Oregon. Don't worry, though, I only take one each year, so typically I'm better able to respond fast.

-
Hi everyone!
I just wanted to add a quick response to shed a bit more light on the situation.
Last year we started a on a project to drastically improve our index. The first part of that was to make our crawler discover more of the web - this included crawling deeper on domains, discovering more links faster (freshness), and contain more links overall.
Background
To understand the changes, it might help if I explain how our crawler used to work and how we changed.
Our crawler used to crawl the web (for 3-4 weeks), then we would compute the link graph and create all the lists of links, and metrics you see in Open Site Explorer - this is what we called processing (and it would take 2-3 weeks). As part of processing we would select the top 10 billion urls to crawl, and then start crawling those.
The problem with this system was that the data was could be 7-8 weeks old (crawling time + processing + deployment to the API and OSE). It also wasn't recursive - meaning that we would only discover new links when we did the processing of that crawl, so it could take us several months before we would see new links that were deeper in domains.
The changes
We modified our crawler so we were crawling all the time - we crawl sites every day, or week, or month - based on authority. As we crawl those site, any new links that we find are added to one of the buckets, and will be crawled typically within that same index. This is exciting because we can go deeper, discover more links, and produce a higher quality index. The other benefit, is that since we are crawling all the time, we can just take a snapshot of that crawl and run processing - without waiting for the last round of processing to finish - and this means we can update the index more often.
However, in June, we had a problem with the old crawlers, and we had to roll out our new version of the crawl and index with the OSE launch on July 27th. So even though our testing looked good when we released the new index, and correlations were higher than the old crawl, we got complaints about things that were wrong.
The issues
Binary files were in the index - There are normally only supposed to be links in the index, but because the new crawler went very deep on some domains we started discovering all sorts of binary files, which when parsed, produced lots of weird links. So domains had all these links from sites that didn't link to them. We fixed this issue, and this is the first index with the fix.
We went too deep on big domains - There are a lot of knobs to turn on the new crawlers - from the number of sites we crawl daily/weekly/month to how many links we keep for different domains. One of the first things we noticed with this new crawl, was that we had less domains in our index. So we dialed down how many urls could come from a domain - and this new index also contains that change.
What we are doing
We recognize that all of you depend on this data. And we take the index quality very seriously.
We have already made a lot of other changes, increasing the overall size and adjusting how we crawl. However, since it still takes 2-4 weeks to process an index, so some of those changes won't be seen for another 2-4 weeks yet.
We are also working on an updated, higher correlating Page Authority/Domain Authority that should be out in a month or two - but also may jump around a bit.
What you can do
Definitely keep sending us feedback. It really helps us understand where we may have missed in our testing, and what we can do to fix it.
And thanks again for your patience - we really want to deliver the best possible Linkscape for you, and I assure the team is working nights and weekends to address these concerns.
And if anyone has questions you can always email me or our help team (which tend to respond to emails much faster), as all of us care a lot and really want to hear your feedback.
Thanks again,
Kate -
Hey Rand, Thanks for the response. I think we all appreciate the complexities involved. You go and have yourself a good anniversary. Cheers, Brendan.
-
And btw, we are still investigating the differences between indexes and will continue to update this thread as we have more information.
Thanks!
-
Happy Anniversary, what are you doing replying to posts!?

I'm sure the team will work it out.
-
Thanks for the detailed info! It'd be great to receive updates like this.
-
Thanks you guys for the follow up and explanation but as some of the PRO member have already mentioned in this thread, most of us are not running large sites and using OSE for large sites. Seeing such huge changes without a heads up, is something that has taken many of us by surprise. There is not one single metrics in the latest updates that is not way off compared to the previous updates for our site. Some of my competitors have seen many changes but some have not? What should i then trust?
Let me be honest with you guys the responses you have provided so far have not been satisfactory.
I know you guys are working hard at this but i can't stress enough how urgent you need to come up with an "official" point of view moving forward.
Thanks,
Olivier
-
Thanks Gyorgy - I am glad you found it useful.
For what it is worth, we have another index update planned in 2-3 weeks, and then another 3 weeks out - each index should get progressively better.
The team is working over time here though - the hard part is that the changes we make now can take 2+ months to propagate.
All the domains people sent us yesterday helped us identify another bug with our index, so we have a fix for that too. But since it takes 3-5 weeks to crawl, and then another 2-3 weeks to process you won't be able to see those improvements for another 2+ months. However, by December, the index will be better than it has ever been - with more domains and links.
Thanks again for your patience and all the details - it has really helps us track down issues.
-
Rand,
Thanks for keeping us up to date with your latest post, Linkscape September Update in the SEOmoz Blog.
Sha
-
Wow! You caught that fast
Thanks Sha - glad we can keep in close touch with everyone around this issue.