Welcome to the Q&A Forum

carinoverturf

Sounds great, Mike! Just send them over and I'll take a look!

Thanks,
Carin

carinoverturf

Hey Mike,

I'm sorry you're so frustrated with the issues in the index lately - I know it's an inconvenience, but, I assure, you the team has been working all hours to work out these kinks!

In fact, after many nights and weekends sacrificed, we're looking at probably being early on our next release. The bugs will be much less evident in this next index as the stale crawl data is dropped from the index.

I know that doesn't help you out right now. Can you send me some details on the corruption you're seeing? A full OSE link with all the parameters would be perfect as well as a CSV, if you have one. If you don't feel comfortable posting in Q&A, please email me at carin@seomoz.org.

This sounds like the same bug we saw emerge in this index, and have since fixed, but I want to make sure that is the case.

Again, I'm really sorry for the inconvenience and frustration this is causing - we are working hard at ironing out these final issues!

Thanks,
Carin

carinoverturf

This just came to our attention yesterday and our engineers have been investigating over the weekend. It appears to be fallout from the parsing bug that caused the initial delay of this index launch.

We're still investigating, but we do have another index in the works, with the parsing bug no longer present. We hope to have this ready in the next two weeks. In the meantime, we're looking into how we can remedy this current anchor text portion.

If you would like to read more about the parsing bug, Phil provided a great explanation in the forum article here.

Sorry for the inconvenience this will cause - we're looking into ways to remedy this as soon as we can!

Thanks,
Carin

carinoverturf

Yep, David is correct - that call is only available with a paid API plan. If you are interested in a paid plan, check out the different tiers on our Mozscape API page.

Thanks!
Carin

carinoverturf

No problem! I'm so sorry for the inconvenience!

I just pushed the remaining pending reports through, so I think you should be set, but if you continue to run into any problems, just let me know!

carinoverturf

Hey there!

Haha - we ran into a problem on Monday night with one of the machines falling over causing a huge backlog to pile up. We were able to get things back on track yesterday and churn through the backed up reports, but with the index launch yesterday, we're seeing a bit of a backlog again this morning.

We are getting a monster machine up right now to speed through this! Once things calm down you should see these come through. It looks like you have about 8 pending - I'll keep an eye on them to make sure they go through!

Thanks,
Carin

carinoverturf

Hey there!

The Top 500 list is compiled from our Mozscape (formerly Linkscape) link data compiled from our crawlers, but, unfortunately, we don't crawl Facebook since the pages are https.

Adding the ability to crawl https is on our road map, however!

Thanks,
Carin

carinoverturf

Hey Ravi,

Sorry for the delayed response - I wanted to follow up with the engineers to see if they had any suggestions for you.

They agreed the Limit parameter set to 1,000 might be too large to process. Have you tried adjusting that to 300 or even 500? Do you see better success at a lower limit?

Our system will timeout at about 60 seconds so I'm not sure if the hanging is on our end. If dropping the limit size doesn't help, you might want to think about ending the request after about a minute. Sometimes requests that are too long wil timeout, but work fine on a retry as some data will be cached from the previous request.

I hope this information is helpful, but let me know if you're still experiencing issues!

Thanks,
Carin

carinoverturf

Hey there!

Just want to make sure I'm understanding what you're trying to do - basically you're hoping to use jQuery to send requests to the API and then fetch the JSON results?

What type of queries are you sending the API? What would the API query look like?

Also, we do have the API Help Forums to post in or search as well - not sure if you've explored these pages, but there could be some helpful information for you there as well!

Thanks!
Carin

carinoverturf

Hey! This is an issue I haven't heard of before - would you be able to provide anymore information like an example query to the API and some of the pages you are seeing hang?

Thanks!

Carin

carinoverturf

Hey guys,

Yep, Keri is correct, unfortunately We found a bug in ourJuly index with our new crawlers - they were crawling binary files as if they were links and, since they are not normal links, the crawler couldn't handle them very well.

We have made some updates to our crawling so it will go deeper into sites. The reason for these odd inbound links from high-authority sites is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is why you’re seeing inbound links to pages that don’t really exist.

There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need a few more weeks to propagate. The fix for this issue probably won’t be seen for another update, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!

The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these phantom inbound links.Thanks for your patience!Carin

carinoverturf

Hey Zack, I saw the ticket you filed was answered by Aaron, but I just wanted to follow up with you as well. We have made some really exciting changes to the crawler, but, unfortunately, there is a pretty obvious bug as well...

The reason for the “questionable” links coming from the Internet Wild West is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is probably what you're seeing with all the crazy links from China and Russia which don't actually link to the site you're researching.

There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be handle the majority of these files correctly, however, this update will need about a month to propagate. The fix for this issue probably won’t be seen for two more updates, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!

The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these “questionable” links associated with your sites.I hope this helps and thanks so much for being patient :)Thanks,Carin

carinoverturf

Hey!

That sounds like odd behavior and I don't think I've heard of that happening before. I'd love to dig a bit deeper to see what's going on.

Would you be able to send me the pages you are searching? I assume you are experiencing this in Open Site Explorer?

If you would prefer not post the URLs in this forum, feel free to email me directly at carin@seomoz.org!

Thanks,
Carin

carinoverturf

Hey there!

The Top 500 list is compiled from our Mozscape (formerly Linkscape) link data compiled from our crawlers, but, unfortunately, we don't crawl Facebook since the pages are https.

Adding the ability to crawl https is on our road map, however!

Thanks,
Carin

carinoverturf

Hey Berend van Bon,

The Back Burner has an interesting thought on the decrease in your DA. It's tough to say for sure why your DA has dropped - it could be due to the updated DA model or we might not have crawled as many sites linking to you. If the latter is the case, you would see a decrease in link count, thereby lowering your DA.

Most likely if you see a drop in your linking domains and/or URL counts, this was what caused your DA to decrease. If these metrics look about the same as last index, the drop would probably be due to the updated DA model.

I hope that helps, but let me know if you have any other questions!

Thanks,

Carin

carinoverturf

Thanks Ryan for the great answer! We do have the new social features in Open SIte Explorer that display the Facebook shares, collected from their FQL API.

We are also in development of a new tool in the PRO app offering Social Analytics metrics. Here is Rand's blog post about it!

Hope that helps, but let me know if you have any more questions!

Thanks,

Carin

carinoverturf

Hey guys!

Keri is right - we have done some updating with our crawler and this index represents the newest version - unfortunately with a few hiccups. People seem to be seeing two issues with this new index - link counts and domain authorities are going up or down considerably and there is an increase of "questionable" inbound links.

Both issues are due to the same root cause: our new crawler is built to be fresher, but it is going deeper into domains, and, unfortunately not visiting as many domains. Domains with a high MozRank are getting crawled deeper, but domains with middle to lower MozRanks are not getting crawled.

Our top priority now is to get the domain diversity back up to or better than that of our last update as was originally designed. It's fixable and we will be focusing all efforts on this.

Previous crawling worked by selecting a list of the top MozRank URLs (around 10B) and then crawling one page from each of them. Now we are crawling links as we discover them, and crawling high MozRank sites daily, weekly or monthly. The advantage of the new crawlers is we are crawling all the time and so we will have fresher data. As links are added, we are much more likely to discover these deeper links. The new crawl had 59B urls, a lot more than the previous 42B, however, more of these links are from the same domain.

The reason for the "questionable" links is due to the fact that the crawler is reaching deeper into the domains where there are more download links. We are currently looking into fixing this so these won't be counted as links. We'll let you know as soon as that issue is resolved!

We are really sorry for the inconvenience. Once we have this new crawler dialed it will provide much fresher and higher quality data!!

Thanks,

Carin

carinoverturf

Hey guys,

The issue you are seeing is due to the new OSE update. We have done some updating with our crawler and this index represents the newest version - sadly, with a few bugs...We are looking into this issue and hope to have it resolved as soon as possible!

The newest version of our crawler is built to be fresher, but it is also going much deeper into high MozRank pages. This bug has probably always existed, but has never been obvious since we weren't crawling as deep into domains where there are more download links. We are currently looking into fixing this so these won't be counted as inbound links.

I'm so sorry for the inconvenience - once we get this new version of the crawler dialed and smoothed out, it will be providing you guys a much fresher and higher quality index!

There is another thread regarding this topic, so check it out if you want more information on what is going on with this index.

Thanks,

Carin

carinoverturf

Hey John,We have made some updates to our crawling so it will go deeper into sites, and this is the first launch including the new metrics. We've discovered a bug, however, in the updated crawler. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is probably why you’re seeing your competitor have these .edu links - they're probably incorrectly associated with their site.
There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need about a month to propagate. The fix for this issue probably won’t be seen for two more updates, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!

The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these “questionable” links associated to either your site or your competitors sites.

I hope this helps answer your questions around these .edu links, but let me know if you have any more questions!

Thanks,
Carin

carinoverturf

Hey guys!

I wanted to jump in here and give you all the lastest on the CSV download issue. We were finally able to clear through the backlog of CSV reports about 7 pm PST last night, however, there were about 3,000 jobs that were still in a finalizing status and just hanging. We were able to work a quick fix to get these last remaining reports out today. The fix was such a great idea, we've decided to make it a permanent feature in Open Site Explorer!

Since the new launch of OSE, we've had reports of users requesting reports that end up hanging during high peak times. The fix we added today will help in those scenarios by re-queueing the report if it is hanging for a long period of time.

I'm hoping this helped get the last of the missing reports out, but please let me know if you guys are still seeing pending or hanging requests.

Thanks!

Carin

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

carinoverturf

@carinoverturf

Latest posts made by carinoverturf

Best posts made by carinoverturf

Blog Posts

Another March Mozscape Index is Live!

Announcing the March Mozscape Index!

The Second February Mozscape Index is Live!

February Mozscape Index is Live

Another January Mozscape Index Has Been Released!

January Mozscape Index is Live!

December Mozscape Index is Live!

Another November Index is Live!

November Mozscape Index is Live!

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved