Welcome to the Q&A Forum

randfish

Hi Marc - if you're worried about them being potentially problematic in Google, just disavow them vai Google Search Console (aka Webmaster Tools), then you can point the domains to any page or part of your site you want without concern. It's likely not a big issue regardless, but if you want to be sure, that would be how I'd do it.

randfish

Hi Joe - fair question.

The basic story is - what the other link indices do (Ahrefs and Majestic) is unprocessed link crawling and serving. That's hard, but not really a problem for us. We do it fairly easily inside the "Just Discovered Links" tab. The problem is really with our metrics, which is what makes us unique and, IMO, uniquely useful.

But, metrics like MozRank, MozTrust, Spam Score, Page Authority, Domain Authority, etc. require processing - meaning all the links needed to be loaded into a series of high-powered machines and iterated on, ala the PageRank patent paper (although there are obviously other kinds of ways we do this for other kinds of metrics). Therein lies the rub. It's really, really hard to do this - takes lots of smart computer science folks, requires tons of powerful machines, takes a LONG time (17 days+ of processing at minimum to get all our metrics into API-shippable format). And, in the case where things break, what's worse is that it's very hard to stop and restart without losing work and very hard to check our work by looking at how processing is going while it's running.

This has been the weakness and big challenge of Mozscape the last few years, and why we've been trying to build a new, realtime version of the index that can process these metrics through newer, more sophisticated, predictive systems. It's been a huge struggle for us, but we're doing our best to improve and get back to a consistent, good place while we finish that new version.

tl;dr Moz's index isn't like others due to our metrics, which take lots of weird/different types of work, hence buying/partnering w/ other indices wouldn't make much sense at the moment.

randfish

Two potential solutions for you - 1) watch "Just Discovered Links" in Open Site Explorer - that tab will still be showing all the links we find, just without the metrics. And 2) Check out Fresh Web Explorer - it will only show you links from blogs, news sites, and other things that have feeds, but it's one of the sources I pay attention to most, and you can set up good alerts, too.

randfish

Yeah - the new links you see via "just discovered" will take longer to be in the main index and impact metrics like MozRank, Page Authority, Domain Authority, etc. It's not that they're not picked up or not searched, but that they don't yet impact the metrics.

And yes - will check out the other question now!

randfish

Hi Will - that's not entirely how I'd frame it. Mozscape's metrics will slowly, over time, degrade in their ability to predict rankings, but it's not as though exactly 31 days after the last update, all the metrics or data is useless. We've had delays before of 60-90+ days (embarrassing I know) and the metrics and link data still applied in those instances, though correlations did slowly get worse.

The best way I can put it is - our index's data won't be as good as it normally is for the next 20-30 days, though it's better now than it will be in 10 days and was better 10 days ago than it is today. It's a gradual decline as the web's link structure changes shape and as new site and pages come into Google's index that we don't account for.

randfish

Hey gang,

I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days.

This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them.

In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need.

We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made.

For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules.

I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.

randfish

I can't help but agree with you. Over the last few years, we've consistently had terrible delays releasing indexes, and despite a team of people way smarter and more talented than me working their tails off to get it right, we haven't had success making it work regularly yet.

This latest index cancellation is embarassing and it sucks. We produced an index, but when we looked at it, a huge problem had arisen that we couldn't see until processing was complete (another problem with our indices is our inability to get a good sense of what they'll look like until they're done which takes 20+ days of processing after a crawl). I'll detail that in a Q+A thread soon (once I get the full rundown and plan from our Big Data team) and then share around.

In any case, you have my sincere apologies and deep regrets. We'll keep trying to get this right, but just FYI - we've simultaneously been building a new index system that's more real-time (like Google's, Ahrefs, Majestic, etc) that can still calculate metrics like MozRank and Page Authority. We've made a lot of progress on it, but it's still probably 6+ months away from launching, so we'll have to deal with the old Mozscape system until then.

randfish

Hi Stephen - when it comes to blogs, especially Wordpress blogs with paginated categories, Google's gotten plenty good over the years at knowing that the full post is the correct version. The category pages on moz.com/rand don't show the full content of the post, don't earn the same links, and do link to the individual posts, so it's really not a concern to noindex them (and, in fact, it might prevent crawling/indexation that I want Google to be able to do).

e.g. I want Google to be able to index https://moz.com/rand/category/archives/startups/ and https://moz.com/rand/mixergy-interview-startup-marketing-reaching-early-adopters-burnout-more/ even though the category page has a small snippet from the Mixergy post.

In these cases of cases, the right pages are ranking for the right queries, and Google's doing a good job of recognizing and differentiating categories vs. posts.

Hope that helps!

randfish

Hi Scott - it depends. You can use Google Search Console's preferred version (https://support.google.com/webmasters/answer/44231?hl=en) to help them choose between www vs. non, but if there are other parameters or versions of the page, you really want a canonical tag or 301.

Given the limitations it sounds like GoDaddy is giving you around this stuff, I'd probably suggest moving to a different CMS/host. Better safe than sorry later.

randfish

Hi Scott - some website builders that might be worth investigation (and will almost definitely be more SEO-friendly than the experience you've described) include:

I'm partial to Wordpress (and there's lots of good hosting options) because of its flexibility, but there's plenty of benefits to other platforms as well.

randfish

Agreed - that looks off to me, too. Again, can't fix in this index, but hopefully the next one should rectify that issue.

randfish

Hi mztobias - I think we just got that flat out wrong. Not sure why our crawler missed your contact page, but clearly it did. Hopefully in the next index, that will be rectified. I don't have the ability to manually edit the score/notation, but once we recrawl the site and update our index, it should be fixed.

Sorry about that!

randfish

Hi Seomvi - yes, definitely a challenging problem, especially since you're thinking preventative rather than reactive (which is very wise!).

My advice would be to consider creating some form of threshold for forum content before you expose it to Google. For example, you could have a litmus test that says, if a forum thread has <500 words or fewer than 2 unique replies, apply a META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW" to the page header. In that fashion, you keep algorithms like Panda from perceiving your forum as having lots of thin content/low value pages.

For PR flow and crawl budget, I'd generally worry less. Google's gotten very adept at identifying forums, crawling them effectively, and understanding how to handle that type of content/link structure. That said, you might try using rel=prev/next to help with Google's crawling.

Wish you all the best!

randfish

Interesting. If you share a screenshot of what you're seeing with the filters (IE 11 only, Bing traffic) here on Q+A as a discussion, you may get some more folks willing to share theirs as well. The Moz team would likely be happy to share ours.

randfish

With "related:" - you could say all the same things with Moz.com, and we also don't have the related results, but I doubt it's hurting our search rankings at all.

Regarding Dmoz - there's no particular way to get Google to recognize Dmoz listings and include them. They seem to do so sometimes and not others. You can actively prevent them using that listing with the noodp tag, but there's no way to do the reverse and get them to pay attention. One thing you might try is making sure they've actually crawled & indexed the page on Dmoz with your listing recently. If they haven't (you can look at the cache date in Google's results), you might try linking to it, using "Fetch as Googlebot," etc.

randfish

Hi DMMoon - it's possible that Bing will make changes that mess up how analytics counts their visits, but hard to say for sure. They have made those errors in the past. My advice is just to keep an eye on your analytics data as they make the changes, and if you see Bing traffic drop when they move to HTTPS, you've got a likely culprit! The SEO news world will probably report on it, too.

randfish

Hi Sarwan - if you're not seeing those in Google's index, it suggest they can't crawl/index that content. You can test it out via Fetch as Google, which will show you what's Google's seeing. With video content, I might suggest using a text transcript, like Moz does on our Whiteboard Friday.

randfish

Nope. We still don't believe them and still have overwhelming evidence that Google doesn't consistently treat all pages on multiple subdomains the same way they do URLs on the same subdomain. They've said for years that it doesn't matter, but the evidence and data are clear. Putting content on multiple subdomains will almost certainly cause it to perform worse in Google than keeping it all on the same subdomain.

The comments here show some nice examples of folks who have moved their content to a single sub/root domain and seen big traffic bumps from search as a result: https://moz.com/blog/subdomains-vs-subfolders-rel-canonical-vs-301-how-to-structure-links-optimally-for-seo-whiteboard-friday

randfish

Sorry if that wasn't clear! By "soon" I meant within the aforementioned 24 hour window. We fixed this ~11:45am today, and the collections are beginning at 4pm this afternoon, so by 4pm tomorrow, you should see data (I just talked to the team and they said it might take a big more time than that - maybe 30-36 hours depending on where your campaign might see in the queue of those needing updates).

randfish

Where are you seeing that? In OSE? Or in Moz Analytics? In Moz Analytics, it's possible that it's still cached, and will be updating (a few thousand campaigns each hour, so not too long until all of them are done), but in OSE, that data should absolutely be new. If not, can you send an email to me - rand at moz dot com - with your sites, and I'll ask the Big Data team to look into it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

randfish

@randfish

Posts made by randfish

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved