The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. API
    4. September's Mozscape Update Broke; We're Building a New Index

    September's Mozscape Update Broke; We're Building a New Index

    API
    24 11 1.3k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • randfish
      randfish last edited by

      Hey gang,

      I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days.

      This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them.

      In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need.

      We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made.

      For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules.

      I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.

      1 Reply Last reply Reply Quote 11
      • Guest
        Guest last edited by

        This post is deleted!
        randfish 1 Reply Last reply Reply Quote 1
        • donford
          donford last edited by

          Webmasters love of sub-domains... shake fist!

          Highland 1 Reply Last reply Reply Quote 2
          • randfish
            randfish @Guest last edited by

            Hi Will - that's not entirely how I'd frame it. Mozscape's metrics will slowly, over time, degrade in their ability to predict rankings, but it's not as though exactly 31 days after the last update, all the metrics or data is useless. We've had delays before of 60-90+ days (embarrassing I know) and the metrics and link data still applied in those instances, though correlations did slowly get worse.

            The best way I can put it is - our index's data won't be as good as it normally is for the next 20-30 days, though it's better now than it will be in 10 days and was better 10 days ago than it is today. It's a gradual decline as the web's link structure changes shape and as new site and pages come into Google's index that we don't account for.

            Guest 1 Reply Last reply Reply Quote 2
            • Guest
              Guest @randfish last edited by

              This post is deleted!
              randfish 1 Reply Last reply Reply Quote 1
              • randfish
                randfish @Guest last edited by

                Yeah - the new links you see via "just discovered" will take longer to be in the main index and impact metrics like MozRank, Page Authority, Domain Authority, etc. It's not that they're not picked up or not searched, but that they don't yet impact the metrics.

                And yes - will check out the other question now!

                1 Reply Last reply Reply Quote 1
                • LayGiri
                  LayGiri last edited by

                  I did notice no new links added to a number of projects in the last 2 months and I was wondering what went wrong. Thanks for clearing up the issue with this post. We look forward to the resolution.

                  randfish 1 Reply Last reply Reply Quote 1
                  • Highland
                    Highland @donford last edited by

                    And they would have gotten away with it too if weren't for those meddling kids and their pesky subdomains

                    1 Reply Last reply Reply Quote 3
                    • randfish
                      randfish @LayGiri last edited by

                      Two potential solutions for you - 1) watch "Just Discovered Links" in Open Site Explorer - that tab will still be showing all the links we find, just without the metrics. And 2) Check out Fresh Web Explorer - it will only show you links from blogs, news sites, and other things that have feeds, but it's one of the sources I pay attention to most, and you can set up good alerts, too.

                      1 Reply Last reply Reply Quote 3
                      • Joe.Robison
                        Joe.Robison last edited by

                        Thanks for the transparency as usual. A question I've always been wondering:

                        Moz seems to have much more stature, clout, and maybe funding compared to many other SEO software companies based around the world. And of course you offer more of a suite of products rather than just focusing on Open Site Explorer. But to me one of the most important SEO tools is the backlink explorer tools that companies offer, and it seems like OSE, although one of the first, lags compared to a few others. I've read that OSE isn't looking to just grab all the links, but only the most important ones. It seems though that there's been lots of technical challenges, and I can't help but think that there are other companies that have already solved their indexing challenges or are a few steps ahead of OSE.

                        Would Moz ever go out an buy a pretty good backlink explorer company like Ahrefs or Majestic or some other upstart that's solved that piece of the puzzle? Combining that new technology that's solved the indexing part with your DA algorithm seems like a match made in heaven. I'm sure you guys have considered this years ago internally, but it's a question I've always pondered...

                        LayGiri randfish 2 Replies Last reply Reply Quote 3
                        • LayGiri
                          LayGiri @Joe.Robison last edited by

                          Talk about reading everyone's mind... I should point out though that Rand mentioned above that moz was working on a new real time tool like the ones we have seen elsewhere. I think a little patience might solve everyone's problems.

                          1 Reply Last reply Reply Quote 1
                          • randfish
                            randfish @Joe.Robison last edited by

                            Hi Joe - fair question.

                            The basic story is - what the other link indices do (Ahrefs and Majestic) is unprocessed link crawling and serving. That's hard, but not really a problem for us. We do it fairly easily inside the "Just Discovered Links" tab. The problem is really with our metrics, which is what makes us unique and, IMO, uniquely useful.

                            But, metrics like MozRank, MozTrust, Spam Score, Page Authority, Domain Authority, etc. require processing - meaning all the links needed to be loaded into a series of high-powered machines and iterated on, ala the PageRank patent paper (although there are obviously other kinds of ways we do this for other kinds of metrics). Therein lies the rub. It's really, really hard to do this - takes lots of smart computer science folks, requires tons of powerful machines, takes a LONG time (17 days+ of processing at minimum to get all our metrics into API-shippable format). And, in the case where things break, what's worse is that it's very hard to stop and restart without losing work and very hard to check our work by looking at how processing is going while it's running.

                            This has been the weakness and big challenge of Mozscape the last few years, and why we've been trying to build a new, realtime version of the index that can process these metrics through newer, more sophisticated, predictive systems. It's been a huge struggle for us, but we're doing our best to improve and get back to a consistent, good place while we finish that new version.

                            tl;dr Moz's index isn't like others due to our metrics, which take lots of weird/different types of work, hence buying/partnering w/ other indices wouldn't make much sense at the moment.

                            KevnJr Joe.Robison 2 Replies Last reply Reply Quote 7
                            • KevnJr
                              KevnJr @randfish last edited by

                              Thanks Rand for the update.  We have hired a full time marketing manager  and he has been working hard the past month, I know he's excited to see the new results.  "Putty & Paint does not a NEW Boat make"  Fixing is a painstaking reality compared to building.  Moz is great, so we will wait 🙂

                              1 Reply Last reply Reply Quote 2
                              • Mysites
                                Mysites last edited by

                                Hey Rand, is this why my crawl reports are saying that i have some 404 client errors on pages where I can't see any issues? Or is this another issue that I'm incurring?

                                Thanks in advance

                                DavidLee 1 Reply Last reply Reply Quote 0
                                • DavidLee
                                  DavidLee @Mysites last edited by

                                  Hi Lehia

                                  Crawl reports are separate from our Mozscape indexes. Also any delays with our index only impact the ability to access new data. With your crawl reports I have a suspicion the URLs with the 404s are ones with trailing slashes e.g. domain.com/ and not domain.com

                                  If not, send us your account info and some examples at help@moz.com and we can take a look!

                                  1 Reply Last reply Reply Quote 2
                                  • DmitriiK
                                    DmitriiK last edited by

                                    Hello, Rand. I just noticed that yesterday new update was scheduled for October 8th. And just now it says October 14th! What's going on? I hope it didn't break again...

                                    randfish 1 Reply Last reply Reply Quote 0
                                    • randfish
                                      randfish @DmitriiK last edited by

                                      It didn't break, but it is taking longer to process than we hoped. Very frustrating, but we have a plan that, starting in a few more weeks, should get us to much more consistent index releases (and better quality ones, too).

                                      DmitriiK 1 Reply Last reply Reply Quote 0
                                      • DmitriiK
                                        DmitriiK @randfish last edited by

                                        Thanks!

                                        I have been noticing for quite some time that last minute changes in update release dates are becoming "normal". Is there way you guys can make those changes in update dates be announced earlier than on the expected update release date?

                                        1 Reply Last reply Reply Quote 0
                                        • randfish
                                          randfish last edited by

                                          Sometimes yes. Sometimes, we don't know until we reach the last stages of processing whether it's going to finish or take longer. We're trying to get better at benchmarking along the way, too, and I'll talk to the team about what we can do to improve our metrics as an index run is compiling.

                                          DmitriiK 1 Reply Last reply Reply Quote 0
                                          • DmitriiK
                                            DmitriiK @randfish last edited by

                                            Thanks for clarifying!

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post
                                            • Got Spam Score on New Sites without Building Backlinks
                                              asadddE
                                              asadddE
                                              0
                                              3
                                              61

                                            • I'm using Mozbar in Google Chrome and it's showing my website is in Germany. Any idea why this may be?
                                              samantha.chapman
                                              samantha.chapman
                                              1
                                              4
                                              279

                                            • Mozscape API Updates (Non-updates!) - becoming a joke!
                                              randfish
                                              randfish
                                              2
                                              5
                                              762

                                            • The April Index Update is Here!
                                              jennita
                                              jennita
                                              9
                                              5
                                              707

                                            • Mozscape Index update frequency problems?
                                              randfish
                                              randfish
                                              1
                                              4
                                              509

                                            • First Mozscape index of the year is live
                                              Mobilio
                                              Mobilio
                                              5
                                              3
                                              522

                                            • January’s Mozscape Index Release Date has Been Pushed Back to Jan. 29th
                                              AHC_SEO
                                              AHC_SEO
                                              2
                                              4
                                              422

                                            • Lost many links and keyword ranks since moz index update
                                              jameskais
                                              jameskais
                                              0
                                              3
                                              413

                                            Get started with Moz Pro!

                                            Unlock the power of advanced SEO tools and data-driven insights.

                                            Start my free trial
                                            Products
                                            • Moz Pro
                                            • Moz Local
                                            • Moz API
                                            • Moz Data
                                            • STAT
                                            • Product Updates
                                            Moz Solutions
                                            • SMB Solutions
                                            • Agency Solutions
                                            • Enterprise Solutions
                                            • Digital Marketers
                                            Free SEO Tools
                                            • Domain Authority Checker
                                            • Link Explorer
                                            • Keyword Explorer
                                            • Competitive Research
                                            • Brand Authority Checker
                                            • Local Citation Checker
                                            • MozBar Extension
                                            • MozCast
                                            Resources
                                            • Blog
                                            • SEO Learning Center
                                            • Help Hub
                                            • Beginner's Guide to SEO
                                            • How-to Guides
                                            • Moz Academy
                                            • API Docs
                                            About Moz
                                            • About
                                            • Team
                                            • Careers
                                            • Contact
                                            Why Moz
                                            • Case Studies
                                            • Testimonials
                                            Get Involved
                                            • Become an Affiliate
                                            • MozCon
                                            • Webinars
                                            • Practical Marketer Series
                                            • MozPod
                                            Connect with us

                                            Contact the Help team

                                            Join our newsletter
                                            Moz logo
                                            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                            • Accessibility
                                            • Terms of Use
                                            • Privacy