The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. On-Page / Site Optimization
    4. New CMS system - 100,000 old urls - use robots.txt to block?

    New CMS system - 100,000 old urls - use robots.txt to block?

    On-Page / Site Optimization
    8 3 745
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Blenny
      Blenny last edited by

      Hello.

      My website has recently switched to a new CMS system.

      Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.

      Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'

      Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.

      My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.

      Thanks!

      1 Reply Last reply Reply Quote 0
      • Mark_Jay_Apsey_Jr.
        Mark_Jay_Apsey_Jr. last edited by

        Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".

        Then you can slowly pick away at the issue and figure out if some of the "not founds" really  have content and it is sending them to the wrong area....

        On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....

        Good luck.

        1 Reply Last reply Reply Quote 2
        • Blenny
          Blenny last edited by

          Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?

          Thanks!

          Mark_Jay_Apsey_Jr. 1 Reply Last reply Reply Quote 0
          • Mark_Jay_Apsey_Jr.
            Mark_Jay_Apsey_Jr. @Blenny last edited by

            Absolutely. Not founds and no content are a concern. This will help your ranking....

            1 Reply Last reply Reply Quote 0
            • Dr-Pete
              Dr-Pete last edited by

              I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.

              It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.

              If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.

              Blenny 1 Reply Last reply Reply Quote 1
              • Blenny
                Blenny @Dr-Pete last edited by

                Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.

                So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.

                Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.

                Thanks!

                Dr-Pete Blenny 2 Replies Last reply Reply Quote 0
                • Dr-Pete
                  Dr-Pete @Blenny last edited by

                  It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.

                  I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.

                  1 Reply Last reply Reply Quote 1
                  • Blenny
                    Blenny @Blenny last edited by

                    Great stuff..thanks again for your advice..much appreciated!

                    1 Reply Last reply Reply Quote 0
                    • 1 / 1
                    • First post
                      Last post
                    • Create 100% new content in existing page/URL?
                      kvillalobos
                      kvillalobos
                      0
                      3
                      58

                    • How do I redirect SEO from pages on old website to new website with new domain name?
                      jasongmcmahon
                      jasongmcmahon
                      0
                      5
                      280

                    • Update old article or publish new content and redirect old post?
                      EGOL
                      EGOL
                      0
                      4
                      971

                    • When You Add a Robots.txt file to a website to block certain URLs, do they disappear from Google's index?
                      Saijo.George
                      Saijo.George
                      0
                      3
                      205

                    • Can we listed URL on Website sitemap page which are blocked by Robots.txt
                      irvingw
                      irvingw
                      0
                      7
                      482

                    • Robots.txt: excluding URL
                      john4math
                      john4math
                      0
                      2
                      821

                    • What reasons exist to use noindex / robots.txt?
                      KeriMorgret
                      KeriMorgret
                      0
                      5
                      1.2k

                    • The SEOmoz crawler is being blocked by robots.txt need help
                      KeriMorgret
                      KeriMorgret
                      0
                      7
                      1.1k

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    • Digital Marketers
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy