The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Is there a limit to how many URLs you can put in a robots.txt file?

    Is there a limit to how many URLs you can put in a robots.txt file?

    Technical SEO Issues
    10 4 1.1k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • kcb8178
      kcb8178 last edited by

      We have a site that has way too many urls caused by our crawlable faceted navigation.  We are trying to purge 90% of our urls from the indexes.  We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags.  Meanwhile we are getting hit with excessive url warnings and have been it by Panda.

      Would it help speed the process of purging urls if we added the urls to the robots.txt file?  Could this cause any issues for us?  Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.

      1 Reply Last reply Reply Quote 0
      • DirkC
        DirkC last edited by

        You could add them to the robots.txt but it you have to remember that Google will only read the first 500kb (source) - as far as I understand with the number of url's you want to block you'll pass this limit.

        As Google bot is able to understand basic regex expressions it's probably better to use regex (you will probably be able to block all these url's with a few lines of code.
        More info here  & on Moz: https://moz.com/blog/interactive-guide-to-robots-txt

        Dirk

        kcb8178 1 Reply Last reply Reply Quote 1
        • Guest
          Guest last edited by

          This post is deleted!
          kcb8178 1 Reply Last reply Reply Quote 0
          • kcb8178
            kcb8178 @Guest last edited by

            Thanks Kristen, thats what I was afraid I would do.  Other than Fetch is there a way to send Google these URLs in mass?  There are over 100 million URLs so Fetch is not scalable.  They are picking them up slowly, but at current pace it will take a few months and I would like to find a way to make it purge faster.

            Guest 1 Reply Last reply Reply Quote 0
            • Guest
              Guest @kcb8178 last edited by

              This post is deleted!
              kcb8178 1 Reply Last reply Reply Quote 0
              • kcb8178
                kcb8178 @Guest last edited by

                Yes, we have done that and are seeing traction on those urls, but we can't get rid of these old urls as fast as we would like.

                Thanks for your input

                1 Reply Last reply Reply Quote 0
                • kcb8178
                  kcb8178 @DirkC last edited by

                  Great thanks for the input.  Per Kristen's post I am worried that it could just block the URLs altogether and they will never get purged from the index.

                  1 Reply Last reply Reply Quote 0
                  • CraigBradford
                    CraigBradford last edited by

                    Hi all, Google Webmaster Tools has a great tool for this. If you go into WMT and select "Google index", then "remove URLs". You can use regex to remove a large batch of URLs then block them in robots.txt to make sure they stay out of the index.

                    I hope this helps.

                    Guest 1 Reply Last reply Reply Quote 0
                    • Guest
                      Guest @CraigBradford last edited by

                      This post is deleted!
                      CraigBradford 1 Reply Last reply Reply Quote 0
                      • CraigBradford
                        CraigBradford @Guest last edited by

                        Hi Kristen,

                        I did this recently and it worked. The important part is that you need to block the pages in robots.txt or add a noindex tag to the pages to stop them from being indexed again.

                        I hope this helps.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • Blocking subdomains with Robots.txt file
                          PaulM01
                          PaulM01
                          0
                          3
                          641

                        • Does putting Disallow: / at the end of a robots.txt file override the Allow: /xxxx that come before it?
                          Alick300
                          Alick300
                          0
                          4
                          249

                        • Can I Block https URLs using Host directive in robots.txt?
                          LoganRay
                          LoganRay
                          0
                          4
                          760

                        • Meta Robots Noindex and Robots.txt File
                          Devanur-Rafi
                          Devanur-Rafi
                          0
                          2
                          125

                        • Question about construction of our sitemap URL in robots.txt file
                          danatanseo
                          danatanseo
                          0
                          16
                          1.6k

                        • Do i have my robots.txt file set up properly
                          ClaireH-184886
                          ClaireH-184886
                          1
                          4
                          319

                        • How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?
                          benjaminspak
                          benjaminspak
                          0
                          4
                          1.6k

                        • Is my robots.txt file working?
                          Resultify
                          Resultify
                          0
                          5
                          1.1k

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy