The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Google: How to See URLs Blocked by Robots?

    Google: How to See URLs Blocked by Robots?

    Intermediate & Advanced SEO
    7 3 6.0k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • lbohen
      lbohen last edited by

      Google Webmaster Tools says we have 17K out of 34K URLs that are blocked by our Robots.txt file.

      How can I see the URLs that are being blocked?

      Here's our Robots.txt file.

      User-agent: * Disallow: /swish.cgi Disallow: /demo Disallow: /reviews/review.php/new/ Disallow: /cgi-audiobooksonline/sb/order.cgi Disallow: /cgi-audiobooksonline/sb/productsearch.cgi Disallow: /cgi-audiobooksonline/sb/billing.cgi Disallow: /cgi-audiobooksonline/sb/inv.cgi Disallow: /cgi-audiobooksonline/sb/new_options.cgi Disallow: /cgi-audiobooksonline/sb/registration.cgi Disallow: /cgi-audiobooksonline/sb/tellfriend.cgi Disallow: /*?gdftrk

      Sitemap: http://www.audiobooksonline.com/google-sitemap.xml

      1 Reply Last reply Reply Quote 0
      • McCannSEO
        McCannSEO last edited by

        If you want to see if Google has indexed individual pages which are supposed to be excluded, you can check the URLs in your robots.txt using the site: command.

        E.g. type the following into Google:

        site:http://www.audiobooksonline.com/swish.cgi
        site:http://www.audiobooksonline.com/reviews/review.php/new/
        .
        ..continue for all the URLs in your robots.txt

        Just from searching on the last example above (site:http://www.audiobooksonline.com/reviews/review.php/new/) I can see that you have results indexed. This is probably because you added the robots.txt after it was already indexed.

        To get rid of these results you need to take the culprit line out of the robots.txt, add the robots meta tag set to noindex to all pages you want removed, submit a URL removal request via webmaster tools, check it has been nonidexed then you can add the line back into the robots.txt.

        This is the tag:

        I hope that makes sense and is useful!

        lbohen 1 Reply Last reply Reply Quote 0
        • lbohen
          lbohen @McCannSEO last edited by

          Liz; Perhaps my post was unclear or I am misunderstanding your answer.

          I want to find out the specific URLs that Google says it isn't indexing because of our Robots.txt file.

          McCannSEO 1 Reply Last reply Reply Quote 0
          • McCannSEO
            McCannSEO @lbohen last edited by

            Hi Larry

            Why do you want to find those URLs out for my understanding? Are you concerned that the robots.txt is blocking URLs it shouldn't be?

            As for downloading a list of URLs which aren't indexed from Google Webmaster Tools, which is what I think you would really like, this isn't possible at the moment.

            lbohen 1 Reply Last reply Reply Quote 0
            • lbohen
              lbohen @McCannSEO last edited by

              I want to make sure that Google is indexing all of our pages we want them to. I.E. That all of the NOT indexed URLs are valid.

              McCannSEO 1 Reply Last reply Reply Quote 0
              • McCannSEO
                McCannSEO @lbohen last edited by

                Okay, well the robots.txt will only be excluding robots from the folders and URLs specified and as I say, there's no way to download a list of all the URLs that Google is not indexing from webmaster tools.

                If you have exact URLs in mind which you think might be getting excluded, you can test individual URLs in Google Webmaster Tools in:

                Health > Blocked URLs > URLs Specify the URLs and user-agents to test against.

                Beyond this, if you want to know if there are URLs that shouldn't be excluded in the folders you have specified, I would run a crawl of your website using SEOMoz' crawl test or Screaming Frog. Then sort the URLs alphabetically and make sure that all of the URLs in the folders you have excluded via robots.txt are ones that you want to exclude.

                1 Reply Last reply Reply Quote 1
                • ThompsonPaul
                  ThompsonPaul last edited by

                  It seems you might be asking two different questions here, Larry.

                  You ask which URLs are blocked by your robots file. You then answered your own question by listing the entries in your robots file which are the actual URLs that it is blocking.

                  If in fact what you want to know is which pages exist on your website but are not currently indexed, that's a much bigger question and requires a lot more work to answer.

                  There is no way Webmaster Tools can give you that answer, because if it was aware of the URL it would already be indexing it.

                  HOWEVER! It is possible to do it if you are willing to do some of the work on your own to collect and manipulate data using several tools. Essentially, you have to do it in three steps:

                  • create a list of all the URLs that Google says are indexed. (This info comes from Google's SERPs.)
                  • then create a separate list of all of the URLs that actually exist on your website. (This must come from a 3rd-party tool you run against your site yourself.)
                  • From there, you will use Excel to subtract the indexed URLs from the known URLs, leaving a list of non-indexed URLS, which is what you asked for.

                  I actually laid out this process step-by-step in response to an earlier question, so you can read the process there http://www.seomoz.org/q/how-to-determine-which-pages-are-not-indexed

                  Is that what you were looking for?

                  Paul

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post
                  • Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
                    Martijn_Scheijbeler
                    Martijn_Scheijbeler
                    0
                    11
                    1.6k

                  • If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
                    alphonseha
                    alphonseha
                    1
                    4
                    1.4k

                  • Blocking out specific URLs with robots.txt
                    Modi
                    Modi
                    0
                    3
                    133

                  • Google tagged URL an overly-dynamic URL?
                    ThompsonPaul
                    ThompsonPaul
                    0
                    2
                    228

                  • Will blocking urls in robots.txt void out any backlink benefits? - I'll explain...
                    AubieJon
                    AubieJon
                    0
                    4
                    1.0k

                  • How can I block unwanted urls being indexed on google?
                    VipinLouka78
                    VipinLouka78
                    0
                    5
                    1.3k

                  • Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?
                    sferrino
                    sferrino
                    0
                    2
                    864

                  • Blocking Dynamic URLs with Robots.txt
                    TaitLarson
                    TaitLarson
                    1
                    4
                    5.1k

                  Get started with Moz Pro!

                  Unlock the power of advanced SEO tools and data-driven insights.

                  Start my free trial
                  Products
                  • Moz Pro
                  • Moz Local
                  • Moz API
                  • Moz Data
                  • STAT
                  • Product Updates
                  Moz Solutions
                  • SMB Solutions
                  • Agency Solutions
                  • Enterprise Solutions
                  • Digital Marketers
                  Free SEO Tools
                  • Domain Authority Checker
                  • Link Explorer
                  • Keyword Explorer
                  • Competitive Research
                  • Brand Authority Checker
                  • Local Citation Checker
                  • MozBar Extension
                  • MozCast
                  Resources
                  • Blog
                  • SEO Learning Center
                  • Help Hub
                  • Beginner's Guide to SEO
                  • How-to Guides
                  • Moz Academy
                  • API Docs
                  About Moz
                  • About
                  • Team
                  • Careers
                  • Contact
                  Why Moz
                  • Case Studies
                  • Testimonials
                  Get Involved
                  • Become an Affiliate
                  • MozCon
                  • Webinars
                  • Practical Marketer Series
                  • MozPod
                  Connect with us

                  Contact the Help team

                  Join our newsletter
                  Moz logo
                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                  • Accessibility
                  • Terms of Use
                  • Privacy