The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Pages getting into Google Index, blocked by Robots.txt??

    Pages getting into Google Index, blocked by Robots.txt??

    Intermediate & Advanced SEO
    10 3 673
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • bjs2010
      bjs2010 last edited by

      Hi all,

      So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
      site:www.sekretza.com inurl:price=
      site:www.sekretza.com inurl:artists=

      So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more."

      So we removed them all, and google removed them all, every single one.

      This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index??

      I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on  - but surely that will waste my crawl budget on unnecessary pages?

      Any ideas?

      thanks.

      1 Reply Last reply Reply Quote 0
      • bjs2010
        bjs2010 last edited by

        For example:
        http://www.sekretza.com/eng/best-sellers-sekretza-products.html?price=1%2C1000

        Is blocked by using:
        Disallow: /*price=

        .... ?

        1 Reply Last reply Reply Quote 0
        • AndersS
          AndersS last edited by

          Hi!

          It could be that that pages has already been indexed before you added the directives to robots.txt.

          I see that you have added the rel=canonical for the pages and that you now have noindex,follow. Is that recently added? If so, it could be wise to actually let GoogleBot access and crawl the pages again - and then they'll go away after a while. Then you could add the directive again later. See https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466 for more about this.

          Hope this helps!
          Anders

          1 Reply Last reply Reply Quote 1
          • Devanur-Rafi
            Devanur-Rafi last edited by

            Anderss has pointed out to the right article. With robots.txt blocking, Google bot will not do the crawl (link discovery) from within the website but what if references to these blocked pages are found else where on third-party websites? This is the case you have been into. So to fully block Google from doing the link discovery and indexing these blocked pages, you should go in for the page-level meta robots tag to block these pages. Once this is in place, this issue will fade away.

            This issue has been addressed many times here on Moz.

            Coming to your concern about the crawl budget. There is nothing to worry about this as Google will not crawl those blocked pages while its on your website as these are already been blocked using robots.txt file.

            Hope it helps my friend.

            Best regards,

            Devanur Rafi

            AndersS 1 Reply Last reply Reply Quote 2
            • bjs2010
              bjs2010 last edited by

              Hi guys,

              Appreciate your replies, but as far as I checked last time, if the URL is blocked by a Robots.txt file, it cannot read the Meta Noindex, Follow tag within the page.

              There are no external references to these URL's, so Google is finding them within the site itself.

              In essence, what you are recommending is that I lift the robots block and let google crawl these pages (which could be infinite as it is faceted navigation).

              This will waste my crawl budget.

              Any other ideas?

              AndersS Devanur-Rafi 2 Replies Last reply Reply Quote 0
              • AndersS
                AndersS @Devanur-Rafi last edited by

                Hi Devanur.

                What I'm guessing is the problem here, is that as of now, GoogleBot is restricted from accessing the pages (because of robots.txt), leading to it never going into the page and updateing its index regarding the "noindex, follow" declaration in the that seems to be in place.

                One other thing that could be considered, is to add "rel=nofollow" to all the faceted navigation links on the left.

                Fully agreeing with you on the "crawl budget" part 🙂

                Anders

                1 Reply Last reply Reply Quote 0
                • AndersS
                  AndersS @bjs2010 last edited by

                  Hi!

                  From what I could tell, it wasn't that many pages already in the index, so it could be worth trying to lift the block, at least for a short while, to see if it will have an impact.

                  In addition - how about configuring how GoogleBot should threat your URLs via the URL parameter tool in Google Webmaster Tools. Here's what Google has to say about this. https://support.google.com/webmasters/answer/1235687

                  Best regards,Anders

                  1 Reply Last reply Reply Quote 0
                  • Devanur-Rafi
                    Devanur-Rafi @bjs2010 last edited by

                    Hi,

                    Please try this and let us know the results:

                    Suppose this is one of the pages in discussion:

                    http://www.yourdomain.com/blocked-page.html

                    Go to Google, type the following along with double quotes. Replace with the actual page:

                    "yourdomain.com/blocked-page.html" -site:yourdomain.com

                    bjs2010 1 Reply Last reply Reply Quote 0
                    • bjs2010
                      bjs2010 @Devanur-Rafi last edited by

                      It doesn't show any result for the "blocked page" when I do that in Google.

                      Devanur-Rafi 1 Reply Last reply Reply Quote 0
                      • Devanur-Rafi
                        Devanur-Rafi @bjs2010 last edited by

                        Oh, ok. If that's the case, pls don't worry about those in the index. You can get them removed using remove URL feature in webmaster tools account.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • Should I use noindex or robots to remove pages from the Google index?
                          SwanseaMedicine
                          SwanseaMedicine
                          0
                          7
                          704

                        • If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
                          GastonRiera
                          GastonRiera
                          0
                          3
                          98

                        • Search Results Pages Blocked in Robots.txt?
                          BeckyKey
                          BeckyKey
                          0
                          3
                          117

                        • Robots.txt Disallowed Pages and Still Indexed
                          Igor.Go
                          Igor.Go
                          0
                          3
                          2.9k

                        • Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page?
                          Muhammad_Jabali
                          Muhammad_Jabali
                          0
                          3
                          748

                        • Is it possible to get a list of pages indexed in Google?
                          CrakJason
                          CrakJason
                          0
                          3
                          1.1k

                        • Robots.txt is blocking Wordpress Pages from Googlebot?
                          Desiree-CP
                          Desiree-CP
                          0
                          4
                          10.7k

                        • Does using robots.txt to block pages decrease search traffic?
                          KeriMorgret
                          KeriMorgret
                          0
                          4
                          520

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy