The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Google indexing despite robots.txt block

    Google indexing despite robots.txt block

    Technical SEO Issues
    13 5 654
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • zeepartner
      zeepartner last edited by

      Hi

      This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch

      This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt

      Any clues why this is or what I could do to resolve it?

      Thanks!

      1 Reply Last reply Reply Quote 0
      • Martijn_Scheijbeler
        Martijn_Scheijbeler last edited by

        Hi Phillipp,

        You almost got me with this one, but it's fairly simple. In your question you're pointing at the robots.txt of your HTTP page. But it's mostly your HTTP**S **pages that are indexed and if you look at that robots.txt file it's pretty clear why these pages are indexed: https://www1.swisscom.ch/robots.txt all the pages that are indexed match with one of your Allow statements are the complete Disallow. Hopefully that provides you with the insight on how to fix your issue.

        zeepartner 1 Reply Last reply Reply Quote 4
        • dotfly
          dotfly last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • zeepartner
            zeepartner @Martijn_Scheijbeler last edited by

            100 points for you Martijn, thanks! I'm pretty sure you've found the problem and I'll go about fixing it. Gotta get used to having https used more frequently now...

            Martijn_Scheijbeler 1 Reply Last reply Reply Quote 1
            • Martijn_Scheijbeler
              Martijn_Scheijbeler @zeepartner last edited by

              You're welcome, it was mostly due to noticing that the first snippet, the homepage, had no snippet and the rest of the pages did have one. That led me to looking at their URL structure. Good luck fixing it!

              1 Reply Last reply Reply Quote 1
              • Kingof5
                Kingof5 last edited by

                A noindex tag specific to Googlebot would also be a good idea.

                Kingof5 1 Reply Last reply Reply Quote -3
                • Kingof5
                  Kingof5 @Kingof5 last edited by

                  People who are disagreeing with this, explain your reasoning.

                  Martijn_Scheijbeler 1 Reply Last reply Reply Quote 0
                  • Martijn_Scheijbeler
                    Martijn_Scheijbeler @Kingof5 last edited by

                    Did you mean a noindex tags for robots or a specific one for googlebot? With the second one I probably get the downvotes.

                    Kingof5 1 Reply Last reply Reply Quote 1
                    • Kingof5
                      Kingof5 last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • Kingof5
                        Kingof5 @Martijn_Scheijbeler last edited by

                        Specifically for Googlebot. I'm pretty surprised people would disagree - Stephan Spencer recommended this in a personal conversation with me.

                        Martijn_Scheijbeler 1 Reply Last reply Reply Quote 0
                        • Martijn_Scheijbeler
                          Martijn_Scheijbeler @Kingof5 last edited by

                          I thought that value was a bit outdated, turns out to be still accepted. Although it probably only address this issue for him in Google and I assume it will still remain one in other search engines.

                          Besides that the problem offered a way better solution in allowing Google not on the HTTPS site.

                          zeepartner 1 Reply Last reply Reply Quote 1
                          • zeepartner
                            zeepartner @Martijn_Scheijbeler last edited by

                            Yes, I think the crucial point is that addressing googlebot wouldn't resolve the specific problem I have here.

                            I would have tried adressing googlebot otherwise. But to be honest, I wouldn't have expected a much different result than specifying all user agents. Googlebot should be part of that exclusion in any case.

                            1 Reply Last reply Reply Quote 0
                            • john4math
                              john4math last edited by

                              It sounds like Martijn solved your problem, but I still wanted to add that robots.txt exclusions keep search bots from reading pages that are disallowed, but it does not stop those pages from being returned in search results.  When those pages do appear, a lot of times they'll have a page description along the lines of "A description of this page is not available due to this sites robots.txt".

                              If you want to ensure that pages are kept out of search engines results, you have to use the noindex meta tag on each page.

                              1 Reply Last reply Reply Quote 1
                              • 1 / 1
                              • First post
                                Last post
                              • No index tag robots.txt
                                Nigel_Carr
                                Nigel_Carr
                                0
                                11
                                3.3k

                              • Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
                                vetofunk
                                vetofunk
                                0
                                5
                                11.2k

                              • Google Indexing Development Site Despite Robots.txt Block
                                DeanAndrews
                                DeanAndrews
                                0
                                6
                                990

                              • Google index dymamic webpages after block in robots.txt...
                                Dr-Pete
                                Dr-Pete
                                0
                                6
                                247

                              • Block Domain in robots.txt
                                donford
                                donford
                                0
                                6
                                2.2k

                              • I accidentally blocked Google with Robots.txt. What next?
                                SebastianCowie
                                SebastianCowie
                                0
                                7
                                2.1k

                              • How to block google robots from a subdomain
                                Alexey_mindvalley
                                Alexey_mindvalley
                                0
                                5
                                2.6k

                              • Blocking other engines in robots.txt
                                RyanKent
                                RyanKent
                                0
                                2
                                581

                              Get started with Moz Pro!

                              Unlock the power of advanced SEO tools and data-driven insights.

                              Start my free trial
                              Products
                              • Moz Pro
                              • Moz Local
                              • Moz API
                              • Moz Data
                              • STAT
                              • Product Updates
                              Moz Solutions
                              • SMB Solutions
                              • Agency Solutions
                              • Enterprise Solutions
                              • Digital Marketers
                              Free SEO Tools
                              • Domain Authority Checker
                              • Link Explorer
                              • Keyword Explorer
                              • Competitive Research
                              • Brand Authority Checker
                              • Local Citation Checker
                              • MozBar Extension
                              • MozCast
                              Resources
                              • Blog
                              • SEO Learning Center
                              • Help Hub
                              • Beginner's Guide to SEO
                              • How-to Guides
                              • Moz Academy
                              • API Docs
                              About Moz
                              • About
                              • Team
                              • Careers
                              • Contact
                              Why Moz
                              • Case Studies
                              • Testimonials
                              Get Involved
                              • Become an Affiliate
                              • MozCon
                              • Webinars
                              • Practical Marketer Series
                              • MozPod
                              Connect with us

                              Contact the Help team

                              Join our newsletter
                              Moz logo
                              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                              • Accessibility
                              • Terms of Use
                              • Privacy