The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Disallow statement - is this tiny anomaly enough to render Disallow invalid?

    Disallow statement - is this tiny anomaly enough to render Disallow invalid?

    Technical SEO Issues
    3 2 118
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • lzhao
      lzhao last edited by

      Google site search (site:'hbn.hoovers.com') indicates 171,000 results for this subdomain.  That is not a desired result - this site has 100% duplicate content.   We don't want SEs spending any time here.

      Robots.txt is set up mostly right to disallow all search engines from indexing this site.   That asterisk at the end of the disallow statement looks pretty harmless - but could that be why the site has been indexed?

      User-agent: *
      Disallow: /*
      
      
      1 Reply Last reply Reply Quote 0
      • WilliamKammer
        WilliamKammer last edited by

        The additional asterisk shouldn't do you any harm, although standard practice seems to be just putting the "/".

        Does it seem like Google is still crawling this subdomain when you look at webmasters crawl stats? While the disallow function in robots.txt will usually stop bots from crawling, it doesn't prevent them from indexing or keeping pages indexed that were before the disallow was put in place. If you want these pages removed from the index, you can request it through webmasters and also use meta robots noindex as opposed to the robots.txt file. Moz has a good article about it here: http://moz.com/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts

        If you're just worried about bots crawling the subdomain, it's possible they've already stopped crawling it, but continue to index it due to history or additional indicators suggesting they should index it.

        lzhao 1 Reply Last reply Reply Quote 2
        • lzhao
          lzhao @WilliamKammer last edited by

          Interesting.  I'd never heard that before.

          We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days.

          But the goal is definitely to make them and their contents invisible to SEs.   We'll get GWT on there and start removing URLs.

          Thanks!

          1 Reply Last reply Reply Quote 0
          • 1 / 1
          • First post
            Last post
          • Disallow wildcard match in Robots.txt
            effectdigital
            effectdigital
            0
            3
            1.0k

          • URL is invalid: Why?
            SMCCoachHire
            SMCCoachHire
            0
            4
            73

          • Fetch as Google Desktop Render Width?
            BlueprintMarketing
            BlueprintMarketing
            0
            3
            1.3k

          • Google Fetch and Render - does this fix penalties?
            0
            1
            119

          • Robots.txt anomaly
            Dan-Lawrence
            Dan-Lawrence
            0
            10
            127

          • Googlebot does not obey robots.txt disallow
            Cyrus-Shepard
            Cyrus-Shepard
            0
            12
            1.4k

          • Allow or Disallow First in Robots.txt
            Net66SEO
            Net66SEO
            0
            12
            27.8k

          • Disallowing https URLs
            AlanMosley
            AlanMosley
            0
            2
            364

          Get started with Moz Pro!

          Unlock the power of advanced SEO tools and data-driven insights.

          Start my free trial
          Products
          • Moz Pro
          • Moz Local
          • Moz API
          • Moz Data
          • STAT
          • Product Updates
          Moz Solutions
          • SMB Solutions
          • Agency Solutions
          • Enterprise Solutions
          • Digital Marketers
          Free SEO Tools
          • Domain Authority Checker
          • Link Explorer
          • Keyword Explorer
          • Competitive Research
          • Brand Authority Checker
          • Local Citation Checker
          • MozBar Extension
          • MozCast
          Resources
          • Blog
          • SEO Learning Center
          • Help Hub
          • Beginner's Guide to SEO
          • How-to Guides
          • Moz Academy
          • API Docs
          About Moz
          • About
          • Team
          • Careers
          • Contact
          Why Moz
          • Case Studies
          • Testimonials
          Get Involved
          • Become an Affiliate
          • MozCon
          • Webinars
          • Practical Marketer Series
          • MozPod
          Connect with us

          Contact the Help team

          Join our newsletter
          Moz logo
          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
          • Accessibility
          • Terms of Use
          • Privacy