The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Trying to reduce pages crawled to within 10K limit via robots.txt

    Trying to reduce pages crawled to within 10K limit via robots.txt

    Technical SEO Issues
    4 2 835
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • AspenFasteners
      AspenFasteners last edited by

      Our site has far too many pages for our 10K page PRO account which are not SEO worthy. In fact, only about 2000 pages qualify for SEO value. Limitations of the store software only permit me to use robots.txt to sculpt the rogerbot site crawl. However, I am having trouble getting this to work. Our biggest problem is the 35K individual product pages and the related shopping cart links (at least another 35K); these aren't needed as they duplicate the SEO-worthy content in the product category pages.

      The signature of a product page is that it is contained within a folder ending in -p. So I made the following addition to robots.txt:

      User-agent: rogerbot
      Disallow: /-p/

      However, the latest crawl results show the 10K limit is still being exceeded. I went to Crawl Diagnostics and clicked on Export Latest Crawl to CSV. To my dismay I saw the report was overflowing with product page links:

      e.g. www.aspenfasteners.com/3-Star-tm-Bulbing-Type-Blind-Rivets-Anodized-p/rv006-316x039354-coan.htm

      The value for the column "Search Engine blocked by robots.txt" = FALSE; does this mean blocked for all search engines? Then it's correct. If it means "blocked for rogerbot? Then it shouldn't even be in the report, as the report seems to only contain 10K pages.

      Any thoughts or hints on trying to attain my goal would REALLY be appreciated, I've been trying for weeks now. Honestly - virtual beers for everyone!

      Carlo

      1 Reply Last reply Reply Quote 0
      • andresgmontero
        andresgmontero last edited by

        Hi, as far as I know wildcard characters (like "*") are not allowed there, the line must be an allow, disallow, comment or a blank line statement, so before you get angry at Roger for not listening to you, go to Google Webmaster Tools > Crawler Access and test the robots.txt file. Hope it works.

        AspenFasteners 1 Reply Last reply Reply Quote 0
        • AspenFasteners
          AspenFasteners @andresgmontero last edited by

          Hi Andres!

          Sorry, I thought I answered this earlier. If I understand correctly wildcards ARE allowed, according to this reply to my question on the topic: http://www.seomoz.org/q/does-rogerbot-read-url-wildcards-in-robots-txt

          Hope THIS reply sticks this time!

          andresgmontero 1 Reply Last reply Reply Quote 0
          • andresgmontero
            andresgmontero @AspenFasteners last edited by

            Wow! thank you, many of the robots.txt testers still show them as disallow, good to know! thank you!

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.
              H.M.N.
              H.M.N.
              0
              6
              64

            • Will a Robots.txt 'disallow' of a directory, keep Google from seeing 301 redirects for pages/files within the directory?
              DmitriiK
              DmitriiK
              0
              4
              408

            • Do I need to block my cart page in robots.txt?
              Hutch42
              Hutch42
              0
              3
              2.4k

            • Should I block Map pages with robots.txt?
              imaginex
              imaginex
              0
              3
              73

            • How to use robots.txt to block areas on page?
              LauraHT
              LauraHT
              0
              8
              225

            • Block or remove pages using a robots.txt
              OlegKorneitchouk
              OlegKorneitchouk
              0
              2
              422

            • Have a client that migrated their site; went live with noindex/nofollow and for last two SEOMoz crawls only getting one page crawled. In contrast, G.A. is crawling all pages. Just wait?
              Nobody1560986989723
              Nobody1560986989723
              0
              5
              422

            • Robots.txt and robots meta
              TheEspresseo
              TheEspresseo
              0
              5
              1.1k

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy