The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Other Research Tools
    4. Odd crawl test issues

    Odd crawl test issues

    Other Research Tools
    5 3 338
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Arropa
      Arropa last edited by

      Hi all, first post, be gentle...

      Just signed up for moz with the hope that it, and the learning will help me improve my web traffic. Have managed to get a bit of woe already with one of the sites we have added to the tool.  I cannot get the crawl test to do any actual crawling. Ive tried to add the domain three times now but the initial of a few pages (the auto one when you add a domain to pro) will not work for me.

      Instead of getting a list of problems with the site, i have a list of 18 pages where it says 'Error Code 902: Network Errors Prevented Crawler from Contacting Server'. Being a little puzzled by this, i checked the site myself...no problems. I asked several people in different locations (and countries) to have a go, and no problems for them either. I ran the same site through Raven Tool site auditor and got some results. it crawled a few thousand pages. I ran the site through screaming frog as google bot user agent, and again no issues. I just tried the fetch as Gbot in WMT and all was fine there.

      I'm very puzzled then as to why moz is having issues with the site but everyone is happy with it. I know the homepage takes 7 seconds to load - caching is off at the moment while we tweak the design - but all the other pages (according to SF) take average of 0.72 seconds to load.

      The site is a magento one so we have a lengthy robots.txt but that is not causing problems for any of the other services.  The robots txt is below.

      Google Image Crawler Setup

      User-agent: Googlebot-Image
      Disallow:

      Crawlers Setup

      User-agent: *

      Directories

      Disallow: /ajax/
      Disallow: /404/
      Disallow: /app/
      Disallow: /cgi-bin/
      Disallow: /downloader/
      Disallow: /errors/
      Disallow: /includes/
      #Disallow: /js/
      #Disallow: /lib/
      Disallow: /magento/
      #Disallow: /media/
      Disallow: /pkginfo/
      Disallow: /report/
      Disallow: /scripts/
      Disallow: /shell/
      Disallow: /skin/
      Disallow: /stats/
      Disallow: /var/
      Disallow: /catalog/product
      Disallow: /index.php/
      Disallow: /catalog/product_compare/
      Disallow: /catalog/category/view/
      Disallow: /catalog/product/view/
      Disallow: /catalogsearch/
      #Disallow: /checkout/
      Disallow: /control/
      Disallow: /contacts/
      Disallow: /customer/
      Disallow: /customize/
      Disallow: /newsletter/
      Disallow: /poll/
      Disallow: /review/
      Disallow: /sendfriend/
      Disallow: /tag/
      Disallow: /wishlist/
      Disallow: /catalog/product/gallery/

      Files

      Disallow: /cron.php
      Disallow: /cron.sh
      Disallow: /error_log
      Disallow: /install.php
      Disallow: /LICENSE.html
      Disallow: /LICENSE.txt
      Disallow: /LICENSE_AFL.txt
      Disallow: /STATUS.txt

      Paths (no clean URLs)

      #Disallow: /.js$
      #Disallow: /
      .css$
      Disallow: /.php$
      Disallow: /
      ?SID=

      Pagnation

      Disallow: /?dir=
      Disallow: /&dir=
      Disallow: /?mode=
      Disallow: /&mode=
      Disallow: /?order=
      Disallow: /&order=
      Disallow: /?p=
      Disallow: /&p=

      If anyone has any suggestions then please i would welcome them, be it with the tool or my robots.  As a side note, im aware that we are blocking the individual product pages. Too many products on the site at the moment (250k plus) which manufacturer default descriptions so we have blocked them and are working on getting the category pages and guides listed. In time we will rewrite the most popular products and unblock them as we go

      Many thanks

      Carl

      1 Reply Last reply Reply Quote 0
      • garfield_disliker
        garfield_disliker last edited by

        This might not be the most helpful response, but this particular question has popped up in the forums a few times now. Here, here, here, and so on. Seems like it might be something that your hosting provider/your server is blocking, not your robots.txt file.

        Arropa 1 Reply Last reply Reply Quote 1
        • Arropa
          Arropa @garfield_disliker last edited by

          Many thanks for the reply. The server we use is a dedicated server which we set up ourselves inc OS and control panel. Just seems very odd that every other tool is working fine etc but moz won't. I cannot see how it would need anything special from, say, Raven's site crawler.

          I will check out those other threads though to see if i missed anything, thanks for the links.

          Just checked port 80 using http:// www.yougetsignal. com/tools/open-ports/ (not sure if links allowed) and no problems there.

          1 Reply Last reply Reply Quote 1
          • DavidLee
            DavidLee last edited by

            Network errors can be somewhere between us and your site and not necessarily directly with your server itself. The best bet would be to check with your ISP for any connectivity issues to your server. Since your issues are only the first time they are reported, the next crawl may be more successful.

            One thing though you will want to keep your user-agent directives in a single block of code without spaces.

            so

            Crawlers Setup

            User-agent: *

            Directories

            Disallow: /ajax/
            Disallow: /404/
            Disallow: /app/

            would need to look like:

            Crawlers Setup

            User-agent: *

            Directories

            Disallow: /ajax/
            Disallow: /404/
            Disallow: /app/

            Arropa 1 Reply Last reply Reply Quote 2
            • Arropa
              Arropa @DavidLee last edited by

              Thanks for the hints re the robots, will tidy that up.

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post
              • Crawl Test is now On-Demand Crawl!
                Libra_Photographic
                Libra_Photographic
                6
                6
                870

              • Crawl tests stuck in queue
                moz_support
                moz_support
                0
                2
                33

              • Crawl test csv has lost its formatting??
                GPFTeam
                GPFTeam
                0
                11
                162

              • How much time should I wait between Crawl Tests?
                md3
                md3
                0
                3
                225

              • Why does the moz crawl test lists page twice?
                iSTORM-New-Media
                iSTORM-New-Media
                0
                3
                224

              • Crawl Test
                DavidLee
                DavidLee
                0
                8
                177

              • Did the Crawl Test tool go away or was it replaced
                DougRoberts
                DougRoberts
                0
                2
                82

              • What happened to moz Crawl Test? Is it moved in the redesign?
                Vahe.Arabian
                Vahe.Arabian
                0
                3
                814

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy