The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Moz Tools
    4. Initial Crawl Questions

    Initial Crawl Questions

    Moz Tools
    4 2 1.0k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • RyanKent
      RyanKent last edited by

      Hello.

      I just joined and used the Crawl tool. I have many questions and hoping the community can offer some guidance.

      1. I received an Excel file with 3k+ records. Is there a friendly online viewer for the Crawl report? Or is the Excel file the only output?

      2. Assuming the Excel file is the only output, the Time Crawled is a number (i.e. 1305798581). I have tried changing the field to a date/time format but that did not work. How can I view the field as a normal date/time such as May 15, 2011 14:02?

      3. I use the ™ symbol in my Title. This symbol appears in the output as a few ascii characters. Is that a concern? Should I remove the trademark symbol from my Title?

      4. I am using XenForo forum software. All forum threads automatically receive a Title Tag and Meta Description as part of a template. The Crawl Test report shows my Title Tag and Meta Description as blank for many threads. I have looked at the source code of several pages and they all have clean Title tags and I don't understand why the Crawl Report doesn't show them. Any ideas?

      5. In some cases the HTTP Status Code field shows a result of "3". Why does that mean?

      6. For every URL in the Crawl Report there is an entry in the Referrer field. What exactly is the relationship between these fields? I thought the Crawl Tool would inspect every page on the site. If a page doesn't have a referring page is it missed? What if a page has multiple referring pages? How is that information displayed?

      7. Under Google Webmaster Tools > Site Configurations > Settings > Parameter Handling I have the options set as either "Ignore" or "Let Google Decide" for various URL parameters. These are "pages" of my site which should mostly be ignored. For example a forum may have 7 headers, each on of which can be sorted in ascending or descending order. The only page that matters is the initial page. All the rest should be ignored by Google and the Crawl.

      Presently there are 11 records for many pages which really should only have one record due to these various sort parameters. Can I configure the crawl so it ignores parameter pages?

      I am anxious to get started on my site. I dove into the crawl results and it's just too messy in it's present state for me to pull out any actionable data. Any guidance would be appreciated.

      1 Reply Last reply Reply Quote 0
      • mattbeswick
        mattbeswick last edited by

        I can help with a few of those:

        1. Looks like you're using the crawl tool. If this is for an on-going project, go to http://www.seomoz.org/campaigns and set one up. That way you get a sexy GUI (if you like robots that is) and weekly crawls / rank tracking.

        2. That number is almost certainly a UNIX timestamp. To convert it inside excel use the formula below (don't forget to format the cell as a date, otherwise you just see a random number!):

        =(A1/86400)+25569+(-5/24)
        

        3. I wouldn't worry about that at all - the crawler converts any non-standard characters to ASCII but, as far as I know, it won't affect your SERP performance.

        4. Could you give a few examples of the pages that are affected so I can take a look?

        5. That's either a bug or (not too likely but worth checking) an issue with how the numbers are formatted in your spreadsheet. I'd advise opening the file using a text editor to check that the numbers that excel shows match up with the raw format and, if they do, submitting a bug report to the SEOMoz team.

        6. The referrer cell tells you how the crawler got to that page. If you don't have any internal links to a page on your site then, chances are, the crawler won't find it. The only caveat to that (and I'm not 100% sure so would need confirmation) is that if the crawl tool uses external linking data. I'd always assumed it didn't but SEOMoz will know where some of your pages are even if you don't link to them internally as external sites will point to them. If that's the case it could be the reason that the referrer cell is blank.

        7. Remember that this is SEOMoz crawling your site, not Google. Anything you set in Webmaster tools isn't visible by other search engine spiders such as those used by Bing, Yahoo!, SEOMoz, Majestic, etc. Because of that they won't know how to handle your URL parameters. You're best setting this through either a meta robots tag, robots.txt, or .htaccess (depending on what you're trying to do). Be careful though - if you mess it up there's a strong possibility that you'll end up blocking pages that you want the search engines to be able to access!

        Hope that's all helpful... give me a shout if there's anything else.

        • Matt
        RyanKent 1 Reply Last reply Reply Quote 1
        • RyanKent
          RyanKent @mattbeswick last edited by

          Thank you very much for the detailed reply.

          For #1, I did start my campaign and I will follow up.

          2. That worked perfect!

          3. Thank you for the information.

          4. I realize the problem. It appears the crawler differentiates on the slightest difference in a URL. There are many pages which it shows ending with a slash "/" but those pages are often linked to without an ending slash. The latter pages do not show their Titles nor Meta tags in the crawler report. I presume this is just a crawler issue and would not affect SEO performance.

          5. I checked the cell formatting and it is "General" which should be fine. All of the rest of the HTTP Status codes appear normally. What I did notice is that all of the "3" codes refer to attachments. Most attachments show a "3" code, but a few show as 301s.

          6. Good to know, thanks for sharing.

          7. My main follow up question would be, is there any harm to setting up in robots.txt to disregard all parameter URLs? Basically I want to clean things up, and all of those URLs which are style or sorting variations aren't helpful to any crawler, and those pages shouldn't be indexed.

          1 Reply Last reply Reply Quote 0
          • mattbeswick
            mattbeswick last edited by

            Good question. There are a few ways of doing it but I'd advise using a canonical URL on each page to tell the search engines where the content stems from. I had a quick look at XenoForo and this looks relatively simple to do... although make sure you test things thoroughly just in case 🙂

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Crawl diagnostics up to date after Magento ecommerce site crawl?
              Whebb
              Whebb
              0
              2
              218

            • How do a run a MOZ crawl of my site before waiting for the scheduled weekly crawl?
              Kingalan1
              Kingalan1
              0
              6
              654

            • Question about Crawl Diagnostics - 4xx (Client Error) report
              fablau
              fablau
              0
              7
              436

            • OK Crawl Test Link Question Again!
              ChiarynMiranda
              ChiarynMiranda
              0
              4
              198

            • Crawl Diagnostics - Canonical Question
              KeriMorgret
              KeriMorgret
              0
              3
              416

            • Crawl Disgnosis only crawling 250 pages not 10,000
              kenneth_martin
              kenneth_martin
              0
              7
              409

            • Hi, New to SEOMOZ and question about ON Page Optimization questions
              Anthony_NorthSEO
              Anthony_NorthSEO
              0
              3
              489

            • Crawl test. Bot crawled only 200 or so links when it should have crawled thousands
              Ev84
              Ev84
              0
              9
              1.2k

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy