The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Working out exactly how Google is crawling my site if I have loooots of pages

    Working out exactly how Google is crawling my site if I have loooots of pages

    Intermediate & Advanced SEO
    3 3 875
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • soeren.hofmayer
      soeren.hofmayer last edited by

      I am trying to work out exactly how Google is crawling my site including entry points and its path from there. The site has millions of pages and hundreds of thousands indexed. I have simple log files with a  time stamp and URL that google bot was on. Unfortunately there are hundreds of thousands of entries even for one day and as it is a massive site I am finding it hard to work out the spiders paths. Is there any way using the log files and excel or other tools to work this out simply? Also I was expecting the bot to almost instantaneously go through each level eg. main page--> category page ---> subcategory page (expecting same time stamp) but this does not appear to be the case. Does the bot follow a path right through to the deepest level it can/allowed to for that crawl and then returns to the higher level category pages at a later time? Any help would be appreciated

      Cheers

      1 Reply Last reply Reply Quote 0
      • eyepaq
        eyepaq last edited by

        I've run into the same issue for a site with 40 k + pages - far from your overall page # but still .. maybe it's the same flow overall.

        The site I was working on had a structure of about 5 level deep. Some of the areas within the last level were out of reach and they didn't get indexed. More then that even a few areas on level 2 were not present in the google index and the google boot didn't visit those either.

        I've created a large xml site map and a dynamic html sitemap with all the pages from the site and submit it via webmaster tool (the xml sitemap that is) but that didn't solve the issue and the same areas were out of the index and didn't got hit. Anyway the huge html sitemap was impossible to follow from a user point of view so I didn't keep that online for long but I am sure it can't work that way either.

        What i did that finally solved the issue was to spot the exact areas that were left out, identify the "head" of those pages - that means several pages that acted as gateway for the entire module and I've build a few outside links that pointed to those pages directly and a few that were pointed to main internal pages of those modules that were left out.

        Those pages gain authority fast and only in a few days we've spotted the google boot staying over night 🙂

        All pages are now indexed and even ranking well.

        If you can spot some entry pages that can conduct the spider to the rest you can try this approach - it should work for you too.

        As far as links I've started with social network links, a few posts with links within the site blog (so that means internal links) and only a couple of outside links - articles with content links for those pages. Overall I think we are talking about 20-25 social network links (twitter, facebook, digg, stumble and delic), about 10 blog posts published in a 2-3 days span and about 10 articles in outside sources.

        Since you have a much larger # as far as pages you probably will need more gateways and that means more links - but overall it's not a very time consuming session and it can solve your issue... hopefully 🙂

        wazza1985 1 Reply Last reply Reply Quote 0
        • wazza1985
          wazza1985 @eyepaq last edited by

          Can you explain to me how you did your site map for this please?

          1 Reply Last reply Reply Quote 0
          • 1 / 1
          • First post
            Last post
          • Google Mobile site crawl returns poorer results on 100% responsive site
            WebQuest
            WebQuest
            0
            4
            37

          • Magento 1.9 SEO. I have product pages with identical On Page SEO score in the 90's. Some pull up Google page 1 some won't pull up at all. I am searching for the exact title on that page.
            CTOPDS
            CTOPDS
            0
            3
            63

          • Best way to link to 1000 city landing pages from index page in a way that google follows/crawls these links (without building country pages)?
            lcourse
            lcourse
            0
            7
            54

          • Google crawling 200 page site thousands of times/day. Why?
            brettmandoes
            brettmandoes
            0
            7
            90

          • Merging your google places page with google plus page.
            junkcars
            junkcars
            0
            3
            518

          • Is there any delay between crawling a page by google and displaying of the ratings in rich snippet of the results in google?
            NEWCRAFT
            NEWCRAFT
            0
            3
            545

          • Does Google crawl the pages which are generated via the site's search box queries?
            YannickVeys
            YannickVeys
            0
            5
            606

          • 1 of the sites i work on keeps having its home page "de-indexed" by google every few months, I then apply for a review and they put it back up. But i have no idea why this keeps happening and its only the home page
            RyanKent
            RyanKent
            0
            7
            606

          Get started with Moz Pro!

          Unlock the power of advanced SEO tools and data-driven insights.

          Start my free trial
          Products
          • Moz Pro
          • Moz Local
          • Moz API
          • Moz Data
          • STAT
          • Product Updates
          Moz Solutions
          • SMB Solutions
          • Agency Solutions
          • Enterprise Solutions
          • Digital Marketers
          Free SEO Tools
          • Domain Authority Checker
          • Link Explorer
          • Keyword Explorer
          • Competitive Research
          • Brand Authority Checker
          • Local Citation Checker
          • MozBar Extension
          • MozCast
          Resources
          • Blog
          • SEO Learning Center
          • Help Hub
          • Beginner's Guide to SEO
          • How-to Guides
          • Moz Academy
          • API Docs
          About Moz
          • About
          • Team
          • Careers
          • Contact
          Why Moz
          • Case Studies
          • Testimonials
          Get Involved
          • Become an Affiliate
          • MozCon
          • Webinars
          • Practical Marketer Series
          • MozPod
          Connect with us

          Contact the Help team

          Join our newsletter
          Moz logo
          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
          • Accessibility
          • Terms of Use
          • Privacy