The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. How can I get a list of every url of a site in Google's index?

    How can I get a list of every url of a site in Google's index?

    Intermediate & Advanced SEO
    8 5 1.1k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 94501
      94501 last edited by

      I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is.

      Is there a way to get an excel sheet with every url Google has indexed for a site?

      Thanks... Mike

      1 Reply Last reply Reply Quote 0
      • Martijn_Scheijbeler
        Martijn_Scheijbeler last edited by

        Hi Mike,

        There a couple of solutions, neither of them provide you with 100% of data. The best would be to export a list of landing pages from Google Analytics or your favorite web analytics tool segmented by organic search/ Google. This would provide you with a list of pages that received traffic via search and so are indexed. If you cross reference them with your sitemaps that might already help you out a bit. Besides that you could crawl and scrape the URLS for a site:xxx.com search.

        94501 1 Reply Last reply Reply Quote 0
        • Kingof5
          Kingof5 last edited by

          Might be something you can get from the WMT API.

          Also, to really see how many pages are indexed, do a site:xxxx.com search, go to the last page, include omitted results, go to the last page again, and add up how many you have. That's probably the most accurate number.

          94501 1 Reply Last reply Reply Quote 0
          • 94501
            94501 @Martijn_Scheijbeler last edited by

            Hi Marijn,

            Thanks for the suggestions. 2.5 years of G/A organic landing pages is 10,000 urls.... 1/2 as many as the site map and 1/3rd as many as Google says indexed. On scraping google, do you know of a tool for that?

            Thanks... Mike

            KaneJamison 1 Reply Last reply Reply Quote 0
            • 94501
              94501 @Kingof5 last edited by

              Yes, WMT API doesn't have it. The site site:xxxx.com search is where are got one of the two too high numbers. Thanks... Mike

              1 Reply Last reply Reply Quote 0
              • DJ123
                DJ123 last edited by

                You could probably write a macro to do this, although just because you could doesn't mean you should.  I don't think it is advisable because you do not want to violate any terms of use for anyone.  That is never a good thing.

                1 Reply Last reply Reply Quote 0
                • KaneJamison
                  KaneJamison @94501 last edited by

                  You can do that with a tool like Scrapebox or Outwit. Go slow, or else you'll need to use proxies to get Google to respond fast enough. As another commenter mentioned, it's probably against TOS.

                  1 Reply Last reply Reply Quote 0
                  • KaneJamison
                    KaneJamison last edited by

                    If this is still an issue you're facing, have you checked the sitemap settings to see which page types are getting included? For example, a site with a few thousand tags that are not entered in the sitemap but not yet set to noindex could easily produce extra pages like this.

                    The next step is parameterization. Anything going on there with search URLs or product URLs? eg ?refid=1235134&q=search+term or ?prod=152134&variant=blue

                    If you really want to scrape through Google, get a list of your sitemap and scrape queries like "inurl:domain.com/a", "inurl:domain.com/b", "inurl:domain.com/c". etc. This should allow you to dive deeper into the site map to see what Google really has indexed. For URL subfolders with tons of URLs like domain.com/product/a, you'll want to do the same thing at a subfolder level instead of root URLs.

                    1 Reply Last reply Reply Quote 0
                    • 1 / 1
                    • First post
                      Last post
                    • Can't support IE 7,8,9, 10\. Can we redirect them to another page that's optimized for those browsers so that we can have our site work on modern browers while still providing a destination of IE browsers?
                      0
                      1
                      18

                    • Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
                      Martijn_Scheijbeler
                      Martijn_Scheijbeler
                      0
                      11
                      1.6k

                    • Getting into Google News, URL's & Sitemaps
                      Martijn_Scheijbeler
                      Martijn_Scheijbeler
                      0
                      3
                      290

                    • Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's
                      Atlanta-SMO
                      Atlanta-SMO
                      0
                      6
                      1.6k

                    • Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
                      KristinaKledzik
                      KristinaKledzik
                      0
                      3
                      138

                    • Killing 404 errors on our site in Google's index
                      Marcus_Miller
                      Marcus_Miller
                      0
                      5
                      442

                    • How can we get a site reconsidered for Google indexing?
                      d25kart
                      d25kart
                      0
                      3
                      301

                    • Export list of urls in google's index?
                      nicole.healthline
                      nicole.healthline
                      0
                      3
                      2.6k

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    • Digital Marketers
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy