The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. How to determine which pages are not indexed

    How to determine which pages are not indexed

    Technical SEO Issues
    18 9 36.2k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • BlueprintMarketing
      BlueprintMarketing last edited by

      I hope the two links below will give you the information that you are looking for. I believe that you will find quite a bit from the second link and the first link will give you a free resource and finding exactly how many links pages have been indexed as far as how many have not you can only find that using the second  link

      http://www.northcutt.com/tools/free-seo-tools/google-indexed-pages-checker/

      along with

      http://support.google.com/webmasters/bin/answer.py?hl=en&answer=2642366

      Go to advanced and it will offer you a show all

      1 Reply Last reply Reply Quote 1
      • BlueprintMarketing
        BlueprintMarketing @TakeshiYoung last edited by

        http://www.screamingfrog.co.uk/

        Google analytics should be able to tell you the answers to this as well. I'm sorry I do not think that earlier however I stand by my Google Webmaster tools especially after consulting with a few more people.

        you can use

        http://marketing.grader.com/

        then when done go to seo Scroll to bottom you will see exactly how many pages have been indexed successfully by Google.

        Mr. Young,

        I would like to know if this person does not have a 301 redirect Wood your site scan work successfully? Because under your directions it would not and I'm not giving you thumbs down on it you know

        ThompsonPaul 1 Reply Last reply Reply Quote -6
        • ThompsonPaul
          ThompsonPaul @BlueprintMarketing last edited by

          Thomas, as Takeshi has tried to point out, you have misread the original question. The original poster is asking for a way to find the actual URLS of pages from his site that are NOT indexed in the search engines.

          He is not looking for the number of URLS that are indexed.

          None of the tools you have repeatedly mentioned are capable of providing this information, which is likely why you're response was downvoted.

          Best to carefully read the original question to ensure you are answering what is actually being asked, rather than what you assume is being asked. Otherwise you add significant confusion to the attempt to provide an answer to the original poster.

          Paul

          BlueprintMarketing 1 Reply Last reply Reply Quote 5
          • ThompsonPaul
            ThompsonPaul last edited by

            There is no individual tool capable of providing the info you're looking for, Seth. At least as far as I've ever come across.

            HOWEVER! It is possible to do it if you are willing to do some of the work on your own to collect and manipulate data using several tools. Essentially this method automates the approach Takeshi has mentioned.

            The short answer
            First you'll create a list of all the pages on your website. Then you'll create a list of all the URLs that Google says are indexed. From there, you will use Excel to subtract the indexed URLs from the known URLs, leaving a list of non-indexed URLS, which is what you asked for.

            Ready? Here's how.

            Collect a list of all your site's pages You can do this in several ways. If you have a reliable and complete sitemap, you can get this data there. If your CMS is capable of outputting such a list, great. If neither of these is an option, you can use the Screaming Frog spider to get the data (remember the free version will only collect up to 500 pages). Xenu Linksleuth is also an alternative. Put all these URLs into a spreadsheet.

            Collect a list of all pages Google has indexed.
            You'll do this using a scraper tool that will "scrape" all the URLs off a Google SERP page. There are many tools to do this; which one is best will depend largely on how big your site is. Assuming your site is only 7 or 800 pages, I recommend the brilliantly simple SERPS Redux bookmarklet from Liam Delahunty.

            Clicking on the bookmarklet while on a SERP page will automatically scrape all the URLs into an easily copyable format. The trick is, you want the SERP page to display as many results as possible, otherwise you'll have to iterate through many, many pages to catch everything.

            So - pro tip - if you go to the setting icon while on any Google search page, and select Search Settings you will see the option to have your searches return up to 100 results instead of the usual 10. You have to select Never Show Instant Results in order for the Results per Page slider to become active.

            Now, in Google's search box, you'll enter site:mysite.com as Takeshi explained. (NOTE: use the canonical version of your domain, so include the www if that's the primary version of your site) You should now have a page listing 100 URLs of your site that are indexed.

            • Click the SERPRedux bookmarklet to collect them all, then copy and paste the URLs into a spreadsheet.
            • Go back to the site:mydomain results page, click for page 2, and repeat, adding the additional URLs to the same spreadsheet.
            • Repeat this process until you have collected all the URLs Google lists

            Remove duplicates to leave just un-indexed URLs
            Now you have a spreadsheet with all known URLs and all indexed URLs. Use Excel to remove all the duplicates, and what you will be left with is all the URLs that Google doesn't list as being indexed.

            Voila !

            A few notes:

            • The site: search operator doesn't guarantee that you'll actually get all indexed URLs, but it's the closest you'll be able to get. For an interesting experiment, re-run this process with the non-canonical version of your site address as well, to see where you might be indexed for duplicates.
            • If your site is bigger, or you will need to do this multiple times, there are tools that will scrape all the SERPS pages at once so you don't have to iterate through them. The scraper components of SEER's SEO Toolbox or Neil Bosma's SEO Tools for Excel are good starting points. There is also a paid tool called ScrapeBox designed specifically for this kind of scraping. It's a blackhat tool, but in the right hands, is also powerful for whitehat purposes
            • Use Takeshi's suggestion of running some of the resulting non-indexed list through manual site: searches to confirm the quality of your list

            Whew! I know that's a lot to throw at you as an answer to what probably seemed like a simple question, but I wanted to work through the steps for you, rather than just hint at how it could be done.

            Be sure to ask about any of the areas where my explanation isn't clear enough.

            Paul

            wrttnwrd BlueprintMarketing 2 Replies Last reply Reply Quote 7
            • BlueprintMarketing
              BlueprintMarketing @ThompsonPaul last edited by

              Dear Paul,

              thank you for taking the time to address this.

              I did become extremely hastily when I wrote my 1st answer I copy and pasted off of a dictation software that I use. I then went on to wrongfully say this is the correct way to do something. However screaming frog SEO spider

              Is a tool that I referenced early on this tool allows you to see 100% of all the links you are hosting at the time you run the scan.

              And includes the ability to check if it is indexed with Google, Bing and Yahoo when I referenced this software nobody took notice as I probably looked like I did not know what I was talking about.

              In hindsight I should have kept bringing up screaming frog however I did not I simply brought up other ways to check lost links. In my opinion going into Google and clicking one by one on what you do or do not know is indexed is a very long and arduous task.

              Screaming frog allows you to click internal links then right-click check if indexed there will be a table that comes down on the right side. You can select from the 3 big search engines you can do many more things with this fantastic tool but I did not illustrate as well as I am right now exactly how this tool should be used or what its capabilities are. I truly thought once I had referenced it somebody would look into it and they would see what I was speaking about however hindsight is 2020 I appreciate your comment very much and hope you can see that yes I'm here mistaken the beginning however I did come up with an automated tool to give him the answer the question asked.

              Screaming frog can be used on PC, Mac or Linux it is free to download and comes in a pay version with even more abilities then water are showcased in the free edition. It is only 2 Mb in size and uses almost no RAM on a Mac I don't know how big it is on the PC

              here's the link to the software

              http://www.screamingfrog.co.uk/seo-spider/

              I hope that you will accept my apologies for not paying this much attention as I should have to what I pasted and hope this tool will be of use to you.

              Respectfully,

              Thomas

              sdukyuG.png

              ThompsonPaul 1 Reply Last reply Reply Quote -1
              • ThompsonPaul
                ThompsonPaul @BlueprintMarketing last edited by

                Thanks for the reminder that Screaming Frog has that "Check Index" functionality, Thomas.

                Unfortunately, I've never been able to get that method to check more than one link at a time, as all it does is send the request to a browser to check. Even highlighting multiple URLs and checking for indexation only checks the first one. Great for spot checks, but not what Seth is looking for, I don't think. My other post details an automatic way to check a site's hundreds (or thousands) of pages at a time.

                I only have the free version of Screaming Frog on this machine at the moment so would be very interested to know if the paid version changes this.

                Paul

                BlueprintMarketing 1 Reply Last reply Reply Quote 0
                • BlueprintMarketing
                  BlueprintMarketing @ThompsonPaul last edited by

                  hi Paul,

                  I too have not had any luck with Screaming Frog actually checking every link that it claims it will. You're exactly right it will check the homepage or the single link that you choose. However it will not from my experience check everything. I have a friend who has the paid version I will ask him.

                  I'll be sure to let you know. Because I do agree with you I just found this out myself in fact it is misleading to say check all and really check just one.

                  Excellent tutorial by the way of how to do this seemingly easy task however when attempted is truly not easy at all.

                  Sincerely,

                  Thomas

                  PS I get this result site:www.example.com

                  he gives me the opportunity to see all the indexed pages Google has processed I however would have to compare them to a csv file in order to actually know what is missing.

                  I really like your example and definitely will use that in the future.

                  1 Reply Last reply Reply Quote 0
                  • wrttnwrd
                    wrttnwrd @ThompsonPaul last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • AaronH
                      AaronH last edited by

                      Crawl the domain using SF and then use URL profiler to check their indexation status.

                      You'll need proxies.

                      Can be done with Scrape box too

                      Otherwise you can probably use Sheets with some importxml wizardry to create a query on Google

                      http://www.screamingfrog.co.uk/

                      http://urlprofiler.com/

                      http://www.scrapebox.com/

                      1 Reply Last reply Reply Quote 0
                      • BlueprintMarketing
                        BlueprintMarketing @ThompsonPaul last edited by

                        Deep crawl will  provide the information with one tool. It's not in expensive but it's definitely the best tool out there you have to connected to Google analytics in order for it to give you this information but it will show you how many of your your url are  index and how many are not & should be.

                        http://urlprofiler.com/

                        If contentEd to  Google Webmaster tools, Google analytics & then any of t analytics he many ways of scraping or indexing the site.

                        Technically that is more than one tool but it is a good way.

                        All the best,

                        tom

                        1 Reply Last reply Reply Quote 0
                        • brettmandoes
                          brettmandoes last edited by

                          I'm running into this same issue where I have about a quarter of a client's site not indexing. Using the site:domain.com trick shows me 336 results - which I somehow need to add to a csv file, compare against the URLs crawled by screaming frog, and then use VLOOKUP to find the unique values.

                          So how can I get those 300+ results exported to a csv file for analysis?

                          1 Reply Last reply Reply Quote 0
                          • mfrgolfgti
                            mfrgolfgti @TakeshiYoung last edited by

                            Hi, I know this is an old question but I wanted to ask about the first paragraph of your answer: "You can start by trying the "site:domain.com" search. This won't show you all the pages which are indexed, but it can help you determine which ones aren't indexed."

                            Do you happen to know why doing a site:domain.com search doesn't show all the indexed pages? I've just discovered this for our website. Down the site: command shows 73 pages but checking through the list, there are lots of pages not included. However if I do the site:domain.com/page.html command for those individual pages, they do come up in the search results page. I don't understand why though?

                            1 Reply Last reply Reply Quote 0
                            • 1 / 1
                            • First post
                              Last post
                            • Indexed pages
                              MikeGracia
                              MikeGracia
                              1
                              6
                              1.5k

                            • Anything new if determining how many of a sites pages are in Google's supplemental index vs the main index?
                              SEMPassion
                              SEMPassion
                              0
                              4
                              390

                            • Pages not being indexed
                              wissamdandan
                              wissamdandan
                              0
                              2
                              122

                            • Home page indexed but not ranking...interior pages with thin content outrank home page??
                              DougHosmer
                              DougHosmer
                              0
                              3
                              294

                            • Indexed pages and current pages - Big difference?
                              grasshopper
                              grasshopper
                              0
                              4
                              445

                            • Secondary Pages Indexed over Primary Page
                              KeriMorgret
                              KeriMorgret
                              0
                              5
                              687

                            • Page not being indexed
                              rasmusbang
                              rasmusbang
                              0
                              8
                              776

                            • Discrepency between # of pages and # of pages indexed
                              Dan-Petrovic
                              Dan-Petrovic
                              0
                              14
                              990

                            Get started with Moz Pro!

                            Unlock the power of advanced SEO tools and data-driven insights.

                            Start my free trial
                            Products
                            • Moz Pro
                            • Moz Local
                            • Moz API
                            • Moz Data
                            • STAT
                            • Product Updates
                            Moz Solutions
                            • SMB Solutions
                            • Agency Solutions
                            • Enterprise Solutions
                            • Digital Marketers
                            Free SEO Tools
                            • Domain Authority Checker
                            • Link Explorer
                            • Keyword Explorer
                            • Competitive Research
                            • Brand Authority Checker
                            • Local Citation Checker
                            • MozBar Extension
                            • MozCast
                            Resources
                            • Blog
                            • SEO Learning Center
                            • Help Hub
                            • Beginner's Guide to SEO
                            • How-to Guides
                            • Moz Academy
                            • API Docs
                            About Moz
                            • About
                            • Team
                            • Careers
                            • Contact
                            Why Moz
                            • Case Studies
                            • Testimonials
                            Get Involved
                            • Become an Affiliate
                            • MozCon
                            • Webinars
                            • Practical Marketer Series
                            • MozPod
                            Connect with us

                            Contact the Help team

                            Moz logo
                            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                            • Accessibility
                            • Terms of Use
                            • Privacy