The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Why do I have so many extra indexed pages?

    Why do I have so many extra indexed pages?

    Intermediate & Advanced SEO
    7 4 120
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tylerj
      Tylerj last edited by

      Stats-

      Webmaster Tools Indexed Pages- 96,995

      Site: Search- 97,800 Pages

      Sitemap Submitted- 18,832

      Sitemap Indexed- 9,746

      I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?

      1 Reply Last reply Reply Quote 0
      • Adriaan.Multiply
        Adriaan.Multiply last edited by

        As long as your organic traffic is doing fine I shouldn't be too concerned. That being said:

        • Is your robots.txt or search console disallowing crawler access to parameters like '?count=' or '?color='?
        • Is your robots.txt disallowing crawler access to urls that have a 'noindex' but were indexed before they got noindex?
        • You can also take a couple of parameters from your site and test if any url's have been indexed, by using the 'inurl:parameter site:www.site.com' query.
        • Are some of the canonicalized urls indexed anyway? This may indicate that page content is different enough for Google to index both versions.
        • If there's a ton of articles that go in and out of stock and use dynamic ID's, Google may keep these in their index. Do out of stock articles return a 404 or are they kept alive?
        Tylerj 1 Reply Last reply Reply Quote 1
        • Tylerj
          Tylerj @Adriaan.Multiply last edited by

          I wouldn't say it is doing fine. Before I started they launched a new site and messed up the 301 redirects. Traffic hasn't recovered yet.

          For Robots I am using the Inchoo robots.txt-http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/ maybe it is a parameters issue, but I can't figure out how to see all my indexed pages.

          I tried doing a search for both inurl:= site:www.site.com and inurl:? site:www.site.com and nothing showed up unless I am missing something.

          I can't figure out how to check if some of the canonicalized urls are indexed. The pages are all identical though.

          We have less then 100 out of stock items.

          1 Reply Last reply Reply Quote 0
          • Martijn_Scheijbeler
            Martijn_Scheijbeler last edited by

            • Have you checked out the parameters settings in Google Search Console to find out how many pages Google has found for your site with the same parameters? That might give some insights on that side.
            • How many images do you have across the site? Do you have image sitemaps for these kind of pages.

            What I would advise + what you've already been trying is to get a full crawl by either using ScreamingFrog or Deepcrawl. This will provide you with better insights into how many pages a search engine can really find.

            Tylerj 1 Reply Last reply Reply Quote 1
            • Tylerj
              Tylerj @Martijn_Scheijbeler last edited by

              1. Your first one is interesting. I actually haven't been in there before. There are 96 rows and everyone of them is set to let Googlebot Decide. Do you think I should change that up?

              2. Not sure on how many images we have but it is a lot. Not we do not have an image sitemap.

              I tried Screaming Frog and it couldn't handle it. After about 1.5 million urls it kept locking up. I just setup a free trial for Deep Crawl. It can only do 10,000 but I will see if it has anything worthwhile.

              ZestUK 1 Reply Last reply Reply Quote 0
              • ZestUK
                ZestUK @Tylerj last edited by

                To ensure Screaming Frog can handle the crawl you could chunk up the site and crawl it in parts, e.g. by each subdirectory. This can be done within the 'configuration' menu under 'include'. There's loads of tutorials online.

                You can also use exclude to ensure it doesn't crawl unnecessary pages, images or scripts for example on wordpress I often block wp-content

                Definitely sounds like a problem with query parameters being indexed though and its often good to ensure these are addressed in the search console.

                1 Reply Last reply Reply Quote 1
                • Tylerj
                  Tylerj last edited by

                  It ended up being my search results. I was able to use the site operator to break it down.

                  1 Reply Last reply Reply Quote 1
                  • 1 / 1
                  • First post
                    Last post
                  • How do we decide which pages to index/de-index? Help for a 250k page site
                    julie-getonthemap
                    julie-getonthemap
                    0
                    2
                    63

                  • What to do when your home page an index for a series of pages.
                    donford
                    donford
                    0
                    7
                    154

                  • HTTPS pages - To meta no-index or not to meta no-index?
                    TomVolpe
                    TomVolpe
                    0
                    3
                    856

                  • "No index" page still shows in search results and paginated pages shows page 2 in results
                    khi5
                    khi5
                    0
                    3
                    114

                  • Can too many "noindex" pages compared to "index" pages be a problem?
                    fablau
                    fablau
                    0
                    13
                    1.3k

                  • Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
                    KeriMorgret
                    KeriMorgret
                    0
                    3
                    378

                  • Too many on page links - product pages
                    KeriMorgret
                    KeriMorgret
                    0
                    2
                    352

                  • Why are so many pages indexed?
                    DougRoberts
                    DougRoberts
                    0
                    5
                    445

                  Get started with Moz Pro!

                  Unlock the power of advanced SEO tools and data-driven insights.

                  Start my free trial
                  Products
                  • Moz Pro
                  • Moz Local
                  • Moz API
                  • Moz Data
                  • STAT
                  • Product Updates
                  Moz Solutions
                  • SMB Solutions
                  • Agency Solutions
                  • Enterprise Solutions
                  • Digital Marketers
                  Free SEO Tools
                  • Domain Authority Checker
                  • Link Explorer
                  • Keyword Explorer
                  • Competitive Research
                  • Brand Authority Checker
                  • Local Citation Checker
                  • MozBar Extension
                  • MozCast
                  Resources
                  • Blog
                  • SEO Learning Center
                  • Help Hub
                  • Beginner's Guide to SEO
                  • How-to Guides
                  • Moz Academy
                  • API Docs
                  About Moz
                  • About
                  • Team
                  • Careers
                  • Contact
                  Why Moz
                  • Case Studies
                  • Testimonials
                  Get Involved
                  • Become an Affiliate
                  • MozCon
                  • Webinars
                  • Practical Marketer Series
                  • MozPod
                  Connect with us

                  Contact the Help team

                  Join our newsletter
                  Moz logo
                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                  • Accessibility
                  • Terms of Use
                  • Privacy