The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Indexed pages and current pages - Big difference?

    Indexed pages and current pages - Big difference?

    Technical SEO Issues
    4 4 445
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Nathan.Smith
      Nathan.Smith last edited by

      Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference?

      The domain canonical is set so can't really figure out if we've got a problem or not?

      1 Reply Last reply Reply Quote 0
      • deltasystems
        deltasystems last edited by

        Yes this is a potentially significant problem. The easiest way to troubleshoot is to do the 'site:' command again, and go to the last page of results. You should be seeing pages that aren't in your sitemap. Very likely duplicated content.

        If you are having a rough time troubleshooting, post a link and I'll be glad to take a peek.

        1 Reply Last reply Reply Quote 1
        • bronxpad
          bronxpad last edited by

          You might have a duplicate content issue. You will want to check if you have the proper 301 redirect and a canonical command in the head of your code. If you don't have this set properly then the search engines will see the www and non-www versions of your site as duplicate. Also remember that the search engines also by default place this at the end of the url /

          Here are two links that can help if this is the issue.

          http://www.webconfs.com/how-to-redirect-a-webpage.php/

          http://www.mattcutts.com/blog/rel-canonical-html-head/

          Hope this helps. Good Luck

          1 Reply Last reply Reply Quote 0
          • grasshopper
            grasshopper last edited by

            Hi Nathan,

            The delta between the number of pages returned by the site: operator and the number of pages in your sitemap could be down to a number of issues:

            1. Your XML sitemap may represent only a percentage of the total number of valid content URLs that your site is capable of generating.

            a) Often sites will only generate XML sitemaps for URLs that someone has decided are "important", when the total number of URLs is much larger.

            1. Your XML sitemap contains ALL the valid content URLs that your site is capable of generating, but search engines are somehow finding more URLs.

            a) Look in Google Webmaster Tools under Optimization >> HTML improvements >> Duplicate title tags

            i) Do the pages with duplicate titles have duplicate page content?  If so, your publishing platform is allowing multiple URLs to render the same content, which is a bug that needs to be fixed

            b) Run a crawler like Xenu Link Sleuth or Screaming Frog against your site, and see how many URLs they discover.  Export the results to Excel and look for weird URLs

            i) Usually culprits for duplicate content include incorrect canonicalization (www vs non-www, URLs ending in /index.html vs just /, etc)

            ii) Look for URLs ending with strange query strings (affiliate tracking, session IDs, etc)

            c) Use the site: operator in other engines (Bing, blekko, etc) and compare the numbers they return.  Especially if this number is larger than the number Google is returning, starting looking for weird URL patterns

            Also, I'm not sure what you mean by "the domain canonical has been set correctly".  If you're referring to use of the canonical link element for every URL, there are plenty of ways that can go wrong.  E.g., if your CMS requires that each published URL have rel="canonical", but allows URLs to be published with and without the trailing /index.html, you can end up with a canonical link element on the non-canonical version of the URL, further confusing engines.  Something to look into.

            1 Reply Last reply Reply Quote 1
            • 1 / 1
            • First post
              Last post
            • Is there a way to index important pages manually or to make sure a certain page will get indexed in a short period of time??
              rijwielcashencarry040
              rijwielcashencarry040
              0
              7
              111

            • Why is there a difference in the number of indexed pages shown by GWT and site: search?
              aleker
              aleker
              0
              4
              146

            • Home page indexed but not ranking...interior pages with thin content outrank home page??
              DougHosmer
              DougHosmer
              0
              3
              294

            • .co.uk/index.html or just .co.uk - my on-page reports are different for both - why?
              askshopper
              askshopper
              0
              2
              270

            • Two different page authority ranks for the same page
              trophycentraltrophiesandawards
              trophycentraltrophiesandawards
              0
              3
              484

            • Does page speed affect what pages are in the index?
              Alex-Harford
              Alex-Harford
              0
              10
              835

            • What's the difference between a category page and a content page
              AlanBleiweiss
              AlanBleiweiss
              0
              5
              1.0k

            • Google News not indexing .index.html pages
              H-FARM
              H-FARM
              0
              10
              1.4k

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy