The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Google News not indexing .index.html pages

    Google News not indexing .index.html pages

    Technical SEO Issues
    10 3 1.4k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H-FARM
      H-FARM last edited by

      Hi all,

      we've been asked by a blog to help them better indexing and ranking on Google News (with the site being already included in Google News with poor results)

      The blog had a chronicle URL duplication problem with each post existing with 3 different URLs:

      #1)  www.domain.com/post.html (currently in noindex for editorial choices as showing all the comments)

      #2) www.domain.com/post/index.html (currently indexed showing only top comments)

      #3) www.domain.com/post/ (very same as #2)

      We've chosen URL #2 (/index.html) as canonical URL,  and included a rel=canonical tag on URL #3 (/) linking to URL #2.
      Also we've submitted  yesterday a Google News sitemap including consistently the list of URLs #2 from the last 48h . The sitemap has been properly "digested" by Google and shows that all URLs have been sent and indexed.

      However if we use the site:domain.com command on Google News we see something completely different: Google News has indexed actually only some news and more specifically only the URLs #3 type (ending with the trailing slash instead of /index.html). Why ? What's wrong ?

      a) Does Google News bot have problems indexing URLs ending with .index.html ? While figuring out what's wrong we've found out that http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html gives no results...it seems that Google News index overall does not include any URLs ending with /index.html

      b) Does Google News bot recognise  rel=canonical tag ?

      c) Is it just a matter of time and then Google News will pick up the right URLs (/index.html) and/or shall we communicate Google News team any changes ?

      d) Any suggestions ? OR Shall we do the other way around. meaning make URL #3 the canonical one ?

      While Google News is showing these problems, Google Web search has actually well received the changes, so we don't know what to do.

      Thanks for your help,

      Matteo

      1 Reply Last reply Reply Quote 0
      • BlinkWeb
        BlinkWeb last edited by

        If you used the rel canonical tag properly and only submitted sitemap yesterday, its just a waiting game. You will get crawled and indexed properly soon.

        1 Reply Last reply Reply Quote 0
        • BlinkWeb
          BlinkWeb last edited by

          I had another thought too. Just because the pages say they are indexed in Google WMT, doesn't mean the new content including the new canonical tags have been crawled or added to the index yet.

          I recently did a similar project adding canonical tags to an ecommerce site. The new URLs are only showing up correctly in the search results maybe 10% of the time, even for pages I know have been crawled and I submitted a week ago. The important thing is that more URLs are updated each day.

          I dont believe they throw out their index the first time they crawl an established page and something has changed. I believe the index gets changed as they continue to crawl they compare versions and index data based on multiple crawl agregates, especially if it is for existing pages that have been in the index for a while. So in other words, if they compare 20 recent crawls and only see 1 version as being different, they may not throw out the old version right away until they crawl it multiple times and see that the the new version exists, say 5 or 10 of the most recent 20 crawls. BTW I don't have any data to back that up just my personal observation/theory.

          H-FARM 1 Reply Last reply Reply Quote 0
          • H-FARM
            H-FARM @BlinkWeb last edited by

            Hi roger and thx for the very insightful answer !

            what about the fact that not a single URL ending with index.html is indexed in Google News ?

            http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html

            compare that with the normal google index

            http://www.google.it/search?q=inurl%3Aindex.html&hl=en&ned=us&tab=nw

            doesn't that sound weird to you ?

            matteo

            KeriMorgret BlinkWeb H-FARM 6 Replies Last reply Reply Quote 0
            • KeriMorgret
              KeriMorgret @H-FARM last edited by

              My hunch, and it's only a hunch, is that it relates to their URL requirements that the URL has to be dedicate to an article. An index.html page is usually not a page that would be dedicated to one individual news story. See http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=68323 for their URL requirements.

              1 Reply Last reply Reply Quote 0
              • BlinkWeb
                BlinkWeb @H-FARM last edited by

                It does sound weird, but I am not sure that search operator works in Google News.

                Here is a simple test. Search Google News for "Google"

                The second story I see is http://phandroid.com/2011/04/22/will-spotify-be-google-musics-savior/

                However a Google News search for "inurl:will-spotify-be-google-musics-savior" returns no results.

                Clearly the story is indexed!

                1 Reply Last reply Reply Quote 0
                • BlinkWeb
                  BlinkWeb @H-FARM last edited by

                  They seem to meet these requirements. The only one that is a problem is requirement #3, but it clearly states that is waived with News sitemaps which Matteo said they submitted.

                  With that said I do like Matteo's option #1 better than the naming convention they chose to go with.

                  1 Reply Last reply Reply Quote 0
                  • H-FARM
                    H-FARM @H-FARM last edited by

                    hey Roger,

                    Look the CNN seems to have exactly the same "problem" as we do.

                    http://www.google.com/#q=Obama+makes+stop+in+Los+Angeles%2C+wraps+up+campaign+swing+&fp=3986f88f9d6402d3&hl=en

                    They have the "/" article indexed in google news and the index.html version on the non-google news index. They did exavtly what we did, putting a rel=canonical on the "/" version to the "index.html" one. Despite this the "/" version is still the only one showing up on google news

                    Here is the screenshot just in case

                    https://skitch.com/matsutton/r5swm/obama-makes-stop-in-los-angeles-wraps-up-campaign-swing-google-search

                    and here the two versions of the same article:

                    - http://edition.cnn.com/2011/POLITICS/04/22/obama.campaign/

                    - http://edition.cnn.com/2011/POLITICS/04/22/obama.campaign/index.html

                    1 Reply Last reply Reply Quote 0
                    • BlinkWeb
                      BlinkWeb @H-FARM last edited by

                      Hmmm, that is strange! Check a cached version of one of your URLs to make sure they new version is in the index. If it is, maybe you should switch to option 3.

                      I am not sure what if any the implications would be of leaving it the way you have it.

                      Since it is in 2 different areas of search I am not sure that duplicate content issues apply if you were to just leave it be.

                      1 Reply Last reply Reply Quote 0
                      • H-FARM
                        H-FARM @H-FARM last edited by

                        To follow up on this.

                        Look what I've found in the Google News Forum:

                        http://www.google.com/support/forum/p/news/thread?tid=248ef4e6fe372e91&hl=en

                        The problem is almost the same. Google News not indexing URLs with the trailing index.html.

                        The only person who answered was a Top Contributor suggesting to contact directly Google News team.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • Pages are Indexed but not Cached by Google. Why?
                          vinaso773
                          vinaso773
                          0
                          3
                          617

                        • Google indexes page elements
                          conversal
                          conversal
                          0
                          3
                          39

                        • Why wont google Index this page?
                          AshShep1
                          AshShep1
                          0
                          5
                          79

                        • Page disappeared from Google index. Google cache shows page is being redirected.
                          shop.nordstrom
                          shop.nordstrom
                          0
                          5
                          761

                        • Why google indexed pages are decreasing?
                          Davinia22
                          Davinia22
                          0
                          2
                          4.0k

                        • No existing pages in Google index
                          AlanBleiweiss
                          AlanBleiweiss
                          0
                          2
                          146

                        • Will Google Continue to Index the Page with NoIndex Tag Upon Google +1 Button Impression or Click?
                          STPseo
                          STPseo
                          0
                          3
                          636

                        • Google indexing page with description
                          adrianvender1
                          adrianvender1
                          0
                          2
                          747

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy