The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Google has deindexed 40% of my site because it's having problems crawling it

    Google has deindexed 40% of my site because it's having problems crawling it

    Technical SEO Issues
    7 2 781
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Bajram.Kurtishaj
      Bajram.Kurtishaj last edited by

      Hi

      Last week i got my fifth email saying 'Google can't access your site'. The first one i got in early November. Since then my site has gone from almost 80k pages indexed to less than 45k pages and the number is lowering even though we post daily about 100 new articles (it's a online newspaper).

      The site i'm talking about is http://www.gazetaexpress.com/

      We have to deal with DDoS attacks most of the time, so our server guy has implemented a firewall to protect the site from these attacks. We suspect that it's the firewall that is blocking google bots to crawl and index our site. But then things get more interesting, some parts of the site are being crawled regularly and some others not at all. If the firewall was to stop google bots from crawling the site, why some parts of the site are being crawled with no problems and others aren't?

      In the screenshot attached to this post you will see how Google Webmasters is reporting these errors.

      In this link, it says that if 'Error' status happens again you should contact Google Webmaster support because something is preventing Google to fetch the site. I used the Feedback form in Google Webmasters to report this error about two months ago but haven't heard from them. Did i use the wrong form to contact them, if yes how can i reach them and tell about my problem?

      If you need more details feel free to ask. I will appreciate any help.

      Thank you in advance

      C43svbv.png?1

      1 Reply Last reply Reply Quote 1
      • DirkC
        DirkC last edited by

        Hi,

        It seems that you're pages are extremely heavy to load - I did 2 tests - on your homepage & on the /moti-sot page

        Your homepage needed a whopping 73sec to load (http://www.webpagetest.org/result/150312_YV_H5K/1/details/) - the moti-sot page is quicker - but 8sec is still rather high (http://www.webpagetest.org/result/150312_SK_H9M/)
        I sometimes noticed a crash of the Shockwave flash plugin, but not sure if this is related to your problem;

        I crawled your site with Screaming Frog, but it didn't really find any indexing problems - while you have a lot of pages very deep in your sitestructure, the bot didn't seem to have any specific troubles to access your page. Websniffer returns a normal 200 code when checking your sites - even with useragent "Google"

        So I guess you're right about the firewall - may be it's blocking the ip addresses used by Google bot - do you have reporting from the firewall which traffic is blocked? Try to search for the useragent Googlebot in your logfiles and see if this traffic is rejected. The fact that some sections are indexed and others not could be related to the configuration of the firewall, and/or the ip addresses used by Google bot to check your site (the bot is not always using the same ip address)

        Hope this helps,

        Dirk

        Bajram.Kurtishaj 1 Reply Last reply Reply Quote 2
        • Bajram.Kurtishaj
          Bajram.Kurtishaj @DirkC last edited by

          Hi Dirk

          Thanks a lot for your reply.

          Today we turned off the firewall for a couple hours and tried to fetch the site as Google. It didn't work. The results we're the same as before.

          This problem is starting to be pretty ugly since Google has started now not showing our mobile results as 'mobile-friendly' even though we have a mobile version of site, we are using rel=canonical and rel=alternate and 302 redirects for mobile users from desktop pages to mobile ones when they are browsing via smartphone.

          Any other idea what might be causing this?

          Thanks in advance

          1 Reply Last reply Reply Quote 0
          • DirkC
            DirkC last edited by

            Hi

            Not sure if the indexing problem is solved now, but I did a few other checks. Most of the tools I used where able to capture the problem url without much issues even from California ip's & simulating Google bot.

            I noticed that some of the pages (example http://www.gazetaexpress.com/fun/) are quite empty if you browse them without Javascript active. Navigating through the site with Javascript is extremely slow, and a lot of links don't seem to respond. When trying to go from /fun/ to /sport/ without Javascript - I got a 504 Gateway Time-out

            Normally Google is now capable of indexing content by executing the javascript, but it's always better to have a non-javascript fallback that can always be indexed (http://googlewebmastercentral.blogspot.be/2014/05/understanding-web-pages-better.html) - the article states explicitly

            • If your web server is unable to handle the volume of crawl requests for resources, it may have a negative impact on our capability to render your pages. If you’d like to ensure that your pages can be rendered by Google, make sure your servers are able to handle crawl requests for resources.

            This could be the reason for the strange errors when trying to fetch like Google.

            Hope this helps,

            Dirk

            Bajram.Kurtishaj 1 Reply Last reply Reply Quote 1
            • Bajram.Kurtishaj
              Bajram.Kurtishaj @DirkC last edited by

              Dirk

              Thanks a lot for your help. Unfortunately the problem remains the same. More than 65% of site has been de-indexed and it's making our work very difficult.

              I'm hoping that somebody here might have any idea of what is causing this so we can find a solution to fix it.

              Thank you all for your time.

              1 Reply Last reply Reply Quote 0
              • Bajram.Kurtishaj
                Bajram.Kurtishaj last edited by

                We found the problem. It was about website compression (GZIP). I found this after crawling my site with Moz, and saw lot's of pages with 608 Error code. Then i searched in Google and saw a response by Dr. Pete in another question here in Moz Q/A (http://moz.com/community/q/how-do-i-fix-608-s-please)

                After we removed the GZIP, Google could crawl the site with no problems.

                1 Reply Last reply Reply Quote 1
                • DirkC
                  DirkC last edited by

                  Great news - strange that these 608 errors didn't appear while crawling the site with Screaming Frog.

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post
                  • Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
                    d.bird
                    d.bird
                    0
                    5
                    93

                  • Google's ability to crawl AJAX rendered content
                    OlegKorneitchouk
                    OlegKorneitchouk
                    0
                    4
                    191

                  • Site's IP showing WMT 'Links to My Site'
                    evansluke
                    evansluke
                    0
                    4
                    117

                  • Could using our homepage Google +1's site wide harm our website?
                    EricaMcGillivray
                    EricaMcGillivray
                    0
                    6
                    126

                  • Weird problems with google's rich snippet markup
                    AlexMcKee
                    AlexMcKee
                    0
                    4
                    499

                  • What's a Google penalty or why ignorance is not bliss - A tale of two web sites.
                    WhoWuddaThunk
                    WhoWuddaThunk
                    0
                    3
                    134

                  • How to tell how often Google crawls someone else's site
                    Saijo.George
                    Saijo.George
                    0
                    4
                    347

                  • How to fix and test Google's indexing / caching problem
                    ShaMenz
                    ShaMenz
                    0
                    2
                    972

                  Get started with Moz Pro!

                  Unlock the power of advanced SEO tools and data-driven insights.

                  Start my free trial
                  Products
                  • Moz Pro
                  • Moz Local
                  • Moz API
                  • Moz Data
                  • STAT
                  • Product Updates
                  Moz Solutions
                  • SMB Solutions
                  • Agency Solutions
                  • Enterprise Solutions
                  • Digital Marketers
                  Free SEO Tools
                  • Domain Authority Checker
                  • Link Explorer
                  • Keyword Explorer
                  • Competitive Research
                  • Brand Authority Checker
                  • Local Citation Checker
                  • MozBar Extension
                  • MozCast
                  Resources
                  • Blog
                  • SEO Learning Center
                  • Help Hub
                  • Beginner's Guide to SEO
                  • How-to Guides
                  • Moz Academy
                  • API Docs
                  About Moz
                  • About
                  • Team
                  • Careers
                  • Contact
                  Why Moz
                  • Case Studies
                  • Testimonials
                  Get Involved
                  • Become an Affiliate
                  • MozCon
                  • Webinars
                  • Practical Marketer Series
                  • MozPod
                  Connect with us

                  Contact the Help team

                  Join our newsletter
                  Moz logo
                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                  • Accessibility
                  • Terms of Use
                  • Privacy