The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Moz Pro
    4. Issues with Moz producing 404 Errors from sitemap.xml files recently.

    Issues with Moz producing 404 Errors from sitemap.xml files recently.

    Moz Pro
    10 3 783
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • BriceSMG
      BriceSMG last edited by

      My last campaign crawl produced over 4k 404 errors resulting from Moz not being able to read some of the URLs in our sitemap.xml file. This is the first time we've seen this error and we've been running campaigns for almost 2 months now -- no changes were made to the sitemap.xml file. The file isn't UTF-8 encoded, but rather Content-Type:text/xml; charset=iso-8859-1 (which is what Moveable Type uses). Just wondering if anyone has had a similar issue?

      1 Reply Last reply Reply Quote 0
      • LynnPatchett
        LynnPatchett last edited by

        Hi Brice,

        What makes you think the issue is that moz cannot read the urls? In the first instance I would want to make sure that something else is not going wrong by checking the urls moz is flagging as 404s, ensuring they actually do or do not exist and if the latter finding out where the link is coming (be it the sitemap or another page on the site). You may have already done this, but if not you can get all this information by downloading the error report in csv and then filtering in excel to get data for 404 pages only.

        If you have done this already then if you give us a sample or two of the urls moz is flagging along with the referring url and your sitemap url we might be able to diagnose the issue better. It would be unusual for the moz crawler to start throwing errors all of a sudden if nothing else has changed. Not saying it is impossible for it to be an error with moz, just saying that the chances are on the side of something else going on.

        Hope that helps!

        BriceSMG 1 Reply Last reply Reply Quote 0
        • BriceSMG
          BriceSMG @LynnPatchett last edited by

          Hi Lynn,

          I did download the csv and found all the 404 errors were generate from our sitemap.xml file. Here's what the URLs look like:

          http://www.cmswire.com/ http:/www.cmswire.com/cms/document-management/attachmentsme-connects-your-gmail-to-microsofts-skydrive-017319.php

          http://www.cmswire.com/ http:/www.cmswire.com/cms/social-business/ibm-offers-social-learning-capabilities-with-the-release-of-kenexa-learning-suite-30-017275.php

          referring URL is http://www.cmswire.com/sitemap.xml

          You'll notice that there is odd formatting wrapping the URL (%0A%09%09%09) + the extra http://www.cmswire to the front of the URL- which does not exist in the actual sitemap.xml file if I view it separately.

          Also: Moz support looked at our campaign and they thought the problem was that our sitemap wasn't UTF-8 encoded.

          Any ideas?

          BriceSMG 1 Reply Last reply Reply Quote 0
          • BriceSMG
            BriceSMG @BriceSMG last edited by

            interesting that a new crawl just completed and now I only have 307 404 Errors and a lot of other different errors and warnings. It's frustrating to see such different things each week.

            barb

            1 Reply Last reply Reply Quote 0
            • LynnPatchett
              LynnPatchett last edited by

              Hi,

              It can be frustrating I know, but if you are methodical you will get to the bottom of all errors and then feel much better 🙂

              Not sure why the number of 404s would have gone down, but in regards the sitemap itself the moz team might be right that utf-8 encoding could be part of the problem. I think it might be more to do with some non visible formatting/characters being added to your site map during creation. %09 is a url encoded tab and %0A is a url encoded line feed, it looks to me that these are getting into your sitemap even though they are not actually visible.

              If you download your site map you will see that many (but not all) the urls look like this:
              <loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>

              Note the new lines and the indent. Some other urls do not have this format for example:

              <loc>http://www.cmswire.com/news/topic/impresspages</loc>

              It would be wise to ensure both the file creating the sitemap and the sitemap itself are in utf-8, but also it could be as simple as going into the file creating the sitemap and removing those line breaks. Once that is done wait for the next crawl and see if it brings the error numbers down (it should). As for the rest of the warnings, just be methodical, identify where they are occurring and why and work through them. You will get to few or zero warnings, and you will feel good about it!

              BriceSMG 1 Reply Last reply Reply Quote 2
              • JoelDay
                JoelDay last edited by

                Hey Brice,

                I just to add to Lynn's great answer with the reason you're seeing the URLs the way they are and to reinforce that.

                You have it formatted as such:
                <loc>http://www.cmswire.com/cms/web-cms/david-hillis-10-predictions-for-web-content-management-in-2011-009588.php</loc>

                The crawler converts everything to URL encoding. So those line feeds and tabs will be converted to percentage tags. The reason your root domain is there is because %0A is not the proper start of a URL so RogerBot assumes it's a relative link to the domain your sitemap is on.

                The encoding thing is probably not affecting this.
                Cheers,
                Joel.

                BriceSMG 2 Replies Last reply Reply Quote 0
                • BriceSMG
                  BriceSMG @LynnPatchett last edited by

                  Thanks Lynn,

                  We are looking at that. The 4k 404 errors are gone now, but it's possible they will return.

                  It's a major change for us to switch to UTF-8, so it's not something that will happen anytime soon. I'll just have to be aware that it might be causing issues.

                  barb

                  1 Reply Last reply Reply Quote 0
                  • BriceSMG
                    BriceSMG @JoelDay last edited by

                    Thanks Joel,

                    We're looking into this.

                    barb

                    1 Reply Last reply Reply Quote 0
                    • BriceSMG
                      BriceSMG @JoelDay last edited by

                      Joel,

                      The latest 404 errors have the same type of issue, and are all over place in terms of referrer (none are the sitemap.xml) that I can see.

                      My question is, can the fact that we don't use the UTF-8 encoding in our site potentially cause issues with other reporting? This is not something we can change easily and I don't want to waste a great deal of effort sorting through "red herring" issues due to the encoding we use on the site.

                      thoughts?

                      barb

                      1 Reply Last reply Reply Quote 0
                      • LynnPatchett
                        LynnPatchett last edited by

                        Hi Barb,

                        I am sure Joel will chime in also but just to clarify that it is probably not the utf8 encoding or lack of it that is causing the issue. At least with the sitemap urls it is simply the formatting of the xml that is being produced. As to if the other errors you are seeing are caused by the same kind of thing, if you are seeing references to the same encoded characters (%0A%09%) then the answer is most likely yes.

                        So the issue is not utf8 encoding related (there are plenty of non utf8 encoded sites on the web still!) but how the moz crawler is reading your links and if other tools/systems will be having the same troubles. Have you looked in google webmaster tools to see if it reports similar 404 errors from the sitemap or elsewhere? If you see similar errors in GWT then the issue is likely not restricted to the moz crawler only.

                        Beyond that, since for the sitemap at least the fix should be relatively simple and quite possibly the other moz errors you see will also be able to be fixed easily by making small adjustments to the templates and removing the extra line breaks/tabs which are creating the issue then it is worth doing so that these errors are removed and you can concentrate on the 'real' errors without all the noise.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • 404 error for unknown URL that Moz is finding in our blog
                          ThompsonPaul
                          ThompsonPaul
                          0
                          3
                          104

                        • 404 errors High Priority Issues in Moz Pro: change to 301 or not ?
                          AaronHemmelgarn
                          AaronHemmelgarn
                          0
                          5
                          273

                        • Webmaster Tools shows mystery errors that Moz does not
                          MarketingEnergy
                          MarketingEnergy
                          0
                          4
                          70

                        • Magento creating odd URL's, no idea why. GWT reporting 404 errors
                          Kingof5
                          Kingof5
                          0
                          2
                          702

                        • What is the best approach to handling 404 errors?
                          Saijo.George
                          Saijo.George
                          0
                          5
                          300

                        • Does SEOmoz give a way to know what link on what page produces the 404 errors that SEOmoz is telling me I have??
                          William.Lau
                          William.Lau
                          0
                          2
                          187

                        • HTTP 404 for 404-page?
                          KeriMorgret
                          KeriMorgret
                          0
                          3
                          783

                        • Title tag on sitemap.xml
                          Theo-NL
                          Theo-NL
                          1
                          2
                          4.3k

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy