Too many on page links in sitemap.html
-
My crawl report is flagging an issue with too many links to one of my pages, this page is my sitemap.html. However, I have coded the page so that if required is specified it generates an .xml version of the page and if not then the html version is displayed. What is the best way to stop the crawl finding the html version whilst maintaining it on the site for clients navigation?
-
hide it using a robots.txt file - though you could also use the noindex meta tag ... this being said search engines in general recognize sitemap pages and aren't too fussed by them, its a good jumping off point for them to find info.
-
Thanks for the response,
This was the first thought, but I wasn't 100% sure that hiding it in the robot.txt file should solely remove this issue and it is still early.
Thanks again.
-
The thing to remember is that the HTML version should only ever be used for users and not to redirect robots if they hit a 404 on your .xml file. The reason for this is that search engines may still see the file as 404 after the redirect or a 301 redirect, if the later you then have an issue of search engines thinking it was there but is now the html page. Which of course is not a good thing.
I would advise ensuring the fall back never happens to robots / spiders - if the file is just a 404 SE's will return to it, they may not if it is 301 redirect.