Where are the crawled URLS in webmaster tools coming from?
-
When looking at the crawl errors in Webmaster Tools/Search Console, where is Google pulling these URLs from? Sitemap?
-
These emors are the problems the googlebot encounters while crawling your site. A site map can help the googlebot to better crawl your site but isn't strictly necessary .
rgds
Dirk
-
I agree 100% with Dirk!
Google is going to crawl your site as long as your as long as you have a domain's robots.txt file and Meta tag robots allow for the bot to crawl the site. By not telling Google anything you are saying welcome to my site please index
Google Webmaster tools are doing you a favor and saying look this is a problem that the bot has encountered while indexing your site look into it.
Submitting a XML sitemap to Google will definitely help show them where to look and you can request that they index using crawl is a Googlebot.
Some good advice on how to fix the issues found
**https://www.distilled.net/blog/seo/indexation-problems-diagnosis-using-google-webmaster-tools/ **
a great resource on indexing & robots.txt
https://www.distilled.net/blog/seo/advanced-seo-troubleshooting-why-isnt-this-page-indexed/
https://varvy.com/robottxt.html
I hope this helps,
Tom
-
Just to make complete. Google search console will list errors for pages with links coming from 3 general location
-
Crawling links on your website. Starting from somewhere on your site and going link to link.
-
Crawling links in your sitemap.
-
Crawling URLs from your site that do not exist anymore on your site or sitemap. I have seen Google keep things in memory and come back to hit pages again that are no longer from option 1 or option 2. If you used to have a bunch of 301 directs in place for an old version of your website and then your developer changes something to delete all those 301s and they become 404s, you will find those pages showing up as errors again. This is really useful as it can help diagnose the issue and you can fix it.
-
Crawling links from other sites. Sometimes, this is how links get crawled for #3.
Here is what really sucks about Search Console and I mean sucks big bananas if you are trying to diagnose an issue. If you look at your Search Console error page. You can click on the URL in the report, it will pop up a box and then you can click the tab "Linked From" and see what pages are linking to the URL in question. That is good! If you then download the CSV, all of that info is lost. If you have more than 20 errors to deal with, you do not have a practical way to manage things and see if there is a trend etc. Otherwise you are left with clicking a lot of links in the report and taking lots of notes and going a little insane.

Good luck!
-