HTTPS site, server root Robots.txt and Crawl errors
-
This post is deleted! -
If you want to stop Rogerbot from crawling those pages use the following command:
- User-agent: rogerbot
- Disallow: < Type your webpages >
Afterwards, you can test it with the research tools and crawl the site.
Hope this helps.
-
Hi Douglas,
I was sure I had responded to this earlier, but there must have been a glitch.
The crawler only looks at one robots.txt per domain. The http version was found and obeyed. At some point, there was a link to https that led to the crawling of the site in the https version.
For now, you can ignore those errors.
In the longer term, it would be good to redirect https to http for the pages that don't need to be https. This way you're also making sure that when people link to you, they're linking to the http version and not the https. It's one more way of canonicalizing your website.