Misbehaving Robots
-
I have a disallow entry in my robots file which is
/*src=
So I have urls that look like
www.website.co.uk/example?value=1&src=2
I would expect these URLs to be blocked.
However both Moz and AHREFS have these in the crawl logs
www.website.co.uk/example?value=1
It is as if the bots are reading the first parameter but being blocked when they reach the second and logging part of the URL.
Is this standard behavior for Roger (mozbot) or Ahrefs bot. Does google act in the same way?
-
Hey Andrew,
Would you be able to reach out to help@moz.com so we could have a look at your specific Campaign/Website.
Thanks!
Eli
-
No, Google does not act in the same way.
-
I have gone into search console > crawl > URL parameters.
I have found the SRC parameter and told google to crawl it. (as opposed to "let google decide")
Google has been indexing the pages without the parameter so I think it chose to ignore it. (because it doesn't alter page content)
Google now states. "This will be crawled unless overridden by other commands". So robots should kick in now and stop the URLs being crawled.
Why Moz and AHREFS are also ignoring this parameter is beyond me.
-
Ok the URL looks like this
www.site.com/engine/referrer.asp?web=http%3a%2f%2fwww.example.co.uk&src=3078c98d2da385d5468f562
However google moz and ahrefs only get this far
www.site.com/engine/referrer.asp?web=http%3a%2f%2fwww.example.co.uk
Could it be because there is a domain name in the paramater
Could it be because they cannot pass the ampersand
Could it be because src is blocked in robots.
Any suggestions would be most welcome.
-
My suspicion would be that those URLs are accessible somewhere without the second variable. (I.e. That second variable isn't always present when the first variable is.)
-
Hi Paul
Thanks for your response. I'm always happy to hear your advise.
I've been through the page code line by line.
I've also fetched in google and the HTML does not contain a URL without the SRC value.
I'm really stuck