Google attempting crawl of URLs with parameter randomly placed in URL
-
This post is deleted! -
This is not a Google issue. You have an issue with the way your server is configured. It is showing a different link for when a bot is visiting your site versus when someone is browsing the site. This is why you can't see it. Google can only see what you show it.
Here is how you find this. You can run a tool like Screaming Frog or there are plugins for FF or Chrome that lets you change the user agent and how the page is shown
https://chrome.google.com/webstore/detail/user-agent-switcher/ffhkkpnppgnfaobgihpdblnhmmbodake
https://addons.mozilla.org/en-us/firefox/addon/user-agent-switcher/
If you do not want to mess with plugins you can use the good ole rex swain tool
http://www.rexswain.com/httpview.html
Start with one of your pages that list jobs - I found this one
http://www.office-angels.com/en-GB/Pages/all-job-results.aspx?kws=Catering%2c+Restaurant+%26+Bars
Take that URL and put it into the RexSwain Tool in the URL section. Dont change anything else and press submit. You will get a page with a bunch of code (shows you the source code that is returned). Use the search page function in your browser to look for the offending "job-id?/" you should not find it.
Click back and go back to the form. Leave the URL in place and now add "Googlebot/2.1 (+http://www.google.com/bot.html)" as the user agent and submit. Do the same search with "job-id?/" you will see it is added in front of all the links to your jobs. See image Rex Swain Crawl as Google Bot attachment below.
If you do this same exercise with one of the plugins, just change the agent to Google and then hover over the link. See image Page in Chrome with Google User Agent attachment below. Bottom left you see the "bad" url and top right you see my User agent set to On to act as google.
Get with your IT guys as you are technically cloaking right now, showing 1 thing to Google and another to users. You are also potentially showing duplicate content to Google as well for 2 URLs with the same job.
Also, I would take out that parameter handing in GWT, that has nothing to do with all of this.
Good Luck!
-
You sir, are a prince among men.
Yes that is exactly it, thank you! I've been tearing my hair out over this for weeks trying to work this out.
There seems to be some sort of condition built in to the job results page designed to distinguish page edit mode vs. normal view mode that's causing the issue, we'll have to untangle it next week.
Anyway, thanks again!
-
My pleasure!
-
One more - once you do fix the issue of the display on the page, you will want to 301 all the rogue URLs to the correct ones just to get that cleaned up for the Goog or anyone else who might visit. That will also take care of the 404 issues in GWT.