Questions
-
Would you rate-control Googlebot? How much crawling is too much crawling?
I agree with Matt that there can probably be a reduction of pages, but that aside, how much of an issue this is comes down to what pages aren't being indexed. It's hard to advise without the site, are you able to share the domain? If the site has been around for a long time, that seems a low level of indexation. Is this a site where the age of the content matters? For example Craigslist? Craig
Intermediate & Advanced SEO | | CraigBradford0 -
Disallow statement - is this tiny anomaly enough to render Disallow invalid?
Interesting. I'd never heard that before. We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days. But the goal is definitely to make them and their contents invisible to SEs. We'll get GWT on there and start removing URLs. Thanks!
Technical SEO Issues | | lzhao0 -
Are URL suffixes ignored by Google? Or is this duplicate content?
As Lesley says, it's not ignored. If the content is exactly the same on both URLs, you can ask your IT folks to include a rel=canonical directive in the header that sets the canonical version of the content to one specific URL or, if a URL isn't needed, it can be 301 redirected to the proper URL. Canonicalization and the Canonical Tag - Learn SEO - Moz
Web Design | | Chris.Menke0 -
Temporarily suspend Googlebot without blocking users
So it seems like we've gone full circle. The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture." Sounds like the answer is, 'that's not possible'.
Technical SEO Issues | | lzhao0 -
To Reduce (pages)... or not to Reduce?
It certainly seems like you've put much more effort into challenging any lazy assumptions (like a single page will inherently convert better) and I think your logic is sound. From what you describe it sounds like the listing UX is well-segmented according to the distinct types of search intent you know that you field from your visitors, and I'd be wary of trying to "fix" that if the navigability ain't broke. Especially if the single-page amalgamation puts any of your strong search intent content in a non-intuitive spot or, god-forbid, below the fold.
On-Page / Site Optimization | | mgalica0 -
Does Google respect User-agent rules in robots.txt?
Does Google respect User-agent rules in robots.txt? Yes I've seen googlebot ignore nofollow rules set at the page level. Google honors the nofollow rules set at the page level. The issue is there may be other links on your site or elsewhere on the web that Google will find and follow those links. Robots.txt is the absolute last means to use for blocking pages. You should not block a page with robots.txt unless you have exhausted all other options. A more appropriate method of keeping a page out of the index is the noindex tag. If you use the tag appropriately, Google will honor the tag.
On-Page / Site Optimization | | RyanKent0 -
Should we use Google's crawl delay setting?
Unfortunately you can't change crawl settings for Google in a robots.txt file, they just ignore it. The best way to rate limit them is using custom Crawl settings in Google Webmaster Tools. (look under Site configuration > Settings) You also might want to consider using your loadbalancer to direct Google (and other search engines) to a "condomised" group of servers (app, db, cache, search) thereby ensuring your users arent inadvertantly hit by perfomance issues caused by over zealous bot crawling.
Technical SEO Issues | | RichardVaughan0