Welcome to the Q&A Forum

bjs2010

On my ecommerce site, we have .html extensions on all files and categories.

I was wondering if it is worth the development cost to make all of them / ?

Is there any SEO benefit in doing so?

Thanks,

B

bjs2010

Hi,

In my sitemap, I have the preferred entrance pages and URL's of categories and subcategories.

But I would like to know more about how Googlebot and other spiders see a site - e.g. - what is classed as a deep link? I am using Screaming Frog SEO spider, and it has a metric called level on it - and this represents how deep or how many clicks away this content is.. but I don't know if that is how Googlebot would see it -

From what Screaming Frog SEO spider software says, each move horizontally across from Navigation is another level which visually doesnt make sense to me?

Also, in my sitemap, I list the URL's of all the products, there are no levels within the sitemap.

Should I be concerned about this?

Thanks,

B

bjs2010

Hi Paul,

Thank you for your detailed answer - so I'm not going crazy

I did try with canonicals but then realized they are more of a suggestion as opposed to a directive and I am still correcting a lot of dupe content and 404's so I am imagining that Google view's the site as "these guys don't know what they are doing' so may have ignored the canonical suggestion.

So what I have done is remove the robots block on the pages I want de-indexed and add in meta noindex, follow on these pages - From what you are saying, they should naturally de-index, after which, I will put the robots.txt block back on to keep my crawl budget spent on better areas of the site.

How long in your opinion can it take for Googlebot to de-index the pages? Can I help it along at all to speed up? Fetch page and linking pages as Googlebot?

Thanks again,

Ben

bjs2010

Hi Tom.

Thank you for your answer - Sounds good, so as far as your answer goes, using sub-folders is similar in seo value as subdomains?

Thanks again

bjs2010

Hi,

Ive noticed that Google is not recognizing/crawling the latest changes on pages in my site - last update when viewing Cached version in Google Results is over 2 months ago.

So, do I Fetch as Googlebot to force an update?

Or do I remove the page's cached version in GWT remove urls?

Thanks,

B

bjs2010

Hi,

We have an e-commerce store in English and Spanish - same products. URLs differ like this:

ENGLISH:
www.mydomain.com/en/manufacturer-sku-productnameinenglish.html

SPANISH:
www.mydomain.com/es/manufacturer-sku-productnameinspanish.html

All content on pages is translated, e.g, H1, Titles, keywords, descriptions and site content itself is in the language displayed.

Is there a risk of similar or near dupe content here in the eyes of the big G?

Would it be worth implementing different languages on subdomains or completely different domains?

thank you

B

bjs2010

Hi,

We have a Magento website using layered navigation - it has created a lot of duplicate content and I did ask Google in GWT to "No URLS" most of the querystrings except the "p" which is for pagination.

After reading how to tackle this issue, I tried to tackle it using a combination of Meta Noindex, Robots, Canonical but still it was a snowball I was trying to control.

In the end, I opted for using Ajax for the layered navigation - no matter what option is selected there is no parameters latched on to the url, so no dupe/near dupe URL's created. So please correct me if I am wrong, but no new links flow to those extra URL's now so presumably in due course Google will remove them from the index? Am I correct in thinking that? Plus these extra URL's have Meta Noindex on them too -

I still have tens of thousands of pages indexed in Google. How long will it take for Google to remove them from index? Will having Meta No Index on the pages that need to be removed help?

Any other way of removing thousands of URLS from GWT?

Thanks again,

B

bjs2010

Hi all,

I hope you can spend some time to answer my first of a few questions

We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS!

Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination).

After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
"There is no information about this page because it is blocked by robots.txt"

So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt.

So coming to my question.

Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index?
Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index?

I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.

I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index.

Thanks!

B

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

bjs2010

@bjs2010

Posts made by bjs2010

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved