Hi Chris,
Screaming Frog has been my go-to XML sitemap generator for years. Plenty of customization options for exclusions and inclusions.
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Hi Chris,
Screaming Frog has been my go-to XML sitemap generator for years. Plenty of customization options for exclusions and inclusions.
Hi Ramendra,
Based on what you said, it sounds like both versions of your site exist and are indexed, and you want to mitigate your duplicate content risk. If that's accurate, here are my recommendations on this:
Hi,
You've got a really big mess on your hands, IMO. Search engines absolutely _do _penalize duplicate content, and it sounds like you have a ton of it on the existing sites, with plans to create even more.
What types of locations are the 25 different sites going to be targeting? All within the same country, or one for each of 25 different countries? The answer to this question will drive any further recommendations.
No problem! Yes, same is true for HTTP and HTTPS
That is correct, you should be using rel=next/prev for markup on paginated sections. But after noticing the www and non-www issue, I don't think your problem is related to canonicals or prev/next.
Regardless of what you're doing with canonical tags or prev/next, you pages should never be accessible at both www and non-www versions. You're going to be at a duplicate content risk as long as both versions exist.
Hi,
Regarding the first set of URLs: I took a look at a handful of those URLs, and it's entirely possible that you're getting duplicate notices on those. Rogerbot flags any 2 pages as duplicates if the source code of those URLs matches at 90% or more. So it's not identical, but not different enough that search engines can discern. Most of the products you've listed there have no content, or a very small amount, meaning that when you consider the rest of the code involved with that page, it mostly matches the homepage.
Regarding the second set: I ran those URLs through Screaming Frog and don't see any canonical tags. Keep in mind, just because URLs aren't indexed in search engines, doesn't mean Rogerbot doesn't have access to them.
*Update - on further digging, I think I found the source of all of your duplicate issues. Both www and non-www versions of your URLs are accessible. One of them should redirect to the other, doesn't matter which, but both should not render.
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Hi,
There are a couple things you can do to help Google serve up your preferred TLDs in each country.
First, you should identify (if you haven't already) the preferred geography in Search Console - see this link for more info on how to set that up: https://support.google.com/webmasters/answer/62399?hl=en
Next, you can add the appropriate hreflang tags to the pages on each of the different site versions. Moz has an extensive guide on how to do this here: https://moz.com/learn/seo/hreflang-tag
Hope that's helpful!
Hi,
Duplicates of this nature can only be fixed by adding canonical tags to one preferred version of that product. When you have multiple instances of the product throughout many categories, search engines aren't sure which to show, so they'll usually not show any of your results. For more on canonical tags, read this Moz article.
You cannot remove it from the Moz report, as that's part of its purpose - to show you instances of duplicate content.
Thanks so much for reporting back! Nonetheless, the responses here helped strengthen my case that on-site elements like this should not exists solely for bots, so I'm making headway!
They will show up in Open Site Explorer, I've seen shortlink URLs show up there for a handful of clients.
Hi,
You can use bit.ly for backlinks, but keep in mind, they are 301 redirects, so you lose a bit of link juice when search engines step through them.
If you chose to do this, you'll also want to tag the URLs before you shortlink them, otherwise your Google Analytics reporting will show all that traffic as Direct/(none) in the Channel report.
Yes, I typically reserve the homepage for branded search. On most sites, the content of the homepage is too broad to really be a helpful entry point from organic non-branded search.
Heyo!
I would not recommend your homepage exists at any URL other than example.com for the following reasons:
I'm sure other could drum up more reasons, but the ones I've listed here should be enough to dissuade you.
You can typically fare much better by giving non-branded keywords interior pages that are specific to that topic rather than the broader homepage content. This will increase the likelihood of people finding what they're looking for and is a better way to tailor your content to your audience and to algorithms.
Hi,
It sounds like you might need to implement hreflang tags. Have you done so already? Without being able to specify each country-level shop in Search Console via ccTLD, this is your best bet to ensure the right pages show in the right versions of Google. Here's Moz's guide to using hreflang tags. For each of your country-specific subdirectories, you'll need to add the appropriate tag which will specify the country and language of the page. This helps keep Swiss pages out of Google.de, etc, etc.
Hi,
This isn't necessarily a problem, but XML sitemaps should be as clean as possible before they're uploaded. i.e., no 301'd URLs, no 404s, no dupes, no parameter'd URLs, no canonicalized, etc..
Are they duplicates in the sense that one has caps, and the other doesn't? As in /example.html and /Example.html. If so, you'll want to fix that.
If they're identically formatted URLs, there should be no problem, but you're at duplicate content risk if they're different in anyway and not canonicalized.
Hi,
I'm trying to read over your content and it just keeps loading more products. Very frustrating. You should consider pagination and having a default amount of products load on the page.
Regarding your internal links, you have one the says "2000 gifts for men" and one that says "shop gifts for men". These links point to the page the exist on, with a hashed URL. Links should never point to the same URL on which they live, I see this a lot and its very frustrating, it looks spammy to search engines, and is confusing/frustrating/irritating for searchers. They click that link only to be taken to the top of this long page again? Not a good user experience. There's also a link for your homepage buried in "..." in the first paragraph, that also looks spammy/cloaky to me.
I'm not saying these page load and internal linking issues are your whole problem, but they're definitely not helping your situation and the simplest things like this should be tackled first.
There are no external links pointing to this page, if your competitors have similar pages with some recently acquired good quality external links to them, this is potentially the cause of your recent decline.
Hi Rob,
Since you've got no links pointing to these pages, a 410 would be your best bet. This will get them removed from the index the quickest, and you'll start to see these errors in Search Console drop. Doing 301 redirects would also get them removed from the index, but that's also to slow down your performance, and since these pages can't be accessed other than SERPs anyway, 301s aren't going to provide much long-term value.
Here's some more info on the difference between 404s and 410s: https://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes
Hi,
I never recommend URL changes for the purpose of improving rankings. While URL structure is a ranking factor, it's not a big enough one to justify restructuring your URLs for. See this article for more on how insignificant this is as a ranking signal.
Thanks for that article, not quite the type of links I'm addressing here, but definitely some applicable nuggets of information there.