Welcome to the Q&A Forum

Keszi

Hi Bharat,

Your website is currently unreachable.

Keszi

Hi there,

Yes, there are several ways you could do that, but my question is, if it is worth it, or not. If we are talking about a large website, you could have issues with Google's crawl budget. Basically, the crawlers would have to go through an additional 301 to land on your homepage.

Google describes their best practices about redirecting 404 pages here: https://support.google.com/webmasters/answer/181708?hl=en

If your page is no longer available, and has no clear replacement, it should return a 404 (not found) or 410 (Gone) response code. Either code clearly tells both browsers and search engines that the page doesn’t exist. You can also display a custom 404 page to the user, if appropriate: for example, a page containing a list of your most popular pages, or a link to your homepage.
If your page has moved or has a clear replacement, return a 301 (permanent redirect) to redirect the user as appropriate.

In my opinion, the decision should be determined by the size of the website. If we are talking about a big website, maybe it would be more beneficial if you follow Google's guidelines, implement a 410 status code. If the website is small, maybe you could redirect the users to the homepage, and hope they are going continue their journey on your website.

István

Keszi

Hi Naufal,

First of all, you need to identify which markup is causing the issue itself. Removing every single markup from your website won't be beneficial (although you might get the manual action removed from the site).

Check the guidelines from Google: https://developers.google.com/search/docs/guides/sd-policies

Two very important points from the guidelines:

Don't mark up content that is not visible to readers of the page. For example, if the JSON-LD markup describes a performer, the HTML body should describe that same performer.
Don't mark up irrelevant or misleading content, such as fake reviews or content unrelated to the focus of a page.

Personally, I do not believe that it is caused by OpenCart. I would look into how the review/rating markup was implemented on the site (that's a point where we had issues with implementation in the past). Which method are you using for the markup?

István

later edit: could you provide an example code? maybe we can analyze it easier. (you can send it via PM, if you do not want to disclose the URL publicly)

Keszi

Let me know how it turns out. If the problem persist, I'm glad to help good luck!

Keszi

Hey,

Can you point out an example URL? (if you don't want to disclose the website URL in here, you can do it via a personal message). This way we can debug an exact URL and not just a theory.

Regarding blocking via robots.txt: it is never a good idea to block a search engine from URLs you want to deindex. This way the Google crawlers won't grab and process the data, and you will have your URLs in the search index.

Just check: https://support.google.com/webmasters/answer/6062608?hl=en

"While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results."

In case of 301 redirects (make sure you are not using 302), if the crawler can access the page, you should have the old URL removed from the index.

Keszi

Sorry, I've forgot to detail that case:) Thanks for pointing it out.

Keszi

Hi there,

Regarding website.com/category/product

What you have to take into consideration: if a product is going to be placed in more than one category, then this product is going to be indexable on more than one URL paths. (Like Gaston mentioned below, in this case you need to take care of duplication, which you can do with canonical links or redirection to one path).

For example, let's say you have a product which is in both cat1, subcat1 and cat2. This way you will have minimum 3 available paths to the product:

This means that on product level you will have to deal with internal duplicated content. This is why usually prefer to use website.com/product url path (IMO).

Regarding the /category/subcategory/ vs /category-subcategory/ this is really a technical question. How "deep" is going to be your website? Do you want a flat infrastructure?

I usually prefer the /category/subcategory/ structure, because of the idea of contents of an e-commerce website should be build up with a structure based upon content silos, where from general you move towards the specific (indifferent of how we achieve this, with subcategories or filters within categories). I really hope it helps you answer your questions.

Keszi

Glad I could help! Good luck!

Keszi

Hi there,

I would say, discuss with your developer. It happened with one of our developers, who has been using our company's subdomain for a personal project development/staging (without our knowledge) and forget to noindex the dev site.

I have found it out exactly with a site: search.

As I can see the links are not live anymore, but are indexed with the old.britishcarregistrations.co.uk subdomain.

Keszi

Hi there,

Just check the answer from the following question: https://moz.com/community/q/unable-to-crawl-after-301-permanent-redirect-how-to-fix-this.

Jordan Railsback describes the issue, and how to fix it.

Keszi

If you want to go with a free tool, you can also check the xenu link sleuth (http://home.snafu.de/tilman/xenulink.html)

Just make a full crawl with the tool, export page map to tab separated file. Then you can open this file in Excel (or any similar software). It should do the job

Keszi

Hi!

I personally like to use the service of Kraken.io. Check their pricing, but I'd say it is a low cost and efficient way to handle the images.

We have been using it with their magento extension. But they also have a plugin for Wordpress. (https://kraken.io/plugins)

I know, it is not the only solution, but it worked well for me

Good luck!

Keszi

Hi David,

There was a very good article about this topic back in 2014 (I know it sounds a little bit old, but still it is very descriptive): https://moz.com/blog/seo-guide-to-google-webmaster-recommendations-for-pagination

We also had a similar implementation, and I went with the Option 3B from the article pointed out above: **Option 3: Implement Pagination Relationships + noindex, follow directive after page 2. **

So you want to have only the first page indexed, then set the directive "robots" to "noindex, follow" after the first page. HINT If you use /page/ in your url structure (vs page query parameter), you can also use that to check if that page needs to be indexed or not, as it should only appear after page 2.

I hope it helps.

Keszi

I'm glad I could help! Let me know if you hit any walls with the implementation.

Keszi

Hi there,

Usually my advice is to add any custom code after the default WordPress rules, just to keep it more organised. It is very important not to add the rules in the WP section (# BEGIN WordPress -> # END WordPress).

Also I usually add comments before every rule group I create (just to have it more organised, and if anything goes wrong - check Search Console for anomalies after implementation - I know where I need to revert/adjust). You can add comments by starting the line with a # sign.

I hope it helps.

Oh and BTW, when using Redirect 301, you should use relative path for the OLD url and absolute path for the NEW url, so the lines that you provided need to contain the full URL for the new version:
Redirect 301 old-relative-path.html http://www.yourdomaingoeshere.com/newurl/

Keszi

Hi there,

From what you are describing the first thought that came to me is a wrongly implemented relative URL.

What I would do in this case: run a full crawl for the website with screaming frog (you will need a paid version) and make a bulk export for 404 inlinks via: Bulk Export -> Response Codes -> Client error (4xx) Inlinks. I would use that list to find a pattern in the anchor texts used to generate these kind of URLs.

When you have found a pattern you can go digg into the source code of the pages where the links come from.

If you don't have a Screaming frog license, send me a PM with the website and I will make a quick crawl for you.

Istvan

Keszi

Hi,

You could add the following code to your .htaccess to redirect all dated urls to non-dated version:

RedirectMatch 301 /([0-9]+)/([0-9]+)/([0-9]+)/(.*)$ http://www.**domain**.com/$4

Change domain.com with your domain name.

This should create a redirect from http://www.website.com/blog/2016/04/10/topic-on-how-to-optimise-blog to www.website.com/blog/topic-on-how-to-optimise-blog (and every similar situation).

Keszi

Hey,

If you check today's whiteboard Friday with Dr. Pete (https://moz.com/blog/arent-301s-302s-canonicals-all-basically-the-same-whiteboard-friday), he mentions this case:

"Some types of 302s just don't make sense at all. So if you're migrating from non-secure to secure, from HTTP to HTTPS and you set up a 302, that's a signal that doesn't quite make sense. Why would you temporarily migrate?"

So answering your question, Google probably considered your initial http -> https redirects as 301.

Keszi

Hey there,

Try to avoid using both canonical and noindex, it is not advised (source: https://www.seroundtable.com/noindex-canonical-google-18274.html).

If you are using canonical on the subdomain it should be more than enough to deindex the subdomain version (if in any way it gets indexed) and resolve the duplicate content issue.

Greetings, Keszi

Keszi

Hi Jose,

The canonical "Warning" is a notification. Tools cannot tell which is the original page, but can alert you that you have a canonical link on the specific URL.

With this report and a little Excel work you can double-check your canonical implementation.

Greetings, Keszi

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Keszi

@Keszi

Posts made by Keszi

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved