Welcome to the Q&A Forum

Keszi

Hi there,

I'm using Firecheckout on a few projects, and it is really easy to use. (M 1.9.3.x)

Keszi

Oh, sorry. Somehow I didn't get any notification on your reply.

For IIS you could go with web.config of your website. The code will be something like:

<rule name="Force WWW and SSL" enabled="true" stopprocessing="true"><match url="(.*)"><conditions logicalgrouping="MatchAny"><add input="{HTTP_HOST}" pattern="^[^www]"><add input="{HTTPS}" pattern="off"></add></add></conditions>
<action type="Redirect" url="https://www.domainname.com/{R:1}" appendquerystring="true" redirecttype="Permanent"></action></match></rule>

Keszi

Hi Sammy,

If I understand your question, you need help with htaccess code to force both https and www with same rule? If so, this might be what you are looking for:

RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www.domainname.com$ [NC]
RewriteRule ^(.*)$ https://www.domainname.com/$1 [L,R=301]

Keszi

Hi there,

The URL structure will remain the same? If so, in the .htaccess file of the subdomain, you should add the following after the RewriteEngine On:

RewriteEngineOn

RewriteCond%{HTTP_HOST}^shop.domain.co.uk$[NC]

RewriteRule(.*)https://www.domain.co.uk/$1 [R=301,L]

this should do the trick to redirect https://shop.domain.co.uk/product-category/great-merchandise/?product_order=desc to https://www.domain.co.uk/product-category/great-merchandise/?product_order=desc

I hope this helped

Keszi

You have my details on my profile. And after we resolve it, we should paste here the solution without domain-specific information, so it helps others in the future. (if you don't mind).

Keszi

Hi there,

Probably what is happening is that your plugins are not optimized for redirects. You should address it from your .htaccess file (probably adds the redirects, but they are not optimized). If you can give access, I can help you out.

Keszi

Hi James,

So far as I can see you have the following architecture:

job posting: https://www.pkeducation.co.uk/job/post-name/
jobs listing page: https://www.pkeducation.co.uk/jobs/

Since from the robots.txt the listing page pagination is blocked, the crawler can access only the first 15 job postings are available to crawl via a normal crawl.

I would say, you should remove the blocking from the robots.txt and focus on implementing a correct pagination. *which method you choose is your decision, but allow the crawler to access all of your job posts. Check https://yoast.com/pagination-seo-best-practices/

Another thing I would change is to make the job post title an anchor text for the job posting. (every single job is linked with "Find out more").

Also if possible, create a separate sitemap.xml for your job posts and submit it in Search Console, this way you can keep track of any anomaly with indexation.

Last, and not least, focus on the quality of your content (just as Matt proposed in the first answer).

Good luck!

Keszi

In my experience, it will help the overall site, but still... do not expect a huge impact on these. URLs are shared, but I don't believe people will start to link to them except for private conversations.

Keszi

This is a technical question, that they need to tackle from database side. It can be implemented, but it needs a few extra development hours, depending on the complexity of your website architecture/cms used/etc.. Anyways, you are changing the URL, so don't forget about the best practices for them. Good luck!

Keszi

Hi there,

I believe the most logical implementation would be to use "noindex, follow" meta robots on these pages.

I wouldn't use canonical because it does not serve this purpose. Also make sure, these pages are not blocked via robots.txt.

Keszi

Hi Rachel,

Like I have mentioned in the previous question, the best case would be to translate them (especially if you are creating a group of redirects now), but if it won't be implemented, then regardless check for the followings after implementation:

each page should have a canonical on itself (they are unique and serve a specific audience)
each page should have the hreflang implementation (to tell crawlers specifically which target audience does it serve).

Btw.: A "must read" article, when we are talking about hreflangs: https://moz.com/blog/hreflang-behaviour-insights from Dave Sottimano.

Keszi

The idea is (which we both highlighted), that blocking your listing page from robots.txt is wrong, for pagination you have several methods to deal with (how you deal with it, it really depends on the technical possibilities that you have on the project).

Regarding James' original question, my feeling is, that he is somehow blocking their posting pages. Cutting the access to these pages makes it really hard for Google, or any other search engine to index it. But without a URL in front of us, we cannot really answer his question, we can only create theories that he can test

Keszi

Hi Rachel,

Regarding the language code in the URL, you can leave it (page**-uk**.html,page**-es**.html, etc.), but maybe it would be an idea of having a translated page url for each language. For example:

This would serve a little bit better than the previous version, where you would have:

Keszi

Usually, we are talking about a wrongly coded page, which gives you a loop when crawling. If you can show the site itself, I will gladly help you find it.

If you cannot disclose the url, you can do the following: create a crawl with a tool such as Screaming Frog, filter for these URLs, and check their inlinks and anchor texts specifically for this type of URLs. When you will find a pattern, you will find where the code is broken. Good luck!

Keszi

Hi James,

First of all you need to categorize these 404 pages, some may come from website sections that were deleted in the past, and haven't been addressed. Other 404 URLs could appear through domain redirects to your website. (unfortunately, these are harder to find, process and resolve).

For the first category (when sections have been deleted, moved, etc) you will have to ask yourself which is the correct way to resolve the issue? 301 redirect to most relevant URLs? Or just 410 and let search engines know that your page are deleted, and URLs should be removed from index. Please don't start redirecting every single 404 URL to your homepage, or any irrelevant page, or you will be creating soft 404s.

Regarding the /insertgibberishurlhere type of URLs, you should check what kind of domains are redirected to your domains (I have seen domains that had this kind of 404 not found errors via massive domain redirects to a project). First of all, if this is the case, you need to ask yourself, what are you redirecting to your website. If the domains are not in your hand, you could also redirect all of these to a 410 status code url on your website.

Oh, and the obvious version: crawl your website with tools as Screaming Frog, and make sure you are not creating the 404 URLs yourself. (almost forget about this version)

Let me know, if you have further questions.

Keszi

Sorry Richard, but using noindex with canonical link is not quite a good practice.

It's an old entry, but still true: https://www.seroundtable.com/noindex-canonical-google-18274.html

Keszi

Hi James,

Regarding the robots.txt syntax:

Disallow: /jobs/? which basically blocks every single URL that contains /jobs/**? **

For example: domain.com**/jobs/?**sort-by=... will be blocked

If you want to disallow query parameters from URL, the correct implementation would be Disallow: /jobs/*? or even specify which query parameter you want to block. For example Disallow: /jobs/*?page=

My question to you, if these jobs are linked from any other page and/or sitemap? Or only from the listing page, which has it's pagination, sorting, etc. is blocked by robots.txt? If they are not linked, it could be a simple case of orphan pages, where basically the crawler cannot access the job posting pages, because there is no actual link to it. I know it is an old rule, but it is still true: Crawl > Index > Rank.

BTW. I don't know why you would block your pagination. There are other optimal implementations.

And there is always the scenario, that was already described by Matt. But I believe in that case you would have at least some of the pages indexed even if they are not going to get ranked well.

Also, make sure other technical implementations are not stopping your job posting pages from being indexed.

Keszi

I'll check in a little bit later, currently, I am getting a DNS error when trying to access it.

Keszi

Hey,

Try fetch and render from Google Search Console. There you could check if there is an X-Robots-Tag in the response header.

For reference: https://developers.google.com/search/reference/robots_meta_tag

Keszi

I still can't reach it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Keszi

@Keszi

Posts made by Keszi

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved