Partial Match or RegEx in Search Console's URL Parameters Tool?

Andy.Drinkwater

Hi Ria,

I have never tried regular expressions in this way, so I can't tell you if this would work or not.

However, If all 1000 of these URL's are already indexed, just disallowing access won't then remove them from Google. You would ideally be able to place a noindex tag on those pages and let Google act on them, then you will be good to disallow. I am pretty sure there is no option to noindex under the URL Parameter Tool.

I hope that makes sense?

-Andy

Ria_

Hi Martijn, thanks for your response!

I'm currently looking at something like this...

**user-agent: *** #disallowing page.php and any parameters after it
disallow: /page.php #but leaving anything that starts with par1=ABC
allow: /page.php?par1=ABC

I would have thought that you could disallow things broadly like that and give an exception, as you can with files in disallowed folders. But it's not passing Google's robots.txt Tester.

One thing that's probably worth mentioning really is that there are only two variables that I want to allow of the par1 parameter. For example's sake, ABC123 and ABC456. So would need to be either a partial match or "this or that" kinda deal, disallowing everything else.

Andy.Drinkwater

Sorry Martijn, just to jump in here for a second - Ria, you can test this via the Robots.txt testing tool in search console before going live to make sure it work.

-Andy

Martijn_Scheijbeler

My guess would be that this line needs an * at the end.
Allow: /page.php?par1=ABC*

Ria_

Hi Andy,

Disallowing them would be my first priority really, before removing from index. Didn't want to remove them before I've blocked Google from crawling them in case they get added back again next time Google comes a-crawling, as has happened before when I've simply removed a URL here and there. Does that make sense or am I getting myself mixed up here?

My other hack of a solution would be to check the URL in the page.php, and if URL includes par1=ABC then insert noindex meta tag. (Not sure if that would work well or not...)

Ria_

I thought so too, but according to Google the trailing wildcard is completely unnecessary, and only needs to be used mid-URL.

Ria_

Yep, have done. (Briefly mentioned in my previous response.) Doesn't pass

Andy.Drinkwater

Disallowing them would be my first priority really, before removing from index.

The trouble with this is that if you disallow first, Google won't be able to crawl the page to act on the noindex. If you add a noindex flag, Google won't index them the next time it comes-a-crawling and then you will be good to disallow

I'm not actually sure of the best way for you to get the noindex in to the page header of those pages though.

-Andy

Andy.Drinkwater

Ah sorry I missed that bit!

-Andy

DirkC

Don't forget that . & ? have a specific meaning within regex - if you want to use them for pattern matching you will have to escape them. Also be aware that not all bots are capable of interpreting regex in robots.txt - you might want to be more explicit on the user agent - only using regex for Google bot.

User-agent: Googlebot

#disallowing page.php and any parameters after it

disallow: /page.php

#but leaving anything that starts with par1=ABC

allow: page.php?par1=ABC

Dirk

Ria_

Haha, I think the train passed the station on that one. I would have realised eventually... XD

Thanks for your help!

Ria_

Thank you!

Andy.Drinkwater

No problem

Hope you get it sorted!

-Andy

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Partial Match or RegEx in Search Console's URL Parameters Tool?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved