Can you see the 'indexing rules' that are in place for your own site?

Visually

By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed.

If you can see them - how?

KeriMorgret

Hi David,

Do you mean how robots.txt is configured and if the robots file is blocking a certain page from being indexed? If so, yes. If the file is complex and you're not sure if it's blocking a particular page, you can go into Google Webmaster Tool and they have a robots.txt utility where you can input a particular URL and it will tell you if the robots.txt file you are using (or proposing) blocks that URL.

If you mean whether the page is quality enough for a search engine to choose to index it? No, that's part of the algorithm and none of the major engines are that nice and open.

Visually

On my site, there was a rule created where users are blocked by robots unless they have submitted a minimum number of 'files'. This was done to ensure that only quality user profile pages are being indexed and not just spam/untouched profiles.

There have been other rules like this created but I don't know what they are and I'd like to find out.

Visually

To hopefully clarify what I'm talking about, I want to provide this example: SEOmoz will remove the "no-follow" tag from the first link in your profile if you get 200 mozpoints.

This is a set rule which I believe will automatically occur once a user reaches the minimum. On my site, a similar rule exists where the meta noindex tag will be removed from a user page if you submit 10 'files'.

There were other rules similar to this created and I need to know what they are. How?

Dr-Pete

Unfortunately, that would be specific to your own platform and server-side code. When you look at the SEOmoz source code, you're either going to see a nofollow or you're not. The code that drives that is on our servers and is unique to our build (PHP/Cake, I think).

You'd have to dig into the source code generating the Robots.txt file. I don't think you can have a fully dynamic Robots.txt (it has to have a .txt extension), so there must be a piece of code that generates a new Robots.txt file, probably on a timer. It could be called something similar, like Robots.php, Robots.aspx, etc. Just a guess.

FYI, dynamic Robots.txt could be a little dicey - it might be better to do this with a META NOINDEX in the header of the user profile pages. That would also avoid the timer approach. The pages would dynamically NOINDEX themselves as they're created.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Can you see the 'indexing rules' that are in place for your own site?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved